I am for anything that would reduce the demand/load on mt-comments.cgi to help keep the server load down. (I know - broken record!) Would this accomplish this or not because the spam messages would still hit mt-comments.cgi and just not get published?
WordPress seems to have a Bayesian spam filter, but I'm guessing you guys aren't about to change blog technologies.
Here's one for MT, but it doesn't seem to recommend itself highly.
CAPTCHA+whitelist sounds more hackable than, say, requiring registration and login, which is basically the same thing. You want to authenticate trusted commenters, right?
I should note that the Akismet peeps are pretty circumspect about identifying how everything works, but it certainly sounds like it's Bayesian filtering.
CAPTCHA+whitelist sounds more hackable than, say, requiring registration and login
Registration + login places a greater burden on new commenters, though. It took me forever to finally leave a comment at Tim Burke's blog, because of the registration process. That probably works to Burke's favor, since he is a Serious Fellow, but we here at unfogged should be more accomodating to passers-through. And while a simple-minded CAPTCHA is more hackable, the relevant question is, would it be hacked? I don't think comment spammers take their cues from what we're up to.
Is the (or, is a) problem that MT uses the same script for posting and viewing comments?
If it weren't for that, we could rename the posting-comments script.
Oh, it's not the CAPTCHA that I'm worried about being hackable... it's the checkbox and cookie. But I'm sure there's more to your scheme that I'm not understanding yet.
And yeah, I get that renaming the mt-comments script won't fly because it would break links; what I guess I meant was, do you think the load problem is because spammers are beating on mt-comments.cgi? Or because all of us clowns keep reloading it?
Related, in case anyone reading this can help: the reason we're getting the Internal Server Errors is that mt-comments.cgi is a memory hog and occasionally hits the limit that causes it to get auto-killed by out new host. I suppose this is preferable to what happened with our old host, which is that they would let it spin out of control until things got so bad that they locked out our site. They doubled our memory limit, which is why the ISEs are a little less frequent now, but I just spoke with them and there's no way for us to upgrade to another shared hosting plan with a higher memory limit. All of their shared plans (including their high-volume ones) have the same limit that we currently have (actually, a lower one since they raised ours) and the only way to get more memory is a dedicated server plan, which is hella expensive.
So, any tips/tweaks that people know that can help us reduce the memory used by mt-comments.cgi would be much appreciated.
Oh, it's not the CAPTCHA that I'm worried about being hackable... it's the checkbox and cookie.
Yeah, the whole thing. (Have you seen the Fistful of Euros CAPTCHA? It's … not that complex. But apparently works fine.) Do comment spammers send along cookies?
Becks, can you share what the current memory limit is?
Could I speak up for a new kind of captcha? I don't mind them. But the random-series-of-characters ones just don't do anything for me spiritually. Why not a graphic that requests the user to type in what it is a picture of? Or alternately a photo of a cock that asks the user to type in its state of tumescence. That would be more fun than SXXILV.
Again, are we worried about breaking intra-site links or links to Unfogged from other sites? Because changing the intra-site links to point to a renamed mt-comments.cgi would be a pretty easy case of exporting the data from the database, running some data conversion commands, and reimporting it. I just ran a similar script last night to fix links to cached articles...and none of you were the wiser.
Posted by
The Mimicking-Apostropher Kid |
Link to this comment |
04-18-06 12:03 PM
16
Ben, their terms and conditions that I was referred to state:
Users may not, through a cron job, CGI script, interactive commands, or any other means, take the following actions on pair Networks servers: * Run any process that requires more than 16MB of memory space. * Run any program that requires more than 30 CPU seconds to complete.
They said they doubled our memory limit, so I assume we get killed whenever we exceed 32 MB.
If we knew what kind of CPU resources we needed, and what the hell unixshell means by a "unit", and were willing to move hosts, and to move to a host where we'd have to install and configure everything ourselves, we could use the unixshell 160 or 192 plans. That would involve some low-level mucking, of course.
That's par for the course for that kind of virtual-server setup, from what I've been able to tell; after all, you're installing your own OS+programs, so their ability to support you is somewhat hampered by your ability to do whatever the hell you like.
26 - The idea of moving servers again, I suppose I could handle. But the idea of redoing everything we've done in the last week PLUS installing everything makes me want to curl up in the corner and cry. How bare is this server we're talking about? We're not just talking "insall Movable Type" -- we'd even have to do crap like configure sendmail, right?
The checkbox thing is not a great idea, I don't think -- spammers can use cookies (I don't know if they would, but it's possible). I'd suggest making commenters periodically get a cookie with a captcha on the main page (in the sidebar, maybe? duplicate it in comments?).
This could potentially help a *lot* for reducing mt-comments load: you can filter specific requests (e.g. POST to mt-comments.cgi) within .htaccess based on the presence of a cookie (I don't know how, but I believe the ancient legends our server guy tells us -- I'm sure the unfogged technical hivemind could figure it out). That would pretty effectively prevent spammers from introducing a load to the system.
I was going to say, how are we sure that the memory problem happens on the POST rather than on comment reads, but then I remembered that we never get the 500s on GETs, just on POSTs.
I don't have an MT installation handy where I can look at the mt-comments source, but I kind of want to check to see what that beast is doing.
Wait, couldn't you split the posting and reading functions into two different scripts? Make mt-comments.cgi just be responsible for reading comments, thereby making sure all old links work, but have it just exit immediately on a POST request.
Then you can make an mt-post-comment.cgi, or whatever, that does all kinds of crazy CAPTCHA stuff, or whatever, and redirects to mt-comments.cgi when it's done. Eh?
30 - Ben, I don't see myself having the time to do anything like that in the next month. I'd be pushing it just to find the time to do another data move.
Also, this seems to be moving in the complete opposite direction of our (well, my) "I don't have time to do a lot of site maintenance so let's find an easy, low-maintenance hosting solution".
Before we bog down in the details of unixshell, I think we need to figure out (1) do we really need to change hosts and (2) if so, isn't there someone who offers more memory but not a bare-bones setup?
I am so rivetingly ignorant about all this stuff, but are there any huge-comment-volume blogs out there whose brains we could pick? I mean, we get a lot of comments, but there are plenty of blogs who get more -- do they all have these problems?
And other than that, I am totally in favor of throwing money rather than time and expertise at this.
Changing hosts should be an option of last resort. 32MB is WAY too much memory for this process to consume. Our server guy considers an Apache thread over 16MB to be abnormal, and we're serving sites that are considerably heavier than Unfogged. It *must* be reindexing old comments, which is a totally stupid thing for it to do. There must be a way to stop it.
My suggestion would be to post a frustrated blog entry that expresses your infinite, cosmic disappointment with Mo/vab/le Ty/pe, the platform you love so well, and lament that you can no longer recommend it to the many well-heeled folks who come to you asking for blog software advice. Ja/y Al/len of Si/xA/part is pretty good about trolling technorati for MT mentions; odds of him swooping in and hooking you up with someone who can diagnose/retrofit mt-comments.cgi are pretty high, I'd say.
might be worth noting that refreshing the comments takes a long time for me, too. maybe you should consider junking the dropdowns and seeing if entry archive-based comments perform better?
I know very little about this as well, but I have noticed that the recent comments sidebar looks different from different archive pages. So that the "recent" comments on say, a page from 2004 are actually comments from a few hours or more before whatever time it is you're looking at them. Does that have anything to do with the reindexing?
(Apologies if you've already noticed this, and if it's irrelevant.)
To Tom's third sentence in 37, I say that the last thing the post() function of Comments.pm does before returning is this:
MT::Util::start_background_task(sub {
$app->rebuild_indexes( Blog => $blog )
or return $app->errtrans("Rebuild failed: [_1]",
$app->errstr);
$app->_send_comment_notification($comment, $comment_link,
$entry, $blog, $commenter);
_expire_sessions($cfg->CommentSessionTimeout)
});
It also rebuilds the entry synchronously.
But isn't that necessary for the comment to be reflected on the (static) index page (and also the archive pages, though their new comments sidebar always lags behind, for some reason)? If you think I'm going to examine what rebuild_indexes does, when I should be preparing for a presentation on Kant, well, you're wrong.
I already thought of the idea proposed in 32, but didn't post about it. Consequently I have no LOVE for it, only (as is appropriate for the day) HATRED for you for stealing my THUNDARR!
Alright... well, is "built w/ indexes" turned off for every index template that doesn't need to be updated whenever a new comment is posted? I assume it is, but want to be sure.
If the rebuild line can be isolated as the cause of the problem, it can probably be written around. It shouldn't be too hard to write some CGI updating a file (to be included by PHP in the index template) to reflect the recent comments -- this would likely be a lot more efficient than whatever involved process MT goes through to rebuild its index templates. I recently switched to doing this for my own archive section, and I like it a lot better as a setup. It appears to me that MT can be made to run a lot more efficiently if you ditch its tag/template system in favor of PHP when appropriate.
The Geens turing test is very effective, but I don't know if it will help w bandwidth. Most spambots go straight for mt-comments.cgi, bypassing the comments box.
I do think you should try out various easily implemented things like dropping popups or turing tests before you start thinking about stuff that invollves a lot of work.
I'm not sure I get why 32 is great, so I suspect I'm misunderstanding what it would do. How would the captcha interfere with reading old comments? And is that what 32 proposes to solve?
Since (apparently) comment-spammers look for scripts called "mt-comments.cgi", if we renamed our script to something else, like mt-fuckyou.cgi, then they wouldn't find it in as great volume. But then all the links to the comments would be broken. So the proposal was to keep the old script name, in read-only fashion.
32 is awesome because TMK said so, and TMK is awesome. (Also, Ben: I will be retaining custody of your thunder until such time as you admit that you're just frontin'.)
(I should also note that I spent a few minutes paralyzed by the surplus of thunder/thundar/thundercats jokes I could have made.)
My idea was this: if the problem is that spammers are overloading my-comments.cgi by posing spam comments, then we can stop that by renaming mt-comments.cgi. However, that breaks links to old comments. So my suggestion was to separate the two functions of mt-comments (posting and viewing) into two separate scripts, one of which is well-known and useless to spammers, and one of which is cleverly named and may or may not use CAPTCHAs, as the bloggers decide.
The following templates are all rebuilt with the indexes:
Atom feed (atom.xml)
Bridgeplate feed (bridgeplate.rdf)
Dynamic Site Bootstrapper (mtview.php) [can this be unchecked?]
Full Post w/comments (comments.xml)
Main Index (index.html)
Master Archives (archives.html)
Mobile (mobile.html)
RSD (rsd.xml)
RSS 1.0 (index.rdf)
RSS 2.0 (index.xml)
As a test, you could try turning off the re-indexing and just temporarily remove the lastest comments list on the front page. If that speeds everything up, the diagnosis would be confirmed.
If most of the time on a spam request to mt-comments is taking up in updating the page indices, then that's the problem. There's no need to update page indices if a comment was blacklisted. Otherwise I don't see how avoiding the reindexing would help that much.
The recent comments sidebar on the archive templates has always been hosed. It reflected what comments were recent when the archive was created or some such.
I'm of course totally in the dark on this, but why are people thinking it's a spam problem at all? Obviously spam is bad independently of page slowdowns and errors, but why would it be the cause of those?
I had gotten the impression somewhere that the spambots were causing most of the requests to mt-comments, and that the resources consumed by each request were about the same whether for each request.
that would return all comments after comment id yyy in a thread and just be added dynamically to the page. That would probably reduce comments-refreshing bandwidth (and probably server cpu) by 95+ percent.
But I'm not sure if it would help with unfogged's problem, since I'm not sure what's taking up all the time here.
Do we really need all of our RSS feeds. I think we need a post-only feed, but we have three of those right now: Atom, RSS 1.0, and RSS 2.0. I think the Bridgeplate feed (comments only) and post+comments feeds are good options, too, but do we really need 3 flavors of post-only?
Now only the following are rebuilt with indexes:
Atom feed (atom.xml)
Bridgeplate feed (bridgeplate.rdf) Dynamic Site Bootstrapper (mtview.php)
Full Post w/comments (comments.xml)
Main Index (index.html) Master Archives (archives.html)
Mobile (mobile.html) RSD (rsd.xml) RSS 1.0 (index.rdf)
RSS 2.0 (index.xml)
Ah, ok, I get 32 now. Makes sense. I still think the real problem is the site's size, not the spam, but pursuing anti-spam measures is certainly a good idea.
The idea I proposed of turning off rebuilding on commenting was a stupid one, I now realize -- it'd break a bunch of other things. But perhaps the rebuild script could be taught to ignore all comments prior to a particular date or ID -- that might speed up the rebuild process.
The XmlHttpRequest thing is a pleasantly geeky idea, but would actually result in a lot more load on the site. The merit of the rebuilding system is that these calculations only have to be done once, then are cached on disk.
I was thinking that the new comments info could be written to disk in a file that acted like a ring buffer (uh, somehow—look at my hands wave!) and that would be how the comment sidebar was replaced, one way or another. Though, of course, then the comment-count would never get updated, etc.
Further evidence that the memory wall is hit when rebuilding indices is that the comment is actually posted, but no email is sent and the sidebar doesn't get rebuilt right away. However, the entry does get rebuilt. That leaves rebuilding the indices as the only candidate.
91: wordpress actually has similar problems of its own -- instead of running periodic tasks on a cron, it has an event loop that fires on a percentage of all requests. jumps in traffic can result in much more load than is actually necessary. I'm no WP expert, but a coworker has been having big trouble with his site as a result (and similar problems finding an ISP that will tolerate the load he introduces to their system).
You know, I just had an idea, which may be MADNESS, but I thought I would mention it.
In order to update the "Recent comments" sidebar the site has to rebuild the main page. But it is possible to make the sidebar not an integral part of the main page, but an extra blog that publishes into a file that is then included into the main page. That's how my sidebar works. (Like this.)
Would it be any use to shunt the sidebar and "Recent Comments" into an extra blog like that, so that the comments process would involve making extra entries in that blog rather than doing stuff to the sidebar? Thinking it over, I suspect not, because after making entries to the new blog you have to rebuild the main page anyway, but I thought I'd mention it.
Another thought: Would it help in any way to drop the "Recent Comments" from archive pages? Those are always messed up anyway.
I realize that Becks is celebrating transferring the reading group archives, so feel free to put this in a little envelope marked "do not open till Xmas," or to ignore it entirely.
As a Wordpress user, I have gotten tremendously satisfying results from Akismet. They just released a version for MT, as well.
Posted by Robust McManlyPants | Link to this comment | 04-18-06 11:29 AM
I am for anything that would reduce the demand/load on mt-comments.cgi to help keep the server load down. (I know - broken record!) Would this accomplish this or not because the spam messages would still hit mt-comments.cgi and just not get published?
Posted by Becks | Link to this comment | 04-18-06 11:31 AM
WordPress seems to have a Bayesian spam filter, but I'm guessing you guys aren't about to change blog technologies.
Here's one for MT, but it doesn't seem to recommend itself highly.
CAPTCHA+whitelist sounds more hackable than, say, requiring registration and login, which is basically the same thing. You want to authenticate trusted commenters, right?
Posted by mrh | Link to this comment | 04-18-06 11:32 AM
Is the (or, is a) problem that MT uses the same script for posting and viewing comments?
Posted by mrh | Link to this comment | 04-18-06 11:33 AM
I should note that the Akismet peeps are pretty circumspect about identifying how everything works, but it certainly sounds like it's Bayesian filtering.
Posted by Robust McManlyPants | Link to this comment | 04-18-06 11:33 AM
CAPTCHA+whitelist sounds more hackable than, say, requiring registration and login
Registration + login places a greater burden on new commenters, though. It took me forever to finally leave a comment at Tim Burke's blog, because of the registration process. That probably works to Burke's favor, since he is a Serious Fellow, but we here at unfogged should be more accomodating to passers-through. And while a simple-minded CAPTCHA is more hackable, the relevant question is, would it be hacked? I don't think comment spammers take their cues from what we're up to.
Is the (or, is a) problem that MT uses the same script for posting and viewing comments?
If it weren't for that, we could rename the posting-comments script.
Posted by ben wolfson | Link to this comment | 04-18-06 11:42 AM
Oh, it's not the CAPTCHA that I'm worried about being hackable... it's the checkbox and cookie. But I'm sure there's more to your scheme that I'm not understanding yet.
And yeah, I get that renaming the mt-comments script won't fly because it would break links; what I guess I meant was, do you think the load problem is because spammers are beating on mt-comments.cgi? Or because all of us clowns keep reloading it?
Posted by mrh | Link to this comment | 04-18-06 11:47 AM
Related, in case anyone reading this can help: the reason we're getting the Internal Server Errors is that mt-comments.cgi is a memory hog and occasionally hits the limit that causes it to get auto-killed by out new host. I suppose this is preferable to what happened with our old host, which is that they would let it spin out of control until things got so bad that they locked out our site. They doubled our memory limit, which is why the ISEs are a little less frequent now, but I just spoke with them and there's no way for us to upgrade to another shared hosting plan with a higher memory limit. All of their shared plans (including their high-volume ones) have the same limit that we currently have (actually, a lower one since they raised ours) and the only way to get more memory is a dedicated server plan, which is hella expensive.
So, any tips/tweaks that people know that can help us reduce the memory used by mt-comments.cgi would be much appreciated.
Posted by Becks | Link to this comment | 04-18-06 11:53 AM
Oh, it's not the CAPTCHA that I'm worried about being hackable... it's the checkbox and cookie.
Yeah, the whole thing. (Have you seen the Fistful of Euros CAPTCHA? It's … not that complex. But apparently works fine.) Do comment spammers send along cookies?
Becks, can you share what the current memory limit is?
Posted by ben wolfson | Link to this comment | 04-18-06 11:56 AM
Could I speak up for a new kind of captcha? I don't mind them. But the random-series-of-characters ones just don't do anything for me spiritually. Why not a graphic that requests the user to type in what it is a picture of? Or alternately a photo of a cock that asks the user to type in its state of tumescence. That would be more fun than SXXILV.
Posted by The Captcha Kid | Link to this comment | 04-18-06 12:00 PM
Can I just say that I'm loving The [Choose-Your-Own Adventure] Kid's metacommentary on his comments?
Posted by Cala | Link to this comment | 04-18-06 12:01 PM
Or even just a line drawing, with instructions "Please identify this cock as flaccid or engorged." I don't think spam bots could handle that.
Posted by The Captcha Kid | Link to this comment | 04-18-06 12:01 PM
Again, are we worried about breaking intra-site links or links to Unfogged from other sites? Because changing the intra-site links to point to a renamed mt-comments.cgi would be a pretty easy case of exporting the data from the database, running some data conversion commands, and reimporting it. I just ran a similar script last night to fix links to cached articles...and none of you were the wiser.
Posted by Becks | Link to this comment | 04-18-06 12:01 PM
Hey thanks for the props, Cala!
Posted by The Grateful Kid | Link to this comment | 04-18-06 12:02 PM
I am the wiser!
Posted by The Mimicking-Apostropher Kid | Link to this comment | 04-18-06 12:03 PM
Ben, their terms and conditions that I was referred to state:
They said they doubled our memory limit, so I assume we get killed whenever we exceed 32 MB.
Posted by Becks | Link to this comment | 04-18-06 12:04 PM
What are our bandwidth/space uses?
Posted by ben wolfson | Link to this comment | 04-18-06 12:11 PM
Hey thanks for the props, Cala!
Yes, well, I give you the opposite of props. I give you: SPORP. So you have zero net approbations.
Posted by Standpipe Bridgeplate | Link to this comment | 04-18-06 12:15 PM
We need at least 2 GB of space and 40 GB bandwidth.
Posted by Becks | Link to this comment | 04-18-06 12:16 PM
If we knew what kind of CPU resources we needed, and what the hell unixshell means by a "unit", and were willing to move hosts, and to move to a host where we'd have to install and configure everything ourselves, we could use the unixshell 160 or 192 plans. That would involve some low-level mucking, of course.
Posted by ben wolfson | Link to this comment | 04-18-06 12:21 PM
Who is this "A" who is planning for spam?
Posted by Jackmormon | Link to this comment | 04-18-06 12:23 PM
Firefox remembers our captcha for me. I'm grateful for that feature, b/c MT's 'remember info?' cookie hardly ever works for me nowadays.
Posted by David Weman | Link to this comment | 04-18-06 12:26 PM
A representative of the aesthetic stage of life.
Posted by ben wolfson | Link to this comment | 04-18-06 12:26 PM
Who is this "A" who is planning for spam?
Surely you of all people have read Perec?
Posted by The OuLiPo Kid | Link to this comment | 04-18-06 12:27 PM
I must say that unixshell's "we give you no support whatsoever" disclaimer gives me pause.
Posted by Becks | Link to this comment | 04-18-06 12:30 PM
That's par for the course for that kind of virtual-server setup, from what I've been able to tell; after all, you're installing your own OS+programs, so their ability to support you is somewhat hampered by your ability to do whatever the hell you like.
Posted by ben wolfson | Link to this comment | 04-18-06 12:36 PM
Why is submitting a comment using 32M of memory?
Posted by mrh | Link to this comment | 04-18-06 12:47 PM
27 - That's what I want to know.
26 - The idea of moving servers again, I suppose I could handle. But the idea of redoing everything we've done in the last week PLUS installing everything makes me want to curl up in the corner and cry. How bare is this server we're talking about? We're not just talking "insall Movable Type" -- we'd even have to do crap like configure sendmail, right?
Posted by Becks | Link to this comment | 04-18-06 12:54 PM
The checkbox thing is not a great idea, I don't think -- spammers can use cookies (I don't know if they would, but it's possible). I'd suggest making commenters periodically get a cookie with a captcha on the main page (in the sidebar, maybe? duplicate it in comments?).
This could potentially help a *lot* for reducing mt-comments load: you can filter specific requests (e.g. POST to mt-comments.cgi) within .htaccess based on the presence of a cookie (I don't know how, but I believe the ancient legends our server guy tells us -- I'm sure the unfogged technical hivemind could figure it out). That would pretty effectively prevent spammers from introducing a load to the system.
(atm)
Posted by tom | Link to this comment | 04-18-06 12:59 PM
Becks: we'd have to do crap like install sendmail. And, for that matter, choose a distro and install that.
Posted by ben w | Link to this comment | 04-18-06 1:08 PM
I was going to say, how are we sure that the memory problem happens on the POST rather than on comment reads, but then I remembered that we never get the 500s on GETs, just on POSTs.
I don't have an MT installation handy where I can look at the mt-comments source, but I kind of want to check to see what that beast is doing.
Posted by mrh | Link to this comment | 04-18-06 1:11 PM
Wait, couldn't you split the posting and reading functions into two different scripts? Make mt-comments.cgi just be responsible for reading comments, thereby making sure all old links work, but have it just exit immediately on a POST request.
Then you can make an mt-post-comment.cgi, or whatever, that does all kinds of crazy CAPTCHA stuff, or whatever, and redirects to mt-comments.cgi when it's done. Eh?
Posted by mrh | Link to this comment | 04-18-06 1:13 PM
30 - Ben, I don't see myself having the time to do anything like that in the next month. I'd be pushing it just to find the time to do another data move.
Posted by Becks | Link to this comment | 04-18-06 1:15 PM
Also, this seems to be moving in the complete opposite direction of our (well, my) "I don't have time to do a lot of site maintenance so let's find an easy, low-maintenance hosting solution".
Before we bog down in the details of unixshell, I think we need to figure out (1) do we really need to change hosts and (2) if so, isn't there someone who offers more memory but not a bare-bones setup?
Posted by Becks | Link to this comment | 04-18-06 1:21 PM
32 is awesome. That is totally what you should do. Excellent idea, Mr. H.
Posted by The Agreeing Kid | Link to this comment | 04-18-06 1:23 PM
I am so rivetingly ignorant about all this stuff, but are there any huge-comment-volume blogs out there whose brains we could pick? I mean, we get a lot of comments, but there are plenty of blogs who get more -- do they all have these problems?
And other than that, I am totally in favor of throwing money rather than time and expertise at this.
Posted by LizardBreath | Link to this comment | 04-18-06 1:30 PM
Changing hosts should be an option of last resort. 32MB is WAY too much memory for this process to consume. Our server guy considers an Apache thread over 16MB to be abnormal, and we're serving sites that are considerably heavier than Unfogged. It *must* be reindexing old comments, which is a totally stupid thing for it to do. There must be a way to stop it.
My suggestion would be to post a frustrated blog entry that expresses your infinite, cosmic disappointment with Mo/vab/le Ty/pe, the platform you love so well, and lament that you can no longer recommend it to the many well-heeled folks who come to you asking for blog software advice. Ja/y Al/len of Si/xA/part is pretty good about trolling technorati for MT mentions; odds of him swooping in and hooking you up with someone who can diagnose/retrofit mt-comments.cgi are pretty high, I'd say.
Posted by tom | Link to this comment | 04-18-06 1:32 PM
might be worth noting that refreshing the comments takes a long time for me, too. maybe you should consider junking the dropdowns and seeing if entry archive-based comments perform better?
Posted by tom | Link to this comment | 04-18-06 1:41 PM
Sorry, meant popups, not dropdowns (I'm working on some dropdowns atm)
Posted by tom | Link to this comment | 04-18-06 1:41 PM
I know very little about this as well, but I have noticed that the recent comments sidebar looks different from different archive pages. So that the "recent" comments on say, a page from 2004 are actually comments from a few hours or more before whatever time it is you're looking at them. Does that have anything to do with the reindexing?
(Apologies if you've already noticed this, and if it's irrelevant.)
Posted by eb | Link to this comment | 04-18-06 2:17 PM
To Tom's third sentence in 37, I say that the last thing the post() function of Comments.pm does before returning is this:
MT::Util::start_background_task(sub {
$app->rebuild_indexes( Blog => $blog )
or return $app->errtrans("Rebuild failed: [_1]",
$app->errstr);
$app->_send_comment_notification($comment, $comment_link,
$entry, $blog, $commenter);
_expire_sessions($cfg->CommentSessionTimeout)
});
It also rebuilds the entry synchronously.
But isn't that necessary for the comment to be reflected on the (static) index page (and also the archive pages, though their new comments sidebar always lags behind, for some reason)? If you think I'm going to examine what rebuild_indexes does, when I should be preparing for a presentation on Kant, well, you're wrong.
Posted by ben wolfson | Link to this comment | 04-18-06 2:21 PM
People! Where is the love for 32? It would contribute mightily to the anti-spam battle, and it is way easy!
Posted by The Modesto Kid | Link to this comment | 04-18-06 2:24 PM
I already thought of the idea proposed in 32, but didn't post about it. Consequently I have no LOVE for it, only (as is appropriate for the day) HATRED for you for stealing my THUNDARR!
Posted by ben wolfson | Link to this comment | 04-18-06 2:26 PM
Alright... well, is "built w/ indexes" turned off for every index template that doesn't need to be updated whenever a new comment is posted? I assume it is, but want to be sure.
If the rebuild line can be isolated as the cause of the problem, it can probably be written around. It shouldn't be too hard to write some CGI updating a file (to be included by PHP in the index template) to reflect the recent comments -- this would likely be a lot more efficient than whatever involved process MT goes through to rebuild its index templates. I recently switched to doing this for my own archive section, and I like it a lot better as a setup. It appears to me that MT can be made to run a lot more efficiently if you ditch its tag/template system in favor of PHP when appropriate.
Posted by tom | Link to this comment | 04-18-06 2:30 PM
The Geens turing test is very effective, but I don't know if it will help w bandwidth. Most spambots go straight for mt-comments.cgi, bypassing the comments box.
I do think you should try out various easily implemented things like dropping popups or turing tests before you start thinking about stuff that invollves a lot of work.
Posted by David Weman | Link to this comment | 04-18-06 2:32 PM
I'm not sure I get why 32 is great, so I suspect I'm misunderstanding what it would do. How would the captcha interfere with reading old comments? And is that what 32 proposes to solve?
Posted by tom | Link to this comment | 04-18-06 2:34 PM
Are you running your MT install on mod_perl? If not, would that help? Would it be possible?
Posted by pdf23ds | Link to this comment | 04-18-06 2:35 PM
Since (apparently) comment-spammers look for scripts called "mt-comments.cgi", if we renamed our script to something else, like mt-fuckyou.cgi, then they wouldn't find it in as great volume. But then all the links to the comments would be broken. So the proposal was to keep the old script name, in read-only fashion.
Posted by ben wolfson | Link to this comment | 04-18-06 2:40 PM
The more I think about it, the stupider reevaluating the main page template every time there's a new comment seems.
Posted by ben wolfson | Link to this comment | 04-18-06 2:41 PM
32 is awesome because TMK said so, and TMK is awesome. (Also, Ben: I will be retaining custody of your thunder until such time as you admit that you're just frontin'.)
(I should also note that I spent a few minutes paralyzed by the surplus of thunder/thundar/thundercats jokes I could have made.)
My idea was this: if the problem is that spammers are overloading my-comments.cgi by posing spam comments, then we can stop that by renaming mt-comments.cgi. However, that breaks links to old comments. So my suggestion was to separate the two functions of mt-comments (posting and viewing) into two separate scripts, one of which is well-known and useless to spammers, and one of which is cleverly named and may or may not use CAPTCHAs, as the bloggers decide.
Posted by mrh | Link to this comment | 04-18-06 2:43 PM
The following templates are all rebuilt with the indexes:
Atom feed (atom.xml)
Bridgeplate feed (bridgeplate.rdf)
Dynamic Site Bootstrapper (mtview.php) [can this be unchecked?]
Full Post w/comments (comments.xml)
Main Index (index.html)
Master Archives (archives.html)
Mobile (mobile.html)
RSD (rsd.xml)
RSS 1.0 (index.rdf)
RSS 2.0 (index.xml)
Posted by Becks | Link to this comment | 04-18-06 2:44 PM
As a test, you could try turning off the re-indexing and just temporarily remove the lastest comments list on the front page. If that speeds everything up, the diagnosis would be confirmed.
Posted by mrh | Link to this comment | 04-18-06 2:45 PM
If most of the time on a spam request to mt-comments is taking up in updating the page indices, then that's the problem. There's no need to update page indices if a comment was blacklisted. Otherwise I don't see how avoiding the reindexing would help that much.
Posted by pdf23ds | Link to this comment | 04-18-06 2:45 PM
The recent comments sidebar on the archive templates has always been hosed. It reflected what comments were recent when the archive was created or some such.
Posted by Becks | Link to this comment | 04-18-06 2:47 PM
I also like Tom's suggestions in 38/39.
Posted by mrh | Link to this comment | 04-18-06 2:47 PM
I'm of course totally in the dark on this, but why are people thinking it's a spam problem at all? Obviously spam is bad independently of page slowdowns and errors, but why would it be the cause of those?
Posted by washerdreyer | Link to this comment | 04-18-06 2:48 PM
I agree with w/d. It could simply be caused by our own commenting practices.
Posted by ben wolfson | Link to this comment | 04-18-06 2:51 PM
btw, the title is supposed to be a reference to the title of Graham's initial essay, "A Plan for Spam". Except I had two plans, see?
Posted by ben wolfson | Link to this comment | 04-18-06 2:52 PM
I had gotten the impression somewhere that the spambots were causing most of the requests to mt-comments, and that the resources consumed by each request were about the same whether for each request.
Posted by pdf23ds | Link to this comment | 04-18-06 2:53 PM
Is it just me or does the site seem kinda snappier now?
Posted by Becks | Link to this comment | 04-18-06 2:53 PM
Testing
Posted by Becks | Link to this comment | 04-18-06 2:54 PM
Not fast but a wee bit quicker? I unchecked four files from "build with indexes". (There used to be even more being built than I had listed in 51.)
Posted by Becks | Link to this comment | 04-18-06 2:55 PM
Are there any more in 51 people think I can uncheck?
Posted by Becks | Link to this comment | 04-18-06 2:58 PM
Now, I had this really good idea the other day. You could use an XmlHttpRequest in the javascript in the comments page to a page like this:
unfogged.com/new-comments.php?postid=xxx&lastcommentid=yyy
that would return all comments after comment id yyy in a thread and just be added dynamically to the page. That would probably reduce comments-refreshing bandwidth (and probably server cpu) by 95+ percent.
But I'm not sure if it would help with unfogged's problem, since I'm not sure what's taking up all the time here.
Posted by pdf23ds | Link to this comment | 04-18-06 2:58 PM
You could do something like that to replace the recent-comments sidebar, too, as long as you had an efficient way of getting those.
Except that wouldn't help with updating the comment-count reflected for each post.
Posted by ben wolfson | Link to this comment | 04-18-06 3:01 PM
Do we really need all of our RSS feeds. I think we need a post-only feed, but we have three of those right now: Atom, RSS 1.0, and RSS 2.0. I think the Bridgeplate feed (comments only) and post+comments feeds are good options, too, but do we really need 3 flavors of post-only?
Also, is Master Archives even used?
Posted by Becks | Link to this comment | 04-18-06 3:08 PM
There's one good way to find out!
Posted by ben wolfson | Link to this comment | 04-18-06 3:09 PM
Testing
Posted by Becks | Link to this comment | 04-18-06 3:12 PM
Also, is Master Archives even used?
I'd be fairly surprised if it wasn't.
Do you rebuild your indexes that often anyway?
Posted by David Weman | Link to this comment | 04-18-06 3:13 PM
Now only the following are rebuilt with indexes:
Atom feed (atom.xml)
Bridgeplate feed (bridgeplate.rdf)
Dynamic Site Bootstrapper (mtview.php)Full Post w/comments (comments.xml)
Main Index (index.html)
Master Archives (archives.html)Mobile (mobile.html)
RSD (rsd.xml)RSS 1.0 (index.rdf)RSS 2.0 (index.xml)
Posted by Becks | Link to this comment | 04-18-06 3:13 PM
Do you rebuild your indexes that often anyway?
Apparently, yes. See 41.
Posted by Becks | Link to this comment | 04-18-06 3:14 PM
Well, things are feeling snappier now for me.
Posted by pdf23ds | Link to this comment | 04-18-06 3:21 PM
Testing.
Posted by Becks | Link to this comment | 04-18-06 3:23 PM
I just got an internal server error.
Posted by Becks | Link to this comment | 04-18-06 3:23 PM
And another.
Posted by Becks | Link to this comment | 04-18-06 3:24 PM
Oh!
Nothing but index.php needs to be checked, I don't think.
Wouldn't it be simpler to change comments.pm, though?
Posted by David Weman | Link to this comment | 04-18-06 3:24 PM
Index.html, that is.
Posted by David Weman | Link to this comment | 04-18-06 3:25 PM
testing
Posted by Becks | Link to this comment | 04-18-06 3:39 PM
Ah, ok, I get 32 now. Makes sense. I still think the real problem is the site's size, not the spam, but pursuing anti-spam measures is certainly a good idea.
The idea I proposed of turning off rebuilding on commenting was a stupid one, I now realize -- it'd break a bunch of other things. But perhaps the rebuild script could be taught to ignore all comments prior to a particular date or ID -- that might speed up the rebuild process.
The XmlHttpRequest thing is a pleasantly geeky idea, but would actually result in a lot more load on the site. The merit of the rebuilding system is that these calculations only have to be done once, then are cached on disk.
Posted by tom | Link to this comment | 04-18-06 3:45 PM
I was thinking that the new comments info could be written to disk in a file that acted like a ring buffer (uh, somehow—look at my hands wave!) and that would be how the comment sidebar was replaced, one way or another. Though, of course, then the comment-count would never get updated, etc.
Posted by ben wolfson | Link to this comment | 04-18-06 3:51 PM
Testing.
Posted by Armsmasher | Link to this comment | 04-18-06 4:06 PM
(I just want to feel helpful)
Posted by Armsmasher | Link to this comment | 04-18-06 4:07 PM
Armshasher gets the Spirit Award.
Posted by apostropher | Link to this comment | 04-18-06 4:07 PM
And they spelled your name wrong on the trophy. Those bastards.
Posted by apostropher | Link to this comment | 04-18-06 4:09 PM
Testing.
Posted by Becks | Link to this comment | 04-18-06 5:07 PM
Testing
Posted by Becks | Link to this comment | 04-18-06 5:09 PM
Testing
Posted by Becks | Link to this comment | 04-18-06 5:11 PM
More testing
Posted by Becks | Link to this comment | 04-18-06 5:12 PM
And another. I know, this is entertaining stuff.
Posted by Becks | Link to this comment | 04-18-06 5:12 PM
I just tried turning off pop-ups and it didn't seem to have an effect.
Posted by Becks | Link to this comment | 04-18-06 5:15 PM
Uhh...so is this all a function of MT rebuilding pages every time someone comments? And isn't that why Henley, among others, moved to Word Press?
NB: I wouldn't swear I know what any of the words in the above mean.
Posted by SomeCallMeTim | Link to this comment | 04-18-06 5:16 PM
Further evidence that the memory wall is hit when rebuilding indices is that the comment is actually posted, but no email is sent and the sidebar doesn't get rebuilt right away. However, the entry does get rebuilt. That leaves rebuilding the indices as the only candidate.
Posted by ben wolfson | Link to this comment | 04-18-06 6:36 PM
91: wordpress actually has similar problems of its own -- instead of running periodic tasks on a cron, it has an event loop that fires on a percentage of all requests. jumps in traffic can result in much more load than is actually necessary. I'm no WP expert, but a coworker has been having big trouble with his site as a result (and similar problems finding an ISP that will tolerate the load he introduces to their system).
Posted by tom | Link to this comment | 04-18-06 7:15 PM
test
Posted by Anonymous | Link to this comment | 04-18-06 7:39 PM
test
Posted by Anonymous | Link to this comment | 04-18-06 7:40 PM
final test
Posted by Anonymous | Link to this comment | 04-18-06 7:41 PM
Yeah, WP isn't perfect, but it has some decent caching options.
Posted by mrh | Link to this comment | 04-18-06 7:55 PM
testing
Posted by Becks | Link to this comment | 04-19-06 6:22 AM
Testy today, are we?
Posted by apostropher | Link to this comment | 04-19-06 7:25 AM
One hundred! Test! More testing!
Posted by washerdreyer | Link to this comment | 04-19-06 9:51 AM
Wow, it is awesome and meta to get comments spam on a thread of this title.
Posted by The Modesto Kid | Link to this comment | 04-23-06 5:11 AM
You know, I just had an idea, which may be MADNESS, but I thought I would mention it.
In order to update the "Recent comments" sidebar the site has to rebuild the main page. But it is possible to make the sidebar not an integral part of the main page, but an extra blog that publishes into a file that is then included into the main page. That's how my sidebar works. (Like this.)
Would it be any use to shunt the sidebar and "Recent Comments" into an extra blog like that, so that the comments process would involve making extra entries in that blog rather than doing stuff to the sidebar? Thinking it over, I suspect not, because after making entries to the new blog you have to rebuild the main page anyway, but I thought I'd mention it.
Another thought: Would it help in any way to drop the "Recent Comments" from archive pages? Those are always messed up anyway.
I realize that Becks is celebrating transferring the reading group archives, so feel free to put this in a little envelope marked "do not open till Xmas," or to ignore it entirely.
Posted by Matt Weiner | Link to this comment | 04-23-06 8:34 AM