WP-GatekeeperPublished 18 years, 4 months past
In my post on
rel="nofollow", I mentioned the use of easily human-comprehensible challenge questions like “What is Eric’s first name?” as a way to defeat spambots. There were two points made in the comments that I had considered but hadn’t brought up, given that they were tangential to the point of the post. They were:
- Spammers could set up a database of questions and answers used on sites. They might or might not share it with each other, but the point is that if I set up “What is Eric’s first name?” as the sole challenge, the human running the spambot could build the ability to answer the question into the spambot, thus defeating it. Quite true.
- In order to make it more difficult to do this, there could be a set of challenges from which one is picked randomly. So I might have three challenges asking for the first names of myself, Kat, and Carolyn. Every time a comment form is delivered to a browser, one of the three challenges, picked at random, is included. This would make it more difficult for a human spammer, since he (or she) would have to find all of the challenge questions. work out the responses, and build them all into a database, keyed to each site’s domain.
So over the weekend, I built as a proof of concept (and also as an exercise in learning more about how PHP, mySQL, and WordPress work) a WordPress package to do what described in the second point above. It’s called WP-Gatekeeper, available from my WordPress Tools page, and if you’re brave you can give it a try. Why brave? Because the installation involves hacking a few WP files and adding a new entry to the admin menu, not to mention firing up a plugin. And if you do it in the wrong order, you can break commenting for a short period. There are DIY installation instructions on the WP-Gatekeeper page, for those who still want to proceed. You also need to be brave because if you install it, you’re running code written—well, actually, adapted—by someone with only beginner-to-intermediate PHP skills. I’ve been testing it locally and everything seems fine, but this is even more “use at your own risk” software than usual. Got it? Good.
Accordingly, WP-Gatekeeper is currently considered beta software. I’m making it available now in the hopes that people more experienced than I with PHP and WordPress can take a look, hack on the code, and make it more efficient and the whole package easier to install. I’m already aware that in WP 1.5, adding the admin page is much easier and doesn’t require hacking files, but I wrote WP-Gatekeeper in 1.2 and want it to work there, since that’s the latest public version. Thus, any optimizations should work in 1.2. When 1.5 (or whatever the next version number is) comes out, then I’ll worry about it.
Of course, there’s still nothing that prevents a spammer from registering questions and answers into a database, but the admin page makes it easy for a blogger to add, remove, modify, and re-key the challenges. That will make tracking them more difficult, so long as a blogger puts effort into maintaining the list of challenges. It gets back, in the end, to maintaining your blog. The more maintenance you put into something, the better its shape will stay.
Got feedback? Let’s hear it.
Weblog Tools Collection » Eric’s Archived Thoughts: WP-Gatekeeper
[…] ived Thoughts: WP-Gatekeeper Categories – WordPress Hack LinkyLoo — Mark Eric Meyer: WP-Gatekeeper: A more accessible CAPTCHA solution to prevent comment Spam […]
$foo.bar » Blog Archive » Awaiting the Onslaught
[…] Meyer seems to have come up with an entirely accessible Turing Test for blog comments, in WP Gatekeeper. […]
$foo.bar » Blog Archive » Awaiting the Onslaught
[…] Meyer seems to have come up with an entirely accessible Turing Test for blog comments, in WP Gatekeeper. […]
Down the Pipe » Blog Protection for WordPress
[…] pammers out of your WordPress blog. It’s called WP-Gatekeeper. Filed under: Development — […]
$foo.bar » Blog Archive » They’re Here
[…] e minimal state is inspiring as well as right. by 0texas hold’em I tried to install WP Gatekeeper, which I have mentioned before, but it had some pretty severe problems […]
Eric, why not just do what other companies do, that is a security code that is randomly generated, aplied to an image and has an input box that must match, like godaddy does.
This can be done even easier since no admin support will be needed, right?
There are numerous OOP PHP scripts available at PHPclasses.org that use CAPTCHA images to validate for human users. Look under Security. I can’t say which is best. I’ve downloaded a bunch, but have yet to work with one.
Because CAPTCHA images aren’t accessible. If you can’t see an image, you can’t pass the test — even though you might well be human…
No image CAPTCHAs. Too many accessibility problems, and too easy to defeat. The hope is that WP-Gatekeeper will be a little tougher to defeat while remaining fully accessible. But if not, then people should explain why rather than replying to “how can this be improved?” with “do something completely different instead”. That isn’t answering the question, it’s dismissing the question and the effort behind it. If I wanted that kind of treatment, I could go post to any one of a number of W3C mailing lists.
sorry eric, i have a onetrack mind, so I appologize to all for that debacle. but I do have suggestions, one way to make it harder for bots and easier for admins would be to hook in several automatic questions that will randomly display along with the admin entered ones. ie
Fill In the blank:
Todays Date is January __ 2005
and simple math questions
now this could be an accessibility issue if a math challanged person arives, but overall it would work, now you could easily create hundreds of these, along with the admin ones, text box name changing, I belive that would be nearly impossable to defeat.
This looks like an interesting idea, I have pondered about something similar myself in the past, although my php skills are probably even poorer than yours, and I havn’t had the time.
Michael Duff’s comment about changing the text box name sounds like a (relatively) easy addition that would make writing a script/bot more of a challenge
I’ve been thinking about a way to solve this for a while, and plan to use it if and when my site starts getting comment spam. It seems to me that as long as your pages are going to be dynamic, why don’t you have variable names for your comment form’s inputs? As far as automatic spamming goes, it seems that the database also has to include the correct POST variables in order to add the comment. If a POST does not have the correct variables, then you just direct whoever sent the request to a page letting them know that they are trying to post to an out of date version (in case their using a cached page). You could just use a database to keep track of which variables the pages should use. I know that it is another thing that a human could easily defeat, but as long as the algorithm isn’t predictable, it should at least make it so they have to take more time. Finally, (sorry, this is getting long) you could also change the structure of the input tag around so that regular expression searches are harder.
This kind of solution is great for defeating human spammers (or at least severely slow them down). Instead of using questions that can be answered just by looking at the question, make a simple ‘reading comprehension’ question for each post. This means a person would at least have had to skim read the post before commenting. For example for this post, the question would be ‘What is the name of the WordPress plugin I wrote?’.
I would use little tasks like a textbox filled with “foobar” and the task “remove letter O from textbox”. There is no single solution as it needs to read the value of the textbox and do something with it. And it can’t be evaled in a scripting language like “2*6”.
Michael: date- and math-based challenges worry me in that they might be more easily defeatable by bots. After all, the bot can figure out what the date is almost as easily as a human, and with the same error possibilities, thanks to time zones. It’s definitely something to keep in mind, though.
Michael and Donald: the original version of WP-Gatekeeper allowed the administrator to define a list of fieldnames for the input box. I modified that to the “key” concept, where every challenge has a unique, randomly generated key associated with it. These can be replaced at any point with a new set of keys by the administrator. The problem I had was I couldn’t figure out how to easily check the returned HTTP variables for the Gatekeeper respnse if it was going to change every time. If someone can point me to a good explanation of how to handle that, I’d be glad to reinstate the feature.
Nico: note that a similar thing can be done with the current system. The challenge could be “Take the B from ABC, and what’s left?” Correct response: “AC”. Still, I might in a future version allow the ability to fill in a value like “foobar” so that the challenge could be “Remove the ‘b’ from this input” (correct response: ‘fooar’). It’s not a bad idea at all, and a good way of thinking for people wanting to devise their own challenges.
You could also consider questions of the sort:
* what is the fourth word of the third sentence in the post?
* what is the second letter of the first word in this sentence?
The thought being that multiple number words require at least a little more analysis from an automated system.
Interesting idea by Eric Meyer on how to defeat comment spam. His solution, WP-Gatekeeper, uses a randomly generated question that users must answer in order to post a comment.
For example, the question I was asked when I looked into posting a comme…
I think your solution may not work out the way you intended. Rather than increasing the complexity for automated spammers, you’ve actually made it much easier.
In the current structure, “What color is an orange?”, it’d be a really simple matter to write a program that would (a) strike out all the question words & articles and then (b) try the remaining words sequentially.
Instead of a universe of potentially infinite matches, you’ve basically reduced it to only two (“color” and “orange”) in this case.
Also, in terms of real usability, you may also unintentionally exclude international users who don’t have full mastery of English idioms.
Like Michael, I also apologize if I offended. I was just offering some thoughts on what I had planned to use as a base for human verification, I certainly did not mean to offend or belittle the effort you put forward with this undertaking. I agree that CAPTCHA images have limitations, but it is possible to work around them, providing accessible options for human verification.
That said, I do love the questions. It is a tack I had not considered before seeing this post. I think they make a lot of sense and provide a great opportunity to show off your keen sense of humor as well.
The attack model isn’t that of spammers building up a database of questions and choosing the answer in advance; it’s real-time, and no database of answers is necessary.
– Spambot visits your comment form
– Your comment form generates a captcha and puts it in the form
– Spambot takes the same captcha and generates a completely separate form with the same captcha
– Spambot presents the form to an unwitting human who wants to see free porn
– Unwitting human answers captcha
– Spambot receives human’s answer, presents free porn to unwitting human as reward
– Spambot completes submission to your comment form with unwitting human’s answer to captcha
– Your comment submission code has received a human’s answer to the captcha, and is happy to accept the comment submission
The captcha is answered in real time, and changing the form each time won’t help; unless you change it so much that it’s non-functional for humans as well. If your site is popular enough, the spammers will keep up with the changes.
There’s no shortage of humans willing to answer any kind of question, at all hours of the day, so that the spambot can get an answer in less than a minute; you can’t timeout faster than that without making it non-functional to humans as well.
Essentially, because the answers can be farmed out by proxy in real time, a captcha that can be solved by a human can be automatically defeated.
I like where you’re going here.
My only little quibble with this sort of technique is that it still requires extra effort on the part of your users. Now, I admit, answering “what color is an orange?” isn’t exactly taxing me to the point I won’t leave a comment — but if you’ll allow me to be a bit of a purist I (perhaps blindly) think we can come up with something to keep the discussion in the blogosphere going that doesn’t require any extra effort on the part of our users.
So far it’s blocked about 28,000 messages with no false positives, and it’s only let about 1,000 messages through to the moderation queue which were in fact comment spam. If I hadn’t made a post about it my users would be none the wiser.
Blocking spam takes effort. Effort if we have to clean out our inboxes every five minutes, effort if we put into place hacks and plug-ins, and effort on the part of our users if they have to fill in an extra box when they want to leave a comment. My purist self leads me to believe we can minimize the effort required by our users.
Blogging Pro - Blogging News, Tools and Hacks
WordPress Plugin: WP-Gatekeeper
WP-Gatekeeper is a plugin/hack to help combat comment spam. The plugin asks the commenter a question which requires the correct answer to get pass the Gatekeeper….
Ben Finney (comment 15), the scenario you presented is quite interesting, and at first seemed like there wasn’t much you could do to prevent that from happening. Then I realised that it wouldn’t be possible if the question asked could only be answered in the context of the correct web site. For example:
If any of those questions were to appear on a spammers web site, it would not be possible for any unsuspecting user to get the answer correct since the answers are only available from this site.
One day your comment form will have a mini-IQ test you must take (and pass) before commenting.
Eric Meyer hat eine WordPress-Erweiterung namens WP-Gatekeeper geschrieben. Das ganze funktioniert sehr
This is all reminding me of the old days when code-wheels were provided with computer games, and you had to use them if you wanted to unlock the game, or questions like “What is the first word on page 83 of the manual?”. Page 83 would then be one of those ‘impossible to photocopy’ pages that were also ‘impossible to read by normal humans’.
Keep the questions simple, I think – the more hoops you make your readers jump through to comment, the fewer comments you’ll end up with.
As an aside, Eric, you’re missing a tabindex on the question field below, so when I tab to fill it in, it jumps straight past to the submit button.
Asking users to answer a question before they can post is likely to put people off commenting at all. But what can you do.
Talking of accessibility, I find inserted text a problem on websites because it appears underlined – but it is not a link. Also, what is the definition of “CAPTCHA”? Shouldn’t an abbreviation or acronym tag be used there?
I’d think anyone concerned with something that results in less comments becauuse you’re making the commentor answer a question is immaterial. After all, you’re merely asking that a commentor and comments stick to the topic of the post, right?
Likewise, if you’re interested in eliminating any and all kinds of comment spam – and by that I’m including comments that are not on topic too – and you are willing to maintain your blog, why not add one additional (optional?) item to each post you write – what question you want to challenge with?
If the author of the post cares to ask something related to what (s)he is posting, that would pretty much force all commentors to at least know what the post was about. So what if you’re asking they understand at least something of what you wrote. If the author leaves the challenge blank, WP would take over and use a randomly selected question.
@ Chris: CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart
It would be nice if you could set a cookie if someone has already answered the question correctly; seeing as the name, email and URI fields are already doing this, I would imagine it wouldn’t be too difficult to add the challenge question (though I don’t know how that would work if you wanted to change questions easily).
Chris: CAPTCHA stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart”, and while most of the tests mentioned here are CAPTCHAs, it’s most often used to describe the image tests. The actual acronym is a trademark.
Eric: Interesting addon, I may use it myself! One thing to note is that individual sites have different levels of attention, both by visitors and spambots. While it certainly would be nice to invent a bulletproof solution, it’s not necessary in the majority of installations which have few visitors and only a few spams a week. In those situations a dumb, static solution will do. Therefore IMHO a good anti-spam plugin should have an adjustable security level to match the site.
Though saying that, I can imagine thousands of abandoned blogs with simple challenges that have been foiled long ago by the spambots. In that case, it may be wise to enforce a good system for every blog (or disallowing comments on old posts).
This is an interesting idea. At first it seemed like a bit of work on the admin’s part to come up with questions and answers, but then I had a thought similar to Lachlan’s (comment 18) – what about using the title of the post?
This seems to have certain advantages, mainly since the key is already there, the administrator doesn’t have to create a series of questions and answers.
I’m sure that there are downsides to it, for example, entering a really long post title could be annoying and allow for introduced typing errors (if one does not copy and paste). Any other downsides?
I would have to disagree with Lachlan Hunt in comment 18. He said it wouldn”t be possible if the question asked could only be answered in the context of the correct web site.
I would most definitely have to disagree just based off of the chosen questions. When you consider that:
these types of questions would seem to be easy enough for a would be spammer to plan for.
I’m not sire what theanswer is, or where it lies, but I I am sure someone somewhere will figure it out, and I am pretty sure I’ll find out about it here.
I have come to the conclusion that comment spam will never be defeated, jsut like browser security issues will never be solved, if theres a will there a way. With that said the only thing that can be done is to make a plugin that would beable to be “updated” by the admin so that the security can be changed as spam bots get wiser.
With regard to image CAPTCHAs being used by porn front-ends to feed spam back-ends, why not include in the CAPTCHA image the URI of the blog on which the CAPTCHA is operating?
Many CAPTCHA implementations include only a single line of text which the human is expected to parse. Would including additional identifying text not used for the verification process make the system too complex? I should think that additional text in the body of the image should make it even harder for automated systems to try to parse.
Math questions are too easy to beat thanks to functions like eval(). Even google can answer questions like “two plus two plus two”.
I contemplated adding such a system back in September when I was overhauling my home grown system, and the idea wasn’t original then, either. The problem with this sort of test is that it assumes everyone who uses the form speaks the same language. This is all well and good for a fair majority of websites. However, there are a few out there that are multilingual and encourage comments in any language the author(s) speak. No matter which language you choose, someone will be unable to read it.
I didn’t take the time to look at the source, so I can’t say how similar this is to the one I wrote. However, rather than requiring users to type in the answer, I gave them multiple choice via select box: displayed the question and its answer with 3 random answers to other questions from the database. So, you might end up with “Moscow is the capitol of which country?” with the options being Russia, Minnie Mouse, blue, and dog.
The more I think about it, the more I feel that systems that require human intervention are not the right way to go. Adopting such solutions on a large scale could lead to an escalating war, which will eventually make forms complex enough to discourage users from commenting. I think a much better solutions lies with implementing a number of back end solutions; for example, random form variable names, which could be encrypted using a hashing function on a hidden key, and bayesian filtering of posted comments. Such solutions could be just as effective and much less intrusive than turing-type systems…
Working on an idea you mentioned in a reply Eric, could you…
Automatically generate the input name based on some variables, then make a hash of this variable name. Store the hash in the database, along with the real input name, and write the hash vaiable into a hidden form tag. Then when the form is submitted to the server, it checks the hash from the form against the database, this gives the real input name, which you can then do a request.form (or whatever the php equiv is) on that, and voila!
Since a hash is only one way, theres no way a bot will ever be able to work out what the input field name is. You can generate a new one for everytime the page is called, clear them out after some decent timeout period…
Does this make sense?
I guess, the bot could be clever enough to scrape your page and find out what the input field is named after xyz element on the page, but that would have to be specifically targeted at your site then…
Jeff (comment 30), you seem to have missed the point of my suggestion (comment 18). Of course the answers may be available in the feed, and they’re also availalble directly from the web page. But compared with the current question “What color is an orange?” where the answer is right there in the question, it’s an improvement.
Both the current question and my suggested questions require a human to provide the initial answer, but my suggested questions need to be asked in the context of the correct site.
Say, for example, the question “What is the domain name of this site?” was reproduced by a spammer on a porn site for an unsuspecting user to answer, how is that user supposed to know the correct answer is “meyerweb.com” and not “xxx-porn.example.com”. Thus any attempt by the spammer to use the answer provided will fail.
Also, keep in mind that the questions will change and even if a bot were programed to fetch the title of the article from the feed for this weeks question, that will fail for next weeks.
c. s. (comment 33), the problem with providing all the possible answers in a select box is that it only requires the spammer to post the spam multiple times, each time selecting the different answer. So, if there were three possible answers provided for any question, and the spammer hit the site multiple times, approx. 1/3 spams will get through, no matter what the question is since changing the question wouldn’t change the method of obtaining the answers.
I am starting to wonder if this isn’t all turning into a rube goldberg web experiment. All these dramatic devices to prevent comment spam. Maybe the simple solution is to just require registration to comment. Your reader will have to go to the trouble of registering once, but after that, they can comment to their hearts content. As opposed to having to jump through a hoop every time. Have all registrations approved by the admin, and you eliminate any bots from joining.
It’s a simple solution, but requiring registration to comment just puts people off commenting altogether.
How about a form asking for the nth word from a sentence in the post? Or a word from another post you’d have to manually find first? Or a word from another site altogether?
How about asking ‘What colour is Thursday?’ or perhaps require the commentator to list all months with an ‘r’ in them, then type in the alphabet backwards.
You keep track of the question being asked somewhere, right? well, keep track of the fieldnames of the answers/post/etc in the same spot, and then on the page that is posted to, pull those from whereever (the database or session variables, presumably), and then continue as normal:
$answerFieldName = $mysqlResultRow['answerFieldName']; //or whereever
$emailFieldName = $mysqlResultRow['emailFieldName'];
//get the variables out of the POST
$answer = $_POST[$answerFieldName];
$email = $_POST[$emailFieldName];
//continue as before
No spam please we’re British
Eric Meyer has written a little WordPress plugin that aims to foil the evil computer programs which are filling people’s websites with comments that link back to commercial sites. The purpose is to catch the unwary reader, just as spam e-mail does, as…
I’m curious as to why no one has mentioned Hashcash! I use a Hashcash solution for WP that I wrote, and since implementation have received no spam at all.
Eric’s Archived Thoughts: WP-Gatekeeper
Eric’s Archived Thoughts: WP-Gatekeeper (more, dl & instructions)…
I apologize in advance for continuing the diversion away from the topic of WP-Gatekeeper, but I just stumbled upon Proposal for an Accessible Captcha and I thought some of the readers here may be interested in the read.
CAPTCHA State of the Union 2005
I have to say, this week is really going well. On Monday, Eric Meyer (whom you may know for having more CSS knowledge in his pinky toe than most of us have in our entire bodies) released a WordPress anti-spam plugin that uses logic puzzles. This is ano…
I’ve been experimenting with a captcha method, using CSS rather than images.
The idea is to output the code as text, but attempt to obfuscate it by outputting the code in random order, and adding noise characters to prevent a bot from easily scraping the code.
Experimental PHP5 Code.
The random code generator is probably too harsh, and replacing it with a randomly selection from the most common 1,000 english words greater than say 3 characters would be much more friendlier.
I have a solution that might also have potential. It might work if you added radio buttons of which you had to click on the correct randomly selected button before posting. I’m not sure if spammers can fill out radio buttons as of yet. Combine that with another challenge and it might be complicated for spammers, but not for humans.
Amazing how so many people hit on similar solutions. After someone last week pointed me (http://tinyurl.com/4hcw2#comment-340) to http://www.syndic8.com/~jeff/blog/index.php?p=103, I started designing an almost identical solution. I happened to stumble across this one today. I really like the simple interface. I haven’t gotten it to work on my site since I’m on WP 1.5, but I’m sure I could get it in if I want.
I have one suggestion, though. Instead of keys stored in the database, why not send a hash of the correct answer, together with some static, but private string, in a hidden variable? Then the table would need to hold only the questions and there answers, and the verification wouldn’t need to take a DB hit.
$key= ' <input name="key" type="hidden" value="' . md5($challenges[$cr]->passcode . "secret string") . '"' . $closeslash . '> ';
And then you can do hash the returned answer and compare the results. Does this make sense?
Oops, typos in my previous:
Then the table would need to hold only the questions and their answers
I have dowloaded from http://meyerweb.com/eric/tools/wordpress/wp-gatekeeper.html, but the version that I read is 1.5RC1. That site say that the version is 1.5RC2. I found http://dev.wp-plugins.org/browser/wp-gatekeeper/. So, what version (the latest) should I monitoring? from this site or from dev.wp-plugins.org? I like this script very much, there is no accessibility problem.
Having had serious spam when using MoveableType and after transferring to WordPress I decided I needed to take drastic action.
I was on the uni server when I used MT, but it didn’t have the PHP graphics module so the Captcha didn’t work. I …
great plugin man. and it’s better than CAPTCHA cos it’s blind-people friendly. But you guys should include this in the instructions:
“after installing you have to log out of your wordpress control panel if you want to test it, because by default the challenge is not posed for the administrator (you)”.
i wonder why you should create this plugin and not use it on your own site (this blog). that’s kind off ironic don’t you think. well great plugin all the same. peace!
What makes you happy ? » Gatekeeper in.
[…] ich I have always quite liked when using it on other’s blogs – is Eric Meyer’s Gatekeeper. I pose a range of quetions, you answer them. So it is now installed and r […]
Web Communication Link Relationships - Lachy’s Log
[…] es, on the condition that it is removed after moderation.
Of course, there are much better
ways to block spam that sho […]
>No image CAPTCHAs. Too many accessibility problems, and too easy to >defeat.
Hmmm…I am wondering how far a spammer would go especially when he/she has to crack a image security code.
They might hire some cheap chinese worker to send their spam by entering each time the image security code.
Hey… Just testing the plugin works!
Wordpress Comment Plugins: Building A Fortress To Defend Against Spam! at MUSicTECHnology.net
[…] there are a couple of other great plugins that are similar like Meyerweb’s WP-Gatekeeper , Did You PASS Math?, ΛορδΧηαος’s Challenge Plugin and Captcha!, I am a […]
Defeating contact form spam by hiding the webmail script | Ardamis.com
[…] correctly answered before the form could be submitted. Eric Meyer wrote a very inspiring piece at WP-Gatekeeper on the use of easily human-comprehensible challenge questions like “What is Eric’s […]
A colleague of mine just had an idea. What about using an image button for the submit button which would pass in the x/y coordinates? If the way spammers are exploiting this is by posting to the form from elsewhere it seems that even checking the referring url might work.
How to avoid spam? | paul olyslager
[…] questions into the form, which needs to be answered by the visitor. Eric Meyer wrote an article at WP-gatekeeper about this. He suggested to ask simple questions, eg. “what is Eric’s first […]
Defeating contact form spam by hiding the webmail script | Ardamis
[…] must be correctly answered before the form is submitted. Eric Meyer wrote a very inspiring piece at WP-Gatekeeper on the use of easily human-comprehensible challenge questions like “What is Eric’s […]