More Spam To Follow

Published 19 years, 2 months past

So… rel="nofollow".  Now there’s a way to deny Google juice to things that are linked.  Will it stop comment spam?  That’s what I first thought, but I’ve come to realize that it’ll very likely make the problem worse.  In the last few hours, I’ve been hearing things that support this conclusion.

First, the by-now required disclaimer: I think it’s great that Google is making a foray into link typing, and I don’t think they should reverse course.  For that matter, it would be nice if they paid attention to VoteLinks as well, and heck, why not collect XFN values while they’re at it?  After all, despite what Bob DuCharme thinks, the rel attribute hasn’t been totally ignored these past twelve years.  There is link typing out there, and it’s spreading.  Why not allow people to search their network of friends?  It’s another small step toward Google Grid… but I digress.

The point is this: rather than discourage comment spammers, nofollow seems likely to encourage them to new depths of activity.  Basically, Google’s move validates their approach: by offering bloggers a way to deny Google juice, Google has acknowledged that comment spam is effective.  This doesn’t mean the folks at Google are stupid or evil.  In their sphere of operation, getting comment spam filtered out of search results is a good thing.  It improves their product.  The validation provided to spammers is an unfortunate, possibly even unanticipated, side effect.

There is also the possibility, as many have said, that nofollow will harm the Web and Google’s results, because blindly applying a nofollow to every comment-based link will deny Google juice to legitimate, interesting stuff.  That might be true if nofollow is used like a sledgehammer, but there are more nuanced solutions aplenty.  One is to apply nofollow to links for the first week or two after a comment is posted, and then remove it.  As long as any spam is deleted before the end of the probation period, it would be denied Google juice, while legitimate comments and links would eventually get indexed and affect Google’s results (for the better).

In such a case, though, we’re talking about a managed blog—exactly the kind of place where comment spam had the least impact anyway.  Sure, occasionally the Googlebot might pick up some spam links before the spam was removed from the site, but in general spam doesn’t survive on managed sites long enough to make that much of a difference.

Like Scoble, where I might find nofollow of use would be if I wanted to link to the site of a group or person I severely disliked in order to support a claim or argument I was making.  It would be a small thing, but still useful on a personal level.  (I’d probably also vote-against the target of such a link, on the chance that one day indexers other than Technorati‘s would pay attention.)

No matter what, the best defenses against comment spam will be to prevent it from ever appearing in the first place.  There are of course a variety of methods to accomplish this, although most of them seem doomed to fail sooner or later.  I’m using three layers of defense myself, the outer of which is currently about 99.9% effective in preventing spam from ever hitting the moderation queue, let alone make it onto the site.  One day, the layer’s effectiveness will very suddenly drop to zero.  The second layer was about 95% effective at catching spam when it was the outer layer, and since it’s content-based will likely stay at that level over time.  The final layer is a last-ditch picket line that only works in certain cases, but is quite effective at what it does.

So what are these layers, exactly?  I’m not telling.  Why not?  Because the longer these methods stay off the spammers’ radar, the longer the defenses will be effective.  Take that outer layer I talked about a moment ago: I know exactly how it could be completely defeated, and for all time.  Think I’m about to explain how?  You must be mad.

The only spam-blocking method I can think of that has any long-term hope of effectiveness is the kind that requires a human brain to circumvent.  As an example, I might put an extra question on my comment form that says “What is Eric’s first name?”  Filling in the right answer gets the post through.  (As Matt pointed out to me, Jeremy Zawodny does this, and that’s where I got the idea.)  That’s the sort of thing a spambot couldn’t possibly get right unless it was specifically programmed to do so for my site—and there’s no reason why any spammer would bother to program a bot to do so.  That would leave only human-driven spam, the kind that’s copy-and-pasted into the comment form by an actual human, and nothing besides having to personally approve every single post will be able to stop that completely.

So, to sum up: it’s cool that Google is getting hip to link typing, even though I don’t think the end result of this particular move is going to be everything we might have hoped.  More active forms of spam defense will be needed, both now and in the future, and the best defense of all is active management of your site.  Spammers are still filthy little parasites, and ought to be keelhauled.  In other words: same as it ever was.  Carry on.


Comments (35)

  1. Pingback ::

    Fighting Spam as Secret Art :: DenkZEIT

    […] : Englisch Spam — Steffen @ 11:52 Reading Eric’s More Spam To Follow on the newly introduced rel=”nofollow” and his measures aga […]

  2. Pingback ::

    Carol Ott Online »

    […] Interesting commentary going on about comment spam and how to kill it. Eric Meyer has given his valuable input, and I really like this guy’s idea for a &#8220 […]

  3. What are people’s thoughts on using an image of a string characters that users have to type before submitting. I’ve seen it used in a few places. Surely that’s at least as effective as asking a question (like Eric’s first name question).

    I guess the main danger is character recognition programs. I imagine, though, that you can sufficiently confuse these with image ‘noise’.

    I’m asking because this seems to me to be about as effective as the question option but a bit easier for genuine commenters to complete.

    …I think i just answered my own question. I guess it won’t work for the visually disabled and text-only browsers. Oops!

  4. What are people”s thoughts on using an image of a string characters that users have to type before submitting.

    Accessibility issues. What do blind people or people who are hard of seeing do?

  5. Adding the question is proof against bots, but isn’t proof against the current alleged best attack against captchas – hire some one in the third world for pennies to do the spamming for you.
    A real human brain at a bot price :(

  6. I don’t see how Google acknowledging that it works helps spammers in any way. It’s trivial to determine whether it works or not, you don’t need Google to tell you.

  7. All this rel=”nofollow” stuff looks just a PR move with an real intent to protect PR.

    Altough I also doubt it is PR that spammers are after. They are after traffic to their sites, and while good positions in SERPs would help to achieve that it is not vital, IMHO. Many of those sites come and go so quickly that they simply have not enough time to build any PageRank nor they need any.

    So we are left with the same old comment spam, new referer spam, new attribute with misleading value (shouldn’t it be “nopagerank”?) and new tool for PageRank manipulation.

    I will stick to moderation, but it is easy for me, because I get only few spam comments per day. I may consider other options, but captchas and alike are out of consideration — they hurt ordinary visitors and they witness our defeat.

    Web is an interesting place.

  8. It shouldn’t be “nopagerank” because google, being sensible, is trying create a standard.

    Other search engines don’t use “PageRank” they have their own systems, so why specify just google? If say MSNBot gets updated with this (and MS probably will do I imagine) then thats most searches right there!

    Also it fits in with the normal convention of robots.txt phrasing.

    My view is that its a good thing, but with limited effectiveness. I couldn’t care about the supposeded impact of links in comments not being indexed. If its THAT good a comment people will blog about it, or update the article.

  9. Good points. I too like the added control that rel=nofollow will provide over posted links, but it’s obviously wrong to label it as something that will “prevent comment spam”, as Google has.

    Incidentally, I had never heard of VoteLinks until you mentioned it here.

  10. Comment spam needs to be stopped before it “explodes”, although can it ever really be killed (Comment No.3 and “captchas”)

    The search engines taking this step is a positive action, and is a step in the right direction.

    As for the PR, its only an aspect of it, although it was easy for the search engines add to their systems

  11. Also it fits in with the normal convention of robots.txt phrasing.

    Only if Google don’t actually follow the link for indexing purposes. A lot of people seem to think that it simply doesn’t count for pagerank purposes, and the information coming out of Google is ambiguous at the moment.

  12. I’m curious what Eric thinks of the reaction on the W3C’s www-html list, which can best be described as “dismissive.” Only after the first 20 messages has anyone come up with an alternative, but there seems universal agreement among W3C-supporters that the only way is to go through the W3C’s process.

    By expressing even mild support for this initiative, they would seem to think that you’re undermining the whole standards process. Do you agree?

  13. Regarding the problem of having the “nofollow” attribute in legitimate commenters’ links, what would be the downside of having both a “Preview” and a “Post” button for comments, and including the “nofollow” attribue only if the commenter posts without previewing first? A little explanatory text under the comment box (such as what appears on this page) could alert savvy commenters of the distinction. I’m probably missing something obvious here, since I’m relatively new to this arena, but I offer it up for discussion anyway…

  14. “The only spam-blocking method I can think of that has any long-term hope of effectiveness is the kind that requires a human brain to circumvent. As an example, I might put an extra question on my comment form that says “What is Eric”s first name?” Filling in the right answer gets the post through. .. That”s the sort of thing a spambot couldn”t possibly get right unless it was specifically programmed to do so for my site—and there”s no reason why any spammer would bother to program a bot to do so.”

    But what if spammer networks set up some sort of private, globally-accessible database that provides their spambots the answer to such questions? It would only require one spammer to add the answer to such a database (when their software tells them that an answer is needed) before this would be defeated.

  15. In regards to the comment about www-html list mentioned by Sandy (comment 10), I don’t believe the W3C is entirely dismissive of the need to address the comment spam situation, nor that the only way is to go through the W3C process. However, you are correct in that most people that have replied to the thread completely agree with the fact the nofollow is an extremely poor choice, and is nothing more than a proprietary extension that is ccomparable with proprietary elements and attributes added in the past, the only difference being that this extensions still validates. The value still lacks useful semantics, and is nothing but abuse of the misunderstood rel attribute.

    Regarding the suggestions for alternate names, I have explained why such alternatives are also unsuitable and also discussed the better alternative approach, which I will be publishing the full details of on my own site shortly.

    Finally, I really like the idea about offering a question for legitimate users to answer before submitting their comment, and I have a suggestion for the problem mentioned by Martin (comment 12) about spammers just adding the answer to their database. The question asked could be randomly selected from a set for each visitor. Additionaly new questions could be added and old ones removed so any Q&A stored by a spammer will eventually time-out and become useless.

  16. Trackback ::

    geek ramblings

    Meme Roundup

    There have been several new memes lately that I have thoughts on, but I just haven’t had the time to comment on them here. I still don’t have time to discuss them in as much depth as I’d like, but here’s a quick summary:

    RSS 1.1:
    I have mix…

  17. Lachlan: your ideas about giving a maintainer the ability to manage a list of randomly-rotated “gatekeeper” questions is just what I was thinking. I also had the idea that there should be a way to easily alter the input field name; for example, the software might have a field that stores a prefix/suffix that is to be added to the “gk” field. Thus I might input “mw”, and the blog software would auto-alter the input tag to say:

    <input type=”text” name=”mwgk” />

    …or “gkmw” or “mw-gk” or “mw_gk” or whatever. I could then change my field-name modifier at will. It might even make sense to have a list of modifiers, just like a list of challenge/response pairs. This would, again, mean that a managed blog would be relatively spamproof, because the author would stay one step ahead of any hypothetical global answer key.

    And my feeling is that spammers wouldn’t bother with the global answer key, because there’d be no need. There will still be a comment forms without any protection, and as long as those serve as a transmission vector, it will be less work to do extra spamming runs than it will be to try to defeat measures like gatekeeper questions. It would require huge numbers of blogs taking up that approach before they’d bother.

  18. Trackback ::

    Nokrev

    Views on nofollow
    Well, it seems everybody who is anybody is talking about [`rel=”nofollow”`](http://www.google.com/googleblog/2005/01/preventing-comment-spam.html “View Google’s release of the nofollow rel attribute”), and I can understand why. This at first seems…

  19. Trackback ::

    Reflexive&#45;Blog

    Imagine the Web without pagerank
    Imagine a Web with pagerank, Google created it. Pagerank is the value of a page, in the search results of Google, based on the number of pages doing an hyperlink to the page in question, and on their respective ranks. The more a page is successful, the…

  20. More about Captcha (automated are-you-a-human test) methods, including how they can always be broken:
    Wikipedia’s Catcha article

  21. > blindly applying a nofollow to every comment-based link will deny Google juice to legitimate, interesting stuff

    I don’t agree – google will surely find the target pages in other ways, if the site is legitimate..

  22. funny this is exactly what eric’s talking about, except for other kinds of technology:
    Exploring the law of unintended consequences [printer-friendly] | The Register

  23. nice write up eric, never ceases to amaze me how spam bots are written and the tremendous amount of man hours that are required to stop them.

    I do tip my hat to you for continuing to make commenting available for us, as so many have abandoned the feature.

    Unfortunately, the spam bots have done alot of damage to the very basis of what drives the web – interactivity. guess it is us against the machines.

  24. From comment number 8 we see this: Comment spam needs to be stopped before it “explodes”.

    I would venture to say that a large group of people are suffering a lot of collateral damage…the explosion is HERE.

    I applaud the work of Eric and others who work so hard to give us simple bloggers the tools to stem the tide. Thanks to you all!

    Despite my own cynicism about the corporate reasons for implementing “nofollow” I still support any effort to make life miserable for spammers. I just don’t want to make life miserable for myself as a site admin nor do I wish to penalize my site visitors in any way for posting legitimate links to sites and articles of interest.

  25. Personally, the only way I could see all this comment spam stopping is to stop leaving blogs up and running if not maintained. Probably the best way to do this is to set all blogging software to disable comments (and maybe even comment links) after X weeks. I am sure there is a way to strip out the link tags after the fact.

  26. Trackback ::

    Minh’s Notes

    Ever closer
    Google is getting ever closer to becoming the Grid.

  27. Trackback ::

    The Red Baron Blog

    Follow Google to Spamsville
    I know what Seth is saying. Dieting books are all the rage, and yet, they don’t seem to truly solve the problem, because statistics prove that losing weight…Is a losing battle. And yet, here I am…Living proof that it can be done. So, if you wer…

  28. Eric,

    I’ve recently implemented a solution that’s along the lines of what you’re currently doing, and your response in comment #15. I don’t think it’s quite ready for prime-time yet, but it’s at once user-friendly (that is, it’s transparent to the end-user), and completely effective at stopping spambots.

    Seems to be that continues to be the best path — differentiating between bonafide human interaction and bot-related activity.

  29. Trackback ::

    Burningbird

    Conversation
    I am working on a follow up post on tags and folksonomies, but the going is slow, not the least because I’ve been helping folks with trackback spam and various other technical problems. Too much so at one point because I think I deleted good trackback…

  30. see blog of Ben Hammersley

    “Let no fellow nofollow, lest we all lie fallow”

    http://www.benhammersley.com/weblog/2005/01/20/let_no_fellow_nofollow_lest_we_all_lie_fallow.html

  31. Trackback ::

    Mike's Blog => Say No To Nofollows!

    Trackbacking your entry…

    […] in fact it may have even trigger more trackback spam, .. So I think if people contribute to my blog by adding a comment they deserve a link, a real link without a nofollow! ..Say no to nofollows! […]

  32. Pingback ::

    geek ramblings » Follow you, follow me

    […] It had no way to know whether it was de-juicing a good guy or a bad guy. (Eric Meyer had some good thoughts on this subject, […]

  33. Pingback ::

    A Fool’s Wisdom » Do Follow WordPress

    […] from Google. But it still feels like it was a sledge hammer and that is reflected by Eric Meyer discussions on the issue at the time. But most of the services seemed to, which speaks to the real pain they […]

  34. Pingback ::

    No follow, no interconnection at Onno Bruins

    […] +Eric Meyer: More Spam to Follow […]

  35. Pingback ::

    » The Comments Conundrum | Web Development Blog: Heidi Adams Cool

    […] More Spam To Follow (Eric Meyer on rel=”nofollow”) […]

Add Your Thoughts

Meyerweb dot com reserves the right to edit or remove any comment, especially when abusive or irrelevant to the topic at hand.

HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <em> <i> <q cite=""> <s> <strong> <pre class=""> <kbd>


if you’re satisfied with it.

Comment Preview