Posts in the Tech Category

More Spam To Follow

Published 21 years, 2 days past

So… rel="nofollow".  Now there’s a way to deny Google juice to things that are linked.  Will it stop comment spam?  That’s what I first thought, but I’ve come to realize that it’ll very likely make the problem worse.  In the last few hours, I’ve been hearing things that support this conclusion.

First, the by-now required disclaimer: I think it’s great that Google is making a foray into link typing, and I don’t think they should reverse course.  For that matter, it would be nice if they paid attention to VoteLinks as well, and heck, why not collect XFN values while they’re at it?  After all, despite what Bob DuCharme thinks, the rel attribute hasn’t been totally ignored these past twelve years.  There is link typing out there, and it’s spreading.  Why not allow people to search their network of friends?  It’s another small step toward Google Grid… but I digress.

The point is this: rather than discourage comment spammers, nofollow seems likely to encourage them to new depths of activity.  Basically, Google’s move validates their approach: by offering bloggers a way to deny Google juice, Google has acknowledged that comment spam is effective.  This doesn’t mean the folks at Google are stupid or evil.  In their sphere of operation, getting comment spam filtered out of search results is a good thing.  It improves their product.  The validation provided to spammers is an unfortunate, possibly even unanticipated, side effect.

There is also the possibility, as many have said, that nofollow will harm the Web and Google’s results, because blindly applying a nofollow to every comment-based link will deny Google juice to legitimate, interesting stuff.  That might be true if nofollow is used like a sledgehammer, but there are more nuanced solutions aplenty.  One is to apply nofollow to links for the first week or two after a comment is posted, and then remove it.  As long as any spam is deleted before the end of the probation period, it would be denied Google juice, while legitimate comments and links would eventually get indexed and affect Google’s results (for the better).

In such a case, though, we’re talking about a managed blog—exactly the kind of place where comment spam had the least impact anyway.  Sure, occasionally the Googlebot might pick up some spam links before the spam was removed from the site, but in general spam doesn’t survive on managed sites long enough to make that much of a difference.

Like Scoble, where I might find nofollow of use would be if I wanted to link to the site of a group or person I severely disliked in order to support a claim or argument I was making.  It would be a small thing, but still useful on a personal level.  (I’d probably also vote-against the target of such a link, on the chance that one day indexers other than Technorati‘s would pay attention.)

No matter what, the best defenses against comment spam will be to prevent it from ever appearing in the first place.  There are of course a variety of methods to accomplish this, although most of them seem doomed to fail sooner or later.  I’m using three layers of defense myself, the outer of which is currently about 99.9% effective in preventing spam from ever hitting the moderation queue, let alone make it onto the site.  One day, the layer’s effectiveness will very suddenly drop to zero.  The second layer was about 95% effective at catching spam when it was the outer layer, and since it’s content-based will likely stay at that level over time.  The final layer is a last-ditch picket line that only works in certain cases, but is quite effective at what it does.

So what are these layers, exactly?  I’m not telling.  Why not?  Because the longer these methods stay off the spammers’ radar, the longer the defenses will be effective.  Take that outer layer I talked about a moment ago: I know exactly how it could be completely defeated, and for all time.  Think I’m about to explain how?  You must be mad.

The only spam-blocking method I can think of that has any long-term hope of effectiveness is the kind that requires a human brain to circumvent.  As an example, I might put an extra question on my comment form that says “What is Eric’s first name?”  Filling in the right answer gets the post through.  (As Matt pointed out to me, Jeremy Zawodny does this, and that’s where I got the idea.)  That’s the sort of thing a spambot couldn’t possibly get right unless it was specifically programmed to do so for my site—and there’s no reason why any spammer would bother to program a bot to do so.  That would leave only human-driven spam, the kind that’s copy-and-pasted into the comment form by an actual human, and nothing besides having to personally approve every single post will be able to stop that completely.

So, to sum up: it’s cool that Google is getting hip to link typing, even though I don’t think the end result of this particular move is going to be everything we might have hoped.  More active forms of spam defense will be needed, both now and in the future, and the best defense of all is active management of your site.  Spammers are still filthy little parasites, and ought to be keelhauled.  In other words: same as it ever was.  Carry on.


S5 1.1b3

Published 21 years, 1 week past

Well, there was time off for the holidays, but now S5 is back and ready to increment its beta number.  So, without too much ado: S5 1.1b3 (248KB ZIP file).  Here’s the current testbed presentation, for those who just want to play around with it.  Because of the long holiday break, I want to add another beta round or two just to work out as many kinks as possible.  So this isn’t the last version before going final on 1.1; still, I’m interested in any problems that people encounter.

There’s really only one notable change from the previous version.  I incorporated Jordan Liggitt’s “type slide number” code into this version.  Why his, when others have done similar things?  Because his version was well-marked with comments, and thus easy for me to figure out what he’d done and how he’d done it.  So here’s how it works:

  • If the user types a number (multi-digit is allowed), the script stores the number.  Inputting any non-number key clears the entered number.
  • If the user hits Enter/Return while there is a number stored, the slide show jumps to that slide.  Any attempt to jump directly to a slide past the end of the slide show results in no action, although the number is still cleared.
  • Hitting any of the “Next” or “Previous” keys while there is a number entered causes the slide show to skip the number entered in the appropriate direction.  Thus, entering “3” and hitting the space bar would jump forward three slides; entering 5 and hitting Page Up would jump backward five slides.  Skipping past the end of the slide show will drop you on the title slide, which is something I’m thinking about changing, though I’m not entirely certain in what way.

I’m mulling over which keys should invoke which jumping behavior.  For example, a couple of times I’ve typed a slide number and then hit the space bar to advance directly to that slide.  Instead, I jumped forward by that number, which is correct but obviously not what I was subconsciously expecting.  So I’m thinking about further restricting the keys that trigger the “jump n slides” behavior.  Anyone have suggestions based on other slide show software?

At this stage, I’m likely to put off adding the multiple-author meta that I toyed with in earlier versions.  The general need is still there, but I’m just not able to think the problem through with the kind of clarity I want.  It will have to wait for another day.  I’m also dithering a bit about the licensing, though at this point I’m leaning pretty heavily toward using Expat.  My hesitation is largely based on my very desire to make the right choice so that I never, ever have to worry about it again, you know?

Anyway, as always, feedback is welcome.


Tabular Weirdness

Published 21 years, 1 week past

Recently I was doing some table styling for a client and ran into what I can only call tabular weirdness.  There were two different things that I stumbled across, and interestingly, they were the kinds of problems you wouldn’t be likely to encounter in layout tables.  These would come up much more often in data tables.

In the first case, the general idea was to put some space between the tables and the surrounding material, but as these were data tables, they came with captions.  So I of course put the caption text in caption elements.  That’s when things started to get inconsistent.

To be more precise, the problems began after I left Safari to check the page in other browsers. In Safari, you see, the caption’s element box is basically made a part of the table box.  It sits, effectively, between the top table border and the top margin.  That allows the caption’s width to inherently match the width of the table itself, and causes any top margin given to the table to sit above the caption.  Makes sense, right?  It certainly did to me.

However, according to section 17.4 of CSS2.1 and the figure that accompanies it, the caption sits entirely outside the table’s box, and that includes the table’s margin.  The two are still tied together by the generation of an anonymous box, but the upshot is that if you give the table left and right margins, then the caption does not follow suit.  If you give the table a top margin, it pushes the caption away from the table. This is the behavior evinced by Firefox 1.0, and as unintuitive as it might be, it’s what the specification demands.

The third piece of strangeness was found in IE/Win.  What I’d done was simply said that some cell borders should be solid—nothing more complicated than border-bottom: 1px solid.  The idea was that it would, as borders do, pick up the foreground color of the cell, but IE/Win had other ideas.  As best I could tell, the borders were a light gray.  You can see it happen in the testcase I constructed to create the images in this entry.  Explicitly specifying a border color fixes the problem, of course, but it was a bit of weirdness I thought I’d pass along in case anyone runs into the same thing.


Mickey Prints

Published 21 years, 1 week past

Since Kat and I were going to be visiting Florida so often last year and this, and therefore we of course had to visit Disney World a lot, we decided to buy annual passes.  I was quite interested that when you buy an annual pass, the Disney folks take the prints of your right hand’s first and second fingers.  That data is associated with the card; whether it’s encoded onto the card’s strip or not, I don’t know.  But either way, some of your biometric data is associated with your Disney pass.  When you enter the park, you run the pass through the turnstile and stick your fingers into a reader.  If the fingers don’t match the card, you can’t get in, so you can’t share an annual pass with anyone else.

Now, suppose the Disney database stores that biometric data.  Now they have that data tied to a credit card number, purchasing patterns in the parks, probably a home address and phone number, and so on.  Interesting.  Guess what?  As of 2 January 2005, Disney is doing that for all passes: day passes, park hopper passes, all kinds of passes.  Every kind of pass.  Get a pass, get your fingers scanned.  (Okay, yes, you can opt out and be required to show photo ID, but how many people will bother?)

That’s a whole lot of biometric data associated with a whole lot of consumer data.  Interesting, don’t you think?


Don’t Care About Market Share

Published 21 years, 1 month past

In a fashion vaguely reminiscent of the process by which weeds keep growing back no matter how you try to rid yourself of them, the question of browser market share has once again been rearing its foul, misshapen head.  Dan kicked off a round of it over at Simplebits, but it’s recently been popping up other places as well.  I heard discussions about market share at SES Chicago, perhaps unsurprisingly, but I’ve also been seeing the question on various mailing lists and other forums.

The only thing more frustrating than the persistent recurrence of this unnecessary question is the inappropriate gravity it seems to acquire in so many minds.

Look, I’ll make this very simple for everyone.  If you’re trying to figure out what browsers to support (or not) in terms of layout consistency on a given site, then the answer is very easy.  Whatever the site’s access logs tell you.  End.  Of.  Story!

For example, the stats for the past few days’ worth of visitors to Complex Spiral Consulting tell me the following:

User AgentPortion of hits
Firefox 43%
IE6 30.8%
Mozilla 8.8%
Safari 8.6%
Opera 2.4%

(For those who are curious, IE5.5 makes up 0.8% of hits.  Various flavors of IE5.x below IE5.5 total roughly 1.2%, but note that Windows and Mac users are lumped together there.)

Those statistics tell me quite a bit about the people who visit the CSC site, and I can use that information to decide what to do about browser support.  You know what those numbers tell you about which browsers to support (or not) in your designs for sites on which you work?  Absolutely squat.  Anyone who uses those access statistics to make decisions for their own work is a fool, and a misinformed fool at that.

In every design, we have to ask what browsers need to have a consistent experience, which ones can be given a reduced experience, and which ones get no design at all.  The user logs from another site are useless in trying to make this decision.  The “global statistics” from firms like WebSideStory are just as useless in this case.  They may be entirely accurate, but they are also entirely irrelevant when it comes to making design support decisions.  The only stats that matter are the ones that come from the site you’re designing.

In a like manner, I don’t care if you think visitors to your site or some other favorite site of yours are an accurate reflection of the overall Internet population or not: that opinion is similarly irrelevant.  It’s rather like me claiming that the people who come to our annual holiday party are an accurate reflection of partygoers in general.  Maybe they are and maybe they aren’t, but either way I don’t think you should plan your all-night rave to accomodate the kinds of people who drop by our house to have homemade bread and soup and chat about babies, politics, science-fiction movies, and the weather.  And vice versa.

(Do remember that your site’s stats may reflect its current behavior instead of your potential audience.  If your site is already broken past the point of usefulness in Safari, then you’re going to see very low Safari numbers.  Make sure that you’re comparing apples to apples, and only compare the numbers in your access logs for browsers that can already use the site.)

As for the related question of “at what percentage level do I decide a browser isn’t worth bothering about”—well, that’s really up to you, isn’t it?  I certainly can’t tell you when it’s worthwhile to stop worrying about IE5.0, or Netscape 4.7, or Mosaic 1.2.  I know what I think is appropriate for the sites I work on—and the process of finding the answer is different for every site.  It has to be, because every site is different.

Now, if you want to share your user demographics with anyone who wants a peek, hey, have fun with that.  If data exhibitionism is your thing, who am I to judge?  Just don’t pretend that the bits of data you’re exposing to the world are representative of everyone else’s, because I guarantee you that they are not.  As for anyone who happens to glance at your data: I hope they realize the same thing.


SES Chicago Report

Published 21 years, 1 month past

Due to some weather-related travel upheavals, I didn’t get to spend as much time at SES Chicago as I would have liked—I ended up flying in Tuesday afternoon, speaking before lunch Wednesday, and leaving Wednesday evening.  Still, the panel went very well, the speakers were quite gracious, and I didn’t even need a fire extinguisher.

Based on what was said in the panel and the fleeting conversations I was able to have (sometimes from the podium) with Matt Bailey and Shari Thurow, here’s what I took away from the conference:

  • Semantic markup does not hurt your search engine rankings.  It may even provide a small lift.  However, the lift will be tiny, and it isn’t always a semantic consideration.  Search engines seem to use markup the same way humans do: headings and elements that cause increased presentational weight, such as <strong> and <i>, will raise slightly the weight of the content within said elements.  So even the presentational-effect elements can have an effect.  They also stated that if you’re using elements solely to increase ranking, you’re playing a loser’s game.
  • The earlier content sits in the document, the more weight it has… but again, this is a very minor effect.
  • Hyperlink title attribute and longdesc text has no effect, positive or negative, on search engine ranking.  The advice given was to have a link’s title text be the same as its content, and that anything you’d put into a longdesc should just go into the page itself.  (Remember: this advice is ruthlessly practical and specific to search-engine ranking, not based on any notions of purity.)
  • Having a valid document neither helps nor hurts ranking; validation is completely ignored.  The (paraphrased) statement from a Yahoo! representative was that validation doesn’t help find better information for the user, because good information can (and usually does) appear on non-valid pages.
  • Search engine indexers don’t care about smaller pages, although the people who run them do care about reducing bandwidth consumption, so they like smaller pages for that reason.  But not enough to make it affect rankings.
  • A lot of things that we take for granted as being good, like image-replacement techniques and Flash replacement techniques, are technologically indistinguishable from search-engine spamming techniques.  (Mostly because these things are often used for the purpose of spamming search engines.)  Things like throwing the text offscreen in order to show a background image, hiding layers of text for dynamic display, and so forth are all grouped together under the SEO-industry term “cloaking”.  As the Yahoo! guy put it, 95% of cloaking is done for the specific purpose of spamming or otherwise rigging search engine results.  So the 5% of it that isn’t… is us.  And we’re taking a tiny risk of search-engine banishment because our “make this look pretty” tools are so often used for evil.

Reading that last point, you might be wondering: how much of a risk are you taking?  Very little, as it turns out.  Search engine indexers do not try to detect cloaking and then slam you into a blacklist—at least, they don’t do that right now.  To get booted from a search engine, someone needs to have reported your site as trying to scam search engines.  If that happens, then extra detection and evaluation measures kick in.  That’s when you’re at risk of being blacklisted.  Note that it takes, in effect, a tattletale to make this even a possibility.  It’s also the case that if you find you’ve been booted and you think the booting unfair, you can appeal for a human review of your site.

So using standards will not, of itself, increase your risk of banishment from Google.  If someone claims to Google that you’re a dirty search spammer, there’s a small but nonzero chance that you’ll get booted, especially if you’re using things like hidden text.  If you do get booted and tell Google you aren’t a spammer, and they check and agree with you, you’ll be back in the index immediately.

So there’s no real reason to panic.  But it’s still a bit dismaying to realize that the very same tools we use to make the Web better are much more often used to pollute it.  I don’t suppose it’s surprising, though.

Due to my radically compressed schedule, I was unfortunately not able to ask most of the questions people suggested, and for that I’m very sorry.  There was some talk of having me present at future SES conferences, however, so hopefully I’ll have more chances in the future.  I’ll also work the e-mail contacts I developed to see what I can divine.


S5 1.1b2

Published 21 years, 1 month past

Behold: v1.1 beta 2 of S5.  This version has a few of changes, all of which are being floated as trial balloons.  Feedback on them all is appreciated.

  • Change in file structure.  Now the ui/ directory will contain only directories.  Thus, the default theme and scripts live in ui/default/.  The reason for this is so that other themes can be put in the ui/ directory without things getting too confusing.  For example, the current beta version has a v11b2/ directory (the beta’s version of ui/) that contains default/ and i18n/.  Switching between them does require manual editing of the XHTML file, as I decided to punt on dynamic theme switching for now.  This does, however, let an author carry around a single ui/ directory with a number of themes contained inside.  That way, he might have four presentations to give, each one with a different theme, but all of them sharing the same ui/ directory.

    Another advantage to changing the directory structure is that v1.0 presentations won’t be compatible with v1.1 themes.  That’s actually a good thing, since the XHTML structure changed in small but significant ways in v1.1b1.

    I thought about further splitting the default directory into “script” and “style” subdirectories, but this seemed like a bit of overkill.  However, I’m starting to wonder how to handle things like IE/Win behaviors, which I suspect will be needed before too much longer.  Why?  Look at the images in the v1.1b2 testbed: they all have flat white backgrounds.  I’d like to turn them all into PNGs with alpha channels, and I’d like to have those work as intended in IE/Win.  The only way to make that work right now is via behaviors like this one.  I’ll want to drop those behaviors into the default/ directory—my leading candidate would actually be IE7, once it gets close to being stable, mostly because it would add quite a lot to theme authors’ CSS toolkits.  But all those behavior files could clutter up the directory, for which the easiest fix is to drop them all in a subdirectory… you probably see where I’m going with this.

    That’s all for another version, though; v1.1 won’t have any behaviors packaged by default.  It’s just on my radar, and I thought I’d toss it out to see if anyone has bright ideas.

  • A “header” file for themes.  You can see an example at v11b2/i18n/00_head.txt.  Briefly, this file contains material destined for the head element of any presentation that’s going to use this theme.  In the case of i18n, the only thing that changes is the link element pointing to slides.css.  Nevertheless, the header file provides all of the link and script elements that should appear in the presentation file.  This should make it easier for an editor program to just grab each block and paste them over the existing block in the presentation file.  It will also reduce ambiguity for anyone doing a manual edit to change themes.  (Open header file, drag-select, copy; open presentation file, drag-select, paste, save, done.)

  • Changes to incremental class names.  In earlier versions, incremental-display objects were marked with a class of inc, and any list that should start out already showing the first list item got a class of psf.  I’ve changed those to incremental and show-first.  The new names require a little more typing, but they’re much less ambiguous and therefore much more author-friendly.  I’m interested to see if anyone has ideas for better names, especially for show-first.

As for the issue of licensing, I guess I’m little further along, but not all they way yet.  The discussion did help me focus on what I want.

  • Presentation content should be under whatever license/terms the author desires.  I do not want to force all S5 presentation content to be public domain, or GPL’ed, or whatever.  If someone wants to give a highly confidential talk using S5, they should be able to put the most restrcitive license in the Universe on the content… but only on the content.

  • Themes should similarly have their licenses, or lack thereof, determined by each one’s author.  If Joe Consultant wants to create a MyCoolCo theme and release it under copyright so that anyone can use the theme for their own presentations but nobody is allowed to re-use his images/look-and-feel/whatever outside the S5 theme, there should be nothing that stops him from doing so.

  • The S5 system (JS, core CSS, and the way they’re put together) should be forever free to use by anyone who wants to do so.  It should be open for future development, in the event that I stop developing it and someone wants to keep going.  This would also allow anyone to fork off their own variant on S5 at any time, but that’s okay too.

    Here’s where it gets a little tricky: S5 should be able to be incorporated into any other project, commercial or not, without restriction.  Attribution to the original source is to be strongly encouraged, but not an absolute requirement.  But at no time, and in no way, should use of S5 in a closed environment ever cause a back-flow of restrictions to the original project.

In other words, anyone should be able to use S5 or a derivative work in their for-profit, wholly proprietary, patented software (or in any other circumstance).  They can even make modifications, if they like.  However, there should be no way for their use of it in a closed system to infect the original S5 source, and if their modifications make it into a future version of S5, the same should hold true.  I don’t even know if that’s possible, but it’s in the spirit of the Share Alike terms in the Creative Commons licenses.  You want to build S5 support into your $49.95 fully copyrighted and licensed editor?  Fine, no problem.  You want to extend S5 to do more cool stuff?  Also fine, but freely contribute the changes back to the place you got the code in the first place.  Don’t try to claim the original project has no right to the additions you made to it, or that the addition of those changes to the original project makes the whole thing yours.

(Not that I think any of you would do such a thing, but I have to think ahead to when S5 catches the interest of someone… well, let’s say less scrupulous.)

In a sense, I want to prevent major infection of licensing terms in both directions.  I’m not entirely sure where that leaves me, but I’d like to work it out before 1.1 goes final.


Unjustified Caption Text

Published 21 years, 1 month past

I just stumbled across a browser bug this evening that I thought you all might like to know about.  So we all know that IE6/Win supports text-align: justify, right?  Wrong.  Sorry, but it’s the truth: IE6 does not fully support text-align: justify.  True, it mostly supports that declaration, but if you apply said declaration to a caption element, guess what?  You get centered, non-justified text instead.  It’s very much as though, in the case of caption elements, IE6 replaces justify with center.

I just thought I’d pass this along in case it might help anyone else avoid some furious head-scratching.  And, I’m sorry to say, I don’t know of a workaround.  If anyone finds one, please leave a comment to that effect.

This is a perfect illustration of how difficult it is to do comprehensive CSS testing.  Every CSS support chart I’ve ever seen has marked down IE6 as supporting justified text; I mean, why wouldn’t they?  It clearly did so… for the specific test cases used to create that support chart.  The odds of a test page including a caption element are pretty vanishingly small, unless of course we’re talking about a test page that includes every single XHTML element in existence.  And to test every element known for every property-value combination… well, I’ve talked about that before.  No need to trample the same ground even flatter.


Browse the Archive

Earlier Entries

Later Entries