Posts in the Tech Category

S5 Update

Published 21 years, 4 months past

I know it’s been a while since the last beta version of S5 was released, but between doing client work, flying to and from Albany (speaking of that, big ups to Dan “So Fine” Feinberg, Ed “The Shark” Skawinski, Ric “Darin” DiDonato, and the rest of the ITU Crew), diversions into PHP hacking, judging a markover contest, starting ballroom dancing classes, and spending time with my family, time has been a wee bit tight. Things will only get worse once March rolls into town, so I’m going to try to push 1.1 into final status before February is done.

In the meantime, I wanted to point to some cool things that I’ve heard about with regards to S5.

S5 was adapted to create an online tour of Epocrates, a popular medical reference package for handhelds. Kat uses it, as a matter of fact.
Ludovic Dubost, developer of XWiki, created an XWiki-based S5 creator, which you can read more about in his blog entry about it.
Pelle Braendgaard launched soapbx.com, a Web-driven S5 editor. You can pick a theme, write the content in a wiki-like form, and get a slideshow. It was apparently developed using Ruby On Rails.
Not quite ten hours after getting Pelle’s e-mail, a message from Lucas Carlson arrived regarding the creation of his own S5 creator: s5presents.com. It too was developed using Ruby On Rails.
Earlier today, Eric Eggert reported that S5 got coverage in the German version of Internet World magazine. I’m sort of hoping to see a scan of the article at some point. (Is it copyright infringement if I possess a scanned copy but can’t understand what it says? Just wondering.) Update: I’ve seen a copy of the article, so there’s no more need for scans.

In other, less specific news, I know that people have created or are working on creating translators of one kind or another. A popular request seems to be an OPML-to-S5 translator of some kind, and there’s always the Keynote-to-S5 idea. So I’m going to throw open comments for people to post links to S5-related projects, translators, and what have you. Heck, if you’ve recently done a presentation using S5, let’s see it, especially if you created a new theme. Just please leave this post’s comment clear of bug reports or feature requests. As of this writing, you can drop those on the S5 1.1b3 post, or else wait for the forthcoming post on 1.1b4. I hope that’ll go up in the next couple of days, but no promises.

S5 Update was published on Wednesday, January 26th, 2005.
It was assigned to the S5 category.
There have been thirteen replies.

Gatekeeper In Perspective

Published 21 years, 4 months past

So when I said on Monday:

Got feedback? Let’s hear it?

…what I actually meant was:

Got feedback about the code or how the package works once it’s installed in WordPress? Let’s hear it.

I should have realized that otherwise, the comments would turn into an argument about comment spam, fighting it, ways the general idea could be defeated, and more. Which they did.

Look, folks, despite what some people might tell you, I’m not so arrogant as to think that I could single-handedly solve the comment spamming problem for all time. Even if I were, I very much doubt I’d be so clueless as to think that WP-Gatekeeper was that solution. And if both those things were the case, I’m pretty darned near certain I would have very explicitly made the claim of having beaten the spammers. Likely in big, boldfaced, red, capitalized, blinking letters, plus a background MIDI of “We Are The Champions”.

WP-Gatekeeper is not going to stop every possible comment spam attack, human or automated, for the rest of time. Neither is any other defense you can name, without exception. There may be measures that currently have 100% resistance to scripted attacks. They will one day fail—I can pretty much guarantee it. Even today, they are defeatable by actual humans sitting at computers and posting comment spam on every site they find. That kind of spamming is very, very rare, but it happens. I had such an incident within the last month. If I hadn’t been keeping a close eye on new comments just then, I’d likely have missed it completely.

I’m fully aware that there are ways a spambot could defeat WP-Gatekeeper. At the moment, none of them can. That will one day change, of course, assuming challenges become at all popular. Comment spam and the fighting thereof is a dance, a tennis match, an arms race. Neither side will ever win. As one side adopts a new tactic, the other side will move to counter it. The countermeasure will itself be countered. And so it goes. Eventually, either spambots or spam defenses (or the two in combination) will become so advanced that they’ll gain self-awareness, and then we’ll all be royally hosed.

I know this. You know this. Let’s move on from there, okay?

In the end, the goal is to add another arrow to the quiver at the disposal of spam fighters. Think this approach is wrongheaded, annoying, or otherwise pointless? Fine. Don’t use it. For those who want to add this kind of capability—and since I instituted it on meyerweb, I’ve had not a single piece of spam make it onto the site or hit the moderation queue, whereas in my pre-defense days, I’d get at least twenty every day—then the package is there. You can combine it with other defenses, if you like, for even more coverage. I may upgrade it in the future, depending how far I get in learning PHP, mySQL, and form handling, and what feedback I get from people who know PHP better than I do. I may not, in which case the system as it stands is effective, and probably will be for a while. Even if I do one day abandon further development, the code is out there for someone else to improve if they so choose.

In the meantime, if there’s anyone who is using WP-Gatekeeper or has looked at the code, and has feedback on the coding or the way it works for the administrator of a WP blog, please feel free to share. Also, if anyone can point me to an example of PHP code for collecting all of the HTTP_VARS that are returned by an XHTML form and then looking through them, even when the variable names aren’t necessarily known ahead of time, I’d really like to see it. Thanks.

Gatekeeper In Perspective was published on Wednesday, January 26th, 2005.
It was assigned to the Tools and WordPress categories.
There have been twenty-six replies.

WP-Gatekeeper

Published 21 years, 4 months past

In my post on rel="nofollow", I mentioned the use of easily human-comprehensible challenge questions like “What is Eric’s first name?” as a way to defeat spambots. There were two points made in the comments that I had considered but hadn’t brought up, given that they were tangential to the point of the post. They were:

Spammers could set up a database of questions and answers used on sites. They might or might not share it with each other, but the point is that if I set up “What is Eric’s first name?” as the sole challenge, the human running the spambot could build the ability to answer the question into the spambot, thus defeating it. Quite true.
In order to make it more difficult to do this, there could be a set of challenges from which one is picked randomly. So I might have three challenges asking for the first names of myself, Kat, and Carolyn. Every time a comment form is delivered to a browser, one of the three challenges, picked at random, is included. This would make it more difficult for a human spammer, since he (or she) would have to find all of the challenge questions. work out the responses, and build them all into a database, keyed to each site’s domain.

So over the weekend, I built as a proof of concept (and also as an exercise in learning more about how PHP, mySQL, and WordPress work) a WordPress package to do what described in the second point above. It’s called WP-Gatekeeper, available from my WordPress Tools page, and if you’re brave you can give it a try. Why brave? Because the installation involves hacking a few WP files and adding a new entry to the admin menu, not to mention firing up a plugin. And if you do it in the wrong order, you can break commenting for a short period. There are DIY installation instructions on the WP-Gatekeeper page, for those who still want to proceed. You also need to be brave because if you install it, you’re running code written—well, actually, adapted—by someone with only beginner-to-intermediate PHP skills. I’ve been testing it locally and everything seems fine, but this is even more “use at your own risk” software than usual. Got it? Good.

Accordingly, WP-Gatekeeper is currently considered beta software. I’m making it available now in the hopes that people more experienced than I with PHP and WordPress can take a look, hack on the code, and make it more efficient and the whole package easier to install. I’m already aware that in WP 1.5, adding the admin page is much easier and doesn’t require hacking files, but I wrote WP-Gatekeeper in 1.2 and want it to work there, since that’s the latest public version. Thus, any optimizations should work in 1.2. When 1.5 (or whatever the next version number is) comes out, then I’ll worry about it.

Of course, there’s still nothing that prevents a spammer from registering questions and answers into a database, but the admin page makes it easy for a blogger to add, remove, modify, and re-key the challenges. That will make tracking them more difficult, so long as a blogger puts effort into maintaining the list of challenges. It gets back, in the end, to maintaining your blog. The more maintenance you put into something, the better its shape will stay.

I’m also interested in suggestions for how the overall system could be made harder to bypass with a bot, and easier for a WP admin to run. One feature I plan to add before going final is the ability to have the keys replaced on a regular basis, with the interval (daily/weekly/monthly/etc.) set by the admin. The other driving consideration here is that the system should be fully capable of working even if JavaScript is disabled. It’s an accessibility thing; just go with me on this. (Accessibility is the main reason I did this rather than install an image CAPTCHA solution, as it happens.)

Got feedback? Let’s hear it.

WP-Gatekeeper was published on Monday, January 24th, 2005.
It was assigned to the Tools and WordPress categories.
There have been sixty-eight replies.

More Spam To Follow

Published 21 years, 4 months past

So… rel="nofollow". Now there’s a way to deny Google juice to things that are linked. Will it stop comment spam? That’s what I first thought, but I’ve come to realize that it’ll very likely make the problem worse. In the last few hours, I’ve been hearing things that support this conclusion.

First, the by-now required disclaimer: I think it’s great that Google is making a foray into link typing, and I don’t think they should reverse course. For that matter, it would be nice if they paid attention to VoteLinks as well, and heck, why not collect XFN values while they’re at it? After all, despite what Bob DuCharme thinks, the rel attribute hasn’t been totally ignored these past twelve years. There is link typing out there, and it’s spreading. Why not allow people to search their network of friends? It’s another small step toward Google Grid… but I digress.

The point is this: rather than discourage comment spammers, nofollow seems likely to encourage them to new depths of activity. Basically, Google’s move validates their approach: by offering bloggers a way to deny Google juice, Google has acknowledged that comment spam is effective. This doesn’t mean the folks at Google are stupid or evil. In their sphere of operation, getting comment spam filtered out of search results is a good thing. It improves their product. The validation provided to spammers is an unfortunate, possibly even unanticipated, side effect.

There is also the possibility, as many have said, that nofollow will harm the Web and Google’s results, because blindly applying a nofollow to every comment-based link will deny Google juice to legitimate, interesting stuff. That might be true if nofollow is used like a sledgehammer, but there are more nuanced solutions aplenty. One is to apply nofollow to links for the first week or two after a comment is posted, and then remove it. As long as any spam is deleted before the end of the probation period, it would be denied Google juice, while legitimate comments and links would eventually get indexed and affect Google’s results (for the better).

In such a case, though, we’re talking about a managed blog—exactly the kind of place where comment spam had the least impact anyway. Sure, occasionally the Googlebot might pick up some spam links before the spam was removed from the site, but in general spam doesn’t survive on managed sites long enough to make that much of a difference.

Like Scoble, where I might find nofollow of use would be if I wanted to link to the site of a group or person I severely disliked in order to support a claim or argument I was making. It would be a small thing, but still useful on a personal level. (I’d probably also vote-against the target of such a link, on the chance that one day indexers other than Technorati‘s would pay attention.)

No matter what, the best defenses against comment spam will be to prevent it from ever appearing in the first place. There are of course a variety of methods to accomplish this, although most of them seem doomed to fail sooner or later. I’m using three layers of defense myself, the outer of which is currently about 99.9% effective in preventing spam from ever hitting the moderation queue, let alone make it onto the site. One day, the layer’s effectiveness will very suddenly drop to zero. The second layer was about 95% effective at catching spam when it was the outer layer, and since it’s content-based will likely stay at that level over time. The final layer is a last-ditch picket line that only works in certain cases, but is quite effective at what it does.

So what are these layers, exactly? I’m not telling. Why not? Because the longer these methods stay off the spammers’ radar, the longer the defenses will be effective. Take that outer layer I talked about a moment ago: I know exactly how it could be completely defeated, and for all time. Think I’m about to explain how? You must be mad.

The only spam-blocking method I can think of that has any long-term hope of effectiveness is the kind that requires a human brain to circumvent. As an example, I might put an extra question on my comment form that says “What is Eric’s first name?” Filling in the right answer gets the post through. (As Matt pointed out to me, Jeremy Zawodny does this, and that’s where I got the idea.) That’s the sort of thing a spambot couldn’t possibly get right unless it was specifically programmed to do so for my site—and there’s no reason why any spammer would bother to program a bot to do so. That would leave only human-driven spam, the kind that’s copy-and-pasted into the comment form by an actual human, and nothing besides having to personally approve every single post will be able to stop that completely.

So, to sum up: it’s cool that Google is getting hip to link typing, even though I don’t think the end result of this particular move is going to be everything we might have hoped. More active forms of spam defense will be needed, both now and in the future, and the best defense of all is active management of your site. Spammers are still filthy little parasites, and ought to be keelhauled. In other words: same as it ever was. Carry on.

More Spam To Follow was published on Friday, January 21st, 2005.
It was assigned to the Web category.
There have been thirty-five replies.

S5 1.1b3

Published 21 years, 5 months past

Well, there was time off for the holidays, but now S5 is back and ready to increment its beta number. So, without too much ado: S5 1.1b3 (248KB ZIP file). Here’s the current testbed presentation, for those who just want to play around with it. Because of the long holiday break, I want to add another beta round or two just to work out as many kinks as possible. So this isn’t the last version before going final on 1.1; still, I’m interested in any problems that people encounter.

There’s really only one notable change from the previous version. I incorporated Jordan Liggitt’s “type slide number” code into this version. Why his, when others have done similar things? Because his version was well-marked with comments, and thus easy for me to figure out what he’d done and how he’d done it. So here’s how it works:

If the user types a number (multi-digit is allowed), the script stores the number. Inputting any non-number key clears the entered number.
If the user hits Enter/Return while there is a number stored, the slide show jumps to that slide. Any attempt to jump directly to a slide past the end of the slide show results in no action, although the number is still cleared.
Hitting any of the “Next” or “Previous” keys while there is a number entered causes the slide show to skip the number entered in the appropriate direction. Thus, entering “3” and hitting the space bar would jump forward three slides; entering 5 and hitting Page Up would jump backward five slides. Skipping past the end of the slide show will drop you on the title slide, which is something I’m thinking about changing, though I’m not entirely certain in what way.

I’m mulling over which keys should invoke which jumping behavior. For example, a couple of times I’ve typed a slide number and then hit the space bar to advance directly to that slide. Instead, I jumped forward by that number, which is correct but obviously not what I was subconsciously expecting. So I’m thinking about further restricting the keys that trigger the “jump n slides” behavior. Anyone have suggestions based on other slide show software?

At this stage, I’m likely to put off adding the multiple-author meta that I toyed with in earlier versions. The general need is still there, but I’m just not able to think the problem through with the kind of clarity I want. It will have to wait for another day. I’m also dithering a bit about the licensing, though at this point I’m leaning pretty heavily toward using Expat. My hesitation is largely based on my very desire to make the right choice so that I never, ever have to worry about it again, you know?

Anyway, as always, feedback is welcome.

S5 1.1b3 was published on Thursday, January 13th, 2005.
It was assigned to the S5 and Tools categories.
There have been thirty-seven replies.

Tabular Weirdness

Published 21 years, 5 months past

Recently I was doing some table styling for a client and ran into what I can only call tabular weirdness. There were two different things that I stumbled across, and interestingly, they were the kinds of problems you wouldn’t be likely to encounter in layout tables. These would come up much more often in data tables.

In the first case, the general idea was to put some space between the tables and the surrounding material, but as these were data tables, they came with captions. So I of course put the caption text in caption elements. That’s when things started to get inconsistent.

To be more precise, the problems began after I left Safari to check the page in other browsers. In Safari, you see, the caption’s element box is basically made a part of the table box. It sits, effectively, between the top table border and the top margin. That allows the caption’s width to inherently match the width of the table itself, and causes any top margin given to the table to sit above the caption. Makes sense, right? It certainly did to me.

However, according to section 17.4 of CSS2.1 and the figure that accompanies it, the caption sits entirely outside the table’s box, and that includes the table’s margin. The two are still tied together by the generation of an anonymous box, but the upshot is that if you give the table left and right margins, then the caption does not follow suit. If you give the table a top margin, it pushes the caption away from the table. This is the behavior evinced by Firefox 1.0, and as unintuitive as it might be, it’s what the specification demands.

The third piece of strangeness was found in IE/Win. What I’d done was simply said that some cell borders should be solid—nothing more complicated than border-bottom: 1px solid. The idea was that it would, as borders do, pick up the foreground color of the cell, but IE/Win had other ideas. As best I could tell, the borders were a light gray. You can see it happen in the testcase I constructed to create the images in this entry. Explicitly specifying a border color fixes the problem, of course, but it was a bit of weirdness I thought I’d pass along in case anyone runs into the same thing.

Tabular Weirdness was published on Tuesday, January 11th, 2005.
It was assigned to the Browsers and CSS categories.
There have been fifteen replies.

Mickey Prints

Published 21 years, 5 months past

Since Kat and I were going to be visiting Florida so often last year and this, and therefore we of course had to visit Disney World a lot, we decided to buy annual passes. I was quite interested that when you buy an annual pass, the Disney folks take the prints of your right hand’s first and second fingers. That data is associated with the card; whether it’s encoded onto the card’s strip or not, I don’t know. But either way, some of your biometric data is associated with your Disney pass. When you enter the park, you run the pass through the turnstile and stick your fingers into a reader. If the fingers don’t match the card, you can’t get in, so you can’t share an annual pass with anyone else.

Now, suppose the Disney database stores that biometric data. Now they have that data tied to a credit card number, purchasing patterns in the parks, probably a home address and phone number, and so on. Interesting. Guess what? As of 2 January 2005, Disney is doing that for all passes: day passes, park hopper passes, all kinds of passes. Every kind of pass. Get a pass, get your fingers scanned. (Okay, yes, you can opt out and be required to show photo ID, but how many people will bother?)

That’s a whole lot of biometric data associated with a whole lot of consumer data. Interesting, don’t you think?

Mickey Prints was published on Sunday, January 9th, 2005.
It was assigned to the General and Tech categories.
There have been twenty-nine replies.

Don’t Care About Market Share

Published 21 years, 6 months past

In a fashion vaguely reminiscent of the process by which weeds keep growing back no matter how you try to rid yourself of them, the question of browser market share has once again been rearing its foul, misshapen head. Dan kicked off a round of it over at Simplebits, but it’s recently been popping up other places as well. I heard discussions about market share at SES Chicago, perhaps unsurprisingly, but I’ve also been seeing the question on various mailing lists and other forums.

The only thing more frustrating than the persistent recurrence of this unnecessary question is the inappropriate gravity it seems to acquire in so many minds.

Look, I’ll make this very simple for everyone. If you’re trying to figure out what browsers to support (or not) in terms of layout consistency on a given site, then the answer is very easy. Whatever the site’s access logs tell you. End. Of. Story!

For example, the stats for the past few days’ worth of visitors to Complex Spiral Consulting tell me the following:

User Agent	Portion of hits
Firefox	43%
IE6	30.8%
Mozilla	8.8%
Safari	8.6%
Opera	2.4%

(For those who are curious, IE5.5 makes up 0.8% of hits. Various flavors of IE5.x below IE5.5 total roughly 1.2%, but note that Windows and Mac users are lumped together there.)

Those statistics tell me quite a bit about the people who visit the CSC site, and I can use that information to decide what to do about browser support. You know what those numbers tell you about which browsers to support (or not) in your designs for sites on which you work? Absolutely squat. Anyone who uses those access statistics to make decisions for their own work is a fool, and a misinformed fool at that.

In every design, we have to ask what browsers need to have a consistent experience, which ones can be given a reduced experience, and which ones get no design at all. The user logs from another site are useless in trying to make this decision. The “global statistics” from firms like WebSideStory are just as useless in this case. They may be entirely accurate, but they are also entirely irrelevant when it comes to making design support decisions. The only stats that matter are the ones that come from the site you’re designing.

In a like manner, I don’t care if you think visitors to your site or some other favorite site of yours are an accurate reflection of the overall Internet population or not: that opinion is similarly irrelevant. It’s rather like me claiming that the people who come to our annual holiday party are an accurate reflection of partygoers in general. Maybe they are and maybe they aren’t, but either way I don’t think you should plan your all-night rave to accomodate the kinds of people who drop by our house to have homemade bread and soup and chat about babies, politics, science-fiction movies, and the weather. And vice versa.

(Do remember that your site’s stats may reflect its current behavior instead of your potential audience. If your site is already broken past the point of usefulness in Safari, then you’re going to see very low Safari numbers. Make sure that you’re comparing apples to apples, and only compare the numbers in your access logs for browsers that can already use the site.)

As for the related question of “at what percentage level do I decide a browser isn’t worth bothering about”—well, that’s really up to you, isn’t it? I certainly can’t tell you when it’s worthwhile to stop worrying about IE5.0, or Netscape 4.7, or Mosaic 1.2. I know what I think is appropriate for the sites I work on—and the process of finding the answer is different for every site. It has to be, because every site is different.

Now, if you want to share your user demographics with anyone who wants a peek, hey, have fun with that. If data exhibitionism is your thing, who am I to judge? Just don’t pretend that the bits of data you’re exposing to the world are representative of everyone else’s, because I guarantee you that they are not. As for anyone who happens to glance at your data: I hope they realize the same thing.

Don’t Care About Market Share was published on Monday, December 20th, 2004.
It was assigned to the Browsers category.
There have been forty-one replies.

Browse the Archive

Earlier Entries

Later Entries

Posts in the Tech Category

Browse the Archive

Feeds

Categories

Archives