A fair portion of the feedback I get whenever I talk about microformats runs along the lines of “How is this any different from stuff like RDF, besides it being written using a far less structured vocabulary?”. Tantek has laid down the basics of the answer to that question. In a severely limited nutshell: the more visible the data, the more likely it is to be made relevant and to be kept that way.
What about search engine spamming? Well, it’s usually easily recognizable as such by a human, so that’s in keeping with visibility and human friendliness. If we suppose a spammer uses CSS to hide the spam from humans, as many do, it’s become invisible—exactly the same as traditional metadata, and exactly what happened to
meta-based keywords before the search engines started ignoring them. Some day (soon?) the search engines may start ignoring any content that’s been hidden, and as far as I’m concerned that would be just fine.
Now, what about farther down the road—will semantic information always have to be visible? An interesting question. Tantek and I have had some pretty energetic arguments about whether the kind of stuff we’re putting into microformats will eventually move into the invisible realm of Semantic Web-style metainformation. As you might guess from his post, Tantek says no way; I’m more agnostic about it. Not every case of structured data lends itself to being visible, and in fact making some kinds of strucuring data visible would be distinctly human-unfriendly. There’s a reason browsers don’t (by default) display a page’s markup.
Besides, to some extent there’s invisible information in microformats, although it’s pretty much always tied to visible information (dates in hCalendar being one such example). Sure, the
class names and
title values are there in the markup as opposed to off in some other file, but from a user point of view, they’re as invisible as
meta keywords or RDF. Usually it’s stuff we don’t want to be in the user’s face: markers telling which bits of content correspond to what, ISO versions of human-readable dates, that kind of thing.
Then again, the truth is that the kind of information most people want to consume and manipulate is the kind of information that lends itself to being visible. Structuring that data in such a way that the same data is useful to both humans and machines—turning the stuff you’re showing to people into the stuff that machines process—is a much more elegant approach, and one that frankly stands a higher chance of success, at least in the short term.
(A quick example: as Andy Baio says, “If hCalendar gets popular, Upcoming.org could scrape events off of websites instead of people entering them directly into Upcoming”. Bands, who are already maintaining their own touring pages, could mark up said pages using hCalendar, and Upcoming would just suck in the information. The advantages? The band’s webmaster doesn’t have to set up the tour page and then go enter all the information into Upcoming; he just creates or updates the page and can then ping Upcoming, or wait for its spider to drop by. The visible information, which is structured in a machine-parsable way, only has to be updated once. Of course, the same would be true with regard to any event aggregator, not just Upcoming, and that’s another advantage right there.)
But will the semantic information stay baked into the visible information? That’s a harder trend to forecast. I remember when presentation was baked into the structure, and it’s been a massive struggle to get the two even partially separated. On the other hand, it makes sense to me to pull presentation and structure apart, so that the former can rest upon the latter instead of having them bolted together. I’m not sure it makes sense to do the same with semantics and structure. Of course, what that really means is that I don’t think it makes sense to argue for their separation now. Perhaps we’ll look back in a decade or two and, with new approaches in hand, chuckle over the thought that we’d ever bolted them together. Alternatively, perhaps we’ll look back from that vantage and wonder why we ever thought the two could, let alone should, be separated.
In either case, it seems clear to me that the way forward is with visible data being used both for human and machine consumption; that is, with the microformat approach. It’s a lightweight, easily grasped, infinitely extensible, and infinitely flexible solution, totally in keeping with the design principles that underpin the Web itself.