Long-Term VisibilityPublished 17 years, 11 months past
A fair portion of the feedback I get whenever I talk about microformats runs along the lines of “How is this any different from stuff like RDF, besides it being written using a far less structured vocabulary?”. Tantek has laid down the basics of the answer to that question. In a severely limited nutshell: the more visible the data, the more likely it is to be made relevant and to be kept that way.
What about search engine spamming? Well, it’s usually easily recognizable as such by a human, so that’s in keeping with visibility and human friendliness. If we suppose a spammer uses CSS to hide the spam from humans, as many do, it’s become invisible—exactly the same as traditional metadata, and exactly what happened to
meta-based keywords before the search engines started ignoring them. Some day (soon?) the search engines may start ignoring any content that’s been hidden, and as far as I’m concerned that would be just fine.
Now, what about farther down the road—will semantic information always have to be visible? An interesting question. Tantek and I have had some pretty energetic arguments about whether the kind of stuff we’re putting into microformats will eventually move into the invisible realm of Semantic Web-style metainformation. As you might guess from his post, Tantek says no way; I’m more agnostic about it. Not every case of structured data lends itself to being visible, and in fact making some kinds of strucuring data visible would be distinctly human-unfriendly. There’s a reason browsers don’t (by default) display a page’s markup.
Besides, to some extent there’s invisible information in microformats, although it’s pretty much always tied to visible information (dates in hCalendar being one such example). Sure, the
class names and
title values are there in the markup as opposed to off in some other file, but from a user point of view, they’re as invisible as
meta keywords or RDF. Usually it’s stuff we don’t want to be in the user’s face: markers telling which bits of content correspond to what, ISO versions of human-readable dates, that kind of thing.
Then again, the truth is that the kind of information most people want to consume and manipulate is the kind of information that lends itself to being visible. Structuring that data in such a way that the same data is useful to both humans and machines—turning the stuff you’re showing to people into the stuff that machines process—is a much more elegant approach, and one that frankly stands a higher chance of success, at least in the short term.
(A quick example: as Andy Baio says, “If hCalendar gets popular, Upcoming.org could scrape events off of websites instead of people entering them directly into Upcoming”. Bands, who are already maintaining their own touring pages, could mark up said pages using hCalendar, and Upcoming would just suck in the information. The advantages? The band’s webmaster doesn’t have to set up the tour page and then go enter all the information into Upcoming; he just creates or updates the page and can then ping Upcoming, or wait for its spider to drop by. The visible information, which is structured in a machine-parsable way, only has to be updated once. Of course, the same would be true with regard to any event aggregator, not just Upcoming, and that’s another advantage right there.)
But will the semantic information stay baked into the visible information? That’s a harder trend to forecast. I remember when presentation was baked into the structure, and it’s been a massive struggle to get the two even partially separated. On the other hand, it makes sense to me to pull presentation and structure apart, so that the former can rest upon the latter instead of having them bolted together. I’m not sure it makes sense to do the same with semantics and structure. Of course, what that really means is that I don’t think it makes sense to argue for their separation now. Perhaps we’ll look back in a decade or two and, with new approaches in hand, chuckle over the thought that we’d ever bolted them together. Alternatively, perhaps we’ll look back from that vantage and wonder why we ever thought the two could, let alone should, be separated.
In either case, it seems clear to me that the way forward is with visible data being used both for human and machine consumption; that is, with the microformat approach. It’s a lightweight, easily grasped, infinitely extensible, and infinitely flexible solution, totally in keeping with the design principles that underpin the Web itself.
Great summary, and food (including some good questions) for thought.
One minor clarification. You wrote:
Not quite. Meta keywords and RDF are totally invisible without “view source”, and even then, the latter tends to be in messy SGML comments or separate files etc.
OTOH, title values are actually somewhat visible, in that by simply hovering over the element with the title, you see a popup with the value.
One might ask, why does that matter? After all, the intent, as with the use of abbr for presenting human dates vs. ISO8601 dates, is to hide “unfriendly” information from the user.
But even ISO8601 datetimes are *possible* for a human to verify. While simply viewing the page (i.e. without doing a view source), one can simply hover over a:
<abbr title="20050610">June 10th</abbr>
And actually verify that yes, the ISO8601 date is the same as the date in the human visible portion. Thus even in the case of using abbr and title to move the machine readable date aside, it’s still kept where a human can easily view it / verify it, should they want to, with a simple gesture, without having to navigate somewhere else.
I completely appreciate microformats, especially as they are quick to deploy, and seem to be more popular than RDF (or, more visible…) even though they are relatively young.
However, I don’t think that the difference between RDF and microformats can be summed up with “microformats are more visible” – they are different technologies (and, I think, compatible).
RDF has a number of benefits… the simple model (something has a property of a value) – which allows all rdf information to be broken down into the same kinds of statements (unlike microformats). Then there is the adoption of URIs for parts/all of RDF statements, that allow two people talking about ‘London’ to use http://www.places.org/UK/London (for example), to ensure that the a specific instance of London is unqiuely identified. And more RDF information can be made available at the end of URIs, really allowing a ‘semantic web’ to be built.
Having said that, it is extremely disappointing that RDF is so complex, is not communicated effectively, and has so many arguments within the internal community that people are put off adopting it.
Microformats could be a big step towards helping people visualise the benefits of publishing semantic information – but I hope people don’t think they are a full alternative to RDF, which is a different kettle of fish…