Posts in the semantic web Category

Customizing Your Markup

Published 12 years, 7 months past

So HTML5 allows you (at the moment) to create your own custom elements.  Only, not really.

(Ed. note: this post has been corrected since its publication, and a followup clarification has been posted.)

Suppose you’re creating a super-sweet JavaScript library to improve text presentation — like, say, TypeButter — and you need to insert a bunch of elements that won’t accidentally pick up pre-existing CSS.  That rules span right out the door, and anything else would be either a bad semantic match, likely to pick up CSS by mistake, or both.

Assuming you don’t want to spend the hours and lines of code necessary to push ahead with span and a whole lot of dynamic CSS rewriting, the obvious solution is to invent a new element and drop that into place.  If you’re doing kerning, then a kern element makes a lot of sense, right?  Right.  And you can certainly do that in browsers today, as well as years back.  Stuff in a new element, hit it up with some CSS, and you’re done.

Now, how does this fit with the HTML5 specification?  Not at all well.  HTML5 does not allow you to invent new elements and stuff them into your document willy-nilly.  You can’t even do it with a prefix like x-kern, because hyphens aren’t valid characters for element names (unless I read the rules incorrectly, which is always possible).

No, here’s what you do instead :

  1. Wrap your document, or at least the portion of it where you plan to use your custom markup,Define the element customization you want with an element element.  That’s not a typo.
  2. To your element element, add an extends attribute whose value is the HTML5 element you plan to extend.  We’ll use span, but you can extend any element.
  3. Now add a name attribute that names your custom “element” name, like x-kern.
  4. Okay, you’re ready!  Now anywhere you want to add a customized element, drop in the elements named by extends and then supply the name via an is attribute.

Did you follow all that?  No?  Okay, maybe this will make it a bit less unclear.  (Note: the following code block was corrected 10 Apr 12.)

<element extends="span" name="x-kern"></element>
<h1>
<span is="x-kern" style="…">A</span>
<span is="x-kern" style="…">u</span>
<span is="x-kern" style="…">t</span>
<span is="x-kern" style="…">u</span>
mn
</h1>
<p>...</p>
<p>...</p>
<p>...</p>

(Based on markup taken from the TypeButter demo page.  I simplified the inline style attributes that TypeButter generates for purposes of clarity.)

So that’s how you create “custom elements” in HTML5 as of now.  Which is to say, you don’t.  All you’re doing is attaching a label to an existing element; you’re sort of customizing an existing element, not creating a customized element.  That’s not going to help prevent CSS from being mistakenly applied to those elements.

Personally, I find this a really, really, really clumsy approach — so clumsy that I don’t think I could recommend its use.  Given that browsers will accept, render, and style arbitrary elements, I’d pretty much say to just go ahead and do it.  Do try to name your elements so they won’t run into problems later, such as prefixing them with an “x” or your username or something, but since browsers support it, may as well capitalize on their capabilities.

I’m not in the habit of saying that sort of thing lightly, either.  While I’m not the wild-eyed standards-or-be-damned radical some people think I am, I have always striven to play within the rules when possible.  Yes, there are always situations where you work counter to general best practices or even the rules, but I rarely do so lightly.  As an example, my co-founders and I went to some effort to play nice when we created the principles for Microformats, segregating our semantics into attribute values — but only because Tantek, Matt, and I cared a lot about long-term stability and validation.  We went as far as necessary to play nice, and not one millimeter further, and all the while we wished mightily for the ability to create custom attributes and elements.

Most people aren’t going to exert that much effort: they’re going to see that something works and never stop to question if what they’re doing is valid or has long-term stability.  “If the browser let me do it, it must be okay” is the background assumption that runs through our profession, and why wouldn’t it?  It’s an entirely understandable assumption to make.

We need something better.  My personal preference would be to expand the “foreign elements” definition to encompass any unrecognized element, and let the parser deal with any structural problems like lack of well-formedness.  Perhaps also expand the rules about element names to permit hyphens, so that we could do things like x-kern or emeyer-disambiguate or whatever.  I could even see my way clear to defining an way to let an author list their customized elements.  Say, something like <meta name="custom-elements" content="kern lead follow embiggen shrink"/>.  I just made that up off the top of my head, so feel free to ignore the syntax if it’s too limiting. The general concept is what’s important.

The creation of customized elements isn’t a common use case, but it’s an incredibly valuable ability, and people are going to do it.  They’re already doing it, in fact.  It’s important to figure out how to make the process of doing so simpler and more elegant.


Long-Term Visibility

Published 19 years, 5 months past

A fair portion of the feedback I get whenever I talk about microformats runs along the lines of “How is this any different from stuff like RDF, besides it being written using a far less structured vocabulary?”.  Tantek has laid down the basics of the answer to that question.  In a severely limited nutshell: the more visible the data, the more likely it is to be made relevant and to be kept that way.

What about search engine spamming?  Well, it’s usually easily recognizable as such by a human, so that’s in keeping with visibility and human friendliness.  If we suppose a spammer uses CSS to hide the spam from humans, as many do, it’s become invisible—exactly the same as traditional metadata, and exactly what happened to meta-based keywords before the search engines started ignoring them.  Some day (soon?) the search engines may start ignoring any content that’s been hidden, and as far as I’m concerned that would be just fine.

Now, what about farther down the road—will semantic information always have to be visible?  An interesting question.  Tantek and I have had some pretty energetic arguments about whether the kind of stuff we’re putting into microformats will eventually move into the invisible realm of Semantic Web-style metainformation.  As you might guess from his post, Tantek says no way; I’m more agnostic about it.  Not every case of structured data lends itself to being visible, and in fact making some kinds of strucuring data visible would be distinctly human-unfriendly.  There’s a reason browsers don’t (by default) display a page’s markup.

Besides, to some extent there’s invisible information in microformats, although it’s pretty much always tied to visible information (dates in hCalendar being one such example).  Sure, the class names and title values are there in the markup as opposed to off in some other file, but from a user point of view, they’re as invisible as meta keywords or RDF.  Usually it’s stuff we don’t want to be in the user’s face: markers telling which bits of content correspond to what, ISO versions of human-readable dates, that kind of thing.

Then again, the truth is that the kind of information most people want to consume and manipulate is the kind of information that lends itself to being visible.  Structuring that data in such a way that the same data is useful to both humans and machines—turning the stuff you’re showing to people into the stuff that machines process—is a much more elegant approach, and one that frankly stands a higher chance of success, at least in the short term.

(A quick example: as Andy Baio says, “If hCalendar gets popular, Upcoming.org could scrape events off of websites instead of people entering them directly into Upcoming”.  Bands, who are already maintaining their own touring pages, could mark up said pages using hCalendar, and Upcoming would just suck in the information.  The advantages?  The band’s webmaster doesn’t have to set up the tour page and then go enter all the information into Upcoming; he just creates or updates the page and can then ping Upcoming, or wait for its spider to drop by.  The visible information, which is structured in a machine-parsable way, only has to be updated once.  Of course, the same would be true with regard to any event aggregator, not just Upcoming, and that’s another advantage right there.)

But will the semantic information stay baked into the visible information?  That’s a harder trend to forecast.  I remember when presentation was baked into the structure, and it’s been a massive struggle to get the two even partially separated.  On the other hand, it makes sense to me to pull presentation and structure apart, so that the former can rest upon the latter instead of having them bolted together.  I’m not sure it makes sense to do the same with semantics and structure.  Of course, what that really means is that I don’t think it makes sense to argue for their separation now.  Perhaps we’ll look back in a decade or two and, with new approaches in hand, chuckle over the thought that we’d ever bolted them together.  Alternatively, perhaps we’ll look back from that vantage and wonder why we ever thought the two could, let alone should, be separated.

In either case, it seems clear to me that the way forward is with visible data being used both for human and machine consumption; that is, with the microformat approach.  It’s a lightweight, easily grasped, infinitely extensible, and infinitely flexible solution, totally in keeping with the design principles that underpin the Web itself.


Getting Onto The Calendar

Published 19 years, 6 months past

Over at Complex Spiral Consulting, I maintain a list of upcoming appearances at conferences, workshops, and the like.  These are the “public” events; that is, events which are accessible by members of the public, assuming they pay whatever registration fee is being charged by the people in charge of the event.  This is in contrast to “private” events; that is, client work that isn’t open to anyone except employees of the client.

Occasionally I’m asked if I have an RSS feed of those events, or send out e-mail updates, or otherwise provide any sort of notice other than just changing the web page.  For a long time, the answer was basically “no”.  Now it’s “yes”, and it’s an example of a microformat in action.

If you’re using iCal on OS X, or any other webcal:-aware calendaring program, then all you have to do is hit the following link: Complex Spiral upcoming events calendar.  Your calendar program should come to the foreground and let you add the URI as a subscribed calendar.  And hey presto!  You’re done.  Any changes to the web page will be reflected in your calendar the next time the subscription is refreshed, and iCal lets you set your refresh interval to be 15 minutes, once a day, once a week, and so on.

What’s happening there is you’re pouring the home page of complexspiral.com through an XSLT recipe called X2V written by Brian Suda.  His XSLT pulls out the hCalendar markup and turns it into an ICS file, one fully conformant with RFC 2445.  So I don’t have to figure out how to produce and provide my own ICS file.  Providing the hCalendar markup is enough, thanks to Brian’s work.

Of course, the number of people who would want to subscribe to my professional appearances schedule is fairly small.  This is just a demonstration, though.  Suppose a site like, oh, upcoming.org were to publish their event calendars with hCalendar markup?  Then all you’d have to do is find the page that corresponds to your city, run it through Brian’s script, and you’d have your very own regularly updated local events calendar, just like that.

Guess what?  You can do that right now:  upcoming.org is publishing its information using hCalendar markup.  For example, here’s the calendar for Cleveland, Ohio, ready for one-click subscription: Cleveland events calendar.  If you just want the ICS file to be downloaded to your hard drive, then you can use this link instead: Cleveland events ICS file.  The only difference between the two links is that the former uses the webcal: scheme identifier, whereas the second uses the more familiar http:.

I personally think there needs to be some work done on their hCalendar markup, like properly marking up location information.  The time information for some events seems to be a bit wonky as well, although the dates are accurate.  The great thing is that the hCalendar information could be fixed in very short order.  In fact, from what I’ve heard, they added basic hCalendar markup to the site in under an hour.  Adding more, or fixing any problems in what they have, shouldn’t take much longer.

Imagine how much further this could go.  Suppose Basecamp marked up its project calendars with hCalendar, and used a script like Brian’s to turn it into ICS information.  Its users could have project milestones right there in their personal calendar programs.  Ditto for the To-Do’s lists, because that sort of information is all defined in the iCalendar specification.  The TiVo site could provide customized schedules, like all the showings of American Idol or Masterpiece Theater.  The IMDB could publish movie opening dates in hCalendar format; studios could do the same.  Want a calendar schedule that shows what DVDs are coming out, when?  Or what new albums are being released for the next month?  All it takes is a little slice of a webmonkey’s time.

The point being, there’s nothing for which said webmonkey has to wait.  The tools are already here.  No browser has to be upgraded.  In fact, in many ways this bypasses the browser to send information directly to the calendaring program… but the information is provided in a browser- and search-engine-friendly way, so they can access and use the same data in their own ways.  No alternate files.  Just a single set of information, made more rich and useful through easily understood mechanisms.

How cool is that?


Microformats and Semantics in Japan

Published 19 years, 6 months past

In our post-game analysis, Tantek and I felt that the Developers Day track on microformats went incredibly well.  Not only did we get a lot of good feedback, I think we turned a lot of heads.  The ideas we presented stood up to initial scrutiny by a pretty tough crowd, and our demonstrations of the already-deployed uses of formats like XFN, like XHTMLfriends.net and an automated way to subscribe to hCalendars and hCards, drew favorable response.

Even better, our joint panel with the Semantic Web folks had a far greater tone of agreement than of acrimony, the latter of which I feared would dominate.  I learned some things there, in fact.  For example, the idea that the Semantic Web efforts are inherently top-down turns out to be false.  It may be that many of the efforts have been top-down, but that doesn’t mean that they have to be.  We also saw examples where Semantic Web technologies are far more appropriate than a microformat would be.  The example Jim Hendler brought up was an oncology database that defines and uses some 600,000 terms.  I would not want to try to capture that in a microformat—although it could be done, I suspect.

Here’s one thing I think is key about microformats: they cause the semantics people already use to be impressed onto the web.  They capture, or at least make it very easy to capture, the current zeitgeist.  This makes them almost automatically human-friendly, which is always a big plus in my book.

The other side of that key is this:  it may be that by allowing authors to quickly annotate their information, microformats will be the gateway through which the masses’ data is brought to the more formal systems the Semantic Web allows.  It very well may be that, in the future, we’ll look back and realize that microformats were the bootstrap needed to haul the web into semanticity.

Tantek and I have had some spirited debates around that last point, and are actually in the middle of one right now.  After all, maybe things won’t go that way; maybe microformats will lead to something else, some other way of spreading machine-recognizable semantic information.  It’s fun to debate where things might go, and why, but I think in the end we’re both willing to keep pushing the concept and use of microformats forward, and see how things turn out down the road.

What’s fascinating is how fired up people get about microformats.  After SXSW05, there was an explosion of interest and experimentation.  Several microformats got created or proposed, covering all kinds of topics—from folksonomy formalization to political categorization.  A similar effect seemed to be occurring at WWW2005.  One person who’s been around long enough to know said that the enthusiasm and excitement surrounding microformats reminded him of the early days of the web itself.

As someone who’s at the center of the work on microformats, it’s hard for me to judge that sort of thing.  But I was there for some of the early WWW conferences, and I remember the energy there.  As I rode home from WWW2 in Chicago, I was convinced that the world was in the process of changing, and I wanted more than anything to be a part of that change.  To hear that there’s a similar energy swirling around something I’m helping to create and define is profoundly humbling.

That all sounds great, of course, but if it remains theoretical it’s not much good, right?  Fortunately, it isn’t staying theoretical at all, and I’m not just talking about XFN.  Want an example of how you could make use of microformatted information right now, as in today?  That’s coming up in the next post, where I’ll show how to make use of a resource I mentioned earlier in this post.


WWW2005: Microformats Track

Published 19 years, 7 months past

As recently announced by Mark Baker, Tantek Çelik and I will be co-chairing a full-day track on microformats as part of Developers’ Day at WWW2005.  We’ll announce the details in the near future, but we can already say that have some great speakers and topics lined up.  I encourage anyone who can to come check it out.  You can register at the WWW2005 site; make sure to check the option for “Developers’ Day, 5/14” when you do.

Tantek and I will also be presenting a poster on XMDP at the conference, and on Tuesday, May 10th, I’ll be delivering a half-day tutorial on Standards-Based Design—assuming enough people register, anyway—as well as delivering the afternoon keynote at, and participating in the closing panel for, the 2nd International Cross-Disciplinary Workshop on Web Accessibility (W4A).

Add to that an expected public appearance in Tokyo the evening of Friday the 13th (for which I hope to have details very soon) as well as a few other agenda items, and I’ve started to wonder if I’m going to have any time to sightsee while I’m there.  That’s becoming something of a theme, actually: I’m not expecting to have more than a day or so to make the rounds when I’m in London this June.

For some reason, I’m reminded of Mel Brooks in Blazing Saddles: “Work work work work work!”


Emergent Semantics

Published 19 years, 8 months past

Just a quick link to my slide deck (when did that term gain currency, and why didn’t I get a memo?) for “Emergent Semantics“.  I was honestly surprised by the number of attendees, and there were some great questions and ideas from audience members.  Throughout the rest of the day, I had some great conversations with people about their own microformat ideas.  Another measure of the level of interest in microformats and the semantic web was attendance at Tantek’s “The Elements of Meaningful XHTML“, which was so heavy that after the seats and floor space in his room filled up, a knot of people stood outside the door, turning their heads slightly and standing on tiptoe in an attempt to hear what he was saying.

On a very related note, I’ve updated my blogroll with some new met values.  I’ve met a ton of people I’d never met before, and hope to meet still more—so if I do assemble a metroll, it’ll have to wait until I get home.


Social Protocols

Published 19 years, 8 months past

Seems like half the Web is already at SXSW, and I’ll be there myself soon.  For those of you who love to build networks out of your social contacts at such events, Tantek’s recently shared the secret of metrolling, which is a great way to get into XFN if you haven’t already.  I’m already planning to add metrolling to my presentation on Sunday as an example of ground-up semantics.  (And I really wish I could be at the Semantic Web panel on Monday, but it’s at the same time as “Women of Web Design”… oh well.)

It’s interesting to see how interest in evolutionary semantics is itself evolving.  A recent example of this is David Berlind‘s ZDNet article “Will social networks give way to social protocols?“.  I firmly believe the answer to be “yes”, even though there are a lot of skeptics (some of them on conference review committees, as it turns out).  Berlind clearly understands the advantage of social protocols.

You might then wonder, “Then what’s up with you writing a whole document about how to set up XFN ‘me’ values in a bunch of services?”  At this stage of social networking, that sort of thing is necessary.  Without interim steps, the information sitting in those services will stay scattered and isolated.  Thanks to the me value, XFN offers a very simple, lightweight solution to the problem of identity consolidation.  As I recently wrote in a poster proposal:

As the Web has evolved, a number of personal-information sites have arisen.  Some of these sites exist to help create and increase professional contacts; others are intended to help bring together one’s friends or even find potential mates.  In every case, however, the user must create a new profile for each site.  Each of these profiles constitutes a small island of identity.  Over time, a person can end up with a fairly extensive identity archipelago.

Unfortunately, there has… been no easily created machine-discoverable way to bridge the gaps between the islands.  An author might publish a page containing links to all his profiles, but to an indexing engine, these links are no more or less notable than his links to the latest amusing Flash animations.

With XFN, it becomes very easy for an author to annotate a link to indicate that its destination is one of the islands in his identity archipelago.  This kind of link is referred to as a “me” link throughout the rest of this paper.  By creating symmetric links between the islands, the author can make it possible to consolidate the various pieces of his online identity into a more cohesive whole.

The same is true for a person’s links to other people.  By pulling them all into one place, or at least by marking them all with XFN and then using “me” links to tie together all the bits of his identity archipelago, real social networking start to emerge.

Now, one of the things that people like to carp about is the limits of XFN.  The first of the two most common complaints are that it’s impossible to capture the full range of human relationships in fifteen words.  We agree.  The other complaint is that we only picked “positive” terms; that is, we have friend but not enemy.  We did that on purpose, as we explained; besides, it’s called XHTML Friends Network, which should be kind of a clue.  Apparently this choice makes us arrogant, or clueless, or some combination thereof.  Maybe that’s so.  What I find interesting is that the people who complain that we didn’t include their preferred relationship terms never do anything about it.  They just complain.  What’s so interesting to me is that the guys who decided to focus on the positive went out and did something; those who want to mix in the negative seem to have nothing to offer except complaints.  That says something, I think.

Because XFN is not, nor was it ever meant or represented to be, the final word on social protocols.  We fully expect it to either be improved, or else superseded.  Suppose one of the critics actually did something to address his concerns, and published an “XHTML Relationships Network”.  This could include all the XFN values, plus their negative counterparts, plus whatever else is thought to be useful by the author(s) of this new XRN.  At that point, you have competing protocols.  The more useful one will win.  The loser will be eventually discarded, although some of its memetic genes may live on.  This isn’t a problem: it’s a strength.

It’s also in many ways the entire point of XHTML Meta Data Profiles.  See a need to fill?  Fill it!  At the end of his column, Berlind says in an update:

Looking at the XFN profile, it suddenly dawned on me that perhaps there should be an XBN/XB2BN that’s strictly for the relationships between businesspeople/businesses. Thoughts?

Here are my thoughts: go for it!  He’s almost certainly right that there’s utility in such a protocol.  All it takes now is for someone to look at the problem and write up an XMDP-based protocol that solves the problem.  The microformat approach makes this so simple, pretty much anyone could do it.  What’s needed is someone who actually will do it.

At some point down the road, it’s possible that the protocols that define personal and professional relationships would merge.  Again, that’s completely in keeping with the vision we have.  The whole point of this kind of ad-hoc semantic enrichment is that it’s evolutionary.  New players will enter the field, and will either prosper or wither.  Anyone can join in.  There is no star chamber of lofty experts to say whether your idea passes some sort of ideological muster.  It’s a great big landscape, and there a million conceptual niches to be filled.

As those niches are filled, the ways in which different protocols interact can trigger truly astounding results… but for thoughts on that aspect of the whole subject, you’ll have to come see my talk.


License To rel

Published 20 years, 8 months past

If you thought XFN or VoteLinks were the last (or only) word on lightweight semantic link annotation, think again.  Tantek writes about the idea of adding a license value to indicate a link that points to licensing terms.  In his post, the expression of this idea is centered around Creative Commons (CC) licenses, but as he says, any license-link could be so annotated.  Apparently the CC folks agree, because their license generator has been updated to include rel="license" in the markup it creates. Accordingly, I’ve updated my CC license link for the Color Blender to carry rel="license", thus making it easier for a spider to auto-discover the licensing terms for the Color Blender.

Tantek also said of the idea of applying CSS to documents that uniquely styles license-links:

I wonder who will be the first to post a user style sheet that demonstrates this.

Ooo, me, me!  Well, not quite.  I don’t have a complete user stylesheet for download, but here are some quick rules I devised to highlight license links.  Add any of them to your user stylesheet, or you can use these as the basis for your own styles.  (Sorry, but they won’t work in Internet Explorer, which doesn’t support attribute selectors.)

/* simple styles */
*[rel~="license"] {font-weight: bold;}
*[rel~="license"] img {border: 3px double; color: inherit;
  padding: 1px;}

/* add a "legal" icon at the beginning of the link */
*[rel~="license"]:before {content: url(legal.gif);}

Here’s my question: should the possible values be extended?  Because I’d really like to be able to insert information based on what kind of license is being referenced.  For example, suppose there were a c-commons value for rel; that way, authors could declare a link to be rel="c-commons license".  Then we could use a rule like:

*[rel~="c-commons"]:before {content: url(c-commons.gif);}

…thus inserting a Creative Commons logo before any link that points to a CC license.  At the moment, it’s highly likely that the only rel="license" links are going to point to CC licenses, but as we move forward I suspect that will be less and less true.  I hope we’ll soon see some finer grains to this particular semantic extension.

If you don’t like using generated content for whatever reason, you could modify the rule to put the icon in the background instead, using a rule something like this:

*[rel~="c-commons"] {background: url(c-commons.gif) no-repeat;
  padding-left: 15px;}

The usual reason to avoid generated content is that IE doesn’t support it, but then IE doesn’t support attribute selectors either, as I mentioned.  So don’t add any of these rules to an IE user stylesheet.  Use Firefox, Safari, Opera, or one of the other currently-in-development browsers instead.

In other news, I was tickled pink (or maybe a dusky red) to see that for sol 34, one of the “wake-up” songs for the Spirit team was The Bobs’ Pounded on a Rock.  My hat’s off to you, Dr. Adler!  I’ve been listening to that particular album recently, mostly to relearn the lyrics.  I’ve been singing to Carolyn when I feed her, and some favorites of ours are Plastic or Paper, Now I Am A Hippie Again, Corn Dogs, and of course Food To Rent.  It’s awfully cute that she smiles at me when I sing to her, mostly because I know one day she’ll grow up, learn about things like “being on key,” and stop smiling when I sing.

In the meantime, though, she’s perfectly happy to rock on! Carolyn, sitting in a chair with her lower half covered by a blanket, raises her left hand above her head with the index and pinky fingers extended, exactly in the manner of hard rockers and head-bangers the world over.


Browse the Archive