Posts from September 2009

HTML5 And You

Published 15 years, 4 months past

I mentioned in my previous post that I “had come away with my head reeling from the massive length and depth of the often-changing specification”, which is entirely true.  Printouts of the current draft of the HTML5 spec can reach, depending on your operating system and installed fonts, somewhere north of 900 pages.  Yes: nine hundred.  There are unabridged Stephen King novels that run shorter.

You might well say to yourself: “Self, is it just me, or are the people doing this completely off their everlovin’ rockers?  Because the specification for something as fundamentally simple as HTML should reach maybe 200 pages, max.”  You might even despair that the entire enterprise is doomed to failure precisely because nobody sane will ever sit down to read that entire doorstop.

But there’s no real reason to panic, because here’s the thing about the HTML5 specification that might not be obvious right away:  it’s not for you.  It’s for implementors.  And that’s a good thing.

If you do start reading the HTML5 draft, you’ll start running into really lengthy, excruciatingly detailed algorithms for, say, parsing a time component.  Or moving through the browser’s history.  Or submitting a form.  There’s an entire (long) chapter on how to process the HTML syntax.

Those are all good things, actually.  They greatly increase the chances of interoperability actually happening within our lifetimes.  There’s no guessing about, well, much of anything.  It’s all been exactingly defined, to the extent that one can exactingly define anything using a human language.  A browser team doesn’t have to wonder, or even guess, what to do when the document has been completely parsed.  It’s all spelled out.  And the people on those browser teams will, in the end, be the people who read that entire doorstop.  (Their sanity is another matter, and not discussed here.)

How is all that stuff relevant to you, the author?  In the sense that when browser teams follow the spec, their products will be interoperable, which is to say consistent.  (Just imagine that for a moment.)

Beyond that, though, the detailed implementation stuff isn’t relevant to you.  You are not expected to know all those algorithms in order to write HTML documents.  Pretty much all you need to know is the markup.  That’s the part that should be no more than 200 pages, yeah?

Turns out it is, and by a comfortable margin.  Michael(tm) Smith’s HTML5: The Markup Language is a version of the HTML5 draft with all of those eye-wateringly pedantic implementor sections stripped out, and when I generated a PDF it came in at 147 pages.  That’s what you really need in order to get up to speed on what’s in HTML5.  It’s for you.


Nine Into Five

Published 15 years, 4 months past

Like so many others, I had tried to dig into the meat of HTML5 and figure out just what the heck was going on.  Like so many others, I had come away with my head reeling from the massive length and depth of the often-changing specification, unsure of the real meaning of much of what I had read.  And like so many others, I had gone to read the commentary surrounding HTML5 and come away deeply dispirited by the confusion, cross-claims, and rancor I found.

Then I received an invitation to join a small, in-person gathering of like-minded people, many of them just as confused and dispirited as I, to turn our collective focus to the situation and see what we found.  I already had plans for the meeting’s scheduled dates.  I altered the plans.

Over two long days, we poked and prodded and pounded on the HTML5 specification—doing our best to figure out what was meant by, and what would result from, this phrase or that example; trying to reconcile seemingly arbitrary design choices with what we knew of the web and its history and the stated goals of the HTML5 specification; puzzling over the implications of example code and detailed algorithms and non-normative notes.

In the end, we came away with a better understanding of what’s going on, and out of that arose some concerns and suggestions.  But in the main, we felt much better about what’s going on in HTML5, and have now said so publicly.

Personally, there are two markup changes I’d like most to see:

  1. The content model of footer should match that of header. As others have said, the English-language name of the footer element creates expectations about what it is and how it should work.  As the spec now stands, most of those expectations will be wrong.  To wit: if your page’s footer includes navigation links, and especially if you have an HTML5-structured “fat footer“, you can’t use footer to contain it.

    If this feels a little familiar, it should: the same problem happened with address, which was specified to mean only the contact information for the author of a page.  It was quite explicitly specified to not accept mailing addresses.  Of course, tons of people did just that, because they had an address and there was an address element, so of course they went together!

    A lot of us cringed every time this came up in the last ten years of conducting training, because it meant we’d have to spend a few minutes explaining that the meaning of the element’s name clashed with its technical design.  We saw a lot of furrowed brows, rolled eyes, and derisively shaken heads.  That will be magnified a millionfold with footer if things are allowed to stand as they are.

    As I said, the fix is simple: just change the content model of footer to state:

    Flow content, but with no header or footer element descendants.

    That’s exactly the same content model as header, and for the same reasons.

  2. time needs to be less restrictive.  That’s not very precise, I know.  But as things stand now, you can only apply time to Gregorian datetimes, and you’re not supposed to use it for anything that couldn’t be easily represented in a calendaring program.  The HTML5 specification says:

    The time element is not intended for encoding times for which a precise date or time cannot be established.

    That makes me wonder, in a manner not at all like Robert Plant, how precise do we have to be?  The answer, I’m sorry to say, is too much.

    To pick an example: I have what I think of as a great use case for the time element, and while it uses the Gregorian calendar, it’s only accurate to whole months (as is Wikipedia’s version).  In some cases I could get the values down to specific days; but in others, maybe not.  So I can’t use the datetime attribute, which requires at least year-month-day, if not actual hours and minutes.  I could omit the attribute, and just have this:

    <time>October 2007</time>
    

    In that case, the content has to be a valid date string in content—which is to say, a valid date string with optional whitespace.  So that won’t work.

    I’ve pondered how best to tackle this, as did the Super Friends.  Our suggestion is to allow bare year and month-day values as permitted in ISO8601.  In addition, I think we should allow a valid date string to only require a year, with month, day, and time optional.  That seems good enough as long as we’re going to go with the idea that the Gregorian calendar contains all the time we ever want to structure.

    But what about other, older dates, some of which are fairly precisely known within their own calendars?  On that point, though the historian in me clamors for a fix, I’m uncertain as to what.  PPK, on the other hand, has put alot of thought into this and written a piece that I have skimmed but never, perhaps ironically, found the time to read in its entirety.

These are not my only concerns, but they’re the big ones.  For the rest, I concur with the hiccups guide, though of course to varying degrees.  I’m still trying to decide how much I care (or don’t) about the subtle differences between article and section, for example, or the way aside fits (or doesn’t) with its cousin elements.  And dialog just bugs me, but I’m not sure I have a better proposal, so I’ll leave it be for the time being.

At the other end of the two days, I felt a good deal more calm and hopeful than I did going in.  As Jeffrey said, “the more I study the direction HTML5 is taking, the better I like it”.  While there are still rough edges to be smoothed, there is time to smooth them.  We’ve already seen responsiveness on some of the points we addressed in the hiccups guide, and discussions around others.  The specification itself is daunting, especially to those who might remember the compact simplicity of the HTML2 spec.  Fortunately, it has good internal cross-linking so that you can, with effort, track down exactly what’s meant by “valid date string with optional time” or “sectioning content” or “formatBlock candidate“.

With HTML5, the web is not ending, nor is it starting over.  It’s evolving, slowly and in full view of the public, with an opportunity for anyone to have their say (which is not, of course, the same as having one’s proposals accepted).  It’s the next step, and I feel quite a bit more confident that it’s a step onto solid ground.


Browse the Archive