HTML5 And You

Published 14 years, 10 months past

I mentioned in my previous post that I “had come away with my head reeling from the massive length and depth of the often-changing specification”, which is entirely true.  Printouts of the current draft of the HTML5 spec can reach, depending on your operating system and installed fonts, somewhere north of 900 pages.  Yes: nine hundred.  There are unabridged Stephen King novels that run shorter.

You might well say to yourself: “Self, is it just me, or are the people doing this completely off their everlovin’ rockers?  Because the specification for something as fundamentally simple as HTML should reach maybe 200 pages, max.”  You might even despair that the entire enterprise is doomed to failure precisely because nobody sane will ever sit down to read that entire doorstop.

But there’s no real reason to panic, because here’s the thing about the HTML5 specification that might not be obvious right away:  it’s not for you.  It’s for implementors.  And that’s a good thing.

If you do start reading the HTML5 draft, you’ll start running into really lengthy, excruciatingly detailed algorithms for, say, parsing a time component.  Or moving through the browser’s history.  Or submitting a form.  There’s an entire (long) chapter on how to process the HTML syntax.

Those are all good things, actually.  They greatly increase the chances of interoperability actually happening within our lifetimes.  There’s no guessing about, well, much of anything.  It’s all been exactingly defined, to the extent that one can exactingly define anything using a human language.  A browser team doesn’t have to wonder, or even guess, what to do when the document has been completely parsed.  It’s all spelled out.  And the people on those browser teams will, in the end, be the people who read that entire doorstop.  (Their sanity is another matter, and not discussed here.)

How is all that stuff relevant to you, the author?  In the sense that when browser teams follow the spec, their products will be interoperable, which is to say consistent.  (Just imagine that for a moment.)

Beyond that, though, the detailed implementation stuff isn’t relevant to you.  You are not expected to know all those algorithms in order to write HTML documents.  Pretty much all you need to know is the markup.  That’s the part that should be no more than 200 pages, yeah?

Turns out it is, and by a comfortable margin.  Michael(tm) Smith’s HTML5: The Markup Language is a version of the HTML5 draft with all of those eye-wateringly pedantic implementor sections stripped out, and when I generated a PDF it came in at 147 pages.  That’s what you really need in order to get up to speed on what’s in HTML5.  It’s for you.

Comments (18)

  1. Well said, Eric. I’ve read various parts of the spec, from the parsing section to the element definition sections, to the form parsing section, and I’m savvy enough to understand it fairly well (I was trying to implement a parser at the time, because PHP’s HTML parser sucks), and indeed very little of the spec is of particular interest to authors.

    We do need another informative document like “The Markup Language”, however: “JavaScript for the Web”. HTML5 contains many sections which add or clarify JavaScript interfaces, and some sections which are the only real spec for a few well-known scripting features. That stuff’s relevant for authors, too, and it’s a shame it’s not easier to comb through.

  2. So basicly the html5 spec is a browser in pseudocode?

  3. (not browser, html render engine)

  4. Actually, the HTML5 specification is for both authors and browsers (and various other actors). The WHATWG copy of the HTML5 specification has the option to hide text that is specific to implementors. (The W3C copy might feature it as alternate style sheet, I forgot.)

    The main “issue” is that the specification text is written to be pedantically correct. This does not always make it accessible to everyone, but does give you the definitive resource if you want to figure out whether something is correct or incorrect per HTML5.

    The hope is that the introductory sections (many yet to be written) make things more accessible and that many tutorials will be published outside the realm of the W3C and WHATWG much like has happened with other published specification.

  5. I forgot to mention that DanC from the W3C generated a static copy of the specification without implementation requirements and put it online here:

  6. Turns out it is, and by a comfortable margin. Michael(tm) Smith”s HTML5: The Markup Language is a version of the HTML5 draft with all of those eye-wateringly pedantic implementor sections stripped out

    There’s a bunch of other stuff stripped out: most of the text concerning common attributes (all of it for events), all the DOM/API stuff as well as the specification of Canvas (aka the useful part), the clear grouping of HTML elements is replaced by a soulless alpha list, microdata is gone as well, …
    As advertised, “HTML5: The Markup Language” contains only a bare description of the HTML5 markup and as a result a lot of stuff useful to language users has disappeared.

  7. As Anne said, there’s also a version of the actual spec itself with the implementation details hidden.

    The WHATWG single-page version:

    The WHATWG multipage version:

    It roughly halves the length of the printed version of the spec (from 506 to 250 pages, with the recent changes to the default printing styles).

  8. The proof of this will be how easy it is to implement a 900-page standard. If it takes vast armies of volunteers and years of mind-numbing programming and debugging before a browser will begin to look finished, then maybe this isn’t such a good idea. If all major browsers quickly and correctly implement the whole standard, then it might be a success.

    Looking at it from the user perspective, it is said that it shrinks down to “only” 250 pages. I have to cringe at the directions for reading the document: “This specification should be read like all other specifications. First, it should be read cover-to-cover, multiple times. Then, it should be read backwards at least once. Then it should be read by picking random sections from the contents list and following all the cross-references.” I hope they’re joking, but I’m not sure.

    Maybe there’s a misunderstanding. Rather than a statement of rules, this document appears to be consist mostly of details of internal implementation. And it’s verbose, even by computer standards. If it all boils down to simple, clear rules that are easy to implement, then it will surely be a success. If it’s crufty, then I fear we’re in for more of the same.

  9. > If it takes vast armies of volunteers and years of mind-numbing
    > programming and debugging before a browser will begin to look
    > finished,

    It certainly would. However, that’s what it takes to write a browser engine for the real Web.

    You could simplify the spec by leaving out a lot of details about what Web pages expect, which makes writing a Web browser take longer because you have to reverse-engineer all those details yourself. Or you could simplify the spec by specifying behaviour simpler than what Web pages expect, in which case a browser that follows your spec can no longer browse the real Web.

  10. >It certainly would. However, that”s what it takes to write a browser
    >engine for the real Web.

    It all seems a bit out of control. I don’t know, maybe they ought to decide what’s most important and take it smaller bites. It doesn’t seem like a standard that isn’t fully implemented is really a standard. If it gets fully implemented right away and it’s really simple for authors, that will be great.

  11. It won’t be fully implemented right away…however, it’s being implemented in chunks by the browser vendors. Opera has implemented several parts, as have Firefox and Safari. I think even IE8 has one or two of the HTML5 bits.

    Granted, there’s not lots of overlap, but many of the “features” of HTML5 can be used as is with current browsers without them specifically implementing them, especially the various new tags (such as ARTICLE, SECTION, HEADER and FOOTER). I won’t say much about it here, there’s plenty of resources that discuss them, including several from Mr. Meyer himself.

  12. is worth reading on this topic.

  13. >”It won”t be fully implemented right away…however, it”s being implemented in chunks by the browser vendors.”

    All right, let’s spell it out. That only works if browsers fully implement the same chunks. If browser A implements chunk X and browser B implements chunk Y, but Web author C uses chunk Z or feature Q, which isn’t fully implemented, then it’s back to the usual mess. A standard is not very useful until authors can rely on implementation.

    So maybe it needs to be rolled out in phases, with a timetable for implementation, so authors can _rely_ on implementation. Any standard ought to include only those features that are well thought out and are simple to use _and_ implement. I hope I’m wrong about this and you guys have it covered, but still, that’s a big standard.

    I’m just looking at the current situation on browser compliance. Look at this and following sections, for exaple: . The current situation is a crazy quilt of partial implementation. I’ve been told that the important parts have been implemented. Well great, I guess that means that the other parts are unimportant. Somebody needs to cut the cruft.

  14. As long as we don’t have to read and to know the content of the 900 pages, HTML 5 can come :) I haven’t used it yet but I’m pretty excited how it will change the way we develop our HTML Pages (I started with HTML 4 so 5 will be a greater step for me, than from html4 to xhtml).

  15. If HTML5 were a business project and its chief editor was a project manager in business he would have long since been fired. Every project has to deal with issues of scope creep and sometimes such problems are beyond the control of the project manager, but nonetheless that project manager will work their hardest to do their job by controlling costs and keeping things on target and within scope. I honestly believe that if HTML5 were a business venture the business would have long ago gone bankrupt and been forgotten, but standards efforts sometimes do not reflect the real world. Unlike business where scope creep is a commonly feared fire breathing beast it actually seems to be encouraged upon the HTML5 specification. That is bad.

    Here is what I have noticed so far:

    1) HTML5 is not a standards specification for HTML. It does not know what it is. Its authors, in my opinion, cannot tell the difference between the application environments that are the web and a markup language that is a data structure. Prior versions of HTML, all of them, were markup languages. HTML5, however, is clearly more than that as it proposed standards for video and audio codecs, session data, and other application features that have nothing to do with markup languages. As a result this is not a specification for HTML, but could easily be renamed from HTML5 to Web5 as it seems intent to engulf absolutely all aspects of the web except, possibly, for HTTP itself. Although maybe I am wrong, maybe it is intent on swallowing parts of HTTP as well.

    2) The HTML5 specification has no boundaries. Every project, business venture, or technology initiative has a defined list of requirements. Completion of those written requirements is the ONLY objective. Changes may occur, but changes are appended to the requirements and the requirements are always well known. HTML5 does not have requirements since it cannot even define itself. In business, in the real world, you never begin work on a project without knowing the requirements. If radical tangents are raised that become important then a proposal is drafted for a separate project or projects.

    3) HTML5 does not know what it wants to be. Since its requirements are not defined, and completely unknown, it does not know what the final product should be. This is why work must not begin until the requirements are written down. Since HTML5 does not know what it wants to be it can neither be a success or a failure as there is nothing to compare it against, but it will fail everything dependent upon it.

    4) Relevant external considerations always exist to any project. In this case the most important external considerations are the business costs of adoption into the user-agent applications. In the real world business costs, and the cost of doing business with business partners, can kill any project. Partners evaluate technology for its merits, costs, and potential prospects for generating revenue or opening new market sources. HTML5 seems to partially ignore this issue. It, since it does not know what it is, does not know what its costs are. It can’t. The result is that because the complexity is so vast, since there is no scope, it will never been implemented uniformly.

    5) HTML5 cannot identify what should be its target audience. The audience of any web business related venture is the persons who use the web and consume its data and services. A business, partnership, or any consortiums of companies never say to hell with the users what is best for our developers. When that does occur, the users, whose very existence the business exists to serve, are alienated away until sufficient market share is destroyed. I believe HTML5 has unintentionally done this very thing by focusing on what is most popular and usable to developers first before ever bothering think about what is best for the end user or how the core technology can be improved to help focus on what the end users consider most important.

    By focusing, any business venture, on usability first instead of the requirements of the technology or the sustainability of the business model the venture represents the end result tends to be frighteningly less usable. This occurs because web developers believe they are very good at knowing what the user wants, so they are quick to tell the user what they want without asking them. By catering to the developers of the technology without regard for the interests of the consumers to that technology HTML5 has done exactly this, which if it were a business with a product or service, would have alienated its consumers and destroyed its business market share.

    6) Client-side script from the web is the greatest security failure in human history by quantity of incidents. More than 95% of all reported security vulnerabilities are related to some form of client-side scripting. The average cost of a single compromise incident in 2008 was $11.3 million US. If HTML5 does not address a solution to this problem then why would anybody spend the time and money to adopt HTML5? In business a new product version is released because it solves problems or meets some consumer demand. This is logical since new software costs money to develop. HTML5, despite consuming all technology not related to markup languages, does not solve this problem. It does not even address it. In business a technology that does not solve important problems or meet consumer demands would suffer a natural boycott. Microsoft Vista is a perfect such example. It is perfectly understandable that HTML5 would not even address this concern since it does not know what it is.

    Sadly, HTML5 does not seem to care that it is a business calamity. Several people have raised such concerns about scope creep, interoperability, security, accessibility, and so on. Sadly, when those concerns are heard as contrary to objectives of media or usability the subject is changed or the topic quashed into avoidance. It is a standard without competition, and that is all that matters since it will be the future of the web— right?

    Businesses, unlike technology standards, are flexible and shift to better serve their market. Unlike standards businesses face competition and strive to develop innovative solutions to better serve their market in order to be competitive. While standards, at least certain open standards, are not subject to competition technology always is. If HTML5 proves to be the technology and the colossal business swell the specification appears to be it will only create a vacuum for something else that does what it cannot.

    It will only be a matter of time before a superior technology is drafted that is easier to read, cheaper to implement, and solves the problems HTML5 does not choose to solve. The rise of such a technology, if passed into the open domain and allowed for open consumption, will be adopted in favor of HTML. The cheaper technology that better addresses the needs of its business always wins. If such a technology is cheaper to implement and it is easier to write for there is no reason why it could not be adopted.

  16. Dive into HTML5 is shaping up nicely as a great guide for HTML authors.

  17. Andreas said:

    So basicly the html5 spec is a browser in pseudocode?

    This is much truer than you probably realize.

  18. Pingback ::

    HTML 5 |Novedades y Tutorial |

    […] A4. Sin embargo, hay mucha información estrictamente técnica que no afecta al diseño web como dice Eric Meyer . Por ejemplo, el cómo hacer un análsis sintáctico (“parsear”) del componente tiempo o cómo […]

Add Your Thoughts

Meyerweb dot com reserves the right to edit or remove any comment, especially when abusive or irrelevant to the topic at hand.

HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <em> <i> <q cite=""> <s> <strong> <pre class=""> <kbd>

if you’re satisfied with it.

Comment Preview