So the feeds I read have been buzzing the past few days with running commentary of the WebKit and Opera teams’ race to be the first to hit 100/100 on Acid3, and then after that the effort to get a pixel-perfect match with the reference image. Last I saw, Opera claimed to have gotten to 100 first but it looked like WebKit had gotten both with something publicly available, but I haven’t verified any of this for myself. Nor do I have any particular plans to do so.
Because as lovely as it is to see that you can, in fact, get one or more browser implementation teams to jump in a precisely defined sequence through a series of cunningly (one might say sadistically) placed hoops, half of which are on fire and the other half lined with razor wire, it doesn’t strike me as the best possible use of the teams’ time and energy.
No, I don’t hate standards, though I may hate freedom (depends on who’s asking). What I disagree with is the idea that if you cherry-pick enough obscure and difficult corners of a bunch of different specifications and mix them all together into a spicy meatball of difficulty, it constitutes a useful test of the specifications you cherry-picked. Because the one does not automatically follow from the other.
For example, suppose I told you that WebKit had implemented just the bits of SMIL-related SVG needed to pass the test, and that in doing so they exposed a woefully incomplete SVG implementation, one that gets something like 2% pass rates on actual SMIL/SVG tests. Laughable, right? Yes, well.
Of course, that’s in a nightly build and they might totally support SMIL by the time the corresponding final version is released and we’ll all look back on this and laugh the carefree laugh of children in springtime. Maybe. The real point here is that the Acid3 test isn’t a broad-spectrum standards-support test. It’s a showpiece, and something of a Potemkin village at that. Which is a shame, because what’s really needed right now is exhaustive test suites for specifications– XHTML, CSS, DOM, SVG, you name it. We’ve been seeing more of these emerge recently, but they’re not enough. I’d have been much more firmly in the cheering section had the effort that went into Acid3 had gone into, say, an obssessively thorough DOM test suite.
I’d had this post in mind for a while now, really ever since Acid3 was released. Then the horse race started to develop, and I told myself I really needed to get around to writing that post—and I got overtaken. Well, that’s being busy for you. It’s just as well I waited, really, because much of what I was going to say got covered by Mike Shaver in his piece explaining why Firefox 3 isn’t going to hit 100% on Acid3. For example:
Ian’s Acid3, unlike its predecessors, is not about establishing a baseline of useful web capabilities. It’s quite explicitly about making browser developers jump… the Acid tests shouldn’t be fair to browsers, they should be fair to the web; they should be based on how good the web will be as a platform if all browsers conform, not about how far any given browser has to stretch to get there.
That’s no doubt more concisely and clearly stated than I would have managed, so it’s all for the best that he got to say it first.
By the by, I was quite intrigued by this part of Mike’s post:
You might ask why Mozilla’s not racking up daily gains, especially if you’re following the relevant bugs and seeing that people have produced patches for some issues that are covered by Acid3.
The most obvious reason is Firefox 3. We’re in the end-game of building what I really do believe is the best browser the web has ever known, and we expect to be putting it in the hands of more than 170 million users in a pretty short period of time. We’re still taking fixes for important issues, but virtually none of the issues on the Acid3 list are important enough for us to take at this stage. We don’t want to be rushing fixes in, or rushing out a release, only to find that we’ve broken important sites or regressed previous standards support, or worse introduced a security problem. Every API that’s exposed to content needs to be tested for compliance and security and reliability… We think these remaining late-stage patches are worth the test burden, often because they help make the web platform much more powerful, and reflect real-web compatibility and capability issues. Acid3’s contents, sadly, are not as often of that nature.
You know, it’s weird, but that seems really familiar, like I’ve heard or read something like that before. Now if only I could remember… Oh yeah! It’s basically what the IE team said about not passing Acid2 when the IE7 betas came out, for which they were promptly excoriated.
Well, never mind that now. Of course it was a totally different set of circumstances and core motivations, and I’m sure there’s absolutely no parallel to be drawn between the two situations. At all.
Returning to the main point here: I’m a little bit sad, to tell the truth. The original acid test was a prefect example of what I think makes for a good stress test. Recall that the test’s original name, before it got shorthanded, was the “Box Model Acid Test”. It was a test of CSS box model handling, including floats. That’s all it was designed to do. It did that fairly well for its time, considering it was part of a CSS1 test suite. It didn’t try to combine box model testing with tests for PNG support, HTML parse error recovery, and DOM scripting.
To me, the ideal CSS test suite is one that has a bunch of basic property/value tests, like the ones I’ve been responsible for creating (1, 2), along with a bunch of acid tests for specific areas or concepts in that specification. So an acidified CSS test suite would have individual acid tests for the box model, positioning, fonts, selectors, table layout, and so on. It would not involve scripting or markup parsing (beyond what’s needed to handle selectors). It would not use animated SVG icons. Hell, it probably wouldn’t even use PNGs, except possibly alphaed PNGs when testing opacity and RGBA colors. And maybe not even then.
So in a DOM test suite, you’d have one test page for each method or attribute, and then build some acid tests out of related bits (say, on an entire interface or set of closely related interfaces). And maybe, at the end, you’d build an overarching acid test that rolled verything in the DOM spec into one fiendishly difficult test. But it would be just about the DOM and whatever absolute minimum of other stuff you needed, like text rendering and maybe GIF support. (Similarly, the CSS tests had to assume some basic HTML and CSS selector support, or else everything else fell down.)
And then, after all those test suites have been built up and a series of acid tests woven into them, with each one culminating in its own spec-spanning acid test, you might think about taking those end-point acid tests and slamming them all together into one super-ultra-hyper-mega acid test, something that even the xenomorphs from the Alien series would look at and say, “That’s gonna sting”. That would be awesome. But that’s not what we have.
I fully acknowledge that a whole lot of very clever thinking went into the construction of Acid3 (as was true of Acid2), and that a lot of very smart people have worked very hard to pass it. Congratulations all around, really. I just can’t help feeling like some broader and more important point has been missed. To me, it’s kind of like meeting the general challenge of finding an economical way to loft broadband transceivers to an altitude of 25,000 feet (in order to get full coverage of large metropolitan areas while avoiding the jetstream) by daring a bunch of teams to plant a transceiver near the summit of Mount Everest—and then getting them to do it. Progress toward the summit can be demonstrated and kudos bestowed afterward, but there’s a wider picture that seems to have been overlooked in the process.