So it turns out that crackers can mess up your Web site with nothing more than a malformed HTTP packet. You might think something as simple as HTTP would be basically risk-free, but no, I’m afraid not. All it takes is interaction between programs that handle HTTP data slightly differently, and hey presto, you’ve got a security hole.
Ben Laurie weighed in on this:
“It is interesting that being liberal in what you accept is the base cause of this misbehaviour,” Laurie says. “Perhaps it is time the idea was revisited.”
That’s a reference to the late Jon Postel‘s dictum (from RFC 793) of “be conservative in what you do, be liberal in what you accept from others”. This is done in the name of robustness: if you’re liberal in what you accept, you can recover from data corruption caused by unanticipated problems.
Laurie’s right. The problem is that being liberal in what you accept inevitably leads to a systemic corruption. Look at the display layer of the Web. For years, browsers have been liberal in what markup they accept. What did it get us? Tag soup. The minute browsers allowed authors to be lazy, authors were lazy. The tools written to help authors encoded that laziness. Browsers had to make sure they could deal with even more laziness, and the tools kept up. Just to get CSS out of that death spiral, we (as a field) had to invent, implement, and explain DOCTYPE switching.
In XML, it’s defined that a user agent must throw an error on malformed markup and stop. No error recovery attempts, just a big old “this is broken” message. Gecko already does this, if you get it into full-on XML mode. It won’t do it on HTML and XHTML served as
text/html, though, because too many Web pages would just break. If you serve up XHTML as
application/xml+xhtml, and it’s malformed, you’ll be treated to an error message. Period.
And would that be so bad, even for HTML? After all, if IE did it, you can be sure that people would fix their markup. If browsers had done it from the beginning, markup would not have been malformed in the first place. (Weird and abnormal, perhaps, but not actually malformed.) Håkon said five years ago that “be liberal in what you accept” is what broke the Web, markup- and style-wise. It’s been a longer fight than that to start lifting it out of that morass, and the job isn’t done.
Authors of feed aggregators have similar dilemmas. If someone subscribes to a feed, thus indicating their interest in it, and the feed is malformed, what do you do? Do you undertake error recovery in an attempt to give the user what they want, or do you just throw up an error message? If you go the error route, what happens when a competitor does the error recovery, and thus gets a reputation as being a better program, even though you know it’s actually worse? That righteous knowledge won’t pay the heating bills, come winter.
“So what?” you may shrug. “It’s not like RSS feeds can be used to breach security”.
Which is just what anyone would have said about HTTP, until very recently.
In the end, the real problem is that liberal acceptance of data will always be used. Even if every single HTTP implementor in the world got together and made sure all their implementations did exactly the same strictly correct conservatively defined thing, there would still be people sending out malformed data. They’d be crackers, script kiddies—the people who have incentive to not be conservative in what they send. The only way to stop them from sending out that malformed data is to be conservative in what your program accepts.
Even then, it might be possible to exploit loopholes, but at least they’d be flaws in the protocol itself. Finding and fixing those is important. Attempting to cope with the twisted landscape of bizarrely interacting error-recovery routines is a fool’s errand at best. Unfortunately, it’s an errand we’re all running.