meyerweb.com

Skip to: site navigation/presentation
Skip to: Thoughts From Eric

Archive: 'Tech' Category

Transiently Damaged PDF Attachments

I have this very odd problem that seems to be some combination of PDF, Acrobat, Outlook, Thunderbird, and maybe even IMAP and GMail. I know, right?

The problem is that certain PDFs sent to me by a single individual won’t open at first. I’ll get one as an email attachment. I drag the attachment to a folder in my (Snow Leopard) Finder and double-click it to open. The error dialog I immediately get from Acrobat Professional is:

There was an error opening this document. The file is damaged and could not be repaired.

Preview, on the other hand, tells me:

The file “[redacted]” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.

When this happens, I tell the person who sent me the file that The Problem has happened again. She sends me the exact same file as an attachment. Literally, she just takes the same file she sent before and drags it onto the new message to send to me again.

And this re-sent file opens without incident. Every time. Furthermore, extra re-sends open without incident. I recently had her send me the same initially damaged file five times, some attached to replies and others to brand-new messages. All of them opened flawlessly. The initially damaged file remained damaged.

Furthermore, if I go through the GMail web interface, I can view the initial attached PDF (the one my OS X applications say is damaged) through the GMail UI without trouble. If I download that attachment to my hard drive, it similarly opens in Acrobat (and Preview) without trouble.

A major indication of damage: that first download is a different size than all the others. In the most recent instance, the damaged file is 680,302 bytes. The undamaged files are all 689,188 bytes. If only I knew why it’s damaged the first time, and not all the others!

So far, I’ve yet to see this happen with PDFs from anyone else, but then I receive very few attached PDFs from people other than this one (our events manager at An Event Apart, who sends and receives PDFs and Office documents like they’re conversational speech—an occupational hazard of her line of work), and it only seems to happen with PDFs of image scans that she’s created. Other types of PDFs, whether she generated them or not, seem to come through fine; ditto for other file types, like Word documents. I’d be tempted to blame the scanning software, but again: the exact same file is damaged the first time, and fine on every subsequent re-attachment.

I’ve done some Googling, and found scattered advice on ways clear up corrupted-PDF-attachment problems in Thunderbird. I’ve followed these pieces of advice, and nothing has helped. In summary, I have so far:

  1. Set mail.server.default.fetch_by_chunks to false.
  2. Set mail.imap.mime_parts_on_demand to false.
  3. Set mail.server.default.mime_parts_on_demand to false.
  4. Tried the Thunderbird extension OPENATTACHMENTBYEXTENSION. That failed, and so I immediately uninstalled it because handling files by extension alone is just asking to be pwned, regardless of your operating system or personal level of datanoia. (I wouldn’t have left it installed had it worked; I just wanted to see if it did work as a data point.)

Here’s what I know about the various systems in play here:

  • I’m using Thunderbird 11.0.1 on OS X 10.6.8.
  • The attachments are always sent via Outlook 2010 on Windows 7.
  • The software used for the scanning is the HP scanning software that was installed with the scanner. Scans are saved to the hard drive, renamed, and then manually attached to the email. On resend, the same file is manually attached to the email.
  • My email account is a GMail IMAP account.

So. Any ideas?

Customizing Your Markup

So HTML5 allows you (at the moment) to create your own custom elements. Only, not really.

(Ed. note: this post has been corrected since its publication, and a followup clarification has been posted.)

Suppose you’re creating a super-sweet JavaScript library to improve text presentation—like, say, TypeButter—and you need to insert a bunch of elements that won’t accidentally pick up pre-existing CSS. That rules span right out the door, and anything else would be either a bad semantic match, likely to pick up CSS by mistake, or both.

Assuming you don’t want to spend the hours and lines of code necessary to push ahead with span and a whole lot of dynamic CSS rewriting, the obvious solution is to invent a new element and drop that into place. If you’re doing kerning, then a kern element makes a lot of sense, right? Right. And you can certainly do that in browsers today, as well as years back. Stuff in a new element, hit it up with some CSS, and you’re done.

Now, how does this fit with the HTML5 specification? Not at all well. HTML5 does not allow you to invent new elements and stuff them into your document willy-nilly. You can’t even do it with a prefix like x-kern, because hyphens aren’t valid characters for element names (unless I read the rules incorrectly, which is always possible).

No, here’s what you do instead (corrected 10 Apr 12):

  1. Wrap your document, or at least the portion of it where you plan to use your custom markup,Define the element customization you want with an element element. That’s not a typo.
  2. To your element element, add an extends attribute whose value is the HTML5 element you plan to extend. We’ll use span, but you can extend any element.
  3. Now add a name attribute that names your custom “element” name, like x-kern.
  4. Okay, you’re ready! Now anywhere you want to add a customized element, drop in the elements named by extends and then supply the name via an is attribute.

Did you follow all that? No? Okay, maybe this will make it a bit less unclear. (Note: the following code block was corrected 10 Apr 12.)

<element extends="span" name="x-kern"></element>
<h1>
<span is="x-kern" style="…">A</span>
<span is="x-kern" style="…">u</span>
<span is="x-kern" style="…">t</span>
<span is="x-kern" style="…">u</span>
mn
</h1>
<p>...</p>
<p>...</p>
<p>...</p>

(Based on markup taken from the TypeButter demo page. I simplified the inline style attributes that TypeButter generates for purposes of clarity.)

So that’s how you create “custom elements” in HTML5 as of now. Which is to say, you don’t. All you’re doing is attaching a label to an existing element; you’re sort of customizing an existing element, not creating a customized element. That’s not going to help prevent CSS from being mistakenly applied to those elements.

Personally, I find this a really, really, really clumsy approach—so clumsy that I don’t think I could recommend its use. Given that browsers will accept, render, and style arbitrary elements, I’d pretty much say to just go ahead and do it. Do try to name your elements so they won’t run into problems later, such as prefixing them with an “x” or your username or something, but since browsers support it, may as well capitalize on their capabilities.

I’m not in the habit of saying that sort of thing lightly, either. While I’m not the wild-eyed standards-or-be-damned radical some people think I am, I have always striven to play within the rules when possible. Yes, there are always situations where you work counter to general best practices or even the rules, but I rarely do so lightly. As an example, my co-founders and I went to some effort to play nice when we created the principles for Microformats, segregating our semantics into attribute values—but only because Tantek, Matt, and I cared a lot about long-term stability and validation. We went as far as necessary to play nice, and not one millimeter further, and all the while we wished mightily for the ability to create custom attributes and elements.

Most people aren’t going to exert that much effort: they’re going to see that something works and never stop to question if what they’re doing is valid or has long-term stability. “If the browser let me do it, it must be okay” is the background assumption that runs through our profession, and why wouldn’t it? It’s an entirely understandable assumption to make.

We need something better. My personal preference would be to expand the “foreign elements” definition to encompass any unrecognized element, and let the parser deal with any structural problems like lack of well-formedness. Perhaps also expand the rules about element names to permit hyphens, so that we could do things like x-kern or emeyer-disambiguate or whatever. I could even see my way clear to defining an way to let an author list their customized elements. Say, something like <meta name="custom-elements" content="kern lead follow embiggen shrink"/>. I just made that up off the top of my head, so feel free to ignore the syntax if it’s too limiting. The general concept is what’s important.

The creation of customized elements isn’t a common use case, but it’s an incredibly valuable ability, and people are going to do it. They’re already doing it, in fact. It’s important to figure out how to make the process of doing so simpler and more elegant.

Invented Elements

This morning I caught a pointer to TypeButter, which is a jQuery library that does “optical kerning” in an attempt to improve the appearance of type. I’m not going to get into its design utility because I’m not qualified; I only notice kerning either when it’s set insanely wide or when it crosses over into keming. I suppose I’ve been looking at web type for so many years, it looks normal to me now. (Well, almost normal, but I’m not going to get into my personal typographic idiosyncrasies now.)

My reason to bring this up is that I’m very interested by how TypeButter accomplishes its kerning: it inserts kern elements with inline style attributes that bear letter-spacing values. Not span elements, kern elements. No, you didn’t miss an HTML5 news bite; there is no kern element, nor am I aware of a plan for one. TypeButter basically invents a specific-purpose element.

I believe I understand the reasoning. Had they used span, they would’ve likely tripped over existing author styles that apply to span. Browsers these days don’t really have a problem accepting and styling arbitrary elements, and any that do would simply render type their usual way. Because the markup is script-generated, markup validation services don’t throw conniption fits. There might well be browser performance problems, particularly if you optically kern all the things, but used in moderation (say, on headings) I wouldn’t expect too much of a hit.

The one potential drawback I can see, as articulated by Jake Archibald, is the possibility of a future kern element that might have different effects, or at least be styled by future author CSS and thus get picked up by TypeButter’s kerns. The currently accepted way to avoid that sort of problem is to prefix with x-, as in x-kern. Personally, I find it deeply unlikely that there will ever be an official kern element; it’s too presentationally focused. But, of course, one never knows.

If TypeButter shifted to generating x-kern before reaching v1.0 final, I doubt it would degrade the TypeButter experience at all, and it would indeed be more future-proof. It’s likely worth doing, if only to set a good example for libraries to follow, unless of course there’s downside I haven’t thought of yet. It’s definitely worth discussing, because as more browser enhancements are written, this sort of issue will come up more and more. Settling on some community best practices could save us some trouble down the road.

Update 23 Mar 12: it turns out custom elements are not as simple as we might prefer; see the comment below for details. That throws a fairly large wrench into the gears, and requires further contemplation.

Negative Proximity

There’s a subtle aspect of CSS descendant selectors that most people won’t have noticed because it rarely comes up: selectors have no notion of element proximity. Here’s the classic demonstration of this principle:

body h1 {color: red;}
html h1 {color: green;}

Given those styles, all h1 elements will be green, not red. That’s because the selectors have equal specificity, so the last one wins. The fact that the body element is “closer to” the h1 than the html element in the document tree is irrelevant. CSS has no mechanism for measuring proximity within the tree, and if I had to place a bet on the topic I’d bet that it never will.

I bring this up because it can get you into trouble when you’re using the negation pseudo-class. Consider:

div:not(.one) p {font-weight: bold;}
div.one p {font-weight: normal;}

<div class="one">
  <div class="two">
    <p>Hi there!</p>
  </div>
</div>

Given these styles, the paragraph will not be boldfaced. That’s because both rules match, so the last one wins. The paragraph will be normal-weight.

“AHA!” you cry. “But the first rule has a higher specificity, so it wins regardless of the order they’re written in!” You’d think so, wouldn’t you? But it turns out that the negation pseudo-class isn’t counted as a pseudo-class. It, like the univseral selector, doesn’t contribute to specificity at all:

Selectors inside the negation pseudo-class are counted like any other, but the negation itself does not count as a pseudo-class.

—Selectors Level 3, section 9: Calculating a selector’s specificity

If you swapped the order of the rules, you’d get a boldfaced paragraph thanks to the “all-other-things-being-equal-the-last-rule-wins” step in the cascade. However, that wouldn’t keep you from getting a red-on-red paragraph in this case:

div:not(.one) p {color: red;}
div.one p {background: red;}

<div class="one">
  <div class="two">
    <p>Hi there!</p>
  </div>
</div>

The paragraph is a child of a div that doesn’t have a class of one, but it’s also descended from a div that has a class of one. Both rules apply.

(Thanks to Stephanie Hobson for first bringing this to my attention.)

The Web Ahead, Episode #18: Me!

Last Thursday, I had the rare honor and privilege of chatting with Jen Simmons as a guest on The Web Ahead . (I’ve also chatted with Jen in real life. That’s even awesomer!) As is my wont, I completely abused that privilege by chatting for two hours—making it the second-longest episode of The Web Ahead to date—about the history of the web and CSS, what’s coming up that jazzes me the most, and all kinds of stuff. I even revealed, toward the end of the conversation, the big-picture projects I dearly wish I had time to work on.

The finished product was published last Friday morning. I know it’s a bit of a lengthy beast, but if you’re at all interested about how we got to where we are with CSS, you might want to give this a listen: The Web Ahead, Episode #18. Available for all your finer digital audio players via embedded Flash player, iTunes, RSS, and MP3 download.

My deepest thanks to Jen for inviting me to be part of the show!

Finding Unicode

A little while back, I was reading some text when I realized the hyphens didn’t look quite right. A little too wide, I thought. Not em-dash wide, but still…wide. Wide-ish? But when I copied some of the text into a BBEdit window, they looked just like the hyphens I typed into the document.

Of course, I know Unicode is filled with all manner of symbols and that the appearance of those symbols can vary from one font face to another. So I changed the font face, made the size really huge, and behold: they were indeed different characters. At this point, I was really curious about what I’d found. What exactly was it? How would I find out?

For the record, here’s the character in question:

Googling “−” and “− Unicode” got me nothing useful. I knew I could try the Character Viewer in OS X, and eventually I did, but I was wondering if there was a better (read: lazier) solution. I asked the Twittersphere for advice, and while I don’t know if these solutions are any lazier, here are the best of the suggestions I received.

  • Unicode Lookup, a site that lets you input or paste in any character and get a report on what it is and how one might call it in various encodings.
  • Richard Ishida’s UniView Lite, which does much the same as Unicode Lookup with the caveat that once you’ve input your character, you have to hit the “Chars” button, not the “Search” button. The latter is apparently how you search Unicode character names for a word or other string, like “dash” or “quot”.
  • UnicodeChecker (OS X), a nice utility that includes a character list pane as well as the ability to type or paste a character into an input and instantly get its gritty details.

Any of those will tell you that the − in question is MINUS SIGN, codepoint 8722 (decimal) / 2212 (UTF-16 hex) / U+2212 (Unicode hex) / et cetera, et cetera. Did you know it was designated in Unicode 1.1? Now you do, thanks to UnicodeChecker and this post. You’re welcome.

Update 2 Mar 12: Philippe Wittenberg points out in the comments that you can add a UnicodeChecker service. With that enabled, all you have to do is highlight a character, summon the contextual menu (right-click, for most of us), and have it shown in UnicodeChecker. Now that’s the kind of laziness I was trying to attain!

“The Vendor Prefix Predicament” at ALA

Published this morning in A List Apart #344: an interview I conducted with Tantek Çelik, web standards lead at Mozilla, on the subject of Mozilla’s plan to honor -webkit- prefixes on some properties in their mobile browser. Even better: Lea Verou’s Every Time You Call a Proprietary Feature ‘CSS3,’ a Kitten Dies. Please—think of the kittens!

My hope is that the interview brings clarity to a situation that has suffered from a number of misconceptions. I do not necessarily hope that you agree with Tantek, nor for that matter do I hope you disagree. While I did press him on certain points, my goal for the interview was to provide him a chance to supply information, and insight into his position. If that job was done, then the reader can fairly evaluate the claims and plans presented. What conclusion they reach is, as ever, up to them.

We’ve learned a lot over the past 15-20 years, but I’m not convinced the lessons have settled in deeply enough. At any rate, there are interesting times ahead. If you care at all about the course we chart through them, be involved now. Discuss. Deliberate. Make your own case, or support someone else’s case if they’ve captured your thoughts. Debate with someone who has a different case to make. Don’t just sit back and assume everything will work out—for while things usually do work out, they don’t always work out for the best. Push for the best.

And fix your browser-specific sites already!

Unfixed

Right in the middle of AEA Atlanta—which was awesome, I really must say—there were two announcements that stand to invalidate (or at least greatly alter) portions of the talk I delivered. One, which I believe came out as I was on stage, was the publication of the latest draft of the CSS3 Positioned Layout Module. We’ll see if it triggers change or not; I haven’t read it yet.

The other was the publication of the minutes of the CSS Working Group meeting in Paris, where it was revealed that several vendors are about to support the -webkit- vendor prefix in their own very non-WebKit browsers. Thus, to pick but a single random example, Firefox would throw a drop shadow on a heading whose entire author CSS is h1 {-webkit-box-shadow: 2px 5px 3px gray;}.

As an author, it sounds good as long as you haven’t really thought about it very hard, or if perhaps you have a very weak sense of the history of web standards and browser development. It fits right in with the recurring question, “Why are we screwing around with prefixes when vendors should just implement properties completely correctly, or not at all?” Those idealized end-states always sound great, but years of evidence (and reams upon reams of bug-charting material) indicate it’s an unrealistic approach.

As a vendor, it may be the least bad choice available in an ever-competitive marketplace. After all, if there were a few million sites that you could render as intended if only the authors used your prefix instead of just one, which would you rather: embark on a protracted, massive awareness campaign that would probably be contradicted to death by people with their own axes to grind; or just support the damn prefix and move on with life?

The practical upshot is that browsers “supporting alien CSS vendor prefixes”, as Craig Grannell put it, seriously cripples the whole concept of vendor prefixes. It may well reduce them to outright pointlessness. I am on record as being a fan of vendor prefixes, and furthermore as someone who advocated for the formalization of prefixing as a part of the specification-approval process. Of course I still think I had good ideas, but those ideas are currently being sliced to death on the shoals of reality. Fingers can point all they like, but in the end what matters is what happened, not what should have happened if only we’d been a little smarter, a little more angelic, whatever.

I’ve seen a proposal that vendors agree to only support other prefixes in cases where they are un-prefixing their own support. To continue the previous example, that would mean that when Firefox starts supporting the bare box-shadow, they will also support -webkit-box-shadow (and, one presumes, -ms-box-shadow and -o-box-shadow and so on). That would mitigate the worst of the damage, and it’s probably worth trying. It could well buy us a few years.

Developers are also trying to help repair the damage before it’s too late. Christian Heilmann has launched an effort to get GitHub-based projects updated to stop being WebKit-only, and Aarron Gustafson has published a UNIX command to find all your CSS files containing webkit along with a call to update anything that’s not cross-browser friendly. Others are making similar calls and recommendations. You could use PrefixFree as a quick stopgap while going through the effort of doing manual updates. You could make sure your CSS pre-processor, if that’s how you swing, is set up to do auto-prefixing.

Non-WebKit vendors are in a corner, and we helped put them there. If the proposed prefix change is going to be forestalled, we have to get them out. Doing that will take a lot of time and effort and awareness and, above all, widespread interest in doing the right thing.

Thus my fairly deep pessimism. I’d love to be proven wrong, but I have to assume the vendors will push ahead with this regardless. It’s what we did at Netscape ten years ago, and almost certainly would have done despite any outcry. I don’t mean to denigrate or undermine any of the efforts I mentioned before—they’re absolutely worth doing even if every non-WebKit browser starts supporting -webkit- properties next week. If nothing else, it will serve as evidence of your commitment to professional craftsmanship. The real question is: how many of your fellow developers come close to that level of commitment?

And I identify that as the real question because it’s the question vendors are asking—must ask—themselves, and the answer serves as the compass for their course.

May 2012
SMTWTFS
April  
 12345
6789101112
13141516171819
20212223242526
2728293031  

Archives

Feeds

Extras