Skip to: site navigation/presentation
Skip to: Thoughts From Eric

Archive: 2007

Survey Analysis Service

During our analysis of the responses to the Web Design Survey, one of the things I thought seriously about doing was dumping the whole dataset into a database and building a web front end to query it.  Then I remembered that as back-end developer, I’m an excellent book author.  I know some MySQL and PHP, but I’m right in that sour spot of knowing enough to make the development process slow and error-prone due to my moderate but incomplete knowledge of the languages while not knowing enough to correctly design the project from the outset.  So I stuck to Excel and the like, which can be cumbersome but quickly learned.

I was a little sad, though.  I’d had the thought that if I built an interface to the survey results, it could be released publicly once we were done.  With such a tool, anyone could generate their own pivot tables without having to learn the process in Excel (or deal with Excel’s handling of enormous data files).  That seemed like a really good thing.

Well, the dataset is public now.  So how about one or more of you super-sharp developer types, the ones who didn’t check any of the boxes on the question about gaps in your back-end coding skills, doing what I could not?

The basic scope of the project would be to list the various data points (gender, ethnicity, age bracket, salary, geographic location, perception of bias, etc., etc., etc.) and let a user pick the two they want to analyze against.  So if someone wanted a table showing the breakdown of gender by ethnicity, they would pick one to go on the top and the other to go on the left.  The table generated would give those numbers.  I’d have it spit out raw numbers, but allowing the user to optionally get the results as percentages might be a nice touch.  Though then you’d have to let the user say which way the percentages are calculated: by column, or by row.

For extra deepness, one could also filter the results based on the value for a third data point.  With that sort of feature, one could get the breakdown of gender by ethnicity for only the EU respondents.  I might do it by letting the user click on a data point and then pick the specific filtering value via a dropdown.  Maybe three radio buttons: one for top, one for left, and one for filter.  Or, heck, do a whole Web 2.0 drag-n-drop interface.  That part’s not important.  What matters is giving anyone the ability to easily get numbers out of the massive dataset.

The only real challenge I can foresee is where questions allowed more than one answer, like the location of work question and the skill questions.  In the dataset, they’re just comma-separated value lists.  Those would need to somehow be broken out into subtables or Boolean columns or something.  The actual structure of the solution interests me a whole lot less than simply having one.

I’m quite sure this is the kind of thing a real programmer could create in about a day.  As I am not a real programmer any more, it would take me a month or four.  Let’s not wait.  Anyone out there able to take the idea and run with it?

Digging Into the Data

One of the practical reasons we released the anonymized data sets from the 2007 Web Design Survey was that we knew we couldn’t ask every possible question, let alone report on the results.  For that matter, we knew we wouldn’t even be able to come up with every possible question.  It’s one thing to approach this enormous mountain of data with a specific question in mind; those questions always seem obvious to the questioner.  In that case, there’s a clear path to the summit.  But we didn’t come at this with a specific angle in mind.  We just wanted to know what the profession looks like.  So we not only didn’t have a clear path to the summit, we didn’t even have a summit to reach.  Instead, we had thousands.  The tyranny of choice came down on our heads like a, well, like a falling mountain.

So the obvious choice was to release the data for others to analyze in search of their own summits, and I’m really glad to see people already doing so.  One gentleman is looking to produce an analysis of UK respondents, for example.  Others are asking specific questions and getting surprising results.

For example, Rebekah Ahrens grabbed a copy of the dataset and pulled out the answer to a straightforward question: what’s the gender distribution for the various age groups?  What she found was that almost without exception, the younger the age group, the smaller the percentage of women.  Here’s a chart showing the results she found in graphic form.

Wow.  What is causing that?  It’s a pronounced enough pattern that I initially wondered if it was somehow an artifact of the analysis method.  Several times during the authoring of the report I’d think I’d found some amazing and previously unsuspected trend… only to discover I’d divided some numbers by the wrong total, or charted column-wise when the table was row-oriented.  Mistakes along those lines.  They happen.  But I really don’t think that’s the case here.

Now, the really important question is why this pattern exists, and that’s where the data fail us—we can’t get the numbers to reveal all the forces that went into their collection.  There are any number of reasons why this pattern might exist.  I thought of three hypotheses in quick succession, and I’m sure there are many more of equal or greater plausibility.

  1. Younger women didn’t hear about the survey, and so didn’t take it.
  2. Women are losing interest in the field, instead heading into other career paths, and so those who have stuck with the field longer are more prevalent.
  3. Increasing margins of error at the low and high ends of the age spectrum reduces the confidence of the numbers to the point that we can’t draw any conclusions.

Remember, these are all hypotheses, any one of which could be true or not.  So how would we go about proving or disproving them?

  1. Conduct a survey of women in the field to see if they answered the survey, if they know others who did, the ages of themselves and those others, and so on.  Difficult to undertake, but not impossible.
  2. Ditto #1, although a possibly useful followup analysis would be to look at the gender distributions by longevity in the field and then cross-reference the two results.
  3. Get someone who is a statistician to figure out the likely margins of error to see if that might explain things.  I’d do it, but I have no idea how.  I would tend to be skeptical of this as an explanation given the clear trend, but I suppose it is possible.

I can, however, create the gender-by-longevity chart mentioned in #2 there.

Check it out: above six years’ longevity, women are consistently more represented (compared to the overall average) than they are below six years’ longevity.  The only exception is a spike at “1 year or less”.  Is that enough to explain the trend Rebekah spotted?  It doesn’t look like it to me, but then I’m not a statistician.  I also wonder a bit about the spikes at edges of the longevity spectrum.

I’m not trying to propose an explanation here, because I don’t have one.  I don’t even have an unsubstantiated belief as to what’s happening here.  I know just enough to know that I don’t know enough to know the answer.  What I’m saying is this: the great thing is that anyone can do this sort of analysis; and that even better, having done so, we can start to figure out what questions we need to be asking of ourselves and each other.

Analytical Breakdowns

As a member of the incredible team that’s worked so hard to create it, I couldn’t be more pleased to announce the publication of the findings of the 2007 Web Design Survey.

It’s amazing how many things this process broke.  Right from the outset, the enormity of the entire enterprise pushed all sorts of things past their limits.

The first thing the survey broke was our expectations.  I can only speak authoritatively for myself, but I think the general sense in the team was that we’d be really happy if the number of responses approached 10,000; I myself was expecting somewhere in the 7,000 – 8,000 range.  Instead, we got 32,831 responses.  So many people took the survey at once that it slowed down the Event Apart server, where the survey was hosted.

Although the survey software collected all that information with nary a hitch, getting it back out was another question.  Over the first few days, I was grabbing snapshots of the results as they came in.  That had to stop when an export request brought the entire server to a grinding halt.  Turns out we’d hit a limitation in the software’s code that made exports of tables above a certain size run very, very slowly.

When everything was finished and it was time to run the final export, the same problem bit us again: only this time, we couldn’t get the data out at all.  It was finally Mark Huot who extracted it by hacking the software a bit and running a custom-configured local server instance just to get it extracted in CSV format.

Which left us with a 35.8MB CSV file that needed to be cleaned a bit, as it had some funky characters in it that prevented a straight import into Excel (or anything else).  Have you ever tried to load a 35.8MB text file into a text editor?  It takes a while.  At one point, I honestly feared I’d locked up BBEdit.  And that was just to load it.  Imagine doing find-and-replace operations.  With grep.  (Now imagine constructing the grep expressions needed without gacking half the file.)

But eventually we got the file scrubbed clean and imported into Excel.  The end result was a 21.7MB Excel file.  Have you ever tried to work with a 21.7MB file in Excel?  It takes a while.  Hell, just hitting “Save” sometimes meant a three-minute wait.  And then there’s doing data analysis on a file that large.  Have you ever…?  Right, you get the idea.

The next thing that broke was our ability to process the data in meaningful ways.  I was able to pull some high-level numbers—what you might call surface analyses—for each of the questions.  For example, I could tell you the breakdown of titles, or genders, or education levels; in effect, all the charts in the final report’s introduction (that is, Figures i – xxviii).  At one point, I could even tell you what proportion of the respondents had 401(k)s.  But when it came to looking more in-depth, I was out of my depth.  Want to know the educational-level distribution of female part-time workers at non-profits?  I was lost.  A statistician I’m not.

So we hired two.  Jeffrey and I decided to use some of the proceeds from An Event Apart to retain two professional statistical consultants, Alan Brickman and Larry Yu, and we worked with them to identify interesting questions and patterns.  They did a superb job of not only breaking down the data for us, but also keeping our pattern-recognition impulses in check.  It’s tempting to look at a spike in numbers and invent reasons for its existence, and they reined us in more than once.

For example, and here’s a tiny peek into the results, we found that the higher a respondent’s salary, the more likely they are to participate in formal training (Fig. 9.6, p. 73, in case you want to go straight there).  But why?  Do they make more because they undergo more training; or undergo more training because they can afford it, or can afford to make their company pay for it?  Do not know, so cannot say.

Of course, it’s one thing to get a few hundred tables of numbers and a lengthy written summary from your consultants.  It’s quite another to turn it into a report that distills it all down to the essence, and looks good to boot.  And that’s where we started to heave real trouble.

I mean, sure, Excel can spit out charts like nobody’s business, but I have news: visually speaking, they really aren’t very good charts.  I know you’re shocked to hear this.  Perhaps under most circumstances that would be okay, but the charts that would go into our final report needed to look professional.  And by that, I mean designery professional.

As Head Chart Guy, I grappled with Excel (the Office X version) for weeks.  I learned more about Excel charting than I’d ever known, and still I could not get it to do what I wanted.  Right aligning the left labels on a chart?  The only way was to set the text rotation to something other than zero.  Then they’d all right-align, but also not be straight horizontal labels.  Plus they looked like crap that way, because there was no anti-aliasing happening on the text, or really anywhere else.  And so on.  We absolutely needed Excel to analyze the data, but its graphical output wasn’t what we needed.

So we looked around.  We considered PlotKit; we thought about pure CSS graphs; we even considered hand-drawing every last chart in Illustrator.  Eventually, we decided to give Numbers a try.  And immediately broke it.

Remember the big ol’ data files I mentioned earlier?  It turns out that Numbers was unable to even open them.  We couldn’t do any of our analysis in Numbers.  In hindsight, this was probably a good thing, because I don’t think it does anything like pivot tables, and those were absolutely fundamental to the whole process.  One personal benefit of this whole process for me is that I finally learned how to create pivot tables in Excel.  Strangely, it’s both easier and harder than you might expect.

So in the end, I produced needed data tables in Excel, copied them over to Numbers, and produced the charts there.  In the process, I learned way more about Numbers than I ever really wanted to know.  Jason and I could probably write a medium-sized book on all the things we learned, and learned to hate, about Numbers.  (Like, legends aren’t directly attached to their charts.  WTF?)

Which is not to say Numbers was a total disaster:  far from it.  All the charts you see in the report were created using Numbers, exported to PDF, and dropped into InDesign.  There were a lot of things about Numbers that made the process a lot easier than it would have been, and that very definitely helped us stay organized.  Of course, that just highlighted its limitations all the more harshly.  (You can’t change the separation distance between labels and the chart area?  WTF?)

And then there was the actual assembly, authoring, and proofreading.  Finding typos that inverted claims, mismatches between numbers, charts that were incomprehensible, summary tables that were calculated along the wrong axis.  All the innumerable errors that creep into a process, intensified and magnified by the unusually high information density of this particular project.  Late-stage changes of approach, starting over in places, realizing that things were not as we thought they were.  It’s the kind of thing that might lead one to break a web service.

And now it’s done.  It’s out.  We can rest.

It’s possible that you will not choose to rest, however.  I’m very proud to say that we’re giving the data back to the community that gave it to us in the first place.  Along with the report, we’re also providing anonymized copies of the complete survey data.  Every single one of those 32,831 responses, available in three formats, with only the identifying information stripped away.  If you want to fact-check our results, you can.  If you want to dig deeper, go for it.  If you want to investigate an area we passed over, please do.  It’s all there.  It’s all yours.  It always was yours—we just collected it and held onto it for a little while, and for that trust and honor, we thank you.

So: go read it and let us know what you think.  Tell us how we could do things better next year.  Show us what we missed by producing your own analyses.  And be ready for spring 2008, when we start the cycle anew.

Primal Tweet

It seems that Twitter just can’t handle the display of primal screams.

See, I had need to let loose a really good primal scream today.  Uncharacteristically, I decided to share it with the online world.  It seemed like the perfect method was to Twitter it.  And for me, the correct form of a primal scream is “AAAAAAAAA…”, so that’s what I Twittered.  Only, I filled the limit: I held down shift-A in Twitterrific until I’d generated 140 upper-case “A”s, no breaks, no punctuation.  Just, you know, primal screaming.


What didn’t occur to me was the fact that browsers are really bad at word-wrapping big long chunks of unbroken characters.  So my primal tweet seriously disrupted the layout of Twitter for me, and for all 768 people following me (at the time), as a layout table got super-expanded and the scream overflowed various and sundry other element boxes.

Oops.  Sorry ’bout that, folks.  Though I have to admit there is the part of me that’s secretly pleased: a primal scream should be disruptive.  And in some cases, the effect is unintentionally funny and appropriate: like the individual display of that tweet, where the scream runs right out of the “text balloon” and just keeps going and going and going.  The failure states become extra levels of commentary on what’s been said.  Screamed.  They accidentally reinforce the intended message instead of subverting it.

Honestly, that’s kind of cool.  I find it all the more delightful because I didn’t intend any of that to happen.  I was just blowing off 140 characters worth of steam.

As for why I felt the need to scream so primally, odds are very high you’ll hear all about it tomorrow.

Director’s Commentary

In this latest resurgence of the “are blog comments distilled joy or pure evil?” conversation, I got tagged by Alastair Campbell as someone whose site has good comments.  This would be a great time for an abashedly mumbled “aw, shucks, it ’tweren’t nuthin’”, except saying so would be a complete lie.

Want to know how I get good comments?  I work for them.  Part of that is striving to write good posts that will get good comments.  Another part is leading by example.  And a third part is a willingness to filter the comments I get, and saying as much up front.

The first part is in many ways the hardest, because what I think is a good post may be judged otherwise by my readers.  I could spend days and days and days on a post that merits a collective “meh”.  From that, I learn.  If this were solely a “me shouting whatever comes to mind” site, then I wouldn’t care so much, but this is a conversation site.  The goal here is not for me to pronounce my views from on high and thus change the world.  The goal here is to share information, with the sharing going in both directions, and thus change ourselves.

The second part is a lot easier for me, but seems to be harder for some.  It’s very, very rare that I will post confrontationally or abusively.  The few times I’ve done so, I’ve gotten some strong pushback, and no wonder.  The vast majority of the time, the posts conform to the site’s overall Airbag Blog Advisory System warning level of Guarded (“Someone might disagree with you, but only after apologizing for it first”).  Unsurprisingly, the non-spam comments that come in are respectful, helpful, and civil about 99% of the time.  Whether the tone of the site only draws people who are naturally that way or it shepherds all kinds of people in that direction is unknown, and to me wholly irrelevant.

It’s also the case that when the comments and the post match in tone, it’s much more likely that subsequent comments will keep to the same tone; that is, that the commenters help me in setting the tone for the site.  And I always, always appreciate that they do so.  (So thank you!)

The third part can be taken one of three ways.  You can say that I’m like a curator who lovingly tends a growing collection of thoughts and contributions, only excising those that would damage the overall whole; or that I’m a tiny fascist who only lets through comments that meet my personal standards of acceptability.  The thing is, both are true (which is the third way you can take it).

Y’see, this here blog is to me an extension of my home.  In my house, some things are acceptable and others are not.  Nobody is allowed to smoke in my house, for example.  In terms of speech, I’m pretty tolerant of what others have to say, but there are lines and I will enforce them.  Have enforced them, in fact, usually gently—although I once very nearly ejected an out-of-town friend who was staying with us from my house for something that was said.  And no, I won’t tell you who or why.

So as I say, you can take that as abrogation of my visitors’ freedom of speech and an exercise of my right to make my home the kind of place I want it to be.  It really is both.  I don’t see that as a problem.

I treat css-discuss the same way, actually.  It is, and always has been, a benevolent dictatorship, with policies that are enforced.  When people go off-topic, the moderators say so and end the threads.  When people get abusive, they’re warned to stop it, and can and will be ejected from the list.  In fact, I’ve done so twice in the list’s five-year history.  The moderators and I work actively to shape the list, and it has paid off.  The community now mostly polices itself, and the need for moderator intervention is becoming more rare.  Over time, it’s become a very helpful community with a very high signal-to-noise ratio.  Others have observed that it’s spawned one of the few truly useful group-run wikis in the world.  None of that just magically happened.  It required years of effort by me, and then by the moderation team.

(And before someone says that a small mailing list is different than a globally available blog, remember that css-d has over 8,300 subscribers from all over the world.)

The other person who got mentioned in Alastair’s post as having a good-comment home was Roger Johansson, who recently contributed his own thoughts on the topic.  In that post, he hits a lot of the pros and cons of allowing comments: the feedback, the spam, the community, the abuse.  One of the most subtle effects of comments is that it does make you think harder about what you post:

I realized that as I was writing I had started to subconsciously think about what kind of comments a post would trigger. I found it harder and harder to write freely, and to express myself the way I really want to.

It’s the same for me.  The few times I’ve posted things I knew were going to contentious, it was after a lot of thought and consideration.  In fact, almost everything I post goes through some degree of pre-approval based on what kinds of comments I think it will trigger.

Where I would seem to part ways with Roger is that I don’t think that’s a bad thing.  One of the things those opposed to blog comments cite is John Gabriel’s Greater Internet F—wad Theory (warning: contains strong language)—as if it only applied to commenters.  The basic anonymity of the Internet isn’t a case of not knowing names: it springs from the very, very low chance that we’ll ever meet in person.  It applies just as much to those who write blog posts as those who comment on said posts.

Still, that sounds like I’m allowing the community to impose some constraints on me.  Actually, it doesn’t sound like that: it is that.  But I choose to do that, and frankly, I don’t think anything is lost in the bargain.  Quite the contrary.  I think more than a little is gained.

So weirdly enough, I find myself in disagreement with Joel Spolsky when he says:

The important thing to notice here is that Dave [Winer] does not see blog comments as productive to the free exchange of ideas. They are a part of the problem, not the solution. You don’t have a right to post your thoughts at the bottom of someone else’s thoughts. That’s not freedom of expression, that’s an infringement on their freedom of expression.

It’s the last sentence where I disagree, not the rest of it, which is provided for context.  That last sentence is like saying that when I have an in-person conversation, anything the other person says is an infringement on my freedom of speech.  In fact, it’s like saying that my response to his post infringes on his freedom of speech.  Which is just silly.

Neither is my pre-filtering of posts an infringement of my speech.  I am not forced to allow comments, nor to pre-judge my posts based on the expected reaction.  It is something I voluntarily accept as part of having an extended conversation.  If I felt that was becoming too much of a burden, I’d turn off comments.

I don’t have comments here out of obligation to some imagined right.  I have them because they’re invitation to contribute, to enrich, to converse.  Just look at what happened with the reset styles: over the course of a few posts, my original work was built upon and improved.  The same thing happened in the early days of S5.  Without comments, neither of those efforts would have gone as far nor been as well-developed as they eventually were.  No, not even with e-mail, which is one-to-one and so doesn’t allow for the commenters to converse with each other.

Of course, not everyone wants to have a conversation right on their site, which is fine as well.  I don’t think Daring Fireball is lessened for its not having comments.  But part of the reason I think that is that John, being a strongly opinionated sort, would probably get the same kinds of comments in return; the bread you cast upon the waters will be returned to you tenfold.  And the fact that much of his posting is about Mac and Windows wouldn’t help much, either.  Nothing invites comment incoherency faster than having a blog about a contentious issue.  (See also: political blogs.)

As well, there’s nothing that says one must have comments always on, or always off.  It’s generally the case that I don’t open comments on the most personal of my posts, particularly those about Carolyn.  In those cases, I close comments because I’m writing them for me and to share those moments with the world, and don’t want positive or negative feedback.  They’re not meant to be conversations, in my view.  They’re snapshots.

(I’ll admit that I may be influenced by the fact that it was a Carolyn-related post that earned me one of the most vitriolic personal attacks that I’ve allowed to stand.)

It is absolutely the case that having good comments is hard work.  It requires leading by example and a willingness to curate/censor the comments that do come in.  And I absolutely think that anyone unwilling or unable to do that work should disable comments.  Because when a site’s comments devolve into “Useless noise[;] Thoughtless drivel written by some anonymous non-entity“, that’s as much the responsibility of the site’s author as of the commenters themselves.

Thus, for meyerweb, I hold to the inverse of Jeremy’s corollary of Sturgeon’s law: here, comments should be enabled 90% of the time.  I would not think to apply either rule to the world at large, of course.  For many sites, comments probably should be off by default.  But not for mine.

A Vast Wasteland

I think it’s pretty much obvious to anyone with half a brain that information wants to be free—both free as in beer as well as free as in speech.  If it weren’t for huge soulless megacorporations imprisoning content behind unreasonably high paywalls and fascist licensing terms, we’d already be collectively a lot richer than we are today.  Anyway, it’s not like people would pay for most of their crap anyway, and since they never would have gotten that money, then it’s not like they’ve lost anything.  Hell, chances are that by being able to preview merchandise in full, sales are actually improved.

What?  Wasn’t this Talk Like A Pirate Day?

Another Soul Lost to SketchUp

Hitting a link shared by Unstoppabot, who really needs to get around to fixing his feed linking policy (“View this site”?  Lame!), I was seized with another spasm of appreciation for the deliciousness that is SketchUp.

I did a moderate amount of 3D modeling back in the day.  The specific day in question would be the one where we all thought that images of rendered 3D models and, whenever possible, blobs of text were the absolute last word in Great Web Design.  Remember that?  Wasn’t it fantastic?  When every page title could be a bunch of extruded and beveled sans-serif letters viewed slightly from above, with the whole mess of text angled away from the observer?

Good times.

So anyway, while I was cranking out renderings of page title text and university logos, I also spent some time creating scenes of other stuff.  You can find some of the results if you dig deeply enough here on meyerweb, but that’s not my point.  What I’m trying to say is that I enjoy a bit o’ three dee more than most, and have some knowledge of how difficult it can be to construct models.

When I first heard about SketchUp, I was intrigued but didn’t really buy into all the hype.  It couldn’t be that easy, could it?  And then I watched someone using it—at An Event Apart, as it happens; and no, it wasn’t one of the attendees—and was captivated.  I downloaded the installer while I was sitting there, watching him create and modify shapes as easily as sketching them on paper.  And then I left it uninstalled, because I was afraid of what it would do to my free time.

A few days ago, I finally broke down.  I actually did have a legitimate reason to install and use it, a really good one, but of course I’d been waiting for any reasonable pretense to launch the .dmg and make with the modeling.  So I did.

Color me deeply impressed.  While you’re at it, add some heavy tints of addicted.  I started by modeling our kitchen, and now I want to do the whole frickin’ house.  I’m starting to eye local landmarks for recreation and contribution to the Warehouse and Google Earth.

I don’t have time for this.  I need help.  Stop me before I model again!

Browsers Boosted

In response to my post about Camino and Firefox, Simon and Smokey Ardisson sent along the following:

  • AsceticBar by (who else?) Jon Hicks Stuart Morgan removes all the icons in Camino’s Bookmarks bar without killing them off in the actual Bookmarks menu.  Exactly what I wanted.  Thanks to Smokey for pointing it out, and Hicksy Stuart for writing it!

  • A Firefox extension that fixes its last-tab behavior in OS X when “always show tab bar” is turned on.  This was contributed by Simon in a comment on Bugzilla bug 348031, and once I installed it, I found Firefox much easier to tolerate.  I hope that this gets permanently fixed in a future version of Firefox, but until that happens, we’ve got the extension to paper over the problem.

My heartfelt thanks to both gentlemen for their pointers and efforts!

April 2014