meyerweb.com

Skip to: site navigation/presentation
Skip to: Thoughts From Eric

Analytical Breakdowns

As a member of the incredible team that’s worked so hard to create it, I couldn’t be more pleased to announce the publication of the findings of the 2007 Web Design Survey.

It’s amazing how many things this process broke.  Right from the outset, the enormity of the entire enterprise pushed all sorts of things past their limits.

The first thing the survey broke was our expectations.  I can only speak authoritatively for myself, but I think the general sense in the team was that we’d be really happy if the number of responses approached 10,000; I myself was expecting somewhere in the 7,000 – 8,000 range.  Instead, we got 32,831 responses.  So many people took the survey at once that it slowed down the Event Apart server, where the survey was hosted.

Although the survey software collected all that information with nary a hitch, getting it back out was another question.  Over the first few days, I was grabbing snapshots of the results as they came in.  That had to stop when an export request brought the entire server to a grinding halt.  Turns out we’d hit a limitation in the software’s code that made exports of tables above a certain size run very, very slowly.

When everything was finished and it was time to run the final export, the same problem bit us again: only this time, we couldn’t get the data out at all.  It was finally Mark Huot who extracted it by hacking the software a bit and running a custom-configured local server instance just to get it extracted in CSV format.

Which left us with a 35.8MB CSV file that needed to be cleaned a bit, as it had some funky characters in it that prevented a straight import into Excel (or anything else).  Have you ever tried to load a 35.8MB text file into a text editor?  It takes a while.  At one point, I honestly feared I’d locked up BBEdit.  And that was just to load it.  Imagine doing find-and-replace operations.  With grep.  (Now imagine constructing the grep expressions needed without gacking half the file.)

But eventually we got the file scrubbed clean and imported into Excel.  The end result was a 21.7MB Excel file.  Have you ever tried to work with a 21.7MB file in Excel?  It takes a while.  Hell, just hitting “Save” sometimes meant a three-minute wait.  And then there’s doing data analysis on a file that large.  Have you ever…?  Right, you get the idea.

The next thing that broke was our ability to process the data in meaningful ways.  I was able to pull some high-level numbers—what you might call surface analyses—for each of the questions.  For example, I could tell you the breakdown of titles, or genders, or education levels; in effect, all the charts in the final report’s introduction (that is, Figures i – xxviii).  At one point, I could even tell you what proportion of the respondents had 401(k)s.  But when it came to looking more in-depth, I was out of my depth.  Want to know the educational-level distribution of female part-time workers at non-profits?  I was lost.  A statistician I’m not.

So we hired two.  Jeffrey and I decided to use some of the proceeds from An Event Apart to retain two professional statistical consultants, Alan Brickman and Larry Yu, and we worked with them to identify interesting questions and patterns.  They did a superb job of not only breaking down the data for us, but also keeping our pattern-recognition impulses in check.  It’s tempting to look at a spike in numbers and invent reasons for its existence, and they reined us in more than once.

For example, and here’s a tiny peek into the results, we found that the higher a respondent’s salary, the more likely they are to participate in formal training (Fig. 9.6, p. 73, in case you want to go straight there).  But why?  Do they make more because they undergo more training; or undergo more training because they can afford it, or can afford to make their company pay for it?  Do not know, so cannot say.

Of course, it’s one thing to get a few hundred tables of numbers and a lengthy written summary from your consultants.  It’s quite another to turn it into a report that distills it all down to the essence, and looks good to boot.  And that’s where we started to heave real trouble.

I mean, sure, Excel can spit out charts like nobody’s business, but I have news: visually speaking, they really aren’t very good charts.  I know you’re shocked to hear this.  Perhaps under most circumstances that would be okay, but the charts that would go into our final report needed to look professional.  And by that, I mean designery professional.

As Head Chart Guy, I grappled with Excel (the Office X version) for weeks.  I learned more about Excel charting than I’d ever known, and still I could not get it to do what I wanted.  Right aligning the left labels on a chart?  The only way was to set the text rotation to something other than zero.  Then they’d all right-align, but also not be straight horizontal labels.  Plus they looked like crap that way, because there was no anti-aliasing happening on the text, or really anywhere else.  And so on.  We absolutely needed Excel to analyze the data, but its graphical output wasn’t what we needed.

So we looked around.  We considered PlotKit; we thought about pure CSS graphs; we even considered hand-drawing every last chart in Illustrator.  Eventually, we decided to give Numbers a try.  And immediately broke it.

Remember the big ol’ data files I mentioned earlier?  It turns out that Numbers was unable to even open them.  We couldn’t do any of our analysis in Numbers.  In hindsight, this was probably a good thing, because I don’t think it does anything like pivot tables, and those were absolutely fundamental to the whole process.  One personal benefit of this whole process for me is that I finally learned how to create pivot tables in Excel.  Strangely, it’s both easier and harder than you might expect.

So in the end, I produced needed data tables in Excel, copied them over to Numbers, and produced the charts there.  In the process, I learned way more about Numbers than I ever really wanted to know.  Jason and I could probably write a medium-sized book on all the things we learned, and learned to hate, about Numbers.  (Like, legends aren’t directly attached to their charts.  WTF?)

Which is not to say Numbers was a total disaster:  far from it.  All the charts you see in the report were created using Numbers, exported to PDF, and dropped into InDesign.  There were a lot of things about Numbers that made the process a lot easier than it would have been, and that very definitely helped us stay organized.  Of course, that just highlighted its limitations all the more harshly.  (You can’t change the separation distance between labels and the chart area?  WTF?)

And then there was the actual assembly, authoring, and proofreading.  Finding typos that inverted claims, mismatches between numbers, charts that were incomprehensible, summary tables that were calculated along the wrong axis.  All the innumerable errors that creep into a process, intensified and magnified by the unusually high information density of this particular project.  Late-stage changes of approach, starting over in places, realizing that things were not as we thought they were.  It’s the kind of thing that might lead one to break a web service.

And now it’s done.  It’s out.  We can rest.

It’s possible that you will not choose to rest, however.  I’m very proud to say that we’re giving the data back to the community that gave it to us in the first place.  Along with the report, we’re also providing anonymized copies of the complete survey data.  Every single one of those 32,831 responses, available in three formats, with only the identifying information stripped away.  If you want to fact-check our results, you can.  If you want to dig deeper, go for it.  If you want to investigate an area we passed over, please do.  It’s all there.  It’s all yours.  It always was yours—we just collected it and held onto it for a little while, and for that trust and honor, we thank you.

So: go read it and let us know what you think.  Tell us how we could do things better next year.  Show us what we missed by producing your own analyses.  And be ready for spring 2008, when we start the cycle anew.

25 Responses»

    • #1
    • Comment
    • Tue 16 Oct 2007
    • 2030
    orrin wrote in to say...

    I regularly open gigabyte-sized text files in UltraEdit. Nothing beats UltaEdit except for the fact that you have to pay for it.

    • #2
    • Comment
    • Tue 16 Oct 2007
    • 2121
    Todd wrote in to say...

    Thank you to you and the entire ALA team that made this a possible.

    • #3
    • Comment
    • Tue 16 Oct 2007
    • 2244
    Jim wrote in to say...

    I’ve been waiting for this! Thanks so much for your hard work and for sharing the results.

    • #4
    • Comment
    • Wed 17 Oct 2007
    • 0322
    Simon wrote in to say...

    To be know-it-all: Why didn’t you import the csv into a db such as sqlite? This would have made processing a lot easier…

    Also NoteTab Light has no problems processing files > 1’000’000 lines, e.g webserver logfiles.

    • #5
    • Comment
    • Wed 17 Oct 2007
    • 0446
    Jylan Wynne wrote in to say...

    Thanks for putting all this great information together, Eric (and all the others).

    • #6
    • Pingback
    • Wed 17 Oct 2007
    • 0528
    Received from Web Design Umfrage | ScreenOrigami

    [...] sehr lesenswert ist Eric Meyers lebhafte Schilderung der Umfrage-Auswertung und der technischen Grenzen gängiger [...]

    • #7
    • Comment
    • Wed 17 Oct 2007
    • 0641
    Paolo wrote in to say...

    great work, Eric!

    • #8
    • Comment
    • Wed 17 Oct 2007
    • 0907
    Dave MacEwan wrote in to say...

    Ditto @Paolo. Rock on, Eric, and Jason, and the hired-gun statisticians, and ALA. This kind of thing can be interesting and fun, but just be careful that “survey nerd” doesn’t unwittingly become your next career move, Eric!

    • #9
    • Comment
    • Wed 17 Oct 2007
    • 0911
    Eric Wiley wrote in to say...

    That was a great behind-the-scenes snapshot of the process, Eric. Really enjoyed it, thank you.

    • #10
    • Comment
    • Wed 17 Oct 2007
    • 1025
    Rachel wrote in to say...

    Sounds like you had a good old time managing all that data! Did you try an enterprise reporting tool like CrystalReports? I suppose that requires that you have the resources for such software, but in my experience these types of tools can handle a lot of data, direct from the database. Maybe someone’s company would volunteer/donate the server space and software access. Crystal allows you to create charts and graphs that help you analyze the data pretty easily if you know SQL (you’d still have to export some data for the actual graphical layouts – the charts may not be up to your standards visually).

    • #11
    • Comment
    • Wed 17 Oct 2007
    • 1047
    Kees wrote in to say...

    Thanks for the quick reply on Jason’s site – after reading your post I found out that you guys struggled just as much as I normally do – pretty interesting :) Next time, if you need help, count me in!

    • #12
    • Comment
    • Wed 17 Oct 2007
    • 1147
    bruce wrote in to say...

    I’m very, very frustrated by this survey. All this data on people who make web sites, and nowhere does it tell me who is best at it.

    • #13
    • Pingback
    • Wed 17 Oct 2007
    • 1303
    Received from Web Design Survey Results Released | NerdStarGamer

    [...] Eric Meyer has written in-depth about the process of collecting the data and creating the report. Definitely read this post after [...]

    • #14
    • Comment
    • Wed 17 Oct 2007
    • 1320
    Lydia Mann wrote in to say...

    Kudos, Eric. I am repeating here my comment from the survey discussion on ALA:

    Fascinating, and a bit overwhelming. I am no statistician, but after reading the survey results I feel rectified in my position that the lack of diversity in conference line-ups merely represents the field in general. Since that is what got this whole thing going (see http://www.kottke.org/07/02/gender-diversity-at-web-conferences) I hope facts will ease perception of gender and racial injustice among conference presenters. As with most inequality, education is key. There are initiatives to bring design education to grades K-12 in wide-ranging communities (see http://www.aiga.org/content.cfm/education, for example). Though we may not see a tidal shift in our generation, conference speakers will likely be a more varied lot in the future. Good work, A List Apart. this was a huge and valuable endeavor and I look forward to future surveys.

    • #15
    • Pingback
    • Wed 17 Oct 2007
    • 1818
    Received from All your blogs…

    [...] Eric Meyer reports that the results are available from the 2007 Web Design Survey: Analytical Breakdowns. [...]

    • #16
    • Comment
    • Thu 18 Oct 2007
    • 0227
    skierpage wrote in to say...

    I loaded a 23MB file into gvim in less than 2 seconds. And it’s free open source (vim.org). And there are vim scripts for working with CSV files. gvim is mostly WYSIWYG, but for all the power you need to learn vi/vim.

    • #17
    • Comment
    • Thu 18 Oct 2007
    • 0327
    prisca wrote in to say...

    Eric,
    thanks to you and the ALA team for all your work on this – should prove an interesting read ;)

    • #18
    • Pingback
    • Thu 18 Oct 2007
    • 1443
    Received from tyHATCH/ Notebook » Blog Archive » Eric Meyer on the 2007 Web Design Survey

    [...] Meyer has posted about his involvement in putting together the 2007 Web Design Survey. Thanks [...]

    • #19
    • Comment
    • Thu 18 Oct 2007
    • 1557
    Jon Cram wrote in to say...

    I recently worked on parsing data out of 150MB text files, producing CSV files around 90MB in size.

    I found TextPad good for opening the raw text files, and for viewing the CSV files ‘directly’ – perhaps around 10 seconds for a 150MB text file on a not-too-fancy middle-aged work-supplied laptop.

    And, yes, doing anything in Excel with large files is a pain – try playing around with a 90MB CSV file without remembering to turn autosave off …

    • #20
    • Comment
    • Thu 18 Oct 2007
    • 1612
    Eric Meyer wrote in to say...

    It turns out my text-editing slowness problems were caused by having the text soft-wrapped. I didn’t realize it would cause that much of a performance hit. So, my bad!

    • #21
    • Pingback
    • Fri 19 Oct 2007
    • 0735
    Received from Jeffrey Zeldman Presents : Faster, pussycat

    [...] Eric Meyer, the survey’s co-author and co-sponsor, has written nice pieces about practical problems overcome in the survey’s creation, and how to keep probing the data for new answers and new [...]

    • #22
    • Comment
    • Fri 19 Oct 2007
    • 1124
    Travis Fleck wrote in to say...

    @Eric – I didn’t see mention of trying to create the tables automatically in Illustrator. I’m not sure if you considered this but you wouldn’t have to draw out each individually. Illustrator can create certain types of graphs based on a data set. It would allow much more flexibility in the design elements that could be modified after the graph was created.

    Illustrator Graphs CS3

    • #23
    • Comment
    • Fri 19 Oct 2007
    • 1304
    Jason Santa Maria wrote in to say...

    @Travis Fleck – I explored Illustrator early on for graph creation and found it fairly anemic. Illustrator can do graphs, but it’s certainly not what it does best. The actual problem is that EVERYTHING is editable. It treats each graph as a collection of many shapes and text fields, meaning that editing and resizing can be a big pain. Numbers understands what a graph is and makes it ridiculously simple to edit and alter them, while maintaining a good relationship between the graphic elements. This makes resizing and editing a snap. Minus a few odd quirks, it actually work out very well for graph creation.

    • #24
    • Pingback
    • Sat 20 Oct 2007
    • 1336
    Received from a work on process » Web Design Survey findings

    [...] note that the A List Apart Web Design Survey findings have now been published. Eric Meyer has some notes about the analysis process which are worth a look for anyone who may find themself managing [...]

    • #25
    • Comment
    • Wed 24 Oct 2007
    • 1632
    Tanner Christensen wrote in to say...

    What a great look at the process behind the survey results; thank you Eric.

    The entire time I was reading about your graphing troubles I was thinking to myself “Numbers. Try Numbers.” and “Numbers can make it beautiful.” But then I read those three little words and almost cried… “immediately broke it.”

    I’m surprised the premier Apple spreadsheet app can’t handle an entire boat full of data. But, I guess not very many applications could.

    It’s good to see you did find some use for my favorite little iWork app though.

    The results are beautiful, and your hard work (especially sitting for countless hours while the file saved) is greatly apprecaited.

Leave a Comment

Line and paragraph breaks automatic, e-mail address required but never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>



Remember to encode character entities if you're posting markup examples! Management reserves the right to edit or remove any comment—especially those that are abusive, irrelevant to the topic at hand, or made by anonymous posters—although honestly, most edits are a matter of fixing mangled markup. Thus the note about encoding your entities. If you're satisfied with what you've written, then go ahead...


October 2007
SMTWTFS
September November
 123456
78910111213
14151617181920
21222324252627
28293031  

Sidestep

Feeds

Extras