meyerweb.com

Skip to: site navigation/presentation
Skip to: Thoughts From Eric

Survey Analysis Service

During our analysis of the responses to the Web Design Survey, one of the things I thought seriously about doing was dumping the whole dataset into a database and building a web front end to query it.  Then I remembered that as back-end developer, I’m an excellent book author.  I know some MySQL and PHP, but I’m right in that sour spot of knowing enough to make the development process slow and error-prone due to my moderate but incomplete knowledge of the languages while not knowing enough to correctly design the project from the outset.  So I stuck to Excel and the like, which can be cumbersome but quickly learned.

I was a little sad, though.  I’d had the thought that if I built an interface to the survey results, it could be released publicly once we were done.  With such a tool, anyone could generate their own pivot tables without having to learn the process in Excel (or deal with Excel’s handling of enormous data files).  That seemed like a really good thing.

Well, the dataset is public now.  So how about one or more of you super-sharp developer types, the ones who didn’t check any of the boxes on the question about gaps in your back-end coding skills, doing what I could not?

The basic scope of the project would be to list the various data points (gender, ethnicity, age bracket, salary, geographic location, perception of bias, etc., etc., etc.) and let a user pick the two they want to analyze against.  So if someone wanted a table showing the breakdown of gender by ethnicity, they would pick one to go on the top and the other to go on the left.  The table generated would give those numbers.  I’d have it spit out raw numbers, but allowing the user to optionally get the results as percentages might be a nice touch.  Though then you’d have to let the user say which way the percentages are calculated: by column, or by row.

For extra deepness, one could also filter the results based on the value for a third data point.  With that sort of feature, one could get the breakdown of gender by ethnicity for only the EU respondents.  I might do it by letting the user click on a data point and then pick the specific filtering value via a dropdown.  Maybe three radio buttons: one for top, one for left, and one for filter.  Or, heck, do a whole Web 2.0 drag-n-drop interface.  That part’s not important.  What matters is giving anyone the ability to easily get numbers out of the massive dataset.

The only real challenge I can foresee is where questions allowed more than one answer, like the location of work question and the skill questions.  In the dataset, they’re just comma-separated value lists.  Those would need to somehow be broken out into subtables or Boolean columns or something.  The actual structure of the solution interests me a whole lot less than simply having one.

I’m quite sure this is the kind of thing a real programmer could create in about a day.  As I am not a real programmer any more, it would take me a month or four.  Let’s not wait.  Anyone out there able to take the idea and run with it?

25 Responses»

    • #1
    • Comment
    • Mon 22 Oct 2007
    • 1124
    zeldman wrote in to say...

    Jason and I have been discussing this as well (possibly for next year).

    • #2
    • Comment
    • Mon 22 Oct 2007
    • 1135
    John wrote in to say...

    I’ve already started working on something like this – are there others already on it?

    • #3
    • Comment
    • Mon 22 Oct 2007
    • 1201
    Eric Meyer wrote in to say...

    zeldman: Yeah, Jason and I kicked the idea around, and I talked about it with Aaron too. In the end, I decided to throw the idea out publicly and see what resulted. There may well be four different implementations of the same basic idea, which is cool by me. That’s four different approaches to compare, and to cross-fertilize each other, and to inform the design of future attempts at the same idea.

    John: I don’t know, but hopefully if they are they’ll comment here.

    • #4
    • Comment
    • Mon 22 Oct 2007
    • 1212
    Art wrote in to say...

    Hi Eric.

    I’ve enjoyed your books, etc. over the years.

    Do the techniques outlined by the folks on the Simile project lend themselves to what you want to do?

    See…

    • #5
    • Comment
    • Mon 22 Oct 2007
    • 1244
    zagrycha wrote in to say...

    I limited myself to generate 32831 plain html files of it at http://www.webdesignsurvey.zagrycha.com and I am currently waiting to be indexed by Google to make their hits count a tool for analysis. I shall see if it makes sense soon. This might be not exactly what you envisioned, but it’ll definitely be very flexible.

    • #6
    • Comment
    • Mon 22 Oct 2007
    • 1254
    John wrote in to say...

    The hardest part was getting the csv into something a database engine could work with – lots of multivalued fields and such. I’ll be working on this today if anyone wants to test or help out.

    • #7
    • Pingback
    • Mon 22 Oct 2007
    • 1328
    Received from eliteab » Blog Archive » Survey Analysis Service

    [...] sourced here [...]

    • #8
    • Comment
    • Mon 22 Oct 2007
    • 1346
    Dominik wrote in to say...

    Even more interesting than static data is data in motion. Perhaps at some time in the future there will be enough web design surveys done and data gathered to interest smart brains that are behind applications like Gapminder to put them into more interesting and more revealing visualisations.

    “Don’t just show the notes, play the music!”

    • #9
    • Comment
    • Mon 22 Oct 2007
    • 1352
    Grant Palin wrote in to say...

    Interesting. For the last while I’ve been working on an ASP.NET project that allows use of any specified database and generates a web form based on what are called “criteria columns”, essentially the keys of the database. It’s somewhat involved to explain here and now, but I’ll try running the provided dataset through the system and see what happens.

    • #10
    • Comment
    • Mon 22 Oct 2007
    • 1404
    Eric Meyer wrote in to say...

    Agreed, Dominik: that’s definitely I’d like to see happen over the long term, so that people can chart changes in pattern from year to year. For now, though, I’d settle for something that will quickly and simply spit out numbers from this year’s survey!

    • #11
    • Comment
    • Mon 22 Oct 2007
    • 1440
    Bob Monsour wrote in to say...

    Guys, this is a perfect fit for DabbleDB. It’s mean to allow anyone to do this sort of multi-variable pivot table. See dabbledb.com. I just went to their site, started a chat and suggested that they take this on as a public service.

    Regards,
    -Bob

    • #12
    • Comment
    • Mon 22 Oct 2007
    • 1441
    Bob Monsour wrote in to say...

    Check out… http://www.dabbledb.com/chat/

    They just responded with…”maybe we’ll set up a commons database and invite eric :)”

    This is wierd, I’m in a chat on a site and posting comments about the chat. What planet is this?

    Regards,
    -Bob

    • #13
    • Comment
    • Mon 22 Oct 2007
    • 1444
    Bob Monsour wrote in to say...

    Lastly, check out http://www.dabbledb.com/commons/

    You can create a “commons” database for sharing. The database has the same features as the main Dabble product.

    • #14
    • Comment
    • Mon 22 Oct 2007
    • 1511
    John wrote in to say...

    I have a database created, but Dabble only supports 15,000 rows or less for a dataset. I can’t host it there. :(

    • #15
    • Comment
    • Mon 22 Oct 2007
    • 1550
    Bob Monsour wrote in to say...

    John, you can do more, but have to do it a piece at a time. That’s what they told me in the chat. They also seemed intrigued with the idea of setting it up under their commons area, where anyone can see/use it.

    Regards,
    -Bob

    • #16
    • Comment
    • Mon 22 Oct 2007
    • 1601
    David Carrington wrote in to say...

    When I was attempting to make graphs of UK-only data during my lunch hour the other day, a colleague suggested that I could put the database online and let people choose a country to run queries against.

    I’d have a go but I’d be worried for my poor little shared hosting server. I’ve imported the CSV data to MySQL (mostly as ENUM fields) and a project like this isn’t too far out of reach to anyone that wanted to try and ticked the “back end” skill on the survey.

    I’d be happy to trade ideas with anyone else attempting to tackle this.

    • #17
    • Comment
    • Mon 22 Oct 2007
    • 1634
    Steph Gray wrote in to say...

    Now this isn’t the tool for a 30,000+ response survey, but for smaller datasets, you might consider my lightweight survey tool AskPeople. It’s a PHP download, flat-file based, with pretty much pure CSS templates and fairly clean XHTML code, and comes with a built in reporting module to look at results of one question crossed by another.

    Sorry for the shameless plug, but someone here might find it useful!

    • #18
    • Comment
    • Mon 22 Oct 2007
    • 1643
    Karl wrote in to say...

    I want the dataset in a SQL file is that possible?

    • #19
    • Comment
    • Mon 22 Oct 2007
    • 1646
    John wrote in to say...

    @Bob:

    I’ve got over a million rows I’d need to import. I don’t think that’s going to work.

    My approach is a bit different than David Carringtons (and others?) because I’ve normalized it a bit – rather than leaving everyones skills in a single field, I’ve broken it out into another table (same with perks, etc).

    Not sure if I should just post the DB somewhere, or build a simple app to view it. :/

    • #20
    • Comment
    • Tue 23 Oct 2007
    • 0635
    Jeff wrote in to say...

    @John,
    Well, depending on the size of the database, if it would fit into a SQL Express 2005 DB which I believe has a 4GB size limit then anyone with any of the free Microsoft ASP.NET development tools could try working out a solution. Otherwise if you have all the SQL Scripts to generate a MySql4 dtabase this could still work out as well..

    You know what they say “normalize your data till it hurts, and denormalize until it makes sense!”

    • #21
    • Comment
    • Tue 23 Oct 2007
    • 0636
    Jeff wrote in to say...

    @John,
    Well, depending on the size of the database, if it would fit into a SQL Express 2005 DB which I believe has a 4GB size limit then anyone with any of the free Microsoft ASP.NET development tools could try working out a solution at home and as a means of learning the software. Otherwise if you have all the SQL Scripts to generate a MySql4 database this could still work out as well..

    You know what they say “normalize your data till it hurts, and denormalize until it makes sense!”

    • #22
    • Comment
    • Tue 23 Oct 2007
    • 0738
    Abdelrahman Osama wrote in to say...

    Thanks Bob, I’m using dabble db to create analysis data for Middle Eastern respondents which are 0.6% of respondents it’s delicious, I’m still working to create more views.

    And will post on my blog when I’m done.

    • #23
    • Comment
    • Tue 23 Oct 2007
    • 1049
    John wrote in to say...

    @Jeff

    Right – I have no problem putting it in something – my comments were more directed to the setup at Dabble DB.

    Well, that and I wouldn’t touch .NET/SQLServer with a 32′ long pole. :)

    I’ll keep tinkering today to see what I can come up with.

    • #24
    • Comment
    • Wed 24 Oct 2007
    • 1006
    John wrote in to say...

    Don’t have time to create an app, but here’s some SQL if you’d like it.

    http://www.johndavidanderson.net/blog/2007/10/23/ala-2007-web-design-survey-in-sql/

    • #25
    • Comment
    • Fri 7 Dec 2007
    • 0427
    Felipe wrote in to say...

    I began working on this tool, too.
    John, I first used your data but the employee field seems to have a problem. So I parsed it again, but nevertheless the data you provided were of a great help to begin faster and the tables structures were a good reference.
    No form or user input yet: the HTML/CSS tables and graphs are generated from a PHP array of strings written prior to execution of the script but I’m working on it for next week.

Leave a Comment

Line and paragraph breaks automatic, e-mail address required but never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>



Remember to encode character entities if you're posting markup examples! Management reserves the right to edit or remove any comment—especially those that are abusive, irrelevant to the topic at hand, or made by anonymous posters—although honestly, most edits are a matter of fixing mangled markup. Thus the note about encoding your entities. If you're satisfied with what you've written, then go ahead...


October 2007
SMTWTFS
September November
 123456
78910111213
14151617181920
21222324252627
28293031  

Sidestep

Feeds

Extras