Survey Analysis Service

Published 11 years, 3 days ago

During our analysis of the responses to the Web Design Survey, one of the things I thought seriously about doing was dumping the whole dataset into a database and building a web front end to query it.  Then I remembered that as back-end developer, I’m an excellent book author.  I know some MySQL and PHP, but I’m right in that sour spot of knowing enough to make the development process slow and error-prone due to my moderate but incomplete knowledge of the languages while not knowing enough to correctly design the project from the outset.  So I stuck to Excel and the like, which can be cumbersome but quickly learned.

I was a little sad, though.  I’d had the thought that if I built an interface to the survey results, it could be released publicly once we were done.  With such a tool, anyone could generate their own pivot tables without having to learn the process in Excel (or deal with Excel’s handling of enormous data files).  That seemed like a really good thing.

Well, the dataset is public now.  So how about one or more of you super-sharp developer types, the ones who didn’t check any of the boxes on the question about gaps in your back-end coding skills, doing what I could not?

The basic scope of the project would be to list the various data points (gender, ethnicity, age bracket, salary, geographic location, perception of bias, etc., etc., etc.) and let a user pick the two they want to analyze against.  So if someone wanted a table showing the breakdown of gender by ethnicity, they would pick one to go on the top and the other to go on the left.  The table generated would give those numbers.  I’d have it spit out raw numbers, but allowing the user to optionally get the results as percentages might be a nice touch.  Though then you’d have to let the user say which way the percentages are calculated: by column, or by row.

For extra deepness, one could also filter the results based on the value for a third data point.  With that sort of feature, one could get the breakdown of gender by ethnicity for only the EU respondents.  I might do it by letting the user click on a data point and then pick the specific filtering value via a dropdown.  Maybe three radio buttons: one for top, one for left, and one for filter.  Or, heck, do a whole Web 2.0 drag-n-drop interface.  That part’s not important.  What matters is giving anyone the ability to easily get numbers out of the massive dataset.

The only real challenge I can foresee is where questions allowed more than one answer, like the location of work question and the skill questions.  In the dataset, they’re just comma-separated value lists.  Those would need to somehow be broken out into subtables or Boolean columns or something.  The actual structure of the solution interests me a whole lot less than simply having one.

I’m quite sure this is the kind of thing a real programmer could create in about a day.  As I am not a real programmer any more, it would take me a month or four.  Let’s not wait.  Anyone out there able to take the idea and run with it?


  1. Jason and I have been discussing this as well (possibly for next year).

  2. I’ve already started working on something like this – are there others already on it?

  3. zeldman: Yeah, Jason and I kicked the idea around, and I talked about it with Aaron too. In the end, I decided to throw the idea out publicly and see what resulted. There may well be four different implementations of the same basic idea, which is cool by me. That’s four different approaches to compare, and to cross-fertilize each other, and to inform the design of future attempts at the same idea.

    John: I don’t know, but hopefully if they are they’ll comment here.

  4. Hi Eric.

    I’ve enjoyed your books, etc. over the years.

    Do the techniques outlined by the folks on the Simile project lend themselves to what you want to do?

    See…

  5. I limited myself to generate 32831 plain html files of it at http://www.webdesignsurvey.zagrycha.com and I am currently waiting to be indexed by Google to make their hits count a tool for analysis. I shall see if it makes sense soon. This might be not exactly what you envisioned, but it’ll definitely be very flexible.

  6. The hardest part was getting the csv into something a database engine could work with – lots of multivalued fields and such. I’ll be working on this today if anyone wants to test or help out.

  7. […] sourced here […]

  8. Even more interesting than static data is data in motion. Perhaps at some time in the future there will be enough web design surveys done and data gathered to interest smart brains that are behind applications like Gapminder to put them into more interesting and more revealing visualisations.

    “Don’t just show the notes, play the music!”

  9. Interesting. For the last while I’ve been working on an ASP.NET project that allows use of any specified database and generates a web form based on what are called “criteria columns”, essentially the keys of the database. It’s somewhat involved to explain here and now, but I’ll try running the provided dataset through the system and see what happens.

  10. Agreed, Dominik: that’s definitely I’d like to see happen over the long term, so that people can chart changes in pattern from year to year. For now, though, I’d settle for something that will quickly and simply spit out numbers from this year’s survey!

  11. Guys, this is a perfect fit for DabbleDB. It’s mean to allow anyone to do this sort of multi-variable pivot table. See dabbledb.com. I just went to their site, started a chat and suggested that they take this on as a public service.

    Regards,
    -Bob

  12. Check out… http://www.dabbledb.com/chat/

    They just responded with…”maybe we’ll set up a commons database and invite eric :)”

    This is wierd, I’m in a chat on a site and posting comments about the chat. What planet is this?

    Regards,
    -Bob

  13. Lastly, check out http://www.dabbledb.com/commons/

    You can create a “commons” database for sharing. The database has the same features as the main Dabble product.

  14. I have a database created, but Dabble only supports 15,000 rows or less for a dataset. I can’t host it there. :(

  15. John, you can do more, but have to do it a piece at a time. That’s what they told me in the chat. They also seemed intrigued with the idea of setting it up under their commons area, where anyone can see/use it.

    Regards,
    -Bob

  16. When I was attempting to make graphs of UK-only data during my lunch hour the other day, a colleague suggested that I could put the database online and let people choose a country to run queries against.

    I’d have a go but I’d be worried for my poor little shared hosting server. I’ve imported the CSV data to MySQL (mostly as ENUM fields) and a project like this isn’t too far out of reach to anyone that wanted to try and ticked the “back end” skill on the survey.

    I’d be happy to trade ideas with anyone else attempting to tackle this.

  17. Now this isn’t the tool for a 30,000+ response survey, but for smaller datasets, you might consider my lightweight survey tool AskPeople. It’s a PHP download, flat-file based, with pretty much pure CSS templates and fairly clean XHTML code, and comes with a built in reporting module to look at results of one question crossed by another.

    Sorry for the shameless plug, but someone here might find it useful!

  18. I want the dataset in a SQL file is that possible?

  19. @Bob:

    I’ve got over a million rows I’d need to import. I don’t think that’s going to work.

    My approach is a bit different than David Carringtons (and others?) because I’ve normalized it a bit – rather than leaving everyones skills in a single field, I’ve broken it out into another table (same with perks, etc).

    Not sure if I should just post the DB somewhere, or build a simple app to view it. :/

  20. @John,
    Well, depending on the size of the database, if it would fit into a SQL Express 2005 DB which I believe has a 4GB size limit then anyone with any of the free Microsoft ASP.NET development tools could try working out a solution. Otherwise if you have all the SQL Scripts to generate a MySql4 dtabase this could still work out as well..

    You know what they say “normalize your data till it hurts, and denormalize until it makes sense!”

  21. @John,
    Well, depending on the size of the database, if it would fit into a SQL Express 2005 DB which I believe has a 4GB size limit then anyone with any of the free Microsoft ASP.NET development tools could try working out a solution at home and as a means of learning the software. Otherwise if you have all the SQL Scripts to generate a MySql4 database this could still work out as well..

    You know what they say “normalize your data till it hurts, and denormalize until it makes sense!”

  22. Thanks Bob, I’m using dabble db to create analysis data for Middle Eastern respondents which are 0.6% of respondents it’s delicious, I’m still working to create more views.

    And will post on my blog when I’m done.

  23. @Jeff

    Right – I have no problem putting it in something – my comments were more directed to the setup at Dabble DB.

    Well, that and I wouldn’t touch .NET/SQLServer with a 32′ long pole. :)

    I’ll keep tinkering today to see what I can come up with.

  24. Don’t have time to create an app, but here’s some SQL if you’d like it.

    http://www.johndavidanderson.net/blog/2007/10/23/ala-2007-web-design-survey-in-sql/

  25. I began working on this tool, too.
    John, I first used your data but the employee field seems to have a problem. So I parsed it again, but nevertheless the data you provided were of a great help to begin faster and the tables structures were a good reference.
    No form or user input yet: the HTML/CSS tables and graphs are generated from a PHP array of strings written prior to execution of the script but I’m working on it for next week.

Leave a Comment

Management reserves the right to edit or remove any comment, especially when abusive or irrelevant to the topic at hand. HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <em> <i> <q cite=""> <s> <strong> <pre class=""> <kbd>


Comment Preview

If you're satisfied with what you've written, then go ahead...