During our analysis of the responses to the Web Design Survey, one of the things I thought seriously about doing was dumping the whole dataset into a database and building a web front end to query it. Then I remembered that as back-end developer, I’m an excellent book author. I know some MySQL and PHP, but I’m right in that sour spot of knowing enough to make the development process slow and error-prone due to my moderate but incomplete knowledge of the languages while not knowing enough to correctly design the project from the outset. So I stuck to Excel and the like, which can be cumbersome but quickly learned.
I was a little sad, though. I’d had the thought that if I built an interface to the survey results, it could be released publicly once we were done. With such a tool, anyone could generate their own pivot tables without having to learn the process in Excel (or deal with Excel’s handling of enormous data files). That seemed like a really good thing.
Well, the dataset is public now. So how about one or more of you super-sharp developer types, the ones who didn’t check any of the boxes on the question about gaps in your back-end coding skills, doing what I could not?
The basic scope of the project would be to list the various data points (gender, ethnicity, age bracket, salary, geographic location, perception of bias, etc., etc., etc.) and let a user pick the two they want to analyze against. So if someone wanted a table showing the breakdown of gender by ethnicity, they would pick one to go on the top and the other to go on the left. The table generated would give those numbers. I’d have it spit out raw numbers, but allowing the user to optionally get the results as percentages might be a nice touch. Though then you’d have to let the user say which way the percentages are calculated: by column, or by row.
For extra deepness, one could also filter the results based on the value for a third data point. With that sort of feature, one could get the breakdown of gender by ethnicity for only the EU respondents. I might do it by letting the user click on a data point and then pick the specific filtering value via a dropdown. Maybe three radio buttons: one for top, one for left, and one for filter. Or, heck, do a whole Web 2.0 drag-n-drop interface. That part’s not important. What matters is giving anyone the ability to easily get numbers out of the massive dataset.
The only real challenge I can foresee is where questions allowed more than one answer, like the location of work question and the skill questions. In the dataset, they’re just comma-separated value lists. Those would need to somehow be broken out into subtables or Boolean columns or something. The actual structure of the solution interests me a whole lot less than simply having one.
I’m quite sure this is the kind of thing a real programmer could create in about a day. As I am not a real programmer any more, it would take me a month or four. Let’s not wait. Anyone out there able to take the idea and run with it?