An anonymized copy of the data collected in the 2008 Survey has been turned over to some professional statisticians, as we did last year, and we’re waiting to hear back from them before moving into writing the full report. But there’s no reason we can’t have a little fun while we wait, right?
So, calling all mapping ninjas: here’s a 136KB zip archive containing two tab-separated text files listing the countries and postcodes supplied by takers of the survey. Before anyone has a privacy-related aneurysm, though, let me explain how they’re structured.
One of the two files is sorted alphabetically by country, with the postcodes as the second “column of data” (it’s country name, tab, postcode). The second is the reverse: it’s sorted alphabetically by postcode, with the country names following each postcode. This sorting should break any association they might have with the released data set, given that we won’t be including the postcodes in the released set. (More on that in a moment.)
A word of warning: though I cleaned out some of the more obvious cases of people heaping abuse on us for even daring to ask the question, I can’t guarantee that the data set is perfectly clean. There may be drops of bile here and there along with the usual collection of mistyped postcodes. I know there’s at least one bit of obvious humor that I chose to leave in, so enjoy that when you find it.
We have two reasons to release this data this way at this point. The first is to see what people do with it—heatmaps, perhaps, or one of those proportion-distortion maps, or a list of top-ten global postcodes or cities (or both). Hey, go crazy! I’d love to see a number of Google Maps/Yahoo! Maps/OpenMap/whatever mashups with this data. That would be awesome.
The second reason is to ask for help with an API challenge. Like I said, we’re not including the postcodes into the released data set. What I would like to do instead is translate the postcodes into administrative regions (states, provinces, etc.) and put those in the data set. That way, we can include things like “Ohio” and “British Columbia” and “Oaxaca”—thus providing a little bit better granularity in terms of geography, which was area of weakness in the 2007 survey.
Thanks to reading a couple of articles, I know how to do this for a single postcode. But how does one do it for 26,457 postcode-and-country combinations without having to submit every single postcode as a separate request? I’ve yet to see an explanation, and maybe there isn’t one, but I’d like to know either way. And please, if someone does come up with a way, please show the work instead of just spitting out the result! I’m hoping to learn a few things from the solution, but I obviously can’t do that without seeing the code.
One note: in cases where a postcode isn’t recognized or some kind of an error is returned, I’d like to have a little dash or “ERR” or something put in the result file. That way we can get a handle on what percentage of the responses were resolvable. Thanks.
Anyway, map and enjoy!