Skip to: site navigation/presentation
Skip to: Thoughts From Eric

Archive: 18 October 2007

Digging Into the Data

One of the practical reasons we released the anonymized data sets from the 2007 Web Design Survey was that we knew we couldn’t ask every possible question, let alone report on the results.  For that matter, we knew we wouldn’t even be able to come up with every possible question.  It’s one thing to approach this enormous mountain of data with a specific question in mind; those questions always seem obvious to the questioner.  In that case, there’s a clear path to the summit.  But we didn’t come at this with a specific angle in mind.  We just wanted to know what the profession looks like.  So we not only didn’t have a clear path to the summit, we didn’t even have a summit to reach.  Instead, we had thousands.  The tyranny of choice came down on our heads like a, well, like a falling mountain.

So the obvious choice was to release the data for others to analyze in search of their own summits, and I’m really glad to see people already doing so.  One gentleman is looking to produce an analysis of UK respondents, for example.  Others are asking specific questions and getting surprising results.

For example, Rebekah Ahrens grabbed a copy of the dataset and pulled out the answer to a straightforward question: what’s the gender distribution for the various age groups?  What she found was that almost without exception, the younger the age group, the smaller the percentage of women.  Here’s a chart showing the results she found in graphic form.

Wow.  What is causing that?  It’s a pronounced enough pattern that I initially wondered if it was somehow an artifact of the analysis method.  Several times during the authoring of the report I’d think I’d found some amazing and previously unsuspected trend… only to discover I’d divided some numbers by the wrong total, or charted column-wise when the table was row-oriented.  Mistakes along those lines.  They happen.  But I really don’t think that’s the case here.

Now, the really important question is why this pattern exists, and that’s where the data fail us—we can’t get the numbers to reveal all the forces that went into their collection.  There are any number of reasons why this pattern might exist.  I thought of three hypotheses in quick succession, and I’m sure there are many more of equal or greater plausibility.

  1. Younger women didn’t hear about the survey, and so didn’t take it.
  2. Women are losing interest in the field, instead heading into other career paths, and so those who have stuck with the field longer are more prevalent.
  3. Increasing margins of error at the low and high ends of the age spectrum reduces the confidence of the numbers to the point that we can’t draw any conclusions.

Remember, these are all hypotheses, any one of which could be true or not.  So how would we go about proving or disproving them?

  1. Conduct a survey of women in the field to see if they answered the survey, if they know others who did, the ages of themselves and those others, and so on.  Difficult to undertake, but not impossible.
  2. Ditto #1, although a possibly useful followup analysis would be to look at the gender distributions by longevity in the field and then cross-reference the two results.
  3. Get someone who is a statistician to figure out the likely margins of error to see if that might explain things.  I’d do it, but I have no idea how.  I would tend to be skeptical of this as an explanation given the clear trend, but I suppose it is possible.

I can, however, create the gender-by-longevity chart mentioned in #2 there.

Check it out: above six years’ longevity, women are consistently more represented (compared to the overall average) than they are below six years’ longevity.  The only exception is a spike at “1 year or less”.  Is that enough to explain the trend Rebekah spotted?  It doesn’t look like it to me, but then I’m not a statistician.  I also wonder a bit about the spikes at edges of the longevity spectrum.

I’m not trying to propose an explanation here, because I don’t have one.  I don’t even have an unsubstantiated belief as to what’s happening here.  I know just enough to know that I don’t know enough to know the answer.  What I’m saying is this: the great thing is that anyone can do this sort of analysis; and that even better, having done so, we can start to figure out what questions we need to be asking of ourselves and each other.

October 2007
September November