Sunday, June 27, 2010

Transparency and Visualization of City Data

Last year my city was considering closing one or two schools (, One of the justifications cited was decline in student enrollment. Though I can no longer find it anywhere online, one of the enrollment artifacts put forth was this image of a column chart from a (presumably) Excel spreadsheet.

From Flow

To look at this picture (of data) one would assume there was a HUGE drop in enrollment, something in the vicinity of 50-60%.

But a closer look at the values reveals quite a different quantitative result. In fact, the actual decline from the peak in 1994-95 to the valley in 2008-09 appears to be 5,600 students. Keep in mind, I'm doing my best to derive the actual number based on the Y-axis scale on the picture. The data weren't provided. To make matters more difficult, the column chart is shown in 3D. This undermines the utility of the visualization by making it hard to determine an actual value on the chart.

The actual difference is 6,824 students, from the high in 1994-95 to the projected low in 2012-13.

6,824 students is 18.1% of the peak (6,824/37,707). While significant, 18.1% is a far cry from what appears to be a 50-60% decline.

Fortunately, when I requested the data they were provided to me.

This is valuable for two reasons. First and foremost, I get the data, not an interpretation of the
data. Second, and more the point of this post, I can show what this chart should look like and what else the data have to say.

Notice the slope of the decline and the differences between yearly observations. Neither are so exaggerated as the first image.

The differences in view are the aspect ratio of the graph and the value of the Y-axis origin. In the first chart the aspect ratio of the picture is nearly square, or 1:1. The starting value of the Y-axis is 28000 (an arbitrary value not found anywhere in the data). In the second chart the Y-axis begins at zero (0) and the aspect ratio is closer to 3:1 (which is probably too wide).

So then, which image and which set of chart variables is correct?

The answer depends on the basis for the query...or what you are trying to say. Actually the answer has nothing to do with "what you are trying to say" since you/we shouldn't be trying to say anything except to represent quantitative values in visual form. The right thing is to understand the nature of the analysis and the decision trying to be made.

In this case, we can assume with confidence that the Y-axis has no business starting at any value other than zero. In this case what is being claimed, via what is shown, is a long term (since 1990) decline, not a year-to-year fluctuation. Over the 14 year span the decline is obviously not in the range of 50-60%, therefore the chart should no imply so. Otherwise, if year-to-year deltas are of interest it probably does not make sense to show a 14 year span.

By the way, the data for recent yearly decline are -216, -200, +30, -17 between the years 2006-2010. 200 students city wide is hardly justification for closing a school. (Our school has over 600 students.)

In the end, my city chose not to close a school. I think that was a wise choice. But I have to wonder what was going on in the minds of the decision makers. What analyses were they using? Did they, too, realize the truth in these numbers or was it some other (political) factor. Did we just get lucky?

When cities make claims based on data they should publish the data. When cities, or anyone, shows you a picture of data you should question the picture and demand politely request a reference to the data. At minimum, know what you are seeing.

No comments: