December 21, 2011

When you use a smoothed line chart, your data is not affected, it’s misrepresented!

This past week, I was watching a presentation on Q3 performance and up pop a bunch of charts that were clearly created in Excel with smoothed lines.  I hadn’t seen smoothed line charts in quite a while, so I was taken aback.  I almost, but thankfully didn’t, stand up and call out the junk.

It was incredibly clear to me that the smoothed lines were distorting the data, not much, but distorting it nonetheless. And I have a problem with THAT!

Let’s first take a look at some examples to see how badly the data can be distorted.  The first chart is obviously the smoothed lines.  Nice and pretty, I agree.  It makes me feel like I’m going up a chairlift then skiing down the slopes of Keystone, Colorado.

image

I added the gridlines, though I would never do this if I were presenting this for real, so that you can see where the points truly intersect.  It should be abundantly clear now that the line is trying to connect points that don’t exist.  Look between July & August 2009 or between August & September 2008.  In both of these instances, and many more across the chart, the lines go beyond where they should in an attempt to make the chart nice and smooth.

If I were to look at this quickly, I might think that my sales increased from July to August 2009, but in fact, there was a slight decrease.  In order for the line to connect smoothly to September 2009, the line has to go around August 2009.  Think about all of the people that don’t used zero-based axes.  Imagine how distorted their data could look.

Contrast the smoothed line chart to this standard line chart.

image

You now easily see that sales decreased from July to August 2009.  It’d be tough to interpret anything from this chart between the months because the lines clearly connect month to month.  The smoothed lines lead you to believe that there is more data being connected.

Now, let’s look at how the smoothed and straight lines look on the same chart.  For illustrative purposes, we’re only looking at 2008.  Now that dip after August really stands out. 

image

Jon Peltier of the Peltier Tech Blog sums it up best in his post about the charts to choose and avoid in Excel 2010:

Smoothed lines are abused. If you are plotting measured data, the only valid connecting curve between points is a straight line (or a line which is fitted to a function that comes from a physical model of the data). A smoothed curve implies that the data goes places where it has not been measured. Smoothed lines without points are even worse, because the person trying to interpret the chart doesn’t even know what points on the smoothed curve belong there.

My advise?  NEVER use smoothed lines.  The ONLY possible outcome is misinterpretation.

Let me wrap up with what I find to be a bit of a funny line from Microsoft’s help for creating smoothed lines:

When you use this procedure to soften the jagged edges of a line chart, your data is not affected.

This is very true.  Your data is not affected, it’s merely misrepresented.  Semantics?