December 21, 2011

When you use a smoothed line chart, your data is not affected, it’s misrepresented!

This past week, I was watching a presentation on Q3 performance and up pop a bunch of charts that were clearly created in Excel with smoothed lines.  I hadn’t seen smoothed line charts in quite a while, so I was taken aback.  I almost, but thankfully didn’t, stand up and call out the junk.

It was incredibly clear to me that the smoothed lines were distorting the data, not much, but distorting it nonetheless. And I have a problem with THAT!

Let’s first take a look at some examples to see how badly the data can be distorted.  The first chart is obviously the smoothed lines.  Nice and pretty, I agree.  It makes me feel like I’m going up a chairlift then skiing down the slopes of Keystone, Colorado.

image

I added the gridlines, though I would never do this if I were presenting this for real, so that you can see where the points truly intersect.  It should be abundantly clear now that the line is trying to connect points that don’t exist.  Look between July & August 2009 or between August & September 2008.  In both of these instances, and many more across the chart, the lines go beyond where they should in an attempt to make the chart nice and smooth.

If I were to look at this quickly, I might think that my sales increased from July to August 2009, but in fact, there was a slight decrease.  In order for the line to connect smoothly to September 2009, the line has to go around August 2009.  Think about all of the people that don’t used zero-based axes.  Imagine how distorted their data could look.

Contrast the smoothed line chart to this standard line chart.

image

You now easily see that sales decreased from July to August 2009.  It’d be tough to interpret anything from this chart between the months because the lines clearly connect month to month.  The smoothed lines lead you to believe that there is more data being connected.

Now, let’s look at how the smoothed and straight lines look on the same chart.  For illustrative purposes, we’re only looking at 2008.  Now that dip after August really stands out. 

image

Jon Peltier of the Peltier Tech Blog sums it up best in his post about the charts to choose and avoid in Excel 2010:

Smoothed lines are abused. If you are plotting measured data, the only valid connecting curve between points is a straight line (or a line which is fitted to a function that comes from a physical model of the data). A smoothed curve implies that the data goes places where it has not been measured. Smoothed lines without points are even worse, because the person trying to interpret the chart doesn’t even know what points on the smoothed curve belong there.

My advise?  NEVER use smoothed lines.  The ONLY possible outcome is misinterpretation.

Let me wrap up with what I find to be a bit of a funny line from Microsoft’s help for creating smoothed lines:

When you use this procedure to soften the jagged edges of a line chart, your data is not affected.

This is very true.  Your data is not affected, it’s merely misrepresented.  Semantics?

10 comments:

  1. Why? Why would someone do this to their data? Especially when there are more appropriate ways to smooth out a line like using trend lines, or a moving average!

    ReplyDelete
  2. Good post! I'll never look at a smoothed line chart the same way again :) Actually I was amazed at how much of a false upward trend was established between Apr and May when you over-laid the charts in the third image.

    ReplyDelete
  3. It can get worse! To test Excel's smoothed lines, I created a graph with the Y-values 0,0,100,100,0,0. Between each pair of 0 values, the smoothed line dips down to about -6, and between the pair of 100 values, it peaks at 110! Excel should have no reason to assume my data could reach such extremes! (Imagine that my data represent percentages, where valid values are 0–100 only!)

    ReplyDelete
  4. Brandon, the unfortunate thing is that people use the smoothed lines purely out of ignorance and have absolutely no idea that they're distorting the data.

    ReplyDelete
  5. How does one select "non-smooth line"?

    ReplyDelete
  6. Go to Format Data Series -> Line Style, then uncheck Smoothed Line.

    ReplyDelete
  7. Out of interest would you mind elaborating on this:

    "I added the gridlines, though I would never do this if I were presenting this for real..."

    ReplyDelete
    Replies
    1. I typically avoid using gridlines because they can distract from the parts of the graph you want your reader to focus on, in this case, the shape of the data.

      Delete
    2. Thanks for the reply... and for the blog in general; most useful and informative.

      Delete
  8. Andy, I liked the post. I do agree that smoothing a line misrepresents the data, though I would say that even a straight line misrepresents the data. A straight line in any of these charts implies that there are an infinite number of points which is likely not the case since we are only talking about discrete values. We could argue that straight lines misrepresent data "less" that smoothed lines, but degree of misrepresentation seems like a silly discussion. If accuracy is our biggest concern, then just plotting points or using tables would be a better choice. I believe that a smoothed curve is more appealing than a jagged one and since the entire exercise is about creating visualizations that are informative and appealing, then there is nothing wrong with smoothing a line.

    ReplyDelete