May 14, 2012

How common is your birthday? Find out exactly with an interactive heat map.

Matt Stiles posted a heat map on his blog yesterday that I thought was pretty well done.  I decided to get the data from NYTimes.com and recreate it in Tableau.

It takes under 20 seconds and under 10 clicks to create it in Tableau, more like 15 seconds if you’ve been using Tableau longer.

Matt chose a brownish color palette, but I wanted to try lots of different colors.  Tableau makes is incredibly simple to try out many options very quickly.  I tested green, blue, gray and orange-blue palettes before settling on an orange palette.  For my eye, the orange palette made distinguishing the colors easiest.

Creating this as an interactive viz in Tableau allows you to provide the reader/viewer/interactor with more information.  Hover over your birthday and you will see exactly where it ranks.  Try it! 

In a static version, you’re left to guess at the approximate range in which it falls.

Check out Matt’s post and the comments.  There are some interesting insights from the readers including:

  • Matt struggled with getting the colors just right using Illustrator.  With Tableau, it’s all built in.  There’s no need to tinker.
  • Doctors apparently don’t like having their vacations disturbed.  Check out how around major holidays (July 4th, Thanksgiving, Christmas) there a fewer babies born.
  • September clearly has many of the top days (in fact it has all of the top 10), but July and August aren’t far behind.  It looks like people conceive during all of those Thanksgiving, Christmas and New Year’s parties.
  • A reader noted that the 13th seems to be least common on average.  Perhaps that’s because many people see that as an unlucky day.

41 comments:

  1. I wish the data set wasn't aggregated to the day and was available over the years - then we could look to see if there's been any shift, and look at day of the week trends.

    ReplyDelete
  2. Yes, that would certainly allow for deeper analysis.

    I added the top 10 and bottom 10 on the right. Note that I excluded Feb 29 from the bottom 10 since it only occurs every 4 years.

    ReplyDelete
    Replies
    1. hi andy,
      but you made duplicacy for month and day again and pasted accordingly in rows and column shelf?

      Delete
  3. Andy,

    Nice job! I agree with the use of the orange pallet. I've been using orange (and the blue and orange pallet) for a while now on many of my dashboards for work.

    I also like the addition of the top and bottom 10.

    Just curious about the white lines. Is it just a formatting thing (heat maps can be tricky to format I know) or do they signify something?

    ReplyDelete
  4. Robert,

    That's a Tableau formatting problem. There's no way to fix it either. I agree that it's distracting.

    I have Tableau set to fill the entire cell, but it sure doesn't look like it does.

    Andy

    ReplyDelete
  5. I have encountered that as well. I will admit that I have spent more time than I should adjusting the size of the grid a hairwidth at a time just to try to get it right. I have actually found that using square shapes can create a better formatted heat map than the default Tableau heat map (in some cases). But it isn't as quick.

    ReplyDelete
  6. Excellent visualization.

    To see a smoothed version of similar data, see the article "The Most Likely Birthday in the US."

    For a discussion of how holidays affect US Births, see "The effect of holidays on US Births."

    ReplyDelete
  7. Craig and I are just working on a maternity data set for the US chaps - we'll let you connect to it when its done - one row per birth... currently we have 5 years of data and it looks much like the data you have.

    ReplyDelete
  8. Thanks for the additional links Rick!

    Tom, I'll play around with it when you're done. It should be fun!

    ReplyDelete
  9. I have raw data with thousands of DOBs. I've already created a heatmap based on the raw counts. Can I create the ranks inside Tableau?

    ReplyDelete
  10. If you need to calculate the rank, use the INDEX() function in Tableau.

    Here's a link to the Tableau help for how it works.

    http://onlinehelp.tableausoftware.com/v7.0/pro/online/en-us/functions_functions_tablecalculation.html

    ReplyDelete
  11. Kevin,

    Here is an example workbook:
    http://public.tableausoftware.com/views/ColoronRank/ColoronRank

    Depending on your exact situation, you may need to be aware of how Tableau pads data when dealing with table calcs and dates. (note the use of string and number fields and aggregate the pills, there are many routes, this is just one option)

    Andy and Robert,

    If you look at this workbook you will see I used the square mark type and made panes instead of cells to get Tableau to pixel-fit to the pane instead of the mark.

    (I thought I had posted a previous comment with additional details, maybe it got blocked)

    ReplyDelete
  12. Joe, as always, thanks for the tip. You would think the default functionality for heat maps would fill the cells.

    I've updated my viz to use squares instead. It's amazing how such a simple change can make it look so much better.

    ReplyDelete
  13. Andy,

    I see you changed the mark type to square, but that is not enough. Currently you are using cells (only one discrete/blue pill on both the Rows and Columns shelves), and you need to be using panes, more than one discrete pill on both the Rows and Columns shelves.

    Look you can simply duplicate your "Day" and "Month" fields, and place their copy on those shelves (for formatting, you can turn off the headers and adjust the level of the borders) in addition to turn your cells into panes.

    As for your difficulty earlier about the white lines, this is just an artifact of what I consider one of Tableau's great features, pixel-fitting, for more on that term see http://dcurt.is/pixel-fitting.

    What is going on here is Tableau performs pixel-fitting differently for marks (what you experienced with the Bar mark type) then it does for Panes. For bar marks, Tableau is trying to set them side-by-side without overlapping, with the just squares in cells with no panes, there is overlap/overprinting, and with panes and squares, the square color fills the pane but does not spill over.

    ReplyDelete
  14. I see your point Joe. I've changed the viz again using your technique. It's only a couple more clicks and a bit of formatting to get it to look good.

    My total time to rebuild it from scratch would still be under one minute.

    ReplyDelete
  15. Thanks so much Andy and Joe! I am now introduced to table calculations, and I have a heatmap.

    ReplyDelete
  16. This is helpful, but it fails to capture the fact that I share a birthday with Arnold Schwarzenegger.

    ReplyDelete
  17. I am a true believer in Astrology. Here are some of my common birthday mates. You can obviously see the connection between all of us:
    Woody Allen
    Woody Strode
    Kareem Abdul Jabbar
    Pat Robertson
    Joe Biden
    Nancy Sinatra
    and.....Albert Pujols

    ReplyDelete
  18. Great job. But, shouldn't the Top 10 be a group of the darkest days and the Bottom 10 be a group of the lightest?

    ReplyDelete
  19. I intentionally did not leave all of the top 10 as the darkest and all of the bottom 10 as the lightest because you wouldn't be able to distinguish the order they are in.

    For example, if I use the scale from 1-366 for the top 10, they are all dark orange and you can see which one is #1 vs #10. With the way I designed it, you can more easily see where they stand.

    The color legend at the top is still applicable because it goes from most common to least common within that subset of days.

    ReplyDelete
  20. My one comment is I think some of your causality reasoning may need to be taken even further. It may not be that doctors don't want their holidays disturbed. I would posit that it is also people not wanting to be in the hospital for the holidays when choosing to induce.

    ReplyDelete
  21. I find the lighter band across all months on the 13th interesting.

    ReplyDelete
  22. Bottom 10 is missing February 29 (366/366)

    ReplyDelete
  23. JimR,

    Feb 29 was intentionally left off because since it only occurs every four years.

    I've added a note on the viz.

    Andy

    ReplyDelete
  24. FYI: it's "palette" in this case. Not "pallet."

    ReplyDelete
    Replies
    1. Thanks Stephanie. I must have had Home Depot on my mind. I've corrected it.

      Delete
  25. Surely modern day C-section are the main explaination of why there aren't as much births on holidays. I would love to see a similar chart only with natural births.

    ReplyDelete
  26. Anonymous, birth dates are public data. But natural vs. induced birth is medical information and therefore private. A natural-birth-only map is therefore not possible.

    ReplyDelete
  27. Haha it's funny how Christmas is the second to lowest common birthday. Its like Jesus is going 'Fuck off. Get back in there. Its my birthday. MY BIRTHDAY!'

    ReplyDelete
  28. my birthday says tooltip unavailable (500) Any info on September 1?

    ReplyDelete
  29. Would be interesting to see the numbers of births from each day (is that available somewhere)? I'm a Jan 3 baby (one of the ten least frequent), and it would be interesting to know on an absolute basis how much frequent my birthday is than a day in September.

    ReplyDelete
  30. Please try the tooltip again. It's working for me. Perhaps there was a blip with Tableau Public's server.

    As for the actual number of births, I don't have that data. All I used was the rankings. I'm sure it's out there somewhere though.

    ReplyDelete
  31. I don't know why I like this so much, but I do. (Probably because my birthday is in the bottom 20, woot.)

    I would be very curious to see this data broken down by geographic and/or climatic regions. When I see summer babies, my first thought is of cold winter snuggling. In that case, though, the distribution ought to shift a bit in Southern California, where Christmas day is usually sunny & 70 degrees. (Says the January-born Southern Californian.) Or, despite our year-round fertility, is it possible that we humans have season-based rhythms?

    Anyway, thanks for sharing the data, I've gotten a kick out of speculating.

    ReplyDelete
  32. Clearly, peak copulation periods are October to December. That's the first thing that came to mind (take it as you will), looking at the concentration of birthdays in July to Sept.

    ReplyDelete
  33. seems like people don't like giving birth on holidays (July 4th, Thanksgiving week, Xmas, new year's day).

    ReplyDelete
    Replies
    1. A common, unproven, theory is that doctors don't want to work on those days, so they either induce their patients or make them wait.

      Delete
  34. My birthday is ranked number 9, which surprises me. In 27 years, I've only met one other person, 10 years older, with the same birthday as me. In school, most of my friends had summer birthdays. Of course this is anecdotal, but when I opened this page, I was guessing it would be one of the least popular.

    ReplyDelete
  35. what's the difference in percentage between the date with the least births and the date with the most births? There's no mention of this here. If it's tiny, the whole graph is just noise...

    ReplyDelete
    Replies
    1. James, the only data that was provided was the ranking. Showing the percentage difference from the max wouldn't change the rank. The purpose of the heat map is to show the frequency, which it does.

      Delete
    2. Andy: 'Frequency' isn't quite what this display shows. It shows the rank order of frequencies, not the values of frequency scores, which is the usual meaning of frequencies. James' question is significant: if the difference between the most common and least common is as low as, say, 5% of frequency, there really isn't much 'signal' in this data at all. While it is a nice display of the commonness of birthdates, it doesn't give information that would support correct judgements of scarcity of birthdate-mates, though I believe this will be a common interpretation. My conclusion: Visualisation good; potential for misinterpretation also good.

      Delete
  36. I take the point, Redwood Rhiadra, about private data so you can't disaggregate 'natural' and 'intervention' births. But one could perhaps compare countries with dissimilar rates of c-section/induction vs natural births (which nonetheless share the same Christian-based set of public holidays, e.g. one of the Scandinavian countries or the Netherlands http://www.oecd-ilibrary.org/sites/health_glance-2011-en/04/09/g4-09-01.html?itemId=/content/chapter/health_glance-2011-37-en) (need to ignore Thanksgiving).

    ReplyDelete