March 10, 2012

Baseball Predictions: How good is the Marcel the Monkey Forecasting System?

If you haven’t heard of the Marcels, then you need to read up on it here. I can’t possibly summarize what the Marcels are better than their website:

The Marcel the Monkey Forecasting System (or the Marcels for short) is the most advanced forecasting system ever conceived.

Not.

Actually, it is the most basic forecasting system you can have, that uses as little intelligence as possible. So, that's the allusion to the monkey. It uses 3 years of MLB data, with the most recent data weighted heavier. It regresses towards the mean. And it has an age factor.

Yes, that’s it.  Don’t read too much into it.  The whole point of the system is its simplicity.  But you know what?  It does a decent job.

I took the Marcels forecast data (both batting and pitching) from 2001-2012 and matched it up with the “actual” data from Lahman for the same time period.  I was only interested to see how the Marcels performed, so I only looked at data where the year and player existed in both systems.  Get the complete set here.

I Tableaud the data to built this interactive viz (download the workbook here).  There are two tabs, one for batting and one for pitching.  You use them exactly the same way.

  1. Start by picking the stat you would like to view
  2. The viz at the top compares the Marcels prediction and the actual stat for all year/player records. 
    • Hover over a point to see the details (i.e., player, year, data, etc.)
    • Click on a point and the chart at the bottom will update with records for only that player
    • The points are color-coded by the prediction error (i.e., (Actual-Marcels)/Marcels)
  3. The charts at the bottom summarize all of the data for the stat chosen
    • The lines show the Actual and Marcels data across the years
    • The bars show the % error for the total year (red = under forecast, black = over forecast)
  4. If you want to analyze a specific player without having to hunt and peck in the scatter plot, simply pick him from the list
  5. Rinse and repeat with the pitching data

One of the features that I like best is that you can enter a stat minimum.  For example, you first pick HR for the stat, but you are only interested in seeing players that hit 30+ HRs in a season.  Enter 30 in the box and hit enter.  Voila!  The charts update.

Two overall themes stuck out to me:

  1. The Marcel over forecast nearly all of the “raw” batting and pitching stats…things like PA, R, H, HR, RBI, SB, W, ER, etc.
  2. The Marcels tend to forecast “calculated stats” very well, e.g., BA, HR %, OBP, SLG, OPS, ISO, ERA, WHIP

Finally, before you complain about the axes not being to the correct decimals for things like BA, OBP, etc. know that I’m using a parameter in Tableau from which I’ve built a calculation.  The best you can do, that I know of, to get close to the correct number formatting is to leave it set to automatic.  I don’t know of a way to force the format of the field to update based on the parameter selection.

No comments:

Post a Comment