Hi all,
To follow up with a previous post, I wanted to share the results for my run total prediction model. I spent about 6 months painstakingly processing, parsing, and integrating retrosheet data with historical closing lines from covers.com for the years 2004-2012.
I know some people bash the covers lines as being "dirty," but I've checked them pretty thoroughly and have found that they are a pretty good reflection of closing lines. I'd also reason that additional value can be found from line shopping and tracking line movement, so if anything I think they represent a pretty conservative estimate of the overall range of lines that are offered prior to game time.
I tried a lot of "sophisticated" modelling approaches - logistic/linear regressions, SVMs, HMMs, and a method that scaled previous offensive run counts according to the opponent starting pitcher, and applied a bootsrapping step to estimate run totals. They all failed (which was especially disappointing for the bootstrap method; it was actually pretty clever, and I was pretty proud/confident of that when I started coding it).
The method that worked wonderfully was actually a lot simpler, and only relies on a few (very predictive) statistics. For each season it identified about 200-250 outliers. The results look too good to be true (2004 and 2005 were a little weaker than the rest), but I assure you it is not due to a coding error - I've quadruple checked it all. I've also checked my data files and run permutation tests for betting at random, to ensure that that distribution conforms to a negative expected value. So with that being said, this year will be my "beta testing" - I'm going to post my totals picks here and track the progress as the season goes along. For now I'll post up the running unit totals for seasons 2004-2012, a la the ones found on RAS. Feel free to comment - looking forward to April...
RunningBalancePlot2004.jpg
RunningBalancePlot2005.jpg
RunningBalancePlot2006.jpg
RunningBalancePlot2009.jpg
RunningBalancePlot2010.jpg
RunningBalancePlot2011.jpg
RunningBalancePlot2012.jpg
RunningBalancePlot2007.jpg
RunningBalancePlot2008.jpg