Predicting Odds

**CrimsonTiger** · 08-01-14, 07:35 PM

The morning line IS the projected odds a horse will go off at. The track handicapper does it all for you. If you want to improve on it then start with the ML and make some modifications.

Let's say a race has 10 horses. Without knowing anything, each horse has a 10% chance. Add a point to the top 5 horses based on say speed racing last race and subtract a point from the bottom 5. That would give you 5 horses with an 11% chance of winning and 5 with a 9% chance of winning. You keep doing this with stats/info you think are important like jockey, $ per start, class etc... Some categories are more important than others so you might want to assign 2 points to some.

**yak merchant** · 08-01-14, 11:05 PM

Originally posted by TravisVOX

In my current project I'm trying to predict the odds a horse will go off at in the race. While a perfect prediction is impossible, I'd like to get as close as possible.

Has anyone tried this or been somewhat successful in doing so? There are a few approaches I've read about and tinkered with, but any thoughts out there...?

Many many years ago I did something similar for greyhound racing. It was (and probably still is) the only application of neural networks that ever worked for me. I took all the pertinent data and stuck it in a dataset and ran it through some Neural Net software (http://www.wardsystems.com/predictor.asp if I remember right, has been around a long time). I'm sure you could find a free R package to do it. With greyhound racing there are (almost) always 8 entrants to the model was pretty consistent. For horse racing, you would probably have to segregate your data sets by number of entrants, and scratches would make that a mess, but you could plug in things, like morning line, last Beyer, best Beyer, avg Beyer last 5, odds last out, etc, etc. The hard part would be class drops etc, and shippers, but I think some of the data files have "class" grades etc in them now days. You could probably take it as far as you wanted, hot jockey, hot trainer, "only grey horse in the race", etc., and all the other crazy stuff the "crowd" buys into.

But if you put together a consistent model and running it through a NN or GA I'm sure you would get very good results. People are actually pretty predictable.

**TravisVOX** · 08-03-14, 10:42 PM

Great stuff, yak. Currently I'm trying a logistic regression with the dependent variable whether or not the horse won... but the public is still only right 30-40% of the time... I feel like I need something that captures how the public tends to overbet certain angles/scenarios/horses/trainers etc.

In your experience, how close did you get in terms of accuracy?

**yak merchant** · 08-04-14, 01:03 AM

Originally posted by TravisVOX

Great stuff, yak. Currently I'm trying a logistic regression with the dependent variable whether or not the horse won... but the public is still only right 30-40% of the time... I feel like I need something that captures how the public tends to overbet certain angles/scenarios/horses/trainers etc.

In your experience, how close did you get in terms of accuracy?

Well I never did extensive analysis on accuracy as A.) Pools are very small, and B.) Due to to the small pools (especially in the Win pool), the "exact" odds was not what I was after. I was just trying to get rankings of the 8 entrants and ballpark interest in backing them, as I was just using to try to project trifecta payouts. It was close enough for government work. I'm not an expert in regression but my guess is that you will end up with too much smoothing for what you are trying to do. While NN's and GA's can suffer from overfitting, I think they may fit the bill in your case. On one hand predicting the odds doesn't really help you unless you can more accurately predict the results than the public anyway, but filtering out the difference between "public money" and "smart money" on underlays seems like the most promising application.