A way of evaluating predictive models reasonableness

Waterstpub87 · 12-16-18, 09:01 PM

I'd watch about the Kenpom predictions, especially when it comes to totals. Last year I got absolutely creamed at the end of the season and the tournment using a kenpom based system. Not that the same thing would happen to you, but it is something to keep in mind.

I don't really bother to back test much anymore. After some many times where stuff worked in the backtest and failed in the live, I don't really waste alot of time .To be honest, my models are all several years in, so much of my recent work has been in better operational stuff. Usually, when I am changing something, I will run through the first few months of a season. I focus more on how close I am to the line. If I within a point or so in NBA of the closing line on 80+% of games, I know I have a decent model. For new stuff, I will generate close to a seasons worth of data, sometimes it works out really well, and sometimes it crashes and burns.

Most of the time though, I am not doing anything too crazy, so if I am close to the line, I am pretty sure my model is good.

HeeeHAWWWW · 12-17-18, 06:44 AM

Brier scores work well for this: calculate it for your model's predictions, and for the implied prob of the market.

It's not perfect because of differing subsets, but it's quick.

Bsims · 12-17-18, 07:34 AM

Correlation Summary

Here are the results of the correlations between the predicted scores and the actual scores for CBB games in 2017-18 season.

Predictor Scores	# Predictions	Avg Regulation Score Team 1	Avg Pred Team 1	Corr 1	Avg Regulation Score Team 2	Avg Pred Team 2	Corr 2
LV implied scores	3,971	69.6	70.0	0.524	74.6	75.0	0.572
Like games average scores	3,505	70.3	71.0	0.498	73.5	74.1	0.508
KenPom predicted scores	3,884	69.6	70.0	0.502	74.6	74.8	0.543
Power ratings predicted scores	2,725	69.9	69.9	0.447	74.1	74.5	0.441

Total Games on Scores File	3,975
Games at Neutral Sites	590
Percent of Neutral Site Games	14.8%

Note that I only used the score at the end of regulation play, ignoring overtime. Normally team 1 is the visitor, and team 2 is the home team. For games at neutral sites, both are considered visitors. This may raise some questions. I’ll try to deal with some obvious ones in subsequent posts. I have put a spreadsheet with the source data and summary in the cloud. Hopefully you can access it via the following URL, bit.ly/2A1f8pE

Bsims · 12-17-18, 07:36 AM

Originally posted by HeeeHAWWWW

Brier scores work well for this: calculate it for your model's predictions, and for the implied prob of the market.

It's not perfect because of differing subsets, but it's quick.

Interesting, I'll have to learn more about this. I can think of some other applications.

HeeeHAWWWW · 12-17-18, 08:44 AM

Originally posted by Bsims

Interesting, I'll have to learn more about this. I can think of some other applications.

Other possibles are the other proper scoring rules: logloss, and spherical loss. Logloss is usually the most practical of the three for most purposes, but given most bets are in the middle of the probability range, Brier is likely best for most people.

nash13 · 12-17-18, 10:32 AM

For Sale Domain: mathematicalfootballpredictions.com

https://mathematicalfootballpredictions.com/montecarlo/

This domain was first registered over 10 years ago (in January 2015) and has hosted a website since Day One, which had over six digit organic impressions in Google Search at one point. The domain name has established an excellent reputation with Goog

i guess here is enough to evaluate your betting process

yak merchant · 12-17-18, 02:40 PM

Originally posted by HeeeHAWWWW

Brier scores work well for this: calculate it for your model's predictions, and for the implied prob of the market.

It's not perfect because of differing subsets, but it's quick.

So how do you deal with interval/ratio data types with Brier scores? Do you convert everything to Moneyline probabilities or are you binning results? Every example I’ve ever seen is analyzing probabilities between Predicted and actual for Nominal or Ordinal types.

HeeeHAWWWW · 12-17-18, 05:06 PM

Originally posted by yak merchant

So how do you deal with interval/ratio data types with Brier scores? Do you convert everything to Moneyline probabilities or are you binning results?

No need for binning, it inherently calibrates across the whole range.

All you need is the (binary) outcome, and prediction %.

Originally posted by yak merchant

So how do you deal with interval/ratio data types with Brier scores? Do you convert everything to Moneyline probabilities or are you binning results?

No need for binning, it inherently calibrates across the whole range.

All you need is the (binary) outcome, and prediction %. This is a superior metric than traditional ones using binary outcomes vs binary predictions (eg accuracy, Kappa, AUC etc), because those are throwing away a lot of info about the prediction.

yak merchant · 12-17-18, 05:28 PM

Originally posted by HeeeHAWWWW

No need for binning, it inherently calibrates across the whole range.

All you need is the (binary) outcome, and prediction %.

No need for binning, it inherently calibrates across the whole range.

All you need is the (binary) outcome, and prediction %. This is a superior metric than traditional ones using binary outcomes vs binary predictions (eg accuracy, Kappa, AUC etc), because those are throwing away a lot of info about the prediction.

Well I guess that is my question the model in question is comparing predicted scores to actually scores not a binary outcome.

peacebyinches · 12-17-18, 06:09 PM

I look forward to seeing how this works out brims

HeeeHAWWWW · 12-17-18, 06:12 PM

Originally posted by yak merchant

Well I guess that is my question the model in question is comparing predicted scores to actually scores not a binary outcome.

AHh, gotcha. I suppose you could use traditional regression metrics, mean squared error etc, take your predicted line and the market's middle point. Problematic in lower scoring sports though, or those with irregular scoring distributions.

Binary over/under or a particular handicap also has the nice advantage of focusing your prediction efforts on improving accuracy in the area that matters - ie exactly the thing you're trying to predict and bet on.

danshan11 · 12-17-18, 06:35 PM

I think closing line predictions are predictive than actual scores of past games. The big issue I see with the idea is the injuries, rest, suspensions of players that actually change the line. Perfect example is the Rockets without Harden is a different team without Harden. Also considering that CBB teams are very different from day one to the next season especially with loss of superstar one and dones.

Waterstpub87 · 12-17-18, 08:25 PM

Originally posted by danshan11

I think closing line predictions are predictive than actual scores of past games. The big issue I see with the idea is the injuries, rest, suspensions of players that actually change the line. Perfect example is the Rockets without Harden is a different team without Harden. Also considering that CBB teams are very different from day one to the next season especially with loss of superstar one and dones.

If you are actually testing realistically, you should account for injuries. When I was testing NBA models, set up a scraper that would scrap the games line ups for a particular day. All I had to do was to hit 2 buttons, one to pull the lineup and one to process the results.

If you are testing CBB it is a little different. But you should account for returning starters when projecting next year. I calculated returning minutes, and went from there.

danshan11 · 12-17-18, 08:32 PM

I dont think his model is doing that and in order to do it successfully you need an algo for player worth, I use a team weight system and give each player value and compare that to total team value!

Bsims · 12-17-18, 08:47 PM

Originally posted by danshan11

I think closing line predictions are predictive than actual scores of past games. The big issue I see with the idea is the injuries, rest, suspensions of players that actually change the line. Perfect example is the Rockets without Harden is a different team without Harden. Also considering that CBB teams are very different from day one to the next season especially with loss of superstar one and dones.

Agree. The problem with any handicapping or predictive model is that unknown information like injuries will result in some wagers will look too good. Somehow one must account for these and be leery of these wagers. I tend to compute a return per dollar and bet on those with returns above $1.00. If the return is something like $1.25, be very careful.

Your second point is also good. CBB is a good example of where a team might change significantly from year to year. Of the 4 models , the LV one and like games (since it comes from LV) probably are the best early on. KenPom probably considers player changes. I'm skeptical about how well this can be done. The power rating system won't generate ratings for a team until it has scores for at least 3 games at the appropriate site. That's why it has about a thousand less games than the others.

I'm planning on a follow up study that will look at correlations by month. I would expect the power rating system to improve the most. In a previous study the ratings got better with more data.

Bsims · 12-17-18, 08:58 PM

One issue I always face is how to account for home court advantage. Three of the four models take this in account. The power rating system alone faces this problem. One approach is to adjust the predicted scores by some home court advantage. I don't like this approach.

Since basketball teams play lots of games, I look at each team as two different teams, one on the road and one at home. Thus I have two ratings for Duke, one for vDuke and the other for hDuke.

tsty · 12-17-18, 11:08 PM

You can do regression with past odds instead of results? Lol

Waterstpub87 · 12-18-18, 12:34 AM

Originally posted by Bsims

One issue I always face is how to account for home court advantage. Three of the four models take this in account. The power rating system alone faces this problem. One approach is to adjust the predicted scores by some home court advantage. I don't like this approach.

Since basketball teams play lots of games, I look at each team as two different teams, one on the road and one at home. Thus I have two ratings for Duke, one for vDuke and the other for hDuke.

You have to consider it in per possession, not flat. Consider that much of the home vs away is things like penalties and fouls. If a team produces .25 less fouls per possession, 60 vs 100 possessions makes a large amount of difference.

I've always been the opposite on Home vs away. By the time you get to 10 home and 10 away, most of the season is gone. So at this point, you are probably somewhere around 4 home, 2 neutral, and 2 away or something similar. Any results that you get, especially exterme ones, are much more likely to be random, and not an actual signal.

If instead, you use a constant, you can use thousands of games to generate the home vs away advantage, meaning the number is much more likely to be actually valid. In cbb, this may not be exact, because many teams play weaker teams at home, like duke playing abiline christian in the first game of the season or something like that. Also, some teams, denver comes to mind, benefit extra because the conditions are more extreme there. But in general, this is a much cleaner and more accurate approach.

Bsims · 12-18-18, 05:46 AM

If I were to use home court advantage, I'd probably use KenPom's instead of a constant value. Currently his biggest HCA's are for Colorado 4.5 and Iowa State 4.4. His lowest are Grambling St. and Navy 1.6. His median is 3.2.

HeeeHAWWWW · 12-18-18, 06:04 AM

Originally posted by Bsims

I tend to compute a return per dollar and bet on those with returns above $1.00. If the return is something like $1.25, be very careful.

Strongly agree with this (at least in any liquid market). You can prove it with sufficient betting history too: your edge estimates have errors, and as the edge increases, typically those will become asymmetrical - ie the real edge will be well below your estimate.

There's a good logical explanation: very large edges represent where the market knows something your model doesn't.

For anyone using Kelly this all becomes rather important :-)

tsty · 12-18-18, 07:22 AM

Originally posted by HeeeHAWWWW

Strongly agree with this (at least in any liquid market). You can prove it with sufficient betting history too: your edge estimates have errors, and as the edge increases, typically those will become asymmetrical - ie the real edge will be well below your estimate.

There's a good logical explanation: very large edges represent where the market knows something your model doesn't.

For anyone using Kelly this all becomes rather important :-)

Selectively following your model is wrong imo

Either 100 or nothing

Bsims · 12-18-18, 08:36 AM

I've eliminated the neutral site games. All the correlations went up a bit. Each model does a better job of predicting the home score than the visitors, except the power rating system. Maybe I need to rethink my home court advantage.

Predictor Scores (eliminating neutral site games)	# Predictions	Avg Regulation Score Team 1	Avg Pred Team 1	Corr 1	Avg Regulation Score Team 2	Avg Pred Team 2	Corr 2
LV implied scores	3,381	69.6	70.0	0.530	74.9	75.1	0.585
Like games average scores	2,956	70.4	71.1	0.502	73.7	74.1	0.516
KenPom predicted scores	3,311	69.7	69.9	0.510	74.9	74.9	0.554
Power ratings predicted scores	2,411	70.2	70.0	0.455	74.3	74.6	0.445

danshan11 · 12-18-18, 08:41 AM

Originally posted by tsty

You can do regression with past odds instead of results? Lol

what is more accurate as a predictor of future scores. The total for a team at closing of 31 points or the actual score of 67 since the starting center of the opponent had his worst night in his career?

if the books have Yale with totals of
31, 33, 35, 41, 39
and the actual scores were
39, 20, 33, 29, 65
which do you think is more indicative of their next game score
37.2 actual score avg or
35.8 which was the line

danshan11 · 12-18-18, 09:00 AM

really I dont see the idea or edge in doing this, you are not doing anything more advanced than even a basic model. I would not see how this system could give you any edge. Do you think it is possible to use this to more accurately predict than the closing line can?

vampire assassin · 12-18-18, 09:42 AM

If you look at the set of wagers where your projected ROR is >10%, these will typically due worse than your 3-6% range. As you said, there is an injury or other big change, and your +EV bet has turned into a coin flip.

If you have a large data set, you can flag matches >10% (or <-10%), or find the sweet spot where you discard matches due to informational disadvantage. If you do this when betting, you'll save a fortune. I lost a 6-fig fortune on the sum of these small positives.

u21c3f6 · 12-18-18, 10:56 AM

Originally posted by HeeeHAWWWW

...
There's a good logical explanation: very large edges represent where the market knows something your model doesn't. ...

Ding, ding, ding!!! We have a winner! (From my point of view)

The above is in large part the focus of what I look for when making selections. You see this phenomenon mentioned in various forms in many threads (think "lock" threads for one form) but not many actually try to use this to their advantage IMO.

Joe.

ChuckyTheGoat · 12-18-18, 12:29 PM

Good work, Bsims. Best of luck.

tsty · 12-19-18, 12:24 AM

Originally posted by danshan11

what is more accurate as a predictor of future scores. The total for a team at closing of 31 points or the actual score of 67 since the starting center of the opponent had his worst night in his career?

if the books have Yale with totals of
31, 33, 35, 41, 39
and the actual scores were
39, 20, 33, 29, 65
which do you think is more indicative of their next game score
37.2 actual score avg or
35.8 which was the line

How do you write a model without using past results? It's literally the only way lol

Using past odds is retarded since it was less accurate in the past

danshan11 · 12-19-18, 10:45 AM

Originally posted by tsty

How do you write a model without using past results? It's literally the only way lol

Using past odds is retarded since it was less accurate in the past

because past results are not indicative of future performance past lines are better.
a team win 10 games straight by 40 points is that more indicative of their power ranking as -40 favorites or is the avg line of -8 more accurate of future performance. Also past lines are a collaboration of past game results.

I think the avg score of Yankees is 12 runs last 10 is less indicative of the offense power as the avg team total line of 7.5 in last 10
I would use the 7.5 not the 12, the 7.5 is better indicator of future performance than the 12

example you take Kluber in his last game there were 9 runs scored
do you think that 9 is a better number than the total of 6.5 for future games, which is more indicative of future performance, the line or the result?

when i say past results I am saying last 10 games up to a season not last 25 years

tsty · 12-19-18, 01:52 PM

Lol u just completely ignore my question but w.e

Ill ask a different one then

How did the bookies make those odds? Where were they derived from?

danshan11 · 12-19-18, 02:29 PM

lines are made with power rankings, weather, injuries and I believe books adjust for teams and situations that they have tons of data on, such as Patriots at home probably gets a little extra push from the books even though rankings say X they are probably X plus a dash of salt.

danshan11 · 12-19-18, 02:29 PM

Originally posted by tsty

Lol u just completely ignore my question but w.e

Ill ask a different one then

How did the bookies make those odds? Where were they derived from?

you did not answer any of my questions

danshan11 · 12-19-18, 02:30 PM

Originally posted by danshan11

because past results are not indicative of future performance past lines are better.
a team win 10 games straight by 40 points is that more indicative of their power ranking as -40 favorites or is the avg line of -8 more accurate of future performance. Also past lines are a collaboration of past game results.

I think the avg score of Yankees is 12 runs last 10 is less indicative of the offense power as the avg team total line of 7.5 in last 10
I would use the 7.5 not the 12, the 7.5 is better indicator of future performance than the 12

example you take Kluber in his last game there were 9 runs scored
do you think that 9 is a better number than the total of 6.5 for future games, which is more indicative of future performance, the line or the result?

when i say past results I am saying last 10 games up to a season not last 25 years

I bolded the question to help you see it

danshan11 · 12-19-18, 02:31 PM

I also just read that some books are now focusing more on line history over power rankings to try and get the line more stable start to finish