Backtesting Questions.

**Justin7** · 06-27-11, 12:29 PM

Yes. You'll frequently see this in many sports if you are not tracking individual players.

**pedro803** · 06-28-11, 08:59 AM

Originally posted by ProphetofProfit

I don't have injury information included in my probabilities, or some other information like clubs resting their whole first team for whatever reason, or clubs selling off many players in a short period of time. And what this results in is my probabilities being more than 0.1 more than the implied probabilities in some circumstances.

As it turns out, if I bet whenever the probability difference >0.05, my ROI is 3.4% over 3700 bets.

Whenever the probability difference >0.11 my ROI is -3% over 550 bets.

I don't understand why the situations you have described result in your model yielding a greater edge. What is the link?

It seems that the model should indicate the edge that the play would have had if all the regular players were present -- so why would this edge be systematically higher than the edge predicted when all the regular players are actually present?

I think you definitely need to control for the types of circumstances you describe, but I don't see why those circumstances would systematically lead to a higher calculated edge. I am probably missing something, please explain!

**pedro803** · 06-28-11, 09:08 AM

OK I figured it out, sorry for the detour -- it works this way because you are comparing your model's prediction to the line, and the line takes the missing players into account while your model does not. Which in turn leads to the big discrepancy between the line and the prediction that you have produced. duh sorry for the brain fart

now I get it!! (if thats not it then please explain)

and if that is it, then yes I agree it would be fair to remove those when backtesting as, you have said you would probably have enough foresight to not bet those.

If you had the time, an even better solution would be to look at those cases individually and decide which ones fit the circumstances you describe, because some of them may be legitimate misses and you are throwing them out -- which could wrongly inflate your confidence.

two possibilites
1) identify the circumstances where blatant examples of this occured and then add some variable to your retro data to indicate as such -- e.g. Boston rested the last 5 games of the season -- put a marker in there

2) on a less labor intensive front you could examine the thresholds more closely, and act accordingly -- i.e you have said that over .055 is a play, but over .11 is likely an error -- ideally there would be an empty spot somewhere in between there to mark the difference.

that is -- whith your hypothesis I would expect the frequency of valid plays would decrease as the predicted edge gets closer to .11, and ideally there would be a gap in there so you could have some confidence that all the cases you are throwing away are error prone as you have described

**ProphetofProfit** · 06-28-11, 10:42 AM

Yep that's right Pedro, everything over a certain probability difference leads to losses. I checked a few of the massive >0.2 differences and they were all either at the start of the season concerning recently promoted teams with no game history, or at the end of the season where one team had nothing to play for.

Unfortunately your first suggestion would be too time consuming, since there are about 13000 games to check.

And the second, I don't understand what you mean by 'an empty spot to mark the difference'. Currently I remove anything where the difference is <0.05 or >0.13 because outside of these boundaries negative ROI occurs.

As for the results, it's been demoralising for sure. First 7000 games it was +190 units betting 1 unit at average odds of 1.95ish. Happy days I thought. Then it imploded and looks to be finishing at +145 units after 13000 games. Should I be pissed off because that's where I am at the moment. I can console myself with the fact that injuries matter, transfers matter, motivation matters and I'll be able to account for this in the future, and the winrate might not be so bad after all.

But the downswings! I thought major downswings only happened to other people.

Each year consists of 2500 games.

-20 units in 70 bets.
-80 units in 1200 bets.
-50 units in 350 bets.

I don't know if I could keep my sanity betting real money. I'll post a graph if anyone wants to see a trainwreck.

**pedro803** · 06-28-11, 01:35 PM

Sounds pretty good to me -- but I am not experienced at this. I am still at the stage of trying to learn to scrape my own stats and build a db and learn how to manipulate it.

What I meant by 'an empty spot to mark the difference':

imagine your db didn't have any of these anomolous cases that you are talking about here, so you would be left with only valid cases where your model is in full effect and giving you valid feedback/results/predictions about these games

and as you have said above .05 is a play, and ostensibly it should be that as this number increases it is an even stronger play (maybe a wrong assumption on my part as I don't know anything about your model)

that is if you are sure enough to put money on .05 and above then in most cases you are more sure of .07 than of .05
and if this is the case, then I would expect to see more .05's than .06's and more .06's than .07's and so on.

so in this imaginary db with no anamolous cases -- over the course of a season you might see:

80 plays @ .05
50 plays @ .06
25 plays @ .07
5 plays @ .08

and you would get no cases above a certain point

So back to your real db that is interspersed with said anamolous cases -- its hard to see exactly where the valid cases top out because the anamolous cases are in there clouding up the picture -- so to put it simply if you knew the p-value of your highest valid case then you would know to throw everything above that out of your backtest sample.

so if the valid cases gave out about .09 and the anomolies didn't kick in until .11 then that gap between .09 and .11 would indicate the demarcation line.

But if you stop to think about it, the problem is probably that the anomolous status of these cases is probably continuous variable in reality -- that is the cases are invalid to a degree rather than just yes or no (which would be a discrete variable)

So you have the extreme cases here but all along the way of your season these events are coloring the picture every single game day -- injuries and so forth. So no set of games is 'pure' from these imperfections, its just that some are so very skewed by the types of circumstances you described above.

So I guess the short answer goes right back to what Justin said above, any model that doesn't take individual players into account is subject to this problem. I think you could take some solace in knowing that you could always check the matchups everyday and cull the plays where injuries and so forth are involved -- even much more proactively than just the extreme examples like we are talking about. Just use a very cautious approach and anything like that, that is off then just make it a no play.

Sorry to run on, sounds to me like you are doing well -- hang in there!

**pedro803** · 06-28-11, 01:42 PM

to clarify a little -- I am saying there probably is no demarcation gap, because the skewed cases vary across a great range of magnitude of skewedness -- so even down in the .05 and .04 range there are plenty of skewed games -- i.e. games where major players are out and so forth -- in this case it would be just a matter of trial and error about where to put the upper threshold, just try different numbers until the results optimize.

so actually if you are culling those by hand, so to speak, your model could actually be stronger than the backtest indicates -- you could look to see if trade deadlines and such have any effect on how accurate your model is on certain weeks of the season, also compare year to year to see if there are any similarities to where the dips come about

**Wrecktangle** · 06-29-11, 07:28 AM

This is fairly simple, as all early model efforts suffer from this. You either have not built a model that considers all relevant factors (injuries are important, but only one factor), or you haven't fitted them well, or you are "adding things up" in a linear fashion, where some non-linear method is better, or etc.

In sum, the model is telling you that it does not fit the facts well.

**arwar** · 06-29-11, 09:35 AM

hey pedro check your pms

**uva3021** · 06-29-11, 01:26 PM

pedro you can learn alot by just parsing the retrosheet database and inserting the data into SQL. SQL is pretty easy, its merely a particular kind of English

after you've done that, you'll be able to extract and manipulate any data you want

**pedro803** · 06-29-11, 05:24 PM

thanks UVA I will look into that