Adjusting for team power/SOS in NCAAF when examining box score variables

brettd · 07-03-13 09:37 AM

I'm looking at trying to incorporate some sort of methodology in the adjustment/standardization of NCAAF box score variables to in order to project these variable outputs when going forward in time. Commonly you can come across a team that has run up its box score outputs against big dogs, which distorts box score match up analysis against a team that has had a recent run against more tougher rivals.

My initial idea is to incorporate a basic efficiency +/- system by regressing dog/faves performances for any box score metric against what has been historically achieved at that spread.

EG:

A regression equation shows the average +5 dog has a rushing yards differential of -50. Team X in its previous game as a +5 dog had a rushing differential of -40. Therefore, team X has a prior week 'ATS efficiency adjusted' metric of +10.

What's the pros/cons to this approach? What are some better ways of looking at this problem?

brettd · 07-03-13 11:47 PM

No one have any ideas on this?

yak merchant · 07-04-13 01:26 AM

Originally Posted by brettd

No one have any ideas on this?

I'll play and I'm sure my answer will be of no help. I guess it depends on how you feel about using the number you are trying to beat as an input into your model. I'm not going to say using line history never has value, but for me basing adjustments on it kind of defeats the point of doing SOS adjustments especially if you are analyzing underlying stats and not scores. For me the whole scenario I'm trying to exploit is historical results that seem consistent with the lines issued, but due to analyzing the stats and adjusting for SOS in isolation from the line I can hopefully identify some value. Most importantly for me comparing the stats to the lines still doesn't solve the big anomalies in the data that derail good models.

It may "Smooth" the data, but regardless of the line, when a game is a blowout weird things happen. Now yes blowouts are more likely to happen in games with big lines, but think about the following two scenarios:

Team A is playing Team B at home Team A is favored by 21
Team A wins 27-7 offense gains 4.1 YPC and gains 7.0 YPA and their defense gives up 3.0 YPC and 5.0 YPA

Team C is playing Team D at home Team C is favored by 21
Team C wins 42-23 offense gains 3.7 YPC and gains 6.5 YPA and their defense gives up 3.1 YPC and 8.0 YPA

If Team C plays Team A if you just use the final stats and the line history to build your model. Your model will spit out Team A winning almost every time.

However if I tell you that Team A was winning 10-7 at half in the first game and the game wasn't decided until the 4th quarter and there was a pick 6 in the last minute to take the score from 20-7 to 27-7

and

That TEAM C was winning 42-3 at half time and put in the second string in the second half and did nothing but run dives on offense and play prevent on defense..

would you still want to wager on Team A?

Granted these are extreme examples on a single set of games but scenarios like this are the hardest to model around. Over the years the most troublesome scenario for my model has always been.

Crappy sun-belt team G is just starting league play and has played Oregon and Nebraska, and Troy in their first three games.

Crappy sub-belt team H played Memphis, North Texas and Duke in their first 3.

Due to blow outs against Oregon and Nebraska (consistent with line issued by Vegas) Team G get's all kinds of garbage time yards their stats are boosted.

Team H stays in their games and doesn't get the same amount of garbage time.

Run stats through model and model says Team G kills Team H due to good stats that are then adjusted up even more due to strong strength of schedule. Make bet on Team G, Team H covers easily.

There are other complications, but for me I would be careful about "Smoothing" data by using the line in SOS adjustments as I can't really see how doing so actually increases the potency of the adjusted stats (versus other SOS adjustment methods). Plus some of the more complex issues with SOS adjustment aren't going to be addressed by that method anyway.

Yes I know. I'm of no help. Good luck.

Good luck.

chunk · 07-04-13 02:54 AM

Originally Posted by brettd

I'm looking at trying to incorporate some sort of methodology in the adjustment/standardization of NCAAF box score variables to in order to project these variable outputs when going forward in time. Commonly you can come across a team that has run up its box score outputs against big dogs, which distorts box score match up analysis against a team that has had a recent run against more tougher rivals.

My initial idea is to incorporate a basic efficiency +/- system by regressing dog/faves performances for any box score metric against what has been historically achieved at that spread.

EG:

A regression equation shows the average +5 dog has a rushing yards differential of -50. Team X in its previous game as a +5 dog had a rushing differential of -40. Therefore, team X has a prior week 'ATS efficiency adjusted' metric of +10.

What's the pros/cons to this approach? What are some better ways of looking at this problem?

For a change, a good question. I do have an opinion on this, but I would like to some others before I opine.

brettd · 07-04-13 04:07 AM

Originally Posted by yak merchant

I'll play and I'm sure my answer will be of no help. I guess it depends on how you feel about using the number you are trying to beat as an input into your model. I'm not going to say using line history never has value, but for me basing adjustments on it kind of defeats the point of doing SOS adjustments especially if you are analyzing underlying stats and not scores. For me the whole scenario I'm trying to exploit is historical results that seem consistent with the lines issued, but due to analyzing the stats and adjusting for SOS in isolation from the line I can hopefully identify some value. Most importantly for me comparing the stats to the lines still doesn't solve the big anomalies in the data that derail good models.

It may "Smooth" the data, but regardless of the line, when a game is a blowout weird things happen. Now yes blowouts are more likely to happen in games with big lines, but think about the following two scenarios:

Team A is playing Team B at home Team A is favored by 21
Team A wins 27-7 offense gains 4.1 YPC and gains 7.0 YPA and their defense gives up 3.0 YPC and 5.0 YPA

Team C is playing Team D at home Team C is favored by 21
Team C wins 42-23 offense gains 3.7 YPC and gains 6.5 YPA and their defense gives up 3.1 YPC and 8.0 YPA

If Team C plays Team A if you just use the final stats and the line history to build your model. Your model will spit out Team A winning almost every time.

However if I tell you that Team A was winning 10-7 at half in the first game and the game wasn't decided until the 4th quarter and there was a pick 6 in the last minute to take the score from 20-7 to 27-7

and

That TEAM C was winning 42-3 at half time and put in the second string in the second half and did nothing but run dives on offense and play prevent on defense..

would you still want to wager on Team A?

Granted these are extreme examples on a single set of games but scenarios like this are the hardest to model around. Over the years the most troublesome scenario for my model has always been.

Crappy sun-belt team G is just starting league play and has played Oregon and Nebraska, and Troy in their first three games.

Crappy sub-belt team H played Memphis, North Texas and Duke in their first 3.

Due to blow outs against Oregon and Nebraska (consistent with line issued by Vegas) Team G get's all kinds of garbage time yards their stats are boosted.

Team H stays in their games and doesn't get the same amount of garbage time.

Run stats through model and model says Team G kills Team H due to good stats that are then adjusted up even more due to strong strength of schedule. Make bet on Team G, Team H covers easily.

There are other complications, but for me I would be careful about "Smoothing" data by using the line in SOS adjustments as I can't really see how doing so actually increases the potency of the adjusted stats (versus other SOS adjustment methods). Plus some of the more complex issues with SOS adjustment aren't going to be addressed by that method anyway.

Yes I know. I'm of no help. Good luck.

Good luck.

Awesome reply man! Thanks. It got me thinking. Keep the ideas flowing! Hopefully this starts a quality thread on actual sports modelling, something the HTT doesn't see too often these days.

Juret · 07-04-13 07:32 AM

Originally Posted by brettd

EG:

A regression equation shows the average +5 dog has a rushing yards differential of -50. Team X in its previous game as a +5 dog had a rushing differential of -40. Therefore, team X has a prior week 'ATS efficiency adjusted' metric of +10.

You should include a Home Team dummy as that would explain some of why the line was where it was in addition to the past stats, or do you adjust to neutral field stats somehow?

brettd · 07-04-13 08:32 AM

Ah good point. Yeah i'll split the regression between 'home +5 dogs' and 'away +5 dogs'. I'm also thinking of splitting by conference and/or team type.

A +5 dog might rack up a fantastic rushing differential because they are playing against a poor rush defence, but that's only because their passing game might be terrible and/or comes up against an excellent pass defence. So they have no choice but to rush.

So in some cases a positive ATS adjusted efficiency differential may be reflective of a good performance, but in other cases may just be reflective of a lop sided team.

This gets more complicated the more I think about it :/

chunk · 07-04-13 01:02 PM

Actually, it gets more complicated the more I think of it also. Do you think that Phil Steele might do something similar when he gives each team a performance rating for each game?

brettd · 07-04-13 01:38 PM

Originally Posted by chunk

Actually, it gets more complicated the more I think of it also. Do you think that Phil Steele might do something similar when he gives each team a performance rating for each game?

No idea.

How about this for an idea, generate a k-nearest neighbor solution to find the most similar opponent in terms of spread, and pass/rush offence & defence. That way, you'd be comparing like with like, and this particular ATS +/- efficiency may become more pertinent.

daringly · 07-08-13 11:20 AM

Get "Who's #1" by Langville/Myer. The book spells out how to use matrices for rankings that incorporate SOS. It is the best thing I have read on this topic.

brettd · 07-10-13 12:17 AM

Originally Posted by daringly

Get "Who's #1" by Langville/Myer. The book spells out how to use matrices for rankings that incorporate SOS. It is the best thing I have read on this topic.

I've got the book. Just getting lazy and wanted to find a quick and dirty way of ranking game variables using the spread. I might have to go down this path after all, and generate a ranking matrix based on each box score variable.

SBR Top-Rated Sportsbooks				Best Sportsbooks List
#1 FanDuel	SBR rating 4.8/5	Review	#6 BetRivers	SBR rating 4.1/5	Review
#2 Caesars	SBR rating 4.7/5	Review	#7 Fanatics	SBR rating 4.1/5	Review
#3 DraftKings	SBR rating 4.7/5	Review	#8 Betway	SBR rating 3.8/5	Review
#4 BetMGM	SBR rating 4.6/5	Review	#9 Borgata	SBR rating 3.5/5	Review
#5 bet365	SBR rating 4.6/5	Review	#10 ClutchBet	SBR rating 2.9/5	Review

Adjusting for team power/SOS in NCAAF when examining box score variables

Thread Tools

Adjusting for team power/SOS in NCAAF when examining box score variables