1. #1
    brettd
    brettd's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 229
    Betpoints: 3869

    Adjusting for team power/SOS in NCAAF when examining box score variables

    I'm looking at trying to incorporate some sort of methodology in the adjustment/standardization of NCAAF box score variables to in order to project these variable outputs when going forward in time. Commonly you can come across a team that has run up its box score outputs against big dogs, which distorts box score match up analysis against a team that has had a recent run against more tougher rivals.

    My initial idea is to incorporate a basic efficiency +/- system by regressing dog/faves performances for any box score metric against what has been historically achieved at that spread.

    EG:

    A regression equation shows the average +5 dog has a rushing yards differential of -50. Team X in its previous game as a +5 dog had a rushing differential of -40. Therefore, team X has a prior week 'ATS efficiency adjusted' metric of +10.


    What's the pros/cons to this approach? What are some better ways of looking at this problem?

  2. #2
    brettd
    brettd's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 229
    Betpoints: 3869

    No one have any ideas on this?

  3. #3
    yak merchant
    yak merchant's Avatar Become A Pro!
    Join Date: 11-04-10
    Posts: 109
    Betpoints: 6170

    Quote Originally Posted by brettd View Post
    No one have any ideas on this?
    I'll play and I'm sure my answer will be of no help. I guess it depends on how you feel about using the number you are trying to beat as an input into your model. I'm not going to say using line history never has value, but for me basing adjustments on it kind of defeats the point of doing SOS adjustments especially if you are analyzing underlying stats and not scores. For me the whole scenario I'm trying to exploit is historical results that seem consistent with the lines issued, but due to analyzing the stats and adjusting for SOS in isolation from the line I can hopefully identify some value. Most importantly for me comparing the stats to the lines still doesn't solve the big anomalies in the data that derail good models.

    It may "Smooth" the data, but regardless of the line, when a game is a blowout weird things happen. Now yes blowouts are more likely to happen in games with big lines, but think about the following two scenarios:

    Team A is playing Team B at home Team A is favored by 21
    Team A wins 27-7 offense gains 4.1 YPC and gains 7.0 YPA and their defense gives up 3.0 YPC and 5.0 YPA


    Team C is playing Team D at home Team C is favored by 21
    Team C wins 42-23 offense gains 3.7 YPC and gains 6.5 YPA and their defense gives up 3.1 YPC and 8.0 YPA

    If Team C plays Team A if you just use the final stats and the line history to build your model. Your model will spit out Team A winning almost every time.

    However if I tell you that Team A was winning 10-7 at half in the first game and the game wasn't decided until the 4th quarter and there was a pick 6 in the last minute to take the score from 20-7 to 27-7

    and

    That TEAM C was winning 42-3 at half time and put in the second string in the second half and did nothing but run dives on offense and play prevent on defense..

    would you still want to wager on Team A?

    Granted these are extreme examples on a single set of games but scenarios like this are the hardest to model around. Over the years the most troublesome scenario for my model has always been.

    Crappy sun-belt team G is just starting league play and has played Oregon and Nebraska, and Troy in their first three games.

    Crappy sub-belt team H played Memphis, North Texas and Duke in their first 3.

    Due to blow outs against Oregon and Nebraska (consistent with line issued by Vegas) Team G get's all kinds of garbage time yards their stats are boosted.

    Team H stays in their games and doesn't get the same amount of garbage time.

    Run stats through model and model says Team G kills Team H due to good stats that are then adjusted up even more due to strong strength of schedule. Make bet on Team G, Team H covers easily.

    There are other complications, but for me I would be careful about "Smoothing" data by using the line in SOS adjustments as I can't really see how doing so actually increases the potency of the adjusted stats (versus other SOS adjustment methods). Plus some of the more complex issues with SOS adjustment aren't going to be addressed by that method anyway.

    Yes I know. I'm of no help. Good luck.

    Good luck.

  4. #4
    chunk
    chunk's Avatar Become A Pro!
    Join Date: 02-08-11
    Posts: 805
    Betpoints: 19168

    Quote Originally Posted by brettd View Post
    I'm looking at trying to incorporate some sort of methodology in the adjustment/standardization of NCAAF box score variables to in order to project these variable outputs when going forward in time. Commonly you can come across a team that has run up its box score outputs against big dogs, which distorts box score match up analysis against a team that has had a recent run against more tougher rivals.

    My initial idea is to incorporate a basic efficiency +/- system by regressing dog/faves performances for any box score metric against what has been historically achieved at that spread.

    EG:

    A regression equation shows the average +5 dog has a rushing yards differential of -50. Team X in its previous game as a +5 dog had a rushing differential of -40. Therefore, team X has a prior week 'ATS efficiency adjusted' metric of +10.


    What's the pros/cons to this approach? What are some better ways of looking at this problem?
    For a change, a good question. I do have an opinion on this, but I would like to some others before I opine.

  5. #5
    brettd
    brettd's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 229
    Betpoints: 3869

    Quote Originally Posted by yak merchant View Post
    I'll play and I'm sure my answer will be of no help. I guess it depends on how you feel about using the number you are trying to beat as an input into your model. I'm not going to say using line history never has value, but for me basing adjustments on it kind of defeats the point of doing SOS adjustments especially if you are analyzing underlying stats and not scores. For me the whole scenario I'm trying to exploit is historical results that seem consistent with the lines issued, but due to analyzing the stats and adjusting for SOS in isolation from the line I can hopefully identify some value. Most importantly for me comparing the stats to the lines still doesn't solve the big anomalies in the data that derail good models.

    It may "Smooth" the data, but regardless of the line, when a game is a blowout weird things happen. Now yes blowouts are more likely to happen in games with big lines, but think about the following two scenarios:

    Team A is playing Team B at home Team A is favored by 21
    Team A wins 27-7 offense gains 4.1 YPC and gains 7.0 YPA and their defense gives up 3.0 YPC and 5.0 YPA


    Team C is playing Team D at home Team C is favored by 21
    Team C wins 42-23 offense gains 3.7 YPC and gains 6.5 YPA and their defense gives up 3.1 YPC and 8.0 YPA

    If Team C plays Team A if you just use the final stats and the line history to build your model. Your model will spit out Team A winning almost every time.

    However if I tell you that Team A was winning 10-7 at half in the first game and the game wasn't decided until the 4th quarter and there was a pick 6 in the last minute to take the score from 20-7 to 27-7

    and

    That TEAM C was winning 42-3 at half time and put in the second string in the second half and did nothing but run dives on offense and play prevent on defense..

    would you still want to wager on Team A?

    Granted these are extreme examples on a single set of games but scenarios like this are the hardest to model around. Over the years the most troublesome scenario for my model has always been.

    Crappy sun-belt team G is just starting league play and has played Oregon and Nebraska, and Troy in their first three games.

    Crappy sub-belt team H played Memphis, North Texas and Duke in their first 3.

    Due to blow outs against Oregon and Nebraska (consistent with line issued by Vegas) Team G get's all kinds of garbage time yards their stats are boosted.

    Team H stays in their games and doesn't get the same amount of garbage time.

    Run stats through model and model says Team G kills Team H due to good stats that are then adjusted up even more due to strong strength of schedule. Make bet on Team G, Team H covers easily.

    There are other complications, but for me I would be careful about "Smoothing" data by using the line in SOS adjustments as I can't really see how doing so actually increases the potency of the adjusted stats (versus other SOS adjustment methods). Plus some of the more complex issues with SOS adjustment aren't going to be addressed by that method anyway.

    Yes I know. I'm of no help. Good luck.

    Good luck.

    Awesome reply man! Thanks. It got me thinking. Keep the ideas flowing! Hopefully this starts a quality thread on actual sports modelling, something the HTT doesn't see too often these days.

  6. #6
    Juret
    Update your status
    Juret's Avatar Become A Pro!
    Join Date: 07-18-10
    Posts: 113
    Betpoints: 1239

    Quote Originally Posted by brettd View Post
    EG:

    A regression equation shows the average +5 dog has a rushing yards differential of -50. Team X in its previous game as a +5 dog had a rushing differential of -40. Therefore, team X has a prior week 'ATS efficiency adjusted' metric of +10.
    You should include a Home Team dummy as that would explain some of why the line was where it was in addition to the past stats, or do you adjust to neutral field stats somehow?

  7. #7
    brettd
    brettd's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 229
    Betpoints: 3869

    Ah good point. Yeah i'll split the regression between 'home +5 dogs' and 'away +5 dogs'. I'm also thinking of splitting by conference and/or team type.

    A +5 dog might rack up a fantastic rushing differential because they are playing against a poor rush defence, but that's only because their passing game might be terrible and/or comes up against an excellent pass defence. So they have no choice but to rush.

    So in some cases a positive ATS adjusted efficiency differential may be reflective of a good performance, but in other cases may just be reflective of a lop sided team.

    This gets more complicated the more I think about it :/

  8. #8
    chunk
    chunk's Avatar Become A Pro!
    Join Date: 02-08-11
    Posts: 805
    Betpoints: 19168

    Actually, it gets more complicated the more I think of it also. Do you think that Phil Steele might do something similar when he gives each team a performance rating for each game?

  9. #9
    brettd
    brettd's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 229
    Betpoints: 3869

    Quote Originally Posted by chunk View Post
    Actually, it gets more complicated the more I think of it also. Do you think that Phil Steele might do something similar when he gives each team a performance rating for each game?
    No idea.

    How about this for an idea, generate a k-nearest neighbor solution to find the most similar opponent in terms of spread, and pass/rush offence & defence. That way, you'd be comparing like with like, and this particular ATS +/- efficiency may become more pertinent.

  10. #10
    daringly
    daringly's Avatar Become A Pro!
    Join Date: 08-10-05
    Posts: 114
    Betpoints: 4671

    Get "Who's #1" by Langville/Myer. The book spells out how to use matrices for rankings that incorporate SOS. It is the best thing I have read on this topic.

  11. #11
    brettd
    brettd's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 229
    Betpoints: 3869

    Quote Originally Posted by daringly View Post
    Get "Who's #1" by Langville/Myer. The book spells out how to use matrices for rankings that incorporate SOS. It is the best thing I have read on this topic.

    I've got the book. Just getting lazy and wanted to find a quick and dirty way of ranking game variables using the spread. I might have to go down this path after all, and generate a ranking matrix based on each box score variable.

Top