1. #1
    VegasRandy
    VegasRandy's Avatar Become A Pro!
    Join Date: 12-30-07
    Posts: 103
    Betpoints: 48

    Math Help: Reducing Multiple Stats to a Single Number

    I’ve created a system that ranks NFL teams from best to worst. To do this I calculated seven different statistic and measured how they related to winning. I did these calculations for every game, then found the overall average for a game. I then went back and compared every game to this average, and found win/losses based on being above or below average. Games where teams played above the NFL average on the seven statistics had the following winning %.

    Stat.......... NFL Ave ...........Off W% .........Def W% ..........W% Off & Def

    DSR........... 69.1% ............66.5% ...........65.4% ............89.14%

    ANPY/A ......5.338 ..............70.3% ...........70.7% ............89.10%

    Turnovers .....1.76 .............67.8% ............67.5%............ 82.5%

    YDS/Drive ....28.36 ............63.2% ...........63.6% ............81.5%

    TOP/Drive ......2.77 ............65.4% ...........65.3% ............79.6%

    Yds/Play ........5.09 .............62.2% ..........61.6% ............77.4%

    FD/Drive .......1.62 .............61.0% ..........61.2% ............76.4%

    3rd&4thdw .....39.1% .........62.6% ..........61.8% ............75.11%

    The next step is what I need help with. I would like to combine these 7 statistics into a single number for each teams’ offense and defense. Ideally, I would like each stat weighted based on the winning % above.
    Here are the statistics for the Detroit Lions Offense:

    DSR........................... 61.29%
    ANPY/A ........................3.45
    Turnovers Per Gm .........2.24
    Yds/Drive ...................22.41
    TOP/Drive ....................2.12
    Yds/Play ......................3.95
    FD/Drive .....................1.34
    3rd & 4th Dwn ............23.4%



    Any ideas?

    Thanks,
    Randy
    Last edited by VegasRandy; 08-20-09 at 11:20 PM.

  2. #2
    Justin7
    Justin7's Avatar Become A Pro!
    Join Date: 07-31-06
    Posts: 8,577
    Betpoints: 1506

    I would try a 7-variable regression. For each team, you have 7 stats you have recorded. You have 1 stat you care about - future win percentage. A data regression will do it.

    Another thing to throw for a LOT more power: an 8th variable for each data set - how many weeks of data you already have. Your model probably works a lot better after 8 weeks than 2. If you put this in, you can do two different approaches:
    1. Regression of data, focusing on forward win rate as a function of weeks of data, and
    2. Regression focusing on differential between spread and your prediction as a function of cumulative data.

  3. #3
    GELATINOUS CUBE
    SBR's 94.4% handicapper
    GELATINOUS CUBE's Avatar Become A Pro!
    Join Date: 08-09-09
    Posts: 4,534

    Or you can just kidnap key offensive and defensive players that you plan to bet against.

  4. #4
    VegasRandy
    VegasRandy's Avatar Become A Pro!
    Join Date: 12-30-07
    Posts: 103
    Betpoints: 48

    Quote Originally Posted by Justin7 View Post
    I would try a 7-variable regression. For each team, you have 7 stats you have recorded. You have 1 stat you care about - future win percentage. A data regression will do it.

    Another thing to throw for a LOT more power: an 8th variable for each data set - how many weeks of data you already have. Your model probably works a lot better after 8 weeks than 2. If you put this in, you can do two different approaches:
    1. Regression of data, focusing on forward win rate as a function of weeks of data, and
    2. Regression focusing on differential between spread and your prediction as a function of cumulative data.
    Thanks but those methods are beyond me.

    I was looking for a formula where I could plug the seven stats into and the output would be a single number.
    Last edited by VegasRandy; 08-21-09 at 01:18 AM.

  5. #5
    Justin7
    Justin7's Avatar Become A Pro!
    Join Date: 07-31-06
    Posts: 8,577
    Betpoints: 1506

    Quote Originally Posted by VegasRandy View Post
    Thanks but those methods are beyond me.

    I was looking for a formula where I could plug the seven stats into and the output would be a single number.
    That's what your doing. A regression tells you how to use those 7 numbers as inputs, and estimate a future win percentage. That number it creates is a powerful power ranking.

  6. #6
    roasthawg
    roasthawg's Avatar Become A Pro!
    Join Date: 11-09-07
    Posts: 2,990

    TEAM AVG*W% (OFF or DEF)/NFL AVG

    Sum the totals and divide by 7.

  7. #7
    maxdalury
    maxdalury's Avatar Become A Pro!
    Join Date: 05-28-09
    Posts: 67

    download R look at a tutorial on basics to use R and you can do regressions, plus R is free.

  8. #8
    head_strong
    head_strong's Avatar Become A Pro!
    Join Date: 07-02-08
    Posts: 4,318
    Betpoints: 500

    Looks good to me.....I would just disregard 99.99% of everything listed above.

  9. #9
    MonkeyF0cker
    Update your status
    MonkeyF0cker's Avatar Become A Pro!
    Join Date: 06-12-07
    Posts: 12,144
    Betpoints: 1127

    You're on the right track, Randy. Take Justin's advice. There are plenty of statistics books where you can learn the fundamentals of regression.

  10. #10
    VegasRandy
    VegasRandy's Avatar Become A Pro!
    Join Date: 12-30-07
    Posts: 103
    Betpoints: 48

    Looks like I’ll be investing my time into learning regression.

    If you have Excel you can go to the tools tab and select Add-Ins. Then select Analysis ToolPak.
    Insert your Excel disc. You can now use the data analysis functions in Excel, which include multiple regression. Didn’t realize this was available until I started doing some research on the best approach to learning regression.

    Thanks everyone.

  11. #11
    1
    Update your status
    1's Avatar Become A Pro!
    Join Date: 07-05-09
    Posts: 30

    Quote Originally Posted by maxdalury View Post
    download R look at a tutorial on basics to use R and you can do regressions, plus R is free.
    What is "R"?

  12. #12
    VegasRandy
    VegasRandy's Avatar Become A Pro!
    Join Date: 12-30-07
    Posts: 103
    Betpoints: 48

    Quote Originally Posted by 1 View Post
    What is "R"?
    http://www.r-project.org/

    Interesting site.

  13. #13
    1
    Update your status
    1's Avatar Become A Pro!
    Join Date: 07-05-09
    Posts: 30

    Thank you

  14. #14
    Formulawiz
    Formulawiz's Avatar Become A Pro!
    Join Date: 01-12-09
    Posts: 1,589

    Quote Originally Posted by VegasRandy View Post
    I’ve created a system that ranks NFL teams from best to worst. To do this I calculated seven different statistic and measured how they related to winning. I did these calculations for every game, then found the overall average for a game. I then went back and compared every game to this average, and found win/losses based on being above or below average. Games where teams played above the NFL average on the seven statistics had the following winning %.

    Stat.......... NFL Ave ...........Off W% .........Def W% ..........W% Off & Def

    DSR........... 69.1% ............66.5% ...........65.4% ............89.14%

    ANPY/A ......5.338 ..............70.3% ...........70.7% ............89.10%

    Turnovers .....1.76 .............67.8% ............67.5%............ 82.5%

    YDS/Drive ....28.36 ............63.2% ...........63.6% ............81.5%

    TOP/Drive ......2.77 ............65.4% ...........65.3% ............79.6%

    Yds/Play ........5.09 .............62.2% ..........61.6% ............77.4%

    FD/Drive .......1.62 .............61.0% ..........61.2% ............76.4%

    3rd&4thdw .....39.1% .........62.6% ..........61.8% ............75.11%

    The next step is what I need help with. I would like to combine these 7 statistics into a single number for each teams’ offense and defense. Ideally, I would like each stat weighted based on the winning % above.
    Here are the statistics for the Detroit Lions Offense:

    DSR........................... 61.29%
    ANPY/A ........................3.45
    Turnovers Per Gm .........2.24
    Yds/Drive ...................22.41
    TOP/Drive ....................2.12
    Yds/Play ......................3.95
    FD/Drive .....................1.34
    3rd & 4th Dwn ............23.4%



    Any ideas?

    Thanks,
    Randy
    I believe the win/loss records are SU and not ATS

  15. #15
    VegasRandy
    VegasRandy's Avatar Become A Pro!
    Join Date: 12-30-07
    Posts: 103
    Betpoints: 48

    I did the regression for the Buffalo Bills but I don’t know how to interpret the output. What statistic do I use for the power rating? Coefficient, Multiple R, R squared, Adjusted R squared? Something else?


    Below is the input I used for the Bills regression:

    Y is the dependent variable "winning %"
    X is the independent variable "stats"

    Winning %...........................Stat........ ......................Buff Ave
    0.665........................DSR........ ......................80.49%
    0.703.......................ANPY/A..........................8.36
    0.678.........................TO/Gm...........................2
    0.632...........................Yds/Drive.....................39.04
    0.654............................TOP/Drive.....................3.13
    0.622............................YDS/Play.......................8.69
    0.61................................1stD wn/Drive...............1.41
    0.626...............................3rd&4thDwn..............41.83%

    Y=Sum(0.665+0.703+0.678+0.632+0.654+0.62 2+0.61+0.626)

    X=Sum(80.49+8.36+2+39.04+3.13+8.69+1.41+ 41.83)


    I tried to copy the results to the post but everything was bunched together so I attached the results instead.


    Thanks for any help-

    Randy
    Attached Files

  16. #16
    maxdalury
    maxdalury's Avatar Become A Pro!
    Join Date: 05-28-09
    Posts: 67

    ok. first of R squared is the rough measurement of how accurate your regression is. you want to get a high R squared.

    and you should not sum up all the x variables
    the point of the regression is to find the weight of each of the individual x variables as it affects the win percentage. By summing it up you are not getting anything. I would suggest that you take one individual dependent variable (pts scored, percentage win) and then add each of the x variables individually.

    then you will have a y = x1 + x2 ... equation, where y is a function of x1 + x2 ... based on their appropriate weights

  17. #17
    VegasRandy
    VegasRandy's Avatar Become A Pro!
    Join Date: 12-30-07
    Posts: 103
    Betpoints: 48

    Thanks for the reply.

    I might have mispoken when I stated:

    Y=Sum(0.665+0.703+0.678+0.632+0.654+0.62 2+0.61+0.626)

    X=Sum(80.49+8.36+2+39.04+3.13+8.69+1.41+ 41.83)

    When excel asked for the Input Y range I highlighted the column with the winning %. The Y range was expressed as $C$92:$C$99. Input X range was stated as $E$92:$E$99. I didn't type these ranges in manually, Excel did this automatically once I highlighted the ranges.

    When I try to compute the regression any other way (ie individually) I get an error message stating either, "The number of rows and colums in X range cannot be the same, or "X range and Y range must have the same number of rows regardless of labels."

    Not sure what I'm doing wrong.

  18. #18
    maxdalury
    maxdalury's Avatar Become A Pro!
    Join Date: 05-28-09
    Posts: 67

    ok. you should have a spreadsheet with the first column should be a single dependent variable for each team(pts. scored, win percentage, etc.). Then you should have all the indepent variables in the same row but going on the columns.

    so it would be

    win % | DSR | ANPY/A | TO/GM | YDS/DRIVE | TOP/DRIVE | YDS/PLAY | 1stD wn/Drive | 3rd & 4th DOWN

    for each team and so on.

    the key is the first column because in laymans terms the regression is trying to find the best fit formula for the first column based on all the other columns.

    hopefully that makes sense

  19. #19
    threeg5
    All In A Days Work
    threeg5's Avatar Become A Pro!
    Join Date: 07-18-09
    Posts: 488
    Betpoints: 321

    Awesome

    So is this equivalent to sabrmetrics in baseball. Man numbers are extremely fascinating. Are they not

  20. #20
    Formulawiz
    Formulawiz's Avatar Become A Pro!
    Join Date: 01-12-09
    Posts: 1,589

    Quote Originally Posted by threeg5 View Post
    So is this equivalent to sabrmetrics in baseball. Man numbers are extremely fascinating. Are they not
    As I mentioned previously the high win % obtained using these stats are based on SU and as we all know SU wont get you anywhere. You need to go back and see how these stats all performed ATS and that is quite a bit of work and I can assure you the win/loss % ATS will be in the 50% range.

  21. #21
    threeg5
    All In A Days Work
    threeg5's Avatar Become A Pro!
    Join Date: 07-18-09
    Posts: 488
    Betpoints: 321

    This may be true but....

    Quote Originally Posted by Formulawiz View Post
    As I mentioned previously the high win % obtained using these stats are based on SU and as we all know SU wont get you anywhere. You need to go back and see how these stats all performed ATS and that is quite a bit of work and I can assure you the win/loss % ATS will be in the 50% range.
    But with the stats given it is trial and error albeit quite a bit however, if there is different weights given to each indivual stat could potentially give one an idea within a % of how team a will perform against team b so you may not know a decisive number for score but will have the knowledge that team a will outperform team b buy x% and then have what team b has the potential to score thereby giving an approximation of ability ATS. I say this is but a start maybe it works better or even with ncaa in current format it may not get completed early or this year for that matter but there's football till december (- playoff contentions) and there will be ball next year.
    Does this make any sense

  22. #22
    Formulawiz
    Formulawiz's Avatar Become A Pro!
    Join Date: 01-12-09
    Posts: 1,589

    Quote Originally Posted by threeg5 View Post
    But with the stats given it is trial and error albeit quite a bit however, if there is different weights given to each indivual stat could potentially give one an idea within a % of how team a will perform against team b so you may not know a decisive number for score but will have the knowledge that team a will outperform team b buy x% and then have what team b has the potential to score thereby giving an approximation of ability ATS. I say this is but a start maybe it works better or even with ncaa in current format it may not get completed early or this year for that matter but there's football till december (- playoff contentions) and there will be ball next year.
    Does this make any sense
    If you look at all the winning systems that are presented you will find quite a bit of redundancy between them. If a team performs well with DSR, they will also perform well with ANPY/A, TO, YDS/DR, RZ EFF and etc. The opposite goes for teams performing miserably. I suggest you only use the ANPY/A formula to keep it simple and you probably will get the same win/loss results anyway.
    I still think its important to see how the formulas performed ATS, otherwise your spinning your wheels.

  23. #23
    threeg5
    All In A Days Work
    threeg5's Avatar Become A Pro!
    Join Date: 07-18-09
    Posts: 488
    Betpoints: 321

    Concur

    Quote Originally Posted by Formulawiz View Post
    If you look at all the winning systems that are presented you will find quite a bit of redundancy between them. If a team performs well with DSR, they will also perform well with ANPY/A, TO, YDS/DR, RZ EFF and etc. The opposite goes for teams performing miserably. I suggest you only use the ANPY/A formula to keep it simple and you probably will get the same win/loss results anyway.
    I still think its important to see how the formulas performed ATS, otherwise your spinning your wheels.
    I agree with the ATS what I am saying is we as a whole should continue the research to find what shows these teams either in their power or their vulnerability either way will be able to help produce a winning result. After all the $$$$ is in the spread

  24. #24
    maxdalury
    maxdalury's Avatar Become A Pro!
    Join Date: 05-28-09
    Posts: 67

    This is a start on the right track...and a learning process. Clearly doing this alone will not be profitable in the long run.

  25. #25
    laxdjock
    Anyone but the SEC.
    laxdjock's Avatar Become A Pro!
    Join Date: 09-15-07
    Posts: 4,074
    Betpoints: 12

    The problem you will run into with taking 7 factors into one is that each factor will influence the outcome at a different percentage, therefore you'd need to weight each factor so there is a more accurate relevance between the numbers. You really need to establish some hypothesis to see if certain factors go together, or you'll be crunching your numbers for hours without much to show for it. You'll also run into a problem with statistical significance, as your samples will have a natural limit because going year to year can really open you up to some pretty powerful confounding variables. There are some post-hoc adjustments you can make to help estimate the numbers, but that starts getting really sticky unless you are very comfortable with a stats package and/or have some home-brewed adjustments.

    I've found that isolating on certain factors within a few teams is easier to manage week to week, as you can become more familiar with their numbers. Once you have a few factors, you can run them against each opponent, and then look at the spread #'s compared to those factors. ATS (basic SDs, etc) across books can also produce some interesting numbers, particularly when you factor in open and close #'s. This is where having others involved is beneficial, as there are a lot of ways to dice the information, but it takes a good approach and skill to get workable data.

  26. #26
    maxdalury
    maxdalury's Avatar Become A Pro!
    Join Date: 05-28-09
    Posts: 67

    Quote Originally Posted by laxdjock View Post
    The problem you will run into with taking 7 factors into one is that each factor will influence the outcome at a different percentage, therefore you'd need to weight each factor so there is a more accurate relevance between the numbers.
    That's what a regression does.

  27. #27
    laxdjock
    Anyone but the SEC.
    laxdjock's Avatar Become A Pro!
    Join Date: 09-15-07
    Posts: 4,074
    Betpoints: 12

    Quote Originally Posted by maxdalury View Post
    That's what a regression does.
    I know. I was trying to write a reply that was a bit easier to understand, as his initial inquiry was asking about it, and based on his replies I'm not sure if it was clear why a regression was needed. My main point was to try and establish a rationale for what you are looking for, as the statistics won't mean much much without it. He could dump the data into SPSS and see what sticks....but that can be a dangerous approach to interpretation.

  28. #28
    maxdalury
    maxdalury's Avatar Become A Pro!
    Join Date: 05-28-09
    Posts: 67

    true. what everyone has said before: it is easy to make a model that predicts the past effectively but it is hard to predict the future effectively.

  29. #29
    threeg5
    All In A Days Work
    threeg5's Avatar Become A Pro!
    Join Date: 07-18-09
    Posts: 488
    Betpoints: 321

    yes indeed

    Quote Originally Posted by maxdalury View Post
    true. what everyone has said before: it is easy to make a model that predicts the past effectively but it is hard to predict the future effectively.

    You can never create a number or factor in a number that estimates a "bad day" or "everything going right"


  30. #30
    roasthawg
    roasthawg's Avatar Become A Pro!
    Join Date: 11-09-07
    Posts: 2,990

    Quote Originally Posted by maxdalury View Post
    true. what everyone has said before: it is easy to make a model that predicts the past effectively but it is hard to predict the future effectively.
    Exactly. I can give you a hundred formulas that would've made you a millionaire if you had them in the past... going forward they're only coin flips though. Tough to beat the books with simple stats that are easily accessible to all.

  31. #31
    Kaplan
    Kaplan's Avatar Become A Pro!
    Join Date: 01-15-11
    Posts: 165
    Betpoints: 865

    Quote Originally Posted by maxdalury View Post
    ok. first of R squared is the rough measurement of how accurate your regression is. you want to get a high R squared.

    and you should not sum up all the x variables
    the point of the regression is to find the weight of each of the individual x variables as it affects the win percentage. By summing it up you are not getting anything. I would suggest that you take one individual dependent variable (pts scored, percentage win) and then add each of the x variables individually.

    then you will have a y = x1 + x2 ... equation, where y is a function of x1 + x2 ... based on their appropriate weights

    I am a relative Newb to using regression. Kind of addicting. How do you determine what a high or acceptable r squared is? I seem to get alot of r squares in the 20-30 % range.

Top