Math Help: Reducing Multiple Stats to a Single Number

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • VegasRandy
    SBR High Roller
    • 12-30-07
    • 103

    #1
    Math Help: Reducing Multiple Stats to a Single Number
    I’ve created a system that ranks NFL teams from best to worst. To do this I calculated seven different statistic and measured how they related to winning. I did these calculations for every game, then found the overall average for a game. I then went back and compared every game to this average, and found win/losses based on being above or below average. Games where teams played above the NFL average on the seven statistics had the following winning %.

    Stat.......... NFL Ave ...........Off W% .........Def W% ..........W% Off & Def

    DSR........... 69.1% ............66.5% ...........65.4% ............89.14%

    ANPY/A ......5.338 ..............70.3% ...........70.7% ............89.10%

    Turnovers .....1.76 .............67.8% ............67.5%............ 82.5%

    YDS/Drive ....28.36 ............63.2% ...........63.6% ............81.5%

    TOP/Drive ......2.77 ............65.4% ...........65.3% ............79.6%

    Yds/Play ........5.09 .............62.2% ..........61.6% ............77.4%

    FD/Drive .......1.62 .............61.0% ..........61.2% ............76.4%

    3rd&4thdw .....39.1% .........62.6% ..........61.8% ............75.11%

    The next step is what I need help with. I would like to combine these 7 statistics into a single number for each teams’ offense and defense. Ideally, I would like each stat weighted based on the winning % above.
    Here are the statistics for the Detroit Lions Offense:

    DSR........................... 61.29%
    ANPY/A ........................3.45
    Turnovers Per Gm .........2.24
    Yds/Drive ...................22.41
    TOP/Drive ....................2.12
    Yds/Play ......................3.95
    FD/Drive .....................1.34
    3rd & 4th Dwn ............23.4%



    Any ideas?

    Thanks,
    Randy
    Last edited by VegasRandy; 08-20-09, 11:20 PM.
  • Justin7
    SBR Hall of Famer
    • 07-31-06
    • 8577

    #2
    I would try a 7-variable regression. For each team, you have 7 stats you have recorded. You have 1 stat you care about - future win percentage. A data regression will do it.

    Another thing to throw for a LOT more power: an 8th variable for each data set - how many weeks of data you already have. Your model probably works a lot better after 8 weeks than 2. If you put this in, you can do two different approaches:
    1. Regression of data, focusing on forward win rate as a function of weeks of data, and
    2. Regression focusing on differential between spread and your prediction as a function of cumulative data.
    Comment
    • GELATINOUS CUBE
      SBR MVP
      • 08-09-09
      • 4534

      #3
      Or you can just kidnap key offensive and defensive players that you plan to bet against.
      blog '09-'10: 37-16: +$31,900
      mlb 2010; 16-12: +$4,540
      gellyhoops 2010: 10-6 +$3,150
      overall: 63-34 +$40,290
      Comment
      • VegasRandy
        SBR High Roller
        • 12-30-07
        • 103

        #4
        Originally posted by Justin7
        I would try a 7-variable regression. For each team, you have 7 stats you have recorded. You have 1 stat you care about - future win percentage. A data regression will do it.

        Another thing to throw for a LOT more power: an 8th variable for each data set - how many weeks of data you already have. Your model probably works a lot better after 8 weeks than 2. If you put this in, you can do two different approaches:
        1. Regression of data, focusing on forward win rate as a function of weeks of data, and
        2. Regression focusing on differential between spread and your prediction as a function of cumulative data.
        Thanks but those methods are beyond me.

        I was looking for a formula where I could plug the seven stats into and the output would be a single number.
        Last edited by VegasRandy; 08-21-09, 01:18 AM.
        Comment
        • Justin7
          SBR Hall of Famer
          • 07-31-06
          • 8577

          #5
          Originally posted by VegasRandy
          Thanks but those methods are beyond me.

          I was looking for a formula where I could plug the seven stats into and the output would be a single number.
          That's what your doing. A regression tells you how to use those 7 numbers as inputs, and estimate a future win percentage. That number it creates is a powerful power ranking.
          Comment
          • roasthawg
            SBR MVP
            • 11-09-07
            • 2990

            #6
            TEAM AVG*W% (OFF or DEF)/NFL AVG

            Sum the totals and divide by 7.
            Comment
            • maxdalury
              Restricted User
              • 05-28-09
              • 67

              #7
              download R look at a tutorial on basics to use R and you can do regressions, plus R is free.
              Comment
              • head_strong
                SBR MVP
                • 07-02-08
                • 4318

                #8
                Looks good to me.....I would just disregard 99.99% of everything listed above.
                Comment
                • MonkeyF0cker
                  SBR Posting Legend
                  • 06-12-07
                  • 12144

                  #9
                  You're on the right track, Randy. Take Justin's advice. There are plenty of statistics books where you can learn the fundamentals of regression.
                  Comment
                  • VegasRandy
                    SBR High Roller
                    • 12-30-07
                    • 103

                    #10
                    Looks like I’ll be investing my time into learning regression.

                    If you have Excel you can go to the tools tab and select Add-Ins. Then select Analysis ToolPak.
                    Insert your Excel disc. You can now use the data analysis functions in Excel, which include multiple regression. Didn’t realize this was available until I started doing some research on the best approach to learning regression.

                    Thanks everyone.
                    Comment
                    • 1
                      SBR Rookie
                      • 07-05-09
                      • 30

                      #11
                      Originally posted by maxdalury
                      download R look at a tutorial on basics to use R and you can do regressions, plus R is free.
                      What is "R"?
                      Comment
                      • VegasRandy
                        SBR High Roller
                        • 12-30-07
                        • 103

                        #12
                        Originally posted by 1
                        What is "R"?


                        Interesting site.
                        Comment
                        • 1
                          SBR Rookie
                          • 07-05-09
                          • 30

                          #13
                          Thank you
                          Comment
                          • Formulawiz
                            Restricted User
                            • 01-12-09
                            • 1589

                            #14
                            Originally posted by VegasRandy
                            I’ve created a system that ranks NFL teams from best to worst. To do this I calculated seven different statistic and measured how they related to winning. I did these calculations for every game, then found the overall average for a game. I then went back and compared every game to this average, and found win/losses based on being above or below average. Games where teams played above the NFL average on the seven statistics had the following winning %.

                            Stat.......... NFL Ave ...........Off W% .........Def W% ..........W% Off & Def

                            DSR........... 69.1% ............66.5% ...........65.4% ............89.14%

                            ANPY/A ......5.338 ..............70.3% ...........70.7% ............89.10%

                            Turnovers .....1.76 .............67.8% ............67.5%............ 82.5%

                            YDS/Drive ....28.36 ............63.2% ...........63.6% ............81.5%

                            TOP/Drive ......2.77 ............65.4% ...........65.3% ............79.6%

                            Yds/Play ........5.09 .............62.2% ..........61.6% ............77.4%

                            FD/Drive .......1.62 .............61.0% ..........61.2% ............76.4%

                            3rd&4thdw .....39.1% .........62.6% ..........61.8% ............75.11%

                            The next step is what I need help with. I would like to combine these 7 statistics into a single number for each teams’ offense and defense. Ideally, I would like each stat weighted based on the winning % above.
                            Here are the statistics for the Detroit Lions Offense:

                            DSR........................... 61.29%
                            ANPY/A ........................3.45
                            Turnovers Per Gm .........2.24
                            Yds/Drive ...................22.41
                            TOP/Drive ....................2.12
                            Yds/Play ......................3.95
                            FD/Drive .....................1.34
                            3rd & 4th Dwn ............23.4%



                            Any ideas?

                            Thanks,
                            Randy
                            I believe the win/loss records are SU and not ATS
                            Comment
                            • VegasRandy
                              SBR High Roller
                              • 12-30-07
                              • 103

                              #15
                              I did the regression for the Buffalo Bills but I don’t know how to interpret the output. What statistic do I use for the power rating? Coefficient, Multiple R, R squared, Adjusted R squared? Something else?


                              Below is the input I used for the Bills regression:

                              Y is the dependent variable "winning %"
                              X is the independent variable "stats"

                              Winning %...........................Stat........ ......................Buff Ave
                              0.665........................DSR........ ......................80.49%
                              0.703.......................ANPY/A..........................8.36
                              0.678.........................TO/Gm...........................2
                              0.632...........................Yds/Drive.....................39.04
                              0.654............................TOP/Drive.....................3.13
                              0.622............................YDS/Play.......................8.69
                              0.61................................1stD wn/Drive...............1.41
                              0.626...............................3rd& 4thDwn..............41.83%

                              Y=Sum(0.665+0.703+0.678+0.632+0.654+0.62 2+0.61+0.626)

                              X=Sum(80.49+8.36+2+39.04+3.13+8.69+1.41+ 41.83)


                              I tried to copy the results to the post but everything was bunched together so I attached the results instead.


                              Thanks for any help-

                              Randy
                              Attached Files
                              Comment
                              • maxdalury
                                Restricted User
                                • 05-28-09
                                • 67

                                #16
                                ok. first of R squared is the rough measurement of how accurate your regression is. you want to get a high R squared.

                                and you should not sum up all the x variables
                                the point of the regression is to find the weight of each of the individual x variables as it affects the win percentage. By summing it up you are not getting anything. I would suggest that you take one individual dependent variable (pts scored, percentage win) and then add each of the x variables individually.

                                then you will have a y = x1 + x2 ... equation, where y is a function of x1 + x2 ... based on their appropriate weights
                                Comment
                                • VegasRandy
                                  SBR High Roller
                                  • 12-30-07
                                  • 103

                                  #17
                                  Thanks for the reply.

                                  I might have mispoken when I stated:

                                  Y=Sum(0.665+0.703+0.678+0.632+0.654+0.62 2+0.61+0.626)

                                  X=Sum(80.49+8.36+2+39.04+3.13+8.69+1.41+ 41.83)

                                  When excel asked for the Input Y range I highlighted the column with the winning %. The Y range was expressed as $C$92:$C$99. Input X range was stated as $E$92:$E$99. I didn't type these ranges in manually, Excel did this automatically once I highlighted the ranges.

                                  When I try to compute the regression any other way (ie individually) I get an error message stating either, "The number of rows and colums in X range cannot be the same, or "X range and Y range must have the same number of rows regardless of labels."

                                  Not sure what I'm doing wrong.
                                  Comment
                                  • maxdalury
                                    Restricted User
                                    • 05-28-09
                                    • 67

                                    #18
                                    ok. you should have a spreadsheet with the first column should be a single dependent variable for each team(pts. scored, win percentage, etc.). Then you should have all the indepent variables in the same row but going on the columns.

                                    so it would be

                                    win % | DSR | ANPY/A | TO/GM | YDS/DRIVE | TOP/DRIVE | YDS/PLAY | 1stD wn/Drive | 3rd & 4th DOWN

                                    for each team and so on.

                                    the key is the first column because in laymans terms the regression is trying to find the best fit formula for the first column based on all the other columns.

                                    hopefully that makes sense
                                    Comment
                                    • threeg5
                                      SBR Sharp
                                      • 07-18-09
                                      • 488

                                      #19
                                      Awesome

                                      So is this equivalent to sabrmetrics in baseball. Man numbers are extremely fascinating. Are they not
                                      Do what you did to get it and don't stop just go and get it!!
                                      Comment
                                      • Formulawiz
                                        Restricted User
                                        • 01-12-09
                                        • 1589

                                        #20
                                        Originally posted by threeg5
                                        So is this equivalent to sabrmetrics in baseball. Man numbers are extremely fascinating. Are they not
                                        As I mentioned previously the high win % obtained using these stats are based on SU and as we all know SU wont get you anywhere. You need to go back and see how these stats all performed ATS and that is quite a bit of work and I can assure you the win/loss % ATS will be in the 50% range.
                                        Comment
                                        • threeg5
                                          SBR Sharp
                                          • 07-18-09
                                          • 488

                                          #21
                                          This may be true but....

                                          Originally posted by Formulawiz
                                          As I mentioned previously the high win % obtained using these stats are based on SU and as we all know SU wont get you anywhere. You need to go back and see how these stats all performed ATS and that is quite a bit of work and I can assure you the win/loss % ATS will be in the 50% range.
                                          But with the stats given it is trial and error albeit quite a bit however, if there is different weights given to each indivual stat could potentially give one an idea within a % of how team a will perform against team b so you may not know a decisive number for score but will have the knowledge that team a will outperform team b buy x% and then have what team b has the potential to score thereby giving an approximation of ability ATS. I say this is but a start maybe it works better or even with ncaa in current format it may not get completed early or this year for that matter but there's football till december (- playoff contentions) and there will be ball next year.
                                          Does this make any sense
                                          Do what you did to get it and don't stop just go and get it!!
                                          Comment
                                          • Formulawiz
                                            Restricted User
                                            • 01-12-09
                                            • 1589

                                            #22
                                            Originally posted by threeg5
                                            But with the stats given it is trial and error albeit quite a bit however, if there is different weights given to each indivual stat could potentially give one an idea within a % of how team a will perform against team b so you may not know a decisive number for score but will have the knowledge that team a will outperform team b buy x% and then have what team b has the potential to score thereby giving an approximation of ability ATS. I say this is but a start maybe it works better or even with ncaa in current format it may not get completed early or this year for that matter but there's football till december (- playoff contentions) and there will be ball next year.
                                            Does this make any sense
                                            If you look at all the winning systems that are presented you will find quite a bit of redundancy between them. If a team performs well with DSR, they will also perform well with ANPY/A, TO, YDS/DR, RZ EFF and etc. The opposite goes for teams performing miserably. I suggest you only use the ANPY/A formula to keep it simple and you probably will get the same win/loss results anyway.
                                            I still think its important to see how the formulas performed ATS, otherwise your spinning your wheels.
                                            Comment
                                            • threeg5
                                              SBR Sharp
                                              • 07-18-09
                                              • 488

                                              #23
                                              Concur

                                              Originally posted by Formulawiz
                                              If you look at all the winning systems that are presented you will find quite a bit of redundancy between them. If a team performs well with DSR, they will also perform well with ANPY/A, TO, YDS/DR, RZ EFF and etc. The opposite goes for teams performing miserably. I suggest you only use the ANPY/A formula to keep it simple and you probably will get the same win/loss results anyway.
                                              I still think its important to see how the formulas performed ATS, otherwise your spinning your wheels.
                                              I agree with the ATS what I am saying is we as a whole should continue the research to find what shows these teams either in their power or their vulnerability either way will be able to help produce a winning result. After all the $$$$ is in the spread
                                              Do what you did to get it and don't stop just go and get it!!
                                              Comment
                                              • maxdalury
                                                Restricted User
                                                • 05-28-09
                                                • 67

                                                #24
                                                This is a start on the right track...and a learning process. Clearly doing this alone will not be profitable in the long run.
                                                Comment
                                                • laxdjock
                                                  SBR MVP
                                                  • 09-15-07
                                                  • 4074

                                                  #25
                                                  The problem you will run into with taking 7 factors into one is that each factor will influence the outcome at a different percentage, therefore you'd need to weight each factor so there is a more accurate relevance between the numbers. You really need to establish some hypothesis to see if certain factors go together, or you'll be crunching your numbers for hours without much to show for it. You'll also run into a problem with statistical significance, as your samples will have a natural limit because going year to year can really open you up to some pretty powerful confounding variables. There are some post-hoc adjustments you can make to help estimate the numbers, but that starts getting really sticky unless you are very comfortable with a stats package and/or have some home-brewed adjustments.

                                                  I've found that isolating on certain factors within a few teams is easier to manage week to week, as you can become more familiar with their numbers. Once you have a few factors, you can run them against each opponent, and then look at the spread #'s compared to those factors. ATS (basic SDs, etc) across books can also produce some interesting numbers, particularly when you factor in open and close #'s. This is where having others involved is beneficial, as there are a lot of ways to dice the information, but it takes a good approach and skill to get workable data.
                                                  Comment
                                                  • maxdalury
                                                    Restricted User
                                                    • 05-28-09
                                                    • 67

                                                    #26
                                                    Originally posted by laxdjock
                                                    The problem you will run into with taking 7 factors into one is that each factor will influence the outcome at a different percentage, therefore you'd need to weight each factor so there is a more accurate relevance between the numbers.
                                                    That's what a regression does.
                                                    Comment
                                                    • laxdjock
                                                      SBR MVP
                                                      • 09-15-07
                                                      • 4074

                                                      #27
                                                      Originally posted by maxdalury
                                                      That's what a regression does.
                                                      I know. I was trying to write a reply that was a bit easier to understand, as his initial inquiry was asking about it, and based on his replies I'm not sure if it was clear why a regression was needed. My main point was to try and establish a rationale for what you are looking for, as the statistics won't mean much much without it. He could dump the data into SPSS and see what sticks....but that can be a dangerous approach to interpretation.
                                                      Comment
                                                      • maxdalury
                                                        Restricted User
                                                        • 05-28-09
                                                        • 67

                                                        #28
                                                        true. what everyone has said before: it is easy to make a model that predicts the past effectively but it is hard to predict the future effectively.
                                                        Comment
                                                        • threeg5
                                                          SBR Sharp
                                                          • 07-18-09
                                                          • 488

                                                          #29
                                                          yes indeed

                                                          Originally posted by maxdalury
                                                          true. what everyone has said before: it is easy to make a model that predicts the past effectively but it is hard to predict the future effectively.

                                                          You can never create a number or factor in a number that estimates a "bad day" or "everything going right"

                                                          Do what you did to get it and don't stop just go and get it!!
                                                          Comment
                                                          • roasthawg
                                                            SBR MVP
                                                            • 11-09-07
                                                            • 2990

                                                            #30
                                                            Originally posted by maxdalury
                                                            true. what everyone has said before: it is easy to make a model that predicts the past effectively but it is hard to predict the future effectively.
                                                            Exactly. I can give you a hundred formulas that would've made you a millionaire if you had them in the past... going forward they're only coin flips though. Tough to beat the books with simple stats that are easily accessible to all.
                                                            Comment
                                                            • Kaplan
                                                              SBR High Roller
                                                              • 01-15-11
                                                              • 165

                                                              #31
                                                              Originally posted by maxdalury
                                                              ok. first of R squared is the rough measurement of how accurate your regression is. you want to get a high R squared.

                                                              and you should not sum up all the x variables
                                                              the point of the regression is to find the weight of each of the individual x variables as it affects the win percentage. By summing it up you are not getting anything. I would suggest that you take one individual dependent variable (pts scored, percentage win) and then add each of the x variables individually.

                                                              then you will have a y = x1 + x2 ... equation, where y is a function of x1 + x2 ... based on their appropriate weights

                                                              I am a relative Newb to using regression. Kind of addicting. How do you determine what a high or acceptable r squared is? I seem to get alot of r squares in the 20-30 % range.
                                                              Comment
                                                              SBR Contests
                                                              Collapse
                                                              Top-Rated US Sportsbooks
                                                              Collapse
                                                              Working...