Regression analysis

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Megaman
    SBR Rookie
    • 12-11-09
    • 23

    #1
    Regression analysis
    I'm working on model to predict soccer scores and having some problems.

    For example, lets say that I'm looking at season totals of a few variables to see how they affect goals scored for teams playing at home
    Using these variables (all are season totals):
    HGF - goals scored by team when playing at home (what I want to predict)
    HCF - number of corners by team when playing at home
    HCA - number of corners against team when playing at home

    Doing linear regression on these results in HSA beeing insignificant and a 4.7*10^-4 significance level on HCF.

    ======================================== =======
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 19.59352 5.39474 3.632 0.000472 ***
    HCF 0.09459 0.05293 1.787 0.077397 .
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 8.125 on 88 degrees of freedom
    (15 observations deleted due to missingness)
    Multiple R-squared: 0.03501, Adjusted R-squared: 0.02405
    F-statistic: 3.193 on 1 and 88 DF, p-value: 0.0774
    ======================================== =======

    This seems all good.

    Now if I try to add these variables:
    ACF - number of corners by team when playing away
    ACA - number of corners against team when playing away

    When adding both in the regression I get this:
    ======================================== =======
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 32.188142 9.960041 3.232 0.00174 **
    HCF -0.005014 0.058215 -0.086 0.93157
    ACA -0.125732 0.054220 -2.319 0.02277 *
    ACF 0.126266 0.074806 1.688 0.09505 .

    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 7.628 on 86 degrees of freedom
    (15 observations deleted due to missingness)
    Multiple R-squared: 0.1689, Adjusted R-squared: 0.1399
    F-statistic: 5.825 on 3 and 86 DF, p-value: 0.001134
    ======================================== =======

    Now HCF have gotten insignificant! ACF is also insignificant, all I get left is ACA.
    Regression on ACA gives:

    ======================================== =======
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 46.29063 4.67482 9.902 5.71e-16 ***
    ACA -0.17510 0.04693 -3.731 0.000337 ***
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 7.686 on 88 degrees of freedom
    (15 observations deleted due to missingness)
    Multiple R-squared: 0.1366, Adjusted R-squared: 0.1268
    F-statistic: 13.92 on 1 and 88 DF, p-value: 0.0003369
    ======================================== =======

    This seems to me very strange, it suggests that the number of corners
    against a team when playing away is a better predictor of goals scored at home
    than the number of corners for the team (be it at home or away).
    The R^2 value is clearly much larger in the second case also. (0.127 vs 0.024)

    I see this kind of things all the time in my analyses, variables that is signigificant
    becomes insignificant when I add other variables. Variables that I think would be
    insignificant throws out previously significant variables.

    How should I handle this?
  • adlai
    SBR Wise Guy
    • 03-11-10
    • 778

    #2
    i don't watch any soccer mind you, but it seems like you might have an endogeneity issue. your other problem, and again i know very little about soccer, but there is hardly any variation in your dataset, which makes a simple ordinary least squares regression absolutely worthless.

    i've said it a thousand times, econometrics has no useful application in sports betting. i'm an applied econometrician, it is what i do for a living. trust me, you're wasting your time.

    also, your r2 is moving from .127 to .024... so it's insignificant regardless. sports statistics in a regression sense are 100% coincidence, there is no useful trend in any of this data.
    Comment
    • donjuan
      SBR MVP
      • 08-29-07
      • 3993

      #3
      i've said it a thousand times, econometrics has no useful application in sports betting. i'm an applied econometrician, it is what i do for a living. trust me, you're wasting your time.
      I'm sure econometrics has little use in sports betting. Statistics, however, has a lot of use. If you think otherwise you clearly have no clue what you are talking about and should stop stealing from whatever company pays you to sit and twiddle your thumb.
      Comment
      • roasthawg
        SBR MVP
        • 11-09-07
        • 2990

        #4
        Originally posted by Megaman
        This seems to me very strange, it suggests that the number of corners
        against a team when playing away is a better predictor of goals scored at home
        than the number of corners for the team (be it at home or away).
        The R^2 value is clearly much larger in the second case also. (0.127 vs 0.024)
        I'm seeing it opposite of this I believe... I think the HCF variable is the "most" significant one according to these results. The fact that it was significant in the first regression proves this to me as no other variables were significant in the remaining tests.
        Comment
        • adlai
          SBR Wise Guy
          • 03-11-10
          • 778

          #5
          Originally posted by donjuan
          I'm sure econometrics has little use in sports betting. Statistics, however, has a lot of use. If you think otherwise you clearly have no clue what you are talking about and should stop stealing from whatever company pays you to sit and twiddle your thumb.
          well this gentleman is dealing with an econometric model. and i'm a contractual employee of several different companies, i do solid work buddy.

          sports betting itself is statistics. why do you think i was initially attracted to the business?
          Comment
          • donjuan
            SBR MVP
            • 08-29-07
            • 3993

            #6
            I hope they get analysis better than this:

            Sports betting and handicapping forum: discuss picks, odds, and predictions for upcoming games and results on latest bets.


            It's pretty LOL when people with lots of training in applied math/stats can't figure out sports betting.
            Comment
            • Megaman
              SBR Rookie
              • 12-11-09
              • 23

              #7
              Originally posted by roasthawg
              I'm seeing it opposite of this I believe... I think the HCF variable is the "most" significant one according to these results. The fact that it was significant in the first regression proves this to me as no other variables were significant in the remaining tests.
              What do you mean by "as no other variables were significant in the remaining tests"?
              Isn't ACA significant in both of the following tests? (sign. level 0.02277 and 0.00033)

              If I had gone the other way around and started with ACF,ACA and then tried to add HCF,HCA I would think HCF to be insignificant.
              Comment
              • Megaman
                SBR Rookie
                • 12-11-09
                • 23

                #8
                Originally posted by adlai
                i don't watch any soccer mind you, but it seems like you might have an endogeneity issue. your other problem, and again i know very little about soccer, but there is hardly any variation in your dataset, which makes a simple ordinary least squares regression absolutely worthless.
                Have to think about this one.

                Originally posted by adlai
                i've said it a thousand times, econometrics has no useful application in sports betting. i'm an applied econometrician, it is what i do for a living. trust me, you're wasting your time.
                Maybe I don't understand what econometrics means, but I thought econometrics were predicting future events/results based on statistics of previous events/results?
                If so, I don't understand how statistics can be usefull but econometrics useless.

                Originally posted by adlai
                also, your r2 is moving from .127 to .024... so it's insignificant regardless. sports statistics in a regression sense are 100% coincidence, there is no useful trend in any of this data.
                It moves from .024 to .127, what is insignificant about it? Are the values too small? Is the increase not large enough?
                Comment
                • Wrecktangle
                  SBR MVP
                  • 03-01-09
                  • 1524

                  #9
                  The value of straight regression in sports modeling has been mostly pounded out of the line since everyone does this first. If you are to find value in "econometric" type modeling, you need to develop new approaches. These days with all the information on the net and easily built dbs (no building "by hand" from the newspapers) you must do this or find your efforts in vain.
                  Comment
                  • adlai
                    SBR Wise Guy
                    • 03-11-10
                    • 778

                    #10
                    Originally posted by donjuan
                    I hope they get analysis better than this:

                    Sports betting and handicapping forum: discuss picks, odds, and predictions for upcoming games and results on latest bets.


                    It's pretty LOL when people with lots of training in applied math/stats can't figure out sports betting.
                    haha... well that bet had another winner last night with the red wings.

                    my second piece of advice, always bet for a team in game 2 that lost game 1 on their home ice, is 6-0 this year. look at the results before you bash buddy. but hey, these are just results, i'd rather make up some bullshit regression model to let me forecast my picks.

                    knowing the failure of statistics in sports betting is why i have been so successful over the last decade.
                    Comment
                    • adlai
                      SBR Wise Guy
                      • 03-11-10
                      • 778

                      #11
                      Originally posted by Megaman
                      Have to think about this one.



                      Maybe I don't understand what econometrics means, but I thought econometrics were predicting future events/results based on statistics of previous events/results?
                      If so, I don't understand how statistics can be usefull but econometrics useless.



                      It moves from .024 to .127, what is insignificant about it? Are the values too small? Is the increase not large enough?
                      r2 is a simple goodness of fit measure. it ranges from 0-1, 1 being the better of the two. if this is your only measure of the goodness of fit of this model, no, it is not good.

                      and econometrics is the study or practice of regression analysis.
                      Comment
                      • roasthawg
                        SBR MVP
                        • 11-09-07
                        • 2990

                        #12
                        Originally posted by Megaman
                        What do you mean by "as no other variables were significant in the remaining tests"?
                        Isn't ACA significant in both of the following tests? (sign. level 0.02277 and 0.00033)

                        If I had gone the other way around and started with ACF,ACA and then tried to add HCF,HCA I would think HCF to be insignificant.
                        My bad, I was highly medicated when I replied last night. Yeah, apparently the level of competition is the more important factor here.
                        Comment
                        • durito
                          SBR Posting Legend
                          • 07-03-06
                          • 13173

                          #13
                          Originally posted by adlai
                          haha... well that bet had another winner last night with the red wings.

                          my second piece of advice, always bet for a team in game 2 that lost game 1 on their home ice, is 6-0 this year. look at the results before you bash buddy. but hey, these are just results, i'd rather make up some bullshit regression model to let me forecast my picks.

                          knowing the failure of statistics in sports betting is why i have been so successful over the last decade.
                          define sucessful
                          Comment
                          • skrtelfan
                            SBR MVP
                            • 10-09-08
                            • 1913

                            #14
                            Originally posted by adlai
                            haha... well that bet had another winner last night with the red wings.
                            "Another" winner? That angle went 0-4 last year!

                            my second piece of advice, always bet for a team in game 2 that lost game 1 on their home ice, is 6-0 this year.
                            And that one was 0-2 last year.
                            Comment
                            • donjuan
                              SBR MVP
                              • 08-29-07
                              • 3993

                              #15
                              Originally posted by adlai
                              haha... well that bet had another winner last night with the red wings.

                              my second piece of advice, always bet for a team in game 2 that lost game 1 on their home ice, is 6-0 this year. look at the results before you bash buddy. but hey, these are just results, i'd rather make up some bullshit regression model to let me forecast my picks.

                              knowing the failure of statistics in sports betting is why i have been so successful over the last decade.
                              For an econometrician to back up his statement by saying some angle is 6-0 this year is just lolololololololol. Would love to hear why you think this will be +ev going forward. Like Durito, I'd also be interested to hear what you define as successful.
                              Comment
                              • adlai
                                SBR Wise Guy
                                • 03-11-10
                                • 778

                                #16
                                Originally posted by skrtelfan
                                "Another" winner? That angle went 0-4 last year!



                                And that one was 0-2 last year.
                                so, in the 2 years you want to compare, and we aren't even halfway through the nhl playoffs, the record stands at 6-2. you are correct, this is an awful return on your investment.

                                the betting on a team facing elimination on home court is more powerful in the nba. but over the long run has proven to be profitable in both the mlb and nhl as well. like i said before, you are all much more intelligent than me with your fancy ti83 calculators and meaningless data. i knew i would get some crap for posting the comment... i regret it, i admit. if you don't want the advice, don't take it. but i'll be taking the flyers tonight.

                                people in these forums always want to bash other people's ideas on strategy. this is not the only take on betting the playoffs, it is one of many that i use. maybe i should have worded it, "never bet against," would this make anyone more happy... probably not. anyway, good luck with your overworked statistical crap, keep finding more sophisticated ways of losing money.

                                i won't be checking this thread again.

                                gl.
                                Comment
                                • ForgetWallStreet
                                  SBR Sharp
                                  • 04-27-07
                                  • 342

                                  #17
                                  Originally posted by adlai
                                  so, in the 2 years you want to compare, and we aren't even halfway through the nhl playoffs, the record stands at 6-2. you are correct, this is an awful return on your investment.

                                  the betting on a team facing elimination on home court is more powerful in the nba. but over the long run has proven to be profitable in both the mlb and nhl as well. like i said before, you are all much more intelligent than me with your fancy ti83 calculators and meaningless data. i knew i would get some crap for posting the comment... i regret it, i admit. if you don't want the advice, don't take it. but i'll be taking the flyers tonight.

                                  people in these forums always want to bash other people's ideas on strategy. this is not the only take on betting the playoffs, it is one of many that i use. maybe i should have worded it, "never bet against," would this make anyone more happy... probably not. anyway, good luck with your overworked statistical crap, keep finding more sophisticated ways of losing money.

                                  i won't be checking this thread again.

                                  gl.
                                  I feel legitimately bad for your employer.
                                  Comment
                                  • skrtelfan
                                    SBR MVP
                                    • 10-09-08
                                    • 1913

                                    #18
                                    Originally posted by adlai
                                    so, in the 2 years you want to compare, and we aren't even halfway through the nhl playoffs, the record stands at 6-2. you are correct, this is an awful return on your investment.
                                    No, that would be 7-6 for your two angles. You boasted about "another winner" from two angles that didn't even win a single game last season!

                                    the betting on a team facing elimination on home court is more powerful in the nba.
                                    Yes, it's so powerful that the team facing elimination is 5-16 straight up and 5-13-3 ATS since 2002.

                                    people in these forums always want to bash other people's ideas on strategy. this is not the only take on betting the playoffs, it is one of many that i use. maybe i should have worded it, "never bet against," would this make anyone more happy... probably not. anyway, good luck with your overworked statistical crap, keep finding more sophisticated ways of losing money.

                                    i won't be checking this thread again.

                                    gl.
                                    I wouldn't check the thread again either if I was shown to be full of crap.
                                    Comment
                                    • Wrecktangle
                                      SBR MVP
                                      • 03-01-09
                                      • 1524

                                      #19
                                      I don't know why it bothers me so much when a "math/stat" guy goes over to the dark side of angle plays, but it does.
                                      Comment
                                      • mathdotcom
                                        SBR Posting Legend
                                        • 03-24-08
                                        • 11689

                                        #20
                                        Megaman,

                                        Your dependent variable is a count variable. It is not continuous. You should not be running OLS.

                                        The problem with count models are mostly the same as probit/logit models for binary outcomes. A probit/logit model is a) very likely to give you insignificant coefficients on variables other than market price (doesn't help you), and b) has interpretation issues. The coefficients don't relate nicely to things you're interested in like "bias in market price", etc.

                                        Also as others have said you seem to have quite a few endogeneity problems.
                                        Comment
                                        • mathdotcom
                                          SBR Posting Legend
                                          • 03-24-08
                                          • 11689

                                          #21
                                          Megaman, I'm confused about a couple other things. Your dependent variable is number of goals scored by home team. How do you even have as an independent variable the number of goals scored when away? Is this the number of goals they scored in their last away game, or something like that? Don't trust any of your results sir, at least not so far.
                                          Comment
                                          • Megaman
                                            SBR Rookie
                                            • 12-11-09
                                            • 23

                                            #22
                                            Originally posted by mathdotcom
                                            Megaman,

                                            Your dependent variable is a count variable. It is not continuous. You should not be running OLS.
                                            What do you suggest I do instead then?

                                            Originally posted by mathdotcom
                                            Megaman, I'm confused about a couple other things. Your dependent variable is number of goals scored by home team. How do you even have as an independent variable the number of goals scored when away? Is this the number of goals they scored in their last away game, or something like that? Don't trust any of your results sir, at least not so far.
                                            All data in my example are final season totals. Right now I'm mostly playing around and trying to get a hang of the regression process. Learning how regression works, the modeling process, what variables might be significant and so on. That is mostly why I asked the question in the first place.
                                            Comment
                                            • mathdotcom
                                              SBR Posting Legend
                                              • 03-24-08
                                              • 11689

                                              #23
                                              No offense but you really need to pick up a book on regression analysis. You need to understand:

                                              - endogeneity
                                              - omitted variables bias

                                              You can't just add or remove variables and see what happens. If you don't understand what the program is computing for you, you're going to run into issues like the ones you've posted here about and be left clueless. I can't teach you regression analysis in the thread.
                                              Comment
                                              SBR Contests
                                              Collapse
                                              Top-Rated US Sportsbooks
                                              Collapse
                                              Working...