1. #1
    Megaman
    Megaman's Avatar Become A Pro!
    Join Date: 12-11-09
    Posts: 23
    Betpoints: 384

    Regression analysis

    I'm working on model to predict soccer scores and having some problems.

    For example, lets say that I'm looking at season totals of a few variables to see how they affect goals scored for teams playing at home
    Using these variables (all are season totals):
    HGF - goals scored by team when playing at home (what I want to predict)
    HCF - number of corners by team when playing at home
    HCA - number of corners against team when playing at home

    Doing linear regression on these results in HSA beeing insignificant and a 4.7*10^-4 significance level on HCF.

    ======================================== =======
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 19.59352 5.39474 3.632 0.000472 ***
    HCF 0.09459 0.05293 1.787 0.077397 .
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 8.125 on 88 degrees of freedom
    (15 observations deleted due to missingness)
    Multiple R-squared: 0.03501, Adjusted R-squared: 0.02405
    F-statistic: 3.193 on 1 and 88 DF, p-value: 0.0774
    ======================================== =======

    This seems all good.

    Now if I try to add these variables:
    ACF - number of corners by team when playing away
    ACA - number of corners against team when playing away

    When adding both in the regression I get this:
    ======================================== =======
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 32.188142 9.960041 3.232 0.00174 **
    HCF -0.005014 0.058215 -0.086 0.93157
    ACA -0.125732 0.054220 -2.319 0.02277 *
    ACF 0.126266 0.074806 1.688 0.09505 .

    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 7.628 on 86 degrees of freedom
    (15 observations deleted due to missingness)
    Multiple R-squared: 0.1689, Adjusted R-squared: 0.1399
    F-statistic: 5.825 on 3 and 86 DF, p-value: 0.001134
    ======================================== =======

    Now HCF have gotten insignificant! ACF is also insignificant, all I get left is ACA.
    Regression on ACA gives:

    ======================================== =======
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 46.29063 4.67482 9.902 5.71e-16 ***
    ACA -0.17510 0.04693 -3.731 0.000337 ***
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 7.686 on 88 degrees of freedom
    (15 observations deleted due to missingness)
    Multiple R-squared: 0.1366, Adjusted R-squared: 0.1268
    F-statistic: 13.92 on 1 and 88 DF, p-value: 0.0003369
    ======================================== =======

    This seems to me very strange, it suggests that the number of corners
    against a team when playing away is a better predictor of goals scored at home
    than the number of corners for the team (be it at home or away).
    The R^2 value is clearly much larger in the second case also. (0.127 vs 0.024)

    I see this kind of things all the time in my analyses, variables that is signigificant
    becomes insignificant when I add other variables. Variables that I think would be
    insignificant throws out previously significant variables.

    How should I handle this?

  2. #2
    adlai
    adlai's Avatar Become A Pro!
    Join Date: 03-11-10
    Posts: 778

    i don't watch any soccer mind you, but it seems like you might have an endogeneity issue. your other problem, and again i know very little about soccer, but there is hardly any variation in your dataset, which makes a simple ordinary least squares regression absolutely worthless.

    i've said it a thousand times, econometrics has no useful application in sports betting. i'm an applied econometrician, it is what i do for a living. trust me, you're wasting your time.

    also, your r2 is moving from .127 to .024... so it's insignificant regardless. sports statistics in a regression sense are 100% coincidence, there is no useful trend in any of this data.

  3. #3
    donjuan
    donjuan's Avatar Become A Pro!
    Join Date: 08-29-07
    Posts: 3,993
    Betpoints: 7537

    i've said it a thousand times, econometrics has no useful application in sports betting. i'm an applied econometrician, it is what i do for a living. trust me, you're wasting your time.
    I'm sure econometrics has little use in sports betting. Statistics, however, has a lot of use. If you think otherwise you clearly have no clue what you are talking about and should stop stealing from whatever company pays you to sit and twiddle your thumb.

  4. #4
    roasthawg
    roasthawg's Avatar Become A Pro!
    Join Date: 11-09-07
    Posts: 2,990

    Quote Originally Posted by Megaman View Post
    This seems to me very strange, it suggests that the number of corners
    against a team when playing away is a better predictor of goals scored at home
    than the number of corners for the team (be it at home or away).
    The R^2 value is clearly much larger in the second case also. (0.127 vs 0.024)
    I'm seeing it opposite of this I believe... I think the HCF variable is the "most" significant one according to these results. The fact that it was significant in the first regression proves this to me as no other variables were significant in the remaining tests.

  5. #5
    adlai
    adlai's Avatar Become A Pro!
    Join Date: 03-11-10
    Posts: 778

    Quote Originally Posted by donjuan View Post
    I'm sure econometrics has little use in sports betting. Statistics, however, has a lot of use. If you think otherwise you clearly have no clue what you are talking about and should stop stealing from whatever company pays you to sit and twiddle your thumb.
    well this gentleman is dealing with an econometric model. and i'm a contractual employee of several different companies, i do solid work buddy.

    sports betting itself is statistics. why do you think i was initially attracted to the business?

  6. #6
    donjuan
    donjuan's Avatar Become A Pro!
    Join Date: 08-29-07
    Posts: 3,993
    Betpoints: 7537

    I hope they get analysis better than this:

    http://www.sportsbookreview.com/forum/players-ta...al-sports.html

    It's pretty LOL when people with lots of training in applied math/stats can't figure out sports betting.

  7. #7
    Megaman
    Megaman's Avatar Become A Pro!
    Join Date: 12-11-09
    Posts: 23
    Betpoints: 384

    Quote Originally Posted by roasthawg View Post
    I'm seeing it opposite of this I believe... I think the HCF variable is the "most" significant one according to these results. The fact that it was significant in the first regression proves this to me as no other variables were significant in the remaining tests.
    What do you mean by "as no other variables were significant in the remaining tests"?
    Isn't ACA significant in both of the following tests? (sign. level 0.02277 and 0.00033)

    If I had gone the other way around and started with ACF,ACA and then tried to add HCF,HCA I would think HCF to be insignificant.

  8. #8
    Megaman
    Megaman's Avatar Become A Pro!
    Join Date: 12-11-09
    Posts: 23
    Betpoints: 384

    Quote Originally Posted by adlai View Post
    i don't watch any soccer mind you, but it seems like you might have an endogeneity issue. your other problem, and again i know very little about soccer, but there is hardly any variation in your dataset, which makes a simple ordinary least squares regression absolutely worthless.
    Have to think about this one.

    Quote Originally Posted by adlai View Post
    i've said it a thousand times, econometrics has no useful application in sports betting. i'm an applied econometrician, it is what i do for a living. trust me, you're wasting your time.
    Maybe I don't understand what econometrics means, but I thought econometrics were predicting future events/results based on statistics of previous events/results?
    If so, I don't understand how statistics can be usefull but econometrics useless.

    Quote Originally Posted by adlai View Post
    also, your r2 is moving from .127 to .024... so it's insignificant regardless. sports statistics in a regression sense are 100% coincidence, there is no useful trend in any of this data.
    It moves from .024 to .127, what is insignificant about it? Are the values too small? Is the increase not large enough?

  9. #9
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    The value of straight regression in sports modeling has been mostly pounded out of the line since everyone does this first. If you are to find value in "econometric" type modeling, you need to develop new approaches. These days with all the information on the net and easily built dbs (no building "by hand" from the newspapers) you must do this or find your efforts in vain.

  10. #10
    adlai
    adlai's Avatar Become A Pro!
    Join Date: 03-11-10
    Posts: 778

    Quote Originally Posted by donjuan View Post
    I hope they get analysis better than this:

    http://www.sportsbookreview.com/forum/players-ta...al-sports.html

    It's pretty LOL when people with lots of training in applied math/stats can't figure out sports betting.
    haha... well that bet had another winner last night with the red wings.

    my second piece of advice, always bet for a team in game 2 that lost game 1 on their home ice, is 6-0 this year. look at the results before you bash buddy. but hey, these are just results, i'd rather make up some bullshit regression model to let me forecast my picks.

    knowing the failure of statistics in sports betting is why i have been so successful over the last decade.

  11. #11
    adlai
    adlai's Avatar Become A Pro!
    Join Date: 03-11-10
    Posts: 778

    Quote Originally Posted by Megaman View Post
    Have to think about this one.



    Maybe I don't understand what econometrics means, but I thought econometrics were predicting future events/results based on statistics of previous events/results?
    If so, I don't understand how statistics can be usefull but econometrics useless.



    It moves from .024 to .127, what is insignificant about it? Are the values too small? Is the increase not large enough?
    r2 is a simple goodness of fit measure. it ranges from 0-1, 1 being the better of the two. if this is your only measure of the goodness of fit of this model, no, it is not good.

    and econometrics is the study or practice of regression analysis.

  12. #12
    roasthawg
    roasthawg's Avatar Become A Pro!
    Join Date: 11-09-07
    Posts: 2,990

    Quote Originally Posted by Megaman View Post
    What do you mean by "as no other variables were significant in the remaining tests"?
    Isn't ACA significant in both of the following tests? (sign. level 0.02277 and 0.00033)

    If I had gone the other way around and started with ACF,ACA and then tried to add HCF,HCA I would think HCF to be insignificant.
    My bad, I was highly medicated when I replied last night. Yeah, apparently the level of competition is the more important factor here.

  13. #13
    durito
    escarabajo negro
    durito's Avatar Become A Pro!
    Join Date: 07-03-06
    Posts: 13,173
    Betpoints: 438

    Quote Originally Posted by adlai View Post
    haha... well that bet had another winner last night with the red wings.

    my second piece of advice, always bet for a team in game 2 that lost game 1 on their home ice, is 6-0 this year. look at the results before you bash buddy. but hey, these are just results, i'd rather make up some bullshit regression model to let me forecast my picks.

    knowing the failure of statistics in sports betting is why i have been so successful over the last decade.
    define sucessful

  14. #14
    skrtelfan
    skrtelfan's Avatar Become A Pro!
    Join Date: 10-09-08
    Posts: 1,913
    Betpoints: 3337

    Quote Originally Posted by adlai View Post
    haha... well that bet had another winner last night with the red wings.
    "Another" winner? That angle went 0-4 last year!

    my second piece of advice, always bet for a team in game 2 that lost game 1 on their home ice, is 6-0 this year.
    And that one was 0-2 last year.

  15. #15
    donjuan
    donjuan's Avatar Become A Pro!
    Join Date: 08-29-07
    Posts: 3,993
    Betpoints: 7537

    Quote Originally Posted by adlai View Post
    haha... well that bet had another winner last night with the red wings.

    my second piece of advice, always bet for a team in game 2 that lost game 1 on their home ice, is 6-0 this year. look at the results before you bash buddy. but hey, these are just results, i'd rather make up some bullshit regression model to let me forecast my picks.

    knowing the failure of statistics in sports betting is why i have been so successful over the last decade.
    For an econometrician to back up his statement by saying some angle is 6-0 this year is just lolololololololol. Would love to hear why you think this will be +ev going forward. Like Durito, I'd also be interested to hear what you define as successful.

  16. #16
    adlai
    adlai's Avatar Become A Pro!
    Join Date: 03-11-10
    Posts: 778

    Quote Originally Posted by skrtelfan View Post
    "Another" winner? That angle went 0-4 last year!



    And that one was 0-2 last year.
    so, in the 2 years you want to compare, and we aren't even halfway through the nhl playoffs, the record stands at 6-2. you are correct, this is an awful return on your investment.

    the betting on a team facing elimination on home court is more powerful in the nba. but over the long run has proven to be profitable in both the mlb and nhl as well. like i said before, you are all much more intelligent than me with your fancy ti83 calculators and meaningless data. i knew i would get some crap for posting the comment... i regret it, i admit. if you don't want the advice, don't take it. but i'll be taking the flyers tonight.

    people in these forums always want to bash other people's ideas on strategy. this is not the only take on betting the playoffs, it is one of many that i use. maybe i should have worded it, "never bet against," would this make anyone more happy... probably not. anyway, good luck with your overworked statistical crap, keep finding more sophisticated ways of losing money.

    i won't be checking this thread again.

    gl.

  17. #17
    ForgetWallStreet
    ForgetWallStreet's Avatar Become A Pro!
    Join Date: 04-27-07
    Posts: 342

    Quote Originally Posted by adlai View Post
    so, in the 2 years you want to compare, and we aren't even halfway through the nhl playoffs, the record stands at 6-2. you are correct, this is an awful return on your investment.

    the betting on a team facing elimination on home court is more powerful in the nba. but over the long run has proven to be profitable in both the mlb and nhl as well. like i said before, you are all much more intelligent than me with your fancy ti83 calculators and meaningless data. i knew i would get some crap for posting the comment... i regret it, i admit. if you don't want the advice, don't take it. but i'll be taking the flyers tonight.

    people in these forums always want to bash other people's ideas on strategy. this is not the only take on betting the playoffs, it is one of many that i use. maybe i should have worded it, "never bet against," would this make anyone more happy... probably not. anyway, good luck with your overworked statistical crap, keep finding more sophisticated ways of losing money.

    i won't be checking this thread again.

    gl.
    I feel legitimately bad for your employer.

  18. #18
    skrtelfan
    skrtelfan's Avatar Become A Pro!
    Join Date: 10-09-08
    Posts: 1,913
    Betpoints: 3337

    Quote Originally Posted by adlai View Post
    so, in the 2 years you want to compare, and we aren't even halfway through the nhl playoffs, the record stands at 6-2. you are correct, this is an awful return on your investment.
    No, that would be 7-6 for your two angles. You boasted about "another winner" from two angles that didn't even win a single game last season!

    the betting on a team facing elimination on home court is more powerful in the nba.
    Yes, it's so powerful that the team facing elimination is 5-16 straight up and 5-13-3 ATS since 2002.

    people in these forums always want to bash other people's ideas on strategy. this is not the only take on betting the playoffs, it is one of many that i use. maybe i should have worded it, "never bet against," would this make anyone more happy... probably not. anyway, good luck with your overworked statistical crap, keep finding more sophisticated ways of losing money.

    i won't be checking this thread again.

    gl.
    I wouldn't check the thread again either if I was shown to be full of crap.

  19. #19
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    I don't know why it bothers me so much when a "math/stat" guy goes over to the dark side of angle plays, but it does.

  20. #20
    mathdotcom
    mathdotcom's Avatar Become A Pro!
    Join Date: 03-24-08
    Posts: 11,689
    Betpoints: 1943

    Megaman,

    Your dependent variable is a count variable. It is not continuous. You should not be running OLS.

    The problem with count models are mostly the same as probit/logit models for binary outcomes. A probit/logit model is a) very likely to give you insignificant coefficients on variables other than market price (doesn't help you), and b) has interpretation issues. The coefficients don't relate nicely to things you're interested in like "bias in market price", etc.

    Also as others have said you seem to have quite a few endogeneity problems.

  21. #21
    mathdotcom
    mathdotcom's Avatar Become A Pro!
    Join Date: 03-24-08
    Posts: 11,689
    Betpoints: 1943

    Megaman, I'm confused about a couple other things. Your dependent variable is number of goals scored by home team. How do you even have as an independent variable the number of goals scored when away? Is this the number of goals they scored in their last away game, or something like that? Don't trust any of your results sir, at least not so far.

  22. #22
    Megaman
    Megaman's Avatar Become A Pro!
    Join Date: 12-11-09
    Posts: 23
    Betpoints: 384

    Quote Originally Posted by mathdotcom View Post
    Megaman,

    Your dependent variable is a count variable. It is not continuous. You should not be running OLS.
    What do you suggest I do instead then?

    Quote Originally Posted by mathdotcom View Post
    Megaman, I'm confused about a couple other things. Your dependent variable is number of goals scored by home team. How do you even have as an independent variable the number of goals scored when away? Is this the number of goals they scored in their last away game, or something like that? Don't trust any of your results sir, at least not so far.
    All data in my example are final season totals. Right now I'm mostly playing around and trying to get a hang of the regression process. Learning how regression works, the modeling process, what variables might be significant and so on. That is mostly why I asked the question in the first place.

  23. #23
    mathdotcom
    mathdotcom's Avatar Become A Pro!
    Join Date: 03-24-08
    Posts: 11,689
    Betpoints: 1943

    No offense but you really need to pick up a book on regression analysis. You need to understand:

    - endogeneity
    - omitted variables bias

    You can't just add or remove variables and see what happens. If you don't understand what the program is computing for you, you're going to run into issues like the ones you've posted here about and be left clueless. I can't teach you regression analysis in the thread.

Top