1. #1
    a4u2fear
    TEASE IT
    a4u2fear's Avatar SBR PRO
    Join Date: 01-29-10
    Posts: 8,147
    Betpoints: 35459

    Regression Help

    I'm performing a linear regression over multiple years worth of data.

    I know of a few categories that really make one team have a significant advantage over another team.

    Issue is, during regression analysis, these categories have a higher than desirable p-value -meaning it's significance over the games evaluated, is not much.

    This is because most teams, and therefore most outcomes, do not have these advantages.

    Also, the X variable values of these categories come out to be the opposite value than they ought to be. i.e. if the team is very good in this category and have a higher value in it, for example the more shots on goal the better (don't think it's that simple though), then you would expect the X variable in the regression to be positive/+ (more shots=more goals), but it actually comes out to be negative. So the teams better at this category are hurt more than they should be.

    Any help on how to fix this and/or incorporate these categories? I'm not specific on purpose, but hopefully the jist is there.

    Thanks.

  2. #2
    statnerds
    Put me in coach
    statnerds's Avatar Become A Pro!
    Join Date: 09-23-09
    Posts: 4,047
    Betpoints: 103

    two things. you acknowledge the lack of specificity, which hinders my ability to input. guys further advanced might be able to offer insight with limited info. sorry.

    the other thing is you reminded of Jaguar Sports, oh how i miss them. they were the only book i ever found that listed total shots as a prop. used to cash on those a lot. prob why no other book did/does them.

  3. #3
    a4u2fear
    TEASE IT
    a4u2fear's Avatar SBR PRO
    Join Date: 01-29-10
    Posts: 8,147
    Betpoints: 35459

    Let me try another example.

    Three NFL teams score at least 1TD on defense or special teams, every game, no matter what.

    29 other teams rarely score these TDs, so when performing a regression to see how stats contribute to points scored, it's likely that this category would have a negative multiplier, since 29/32 teams do not have these TDs that contribute to their overall points .

    But this stat sets these teams above the rest and cannot be discarded

  4. #4
    antonyp22
    antonyp22's Avatar Become A Pro!
    Join Date: 01-12-14
    Posts: 78
    Betpoints: 2528

    With respect to the significant levels of variables the higher than expected p-value could be caused by the fact that some variables are already taken into account by others.

  5. #5
    a4u2fear
    TEASE IT
    a4u2fear's Avatar SBR PRO
    Join Date: 01-29-10
    Posts: 8,147
    Betpoints: 35459

    analyzing the regression formula I came up with. Out of 164 games to choose from. I played 86 of them, with a 66.3% win pct in the NFL.
    28/41 wins playing overs (2 pt or larger difference), and 29/45 wins playing unders (3 pt or larger difference)

  6. #6
    bihon
    bihon's Avatar Become A Pro!
    Join Date: 11-03-09
    Posts: 731

    Quote Originally Posted by a4u2fear View Post
    Out of 164 games to choose from. I played 86 of them, with a 66.3% win pct in the NFL.
    O/U assume decimal odds of about 1.9.

    It looks promissing although the sample is not really significant. If you can keep wpct above 55% after a few hundred bets, that will be great.

    Keep posting please.

  7. #7
    a4u2fear
    TEASE IT
    a4u2fear's Avatar SBR PRO
    Join Date: 01-29-10
    Posts: 8,147
    Betpoints: 35459

    Quote Originally Posted by bihon View Post
    O/U assume decimal odds of about 1.9.

    It looks promissing although the sample is not really significant. If you can keep wpct above 55% after a few hundred bets, that will be great.

    Keep posting please.
    I've gotten there above 150 bets at 62%, though it only used regression of a points and yards using an adjusted formula

    this regression contains many more variables

    got tons of snow in buffalo so I've had time off work, if I'm off again tomorrow I can Import another full season.

  8. #8
    bihon
    bihon's Avatar Become A Pro!
    Join Date: 11-03-09
    Posts: 731

    Quote Originally Posted by a4u2fear View Post
    Let me try another example.

    Three NFL teams score at least 1TD on defense or special teams, every game, no matter what.

    29 other teams rarely score these TDs, so when performing a regression to see how stats contribute to points scored, it's likely that this category would have a negative multiplier, since 29/32 teams do not have these TDs that contribute to their overall points .

    But this stat sets these teams above the rest and cannot be discarded

    Maybe one way would be to find importance of that particular variable and multiply the value of only the positive teams accordingly. So you don't have to use that variable in further calculations.

    e.g.
    t1=5; totval1 (w/o t1)= 40
    t2=3; totval2=60
    t3=7; totval3=60
    trest=0; various totvals
    avgtotval (32 teams)=50

    Importance factor =1.3 (50*0.3*3 teams=45)

    Corrected values:

    totval1=0.33*45+40=54.85
    totval2=0.2*45+60=69
    totval3=0.47*45+60=81.15

    ...or something in that direction.
    Last edited by bihon; 11-19-14 at 04:27 PM.

  9. #9
    a4u2fear
    TEASE IT
    a4u2fear's Avatar SBR PRO
    Join Date: 01-29-10
    Posts: 8,147
    Betpoints: 35459

    yea, I will try that, see if I can come up with something.

    off of work again today due to snow, now have 2 years worth of checking data:
    playing overs, with difference of 2 pts, 52/83=62.7%
    with difference of 3 pts, 43/63=65.2%

    unders, with difference of 3 pts, 73/135=54.1%

    overs remain good, unders falling off, more research needed, will try for third season import.

  10. #10
    bihon
    bihon's Avatar Become A Pro!
    Join Date: 11-03-09
    Posts: 731

    Quote Originally Posted by bihon View Post
    Importance factor =1.3 (50*0.3*3 teams=45)
    Of course, summing three teams was too fast and logically wrong.

    't' value must be directly related to 'totval' value, such as t=0.05*totval or similar.

  11. #11
    bihon
    bihon's Avatar Become A Pro!
    Join Date: 11-03-09
    Posts: 731

    Quote Originally Posted by a4u2fear View Post
    off of work again today due to snow, now have 2 years worth of checking data:
    playing overs, with difference of 2 pts, 52/83=62.7%
    with difference of 3 pts, 43/63=65.2%
    Not really much data for testing, but anyway be careful with it.
    E.g. don't throw it at once and tweak, but instead use the first set of data and apply results on the second set (not part of the first one).
    If you obtain similar results, then you're onto something.

  12. #12
    a4u2fear
    TEASE IT
    a4u2fear's Avatar SBR PRO
    Join Date: 01-29-10
    Posts: 8,147
    Betpoints: 35459

    Quote Originally Posted by bihon View Post
    Not really much data for testing, but anyway be careful with it.
    E.g. don't throw it at once and tweak, but instead use the first set of data and apply results on the second set (not part of the first one).
    If you obtain similar results, then you're onto something.
    Yep, obtained using a few years, tested on another year not included in original obtainment

  13. #13
    a4u2fear
    TEASE IT
    a4u2fear's Avatar SBR PRO
    Join Date: 01-29-10
    Posts: 8,147
    Betpoints: 35459

    Small update, I've had good success with regressions and NFL totals, not so much with sides

    i would compare the estimated totals for each team and create the line, and see which side was worth betting, never getting really good results.

    so I went back, with the notion that if you can pick the winner, of the game you will win the spread 75-80% of the time. Doing this, I was immediately able to pick 60% winners over 200+ games.

    very intriguing, yet, so simple

  14. #14
    antonyp22
    antonyp22's Avatar Become A Pro!
    Join Date: 01-12-14
    Posts: 78
    Betpoints: 2528

    Not surprising considering that NFL totals would be a softer market than NFL sides.

    Try a regression with the line itself being your dependent variable/output rather than estimated totals for both teams - keep in mind the NFL sides market is one of the most efficient markets in the world.

  15. #15
    peacebyinches
    pull the trigger
    peacebyinches's Avatar SBR PRO
    Join Date: 02-13-10
    Posts: 1,108
    Betpoints: 7802

    would you be able to post your design matrix? I worry it may not be full rank (aka, watch out for auto correlation), this could be trouble potentially

Top