1. #1
    clarkacal
    clarkacal's Avatar Become A Pro!
    Join Date: 11-03-09
    Posts: 353

    Data snooping

    I'm relatively new to sports betting but not new to gambling. I have always taken an analytical approach through research and data analysis, but I was worried about data snooping when collecting data from previous years to find profitable betting strategies. I was wondering how big of a problem it is and at what point is the sample size large enough to be reliable?

  2. #2
    Wojo
    Wojo's Avatar Become A Pro!
    Join Date: 03-19-10
    Posts: 1,764
    Betpoints: 9513

    I don't understand what you mean by data snooping.

  3. #3
    clarkacal
    clarkacal's Avatar Become A Pro!
    Join Date: 11-03-09
    Posts: 353

    When you think you've found parameters which have +ev but turn out to be only +ev for a specific set of data and not necessarily for future data

  4. #4
    mathdotcom
    mathdotcom's Avatar Become A Pro!
    Join Date: 03-24-08
    Posts: 11,689
    Betpoints: 1943

    This is the classic question in inference and it does not have a universal answer.

    Depends on the sport, whether the rules changed recently [hockey lockout, steroid crackdown], etc. etc.

    You will never know if something that has been profitable every year til now will be profitable in 2010.

  5. #5
    yankeerick
    yankeerick's Avatar Become A Pro!
    Join Date: 09-07-09
    Posts: 1,171
    Betpoints: 79


  6. #6
    clarkacal
    clarkacal's Avatar Become A Pro!
    Join Date: 11-03-09
    Posts: 353

    Mathdotcom I understand u cant be sure, that's why it's gambling. But what I'm wondering is whether the initial theory is simply a result of that particular data or is it from a large enough sample that it can be expected to repeat with approximately the same results.
    For example on a game with known odds such as craps (but you didn't know the odds you were just testing data)you gathered data from 5000 rolls of the dice.

    theory a: a seven rolls more than an 8 at 6.25:5 and a seven rolls more than a 6 at 5.75:5

    theory b: first time left handed female shooters have an average roll of 7.75 so you should place the high numbers

    Theory a isn't perfect but it approximates the true odds. Theory b is obviously a joke and you'll go broke, but if you're only dealing with numbers in your theory it isn't as obvious. How do you know which category your theory fits into?

  7. #7
    Flying Dutchman
    Floggings continue until morale improves
    Flying Dutchman's Avatar Become A Pro!
    Join Date: 05-17-09
    Posts: 2,467
    Betpoints: 759

    http://en.wikipedia.org/wiki/Data-snooping_bias

    some folks might also refer to this as overfitting...

  8. #8
    SportsbetTracker
    SportsbetTracker's Avatar Become A Pro!
    Join Date: 04-30-10
    Posts: 26
    Betpoints: 55

    I have been testing systems myself. In fact, it's my major focus. I am in the process of obtaining every box score for the four professional sports and NCAA Division I for basketball and football since 2000. For now, I the final scores and final lines of all teams for all events, which helps me with some systems (Morrison, et. al.).

    Let's face it, sports gamblers are people who A) have a sense of rationality, but also, B) a desire to "fit" facts to theories, rather than let the theories acknowledge the facts. For the most part, system players promote A heavily, while implying B just as heavily. They don't consider the unstated C) Facts are facts and cannot be modfied.

    To that end, though, sports betting IS based upon the HUMAN factor, and NOT the physics factor. Vegas was built on craps, roulette, and slots, NOT on the Super Bowl or the NBA Finals. So there is always going to be handicapping the events, and with every event played, there is another event that can be scrutinized and analyzed for future calculations.

    Data mining has its advantages in allowing sharps to utilize historical trends, but is only a tool, and not a true system process.

  9. #9
    Pokerjoe
    Pokerjoe's Avatar Become A Pro!
    Join Date: 04-17-09
    Posts: 704
    Betpoints: 307

    Here's the basic conundrum: the smaller the set, the less valid the results, obv. But the further back in time you go to build a bigger set, the less relevant the data is to the current environment. And game environments change in subtle ways. It isn't only things as obvious as the NFL adopting the 2pt conversion.

    Often, things work (say, certain passing strategies) only until other coaches see that they work and adopt countering defensive measures. No rule change, just, first, a change of offensive strategy (maybe leading you to say, hey, teams with this offensive strategy/statistical pattern have covered like crazy for the last two years!) followed by a corresponding and countering change in defensive strategy (which is implemented just as you start betting on the offensive strategy).

    There is no Holy Grail.

  10. #10
    DRZ
    Update your status
    DRZ's Avatar Become A Pro!
    Join Date: 02-23-10
    Posts: 918

    lots of systems out there tough to choose one

  11. #11
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    Quote Originally Posted by Pokerjoe View Post
    Here's the basic conundrum: the smaller the set, the less valid the results, obv. But the further back in time you go to build a bigger set, the less relevant the data is to the current environment. And game environments change in subtle ways. It isn't only things as obvious as the NFL adopting the 2pt conversion.

    Often, things work (say, certain passing strategies) only until other coaches see that they work and adopt countering defensive measures. No rule change, just, first, a change of offensive strategy (maybe leading you to say, hey, teams with this offensive strategy/statistical pattern have covered like crazy for the last two years!) followed by a corresponding and countering change in defensive strategy (which is implemented just as you start betting on the offensive strategy).

    There is no Holy Grail.
    As much as I like the NFL, with it's short season compared to other sports this is the dominant issue. On longer season sports the market learns about the teams, and where you might have an advantage early on, it can erode by the end of the season. NBA shows this pattern nearly every year.

    As for a holy grail, the time machine in the movie Back to the Future worked.

  12. #12
    ZombieWolverine
    ZombieWolverine's Avatar Become A Pro!
    Join Date: 06-05-10
    Posts: 306

    Then need to have a new remake on that movie , I think that would be sweet ,

  13. #13
    mathdotcom
    mathdotcom's Avatar Become A Pro!
    Join Date: 03-24-08
    Posts: 11,689
    Betpoints: 1943

    Answer is still the same.

    There is no way to eliminate the tradeoff between small sample issues and introducing biased data.

    First thing to do is estimate it both ways and see if it even differs. You may not have a problem after all.

  14. #14
    Flying Dutchman
    Floggings continue until morale improves
    Flying Dutchman's Avatar Become A Pro!
    Join Date: 05-17-09
    Posts: 2,467
    Betpoints: 759

    Quote Originally Posted by mathdotcom View Post
    Answer is still the same.

    There is no way to eliminate the tradeoff between small sample issues and introducing biased data.

    First thing to do is estimate it both ways and see if it even differs. You may not have a problem after all.
    And how would I do that?

  15. #15
    roasthawg
    roasthawg's Avatar Become A Pro!
    Join Date: 11-09-07
    Posts: 2,990

    Quote Originally Posted by Wrecktangle View Post
    As much as I like the NFL, with it's short season compared to other sports this is the dominant issue. On longer season sports the market learns about the teams, and where you might have an advantage early on, it can erode by the end of the season. NBA shows this pattern nearly every year.

    As for a holy grail, the time machine in the movie Back to the Future worked.
    Yeah, it's tough to make much money in the NFL... college football has been much easier for me.

    As to your point about the NBA, one thing that I've noticed is that the "eroded early season edge" returns in the playoffs... imo this is due to the fact that there is more "public" money on the games come playoff time so it's profitable for the books to have a lean.

  16. #16
    roasthawg
    roasthawg's Avatar Become A Pro!
    Join Date: 11-09-07
    Posts: 2,990

    Quote Originally Posted by Flying Dutchman View Post
    And how would I do that?
    Analyze both sets of data (small and large) and see if the results differ significantly.

Top