1. #1
    str8chedda
    str8chedda's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 16

    sample size and confidence interval/level for a system

    have been an avid sports bettor for about a year and finally decided to try and actually try and come up with my first system. I was wondering what most of you successful/semi-successful cappers have to be hitting and at what confidence level/interval before you try a system. i recently found a system for NBA that would have hit 65% over an 74 game sample this year. i realize that 65% is not realistically sustainable in the long run and my sample size is not that large, so I ran some confidence intervals on it and came up with this:

    for an 74 game sample that hit 65% ATS:

    99% sure that the true % it hit is between 49.30% and 78.47%.
    95% sure that the true % it hit is between 52.89% and 75.61%.
    90% sure that the true % it hit is between 54.73% and 74.08%.
    85% sure that the true % it hit is between 55.92% and 73.06%.
    80% sure that the true % it hit is between 56.84% and 72.27%.

    basically, what I am asking is what is a good confidence level (the first %) and what should my lower end of my interval (second %) be before I should actually put the system to test. not a trial and error person, since I bet about 1k per game on average so I like to be pretty confident (if that changes anything).
    Last edited by str8chedda; 03-09-10 at 01:46 PM.

  2. #2
    Peregrine Stoop
    Peregrine Stoop's Avatar Become A Pro!
    Join Date: 10-23-09
    Posts: 869
    Betpoints: 779

    first, I would go back and see how your system did in previous years
    you might just have lucked into a winning subset this year due to datamining

  3. #3
    sharpcat
    sharpcat's Avatar Become A Pro!
    Join Date: 12-19-09
    Posts: 4,516

    Out of 74 games a winning percent of 65% has roughly a 1-100 rarity of occurrence meaning that out of a study of 100 systems of the same sample size you are likely to come across at least 1 random occurrence like this, which is good but not exactly flawless. I would recommend digging back to prior seasons and developing a larger sample size to insure that your 65% win ratio is consistent. Also if while you were studying this system you made any adjustments due to any additional trends you may have noticed you would have to discard all of the games that you used for the original system and start over with new games because the previous games you had tested are now tainted.

  4. #4
    BigdaddyQH
    BigdaddyQH
    BigdaddyQH's Avatar Become A Pro!
    Join Date: 07-13-09
    Posts: 19,530
    Betpoints: 8638

    I agree with the other posters. Your sample is way too small. Normally, 5 years is needed to validate any system. 74 games is basically a drop in the bucket.

  5. #5
    str8chedda
    str8chedda's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 16

    peregrine, yea that is my next step in the process, havn't had much time to do it yet.

    bigdaddyqh, sample size issues are built in to the math behind confidence intervals. no matter the sample size there is a low end to the interval at a specific confidence level. a system tested on a 75 game sample can have the same likely hood of being profitable as a system tested on a 750 game sample at certain levels of confidence and different % hit rates. so by the looks of it you guys would agree that i would need like a 55%+ at a confidence level of like 99.5%?

    also sharpcat, just curious why if you think that if a system is only bad 1 in 100. why would that not be enough to start using it? if you had 100 of these systems and for arguement sake, this was even a 1 in 10 occurence, you would have 10 systems that did not show profit (so they would on average lose at -2.5%), but you would have 90 of them that won at like 5%. you would be more profitable than just having 10 systems that you knew were 100% to be profitable because of have a VERY large sample. this is assuming that the 10 systems that were 100% known to be profitable didn't generate an absurd about of plays to even out the fact that there are less systems.
    Last edited by str8chedda; 03-10-10 at 12:34 AM.

  6. #6
    u21c3f6
    u21c3f6's Avatar Become A Pro!
    Join Date: 01-17-09
    Posts: 790
    Betpoints: 5198

    Quote Originally Posted by str8chedda View Post
    I recently found a system for NBA that would have hit 65% over a 74 game sample this year.
    It depends on how the "sample" was obtained. The math is only valid if you have "untainted" data. If the system was found by datamining the sample, then the results are more than likely useless regardless of the confidence math. The true test would be to apply your system to 74 games going forward.

    Joe.

  7. #7
    str8chedda
    str8chedda's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 16

    Quote Originally Posted by u21c3f6 View Post
    It depends on how the "sample" was obtained. The math is only valid if you have "untainted" data. If the system was found by datamining the sample, then the results are more than likely useless regardless of the confidence math. The true test would be to apply your system to 74 games going forward.

    Joe.
    would coming up with a system and THEN testing it on previous data be considered tainted data?

    Over the 74 games there is one common denominator between them (there could be something else, but the chance of that has to be very small). I feel like if there is truly 1 common denominator between all 74 games then the data has to be "untainted" cause there would really be no difference in the sample of 74 games going forward and the 74 games that were backtested other than the time they were tested (a.k.a previous data and future data).
    Last edited by str8chedda; 03-10-10 at 10:16 AM.

  8. #8
    sharpcat
    sharpcat's Avatar Become A Pro!
    Join Date: 12-19-09
    Posts: 4,516

    basically if you studied 100 past games to see "if when team A was favored by -3 do they win more frequently?" and during the course of studying these games you noticed that they were winning at -3.5 a lot also and decide to add this into your hypothesis. Well now that you have changed your hypothesis based on information that you noticed while testing your first 100 games these games are now tainted and now in order to properly test your new hypothesis you would have to use an entirely new set of data.

    Therefore testing a system with previous data is fine as long as your hypothesis was not developed from a trend that you noticed in the 100 games you plan to use to test with. If this was the case you would need to find a set of games in which you have not examined yet.

    If this is not the case than your test is safe, but regardless when testing a hypothesis on such a limited sample your results have a very high risk of being flawed. Basically it would be recommended to continue testing this system and if you are looking for a little action you could even begin to invest into the system, but I would recommend not betting large amounts on these games until you have tested many more games.

    Look at it this way if you flip a coin 74 times you have a much better probability of flipping heads 65% of the time than say if you were to flip that coin 740 times, if after 740 tosses you are still at 65% well than you are looking at a rarity in occurrence of more ilke 1 in 10,000 as compared to 1 in 100.

  9. #9
    str8chedda
    str8chedda's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 16

    Quote Originally Posted by sharpcat View Post
    basically if you studied 100 past games to see "if when team A was favored by -3 do they win more frequently?" and during the course of studying these games you noticed that they were winning at -3.5 a lot also and decide to add this into your hypothesis. Well now that you have changed your hypothesis based on information that you noticed while testing your first 100 games these games are now tainted and now in order to properly test your new hypothesis you would have to use an entirely new set of data.

    Therefore testing a system with previous data is fine as long as your hypothesis was not developed from a trend that you noticed in the 100 games you plan to use to test with. If this was the case you would need to find a set of games in which you have not examined yet.

    If this is not the case than your test is safe, but regardless when testing a hypothesis on such a limited sample your results have a very high risk of being flawed. Basically it would be recommended to continue testing this system and if you are looking for a little action you could even begin to invest into the system, but I would recommend not betting large amounts on these games until you have tested many more games.

    Look at it this way if you flip a coin 74 times you have a much better probability of flipping heads 65% of the time than say if you were to flip that coin 740 times, if after 740 tosses you are still at 65% well than you are looking at a rarity in occurrence of more ilke 1 in 10,000 as compared to 1 in 100.
    i get what you are saying, but even over a 74 game sample (as long as the data was not tainted) and you see a 65% hit rate, it is more likely that it is not an occurring because of variance. granted there are no guarantees, but as long as you can be like 70% sure that it is a profitable system then why not bet it? if you bet all of your systems that you knew where 70% to be profitable you would have 30% losing systems, but that would be negated by the fact that you have 70% of winning systems. overall still +EV and i feel like this would maximize profits the most?

    I understand the statistics, variance, and whatnot, and now understand the tainted data part, but my question is how sure do you have to be before you bet a system? By the looks of it you guys want to be like 99.9% sure, but why? i feel like you would only be able to come up with like 2-3 systems if you were that conservative about it.
    Last edited by str8chedda; 03-10-10 at 12:40 PM.

  10. #10
    70kgman
    70kgman's Avatar Become A Pro!
    Join Date: 01-31-10
    Posts: 4,354
    Betpoints: 1895

    I am in the first year of testing a NBA totals system I came up with. The first two months of the season (approximately 60-70 games), it was hitting at an incredible rate of about 68%, but January-the present it has just been hovering right around the .500 mark (1 game above .500 since Jan 1st to be exact). While the overall season numbers are still great, the last 2+ months of basically breaking even has me wondering if it really has any substance. I guess my point is 74 games is definitely way too small of a sample size. I thought I stumbled upon a goldmine 74 games into my system I was referring to, but now I am very unsure of.

  11. #11
    str8chedda
    str8chedda's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 16

    Quote Originally Posted by 70kgman View Post
    I am in the first year of testing a NBA totals system I came up with. The first two months of the season (approximately 60-70 games), it was hitting at an incredible rate of about 68%, but January-the present it has just been hovering right around the .500 mark (1 game above .500 since Jan 1st to be exact). While the overall season numbers are still great, the last 2+ months of basically breaking even has me wondering if it really has any substance. I guess my point is 74 games is definitely way too small of a sample size. I thought I stumbled upon a goldmine 74 games into my system I was referring to, but now I am very unsure of.
    that really isn't my question. i know that 74 games is too small of a sample to be sure that it is profitable at a high percent, but when i ran the confidence math on it, it basically says that i have a 85% chance that this system is hitting greater than 55%. if i had 10 of these systems, 1.5 of them on average would lose at (-2.5% a.k.a juice) and the other 8.5 systems would be winning at a rate of 55% or higher. so OVERALL you would be winning a good amount of money. my question is how sure do you have to be? and why do you guys think that you have to be 99.99% sure that it is profitable before you use it?

  12. #12
    sharpcat
    sharpcat's Avatar Become A Pro!
    Join Date: 12-19-09
    Posts: 4,516

    Quote Originally Posted by str8chedda View Post
    that really isn't my question. i know that 74 games is too small of a sample to be sure that it is profitable at a high percent, but when i ran the confidence math on it, it basically says that i have a 85% chance that this system is hitting greater than 55%. if i had 10 of these systems, 1.5 of them on average would lose at (-2.5% a.k.a juice) and the other 8.5 systems would be winning at a rate of 55% or higher. so OVERALL you would be winning a good amount of money. my question is how sure do you have to be? and why do you guys think that you have to be 99.99% sure that it is profitable before you use it?
    This is gonna come down to preference, me I am a gambler and I agree with you I am just telling you to proceed with caution. The most important thing here is how you manage your bank roll, whether you use kelley or just flat bet. I just flat bet and in this scenario I would probably bet 2-3% of my bank per bet until things turn for the worst. It still does not hurt to continue testing to get a better feel for where you stand, I hit 68% of my NCAA and NBA picks last month over 107 bets but I am not going to quit my day job because of this but I am not going to quit betting either.

    The answer here is bet when you feel you have value if you are correct in your assumption you have value than you will be successful long term. Nobody is telling you not to bet they are telling you to not quit your day job after a study of 74 games.
    Points Awarded:

    str8chedda gave sharpcat 5 SBR Point(s) for this post.

    str8chedda gave sharpcat 5 SBR Point(s) for this post.


  13. #13
    u21c3f6
    u21c3f6's Avatar Become A Pro!
    Join Date: 01-17-09
    Posts: 790
    Betpoints: 5198

    Quote Originally Posted by str8chedda View Post
    that really isn't my question.
    The question that needs to be answered first is: Does your system have any part of it based on what you learned from that 74 game sample? Or in other words, did the system exist in its entirety prior to testing it on the 74 game sample?

    If you answered yes to the first question above and/or no to the second question above, your confidence calculations are not valid.

    Joe.
    Points Awarded:

    str8chedda gave u21c3f6 5 SBR Point(s) for this post.


  14. #14
    str8chedda
    str8chedda's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 16

    never gonna quit my day job (poker ), but anyways thanks for the responses, have a better idea on what is a valid system and what isn't now

  15. #15
    sharpcat
    sharpcat's Avatar Become A Pro!
    Join Date: 12-19-09
    Posts: 4,516

    As a poker player I am sure you understand bankroll management which IMO is the most important aspect of any system.

    With all that spare time you got on your hands you should check out a few books on statistics, you can find some great info and may even be able to improve your poker game as well.

  16. #16
    louis.ana
    louis.ana's Avatar Become A Pro!
    Join Date: 02-08-09
    Posts: 359
    Betpoints: 446

    Each wager you make should be about 2% of your bankroll, if you are placing 1k bets and your bankroll is not 50k.. then you should apply a better money management to complement your system. Your money management and system should go hand in hand.

    Large sample sizes are nice but not always needed, try a random sample size of 31 and see what results you achieve. You can:
    - apply your system over a 31 game subset of the 74 game sample size and see if it hits 65%
    or
    - apply your NBA system over the next 31 days and see how many of those games would be a play and track their results

    Not much time left in the NBA season. You could be dealing with some teams that just don't care anymore. My best runs in the NBA have been early in the season and right after the All-Star game.

  17. #17
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    One thing folks seem to forget is: the leagues change. Coaching "fashion" changes, rules change, refs change, players change (less of a problem), etc. This will run against your implied assumption that you are drawing from the same distribution. I once used to think you needed at least 5 years of data, now I think that can skew good working models. If you ask me what is the best period, I don't know and it depends on the sport.

Top