1. #1
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Can Someone Confirm My Happiness?

    I think I just created a frickin sweet model, but I'm paranoid that I did something wrong with testing that invalidates the data. If someone could verify that I tested correctly that would be hugely appreciated.

    The process:
    I created a model, backtested using data from April 1st, 2009 - April 30th, 2010, and then made repeated changes to the model until backtesting on that data set yielded the best result. Then, once I was confident in my model, I "forward tested" from May 1st, 2010 through yesterday.

    The results (from May 1st onward):
    494 - 292, +285.99 units, Z-Score = 6.05 (All bets were between 1 and 5 units)

    Since I didn't backtest with the data I used for "forward testing" does that mean that these results are valid? They almost seem too good to be true, which has me worried.

  2. #2
    hutennis
    hutennis's Avatar Become A Pro!
    Join Date: 07-11-10
    Posts: 847
    Betpoints: 3253

    What is "forward testing" ?

  3. #3
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Quote Originally Posted by hutennis View Post
    What is "forward testing" ?
    I don't know if there is such a thing. I didn't know what to called it, so I made up a name. Basically, I just mean that the data set I used to create the model and the data set I used to come up with the results are two different sets.

  4. #4
    sharpcat
    sharpcat's Avatar Become A Pro!
    Join Date: 12-19-09
    Posts: 4,516

    Kind of alarming that you came up with 786 plays in 92 days if this is baseball alone you would have bet 60-70% of the games on the board.

  5. #5
    hutennis
    hutennis's Avatar Become A Pro!
    Join Date: 07-11-10
    Posts: 847
    Betpoints: 3253

    So it looks like you have not actually tried to place any bets.

    Why don't you do that?

    Start betting money. Small money. 1 unit = $1.

    If your model is good then you'll be quietly making a lot of money gradually increasing stakes and soon will become financially independent.
    A lot of money in a bank will be the best possible conformation of your happiness.
    No amount of guessing (or trying to find black cat in a dark room when there is no cat there in a first place) you are asking for in your post can even come close.

    If after placing real bets you'll see that sigma 6 is nowhere in sight, money are lost and happy dreams of financial independence vanished like fart in a wind then you can come back to forums, disclose details of your now worthless model and ask for help on figuring out why it worked great on paper and failed miserably in real life.
    I'm sure, you gonna get all the help you need.

    Yep, looks reasonable to me.
    Last edited by hutennis; 08-02-10 at 01:17 PM.

  6. #6
    roasthawg
    roasthawg's Avatar Become A Pro!
    Join Date: 11-09-07
    Posts: 2,990

    Sounds solid to me.

  7. #7
    u21c3f6
    u21c3f6's Avatar Become A Pro!
    Join Date: 01-17-09
    Posts: 790
    Betpoints: 5198

    Quote Originally Posted by Grind-It-Out View Post
    I think I just created a frickin sweet model, but I'm paranoid that I did something wrong with testing that invalidates the data. If someone could verify that I tested correctly that would be hugely appreciated.

    The process:
    I created a model, backtested using data from April 1st, 2009 - April 30th, 2010, and then made repeated changes to the model until backtesting on that data set yielded the best result. Then, once I was confident in my model, I "forward tested" from May 1st, 2010 through yesterday.

    The results (from May 1st onward):
    494 - 292, +285.99 units, Z-Score = 6.05 (All bets were between 1 and 5 units)

    Since I didn't backtest with the data I used for "forward testing" does that mean that these results are valid? They almost seem too good to be true, which has me worried.
    I think you need to either base your stats on a flat bet basis or separate your stats by the number of units and then test those results for confidence levels. You may have hit a larger % of 5 unit than 1 unit wagers which may skew your results if you comingle them.

    Joe.

    PS. You may also have to separate by odds ranges as a winner or two or three at very high odds may also skew your results.
    Last edited by u21c3f6; 08-02-10 at 03:05 PM. Reason: PS.

  8. #8
    bztips
    bztips's Avatar Become A Pro!
    Join Date: 06-03-10
    Posts: 283

    Quote Originally Posted by u21c3f6 View Post
    I think you need to either base your stats on a flat bet basis or separate your stats by the number of units and then test those results for confidence levels. You may have hit a larger % of 5 unit than 1 unit wagers which may skew your results if you comingle them.

    Joe.

    PS. You may also have to separate by odds ranges as a winner or two or three at very high odds may also skew your results.
    If you're using Kelly, you would EXPECT to hit a higher % on your larger bets (that's why you bet more )

  9. #9
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Quote Originally Posted by u21c3f6 View Post
    I think you need to either base your stats on a flat bet basis or separate your stats by the number of units and then test those results for confidence levels. You may have hit a larger % of 5 unit than 1 unit wagers which may skew your results if you comingle them. Joe. PS. You may also have to separate by odds ranges as a winner or two or three at very high odds may also skew your results.
    The z-score would account for any outliers like that.

    That being said, the breakdown by unit size is:

    1: 359 - 242 (+101.18)
    2: 99 - 41 (+97.28)
    3: 23 - 8 (+35.73)
    4: 9 - 1 (+31.80)
    5: 4 - 0 (+20.00)

    Also, there aren't any ridiculously large lines. They all pretty much fall between -200 and +200.

    Quote Originally Posted by bztips View Post
    If you're using Kelly, you would EXPECT to hit a higher % on your larger bets (that's why you bet more )
    Precisely! Obviously Kelly doesn't recommend a flat 1-5 unit scale, but I'm not good enough (yet) to predict with any more precision than that.

  10. #10
    Indecent
    Indecent's Avatar Become A Pro!
    Join Date: 09-08-09
    Posts: 758
    Betpoints: 1156

    Quote Originally Posted by Grind-It-Out View Post
    The process:
    I created a model, backtested using data from April 1st, 2009 - April 30th, 2010, and then made repeated changes to the model until backtesting on that data set yielded the best result. Then, once I was confident in my model, I "forward tested" from May 1st, 2010 through yesterday.
    Ideally, for each tweak you would use a new set of games (validation set) to compare the results (forward testing as you called it). Tweaking the same set of data would make me concerned of data-mining/overtraining issues. If you had more games to test the model on you would be able to get a better idea of what it could do moving forward. Even games that occurred in previous seasons are new to your model and could be use as a validation set to further verify the model.

    With that said, I can't say for sure if you would suffer from data-mining issues given what you've provided.

    Gl
    Last edited by Indecent; 08-02-10 at 04:53 PM.

  11. #11
    sharpcat
    sharpcat's Avatar Become A Pro!
    Join Date: 12-19-09
    Posts: 4,516

    Quote Originally Posted by Indecent View Post
    Ideally, for each tweak you would use a new set of games (validation set) to compare the results (forward testing as you called it). Tweaking the same set of data would make me concerned of data-mining/overtraining issues. If you had more games to test the model on you would be able to get a better idea of what it could do moving forward. Even games that occurred in previous seasons are new to your model and could be use as a validation set to further verify the model.

    With that said, I can't say for sure if you would suffer from data-mining issues given what you've provided.

    Gl
    This is a very good point I completely missed this when I first read this post.

    Every time you alter a system you are testing you have to use a new set of data since a pattern noticed in the original set of data may have contributed to your new improved system.

    If possible I would look into backtesting prior seasons.

  12. #12
    MarketMaker
    MarketMaker's Avatar Become A Pro!
    Join Date: 07-19-10
    Posts: 44

    My guess would be that if you made an error it would be using data from a date past the date of the game to find your bets.

    For example, if you had data from April 1st 2009 to yesterday and then used all of that data to try to predict a game on May 1st, 2010 you would seemingly have a very successful model. It is important that you only use data up to and not including the date of the event to come up with your prediction.

    It is also possible that you just have a really successful model. A win rate of 62.8% is not something I have ever heard of in baseball if your median line is +/-100 but I know it is possible in other sports.

  13. #13
    MarketMaker
    MarketMaker's Avatar Become A Pro!
    Join Date: 07-19-10
    Posts: 44

    Since I didn't backtest with the data I used for "forward testing" does that mean that these results are valid? They almost seem too good to be true, which has me worried.

    This has me worried. So if you didn't backtest with the data you used for "forward testing", and you stated that you backtested with data from April 1st, 2009 - April 30th, 2010, what data did you use in your model beginning on May 1st, 2010?

  14. #14
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Quote Originally Posted by MarketMaker View Post
    Since I didn't backtest with the data I used for "forward testing" does that mean that these results are valid? They almost seem too good to be true, which has me worried.

    This has me worried. So if you didn't backtest with the data you used for "forward testing", and you stated that you backtested with data from April 1st, 2009 - April 30th, 2010, what data did you use in your model beginning on May 1st, 2010?
    By "forward testing", I suppose I still meant bactesting, but on a different data set.

    I used data from April 1st, 2009 - April 30th, 2010 to build and test the model. Data from May 1st onward was not used in any way, until I was 100% done development. Then, I backtested again from May 1st onward. What I was trying to convey was that I didn't alter the model based on this data.

  15. #15
    MarketMaker
    MarketMaker's Avatar Become A Pro!
    Join Date: 07-19-10
    Posts: 44

    So from May 1st onward what data did you use?

  16. #16
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Quote Originally Posted by MarketMaker View Post
    So from May 1st onward what data did you use?
    I had data up until August 1st. I knew I didn't want to compromise the results by backtesting on the same data I used for final testing, so I simply separated one data set into two data sets.

  17. #17
    MarketMaker
    MarketMaker's Avatar Become A Pro!
    Join Date: 07-19-10
    Posts: 44

    I don't think you are understanding what I am getting at. For your predictions for the games specifically on May 1st, 2010 what data did you use?

  18. #18
    djiddish98
    djiddish98's Avatar Become A Pro!
    Join Date: 11-13-09
    Posts: 345
    Betpoints: 237

    Here's a rephrasing of my own: You're running a test for games on June 1, 2010. Did you use data from April 1, 2009 to May 31, 2010 to test the games on June 1? Or are you using data from April 1, 2009 to August 1, 2010 or May 1, 2010 to August 1, 2010 to test on June 1?

  19. #19
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Quote Originally Posted by MarketMaker View Post
    I don't think you are understanding what I am getting at. For your predictions for the games specifically on May 1st, 2010 what data did you use?
    On May 1st, 2010, I used data from April 1st 2009 - April 30th, 2010.
    On May 2nd, 2010, I used data from April 1st 2009 - May 1st, 2010.
    etc.

  20. #20
    MarketMaker
    MarketMaker's Avatar Become A Pro!
    Join Date: 07-19-10
    Posts: 44

    If that is the case your data is valid.

  21. #21
    djiddish98
    djiddish98's Avatar Become A Pro!
    Join Date: 11-13-09
    Posts: 345
    Betpoints: 237

    Also, is this just moneylines or ML, RL, Total, other derivatives, etc?

  22. #22
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Quote Originally Posted by djiddish98 View Post
    Also, is this just moneylines or ML, RL, Total, other derivatives, etc?
    ML, RL and total.

    For the ML and RL, I take the ML if it is -200 or better, and the RL otherwise.

  23. #23
    MarketMaker
    MarketMaker's Avatar Become A Pro!
    Join Date: 07-19-10
    Posts: 44

    Quote Originally Posted by Grind-It-Out View Post
    ML, RL and total.

    For the ML and RL, I take the ML if it is -200 or better, and the RL otherwise.
    What exactly does your model output? Predicted runs for each team?

  24. #24
    djiddish98
    djiddish98's Avatar Become A Pro!
    Join Date: 11-13-09
    Posts: 345
    Betpoints: 237

    Why is there a ceiling of -200? I would think a model wouldn't really see a RL as more valuable since it is merely derived from the ml, unless the market was somehow over pricing long dog lose by 1 probability.

    Also, what line / book are you grading against? Opener, closer, etc?

  25. #25
    mathdotcom
    mathdotcom's Avatar Become A Pro!
    Join Date: 03-24-08
    Posts: 11,689
    Betpoints: 1943

    Sounds rogue

  26. #26
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Quote Originally Posted by djiddish98 View Post
    Why is there a ceiling of -200? I would think a model wouldn't really see a RL as more valuable since it is merely derived from the ml, unless the market was somehow over pricing long dog lose by 1 probability.

    Also, what line / book are you grading against? Opener, closer, etc?
    The idea is that I want to introduce as little juice as possible. There is also a ceiling of +200 for the same reason.

    I'd rather bet a runline with lines of +120/-130 than a moneyline with lines of +240/-280.

  27. #27
    djiddish98
    djiddish98's Avatar Become A Pro!
    Join Date: 11-13-09
    Posts: 345
    Betpoints: 237

    I get what you're saying, but what book is offering 40 cent spreads on the ml and 10 cent spreads on the rl?

    There is a small window where you might think the ml should be -278, so the rl could become +ev based on your model. However, I would think typically if a ml isn't valuable, the rl isn't either.

    What is the average odds bet? How many bets on totals, rl, ml?

  28. #28
    wrongturn
    Update your status
    wrongturn's Avatar Become A Pro!
    Join Date: 06-06-06
    Posts: 2,228
    Betpoints: 3726

    check if your code has bug. It is too good to be true.

  29. #29
    mathdotcom
    mathdotcom's Avatar Become A Pro!
    Join Date: 03-24-08
    Posts: 11,689
    Betpoints: 1943

    Quote Originally Posted by Grind-It-Out View Post
    The idea is that I want to introduce as little juice as possible. There is also a ceiling of +200 for the same reason.

    I'd rather bet a runline with lines of +120/-130 than a moneyline with lines of +240/-280.
    I hope you know why this should not be an absolute statement, otherwise I'm sure your model is wrong.

  30. #30
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Quote Originally Posted by mathdotcom View Post
    I hope you know why this should not be an absolute statement, otherwise I'm sure your model is wrong.
    I don't know how you can say that without knowing anything about my model. My model doesn't tell me how much of an edge a team has, only if they have one or not. The unit size tells me the confidence that an edge exists, not the size of the edge.

    So, armed with limited information, I believe my attempt to reduce vig is a valid one. My thought is that the oddsmakers will do a very good job (most of the time) converting the moneyline into a comparable runline, and vice versa.

  31. #31
    lasker
    lasker's Avatar Become A Pro!
    Join Date: 01-27-10
    Posts: 1,683
    Betpoints: 114

    I can't give technical advice, but I would say just bet small at first and see how it goes. Or don't bet it at all yet, just track the results... If your model can truly produce so many picks at such a high percentage, you'll have your answer soon enough

    And let us know if it turns out to be gold or fool's gold!
    Last edited by lasker; 08-03-10 at 12:13 PM.

  32. #32
    CrimsonQueen
    CrimsonQueen's Avatar Become A Pro!
    Join Date: 08-12-09
    Posts: 1,068
    Betpoints: 1660

    I disagree that the oddsmakers will make the RL and ML comparable all (or even most) of the time. What I'm saying is there's two games on the board RIGHT NOW:
    Game 1: (Phillies)
    ML = -200 RL -120
    Game 2: (Cardinals)
    ML = -225 RL -105

    My point is sometimes a team is more likely to win the game but not to cover the RL.

  33. #33
    jgilmartin
    jgilmartin's Avatar Become A Pro!
    Join Date: 03-31-09
    Posts: 1,119

    Home favorites and away favorites will have different runlines.

  34. #34
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Quote Originally Posted by CrimsonQueen View Post
    I disagree that the oddsmakers will make the RL and ML comparable all (or even most) of the time. What I'm saying is there's two games on the board RIGHT NOW:
    Game 1: (Phillies)
    ML = -200 RL -120
    Game 2: (Cardinals)
    ML = -225 RL -105

    My point is sometimes a team is more likely to win the game but not to cover the RL.
    Right, and the books account for this with the lines they set, thus making them comparable.

  35. #35
    sharpcat
    sharpcat's Avatar Become A Pro!
    Join Date: 12-19-09
    Posts: 4,516

    I think books basically convert ML to RL using historical Home/Away 1 run win statistics, but because these are still 2 separate markets even though they move closely together they do not have to move together. Whether to bet the ML or the RL or ML or spread should come down to which side has more value and just because a market has high juice does not mean that you can not find value in it.

12 Last
Top