Bonferonni Math Question For Ganchrow

**Ganchrow** · 03-05-08, 05:09 PM

Originally posted by VideoReview

This seems to say that I am supposed to throw out the combinations with the lowest p-values. I am completely lost. I thought I knew something for a moment but I am now completely baffled. I thought I was supposed to be searching for low p-value models, not high p-value models.

You are indeed looking for low p-value models. I think you're just getting stuck on the terminology.

A typical hypothesis test might look something like this:

H₀: The tested model is not profitable
H_a: The tested model is profitable

You'd then reject the null in favor of the alternative given a p-value ≤ α.

As a side note, technically speaking the Bonferroni correction is only an approximation. FWIW, the actual corrective factor should be 1-(1-α)^¹/_k which will always be > ^α/_k (for α > 0, k > 1). In practice, however, for traditional values of alpha (≤ ~5%) the difference is negligible.

**VideoReview** · 03-05-08, 06:30 PM

Originally posted by Ganchrow

You are indeed looking for low p-value models. I think you're just getting stuck on the terminology.

A typical hypothesis test might look something like this:

H₀: The tested model is not profitable
H_a: The tested model is profitable

You'd then reject the null in favor of the alternative given a p-value ≤ α.

As a side note, technically speaking the Bonferroni correction is only an approximation. FWIW, the actual corrective factor should be 1-(1-α)^¹/_k which will always be > ^α/_k (for α > 0, k > 1). In practice, however, for traditional values of alpha (≤ ~5%) the difference is negligible.

I see. So in the Holm-Boneronni example I gave, I would be accepting the null hypothesis for the p-values of .03 and .04 and rejecting it for .005 and .01. Is this correct?

Also, please confirm or reject my understanding that either of these 2 Bonferroni methods do, in fact, strongly prevent back fitting errors if I honestly account for every single p-value that I look at regardless of how I came to investigate the variable combination in the first place. I am sure you have already drawn your conclusions about my very rudimentary statistics knowledge so what I am really asking is your subjective opinion if you think the Bonferonni and/or Holms-Bonferonni methods are enough to keep "my" models HSD from the null hypothesis given my limited knowledge?

Also, assuming you think either or both of these methods would be suitable for me, what is your opinion on the adjusted r^2 value for the model I posted above, the way in which I developed the model, and my ultimate conclusion that the model is significant at .05 or below based on the number of p-values I looked at in order to create my final model? The dependant variable was the actual ROI on a win one unit bet.

Finally, I see what you mean about the corrective factor being pretty close for small alphas but it gets way off for relatively large alphas. What is the name of the corrective factor you quoted?

**Ganchrow** · 03-05-08, 07:24 PM

Originally posted by VideoReview

I see. So in the Holm-Boneronni example I gave, I would be accepting the null hypothesis for the p-values of .03 and .04 and rejecting it for .005 and .01. Is this correct?

Correct.

Originally posted by VideoReview

Also, please confirm or reject my understanding that either of these 2 Bonferroni methods do, in fact, strongly prevent back fitting errors if I honestly account for every single p-value that I look at regardless of how I came to investigate the variable combination in the first place.

Actually, no. For the Bonferroni correction to be theoretically valid the hypotheses you're testing need to be independent of one another. Also see below.

Originally posted by VideoReview

I am sure you have already drawn your conclusions about my very rudimentary statistics knowledge so what I am really asking is your subjective opinion if you think the Bonferonni and/or Holms-Bonferonni methods are enough to keep "my" models HSD from the null hypothesis given my limited knowledge?

Also, assuming you think either or both of these methods would be suitable for me, what is your opinion on the adjusted r^2 value for the model I posted above, the way in which I developed the model, and my ultimate conclusion that the model is significant at .05 or below based on the number of p-values I looked at in order to create my final model? The dependant variable was the actual ROI on a win one unit bet.

Finally, I see what you mean about the corrective factor being pretty close for small alphas but it gets way off for relatively large alphas. What is the name of the corrective factor you quoted?

I think you might be getting a bit ahead of yourself here. I certainly respect the care with which you're investigating your hypotheses and the obviously interest you have in the underlying statistics. That said, I think your first step really should be to partition your data set (I generally try to create 3 segments based on date divisibility by 3 -- this helps insure that season-specific effects won't bias a particular partition) and play around with this within a single segment. Beat up the data as much as possible and use these methods you've outline to try to come up with the model you expect to be most profitable out of sample.

Then test the same model on another partition and see what happens.

**VideoReview** · 03-05-08, 10:12 PM

Originally posted by Ganchrow

Correct.

Actually, no. For the Bonferroni correction to be theoretically valid the hypotheses you're testing need to be independent of one another. Also see below.

I think you might be getting a bit ahead of yourself here. I certainly respect the care with which you're investigating your hypotheses and the obviously interest you have in the underlying statistics. That said, I think your first step really should be to partition your data set (I generally try to create 3 segments based on date divisibility by 3 -- this helps insure that season-specific effects won't bias a particular partition) and play around with this within a single segment. Beat up the data as much as possible and use these methods you've outline to try to come up with the model you expect to be most profitable out of sample.

Then test the same model on another partition and see what happens.

Regarding your notes on the Bonferonni correction, I understood that the test statistics could be dependant or independent:
"Intuitively, reducing the size of the allowable error (alpha) for each comparison by the number of comparisons will result in an overall alpha which does not exceed the desired limit, and this can be mathematically proved to be true using Bonferroni's inequality, regardless of independence or dependence among test statistics."

I am sure I am missing something here because I can't see what use this correction would have at reducing p-values of different hypothesis since I understood that this was to prevent Type I errors for similar hypothesis when extensive testing was being performed. Please explain a bit more.

Regarding out of sample testing, I think I understand what you mean and I COMPLETELY forgot to tell you an important part of my model development. The 728 data points represent the home and away team for 364 NHL games. These 364 games were part of a total population of 1372 possible bets (686 games) from April 2007 until January 2008. The ONLY reason the 728 data points were selected was because they represented all of the data points where all of the variables were present. In other words, I actually did the regression analysis on all 1372 possible bets and the program was instructed to remove any observation where all of the variables that were being tested were not present. By present I do not mean it was positive or negative and do mean that the variable was actually missing and I had no way of knowing what the value was. What I did was once I settled on my model of 17 variables, in order to get a statistically significant result, I actually ran 364 linear regression analysis' using the Leave-One-Out cross validation method. From what I had read, this is one of the strongest ways of developing test results. So in essence, I was using results from over half a season to predict the results of a single game which, and please correct me if I am wrong, would remove as much as possible and seasonal bias in my data notwithstanding that I didn't yet add February and March. I had 728 teams (possible bets) in 364 games and one game at a time, I removed both the home and away teams data from the population. So, the values of the variables (the training data) were made up from the remaining 726. This training data obviously had no way of knowing the values of the 2 teams (1 game) that I had pulled out or the results of the game. I copied the 2 predicted results and the actual result from the 2 test teams. Because the training data figured out that by betting both sides of a game, I would lose about 3.7% on any game, the predicted amounts would always be different by the "exact" amount of the average loss of each bet of the total population. My results are based on betting at Pinnacle and although a NHL nickel book, the negative 3.7% simple shows that the dogs did better than average during this sample. A typical result of using 726 data points to predict 2 data points would look like this:

Observation Weight Result To Win 1 Unit Pred(Result To Win 1 Unit)
Obs1 1 1.000 0.069
Obs687 1 -1.410 -0.143

Observation 1 and 687 represent the away and home team from a single game. In this case, it estimated the away team to have a predicted positive EV of 6.9% and the home team to have a predicted negative EV of -14.3%. Therefore, the total average EV is predicted to be -3.7% = ((-14.3+6.9)/2). I did not copy the individual p-values from each of the 364 models in the Leave-One-Out process but did anecdotally notice that all of them were <.0001. Also, the adjusted r^2 value for all that I looked at was in the .05 to .06 range which is what was expected based on the entire population.

I gathered the following from Wikipedia:

"K-fold cross-validation
In K-fold cross-validation, the original sample is partitioned into K subsamples. Of the K subsamples, a single subsample is retained as the validation data for testing the model, and the remaining K − 1 subsamples are used as training data. The cross-validation process is then repeated K times (the folds), with each of the K subsamples used exactly once as the validation data. The K results from the folds then can be averaged (or otherwise combined) to produce a single estimation."

I think what you were suggesting was a sort of K-fold cross validation where I would take 1/3 of the data as training data and that this data would be a random or pseudo-random cross section of the entire set (say every third game) so that I would have an equal amount of games from various dates and would not end up with a seasonally biased (season, day of week, week of month etc.). My assumption (again please correct or confirm for me) is that I have done this with the method I have employed. However, I am very keen to hear if you still feel that your proposal is better than what I am doing and why.

Since the Leave-one-out method is simply a k-fold cross-validation taken to its furthest limit, I concluded that I was able to take any averages or do regressions on the new test population that was completely generated out of sample. Please confirm or correct this assumption as well.

I did some basic average tests on the new test data since I was confident that it was out of sample. It showed the following assuming bet to win 1 unit:

1) If I were to make 728 bets (the entire test population) which would mean betting both sides of the game, my ROI would be -3.7627528%. I though I would check this number against the entire population of 1372 bets (which include games where all variables were not present and hence were not included in the regression) and the results were -3.623192%. This showed me that my subset was a fair reflection of the whole population.
2) If I were to make an equally weighted bet on every single positive predicted EV (Pred EV > 0), my ROI for 324 bets would be +12.433314%. Because I have not figured out a way to do a regression with the total amount bet and total amount won numbers combined, I had decided I would simply say the result was 1 if the team won (I win 1 unit) and -n for the number of units lost if the team lost (i.e. -1.5 for a -150 team loss or -.5 for a +200 team loss). Just to double check my average ROI, I calculated what my ROI would have been using total won/total bet for the entire test sample. It came back as 12.4025404% which was very close to my number so I accepted it. Finally, I took all of the odds that were bet (324 of them in total) in the test sample, and did 2 Monte Carlo runs with a bogey of .124025404 (I used the more accurate number). The 1 million trial run came back with p=.012323 and the 3 million trial run came back with p=.0123716. The means were both very close to zero so I accepted the results.
3) I wanted to see what difference betting with different weights would have and so weighted each of the predicted positive EV bets by exactly the amount the were predicted to be (e.g. if the bet on a +100 team was predicted to be a positive EV of .10, I would put the weight at .1 which efectively means I was betting .1 units to win .1 units in this case). The results were that my Total Won/Total Bet = 20.548618%. This was substantially better than the equal weight bets and reassured me I was on the right track. I then requested and received your code for weighted Monte Carlo runs. I sent the odds to the program along with the predicted EV weights and the results for a 1 million trial was p=.002784 and 3 million trial was p=.002805. The means were, again, very close to zero so I accepted the results.
4) Finally, I wanted to test how good these EV's (both positive and negative) were at predicting the ACTUAL outcome of betting to win one unit (the actual ROI). I ran a regression analysis on the Predicted EV column and made the dependent variable the actual win 1 unit outcome. I also forced the intercept to zero because I simply wanted a percentage of predicted EV that I could multiply by to come up with a statistically significant predicted EV. I ran it at 95% confidence and the results were:
Value=.596
Lower Bound at 95%=.324
Upper Bound at 95%=.869
p <.0001

I also did this for 99% and 99.99% confidence level (the maximum my program lets me). The 99.99% results were:
Value=.596
Lower Bound at 99.99%=.054
Upper Bound at 99.99%=1.139
p <.0001

I had pre-decided that I was going to take the 95% Lower Bound and bet to win a quarter of this amount (approximately(thanks for the previous equation) quarter Kelly).

Therefore, my new bet amount would be to win the following number of units per bet:

Predicted EV(if > 0) * .324 / 4 = .081073875

I did a check on the 8.1% number and found that it was the lower bound at a little over the 99.97% confidence level. Therefore, I accepted this number as very reliable.

I have tried to do every step of the process as thoroughly as I am capable of with the knowledge I have and have tried to employ the most rigid requirements at every step. If you can see any error, either small ones or ones that require large flashing green corrections, please let me know or point me in the correct direction if is proprietary.

I would like you to know that although I won't make my variables known, even though everyone is aware of them, I have only seen them discussed very infrequently, never as an entire group of variables, and that the discussions seem to come from posters who are on the whole quiet on the subject and that I sincerely believe are making money.

**Ganchrow** · 03-05-08, 10:31 PM

Originally posted by VideoReview

Regarding your notes on the Bonferonni correction, I understood that the test statistics could be dependant or independent:
"Intuitively, reducing the size of the allowable error (alpha) for each comparison by the number of comparisons will result in an overall alpha which does not exceed the desired limit, and this can be mathematically proved to be true using Bonferroni's inequality, regardless of independence or dependence among test statistics."

This refers to the Bonferroni inequality. The inequality will be an equality only if the events are independent. If the events are not independent than Bonferroni will be too conservative, resulting in lower statistical power for the test. I apologize if I implied otherwise in my original response.

Originally posted by VideoReview

I am sure I am missing something here because I can't see what use this correction would have at reducing p-values of different hypothesis since I understood that this was to prevent Type I errors for similar hypothesis when extensive testing was being performed. Please explain a bit more.

I don't understand your question.

Regarding out of sample testing, I think I understand what you mean and I COMPLETELY forgot to tell you an important part of my model development. The 728 data points represent the home and away team for 364 NHL games. These 364 games were part of a total population of 1372 possible bets (686 games) from April 2007 until January 2008. The ONLY reason the 728 data points were selected was because they represented all of the data points where all of the variables were present. In other words, I actually did the regression analysis on all 1372 possible bets and the program was instructed to remove any observation where all of the variables that were being tested were not present. By present I do not mean it was positive or negative and do mean that the variable was actually missing and I had no way of knowing what the value was. What I did was once I settled on my model of 17 variables, in order to get a statistically significant result, I actually ran 364 linear regression analysis' using the Leave-One-Out cross validation method. From what I had read, this is one of the strongest ways of developing test results. So in essence, I was using results from over half a season to predict the results of a single game which, and please correct me if I am wrong, would remove as much as possible and seasonal bias in my data notwithstanding that I didn't yet add February and March. I had 728 teams (possible bets) in 364 games and one game at a time, I removed both the home and away teams data from the population. So, the values of the variables (the training data) were made up from the remaining 726. This training data obviously had no way of knowing the values of the 2 teams (1 game) that I had pulled out or the results of the game. I copied the 2 predicted results and the actual result from the 2 test teams. Because the training data figured out that by betting both sides of a game, I would lose about 3.7% on any game, the predicted amounts would always be different by the "exact" amount of the average loss of each bet of the total population. My results are based on betting at Pinnacle and although a NHL nickel book, the negative 3.7% simple shows that the dogs did better than average during this sample. A typical result of using 726 data points to predict 2 data points would look like this:

Observation Weight Result To Win 1 Unit Pred(Result To Win 1 Unit)
Obs1 1 1.000 0.069
Obs687 1 -1.410 -0.143

Observation 1 and 687 represent the away and home team from a single game. In this case, it estimated the away team to have a predicted positive EV of 6.9% and the home team to have a predicted negative EV of -14.3%. Therefore, the total average EV is predicted to be -3.7% = ((-14.3+6.9)/2). I did not copy the individual p-values from each of the 364 models in the Leave-One-Out process but did anecdotally notice that all of them were <.0001. Also, the adjusted r^2 value for all that I looked at was in the .05 to .06 range which is what was expected based on the entire population.

I gathered the following from Wikipedia:

"K-fold cross-validation
In K-fold cross-validation, the original sample is partitioned into K subsamples. Of the K subsamples, a single subsample is retained as the validation data for testing the model, and the remaining K − 1 subsamples are used as training data. The cross-validation process is then repeated K times (the folds), with each of the K subsamples used exactly once as the validation data. The K results from the folds then can be averaged (or otherwise combined) to produce a single estimation."

I think what you were suggesting was a sort of K-fold cross validation where I would take 1/3 of the data as training data and that this data would be a random or pseudo-random cross section of the entire set (say every third game) so that I would have an equal amount of games from various dates and would not end up with a seasonally biased (season, day of week, week of month etc.). My assumption (again please correct or confirm for me) is that I have done this with the method I have employed. However, I am very keen to hear if you still feel that your proposal is better than what I am doing and why.

Since the Leave-one-out method is simply a k-fold cross-validation taken to its furthest limit, I concluded that I was able to take any averages or do regressions on the new test population that was completely generated out of sample. Please confirm or correct this assumption as well.

I did some basic average tests on the new test data since I was confident that it was out of sample. It showed the following assuming bet to win 1 unit:

1) If I were to make 728 bets (the entire test population) which would mean betting both sides of the game, my ROI would be -3.7627528%. I though I would check this number against the entire population of 1372 bets (which include games where all variables were not present and hence were not included in the regression) and the results were -3.623192%. This showed me that my subset was a fair reflection of the whole population.
2) If I were to make an equally weighted bet on every single positive predicted EV (Pred EV > 0), my ROI for 324 bets would be +12.433314%. Because I have not figured out a way to do a regression with the total amount bet and total amount won numbers combined, I had decided I would simply say the result was 1 if the team won (I win 1 unit) and -n for the number of units lost if the team lost (i.e. -1.5 for a -150 team loss or -.5 for a +200 team loss). Just to double check my average ROI, I calculated what my ROI would have been using total won/total bet for the entire test sample. It came back as 12.4025404% which was very close to my number so I accepted it. Finally, I took all of the odds that were bet (324 of them in total) in the test sample, and did 2 Monte Carlo runs with a bogey of .124025404 (I used the more accurate number). The 1 million trial run came back with p=.012323 and the 3 million trial run came back with p=.0123716. The means were both very close to zero so I accepted the results.
3) I wanted to see what difference betting with different weights would have and so weighted each of the predicted positive EV bets by exactly the amount the were predicted to be (e.g. if the bet on a +100 team was predicted to be a positive EV of .10, I would put the weight at .1 which efectively means I was betting .1 units to win .1 units in this case). The results were that my Total Won/Total Bet = 20.548618%. This was substantially better than the equal weight bets and reassured me I was on the right track. I then requested and received your code for weighted Monte Carlo runs. I sent the odds to the program along with the predicted EV weights and the results for a 1 million trial was p=.002784 and 3 million trial was p=.002805. The means were, again, very close to zero so I accepted the results.
4) Finally, I wanted to test how good these EV's (both positive and negative) were at predicting the ACTUAL outcome of betting to win one unit (the actual ROI). I ran a regression analysis on the Predicted EV column and made the dependent variable the actual win 1 unit outcome. I also forced the intercept to zero because I simply wanted a percentage of predicted EV that I could multiply by to come up with a statistically significant predicted EV. I ran it at 95% confidence and the results were:
Value=.596
Lower Bound at 95%=.324
Upper Bound at 95%=.869
p <.0001

I also did this for 99% and 99.99% confidence level (the maximum my program lets me). The 99.99% results were:
Value=.596
Lower Bound at 99.99%=.054
Upper Bound at 99.99%=1.139
p <.0001

I had pre-decided that I was going to take the 95% Lower Bound and bet to win a quarter of this amount (approximately(thanks for the previous equation) quarter Kelly).

Therefore, my new bet amount would be to win the following number of units per bet:

Predicted EV(if > 0) * .324 / 4 = .081073875

I did a check on the 8.1% number and found that it was the lower bound at a little over the 99.97% confidence level. Therefore, I accepted this number as very reliable.

I have tried to do every step of the process as thoroughly as I am capable of with the knowledge I have and have tried to employ the most rigid requirements at every step. If you can see any error, either small ones or ones that require large flashing green corrections, please let me know or point me in the correct direction if is proprietary.

I would like you to know that although I won't make my variables known, even though everyone is aware of them, I have only seen them discussed very infrequently, never as an entire group of variables, and that the discussions seem to come from posters who are on the whole quiet on the subject and that I sincerely believe are making money.

This is quite a lot of writing and would take me a long time to get through. Could you possibly condense this?

**20Four7** · 03-06-08, 04:41 AM

OMG someone has learned Ganch. I'm having difficulty wading through this but think once I do it could be valuable.

**VideoReview** · 03-12-08, 03:35 PM

Originally posted by Ganchrow

I think you might be getting a bit ahead of yourself here. I certainly respect the care with which you're investigating your hypotheses and the obviously interest you have in the underlying statistics. That said, I think your first step really should be to partition your data set (I generally try to create 3 segments based on date divisibility by 3 -- this helps insure that season-specific effects won't bias a particular partition) and play around with this within a single segment. Beat up the data as much as possible and use these methods you've outline to try to come up with the model you expect to be most profitable out of sample.

Then test the same model on another partition and see what happens.

I have done what you have suggested and divided the games that were in chronological order up into thirds by numbering them 1, 2, 3, 1, 2, 3, 1 etc. I even had a random number selected to start the sequence.

To eliminate bias, I now utilize a model creation algorithm. Basically, every single time I run a regression analysis on any group of data, the data is evaluated in order to create the best "new" model of different variables as well as different values. I never consider what models previously worked well and l start from scratch every time. This eliminates bias from me determining the best model for the entire population and then having only the variable values change when I perform a regression on subsets of the data.

With 200+ in my new NHL sample set, and 27 variables to start with, I was able to get an adjusted r^2 value of .316 and p <.00001. When I tested to see what my results would have been if had bet to win 1 unit every time the predicted EV was positive, my ROI was around +24%. Now, I tested the variables on the other 2 sets. The results were completely unexpected and I would like some help to interpret them. I expected to see one of two possibilities:

a) The ROI on the other 2 sets at around -3.6% which would be the ROI if I bet both sides of the game for all teams trying to win 1 unit. Basically, the model has no predictive power.
OR
b) An ROI in the range of +2% to +5% indicating that the model was profitable.

Instead, I got -9% and -13% if the bets were weighted based on predicted EV. Statistically significantly far worse than random. Does anyone have any insight as to what these results mean?

I then tried testing each of the other 2 sets against the remaining sets and had similar results. I then divided the data up into thirds in chronological order this time and had them predict the other remaining sets and had even more startling results (i.e. +40% for my sample set, and -15% for each of the other 2 sets). In fact, the higher the adj r^2 value, the higher the ROI in my sample set and the more negative the ROI of the test sets.

All 3 sets have a combined unweighted ROI of about +3% when the variables are used.

The only explanation I can come up with for this almost perfectly symmetrical divergence of results between sample set and test sets is that I do have a small edge for my NHL games population and the ROI's of the games are not independent (i.e. market or book pressure continuously corrects this edge from getting too large.).

Any other ideas anyone?

**Ganchrow** · 03-14-08, 06:17 AM

If you're randomly segmenting your sample I don't really see how we could expect that this is the result of a book continually readjusting its odds.

When you say "far worse than random" do you mean they were to a statistically significant extent much worse (two-tailed) than your a) case? If not, it seems most likely that your results were simply the product of data mining.

If so, and especially if you're finding this across partitions, well I hate to be dismissive, but have you considered the possibility of a programming error?

**VideoReview** · 03-20-08, 08:46 AM

Originally posted by Ganchrow

If you're randomly segmenting your sample I don;t really see how we could expect that this is the result of a book continually readjusting its odds.

After spending 50%+ of my waking hours since you replied trying to come up with a plausible explanation for the results I was seeing, I have FINALLY succumb to the overwhelming conclusion that it is very risky, regardless of extremely low p values, to assume a model is valid when it can not be verified by data that is "totally" out of sample. Although this conclusion may seem academically trivial for those who already abide by this principle, it is a major leap for me to truly take data away before doing the serious number crunching. I always feared that I would not see the trends by limiting my data. Now I know that any trend worth spotting will be in a smaller data sample and if not verifiable due to a small amount of available data, then I must obtain or wait for more out of sample data before attempting to draw any conclusions.

In retrospect I can see now that I was grasping at straws trying to explain why my model wasn't useless at predicting something.

Although this is the case, I am still at a loss to explain why the larger my Adjusted R value, and consequently my ROI, became in my 1/3 sample, the proportionally worse my ROI became in the remaining 2/3's of the data. I mean, if I was flipping a coin 300 times and cut the data by taking every third flip as my original sample and discovered some way to explain 80% of the flips within that 100 flips, I would expect that my algorithm would produce at least a 50% result (50% if formula is useless and higher if I found a bias) on the remaining 2/3's of the data and not a 35% result which would equal .80*100 + .35*200 = 150 or 50%. In my NHL data, the higher my ROI got in my 1/3 (every third game in chronological order) sample, the lower the ROI got in the 2/3 test sample. This is what I was trying to explain by the bookmaker readjusting its odds idea. Is there a simple explanation for the phenomenon I was observing?

Originally posted by Ganchrow

When you say "far worse than random" do you mean they were to a statistically significant extent much worse (two-tailed) than your a) case? If not, it seems most likely that your results were simply the product of data mining.

I overstated the situation and will be more careful in the future not to make unsubstantiated general claims like this without doing the hard math which is what I had always done before.

Originally posted by Ganchrow

If so, and especially if you're finding this across partitions, well I hate to be dismissive, but have you considered the possibility of a programming error?

I use XLStat, an Excel statistics package add-on so unless they have an error in their programming, I am usually pretty thorough making sure my data in intact (i.e. the use of check-sums to make sure I haven't inadvertently changed the data from the original etc.)

I am currently at the stage of coming up with a logical model that is able to be developed, tested, and verified on numerous combinations of in-sample date (i.e. every 2nd game or every 3rd game or every 4th game etc.) and develop from scratch new variable combinations and values each time that still explain, beyond what is random and enough to generate positive EV beyond vig, in the remaining out of sample data. This will be done without data mining and looking for combinations that can be explained that way. Although this is obviously a big task, I am at least grateful that I am now asking the correct question. Stay tuned...