Random Date Testing

Dark Horse · 05-18-09 04:17 AM

Other than going over entire seasons to backtest a method, is there a 'best' approach in terms of testing randomly selected dates or games? Something along the lines of exit polls during elections that are reliable within a few percentage points.

Something related to distribution of results, beyond merely sample size (and z-score).

MonkeyF0cker · 05-18-09 02:12 PM

Is a few percentage points of error worth not testing the entire population? One would think that would have a significant impact on the validity of your results. Edges would typically fall within the range of standard error of a sample. I don't see how it could possibly conclusively confirm or deny an edge.

Dark Horse · 05-18-09 04:21 PM

It's a few percentage point for exit polls, but the same idea could be tightened up.

Theoretically, one could make a case that there are no complete populations, because these would also have to include the future. As such, again theoretically, an advanced random distribution approach may even produce more reliable results (going forward) than the past, because it would filter out extremes that throw off the model.

In any case, some systems are extremely time-consuming to backtest against an entire population, so there would be value in a scientific shortcut. The Z-score gives enough of a measure of reliability, so my question here is about statistical distribution.

Wrecktangle · 05-18-09 10:19 PM

DH, with the variability from year to year in some sports, you would want to test at least two years (NFL good example). In some sports (NBA and CBB) you will see easier lines early in the season, and tightened lines later in season. In sports there are NO guarantees, everything varies.

Dark Horse · 05-18-09 11:28 PM

Let's reason backwards. Object is Z-score of 2.5 to 3.0 (leaving out vig for now). What kind of sample size would you need for that?

For this ranger of Z-score, derived from predetermined minimum winning percentage, what would the sample size have to be? Fill in the number.

Now all I have to do is to find as neutral and random a sample as possible for that number of games. There is no need to focus on seasons as such. If the total number of available games spans twenty seasons, I still only need that number of games.

So my question is about random game selection. I suppose I could simply throw dice to determine season, month, day, and game, but I was hoping for something a little more scientific.

Dark Horse · 05-18-09 11:57 PM

Roughly, you could say that if your system has to meet a minimum requirement of 54% and a Z-score of 2.5, then your sample size has to be 1000.

Z-score for a sample size of 1000 and 54% (540-460) is 2.53.
Z-score for a sample size of 1000 and 55% (550-450) is 3.16. So if the winning percentage were established at 55% or higher, you don't really need a sample size of 1000 to meet the 2.5 Z-score 'requirement'.

Hence my question.

This could be a pretty cool tool for the tool kit. Random game selector for back testing.

Wrecktangle · 05-19-09 06:46 AM

DH: if sports were the result of static distributions, I'd agree that setting a decent z-score might be enough. But unfortunately I contend that sports distributions have enough heteroskedascity in them that you will fool yourself into thinking you have enough data to test against even when you have seriously good z-scores of 4 or 5. The leagues change, and the line changes in reflection of that and sports betting "fashions".

MonkeyF0cker · 05-19-09 07:48 AM

Originally Posted by Dark Horse

It's a few percentage point for exit polls, but the same idea could be tightened up.

Theoretically, one could make a case that there are no complete populations, because these would also have to include the future. As such, again theoretically, an advanced random distribution approach may even produce more reliable results (going forward) than the past, because it would filter out extremes that throw off the model.

In any case, some systems are extremely time-consuming to backtest against an entire population, so there would be value in a scientific shortcut. The Z-score gives enough of a measure of reliability, so my question here is about statistical distribution.

Well, your original question asked if you could sample within a few percentage points. Of course, you could. I'm not sure why you edited it.

What do you consider extremes? If those results are part of the distribution and you eliminate them, then you are skewing your results in the first place. If these "extremes" had happened previously, why are they less likely to occur in the future? And why would they not be statistically relevant?

The best shortcut you can take is to learn a programming language to automate your backtesting. Depending on sampling with such a small population could lead to disasterous results.

MonkeyF0cker · 05-19-09 08:07 AM

Originally Posted by Dark Horse

Roughly, you could say that if your system has to meet a minimum requirement of 54% and a Z-score of 2.5, then your sample size has to be 1000.

Z-score for a sample size of 1000 and 54% (540-460) is 2.53.
Z-score for a sample size of 1000 and 55% (550-450) is 3.16. So if the winning percentage were established at 55% or higher, you don't really need a sample size of 1000 to meet the 2.5 Z-score 'requirement'.

Hence my question.

This could be a pretty cool tool for the tool kit. Random game selector for back testing.

A sample size of 1000 would include almost 4 seasons of NFL games, nearly an entire year of NBA, and nearly half of a season in MLB. How many seasons are included in your population? If you're going to sample this many games, why not just backtest your entire population and eliminate the sampling error?

Dark Horse · 05-19-09 04:56 PM

I've done and believe in the value of what you suggest.

My hypothesis is that too much is made of looking at entire populations, and that very reliable results may be obtained from randomly selected games. So I was asking if anyone was aware of a method to randomly select.

The suggestion, upfront, that such results will not be accurate enough is irrelevant. After all, one could always compare the results of the random method with the results of an entire population.

As to extremes, there is a way to throw them out without polluting your sample (extremes could be viewed as being a pollutant). It's not really relevant to my question, though. Parameters could be defined upfront and a filter could be introduced into the random selector. No data mining.

MonkeyF0cker · 05-19-09 05:59 PM

What exactly would an "extreme" consist of? And how do you extrapolate that these "extremes" are statistically irrelevant? There are plenty of methods to randomly sample. The best method to use in this instance would probably be stratified sampling. But, again, you add additional error to your results in addition to the error that would be considered when testing your entire population (i.e. multiple errors). I really don't understand how accuracy isn't a primary concern. If it isn't, why bother backtesting in the first place? And if you do care about the accuracy, then comparing your sample to the entire population is fairly worthless as you've just done the work that you were attempting to avoid. Are you suggesting that these "extremes" that you speak of are somehow tainting your results? I simply don't understand what you're getting at. It makes no sense.

Dark Horse · 05-19-09 07:38 PM

'Stratified sampling'. That's food for thought...

From wikipedia:

In statistics, stratified sampling is a method of sampling from a population.

When sub-populations vary considerably, it is advantageous to sample each subpopulation (stratum) independently. Stratification is the process of grouping members of the population into relatively homogeneous subgroups before sampling. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. Then random or systematic sampling is applied within each stratum. This often improves the representativeness of the sample by reducing sampling error. It can produce a weighted mean that has less variability than the arithmetic mean of a simple random sample of the population.

As to the number of games, obviously MLB, NHL, and NBA are in a different ballpark than the NFL. MLB, in particular, can be very challenging to backtest.

Regarding extremes, I tend to toss them out because that has worked for me (that doesn't mean it will work here, but it could). For instance, if 99% of games end in totals within certain parameters, and the remaining 1% is way off, then that 1% could easily become a pollutant. In random sampling this would be important, because a random selection process could exaggerate the extreme; if it happened to pick one.

Question. Wouldn't it be interesting if, for highly effective systems, a sample size of a few hundred would turn out to be enough?
Underlying question: is there such a thing as high quality random selection? Or does the idea of 'quality' undermine the randomness, and if so, is there a way to combine quality and randomness?

MonkeyF0cker · 05-19-09 07:54 PM

But how are they not statistically relevant? Those high totals do occur and they will again.

Dark Horse · 05-20-09 01:22 AM

They will occur again, but I'm willing to be surprised by rare exceptions if it means getting a 'cleaner' sample size in return.

SBR Top-Rated Sportsbooks				Best Sportsbooks List
#1 FanDuel	SBR rating 4.8/5	Review	#6 BetRivers	SBR rating 4.1/5	Review
#2 Caesars	SBR rating 4.7/5	Review	#7 Fanatics	SBR rating 4.1/5	Review
#3 DraftKings	SBR rating 4.7/5	Review	#8 Betway	SBR rating 3.8/5	Review
#4 BetMGM	SBR rating 4.6/5	Review	#9 Borgata	SBR rating 3.5/5	Review
#5 bet365	SBR rating 4.6/5	Review	#10 ClutchBet	SBR rating 2.9/5	Review

Random Date Testing

Thread Tools

Random Date Testing