Props and Question for Ganchrow

tacomax · 03-27-07, 02:09 PM

I'll put my neck on the line and then ganchrow will come and set you straight and put me in my place. I'm assuming that you make 200 wagers per year and I'm assuming that there is a binomial distribution like a coin flip where there are two outcomes (win/lose - we'll ignore the push) and that the sum of the probabilities equal 1. I'll also assume that if you picked your games randomly, you'd get a 50/50 split of wins/losses.

So, the probability of a win in a random pick is 0.5, the probability of a loss is 0.5. Now you can calculate if you're doing better than a random pick.

You say you are up 4-5% per year - is that in monetary terms or % wins (i.e. are you hitting 54-55%). If it's the former, you need to know what price you generally pay - making 5% return on your money at a -110 shop makes you a better player than making 5% at a -105 shop. Since I don't know, I'm going to assume the latter - that you hit 55%.

Therefore, you get 110 wins out of 200 when a "coin flip" system should give you 100. Over 7 years, that's a 770-630 record when you should expect 700-700 if you were picking randomly.

The variance of this binomial distribution is = 350 (200*7*0.5*0.5), the standard deviation is 18.71. That means your record after 7 years is over 3 standard deviations away from the expected mean - and that means that there is strong statistical evidence that your picking ability is not 50/50 - i.e. you're hitting 55% by judgment rather than luck.

That's my very quick attempt - I'll let Ganchrow the maths & stats daddy take over from here. I'm assuming lots of things and might have made an error somewhere. Now I'm off to my stats class.

Ganchrow · 03-27-07, 08:13 PM

Originally posted by Rain Man

Been keeping my eye on the forum for a fair bit now and always notice what a good dude ganchrow is for lending his great knowledge to everyone so that real logic gets applied.

Thanks for the kind of words. Taco's pretty much right on, with his analysis, although I'd have to temper his conclusions.

Originally posted by Rain Man

I don't bother doing real handicap work for lack of time so for fun over the years, i've kept stats on basketball and football looking for what i'll call intuitive logical results patterns in hopes of finding the ones that hold true over time. Over the years I unearthed two particular ones that I started betting six or seven years ago and have come out ahead in a tight range of 4-5 percent each and every year. Each of these has roughly 150-200 wagering opportunities a year.

My question then is how statistically significant are the results to date in gauging whether this is something that can hold true from now till i'm too old to remember what i'm doing in front of a computer with a betting page open in front of me.
Is this a case where perhaps the law of large numbers is lurking around the corner with an uzi.

I'll assume you've averaged a profit of 4.5% of amount wagered and have done so at -110. This implies a win probability of 1.045/(1 + 10/11) ≈ 54.738%.

So now let's lay out our hypothesis. One might propose to test, as taco has done, whether your picking has been significantly better than 50/50. I'd contend that that isn't the relevant factor when evaluating betting strategies, and instead we should be testing whether you've actually been profitable to a significant degree. Therefore our null hypothesis will be be that you've been picking at a rate of no better than 52.381%.

We'll do this two different ways: first we'll approximate with a Z-test (which would yield exact results asymptotically, meaning as the number of trials approached infinity); and then compare that to the exact answer we get from using the binomial distribution directly.

So you've had 175 trials per year, over a 7-year period had have hit at a rate of 54.738%. This corresponds to a record of about 671 wins and 554 losses. Now if you were in fact only a breakeven bettor, your expectation over 1,225 picks would be 1,225 * 52.381% ≈ 642 correct, and your standard deviation over 1,225 picks would be sqrt(1,225 * 52.381% * (1-52.381%)) ≈ 17.48 picks. At 671 picks correct you're (671-642) / 17.48 ≈ 1.62 standard deviations away from what's expected.

Using Excel, we see that a Z-score of 1.652 corresponds to a p-level of 95.072% (=NORMSDIST(1.652)). This means that at a 5% level of confidence we'd have to reject the null hypothesis that your picking over the last 7 years was no better than breakeven. We'd be unable to reject the null hypothesis at higher levels of confidence.

Now we'll try the same test using the binomial distribution. The idea is to determine what the probability of achieving fewer than 671 wins would be were you to pick at a rate of 52.381%. Using Excel, it's rather simple. We have =BINOMDIST(670,1225,52.381%,1). This works out to about 95.058%, which is absurdly close to our Z-test results of 95.072%.

So what can we make of this? Well the returns are certainly intriguing, to say the least (we could be ~99.95% confident these models are better than 50/50), but to determine just how strong they really are we'd need to look more closely at how you came up with these numbers. You should certainly ask your self these questions:

Might there be survivorship bias, meaning did you start off with a larger number of strategies, whittling the number down when some might have proven themselves ineffective? Are these actual returns, or do they represent changes you may have applied retroactively to your model (for example, maybe for the first 4 years you treated home and away teams in the same manner, but then 3 years ago started treating them differently and have applied the home/away team correction to your earlier picks)? Did you start with this strategy more than 7 years but only found it effective over the past 7?

All these are the very sorts of issues that can greatly skew the results of hypothesis tests. You really need to make sure you're confident with the nature of each one.

Originally posted by Rain Man

I should also mention that I do understand that this presumes that the dependent variables remain constant. In fact would there even be a way to calculate as well what the likelyhood is of the dependent variables staying where they should for all this to work.

Yes, you are correct about the dependent variables. You'd wan to consider testing them for stability over time. How exactly you'd go about doing this would be a function of the nature of the variables themselves.

Originally posted by Rain Man

I hope I explained this well enough for you to answer ganchrow and thanks very much in advance.

You did and no problem. Hope this helps.

raiders72002 · 03-27-07, 08:33 PM

If taco were hitting 55%, throw out all that crap as regression to the mean will occur in a hurry and it won't stop there.

Rain Man · 03-28-07, 09:26 AM

Thank you for the response and help gentlemen. Sorry I can't put your quotes in the right places here but I don't have the time right now to figure out how to do it.

Special thanks Ganchrow for the quick tutorial on the calculative and deductive process for this type of situation. After some concentrated interpretation, I actually feel I understand and can do it myself for future reference. Also now know how to use some of those excel functions.

The point regarding survivorship bias was dead on the mark. (love that term......survivorship bias......really does nail it)

Back in the early 90's when I first started to do this type of thing, I repeatedly had promising looking ones go down the tube for this reason. Took a while of self admittal to get this very point thru my thick head. I feel confident admitting this no longer happens. I stay super stringent on the assessments and in fact go so far as to err the other way....sorta like putting vig on myself just to be sure the results are pure and honest.

At this point it looks like the only real wild card out there that might derail me is a change in some key dependent variable that I don't know exists as part of it. Sorta of why I was hoping some brain had come up a good number based theory or with a way of statistically assessing the probably of change in unknown dependent variables in a large sample where they must have remained constant to present.

I truly do appreciate very much the time spent in your responses......very helpful indeed.

All the best