This is partly intended for Ganchrow, but anyone can respond:
Let's say I want to use statistics to evaluate a certain angle, or capper, or tout, or whatever.
I know if I have some observations, I can use the binomial distribution, or normal approx. for large samples, to calculate a p-value, given a null hypothesis. So if I observe an ATS record of 9-2, and I make the null hypothesis that each of this system/capper/etc.'s picks will hit with probability q,
p = (11 choose 2) * q^9 * (1-q)^2
+ (11 choose 1) * q^10 * (1-q)
+ (11 choose 0) * q^11
And I can choose some significance level p*, and reject the null if p < p*. I hope that's right, anyway.
What I'd like to know is how things change if I have lots of sets of observations - say I'm data-mining, and I look at 100 different angles at once. Or I'm evaluating lots of handicappers - maybe the BTP contest - and I want to know, quantitatively, "Can any of these guys really cap?" Obviously, I can calculate p-values for each individual set of picks. But intuitively, they don't appear to have the some meaning. If the smallest p-value out of 700 is less than some p*, well, maybe you would probably expect somebody to do that well just from sheer luck and the high sample size.
I did get that if you take the null to be "all of these guys' picks hit with the same probability q" then you can add all the results together and get one p (for each q) for the entire sample. But if you rejected that null all you would be able to say is "at least one of these cappers wins with probability greater than q". But that's not that useful. What I'm wondering is if there is a way to come up with an "adjusted" p for each observation, that takes into account that it was one of many.
I hope that's clear. Thanks in advance for any input.
~ Max
Let's say I want to use statistics to evaluate a certain angle, or capper, or tout, or whatever.
I know if I have some observations, I can use the binomial distribution, or normal approx. for large samples, to calculate a p-value, given a null hypothesis. So if I observe an ATS record of 9-2, and I make the null hypothesis that each of this system/capper/etc.'s picks will hit with probability q,
p = (11 choose 2) * q^9 * (1-q)^2
+ (11 choose 1) * q^10 * (1-q)
+ (11 choose 0) * q^11
And I can choose some significance level p*, and reject the null if p < p*. I hope that's right, anyway.
What I'd like to know is how things change if I have lots of sets of observations - say I'm data-mining, and I look at 100 different angles at once. Or I'm evaluating lots of handicappers - maybe the BTP contest - and I want to know, quantitatively, "Can any of these guys really cap?" Obviously, I can calculate p-values for each individual set of picks. But intuitively, they don't appear to have the some meaning. If the smallest p-value out of 700 is less than some p*, well, maybe you would probably expect somebody to do that well just from sheer luck and the high sample size.
I did get that if you take the null to be "all of these guys' picks hit with the same probability q" then you can add all the results together and get one p (for each q) for the entire sample. But if you rejected that null all you would be able to say is "at least one of these cappers wins with probability greater than q". But that's not that useful. What I'm wondering is if there is a way to come up with an "adjusted" p for each observation, that takes into account that it was one of many.
I hope that's clear. Thanks in advance for any input.
~ Max