I know typically we use 99%, 95%, and 90% alpha levels for confidence intervals. But say we're trying to determine if something is PROBABLY significant. Am I correct in thinking that we would use 51%?
Quick and easy(for you) question
Collapse
X
-
illfuuptnSBR MVP
- 03-17-10
- 1860
#1Quick and easy(for you) questionTags: None -
buby74SBR Hustler
- 06-08-10
- 92
#2No! p vaues are not a percentage measurement of whether or not you are "right"
the alpha (or p value) is another way of saying
"IF I had no skill and my results are just random then I would see a result as good as this only x% of the time."
So if it is 5% or 1% you might persuade yourself that you are not just on a lucky streak and are better than a random number generator. But if you got a result with a p value of 0.49 that is not impressive (put 49%where x% is in the sentence in quotes and say it out loud.)
Of course even if you get a decent p value Dr. Bonferroni will then come along and smack you down for your impudence... statistics is a miserable science.
the p value tells you about how your results compare to the null hypothesis of no skillComment -
illfuuptnSBR MVP
- 03-17-10
- 1860
#3I'm not talking about p values at all. The p value is measured against alpha to determine significance.Comment -
buby74SBR Hustler
- 06-08-10
- 92
#4If you set alpha as 50% and then get excited about a p value of 0.49 you have fallen into the same trap of thinking this result is "probably"significantComment -
illfuuptnSBR MVP
- 03-17-10
- 1860
#5If you do a 95% confidence interval wouldn't you say "I'm 95% confident that the population mean lies within this interval"?Comment -
uva3021SBR Wise Guy
- 03-01-07
- 537
#6when asking this question you have to provide the parameters and logic behind your research
as humans we can often come to sound reasoning and isolate things that are unrelated regardless if the descriptive statistics are neat and tidyComment -
smmteaSBR Rookie
- 08-11-11
- 8
#7technically, you can use 51% if you want. 99, 95 and 90 are just numbers that people picked because they are fairly large and look nice. there is no reason that one cannot use 97% or 93%.
however, using 51% will probably not convince anyone to trust your results. the chance of the mean lying outside of the interval is too large.Comment -
TomGSBR Wise Guy
- 10-29-07
- 500
#8create the 95% confident interval and see where the value falls within the confidence interval. does it fall right in the middle or near the ends of the CI? you don't need some rigid accept/reject set of rules for something like this.Comment -
illfuuptnSBR MVP
- 03-17-10
- 1860
#9Okay then. Lets say a player has hit fastballs at an 80% clip for the first 2 months of the season. Then he Hits them at 97% over the following 2 weeks(say it was 100 fastballs total in these 2 weeks). I have a basic program in excel that spits out confidence intervals. In this case I know the population standard deviation and then I also input the sample size of 100 and the sample size mean of 97. Then it gives me a 51% confidence interval of lets say(and this is just hypothetical) 94 to 100.
Let's say if I did the same thing with a 95% interval it would be 78 to 100.
Wouldn't it be silly to go by the 95% confidence interval and not change anything about handicapping that player? After all, there is a 51% chance that his true fastball contact % is at least 94% over the last 2 weeks. So he is PROBABLY better now than he was before.
Assume all other variables contant(pitcher, count, game situation).
On another note, if the above is correct, how would I determine what his true average over that 2 week span is? It seems silly to assume that it's right in the center of the confidence interval at the mean of 97. Is there a way of determining that?
Thanks for any help.Comment -
illfuuptnSBR MVP
- 03-17-10
- 1860
#10TomG could you elaborate on your post and maybe answer my above post?Comment -
buby74SBR Hustler
- 06-08-10
- 92
#11Is the player better than he was or has just been lucky? Do a chisquare test between the first two months and the two weeks he has been hitting 97% of fbs is it enough to change your handicapping of the player
What was the confidence interval during the first two months compared to the last two weeks?Last edited by buby74; 08-16-11, 11:39 AM.Comment -
illfuuptnSBR MVP
- 03-17-10
- 1860
#12^^So if a Chi-Square test finds the last 2 weeks to be significant how do I then determine the most likely true mean for the past 2 weeks? It seems stupid to assume the true mean is 97%. But at the same time it doesn't seem right to just take the bottom end of the confidence interval.
The expected frequency just doesn't give the current streak enough credit imo. In this example I think the expected frequency would be okay but what if this player then hit 97% for 2 months? Then the expected frequency would be the mean of the two numbers at 88.5%. But at the same time a 95% confidence interval for that two month span would probably not contain 88.5 in it.Last edited by illfuuptn; 08-16-11, 12:20 PM.Comment -
brettdSBR High Roller
- 01-25-10
- 229
#13You could monte carlo it. Throw in simulation runs at a proportion of 0.51, whereby his fastball hit % is randomly assigned between 78% and 100% (or you could model a better breakdown via a gaussian distribution with 78% and 100% being the tails), then assign a 0.49 proportion of the simulation runs to his original fastball hit % figure, again constructed with an appropriate distribution (this would tighter, considering you should know his distribution around his current mean).
A bit of work, but you would get the best of both worlds this way. You could keep your 51% confidence, and model the outcome of adopting this over x runs, whilst also looking at the long term EV effect of him NOT actually having changed his hit% figure.Comment -
TomGSBR Wise Guy
- 10-29-07
- 500
#14the "true" average is something that only exists in theory. and in some people's theory (bayesians) the idea of a constant true average doesn't even exist at all. the best you can do it come up with an estimate that you feel is accurate going forward.
in your example 80% is near the bottom range of the 95% confidence interval so i would feel confident enough that it's outside the range. but post the actual CI and it will probably be wider and much less conclusive.Comment -
rsigleySBR Sharp
- 02-23-08
- 304
#15^^So if a Chi-Square test finds the last 2 weeks to be significant how do I then determine the most likely true mean for the past 2 weeks? It seems stupid to assume the true mean is 97%. But at the same time it doesn't seem right to just take the bottom end of the confidence interval.
The expected frequency just doesn't give the current streak enough credit imo. In this example I think the expected frequency would be okay but what if this player then hit 97% for 2 months? Then the expected frequency would be the mean of the two numbers at 88.5%. But at the same time a 95% confidence interval for that two month span would probably not contain 88.5 in it.
i would recommend just using a t test and use your sample variance. people say if n is large enough just use z test, but computers can easily compute t test so might as well just use that. the reason in textbooks they tell you for n> 30 use normal is because most of the time the tables in the back don't have every value for t degree freedom. just remember if you're using a 95% CI use t97.5 not t95
In general, IMO CI are horrible for predictive models and you should use credible intervals instead. Just think about what the definition of a confidence interval is.
Lets say with 95% confidence TomG will make between 85 and 95 bad bets a day. I can't say that on today there is a 95% chance TomG will make between 85 and 95 bad bets. Instead the confidence interval is saying that range is where the mean would occur 95% of the time with sampling over andd over, but there's no indication on whether or not the interval [85,95] contains the true mean. The reason is because when you make a Conf. Interval you are treating the mean to be fixed and it doesn't have a probability distribution. So [85,95] isn't a probability region around the parameter so the thing TomG said about being near a bound doesn't apply.
Whereas with credible intervals (and Bayesian statistics) you can accurate say 95% of the time the mean is inbetween [85,95].
But if you prefer to use old tests that aren't accurate, just follow this:
Variance known, mean unknown = t-test (or z if you want but there's no point really since a t test with infinite degree of freedom is a z test)
Unknown variance = chi-squared
Comparing two variances = F-test (ratio of chi-squared)
Also, you may find a lot of benefit in computer the power function of the test. It's a good visual representation of how good the test isComment
SBR Contests
Collapse
Top-Rated US Sportsbooks
Collapse
#1 BetMGM
4.8/5 BetMGM Bonus Code
#2 FanDuel
4.8/5 FanDuel Promo Code
#3 Caesars
4.8/5 Caesars Promo Code
#4 DraftKings
4.7/5 DraftKings Promo Code
#5 Fanatics
#6 bet365
4.7/5 bet365 Bonus Code
#7 Hard Rock
4.1/5 Hard Rock Bet Promo Code
#8 BetRivers
4.1/5 BetRivers Bonus Code