Urgent Help Need - Confidence Intervals

**hutennis** · 07-22-12, 12:56 PM

Originally posted by 339955

let's say I backtest something and come up with a result of -13.58 units over 688 bets resulting in a ROI of -.0197. how can i find out what my "true ROI" is with a 95% confidence level?

Are you looking for what your "true ROI" was, is or will be?

**339955** · 07-22-12, 12:59 PM

..

i want to find out what my levels my true ROI was, within 95% or 90% confidence levels. i have been looking at stats tutorials online but am having a hard time with it for a couple days now. this was with flat betting of 1 unit for bets between +150 and -150

**hutennis** · 07-22-12, 01:09 PM

Even after getting helped or educating yourself to the point that you can solve this elementary problem on your own, you still have to keep in mind that you should not apply whatever number you gonna get to your future betting activities.
It is not going to work.

**339955** · 07-22-12, 01:15 PM

hutennis, what do you think is a good method to use in backtesting to see that a model is accurate?

FWIW I am doing -.0197 + 1.96 * 1.4 / root(688) and -.0197 - 1.96 * 1.4/root(688) but someone told me this is wrong because "true ROI isn't a random variable".

**HUY** · 07-23-12, 01:30 AM

What the hell is "true ROI"? You should brush up on your statistics terminology.

The ROI you get after a number of bets is definitely a random variable.

**rsigley** · 07-23-12, 08:08 AM

"true roi" is an unknown constant and not a random variable. the r.v's are the sample roi/variance used to construct the interval. once you collect data to form the CI you can't make probability statements about ROI because its not a random variable (e.g. my ROI is 95% between 1% and 5% is wrong)

**TomG** · 07-23-12, 10:22 AM

ok. just say, "the confidence interval is 95% likely to contain my true roi." be careful with your sampling and calculations and you can calculate a confidence interval without delving into a decades long debate on the philosophical underpinnings of statistical inference on a fking gambling forum

**rsigley** · 07-23-12, 11:31 AM

that isn't true either

it's 95% of confidence intervals generated will contain which is different

**339955** · 07-24-12, 02:19 PM

thanks for the explinations not sure if it helps too much on a practical level though

**rsigley** · 07-24-12, 02:44 PM

sure it does. it tells you if you want to make probabilistic inferences on things like that you can't use frequentist statistics and need to do bayesian

**339955** · 07-26-12, 10:00 PM

.

okay thanks i will think on this some more

**mathdotcom** · 07-27-12, 12:14 AM

Originally posted by 339955

let's say I backtest something and come up with a result of -13.58 units over 688 bets resulting in a ROI of -.0197. how can i find out what my "true ROI" is with a 95% confidence level?

I am 95% confident that the data suggest it is more likely your model is a dud rather than a stud.

Thanks
MDC Industries

**339955** · 07-27-12, 10:58 AM

thanks dotcom but i am not really looking for a boys intuition but for some statistical analysis

**MidnightToker** · 07-27-12, 11:02 AM

I'm sure someone will be by to hold your hand through this too.

**mathdotcom** · 07-27-12, 11:13 AM

Originally posted by 339955

thanks dotcom but i am not really looking for a boys intuition but for some statistical analysis

Null hypothesis: ROI = 0
Alternative hypothesis: ROI > 0

Data: ROI < 0

p-value > 0.5

We cannot reject the null hypothesis in favor of the alternative hypothesis.

If you can't figure this out for yourself then your model is no doubt complete horseshit and I pity you if you bet it. What's even more scary is the back testing showed you you have a dud but you still think you're onto something.

I really feel sorry for guys who have no background at all in statistics and are trying to create a successful model. Spurious correlations and poorly thought out regressions are no better than the methods used by squares looking for stupid trends like "Pitcher X is 10-0 when pitching on a Friday west of the Mississippi with the wind blowing north."

**big0mar** · 07-27-12, 11:18 AM

No offense, but I would recommend getting a better grasp on statistics before getting yourself into this type of analysis.

**a4u2fear** · 07-27-12, 11:33 AM

Originally posted by mathdotcom

I am 95% confident that the data suggest it is more likely your model is a dud rather than a stud.

Thanks
MDC Industries

Yup, this guy is completely useless unless you call him useless, then he tries to justify himself.

**mathdotcom** · 07-27-12, 12:17 PM

Originally posted by a4u2fear

Yup, this guy is completely useless unless you call him useless, then he tries to justify himself.

My original statement is identical to the 'formal' one presented after.

Do you guys realize what a retarded question he asked? He wants a single number and a range at the same time. I told him what the main implication of his results is -- that he is more likely to have a shit model than a good model.

So many of these guys must be losing their shirts because they did not take Stats 101. Sad.

**MidnightToker** · 07-27-12, 01:30 PM

I wouldn't spend much time trying to help those who aren't willing to help themselves. There's enough material out there and enough obvious weaknesses in unluckyboy's knowledge that there's really no reason for him to not just shut up and work through things by himself for some months. I guess that's just too hard though, and asking other people is just too easy.

**339955** · 07-27-12, 02:03 PM

lol midnight toker. i'm not willing to help myself? yeah that conveniently ignores the fact that i got the data and built the program to backtest it. i am studying statistics but need help. everyone needs help. i get some help from books and online videos and now i come here for more help.

i get it dude, you are a jerk, but stop trying to impose your moral system of indifference and selfishness on others, out of decency please keep it to yourself.

**339955** · 07-27-12, 02:13 PM

mathdotcom thanks for the post. while i am working through this P-value stuff i have another question.

i have DB of full game line and quarter by quarter scores. so i just do a weighted regression based on the sample size of how many games for a given full game line that there are, with the quarter result. in what ways would you think to improve on this? i have tried changing the regression line to include only games with a full game line closer to the game being modeled for but have found these to be less accurate. it looks like a good linear relationship.

fwiw the results i gave above were results for just some of the data backtested. on other subsets the results can be more favorable. though i am not really sure what kind of ROI people are averaging with derivatives...

**mathdotcom** · 07-27-12, 02:18 PM

Originally posted by 339955

mathdotcom thanks for the post. while i am working through this P-value stuff i have another question.

i have DB of full game line and quarter by quarter scores. so i just do a weighted regression based on the sample size of how many games for a given full game line that there are, with the quarter result. in what ways would you think to improve on this? i have tried changing the regression line to include only games with a full game line closer to the game being modeled for but have found these to be less accurate. it looks like a good linear relationship.

Can you rephrase this part? Not following.

For now I would say don't re-run your model by changing the underlying sample. If you do that enough you will eventually find that your model works well for game spreads between -1.5 and +3 (for example) but it is unlikely that it will for those spreads going forward. The reason is if you restrict your sample enough, eventually any model will make predictions that would've beaten the market line. Instead, you should always be guided by theory: what do you think is the true relationship between the derivative cashing and the other variables you've collected? Do you have data for all those relevant variables? If not, what is missing? Does it matter? If it does, how can you try to fix that? etc.

**339955** · 07-27-12, 02:30 PM

yeah this p value stuff is what i was looking for all along. i will try

hypothesis - my ROI is 2% for each bet

z = (sample mean - population mean)/ (standard deviation/sqrt(sample))

z = (-.02 - .02)/ (1.1/26) = -.945
since the zscore is not outside of the 1.96 to -1.96 range this result is reasonable and we cannot reject hypothesis.

**339955** · 07-27-12, 02:37 PM

so i have data points where x is the full game line, and y is the percent of times teams with the full game line beat the quarter line. clearly as x goes up y will go up. then i put a regression line through all those data points. it appears to me that the relationship is linear. in other words, if you have a quarter line of 7, full game lines of 30 will be just helpful in predicting the first quarter result as full game lines of 60.

i have data going from 2008-2012 but not earlier because i think the game may change if we go back far and those results may not be relevant.

i modeled the quarter line simply as a derivative of full game line. i was told by someone that the full game line is the only variable that matters as a first quarter line is a true derivative. so i am not sure how i could improve the model at this point.

**mathdotcom** · 07-27-12, 02:45 PM

Originally posted by 339955

so i have data points where x is the full game line, and y is the percent of times teams with the full game line beat the quarter line. clearly as x goes up y will go up. then i put a regression line through all those data points. it appears to me that the relationship is linear. in other words, if you have a quarter line of 7, full game lines of 30 will be just helpful in predicting the first quarter result as full game lines of 60.

i have data going from 2008-2012 but not earlier because i think the game may change if we go back far and those results may not be relevant.

i modeled the quarter line simply as a derivative of full game line. i was told by someone that the full game line is the only variable that matters as a first quarter line is a true derivative. so i am not sure how i could improve the model at this point.

If you continue with this your model is basically going to predict the Q1 spread as (game spread/4), which is the same number the books post. Given the juice on Q1 lines, you will have to bring something new to the analysis to try and get an edge. Maybe some teams are perennially slow starters, maybe in some games the spread takes into account the effect that star players will be rested in the 2H if they have a big enough lead, etc. Whatever idea you get, you'll have to quantify it and if you're lucky it'll be significant and change your predictions enough to give you an edge.