Introduction & General Questions

scratchmode · 09-05-12 07:46 PM

Just wanted to make an introduction. Very interesting approaches to sports betting here. Love the confluence of markets, modelling and sports. A few questions on what people are seeing out there:

- About what kind of a R-square values have people's models been hitting? At about what value do you think you're getting close to something usable?
- Anyone testing models and/or individual variables for statistical significance?
- Any predictions on how long it would take before sportsbooks start using machine-learning models? When/if this happens (if it hasn't already), do we all lose our edge?

Pretty freaking excited to have found this forum. Very interested in people's thoughts on this stuff!

Justin7 · 09-05-12 07:50 PM

I can quickly answer your last question. Sportsbooks are unlikely to ever spend much time, energy or intellect to develop very advanced handicapping approaches. It is relatively cheap for them to put up a number that is pretty close, and let the market correct it. And, those that are able to develop advanced models or "machine-learning models" (I'm not quite sure what your definition of this is) can probably make more money betting than booking.

sayhey69 · 09-05-12 07:56 PM

univariate linear regression is machine learning

mathdotcom · 09-05-12 08:58 PM

Finding the line of best fit = machine learning? lol it's no different than using a calculator

sayhey69 · 09-05-12 09:18 PM

ok ill rephrase. using gradient descent for linear regression as opposed to the normal equations is machine learning. and youre missing the point of my post. you can make incredibly stupid models using incredibly complex machine learning algorithms that tell you incredibly nothing about incredibly anything.

oh and well im at it. R-squared is a retarded statistic for retards that want to be retarded for the rest of their life and never model anything meaningful

mathdotcom · 09-06-12 08:07 AM

Why would you use gradient descent for linear regression, there's a closed form solution

Correct about R-squared

scratchmode · 09-06-12 08:53 AM

Originally Posted by sayhey69

R-squared is a retarded statistic for retards that want to be retarded for the rest of their life and never model anything meaningful

In the interests of having a productive conversation (and me not being such a retard), what is it that enlightened folks such as yourself use to determine the representativeness of your models? The crux of my question, is whether people just use their models for "directional" input, or whether they're calculating p-values, etc. Are you implying that OLS regression doesn't cut it for sports betting? If that's the case, then that's interesting. It would do us all a favor if you'd be able to elaborate a bit. Most of the financial markets applications I've seen use OLS regression--people put money behind that still--so maybe it just depends on what you're trying to find out? I'll do the forum a favor and not pretend to be all-knowing, but this is interesting stuff. Thanks for any constructive insight you're able to offer.

mathdotcom · 09-06-12 09:11 AM

The point is there are a number of ways to get a very high R-squared by simply running 1000 versions of the model until you get it very high. (That's why you need to start with a theory and not just blindly looking for patterns. I've had animated discussions in the past on here before about the difference between a theoretical model and the empirical counterpart. Many argue they are the same thing which is wrong.)

And for some models, your R-squared can be miniscule and still deliver great results. Most derivative models have very low R-squared. Adding more variables will always increase your R-squared so some fools do this thinking their model is getting stronger as a result.

Back testing is probably the most useful test, along with being careful creating your model along the way. You have to understand your raw data, and the coefficients typically have to make sense. Sometimes you'll be surprised, but typically when you're surprised something is wrong. If you're predicting totals and you get a negative coefficient on pitcher ERAs, you know something is wrong. There are whole books written on how things can go wrong with OLS and if you're familiar with OLS then you know what I mean.

alukk · 09-21-12 12:34 AM

R"2 depends on the kind of data you have, with some data like PIB for example having and r2 smaller than .85 is pretty bad, but with other kind of data having and r2 higher than .20 is pretty good. Some people get sick trying to get high results for r2.

durito · 09-21-12 01:34 AM

Justin7 gave mathdotcom 2 SBR Point(s) for this post. = Irony

uva3021 · 09-21-12 11:09 PM

the normal equation is not always precise

SBR Top-Rated Sportsbooks				Best Sportsbooks List
#1 FanDuel	SBR rating 4.8/5	Review	#6 BetRivers	SBR rating 4.1/5	Review
#2 Caesars	SBR rating 4.7/5	Review	#7 Fanatics	SBR rating 4.1/5	Review
#3 DraftKings	SBR rating 4.7/5	Review	#8 Betway	SBR rating 3.8/5	Review
#4 BetMGM	SBR rating 4.6/5	Review	#9 Borgata	SBR rating 3.5/5	Review
#5 bet365	SBR rating 4.6/5	Review	#10 ClutchBet	SBR rating 2.9/5	Review

Introduction & General Questions

Thread Tools

Introduction & General Questions