1. #1
    scratchmode
    scratchmode's Avatar Become A Pro!
    Join Date: 09-03-12
    Posts: 2
    Betpoints: 36

    Introduction & General Questions

    Just wanted to make an introduction. Very interesting approaches to sports betting here. Love the confluence of markets, modelling and sports. A few questions on what people are seeing out there:

    - About what kind of a R-square values have people's models been hitting? At about what value do you think you're getting close to something usable?
    - Anyone testing models and/or individual variables for statistical significance?
    - Any predictions on how long it would take before sportsbooks start using machine-learning models? When/if this happens (if it hasn't already), do we all lose our edge?

    Pretty freaking excited to have found this forum. Very interested in people's thoughts on this stuff!

  2. #2
    Justin7
    Justin7's Avatar Become A Pro!
    Join Date: 07-31-06
    Posts: 8,577
    Betpoints: 1506

    I can quickly answer your last question. Sportsbooks are unlikely to ever spend much time, energy or intellect to develop very advanced handicapping approaches. It is relatively cheap for them to put up a number that is pretty close, and let the market correct it. And, those that are able to develop advanced models or "machine-learning models" (I'm not quite sure what your definition of this is) can probably make more money betting than booking.

  3. #3
    sayhey69
    sayhey69's Avatar Become A Pro!
    Join Date: 04-16-12
    Posts: 50
    Betpoints: 398

    univariate linear regression is machine learning

  4. #4
    mathdotcom
    mathdotcom's Avatar Become A Pro!
    Join Date: 03-24-08
    Posts: 11,689
    Betpoints: 1943

    Finding the line of best fit = machine learning? lol it's no different than using a calculator

  5. #5
    sayhey69
    sayhey69's Avatar Become A Pro!
    Join Date: 04-16-12
    Posts: 50
    Betpoints: 398

    ok ill rephrase. using gradient descent for linear regression as opposed to the normal equations is machine learning. and youre missing the point of my post. you can make incredibly stupid models using incredibly complex machine learning algorithms that tell you incredibly nothing about incredibly anything.

    oh and well im at it. R-squared is a retarded statistic for retards that want to be retarded for the rest of their life and never model anything meaningful
    Last edited by sayhey69; 09-06-12 at 03:02 AM.

  6. #6
    mathdotcom
    mathdotcom's Avatar Become A Pro!
    Join Date: 03-24-08
    Posts: 11,689
    Betpoints: 1943

    Why would you use gradient descent for linear regression, there's a closed form solution

    Correct about R-squared

  7. #7
    scratchmode
    scratchmode's Avatar Become A Pro!
    Join Date: 09-03-12
    Posts: 2
    Betpoints: 36

    Quote Originally Posted by sayhey69 View Post
    R-squared is a retarded statistic for retards that want to be retarded for the rest of their life and never model anything meaningful
    In the interests of having a productive conversation (and me not being such a retard), what is it that enlightened folks such as yourself use to determine the representativeness of your models? The crux of my question, is whether people just use their models for "directional" input, or whether they're calculating p-values, etc. Are you implying that OLS regression doesn't cut it for sports betting? If that's the case, then that's interesting. It would do us all a favor if you'd be able to elaborate a bit. Most of the financial markets applications I've seen use OLS regression--people put money behind that still--so maybe it just depends on what you're trying to find out? I'll do the forum a favor and not pretend to be all-knowing, but this is interesting stuff. Thanks for any constructive insight you're able to offer.

  8. #8
    mathdotcom
    mathdotcom's Avatar Become A Pro!
    Join Date: 03-24-08
    Posts: 11,689
    Betpoints: 1943

    The point is there are a number of ways to get a very high R-squared by simply running 1000 versions of the model until you get it very high. (That's why you need to start with a theory and not just blindly looking for patterns. I've had animated discussions in the past on here before about the difference between a theoretical model and the empirical counterpart. Many argue they are the same thing which is wrong.)

    And for some models, your R-squared can be miniscule and still deliver great results. Most derivative models have very low R-squared. Adding more variables will always increase your R-squared so some fools do this thinking their model is getting stronger as a result.

    Back testing is probably the most useful test, along with being careful creating your model along the way. You have to understand your raw data, and the coefficients typically have to make sense. Sometimes you'll be surprised, but typically when you're surprised something is wrong. If you're predicting totals and you get a negative coefficient on pitcher ERAs, you know something is wrong. There are whole books written on how things can go wrong with OLS and if you're familiar with OLS then you know what I mean.
    Points Awarded:

    Justin7 gave mathdotcom 2 SBR Point(s) for this post.


  9. #9
    alukk
    alukk's Avatar Become A Pro!
    Join Date: 01-29-09
    Posts: 1,544
    Betpoints: 8012

    R"2 depends on the kind of data you have, with some data like PIB for example having and r2 smaller than .85 is pretty bad, but with other kind of data having and r2 higher than .20 is pretty good. Some people get sick trying to get high results for r2.

  10. #10
    durito
    escarabajo negro
    durito's Avatar Become A Pro!
    Join Date: 07-03-06
    Posts: 13,173
    Betpoints: 438

    Justin7 gave mathdotcom 2 SBR Point(s) for this post. = Irony

  11. #11
    uva3021
    uva3021's Avatar Become A Pro!
    Join Date: 03-01-07
    Posts: 537
    Betpoints: 381

    the normal equation is not always precise

Top