Correlated Gaussian

**HUY** · 11-19-13, 01:29 PM

You mean modelling the distribution of the scores as a bivariate gaussian distribution? If so, I've gone down that way a few times. What exactly are you trying to do?

**flsaders85** · 11-19-13, 03:26 PM

I'm using multinomial logistic regression to predict points distribution (0,1,2,3) within a possession and pace. I noticed my absolute error is higher in games with big point spreads. Somebody referred me to correlated gaussian to reduce my prediction error...especially in games with big point spreads. Correlated gaussian accounts for the tendency of teams to play to the level of their competition...which will affect games with larger points. Below is he source of the formula via Dean Oliver. It's proven to be slightly more accurate than Pythag.

I want to use the formula to predict win % of an individual game based on my game predictions. I'd like to plug in my predictions in the numerator and account of the consistency or lack of consistency in the denominator.

Correlated Gaussian Method

http://www.rawbw.com/~deano/helpscrn/corrgauss.html

**HUY** · 11-19-13, 05:07 PM

Originally posted by flsaders85

I'm using multinomial logistic regression to predict points distribution (0,1,2,3) within a possession and pace. I noticed my absolute error is higher in games with big point spreads. Somebody referred me to correlated gaussian to reduce my prediction error...especially in games with big point spreads. Correlated gaussian accounts for the tendency of teams to play to the level of their competition...which will affect games with larger points. Below is he source of the formula via Dean Oliver. It's proven to be slightly more accurate than Pythag.

I want to use the formula to predict win % of an individual game based on my game predictions. I'd like to plug in my predictions in the numerator and account of the consistency or lack of consistency in the denominator.

http://www.rawbw.com/~deano/helpscrn/corrgauss.html

Your denominator is:

SQRT[Var(Rtg)+Var(Opp.Rtg)-2*Cov(Rtg,Opp.Rtg)]

If you didn't notice, this is simply the expression for the standard deviation (i.e. the square root of the variance)
of the difference of two random variables; you can reach the same expression by substituting in the
formula provided here: https://en.wikipedia.org/wiki/Varian...ated_variables (keeping in mind
to change the sign of the covariance term since the formula applies to the sum while you
want to apply to the difference).

In principle you need to have a variance for the rating of each team and also a
covariance for all pairs of teams. If you think this is overkill or is likely to lead to overfitting
then maybe you can try to get away with a variance that depends linearly on the rating, for
example you can assume that teams with 100 points mean scoring have a 20 points standard deviation.
You can do something similar for the covariance. If you still have questions post away, and also
share your results :-)

Of course, it is always possible to estimate variance per team, as I have done here: http://www.statsfair.com/iplot?sport=snooker (note the dotted lines in the plot, which follow the mean rating but plus-minus one
standard deviation.)