Alright, trying to get something more serious here... I want to come up with a model to pick winners. To make sure that I'm looking for the right things, I have some questions.
Lets take for example NBA over/under bets. I have a simple prediction model of the total score in the game (that doesn't work of course) and I want to tweak it to make it more accurate. In Minitab (statistical software) I run a "Descriptive Statistics" analysis, which tells me how far I am off, by looking at standard deviation. Here is what it told me:
Descriptive Statistics: diff
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3
diff 13194 0 -1.196 0.155 17.813 -113.469 -12.512 -0.759 10.891
Variable Maximum
diff 61.307
Standard deviation of 17.8. What is my goal here? Is it to bring down the standard deviation to a minimum? Would the model be more predictive if the standard deviation was, lets say, 4.6?
Also I checked how far off the actual over/under line. There isn't much of a difference:
Descriptive Statistics: diffBooks
Variable N N* Mean SE Mean StDev Minimum Q1 Median
diffBooks 13194 0 -0.411 0.151 17.292 -109.000 -11.500 0.000
Variable Q3 Maximum
diffBooks 11.500 69.000
Lines are way off too. Besides standard deviation, is there anything else I should look at? I have a very basic understanding of statistics, so I'm not sure what some of the numbers represent or what is of interest to me.
I also ran a regressional analysis on the same variables that are trying to predict the total score. Here is the result:
Regression Analysis: actualScore versus pvscore, phscore
The regression equation is
actualScore = - 8.15 + 1.06 pvscore + 1.03 phscore
Predictor Coef SE Coef T P
Constant -8.148 3.126 -2.61 0.009
pvscore 1.05628 0.03409 30.99 0.000
phscore 1.03302 0.03418 30.23 0.000
S = 17.7913 R-Sq = 23.8% R-Sq(adj) = 23.8%
Analysis of Variance
Source DF SS MS F P
Regression 2 1303249 651625 2058.64 0.000
Residual Error 13191 4175375 317
Total 13193 5478624
Source DF Seq SS
pvscore 1 1014079
phscore 1 289171
I have a question about the variable "R-Sq". The 23.8%, does that mean I only have 23.8% of all factors that make up an accurate predictive model? Do I need to search for the other 76.2%?
I hope someone can help me with this