Hi all,
Haven't done full-on stats since college but I started tinkering with R and wanted to use it for research for NCAA tournament underdogs. My theory was this; underdogs that cover point spreads would most likely maximize their possessions and give opponents fewer opportunities to score points, in other words, teams that rebound well on both ends and don't turn the ball over will do better than others.
To test this, I performed the following:
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1604 -0.6199 -0.4857 -0.3028 2.4929
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 15.332281 7.139957 2.147 0.03176 *
Reb 0.002679 0.002510 1.067 0.28587
opp_reb -0.002790 0.002649 -1.053 0.29223
Turn -0.017394 0.044127 -0.394 0.69344
opp_turns 0.010915 0.044040 0.248 0.80425
opp_fg_per -1.657098 12.996452 -0.128 0.89854
fg_diff 0.708769 10.636560 0.067 0.94687
to_margin 0.140202 1.413622 0.099 0.92100
RPI -0.035033 0.014554 -2.407 0.01608 *
SOS 0.015673 0.005987 2.618 0.00885 **
winper -16.215107 5.006029 -3.239 0.00120 **
asst_to -1.827639 1.551475 -1.178 0.23880
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 278.73 on 323 degrees of freedom
Residual deviance: 257.22 on 312 degrees of freedom
AIC: 281.22
Number of Fisher Scoring iterations: 5
If I'm interpreting the data right, and I'd like to think that I am, the most statistically significant variables are RPI (Rank Percentage Index), SOS (strength of schedule), and winper (team's winning percentage) were as Reb (team's rebounds), opp_reb (opponents rebounds), and turn (team's TOs) weren't what I had hoped.
What I'm stuck on; is this a *#$% model? I mean, I get to be on the lookout for teams that may have a better RPI versus a favorite but does this make sense to the group? If there is some value, how do I take it to the next level/what do I do with it now/how do I apply it to this year's field? I'm somewhat familiar with R (I did the above by just reading up on the subject) but by no means an expert (I added the procedures I performed to the word document attached to this message, always show your work, right?).
Some other thoughts; I would like to push it a bit further and add in offensive efficiency (to push the last piece of what I "believe" to be a good underdog). Or, do I "flip" the whole model and instead look for what are the traits that favorites covered with?
Any guidance would be appreciated, PM me if you feel the need.
data and code.zip
Haven't done full-on stats since college but I started tinkering with R and wanted to use it for research for NCAA tournament underdogs. My theory was this; underdogs that cover point spreads would most likely maximize their possessions and give opponents fewer opportunities to score points, in other words, teams that rebound well on both ends and don't turn the ball over will do better than others.
To test this, I performed the following:
- Pulled data on all NCAA tournament teams for the past 6 years
- Pulled all spreads and assigned a "1" to underdogs that covered in the 1st round and a "0" to all others
- Uploaded that data into R (I attached the data to this message)
- Did not receive the responses I expected, details below:
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1604 -0.6199 -0.4857 -0.3028 2.4929
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 15.332281 7.139957 2.147 0.03176 *
Reb 0.002679 0.002510 1.067 0.28587
opp_reb -0.002790 0.002649 -1.053 0.29223
Turn -0.017394 0.044127 -0.394 0.69344
opp_turns 0.010915 0.044040 0.248 0.80425
opp_fg_per -1.657098 12.996452 -0.128 0.89854
fg_diff 0.708769 10.636560 0.067 0.94687
to_margin 0.140202 1.413622 0.099 0.92100
RPI -0.035033 0.014554 -2.407 0.01608 *
SOS 0.015673 0.005987 2.618 0.00885 **
winper -16.215107 5.006029 -3.239 0.00120 **
asst_to -1.827639 1.551475 -1.178 0.23880
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 278.73 on 323 degrees of freedom
Residual deviance: 257.22 on 312 degrees of freedom
AIC: 281.22
Number of Fisher Scoring iterations: 5
If I'm interpreting the data right, and I'd like to think that I am, the most statistically significant variables are RPI (Rank Percentage Index), SOS (strength of schedule), and winper (team's winning percentage) were as Reb (team's rebounds), opp_reb (opponents rebounds), and turn (team's TOs) weren't what I had hoped.
What I'm stuck on; is this a *#$% model? I mean, I get to be on the lookout for teams that may have a better RPI versus a favorite but does this make sense to the group? If there is some value, how do I take it to the next level/what do I do with it now/how do I apply it to this year's field? I'm somewhat familiar with R (I did the above by just reading up on the subject) but by no means an expert (I added the procedures I performed to the word document attached to this message, always show your work, right?).
Some other thoughts; I would like to push it a bit further and add in offensive efficiency (to push the last piece of what I "believe" to be a good underdog). Or, do I "flip" the whole model and instead look for what are the traits that favorites covered with?
Any guidance would be appreciated, PM me if you feel the need.
data and code.zip