Logit regression, R, and the NCAA tournament

**TomG** · 03-06-12, 11:31 AM

the good news is that its not so hard, right? r does all the work for you.

the bad news is that your regression equation is a jumbled mess. it's a good example why you can't just throw a bunch of stuff into the formula and expect to get anything meaningful out of it. read the underlying assumptions for a linear regression

Regression analysis - Wikipedia

http://en.wikipedia.org/wiki/Regression_analysis

hint: try typing in pairs(RPI, SOS, winper, ft_per, score_margin, ft_per, reb_margin, to_margin, asst_to) and post a pic of the output

**RickySteve** · 03-06-12, 11:51 AM

Tommy, Heritage is originating on MLB derivatives this year.

**TomG** · 03-06-12, 11:56 AM

100 limits on mlb derivatives there for me

**RickySteve** · 03-06-12, 12:19 PM

I'm sure you have 2nd cousins with higher limits.

**zeros_and_ones** · 03-06-12, 12:50 PM

Originally posted by TomG

the good news is that its not so hard, right? r does all the work for you.

the bad news is that your regression equation is a jumbled mess. it's a good example why you can't just throw a bunch of stuff into the formula and expect to get anything meaningful out of it. read the underlying assumptions for a linear regression

Regression analysis - Wikipedia

http://en.wikipedia.org/wiki/Regression_analysis

hint: try typing in pairs(RPI, SOS, winper, ft_per, score_margin, ft_per, reb_margin, to_margin, asst_to) and post a pic of the output

Appreciate the response and to clarify your hint, are you suggesting running the model for each variable? As in the following:

hoopslogit1<- glm(X1stdogwin~RPI, family=binomial)

obtain summary

hoopslogit2<- glm(X1stdogwin~reb_margin, family=binomial)

obtain summary

and so forth?

Overall, taking a step back, I agree with your feedback that the model is a bit jumbled. My first iteration just focused on rebound related information but I found that none of the variables were statistically significant hence me throwing some other stuff in. Makes sense that I can't bake a cake with throwing a bunch of crap in.

**TomG** · 03-06-12, 05:04 PM

oops looks like i didn't do that right just do pairs(hoops) to check for multicollinearity and don't include highly correlated variables as predictors. there are lots of ways to build models--prune top down or build bottom up. just follow the regression assumptions and see which model has the best adjusted r-squared, aic, bic, or whatever selection criteria you want to work with. i don't even understand what you are trying to predict, though, so i think you have a ways to go.

**zeros_and_ones** · 03-06-12, 06:14 PM

Originally posted by TomG

oops looks like i didn't do that right just do pairs(hoops) to check for multicollinearity and don't include highly correlated variables as predictors. there are lots of ways to build models--prune top down or build bottom up. just follow the regression assumptions and see which model has the best adjusted r-squared, aic, bic, or whatever selection criteria you want to work with. i don't even understand what you are trying to predict, though, so i think you have a ways to go.

Hi Tom,

Appreciate the feedback, I need to just play around with it a bit more. To answer your last question (re: what I'm trying to predict) is this, I wanted to figure out which attributes (i.e. rebound margin, turnovers, etc) successful underdogs possessed. To get the inital data set, I took all past underdog winners, and coded them as 1.