I'm working on model to predict soccer scores and having some problems.
For example, lets say that I'm looking at season totals of a few variables to see how they affect goals scored for teams playing at home
Using these variables (all are season totals):
HGF - goals scored by team when playing at home (what I want to predict)
HCF - number of corners by team when playing at home
HCA - number of corners against team when playing at home
Doing linear regression on these results in HSA beeing insignificant and a 4.7*10^-4 significance level on HCF.
======================================== =======
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.59352 5.39474 3.632 0.000472 ***
HCF 0.09459 0.05293 1.787 0.077397 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.125 on 88 degrees of freedom
(15 observations deleted due to missingness)
Multiple R-squared: 0.03501, Adjusted R-squared: 0.02405
F-statistic: 3.193 on 1 and 88 DF, p-value: 0.0774
======================================== =======
This seems all good.
Now if I try to add these variables:
ACF - number of corners by team when playing away
ACA - number of corners against team when playing away
When adding both in the regression I get this:
======================================== =======
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.188142 9.960041 3.232 0.00174 **
HCF -0.005014 0.058215 -0.086 0.93157
ACA -0.125732 0.054220 -2.319 0.02277 *
ACF 0.126266 0.074806 1.688 0.09505 .
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.628 on 86 degrees of freedom
(15 observations deleted due to missingness)
Multiple R-squared: 0.1689, Adjusted R-squared: 0.1399
F-statistic: 5.825 on 3 and 86 DF, p-value: 0.001134
======================================== =======
Now HCF have gotten insignificant! ACF is also insignificant, all I get left is ACA.
Regression on ACA gives:
======================================== =======
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.29063 4.67482 9.902 5.71e-16 ***
ACA -0.17510 0.04693 -3.731 0.000337 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.686 on 88 degrees of freedom
(15 observations deleted due to missingness)
Multiple R-squared: 0.1366, Adjusted R-squared: 0.1268
F-statistic: 13.92 on 1 and 88 DF, p-value: 0.0003369
======================================== =======
This seems to me very strange, it suggests that the number of corners
against a team when playing away is a better predictor of goals scored at home
than the number of corners for the team (be it at home or away).
The R^2 value is clearly much larger in the second case also. (0.127 vs 0.024)
I see this kind of things all the time in my analyses, variables that is signigificant
becomes insignificant when I add other variables. Variables that I think would be
insignificant throws out previously significant variables.
How should I handle this?