I'm performing a linear regression over multiple years worth of data.
I know of a few categories that really make one team have a significant advantage over another team.
Issue is, during regression analysis, these categories have a higher than desirable p-value -meaning it's significance over the games evaluated, is not much.
This is because most teams, and therefore most outcomes, do not have these advantages.
Also, the X variable values of these categories come out to be the opposite value than they ought to be. i.e. if the team is very good in this category and have a higher value in it, for example the more shots on goal the better (don't think it's that simple though), then you would expect the X variable in the regression to be positive/+ (more shots=more goals), but it actually comes out to be negative. So the teams better at this category are hurt more than they should be.
Any help on how to fix this and/or incorporate these categories? I'm not specific on purpose, but hopefully the jist is there.
Thanks.