alright so ive been toying around w/ a small subset of data of one nfl team from 92-08 the past couple weeks and have been toying around w/ AR, MA, and ARMA models for predicting totals in the NFL. Through my AR modeling on i am noticing that there is hardly any correlation after a lag-1 so im figuring anything more than an AR-1 is meaningless and through my MA modeling on my feedback im noticing that some nfl statistics are pretty correlated to totals (possibly including a lag). so basically i have been trying to use an ARMA model to try and fit totals over an extended period of time. i guess this question goes mostly to ganch b/c im sure 99% of you have no idea about ARMA modeling, but im wanting to create a categorical variable for each team which turns on and off when a team plays a specific team to possibly help fit the line better. i guess my question right now is is it worth looking into or would i just be wasting my time?
Would like some input and criticism...(esp ganch)
Collapse
X
-
Quebb DieselSBR MVP
- 01-26-08
- 3045
#1Would like some input and criticism...(esp ganch)Tags: None -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#2We just keep coming back to the same issue again.
Why do you believe the underlying factors to be ARMA in the first place?
You shouldn't be saying to yourself, "Hey, I just learned some neat new math, let's keep testing new and different usages of it until I find something that works, " but rather "OK, I have a theory based on my prior knowledge that now that I['ve learned this neat new math I can finally test".
To paraphrase the aphorism, "When you first learn how to use a hammer, every new problem looks like nail."Comment -
Quebb DieselSBR MVP
- 01-26-08
- 3045
#3We just keep coming back to the same issue again.
Why do you believe the underlying factors to be ARMA in the first place?
You shouldn't be saying to yourself, "Hey, I just learned some neat new math, let's keep testing new and different usages of it until I find something that works, " but rather "OK, I have a theory based on my prior knowledge that now that I['ve learned this neat new math I can finally test".
To paraphrase the aphorism, "When you first learn how to use a hammer, every new problem looks like nail."
now im not saying this is the key to life and whatever it has to offer, but i dont see why if the data says that its not very correlated with itself after a couple games and a couple variables are highly correlated over a certain lag of games to experiment with a technique like this?Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#4By all means do experiment, but to be clear what you're describing is data mining.
Just because you have a model that's descriptive of the past, doesn't mean it'll serve to be predictive of the future. If your work's built on several iterations of data fitting rather than on foundation of firm prior knowledge of the underlying "economics" and market structure it's going to be difficult to build a model that's predictive in excess of that which is already priced in to the market.
But give it a try in-sample. Just make sure to maintain a pristine out-of-sample data set (or two) to verify your conclusions.Comment -
Quebb DieselSBR MVP
- 01-26-08
- 3045
#5By all means do experiment, but to be clear what you're describing is data mining.
Just because you have a model that's descriptive of the past, doesn't mean it'll serve to be predictive of the future. If your work's built on several iterations of data fitting rather than on foundation of firm prior knowledge of the underlying "economics" and market structure it's going to be difficult to build a model that's predictive in excess of that which is already priced in to the market.
But give it a try in-sample. Just make sure to maintain a pristine out-of-sample data set (or two) to verify your conclusions.Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#6Because you're using the econometrics to determine which variables (of many) work, while ideally it's theory that should tell you which variables you expect to work, at which point, you'd then go about testing your hypothesis using your econometrics.
But we've talked about this exact issue many times before, you just don't seem to want to believe me. Coming from an academic background myself I can totally identify with your mode of thought. When I first started on Wall Street doing hedge fund quant work I thought about it in a similar fashion, too. "I know tons of econometrics from grad school, let me try it every which way I can and just see what works best."
But after a decade and a half of experience I can tell you that that's generally not the best way to proceed. If one throws enough darts at a board, one's bound to eventually hit bullseye -- but that alone speaks little to one's bullseye chances on the next shot.
But hey, you're under no obligation whatsoever to believe me. I'm just giving you my opinion. Try it out yourself and see what you find. At the very worst it'll be a learning experience and at the very best maybe you'll find you're onto something in which case you can feel free to laugh all the way to the bank.Comment -
Quebb DieselSBR MVP
- 01-26-08
- 3045
#7Because you're using the econometrics to determine which variables (of many) work, while ideally it's theory that should tell you which variables you expect to work, at which point, you'd then go about testing your hypothesis using your econometrics.
But we've talked about this exact issue many times before, you just don't seem to want to believe me. Coming from an academic background myself I can totally identify with your mode of thought. When I first started on Wall Street doing hedge fund quant work I thought about it in a similar fashion, too. "I know tons of econometrics from grad school, let me try it every which way I can and just see what works best."
But after a decade and a half of experience I can tell you that that's generally not the best way to proceed. If one throws enough darts at a board, one's bound to eventually hit bullseye -- but that alone speaks little to one's bullseye chances on the next shot.
But hey, you're under no obligation whatsoever to believe me. I'm just giving you my opinion. Try it out yourself and see what you find. At the very worst it'll be a learning experience and at the very best maybe you'll find you're onto something in which case you can feel free to laugh all the way to the bank.
im mainly experimenting w/ several techniques i have learned over the past couple years and seeing how they can be applied to say sports data.
i guess my next question for you is what sort of techniques are useful in this field then?Comment -
reno coolSBR MVP
- 07-02-08
- 3567
#8Good to have a theory, or hypothesis lets say. But even there you will risk erroneously confirming your idea. Correlations are a dangerous thing.bird bird da bird's da wordComment -
marcoforteSBR High Roller
- 08-10-08
- 140
#9By all means do experiment, but to be clear what you're describing is data mining.
Just because you have a model that's descriptive of the past, doesn't mean it'll serve to be predictive of the future. If your work's built on several iterations of data fitting rather than on foundation of firm prior knowledge of the underlying "economics" and market structure it's going to be difficult to build a model that's predictive in excess of that which is already priced in to the market.
But give it a try in-sample. Just make sure to maintain a pristine out-of-sample data set (or two) to verify your conclusions.Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#10I don't disagree with your premise about knowing the underlying structures But, at what point does data mining become predictive? If a subset of data has a 20 year history of winning and you've played it each and every year forward does it reach the predictive stage assuming n>60 and the winning percentage is >60%?
The real problem with data mining is that it tends to produce spurious correlations.
Flip 100 coins and you'll get 65 or more heads with probability of roughly 0.1759%.
But if 1,000 people each flip 100 coins, there's a 82.80% probability that at least one of the group will flip 65 or more heads. Does that mean that we should believe such person to be an expert coin flipper? No. It just means that it becomes increasingly likely for one to observe a rare occurrence as the number of trials increase.
But then again, if a number of the players who flipped 65 or more heads the first time are able repeat the feat a second and third and fourth time, then you very well might have identified a set of skilled coin flippers.
I've posted on this topic frequently in the past, especially in conversations with posters Dark Horse and VideoReview. You might want to search for posts talking about data segmenting (in-sample vs. out-of-sample) as well as the Bonferonni Method (which as I recall was only discussed in brief).
Good luck!Comment -
Quebb DieselSBR MVP
- 01-26-08
- 3045
#11"correlation does not infer causation" i know i know...observing the AR and MA models clearly indicate that there is very little correlation between variables and to themselves...even on a lag-1 basis...my only ideas right now are to toy around w/ multiple aspects of statistics and see what kind of inferences may come up...Comment -
Quebb DieselSBR MVP
- 01-26-08
- 3045
#12
the outliers wouldnt be increasingly likely but only increased in observations as n increases assuming normality right?Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#13but when n becomes large wouldnt you normally appoximate your data? and when a distribution is gaussian arent the number of outliers .4+.007*n on average?
the outliers wouldnt be increasingly likely but only increased in observations as n increases assuming normality right?
I'm not sure what you're getting at here. If you search long and hard enough you'll find something with probability approaching 1. The question reamins, however, will it be predictive?Comment -
Quebb DieselSBR MVP
- 01-26-08
- 3045
#14When searching for a potentially profitable strategy it's going to be the outliers which will be of most interest in the first place.
I'm not sure what you're getting at here. If you search long and hard enough you'll find something with probability approaching 1. The question reamins, however, will it be predictive?Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#15nono im not talking about in my case...just when you were referring to a binomially distributed situation like flipping coins...with large n dont you typically use normal approximation? just kinda thrown off b/c you said outliers will be increasingly likely but outliers in a gaussian distribution tend to follow a simple algebraic equation on average no?
The more people you have flipping 100 coins each, the more likely it becomes that one or more will flip 65 or more heads.Comment -
Art VandeleighSBR MVP
- 12-31-06
- 1494
#17We just keep coming back to the same issue again.
Why do you believe the underlying factors to be ARMA in the first place?
You shouldn't be saying to yourself, "Hey, I just learned some neat new math, let's keep testing new and different usages of it until I find something that works, " but rather "OK, I have a theory based on my prior knowledge that now that I['ve learned this neat new math I can finally test".
To paraphrase the aphorism, "When you first learn how to use a hammer, every new problem looks like nail."
Can I ask a question, I don't want to start a new thread.
If I were an alien who had landed on Earth, and I had never experienced thunder or lighting in my home planet, how many times would I need to observe these two seemingly separate phenomenon before I was 95% certain that there was 100% correlation between them?Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#18Can I ask a question, I don't want to start a new thread.
If I were an alien who had landed on Earth, and I had never experienced thunder or lighting in my home planet, how many times would I need to observe these two seemingly separate phenomenon before I was 95% certain that there was 100% correlation between them?
If so, the answer is ∞.Comment -
reno coolSBR MVP
- 07-02-08
- 3567
#19Is that because of the 100% part?bird bird da bird's da wordComment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#20Yes. Exactly.
As I demonstrated in a PM to Art than I had originally posted here but then subsequently deleted due to it being too esoteric and off-topic:
In general, to be able to say that thunder followed lightning with probability of at least p† given a confidence level of (1-α) we'd need to observe this occurring without fail (log(α)/log(p) - 1) number of times (well, technically it would be the least integer upper bound of that term as one can't have a fractional number of observations).
Hence, our aliens would need to observe thunder following lightning 298 times without fail to be ≥ 95% certain that thunder followed lightning at least 99% of the time.
† This assumes that our aliens had no prior knowledge of the nature of this phenomenon and so by default assumed all possible values of p equally likely.Comment -
Art VandeleighSBR MVP
- 12-31-06
- 1494
#21First off sorry Queb for hijacking this thread a bit, but the main subject seemed to be about correlation, thought I'd stick the question here instead of starting a new thread.
And to try and equate this thunder/lighting example to sports...
I have observed that an NBA player, after he misses a 3-point shot, will not attempt further 3-pointers until he makes another shot somewhere within the 3-point line.
I would need to observe this 298 consecutive times before I am 95% certain that there is a 99% correlation between the 2 events (missing a 3-pointer/not attempting again until a 2-pointer has been made)Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#22
"Correlation" has a specific meaning (or set of meanings) in probability. Rather than correlation you really mean that there's at least a 99% probability that one event does (or does not) follow the other.
The correlation coefficient between two random variables, as most frequently defined, is the covariance of the two variables divided by the product of their standard deviations. It represents a normalized covariance. and such will necessarily be ≥ -1 and ≤ +1.
In Excel you can determine the correlation and covariance between two arrays of data by using the correl() and covar() functions, respectively.
One should of course always be mindful of the oft-repeated maxim that correlation does not imply causation.Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#23And to try and equate this thunder/lighting example to sports...
I have observed that an NBA player, after he misses a 3-point shot, will not attempt further 3-pointers until he makes another shot somewhere within the 3-point line.
I would need to observe this 298 consecutive times before I am 95% certain that there is a 99% correlation between the 2 events (missing a 3-pointer/not attempting again until a 2-pointer has been made)
In other words, for the above to hold, you'd need to believe prior to making any observations that the likelihood of the probability lying between any two intervals of equal size are equal. So for example, there'd be as much a chance of the probability lying between 45% and 55% as between 90% and 100%.
Unfortunately, from our prior knowledge of the game of basketball, we can say that this is almost certainly not the case.Comment
Search
Collapse
SBR Contests
Collapse
Top-Rated US Sportsbooks
Collapse
#1 BetMGM
4.8/5 BetMGM Bonus Code
#2 FanDuel
4.8/5 FanDuel Promo Code
#3 Caesars
4.8/5 Caesars Promo Code
#4 DraftKings
4.7/5 DraftKings Promo Code
#5 Fanatics
#6 bet365
4.7/5 bet365 Bonus Code
#7 Hard Rock
4.1/5 Hard Rock Bet Promo Code
#8 BetRivers
4.1/5 BetRivers Bonus Code