Would like some input and criticism...(esp ganch)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Quebb Diesel
    SBR MVP
    • 01-26-08
    • 3045

    #1
    Would like some input and criticism...(esp ganch)
    alright so ive been toying around w/ a small subset of data of one nfl team from 92-08 the past couple weeks and have been toying around w/ AR, MA, and ARMA models for predicting totals in the NFL. Through my AR modeling on i am noticing that there is hardly any correlation after a lag-1 so im figuring anything more than an AR-1 is meaningless and through my MA modeling on my feedback im noticing that some nfl statistics are pretty correlated to totals (possibly including a lag). so basically i have been trying to use an ARMA model to try and fit totals over an extended period of time. i guess this question goes mostly to ganch b/c im sure 99% of you have no idea about ARMA modeling, but im wanting to create a categorical variable for each team which turns on and off when a team plays a specific team to possibly help fit the line better. i guess my question right now is is it worth looking into or would i just be wasting my time?
  • Ganchrow
    SBR Hall of Famer
    • 08-28-05
    • 5011

    #2
    We just keep coming back to the same issue again.

    Why do you believe the underlying factors to be ARMA in the first place?

    You shouldn't be saying to yourself, "Hey, I just learned some neat new math, let's keep testing new and different usages of it until I find something that works, " but rather "OK, I have a theory based on my prior knowledge that now that I['ve learned this neat new math I can finally test".

    To paraphrase the aphorism, "When you first learn how to use a hammer, every new problem looks like nail."
    Comment
    • Quebb Diesel
      SBR MVP
      • 01-26-08
      • 3045

      #3
      Originally posted by Ganchrow
      We just keep coming back to the same issue again.

      Why do you believe the underlying factors to be ARMA in the first place?

      You shouldn't be saying to yourself, "Hey, I just learned some neat new math, let's keep testing new and different usages of it until I find something that works, " but rather "OK, I have a theory based on my prior knowledge that now that I['ve learned this neat new math I can finally test".

      To paraphrase the aphorism, "When you first learn how to use a hammer, every new problem looks like nail."
      so even if i were to fit an ARMA model to historical data using a number of variables and lags why would that not be relevant? wouldnt the feedback AR and input MA models be telling you that there are obviously correlations with the variables that could be used to generate an ARMA model? if i notice that there appears to be a correlation in the feedback term by plotting auto correlaitons adjusting the lag and use those corresponding lag terms in the model? then observe cross correlations between my feedback term and other impact variables and select those cross correlations that appear high for the model. if say there appears to be high cross correlations between ypp, 1st downs, and months with my feedback term, why not put include these variables in my AR model? say that model fits the to the actual data pretty dead on...why wouldnt it be a good idea to use that model to possibly predict what future scores would be?

      now im not saying this is the key to life and whatever it has to offer, but i dont see why if the data says that its not very correlated with itself after a couple games and a couple variables are highly correlated over a certain lag of games to experiment with a technique like this?
      Comment
      • Ganchrow
        SBR Hall of Famer
        • 08-28-05
        • 5011

        #4
        By all means do experiment, but to be clear what you're describing is data mining.

        Just because you have a model that's descriptive of the past, doesn't mean it'll serve to be predictive of the future. If your work's built on several iterations of data fitting rather than on foundation of firm prior knowledge of the underlying "economics" and market structure it's going to be difficult to build a model that's predictive in excess of that which is already priced in to the market.

        But give it a try in-sample. Just make sure to maintain a pristine out-of-sample data set (or two) to verify your conclusions.
        Comment
        • Quebb Diesel
          SBR MVP
          • 01-26-08
          • 3045

          #5
          Originally posted by Ganchrow
          By all means do experiment, but to be clear what you're describing is data mining.

          Just because you have a model that's descriptive of the past, doesn't mean it'll serve to be predictive of the future. If your work's built on several iterations of data fitting rather than on foundation of firm prior knowledge of the underlying "economics" and market structure it's going to be difficult to build a model that's predictive in excess of that which is already priced in to the market.

          But give it a try in-sample. Just make sure to maintain a pristine out-of-sample data set (or two) to verify your conclusions.
          so if im using MA models to find which variables have strong correlations to total points put up why would that be considered data mining? wouldnt it be better to fit correlated variables with a response as compared to having a model consisting of every NFL stat known to man? wouldnt fitting an abundance of variables to a model be shooting the variance of the fit sky high?
          Comment
          • Ganchrow
            SBR Hall of Famer
            • 08-28-05
            • 5011

            #6
            Because you're using the econometrics to determine which variables (of many) work, while ideally it's theory that should tell you which variables you expect to work, at which point, you'd then go about testing your hypothesis using your econometrics.

            But we've talked about this exact issue many times before, you just don't seem to want to believe me. Coming from an academic background myself I can totally identify with your mode of thought. When I first started on Wall Street doing hedge fund quant work I thought about it in a similar fashion, too. "I know tons of econometrics from grad school, let me try it every which way I can and just see what works best."

            But after a decade and a half of experience I can tell you that that's generally not the best way to proceed. If one throws enough darts at a board, one's bound to eventually hit bullseye -- but that alone speaks little to one's bullseye chances on the next shot.

            But hey, you're under no obligation whatsoever to believe me. I'm just giving you my opinion. Try it out yourself and see what you find. At the very worst it'll be a learning experience and at the very best maybe you'll find you're onto something in which case you can feel free to laugh all the way to the bank.
            Comment
            • Quebb Diesel
              SBR MVP
              • 01-26-08
              • 3045

              #7
              Originally posted by Ganchrow
              Because you're using the econometrics to determine which variables (of many) work, while ideally it's theory that should tell you which variables you expect to work, at which point, you'd then go about testing your hypothesis using your econometrics.

              But we've talked about this exact issue many times before, you just don't seem to want to believe me. Coming from an academic background myself I can totally identify with your mode of thought. When I first started on Wall Street doing hedge fund quant work I thought about it in a similar fashion, too. "I know tons of econometrics from grad school, let me try it every which way I can and just see what works best."

              But after a decade and a half of experience I can tell you that that's generally not the best way to proceed. If one throws enough darts at a board, one's bound to eventually hit bullseye -- but that alone speaks little to one's bullseye chances on the next shot.

              But hey, you're under no obligation whatsoever to believe me. I'm just giving you my opinion. Try it out yourself and see what you find. At the very worst it'll be a learning experience and at the very best maybe you'll find you're onto something in which case you can feel free to laugh all the way to the bank.
              no im not knocking you at all, and know you are probably the most knowledgeable person around these forums when it comes to stat theory.

              im mainly experimenting w/ several techniques i have learned over the past couple years and seeing how they can be applied to say sports data.

              i guess my next question for you is what sort of techniques are useful in this field then?
              Comment
              • reno cool
                SBR MVP
                • 07-02-08
                • 3567

                #8
                Good to have a theory, or hypothesis lets say. But even there you will risk erroneously confirming your idea. Correlations are a dangerous thing.
                bird bird da bird's da word
                Comment
                • marcoforte
                  SBR High Roller
                  • 08-10-08
                  • 140

                  #9
                  Originally posted by Ganchrow
                  By all means do experiment, but to be clear what you're describing is data mining.

                  Just because you have a model that's descriptive of the past, doesn't mean it'll serve to be predictive of the future. If your work's built on several iterations of data fitting rather than on foundation of firm prior knowledge of the underlying "economics" and market structure it's going to be difficult to build a model that's predictive in excess of that which is already priced in to the market.

                  But give it a try in-sample. Just make sure to maintain a pristine out-of-sample data set (or two) to verify your conclusions.
                  I don't disagree with your premise about knowing the underlying structures But, at what point does data mining become predictive? If a subset of data has a 20 year history of winning and you've played it each and every year forward does it reach the predictive stage assuming n>60 and the winning percentage is >60%?
                  Comment
                  • Ganchrow
                    SBR Hall of Famer
                    • 08-28-05
                    • 5011

                    #10
                    Originally posted by marcoforte
                    I don't disagree with your premise about knowing the underlying structures But, at what point does data mining become predictive? If a subset of data has a 20 year history of winning and you've played it each and every year forward does it reach the predictive stage assuming n>60 and the winning percentage is >60%?
                    What you need you need to do is look at the strategy p-value (preferably from a Bayesian perspective) making sure not to neglect all the other strategies you might have considered along the way. This is why segmenting data into in-sample and out-of-sample tranches (preferably as many of the latter as the size of your data set would reasonably permit) is so useful.

                    The real problem with data mining is that it tends to produce spurious correlations.

                    Flip 100 coins and you'll get 65 or more heads with probability of roughly 0.1759%.

                    But if 1,000 people each flip 100 coins, there's a 82.80% probability that at least one of the group will flip 65 or more heads. Does that mean that we should believe such person to be an expert coin flipper? No. It just means that it becomes increasingly likely for one to observe a rare occurrence as the number of trials increase.

                    But then again, if a number of the players who flipped 65 or more heads the first time are able repeat the feat a second and third and fourth time, then you very well might have identified a set of skilled coin flippers.

                    I've posted on this topic frequently in the past, especially in conversations with posters Dark Horse and VideoReview. You might want to search for posts talking about data segmenting (in-sample vs. out-of-sample) as well as the Bonferonni Method (which as I recall was only discussed in brief).

                    Good luck!
                    Comment
                    • Quebb Diesel
                      SBR MVP
                      • 01-26-08
                      • 3045

                      #11
                      Originally posted by reno cool
                      Good to have a theory, or hypothesis lets say. But even there you will risk erroneously confirming your idea. Correlations are a dangerous thing.
                      "correlation does not infer causation" i know i know...observing the AR and MA models clearly indicate that there is very little correlation between variables and to themselves...even on a lag-1 basis...my only ideas right now are to toy around w/ multiple aspects of statistics and see what kind of inferences may come up...
                      Comment
                      • Quebb Diesel
                        SBR MVP
                        • 01-26-08
                        • 3045

                        #12
                        Originally posted by Ganchrow
                        Does that mean that we should believe such person to be an expert coin flipper? No. It just means that rare occurrences become increasingly likely as the number of trials increase.
                        but when n becomes large wouldnt you normally appoximate your data? and when a distribution is gaussian arent the number of outliers .4+.007*n on average?

                        the outliers wouldnt be increasingly likely but only increased in observations as n increases assuming normality right?
                        Comment
                        • Ganchrow
                          SBR Hall of Famer
                          • 08-28-05
                          • 5011

                          #13
                          Originally posted by Quebb Diesel
                          but when n becomes large wouldnt you normally appoximate your data? and when a distribution is gaussian arent the number of outliers .4+.007*n on average?

                          the outliers wouldnt be increasingly likely but only increased in observations as n increases assuming normality right?
                          When searching for a potentially profitable strategy it's going to be the outliers which will be of most interest in the first place.

                          I'm not sure what you're getting at here. If you search long and hard enough you'll find something with probability approaching 1. The question reamins, however, will it be predictive?
                          Comment
                          • Quebb Diesel
                            SBR MVP
                            • 01-26-08
                            • 3045

                            #14
                            Originally posted by Ganchrow
                            When searching for a potentially profitable strategy it's going to be the outliers which will be of most interest in the first place.

                            I'm not sure what you're getting at here. If you search long and hard enough you'll find something with probability approaching 1. The question reamins, however, will it be predictive?
                            nono im not talking about in my case...just when you were referring to a binomially distributed situation like flipping coins...with large n dont you typically use normal approximation? just kinda thrown off b/c you said outliers will be increasingly likely but outliers in a gaussian distribution tend to follow a simple algebraic equation on average no?
                            Comment
                            • Ganchrow
                              SBR Hall of Famer
                              • 08-28-05
                              • 5011

                              #15
                              Originally posted by Quebb Diesel
                              nono im not talking about in my case...just when you were referring to a binomially distributed situation like flipping coins...with large n dont you typically use normal approximation? just kinda thrown off b/c you said outliers will be increasingly likely but outliers in a gaussian distribution tend to follow a simple algebraic equation on average no?
                              In context, what I was saying was that as n increases the appearance of one ore more rare occurrences becomes increasingly likely.

                              The more people you have flipping 100 coins each, the more likely it becomes that one or more will flip 65 or more heads.
                              Comment
                              • Quebb Diesel
                                SBR MVP
                                • 01-26-08
                                • 3045

                                #16
                                Originally posted by Ganchrow
                                What I was saying was that as n increases the appearance of one ore more rare occurrences becomes increasingly likely.
                                okay i think i took what you said the wrong way
                                Comment
                                • Art Vandeleigh
                                  SBR MVP
                                  • 12-31-06
                                  • 1494

                                  #17
                                  Originally posted by Ganchrow
                                  We just keep coming back to the same issue again.

                                  Why do you believe the underlying factors to be ARMA in the first place?

                                  You shouldn't be saying to yourself, "Hey, I just learned some neat new math, let's keep testing new and different usages of it until I find something that works, " but rather "OK, I have a theory based on my prior knowledge that now that I['ve learned this neat new math I can finally test".

                                  To paraphrase the aphorism, "When you first learn how to use a hammer, every new problem looks like nail."

                                  Can I ask a question, I don't want to start a new thread.

                                  If I were an alien who had landed on Earth, and I had never experienced thunder or lighting in my home planet, how many times would I need to observe these two seemingly separate phenomenon before I was 95% certain that there was 100% correlation between them?
                                  Comment
                                  • Ganchrow
                                    SBR Hall of Famer
                                    • 08-28-05
                                    • 5011

                                    #18
                                    Originally posted by Art Vandeleigh
                                    Can I ask a question, I don't want to start a new thread.

                                    If I were an alien who had landed on Earth, and I had never experienced thunder or lighting in my home planet, how many times would I need to observe these two seemingly separate phenomenon before I was 95% certain that there was 100% correlation between them?
                                    What I assume you're asking is that if we ignore the varying amount of time between lightning and concomitant thunder, how many observations of lightning+thunder would we need before we could be 95% certain that thunder follows lightning 100% of time.

                                    If so, the answer is ∞.
                                    Comment
                                    • reno cool
                                      SBR MVP
                                      • 07-02-08
                                      • 3567

                                      #19
                                      Is that because of the 100% part?
                                      bird bird da bird's da word
                                      Comment
                                      • Ganchrow
                                        SBR Hall of Famer
                                        • 08-28-05
                                        • 5011

                                        #20
                                        Originally posted by reno cool
                                        Is that because of the 100% part?
                                        Yes. Exactly.

                                        As I demonstrated in a PM to Art than I had originally posted here but then subsequently deleted due to it being too esoteric and off-topic:
                                        In general, to be able to say that thunder followed lightning with probability of at least p given a confidence level of (1-α) we'd need to observe this occurring without fail (log(α)/log(p) - 1) number of times (well, technically it would be the least integer upper bound of that term as one can't have a fractional number of observations).

                                        Hence, our aliens would need to observe thunder following lightning 298 times without fail to be ≥ 95% certain that thunder followed lightning at least 99% of the time.

                                        This assumes that our aliens had no prior knowledge of the nature of this phenomenon and so by default assumed all possible values of p equally likely.
                                        Comment
                                        • Art Vandeleigh
                                          SBR MVP
                                          • 12-31-06
                                          • 1494

                                          #21
                                          First off sorry Queb for hijacking this thread a bit, but the main subject seemed to be about correlation, thought I'd stick the question here instead of starting a new thread.

                                          And to try and equate this thunder/lighting example to sports...

                                          I have observed that an NBA player, after he misses a 3-point shot, will not attempt further 3-pointers until he makes another shot somewhere within the 3-point line.

                                          I would need to observe this 298 consecutive times before I am 95% certain that there is a 99% correlation between the 2 events (missing a 3-pointer/not attempting again until a 2-pointer has been made)
                                          Comment
                                          • Ganchrow
                                            SBR Hall of Famer
                                            • 08-28-05
                                            • 5011

                                            #22
                                            Originally posted by Art Vandeleigh
                                            I would need to observe this 298 consecutive times before I am 95% certain that there is a 99% correlation between the 2 events (missing a 3-pointer/not attempting again until a 2-pointer has been made)
                                            Just a minor note on terminology here.

                                            "Correlation" has a specific meaning (or set of meanings) in probability. Rather than correlation you really mean that there's at least a 99% probability that one event does (or does not) follow the other.

                                            The correlation coefficient between two random variables, as most frequently defined, is the covariance of the two variables divided by the product of their standard deviations. It represents a normalized covariance. and such will necessarily be ≥ -1 and ≤ +1.

                                            In Excel you can determine the correlation and covariance between two arrays of data by using the correl() and covar() functions, respectively.

                                            One should of course always be mindful of the oft-repeated maxim that correlation does not imply causation.
                                            Comment
                                            • Ganchrow
                                              SBR Hall of Famer
                                              • 08-28-05
                                              • 5011

                                              #23
                                              Originally posted by Art Vandeleigh
                                              And to try and equate this thunder/lighting example to sports...

                                              I have observed that an NBA player, after he misses a 3-point shot, will not attempt further 3-pointers until he makes another shot somewhere within the 3-point line.

                                              I would need to observe this 298 consecutive times before I am 95% certain that there is a 99% correlation between the 2 events (missing a 3-pointer/not attempting again until a 2-pointer has been made)
                                              I'll also point out that this only holds if you believe the prior distribution of the probability to be uniform.

                                              In other words, for the above to hold, you'd need to believe prior to making any observations that the likelihood of the probability lying between any two intervals of equal size are equal. So for example, there'd be as much a chance of the probability lying between 45% and 55% as between 90% and 100%.

                                              Unfortunately, from our prior knowledge of the game of basketball, we can say that this is almost certainly not the case.
                                              Comment
                                              Search
                                              Collapse
                                              SBR Contests
                                              Collapse
                                              Top-Rated US Sportsbooks
                                              Collapse
                                              Working...