Flukes/records?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Dark Horse
    SBR Posting Legend
    • 12-14-05
    • 13764

    #1
    Flukes/records?
    I'm keeping a record for a rather big MLB project, going forward. I have no interest in making the record look better than it is. I want it unbiased.

    Yesterday a fluke happened. I had identified a strong play, largely based on starting pitching. The pitcher got hit by a ball in the second inning and had to leave the game. The team went on to lose, and I recorded it as a loss. Am I correct to include this, because flukes happen, and I couldn't specify upfront what a fluke is anyway (and determining it afterwards would be more dangerous to an objective record than a true fluke), or does including this result throw the record off, if only ever so slightly?

    The game was scoreless when the pitcher left, and he had recorded just 4 outs. I'm counting it because the book counted it. End of story. But is that the correct approach, statistically speaking? Technically, the pitcher played the game, but realistically he had no impact on it. If the pitcher got hit by lightning after recording 1 out, would I count that too? Where does act-of-God territory start?

    I'm including the result. Wrong or right? The alternative would be to record the game as if the replacement pitcher had started (which, for this method, would have greater objective value going forward than including this fluke). A third possibility would be to not record it at all. Which is most objective?
    Last edited by Dark Horse; 04-09-10, 07:15 AM.
  • gryfyn1
    SBR MVP
    • 03-30-10
    • 3285

    #2
    Originally posted by Dark Horse
    . I'm counting it because the book counted it. End of story.
    Sounds the way to go - If you are testing a theory/system i may pause to include obvious flukes like this in determining the validity of the system -
    Comment
    • Wrecktangle
      SBR MVP
      • 03-01-09
      • 1524

      #3
      DH, I record everything, but I note it in comments. What you are getting at is the problem with "outliers." I think too many statisticians are too outlier happy and toss them to fit their favorite theories. There is a lot of work on this and I'm sure you've read up on it, but if you pull too many of these events out you can bias your prediction.

      One of the areas I have a fair amount of problems with is overtime games which I'd love to toss as I know they bend my methods, so I normalize those to fit better with the regular games. I also bin them to see how they work as a category. I'm still not happy with the normalization, and there are too many of these outliers to toss, so it gives me something (else) to think about when I'm swilling a beer by the pool, I guess.
      Comment
      • Wrecktangle
        SBR MVP
        • 03-01-09
        • 1524

        #4
        double post-some sort of software glitch, I guess.
        Last edited by Wrecktangle; 04-09-10, 07:23 AM. Reason: deleted a double post
        Comment
        • dodger33
          SBR MVP
          • 08-14-09
          • 3962

          #5
          I bet the Lackey game last year where he got tossed in the second pitch of the game... I think over time flukes like these will even out.
          Comment
          • dwaechte
            SBR Hall of Famer
            • 08-27-07
            • 5481

            #6
            You need to identify why you were playing it, and what went right/wrong based on your predictions. If the play was mostly based on starting pitching as you said, then I wouldn't count it... nothing happened that would be predictable in future games unless your predictions were somehow correlated with the odds of a pitcher getting hurt early in a game, which is almost certainly not the case. I would hope there's more to the system than just starting pitching though.
            Comment
            • u21c3f6
              SBR Wise Guy
              • 01-17-09
              • 790

              #7
              IMO you must include it because as rare as these events are, they do happen. If you don't include this event, then you must also exclude any "fluke" that was to your advantage.

              Joe.
              Comment
              • MadTiger
                SBR MVP
                • 04-19-09
                • 2724

                #8
                Great comments here.

                For the OP, I would say that anytime a starting pitcher leaves before a reasonable pitch count (5+ innings), and is not losing, then it would be a "fluke." Injury, illness, family situation, etc.
                Comment
                • Dark Horse
                  SBR Posting Legend
                  • 12-14-05
                  • 13764

                  #9
                  Thanks for the input guys. Very helpful.

                  I liked MadTiger's idea that the definition of a fluke can be quite broad and still be objective, as long as it is precise and not forced. Wrecktangle brought up OT, and reminded me that I already use a similarly broad filter for NBA totals, where OT has no place. Why not? Because of the key element that dwaechte brought up: it has no predictive value.

                  Based on this feedback, I'm tossing the game out. It's the right thing.
                  Comment
                  • Dirty Sanchez
                    SBR Posting Legend
                    • 03-01-10
                    • 16031

                    #10
                    My personal opinion is I place everything in my overall record once I call it a play...period. I've taken some goofy losses over the years, but normally they even out.
                    Comment
                    • Dave Head
                      SBR Hustler
                      • 07-22-09
                      • 73

                      #11
                      Originally posted by Dark Horse
                      ...I'm counting it because the book counted it. End of story. ...
                      No. Ask yourself these questions:

                      1) "Does this game accurately represent both teams?" and
                      2) "Would I be willing to bet money on either team based on this game?"

                      Those questions are the litmus test. If a starting pitcher is forced out of a game by an injury in the 2nd inning, then this game fails the test.

                      Your database is not supposed to be an accurate record of history. It's a tool for predicting future games as accurately as possible.

                      Originally posted by dodger33
                      ... I think over time flukes like these will even out.
                      You wouldn't look farther back than 10 to 30 games when predicting a game. So, even if it were true that "over time flukes ... event out", this statement is irrelevant. Flukes do not even out over the short term that you will be using.

                      Saying something like "It all evens out" is lazy. When I die, my ghost will kick people in the ass who say "It all evens out".
                      Comment
                      • BigdaddyQH
                        SBR Posting Legend
                        • 07-13-09
                        • 19530

                        #12
                        You must include it. A win is a win and a loss is a loss.
                        Comment
                        • Dark Horse
                          SBR Posting Legend
                          • 12-14-05
                          • 13764

                          #13
                          Originally posted by dwaechte
                          I would hope there's more to the system than just starting pitching though.
                          Starting pitchers against the lineup. Plus a little extra.
                          Comment
                          • skrtelfan
                            SBR MVP
                            • 10-09-08
                            • 1913

                            #14
                            Originally posted by Dave Head
                            Saying something like "It all evens out" is lazy. When I die, my ghost will kick people in the ass who say "It all evens out".
                            While I agree that "it all evens out" is often used as an excuse not to do additional work, in this case, I think you have it backwards. If there's any reason it doesn't all "even out" it's because certain pitchers are more injury prone that others. I'd guess that ground ball pitchers are a bit less susceptible to getting hit by a sharp comebacker than pitchers who allow more line drives. As miniscule as the difference might be between one pitcher getting hit by a line drive and another, that difference still exists.

                            That's sort of tangential though, the biggest reason why you need to include games like this in the analysis is because removing them could hide the fact that you're underrating the bullpen. There will be times the starting pitcher leaves with an injury, and those times are included in the distribution of "expected number of innings pitched" by the starter.
                            Comment
                            • Dark Horse
                              SBR Posting Legend
                              • 12-14-05
                              • 13764

                              #15
                              I've heard of injury prone pitchers, but never of pitchers prone to getting hit by a ball. The first, as you point out, could have some predictive value.

                              I suppose there are two questions. The first is if a filter should ever be applied retroactively to improve the predictive value of a system. If the answer to that question is yes, then the second question is about the purity of that filter. Obviously, if a filter has any bias it is counterproductive.
                              Comment
                              • Dark Horse
                                SBR Posting Legend
                                • 12-14-05
                                • 13764

                                #16
                                I adjusted my view. The problem is retroactive adjustment. Even though throwing out this one result would indeed improve the predictive value of the system, I can't make this adjustment after I became aware of the fluke. Because I don't know how many times a similar instance may have occurred in the past, and is already part of the record.

                                Retroactive science - 'wait a minute, you didn't tell me that, that doesn't count...' - is never acceptable, if it serves to maintain a hypothesis. Because it is a function of the awareness or lack of awareness of the observer.

                                However, this makes a difference of only one W/L result. Now that I'm aware of the fluke, I can, going forward, apply it as filter, as long as I define that filter precisely, and as long as I apply it to all results from hereon out. While this may, ever so slightly, affect the consistency of the system already in place, that adjustment will, in this case, be so extremely small (because it's a true fluke) that it won't have a tangible effect.

                                In this case, the filter would be something like: "if game is tied in first 5 innings, and either starting pitcher has to leave game because he was hit by a ball, toss out the result".

                                This would be a narrow filter, which I find preferable because it doesn't mess with the system I already have in place. Broader filters can work just as well, or much better, but have to be specified upfront, and would have a far greater impact on the system already in place. So one would have to start from scratch.
                                Last edited by Dark Horse; 04-10-10, 05:11 AM.
                                Comment
                                SBR Contests
                                Collapse
                                Top-Rated US Sportsbooks
                                Collapse
                                Working...