1. #1
    Dark Horse
    Deus Ex Machina
    Dark Horse's Avatar Become A Pro!
    Join Date: 12-14-05
    Posts: 13,764

    Flukes/records?

    I'm keeping a record for a rather big MLB project, going forward. I have no interest in making the record look better than it is. I want it unbiased.

    Yesterday a fluke happened. I had identified a strong play, largely based on starting pitching. The pitcher got hit by a ball in the second inning and had to leave the game. The team went on to lose, and I recorded it as a loss. Am I correct to include this, because flukes happen, and I couldn't specify upfront what a fluke is anyway (and determining it afterwards would be more dangerous to an objective record than a true fluke), or does including this result throw the record off, if only ever so slightly?

    The game was scoreless when the pitcher left, and he had recorded just 4 outs. I'm counting it because the book counted it. End of story. But is that the correct approach, statistically speaking? Technically, the pitcher played the game, but realistically he had no impact on it. If the pitcher got hit by lightning after recording 1 out, would I count that too? Where does act-of-God territory start?

    I'm including the result. Wrong or right? The alternative would be to record the game as if the replacement pitcher had started (which, for this method, would have greater objective value going forward than including this fluke). A third possibility would be to not record it at all. Which is most objective?
    Last edited by Dark Horse; 04-09-10 at 07:15 AM.

  2. #2
    gryfyn1
    gryfyn1's Avatar Become A Pro!
    Join Date: 03-30-10
    Posts: 3,285
    Betpoints: 48

    Quote Originally Posted by Dark Horse View Post
    . I'm counting it because the book counted it. End of story.
    Sounds the way to go - If you are testing a theory/system i may pause to include obvious flukes like this in determining the validity of the system -

  3. #3
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    DH, I record everything, but I note it in comments. What you are getting at is the problem with "outliers." I think too many statisticians are too outlier happy and toss them to fit their favorite theories. There is a lot of work on this and I'm sure you've read up on it, but if you pull too many of these events out you can bias your prediction.

    One of the areas I have a fair amount of problems with is overtime games which I'd love to toss as I know they bend my methods, so I normalize those to fit better with the regular games. I also bin them to see how they work as a category. I'm still not happy with the normalization, and there are too many of these outliers to toss, so it gives me something (else) to think about when I'm swilling a beer by the pool, I guess.

  4. #4
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    double post-some sort of software glitch, I guess.
    Last edited by Wrecktangle; 04-09-10 at 07:23 AM. Reason: deleted a double post

  5. #5
    dodger33
    Kershnasty
    dodger33's Avatar Become A Pro!
    Join Date: 08-14-09
    Posts: 3,962
    Betpoints: 244

    I bet the Lackey game last year where he got tossed in the second pitch of the game... I think over time flukes like these will even out.

  6. #6
    dwaechte
    dwaechte's Avatar Become A Pro!
    Join Date: 08-27-07
    Posts: 5,481
    Betpoints: 235

    You need to identify why you were playing it, and what went right/wrong based on your predictions. If the play was mostly based on starting pitching as you said, then I wouldn't count it... nothing happened that would be predictable in future games unless your predictions were somehow correlated with the odds of a pitcher getting hurt early in a game, which is almost certainly not the case. I would hope there's more to the system than just starting pitching though.

  7. #7
    u21c3f6
    u21c3f6's Avatar Become A Pro!
    Join Date: 01-17-09
    Posts: 790
    Betpoints: 5198

    IMO you must include it because as rare as these events are, they do happen. If you don't include this event, then you must also exclude any "fluke" that was to your advantage.

    Joe.

  8. #8
    MadTiger
    Wait 'til next year!
    MadTiger's Avatar Become A Pro!
    Join Date: 04-19-09
    Posts: 2,724
    Betpoints: 47

    Great comments here.

    For the OP, I would say that anytime a starting pitcher leaves before a reasonable pitch count (5+ innings), and is not losing, then it would be a "fluke." Injury, illness, family situation, etc.

  9. #9
    Dark Horse
    Deus Ex Machina
    Dark Horse's Avatar Become A Pro!
    Join Date: 12-14-05
    Posts: 13,764

    Thanks for the input guys. Very helpful.

    I liked MadTiger's idea that the definition of a fluke can be quite broad and still be objective, as long as it is precise and not forced. Wrecktangle brought up OT, and reminded me that I already use a similarly broad filter for NBA totals, where OT has no place. Why not? Because of the key element that dwaechte brought up: it has no predictive value.

    Based on this feedback, I'm tossing the game out. It's the right thing.

  10. #10
    Dirty Sanchez
    Two time SBR Academy Award winner
    Dirty Sanchez's Avatar Become A Pro!
    Join Date: 03-01-10
    Posts: 16,031
    Betpoints: 26

    My personal opinion is I place everything in my overall record once I call it a play...period. I've taken some goofy losses over the years, but normally they even out.

  11. #11
    Dave Head
    Dave Head's Avatar Become A Pro!
    Join Date: 07-22-09
    Posts: 73

    Quote Originally Posted by Dark Horse View Post
    ...I'm counting it because the book counted it. End of story. ...
    No. Ask yourself these questions:

    1) "Does this game accurately represent both teams?" and
    2) "Would I be willing to bet money on either team based on this game?"

    Those questions are the litmus test. If a starting pitcher is forced out of a game by an injury in the 2nd inning, then this game fails the test.

    Your database is not supposed to be an accurate record of history. It's a tool for predicting future games as accurately as possible.

    Quote Originally Posted by dodger33 View Post
    ... I think over time flukes like these will even out.
    You wouldn't look farther back than 10 to 30 games when predicting a game. So, even if it were true that "over time flukes ... event out", this statement is irrelevant. Flukes do not even out over the short term that you will be using.

    Saying something like "It all evens out" is lazy. When I die, my ghost will kick people in the ass who say "It all evens out".

  12. #12
    BigdaddyQH
    BigdaddyQH
    BigdaddyQH's Avatar Become A Pro!
    Join Date: 07-13-09
    Posts: 19,530
    Betpoints: 8638

    You must include it. A win is a win and a loss is a loss.

  13. #13
    Dark Horse
    Deus Ex Machina
    Dark Horse's Avatar Become A Pro!
    Join Date: 12-14-05
    Posts: 13,764

    Quote Originally Posted by dwaechte View Post
    I would hope there's more to the system than just starting pitching though.
    Starting pitchers against the lineup. Plus a little extra.

  14. #14
    skrtelfan
    skrtelfan's Avatar Become A Pro!
    Join Date: 10-09-08
    Posts: 1,913
    Betpoints: 3337

    Quote Originally Posted by Dave Head View Post
    Saying something like "It all evens out" is lazy. When I die, my ghost will kick people in the ass who say "It all evens out".
    While I agree that "it all evens out" is often used as an excuse not to do additional work, in this case, I think you have it backwards. If there's any reason it doesn't all "even out" it's because certain pitchers are more injury prone that others. I'd guess that ground ball pitchers are a bit less susceptible to getting hit by a sharp comebacker than pitchers who allow more line drives. As miniscule as the difference might be between one pitcher getting hit by a line drive and another, that difference still exists.

    That's sort of tangential though, the biggest reason why you need to include games like this in the analysis is because removing them could hide the fact that you're underrating the bullpen. There will be times the starting pitcher leaves with an injury, and those times are included in the distribution of "expected number of innings pitched" by the starter.

  15. #15
    Dark Horse
    Deus Ex Machina
    Dark Horse's Avatar Become A Pro!
    Join Date: 12-14-05
    Posts: 13,764

    I've heard of injury prone pitchers, but never of pitchers prone to getting hit by a ball. The first, as you point out, could have some predictive value.

    I suppose there are two questions. The first is if a filter should ever be applied retroactively to improve the predictive value of a system. If the answer to that question is yes, then the second question is about the purity of that filter. Obviously, if a filter has any bias it is counterproductive.

  16. #16
    Dark Horse
    Deus Ex Machina
    Dark Horse's Avatar Become A Pro!
    Join Date: 12-14-05
    Posts: 13,764

    I adjusted my view. The problem is retroactive adjustment. Even though throwing out this one result would indeed improve the predictive value of the system, I can't make this adjustment after I became aware of the fluke. Because I don't know how many times a similar instance may have occurred in the past, and is already part of the record.

    Retroactive science - 'wait a minute, you didn't tell me that, that doesn't count...' - is never acceptable, if it serves to maintain a hypothesis. Because it is a function of the awareness or lack of awareness of the observer.

    However, this makes a difference of only one W/L result. Now that I'm aware of the fluke, I can, going forward, apply it as filter, as long as I define that filter precisely, and as long as I apply it to all results from hereon out. While this may, ever so slightly, affect the consistency of the system already in place, that adjustment will, in this case, be so extremely small (because it's a true fluke) that it won't have a tangible effect.

    In this case, the filter would be something like: "if game is tied in first 5 innings, and either starting pitcher has to leave game because he was hit by a ball, toss out the result".

    This would be a narrow filter, which I find preferable because it doesn't mess with the system I already have in place. Broader filters can work just as well, or much better, but have to be specified upfront, and would have a far greater impact on the system already in place. So one would have to start from scratch.
    Last edited by Dark Horse; 04-10-10 at 05:11 AM.

Top