Flukes/records?

**gryfyn1** · 04-09-10, 07:11 AM

Originally posted by Dark Horse

. I'm counting it because the book counted it. End of story.

Sounds the way to go - If you are testing a theory/system i may pause to include obvious flukes like this in determining the validity of the system -

**Wrecktangle** · 04-09-10, 07:21 AM

DH, I record everything, but I note it in comments. What you are getting at is the problem with "outliers." I think too many statisticians are too outlier happy and toss them to fit their favorite theories. There is a lot of work on this and I'm sure you've read up on it, but if you pull too many of these events out you can bias your prediction.

One of the areas I have a fair amount of problems with is overtime games which I'd love to toss as I know they bend my methods, so I normalize those to fit better with the regular games. I also bin them to see how they work as a category. I'm still not happy with the normalization, and there are too many of these outliers to toss, so it gives me something (else) to think about when I'm swilling a beer by the pool, I guess.

**Wrecktangle** · 04-09-10, 07:21 AM

double post-some sort of software glitch, I guess.

**dodger33** · 04-09-10, 07:55 AM

I bet the Lackey game last year where he got tossed in the second pitch of the game... I think over time flukes like these will even out.

**dwaechte** · 04-09-10, 09:56 AM

You need to identify why you were playing it, and what went right/wrong based on your predictions. If the play was mostly based on starting pitching as you said, then I wouldn't count it... nothing happened that would be predictable in future games unless your predictions were somehow correlated with the odds of a pitcher getting hurt early in a game, which is almost certainly not the case. I would hope there's more to the system than just starting pitching though.

**u21c3f6** · 04-09-10, 10:43 AM

IMO you must include it because as rare as these events are, they do happen. If you don't include this event, then you must also exclude any "fluke" that was to your advantage.

Joe.

**MadTiger** · 04-09-10, 12:47 PM

Great comments here.

For the OP, I would say that anytime a starting pitcher leaves before a reasonable pitch count (5+ innings), and is not losing, then it would be a "fluke." Injury, illness, family situation, etc.

**Dark Horse** · 04-09-10, 01:33 PM

Thanks for the input guys. Very helpful.

I liked MadTiger's idea that the definition of a fluke can be quite broad and still be objective, as long as it is precise and not forced. Wrecktangle brought up OT, and reminded me that I already use a similarly broad filter for NBA totals, where OT has no place. Why not? Because of the key element that dwaechte brought up: it has no predictive value.

Based on this feedback, I'm tossing the game out. It's the right thing.

**Dirty Sanchez** · 04-09-10, 01:51 PM

My personal opinion is I place everything in my overall record once I call it a play...period. I've taken some goofy losses over the years, but normally they even out.

**Dave Head** · 04-09-10, 01:56 PM

Originally posted by Dark Horse

...I'm counting it because the book counted it. End of story. ...

No. Ask yourself these questions:

1) "Does this game accurately represent both teams?" and
2) "Would I be willing to bet money on either team based on this game?"

Those questions are the litmus test. If a starting pitcher is forced out of a game by an injury in the 2nd inning, then this game fails the test.

Your database is not supposed to be an accurate record of history. It's a tool for predicting future games as accurately as possible.

Originally posted by dodger33

... I think over time flukes like these will even out.

You wouldn't look farther back than 10 to 30 games when predicting a game. So, even if it were true that "over time flukes ... event out", this statement is irrelevant. Flukes do not even out over the short term that you will be using.

Saying something like "It all evens out" is lazy. When I die, my ghost will kick people in the ass who say "It all evens out".

**BigdaddyQH** · 04-09-10, 03:35 PM

You must include it. A win is a win and a loss is a loss.

**Dark Horse** · 04-09-10, 03:42 PM

Originally posted by dwaechte

I would hope there's more to the system than just starting pitching though.

Starting pitchers against the lineup. Plus a little extra.

**skrtelfan** · 04-09-10, 06:42 PM

Originally posted by Dave Head

Saying something like "It all evens out" is lazy. When I die, my ghost will kick people in the ass who say "It all evens out".

While I agree that "it all evens out" is often used as an excuse not to do additional work, in this case, I think you have it backwards. If there's any reason it doesn't all "even out" it's because certain pitchers are more injury prone that others. I'd guess that ground ball pitchers are a bit less susceptible to getting hit by a sharp comebacker than pitchers who allow more line drives. As miniscule as the difference might be between one pitcher getting hit by a line drive and another, that difference still exists.

That's sort of tangential though, the biggest reason why you need to include games like this in the analysis is because removing them could hide the fact that you're underrating the bullpen. There will be times the starting pitcher leaves with an injury, and those times are included in the distribution of "expected number of innings pitched" by the starter.

**Dark Horse** · 04-10-10, 03:35 AM

I've heard of injury prone pitchers, but never of pitchers prone to getting hit by a ball. The first, as you point out, could have some predictive value.

I suppose there are two questions. The first is if a filter should ever be applied retroactively to improve the predictive value of a system. If the answer to that question is yes, then the second question is about the purity of that filter. Obviously, if a filter has any bias it is counterproductive.

**Dark Horse** · 04-10-10, 04:43 AM

I adjusted my view. The problem is retroactive adjustment. Even though throwing out this one result would indeed improve the predictive value of the system, I can't make this adjustment after I became aware of the fluke. Because I don't know how many times a similar instance may have occurred in the past, and is already part of the record.

Retroactive science - 'wait a minute, you didn't tell me that, that doesn't count...' - is never acceptable, if it serves to maintain a hypothesis. Because it is a function of the awareness or lack of awareness of the observer.

However, this makes a difference of only one W/L result. Now that I'm aware of the fluke, I can, going forward, apply it as filter, as long as I define that filter precisely, and as long as I apply it to all results from hereon out. While this may, ever so slightly, affect the consistency of the system already in place, that adjustment will, in this case, be so extremely small (because it's a true fluke) that it won't have a tangible effect.

In this case, the filter would be something like: "if game is tied in first 5 innings, and either starting pitcher has to leave game because he was hit by a ball, toss out the result".

This would be a narrow filter, which I find preferable because it doesn't mess with the system I already have in place. Broader filters can work just as well, or much better, but have to be specified upfront, and would have a far greater impact on the system already in place. So one would have to start from scratch.