Justin7 mentioned in another thread that ML calculations should consider totals. I'm also seeing ML differences that depend on location (home, visitor, neutral). Whether these differences are significant or not... well, that's why I'm posting this.
Using a window of closing spreads from -5.5 to -4.5 and a window of total scores from 110 to 130 (which might be used to evaluate a -5 spread and a 120 o/u), here's what I'm seeing using data scraped since 1997 from Covers.com:
VISITOR wins: 257 Losses: 118 Prob: 0.685 Odds: -218
HOME wins: 600 Losses: 296 Prob: 0.670 Odds: -203
NEUTRAL wins: 155 Losses: 72 Prob: 0.683 Odds: -215
At first glance, this says that home teams for this particular subset are slightly overrated: they win less frequently (0.670) for the given spread(s) and given total(s) than visitors (0.685) or teams playing at neutral sites (0.683). I know the relatively small sample means this could be meaningless, but I'm seeing the same effect on other subsets of the data. For example, using spreads from -2.5 to -3.5 on the same totals:
VISITOR wins: 383 Losses: 204 Prob: 0.652 Odds: -188
HOME wins: 553 Losses: 382 Prob: 0.591 Odds: -145
NEUTRAL wins: 177 Losses: 117 Prob: 0.602 Odds: -151
and for spreads from -1.5 to -2.5 for the same (110-130) range of totals:
VISITOR wins: 348 Losses: 223 Prob: 0.609 Odds: -156
HOME wins: 435 Losses: 393 Prob: 0.525 Odds: -111
NEUTRAL wins: 151 Losses: 140 Prob: 0.519 Odds: -108
My first test in evaluating anything gleaned from data mining is to ask, "Is there any kind of reasonable theory that would explain this?" If, for example, I discovered that teams with more than four vowels in their names did better than teams with fewer than four vowels, the answer to this would be a quick "no" (or more likely, a "NFW!"). In this case, I can come up with some theories (spreads are slightly biased to over-favor home teams) but nothing that's absolutely convincing. And, there's always the possibility that it's all within the range of error. I confess: I haven't done the math on this (yet).
Thoughts? I'm sure this has been investigated before.
Using a window of closing spreads from -5.5 to -4.5 and a window of total scores from 110 to 130 (which might be used to evaluate a -5 spread and a 120 o/u), here's what I'm seeing using data scraped since 1997 from Covers.com:
VISITOR wins: 257 Losses: 118 Prob: 0.685 Odds: -218
HOME wins: 600 Losses: 296 Prob: 0.670 Odds: -203
NEUTRAL wins: 155 Losses: 72 Prob: 0.683 Odds: -215
At first glance, this says that home teams for this particular subset are slightly overrated: they win less frequently (0.670) for the given spread(s) and given total(s) than visitors (0.685) or teams playing at neutral sites (0.683). I know the relatively small sample means this could be meaningless, but I'm seeing the same effect on other subsets of the data. For example, using spreads from -2.5 to -3.5 on the same totals:
VISITOR wins: 383 Losses: 204 Prob: 0.652 Odds: -188
HOME wins: 553 Losses: 382 Prob: 0.591 Odds: -145
NEUTRAL wins: 177 Losses: 117 Prob: 0.602 Odds: -151
and for spreads from -1.5 to -2.5 for the same (110-130) range of totals:
VISITOR wins: 348 Losses: 223 Prob: 0.609 Odds: -156
HOME wins: 435 Losses: 393 Prob: 0.525 Odds: -111
NEUTRAL wins: 151 Losses: 140 Prob: 0.519 Odds: -108
My first test in evaluating anything gleaned from data mining is to ask, "Is there any kind of reasonable theory that would explain this?" If, for example, I discovered that teams with more than four vowels in their names did better than teams with fewer than four vowels, the answer to this would be a quick "no" (or more likely, a "NFW!"). In this case, I can come up with some theories (spreads are slightly biased to over-favor home teams) but nothing that's absolutely convincing. And, there's always the possibility that it's all within the range of error. I confess: I haven't done the math on this (yet).
Thoughts? I'm sure this has been investigated before.