If that 3-effect is real, it's something really really strange. I created 1 million synthetic 13-15 game salamis by picking that many random games from the last 8 years and adding up the MOVs. My push%s for each number in range were 2.3x% and there were no abnormally weak or strong numbers for 13, 14, or 15 games.
CRIS Salami Line
Collapse
X
-
tomcowleySBR MVP
- 10-01-07
- 1129
#36Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#37If that 3-effect is real, it's something really really strange. I created 1 million synthetic 13-15 game salamis by picking that many random games from the last 8 years and adding up the MOVs. My push%s for each number in range were 2.3x% and there were no abnormally weak or strong numbers for 13, 14, or 15 games.
And with all that I'm seeing essentially the same results as you've described. I still have no explanation for the observed 3-effect (and 3-multiple) effect. But based upon these results, I have no choice but to concede that my earlier dismissal of the possibility of a statistical aberration was likely made considerably too hastily. (Although I'm not ruling out earlier programmer error on my part either).
In case anyone's interested, here's the hastily hacked together Perl code I used to simulate the 15-game Salami's:
Code:#!perl use strict; use Math::Random::MT; # Mersenne Twister module available from CPAN # http://search.cpan.org/~ams/Math-Random-MT-1.11/MT.pm # requires Perl 5.10.0 or higher use constant SIZE => 15; use constant TRIALS => 10_000_000; my (@movs, $rand_gen, $sum_r, ); BEGIN { warn "Seeding random number generator.\n"; require LWP::Simple; my $RAND_URL=\("http://random.org/integers/?num=1248&min=0&max=65535&col=2&base=10&format=plain&rnd=new"); my (@seed); foreach (split(/\n/, LWP::Simple::get($$RAND_URL))) { m/^([0-9]+)\s+([0-9]+)$/; push @seed, $1 + $2*2**16; } $rand_gen = Math::Random::MT->new(@seed); warn "Random number generator seeded.\n"; } while(<>) { next unless m/^[12][0-9]{7}/; chomp; my($date, $away, $home, $mov,) =split; push @movs, $mov; } for(my $i=1; $i<=TRIALS; $i++) { my $selected = {}; my $sum = 0; warn "TRIAL# $i\n" unless $i % 10_000; for (my $j=0; $j < SIZE; $j++) { my $r = int($rand_gen->rand($#movs + 1)); redo if $selected->{$r}; $selected->{$r} = 1; $sum += $movs[$r]; } $sum_r->{$sum}++; } foreach my $sum (sort {$a <=> $b} keys %{ $sum_r } ) { print "$sum\t$sum_r->{$sum}\n"; }
And the results from the 10,000,000 sim:
Hard to argue with that.Comment -
DataSBR MVP
- 11-27-07
- 2236
#38So for a 15-game Salami the CLT would predict a mean and and standard deviation of 1.94139 and 17.00553 runs respectively. The following table compares the predicted frequencies (using a continuity correction) with actual 15-game Salami results over the in-sample time period:
I think the most interesting take-away from this would be the low observed frequency of the 3-run home Salami MOV (and, it turns out, for subsequent multiples of 3).Comment -
flyingilliniSBR Aristocracy
- 12-06-06
- 41217
#39It's nice to see Ganch posting this information!המוסד
המוסד למודיעין ולתפקידים מיוחדים
Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#40I just cannot see any significance of the number 3. However, I suspect that 7 should produce noticeable anomalies. As you can see, MOV -4 is less than expected, same as MOV 3. So, I am wondering what are the numbers for MOV -11 and MOV 10. The effect should exist albeit to a lesser extent.
Unless we're both missing something however, TomCowleys' experiment (followed by my reproduction of his results -- now that's real science reproducible experimenation) does rather strongly suggest that to be an aberration.
Also don't discount the the possibility that I simply made a mistake in my earlier data culling. While, I have rechecked it, someone else might want to verify my initial findings. The fact is that it does represent a fairly large outlier (although once again because we're dealing with several strata of categorical data, the results are not actually as extreme as they might appear at first glance).
Ayway, if you're interested in CLT predicted vs. actual results over a larger support:Just remember of course that CLT convergence drops off as we further approach the distribution tails.
Oh and btw, Data, I just checked my ledger and it appears that you still owe me a drink. Please don't make me call you a stiff on the open forum.Comment -
DataSBR MVP
- 11-27-07
- 2236
#42
Here is my "back of the envelope" take on this. A 15-game salami will fall into 16 subsets where a home team wins 0 to 15 games. Lets assume that each "step" from one subset to another will result in changing the Home team MOV by 2.6 and the Away team MOV by 3. (Note that this numbers seems reasonably close to the medians and should not be raising eyebrows). With that assumption, here are the expected maximums and minimums in each subset's distribution with minimums positioned right in the middle.
As far as I can tell, this table matches your real observed results pretty well. I am not saying this is nearly accurate as 1,000,000 simulations results but I tend to think that this is a better approach in both, the logic behind and the results.
Oh and btw, Data, I just checked my ledger and it appears that you still owe me a drink. Please don't make me call you a stiff on the open forum.Last edited by Data; 07-04-10, 01:11 PM.Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#43First off, Data, all I can say is that finally someone other than myself has begun using the [table][/table] BB tags. Good on you, ya Cossack. A trend brewing perhaps? It warms my otherwise frigid heart.
I am sorry but I do not put too much weight into these results due to my complete disagreement with a blind assumption that n-game salami distribution is sufficiently close to a distribution of randomly selected n games. There are parameters that make any given salami a non-random set, some of those parameters like teams' relative strength or cold/hot season are self-evident while likely there are others that are not immediately obvious. Please note, this assumption may be proven correct at the end with more research done but the initial assumption that I would make is that non-randomness must not be ignored.
Here is my "back of the envelope" take on this. A 15-game salami will fall into 16 subsets where a home team wins 0 to 15 games. Lets assume that each "step" from one subset to another will result in changing the Home team MOV by 2.6 and the Away team MOV by 3. (Note that this numbers seems reasonably close to the medians and should not be raising eyebrows). With that assumption, here are the expected maximums and minimums in each subset's distribution with minimums positioned right in the middle.
As far as I can tell, this table matches your real observed results pretty well. I am not saying this is nearly accurate as 1,000,000 simulations results but I tend to think that this is a better approach in both, the logic behind and the results.
In an attempt to be as fair-minded as you, however, (at least in this post), I do have to concede that this certainly does present the beginnings of what could be a strong counter-theory.
With that in mind and in attempt to further dissect the data, following are frequency analyses broken down by periods:
These do rather clearly show that the 3-run gap appears fairly uniformly from year-to-year (excepting 1990-1998, a period notable for the paucity of 15-game Salamis). Less so for the 6-run, but still not to what might be construed a negligible extenet.
Now looking at it from month to month:
So it does indeed seem that (March/April excluded) this is a consistent phenomenon from month to month. How statistically relevant is this in light of both our prior in-sample observation of the 3-run phenomenon and the small sample sizes of each of our data partitions? Well, that's a bit too much multinomial statistics for me to wade through on a Monday, but my first inclination would be "not irrelevant but still probably less relevant than it might appear at first glance".
Anyway, as I've already reversed my opinion on this at least once and arguably twice, I'm going to temporarily recuse myself and wait and see if any other analysts among us can make some compelling arguments.
Stiff. Comment -
DataSBR MVP
- 11-27-07
- 2236
#44Cossack, stiff... With Ganchrow's departure the TT went downhill with posters resorting to name calling and posting pictures. Pathetic...Comment -
DataSBR MVP
- 11-27-07
- 2236
#45My primary objection, however, would be that by reducing the variance via solely considering the medians of the two states (i.e., home win and home loss) and ignoring the tails, we'd necessarily be creating a results distribution more discrete than what we'd find in reality.Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
#46Comment -
GanchrowSBR Hall of Famer
- 08-28-05
- 5011
-
mathdotcomSBR Posting Legend
- 03-24-08
- 11689
#50So today we have:
+155/-175 at CRIS (and currently +150/-160 at Pinn)
Home -4.5 -105 at CRIS
If we take the fair line to be 155, using Ganchrow's table the probability of the home MOV to be more than 4 is ~ 0.5226 > break even probability of 0.5122 at -105.
With a fair line of 165, the probability of home MOV > 4 is 0.5374.
Pinn has -4.5 @ -103, too. What am I missing?Comment -
tomcowleySBR MVP
- 10-01-07
- 1129
#51The points are going to be worth more in general than the push%s above because those push %s are for all salamis (it's like asking what the NFL 3 push % is by looking at all the games instead of the games lined in the neighborhood of 3). Also, 12 game salami today, so the points are worth a bit more.Comment -
mathdotcomSBR Posting Legend
- 03-24-08
- 11689
#52Good point tom
I will be back the next day there are 15 gamesComment -
mathdotcomSBR Posting Legend
- 03-24-08
- 11689
#53Cris:
Away +4 -105
Home -4 -115
Away ML +150
Home ML -170
If fair odds on ML are 160, then again using Ganch's table the probability of Home MOV > 4 is ~ 0.5301, which suggests a fair line of:
Home -4.5 -113
Pinnacle currently has -4.5 -107 with Away/Home as +156/-166.
Nothing to get excited about but there seems to be a small bias.Comment
SBR Contests
Collapse
Top-Rated US Sportsbooks
Collapse
#1 BetMGM
4.8/5 BetMGM Bonus Code
#2 FanDuel
4.8/5 FanDuel Promo Code
#3 Caesars
4.8/5 Caesars Promo Code
#4 DraftKings
4.7/5 DraftKings Promo Code
#5 Fanatics
#6 bet365
4.7/5 bet365 Bonus Code
#7 Hard Rock
4.1/5 Hard Rock Bet Promo Code
#8 BetRivers
4.1/5 BetRivers Bonus Code