CRIS Salami Line

**tomcowley** · 07-03-10, 01:10 PM

If that 3-effect is real, it's something really really strange. I created 1 million synthetic 13-15 game salamis by picking that many random games from the last 8 years and adding up the MOVs. My push%s for each number in range were 2.3x% and there were no abnormally weak or strong numbers for 13, 14, or 15 games.

**Ganchrow** · 07-03-10, 04:12 PM

Originally posted by tomcowley

If that 3-effect is real, it's something really really strange. I created 1 million synthetic 13-15 game salamis by picking that many random games from the last 8 years and adding up the MOVs. My push%s for each number in range were 2.3x% and there were no abnormally weak or strong numbers for 13, 14, or 15 games.

Yeah following your lead, I ran 10,000,000 15-game synthetic Salamis (without replacement per Salami, FWIW) randomly selected from 1990-2009 data selected via the 32-bit Mersenne Twister PNRG fully seeded via random.org.

And with all that I'm seeing essentially the same results as you've described. I still have no explanation for the observed 3-effect (and 3-multiple) effect. But based upon these results, I have no choice but to concede that my earlier dismissal of the possibility of a statistical aberration was likely made considerably too hastily. (Although I'm not ruling out earlier programmer error on my part either).

In case anyone's interested, here's the hastily hacked together Perl code I used to simulate the 15-game Salami's:

And the results from the 10,000,000 sim:

Hard to argue with that.

**Data** · 07-03-10, 06:20 PM

Originally posted by Ganchrow

So for a 15-game Salami the CLT would predict a mean and and standard deviation of 1.94139 and 17.00553 runs respectively. The following table compares the predicted frequencies (using a continuity correction) with actual 15-game Salami results over the in-sample time period:

I think the most interesting take-away from this would be the low observed frequency of the 3-run home Salami MOV (and, it turns out, for subsequent multiples of 3).

I just cannot see any significance of the number 3. However, I suspect that 7 should produce noticeable anomalies. As you can see, MOV -4 is less than expected, same as MOV 3. So, I am wondering what are the numbers for MOV -11 and MOV 10. The effect should exist albeit to a lesser extent.

**flyingillini** · 07-04-10, 12:51 AM

It's nice to see Ganch posting this information!

**Ganchrow** · 07-04-10, 01:22 AM

Originally posted by Data

I just cannot see any significance of the number 3. However, I suspect that 7 should produce noticeable anomalies. As you can see, MOV -4 is less than expected, same as MOV 3. So, I am wondering what are the numbers for MOV -11 and MOV 10. The effect should exist albeit to a lesser extent.

Nor can I.

Unless we're both missing something however, TomCowleys' experiment (followed by my reproduction of his results -- now that's real science reproducible experimenation

) does rather strongly suggest that to be an aberration.

Also don't discount the the possibility that I simply made a mistake in my earlier data culling. While, I have rechecked it, someone else might want to verify my initial findings. The fact is that it does represent a fairly large outlier (although once again because we're dealing with several strata of categorical data, the results are not actually as extreme as they might appear at first glance).

Ayway, if you're interested in CLT predicted vs. actual results over a larger support:Just remember of course that CLT convergence drops off as we further approach the distribution tails.

Oh and btw, Data, I just checked my ledger and it appears that you still owe me a drink. Please don't make me call you a stiff on the open forum.

**Ganchrow** · 07-04-10, 01:26 AM

Originally posted by flyingillini

It's nice to see Ganch posting this information!

Thanks for the kind for the kind words,

And I find it nicer still to see so many people getting involved in these conversations.

**Data** · 07-04-10, 12:51 PM

Originally posted by Ganchrow

TomCowleys' experiment (followed by my reproduction of his results -- now that's real science reproducible experimenation

) does rather strongly suggest that to be an aberration.

I am sorry but I do not put too much weight into these results due to my complete disagreement with a blind assumption that n-game salami distribution is sufficiently close to a distribution of randomly selected n games. There are parameters that make any given salami a non-random set, some of those parameters like teams' relative strength or cold/hot season are self-evident while likely there are others that are not immediately obvious. Please note, this assumption may be proven correct at the end with more research done but the initial assumption that I would make is that non-randomness must not be ignored.

Here is my "back of the envelope" take on this. A 15-game salami will fall into 16 subsets where a home team wins 0 to 15 games. Lets assume that each "step" from one subset to another will result in changing the Home team MOV by 2.6 and the Away team MOV by 3. (Note that this numbers seems reasonably close to the medians and should not be raising eyebrows). With that assumption, here are the expected maximums and minimums in each subset's distribution with minimums positioned right in the middle.

As far as I can tell, this table matches your real observed results pretty well. I am not saying this is nearly accurate as 1,000,000 simulations results but I tend to think that this is a better approach in both, the logic behind and the results.

Oh and btw, Data, I just checked my ledger and it appears that you still owe me a drink. Please don't make me call you a stiff on the open forum.

Please be reminded that our payout method requires customer's physical presence in ***** city. Should you satisfy this requirement you can request your payout on any day.

**Ganchrow** · 07-05-10, 10:59 AM

First off, Data, all I can say is that finally someone other than myself has begun using the [table][/table] BB tags. Good on you, ya Cossack. A trend brewing perhaps? It warms my otherwise frigid heart.

Originally posted by Data

I am sorry but I do not put too much weight into these results due to my complete disagreement with a blind assumption that n-game salami distribution is sufficiently close to a distribution of randomly selected n games. There are parameters that make any given salami a non-random set, some of those parameters like teams' relative strength or cold/hot season are self-evident while likely there are others that are not immediately obvious. Please note, this assumption may be proven correct at the end with more research done but the initial assumption that I would make is that non-randomness must not be ignored.

Fair enough, although I'm not at first blush particularly inclined to agree. Still, I can't immediately offer compelling evidence to the contrary.

Originally posted by Data

Here is my "back of the envelope" take on this. A 15-game salami will fall into 16 subsets where a home team wins 0 to 15 games. Lets assume that each "step" from one subset to another will result in changing the Home team MOV by 2.6 and the Away team MOV by 3. (Note that this numbers seems reasonably close to the medians and should not be raising eyebrows). With that assumption, here are the expected maximums and minimums in each subset's distribution with minimums positioned right in the middle.

As far as I can tell, this table matches your real observed results pretty well. I am not saying this is nearly accurate as 1,000,000 simulations results but I tend to think that this is a better approach in both, the logic behind and the results.

I think that's certainly a fair initial step in what ideally would become a larger combinatorial analysis. My primary objection, however, would be that by reducing the variance via solely considering the medians of the two states (i.e., home win and home loss) and ignoring the tails, we'd necessarily be creating a results distribution more discrete than what we'd find in reality.

In an attempt to be as fair-minded as you, however, (at least in this post

), I do have to concede that this certainly does present the beginnings of what could be a strong counter-theory.

With that in mind and in attempt to further dissect the data, following are frequency analyses broken down by periods:

These do rather clearly show that the 3-run gap appears fairly uniformly from year-to-year (excepting 1990-1998, a period notable for the paucity of 15-game Salamis). Less so for the 6-run, but still not to what might be construed a negligible extenet.

Now looking at it from month to month:

So it does indeed seem that (March/April excluded) this is a consistent phenomenon from month to month. How statistically relevant is this in light of both our prior in-sample observation of the 3-run phenomenon and the small sample sizes of each of our data partitions? Well, that's a bit too much multinomial statistics for me to wade through on a Monday, but my first inclination would be "not irrelevant but still probably less relevant than it might appear at first glance".

Anyway, as I've already reversed my opinion on this at least once and arguably twice, I'm going to temporarily recuse myself and wait and see if any other analysts among us can make some compelling arguments.

Originally posted by Data

Please be reminded that our payout method requires customer's physical presence in ***** city. Should you satisfy this requirement you can request your payout on any day.


		Stiff.

**Data** · 07-05-10, 01:12 PM

Cossack, stiff... With Ganchrow's departure the TT went downhill with posters resorting to name calling and posting pictures. Pathetic...

**Data** · 07-05-10, 01:23 PM

Originally posted by Ganchrow

My primary objection, however, would be that by reducing the variance via solely considering the medians of the two states (i.e., home win and home loss) and ignoring the tails, we'd necessarily be creating a results distribution more discrete than what we'd find in reality.

Sure, but I was not going to ignore the tails. I was merely attempting to make some sense of the "bumps". Kind of taking the sims as a first approximation and then introducing some small "waves" instead of a curve line.

**Ganchrow** · 07-05-10, 02:32 PM

Originally posted by Data

Sure, but I was not going to ignore the tails. I was merely attempting to make some sense of the "bumps". Kind of taking the sims as a first approximation and then introducing some small "waves" instead of a curve line.

Of course. What we're all just trying to get at is a reasonable explanation for the relative dearth of 3-run Home Salami wins. Both you and Cowley have each produced somewhat competing arguments, each with merits, each with holes. Me, I'm just hoping to be convinced one way or the other before I have to do any serious thinking.

**Ganchrow** · 07-05-10, 02:35 PM

Originally posted by Data

Cossack, stiff... With Ganchrow's departure the TT went downhill with posters resorting to name calling and posting pictures. Pathetic...

Did you miss your last appointment with Dr. Soong (nerd alert) for installation of your upgraded humor chip?

**Data** · 07-05-10, 02:58 PM

Originally posted by Ganchrow

Did you miss your last appointment with Dr. Soong (nerd alert) for installation of your upgraded humor chip?

Perhaps, but this only says that, unlike for you, there is a hope for me.

**Ganchrow** · 07-05-10, 03:05 PM

Originally posted by Data

Perhaps, but this only says that, unlike for you, there is a hope for me.

I abandoned all hope years ago.

**mathdotcom** · 07-08-10, 11:11 AM

So today we have:

+155/-175 at CRIS (and currently +150/-160 at Pinn)
Home -4.5 -105 at CRIS

If we take the fair line to be 155, using Ganchrow's table the probability of the home MOV to be more than 4 is ~ 0.5226 > break even probability of 0.5122 at -105.

With a fair line of 165, the probability of home MOV > 4 is 0.5374.

Pinn has -4.5 @ -103, too. What am I missing?

**tomcowley** · 07-08-10, 11:22 AM

The points are going to be worth more in general than the push%s above because those push %s are for all salamis (it's like asking what the NFL 3 push % is by looking at all the games instead of the games lined in the neighborhood of 3). Also, 12 game salami today, so the points are worth a bit more.

**mathdotcom** · 07-08-10, 01:40 PM

Good point tom

I will be back the next day there are 15 games

**mathdotcom** · 07-09-10, 09:47 AM

Cris:
Away +4 -105
Home -4 -115

Away ML +150
Home ML -170

If fair odds on ML are 160, then again using Ganch's table the probability of Home MOV > 4 is ~ 0.5301, which suggests a fair line of:

Home -4.5 -113

Pinnacle currently has -4.5 -107 with Away/Home as +156/-166.

Nothing to get excited about but there seems to be a small bias.