I could use a little help here. I was reading Ganch's post:
I understand the underlying theory but the math is still elusive.
From Mike Orkin (sorry, the software he's talking about here is no longer available):
Z-VALUES
The z-value measures the likelihood that an observed event would occur due only to chance. The larger the z-value, the less likely it is that chance alone would cause the event. A z-value of 2 or larger is typically called "statistically significant" -- the likelihood is less than 5% that chance would cause such an event. A z-value of 3 or larger indicates that the likelihood is less than three in a thousand that chance would cause the observed event. The Optimizer allows you to save any situations with z-values of 2 or larger.
Z-values are more useful than percentages in measuring the strengths of win-loss statistics because, for example, percentages don't take into account the total number of games played. For example a record of 2-0 versus the point spread gives a win-loss record of 100% and so does a record of 10-0, yet 10-0 is clearly a better record. On the other hand, the z-value of a 2-0 record is 1.41, whereas the z-value of a 10-0 record is 3.16, correctly indicating that 10-0 is the stronger result.
If each team in a game is equally likely to cover the point spread, you can think of the game as a coin toss. Under this assumption, a 2-0 record is equivalent to tossing a coin twice and getting heads on each toss, which has probability 1 in 4. A 10-0 record is equivalent to tossing a coin ten times and getting heads on each toss, which has probability 1 in 1,024. This is reflected by z-values. The higher the z-value, the less likely it is that the record would arise due to chance.
Technically, z-values can be either positive or negative, depending on the direction of the deviation from the average value. For example, a win-loss record of 10-0 has z-value = 3.16, and a win-loss record of 0-10 has z-value = -3.16. Because of this symmetry, the Optimizer displays only absolute z-values.
When searching through large amounts of data, it's not a good idea to rely only on large z-values to identify predictable patterns. An important statistical law, known as the Lottery Principle, asserts that given enough opportunity, weird events will happen due to chance alone. When you toss a coin enough times, you'll eventually get ten heads in a row. Similarly, if you use software like the Optimizer to search through data, you will uncover seemingly strong situations that may be due only to chance. Separating the good stuff from random noise may require further work. For example, you could apply situations with large z-values to fresh data or look at them from different perspectives. The ability to detect predictable patterns in a sea of data can translate into long-term profit.
Questions I have:
1) Is Z score and Z value the same thing?
2) How could you use Excel to get the same numbers Orkin has, i.e. a 10-0 z-value is 3.16? Is it possible to just have two inputs, wins and losses, and get the z value?
3) In Ganch's example:
Z(han. A) = (79 units - 0 units) / 121.696 units ≈ 0.6492
Z(han. B) = (26.5 units - 0 units) / 18.28 units ≈ 1.4500
Neither handicapper has a Z value above 2. Since Orkin asserts 2 is statistically significant, should we conclude that neither capper has proven themselves yet, regardless of how they compare to each other?
I understand the underlying theory but the math is still elusive.
From Mike Orkin (sorry, the software he's talking about here is no longer available):
Z-VALUES
The z-value measures the likelihood that an observed event would occur due only to chance. The larger the z-value, the less likely it is that chance alone would cause the event. A z-value of 2 or larger is typically called "statistically significant" -- the likelihood is less than 5% that chance would cause such an event. A z-value of 3 or larger indicates that the likelihood is less than three in a thousand that chance would cause the observed event. The Optimizer allows you to save any situations with z-values of 2 or larger.
Z-values are more useful than percentages in measuring the strengths of win-loss statistics because, for example, percentages don't take into account the total number of games played. For example a record of 2-0 versus the point spread gives a win-loss record of 100% and so does a record of 10-0, yet 10-0 is clearly a better record. On the other hand, the z-value of a 2-0 record is 1.41, whereas the z-value of a 10-0 record is 3.16, correctly indicating that 10-0 is the stronger result.
If each team in a game is equally likely to cover the point spread, you can think of the game as a coin toss. Under this assumption, a 2-0 record is equivalent to tossing a coin twice and getting heads on each toss, which has probability 1 in 4. A 10-0 record is equivalent to tossing a coin ten times and getting heads on each toss, which has probability 1 in 1,024. This is reflected by z-values. The higher the z-value, the less likely it is that the record would arise due to chance.
Technically, z-values can be either positive or negative, depending on the direction of the deviation from the average value. For example, a win-loss record of 10-0 has z-value = 3.16, and a win-loss record of 0-10 has z-value = -3.16. Because of this symmetry, the Optimizer displays only absolute z-values.
When searching through large amounts of data, it's not a good idea to rely only on large z-values to identify predictable patterns. An important statistical law, known as the Lottery Principle, asserts that given enough opportunity, weird events will happen due to chance alone. When you toss a coin enough times, you'll eventually get ten heads in a row. Similarly, if you use software like the Optimizer to search through data, you will uncover seemingly strong situations that may be due only to chance. Separating the good stuff from random noise may require further work. For example, you could apply situations with large z-values to fresh data or look at them from different perspectives. The ability to detect predictable patterns in a sea of data can translate into long-term profit.
Questions I have:
1) Is Z score and Z value the same thing?
2) How could you use Excel to get the same numbers Orkin has, i.e. a 10-0 z-value is 3.16? Is it possible to just have two inputs, wins and losses, and get the z value?
3) In Ganch's example:
Z(han. A) = (79 units - 0 units) / 121.696 units ≈ 0.6492
Z(han. B) = (26.5 units - 0 units) / 18.28 units ≈ 1.4500
Neither handicapper has a Z value above 2. Since Orkin asserts 2 is statistically significant, should we conclude that neither capper has proven themselves yet, regardless of how they compare to each other?