Looking for market inefficiency vs. predicting the line

**uva3021** · 05-12-10, 01:42 PM

Isn't the prediction of Team A Wins a potential time consuming number in and of itself, the numbers has to come from somewhere

**RockyV** · 09-11-10, 03:21 PM

This is a pretty powerful idea. Rather than building a model from scratch, use the line itself as a predictor variable and see whether some other variable you suspect the market hasn't taken into account is significant or not.

I think it would be hard to do this successfully in practice, though. Like, let's pretend we did a simple linear regression of the Pinnacle line 1 hour before the game starts, and some variable X we think is interesting. We run a regression and get an insignificant coefficient for this variable X. Does this mean that Pinnacle has already taken this variable into account when figuring out its line? Maybe, maybe not. It is possible that Pinnacle took X into account. It is also possible that they didn't, and in fact both you and Pinnacle are ignoring the highly predictive variable X^2.

That is what makes this stuff a big delicate. Still though, interesting post.

**HedgeHog** · 09-11-10, 05:36 PM

Originally posted by mathdotcom

Looking for market inefficiency = given prices, is some variable a significant predictor of outcomes?

Yes, the actual final score of the game in question is the most significant factor. Knowing that will simplify your handicapping. Glad I could help.

**mathdotcom** · 09-11-10, 08:17 PM

Valuable insight as always, Hedgehog.

**That Foreign Guy** · 09-12-10, 07:06 AM

Yeah when in doubt assume that the market consensus is a pretty accurate estimate. I hadn't thought of using it to test whether the bookie models take a specific variable in to account though. Nice post.

**HedgeHog** · 09-12-10, 03:41 PM

**bztips** · 09-12-10, 10:17 PM

Very perceptive view of the issue.
I have thought of it both ways: Initially, I built a model (for MLB) that seems to work well, ie., it appears to generate some +EV opportunities, although I don't have a big enough sample yet.

AFTER building the model, I added in the Pinny closing line to the model -- and guess what, it was statistically INsignificant. Not surprisingly, if I reverse course by starting with only the Pinny line, it's significant by itself. But if I then add in my model variables one at a time, it eventually goes insignificant. (I know, I know, that's not technically the right way to either test the Pinny line or the other model variables.) But my conclusion from running this little exercise is that model building is still an art (mixed in with some science), and that it's very difficult to use statistics to show how efficient a market is or isn't.

**RockyV** · 09-13-10, 01:33 AM

.

^-- Hrm, interesting.

So you have some sort of technique for guessing the line yourself, call it MyLine, and want to test say the variable "Pinnacle line 1 hour before the game starts"?

So you looking at say the model:

ActualGameLine = B_1* PinnacleLine + B_2*MyLine+ e

In a perfect world, with say a perfect or near-perfect line given to you by a genie, you'd run this and get a tiny coefficient for B_1 (For example, if your line were perfect and exactly equal to ActualGameLine, you'd run the regression and get B_1 = 0), and thus get the sense that your line is a lot better than Pinnacle's.

However in practice, my suspicion is that you'll run the regression and get significant coefficients for both B_1 and B_2. This doesn't necessarily mean that your own line sucks; you and Pinnacle are using many of the same variables and so your lines will be pretty correlated. And as predictor variables become more and more correlated, regression in general has a harder time figuring out the right coefficient for each predictor variable (see this article for more info: http://en.wikipedia.org/wiki/Multicollinearity).

So unless you have an enormous amount of data, I'm not sure a regression procedure like this is a good way to test lines against each other. Instead, you probably want to do some sort of Cross-Validation procedure where you build your model on say 5k games, then compare its prediction results (say in a squared error sense, or something) to Pinnacle's on another block of games.

I think Mathdotcom's idea is more useful for cases where you have some raw individual variable in mind that you think Pinnacle hasn't taken into account. Hopefully this variable is uncorrelated with the Pinnacle line, or very close...that way you'll minimize this collinearity issue. And then if things turn out well, you then discover that this variable is statistically significant in your regression model. (Though as the example I gave above showed with X and X^2, it might be necessary for you to transform the variable in the right way for it to improve your prediction results.)

Another way to interpret Mathdotcom's idea is that you have two options when trying to build a profitable model:

1) Build a model that predicts the actual line.
2) Build a model that predicts the difference between the actual line and the pinnacle line.

His argument is that task #2 is conceptually easier, because:

a) You don't really give a damn what the final outcome of the game is, you are hunting for cases where Pinnacle has incorrectly set the line
b) If the market is close to efficient, then you won't be able to do task #1 well unless you are also doing #2.
c) You already know certain variables are not going to be helpful in doing task #2...home court advantage for example has already been taken into account by the line. So you can spend your time actually hunting for things that Pinnacle has not already discovered, rather than reinventing the wheel.

Like others have said, I think this is a very useful way to think about things conceptually...but probably hard to do in practice (though very profitable if you are able to find such variables.)

**statictheory** · 09-13-10, 06:58 PM

"Depending on your question, these can amount to same thing. But most are really interested in the first case. You can always create a model, get it to predict a line, and compare it to the market line. But this is time consuming."(quote)

For football I used to just divide the average yds allowed(2-4 games) by" yards per point'(AVG 2-4 GAMES) of the opposing team to get a score, and add 3 points for home field and i was shocked how close to the actual opening line I was, considering there are so many other factors. It only took me an hour a week to make my lines and that was just with the calculator. im sure all the handicappers will laugh this off but that simple formula gives a fair prediction.

**pedro803** · 09-14-10, 06:07 AM

Originally posted by statictheory

divide the average yds allowed(2-4 games) by" yards per point'(AVG 2-4 GAMES) of the opposing team to get a score, and add 3 points for home field

can you explain what you mean by "yards per point" ? is this on offense? Thanks

maybe a quick example of the formula would help

**statictheory** · 09-14-10, 11:52 AM

Originally posted by pedro803

can you explain what you mean by "yards per point" ? is this on offense? Thanks

maybe a quick example of the formula would help

Take the two teams that are playing each other and do this:

take the avg of yards gained from the last 4 games(you can start doing this at 2 games). always use the last 4 games for your average rather than the season as a whole.
Get the average of points scored the last 4 games(can start at 2 games).
Now divide the avg yds gained by the avg points scored and you get = yards per point.You do this for both teams

You now need the average yards allowed for both teams by the same method as above.

To get the score you now take the average yards allowed by team A and divide
it by your Yards per point number from team B. This is your projected score for team B

Now do the opposite by dividing team Bs average yards allowed by Team As yards per point number to get team As projected score.
add 2-3 for home team.

where there is a large discrepency between your and the books line is hopefully where youll find value. very basic, try it and see if it helps. good luck

**pedro803** · 09-15-10, 04:08 AM

Ok Statictheory I understand now, thanks -- I may try to set that up in a spread sheet just to see how it looks. First, I gotta get scraping with excel dynamic web queries working for me!

**Joe Dogs** · 09-15-10, 06:57 AM

Statictheory

Interesting......I will use this concept after week 2 and see what kind of numbers I get........Thanks.

**statictheory** · 09-16-10, 02:22 PM

Originally posted by Joe Dogs

Statictheory

Interesting......I will use this concept after week 2 and see what kind of numbers I get........Thanks.

Itll give you a base but youll still have to interpret turnovers ,severe losses week before, injuries etc. good luck

**statnerds** · 09-16-10, 03:27 PM

I don't think enough people give YPPT the recognition or respect it deserves.

as to Mathy's argument, or anyone that involves market efficiency, once you accumulate enough data to find that statistically significant factor, the market will have evolved. unless of course you have discovered the mother of all inefficient markets. in that case, take the money and run.

and if we assume the market is efficient, by definition, then all knowable information is already factored into the line. in this case, I would think that the market would evolve and smooth out any defects. but I like the concept Mathy presented and it is a way to make money. you either have to discover a variable others failed to consider or discover one that is factored in, but at an improper value.

**marcoforte** · 09-19-10, 07:10 PM

I've been using statictheory's model for 20+ years. It's from a long out of date book. It's been a long road to find value. You can find it in a few areas but the game changes over time so what worked 20, 10, 5 years ago does not work now. Regression to the mean I guess. Finding areas of value is what I do in the off-season.

**statictheory** · 09-20-10, 11:05 AM

Originally posted by marcoforte

I've been using statictheory's model for 20+ years. It's from a long out of date book. It's been a long road to find value. You can find it in a few areas but the game changes over time so what worked 20, 10, 5 years ago does not work now. Regression to the mean I guess. Finding areas of value is what I do in the off-season.

i was shown it over 10 years ago from a friend, but if it was in a book it wouldnt suprise me. As I said its basic but yards per point accounts for a lot of things since
total yards are affected by turnovers penalties etc. But it does match up with the majority of lines out there, its hard though to interpret the lines where it doesnt match up. I do something other than football theswe days but, also I know making a good line in mlb uses something similar in concept.