1. #1
    metaldome
    metaldome's Avatar Become A Pro!
    Join Date: 02-18-08
    Posts: 22

    For anyone with Sabermetrics knowledge...

    I can't wait for baseball to start. This is the first season I am trying to come up with my own system of projecting baseball scores so I can bet games where I see value in the lines. I want to explain to you some of what I am doing so you can tell me where I may be going wrong.

    1) I tested a bunch of pitching and batting statistics to see how well they correlated to wins and runs scored and found OPS and OPSA to be the best. I know there are a few stats that are supposed to be a little better (wOBA, 1.8OPS, etc) but OPS and OPSA were easier to find and use than any of the other stats, and the difference between them was fairly small (less than one percent).

    2) So next, I compared all major league team's season OPS to their average RPG (and OPSA to average opponents RPG) for the last three years on one graph and found the best fit line. R squared came out to about .87 (an 87% correlation) and the formula was off by an average of 2.5%, or about 0.12 RPG, according to my calculations.

    I did not make any league adjustments because when I compared each league separately to the MLB as a whole, I found that the difference was pretty small (again, less than one percent), and I wanted to keep this as simple as possible. I also figured that any difference between having a designated hitter (or not) would already be reflected in each team's OPS (and therefore the projected score).

    Although I did not find much of a correlation between OPSA and unearned runs (only about 17%, obviously it depends more on fielding than pitching), it seems that including them does not lower the accuracy of the formula. For this reason, I decided to include both earned and unearned runs, as this should make the predicted score and total closer to the actual game results. Would you agree?

    I also wondered whether I should put the formula in the context of runs per game (R/G) or runs per nine innings (or runs per inning). I decided on runs per game, thinking it will be easier (it is hard to get total offensive innings, although close to innings pitched for the team, it would not be exact) and closer to actual scores (you can't know whether a game will go into extra innings beforehand) than the others. Do you think this is best?

    I have not yet added park adjustments because so far I am unsure whether they can be calculated with any degree of accuracy, how I should do it, or whether it will make much of a difference (except for a few teams like Colorado and San Diego). It would also be almost impossible to come up with anything for teams with new stadiums. As far as home field advantage, again I don’t know how accurate you could really get, and was thinking of just going with the major league average of about four percent. Any ideas?

    I think that last year my decisions were too heavily influenced by the last three games for a pitcher and last ten games for a team. This year I would like to use at least a years worth of data in my calculations. For pitchers this is a piece of cake, but I am not sure how I could do this easily for team OPS (lineups change and players get injured or traded throughout the season, and from season to season). Any ideas? (Remember, I want to keep this somewhat simple and don't need to be totally exact. I can't spend eight hours a day collecting information and doing calculations.)

    Lastly, it seems from looking at other predictive models, that I should adjust the numbers to see how teams would do against a league average pitching staff (or for pitchers, against a league average offense) before calculating scores (based on my numbers for offense, starters, bullpens, and the amount of innings I think they will pitch). I think I know how I could do this, but am not exactly sure why this is important. Can anyone explain it to me?

    Sorry if this was long. I hope it wasn't too confusing and that some people got something out of it. Any help with the questions above will be greatly appreciated.
    Last edited by metaldome; 03-13-09 at 02:42 AM.

  2. #2
    Data
    Data's Avatar Become A Pro!
    Join Date: 11-27-07
    Posts: 2,236

    The first question you should ask yourself is what is your edge. If you plan to have an edge via creating a model based on stats then you should dig deeper and not to take any shortcuts. A past or future success of any system that is based on stats and methods that are common knowledge is due to pure chance.

  3. #3
    MrX
    MrX's Avatar Become A Pro!
    Join Date: 01-10-06
    Posts: 1,540

    Data is spot on with his advice, but your post is a well thought-out request for advice and I'll try to give some.

    Quote Originally Posted by metaldome View Post
    I tested a bunch of pitching and batting statistics to see how well they correlated to wins and runs scored and found OPS and OPSA to be the best. I know there are a few stats that are supposed to be a little better (wOBA, 1.8OPS, etc) but OPS and OPSA were easier to find and use than any of the other stats, and the difference between them was fairly small (less than one percent).[/SIZE][/FONT]
    As data pointed out, you're going to have a tough time when you start sacrificing accuracy for the sake of "easier to find and use."

    Those are exactly the kind of "fairly small" differences that are going to separate you from the herd. The market is largely driven by people using the easy to find stats.

    I'd do more research before you use OPSA(or its equivalents) to evaluate pitchers. It correlates well to runs allowed and win%, but there are much better ways to evaluate the ability of a pitcher.

    Quote Originally Posted by metaldome View Post
    I also wondered whether I should put the formula in the context of runs per game (R/G) or runs per nine innings (or runs per inning). I decided on runs per game, thinking it will be easier (it is hard to get total offensive innings, although close to innings pitched for the team, it would not be exact) and closer to actual scores (you can't know whether a game will go into extra innings beforehand) than the others. Do you think this is best?
    I doubt it would make much of a difference, but runs/9 innings is more logical. You wouldn't want a teams offensive stats inflated just because they've been in an unusual number of extra-inning games.

    You need to get a little more creative and do you own manipulation of the easy to find stats. You can easily derive offensive innings as well as the other hard to find stats mentioned earlier.

    Quote Originally Posted by metaldome View Post
    I have not yet added park adjustments because so far I am unsure whether they can be calculated with any degree of accuracy, how I should do it, or whether it will make much of a difference (except for a few teams like Colorado and San Diego). It would also be almost impossible to come up with anything for teams with new stadiums. As far as home field advantage, again I don’t know how accurate you could really get, and was thinking of just going with the major league average of about four percent. Any ideas?
    Park factors are hard... really hard, actually, to do well. At the point you're at right now, I'd work on other things but be aware that you're going to have to address it at some point to handle certain teams.

    Quote Originally Posted by metaldome View Post
    I think that last year my decisions were too heavily influenced by the last three games for a pitcher and last ten games for a team. This year I would like to use at least a years worth of data in my calculations. For pitchers this is a piece of cake, but I am not sure how I could do this easily for team OPS (lineups change and players get injured or traded throughout the season, and from season to season). Any ideas? (Remember, I want to keep this somewhat simple and don't need to be totally exact. I can't spend eight hours a day collecting information and doing calculations.)
    It's a very good sign for you that you realized this. The betting markets tend to over-value recent performance and putting yourself in that camp is not good.

    Study up on regression to the mean.

    I'm sure you can come up with a way to adjust for roster changes and injuries. You can always simply throw out games with major changes.

    Quote Originally Posted by metaldome View Post
    Sorry if this was long. I hope it wasn't too confusing and that some people got something out of it. Any help with the questions above will be greatly appreciated.
    Really you need to decide if you want a successful model, OR if you want simplicity. You can't really have both.

    By the way, the key to making a sophisticated model easy to use is becoming proficient with a programming language.

Top