1. #36
    suicidekings
    Update your status
    suicidekings's Avatar Become A Pro!
    Join Date: 03-23-09
    Posts: 9,962

    Quote Originally Posted by illfuuptn View Post
    Hello all. I want to build a model for the upcoming baseball season and I literally have NO IDEA where to start. I read Justin7's book and it was very helpful but it still doesn't change the fact that I know nothing about modelling software and such. So here are my many questions(not necessarily in order).
    Seeing as I'm completely reconstructing my own MLB model this year, I took some time over the last couple of days to build the model described in Justin7's book from scratch, just to give it a look, using only Excel with:

    - Player data from Sports Prospectus
    - Park Factors from ESPN
    - Projected rosters from mlbdepthcharts.com (you can find them a lot of places)

    Admittedly, I have experience building models in the past and solid Excel skills, but the most complicated functions you need to use are the HLookup/VLookup and Sumifs functions, and I feel like anyone with even midrange Excel skills can build this in a couple of days at most. For the moment, it really doesn't matter if your data is being updated daily from web sources. Regular season games don't even start for 6 weeks, so just don't worry about the data collection side of this for the moment. It's far more important for you to focus on the actual model building process (the part that actually does the calculations), and develop an understanding of exactly what data you need and how important/sensitive different stats are to the success of the model.
    Last edited by suicidekings; 02-15-11 at 08:22 PM.

  2. #37
    Buried_PIRATE
    Shaved my chest.
    Buried_PIRATE's Avatar Become A Pro!
    Join Date: 12-28-10
    Posts: 546

    @Kings...

    What factors do you use in your MLB model?

    WAR?

    Do you take the lineup projected for the day and get that teams WAR and look at it vs their opponent and then factor in some multiple for a pitcher?

    The thing is, if you come up with a team that has a 50 WAR vs a team that had a 45 WAR, how do you decide to bet -150 or +150 for the game etc.

    These are the types of things that makes it confusing for me... getting the data to work with is work, but the easy part.

  3. #38
    suicidekings
    Update your status
    suicidekings's Avatar Become A Pro!
    Join Date: 03-23-09
    Posts: 9,962

    Quote Originally Posted by Buried_PIRATE View Post
    @Kings... What factors do you use in your MLB model? WAR? Do you take the lineup projected for the day and get that teams WAR and look at it vs their opponent and then factor in some multiple for a pitcher? The thing is, if you come up with a team that has a 50 WAR vs a team that had a 45 WAR, how do you decide to bet -150 or +150 for the game etc. These are the types of things that makes it confusing for me... getting the data to work with is work, but the easy part.
    I personally don't like WAR because it's not an intuitive statistic to me. It's like looking at a pie chart instead of the spreadsheet the graph is based on. I want to see and have control over the details.

    When it comes to assessing the strength of plays, handicapping is all about experience, being confident in the numbers you produce, having the ability to understand the reasons behind your model breaking down when it does (and it will, repeatedly) and how to fix it. Data collection and Excel grunt work are tedious compared to making decisions on where to lay money, but until you have the confidence in your numbers vs the market, you have no baseline to base decisions on at all.

    As for determining where the value lies, if your model generates +150/-150 as a fair line for a game, the answer of which side to bet entirely depends on what prices are available and which way you expect the line to move (again, you need to have confidence in your numbers to determine this). If you see a +120/-130 overnight line, the value would appear to lie with the favourite. If the game closes at +135/-145, the market is reinforcing your numbers as being correct. If it closes at +105/-115, your assessment of the game is probably wrong.

    Long term, beating the closing number consistently is what's important, but that's many steps beyond where you're currently at until you get your model together and start playing around with it.
    Last edited by suicidekings; 02-16-11 at 12:17 AM.

  4. #39
    Buried_PIRATE
    Shaved my chest.
    Buried_PIRATE's Avatar Become A Pro!
    Join Date: 12-28-10
    Posts: 546

    Thanks for the post... I'll consider it as baseball season gets closer

    In the data collection phase right now =**

  5. #40
    Flight
    Update your status
    Flight's Avatar Become A Pro!
    Join Date: 01-27-09
    Posts: 1,979

    Quote Originally Posted by Borat38 View Post
    What about SQL, guys? Am fine w/ Excel, but in terms of pulling up data with certain criteria, would Sql be a good choice for a non-programmer like me?
    SQL is not a good choice for non-programmers, as it is a full language in itself.

  6. #41
    Flight
    Update your status
    Flight's Avatar Become A Pro!
    Join Date: 01-27-09
    Posts: 1,979

    Quote Originally Posted by illfuuptn View Post
    So am I correct about the following? I need a database(like mysql) to house all of the information I find, information which I scrape from the web using a programming code I build/make(in java?). What exactly do I use to run tests on the data I find? Is that also done in mysql or java or is that completely different?
    Yes that is a good architecture. Here is the tool and process flow with my recommendations for each component in parenthesis.

    - Data Gatherer (C# Windows Forms Application)
    - Database (MS SQL Server Express)
    - Analysis / Construction (Excel / R / Matlab)
    - Prediction Tool (C# Windows Forms Application)

  7. #42
    JOHNPRUSSELL
    JOHNPRUSSELL's Avatar Become A Pro!
    Join Date: 01-18-11
    Posts: 7

    A word of encouragement to you and everyone else. Here is why among others that you can beat Vegas. First, Vegas has to put up a line on every game,you dont have to play every line; you get to decide what games you want to play. Vegas may be better than the individual on most lines, but cant be better all the time on all the lines.

    #2 Vegas has to put out the same line for everyone. Sharps and Squares bet the same line. I think a good example of this, is that even though the Dallas Cowboys were really bad this year, the public tends to play them regardless. Even though Vegas knows Dallas stinks; they also know that pretty much regardless of the line, the public is going to bet Dallas. Therefore, they have to inflate Dallas a little beyond what they really think it should be. this presents opportunity.

    Keep moving in a positive direction.

    I have a formula that works decently that I will post tomorrow, that will allow you to get by without having to compile all of the data yourself. Essentially it takes "Sagarin" ratings and lets you plug them into a formula. I am sure by looking at the skill levels of some of the posters on this site,they have some strong opinions about Sagarin, (good/bad). but for where you are right now, Let him compile the data for you, and you make predictions from his data.

  8. #43
    masticore
    masticore's Avatar Become A Pro!
    Join Date: 07-24-09
    Posts: 1,177
    Betpoints: 456

    There is a exellent bas online at http://killersports.com/ (yes,its free)
    It covers NBA,NFL and MLB

    I'm looking for same for other sports anyone know of any?

    regards
    mikke

  9. #44
    Buried_PIRATE
    Shaved my chest.
    Buried_PIRATE's Avatar Become A Pro!
    Join Date: 12-28-10
    Posts: 546

    Interesting link masticore

    Also @JohnRussell some good points there sir!

  10. #45
    masticore
    masticore's Avatar Become A Pro!
    Join Date: 07-24-09
    Posts: 1,177
    Betpoints: 456

    Also agree witj JohnRussell

    The bookies is not intressed to the "real" lines..they want the line the public playing on
    So if they find a line there the public bets 50/50, then its a perfect line - and it will also be a perfect line for the "sharks".

  11. #46
    antifoil
    Update your status
    antifoil's Avatar Become A Pro!
    Join Date: 11-11-09
    Posts: 3,993
    Betpoints: 6611

    http://www.baseballthinkfactory.org/files/oracle/

    anyone used these ZiPS projections and how do they fair in comparison to the baseball prospectus and bill james?

  12. #47
    hubie69
    I am JJs bookie
    hubie69's Avatar Become A Pro!
    Join Date: 09-16-10
    Posts: 7,329
    Betpoints: 617

    I use MySQL on a linux box for my data collection needs (only one sport) and it automatically scrapes data daily. The best thing I can tell you is getting data by hand everyday sucks, find a way to automatically scrape it and put it into a .csv (which you can then open in excel AND import into any version of SQL). This is easier said than done though, But well worth it in the end.

  13. #48
    Miz
    Miz's Avatar Become A Pro!
    Join Date: 08-30-09
    Posts: 693
    Betpoints: 3132

    I agree that excel can do most anything you want. It is a bit inefficient and cumbersome at times though. I am exploring alternatives. Lookup tables and sheet linking are important things to learn. Retrosheet is a wonderful resource, as is dougstats. Retrosheet has most everything except the lines. Remember that you'll probably have to combine 3-4 data sources to get all the info you need to test etc. Unfortunately they aren't usually in one spot; and they require some manipulation. I also think that going through the exercise of building your model (regardless of whether or not it is immediately successful) is important. You will learn a lot about the process and the next time you do it will be much easier.

  14. #49
    mebaran
    Con los terroristas
    mebaran's Avatar Become A Pro!
    Join Date: 09-16-09
    Posts: 1,540
    Betpoints: 330

    This is turning into a great thread guys.
    I have a database set up with play-by-play data from retrosheet back to 1950. Is this data, along with the past lines, enough to make a decent model? I'm still trying to figure out what data I actually have (the way retrosheet formats their data by play is tripping me up a bit).

  15. #50
    Sportsguy_USA
    Sportsguy_USA's Avatar Become A Pro!
    Join Date: 09-29-10
    Posts: 59

    I would first define your criteria / requirements on who metrics you want to evaluate to determine your outcomes. Once you have this, research to what sites post the data inputs you need (ex. lines, ratings, etc.). Then lastly, hire a programmer (hell - go offshore, its cheap) to build the requirements and an interface that allows you to play with multiple scenarios.
    Then, tweak this to address each sport you desire.

    I'm modeling something like this right now...its been very efficient, cheap and has saved me a ton of time. Completely giddy to get it launched!

First 12
Top