1,000,000 questions for model builders...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • suicidekings
    SBR Hall of Famer
    • 03-23-09
    • 9962

    #36
    Originally posted by illfuuptn
    Hello all. I want to build a model for the upcoming baseball season and I literally have NO IDEA where to start. I read Justin7's book and it was very helpful but it still doesn't change the fact that I know nothing about modelling software and such. So here are my many questions(not necessarily in order).
    Seeing as I'm completely reconstructing my own MLB model this year, I took some time over the last couple of days to build the model described in Justin7's book from scratch, just to give it a look, using only Excel with:

    - Player data from Sports Prospectus
    - Park Factors from ESPN
    - Projected rosters from mlbdepthcharts.com (you can find them a lot of places)

    Admittedly, I have experience building models in the past and solid Excel skills, but the most complicated functions you need to use are the HLookup/VLookup and Sumifs functions, and I feel like anyone with even midrange Excel skills can build this in a couple of days at most. For the moment, it really doesn't matter if your data is being updated daily from web sources. Regular season games don't even start for 6 weeks, so just don't worry about the data collection side of this for the moment. It's far more important for you to focus on the actual model building process (the part that actually does the calculations), and develop an understanding of exactly what data you need and how important/sensitive different stats are to the success of the model.
    Last edited by suicidekings; 02-15-11, 09:22 PM.
    Comment
    • Buried_PIRATE
      SBR Wise Guy
      • 12-28-10
      • 546

      #37
      @Kings...

      What factors do you use in your MLB model?

      WAR?

      Do you take the lineup projected for the day and get that teams WAR and look at it vs their opponent and then factor in some multiple for a pitcher?

      The thing is, if you come up with a team that has a 50 WAR vs a team that had a 45 WAR, how do you decide to bet -150 or +150 for the game etc.

      These are the types of things that makes it confusing for me... getting the data to work with is work, but the easy part.
      Comment
      • suicidekings
        SBR Hall of Famer
        • 03-23-09
        • 9962

        #38
        Originally posted by Buried_PIRATE
        @Kings... What factors do you use in your MLB model? WAR? Do you take the lineup projected for the day and get that teams WAR and look at it vs their opponent and then factor in some multiple for a pitcher? The thing is, if you come up with a team that has a 50 WAR vs a team that had a 45 WAR, how do you decide to bet -150 or +150 for the game etc. These are the types of things that makes it confusing for me... getting the data to work with is work, but the easy part.
        I personally don't like WAR because it's not an intuitive statistic to me. It's like looking at a pie chart instead of the spreadsheet the graph is based on. I want to see and have control over the details.

        When it comes to assessing the strength of plays, handicapping is all about experience, being confident in the numbers you produce, having the ability to understand the reasons behind your model breaking down when it does (and it will, repeatedly) and how to fix it. Data collection and Excel grunt work are tedious compared to making decisions on where to lay money, but until you have the confidence in your numbers vs the market, you have no baseline to base decisions on at all.

        As for determining where the value lies, if your model generates +150/-150 as a fair line for a game, the answer of which side to bet entirely depends on what prices are available and which way you expect the line to move (again, you need to have confidence in your numbers to determine this). If you see a +120/-130 overnight line, the value would appear to lie with the favourite. If the game closes at +135/-145, the market is reinforcing your numbers as being correct. If it closes at +105/-115, your assessment of the game is probably wrong.

        Long term, beating the closing number consistently is what's important, but that's many steps beyond where you're currently at until you get your model together and start playing around with it.
        Last edited by suicidekings; 02-16-11, 01:17 AM.
        Comment
        • Buried_PIRATE
          SBR Wise Guy
          • 12-28-10
          • 546

          #39
          Thanks for the post... I'll consider it as baseball season gets closer

          In the data collection phase right now =**
          Comment
          • Flight
            Restricted User
            • 01-28-09
            • 1979

            #40
            Originally posted by Borat38
            What about SQL, guys? Am fine w/ Excel, but in terms of pulling up data with certain criteria, would Sql be a good choice for a non-programmer like me?
            SQL is not a good choice for non-programmers, as it is a full language in itself.
            Comment
            • Flight
              Restricted User
              • 01-28-09
              • 1979

              #41
              Originally posted by illfuuptn
              So am I correct about the following? I need a database(like mysql) to house all of the information I find, information which I scrape from the web using a programming code I build/make(in java?). What exactly do I use to run tests on the data I find? Is that also done in mysql or java or is that completely different?
              Yes that is a good architecture. Here is the tool and process flow with my recommendations for each component in parenthesis.

              - Data Gatherer (C# Windows Forms Application)
              - Database (MS SQL Server Express)
              - Analysis / Construction (Excel / R / Matlab)
              - Prediction Tool (C# Windows Forms Application)
              Comment
              • JOHNPRUSSELL
                SBR Rookie
                • 01-18-11
                • 7

                #42
                A word of encouragement to you and everyone else. Here is why among others that you can beat Vegas. First, Vegas has to put up a line on every game,you dont have to play every line; you get to decide what games you want to play. Vegas may be better than the individual on most lines, but cant be better all the time on all the lines.

                #2 Vegas has to put out the same line for everyone. Sharps and Squares bet the same line. I think a good example of this, is that even though the Dallas Cowboys were really bad this year, the public tends to play them regardless. Even though Vegas knows Dallas stinks; they also know that pretty much regardless of the line, the public is going to bet Dallas. Therefore, they have to inflate Dallas a little beyond what they really think it should be. this presents opportunity.

                Keep moving in a positive direction.

                I have a formula that works decently that I will post tomorrow, that will allow you to get by without having to compile all of the data yourself. Essentially it takes "Sagarin" ratings and lets you plug them into a formula. I am sure by looking at the skill levels of some of the posters on this site,they have some strong opinions about Sagarin, (good/bad). but for where you are right now, Let him compile the data for you, and you make predictions from his data.
                Comment
                • masticore
                  SBR MVP
                  • 07-24-09
                  • 1177

                  #43
                  There is a exellent bas online at http://killersports.com/ (yes,its free)
                  It covers NBA,NFL and MLB

                  I'm looking for same for other sports anyone know of any?

                  regards
                  mikke
                  Comment
                  • Buried_PIRATE
                    SBR Wise Guy
                    • 12-28-10
                    • 546

                    #44
                    Interesting link masticore

                    Also @JohnRussell some good points there sir!
                    Comment
                    • masticore
                      SBR MVP
                      • 07-24-09
                      • 1177

                      #45
                      Also agree witj JohnRussell

                      The bookies is not intressed to the "real" lines..they want the line the public playing on
                      So if they find a line there the public bets 50/50, then its a perfect line - and it will also be a perfect line for the "sharks".
                      Comment
                      • antifoil
                        SBR MVP
                        • 11-11-09
                        • 3993

                        #46


                        anyone used these ZiPS projections and how do they fair in comparison to the baseball prospectus and bill james?
                        Comment
                        • hubie69
                          SBR Hall of Famer
                          • 09-16-10
                          • 7329

                          #47
                          I use MySQL on a linux box for my data collection needs (only one sport) and it automatically scrapes data daily. The best thing I can tell you is getting data by hand everyday sucks, find a way to automatically scrape it and put it into a .csv (which you can then open in excel AND import into any version of SQL). This is easier said than done though, But well worth it in the end.
                          Comment
                          • Miz
                            SBR Wise Guy
                            • 08-30-09
                            • 695

                            #48
                            I agree that excel can do most anything you want. It is a bit inefficient and cumbersome at times though. I am exploring alternatives. Lookup tables and sheet linking are important things to learn. Retrosheet is a wonderful resource, as is dougstats. Retrosheet has most everything except the lines. Remember that you'll probably have to combine 3-4 data sources to get all the info you need to test etc. Unfortunately they aren't usually in one spot; and they require some manipulation. I also think that going through the exercise of building your model (regardless of whether or not it is immediately successful) is important. You will learn a lot about the process and the next time you do it will be much easier.
                            Comment
                            • mebaran
                              SBR MVP
                              • 09-16-09
                              • 1540

                              #49
                              This is turning into a great thread guys.
                              I have a database set up with play-by-play data from retrosheet back to 1950. Is this data, along with the past lines, enough to make a decent model? I'm still trying to figure out what data I actually have (the way retrosheet formats their data by play is tripping me up a bit).
                              Comment
                              • Sportsguy_USA
                                SBR Hustler
                                • 09-29-10
                                • 59

                                #50
                                I would first define your criteria / requirements on who metrics you want to evaluate to determine your outcomes. Once you have this, research to what sites post the data inputs you need (ex. lines, ratings, etc.). Then lastly, hire a programmer (hell - go offshore, its cheap) to build the requirements and an interface that allows you to play with multiple scenarios.
                                Then, tweak this to address each sport you desire.

                                I'm modeling something like this right now...its been very efficient, cheap and has saved me a ton of time. Completely giddy to get it launched!
                                Comment
                                SBR Contests
                                Collapse
                                Top-Rated US Sportsbooks
                                Collapse
                                Working...