1,000,000 questions for model builders...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • illfuuptn
    SBR MVP
    • 03-17-10
    • 1860

    #1
    1,000,000 questions for model builders...
    Hello all. I want to build a model for the upcoming baseball season and I literally have NO IDEA where to start. I read Justin7's book and it was very helpful but it still doesn't change the fact that I know nothing about modelling software and such. So here are my many questions(not necessarily in order).

    1.) What modeling program do I use? Is free pascal good or is there a better one?
    2.) How do I learn how to program? Is that what I'll be doing in pascal, programming?
    3.) Where can I find a place that has historical lines(opening through closing) for MLB? How do I put this information into my model?
    4.) Where can I find a historical database that has the correct lineups for each different game for the last 5 seasons or so? Again, how would I get this information into my model?

    Many more questions to come once I think of them and/or have questions about any replies.
  • dr_wolf
    SBR Sharp
    • 07-20-10
    • 417

    #2
    What do you want to obtain from this software?
    Comment
    • illfuuptn
      SBR MVP
      • 03-17-10
      • 1860

      #3
      Originally posted by dr_wolf
      What do you want to obtain from this software?
      I want to build a model to use for handicapping the MLB.
      Comment
      • FourLengthsClear
        SBR MVP
        • 12-29-10
        • 3808

        #4
        On the basis that you are complete beginner in terms of modeling, it is probably best to start with MS Excel, MS Access or Python. These don't need any programming abilities. Once you have an idea of what your key parameters/variables are you can start to look at more advanced software and techniques.

        A large part of modeling successfully is correctly structuring your data. You have asked for historic sources on line movements and team line-ups, do you have any idea yet what you want to do with this information?
        Comment
        • illfuuptn
          SBR MVP
          • 03-17-10
          • 1860

          #5
          Originally posted by FourLengthsClear
          On the basis that you are complete beginner in terms of modeling, it is probably best to start with MS Excel, MS Access or Python. These don't need any programming abilities. Once you have an idea of what your key parameters/variables are you can start to look at more advanced software and techniques.

          A large part of modeling successfully is correctly structuring your data. You have asked for historic sources on line movements and team line-ups, do you have any idea yet what you want to do with this information?
          I would use the line movements, at first, to essentially grade my strength of plays based on the opinions of sharps. I want to have team lineups and updated stats to construct rolling power rankings for each team as the season's progress. Any further suggestions would be much appreciated.
          Comment
          • hubric
            SBR Rookie
            • 02-07-11
            • 2

            #6
            Just echoing the suggestion of starting with Excel.
            The cell based layout will make easier to start getting results without having to learn about debugging. In tandem with that adventure, try to duplicate your results in whatever language seems interesting to you to get up on the curve.
            Comment
            • specialronnie29
              SBR High Roller
              • 09-19-10
              • 140

              #7
              youre in over your head

              start with props
              Comment
              • illfuuptn
                SBR MVP
                • 03-17-10
                • 1860

                #8
                Originally posted by specialronnie29
                youre in over your head

                start with props
                I'm obviously in over my head. That's why I'm asking for help.
                Comment
                • YorkHunt
                  SBR Hall of Famer
                  • 12-11-10
                  • 7496

                  #9
                  How well do you know excel?
                  Comment
                  • illfuuptn
                    SBR MVP
                    • 03-17-10
                    • 1860

                    #10
                    Originally posted by YorkHunt
                    How well do you know excel?
                    I know excel pretty well, but how effective can excel really be when it comes to having data constantly imported into my spreadsheets.
                    Comment
                    • Wrecktangle
                      SBR MVP
                      • 03-01-09
                      • 1524

                      #11
                      Originally posted by illfuuptn
                      I know excel pretty well, but how effective can excel really be when it comes to having data constantly imported into my spreadsheets.
                      Excel is so good these days you can prototype an entire modeling approach in a linked set of sheets if not one sheet WITHOUT having to program in VB. This means you don't have to necessarily be a "power" user, but you will need a good grounding in statistics and the Excel functions.

                      Never before has so much power been placed in the hands of the layman with computers.
                      Comment
                      • jamesbettor
                        SBR Rookie
                        • 12-12-10
                        • 17

                        #12
                        May I suggest looking through Baseball Reference - http://www.baseball-reference.com/ - for historical line-ups and stats.

                        For example, 2010 Box Scores for Boston Red Sox can be found here - http://www.baseball-reference.com/te...e-scores.shtml - which includes line-ups.

                        Example of a Box Score from last season - http://www.baseball-reference.com/bo...01004040.shtml

                        Hope that helps. Good luck.
                        Comment
                        • u21c3f6
                          SBR Wise Guy
                          • 01-17-09
                          • 790

                          #13
                          Excel can be used very effectively but IMO MS Access is worth learning to set-up databases with more fuctionality.

                          Joe.
                          Comment
                          • dr_wolf
                            SBR Sharp
                            • 07-20-10
                            • 417

                            #14
                            Excel is very good but depend what you need for example how you resolve system of ecuation in excel?
                            Comment
                            • mebaran
                              SBR MVP
                              • 09-16-09
                              • 1540

                              #15
                              There is some really good information in tech forums around the web. Just poke your head in with a question and you'll get a ton of feedback. Also, if you do use Excel, you can contact Microsoft if need be.
                              Comment
                              • Buried_PIRATE
                                SBR Wise Guy
                                • 12-28-10
                                • 546

                                #16
                                I can crush VB and I know how to use regressions using STATA/SAS etc but how the hell do you even begin to model baseball?

                                Scenario, Red Sox vs Twins, Lester on the hill on short rest vs Blackburn on full rest, couple minor injuries, papelbon used on back to back days... prediction, go!

                                Or am I missing something?
                                Comment
                                • Kaplan
                                  SBR High Roller
                                  • 01-15-11
                                  • 165

                                  #17
                                  Originally posted by Buried_PIRATE
                                  I can crush VB and I know how to use regressions using STATA/SAS etc but how the hell do you even begin to model baseball?

                                  Scenario, Red Sox vs Twins, Lester on the hill on short rest vs Blackburn on full rest, couple minor injuries, papelbon used on back to back days... prediction, go!

                                  Or am I missing something?
                                  I've been looking around the web trying to decide on a software product that can do regressions. My computer abilities is limited to excel (pretty good). Do I have a chance of learning to do regressions with Stata, given my limited exposure?

                                  Thanks.
                                  Comment
                                  • Buried_PIRATE
                                    SBR Wise Guy
                                    • 12-28-10
                                    • 546

                                    #18
                                    Good thread to bump. Hopefully some senior model builders can weigh in on my previous question.

                                    @Kaplan: Yes, I think Stata would be a fine program. The computer resources required to run it are low and basic functionality is pretty easy to understand. I think the more important thing to understand is what regressions actually mean / do. I would recommend you check out an introductory text in econometrics or something along those lines.
                                    Comment
                                    • suicidekings
                                      SBR Hall of Famer
                                      • 03-23-09
                                      • 9962

                                      #19
                                      Originally posted by Buried_PIRATE
                                      I can crush VB and I know how to use regressions using STATA/SAS etc but how the hell do you even begin to model baseball?

                                      Scenario, Red Sox vs Twins, Lester on the hill on short rest vs Blackburn on full rest, couple minor injuries, papelbon used on back to back days... prediction, go!

                                      Or am I missing something?
                                      In general, that's correct. Handicapping a particular game would involve:

                                      Starting pitcher A vs Hitters B + Relief Pitching A vs Team B = Team Total B
                                      Starting pitcher B vs Hitters A + Relief Pitching B vs Team A = Team Total A
                                      Use respective runs scored to calculate the total and no-vig moneyline

                                      You'd have to know: how many innings each starter is expected to pitch, how rested each bullpen is (this affects different teams by different amounts), the exact starting lineup for each team and their expected performance both in this game and against LHP/RHP, Ballpark rating (hitters park vs pitchers park), weather, etc.

                                      There are a TON of stats in baseball, and some are much better assessments of ability than others. Basically, what I'm saying is describing the process is very simple, but building the model is not. The measure of your success in modeling will be how you assemble that massive collection of stats and extract the useful information into something that accurately describes what will happen in a particular game.

                                      Rule #1 for computer modeling: Garbage in, Garbage out.
                                      Comment
                                      • Boner_18
                                        SBR Hall of Famer
                                        • 08-24-08
                                        • 8301

                                        #20
                                        I was hoping someone could better explain to me how to create B multipliers for Smyths BaseRuns equation(s). I know the equation to use, essentially solve for the multipliers but I am having trouble conceptualizing what that means, what data to use etc. Any info/discussion would be much appreciated.
                                        Comment
                                        • illfuuptn
                                          SBR MVP
                                          • 03-17-10
                                          • 1860

                                          #21
                                          All of these follow up questions are great and I hope some people can answer those questions as well, but the answers I need are much simpler.
                                          How do I build a model? What program do I use? How do I learn the programming language(no experience whatsoever) for said program?
                                          Comment
                                          • EXhoosier10
                                            SBR MVP
                                            • 07-06-09
                                            • 3122

                                            #22
                                            Build a model - import data into your spreadsheet (you can use excel web queries)
                                            Program - pick the one you're most familiar with. If you know excel, you can use excel. Otherwise, proceed to next step
                                            How to learn - go to your local library, rent "Blank Blank for Dummies", sit down next to your computer with said book, open to page 1, and begin reading.
                                            Comment
                                            • illfuuptn
                                              SBR MVP
                                              • 03-17-10
                                              • 1860

                                              #23
                                              Originally posted by EXhoosier10
                                              Build a model - import data into your spreadsheet (you can use excel web queries)
                                              Program - pick the one you're most familiar with. If you know excel, you can use excel. Otherwise, proceed to next step
                                              How to learn - go to your local library, rent "Blank Blank for Dummies", sit down next to your computer with said book, open to page 1, and begin reading.
                                              Yeah but how could I just "import" years worth of baseball results and statistics into excel in a fast, readable format? And then how could I use that information to run backtesting on specific algorithms to determine if I'm profitable?
                                              Comment
                                              • EXhoosier10
                                                SBR MVP
                                                • 07-06-09
                                                • 3122

                                                #24


                                                If you choose MS Excel (Im not sure if this is the best way, its the only way i know how though), google "web scraping +excel" and that should get you started. Then learn Visual Basic (http://www.google.com/search?aq=f&so...ic+for+dummies) to collect it all.
                                                Comment
                                                • Buried_PIRATE
                                                  SBR Wise Guy
                                                  • 12-28-10
                                                  • 546

                                                  #25
                                                  Originally posted by illfuuptn
                                                  Yeah but how could I just "import" years worth of baseball results and statistics into excel in a fast, readable format? And then how could I use that information to run backtesting on specific algorithms to determine if I'm profitable?
                                                  I know you can backtest on covers, I have been doing it for the NBA

                                                  If you are really serious about it I think you have to start your own database in MS Access...

                                                  You would to include all the basics, opening line, totals, relevant injuries (if you can track it), back to back games, important game notes etc..

                                                  I mean... it isn't easy, but you can bet that Vegas is doing similar things and then some, with much more people looking at it than just yourself
                                                  Comment
                                                  • sharpcat
                                                    Restricted User
                                                    • 12-19-09
                                                    • 4516

                                                    #26
                                                    Originally posted by illfuuptn
                                                    Yeah but how could I just "import" years worth of baseball results and statistics into excel in a fast, readable format? And then how could I use that information to run backtesting on specific algorithms to determine if I'm profitable?
                                                    You are in way over your head trying to model MLB if you do not know the answers to these questions.
                                                    Comment
                                                    • illfuuptn
                                                      SBR MVP
                                                      • 03-17-10
                                                      • 1860

                                                      #27
                                                      Originally posted by sharpcat
                                                      You are in way over your head trying to model MLB if you do not know the answers to these questions.
                                                      Hence why I'm asking these questions. We all have to start somewhere.
                                                      Comment
                                                      • Flight
                                                        Restricted User
                                                        • 01-28-09
                                                        • 1979

                                                        #28
                                                        You need to start with data. You should have a program that can automatically import the historic and current boxscores into your database.

                                                        You're almost out of time for MLB. Give yourself time to properly develop and test before deploying.
                                                        Comment
                                                        • suicidekings
                                                          SBR Hall of Famer
                                                          • 03-23-09
                                                          • 9962

                                                          #29
                                                          A starting point for data:


                                                          Complete source for baseball history including complete major league player, team, and league stats, awards, records, leaders, rookies and scores.



                                                          As was stated above, it's not feasible for you to expect to be guided through the model building process step by step. The data is available in csv format, which can be imported into excel easily. It's up to you to learn how to use the program to manage the databases. As for the actual process of building the model, you need to come up with that on your own as well.

                                                          Start by looking back through the Thinktank for the dozens of threads that already exist about model building and do some reading. I can think of several very helpful threads that are in there, particularly those written by a poster named Ganchrow that explain many fundamental principles.
                                                          Last edited by suicidekings; 02-14-11, 03:17 AM.
                                                          Comment
                                                          • illfuuptn
                                                            SBR MVP
                                                            • 03-17-10
                                                            • 1860

                                                            #30
                                                            So am I correct about the following? I need a database(like mysql) to house all of the information I find, information which I scrape from the web using a programming code I build/make(in java?). What exactly do I use to run tests on the data I find? Is that also done in mysql or java or is that completely different?
                                                            Comment
                                                            • Dark Horse
                                                              SBR Posting Legend
                                                              • 12-14-05
                                                              • 13764

                                                              #31
                                                              Originally posted by illfuuptn
                                                              Hence why I'm asking these questions. We all have to start somewhere.
                                                              You only need one question to get started. When you find the answer, by actually starting somewhere, you may come across another question. The questions lead the way, and as you find the answers you become more empowered and certain in your ways. The title of this thread is not hopeful, where it comes to your ability to think for yourself.
                                                              Comment
                                                              • roasthawg
                                                                SBR MVP
                                                                • 11-09-07
                                                                • 2990

                                                                #32
                                                                Excel has gotten me through years of modeling. Only recently have I run into the issue of memory overload. Excel is the way to go in my experience for sure.
                                                                Comment
                                                                • Borat38
                                                                  SBR High Roller
                                                                  • 10-15-10
                                                                  • 177

                                                                  #33
                                                                  What about SQL, guys? Am fine w/ Excel, but in terms of pulling up data with certain criteria, would Sql be a good choice for a non-programmer like me?
                                                                  Comment
                                                                  • ManBearPig
                                                                    SBR MVP
                                                                    • 12-04-08
                                                                    • 2473

                                                                    #34
                                                                    Originally posted by Borat38
                                                                    What about SQL, guys? Am fine w/ Excel, but in terms of pulling up data with certain criteria, would SQL be a good choice for a non-programmer like me?
                                                                    Unless your well versed in SQL...SQL by itself won't be much use to you because one you have the data you have to know how you want to mine the data. I guess you could create a bunch of views and use those as secondary datasets and pull results based on that, but it would be really time consuming.

                                                                    The reason why most suggest using a language like VB or C# in conjunction is that you need a stack that will allow you to not only query the data but also do something with it...usually apply logic via functions/procedures and get some real work done. There really is almost no limit as to what languages you can use if you know how to use them correctly and like playing architect.

                                                                    Excel is good to start with because you can use it as a DB, although limited, and it has the ability to do some calculations of this data using Excel VB that can stacked into a single application. Personally I'd rather use something like Oracle and PL/SQL, but you have to a more solid understand of DB's at a more than basic level. Also you need to have the time to do it right and set everything up. With Access or Excel you can do it on the fly with minimal programming knowledge and effort...that's what I did. In the off-season, I'm going to move it to something more useful to me.
                                                                    Comment
                                                                    • bolekblues
                                                                      SBR Sharp
                                                                      • 12-06-08
                                                                      • 420

                                                                      #35
                                                                      I have had some success with excel as well, though i have an NBA database, not MLB. I have to agree that if you can use it properly (advanced search functions, if-clauses, maybe some simple macros) you do not really need any other program. good luck
                                                                      Comment
                                                                      SBR Contests
                                                                      Collapse
                                                                      Top-Rated US Sportsbooks
                                                                      Collapse
                                                                      Working...