1. #1
    illfuuptn
    illfuuptn's Avatar Become A Pro!
    Join Date: 03-17-10
    Posts: 1,860

    1,000,000 questions for model builders...

    Hello all. I want to build a model for the upcoming baseball season and I literally have NO IDEA where to start. I read Justin7's book and it was very helpful but it still doesn't change the fact that I know nothing about modelling software and such. So here are my many questions(not necessarily in order).

    1.) What modeling program do I use? Is free pascal good or is there a better one?
    2.) How do I learn how to program? Is that what I'll be doing in pascal, programming?
    3.) Where can I find a place that has historical lines(opening through closing) for MLB? How do I put this information into my model?
    4.) Where can I find a historical database that has the correct lineups for each different game for the last 5 seasons or so? Again, how would I get this information into my model?

    Many more questions to come once I think of them and/or have questions about any replies.

  2. #2
    dr_wolf
    dr_wolf's Avatar SBR PRO
    Join Date: 07-20-10
    Posts: 417
    Betpoints: 19055

    What do you want to obtain from this software?

  3. #3
    illfuuptn
    illfuuptn's Avatar Become A Pro!
    Join Date: 03-17-10
    Posts: 1,860

    Quote Originally Posted by dr_wolf View Post
    What do you want to obtain from this software?
    I want to build a model to use for handicapping the MLB.

  4. #4
    FourLengthsClear
    King of the Idiots
    FourLengthsClear's Avatar Become A Pro!
    Join Date: 12-29-10
    Posts: 3,808
    Betpoints: 508

    On the basis that you are complete beginner in terms of modeling, it is probably best to start with MS Excel, MS Access or Python. These don't need any programming abilities. Once you have an idea of what your key parameters/variables are you can start to look at more advanced software and techniques.

    A large part of modeling successfully is correctly structuring your data. You have asked for historic sources on line movements and team line-ups, do you have any idea yet what you want to do with this information?

  5. #5
    illfuuptn
    illfuuptn's Avatar Become A Pro!
    Join Date: 03-17-10
    Posts: 1,860

    Quote Originally Posted by FourLengthsClear View Post
    On the basis that you are complete beginner in terms of modeling, it is probably best to start with MS Excel, MS Access or Python. These don't need any programming abilities. Once you have an idea of what your key parameters/variables are you can start to look at more advanced software and techniques.

    A large part of modeling successfully is correctly structuring your data. You have asked for historic sources on line movements and team line-ups, do you have any idea yet what you want to do with this information?
    I would use the line movements, at first, to essentially grade my strength of plays based on the opinions of sharps. I want to have team lineups and updated stats to construct rolling power rankings for each team as the season's progress. Any further suggestions would be much appreciated.

  6. #6
    hubric
    hubric's Avatar Become A Pro!
    Join Date: 02-07-11
    Posts: 2

    Just echoing the suggestion of starting with Excel.
    The cell based layout will make easier to start getting results without having to learn about debugging. In tandem with that adventure, try to duplicate your results in whatever language seems interesting to you to get up on the curve.

  7. #7
    specialronnie29
    specialronnie29's Avatar Become A Pro!
    Join Date: 09-19-10
    Posts: 140

    youre in over your head

    start with props

  8. #8
    illfuuptn
    illfuuptn's Avatar Become A Pro!
    Join Date: 03-17-10
    Posts: 1,860

    Quote Originally Posted by specialronnie29 View Post
    youre in over your head

    start with props
    I'm obviously in over my head. That's why I'm asking for help.

  9. #9
    YorkHunt
    I JUST RELEASED
    YorkHunt's Avatar Become A Pro!
    Join Date: 12-11-10
    Posts: 7,496
    Betpoints: 1507

    How well do you know excel?

  10. #10
    illfuuptn
    illfuuptn's Avatar Become A Pro!
    Join Date: 03-17-10
    Posts: 1,860

    Quote Originally Posted by YorkHunt View Post
    How well do you know excel?
    I know excel pretty well, but how effective can excel really be when it comes to having data constantly imported into my spreadsheets.

  11. #11
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    Quote Originally Posted by illfuuptn View Post
    I know excel pretty well, but how effective can excel really be when it comes to having data constantly imported into my spreadsheets.
    Excel is so good these days you can prototype an entire modeling approach in a linked set of sheets if not one sheet WITHOUT having to program in VB. This means you don't have to necessarily be a "power" user, but you will need a good grounding in statistics and the Excel functions.

    Never before has so much power been placed in the hands of the layman with computers.

  12. #12
    jamesbettor
    jamesbettor's Avatar Become A Pro!
    Join Date: 12-11-10
    Posts: 17

    May I suggest looking through Baseball Reference - http://www.baseball-reference.com/ - for historical line-ups and stats.

    For example, 2010 Box Scores for Boston Red Sox can be found here - http://www.baseball-reference.com/te...e-scores.shtml - which includes line-ups.

    Example of a Box Score from last season - http://www.baseball-reference.com/bo...01004040.shtml

    Hope that helps. Good luck.

  13. #13
    u21c3f6
    u21c3f6's Avatar Become A Pro!
    Join Date: 01-17-09
    Posts: 790
    Betpoints: 5198

    Excel can be used very effectively but IMO MS Access is worth learning to set-up databases with more fuctionality.

    Joe.

  14. #14
    dr_wolf
    dr_wolf's Avatar SBR PRO
    Join Date: 07-20-10
    Posts: 417
    Betpoints: 19055

    Excel is very good but depend what you need for example how you resolve system of ecuation in excel?

  15. #15
    mebaran
    Con los terroristas
    mebaran's Avatar Become A Pro!
    Join Date: 09-16-09
    Posts: 1,540
    Betpoints: 330

    There is some really good information in tech forums around the web. Just poke your head in with a question and you'll get a ton of feedback. Also, if you do use Excel, you can contact Microsoft if need be.

  16. #16
    Buried_PIRATE
    Shaved my chest.
    Buried_PIRATE's Avatar Become A Pro!
    Join Date: 12-28-10
    Posts: 546

    I can crush VB and I know how to use regressions using STATA/SAS etc but how the hell do you even begin to model baseball?

    Scenario, Red Sox vs Twins, Lester on the hill on short rest vs Blackburn on full rest, couple minor injuries, papelbon used on back to back days... prediction, go!

    Or am I missing something?

  17. #17
    Kaplan
    Kaplan's Avatar Become A Pro!
    Join Date: 01-15-11
    Posts: 165
    Betpoints: 865

    Quote Originally Posted by Buried_PIRATE View Post
    I can crush VB and I know how to use regressions using STATA/SAS etc but how the hell do you even begin to model baseball?

    Scenario, Red Sox vs Twins, Lester on the hill on short rest vs Blackburn on full rest, couple minor injuries, papelbon used on back to back days... prediction, go!

    Or am I missing something?
    I've been looking around the web trying to decide on a software product that can do regressions. My computer abilities is limited to excel (pretty good). Do I have a chance of learning to do regressions with Stata, given my limited exposure?

    Thanks.

  18. #18
    Buried_PIRATE
    Shaved my chest.
    Buried_PIRATE's Avatar Become A Pro!
    Join Date: 12-28-10
    Posts: 546

    Good thread to bump. Hopefully some senior model builders can weigh in on my previous question.

    @Kaplan: Yes, I think Stata would be a fine program. The computer resources required to run it are low and basic functionality is pretty easy to understand. I think the more important thing to understand is what regressions actually mean / do. I would recommend you check out an introductory text in econometrics or something along those lines.

  19. #19
    suicidekings
    Update your status
    suicidekings's Avatar Become A Pro!
    Join Date: 03-23-09
    Posts: 9,962

    Quote Originally Posted by Buried_PIRATE View Post
    I can crush VB and I know how to use regressions using STATA/SAS etc but how the hell do you even begin to model baseball?

    Scenario, Red Sox vs Twins, Lester on the hill on short rest vs Blackburn on full rest, couple minor injuries, papelbon used on back to back days... prediction, go!

    Or am I missing something?
    In general, that's correct. Handicapping a particular game would involve:

    Starting pitcher A vs Hitters B + Relief Pitching A vs Team B = Team Total B
    Starting pitcher B vs Hitters A + Relief Pitching B vs Team A = Team Total A
    Use respective runs scored to calculate the total and no-vig moneyline

    You'd have to know: how many innings each starter is expected to pitch, how rested each bullpen is (this affects different teams by different amounts), the exact starting lineup for each team and their expected performance both in this game and against LHP/RHP, Ballpark rating (hitters park vs pitchers park), weather, etc.

    There are a TON of stats in baseball, and some are much better assessments of ability than others. Basically, what I'm saying is describing the process is very simple, but building the model is not. The measure of your success in modeling will be how you assemble that massive collection of stats and extract the useful information into something that accurately describes what will happen in a particular game.

    Rule #1 for computer modeling: Garbage in, Garbage out.

  20. #20
    Boner_18
    Update your status
    Boner_18's Avatar Become A Pro!
    Join Date: 08-24-08
    Posts: 8,301
    Betpoints: 1031

    I was hoping someone could better explain to me how to create B multipliers for Smyths BaseRuns equation(s). I know the equation to use, essentially solve for the multipliers but I am having trouble conceptualizing what that means, what data to use etc. Any info/discussion would be much appreciated.

  21. #21
    illfuuptn
    illfuuptn's Avatar Become A Pro!
    Join Date: 03-17-10
    Posts: 1,860

    All of these follow up questions are great and I hope some people can answer those questions as well, but the answers I need are much simpler.
    How do I build a model? What program do I use? How do I learn the programming language(no experience whatsoever) for said program?

  22. #22
    EXhoosier10
    EXhoosier10's Avatar Become A Pro!
    Join Date: 07-06-09
    Posts: 3,122
    Betpoints: 4390

    Build a model - import data into your spreadsheet (you can use excel web queries)
    Program - pick the one you're most familiar with. If you know excel, you can use excel. Otherwise, proceed to next step
    How to learn - go to your local library, rent "Blank Blank for Dummies", sit down next to your computer with said book, open to page 1, and begin reading.

  23. #23
    illfuuptn
    illfuuptn's Avatar Become A Pro!
    Join Date: 03-17-10
    Posts: 1,860

    Quote Originally Posted by EXhoosier10 View Post
    Build a model - import data into your spreadsheet (you can use excel web queries)
    Program - pick the one you're most familiar with. If you know excel, you can use excel. Otherwise, proceed to next step
    How to learn - go to your local library, rent "Blank Blank for Dummies", sit down next to your computer with said book, open to page 1, and begin reading.
    Yeah but how could I just "import" years worth of baseball results and statistics into excel in a fast, readable format? And then how could I use that information to run backtesting on specific algorithms to determine if I'm profitable?

  24. #24
    EXhoosier10
    EXhoosier10's Avatar Become A Pro!
    Join Date: 07-06-09
    Posts: 3,122
    Betpoints: 4390

    http://www.excelforum.com/excel-gene...-scraping.html

    If you choose MS Excel (Im not sure if this is the best way, its the only way i know how though), google "web scraping +excel" and that should get you started. Then learn Visual Basic (http://www.google.com/search?aq=f&so...ic+for+dummies) to collect it all.

  25. #25
    Buried_PIRATE
    Shaved my chest.
    Buried_PIRATE's Avatar Become A Pro!
    Join Date: 12-28-10
    Posts: 546

    Quote Originally Posted by illfuuptn View Post
    Yeah but how could I just "import" years worth of baseball results and statistics into excel in a fast, readable format? And then how could I use that information to run backtesting on specific algorithms to determine if I'm profitable?
    I know you can backtest on covers, I have been doing it for the NBA

    If you are really serious about it I think you have to start your own database in MS Access...

    You would to include all the basics, opening line, totals, relevant injuries (if you can track it), back to back games, important game notes etc..

    I mean... it isn't easy, but you can bet that Vegas is doing similar things and then some, with much more people looking at it than just yourself

  26. #26
    sharpcat
    sharpcat's Avatar Become A Pro!
    Join Date: 12-19-09
    Posts: 4,516

    Quote Originally Posted by illfuuptn View Post
    Yeah but how could I just "import" years worth of baseball results and statistics into excel in a fast, readable format? And then how could I use that information to run backtesting on specific algorithms to determine if I'm profitable?
    You are in way over your head trying to model MLB if you do not know the answers to these questions.

  27. #27
    illfuuptn
    illfuuptn's Avatar Become A Pro!
    Join Date: 03-17-10
    Posts: 1,860

    Quote Originally Posted by sharpcat View Post
    You are in way over your head trying to model MLB if you do not know the answers to these questions.
    Hence why I'm asking these questions. We all have to start somewhere.

  28. #28
    Flight
    Update your status
    Flight's Avatar Become A Pro!
    Join Date: 01-27-09
    Posts: 1,979

    You need to start with data. You should have a program that can automatically import the historic and current boxscores into your database.

    You're almost out of time for MLB. Give yourself time to properly develop and test before deploying.

  29. #29
    suicidekings
    Update your status
    suicidekings's Avatar Become A Pro!
    Join Date: 03-23-09
    Posts: 9,962

    A starting point for data:

    http://www.baseballprospectus.com/statistics/sortable/
    http://www.baseball-reference.com/
    http://www.retrosheet.org/

    As was stated above, it's not feasible for you to expect to be guided through the model building process step by step. The data is available in csv format, which can be imported into excel easily. It's up to you to learn how to use the program to manage the databases. As for the actual process of building the model, you need to come up with that on your own as well.

    Start by looking back through the Thinktank for the dozens of threads that already exist about model building and do some reading. I can think of several very helpful threads that are in there, particularly those written by a poster named Ganchrow that explain many fundamental principles.
    Last edited by suicidekings; 02-14-11 at 02:17 AM.

  30. #30
    illfuuptn
    illfuuptn's Avatar Become A Pro!
    Join Date: 03-17-10
    Posts: 1,860

    So am I correct about the following? I need a database(like mysql) to house all of the information I find, information which I scrape from the web using a programming code I build/make(in java?). What exactly do I use to run tests on the data I find? Is that also done in mysql or java or is that completely different?

  31. #31
    Dark Horse
    Deus Ex Machina
    Dark Horse's Avatar Become A Pro!
    Join Date: 12-14-05
    Posts: 13,764

    Quote Originally Posted by illfuuptn View Post
    Hence why I'm asking these questions. We all have to start somewhere.
    You only need one question to get started. When you find the answer, by actually starting somewhere, you may come across another question. The questions lead the way, and as you find the answers you become more empowered and certain in your ways. The title of this thread is not hopeful, where it comes to your ability to think for yourself.

  32. #32
    roasthawg
    roasthawg's Avatar Become A Pro!
    Join Date: 11-09-07
    Posts: 2,990

    Excel has gotten me through years of modeling. Only recently have I run into the issue of memory overload. Excel is the way to go in my experience for sure.

  33. #33
    Borat38
    Borat38's Avatar Become A Pro!
    Join Date: 10-15-10
    Posts: 177
    Betpoints: 132

    What about SQL, guys? Am fine w/ Excel, but in terms of pulling up data with certain criteria, would Sql be a good choice for a non-programmer like me?

  34. #34
    ManBearPig
    ManBearPig's Avatar Become A Pro!
    Join Date: 12-04-08
    Posts: 2,473

    Quote Originally Posted by Borat38 View Post
    What about SQL, guys? Am fine w/ Excel, but in terms of pulling up data with certain criteria, would SQL be a good choice for a non-programmer like me?
    Unless your well versed in SQL...SQL by itself won't be much use to you because one you have the data you have to know how you want to mine the data. I guess you could create a bunch of views and use those as secondary datasets and pull results based on that, but it would be really time consuming.

    The reason why most suggest using a language like VB or C# in conjunction is that you need a stack that will allow you to not only query the data but also do something with it...usually apply logic via functions/procedures and get some real work done. There really is almost no limit as to what languages you can use if you know how to use them correctly and like playing architect.

    Excel is good to start with because you can use it as a DB, although limited, and it has the ability to do some calculations of this data using Excel VB that can stacked into a single application. Personally I'd rather use something like Oracle and PL/SQL, but you have to a more solid understand of DB's at a more than basic level. Also you need to have the time to do it right and set everything up. With Access or Excel you can do it on the fly with minimal programming knowledge and effort...that's what I did. In the off-season, I'm going to move it to something more useful to me.

  35. #35
    bolekblues
    bolekblues's Avatar Become A Pro!
    Join Date: 12-06-08
    Posts: 420

    I have had some success with excel as well, though i have an NBA database, not MLB. I have to agree that if you can use it properly (advanced search functions, if-clauses, maybe some simple macros) you do not really need any other program. good luck

12 Last
Top