1. #1
    lyon804
    lyon804's Avatar Become A Pro!
    Join Date: 11-02-09
    Posts: 6,526
    Betpoints: 1011

    Computer Modeling and Data Mining

    I am looking to build a database and create models that do basically what I have been doing manually for the last 10+ yrs. I enjoy handicapping and the challenge of it, but I see the need to branch out into something that is more efficient. Basically, my goal is to input what I know into a model to gain more +EV opportunities over large numbers. Currently, I still work a traditional job but I am self employed which allows for more time and freedom to do what I really enjoy doing which is sports handicapping.


    I respect many of you here and know that you are past what I am trying to accomplish in regards to modeling and data mining. I respect those that already have this bulit and are currently grinding out profits over several sports I understand.


    Without giving me the golden egg please tell me some general things. For example. Were did you get your data? How far back is relevant/irrelevant? What kind of software or program did you build or did you buy something canned?


    Any general information would be very well appreciated.



    Thanks in advance to all that contribute to the think tank.

  2. #2
    MrX
    MrX's Avatar Become A Pro!
    Join Date: 01-10-06
    Posts: 1,540

    Quote Originally Posted by lyon804 View Post
    Without giving me the golden egg please tell me some general things. For example. Were did you get your data?
    That depends on the sport and what kind of data you want. For baseball game-data, look no further than retrosheet.org. Most other sports present more of a challenge. You're going to have to learn how to scrape websites to get the more useful data.

    Quote Originally Posted by lyon804 View Post
    How far back is relevant/irrelevant?
    That's something you have to work out on your own. Regression analysis is your friend. It's going to vary from stat to stat. Also, you probably want to think more in terms of how relevant is data from last game and how relevant is data from 30 games ago, etc., rather than a cutoff point between relevant and irrelevant.


    Quote Originally Posted by lyon804 View Post
    What kind of software or program did you build or did you buy something canned?
    If by "something canned" you mean commercial handicapping software, I wouldn't recommend it. If you mean database and statistical software packages, Excel, R, Stata, MySQL are all valuable tools.

    Personally, I use Visual Studio (vb.net) and a MySQL database, but just about any programming environment, database combo will work.

  3. #3
    lyon804
    lyon804's Avatar Become A Pro!
    Join Date: 11-02-09
    Posts: 6,526
    Betpoints: 1011

    Quote Originally Posted by MrX View Post
    That depends on the sport and what kind of data you want. For baseball game-data, look no further than retrosheet.org. Most other sports present more of a challenge. You're going to have to learn how to scrape websites to get the more useful data.


    Thank you! Alot of what you are talking about I have spoken with my father. He can can copy and paste past results from a website such as covers.com and put into word and write a program to scrap or load it into a data base. He did this for horse racing handicapping. Is that what you were referring to?



    That's something you have to work out on your own. Regression analysis is your friend. It's going to vary from stat to stat. Also, you probably want to think more in terms of how relevant is data from last game and how relevant is data from 30 games ago, etc., rather than a cutoff point between relevant and irrelevant.


    Good point.




    If by "something canned" you mean commercial handicapping software, I wouldn't recommend it. If you mean database and statistical software packages, Excel, R, Stata, MySQL are all valuable tools.

    Personally, I use Visual Studio (vb.net) and a MySQL database, but just about any programming environment, database combo will work.


    You are spot on with that. That is exactly what I was saying.



    MR. X I appreciate your insight. Thank you for taking the time to point me in the right direction.

  4. #4
    MrX
    MrX's Avatar Become A Pro!
    Join Date: 01-10-06
    Posts: 1,540

    Quote Originally Posted by lyon804 View Post
    Thank you! Alot of what you are talking about I have spoken with my father. He can can copy and paste past results from a website such as covers.com and put into word and write a program to scrap or load it into a data base. He did this for horse racing handicapping. Is that what you were referring to?
    Anything that involves manual copy/paste is going to end up being very time-consuming, though it could get the job done. There is a sticky-thread in the Think Tank with an excellent intro to data-scraping.

  5. #5
    whitey
    SBROdds Developer
    whitey's Avatar Become A Pro!
    Join Date: 04-01-08
    Posts: 485
    Betpoints: 7175

    If you have any programming experience and know a little about XPath you can scrape any website easily.

    There is a tool called Tidy that will convert any HTML document into XML. Once the website is in XML its very simple to scrape any of the information.
    Points Awarded:

    Maverick22 gave whitey 20 SBR Point(s) for this post.


  6. #6
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    Quote Originally Posted by whitey View Post
    If you have any programming experience and know a little about XPath you can scrape any website easily.

    There is a tool called Tidy that will convert any HTML document into XML. Once the website is in XML its very simple to scrape any of the information.
    I owe you a beer (prolly a keg's worth) for this one whitey... Thanks bro!

    Can I buy a keg of beer with betpoints? if so, i'll fwd you the points

  7. #7
    lyon804
    lyon804's Avatar Become A Pro!
    Join Date: 11-02-09
    Posts: 6,526
    Betpoints: 1011

    Quote Originally Posted by whitey View Post
    If you have any programming experience and know a little about XPath you can scrape any website easily.

    There is a tool called Tidy that will convert any HTML document into XML. Once the website is in XML its very simple to scrape any of the information.


    Outstanding Whitey! Thanks for sharing this information.

  8. #8
    whitey
    SBROdds Developer
    whitey's Avatar Become A Pro!
    Join Date: 04-01-08
    Posts: 485
    Betpoints: 7175

    Quote Originally Posted by Maverick22 View Post
    I owe you a beer (prolly a keg's worth) for this one whitey... Thanks bro!

    Can I buy a keg of beer with SBR points? if so, i'll fwd you the points
    I'm not very savvy with the sports betting, but if you have any question about programming I probably have an answer. Mostly C# though, don't know a lot about many other languages.

  9. #9
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    Quote Originally Posted by whitey View Post

    I'm not very savvy with the sports betting, but if you have any question about programming I probably have an answer. Mostly C# though, don't know a lot about many other languages.
    Nah, no questions. Just that tool sounds like like it may be worth its weight in gold... If there is one thing I can do ( and till the cows come home )... Its parse some xml... Imma fire it off some sites and see how it works...It may save me a few weeks of programming...

    Hahah unless you know how to program device drivers...then I have a gillion questions...But I digress from the original post's intent...

  10. #10
    pedro803
    pedro803's Avatar Become A Pro!
    Join Date: 01-02-10
    Posts: 309
    Betpoints: 5708

    has anyone tried the application named tidy referenced above? If you have, has it made scraping easier for you?

  11. #11
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    lyon, scraping one data set is one thing, but they all have dropouts, errors, are too short, don't have all the right data, etc. swapping with others can get you up the curve faster. I typically work on a one-for-one swap arrangement. Any interest? If so, PM me.

  12. #12
    roasthawg
    roasthawg's Avatar Become A Pro!
    Join Date: 11-09-07
    Posts: 2,990

    Check out simplehtmldom for php too if you're interested in making your scraping a little easier.

  13. #13
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    Quote Originally Posted by pedro803 View Post
    has anyone tried the application named tidy referenced above? If you have, has it made scraping easier for you?
    After a lot of fiddling. I got this to work. Depending how you use it, you will get mixed results. So YMMV. I just got it working moments ago, but i can guarantee it should make scraping easier, b/c of the tree/recursive structure of xml

    But as Wreck said. Data may not be consistent. So you will have to keep this in mind.

  14. #14
    strixee
    I think, therefore I win
    strixee's Avatar Become A Pro!
    Join Date: 05-31-10
    Posts: 432

    I'm going to learn about data-mining for betting purposes. I have at least basic knowledge of statistics, graph theory, both discrete and analytic mathematics, linear algebra, formal languages, FSM, probabilistic decision making, standard logics etc. What book do you think I should read first of the following ones?
    Advances in Data Mining (Petra Perner)
    Data Mining Methods And Models (Daniel T. Larose)
    Data Mining Practical Machine Learning Tools and Techniques (Ian H. Witten,Eibe Frank)
    Data Mining Techniques For Marketing, Sales, and Customer Relationship Management (Michael J.A. Berry,Gordon S. Linoff)
    Data Mining Practical Machine Learning Tools and Techniques (Ian H. Witten,Eibe Frank,Mark A. Hall)
    Mastering Data Mining The Art and Science of Customer Relationship Managemen (Michael J A Berry, Gordon S Linoff)
    Optimization Based Data Mining: Theory and Applications (Yong Shi, Yingjie Tian, Gang Kou, Yi Peng, Jianping Li)

Top