1. #1
    geebo18
    geebo18's Avatar Become A Pro!
    Join Date: 09-25-11
    Posts: 7
    Betpoints: 387

    Building Database

    I am working on building an MLB database and was hoping to open up a discussion on the best way to go about this. My vision was to included game by game data for each player. I have built databases before via scraping with Ruby, but I'm not sure that will be the most efficient way to compile this database. Curious to hear how others have built/bought theirs? Are there any good dbs for sale? If so what sites? And if anyone has scraped which sites do u like? Retrosheet, fangraphs...

  2. #2
    EXhoosier10
    EXhoosier10's Avatar Become A Pro!
    Join Date: 07-06-09
    Posts: 3,122
    Betpoints: 4390

    i've been pulling in fangraphs data for players using excel web queries for a while. I moved over to python at the end of the season and it was a bit of a pain to navigate to pull YTD data. In all honesty, it wasn't that much easier than the excel webqueries in my opinion.

    I know fangraphs began including daily stat updates on individual player pages, but scraping hundreds of those each day is definitely not going to be efficient, especially once minor league players start to get called up. I'm not sure if they have/had a leaderboard for every player on one page, but i imagine that would be helpful.

    I think the website you want to scrape from depends on what data you want to look at as they all provide a bit of variety.

Top