1. #1
    therber2
    Barrels of whiskey
    therber2's Avatar Become A Pro!
    Join Date: 12-22-08
    Posts: 3,715

    Printable MLB historical box scores

    Does anyone have a good source for this? Something that scrapes nicely into excel.

  2. #2
    Flight
    Update your status
    Flight's Avatar Become A Pro!
    Join Date: 01-27-09
    Posts: 1,979

    Probably better to build an automated tool considering the vast number of games each season.

    Iterative HTML crawler + RegEx parser

  3. #3
    therber2
    Barrels of whiskey
    therber2's Avatar Become A Pro!
    Join Date: 12-22-08
    Posts: 3,715

    Quote Originally Posted by Flight View Post
    Probably better to build an automated tool considering the vast number of games each season.

    Iterative HTML crawler + RegEx parser
    That's a start. In english though? :-)

  4. #4
    therber2
    Barrels of whiskey
    therber2's Avatar Become A Pro!
    Join Date: 12-22-08
    Posts: 3,715

    Flight

    Here Flight,

    Check out my file in progress:

    Attached Files

  5. #5
    Flight
    Update your status
    Flight's Avatar Become A Pro!
    Join Date: 01-27-09
    Posts: 1,979

  6. #6
    Flight
    Update your status
    Flight's Avatar Become A Pro!
    Join Date: 01-27-09
    Posts: 1,979

    I looked at your spreadsheet.

    Were you looking for data for this year? It looks like you're trying to track teams as they progress through the current season (as opposed to analyzing data from past seasons in order to run regressions and build a model).

  7. #7
    therber2
    Barrels of whiskey
    therber2's Avatar Become A Pro!
    Join Date: 12-22-08
    Posts: 3,715

    Quote Originally Posted by Flight View Post
    I looked at your spreadsheet.

    Were you looking for data for this year? It looks like you're trying to track teams as they progress through the current season (as opposed to analyzing data from past seasons in order to run regressions and build a model).
    You got it flight. I need day by day results as opposed to a cummulative at any point. The rates of change are what I am interested in here.

    I saw that website before. Could you show me and example of how you could scrape all of the data for one team if possible?

    Thanks

  8. #8
    Flight
    Update your status
    Flight's Avatar Become A Pro!
    Join Date: 01-27-09
    Posts: 1,979

    I recommend going fully automated, but that does require programming skill. If you want to do it manually, continue reading. The reason I say go auto is because you will have to repeat this exercise every time you want to update data and get the latest numbers. I hate repeating tasks.

    http://boston.redsox.mlb.com/schedul...=bos&year=2010

    The team "sortable schedule" pages from mlb.com seem to do well for a manual job. Highlight the table with your mouse and CTRL+C. Go to Excel, right click and select paste special and past it as Text. If using internet explorer, you can probably just hit CTRL+V, but I find other browsers like Firefox and Chrome require a paste-special-text.

    This should get you a table in excel for one team.

    You can optionally delete the games that were "Postponed" to make your data clean, but it may screw up your dates. I think you may have other issues regarding date as well, as sometimes there are double headers and no-game days, and it won't line up with the Excel table you attached where you have every day listed. I would recommend leaving date out of the your table and just saying "Game 1, Game 2, etc" for your column headers.

    Now... We need to clean up the scores and get columns of RS and RA, right? Insert a column to the right of the Result column C. Paste this formula:
    Code:
    =RIGHT(C2,LEN(C2)-2))
    This should strip off the W/L text. Now insert two columns to the right of column D, column E = RS, column F = RA.

    RS formula
    Code:
    =LEFT(D2,FIND("-",D2)-1)
    RA Formula
    Code:
    =RIGHT(D2,FIND("-",D2)-1)
    Now that you have an RS column, you need an RS row (according to the spreadsheet you had dates across the columns, unorthodox). Highlight the entire RS column (you can skip the header row) and copy. Click an empty cell (or your destination cell in your attached worksheet), right click the cell and select paste special. Two selections to make here are transpose (to convert column to row) and values (to leave those crazy formulas behind).

    This should get you one team's runs scored for each game in the format you need. Now repeat for RA's, and for all the other teams. I recommend hiring a high school kid, poor Yugoslavian, or a monkey to finish the job.

    Hope this helps.

  9. #9
    arwar
    arwar's Avatar Become A Pro!
    Join Date: 07-09-09
    Posts: 208
    Betpoints: 1544

    brute force

Top