Printable MLB historical box scores

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • therber2
    Restricted User
    • 12-22-08
    • 3715

    #1
    Printable MLB historical box scores
    Does anyone have a good source for this? Something that scrapes nicely into excel.
  • Flight
    Restricted User
    • 01-28-09
    • 1979

    #2
    Probably better to build an automated tool considering the vast number of games each season.

    Iterative HTML crawler + RegEx parser
    Comment
    • therber2
      Restricted User
      • 12-22-08
      • 3715

      #3
      Originally posted by Flight
      Probably better to build an automated tool considering the vast number of games each season.

      Iterative HTML crawler + RegEx parser
      That's a start. In english though? :-)
      Comment
      • therber2
        Restricted User
        • 12-22-08
        • 3715

        #4
        Flight

        Here Flight,

        Check out my file in progress:

        Attached Files
        Comment
        • Flight
          Restricted User
          • 01-28-09
          • 1979

          #5



          They don't have 2010 data available yet.
          Comment
          • Flight
            Restricted User
            • 01-28-09
            • 1979

            #6
            I looked at your spreadsheet.

            Were you looking for data for this year? It looks like you're trying to track teams as they progress through the current season (as opposed to analyzing data from past seasons in order to run regressions and build a model).
            Comment
            • therber2
              Restricted User
              • 12-22-08
              • 3715

              #7
              Originally posted by Flight
              I looked at your spreadsheet.

              Were you looking for data for this year? It looks like you're trying to track teams as they progress through the current season (as opposed to analyzing data from past seasons in order to run regressions and build a model).
              You got it flight. I need day by day results as opposed to a cummulative at any point. The rates of change are what I am interested in here.

              I saw that website before. Could you show me and example of how you could scrape all of the data for one team if possible?

              Thanks
              Comment
              • Flight
                Restricted User
                • 01-28-09
                • 1979

                #8
                I recommend going fully automated, but that does require programming skill. If you want to do it manually, continue reading. The reason I say go auto is because you will have to repeat this exercise every time you want to update data and get the latest numbers. I hate repeating tasks.

                The official website of the Boston Red Sox with the most up-to-date information on news, tickets, schedule, stadium, roster, rumors, scores, and stats.


                The team "sortable schedule" pages from mlb.com seem to do well for a manual job. Highlight the table with your mouse and CTRL+C. Go to Excel, right click and select paste special and past it as Text. If using internet explorer, you can probably just hit CTRL+V, but I find other browsers like Firefox and Chrome require a paste-special-text.

                This should get you a table in excel for one team.

                You can optionally delete the games that were "Postponed" to make your data clean, but it may screw up your dates. I think you may have other issues regarding date as well, as sometimes there are double headers and no-game days, and it won't line up with the Excel table you attached where you have every day listed. I would recommend leaving date out of the your table and just saying "Game 1, Game 2, etc" for your column headers.

                Now... We need to clean up the scores and get columns of RS and RA, right? Insert a column to the right of the Result column C. Paste this formula:
                Code:
                =RIGHT(C2,LEN(C2)-2))
                This should strip off the W/L text. Now insert two columns to the right of column D, column E = RS, column F = RA.

                RS formula
                Code:
                =LEFT(D2,FIND("-",D2)-1)
                RA Formula
                Code:
                =RIGHT(D2,FIND("-",D2)-1)
                Now that you have an RS column, you need an RS row (according to the spreadsheet you had dates across the columns, unorthodox). Highlight the entire RS column (you can skip the header row) and copy. Click an empty cell (or your destination cell in your attached worksheet), right click the cell and select paste special. Two selections to make here are transpose (to convert column to row) and values (to leave those crazy formulas behind).

                This should get you one team's runs scored for each game in the format you need. Now repeat for RA's, and for all the other teams. I recommend hiring a high school kid, poor Yugoslavian, or a monkey to finish the job.

                Hope this helps.
                Comment
                • arwar
                  SBR High Roller
                  • 07-09-09
                  • 208

                  #9
                  brute force
                  Comment
                  SBR Contests
                  Collapse
                  Top-Rated US Sportsbooks
                  Collapse
                  Working...