How to scrape your own NCAA game result database

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Waterstpub87
    SBR MVP
    • 09-09-09
    • 4102

    #1
    How to scrape your own NCAA game result database
    This post will give you instructions on how to scrape your own database for NCAA Football using only Microsoft excel and an internet connection

    Please don't ask me for a different database with a bunch of different statistics. SBR wouldn't allow me to charge you, and I'm not doing it for free. Hiring someone to program a scrapper would run you in the hundreds of dollars, and it is not something terribly difficult to do by yourself with some practice.

    I use foxsports for all of my scrapping. Their web-address is easy to parse and iterate in excel.

    Please don't respond till I say "Done". It is hard to keep track of a post with instructions if there are posts interrupting it.

    Lets get started:

    For example, the address for the 2015 Miami Beach Bowl between Western Kentucky and USF is



    We can see the part that we need to pull the correct games is the ID number on the end.

    So we need a list of game numbers to scrape
  • Waterstpub87
    SBR MVP
    • 09-09-09
    • 4102

    #2
    air-force-falcons
    akron-zips
    alabama-crimson-tide
    appalachian-state-mountaineers
    arizona-state-sun-devils
    arizona-wildcats
    arkansas-razorbacks
    arkansas-state-red-wolves
    army-west-point-black-knights
    auburn-tigers
    ball-state-cardinals
    baylor-bears
    boise-state-broncos
    boston-college-eagles
    bowling-green-falcons
    buffalo-bulls
    byu-cougars
    california-golden-bears
    central-michigan-chippewas
    charlotte-49ers
    cincinnati-bearcats
    clemson-tigers
    colorado-buffaloes
    colorado-state-rams
    connecticut-huskies
    duke-blue-devils
    east-carolina-pirates
    eastern-michigan-eagles
    fiu-golden-panthers
    florida-atlantic-owls
    florida-gators
    florida-state-seminoles
    fresno-state-bulldogs
    georgia-bulldogs
    georgia-southern-eagles
    georgia-state-panthers
    georgia-tech-yellow-jackets
    hawaii-rainbow-warriors
    houston-cougars
    idaho-vandals
    illinois-fighting-illini
    indiana-hoosiers
    iowa-hawkeyes
    iowa-state-cyclones
    kansas-jayhawks
    kansas-state-wildcats
    kent-state-golden-flashes
    kentucky-wildcats
    louisiana-monroe-warhawks
    louisiana-ragin-cajuns
    louisiana-tech-bulldogs
    louisville-cardinals
    lsu-tigers
    marshall-thundering-herd
    maryland-terrapins
    massachusetts-minutemen
    memphis-tigers
    miami-(fl)-hurricanes
    miami-(oh)-redhawks
    michigan-state-spartans
    michigan-wolverines
    middle-tennessee-blue-raiders
    minnesota-golden-gophers
    mississippi-state-bulldogs
    missouri-tigers
    navy-midshipmen
    nebraska-cornhuskers
    nevada-wolf-pack
    new-mexico-lobos
    new-mexico-state-aggies
    north-carolina-state-wolfpack
    north-carolina-tar-heels
    northern-illinois-huskies
    north-texas-mean-green
    northwestern-wildcats
    notre-dame-fighting-irish
    ohio-bobcats
    ohio-state-buckeyes
    oklahoma-sooners
    oklahoma-state-cowboys
    old-dominion-monarchs
    ole-miss-rebels
    oregon-ducks
    oregon-state-beavers
    penn-state-nittany-lions
    pittsburgh-panthers
    purdue-boilermakers
    rice-owls
    rutgers-scarlet-knights
    san-diego-state-aztecs
    san-jose-state-spartans
    smu-mustangs
    south-alabama-jaguars
    south-carolina-gamecocks
    southern-miss-golden-eagles
    south-florida-bulls
    stanford-cardinal
    syracuse-orange
    tcu-horned-frogs
    temple-owls
    tennessee-volunteers
    texas-am-aggies
    texas-longhorns
    texas-state-bobcats
    texas-tech-red-raiders
    toledo-rockets
    troy-trojans
    tulane-green-wave
    tulsa-golden-hurricane
    ucf-knights
    ucla-bruins
    unlv-rebels
    usc-trojans
    utah-state-aggies
    utah-utes
    utep-miners
    utsa-roadrunners
    vanderbilt-commodores
    virginia-cavaliers
    virginia-tech-hokies
    wake-forest-demon-deacons
    washington-huskies
    washington-state-cougars
    western-kentucky-hilltoppers
    western-michigan-broncos
    west-virginia-mountaineers
    wisconsin-badgers
    wyoming-cowboys
    Comment
    • Waterstpub87
      SBR MVP
      • 09-09-09
      • 4102

      #3
      Above is the list of NCAA football teams, properly configured for our scraping program
      Comment
      • Waterstpub87
        SBR MVP
        • 09-09-09
        • 4102

        #4
        How to scrape gamenumber (2015):

        Open a new excel workbook
        In Column DH, Paste the above list of teams
        In Column DI, paste the following formula =CONCATENATE(DH1,"-team-schedule"), drag to the bottom of the list
        Go to Developer->Visual Basic
        Left Click on Modules in the right hand side, the insert then Module
        You will see a blank window to the right
        Copy and Paste This there:

        Public Function GetURL(c As Range) As String
        On Error Resume Next
        GetURL = c.Hyperlinks(1).Address
        End Function

        Exit the VBA window and return to the worksheet

        Open up Macros and create a new macro, paste the below:





        Code:
        Dim Counter As Integer
        Counter = 1
        
        Do While Counter < 129
        Dim Team As String
        Team = Cells(Counter, "DI").Value
        
        
            With ActiveSheet.QueryTables.Add(Connection:= _
                "URL;http://www.foxsports.com/college-football/" & Team & "?season=2015", Destination _
                :=Range("$A$1"))
                .Name = "mlb-baseball"
                .FieldNames = True
                .RowNumbers = False
                .FillAdjacentFormulas = False
                .PreserveFormatting = False
                .RefreshOnFileOpen = False
                .BackgroundQuery = True
                .RefreshStyle = xlOverwriteCells
                .SavePassword = False
                .SaveData = True
                .AdjustColumnWidth = True
                .RefreshPeriod = 0
                .WebSelectionType = xlEntirePage
                .WebFormatting = xlWebFormattingAll
                .WebPreFormattedTextToColumns = True
                .WebConsecutiveDelimitersAsOne = True
                .WebSingleBlockTextImport = False
                .WebDisableDateRecognition = False
                .WebDisableRedirections = False
                .Refresh BackgroundQuery:=False
            End With
          Range("DD2:DD20").Copy
          Sheets("Sheet2").Select
          Range("A60000").End(xlUp).Offset(1, 0).Select
          Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
          :=False, Transpose:=False
          Sheets("Sheet1").Select
          Counter = Counter + 1
          Loop
        Application.ScreenUpdating = True
        End Sub
        Last edited by Waterstpub87; 07-25-17, 02:30 PM.
        Comment
        • Waterstpub87
          SBR MVP
          • 09-09-09
          • 4102

          #5
          The smile above is a colon

          Code:
          Create a second sheet called sheet 2
          
          Select Sheet 1
          
          In Cells DA2 Write 1
          In Cells DB2 Write =MATCH(DA2,A:A,0)
          In Cells DC2 write =geturl(INDEX(D:D,DB2))
          In Cells DD2 Write =RIGHT(DC2,5)
          
          Drag your DA2 result down, insuring that you get the numbers 1 to 15
          In DA17 Write Bowl
          In DA18 Write B1
          
          Drag DB2:DD2 down to Row 18. Everything should be blank but don't worry we will go from there
          Last edited by Waterstpub87; 07-25-17, 06:45 PM.
          Comment
          • Waterstpub87
            SBR MVP
            • 09-09-09
            • 4102

            #6
            Any Time its a smiley, it is colon letter D.

            Go to Developer->Macros and open up your macro

            Where it says : Do While Counter <129, change that to a 2, so it should read Do While Counter <2

            Say a prayer and hit the run button.

            If your sheet is correct, you will see in column A around row 468 or so, will be the Air Force Falcons Schedule

            Go to the DD Column at the top and you will see a list of game numbers, which have been pasted to sheet 2

            If you have game numbers on sheet 2, your process is working correctly.

            Go Back to sheet 1, and go to Macros_> Macro1_>Edit

            Change your Do While Counter <2 to Do While Counter <129

            Click the run button, and wait roughly 10 minutes, and you will have all of the game numbers for the 128 teams pasted in duplicate on sheet 2.
            Last edited by Waterstpub87; 07-24-17, 09:33 PM.
            Comment
            • Waterstpub87
              SBR MVP
              • 09-09-09
              • 4102

              #7
              Once this has finished running, select sheet 2. Rows 2 through roughly 2211 should have data or errors.

              Select all of column A. Select the Data tab at the top, and then click Remove duplicate (Every Game is copied twice because it is on the home and away teams schedules), click ok at the prompt, it will pop up a message say 1339 duplicates remove, 874 values remain.

              In A5, there should be an error. Delete row 5 moving everything up. These are the games that occurred during 2015. Insure that you save the sheet as a macro enabled workbook.

              In the future, if you want to scrape other years, easy fix:

              Change the year after season=2015 to whatever year you would like. I have not test this besides 2014 as of yet.

              "URL;http://www.foxsports.com/college-football/" & Team & "?season=2015", Destination _
              :=Range("$A$1"))

              We are ready to move on to scraping our data.

              Open up another excel work book.
              Comment
              • Waterstpub87
                SBR MVP
                • 09-09-09
                • 4102

                #8
                Rename Sheet 1 to Games
                Rename Sheet 2 to Data

                Copy and Paste your gamenumbers that we generated in the other sheet into Column AH
                Click on Developer->Macros>-Create New Macro with name of Macro1

                Copy and Paste the Following into the Macro
                Dim Counter As Integer
                Counter = 1
                Application.ScreenUpdating = False
                Do While Counter < 2
                Dim gamenum As Integer
                gamenum = Cells(Counter, "AH").Value

                With ActiveSheet.QueryTables.Add(Connection:= _
                "URL;http://www.foxsports.com/college-football/boxscore?id=" & gamenum & "&type=3", _
                Destination:=Range("$A$1"))
                .Name = "boxscore?id=19564&type=3"
                .FieldNames = True
                .RowNumbers = False
                .FillAdjacentFormulas = False
                .PreserveFormatting = True
                .RefreshOnFileOpen = False
                .BackgroundQuery = True
                .RefreshStyle = xlOverwriteCells
                .SavePassword = False
                .SaveData = True
                .AdjustColumnWidth = True
                .RefreshPeriod = 0
                .WebSelectionType = xlEntirePage
                .WebFormatting = xlWebFormattingNone
                .WebPreFormattedTextToColumns = True
                .WebConsecutiveDelimitersAsOne = True
                .WebSingleBlockTextImport = False
                .WebDisableDateRecognition = False
                .WebDisableRedirections = False
                .Refresh BackgroundQuery:=False
                End With
                Sheets("Games").Calculate

                Range("L5:AG5").Copy
                Sheets("Data").Select
                Range("A60000").End(xlUp).Offset(1, 0).Select
                Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
                :=False, Transpose:=False
                Sheets("Games").Select

                Range("L8:AG8").Copy
                Sheets("Data").Select
                Range("A60000").End(xlUp).Offset(1, 0).Select
                Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
                :=False, Transpose:=False
                Sheets("Games").Select
                Counter = Counter + 1
                Loop
                Application.ScreenUpdating = True
                End Sub
                Last edited by Waterstpub87; 07-24-17, 10:30 PM.
                Comment
                • Waterstpub87
                  SBR MVP
                  • 09-09-09
                  • 4102

                  #9
                  Click the run botton

                  In column A, you should now see a lot of information. This is the scrapped data of the game whose number is in AH1

                  In L4 write Date
                  In M4 Write Away
                  In N4 Write OPP
                  In O4 Write Where
                  In P4 Write Passing Yards
                  In Q4 Passing Attempts
                  In R4 Completions
                  In S4 Interceptions
                  In T4 Rush Attempts
                  In U4 Rush Yards
                  In V4 Special Teams TDS
                  In W4 Def TDs
                  In X4 Opponent Passing Yards
                  In Y4 Opponent Pass Attempts
                  In Z4 Opponent Completions
                  In AA4 Opponent Interceptions
                  In AB4 Opponent Rushes
                  IN AC4 OPP Rush Yards
                  IN AD4 Opponent ST TDS
                  IN AE4 Opp DEF TD
                  IN AF4 Points
                  IN AG4 Points Against

                  Copy these header fields, and paste into L7. Change the Away to Home.

                  In L5 =INDEX(A:A,MATCH("Final",A:A,0)+1)
                  In M5 =INDEX(A:A,MATCH("1st Downs",A:A,0)-2)
                  In N5 =M8
                  In O5 = Away
                  In P5 =INDEX(B:B,MATCH("Comp-Att",A:A,0)-1)/1
                  In Q5 =IFERROR(MID(INDEX(B:B,MATCH("Comp-Att",A:A,0)),SEARCH("-",INDEX(B:B,MATCH("Comp-Att",A:A,0)))+1,3)/1,DAY(INDEX(B:B,MATCH("Comp-Att",A:A,0))))

                  In R5 =IFERROR(LEFT(INDEX(B:B,MATCH("Comp-Att",A:A,0)),SEARCH("-",INDEX(B:B,MATCH("Comp-Att",A:A,0)))-1)/1,MONTH(INDEX(B:B,MATCH("Comp-Att",A:A,0))))

                  In S5 =INDEX(B:B,MATCH("Comp-Att",A:A,0)+2)/1

                  In t5 =INDEX(B:B,MATCH("Attempts",A:A,0))/1

                  in U5 =INDEX(B:B,MATCH("Attempts",A:A,0)-1)/1

                  In V5 =INDEX(B:B,MATCH("Special Teams TDs",A:A,0))/1

                  In W5 =INDEX(B:B,MATCH("Def TDs",A:A,0))/1

                  In X5 =P8

                  In Y5 =Q8

                  In Z5 =R8

                  In AA5 =S8

                  In AB5 =T8

                  In AC5 =U8

                  in AD5 =V8

                  In AE5 =W8

                  in AF5 =INDEX(A:A,MATCH("Box Score Box Team Stats Team Stats Play-By-Play PBP Scoring Summary Scoring ",A:A,0)-9)

                  In AG5 =AF8
                  Comment
                  • Waterstpub87
                    SBR MVP
                    • 09-09-09
                    • 4102

                    #10
                    Continuing on to parsing the home data, almost done

                    Code:
                    In L8 =INDEX(A:A,MATCH("Final",A:A,0)+1)
                    M8 =INDEX(C:C,MATCH("1st Downs",C:C,0)-2)
                    N8  =M5
                    O8 = Home
                    P8  =INDEX(D:D,MATCH("Comp-Att",A:A,0)-1)/1
                    
                    Q8 =IFERROR(MID(INDEX(D:D,MATCH("Comp-Att",C:C,0)),SEARCH("-",INDEX(D:D,MATCH("Comp-Att",C:C,0)))+1,3)/1,DAY(INDEX(D:D,MATCH("Comp-Att",C:C,0))))
                    
                    R8 =IFERROR(LEFT(INDEX(D:D,MATCH("Comp-Att",C:C,0)),SEARCH("-",INDEX(D:D,MATCH("Comp-Att",C:C,0)))-1)/1,MONTH(INDEX(D:D,MATCH("Comp-Att",C:C,0))))
                    
                    S8 =INDEX(D:D,MATCH("Comp-Att",C:C,0)+2)/1
                    
                    T8 =INDEX(D:D,MATCH("Attempts",C:C,0))/1
                    
                    U8 =INDEX(D:D,MATCH("Attempts",C:C,0)-1)/1
                    
                    V8 =INDEX(D:D,MATCH("Special Teams TDs",C:C,0))/1
                    
                    W8 =INDEX(D:D,MATCH("Def TDs",C:C,0))/1
                    
                    X8 =P5
                    
                    Y8 =Q5
                    
                    Z8 =R5
                    
                    AA8 =S5 
                    
                    AB8 =T5
                    AC8 =U5
                    AD8 [B]=[/B]V5
                    AE8 =W5
                    
                    AF8 =INDEX(A:A,MATCH("Box Score Box Team Stats Team Stats Play-By-Play PBP Scoring Summary Scoring ",A:A,0)-8)
                    
                    AG =AF5
                    
                    You should now see the game statistics populated for both teams. 
                    
                    Copy L4: AG4
                    Select Sheet Data, and paste in Row 1
                    Select sheet games

                    Click Macros and Edit Macro One

                    Find the line that says "Do While Counter < 2", Change this to 50 and click run botton. Wait roughly 5 minutes. Click on Data, and you should have game results for 49 games pasted there.

                    Going forward, I usually run 100 to 150 at a time. This stops the program from crashing, as it tend to crash sometimes if you set the numbers to high, and I like to save as I go.

                    To Run this, change the "Do While Counter < 2" to whatever number, and change the "Counter = 1" to the number last time

                    For example, if you ran the first 50, change the "Do While Counter < 100 " and Counter = 50

                    I run this while I watch baseball, and just watch for when it finishes running. After you scrap all the data, your sheet might be a little sluggish. It is good to copy the data to another workbook and work on it from there.

                    This should be all, I might make edits when I try to walk through doing this based on these instructions.

                    DONE
                    Last edited by Waterstpub87; 07-25-17, 06:48 PM.
                    Comment
                    • Waterstpub87
                      SBR MVP
                      • 09-09-09
                      • 4102

                      #11
                      Remeber the Smiles are : D

                      Should be no more smileys, saving in the code gets rid of smiles.

                      Feel free to post any questions
                      Last edited by Waterstpub87; 07-25-17, 06:49 PM.
                      Comment
                      • eastvan09
                        SBR MVP
                        • 09-30-09
                        • 1400

                        #12
                        Just read this as I am back on the forum after several years away. The easy to follow instructions are great. I have not tried this yet but am considering doing so. How many years back will you scrape? Once you have this data, how to decide on calculations which use the data to highlight games which are a play in the upcoming schedule?
                        Comment
                        • Waterstpub87
                          SBR MVP
                          • 09-09-09
                          • 4102

                          #13
                          Originally posted by eastvan09
                          Just read this as I am back on the forum after several years away. The easy to follow instructions are great. I have not tried this yet but am considering doing so. How many years back will you scrape? Once you have this data, how to decide on calculations which use the data to highlight games which are a play in the upcoming schedule?
                          This is season is on going, so my scraping is already being put into practice. I scraped 3 years. Used regressions to determine the results of individuals plays based on differences in power ratings in past data to apply to this season.
                          Comment
                          • eastvan09
                            SBR MVP
                            • 09-30-09
                            • 1400

                            #14
                            Originally posted by Waterstpub87
                            This is season is on going, so my scraping is already being put into practice. I scraped 3 years. Used regressions to determine the results of individuals plays based on differences in power ratings in past data to apply to this season.
                            Thanks Water!
                            Comment
                            • kodiakken
                              SBR Rookie
                              • 08-21-17
                              • 1

                              #15
                              Thanks. I'm trying to learn how to use Excel in similar ways for different sports and this will give me something to study.
                              Comment
                              • Waterstpub87
                                SBR MVP
                                • 09-09-09
                                • 4102

                                #16
                                Originally posted by kodiakken
                                Thanks. I'm trying to learn how to use Excel in similar ways for different sports and this will give me something to study.
                                Glad to help. Will be posting scraping program for NCAA Basketball, likely end of October or so.
                                Comment
                                • dbouchot
                                  SBR Hustler
                                  • 07-16-17
                                  • 94

                                  #17
                                  hi Waters, first of all ty for this thread!
                                  Im sorry to bother you with this but hopefully you can help me, on the step of running the macro after changing 129 to 2 im getting a runtime 438 error, when i debug it highlighs .preserveformatting = false, do you have any idea what am I doing wrong?
                                  Comment
                                  • Waterstpub87
                                    SBR MVP
                                    • 09-09-09
                                    • 4102

                                    #18
                                    Originally posted by dbouchot
                                    hi Waters, first of all ty for this thread!
                                    Im sorry to bother you with this but hopefully you can help me, on the step of running the macro after changing 129 to 2 im getting a runtime 438 error, when i debug it highlighs .preserveformatting = false, do you have any idea what am I doing wrong?
                                    Try exiting out of the work book, and then running again. Sometimes, if there has been an error where the website did not load correctly, some of those options get stuck.
                                    Comment
                                    SBR Contests
                                    Collapse
                                    Top-Rated US Sportsbooks
                                    Collapse
                                    Working...