1. #1
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    How to scrape your own NCAA game result database

    This post will give you instructions on how to scrape your own database for NCAA Football using only Microsoft excel and an internet connection

    Please don't ask me for a different database with a bunch of different statistics. SBR wouldn't allow me to charge you, and I'm not doing it for free. Hiring someone to program a scrapper would run you in the hundreds of dollars, and it is not something terribly difficult to do by yourself with some practice.

    I use foxsports for all of my scrapping. Their web-address is easy to parse and iterate in excel.

    Please don't respond till I say "Done". It is hard to keep track of a post with instructions if there are posts interrupting it.

    Lets get started:

    For example, the address for the 2015 Miami Beach Bowl between Western Kentucky and USF is

    http://www.foxsports.com/college-foo...score?id=25507

    We can see the part that we need to pull the correct games is the ID number on the end.

    So we need a list of game numbers to scrape
    Nomination(s):
    This post was nominated 2 times . To view the nominated thread please click here. People who nominated: KVB, and Bluehorseshoe

  2. #2
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    air-force-falcons
    akron-zips
    alabama-crimson-tide
    appalachian-state-mountaineers
    arizona-state-sun-devils
    arizona-wildcats
    arkansas-razorbacks
    arkansas-state-red-wolves
    army-west-point-black-knights
    auburn-tigers
    ball-state-cardinals
    baylor-bears
    boise-state-broncos
    boston-college-eagles
    bowling-green-falcons
    buffalo-bulls
    byu-cougars
    california-golden-bears
    central-michigan-chippewas
    charlotte-49ers
    cincinnati-bearcats
    clemson-tigers
    colorado-buffaloes
    colorado-state-rams
    connecticut-huskies
    duke-blue-devils
    east-carolina-pirates
    eastern-michigan-eagles
    fiu-golden-panthers
    florida-atlantic-owls
    florida-gators
    florida-state-seminoles
    fresno-state-bulldogs
    georgia-bulldogs
    georgia-southern-eagles
    georgia-state-panthers
    georgia-tech-yellow-jackets
    hawaii-rainbow-warriors
    houston-cougars
    idaho-vandals
    illinois-fighting-illini
    indiana-hoosiers
    iowa-hawkeyes
    iowa-state-cyclones
    kansas-jayhawks
    kansas-state-wildcats
    kent-state-golden-flashes
    kentucky-wildcats
    louisiana-monroe-warhawks
    louisiana-ragin-cajuns
    louisiana-tech-bulldogs
    louisville-cardinals
    lsu-tigers
    marshall-thundering-herd
    maryland-terrapins
    massachusetts-minutemen
    memphis-tigers
    miami-(fl)-hurricanes
    miami-(oh)-redhawks
    michigan-state-spartans
    michigan-wolverines
    middle-tennessee-blue-raiders
    minnesota-golden-gophers
    mississippi-state-bulldogs
    missouri-tigers
    navy-midshipmen
    nebraska-cornhuskers
    nevada-wolf-pack
    new-mexico-lobos
    new-mexico-state-aggies
    north-carolina-state-wolfpack
    north-carolina-tar-heels
    northern-illinois-huskies
    north-texas-mean-green
    northwestern-wildcats
    notre-dame-fighting-irish
    ohio-bobcats
    ohio-state-buckeyes
    oklahoma-sooners
    oklahoma-state-cowboys
    old-dominion-monarchs
    ole-miss-rebels
    oregon-ducks
    oregon-state-beavers
    penn-state-nittany-lions
    pittsburgh-panthers
    purdue-boilermakers
    rice-owls
    rutgers-scarlet-knights
    san-diego-state-aztecs
    san-jose-state-spartans
    smu-mustangs
    south-alabama-jaguars
    south-carolina-gamecocks
    southern-miss-golden-eagles
    south-florida-bulls
    stanford-cardinal
    syracuse-orange
    tcu-horned-frogs
    temple-owls
    tennessee-volunteers
    texas-am-aggies
    texas-longhorns
    texas-state-bobcats
    texas-tech-red-raiders
    toledo-rockets
    troy-trojans
    tulane-green-wave
    tulsa-golden-hurricane
    ucf-knights
    ucla-bruins
    unlv-rebels
    usc-trojans
    utah-state-aggies
    utah-utes
    utep-miners
    utsa-roadrunners
    vanderbilt-commodores
    virginia-cavaliers
    virginia-tech-hokies
    wake-forest-demon-deacons
    washington-huskies
    washington-state-cougars
    western-kentucky-hilltoppers
    western-michigan-broncos
    west-virginia-mountaineers
    wisconsin-badgers
    wyoming-cowboys

  3. #3
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Above is the list of NCAA football teams, properly configured for our scraping program

  4. #4
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    How to scrape gamenumber (2015):

    Open a new excel workbook
    In Column DH, Paste the above list of teams
    In Column DI, paste the following formula =CONCATENATE(DH1,"-team-schedule"), drag to the bottom of the list
    Go to Developer->Visual Basic
    Left Click on Modules in the right hand side, the insert then Module
    You will see a blank window to the right
    Copy and Paste This there:

    Public Function GetURL(c As Range) As String
    On Error Resume Next
    GetURL = c.Hyperlinks(1).Address
    End Function

    Exit the VBA window and return to the worksheet

    Open up Macros and create a new macro, paste the below:





    Code:
    Dim Counter As Integer
    Counter = 1
    
    Do While Counter < 129
    Dim Team As String
    Team = Cells(Counter, "DI").Value
    
    
        With ActiveSheet.QueryTables.Add(Connection:= _
            "URL;http://www.foxsports.com/college-football/" & Team & "?season=2015", Destination _
            :=Range("$A$1"))
            .Name = "mlb-baseball"
            .FieldNames = True
            .RowNumbers = False
            .FillAdjacentFormulas = False
            .PreserveFormatting = False
            .RefreshOnFileOpen = False
            .BackgroundQuery = True
            .RefreshStyle = xlOverwriteCells
            .SavePassword = False
            .SaveData = True
            .AdjustColumnWidth = True
            .RefreshPeriod = 0
            .WebSelectionType = xlEntirePage
            .WebFormatting = xlWebFormattingAll
            .WebPreFormattedTextToColumns = True
            .WebConsecutiveDelimitersAsOne = True
            .WebSingleBlockTextImport = False
            .WebDisableDateRecognition = False
            .WebDisableRedirections = False
            .Refresh BackgroundQuery:=False
        End With
      Range("DD2:DD20").Copy
      Sheets("Sheet2").Select
      Range("A60000").End(xlUp).Offset(1, 0).Select
      Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
      :=False, Transpose:=False
      Sheets("Sheet1").Select
      Counter = Counter + 1
      Loop
    Application.ScreenUpdating = True
    End Sub
    Last edited by Waterstpub87; 07-25-17 at 02:30 PM.

  5. #5
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    The smile above is a colon

    Code:
    Create a second sheet called sheet 2
    
    Select Sheet 1
    
    In Cells DA2 Write 1
    In Cells DB2 Write =MATCH(DA2,A:A,0)
    In Cells DC2 write =geturl(INDEX(D:D,DB2))
    In Cells DD2 Write =RIGHT(DC2,5)
    
    Drag your DA2 result down, insuring that you get the numbers 1 to 15
    In DA17 Write Bowl
    In DA18 Write B1
    
    Drag DB2:DD2 down to Row 18. Everything should be blank but don't worry we will go from there
    Last edited by Waterstpub87; 07-25-17 at 06:45 PM.

  6. #6
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Any Time its a smiley, it is colon letter D.

    Go to Developer->Macros and open up your macro

    Where it says : Do While Counter <129, change that to a 2, so it should read Do While Counter <2

    Say a prayer and hit the run button.

    If your sheet is correct, you will see in column A around row 468 or so, will be the Air Force Falcons Schedule

    Go to the DD Column at the top and you will see a list of game numbers, which have been pasted to sheet 2

    If you have game numbers on sheet 2, your process is working correctly.

    Go Back to sheet 1, and go to Macros_> Macro1_>Edit

    Change your Do While Counter <2 to Do While Counter <129

    Click the run button, and wait roughly 10 minutes, and you will have all of the game numbers for the 128 teams pasted in duplicate on sheet 2.
    Last edited by Waterstpub87; 07-24-17 at 09:33 PM.

  7. #7
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Once this has finished running, select sheet 2. Rows 2 through roughly 2211 should have data or errors.

    Select all of column A. Select the Data tab at the top, and then click Remove duplicate (Every Game is copied twice because it is on the home and away teams schedules), click ok at the prompt, it will pop up a message say 1339 duplicates remove, 874 values remain.

    In A5, there should be an error. Delete row 5 moving everything up. These are the games that occurred during 2015. Insure that you save the sheet as a macro enabled workbook.

    In the future, if you want to scrape other years, easy fix:

    Change the year after season=2015 to whatever year you would like. I have not test this besides 2014 as of yet.

    "URL;http://www.foxsports.com/college-football/" & Team & "?season=2015", Destination _
    :=Range("$A$1"))

    We are ready to move on to scraping our data.

    Open up another excel work book.

  8. #8
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Rename Sheet 1 to Games
    Rename Sheet 2 to Data

    Copy and Paste your gamenumbers that we generated in the other sheet into Column AH
    Click on Developer->Macros>-Create New Macro with name of Macro1

    Copy and Paste the Following into the Macro
    Dim Counter As Integer
    Counter = 1
    Application.ScreenUpdating = False
    Do While Counter < 2
    Dim gamenum As Integer
    gamenum = Cells(Counter, "AH").Value

    With ActiveSheet.QueryTables.Add(Connection:= _
    "URL;http://www.foxsports.com/college-football/boxscore?id=" & gamenum & "&type=3", _
    Destination:=Range("$A$1"))
    .Name = "boxscore?id=19564&type=3"
    .FieldNames = True
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = True
    .RefreshOnFileOpen = False
    .BackgroundQuery = True
    .RefreshStyle = xlOverwriteCells
    .SavePassword = False
    .SaveData = True
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .WebSelectionType = xlEntirePage
    .WebFormatting = xlWebFormattingNone
    .WebPreFormattedTextToColumns = True
    .WebConsecutiveDelimitersAsOne = True
    .WebSingleBlockTextImport = False
    .WebDisableDateRecognition = False
    .WebDisableRedirections = False
    .Refresh BackgroundQuery:=False
    End With
    Sheets("Games").Calculate

    Range("L5:AG5").Copy
    Sheets("Data").Select
    Range("A60000").End(xlUp).Offset(1, 0).Select
    Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
    :=False, Transpose:=False
    Sheets("Games").Select

    Range("L8:AG8").Copy
    Sheets("Data").Select
    Range("A60000").End(xlUp).Offset(1, 0).Select
    Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
    :=False, Transpose:=False
    Sheets("Games").Select
    Counter = Counter + 1
    Loop
    Application.ScreenUpdating = True
    End Sub
    Last edited by Waterstpub87; 07-24-17 at 10:30 PM.

  9. #9
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Click the run botton

    In column A, you should now see a lot of information. This is the scrapped data of the game whose number is in AH1

    In L4 write Date
    In M4 Write Away
    In N4 Write OPP
    In O4 Write Where
    In P4 Write Passing Yards
    In Q4 Passing Attempts
    In R4 Completions
    In S4 Interceptions
    In T4 Rush Attempts
    In U4 Rush Yards
    In V4 Special Teams TDS
    In W4 Def TDs
    In X4 Opponent Passing Yards
    In Y4 Opponent Pass Attempts
    In Z4 Opponent Completions
    In AA4 Opponent Interceptions
    In AB4 Opponent Rushes
    IN AC4 OPP Rush Yards
    IN AD4 Opponent ST TDS
    IN AE4 Opp DEF TD
    IN AF4 Points
    IN AG4 Points Against

    Copy these header fields, and paste into L7. Change the Away to Home.

    In L5 =INDEX(A:A,MATCH("Final",A:A,0)+1)
    In M5 =INDEX(A:A,MATCH("1st Downs",A:A,0)-2)
    In N5 =M8
    In O5 = Away
    In P5 =INDEX(B:B,MATCH("Comp-Att",A:A,0)-1)/1
    In Q5 =IFERROR(MID(INDEX(B:B,MATCH("Comp-Att",A:A,0)),SEARCH("-",INDEX(B:B,MATCH("Comp-Att",A:A,0)))+1,3)/1,DAY(INDEX(B:B,MATCH("Comp-Att",A:A,0))))

    In R5 =IFERROR(LEFT(INDEX(B:B,MATCH("Comp-Att",A:A,0)),SEARCH("-",INDEX(B:B,MATCH("Comp-Att",A:A,0)))-1)/1,MONTH(INDEX(B:B,MATCH("Comp-Att",A:A,0))))

    In S5 =INDEX(B:B,MATCH("Comp-Att",A:A,0)+2)/1

    In t5 =INDEX(B:B,MATCH("Attempts",A:A,0))/1

    in U5 =INDEX(B:B,MATCH("Attempts",A:A,0)-1)/1

    In V5 =INDEX(B:B,MATCH("Special Teams TDs",A:A,0))/1

    In W5 =INDEX(B:B,MATCH("Def TDs",A:A,0))/1

    In X5 =P8

    In Y5 =Q8

    In Z5 =R8

    In AA5 =S8

    In AB5 =T8

    In AC5 =U8

    in AD5 =V8

    In AE5 =W8

    in AF5 =INDEX(A:A,MATCH("Box Score Box Team Stats Team Stats Play-By-Play PBP Scoring Summary Scoring ",A:A,0)-9)

    In AG5 =AF8

  10. #10
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Continuing on to parsing the home data, almost done

    Code:
    In L8 =INDEX(A:A,MATCH("Final",A:A,0)+1)
    M8 =INDEX(C:C,MATCH("1st Downs",C:C,0)-2)
    N8  =M5
    O8 = Home
    P8  =INDEX(D:D,MATCH("Comp-Att",A:A,0)-1)/1
    
    Q8 =IFERROR(MID(INDEX(D:D,MATCH("Comp-Att",C:C,0)),SEARCH("-",INDEX(D:D,MATCH("Comp-Att",C:C,0)))+1,3)/1,DAY(INDEX(D:D,MATCH("Comp-Att",C:C,0))))
    
    R8 =IFERROR(LEFT(INDEX(D:D,MATCH("Comp-Att",C:C,0)),SEARCH("-",INDEX(D:D,MATCH("Comp-Att",C:C,0)))-1)/1,MONTH(INDEX(D:D,MATCH("Comp-Att",C:C,0))))
    
    S8 =INDEX(D:D,MATCH("Comp-Att",C:C,0)+2)/1
    
    T8 =INDEX(D:D,MATCH("Attempts",C:C,0))/1
    
    U8 =INDEX(D:D,MATCH("Attempts",C:C,0)-1)/1
    
    V8 =INDEX(D:D,MATCH("Special Teams TDs",C:C,0))/1
    
    W8 =INDEX(D:D,MATCH("Def TDs",C:C,0))/1
    
    X8 =P5
    
    Y8 =Q5
    
    Z8 =R5
    
    AA8 =S5 
    
    AB8 =T5
    AC8 =U5
    AD8 =V5
    AE8 =W5
    
    AF8 =INDEX(A:A,MATCH("Box Score Box Team Stats Team Stats Play-By-Play PBP Scoring Summary Scoring ",A:A,0)-8)
    
    AG =AF5
    
    You should now see the game statistics populated for both teams. 
    
    Copy L4: AG4
    Select Sheet Data, and paste in Row 1
    Select sheet games

    Click Macros and Edit Macro One

    Find the line that says "Do While Counter < 2", Change this to 50 and click run botton. Wait roughly 5 minutes. Click on Data, and you should have game results for 49 games pasted there.

    Going forward, I usually run 100 to 150 at a time. This stops the program from crashing, as it tend to crash sometimes if you set the numbers to high, and I like to save as I go.

    To Run this, change the "Do While Counter < 2" to whatever number, and change the "Counter = 1" to the number last time

    For example, if you ran the first 50, change the "Do While Counter < 100 " and Counter = 50

    I run this while I watch baseball, and just watch for when it finishes running. After you scrap all the data, your sheet might be a little sluggish. It is good to copy the data to another workbook and work on it from there.

    This should be all, I might make edits when I try to walk through doing this based on these instructions.

    DONE
    Last edited by Waterstpub87; 07-25-17 at 06:48 PM.

  11. #11
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Remeber the Smiles are : D

    Should be no more smileys, saving in the code gets rid of smiles.

    Feel free to post any questions
    Last edited by Waterstpub87; 07-25-17 at 06:49 PM.

  12. #12
    eastvan09
    eastvan09's Avatar Become A Pro!
    Join Date: 09-30-09
    Posts: 1,400
    Betpoints: 1728

    Just read this as I am back on the forum after several years away. The easy to follow instructions are great. I have not tried this yet but am considering doing so. How many years back will you scrape? Once you have this data, how to decide on calculations which use the data to highlight games which are a play in the upcoming schedule?

  13. #13
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Quote Originally Posted by eastvan09 View Post
    Just read this as I am back on the forum after several years away. The easy to follow instructions are great. I have not tried this yet but am considering doing so. How many years back will you scrape? Once you have this data, how to decide on calculations which use the data to highlight games which are a play in the upcoming schedule?
    This is season is on going, so my scraping is already being put into practice. I scraped 3 years. Used regressions to determine the results of individuals plays based on differences in power ratings in past data to apply to this season.

  14. #14
    eastvan09
    eastvan09's Avatar Become A Pro!
    Join Date: 09-30-09
    Posts: 1,400
    Betpoints: 1728

    Quote Originally Posted by Waterstpub87 View Post
    This is season is on going, so my scraping is already being put into practice. I scraped 3 years. Used regressions to determine the results of individuals plays based on differences in power ratings in past data to apply to this season.
    Thanks Water!

  15. #15
    kodiakken
    kodiakken's Avatar Become A Pro!
    Join Date: 08-21-17
    Posts: 1
    Betpoints: 120

    Thanks. I'm trying to learn how to use Excel in similar ways for different sports and this will give me something to study.

  16. #16
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Quote Originally Posted by kodiakken View Post
    Thanks. I'm trying to learn how to use Excel in similar ways for different sports and this will give me something to study.
    Glad to help. Will be posting scraping program for NCAA Basketball, likely end of October or so.

  17. #17
    dbouchot
    dbouchot's Avatar Become A Pro!
    Join Date: 07-16-17
    Posts: 94
    Betpoints: 148

    hi Waters, first of all ty for this thread!
    Im sorry to bother you with this but hopefully you can help me, on the step of running the macro after changing 129 to 2 im getting a runtime 438 error, when i debug it highlighs .preserveformatting = false, do you have any idea what am I doing wrong?

  18. #18
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Quote Originally Posted by dbouchot View Post
    hi Waters, first of all ty for this thread!
    Im sorry to bother you with this but hopefully you can help me, on the step of running the macro after changing 129 to 2 im getting a runtime 438 error, when i debug it highlighs .preserveformatting = false, do you have any idea what am I doing wrong?
    Try exiting out of the work book, and then running again. Sometimes, if there has been an error where the website did not load correctly, some of those options get stuck.

Top