Excel based tutorial for web scraping

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • oilcountry99
    SBR Wise Guy
    • 08-29-10
    • 707

    #176
    Originally posted by b_rad_1983
    Got it Oil...

    This will search row a1 through a100. Look for the word STARTERS. Once found it will move up 2 rows and color the entire row as well as change the font color.


    Dim r As Range
    Dim i As Long
    Set r = Range("A1:A100")
    For i = r.Rows.Count To 1 Step -1
    With r.Cells(i, 1)
    If .Value = "STARTERS" Then
    Range("A" & i - 2).EntireRow.Interior.Color = RGB(25, 25, 112)
    Range("A" & i - 2).EntireRow.Font.Color = vbWhite
    End If
    End With
    Next I
    Another thought would be adding conditional formatting to your 'final' sheet or wherever the data is pasted to from the temp sheet.
    Comment
    • b_rad_1983
      SBR High Roller
      • 01-07-13
      • 127

      #177
      This is my current copy selection

      Worksheets("Test").Range("A2299").Copy _


      this is a way i found to get to a cell and clear it.

      Dim cell As Range
      For Each cell In [a:a]
      If cell.Value = "SPORTS" Then cell.ClearContents 'put any value you want here
      Next cell


      What i would like to do is use them both together, to either delete everything after cell.value or use cell.value in my copy instead of p99

      have having a hard time at combining everything
      Comment
      • b_rad_1983
        SBR High Roller
        • 01-07-13
        • 127

        #178
        Instead of copying I just deleted un wanted stuff and formatted on the main page. At the end of my code it renames the sheet.
        So I guess you could say I figured it out.
        Comment
        • b_rad_1983
          SBR High Roller
          • 01-07-13
          • 127

          #179
          Text Splitting...

          Pittsburgh Pirates at Chicago Cubs
          Milwaukee Brewers at Philadelphia Phillies
          New York Mets at Atlanta Braves

          That's a1 trough a3

          I would like to have the home teams in r1 through r3 and visiting teams in s1 through s3.
          Matchups will not always be in the same order.

          It seems that for some teams what I tried works, but with others it doesn't.
          Comment
          • b_rad_1983
            SBR High Roller
            • 01-07-13
            • 127

            #180
            New York Mets at Atlanta Braves =RIGHT(A59,LEN(A59)-(1)-FIND("at",A59))
            that works to get Atlanta

            Pittsburgh Pirates at Chicago Cubs =RIGHT(A64,LEN(A64)-(1)-FIND("at",A64))
            same formula doesn't work to get Chicago
            Result= es at Chicago Cubs
            Comment
            • b_rad_1983
              SBR High Roller
              • 01-07-13
              • 127

              #181
              Got it =TRIM(RIGHT(SUBSTITUTE(A64," at",REPT(" ",LEN(A64))),LEN(A64)))
              Comment
              • b_rad_1983
                SBR High Roller
                • 01-07-13
                • 127

                #182
                On the current sheet that I'm using to import my data from Web. If I type in a formula and fill down it keeps all the same result, but the actual formula updates by row like it should. I need to double click the cell then hit enter for the correct results.
                Is this how excel is supposed to act? I've never imported data with Excel before but am I supposed to close the connection after? Is this a setting issue?
                Comment
                • b_rad_1983
                  SBR High Roller
                  • 01-07-13
                  • 127

                  #183
                  Sub indirect()


                  Dim r As Range
                  Dim i As Long
                  Dim cell As Range


                  Set r = Range("A1:A300")
                  For i = r.Rows.Count To 1 Step -1
                  With r.Cells(i, 1)
                  If .Value = "STARTERS" Then
                  Range("A" & i - 2).Range("$r$1").FormulaR1C1 = _
                  "=TRIM(RIGHT(SUBSTITUTE(RC[-17],"" at"",REPT("" "",LEN(RC[-17]))),LEN(RC[-17])))"
                  End If
                  End With
                  Next i
                  End Sub


                  This will search a1 through a300 and look for they key word "starters". Once it finds starters it will move up two rows and split the matchup and place the home team in column r

                  table i imported is http://espn.go.com/mlb/probables
                  Comment
                  • HeeeHAWWWW
                    SBR Hall of Famer
                    • 06-13-08
                    • 5487

                    #184
                    Good thread this - wish it was available a few years ago, would have saved me an enormous amount of time :-)
                    Comment
                    • b_rad_1983
                      SBR High Roller
                      • 01-07-13
                      • 127

                      #185
                      Originally posted by HeeeHAWWWW
                      Good thread this - wish it was available a few years ago, would have saved me an enormous amount of time :-)

                      How advanced are you?

                      Im looking take a team name and match it to another sheet and grab some cells based on the team, can you help with that?
                      Comment
                      • HeeeHAWWWW
                        SBR Hall of Famer
                        • 06-13-08
                        • 5487

                        #186
                        Originally posted by b_rad_1983
                        How advanced are you?

                        Im looking take a team name and match it to another sheet and grab some cells based on the team, can you help with that?

                        Haven't done much of this lately (ended up paying someone to write a scraper!), but if I'm reading you correctly that sounds like a job for a vlookup.
                        Comment
                        • b_rad_1983
                          SBR High Roller
                          • 01-07-13
                          • 127

                          #187
                          Originally posted by HeeeHAWWWW
                          Haven't done much of this lately (ended up paying someone to write a scraper!), but if I'm reading you correctly that sounds like a job for a vlookup.

                          I was able to do it with INDEX. Looks like there may be 2 ways!

                          =IFERROR(INDEX(RPI!$C$3:$C$32,MATCH($R4, RPI!$B$3:$B$32,0)),"")
                          Comment
                          • b_rad_1983
                            SBR High Roller
                            • 01-07-13
                            • 127

                            #188
                            I'm looking to import pitcher stats that include WHIP

                            I'm currently using http://espn.go.com/mlb/probables

                            But there is no walks, so I cannot use a formula to figure out WHIP stats.

                            Any suggestions of a better site than espn?

                            Or a way to figure out whip?
                            Comment
                            • b_rad_1983
                              SBR High Roller
                              • 01-07-13
                              • 127

                              #189
                              a4u2fear

                              im looking to import 5 different web pages with a "loop" doing the exact same formatting, then renaming the tab to a specific name.

                              as of now, after the standard pull in code i have the following.

                              'Deletes Test SheetApplication.DisplayAlerts = False
                              Sheets("Test").Select
                              ActiveWindow.SelectedSheets.Delete


                              'Add Titles
                              Sheets("MR").Select
                              Range("H9").Select
                              ActiveCell.FormulaR1C1 = "Us Odds"
                              Range("I9").Select
                              ActiveCell.FormulaR1C1 = "Decimal Odds"
                              Range("I10").Select
                              Columns("I:I").EntireColumn.AutoFit
                              Range("J9").Select
                              ActiveCell.FormulaR1C1 = "Win/Loss"
                              Range("J10").Select
                              Columns("J:J").EntireColumn.AutoFit

                              'autofits all rows
                              Sheets("MR").Select
                              Columns("A:I").Select
                              Columns("A:I").EntireColumn.AutoFit
                              Range("A3").Select






                              I would like to have something along the line of
                              for i = 1 to 5

                              1= site 1
                              2= site 2

                              With Sheets("NHLData").QueryTables.Add(Connec tion:= _
                              "URL;(1), Destination:= _
                              Range("$A$1"))

                              'Deletes Test SheetApplication.DisplayAlerts = False
                              Sheets("Test").Select
                              ActiveWindow.SelectedSheets.Delete


                              'Add Titles
                              Sheets("MR").Select
                              Range("H9").Select
                              ActiveCell.FormulaR1C1 = "Us Odds"
                              Range("I9").Select
                              ActiveCell.FormulaR1C1 = "Decimal Odds"
                              Range("I10").Select
                              Columns("I:I").EntireColumn.AutoFit
                              Range("J9").Select
                              ActiveCell.FormulaR1C1 = "Win/Loss"
                              Range("J10").Select
                              Columns("J:J").EntireColumn.AutoFit

                              'autofits all rows
                              Sheets("MR").Select
                              Columns("A:I").Select
                              Columns("A:I").EntireColumn.AutoFit
                              Range("A3").Select


                              next i

                              I know im way off, i just need a direction to go in. Do not do the work for me lol
                              Comment
                              • oilcountry99
                                SBR Wise Guy
                                • 08-29-10
                                • 707

                                #190
                                I'm looking to format the following:
                                Cell A1 contains: Jake Peavy (R)
                                I want cell B1 to contain: J. Peavy

                                I know this calls for right left replace trim etc etc commands but I don't get how that works for the life of me.

                                Any help here would be great

                                TIA
                                OC99
                                Last edited by oilcountry99; 06-09-14, 07:15 AM. Reason: Changed cell reference
                                Comment
                                • oilcountry99
                                  SBR Wise Guy
                                  • 08-29-10
                                  • 707

                                  #191
                                  Originally posted by b_rad_1983
                                  I'm looking to import pitcher stats that include WHIP

                                  I'm currently using http://espn.go.com/mlb/probables

                                  But there is no walks, so I cannot use a formula to figure out WHIP stats.

                                  Any suggestions of a better site than espn?

                                  Or a way to figure out whip?
                                  Complete source for baseball history including complete major league player, team, and league stats, awards, records, leaders, rookies and scores.
                                  Comment
                                  • b_rad_1983
                                    SBR High Roller
                                    • 01-07-13
                                    • 127

                                    #192
                                    Oil,

                                    give this a try

                                    =LEFT(A1,1) & ". " & B1 & MID(A1,SEARCH(" ", A1,1)+1,(SEARCH(" ",A1,SEARCH(" ",A1,1)+1)-1)-SEARCH(" ",A1,1))
                                    Comment
                                    • b_rad_1983
                                      SBR High Roller
                                      • 01-07-13
                                      • 127

                                      #193
                                      To be a bit more clear on what i want to do is to get data from 5 different pages (all with different names) with a loop to avoid having 5 different set of info. Just a bit more simpler code... and alternate method.

                                      As of now I have two modules, module 2 has -one sub that gets site 1, I copied that down and have a another sub to get site 2 and so on..

                                      Then in module 1 i have my main code, which is the one I run.

                                      sub main
                                      call site 1
                                      call site 2
                                      call site 3
                                      ...

                                      end sub
                                      Comment
                                      • akphidelt
                                        SBR MVP
                                        • 07-24-11
                                        • 1228

                                        #194
                                        Awesome thread guys. Hope I can help. Expert in excel. Also produced web programs, was on another site with an NBA tracker. Have it on www.akptracker.com. Can scrape any data off any site.

                                        B_rad, I promise you if you took the time to do this through php and MySQL you would never use excel again. And it's completely free to do. Just download Xampp from apachefriends.org. Either way, hope I can help. Love this stuff.
                                        Comment
                                        • oilcountry99
                                          SBR Wise Guy
                                          • 08-29-10
                                          • 707

                                          #195
                                          Originally posted by b_rad_1983
                                          Oil,

                                          give this a try

                                          =LEFT(A1,1) & ". " & B1 & MID(A1,SEARCH(" ", A1,1)+1,(SEARCH(" ",A1,SEARCH(" ",A1,1)+1)-1)-SEARCH(" ",A1,1))
                                          b_rad, thanks for the reply....no success however. I made an error in my request as well and have since updated post #190 to reflect that.

                                          I appreciate your efforts and any other suggestions you may have
                                          Comment
                                          • b_rad_1983
                                            SBR High Roller
                                            • 01-07-13
                                            • 127

                                            #196
                                            I had an error! copy and paste this in b1.


                                            =LEFT(A1,1) & ". " & MID(A1,SEARCH(" ", A1,1)+1,(SEARCH(" ",A1,SEARCH(" ",A1,1)+1)-1)-SEARCH(" ",A1,1))
                                            Comment
                                            • oilcountry99
                                              SBR Wise Guy
                                              • 08-29-10
                                              • 707

                                              #197
                                              Works great! My next question is what if the name is Jorge De La Rosa (L)?

                                              I end up with J. De
                                              Comment
                                              • b_rad_1983
                                                SBR High Roller
                                                • 01-07-13
                                                • 127

                                                #198
                                                You will need to play around with that one. Do a Google search for excel text splitting from space to bracket.

                                                I was able to overcome my problem by using

                                                Dim URL As Range


                                                For Each URL In Sheets("Urls").Range("A1").Cells 'Sheet Urls is where I have the different urls to add on to the site.
                                                ActiveWorkbook.Worksheets.Add
                                                With ActiveSheet.QueryTables.Add(Connection:= _
                                                "URL;www.firstpartofsite.com/" & URL, Destination:=Range("$A$1"))
                                                .Name = "table"
                                                .FieldNames = True
                                                .RowNumbers = False
                                                .FillAdjacentFormulas = False
                                                .PreserveFormatting = True
                                                .RefreshOnFileOpen = False
                                                .BackgroundQuery = True
                                                .RefreshStyle = xlOverwriteCells
                                                .SavePassword = False
                                                .SaveData = True
                                                .AdjustColumnWidth = False
                                                .RefreshPeriod = 0
                                                .WebSelectionType = "1"
                                                .WebFormatting = xlWebFormattingNone
                                                .WebPreFormattedTextToColumns = True
                                                .WebConsecutiveDelimitersAsOne = True
                                                .WebSingleBlockTextImport = False
                                                .WebDisableDateRecognition = True
                                                .WebDisableRedirections = False
                                                .Refresh BackgroundQuery:=False
                                                End With

                                                Next
                                                End Sub
                                                Comment
                                                • b_rad_1983
                                                  SBR High Roller
                                                  • 01-07-13
                                                  • 127

                                                  #199
                                                  a12 = Jorge De La Rosa (L)

                                                  =LEFT(A12,1) & ". " & MID(A12,SEARCH(" ", A12,1)+1,(SEARCH(" (",A12,SEARCH(" ",A12,1)+1)-1)-SEARCH(" ",A12,1))


                                                  b12 = J. De La Rosa
                                                  Comment
                                                  • oilcountry99
                                                    SBR Wise Guy
                                                    • 08-29-10
                                                    • 707

                                                    #200
                                                    Originally posted by b_rad_1983
                                                    a12 = Jorge De La Rosa (L)

                                                    =LEFT(A12,1) & ". " & MID(A12,SEARCH(" ", A12,1)+1,(SEARCH(" (",A12,SEARCH(" ",A12,1)+1)-1)-SEARCH(" ",A12,1))


                                                    b12 = J. De La Rosa
                                                    Thanks again for your help! This doesn't make any sense to me, i need to study it.
                                                    Last edited by oilcountry99; 06-09-14, 01:23 PM.
                                                    Comment
                                                    • akphidelt
                                                      SBR MVP
                                                      • 07-24-11
                                                      • 1228

                                                      #201
                                                      Originally posted by b_rad_1983
                                                      I'm looking to import pitcher stats that include WHIP

                                                      I'm currently using http://espn.go.com/mlb/probables

                                                      But there is no walks, so I cannot use a formula to figure out WHIP stats.

                                                      Any suggestions of a better site than espn?

                                                      Or a way to figure out whip?
                                                      What about getting the statistics for the pitchers from ESPN stats pages... like this...

                                                      ESPN is the place for MLB stats! Discover the All MLB Pitching stat leaders of the 2025 MLB Regular Season.


                                                      You can loop through the "count" and store all the pitcher data in a sheet, then just look up that data from the Probables page you have by pitcher?

                                                      You'd just have to adjust the pitcher count and the qualified/false or qualified/true parameters and loop through that beast and you should have all the pitching data you can handle.
                                                      Comment
                                                      • b_rad_1983
                                                        SBR High Roller
                                                        • 01-07-13
                                                        • 127

                                                        #202
                                                        Originally posted by akphidelt
                                                        What about getting the statistics for the pitchers from ESPN stats pages... like this...

                                                        ESPN is the place for MLB stats! Discover the All MLB Pitching stat leaders of the 2025 MLB Regular Season.


                                                        You can loop through the "count" and store all the pitcher data in a sheet, then just look up that data from the Probables page you have by pitcher?

                                                        You'd just have to adjust the pitcher count and the qualified/false or qualified/true parameters and loop through that beast and you should have all the pitching data you can handle.


                                                        On the probables page it all there and easy to get. The pitching link you posted, its over many of pages. I'm not advanced enough to take on this task.

                                                        I would love to make a sheet that has every pitcher in mlb and all thier stats.
                                                        Comment
                                                        • akphidelt
                                                          SBR MVP
                                                          • 07-24-11
                                                          • 1228

                                                          #203
                                                          Originally posted by b_rad_1983
                                                          On the probables page it all there and easy to get. The pitching link you posted, its over many of pages. I'm not advanced enough to take on this task.

                                                          I would love to make a sheet that has every pitcher in mlb and all thier stats.
                                                          I'll take a stab at it after lunch... might be a little rusty on VBA but it should come back.
                                                          Comment
                                                          • b_rad_1983
                                                            SBR High Roller
                                                            • 01-07-13
                                                            • 127

                                                            #204
                                                            Rusty or not. Point us in the direction and we should be able to help out!
                                                            Comment
                                                            • akphidelt
                                                              SBR MVP
                                                              • 07-24-11
                                                              • 1228

                                                              #205
                                                              Originally posted by b_rad_1983
                                                              Rusty or not. Point us in the direction and we should be able to help out!
                                                              This is super ghetto but it gets the job done on my end...

                                                              Code:
                                                              Sub GetPitchingStats()
                                                              
                                                              Dim startPitcher As Long
                                                              Dim lastPitcher As Long
                                                              Dim qualified As String
                                                              Dim lastRow As Long
                                                              
                                                              startPitcher = 1
                                                              lastPitcher = 800
                                                              qualified = "false"
                                                              lastRow = 1
                                                              
                                                              Application.ScreenUpdating = False
                                                              
                                                              Do Until startPitcher > lastPitcher
                                                                  With ActiveSheet.QueryTables.Add(Connection:= _
                                                                      "URL;http://espn.go.com/mlb/stats/pitching/_/count/" & startPitcher & "/qualified/" & qualified & "/order/false" _
                                                                      , Destination:=Range("$A$" & lastRow))
                                                                      .Name = "false"
                                                                      .FieldNames = True
                                                                      .RowNumbers = False
                                                                      .FillAdjacentFormulas = False
                                                                      .PreserveFormatting = True
                                                                      .RefreshOnFileOpen = False
                                                                      .BackgroundQuery = True
                                                                      .RefreshStyle = xlInsertDeleteCells
                                                                      .SavePassword = False
                                                                      .SaveData = True
                                                                      .AdjustColumnWidth = True
                                                                      .RefreshPeriod = 0
                                                                      .WebSelectionType = xlEntirePage
                                                                      .WebFormatting = xlWebFormattingNone
                                                                      .WebPreFormattedTextToColumns = True
                                                                      .WebConsecutiveDelimitersAsOne = True
                                                                      .WebSingleBlockTextImport = False
                                                                      .WebDisableDateRecognition = False
                                                                      .WebDisableRedirections = False
                                                                      .Refresh BackgroundQuery:=False
                                                                  End With
                                                                  
                                                                  'delete all blank cells in column R
                                                                  Columns("R:R").Select
                                                                  Selection.SpecialCells(xlCellTypeBlanks).Select
                                                                  Selection.EntireRow.Delete
                                                                  
                                                                  'start back at cell B1
                                                                  Range("B1").Select
                                                                  
                                                                  'add one to the last row so it starts on a blank row next time
                                                                  lastRow = Selection.End(xlDown).Row + 1
                                                                  
                                                                  'increase the pitcher count by 40
                                                                  startPitcher = startPitcher + 40
                                                              Loop
                                                              
                                                              'a little cleanup
                                                              lastRow = Range("$B$5000").End(xlUp).Row
                                                              For i = 2 To lastRow
                                                              If Cells(i, 2) = "PLAYER" Then
                                                                  Rows(i & ":" & i).EntireRow.Delete
                                                              End If
                                                              Next i
                                                              
                                                              Application.ScreenUpdating = True
                                                              
                                                              End Sub
                                                              Comment
                                                              • b_rad_1983
                                                                SBR High Roller
                                                                • 01-07-13
                                                                • 127

                                                                #206
                                                                Originally posted by akphidelt
                                                                This is super ghetto but it gets the job done on my end...

                                                                Code:
                                                                Sub GetPitchingStats()
                                                                
                                                                Dim startPitcher As Long
                                                                Dim lastPitcher As Long
                                                                Dim qualified As String
                                                                Dim lastRow As Long
                                                                
                                                                startPitcher = 1
                                                                lastPitcher = 800
                                                                qualified = "false"
                                                                lastRow = 1
                                                                
                                                                Application.ScreenUpdating = False
                                                                
                                                                Do Until startPitcher > lastPitcher
                                                                    With ActiveSheet.QueryTables.Add(Connection:= _
                                                                        "URL;http://espn.go.com/mlb/stats/pitching/_/count/" & startPitcher & "/qualified/" & qualified & "/order/false" _
                                                                        , Destination:=Range("$A$" & lastRow))
                                                                        .Name = "false"
                                                                        .FieldNames = True
                                                                        .RowNumbers = False
                                                                        .FillAdjacentFormulas = False
                                                                        .PreserveFormatting = True
                                                                        .RefreshOnFileOpen = False
                                                                        .BackgroundQuery = True
                                                                        .RefreshStyle = xlInsertDeleteCells
                                                                        .SavePassword = False
                                                                        .SaveData = True
                                                                        .AdjustColumnWidth = True
                                                                        .RefreshPeriod = 0
                                                                        .WebSelectionType = xlEntirePage
                                                                        .WebFormatting = xlWebFormattingNone
                                                                        .WebPreFormattedTextToColumns = True
                                                                        .WebConsecutiveDelimitersAsOne = True
                                                                        .WebSingleBlockTextImport = False
                                                                        .WebDisableDateRecognition = False
                                                                        .WebDisableRedirections = False
                                                                        .Refresh BackgroundQuery:=False
                                                                    End With
                                                                    
                                                                    'delete all blank cells in column R
                                                                    Columns("R:R").Select
                                                                    Selection.SpecialCells(xlCellTypeBlanks).Select
                                                                    Selection.EntireRow.Delete
                                                                    
                                                                    'start back at cell B1
                                                                    Range("B1").Select
                                                                    
                                                                    'add one to the last row so it starts on a blank row next time
                                                                    lastRow = Selection.End(xlDown).Row + 1
                                                                    
                                                                    'increase the pitcher count by 40
                                                                    startPitcher = startPitcher + 40
                                                                Loop
                                                                
                                                                'a little cleanup
                                                                lastRow = Range("$B$5000").End(xlUp).Row
                                                                For i = 2 To lastRow
                                                                If Cells(i, 2) = "PLAYER" Then
                                                                    Rows(i & ":" & i).EntireRow.Delete
                                                                End If
                                                                Next i
                                                                
                                                                Application.ScreenUpdating = True
                                                                
                                                                End Sub

                                                                I had to add Dim i As Integer, then it worked great!

                                                                Thanks!
                                                                Comment
                                                                • akphidelt
                                                                  SBR MVP
                                                                  • 07-24-11
                                                                  • 1228

                                                                  #207
                                                                  Originally posted by b_rad_1983
                                                                  I had to add Dim i As Integer, then it worked great!

                                                                  Thanks!
                                                                  Ahhh, you must have it set to where all variables need to be declared. Well that's good that you got it to work. Always so many little things that can screw stuff up, lol.
                                                                  Comment
                                                                  • b_rad_1983
                                                                    SBR High Roller
                                                                    • 01-07-13
                                                                    • 127

                                                                    #208
                                                                    In the time it took you to do that, how long would it take to accomplish it in php?
                                                                    Comment
                                                                    • lamichaeljames
                                                                      SBR Rookie
                                                                      • 06-02-14
                                                                      • 40

                                                                      #209
                                                                      Good thread!
                                                                      Comment
                                                                      • akphidelt
                                                                        SBR MVP
                                                                        • 07-24-11
                                                                        • 1228

                                                                        #210
                                                                        Originally posted by b_rad_1983
                                                                        In the time it took you to do that, how long would it take to accomplish it in php?
                                                                        Just to get that data would be about 10-15 minutes. Would be another 10-15 minutes if I wanted to set up a table in the DB to store it.

                                                                        The beauty of it though, you could scrape every games data, store it and calculate that data yourself if you wanted to.
                                                                        Comment
                                                                        SBR Contests
                                                                        Collapse
                                                                        Top-Rated US Sportsbooks
                                                                        Collapse
                                                                        Working...