1. #1
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    Python Series: MLB Default Lineups from Rotowire

    Back in the day, maybe 4 or 5 years ago, it would take me 45 minutes to update my MLB spreadsheet every night. One large task was updating line ups, specifically substituting the platooning players against left handed pitchers. At the time, there was a website that had a grid of all the batting lineups. That site was taken down, forcing me to look elsewhere for default line ups. Originally written in VBA, I wrote this script in python to capture the daily lineups from Rotowire.

    Sharing here, hope that it can help someone save some time.

    Please don't reply until sequence is finished.

    Thanks.

    Will consider doing a series of these type of automation if there is interest.
    Points Awarded:

    Optional gave Waterstpub87 2 Betpoint(s) for this post.

    Optional gave Waterstpub87 150 Betpoint(s) for this post.

    peacebyinches gave Waterstpub87 100 Betpoint(s) for this post.

    Nomination(s):
    This post was nominated 2 times . To view the nominated thread please click here. People who nominated: PPP_JP, and peacebyinches

  2. #2
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    Step 1: Installing Python:

    I use anaconda distribution. I find that it is very easy to troubleshoot code and to use.

    Where to get it: https://www.anaconda.com/products/distribution

    Install it. and move on to the next step

  3. #3
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    Step 2: Installing Selenium:

    The python package I use for web scraping is selenium. This particular script uses google chrome. If you need a different, you will have to search for the proper driver (Step 3).

    1. Search for Anaconda Prompt in your start menu
    2. It will load a CMD like screen
    3. Type pip install Selenium

    This will download the selenium package to your machine, for use in python

  4. #4
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    Step 3: Getting the Chrome Driver

    You need a driver to be able to use selenium. It basically uses the browser in test mode to go to the website, and run the scraping.

    Go here to retrieve it: https://chromedriver.chromium.org/downloads

    Select the correct one for your version. If your Chrome updates itself, you will need to re-download the driver for the correct version.

    It will download a folder. I tend to copy the driver out of it, and put it in documents or something easy to get to.
    Last edited by Waterstpub87; 04-24-22 at 11:23 PM.

  5. #5
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    Step 4: Opening python

    I use Spyder as the development environment.

    Search for Spyder in your start menu.

    Click to open

    It will bring it up, with a temp file loaded.

    Click new file

    Save as "MLB lineup scrape" or a file name of your choosing. Save it where you want your end files to go, something like documents or a folder on your desktop.
    Paste the code below

    Edit the path to your driver file, it needs to be inside the ' ', single quotes. It does not need an extension, it should just end in chromedriver. There needs to be double slashes \\ in the path. Like:
    'C:\\user\\documents\\chromedriver'

    and click the green run button.

    Code:
    import pandas as pd
    
    from selenium import webdriver
    
    
    driver = webdriver.Chrome('Path to chrome driver')
           
    teams = ['BAL',
             'BOS',
             'CWS',
             'CLE',
             'DET',
             'HOU',
             'KC',
             'LAA',
             'MIN',
             'NYY',
             'OAK',
             'SEA',
             'TB',
             'TEX',
             'TOR',
             'ARI',
             'ATL',
             'CHC',
             'CIN',
             'COL',
             'LAD',
             'MIA',
             'MIL',
             'NYM',
             'PHI',
             'PIT',
             'SD',
             'SF',
             'STL',
             'WAS']
    
                      
    for x in teams:         
    
        driver.get('https://www.rotowire.com/baseball/batting-orders.php?team='+x)
        battertables = driver.find_elements_by_xpath('//ol[@class="list is-rankings pad-5-10"]')
        rbatteritmes = battertables[1].find_elements_by_xpath('.//li[@class="md-text"]')
        rbatters = []
        for z in rbatteritmes:
            rbatters.append(z.text)
        Lbatteritems = battertables[2].find_elements_by_xpath('.//li[@class="md-text"]')
        Lbatters = []
        for z in Lbatteritems:
            Lbatters.append(z.text)
            
        teamvsr = pd.DataFrame({x:rbatters})
        teamvsl = pd.DataFrame({x:Lbatters})
        
        if x == 'BAL':
            vsRight = teamvsr
        else:
            vsRight = vsRight.join(teamvsr)
        
        if x == 'BAL':
            VsLeft = teamvsl
        else:
            VsLeft = VsLeft.join(teamvsl)
            
            
    driver.close()
    vsRight.to_csv('VsRightBattingLineups.csv')
    VsLeft.to_csv('VsLeftBattingLineups.csv')
    The script should pull up chrome, and you should see it going between the team tabs. It should output two CSV files, on with the default vs Right, the other with default vs left.
    Last edited by Waterstpub87; 04-24-22 at 11:42 PM.

  6. #6
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    That should be it. Feel free to reply. Let me know if you run into any issues.

  7. #7
    KVB
    It's not what they bring...
    KVB's Avatar SBR PRO
    Join Date: 05-29-14
    Posts: 74,849
    Betpoints: 7576

    Very nice Waterst.


  8. #8
    Optional
    Optional's Avatar Moderator
    Join Date: 06-10-10
    Posts: 57,796
    Betpoints: 9194

    Nice easy to understand tutorial.

    Nice example to help new people get started with Python too

  9. #9
    LT Profits
    LT Profits's Avatar Become A Pro!
    Join Date: 10-27-06
    Posts: 90,963
    Betpoints: 5179

    If you are referring to Roster Resource, grid still exists, it has moved to FanGraphs with its manager Jason Martinez

  10. #10
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    Quote Originally Posted by LT Profits View Post
    If you are referring to Roster Resource, grid still exists, it has moved to FanGraphs with its manager Jason Martinez
    I was. Emailed back and forth with him many times.

    I could never find the grid again. Maybe they added it back this year. The grid was not being updated, maybe in 2020. They then moved to fangraphs, and maybe did not have the grid.

    At that point, I had built the VBA to scrape the actual pages, the underlying ones from roster resource. The fangraphs never loaded in VBA, something about how it was structured.

    My issue with that was that the platoon was never listed as a separate line up. Require a manual fix. Which is annoying as hell.

    This particular process does not require that, and only a true/false with an index to load the vs left handed line up when required.

  11. #11
    LT Profits
    LT Profits's Avatar Become A Pro!
    Join Date: 10-27-06
    Posts: 90,963
    Betpoints: 5179

    Quote Originally Posted by Waterstpub87 View Post
    I was. Emailed back and forth with him many times.

    I could never find the grid again. Maybe they added it back this year. The grid was not being updated, maybe in 2020. They then moved to fangraphs, and maybe did not have the grid.

    At that point, I had built the VBA to scrape the actual pages, the underlying ones from roster resource. The fangraphs never loaded in VBA, something about how it was structured.

    My issue with that was that the platoon was never listed as a separate line up. Require a manual fix. Which is annoying as hell.

    This particular process does not require that, and only a true/false with an index to load the vs left handed line up when required.
    Yes Jason and I used to message often, although we have not since last season. I used to help him with Opener pitchers. (i.e., Long tonight for Giants)

    I still update my defaults manually because a major flaw with script is it does not account for players that are known to be out injured but are still on active roster, as they are still listed in default lineup. Two examples are JD Martinez currently (pending tonight) and Buxton of Minnesota last week.

  12. #12
    KVB
    It's not what they bring...
    KVB's Avatar SBR PRO
    Join Date: 05-29-14
    Posts: 74,849
    Betpoints: 7576

    Yes, players injured but still in the default lineup is an issue.

    With a lot of juggling these lineups and pitchers these days, it has to be addressed.

    I often feel like everytime I get things updated and running I just create another issue where something manual must still be done.

    Alway something manual...lol.

  13. #13
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    If you had an injured player list, which you could generate somewhere.

    You could write something on the above script to replace the player or print a message to alert you to do so.

    I don't know your level of scripting knowledge, so my apologies if an obvious solution.

    Like I said, originally I was using Jason's grid. His grid looked like to loaded from underlying team pages. I audit this once, and discovered massive differences, I think it was after the trade deadline a few years ago. On the grid, the traded players were on the old team, and on the individual team pages, they were on the new team. I wrote him, and he said it was errored, and was moving to fangraphs. No blame, it was great thing to get for free, and its probably a lot of work, so I totally get it. Your not going to get something institutional quality for free run by one guy.

    The old team pages were still in google docs, which fed to fangraphs. So I wrote VBA to run it to scrape from individual pages. The VBA took like 15 minutes or so to run, and VBA with scraping has a tendency to hang. On top of this, I still had to manually adjust the lineups.

    I built this earlier to get around it. I might add the injury thing, but my feeling is that place like Rotowire are likely doing a decent enough job, and I don't have a lot of free time. It adds a bit of error to the process I'm sure, but thats why you have tolerances.

    Edit to say I might be making this up. It was a few years ago, there was a reason I stopped using the grid. I could look at my emails from then, but it isn't important.
    Last edited by Waterstpub87; 04-25-22 at 12:31 PM.
    Points Awarded:

    KVB gave Waterstpub87 1 Betpoint(s) for this post.


  14. #14
    LT Profits
    LT Profits's Avatar Become A Pro!
    Join Date: 10-27-06
    Posts: 90,963
    Betpoints: 5179

    Quote Originally Posted by Waterstpub87 View Post
    If you had an injured player list, which you could generate somewhere.

    You could write something on the above script to replace the player or print a message to alert you to do so.

    I don't know your level of scripting knowledge, so my apologies if an obvious solution.

    Like I said, originally I was using Jason's grid. His grid looked like to loaded from underlying team pages. I audit this once, and discovered massive differences, I think it was after the trade deadline a few years ago. On the grid, the traded players were on the old team, and on the individual team pages, they were on the new team. I wrote him, and he said it was errored, and was moving to fangraphs. No blame, it was great thing to get for free, and its probably a lot of work, so I totally get it. Your not going to get something institutional quality for free run by one guy.

    The old team pages were still in google docs, which fed to fangraphs. So I wrote VBA to run it to scrape from individual pages. The VBA took like 15 minutes or so to run, and VBA with scraping has a tendency to hang. On top of this, I still had to manually adjust the lineups.

    I built this earlier to get around it. I might add the injury thing, but my feeling is that place like Rotowire are likely doing a decent enough job, and I don't have a lot of free time. It adds a bit of error to the process I'm sure, but thats why you have tolerances.

    Edit to say I might be making this up. It was a few years ago, there was a reason I started using the grid. I could look at my emails from then, but it isn't important.
    Thing is I use actual anticipated lineups, so for example subbing Buxton's replacement in Buxton's leadoff spot does not work when that guy bats 9th. Plate appearances is a model component.

  15. #15
    oilcountry99
    oilcountry99's Avatar Become A Pro!
    Join Date: 08-29-10
    Posts: 707
    Betpoints: 1094

    @Waterstreetpub87
    Thanks for this, I don't use python or scrape but its a great working example. Would love to see more. Thanks for sharing.

  16. #16
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    Quote Originally Posted by LT Profits View Post
    Thing is I use actual anticipated lineups, so for example subbing Buxton's replacement in Buxton's leadoff spot does not work when that guy bats 9th. Plate appearances is a model component.
    I like this idea. Agree with you, actual batting position is important. If the rotowire stuff is actually accurate, I hope this solves it.

  17. #17
    oilcountry99
    oilcountry99's Avatar Become A Pro!
    Join Date: 08-29-10
    Posts: 707
    Betpoints: 1094

    Do you know if theses lineups are more accurate than Rotogrinders expected lineups?

  18. #18
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    Quote Originally Posted by oilcountry99 View Post
    Do you know if theses lineups are more accurate than Rotogrinders expected lineups?
    No idea. I've spot checked it a few times when games start. They seem decent enough. I bet a lot of props, and I don't have a lot of missing players vs what draftkings has.

    Also, it is being updated atleast daily. I have a part of the excel model that checks if I have data on a player. I keep having to add new players as the lineups changes. I have to do this daily, so I assume it's pretty frequently updated.
    Last edited by Waterstpub87; 04-25-22 at 01:34 PM.

  19. #19
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    Quote Originally Posted by oilcountry99 View Post
    @Waterstreetpub87
    Thanks for this, I don't use python or scrape but its a great working example. Would love to see more. Thanks for sharing.
    Never too late to start. I've had to teach several people python at work. Was a VBA guy, become a python guy. Its like going from a shitty 1980's honda to a ferrari. Their both cars, but there is a world of difference.

    Appreciate the kind words from SBR luminaries such as KVB and Optional

  20. #20
    potamushippo
    potamushippo's Avatar SBR PRO
    Join Date: 03-06-19
    Posts: 13
    Betpoints: 9187

    Thanks. Tried to give you points but forum throws message
    This user unable to receive points.

  21. #21
    Optional
    Optional's Avatar Moderator
    Join Date: 06-10-10
    Posts: 57,796
    Betpoints: 9194

    Quote Originally Posted by potamushippo View Post
    Thanks. Tried to give you points but forum throws message
    This user unable to receive points.
    You can only send 2 points a day as a non-pro member, I assume is what you mean.

    It's free to upgrade to Pro membership right now. Just click here and choose any option and submit the form and you will be approved. https://www.sportsbookreview.com/forum/sbr-pro/

Top