1. #1
    gui_m_p
    gui_m_p's Avatar Become A Pro!
    Join Date: 09-18-13
    Posts: 123
    Betpoints: 1537

    How to scrape oddsportal?

    I've got the basics of programming in R, and would like to scrape the odds from this page:

    http://www.oddsportal.com/matches/soccer/

    However, apparently the content (odds) is not in HTML or XML language, so the XML/RCurl packages are not working.

    What is the best approach? Is to possible to do it within R, or I have to go to other language?

    Thanks!

  2. #2
    TheDonger
    TheDonger's Avatar Become A Pro!
    Join Date: 11-16-13
    Posts: 352
    Betpoints: 869

    Quote Originally Posted by gui_m_p View Post
    I've got the basics of programming in R, and would like to scrape the odds from this page:

    http://www.oddsportal.com/matches/soccer/

    However, apparently the content (odds) is not in HTML or XML language, so the XML/RCurl packages are not working.

    What is the best approach? Is to possible to do it within R, or I have to go to other language?

    Thanks!
    What do you mean "not in HTML"? Anytime you see content on the web it's HTML unless it's a java app. I've double check and the page that you show me is in HTML. If you download google chrome and press CTRL + J you can use the toolbar to navigate with clean code.

    In my personal opinion, I beleive that PHP is great to scrap HTML code. You can read content easily and search effectively threw a string. You then connect that to a MySQL DB. Once that's done you have pretty much all the data you need with simple queries. GL.

  3. #3
    thom321
    thom321's Avatar Become A Pro!
    Join Date: 06-17-11
    Posts: 112
    Betpoints: 4983

    I am not familiar with R but I typically download the Web page source code as a string and then create my own parsing code depending on what data I would like to extract. A lot of sites make it hard since the URL the data is actually coming from is not the URL displayed on the address bar.
    Last edited by thom321; 05-12-14 at 01:46 PM.

  4. #4
    lamichaeljames
    lamichaeljames's Avatar Become A Pro!
    Join Date: 06-02-14
    Posts: 40
    Betpoints: 109

    Quote Originally Posted by gui_m_p View Post
    I've got the basics of programming in R, and would like to scrape the odds from this page:

    http://www.oddsportal.com/matches/soccer/

    However, apparently the content (odds) is not in HTML or XML language, so the XML/RCurl packages are not working.

    What is the best approach? Is to possible to do it within R, or I have to go to other language?

    Thanks!
    Did you figure this out?

  5. #5
    Omaga
    I GET IT...
    Omaga's Avatar Become A Pro!
    Join Date: 07-10-12
    Posts: 460
    Betpoints: 313

    Here is an iMacros I made for you.

    1) Download iMacros for FF or IE
    2) Copy text in notepad and then save with .iim extension
    3) take the .iim file and place in your iMacros folder and run
    4) Your output file is .csv in your iMacros/downloads as oddsportal.csv

    take the simlly faces and replace with : o but be sure to leave out space in between : o make them join. That is the letter o and NOT zero.

    VERSION BUILD=8032216
    TAB T=1
    SET !ERRORIGNORE YES
    SET !REPLAYSPEED FAST
    SET !TIMEOUT_STEP 1
    SET !TIMEOUT_PAGE 30
    SET !LOOP 1
    'Increase the current position in the file with each loop
    SET !DATASOURCE_LINE {{!LOOP}}
    SET !EXTRACT_TEST_POPUP NO
    URL GOTO=http://www.oddsportal.com/matches/soccer/
    TAG POS={{!LOOP}} TYPE=TD ATTR=CLASS:nametable-participant EXTRACT=TXT
    SET !VAR0 {{!EXTRACT}}
    SET !EXTRACT NULL
    TAG POS={{!LOOP}} TYPE=TD ATTR=CLASS:centerboldtable-oddstable-score EXTRACT=TXT
    SET !VAR1 {{!EXTRACT}}
    SET !EXTRACT NULL
    TAG POS={{!LOOP}} TYPE=TD ATTR=CLASSdds-nowrp EXTRACT=TXT
    SET !VAR2 {{!EXTRACT}}
    SET !EXTRACT NULL
    TAG POS={{!LOOP}} TYPE=TD ATTR=CLASSdds-nowrp EXTRACT=TXT
    SET !VAR3 {{!EXTRACT}}
    SET !EXTRACT NULL
    TAG POS={{!LOOP}} TYPE=TD ATTR=CLASSdds-nowrpresult-okin-coupon EXTRACT=TXT
    SET !VAR4 {{!EXTRACT}}
    SET !EXTRACT NULL
    TAG POS={{!LOOP}} TYPE=TD ATTR=CLASS:centerinfo-value EXTRACT=TXT
    SET !VAR5 {{!EXTRACT}}
    SET !EXTRACT NULL

    SET !EXTRACT NULL
    ADD !EXTRACT {{!VAR0}}
    ADD !EXTRACT {{!VAR1}}
    ADD !EXTRACT {{!VAR2}}
    ADD !EXTRACT {{!VAR3}}
    ADD !EXTRACT {{!VAR4}}
    ADD !EXTRACT {{!VAR5}}
    SAVEAS TYPE=EXTRACT FOLDER=* FILE=oddsportal.csv
    SET !EXTRACT NULL
    Last edited by Omaga; 06-03-14 at 04:46 PM.

  6. #6
    lamichaeljames
    lamichaeljames's Avatar Become A Pro!
    Join Date: 06-02-14
    Posts: 40
    Betpoints: 109

    thanks!

Top