1. #1
    TomG
    TomG's Avatar Become A Pro!
    Join Date: 10-29-07
    Posts: 500

    Parsing Pinnacle XML in R

    Since we're on the subject of Pinnacle XML, I wanted to throw this challenge out there to my fellow "R-Tards." For those without any experience in R, it's probably beyond your ability but I guess it could still be a good learning experience. I spent a few hours yesterday trying to parse out Pinnacle's XML file in R using the XML package without much success.

    Download R
    http://www.r-project.org/

    XML Package Documentation
    http://cran.r-project.org/web/packages/XML/XML.pdf

    I've attached yesterday's Pinnacle XML file for download so we don't have to hammer their server every time we load the data (download it and edit the code to direct it to the new directory location). I've also attached some sample code to copy/paste into R's GUI to get the ball rolling.

    Ideally, I'd like the parsed XML in a data.frame format within R. That's an easy to reference table with columns similar to the way Excel parses the XML file.

    500 betpoints, my gratitude and respect goes to anyone who can finish the job.
    Attached Files

  2. #2
    TomG
    TomG's Avatar Become A Pro!
    Join Date: 10-29-07
    Posts: 500

    PS, the xmlToDataFrame() function included in the XML package is likely ultimately what is going to get the job done, but I couldn't get it to work with Pinnacle's XML format.

  3. #3
    MonkeyF0cker
    Update your status
    MonkeyF0cker's Avatar Become A Pro!
    Join Date: 06-12-07
    Posts: 12,144
    Betpoints: 1127

    Perhaps an easier solution:

    Load the XML into Excel. Save as CSV.

    In R:

    > tree <- read.csv(file="C:\\foo\\whatever.csv",header=FALSE,sep=",");

  4. #4
    MonkeyF0cker
    Update your status
    MonkeyF0cker's Avatar Become A Pro!
    Join Date: 06-12-07
    Posts: 12,144
    Betpoints: 1127

    Or are you looking to automate it?

  5. #5
    TomG
    TomG's Avatar Become A Pro!
    Join Date: 10-29-07
    Posts: 500

    That's a potential work around and it's my backup plan. I already have calls within R to read Excel files for other stuff. But the whole goal is to transition away from Excel and have everything automated within R.

    There are some pretty helpful examples here. But it looks like it's not going to be a simple feat. I'm going to need to write an event handler to visit every node in the tree. It seems crazy that there isn't already a function in the package to do that for me.

    I'll play with it some more though and hopefully get something hacked together. Otherwise I'll just settle with some use of Excel. There is a read.xls() function in the "gdata" package that I've been using.

  6. #6
    MonkeyF0cker
    Update your status
    MonkeyF0cker's Avatar Become A Pro!
    Join Date: 06-12-07
    Posts: 12,144
    Betpoints: 1127

    You could parse the XML with a frontend like C# and load the data into R via the COM interface.

    That's how I'd handle it...

  7. #7
    rsigley
    rsigley's Avatar Become A Pro!
    Join Date: 02-23-08
    Posts: 304
    Betpoints: 186

    xmlToDataFrame

    r slow for big xml files, better to work directly with local db. or use python to grab and parse xml then send commands you want to R

  8. #8
    MonkeyF0cker
    Update your status
    MonkeyF0cker's Avatar Become A Pro!
    Join Date: 06-12-07
    Posts: 12,144
    Betpoints: 1127

    I threw together a command line Pinny XML downloader/parser from some of my C# classes that writes a CSV file.

    You could call it from R with system() and load the CSV into a data.frame afterward.

    PM me if you're interested.

  9. #9
    uva3021
    uva3021's Avatar Become A Pro!
    Join Date: 03-01-07
    Posts: 537
    Betpoints: 381

    search for pitchfx and R, gameday data aggregates information via XML and many have used R for extraction and manipulation of such data

  10. #10
    statdude
    always sweating
    statdude's Avatar Become A Pro!
    Join Date: 09-11-12
    Posts: 117

    Quote Originally Posted by MonkeyF0cker View Post
    Perhaps an easier solution:

    Load the XML into Excel. Save as CSV.

    In R:

    > tree <- read.csv(file="C:\\foo\\whatever.csv",he ader=FALSE,sep=",");
    Can you clarify "Load" the XML into Excel?

    Do you mean using VBA, or through Query? Because it does not seem to import directly like other book's XML files. Or am I missing something?

  11. #11
    HUY
    HUY's Avatar Become A Pro!
    Join Date: 04-29-09
    Posts: 253
    Betpoints: 3257

    Quote Originally Posted by TomG View Post
    Since we're on the subject of pinnacle XML, I wanted to throw this challenge out there to my fellow "R-Tards." For those without any experience in R, it's probably beyond your ability but I guess it could still be a good learning experience. I spent a few hours yesterday trying to parse out Pinnacle's XML file in R using the XML package without much success.

    Download R
    http://www.r-project.org/

    XML Package Documentation
    http://cran.r-project.org/web/packages/XML/XML.pdf

    I've attached yesterday's Pinnacle XML file for download so we don't have to hammer their server every time we load the data (download it and edit the code to direct it to the new directory location). I've also attached some sample code to copy/paste into R's GUI to get the ball rolling.

    Ideally, I'd like the parsed XML in a data.frame format within R. That's an easy to reference table with columns similar to the way Excel parses the XML file.

    500 betpoints, my gratitude and respect goes to anyone who can finish the job.
    Why would you parse XML in R? Use the right tool for the job, i.e. Python to parse the XML and then export to some format that R can work with.

  12. #12
    TomG
    TomG's Avatar Become A Pro!
    Join Date: 10-29-07
    Posts: 500

    I originally had grand plans of creating an entire package within R related to betting objects.

    I ended up just hacking things together (as usual) using a mix of Excel, Notepad, and R. Excel being used for the parsing, Notepad as an intermediary, and R for the data analysis. I originally was importing into R the data directly from Excel but it was just too damn slow. I think R loads the entire Excel sheet into memory first instead of just reading the file. Janky solution but story of my life.

  13. #13
    Zesty41
    Zesty41's Avatar Become A Pro!
    Join Date: 07-02-13
    Posts: 3
    Betpoints: 108

    What about using PHP to pures the xml in to MySQL,

    and then use mysql calls to the db for data analysis.

Top