Parsing Pinnacle XML in R

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • TomG
    SBR Wise Guy
    • 10-29-07
    • 500

    #1
    Parsing Pinnacle XML in R
    Since we're on the subject of Pinnacle XML, I wanted to throw this challenge out there to my fellow "R-Tards." For those without any experience in R, it's probably beyond your ability but I guess it could still be a good learning experience. I spent a few hours yesterday trying to parse out Pinnacle's XML file in R using the XML package without much success.

    Download R


    XML Package Documentation


    I've attached yesterday's Pinnacle XML file for download so we don't have to hammer their server every time we load the data (download it and edit the code to direct it to the new directory location). I've also attached some sample code to copy/paste into R's GUI to get the ball rolling.

    Ideally, I'd like the parsed XML in a data.frame format within R. That's an easy to reference table with columns similar to the way Excel parses the XML file.

    500 SBR points, my gratitude and respect goes to anyone who can finish the job.
    Attached Files
  • TomG
    SBR Wise Guy
    • 10-29-07
    • 500

    #2
    PS, the xmlToDataFrame() function included in the XML package is likely ultimately what is going to get the job done, but I couldn't get it to work with Pinnacle's XML format.
    Comment
    • MonkeyF0cker
      SBR Posting Legend
      • 06-12-07
      • 12144

      #3
      Perhaps an easier solution:

      Load the XML into Excel. Save as CSV.

      In R:

      > tree <- read.csv(file="C:\\foo\\whatever.csv",he ader=FALSE,sep=",");
      Comment
      • MonkeyF0cker
        SBR Posting Legend
        • 06-12-07
        • 12144

        #4
        Or are you looking to automate it?
        Comment
        • TomG
          SBR Wise Guy
          • 10-29-07
          • 500

          #5
          That's a potential work around and it's my backup plan. I already have calls within R to read Excel files for other stuff. But the whole goal is to transition away from Excel and have everything automated within R.

          There are some pretty helpful examples here. But it looks like it's not going to be a simple feat. I'm going to need to write an event handler to visit every node in the tree. It seems crazy that there isn't already a function in the package to do that for me.

          I'll play with it some more though and hopefully get something hacked together. Otherwise I'll just settle with some use of Excel. There is a read.xls() function in the "gdata" package that I've been using.
          Comment
          • MonkeyF0cker
            SBR Posting Legend
            • 06-12-07
            • 12144

            #6
            You could parse the XML with a frontend like C# and load the data into R via the COM interface.

            That's how I'd handle it...
            Comment
            • rsigley
              SBR Sharp
              • 02-23-08
              • 304

              #7
              xmlToDataFrame

              r slow for big xml files, better to work directly with local db. or use python to grab and parse xml then send commands you want to R
              Comment
              • MonkeyF0cker
                SBR Posting Legend
                • 06-12-07
                • 12144

                #8
                I threw together a command line Pinny XML downloader/parser from some of my C# classes that writes a CSV file.

                You could call it from R with system() and load the CSV into a data.frame afterward.

                PM me if you're interested.
                Comment
                • uva3021
                  SBR Wise Guy
                  • 03-01-07
                  • 537

                  #9
                  search for pitchfx and R, gameday data aggregates information via XML and many have used R for extraction and manipulation of such data
                  Comment
                  • statdude
                    SBR High Roller
                    • 09-11-12
                    • 117

                    #10
                    Originally posted by MonkeyF0cker
                    Perhaps an easier solution:

                    Load the XML into Excel. Save as CSV.

                    In R:

                    > tree <- read.csv(file="C:\\foo\\whatever.csv",he ader=FALSE,sep=",");
                    Can you clarify "Load" the XML into Excel?

                    Do you mean using VBA, or through Query? Because it does not seem to import directly like other book's XML files. Or am I missing something?
                    Comment
                    • HUY
                      SBR Sharp
                      • 04-29-09
                      • 253

                      #11
                      Originally posted by TomG
                      Since we're on the subject of pinnacle XML, I wanted to throw this challenge out there to my fellow "R-Tards." For those without any experience in R, it's probably beyond your ability but I guess it could still be a good learning experience. I spent a few hours yesterday trying to parse out Pinnacle's XML file in R using the XML package without much success.

                      Download R


                      XML Package Documentation


                      I've attached yesterday's Pinnacle XML file for download so we don't have to hammer their server every time we load the data (download it and edit the code to direct it to the new directory location). I've also attached some sample code to copy/paste into R's GUI to get the ball rolling.

                      Ideally, I'd like the parsed XML in a data.frame format within R. That's an easy to reference table with columns similar to the way Excel parses the XML file.

                      500 SBR points, my gratitude and respect goes to anyone who can finish the job.
                      Why would you parse XML in R? Use the right tool for the job, i.e. Python to parse the XML and then export to some format that R can work with.
                      Comment
                      • TomG
                        SBR Wise Guy
                        • 10-29-07
                        • 500

                        #12
                        I originally had grand plans of creating an entire package within R related to betting objects.

                        I ended up just hacking things together (as usual) using a mix of Excel, Notepad, and R. Excel being used for the parsing, Notepad as an intermediary, and R for the data analysis. I originally was importing into R the data directly from Excel but it was just too damn slow. I think R loads the entire Excel sheet into memory first instead of just reading the file. Janky solution but story of my life.
                        Comment
                        • Zesty41
                          SBR Rookie
                          • 07-02-13
                          • 3

                          #13
                          What about using PHP to pures the xml in to MySQL,

                          and then use mysql calls to the db for data analysis.
                          Comment
                          SBR Contests
                          Collapse
                          Top-Rated US Sportsbooks
                          Collapse
                          Working...