1. #1
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    Programmers Wanted For Group NFL Web Scraper Project

    Hi all...

    It was mentioned to have a community/group created NFL Scraper... So I figured I would spear head this effort...

    Was looking to scrape webpages similar to this: http://sports.espn.go.com/nfl/boxscore?gameId=290914017

    I am pretty crispy on Java ( did my MLB scraper in Java). But am open to whatever "the group" wants to use...

    Does there exist any interest for this?

  2. #2
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    I'm game if we do it in PHP. I have zero interest in ever using Java again .

  3. #3
    fightingwarrior
    fightingwarrior's Avatar Become A Pro!
    Join Date: 05-06-10
    Posts: 7,818

    i can do it.seems easy enough,

  4. #4
    FreeFall
    FreeFall's Avatar Become A Pro!
    Join Date: 02-20-08
    Posts: 3,365
    Betpoints: 2391

    I don't see why it matters. Just something that is platform independent or hits the big three, Linux, Mac, Win. Either way aren't we all just looking for the end goal and thats the DB?

  5. #5
    Grind-It-Out
    Grind-It-Out's Avatar Become A Pro!
    Join Date: 05-04-10
    Posts: 537
    Betpoints: 942

    Quote Originally Posted by FreeFall View Post
    I don't see why it matters. Just something that is platform independent or hits the big three, Linux, Mac, Win. Either way aren't we all just looking for the end goal and thats the DB?
    Yes. but by "group" project I took it as we would all be working on the same script. If that's the case, it would have to be the same language. Although, it definitely seems easy enough for one person to do.

  6. #6

  7. #7
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    Quote Originally Posted by Grind-It-Out View Post

    Yes. but by "group" project I took it as we would all be working on the same script. If that's the case, it would have to be the same language. Although, it definitely seems easy enough for one person to do.
    Yes, it is easy enough for a single person. It is not hard...but quite tedious... I could write it myself... But then I would hoard the code and it would never see the light of day... Why not pool talents and roll out a "product" quicker than a single man can... I'm awesome... but i'm not superman...

    So... According to this website, the guy did a single mass upload of the stats. And WELL after the season was over. For what most of us have in mind, this needs to be done every day... End of the season does little good. Unless i missed the link where it is noted the stats will be updated/uploaded nightly... This link is only good in the offseason...

  8. #8
    gimpy
    gimpy's Avatar Become A Pro!
    Join Date: 10-17-10
    Posts: 510
    Betpoints: 3817

    Is there still interest for this? I'd like to see it happen.

  9. #9
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    I have thus far fully created this scraper. So i guess there is no more need for a collaborative effort.

    I did it the 'long' way, but it is the 'right' and most 'robust' way. I am an expert programmer after all

  10. #10
    uva3021
    uva3021's Avatar Become A Pro!
    Join Date: 03-01-07
    Posts: 537
    Betpoints: 381

    Quote Originally Posted by Maverick22 View Post
    I have thus far fully created this scraper. So i guess there is no more need for a collaborative effort.

    I did it the 'long' way, but it is the 'right' and most 'robust' way. I am an expert programmer after all
    share the source so we can begin collaborating

    hopefully you did it in python or PHP

  11. #11
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    ...
    Last edited by Maverick22; 09-23-11 at 05:12 AM.

  12. #12
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    ...
    Last edited by Maverick22; 09-23-11 at 05:12 AM.

  13. #13
    chilidog
    chilidog's Avatar Become A Pro!
    Join Date: 04-05-09
    Posts: 10,304
    Betpoints: 956

    Quote Originally Posted by Maverick22 View Post
    Once upon a time, a little red hen lived in a small cottage. She worked
    hard to keep her family fed. One day, when the little red hen was out walk-
    ing with her friends, the goose, the cat, and the pig, she found a few grains of
    wheat....
    Such a moral doesn't work anymore. In modern times, the cat and pig would've just killed the hen and took her bread.

  14. #14
    pedro803
    pedro803's Avatar Become A Pro!
    Join Date: 01-02-10
    Posts: 309
    Betpoints: 5708

    Quote Originally Posted by chilidog View Post
    Such a moral doesn't work anymore. In modern times, the cat and pig would've just killed the hen and took her bread.

    probably this is the reason Maverick ain't sharing his address!

  15. #15
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    ...
    Last edited by Maverick22; 09-23-11 at 05:12 AM.

  16. #16
    podonne
    podonne's Avatar Become A Pro!
    Join Date: 07-01-11
    Posts: 104

    This is part of my frustration with the current state of affairs in the sports-information marketplace. Everyone writes this kind of code for themselves and hoards it, or is forced to pay $100s of dollars a month to websites. I really think we could all benefit from a simple service that charged a nominal fee ($10\month?) to download box scores and odds lines in XML format. After all, the information is free (scraped) so they really shouldn't charge that mush to recieve it.

  17. #17
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    ...
    Last edited by Maverick22; 09-23-11 at 05:12 AM.

  18. #18
    uva3021
    uva3021's Avatar Become A Pro!
    Join Date: 03-01-07
    Posts: 537
    Betpoints: 381

    Get over yourself Maverick22

  19. #19
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    ...
    Last edited by Maverick22; 09-23-11 at 05:12 AM.

  20. #20
    rsigley
    rsigley's Avatar Become A Pro!
    Join Date: 02-23-08
    Posts: 304
    Betpoints: 186

    fwiw nfl.com boxscore has much more info than espn that could be useful. that is the one i scrape

    also

    w1 lines:
    http://www.squaresportsbetting.com/NFLW12011.csv
    w2 lines:
    http://www.squaresportsbetting.com/NFLW2Lines.csv

    w1 box:
    http://www.squaresportsbetting.com/NFLBSW12011.csv
    w2 box:
    http://www.squaresportsbetting.com/NFLW2Box.csv

  21. #21
    podonne
    podonne's Avatar Become A Pro!
    Join Date: 07-01-11
    Posts: 104

    Quote Originally Posted by rsigley View Post
    fwiw nfl.com boxscore has much more info than espn that could be useful. that is the one i scrape
    What nfl.com box scores are you looking at? I can't find a proper box score on the site, except for a "Gamebook" but it's in PDF. Are you parsing the PDF file?

    A game page: http://www.nfl.com/gamecenter/200909...ghts&tab=recap

  22. #22

  23. #23
    subs
    subs's Avatar Become A Pro!
    Join Date: 04-30-10
    Posts: 1,412
    Betpoints: 969

    the exact opposite of what it's called

  24. #24
    rsigley
    rsigley's Avatar Become A Pro!
    Join Date: 02-23-08
    Posts: 304
    Betpoints: 186

    Quote Originally Posted by podonne View Post
    What nfl.com box scores are you looking at? I can't find a proper box score on the site, except for a "Gamebook" but it's in PDF. Are you parsing the PDF file?

    A game page: http://www.nfl.com/gamecenter/200909...ghts&tab=recap
    http://www.nfl.com/gamecenter/200909...ts&tab=analyze

    but using top tier methods to by pass the ajax which makes it pretty impossible to scrape

  25. #25
    Maverick22
    Maverick22's Avatar Become A Pro!
    Join Date: 04-10-10
    Posts: 807
    Betpoints: 58

    top tier methods? What does that mean?

  26. #26
    podonne
    podonne's Avatar Become A Pro!
    Join Date: 07-01-11
    Posts: 104

    Quote Originally Posted by rsigley View Post
    http://www.nfl.com/gamecenter/200909...ts&tab=analyze

    but using top tier methods to by pass the ajax which makes it pretty impossible to scrape
    That's pretty good! Its making a call for a formatted JSON file with all the box score information. Love when its in a non-html format!

    Try this from the game above: http://www.nfl.com/liveupdate/game-c...=1316930020000

    If the random number is meaningful, you could always just load up an internet explorer webbrowser control and use WebBrowser.Navigate(). Wait a few seconds and then parse the DOM. It'd be alot slower, but it can't be defeated by trickery.

    If you don't mind parsing the HTML, try a link like this one:

    http://www.nfl.com/widget/gc/2011/ta...eId=2009091303

    Returns an HTML version of the box score. Just set the date in YYYYMMDD format and append a "01" to "16" and you should get what you need. No random numbers...

  27. #27
    podonne
    podonne's Avatar Become A Pro!
    Join Date: 07-01-11
    Posts: 104

    Here's a URL for play-by-play using the same method as for the box score:

    http://www.nfl.com/widget/gc/2011/ta...eId=2009091303

  28. #28
    rsigley
    rsigley's Avatar Become A Pro!
    Join Date: 02-23-08
    Posts: 304
    Betpoints: 186

    oh yep i use those links. if anyone wants to know how to find them, on firefox you can install this plugin called firebug (the one on chrome is better but its harder to find these urls unless you know where to look)

    when you click on the boxscore then play by play tab you get (you have to enable console for this)



    So you can see that when you click the boxscore button its sending a get request to get the data from that URL and it seems there's a variable involved called "gameID".
    but how do you figure out the gameID? if you click scores at the top and go to week you can see that its embedded in every gamecenter url

    so the logic behind a parser would be

    - Go to current week in scores (they use the convenient REGX where X = week of the season)
    - Grab all the game center URLs (they're of the form /gamecenter/XXXX not the full www.nfl.com/gamecenter/etc)
    - Find a tricky way to get the gameID out of the URL (one way I use it to string replace /gamecenter/ and then explode on the "/" so it will be the first element in the new array. You could also just explode on / and look at the second element but I ran into some errors on the older boxscores that way. I mean you could do checks to see if its numeric or whatever but why bother when something like this would work:

    $link2 = str_replace("/gamecenter/", "", $dom2->find('div[id="score-boxes"]', 0)->find('div[class="game-center-area"]', $c)->find('a', 0)->href);
    $bsid = explode("/", $link2);
    $BSLink = "http://www.nfl.com/widget/gc/2011/tabs/cat-post-boxscore?gameId=" . $bsid[0];
    $PBPLink = "http://www.nfl.com/widget/gc/2011/tabs/cat-post-playbyplay?gameId=" . $bsid[0];

    Then you can just get those files and parse accordingly.

    This method is useful to parse any website that uses ajax to load the data. Some (like Covers) require you to send post requests to the server to get the data back as JSON but that's not that big of a deal. Just use firebug to see what variable the requests are sending and what is coming back and use your script to send/receive those
    Last edited by rsigley; 09-25-11 at 08:58 AM.

  29. #29
    chemicalbrother
    Under the Influence
    chemicalbrother's Avatar Become A Pro!
    Join Date: 01-26-11
    Posts: 4,086

    rsigley: not just for trolling anymore.

  30. #30
    podonne
    podonne's Avatar Become A Pro!
    Join Date: 07-01-11
    Posts: 104

    rsigley: Can't you just start from July 1, 2000 and increment day by day, putting each date in YYYYMMDD format and pulling every value between "01" and "16"?

    You'll have to figure out which game you've pulled, but it should be easier than parsing the gamecenter urls.

  31. #31
    rsigley
    rsigley's Avatar Become A Pro!
    Join Date: 02-23-08
    Posts: 304
    Betpoints: 186

    dunno if you want but its easier to just get the link than not knowing what you're getting

    w3 if anyones interested:

    w3 lines
    http://www.squaresportsbetting.com/NFLLinesW3.csv

    w3 boxscore
    http://www.squaresportsbetting.com/NFLBSW3.csv

Top