An introduction to research

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pedro803
    SBR Sharp
    • 01-02-10
    • 309

    #106
    Originally posted by uva3021
    stick a comment,', before the "On Error Resume Next" sequence, then post the error message, if any

    it could be merely statfox being offline, or a bad internet connection

    I get:

    Run-time error '9':
    Subscript out of range



    Thanks for the help -- and again thanks for the heads up that excel can scrape web pages!
    Comment
    • uva3021
      SBR Wise Guy
      • 03-01-07
      • 537

      #107
      click debug and tell me what line it highlights
      Comment
      • pedro803
        SBR Sharp
        • 01-02-10
        • 309

        #108
        I clicked debug and step into in the VB code window and it highlighted the very first line:

        Sub NFLfromStatfox()

        but I kinda don't think that is what you were looking for, I don't know how to use debug
        Comment
        • uva3021
          SBR Wise Guy
          • 03-01-07
          • 537

          #109
          did you define this range?

          Code:
          For i = 1 To Range("[B]NFLteams[/B]").Rows.Count
          Comment
          • pedro803
            SBR Sharp
            • 01-02-10
            • 309

            #110
            well that line is in the code that you provided, so I guess I did

            in your instructions you wrote:

            Then select all the teams in the table and define a name to the range, a brief survey of the code and one can see I named the range "NFLTeams"

            I wasn't exactly sure what to do, I highlighted the column of teams and pushed the button at the top "define name" on the formulas tab and named it NFLteams
            Comment
            • pedro803
              SBR Sharp
              • 01-02-10
              • 309

              #111
              stepping through the code when it gets to the line:

              Sheets(sht).Select

              i get the error message

              run time error '1004'

              application-defined or object defined-error
              Comment
              • uva3021
                SBR Wise Guy
                • 03-01-07
                • 537

                #112
                that's because "sht" doesn't exist as a sheet. There is something wrong with your naming conventions

                I.E. this is how my range, "NFLTeams" is structured

                Arizona2009
                Atlanta2009
                ....
                NY+JETS2004
                ....
                SAN+DIEGO2000

                Every team from 2009 to 2000 is named in accordance to how they are formatted in the statfox link

                Copy the names from a team report page, replace all spaces with a "+", the run the code
                Comment
                • pedro803
                  SBR Sharp
                  • 01-02-10
                  • 309

                  #113
                  I am giving up for the night -- I have tried everything I can think of for now. I did import the table from the destination page with the excel browser (I could only import the whole page, wasn't able to get the table separate) and I have done the find and replace -- and I have done my best to name the range but I am not sure I am doing this right. I get the names of all the sheets e.g. NY+Giants2004 but none of the sheets have anything in them.

                  thanks for all of your help, I will come back to this!
                  Comment
                  • thechaoz
                    SBR Posting Legend
                    • 10-23-09
                    • 12155

                    #114
                    Amazing thread. Thanks for all the great info.
                    Comment
                    • ScoreProphet
                      SBR Rookie
                      • 09-01-10
                      • 11

                      #115
                      Hi everyone, new guy here..

                      I didn't read this entire thread, but I got the gist of it from the first few pages. I started doing handicapping a few years ago as a hobby. I started by doing calculations on spreadsheets. Since then I've moved on to running all out, full-blown simulations of football games with some Python scripts I wrote. I run each matchup 10,000 times, and it gives me each team's % chance of winning straight up or against a given spread, average scores, rush attempts and yards and pass attempts, completions and yards. I also use the scripts to rank the teams, and I like my rankings better than most of the ones used in the BCS.

                      I'm not here to gloat. I can't, actually, because I haven't yet used my results to gamble with. I also don't have hard numbers on exactly how successful my projections are, though I will in the coming days I hope. All I know for sure is that I consistently do very well on ESPN.com's college pick'em, as that was my initial reason for starting all of this. That said, I'm only here to answer any questions anyone has about my methods, my scripts, or whatever else you can think of.

                      A little more detail:
                      I've built a database of each and every college football play for the last 2 years (and I can go back further just by running a script). For every play, the database has the down & distance, the yardline, the quarter and time left, the current score, the type of play, yards gained or lost, turnover, penalty... the whole shebang. With this information I can build my own boxscores with almost any type of information I need. More importantly, I use the info to build a sort of profile of each team, with their individual offensive, defensive, and special teams strengths and weaknesses.

                      These team profiles consist of a series of ratings which, when compared to any given opponents ratings, can be fed to the simulation script which churns out 10,000 simulated games between the given teams.

                      That's the basics... if you have any questions or would like any tips, ask away.
                      Comment
                      • ScoreProphet
                        SBR Rookie
                        • 09-01-10
                        • 11

                        #116
                        First post took a while for moderator approval, and then went up twice. Sorry!
                        Last edited by ScoreProphet; 09-01-10, 03:13 PM. Reason: double post
                        Comment
                        • Indecent
                          SBR Wise Guy
                          • 09-08-09
                          • 758

                          #117
                          Originally posted by ScoreProphet
                          More importantly, I use the info to build a sort of profile of each team, with their individual offensive, defensive, and special teams strengths and weaknesses.

                          These team profiles consist of a series of ratings which, when compared to any given opponents ratings, can be fed to the simulation script which churns out 10,000 simulated games between the given teams.

                          That's the basics... if you have any questions or would like any tips, ask away.
                          Sounds interesting. The bolded part reminds me a bit of opponent modeling in poker-botting. If you don't mind me asking, what type of information/stats do you use for the team profiles? If your information is detailed enough, I think you could create a pretty robust play-by-play prediction system.
                          Comment
                          • ScoreProphet
                            SBR Rookie
                            • 09-01-10
                            • 11

                            #118
                            Originally posted by Indecent
                            If you don't mind me asking, what type of information/stats do you use for the team profiles?
                            Without going into mind-numbing detail, each team is rated on their run "power" and pass "power", and also such things as their punt/kick return ability and whatnot. The rating system itself is on a scale centered around 1.0, which would be an average rating. On offense, anything over 1.0 is good, and on defense less than 1.0 is good. Let's say that across all FBS teams, the average running play is 5 yards per carry. If Team A has a run rating of 1.1, then that team should average about 5 * 1.1 = 5.5 yards per carry against an average (1.0 run defense) team. If instead of an average team, Team A played against a team with a run defense rating of 0.8 (very good), then Team A will probably average about 5 * 1.1 * 0.8 = 4.4 yards per rush.

                            The simulation script itself will randomize Team A's carries throughout the games in a way that after 10,000 games they will average the "correct" yards per play. It does the same for each pass completion (and similarly completion %), and also accounts for turnovers and penalties, and like I mentioned kick/punt returns. Each team's profile also has info regarding their pass/rush ratio, which is also accounted for in the simulation.

                            Simulated coaching decisions, such as passing more often when you're trailing toward the end of the game (or running more with a large lead), are also taken into account to provide more realistic results.
                            Comment
                            • CrimsonQueen
                              SBR MVP
                              • 08-12-09
                              • 1068

                              #119
                              ScoreProfit: How do you then back test this? I have somewhat of a stats database, and some formulas I've created similar to your rating each thing based around 1.0... I have limited knowledge of Python, but really want to back test my data with my formulas to find the final scores vs. the actual scores and spreads.
                              Currently... I made it so I have a drop down box with each team, then it pulls all their stats into the fields for my formulas to read (using an Array formula in Excel)...but it's insanely time consuming (and outright laughable, really) to switch every single matchup and look at every single score and compare them all by hand... then change the formula slightly to make it more accurate and then redo all of this by hand again.........
                              Anyone who wants to help, thanks!
                              Comment
                              • ScoreProphet
                                SBR Rookie
                                • 09-01-10
                                • 11

                                #120
                                Originally posted by CrimsonQueen
                                ScoreProfit: How do you then back test this? I have somewhat of a stats database, and some formulas I've created similar to your rating each thing based around 1.0... I have limited knowledge of Python, but really want to back test my data with my formulas to find the final scores vs. the actual scores and spreads.
                                Currently... I made it so I have a drop down box with each team, then it pulls all their stats into the fields for my formulas to read (using an Array formula in Excel)...but it's insanely time consuming (and outright laughable, really) to switch every single matchup and look at every single score and compare them all by hand... then change the formula slightly to make it more accurate and then redo all of this by hand again.........
                                Anyone who wants to help, thanks!
                                Yeah, as long as you're dealing with spreadsheets, you will be doing a lot of things by hand. My knowledge of Python and programming in general is pretty limited, as well, but I know enough to get by for my needs. You could use python to work with CSV files, but storing all of your data in a database is much neater (SQLite works great). If you're going to get more into Python, look into SQLalchemy... it makes working with databases with python pretty easy. It's one more thing to learn, but it beats having to write SQL queries all the time, and then figuring out how to deal with the lists, and then this and that. It handles a lot of the legwork and confusing bits for you.

                                As for your situation, I haven't dealt with spreadsheets in a while, so it's hard for me to say how you should backtest your results, especially without knowing the details of how you have all your data laid out. It sounds to me, though, that instead of using the dropdown lists, you should find a way to import the week's results onto one sheet. Just a long list, with each row containing one game. I would imagine columns A&B having the home and away team names, C&D your predicted scores, and E&F the actual scores. This way it's easy to put formulas in G&H for the difference between your projections and the actual scores, or whatever other calculations you want to see.
                                Comment
                                • craigpb
                                  SBR Wise Guy
                                  • 06-19-08
                                  • 699

                                  #121
                                  Thanks for all the great info guys; really helpful.
                                  Comment
                                  • hubie69
                                    SBR Hall of Famer
                                    • 09-16-10
                                    • 7329

                                    #122
                                    I use a mysql db with a series of bash scripts on a linux box for my college basketball stuff. Was a F*ck ton of work at the beggining to get it going but now that it's running it doesn't require much from me. only does college basketball though.
                                    Comment
                                    • nmr123321
                                      SBR Wise Guy
                                      • 01-06-10
                                      • 609

                                      #123
                                      thank you ver much for this
                                      Comment
                                      • dmolition
                                        SBR High Roller
                                        • 10-10-08
                                        • 106

                                        #124
                                        This is thread is really great, thanks a lot, i have some questions about using
                                        br.set_handle_robots(False) in mechanize
                                        when a site has a robots.txt file, i know there are legal o ethical issues respecting this,
                                        i want to try scraping but from what i read you need to set timeouts on your scripts so your ip doesnt get ban, and other measures.

                                        are there any sites that are "ok" with being scrape for stats (sbr?)?? or should you be really careful with your scraping since most i would guess dont like it, what other things should we consider??
                                        Comment
                                        • Wrecktangle
                                          SBR MVP
                                          • 03-01-09
                                          • 1524

                                          #125
                                          dmolition, most sites are NOT OK with scraping due to copyright and not a few will actively block you. And it seems that even those who tolerate it change formats so often that you are always in tweaking code to get around the changes.
                                          Comment
                                          • Maverick22
                                            SBR Wise Guy
                                            • 04-10-10
                                            • 807

                                            #126
                                            Which Sites are you referring to that will block you?
                                            Comment
                                            • lucaario83
                                              Restricted User
                                              • 10-05-10
                                              • 180

                                              #127
                                              very interesting stuff
                                              Comment
                                              • dmolition
                                                SBR High Roller
                                                • 10-10-08
                                                • 106

                                                #128
                                                Originally posted by Wrecktangle
                                                dmolition, most sites are NOT OK with scraping due to copyright and not a few will actively block you. And it seems that even those who tolerate it change formats so often that you are always in tweaking code to get around the changes.
                                                Yeah i figured as much, so to scrape 10 seasons of any sport i imagine i need multiple IPs, timeouts in scripts, constantly checking for changes in DOM structure of the HTML,etc,etc. Now i know the cost of data.

                                                Im gonna research and maybe if i gather enough data i'll be willing to trade it (after i validate it of course)
                                                It would be nice to have a list of sites of where they enforce more strictly anti scraping policies or where NOT to try it so we can have a little piece of mind.

                                                Also i'm taking the hard road and learning R and python (checking out SciPy also) for data analysis, i'm savvy with software development, when i can actually start doing some serious data analysis if anyone wants to exchange technical tips of how to do that and this, maybe we can open a "hacking/data analysis stuff" thread to discuss tips and such, to ask general questions,tips and contribute in general.
                                                Comment
                                                • uva3021
                                                  SBR Wise Guy
                                                  • 03-01-07
                                                  • 537

                                                  #129
                                                  i abuse statfox and have yet to be banned
                                                  Comment
                                                  • Data
                                                    SBR MVP
                                                    • 11-27-07
                                                    • 2236

                                                    #130
                                                    Originally posted by Wrecktangle
                                                    dmolition, most sites are NOT OK with scraping due to copyright and not a few will actively block you.
                                                    Hm, most sites? No way. The largest projects traffic wise are collecting several years worth of box scores and play-by-plays. Everything else is peanuts. The only site that ever temporarily blocked my scrapping was !Yahoo.
                                                    Comment
                                                    • dmolition
                                                      SBR High Roller
                                                      • 10-10-08
                                                      • 106

                                                      #131
                                                      Ok im starting to collect the data so far so good, but the next step is to check the integrity,
                                                      im comparing my data against covers.com and espn.com mostly,
                                                      are these sites accurate with stat records?

                                                      what sites are more reliable in your opinion for stat comparing?
                                                      Comment
                                                      • jscar3
                                                        SBR High Roller
                                                        • 02-10-09
                                                        • 130

                                                        #132
                                                        i will look this up to see the sense in it. thanks.
                                                        Comment
                                                        • LegitBet
                                                          Restricted User
                                                          • 05-25-10
                                                          • 538

                                                          #133
                                                          what would be nice is 'data for dummies', but that comes with many challenges for the sharpies...
                                                          my 2 cents
                                                          Comment
                                                          • Jeremy Nguyen
                                                            SBR Rookie
                                                            • 10-25-10
                                                            • 1

                                                            #134
                                                            last monday nite 10/18

                                                            Hello to every One
                                                            Do anybody remember what time ? the ball kick off from 2nd between Tenn adn Jacksonvill? Please!! Thank you
                                                            Comment
                                                            • Chachieguy
                                                              SBR Rookie
                                                              • 10-27-10
                                                              • 3

                                                              #135
                                                              Looking forward to learning more. Thank you
                                                              Comment
                                                              • Flying Dutchman
                                                                SBR MVP
                                                                • 05-17-09
                                                                • 2467

                                                                #136
                                                                Originally posted by Data
                                                                Hm, most sites? No way. The largest projects traffic wise are collecting several years worth of box scores and play-by-plays. Everything else is peanuts. The only site that ever temporarily blocked my scrapping was !Yahoo.
                                                                I'm hearing both sides on this. I had trouble with Covers last year in the NBA and then went to a piece of software where I could change my IP and problem went away. Then I quit changing IP for a while and didn't get blocked. Or were they just having site trouble?

                                                                I also had trouble on FoxSports and CBS as I recall.

                                                                Comment
                                                                • Data
                                                                  SBR MVP
                                                                  • 11-27-07
                                                                  • 2236

                                                                  #137
                                                                  While scraping boxscores, I make a courtesy 1 sec pause after processing each boxscore. Not sure if everybody does this but they should.
                                                                  Comment
                                                                  • Indecent
                                                                    SBR Wise Guy
                                                                    • 09-08-09
                                                                    • 758

                                                                    #138
                                                                    Originally posted by Data
                                                                    While scraping boxscores, I make a courtesy 1 sec pause after processing each boxscore. Not sure if everybody does this but they should.
                                                                    If you don't mind me asking, how long have you been using this method?

                                                                    I have my scraper pause for a random number of seconds (usually 10-25 but it will go shorter and longer) to try to simulate a human browsing the pages. If you've been using 1 second delay successfully for a while with no bans, etc, I might have to drop my delay times considerably.
                                                                    Comment
                                                                    • Data
                                                                      SBR MVP
                                                                      • 11-27-07
                                                                      • 2236

                                                                      #139
                                                                      I finished my last big scrapping project about a year ago. I only scap boxscores nowadays as I calculate all the stats I need myself. Well, I do import some stuff into Excel too but not that much.
                                                                      Comment
                                                                      • pro-style
                                                                        SBR High Roller
                                                                        • 07-20-10
                                                                        • 177

                                                                        #140
                                                                        where is the best play to scrape boxscores?
                                                                        Comment
                                                                        SBR Contests
                                                                        Collapse
                                                                        Top-Rated US Sportsbooks
                                                                        Collapse
                                                                        Working...