Login Search

An introduction to research

Last Post
#32

Default

Relatively similar? LOL. I hope you're joking.

Some people have only box scores, some have play by play data, some have pitch by pitch data, some people have linked line history tables, some people have closing number columns, some have linked player tables with keys, some have individual tables for each game, some people perform simple system or prop queries, some perform a series of queries to populate and process a model, etc., etc., etc., etc., etc. Not to mention, there are probably more than 3 billion ways that one can go about doing the exact same thing.

AGAIN, IT ALL DEPENDS ON YOUR DATASET AND HOW YOU PLAN ON USING YOUR DATA!!!!!!!

Anyone whose profession is in data warehousing should be able to grasp this simple concept the first time they are told. However, they certainly shouldn't need to be told this in the first place.
Last edited by MonkeyF0cker; 04-16-10 at 11:13 PM.
Give Points

Points Awarded:

Maverick22 gave MonkeyF0cker 5 SBR Point(s) for this post.

#34

Default

I'm always struck by how hard it can be to express yourself in print, and the fact that we all use differing terms to label the same items. I'm not a data base guy but data dictionary "thingies" are important even in my simplistic world. I would like to see us stay away from the Players Talk way of solving differences of opinion here in the Tank, however.

I keep saying this to no observable progress: I'd like to see a group form where the interest is sharing checked out data sets. I spend way too much time cross checking data and way too little time on model building and analysis; especially the analysis.
#35

Default

The reason his statement was confusing is because he wasn't using the proper terminology, Wrecktangle. If someone attacks my integrity in here, I'm certainly going to prove my point. If you don't design your data tables to coincide with your end product, you'll likely create a ton of unnecessary work for yourself and inefficiencies in the modelling phase. It can make your queries a nightmare to code and process.

As far as sharing datasets, I have no interest in that. I do everything programmatically and I think I have far more reliable data than the vast majority of posters here. I really doubt I'd get any desirable reciprocation for my work. Not to mention, I'm not one to trust other people's work when it comes to these things. If someone gave me a set of data, the first thing I'd do is verify its integrity. So it would be a completely unproductive process for me.
#37

Default

I really don't think the discussion we had is too far off topic. The thread is about programmatically scraping data. There really needs to be some concern by those who wish to pursue that venture on how that data is stored, organized, and accessed. CSV's simply aren't practical for most profitable modeling applications.
Last edited by MonkeyF0cker; 04-17-10 at 08:22 PM.
#38

Default

Ijump12,

Please continue with your thread and do not let Monkey destroy it, which has clearly been his intention since day 1

This is a great thread for those of us interested in the subject and your thread has inspired me to purchase a few books to help make this a little more understanding for me. The subject has tweaked my interest for a while now and your thread has helped to point me in the right direction to get started, I also have interest in programming outside of gambling but this is a great starting point to give me the motivation to learn.
#44

Default

Quote Originally Posted by Meestermike View Post
Thanks Daniel. Any idea on the hockey DB addy?
The hockey stats are only available if you're a member of a yahoo group (free). The data is also CSV-only and updated after each season, but contains just about every NHL-season's worth of individual and team stats.

It's managed by a single guy, but he appreciates comments and discussion on the yahoo-group's mailing list.

http://sports.groups.yahoo.com/group/hockey-databank/
#45

Default

Quote Originally Posted by Daniel View Post
The hockey stats are only available if you're a member of a yahoo group (free). The data is also CSV-only and updated after each season, but contains just about every NHL-season's worth of individual and team stats.

It's managed by a single guy, but he appreciates comments and discussion on the yahoo-group's mailing list.

http://sports.groups.yahoo.com/group/hockey-databank/
Thanks Daniel. Much appreciated