Hello everyone!
I've worked on a scraper to download horse-racing data and I now have data for approximately one thousand races.
My main question:
The data that I have scraped is "starting list information" (don't know if it's the correct english term) combined with results for the race. Problem is, the data doesn't fit into neat lines. What I mean is: in one race there are about ten or so horses, and for each horse there are reported results for its five last races (together with date and track code). My problem is that I wish to use the data from the five last races to somehow estimate the "fitness" of the horse, but I don't know how to structure the data. The ML library that I usually use (Orange for Python 2.7) only takes tab-delimited data printed out on single lines, whereas my data would be better written out in a tree-like structure (or something). Do you guys have any ideas on what I could do for this to work? Currently I have list objects with data from previous races as data points, like so:
horse_name driver_name [data for prev. race 1] [data for prev. race 2] {etc.**
(I realize this question is somewhat unclear, but it's probably because I don't even know if it's the right question to be asking to begin with)
Less important second question:
Do you find the odds on pari-mutuel betting markets to be better or worse than those given by betting firms?
Thank you all for your time; this is my first post but I want to thank you all for the great discussions you've had on this board so far! It's been a great joy for me to read!
EDIT: I should probably add that I'm working with Python, and I'm writing the data to a .txt file. I really don't know anything about databases, so please excuse me if this is a dumb question.
I've worked on a scraper to download horse-racing data and I now have data for approximately one thousand races.
My main question:
The data that I have scraped is "starting list information" (don't know if it's the correct english term) combined with results for the race. Problem is, the data doesn't fit into neat lines. What I mean is: in one race there are about ten or so horses, and for each horse there are reported results for its five last races (together with date and track code). My problem is that I wish to use the data from the five last races to somehow estimate the "fitness" of the horse, but I don't know how to structure the data. The ML library that I usually use (Orange for Python 2.7) only takes tab-delimited data printed out on single lines, whereas my data would be better written out in a tree-like structure (or something). Do you guys have any ideas on what I could do for this to work? Currently I have list objects with data from previous races as data points, like so:
horse_name driver_name [data for prev. race 1] [data for prev. race 2] {etc.**
(I realize this question is somewhat unclear, but it's probably because I don't even know if it's the right question to be asking to begin with)
Less important second question:
Do you find the odds on pari-mutuel betting markets to be better or worse than those given by betting firms?
Thank you all for your time; this is my first post but I want to thank you all for the great discussions you've had on this board so far! It's been a great joy for me to read!
EDIT: I should probably add that I'm working with Python, and I'm writing the data to a .txt file. I really don't know anything about databases, so please excuse me if this is a dumb question.