I'm writing this post to serve as an intro into using computers for sports betting. Programming isn't as hard as most people think, and the basic skills can be picked up on a weekend. This will be by no means an extensive resource, but will rather be a brief introduction. It is my belief that the best way to beat the books is with extensive research and backtesting. What is taught here will not give you the answers, there are no 20*play GOTY locks in this thread, only the tools that will allow you to succeed. Also note this is very much a work in progress, I will post new sections as I write them. If you have suggestions, would like to contribute etc. etc. just post!
Sections:
1) Intro to programming (Taught in Python) a) What you need to get started
b) Basics of programming
c) Basics of data input & output
d) How to scrape the internet for data
e) How to manipulate the data for excel
2) Intro to excel a) How to load in data files
b) What can be done in excel
3) Intro to standard wagering ideas a) Arbitrage
b) Kelly Criterion
Python is one of many programming languages, and it allows us to work gather,manipulate and apply data. I believe Python is the best language for a beginner to learn becuase it reads like english, but is still extremely powerful.
Section A) What you need to get started..
Since you're reading this thread i'll assume you have a computer. Python is a platform independent scripting language, which means that it *should* run the same across different operating systems [Windows, Mac, Unix etc]. For this tutorial, i'm going to assume you have Mac/Linux becuase that is what I'm familiar with. However, it should be pretty easy to generalize to Windows.
Downloading Python
If you're on windows you will need to download Python and Idle [ http://www.python.org/download/ ]
Get version 2.6.* -- don't get version 3. A lot has changed in version 3, and most old code is not supported, making it a pain in the ass. Trust me on this. Version 2.6.* is what you want.
Good news, If you're on Mac or Linux, you probably already have python!
Open up terminal [Mac users hit apple+space to bring up spotlight, and type in terminal].
Type in "python -V" and press enter. It should tell you which version of python is installed. Even if it's not version 2.6.*, it will probably still do, as long as it's > 2.3 and < 3.0
Writing Python Programs
Python programs should be writted in a text editor, in a monospaced font...
Windows Users: There's a good editor called "notepad++" google it. Alternatively when you download python it will come with an editor. You could use that...
Mac Users: I like a program called "TextMate", though you need to pay for it. There's probably a free trial somewhere.
Section B) Basics of Programming..
Learning Python:
I could type up a basic tutorial in python, but i'd be reinventing the wheel. John wrote a great introduction to programming that you can find here: http://books.google.com/books?id=aJQ...age&q=&f=false
I'd suggest you read this through. Read at least the first 4 chapters. Spend a day and DO THE EXAMPLES. The only way to learn programming is by doing. It's really not hard stuff, it just takes some time to get the basics. Again, don't just read it or you will learn nothing. Take some time and practice practice practice. You can post questions or snippets of code in this thread if you're having problems. I'm sure I, or someone else can find and fix your problem.
Section C) Basics of data input & output..
If you have gotten to this point, you should already know the basics of python. You should know what an "if statement" is, what a "for loop" is, and how to print "Hello World!".
In general the tasks we are trying to do with python will either be taking data from excel and manipulating/running tests on it, or getting data from the internet, and writing it to an excel file for easier access. We can do both with python! Excel takes in what is known as a "CSV" or comma separated file, and displays it in spreadsheet format, so all we have to do is have our python program output a file that is comma separated -- and we can load it right into excel.
Let's start with a simple example. I have uploaded a .csv file to my website, it contains MLB game information for a single day. Download and save this file into the same directory that your python script will run from. If you open the file in excel, you will get a better idea of what is inside it. You'll find the file here: http://atbgreen.com/mlb_ex_1.csv
[Opening and Reading a .csv File]
PHP Code:
## Created on 4/1/10
## This example should shows how to open and read a .csv file.
##
## Notes: The file mlb_ex_1.csv should be downloaded and in the same folder
## .. as this scrpt
## Tell python to import the .csv module, becuase we will be reading a .csv
import csv
## First we need to open the file
mlb_file = open("mlb_ex_1.csv","r") ## Open the MLB .csv file for reading
## Now we need to tell python to read it as a csv file
## We are opening the mlb_file as defined above, it is deliminated by commas,
## and our quote characted is a regular quote (")
mlbReader = csv.reader(mlb_file, delimiter=',', quotechar='"')
## Grab the first line, becuase it is the headers..
headers = mlbReader.next()
## It's now time to iterate through the file row by row...
for row in mlbReader:
## Let's try and only print the Over/Under Line, and the actual runs scored
## .. in the game. If you look at the .csv in excel you will see these are
## .. in the 6 & 7 columns. But since the computer starts counting at 0,
## .. we would say they are in the 5th and 6th columns
ou_line = float(row[5]) ## This should be a float, becase it can be .5
runs_scored = int(row[6]) ## This will be an int, becuase runs are integers
print "The line was",ou_line,"and",runs_scored,"runs were scored"
## End of program
Save and run the code. I've commented it generously so you can tell exactly whats going on. It looks long, but it's only becuase i've tried to make it as clear as possible. If I wanted, i could compress the code into 3 lines -- but it's not nearly as easy to understand.
[Opening and Reading a .csv File (in 3 lines)]
PHP Code:
import csv
mlbReader = csv.reader(open("mlb_ex_1.csv"),delimiter=',',quotechar='"')
for row in mlbReader: print "The line was",row[5],"and",row[6],"runs were scored"
Let's go a step further this time, and do some calculations with our file. Let's determine whether the game went over or under.
[Opening and Reading a .csv File, and determining over or under]
PHP Code:
## Created on 4/1/10
## This example should shows how to open and read a .csv file, and perform
## .. some simple calculations
##
## Notes: The file mlb_ex_1.csv should be downloaded and in the same folder
## .. as this scrpt
## Tell python to import the .csv module, becuase we will be reading a .csv
import csv
total_overs = 0 ## Initialize the total number of overs to 0
total_unders = 0 ## Initialize the total number of unders to 0
## First we need to open the file
mlb_file = open("mlb_ex_1.csv","r") ## Open the MLB .csv file for reading
## Now we need to tell python to read it as a csv file
## We are opening the mlb_file as defined above, it is deliminated by commas,
## and our quote characted is a regular quote (")
mlbReader = csv.reader(mlb_file, delimiter=',', quotechar='"')
## Grab the first line, becuase it is the headers..
headers = mlbReader.next()
## It's now time to iterate through the file row by row...
for row in mlbReader:
## First we need to get the OU_Line, and runs scored out of the file.
ou_line = float(row[5]) ## This should be a float, becase it can be .5
runs_scored = int(row[6]) ## This will be an int, becuase runs are integers
## Now lets compare the two with an if statement to see what happened:
if ou_line < runs_scored:
ou_result = "Under"
total_unders += 1
elif ou_line > runs_scored:
ou_result = "Over"
total_overs += 1
else:
ou_result = "Push"
## Calculate the percent of games that went over, and round it to 2 decimal places.
over_under_percentage = round((total_overs / float(total_overs + total_unders)),2)*100
## Finally let's put it all together in one print statement
print "The line was",ou_line,"and",runs_scored,"runs were scored, so the game went",ou_result
## END OF FOR LOOP
print "There were",total_overs,"Overs"
print "There were",total_unders,"Unders"
print over_under_percentage,"percent of games went Over"
## End of program
That's really all there is to reading in a file. What you do after you have read the file in is completely up to you. All the columns are accessible in the "row" array, and can be accessed by asking for a position out of the array. Remember the position is always one less then its column number. For example, if you want the 7th column, you would do row[6].
Let's move on to data output. Let's further expand on our old example, and say after we calculate whether the game went over or under, we want to write it to a new file. We want our new file to have three columns. Date, Teams, OverUnder. If we look in our sheet we will see that the date and teams are in columns 1 and 3 respectively. We will call our new file MLB_output.csv
PHP Code:
## Created on 4/1/10
## This example should shows how to open and read a .csv file, and perform
## .. some simple calculations
##
## Notes: The file mlb_ex_1.csv should be downloaded and in the same folder
## .. as this scrpt
## Tell python to import the .csv module, becuase we will be reading a .csv
import csv
## First we need to open both files
mlb_file = open("mlb_ex_1.csv","r") ## Open the MLB .csv file for reading
output_file = open("MLB_output.csv","w") ## Open the output file for writing
## Now we need to tell python to read it as a csv file
## We are opening the mlb_file as defined above, it is deliminated by commas,
## and our quote characted is a regular quote (")
mlbReader = csv.reader(mlb_file, delimiter=',', quotechar='"')
## We'll do the same for our writer. We need to tell it where we will be writing
## .. to, and what kind of delimiters we want to use.
mlbWriter = csv.writer(output_file, delimiter=',', quotechar='"')
## Grab the first line, becuase it is the headers..
headers = mlbReader.next()
## It's now time to iterate through the file row by row...
for row in mlbReader:
## First we need to get the OU_Line, and runs scored out of the file.
ou_line = float(row[5]) ## This should be a float, becase it can be .5
runs_scored = int(row[6]) ## This will be an int, becuase runs are integers
## Now lets get the other information we need out (Date and Teams)
date = row[0]
teams = row[2]
## Now lets compare the two with an if statement to see what happened:
if ou_line < runs_scored:
ou_result = "Under"
elif ou_line > runs_scored:
ou_result = "Over"
else:
ou_result = "Push"
## Instead of printing here like we did before, we want to write to the file
mlbWriter.writerow([date,teams,ou_result])
## END OF FOR LOOP
output_file.close() ## Close the file after we have written everything
print "The program has written everything!"
## End of program
Try running the program. After you do, you should see a new file has been created. This file will contain exactly what we expect
Code:
9/21/09,atl at nyn,Under
9/21/09,bal at tor,Under
9/21/09,bos at kca,Under
9/21/09,chn at mil,Under
9/21/09,min at cha,Over
9/21/09,nya at ana,Over
9/21/09,sdn at pit,Under
9/21/09,sln at hou,Under
9/21/09,tex at oak,Under
That's really all there is to basic input and output of files!
Section D) How to scrape the internet for data
To be continued......