Creating betting models?

dude_bg · 04-12-11 06:51 AM

I have been following NBA for many years (i do have bit over the average knowledge) This is my first year betting basketball. I saw a lot of people having betting models they follow. I would be happy for any hints how to make one and back test it in someway. I am not expecting someone to reveal his secrets, but i would appreciate any guidance

demens · 04-12-11 10:56 AM

Yeah, good luck with that. Noone wants to share shit around here. Plus there aren't too many that are even worth listening too in the NBA section. You're better off searching through the thinkthank section.

dude_bg · 04-12-11 11:30 AM

Originally Posted by demens

Yeah, good luck with that. Noone wants to share shit around here. Plus there aren't too many that are even worth listening too in the NBA section. You're better off searching through the thinkthank section.

Thanks for this, this section seems very useful.

demens · 04-12-11 11:53 AM

Originally Posted by dude_bg

Thanks for this, this section seems very useful.

You might want to check this link out for some ideas.
http://www.rawbw.com/~deano/methdesc.htm

Lots of people recommend the book in the SBR Store too.

ManBearPig · 04-12-11 01:43 PM

There's no easy way but step 1 is figure out what to store and how to store it. Depending on your skill level this could be a database or a simple excel SS.

From there you start by trying to form some type of hypothesis and then try to break it. I recently came across this post and it is an approach that I've been trying stick to and this best best sums it up.

Hope this begins to answer you're questions. Oh and get a couple book to start you out as well...Basketball on Paper, Mathletics, and Conquering Risk are some of my favs. These are only initial steps as there is a plethora of other things you could consider, but understanding how to recognize a profitable trend is the number one focus you should have. There is no quicker way to lose money than to start betting a system/trend that hold no long term weight. Whether people care or not this all comes down to math and the sooner you understand how the more successful you can be. A lot of people win without systems, but if they don't know what it means it doesn't mean they will make money.

Same link that Demens posted (his didn't work) -> http://www.rawbw.com/~deano/

I agree with him as well that you'll be hard pressed to get too much help with this, some are more helpful than others, but it's going to be a lot of you and the time you want to put into researching. There is a lot of information on this site if you search from older posters and such - I found that helpful in spots too.

Dear Friends,
Two sections to this post. I will be around to answer questions as I can.
1) Here's a post I had a while back regarding data mining trends vs
predictive trends: Distinguishing trends that are data fitting from truly
predictive ones is key.
I suggest you don't use the entire database for your queries.
Develop your systems using no more than 30-40% of the available years. (i.e. always include 1998<=season<=2002 in your queries for example) Now, apply the system to the remaining seasons' unseen data. Did
your system hold up? If it did, you're on your way to a predictive system. Sometimes you will need to optimize your system with additional query parameters to tip it to a profitable state. This can happen due to the smaller sample size you're using to develop your angle. Keeping it simple is desirable, but simple systems may be difficult to find
using the above procedure. How do you tell when you've over-optimized and are now curve fitting data?

There is an advanced modeling technique based on how neural nets learn which can help in these cases. If you want to give it a shot you will need 3 datasets instead of 2. In the above example we have a training set (the data we're using to develop our system) and a test set (the unseen data). Instead of that, simply divide your data into 3 sets - a training set, a test set, and a test-while-training set. Example - the NBA database has 10 years, so set aside 3 years for the training a.k.a. development of your system, 3 years for the test-while training, and 4 years to test it's predictive ability.

Why do this? It's easier to have an intermediary step (test-while- training) to provide iterative feedback on your development of your system because you will now be able to fine tune your system with predictive query strings and not data mined crap and still have raw data (test data) that the system you have hasn't seen. How you might use this procedure:

1) Develop your system query using your training data
2) Apply it to the test-while-training data.
3a) If the results are similar that is a good sign. If you're happy already, go ahead and apply the query to the test data. If the results are still good, you're done and you have a very good probability you have a predictive model. If you want to see if you can further improve the ROI go to step 3b.
3b) If the results are inconclusive or the ROI drops off too much after applying the training query to the test-while-training data, go back to the training data and optimize it, one step at a time. You can remove query parameters one by one and then apply to the test-while- training data or you can add parameters. There are infinite combinations. You are most likely on the right track when the test- while-training data has similar or slightly better results to the training data. Keep going back and forth with the training data and the test-while-training data until you like the consistency between the datasets.
4) Now for the big moment. Run your query on the test data. Don't be too upset if it crashes - that's what will happen most of the time. If you've gone through all the steps and you crash there is a high probability your angle isn't worth persuing. Let me tell you though, it's pretty cool when it does hold up. Then it Miller time.
2) How do mathematicians measure trend quality? The most common way is called the Z-value or Z-score. In a nutshell it's a measure of the number of standard deviations a trend has vs. a coin flip.

The formula assumes all bets at -110:
(Wins - Losses)/Square Root(Wins + Losses)
Example: A trend is 300 wins, 250 losses, thus:
(300-250)/Square Root(300+250)
Further reduced:
50/Square Root(550)
Further reduced:
50/23.452 or a Z-score of 2.13

Z-scores above 2, or 2 standard deviations, are worth investigating further. Z-scores above 3 are rare, perhaps 1 in 5 or 10 thousand chance of being random. The realistic scale is 0 to about 4 or 5. 0 to 2 is inconclusive, 2-3 is of interest, 3+ is highly interesting.

dude_bg · 04-20-11 01:02 AM

my aim will be to handicap the games by my system once it is ready and to see what difference between my line and actual line is good enough to hit like 55-58 %

suicidekings · 04-20-11 03:19 AM

Originally Posted by dude_bg

Thanks for this, this section seems very useful.

Read everything Justin7 and Ganchrow have written, and pick up the book, "Conquering Risk" from the SBR Store. Your first model you ever build will suck, straight up. But they'll get progressively better as you gain a better appreciation for the math behind the model.

demens · 04-20-11 09:49 AM

Honestly, i'm a novice at this but i really doubt there is a model out there that hits 58% season to season. No matter how complicated it is, or how advanced the math is (and i've seen some insane shit), at the end of the day this is still sports. I dont care what kind of prediction algorithms you have this is still real life where anything can happen at any time and you can't predict anything. You're betting on the human element not robots.

I just found a cool stat for you. The average NBA win is by about 10 points, the average NBA spread is about 3. Spreads are off the end results by an average of 7 points. So when people run around the forums talking about how the lines are sharp i really think they dont have a clue of what they are actually saying.

You're really going against yourself by dedicating too much time to perfecting a model. Are you really hoping to discover something Vegas doesn't already know? And even if you do, and lets say your lines are better then Vegas lines you're still gonna lose plenty because the lines will never consistently reflect exactly how the game is played.

The best use of a model is just to have a starting point to compare lines. If your model gives you what you consider fair lines you can compare them with Vegas lines and have a better understanding of WHY the Vegas line is what it is which i think is a major key to success.

But at the end of the day, even if you get the Lakers spread is inflated to -12 because they are the Lakers, and you get -8 for that game so you know +EV is on betting against LA. And you do, and LA wins by 25. Because the shit is unpredictable. And anyone that runs some probability on this scenario and tells you that it might lose due to variance but in the long run is gonna hit at whatever % is just guessing. Because there is more to it then math, maybe LA just beat that opponent senseless all the time, Maybe the oppenents best player has a mistress in LA and is exhausted in all the games. There are a million outside factors that can make this mathematical quest useless.

No matter what genius model you create, its still gonna be extremely rare you find mistakes in Vegas lines, and if you do it does not even mean that that game will go like you expect it to.

But if you are talking about creating something for an unpopular sport then you might have a chance. Then you do have a chance to have much more "accurate" lines then Vegas that should produce winners. But are you willing to do extensive research on a sport noone gives a shit about, even watch it sometimes. And probably develop your own databases and keep all the stats because they are not kept online for you like for NBA or NFL, etc?

Anyway, my 2 cents.

widebody2 · 04-20-11 11:27 AM

Hey Demens did you get that NBA point spread stat from my thread? Average line is somewhere in the 3.xx range while I thought the average game would be decided by around 10.5. The only math that I did to figure out the actual point spread was based on info that was given to me by someone else so it may or may not be accurate, but anyway I came up with 6.32 as the average margin of victory for NBA games in 2010-11.

Just to clarify a few things, the vegas point spreads of 3.xx do not actually correlate exactly to how much Vegas thinks the lakers will beat the Cavaliers by for example, that 3.xxx is actually only a tipping point where vegas thinks 50% of bettors will be on each side. They probably came up with Lakers by 5-7 but need to adjust down to correct for the possibility that the Lakers may not win at all. The spreads are also a basic tipping point where Vegas believes 50% of favorites will score both above and below this point. So it is an average win margin for favorites that has been reduced by the correction made for the 30% that the underdog actually manages a win.

The average game is decided by 6.xxx, the average spread is half of that. That is because 30% of the time the underdog will win. The Cavaliers will beat the lakers by 6.xxx. So for every 2 times the lakers beat the cavs by 6.xxx the cavs will beat the lakers once. This is why the vegas lines are so much lower than the actual average margins. But by average whoever wins the game will win by a great deal more than the predicted point spread.

I have never seen any evidence of this but I do believe a model with a 55% win rate is possible. Even betting every single game long term. Math queries on many various sites will show you that Vegas is giving us 2% or more in multiple areas. For example if you simply take the underdog always you will be up by 2% or so. This actually does not mean that Vegas is giving up that 2%. I would venture that it means an extra 2% of the population is always betting on the favorites, and vegas knows this, so they keep there spreads right where they are. But never-the-less, that 2% is there for us to take.

Obviously using math you will not be able to pin point human performance. But I think you can do pretty well. The right team only wins 70% of the time anyway. We are not trying to be exact. We are only trying to improve by 5% over the tipping point that Vegas decides will produce 50% bettors on each side of the line.

Does anyone know of anyone, ever, using a formula that produced 54% or higher wins? I believe that the computer group's math guy was producing upwards of 60% but I would not consider that relevant anymore since he was not competing against other formulas...he was the first and was a big impetus for change in the way vegas created their lines

ManBearPig · 04-20-11 12:22 PM

Originally Posted by widebody2

Does anyone know of anyone, ever, using a formula that produced 54% or higher wins? I believe that the computer group's math guy was producing upwards of 60% but I would not consider that relevant anymore since he was not competing against other formulas...he was the first and was a big impetus for change in the way Vegas created their lines

This isn't entirely true...there's a guy who was part of the computer group that is posting plays (service) and hits at very high rates using a lot of the same methods they used back then...you probably won't believe me so Google Computer Prediction and you should come across it.

***Only read if you don't get bored easily***

This has been beaten to a pulp on whether systems or math are actually worth your wild or time so here's my semi-quick addition. I think it comes down to the individual and how much time they want/like to spend crunching numbers and what you are looking to accomplish, which could be more than just hitting 55%.

Demens is correct in that you can't expect to put a model together that will churn out winner after winner at a 60% clip and no matter how smart your algorithm is you can't predict the unpredictable and at the end of the day there is always going to be some form or variance and regression to the mean.

I think models are best used as a handicapping tool that will help you find plays that give you the best edge and make your decision based on that an some additional factors. You're still handicapping, but you model is only a tool that helps to save you time to make an educated decision. No one person will be able to predict the future 100% and although there are some smart people who do a good job you can't get it all right.

Do you know that the government actually has a guy they consult who is a professor at Harvard or Princeton that uses advanced modeling techniques to predict future events of the world and from what I understood does a pretty good job at it. Anyone that tries to tell you that modeling and math is crap and useless is too dumb or ignorant to understand and their opinion won't mean much anyways.

Like any good handicapper you have to be willing to evolve if you want to have any sort of long term success. You can't approach this game the same way and expect the same results every and every year. Models are no different trends that may have been successful to bet on last year won't be successful this year or in 5 years. Models need maintenance like anything and part of the tricky part is recognizing when you've regressed and your 60% hit rate has regressed and you're actually only hitting at 51% now. I actually find myself spending a lot of time not finding plays but experimenting with ways to analyze data and I suppose one day I'll find something I like and go from there. I actually have found it a fun way to put my programming to use on something more interesting than what I do at work.

It won't make you an instant winner or a millionaire, but understanding the math and certain statistical methods will not hurt you and can be profitable in the long run. I've read at least 5 books on this subject not counting the countless websites and articles in the last couple months and it's amazing how many people do this type of thing just for the fun of it and don't even bet - PHD types essentially.

Get Conquering Risk to start, anything by Stanford Wong, King Yao, Basketball on Paper is gold to name a few...there are more advanced books you can look at from there if you want to get serious.

As you can see there's no right or wrong way to approach this and it's not a very small topic to approach but that's why you start small and build from there. If you're already an accomplished better just look at it as a way to take what's in your head and have a computer do it for you.

widebody2 · 04-20-11 01:27 PM

So manbearpig this is obviously a topic that you are very interested and have put a lot of thought into. Do you think 55% over the course of 900-1000 bets per NBA season is possible using a system alone?

Over that many bets 55% is nothing to sneeze at.

dude_bg · 04-20-11 02:37 PM

guys, i would think that the betting model would be easy to implement for totals, as the winner/loser contribute to the outcome anyway
you don't need to predict the side but the effort
i guess that's why 70kg man is hitting like 60 % on totals, he has said he uses a model

demens · 04-20-11 02:45 PM

How long has he been doing it for? Cause the league goes through phases, something that was hitting 60% this season might hit 40% next.

I think to be successful with totals you have to keep up with pace stats. Not so much an individual teams pace but more how the opponent reacts to it. And you have to do that for every team A in the league. I think its simple enough just takes a bit of setting up but i'm sure there is a clever way of doing it so it does most of the work by itself. But at the end of the day you gotta ask yourself, is your totals model really stronger then Vegas?

suicidekings · 04-20-11 03:19 PM

Originally Posted by demens

How long has he been doing it for? Cause the league goes through phases, something that was hitting 60% this season might hit 40% next.

I think to be successful with totals you have to keep up with pace stats. Not so much an individual teams pace but more how the opponent reacts to it. And you have to do that for every team A in the league. I think its simple enough just takes a bit of setting up but i'm sure there is a clever way of doing it so it does most of the work by itself. But at the end of the day you gotta ask yourself, is your totals model really stronger then Vegas?

Who says Vegas always puts out lines that are accurate representations of the predicted score? Sometimes there's a big difference between where the line should be and where it is. With that in mind, the proposition becomes one where you're identifying when the line offered by the books is not one that's entirely based on the numbers. The relative power ratings between teams change, but public perception tends to move a lot slower. There's your edge.

demens · 04-20-11 03:37 PM

Manbear,

I remember you talking about creating a database. I'm not familiar with how its done so i'll ask a simple question. Do you enter data manually (as in for each game) or do you have the set up automated?

dude_bg · 04-21-11 01:16 AM

i have read some of the thinktank, they do it automatically as importing csv files in a database. But not sure who is this collected. I guess they enter the boxscore pages

suicidekings · 04-21-11 01:19 AM

Originally Posted by dude_bg

i have read some of the thinktank, they do it automatically as importing csv files in a database. But not sure who is this collected. I guess they enter the boxscore pages

Player stats are available in numerous locations online, such as USA Today's sports section. For boxscores, you can scrape them yourself or pay a service like nbastuffer.com a nominal fee for the info, updated weekly.

demens · 04-21-11 09:44 AM

I'm not sure if i can write a prog myself for scraping. My prog skills are very rusty plus i never liked doing this crap in school anyway. I was good at it tough so maybe something i'm interested in would be fun.

Not sure which way to go to get started with the database. I'm definitely not the type to pay for something i can do myself.

1 option is to use excel, (dont think i can write the Macro code for OO Calc that i have cause people are saying it doesn't use VisualBasic), i've seen some example of VidualBasic code that was use for scraping. Just have to see if i can make it fit the websites i want. I looked at USAToday and their pages look pretty simple, i like it. Another site is basketball-reference. Have not thought about which site to use for lines, it would be nice to get openers and closers. Also 2h lines.

If i'm successful with excel it shouldn't be too hard to format the data into a cvs file and move it to a database. I've never used any database proggy so i have to look those up. From what i can tell you can't go wrong with mySQL. I'm sure they all do the same thing anyway.

1 thing i dont like about the idea of Excel is from the VBAcode i've seen (posted by uva in thinktank) it'll create tons of worksheets, might be a pain.

I might try using Python and following the stuff from the into to research thread (too bad the guy never finished it, it could have been the best thread on the forum). Again the goal is to have the data in cvs format or simple txt file.

Now i'm thinking ahead and trying to figure out how to structure the database after that. The sample off NBAStuffer looks nice. Date, teams, qtr scores, final score, all the game stats, some adv stats and starting 5s. all in 1 line (this is in excel).

That seems simple enough for the database, like in table format. But i would like to save the player stats as well. Might be jumping the gun with this question cause it might be something very simple that i'll see once i start playing with the Prog, but for now i'm not sure if i can keep a record of all the games in similar style line by line in a table, but have each game have like a tree format (kind of like folders) that when you click on shows player stats. It would look too clunky if full boxscores where just listed 1 by 1 (like i assume it will be in the cvs file).

I'm i going in the right direction here? The guys that have databases with boxscores what is your set-up like?

They should move this to the ThinkTank btw.

therushishere · 08-02-12 04:36 AM

Originally Posted by ManBearPig

I've read at least 5 books on this subject not counting the countless websites and articles in the last couple months and it's amazing how many people do this type of thing just for the fun of it and don't even bet - PHD types essentially.

Where do you find the PHD types putting together models and not using them?

SBR Top-Rated Sportsbooks				Best Sportsbooks List
#1 FanDuel	SBR rating 4.8/5	Review	#6 BetRivers	SBR rating 4.1/5	Review
#2 Caesars	SBR rating 4.7/5	Review	#7 Fanatics	SBR rating 4.1/5	Review
#3 DraftKings	SBR rating 4.7/5	Review	#8 Betway	SBR rating 3.8/5	Review
#4 BetMGM	SBR rating 4.6/5	Review	#9 Borgata	SBR rating 3.5/5	Review
#5 bet365	SBR rating 4.6/5	Review	#10 ClutchBet	SBR rating 2.9/5	Review

Creating betting models?

Thread Tools

Creating betting models?