Hi, I've been interested in betting on tennis for the last 2 years. I've messed around with various models and variations. At the moment I use mostly match results and some iterative algorithms that are publicly available and I get a pretty good percentage prediction.
I'd like to know, what's a ballpark minimum correct prediction percentage. It seems the bookies get the favourite right about 80% of the time in top level tennis, and down towards 70% in lower levels. Should I be looking to out predict them or is there a better strategy? Some weeks I do out predict the bookies, but others not so much, and I drop money. I use a kelly system, but perhaps I don't understand what I'm doing, as I lose with it when I say WTF and bet whatever my kelly calculations say. Then go back to flat betting to build up my bank again. I'm not sure whether wikipedia's description is correct because I read here in the forum that I should be looking to make your edge, not what the kelly formula says (i.e. 0.65 prob of win * $3 (0.65*3-1)/(3-1)= bet large, almost half your bank in full kelly).
I do a lot of backtesting, but my odds data is always incomplete because I scrape a few sites, but none seem to offer odds on all tennis tournaments. I seem to predict well overall for a weeks games, but the ones I bet on, I will only get about 2/3 of what I correctly predict overall.
I've read a bit of the things on the forum so far, and it's good. There's a bit to wade through. I guess I 'd like some pointers to the best way to statistically analyse what I'm doing. Not that my predictions are false, but my betting strategy. I seem to be great at betting on the wrong match.
Honestly, I know nothing about tennis. But I can tell you this much: it seems like you're still in the developmental stage of your model. Nothing wrong with that, gotta start somewhere. However, you definitely CANNOT use Kelly (or even 1/2 Kelly) without being able to accurately quantify your edge and your standard error. Personally, if I were you I'd stick to flat betting a small percentage.
One small piece of advice I would offer is quite simple. Is your model improving over time? More appropriately, does it perform better as it acquires more information? This might seem obvious, but you'd be surprised at how easy this is to overlook. Calculating error is probably difficult quantitatively with regard to tennis matches, but I'm sure there's a way. I know that feeling very well, the one where you feel like you're picking winners, but when you throw the money down you just seem to pick the wrong match. Each year your model will get a little bit better, and over time these scenarios will be a thing of the past.
As to your % winners question to make a sustainable profit, I can't really help. I do know that tennis bets involve a lot of chalk, and that kind of variance can be a real bankroll destroyer (yet another reason to avoid Kelly altogether). You seem like you're heading down the right path though. Hope this was at least a little bit helpful. BOL.
Thanks for your reply. I agree that I'm in the developmental stage. One of the reasons I'm trying Kelly betting is to keep improving. Kelly seemed like a way to not bet when there was no long term way to make a profit.
I think the model is improving, I've tweaked it recently to take into account recent form, which has uped the win percent a few digits. I'm just stagnating on ideas to improve it further. I guess I don't have sufficient statistical background. I have done a statistics unit or two at uni for a psych course, but not much of it seemed applicable, at least those things I remember.
Hi, I've another question. How do I ensure that my data is good? Are there any tests? I get data from one site (I won't include links, I've just learned the hard way) regularly, which is pretty good, has timestamp and some odds data. It does have the odd incorrect match and marks walkovers at a 1-0 set win. I use some of the main tennis organization sites to correct the data periodically. This is probably pretty nebulous, but any rules of thumbs/guidelines/tests?
Thanks again.
Does anybody know where I can get an electronic (Kindle/PDF/Whatever) version of 'without a tout'? I found a site that (research publishers) that has a download copy for sale, but their website shopping cart is borked.
Thanks. Perhaps there's not so many folks betting on tennis then, an immature market?
I'll post some results from a test I ran over 3 and a bit years. If anybody would like to critique it, tell me why I got it all wrong, please do.
I got the idea from Conquering Risk, which I saw in a thread here and got a copy from Amazon.
Anyway, tell me if any of my assumptions for Z-Score calculations are wrong.
I put odds (decimal) into 5 cent buckets and all results for a game with that bucket's odds got jammed into that bucket (i.e $1.02, $1.04 go into 1.00, which creates a divide by zero in first bucket, but I'm never going to bet on a $1.04 favourite, so meh) It reduces the numbers to digest and having 20 bets (say) for $1.00-$1.04 instead of 2 for 1.01, 3 for $1.02 seems more statistically valid. I figured that using the bucket odds (lowest odds) would give a conservative result. Is this assumption(s) bogus?
Second, to calculate the average probability of a bet for this odds bucket, take the inverse. That is 1/$2.00 = 50% or 0.5 chance of winning. This is the implied probability of the odds. I used this value to calculate the mean expected result. I.e. prob * number of games played in bucket = mep. I think of the mep as the number of implied wins. This probability will always be the least of the bucket, 1/$1.00 not 1/$1.04, so conservative. Another bogus assumption?
The standard deviation was just the square root of the implied probability multiplied by the probability of a loss (1- implied prob.) by the number of items in a bucket. All problems/features of using implied probability affect the standard deviation.
Finally, the Z-Score, just the number of wins less the implied wins (mep) divided by the standard deviation. It's as accurate as the values used to calculate it.
So, if that's rubbish, the results I'm posting likewise will be rubbish. Sorry about formatting, still learning how to use the forum.