Hi, this is my first post here. First I'd like to say I wasn't a sports bettor until a few days ago when I started live testing a model I developed for NBA spreads. I'm primarily a software developer and have been mainly focused on machine learning and statistical analysis. I was browsing kaggle(a community of data scientists and machine learners) and found a rather messy NBA dataset which included moneylines and spreads for some 15k games. I cleaned the dataset up and scraped some more info including player/team advanced stats as well as totals lines. After working on it for a while I developed a binary classifier that works really well in back testing. The output from the classifier is in the form of a list of probabilities like this [0.3, 0.7] as an example. The prediction is the index of the largest number(1 in the example case). The output probabilities aren't very well calibrated. For example, over the validation data(last 2500 games) the brier score was 0.305 and pinnacle's brier score over that same sample(using the implied probabilities) was 0.251. The model's output probabilities aren't very well calibrated but still the higher the probability the higher the accuracy rate but not at a 1:1. The back test results were pretty promising given the difficulty of the task. The accuracy rate over 2500x sample games that I held out for validation data/back testing was ~64%. Here is the "classification report"(sklearn) for the validation data:
I "bin-ed" the output probabilities into 3 bins and then evaluated the accuracy rates of each bin on the validation data. I named them bronze, silver, gold:Code:validation report: precision recall f1-score support 0 0.64 0.67 0.65 1281 1 0.63 0.61 0.62 1219 micro avg 0.64 0.64 0.64 2500 macro avg 0.64 0.64 0.64 2500 weighted avg 0.64 0.64 0.64 2500
Code:total 2500 games in the back test/validation data bronze rating is 55.87% accurate over 1235.0 sample silver rating is 66.20% accurate over 722.0 sample gold rating is 78.82% accurate over 543.0 sample
Now I'm not really sure what the optimal bet sizes to use for each class/bin would be. I've read up on the kelly criterion and found a wealth of academic papers on the subject but from what I've read that is most likely not the best route. For example a fully kelly would say to bet roughly 50% of the bankroll on a pick with a 78% accuracy given the standard 52.38% implied probability(-110) but intuitively that seems like a bad idea.
I've thought about using a unit based system where I base a unit on some fixed percentage of the bankroll(typical sports bettors range between 1-5%) and then betting more/less based on the accuracy rate of the output class. For example, 1 unit on bronze, 2.5 on silver and 5 on gold but betting upwards of 25% of the bankroll on any one match seems excessive to me and 5% on an almost 79% chance to win seems overly cautious. Basically I'm trying to minimize risk/volatility while maximizing gain.
Hopefully some pros can shed some light on how they would structure their bet sizes given the back testing results