Backtesting, theoretical and empirical data
I have a question regarding backtesting and introducing a correlation factor for empirical and theoretical data. The testing is for Soccer and currently compiling data to back test for the past 15 seasons (roughly before that time there was a rule change that changed the amount of points for a win).
Let's say in the past 15 years in the English Premier League, teams that had a theoretical probability of winning of 0.62 (based on my spreadsheet score/result predictor), actually won 59% of the time (0.59). In other words you would need to multiply the theoretical probability by 0.952 (0.59/0.62) to convert the theoretical value into its real, empirical value. I called this a correlation factor.
Would it be realistic to apply this correlation factor to the current season, ie. whenever my spreadsheet favours a team to win 62% of the time, should I adjust this figure by the correlation factor of 0.952 for the value of 62%, all the time. Of course, if I did this, I would need to know how much of the difference was distributed to the draw result and how much was distributed to the loss result and multiply them accordingly.
But will this increase the accuracy of my spreadsheet predictions?
To answer that, I think you need to answer: does the correlation factor stay fairly constant throughout each different historical season. Or does it vary greatly from season to season?
I'll have the answer to that once I finish compiling all the data of my 15 year backtest. However, I would love a second opinion. What does everyone think about all this and has anyone done any tests on this themselves? Does my idea sound good in principle or it is seriously flawed in a way that I've yet to discover?
Even if I can't use the data to improve the accuracy of current season predictions I suppose it will still be useful to backtest the accuracy of the spreadsheet predictor in past seasons.
Cheers for any input into this.