In creating a model, I have a factor where all of the observations have varying levels of frequency.
For example, if I was modeling bowling and had 10 players with varying amounts of games played, what can I do to that dataset so the average score is brought somewhat into line?
If six guys have 100 games, two have 80, another with 70 and finally one with just five games... what can be done to the data to (not sure the word here) normalize or smooth the data? The end result would be the five game player's value would be adjusted to account for its small sample size.
...or perhaps none if this makes sense? Any help is appreciated!
statistical approach type people would probably just not bet on games involving the person with 5 games played. That may simplify what you are setting out to do
I am working on a simillar project with Rating Systems and modelling their long term success. I use z transformation to compare different sample sizes.
Thanks for the responses. Sorry I was slow to get back here - been busy.
I was told by one person to do it this way...
If n = number of games played, you take n / n + 2 and multiply that by his/her win percentage.
You then take 1 - (n / n+2) and multiply that by the population average. Finally, you add both components together to get a "weighted" or "smoothed" value. You can also, I'm told, use whatever value you want in the divisor... n / n +2, n / n + 5 etc.
Now, I'm not a mathematician, so don't shoot me here. I'm sure there is a term for this. I also question at which point you don't need to do this because of a player have a sufficient number of events. However, for some of my projects, where there are thousands of subjects with a wide range of "events played" - this is supposed to help bring those together so the modeler can use the factors effectively.