Model factor and sample size

TravisVOX · 01-24-14 02:33 AM

In creating a model, I have a factor where all of the observations have varying levels of frequency.

For example, if I was modeling bowling and had 10 players with varying amounts of games played, what can I do to that dataset so the average score is brought somewhat into line?

If six guys have 100 games, two have 80, another with 70 and finally one with just five games... what can be done to the data to (not sure the word here) normalize or smooth the data? The end result would be the five game player's value would be adjusted to account for its small sample size.

...or perhaps none if this makes sense? Any help is appreciated!

matthew919 · 01-24-14 08:52 AM

Look into using a hierarchical Bayesian model for this.

PM me if you'd like to work on a model together- I could go for a good degen project.

Grease King · 01-24-14 09:32 AM

statistical approach type people would probably just not bet on games involving the person with 5 games played. That may simplify what you are setting out to do

nash13 · 01-24-14 11:56 AM

I am working on a simillar project with Rating Systems and modelling their long term success. I use z transformation to compare different sample sizes.

TravisVOX · 02-01-14 02:25 PM

Thanks for the responses. Sorry I was slow to get back here - been busy.

I was told by one person to do it this way...

If n = number of games played, you take n / n + 2 and multiply that by his/her win percentage.

You then take 1 - (n / n+2) and multiply that by the population average. Finally, you add both components together to get a "weighted" or "smoothed" value. You can also, I'm told, use whatever value you want in the divisor... n / n +2, n / n + 5 etc.

Now, I'm not a mathematician, so don't shoot me here. I'm sure there is a term for this. I also question at which point you don't need to do this because of a player have a sufficient number of events. However, for some of my projects, where there are thousands of subjects with a wide range of "events played" - this is supposed to help bring those together so the modeler can use the factors effectively.

marcoforte · 02-14-14 07:30 PM

I'm not a statistician but use stats for quality control in biological products, I wouldn't use less than a sample size of 60 per contestant.

SBR Top-Rated Sportsbooks				Best Sportsbooks List
#1 FanDuel	SBR rating 4.8/5	Review	#6 BetRivers	SBR rating 4.1/5	Review
#2 Caesars	SBR rating 4.7/5	Review	#7 Fanatics	SBR rating 4.1/5	Review
#3 DraftKings	SBR rating 4.7/5	Review	#8 Betway	SBR rating 3.8/5	Review
#4 BetMGM	SBR rating 4.6/5	Review	#9 Borgata	SBR rating 3.5/5	Review
#5 bet365	SBR rating 4.6/5	Review	#10 ClutchBet	SBR rating 2.9/5	Review

Model factor and sample size

Thread Tools

Model factor and sample size