Model factor and sample size

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • TravisVOX
    SBR Rookie
    • 12-25-12
    • 30

    #1
    Model factor and sample size
    In creating a model, I have a factor where all of the observations have varying levels of frequency.

    For example, if I was modeling bowling and had 10 players with varying amounts of games played, what can I do to that dataset so the average score is brought somewhat into line?

    If six guys have 100 games, two have 80, another with 70 and finally one with just five games... what can be done to the data to (not sure the word here) normalize or smooth the data? The end result would be the five game player's value would be adjusted to account for its small sample size.

    ...or perhaps none if this makes sense? Any help is appreciated!
  • matthew919
    SBR Sharp
    • 11-21-12
    • 421

    #2
    Look into using a hierarchical Bayesian model for this.

    PM me if you'd like to work on a model together- I could go for a good degen project.
    Last edited by matthew919; 01-24-14, 10:03 AM.
    Comment
    • Grease King
      SBR Sharp
      • 10-29-13
      • 383

      #3
      statistical approach type people would probably just not bet on games involving the person with 5 games played. That may simplify what you are setting out to do
      Comment
      • nash13
        SBR MVP
        • 01-21-14
        • 1122

        #4
        I am working on a simillar project with Rating Systems and modelling their long term success. I use z transformation to compare different sample sizes.
        Comment
        • TravisVOX
          SBR Rookie
          • 12-25-12
          • 30

          #5
          Thanks for the responses. Sorry I was slow to get back here - been busy.

          I was told by one person to do it this way...

          If n = number of games played, you take n / n + 2 and multiply that by his/her win percentage.

          You then take 1 - (n / n+2) and multiply that by the population average. Finally, you add both components together to get a "weighted" or "smoothed" value. You can also, I'm told, use whatever value you want in the divisor... n / n +2, n / n + 5 etc.

          Now, I'm not a mathematician, so don't shoot me here. I'm sure there is a term for this. I also question at which point you don't need to do this because of a player have a sufficient number of events. However, for some of my projects, where there are thousands of subjects with a wide range of "events played" - this is supposed to help bring those together so the modeler can use the factors effectively.
          Comment
          • marcoforte
            SBR High Roller
            • 08-10-08
            • 140

            #6
            I'm not a statistician but use stats for quality control in biological products, I wouldn't use less than a sample size of 60 per contestant.
            Comment
            SBR Contests
            Collapse
            Top-Rated US Sportsbooks
            Collapse
            Working...