1. #1
    podonne
    podonne's Avatar Become A Pro!
    Join Date: 07-01-11
    Posts: 104

    Interesting article\alforithm on classifying winners

    I read an interesting article in last week's Economist about a method of classifying artworks by looking at a large set of quantifiable factors. As usual when I read things like this, I was thinking in my mind how to apply this to sports betting, namely classifing winners.

    From the article:
    All told, the computer identified 4,027 different numerical descriptors. Once their values had been established for each of the 513 artworks that had been fed into it, it was ready to do the analysis.
    Dr Shamir’s aim was to look for quantifiable ways of distinguishing between the work of different artists. If such things could be established, it might make the task of deciding who painted what a little easier. Such decisions matter because, even excluding deliberate forgeries, there are many paintings in existence that cannot conclusively be attributed to a master rather than his pupils, or that may be honestly made copies whose provenance is now lost.
    To look for such distinguishing features, Dr Shamir programmed the computer to use a statistical method that scores the strength of the distance between the values of two or more descriptors for each pair of artists. As a result, he was able to rank each of the 4,027 descriptors by how useful it was at discriminating between artists.
    Src: http://www.economist.com/node/21524699
    Replace the word "artwork" with "teams in games" and "artists" with "winners" and you can see why this was interesting. Through a little research I found the original paper and I got very excited when I read that he only used 513 artworks total (that's only about 256 games) and got these results:

    Each classifier was tested 50 times such that in each run the
    images were randomly allocated for training and test sets. The automatic classification between the paintings of Van
    Gogh and Pollock using low-level image content descriptors was accurate in just 92% of the cases, while the
    accuracy of the two-way classifiers between Pollock and Monet or Pollock and Renoir was 100% in both cases [38].
    The classification accuracy was also perfect when classifying Pollock and other painters such as Dali.
    Src: http://vfacstaff.ltu.edu/lshamir/publications/vangogh_pollock%20_final.pdf
    Pretty good results. The source code for the algorithm he used is called WND-CHARM and was originally written for classifying biological images. Available here: http://www.scfbm.org/content/3/1/13

    All I ask is that if you make something of this you'll share your results!

  2. #2
    brettd
    brettd's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 229
    Betpoints: 3869

    Automatic "classification"? "Training" sets and "test" sets?

    Sounds like discriminant analysis. Nothing new there, just applied to some interesting subject matter.

    http://en.wikipedia.org/wiki/Discriminant_analysis

  3. #3
    Jontheman
    Jontheman's Avatar Become A Pro!
    Join Date: 09-09-08
    Posts: 139
    Betpoints: 4073

    Van Gogh and Pollock are very different and anyone off the street with no knowledge, if given 5 minutes training on styles and distinguishing features, could achieve a 100% success rate in assigning a work to one or the other. Don't know why you're impressed that a computer could only manage 92%

  4. #4
    podonne
    podonne's Avatar Become A Pro!
    Join Date: 07-01-11
    Posts: 104

    Quote Originally Posted by Jontheman View Post
    Van Gogh and Pollock are very different and anyone off the street with no knowledge, if given 5 minutes training on styles and distinguishing features, could achieve a 100% success rate in assigning a work to one or the other. Don't know why you're impressed that a computer could only manage 92%
    One word, scalability. Also, you should read the article. The author found that Van Gogh and Pollock were more similar than other artists typically associated like Monet or Renoir:

    Surprisingly, the values of 19 of the 20 most informative descriptors showed dramatically higher similarities between Van Gogh (left below) and Pollock (right) than between Van Gogh and painters such as Monet and Renoir, who conventional art criticism would think more closely related to Van Gogh’s oeuvre than Pollock’s is. (Dalí and Ernst, by contrast, were farther apart then expected.)
    What is interesting, according to Dr Shamir, is that no single feature makes Pollock’s artistic style similar to Van Gogh’s. Instead, the connection is based on a broad set of image-content descriptors which reflect many aspects of the two artists’ styles, including a shared preference for low-level textures and shapes, and similarities in the ways they employed lines and edges.

  5. #5
    Jontheman
    Jontheman's Avatar Become A Pro!
    Join Date: 09-09-08
    Posts: 139
    Betpoints: 4073

    I don't think that invalidates my point. To a computer they may have similarities, but they are easy to tell apart by any human, even one with an untrained eye. In other words it confirms that computers are a LONG way behind in this aspects.

    Why would you want to scale something that is currently inferior to every human at making sense of information?

  6. #6
    podonne
    podonne's Avatar Become A Pro!
    Join Date: 07-01-11
    Posts: 104

    Quote Originally Posted by Jontheman View Post
    I don't think that invalidates my point. To a computer they may have similarities, but they are easy to tell apart by any human, even one with an untrained eye. In other words it confirms that computers are a LONG way behind in this aspects.

    Why would you want to scale something that is currently inferior to every human at making sense of information?
    Well, we're talking about an application to sports betting, and I think I'm safe in saying that its not easy for a human to distinguish between a team that will win and a team that will lose. Its not to hard to build a computer program that is better than 50% of humans at picking winners (and that's generous).

    Second, there are a finite number of games that a human can consider, even a human that can distinguish between winners and losers. If I required you to calculate 8,000+ numbers (4,000 for each team) for every matchup you would be hard pressed to handicap a day's worth of NCAA basketball. A computer can do it in a mater of minutes.

    Third, the whole idea is to find things that distinguish between winners and losers that are NOT easily distinguishable to the average person. Anything that's easy to see will already be included in the line. Its only the hidden things (like the subtle yet significant combinations of thousands of factors that make Van Gogh more similar to Pollak than Monet) that will make you money.

  7. #7
    Peregrine Stoop
    Peregrine Stoop's Avatar Become A Pro!
    Join Date: 10-23-09
    Posts: 869
    Betpoints: 779

    alforithm?

  8. #8
    Peregrine Stoop
    Peregrine Stoop's Avatar Become A Pro!
    Join Date: 10-23-09
    Posts: 869
    Betpoints: 779

    simple models make better predictions than complex ones

    simple models with human input make even better predictions

  9. #9
    chunk
    chunk's Avatar Become A Pro!
    Join Date: 02-08-11
    Posts: 805
    Betpoints: 19168

    Smart bird there.

  10. #10
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    Quote Originally Posted by podonne View Post

    All I ask is that if you make something of this you'll share your results!
    Right, count on it.

  11. #11
    vyomguy
    vyomguy's Avatar Become A Pro!
    Join Date: 12-08-09
    Posts: 5,794
    Betpoints: 234

    Quote Originally Posted by Peregrine Stoop View Post
    simple models make better predictions than complex ones simple models with human input make even better predictions
    This. You need to have human input to the models to have success.

  12. #12
    FuzzyMathGuru
    FuzzyMathGuru's Avatar Become A Pro!
    Join Date: 07-27-11
    Posts: 3

    These models never work. They are always history facing, meaning they make sense of what has happened in the past to predict the future, but they are NEVER graded on their predictions. They never apply their model to existing data for verification. When they use 1990-2000 data to predict what will happen in 2001, then you compare against the results, you'll see no advantage. Pass.

  13. #13
    vyomguy
    vyomguy's Avatar Become A Pro!
    Join Date: 12-08-09
    Posts: 5,794
    Betpoints: 234

    the biggest problem is to extract the features for training data.

  14. #14
    Peregrine Stoop
    Peregrine Stoop's Avatar Become A Pro!
    Join Date: 10-23-09
    Posts: 869
    Betpoints: 779

    Quote Originally Posted by FuzzyMathGuru View Post
    These models never work. They are always history facing, meaning they make sense of what has happened in the past to predict the future, but they are NEVER graded on their predictions. They never apply their model to existing data for verification. When they use 1990-2000 data to predict what will happen in 2001, then you compare against the results, you'll see no advantage. Pass.
    do you know every model being used?

Top