Interesting article\alforithm on classifying winners

podonne · 08-09-11 09:36 AM

I read an interesting article in last week's Economist about a method of classifying artworks by looking at a large set of quantifiable factors. As usual when I read things like this, I was thinking in my mind how to apply this to sports betting, namely classifing winners.

From the article:

All told, the computer identified 4,027 different numerical descriptors. Once their values had been established for each of the 513 artworks that had been fed into it, it was ready to do the analysis.
Dr Shamir’s aim was to look for quantifiable ways of distinguishing between the work of different artists. If such things could be established, it might make the task of deciding who painted what a little easier. Such decisions matter because, even excluding deliberate forgeries, there are many paintings in existence that cannot conclusively be attributed to a master rather than his pupils, or that may be honestly made copies whose provenance is now lost.
To look for such distinguishing features, Dr Shamir programmed the computer to use a statistical method that scores the strength of the distance between the values of two or more descriptors for each pair of artists. As a result, he was able to rank each of the 4,027 descriptors by how useful it was at discriminating between artists.
Src: http://www.economist.com/node/21524699

Replace the word "artwork" with "teams in games" and "artists" with "winners" and you can see why this was interesting. Through a little research I found the original paper and I got very excited when I read that he only used 513 artworks total (that's only about 256 games) and got these results:

Each classifier was tested 50 times such that in each run the
images were randomly allocated for training and test sets. The automatic classification between the paintings of Van
Gogh and Pollock using low-level image content descriptors was accurate in just 92% of the cases, while the
accuracy of the two-way classifiers between Pollock and Monet or Pollock and Renoir was 100% in both cases [38].
The classification accuracy was also perfect when classifying Pollock and other painters such as Dali.
Src: http://vfacstaff.ltu.edu/lshamir/publications/vangogh_pollock%20_final.pdf

Pretty good results. The source code for the algorithm he used is called WND-CHARM and was originally written for classifying biological images. Available here: http://www.scfbm.org/content/3/1/13

All I ask is that if you make something of this you'll share your results!

brettd · 08-09-11 10:27 AM

Automatic "classification"? "Training" sets and "test" sets?

Sounds like discriminant analysis. Nothing new there, just applied to some interesting subject matter.

http://en.wikipedia.org/wiki/Discriminant_analysis

Jontheman · 08-09-11 11:24 AM

Van Gogh and Pollock are very different and anyone off the street with no knowledge, if given 5 minutes training on styles and distinguishing features, could achieve a 100% success rate in assigning a work to one or the other. Don't know why you're impressed that a computer could only manage 92%

podonne · 08-09-11 12:40 PM

Originally Posted by Jontheman

Van Gogh and Pollock are very different and anyone off the street with no knowledge, if given 5 minutes training on styles and distinguishing features, could achieve a 100% success rate in assigning a work to one or the other. Don't know why you're impressed that a computer could only manage 92%

One word, scalability. Also, you should read the article. The author found that Van Gogh and Pollock were more similar than other artists typically associated like Monet or Renoir:

Surprisingly, the values of 19 of the 20 most informative descriptors showed dramatically higher similarities between Van Gogh (left below) and Pollock (right) than between Van Gogh and painters such as Monet and Renoir, who conventional art criticism would think more closely related to Van Gogh’s oeuvre than Pollock’s is. (Dalí and Ernst, by contrast, were farther apart then expected.)
What is interesting, according to Dr Shamir, is that no single feature makes Pollock’s artistic style similar to Van Gogh’s. Instead, the connection is based on a broad set of image-content descriptors which reflect many aspects of the two artists’ styles, including a shared preference for low-level textures and shapes, and similarities in the ways they employed lines and edges.

Jontheman · 08-09-11 02:10 PM

I don't think that invalidates my point. To a computer they may have similarities, but they are easy to tell apart by any human, even one with an untrained eye. In other words it confirms that computers are a LONG way behind in this aspects.

Why would you want to scale something that is currently inferior to every human at making sense of information?

podonne · 08-10-11 04:16 PM

Originally Posted by Jontheman

I don't think that invalidates my point. To a computer they may have similarities, but they are easy to tell apart by any human, even one with an untrained eye. In other words it confirms that computers are a LONG way behind in this aspects.

Why would you want to scale something that is currently inferior to every human at making sense of information?

Well, we're talking about an application to sports betting, and I think I'm safe in saying that its not easy for a human to distinguish between a team that will win and a team that will lose. Its not to hard to build a computer program that is better than 50% of humans at picking winners (and that's generous).

Second, there are a finite number of games that a human can consider, even a human that can distinguish between winners and losers. If I required you to calculate 8,000+ numbers (4,000 for each team) for every matchup you would be hard pressed to handicap a day's worth of NCAA basketball. A computer can do it in a mater of minutes.

Third, the whole idea is to find things that distinguish between winners and losers that are NOT easily distinguishable to the average person. Anything that's easy to see will already be included in the line. Its only the hidden things (like the subtle yet significant combinations of thousands of factors that make Van Gogh more similar to Pollak than Monet) that will make you money.

Peregrine Stoop · 08-11-11 04:13 PM

alforithm?

Peregrine Stoop · 08-11-11 04:14 PM

simple models make better predictions than complex ones

simple models with human input make even better predictions

chunk · 08-11-11 06:02 PM

Smart bird there.

Wrecktangle · 08-12-11 08:12 AM

Originally Posted by podonne

All I ask is that if you make something of this you'll share your results!

Right, count on it.

vyomguy · 08-12-11 01:26 PM

Originally Posted by Peregrine Stoop

simple models make better predictions than complex ones simple models with human input make even better predictions

This. You need to have human input to the models to have success.

FuzzyMathGuru · 08-15-11 12:58 PM

These models never work. They are always history facing, meaning they make sense of what has happened in the past to predict the future, but they are NEVER graded on their predictions. They never apply their model to existing data for verification. When they use 1990-2000 data to predict what will happen in 2001, then you compare against the results, you'll see no advantage. Pass.

vyomguy · 08-15-11 01:08 PM

the biggest problem is to extract the features for training data.

Peregrine Stoop · 08-15-11 09:19 PM

Originally Posted by FuzzyMathGuru

These models never work. They are always history facing, meaning they make sense of what has happened in the past to predict the future, but they are NEVER graded on their predictions. They never apply their model to existing data for verification. When they use 1990-2000 data to predict what will happen in 2001, then you compare against the results, you'll see no advantage. Pass.

do you know every model being used?

SBR Top-Rated Sportsbooks				Best Sportsbooks List
#1 FanDuel	SBR rating 4.8/5	Review	#6 BetRivers	SBR rating 4.1/5	Review
#2 Caesars	SBR rating 4.7/5	Review	#7 Fanatics	SBR rating 4.1/5	Review
#3 DraftKings	SBR rating 4.7/5	Review	#8 Betway	SBR rating 3.8/5	Review
#4 BetMGM	SBR rating 4.6/5	Review	#9 Borgata	SBR rating 3.5/5	Review
#5 bet365	SBR rating 4.6/5	Review	#10 ClutchBet	SBR rating 2.9/5	Review

Interesting article\alforithm on classifying winners

Thread Tools

Interesting article\alforithm on classifying winners