Oleg Zabluda's blog
Thursday, November 29, 2012
 
For 2012 presidential elections, among ~25 polling firms with 5+polls in last 21 days, there were plenty of polling...
For 2012 presidential elections, among ~25 polling firms with 5+polls in last 21 days, there were plenty of polling firms who beat in accuracy a  simple average of all polls.

Among 5 top poll aggregators, Nate Silver was most accurate [1,2] with RMSE=1.93. But he didn't beat simple average of those 5 (RMSE=1.67).

Among polling firms with 5+polls in last 21 days, most accurate was IBT/TIPP  with 11 polls and Avg.Err = 0.9%.

Average of polling errors:
wholly or partially online - 2.1%.
live telephone interviewers - 3.5%
robopolls by automated script - 5.0%
phone calls w/o cellphones - 4.7%
phone calls with cellphones - 3.5%

http://marginoferror.org/2012/11/08/aggregating-the-aggregates/

http://fivethirtyeight.blogs.nytimes.com/2012/11/10/which-polls-fared-best-and-worst-in-the-2012-presidential-race/

[1] A very cool post, which shows all pollsters with their error margins, and Brier scores of all aggregators, as well as "coin toss" and "2008 repeat", according to which Nate Silver was not the best, but #3, suffering from apparently incorrect stated confidence:
http://appliedrationality.org/2012/11/09/was-nate-silver-the-most-accurate-2012-election-pundit/
http://appliedrationality.org/2012/11/09/was-nate-silver-the-most-accurate-2012-election-pundit/#comment-93
http://www.gwern.net/2012%20election%20predictions

[2] For Senate races, he was pretty much dead last out of five. Simple average was average:
http://marginoferror.org/2012/11/12/its-good-to-be-average/
http://marginoferror.org/2012/11/08/aggregating-the-aggregates/

Labels:


 
Discussion (mostly in the comments) of collaborative filtering history, going back to YouTube (2010), Amazon...
Discussion (mostly in the comments) of collaborative filtering history, going back to YouTube (2010), Amazon (1998), Firefly (1996), academia Resnick et, al (1994), Xerox PARC Tapestry (1992)

http://glinden.blogspot.com/2011/02/youtube-uses-amazons-recommendation.html

Amazon invented particularly efficient and effective algorithm for recommender systems, which endured almost unchanged since 1998 through all the work of the Netflix Prize, etc. YouTube's 2010 paper is a minor variation of Amazon's 1998 paper.

http://glinden.blogspot.com/2011/02/youtube-uses-amazons-recommendation.html?showComment=1296883888409#c1403641366477195631 """
There are 2 implementations of collaborative filtering, both based on the same user-item matrix.  Let users be rows, and items e the columns.

User-user collaborative filtering finds rows which are similar, yielding similar users - then if user B is similar to user A, we can recommend items consumed by user A to user B. This was the original method in Resnick et, al and used by Net Perceptions.

Item-item collaborative filtering finds columns which are similar, yielding similar items - then if item A and item B are similar, we show item B to users who consumed item A. This is what Amazon uses, as do most commercial applications today.

Item-item collaborative filtering has been shown to be more effective, because sparsity of data make conclusions about user-user similarity rather suspect.

Statistically, item-item collaborative filtering is an implementation of a clustering algorithm, with the formula used to determine similarity of two different item column vectors acting as the similarity measure between items that is needed for clustering [an example is the cosine of the two vectors]. The advantage of using collaborative filtering for clustering rather than traditional methods like k-means or even SVD is that the former does not require iteration of the set of columns; it is [invariably] a one-pass algorithm, at the expense of reduced yet acceptable optimality. 

In general though, all collaborative filtering methods suffer from typical problems listed in many academic papers, like the cold start problem, popular item dominance, etc. They really work best in massive scale systems [both # of users and # if items]. This is why it did not work that well at Netflix, which has a small number of items [in the 100Ks rather than 100s of millions at Amazon and YouTube], so they needed better methods like the ensemble approach that won the Netflix prize.
"""
http://glinden.blogspot.com/2011/02/youtube-uses-amazons-recommendation.html

Labels:


 
Search engines have massive logs of people asking for directions from A to B, with precise locations.
Search engines have massive logs of people asking for directions from A to B, with precise locations. This often means that a person is interested in B, especially if they happen to be at or near A.

It appears this data may be as or more useful than user reviews of businesses and maybe GPS trails for local search ranking. At least 20% of web queries have local intent and mobile may be twice as high.

These findings are important because driving direction logs are orders of magnitude more frequent and cheaper than user reviews. Further, the logs provide near real-time evidence of changing sentiment and are available for broader types of locations.

http://glinden.blogspot.com/2011/05/value-of-google-maps-directions-logs.html
http://glinden.blogspot.com/2011/05/value-of-google-maps-directions-logs.html

Labels:


 
Google paper at the ICML 2011 conference, "Suggesting (More) Friends Using the Implicit Social Graph" (PDF), not...
Google paper at the ICML 2011 conference, "Suggesting (More) Friends Using the Implicit Social Graph" (PDF), not only describes the technology behind GMail's fun "Don't forget Bob!" and "Got the right Bob?" features, but also may be part of the friend suggestions in Google+ Circles.

http://glinden.blogspot.com/2011/07/google-and-suggesting-friends.html
http://glinden.blogspot.com/2011/07/google-and-suggesting-friends.html

Labels:



Powered by Blogger