Oleg Zabluda's blog
Thursday, November 29, 2012
 
Discussion (mostly in the comments) of collaborative filtering history, going back to YouTube (2010), Amazon...
Discussion (mostly in the comments) of collaborative filtering history, going back to YouTube (2010), Amazon (1998), Firefly (1996), academia Resnick et, al (1994), Xerox PARC Tapestry (1992)

http://glinden.blogspot.com/2011/02/youtube-uses-amazons-recommendation.html

Amazon invented particularly efficient and effective algorithm for recommender systems, which endured almost unchanged since 1998 through all the work of the Netflix Prize, etc. YouTube's 2010 paper is a minor variation of Amazon's 1998 paper.

http://glinden.blogspot.com/2011/02/youtube-uses-amazons-recommendation.html?showComment=1296883888409#c1403641366477195631 """
There are 2 implementations of collaborative filtering, both based on the same user-item matrix.  Let users be rows, and items e the columns.

User-user collaborative filtering finds rows which are similar, yielding similar users - then if user B is similar to user A, we can recommend items consumed by user A to user B. This was the original method in Resnick et, al and used by Net Perceptions.

Item-item collaborative filtering finds columns which are similar, yielding similar items - then if item A and item B are similar, we show item B to users who consumed item A. This is what Amazon uses, as do most commercial applications today.

Item-item collaborative filtering has been shown to be more effective, because sparsity of data make conclusions about user-user similarity rather suspect.

Statistically, item-item collaborative filtering is an implementation of a clustering algorithm, with the formula used to determine similarity of two different item column vectors acting as the similarity measure between items that is needed for clustering [an example is the cosine of the two vectors]. The advantage of using collaborative filtering for clustering rather than traditional methods like k-means or even SVD is that the former does not require iteration of the set of columns; it is [invariably] a one-pass algorithm, at the expense of reduced yet acceptable optimality. 

In general though, all collaborative filtering methods suffer from typical problems listed in many academic papers, like the cold start problem, popular item dominance, etc. They really work best in massive scale systems [both # of users and # if items]. This is why it did not work that well at Netflix, which has a small number of items [in the 100Ks rather than 100s of millions at Amazon and YouTube], so they needed better methods like the ensemble approach that won the Netflix prize.
"""
http://glinden.blogspot.com/2011/02/youtube-uses-amazons-recommendation.html

Labels:


| |

Home

Powered by Blogger