Monday, March 26, 2007

Collaborative Filtering: An introduction

Recommendation algorithms are hot right now. NetFlix, Amazon, and StumbleUpon all have examples of working recommendation algorithms that are not only interesting, but useful. Collaborative filtering is the method of prediction that has been widely adopted across these sites.

The underlying assumption with collaborative filtering is that your preferences in the past will help predict your preferences in the future. Assuming that this is true, CF makes itself unique by finding users with similar preferences as you. Once this neighborhood of users is found, it can be reasonably assumed any items they like, you will also like, and hence, an intelligent recommendation can be made.

Using the NetFlix analogy, if you and another user named "Jack", who is completely unknown to you, have rated the same movies in the same way, it can be assumed that you have similar tastes. Therefore, if Jack rates a new movie very highly, the likelihood that you will also like that movie is very high, and it can be recommended.

There are issues with this algorithm, of course. First, you have to build up a reasonable list of preferences for the algorithm to make successful matches of other users. If you only rate 1 item ever, you can't expect great recommendations to come your way. Second, if your tastes change, the algorithm will be able to pick it up, but only with time. There must be time for the new change to affect the weight of your past selections. Last, this algorithm doesn't scale very well. For massive sites like NetFlix and Amazon, I'm sure there are some sort of caching or iterative methods involved so that the recommendations aren't recomputed everytime you login to the site. With millions of users and millions of products, it would simply take too long to load any sort of user-specific recommendation.

There is another method way to implement CF outside of the user-centric CF model described above. You can do an item-centric recommendation too. Rather than find a neighborhood of users with similar tastes, you can look at the specific items you've rated well and find a neighborhood of items with that share similar traits. If you're in a world where the number of items doesn't grow as fast as the number of users, this algorithm would scale much better.

No comments: