Saturday, May 31, 2008

Netflix Prize

One of my summer projects will be working on the Netflix prize. It is a competition to write a program to predict user ratings of movies. We are provided with a huge dataset of actual user ratings from the Netflix database. We are also provided with a test set of <user, movie> tuples for which we need to predict ratings. After submitting the predictions Netflix returns the root mean squared error (RMSE) for a subset of the test set. Netflix already has the actual ratings for the test set, which is how they score the predictions. The three submissions I have made so far have gotten the following RMSE:


Netflix's own algorithm (Cinematch) gets a RMSE of 0.9525. In order to win the competition and get the one million dollar prize a team must have a submission with a RMSE below 0.8572. The best team currently has a score of 0.8643. The three submissions I have made so far just use basic statistics for the predictions. I have three main ideas on how to approach the problem - two of them involve clustering algorithms and one of them uses temporal neural networks.
