Byron's Blog: Netflix Prize

Saturday, May 31, 2008

Netflix Prize

One of my summer projects will be working on the Netflix prize. It is a competition to write a program to predict user ratings of movies. We are provided with a huge dataset of actual user ratings from the Netflix database. We are also provided with a test set of <user, movie> tuples for which we need to predict ratings. After submitting the predictions Netflix returns the root mean squared error (RMSE) for a subset of the test set. Netflix already has the actual ratings for the test set, which is how they score the predictions. The three submissions I have made so far have gotten the following RMSE:

1.0533
0.9992
0.9844

Netflix's own algorithm (Cinematch) gets a RMSE of 0.9525. In order to win the competition and get the one million dollar prize a team must have a submission with a RMSE below 0.8572. The best team currently has a score of 0.8643. The three submissions I have made so far just use basic statistics for the predictions. I have three main ideas on how to approach the problem - two of them involve clustering algorithms and one of them uses temporal neural networks.

2 comments:

Unknown said...: I've looked into the netflix algoritm and tried to learn about neural networks, but I'm afraid I don't get it. The SVD-methods I have no problems with.

If we look at your demo, it predicts how I will vote based on how I voted in the past (0,1). How does this help me in netflix, more than that users with a lower grade average will generally grade lower than users with a high grade average. Don't you have to combine this with some kind of cluster analysis in order to get any valuable results?; 3:56 AM
Byron Knoll said...: You are right, combining different types of analysis and additional information to feed into the neural network is the only way it is going to make decent predictions. Today I submitted predictions from a neural network which only had <user rating, average movie score> as inputs and the RMSE was 0.9914. I think one advantage to using neural networks is that they can detect patterns unique to an individual. For example, lets say that a user rates all of his/her movies with either a 1 or a 5. The neural network will be able to detect this pattern and will be less likely to predict a 2, 3, or 4. I have decided that for now I will continue using neural networks but work on adding additional inputs which might be useful for making predictions.; 8:21 PM

Saturday, May 31, 2008

Netflix Prize

2 comments:

Links

Blog Archive