Saturday, May 31, 2008

Netflix Prize

One of my summer projects will be working on the Netflix prize. It is a competition to write a program to predict user ratings of movies. We are provided with a huge dataset of actual user ratings from the Netflix database. We are also provided with a test set of <user, movie> tuples for which we need to predict ratings. After submitting the predictions Netflix returns the root mean squared error (RMSE) for a subset of the test set. Netflix already has the actual ratings for the test set, which is how they score the predictions. The three submissions I have made so far have gotten the following RMSE:

1.0533
0.9992
0.9844

Netflix's own algorithm (Cinematch) gets a RMSE of 0.9525. In order to win the competition and get the one million dollar prize a team must have a submission with a RMSE below 0.8572. The best team currently has a score of 0.8643. The three submissions I have made so far just use basic statistics for the predictions. I have three main ideas on how to approach the problem - two of them involve clustering algorithms and one of them uses temporal neural networks.

2 comments:

Unknown said...

I've looked into the netflix algoritm and tried to learn about neural networks, but I'm afraid I don't get it. The SVD-methods I have no problems with.

If we look at your demo, it predicts how I will vote based on how I voted in the past (0,1). How does this help me in netflix, more than that users with a lower grade average will generally grade lower than users with a high grade average. Don't you have to combine this with some kind of cluster analysis in order to get any valuable results?

Byron Knoll said...

You are right, combining different types of analysis and additional information to feed into the neural network is the only way it is going to make decent predictions. Today I submitted predictions from a neural network which only had <user rating, average movie score> as inputs and the RMSE was 0.9914. I think one advantage to using neural networks is that they can detect patterns unique to an individual. For example, lets say that a user rates all of his/her movies with either a 1 or a 5. The neural network will be able to detect this pattern and will be less likely to predict a 2, 3, or 4. I have decided that for now I will continue using neural networks but work on adding additional inputs which might be useful for making predictions.