Netflix Data Statistics
February 1st, 2007
I found some interesting analysis of the netflix prize dataset, but it was not exactly what I was looking for. I downloaded the dataset, and decided to do some analysis myself. I went with what I though was the easy route by installing pyflix, which in hindsight might not have been the fastest approach(pyflix took about 90 minutes to install after I figured out how to install NumPy in Ubuntu). Here are the results of my analysis of movie ratings over time.
The first graph shows the average rating of movies over release year. The average movie rating is not an average of each movie’s average rating, but instead an average of all the ratings of movies released in a given year.
This shows the average number of rating per movie by year which can be used as a basic measure of popularity.
The last graph shows the number of movies in the Netflix dataset by release year.
Ilya Grigorik said:
Very cool, I never thought of looking at the first two graphs you made. The ‘Average Movie Rating’ is especially interesting, there seems to be a fairly consistent downward trend, albeit with a small rebound at the end. Perhaps movie execs should stop worrying about users stealing content and instead focus on making better movies!
Colin Green said:
That first graph is fascinating. There is a clear trend line there and it’s interesting to speculate on the psychological mechanisms behind such a pattern, and indeed most/all of the patterns that can be extracted from the netflix data set. E.g one can imagine that,say, users expectations of older films are lower and therefore they are (on aggregate) pleasantly surprised when they view them, hence the hump in the years from about 1920 to the 1970’s. On the flip side people may have a higher expectation of more recent films. Or perhaps as more and more films were made the overall quality went down so your chances of getting a bum movie from alter year are slightly higher. All speculation of course, but clearly there is something going on there.