Netflix Data Statistics

February 1st, 2007

I found some interesting analysis of the netflix prize dataset, but it was not exactly what I was looking for. I downloaded the dataset, and decided to do some analysis myself. I went with what I though was the easy route by installing pyflix, which in hindsight might not have been the fastest approach(pyflix took about 90 minutes to install after I figured out how to install NumPy in Ubuntu). Here are the results of my analysis of movie ratings over time.

The first graph shows the average rating of movies over release year. The average movie rating is not an average of each movie’s average rating, but instead an average of all the ratings of movies released in a given year.
Average Movie Rating by Year

This shows the average number of rating per movie by year which can be used as a basic measure of popularity.
Average Number of Ratings per Movie by Year

The last graph shows the number of movies in the Netflix dataset by release year.
Number of Movies in Netflix by Year


4 Comments to “Netflix Data Statistics”


  1. Ilya Grigorik said:

    Very cool, I never thought of looking at the first two graphs you made. The ‘Average Movie Rating’ is especially interesting, there seems to be a fairly consistent downward trend, albeit with a small rebound at the end. Perhaps movie execs should stop worrying about users stealing content and instead focus on making better movies! :)


  2. Colin Green said:

    That first graph is fascinating. There is a clear trend line there and it’s interesting to speculate on the psychological mechanisms behind such a pattern, and indeed most/all of the patterns that can be extracted from the netflix data set. E.g one can imagine that,say, users expectations of older films are lower and therefore they are (on aggregate) pleasantly surprised when they view them, hence the hump in the years from about 1920 to the 1970’s. On the flip side people may have a higher expectation of more recent films. Or perhaps as more and more films were made the overall quality went down so your chances of getting a bum movie from alter year are slightly higher. All speculation of course, but clearly there is something going on there.


  3. Zach Pilchen said:

    I think your mistake here is assuming that each year’s sample of movies offered by Netflix is representative of all the movies released in that year.

    What about the possibility that movies aren’t actually diminishing in quality, but that Netflix has only obtained the best older movies?…the ones that have withstood the testament of time? Any new movie that comes out on DVD today, no matter how much it flopped in theaters, will likely be obtained by Netflix. (See: “Meet Dave,” starring Eddie Murphy).

    Alternatively, major flops from 1939 (the most highly ranked year for Netflix movies) are likely excluded. Instead of offering flops from that year (like “Midnight Shadow”) that haven’t withstood time, the films Netflix offers are the ones that people are most likely to rent. From 1939, these include such highly regarded classics as “Gone With the Wind,” “The Wizard of Oz,” or “Mr. Smith Goes to Washington.”

    Rating dip toward the 1910 region because, frankly, movies really sucked back then. Most movies didn’t even have sound.


  4. kbrower said:

    Good point. Perhaps I should analyze the 100 best movies per year.

Leave a Reply