Stats & IMDb: Machine Learning and Movie Recommendations – Part I

I have been using IMDb as a tracking tool for movies. I rate movies I watched, and I use IMDb’s average member rating, and metascores (critics rating) in determining what else I’d like to see. However, these ratings often deviate from my own scores. In preparation for a machine learning course, I am using this mismatch to set up a new project; the objective is to make better predictions and give more accurate movie recommendations. This is the first of at least two posts, one focusing on exploring the data, and establishing that IMDb ratings and metascores are indeed bad predictors of my own ratings, and the other focusing on machine learning algorithms and finding a better prediction model.

I scraped my own IMDb page with user ratings – using a web crawler written in Python – and retrieved basic meta data – title, year, genres, IMDb rating, metascore – for every movie I have rated since I started using the account. I excluded TV Series, but documentaries and specials are included with movies. In total this yielded 212 entries.

I started out with 212 rated movies1 of which 207 records were kept as they contain no missing data. I looked at rating distributions first (Figure 1)2. All three have right-skewed distributions, centred around values 7 and 8 on a ten-point scale3. For user ratings, this is to be expected as I intentionally pick movies I think I’ll appreciate, which would consequently be reflected in higher than average ratings. This distribution seems rather like the distribution of metascores, but IMDb ratings are distributed somewhat differently. They are very narrowly centred around values 7 and 8, with a single movie rated 5, and every other records rated between 6 and 9. This also is to be expected, because I use IMDb ratings as my main indicator for deciding whether to watch a certain movie. Therefore, it would make sense that there are no low scores in this distribution.

Figure 1 – Rating frequencies (n = 207).

On a first look, these distributions seem to imply that metascores more closely represent user ratings than IMDb scores. However, we cannot really say that yet, because this figure does not indicate the statistical significance of differences we may find, and it also in no way accounts for the differences in scores on individual rows.

Paired-sample t-tests
So, is there really a significant difference between these ratings? The mean user rating is 7.0, the mean IMDb rating is 7.3 and the mean metascore is 6.6. To test whether these ratings are different statistically speaking, I performed three paired-sample t-tests. And, as displayed in Table 1, all tests give significant results4.

Table 1 – Results for t-tests between rating types.

This means that for my machine learning project, I might want to consider each of these individually, because they likely have different predictive values.

Correlation and predictive value
I then performed three correlation tests, which tells me how closely related these rating scales are (Table 2). The results for these are also significant; they show moderate positive correlations between user ratings and IMDb ratings (r2 = 0.40) and user ratings and metascores (r2 = 0.36) of similar values, and a strong positive correlation between IMDb ratings and metascores (r2 = 0.69). This indicates the extend to which high ratings in group one are associated with high ratings in group two. For a recommendation model, these associations seems rather weak.

Table 2 – Pearson correlation between rating types.

When plugging these variables into an OLS model, the overall predictive value of IMDb ratings and metascores for variance in user ratings is 17% (adj. r2 = 0.17, p = 0.000), which is not a whole lot. Also, though the effect of IMDb scores is significant (t = 3.373, p = 0.001), the effect of metascores is not (t = 1.794, p = 0.074). In fact, dropping metascores from the model altogether only reduces the model’s predictive value by 1% (adj. r2 = 0.16, p = 0.000). It should also be noted that the model is heavily biased, as results for Jarque-Bera tests in both models show the residuals are not normally distributed. Again, this seems to show that IMDb rating and metascores are not ideal predictors in a movie recommendation model.

Now what?
In this post I have attempted to show that IMDb ratings and metascores are poor predictors for my for my own movie ratings, and consequently that this might be a good starting point for a machine learning project. IMDb ratings and metascores differ significantly from each other, and from user ratings. They show only moderate positive correlation with user ratings, and in an OLS only the IMDb rating is a significant predictor for user rating. However, this model explains a mere 17% of variance and is also not a great fit to the data. In subsequent posts, I explore better ways to deal with recommendations.

1 Henceforth, I refer to my own ratings as user ratings, because that is how they are recorded in the data.
2 Metascores are recorded on a 100-point scale, but for easier comparison in figures, they were converted to a ten-point scale; the same as IMDb ratings and user ratings.
3 IMDb ratings and metascores are accurate to one decimal; frequencies are for rounded numbers, where the bin for ratings of 8 include 7.5 ≤ rating < 8.5.
4 The significance level used in all tests is α = 0.05

Image Credit

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s