Meeting invitation
Welcome to Blockbuster Inc., the only movie producer obsessed on achieving the highest financial results in the film industry. You have the chance to take part of our latest board meeting, where our hard working interns sum up their most promising findings. During this meeting of the highest importance, they will expose the secret sauce that turns a movie into a worldwide success. In case you would be reluctant to attend this summit, be assured that the investigation presented will be ground-breaking, since our newly hired interns were given unlimited access to our next generation quantum compute clusters and were generously compensated for their job. We strongly encourage you to attend the meeting. Stay tuned!
Name of our talented interns: Adam Benslama, Dusan Cvijetic, Gilles Moreillon, Marko Simic, Romain Pythoud
Start of the meeting
8:00-8:30: Introduction to the dataset | Speaker: Romain
For this investigation, the CMU Movie Corpus dataset will be used to take a dive into the realms of movie industry. This dataset includes movie titles, featured actors, related genres, corresponding box office revenues and more. Regrettably, the dataset does not include any information on movies ratings, a useful component of our analysis as it includes the overall public perception. To adress this issue, we supplemented the main dataset with the IMDb Movie Dataset, which provides the movie ratings.
Furthermore, a quick look at the main dataset shows that most of the box office revenues data related to the films are missing. This observation is confirmed by the following plot.
Although the proportion of missing revenues data is high, an analysis on the overall box office revenues could still provide relevant insights as the total number of remaining is still significant.
A plot of the movie revenues over the years is done below.
From this plot, we can see that the increase of movie revenues over the years follow an exponential distribution. This is further confirmed by the positive slope of the linear regression on the log axis of the data points. This exponential increase in box office revenues over the years is due to the compounding effect of monetary inflation year over year. We can take into account this effect by adjusting the revenues in usd using inflation data provided by the US Consumer price index (CPI). Although other effects can affect the revenue distribution over the years like the size of the adressable market, the resulting plot is a more reliable representation of the financial success of a movie through history.
8:30-9:00: Movie genres clustering | Speaker: Adam
We created a pie chart to visualise how the movie genres are distributed. This allow us to have a quick look at the movie distribution, assigned to the main genres.
The genre “Drama & Romance” is the most common in the dataset, making up a quarter of the movies. “Comedy” and “Action & Adventure” also represent significant portions, while “Horror & Thriller” and “Crime & Mystery” are less represented but still significant.
The “Others” category, which represents the aggregated total of the least common genres, makes up a larger slice than two individual genres, indicating that the movies genres follows a heavy-tailed distribution.
But so far, this analysis doesn’t say much about the changes over time and how they are distributed with respect to the revenue they generated. The next plot should be more informative in this regard.