442
Views
3
CrossRef citations to date
0
Altmetric
Research Article

How words matter: machine learning & movie success

& ORCID Icon
 

ABSTRACT

We employed a machine learning structure to examine the relationships between word choice in Internet Movie Database (IMDB) comedy movie descriptions and overall performance. Our measures of success were ticket sales, user ratings, and Metacritic scores. We used linear regressions, along with recurrent neural networks implementing a Long Short-Term Memory framework, for textual sentiment analysis. Employing conservative p-values, our results revealed the possible influence of gender bias in movies that favoured male-centric themes, as well as negative effects for holiday comedies, paranormal movies, and crime films.

JEL CLASSIFICATION:

Disclosure statement

We have no potential conflicts of interest to disclose.

Notes

1 The tf–idf value increases with the number of times a certain word appears in a description, but it is reduced based on the number of descriptions in the dataset that contain that word. Each word had to be included in at least three descriptions to be considered, while words that were very common, such as ‘the’ and ‘a’, called stop words, were excluded completely, since they do not provide useful information. While there is no uniform definition of stop words, the term is generally used to describe ‘common’ words that are cut out during the pre-processing stage in semantic analysis.

2 RNNs are a subset of artificial neural networks in which each node can only directly influence the next node in a directed sequence. This allows the network to better represent temporally ordered data, such as speech or written text. Nodes usually have an internal memory for storing information from previous inputs. Notable applications that use RNNs include language translation and ‘smart speakers’ (such as the Amazon Echo) that can parse verbal commands.

3 An LSTM framework is a method for computers to ‘understand’ the meaning of natural language words presented in a sequence. In our context, a recurrent neural network was the only semantic method that could achieve the desired purpose, and it is unparalleled when compared with alternative methodologies. In our model, the framework employs a layering structure with initial conversion of words into sequences (embedding layer), followed by two additional layers used to first convert these sequences into something of meaning for the computer (LSTM layer and dense layer), and a final layer used to answer our question regarding whether the value is above or below the mean. We also note that in the LSTM layer, there are a sequence of nodes, with each receiving the output from the previous node, along with a stored memory value. The current node combines this information with the latest input word in the text to generate the output and memory values for the next node. The dense layer takes all 64 sequential output values and combines them into the final binary prediction layer.

4 The network was trained in 13 epochs with a batch size of 16. The training process of an artificial neural network optimizes the internal weights of the connections within the model so that it can accurately predict the results of previously unseen examples. This is done by selecting examples from the training set to input into the network using the current weights. If the model outputs the wrong predictions, the weights are adjusted using gradient descent to minimize the error between the predictions and the correct labels. This is iteratively repeated until a desired number of epochs, each representing one complete cycle through the entire available training data, have been completed. To conserve computer resources, examples are often run in batches – where a batch is a subset of an epoch, with the weights updated only after all the examples in each batch have been run. The test set, which is the data held back from the training set, is reserved for assessing the accuracy of the model for new examples.

5 The results do not use movie runtime as a control, since we found that it had no additional impact on our results. To provide generally conservative estimates, in this table and in the next, we only display the coefficient impact of words having p-values at the 1% level of significance or better.

6 In this method, words that occur commonly in the complete corpus are considered less informative, and are weighted less than unusual words, which are considered to have greater impact.

7 Notice that we cannot dismiss the possibility of a seasonality effect, since our use of final ticket sales did not control for the initial release date. Users also seem averse to some of these crime movies, with poor ratings for ‘prison’. There may also be a religious component – as with holiday movies, in how users negatively rate ‘inspiration’. It is also an interesting generational quirk that the words ‘paranormal’, ‘challenged’, ‘normal’, or ‘annual’ also yield worse outcomes from users. Apparently, this generation of users is less interested in sci-fi or the distinctions between normal and not.

8 ‘boy’ made this list as well (with 0.05 significance, but not 0.01) as did ‘girl’ (again, with 0.05 significance only), and the positive impact of ‘boy’ at least triple the impact of using the word ‘girl’ in the description.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.