F.R.I.E.N.D.S: Character Classification Perspective

6 min readJan 23, 2021

Friends is one of the famous series which runs for 10 seasons (1994–2004). The series talks about the life of people in their late 20s which contains six main characters: Monica, Ross, Rachel, Joey, Phoebe, and Chandler. Even though the series has finished more than 15 years ago, it still becomes a favorite for the streaming audience.

The data analysis about this particular series also has spread online from a different angle, like The One With The Data Scientist, F.R.I.E.N.D.S — Data Analysis, or Who the Real Lead Character Was on “Friends”. Those articles mostly talk about the deep analysis of the characters, runtime, sentiment, and relationship from the dialogue script they have, and it has really good visualization about it.

However, quite different from previous articles, one thing that interesting is how the scripts of dialogue can describe the character itself. The big question here is: can a model predict which character talks given a particular scene of the script?

The one with Data

This article uses the data source from here, and there is some engineering work that has been done which makes the data look like below.

As usual, the data needs to be split into two. In this case, there will be seen_data and unseen_data . The seen_data is all dialogue from season 1 to season 8 which will be used for training the model. Meanwhile, the unseen data is the rest of the season to check the model works well.

The data consist of the episode, the character (and its label), the scene (and its label), and the dialogue (the raw and the clean one). Cleaning the dialogue quite tricky since it also contains the act of the characters, which will be ruled out for this analysis.

The one when you realize

The feature to generate the “scene” to predict character obviously the dialogue itself. By ruled out another character other than the six main characters as “others”, this is the quick observation of the dialogue with a length less than 5.

The word cloud shows most of the words between characters almost the same. Thus the words like ‘yeah’, ’okay’, ’hey’, etc are removed.

The analysis focus on 5 or fewer length since the p50 of dialogue length is 7 words. It is just a simple exploration to remove the noise factor that can’t distinct the character. This removal emphasizes words like “honey” in Monica, and “pheebs” in Joey.

After the text cleansing, here are the latest dataset looks like,

where the label mapping like this (ignores “others”) :

{0: 'rachel', 1: 'ross', 2: 'monica', 3: 'chandler', 4: 'joey', 5: 'phoebe'}

The proportion of the label also almost the same :

After getting the data, the LSTM model is trained by certain parameters with MAX_LEN = 20 and EMBED = 128, here is the result of the unseen data prediction (season 9 and 10) :

The macro avg of the f1 score is 0.21 which not really good. Most of the prediction falls on Rachel or Monica which seems that most of the words that used by other character are used by them.

This shows the lines are not enough to predict the character, it needs more context. In a scene, the lines are not just generated by the characters but appear to respond to the interlocutors. Therefore the second dataset is generated :

text1 is the line of interlocutors and text2 is the line of predicted character. Using 2 input of LSTM, here are the results of the unseen data prediction :

The result is better, but not significantly better. The model still ignored chandler, joey, and phoebe. One hypothesis is these characters don’t have unique terms. It is really hard for the model to distinguish the characters. However, as a note, the model here hasn’t really built optimally which could be one of the factors.

The one where lines are not enough

Clearly, even though with the optimal text cleansing and model, text data will not really enough to predict character. Since the premise indicate that the input of the model must “scene of the script”, we can use the place and the interlocutors on the scene.

There are many places on the script, it even has different names for the same place. Like “Monica and Rachel's apartment” and “Monica and Chandler’s apartment” which is the same place in the series. The manually labeled of the place generates 13 different places. This is the simple exploration between characters and the places :

The table shows the percentage of the place becomes the scene location by each character. For example, 41.08% of place label 4 (which is Ross’s apartment) becomes Ross's scene location. A further example, Chandler and Joey dominate place label 3 (Chandler and Joey’s apartment).

Not only the places, but the interaction between character also become quite important:

One significant changes happen after season 4, there are intense Chandler-Monica interaction. It happens after the “london” accident which really change the proportion interaction there. Also, Interaction between Joey and Rachel also increase since they are become roommate on season 7. Based on this analysis, we can have hypothesis that the early seasons will not relevant anymore to predict character on season 9 and 10.

Consider the above exploration, here is the data if we add the person who the characters interact with (label_prev) and the places of the scene.

After that, the third dataset is devided into “All season dataset” and “After 3rd seasons datasets” (remove first, second and third season) to test the hypothesis. Use the same keras structure, the below confusion matrix shows the evaluation between “All dataset” (Confusion matrix 3) and “After 3rd seasons datasets” (Confusion matrix 4).

It tells us that the new feature make significant improvement, some characters even have recall almost 50%. Moreover,removing seasons 1 to 3 also gives quite improvements to predict Rachel, Ross, Monica, and Chandler. Meanwhile, Joey has decreased accuracy and the model really can’t distinguish him with Rachel, Ross and Monica. Also, Phoebe seems almost the same as Ross.

The one which still needs an answer

The model of characters predictions given a scene still needs improvements. The trained model above hasn’t been optimized and maybe there are some unexplored aspect of “the scene” that may improve the model.

In conclusion, a modelling is not only talk about the algorithm, but also the features, which needs the understanding of the cases. In this case, the model still can’t predict the characters well but it gets improvement (in terms of macro avg) from 0.24 to 0.37 because we do more exploration on the features. Sometimes, it is fine to not really success to build a model in the beginning, since the modeller also need to have done several iteration to learn the cases and gets the best model in the end.

F.R.I.E.N.D.S: Character Classification Perspective

The one with Data

The one when you realize

The one where lines are not enough

The one which still needs an answer

Written by Alamsyah Hanza