When a Robot Writes a Play: Automatically Generating a Theatre Play Script

We inform about AI: When a Robot Writes a Play , a theatre play with a mostly artiﬁcially generated script. We describe the THEaiTRobot 1.0 tool, which was used to generate the script. We discuss various issues encountered in the process, including those that we solved to some extent as well as those which we plan to solve in a future version of the system.


The Play
On 26th February 2021, we premiered a theatre play with the title AI: When a Robot Writes a Play 1 in theŠvanda theatre in Prague, Czechia. The 60-minute play, followed by a 75-minute discussion, was streamed online via the webpage of the THEaiTRE project (Rosa et al., 2020), within which it was created. 2 The play was performed in Czech and accompanied by English subtitles.
The script (i.e., the dialogues) for the play was generated by our THEaiTRobot 1.0 system (Rosa et al., 2021) based on the GPT-2 language model (Radford et al., 2019) with very little human intervention. Specifically, our analysis (to be published soon) shows that 92% of the words in the script were generated by GPT-2. We describe some aspects of the generation process and some issues that we had to resolve. The subsequent staging of the play, on the other hand, was completely in the hands of human theatre professionals.
The THEaiTRobot 1.0 system has several limitations, such as being capable of only generating scripts for individual scenes, not for a full play. We discuss some of the most important issues and our intended approach at solving them. This will eventually lead to a THEaiTRobot 2.0 system, which will be used to generate a script for a new theatre play, with a premiere planned for January 2022.

Generative Art
There already is a range of works of art created with the help of artificial intelligence. Among the most notable are the short sci-fi movie Sunspring (Benjamin et al., 2016), the musical Beyond the Fence (Colton et al., 2016), and the theatre play Lifestyle of the Richard and Family (Helper, 2018). However, to the best of our knowledge, our project is the first to produce a full-length theatre play with such a small amount of human intervention into the script.

The Generation Process
The process of generating a theatre play scene script starts by the user (a theatre dramaturge in our case) defining the start of the scene: a scene setting and two initial lines of dialogue. For the first play, we defined a set of inputs revolving around a common topic to ensure some basic coherence of the whole play. The THEaiTRobot tool then uses the vanilla GPT-2 XL model to generate continuing lines.
For the play, 727 lines of script were generated. The user had the option to discard any generated line (together with all subsequent lines), prompting the tool to generate a different continuation (used 46×). The user could also manually enter a line into the script, which became part of the input for GPT-2 (used 8×). The script was then post-edited by deleting 214 lines, and changing 362 words (8%) on 146 lines.
The tool itself is implemented as a web application with a server backend, using the Huggingface Transformers library (Wolf et al., 2020). We have published a video showing the operation of THEaiTRobot 1.0, 3 a sample of its outputs, 4 and its source codes. 5 We will also soon publish the full script of the theatre play with marked human interventions.

Resolved Issues
Set of characters The model does not work with a limited set of characters naturally and tends to forget characters and invent new characters too often. We resolve this by modifying the next token probability distribution within the GPT-2 model, so that at the start of a new line, only tokens corresponding to character names present in the input prompt are allowed. We also boost probabilities of characters that have not spoken for a long time.
Repetitiveness GPT-2's generation may get stuck in a loop, generating one or several lines again and again. We managed to resolve this by modifying the hyperparameters of GPT-2, changing repetition penalty from 1.00 to 1.01. As a backup, we also automatically discard any generated repeated lines and prompt the model to generate another continuing line.

Limited context
The GPT-2 model has a limit of 1024 subword tokens, within which both the input prompt and the generated output must fit. The typical solution is to crop the input at the beginning so that it fits into the window with sufficient space for generating the output. However, this means forgetting potentially important information from the input prompt and the previously generated text, which can lead to an unwanted continual topic drift and also to generating contradictory text; the text is still locally consistent, but as a whole it may be inconsistent. This severely limits the ability of GPT-2 to generate longer texts.
To handle this issue, we introduce automated extractive summarization into the process, hoping that the summarization algorithm will identify the most important pieces of information to remember. Whenever the already generated script does not fit into the limited window any more, we use the TextRank algorithm by Mihalcea and Tarau (2004) to summarize a large part of the script into only 5 lines. We concatenate those with the last 250 tokens if the script, and use that as the input for the generation process. In this way, both global and local consistency of the script can be achieved.

Machine translation
The GPT-2 model operates on English, while we want to generate a Czech script. We therefore automatically translate the generated script using the CUBBITT neural translation model by Popel et al. (2020). As the translation tends to discard character names from the lines, we add them by identifying them in the input and translating them independently.

Unresolved issues and future plans
Generating a whole play The model is not able to generate a long and complex text such as a full theatre play script. To resolve this, we intend to generate the script hierarchically, first generating a synopsis for the whole play, then expanding it into synopses for individual scenes, and finally generating each scene individually based on its synopsis. This approach is inspired by the work of Fan et al. (2018Fan et al. ( , 2019, who take a similar coarse-to-fine approach to story generation. Our situation is, however, more complex, as we plan to use one more step of the hierarchy. Character personalities The characters in the play do not seem to have independent personalities in the generated script; the model seems to simply ensure consistency with already generated text, not taking the character names into account. The character personalities thus appear to switch and merge. We intend to resolve this by learning theatre character embeddings and using them to condition the language model. We plan to resolve this by clustering our data into several basic character personality types (Azab et al., 2019), then train separate character-aware language models, either by finetuning the GPT-2 model, or by using adapter models (Madotto et al., 2020;Wang et al., 2020).

Dramatic situations
The text is generated word by word and line by line, whereas human authors of theatre plays typically operate on a more abstract level, such as dramatic situations (Polti, 1921). 6 While there is some work on identifying dramatic turning points (Papalampidi et al., 2019(Papalampidi et al., , 2020, it is too coarse-grained for our application. We are thus currently annotating a corpus of theatre play scripts with a modified set of dramatic situations, and plan to enhance the tool with this abstraction, either by adding one more layer in the hierarchical setup, or by using special tokens or embeddings to mark dramatic situations in the generated text. Machine translation issues The MT model we use is tuned for news text, not theatre scripts, and translates each sentence independently. This leads to various issues, including errors in morphological gender (which should pertain to the character), variance in the honorific T-V distinction (which may vary but should be consistent for each pair of characters), and erroneous sentence splitting. We intend to tackle these issues by using a document-level translation system which takes larger context into account, fine-tuning the model on a corpus of theatre play scripts, and adding various heuristic modifications where necessary.

Conclusion
We have described the process behind generating the script for the play AI: When a Robot Writes a Play, which was created by THEaiTRobot 1.0 with minimal human intervention. We have also discussed numerous issues, some of which we managed to resolve, while others are waiting to be tackled in THEaiTRobot 2.0.