This project was completed with Amy Weng, Heidi Smith, and Jennifer Ouyang for the Spring 2023 section of COMPSCI 390: Computer Science Special Topics at Duke University.
If we wanted to be cutesy, we could start this with a cliché — but that would be like beating a dead horse, now wouldn’t it? Following the work of van Cranenburgh (2018) on the presence of clichés in modern Dutch literature versus their literary quality, we wanted to explore whether this phenomenon was replicable in 19th-century English novels. Expanding on the original question, we explored the markers of scholarly acclaim, quality, and popularity based on Goodreads and Modern Language Association data as compared to the percentage of words in an authors’ texts from Project Gutenberg that contain clichéd expressions. Our results show that, similarly to the original study, scholarly acclaim has a negative correlation with cliché ratios, whereas the other two predictor variables are positively correlated. We also summarize our findings vis-à-vis the clichés themselves, which are present in our corpus.
👉 The code used for this paper is available HERE