Who Wrote this Novel? Authorship Attribution across Three Languages

Auteurs-es

  • Jacques Savoy Institut d’informatique, Université de Neuchâtel

DOI :

https://doi.org/10.26034/tranel.2011.2792

Résumé

Based on different writing style definitions, various authorship attribution schemes have been proposed to identify the real author of a given text or text excerpt. In this article we analyze the relative performance of word types or lemmas assigned to re-present styles and texts. As a second objective we compare two authorship attribu-tion approaches, one based on principal component analysis (PCA), and a new au-thorship attribution method involving specific vocabulary (Z score classification scheme). As a third goal we carry out our experiments on data from three corpora written in three different languages (English, French, and German). In the first we ca-tegorize 52 text excerpts (taken from 19th century English novels) written by nine au-thors. In the second we work with 44 segments taken from French novels (mainly 19th century) written by eleven authors. In the third we extract 59 German text excerpts written by 15 authors and covering the 19th and early 20th centuries. Based on these collections and two specific features (word types or lemmas) we demonstrate that the Z score method performs better than the PCA, while demonstrating that lemmas tend to produce slightly better performance than word types.

Téléchargements

Publié-e

01-01-2011

Comment citer

Savoy, J. (2011). Who Wrote this Novel? Authorship Attribution across Three Languages. Travaux neuchâtelois De Linguistique, (55), 59–75. https://doi.org/10.26034/tranel.2011.2792

Numéro

Rubrique

Article thématique