Skip to main content
Start main content

Robust stylometric analysis and author attribution based on tones and rimes

Hou, R., & Huang, C. R. (2020). Robust stylometric analysis and author attribution based on tones and rimes. Natural Language Engineering, 26(1), 49-71. https://doi.org/10.1017/S135132491900010X

 

Abstract

In this article, we propose an innovative and robust approach to stylometric analysis without annotation and leveraging lexical and sub-lexical information. In particular, we propose to leverage the phonological information of tones and rimes in Mandarin Chinese automatically extracted from unannotated texts. The texts from different authors were represented by tones, tone motifs, and word length motifs as well as rimes and rime motifs. Support vector machines and random forests were used to establish the text classification model for authorship attribution. From the results of the experiments, we conclude that the combination of bigrams of rimes, word-final rimes, and segment-final rimes can discriminate the texts from different authors effectively when using random forests to establish the classification model. This robust approach can in principle be applied to other languages with established phonological inventory of onset and rimes.

 

FH_23Link to publication in Scopus

FH_23Link to publication in Cambridge University Press


Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here