- Home
- Research
- Research Output
- Journal Articles
- Robust stylometric analysis and author attribution based on tones and rimes
Journal Articles Published
Robust stylometric analysis and author attribution based on tones and rimes
Hou, R., & Huang, C. R. (2020). Robust stylometric analysis and author attribution based on tones and rimes. Natural Language Engineering, 26(1), 49-71. https://doi.org/10.1017/S135132491900010X
Abstract
In this article, we propose an innovative and robust approach to stylometric analysis without annotation and leveraging lexical and sub-lexical information. In particular, we propose to leverage the phonological information of tones and rimes in Mandarin Chinese automatically extracted from unannotated texts. The texts from different authors were represented by tones, tone motifs, and word length motifs as well as rimes and rime motifs. Support vector machines and random forests were used to establish the text classification model for authorship attribution. From the results of the experiments, we conclude that the combination of bigrams of rimes, word-final rimes, and segment-final rimes can discriminate the texts from different authors effectively when using random forests to establish the classification model. This robust approach can in principle be applied to other languages with established phonological inventory of onset and rimes.