Automated lexical and time series modelling for critical discourse research: A case study of Hong Kong protest editorials
Abstract
This paper advances a novel approach to critical synchronic and diachronic discourse analysis using automated lexical and time series modelling. It is illustrated by a case study of near-daily editorials (N = 201; 300,081 words) from 9 June to 2 October 2019 on the Hong Kong protest movement in three ideologically contrasting sources – China Daily (CD), South China Morning Post (SCMP), and Hong Kong Free Press (HKFP). Lexical analysis with Linguistic Inquiry and Word Count (LIWC) first revealed four predominant socio-psychological word categories - relativity, drive, cognitive, and affect. Overall, HKFP expresses anger at the government, CD lays blame on protestors’ violent actions, and SCMP occupies a middle position to focus on less political aspects. Time series modelling is then applied to redirect attention from these aggregated differences to how they unfold day-to-day. It was found that while positive affect words are characterized by short-term consistencies and fluctuations, most variables exhibit random variation across time. The approach allows precise description of how linguistic variables in neighbouring time periods inter-relate, offering rich interpretative possibilities for different linguistic/discourse contexts. Furthermore, determining whether a variable is ‘modelable’ offers a systematic and replicable way to interrogate the assumption that discourse inevitably serves to construe social reality.
Link to publication in Science Direct