Applied Corpus Linguistics: A Synergy of Corpora and Generative AI
As the pace of technological innovation reshapes how we teach, learn, and analyze language, the Department of English and Communication (ENGL) at The Hong Kong Polytechnic University has taken a leading role in fostering dialogue between tradition and innovation. In my dual capacity as Editor-in-Chief of Applied Corpus Linguistics (ACORP) and Head of ENGL, I am proud to reflect on our journal’s initiative to feature short communication pieces on the intersection of corpora and generative AI approaches.
The inspiration for this feature emerged from a pressing need to bridge two seemingly disparate worlds: the empirical rigor of corpus linguistics, with its roots in analyzing real-world language data, and the transformative potential of generative AI, which is redefining how learners and researchers engage with language. While corpus methods have long provided a bedrock of authenticity—revealing patterns in vocabulary, grammar, and cultural nuance—generative AI tools like ChatGPT or DeepSeek offer unprecedented opportunities for personalized, interactive language practice. Yet too often, these fields have advanced in parallel rather than in conversation. Our goal was to create a platform where scholars could explore their synergies, confront their tensions, and chart a collaborative path forward.
Since this ACORP feature’s launch in 2023, we have published concise, impactful contributions from leading scholars, including members of our editorial board and global experts. Prof. Phoebe Lin of ENGL contributed an important piece entitled ChatGPT: Friend or Foe (to Corpus Linguists?). These contributions illustrate how corpus data can ground AI tools in authentic language use, ensuring that generated content—whether for business negotiations, essay feedback, or vocabulary drills—reflects real-world communication.
Yet the initiative has also sparked candid discussions about challenges. Generative AI’s “black-box” opacity raises questions about accountability: How do we ensure that AI-generated language advice aligns with corpus-validated norms? How do we address biases embedded in training data, which may exclude regional dialects or perpetuate stereotypes? Contributors have argued for ethical frameworks that prioritize transparency, urging developers to document corpus sources and educators to critically evaluate AI outputs. These debates resonate deeply in Hong Kong’s multilingual context, where language education must balance global English standards with respect for local linguistic identities.
Practical hurdles also persist. While large institutions increasingly adopt AI-driven tools, resource constraints can leave smaller programs behind. Our feature has highlighted this divide, with scholars calling for open-access corpora and collaborative models to democratize innovation. Such efforts align with our department’s commitment to inclusivity, as seen in our research on AI-assisted language learning for second-language speakers of English and our partnerships with regional educators.
Looking ahead, the feature will delve into emerging frontiers. How might multimodal corpora—integrating text, speech, and gesture—train AI to teach paralinguistic skills like persuasive delivery or intercultural body language? Can AI help track linguistic shifts in real time, using social media corpora to dynamically update teaching materials? Contributors are already exploring these questions through projects that blend expertise in corpus-assisted communication studies with cutting-edge AI development.
This synergy between corpora and generative AI reflects a broader ethos within ENGL and our journal: one that embraces technological change without compromising scholarly rigor. By fostering interdisciplinary collaboration—between linguists, computer scientists, and educators—we are shaping tools that enhance human creativity rather than replace it. As we move forward, we remain committed to interrogating both the promises and pitfalls of this integration.
ENGL hopes to continue leading by example, hosting workshops on corpus-informed AI pedagogy and supporting student-led innovations. In July 2025, we will host an AI in Education Summit for the Education Bureau of Hong Kong, and several ENGL staff and researchers are pursuing important initiatives in this domain. In a field often polarized between techno-optimism and skepticism, I believe that ENGL offers a third way: one where innovation is driven by evidence, ethics, and a shared dedication to empowering learners. As educators and researchers at the forefront of this movement, we are not merely observers of change—we are its architects.
Access the ACORP Journal from HERE.