Skip to main content Start main content

Cantonese Natural Language Processing in the Transformers Era: A Survey and Current Challenges

Xiang, R., Chersoni, E., Li, Y., Li, J., Huang, C.-R., Pan, Y., & Li, Y. (2024). Cantonese Natural Language Processing in the Transformers Era: A Survey and Current Challenges. Language Resources and Evaluation. https://doi.org/10.1007/s10579-024-09744-w

 

Abstract

Despite being spoken by a large population of speakers worldwide, Cantonese is under-resourced in terms of the data scale and diversity compared to other major languages. This limitation has excluded it from the current “pre-training and fine-tuning” paradigm that is dominated by Transformer architectures. In this paper, we provide a comprehensive review on the existing resources and methodologies for Cantonese Natural Language Processing, covering the recent progress in language understanding, text generation and development of language models. We finally discuss two aspects of the Cantonese language that could make it potentially challenging even for state-of-the-art architectures: colloquialism and multilinguality.

 

FH_23Link to publication in Springer Nature Link

FH_23Link to publication in Scopus

 

Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here