Skip to main content
Start main content

Prof. Emmanuele CHERSONI, Department of Chinese and Bilingual Studies

 

A General Evaluation of Distributional Semantic Models for Mandarin Chinese. 6th Academic Forum on Language Acquisition, Cognition, and Brain Science. Center for the Cognitive Scienec of Language (CCSL), Beijing Language and Culture University, Beijing, China, 13 December 2024.

Abstract
Since their popularization in early 2010, word embeddings have become a standard in Natural Language Processing (NLP), as they provide an efficient method to derive word meaning representations from corpus data.
Based on the Distributional Hypothesis (Harris, 1954), which claims that words occurring in similar contexts also have a similar meaning, the word embedding approach models the semantic similarity between lexical items as proximity in high-dimensional vector spaces. Significantly, the estimated similarities between word vectors have been shown to predict human performance in several psycholinguistic tasks. Moreover, in the era of language models a new type of word embedding models emerged, based on contextualized vectors that are able to account even for subtle variations of lexical meaning in context.
In this contribution, I will illustrate the historical development of word embeddings from the early days of Distributional Semantics to the contextualized vectors used by modern language models. Additionally, since most model evaluations in the literature have been carried out on English, I will present the results of our recent study on Mandarin Chinese datasets and discuss how the findings compare with similar experiments in Western languages.


Your browser is not the latest version. If you continue to browse our website, Some pages may not function properly.

You are recommended to upgrade to a newer version or switch to a different browser. A list of the web browsers that we support can be found here