Dr Lee is an expert in voice biometrics, speech anti-spoofing, paralinguistic information processing, voice privacy and security, and speech database collection and experiment design. He received the Singapore IES Prestigious Engineering Achievement Award in 2013, and the Outstanding Service Award from IEEE ICME in 2020. He serves as an Editorial Board Member for Elsevier Computer Speech and Language (2016 - present) and as an Associate Editor for IEEE/ACM Transactions on Audio, Speech, and Language Processing (2017 – 2021). He was the General Chair of the Speaker Odyssey 2020 Workshop.
Research highlights
Xi-vector Speaker Embedding introduces a Bayesian framework for deep speaker embedding, where the xi-vector serves as the Bayesian variant of the x-vector, incorporating an uncertainty estimate. The xi-vector represents a natural extension of the well-established x-vector by integrating the uncertainty modeling capacity of the i-vector. Thus, it is referred to as the xi-vector, pronounced as the “/zai/” vector [Paper: SPL]. The framework is extended further for speech disentanglement [Paper: NeurIPS2023].
ASVspoof Challenges. Like other biometric systems, automatic speaker verification (ASV) is susceptible to various forms of spoofing, including replay, speech synthesis, and voice conversion attacks. To address these vulnerabilities, the development of spoofing countermeasures or Presentation Attack Detection (PAD) systems is crucial. The ASVspoof challenge initiative was established to promote research in anti-spoofing techniques and to offer standardized platforms for evaluating and comparing spoofing countermeasures. Dr Lee takes pride in his involvement in organizing the ASVspoof series of challenges. [Challenge website: ASVspoof.org] [Paper: T-PAMI]
CORAL+ domain adaptation in action! It is common that the domains (e.g., language, demographic) in which a speaker recognition system is deployed differs from that we trained it. CORAL+ was designed to bridge the gap. It is an unsupervised adaptation algorithm that learns from a small amount of unlabelled in-domain data. [Paper: ICASSP 2019, T-IFS]