x

current students

Computational Linguistics Summit in the Era of Large Language Models cum International Symposium on Collaborative Innovations between The Hong Kong Polytechnic University and The China Computer Federation

大模型時代的計算語言學高峰論壇暨香港理工大學與中國計算機學會合作創新國際研討)

Poster and Registration:

CCFPolyUPoster

Should you be interested, Please click HERE to register.

 

Programme Rundown:

Day 1, 22 Aug 2024 ( Thursday ) 

 Time Venue  Session 
13:30 - 14:00   PQ306 Welcoming
14:00 - 15:15 PQ306

Seminar 1

What is Digital Humanities?

Prof. Kam Fai Wong (黃錦輝 教授) 

Zoom Meeting Link: https://polyu.zoom.us/j/85605655395?pwd=XGbSk8QSO8fp7qa8K1l9bl7thvJb8s.1
Meeting ID: 856 0565 5395Passcode: 455340

15:30 - 16:45 PQ306

Seminar 2 (Online seminar)

從認知科學看待語言機制 (中文报告) 

Prof. Guodong Zhou (周國棟 教授)

Zoom Meeting Link: https://polyu.zoom.us/j/85605655395?pwd=XGbSk8QSO8fp7qa8K1l9bl7thvJb8s.1
Meeting ID: 856 0565 5395Passcode: 455340

Day 2, 23 Aug 2024 ( Friday ) 

Time Venue Session 
09:15 - 10:30 PQ305

Seminar 3

Advancing LLM Evaluation: Comprehen- sive Evaluation on Long-Context, Multi-Turn, and Instruction-Following

Dr. Xingshan Zeng (曾幸山 博士)

10:30 - 10:45  Tea Break
10:45 - 12:00  PQ305

 Seminar 4

Metaphor and Synesthesia Analysis via Computational Linguistic Methods Metaphorand Synesthesia Analysis via Computational Linguistic Methods

Dr. Zhongqing Wang (王中卿 博士)

Zoom Meeting Link: https://polyu.zoom.us/j/85605655395?pwd=XGbSk8QSO8fp7qa8K1l9bl7thvJb8s.1
Meeting ID: 856 0565 5395Passcode: 455340

12:00 - 14:00 Lunch Break*
14:00 - 15:15  PQ305

Seminar 5 (Online Seminar)

Knowledge Retrieval Augmentation: Paradigm and Key TechnologiesKnowledge Retrieval Augmentation: Paradigm and Key Technologies

Prof. Haofen Wang (王昊奮 教授)

Zoom Meeting Link: https://polyu.zoom.us/j/85605655395?pwd=XGbSk8QSO8fp7qa8K1l9bl7thvJb8s.1
Meeting ID: 856 0565 5395Passcode: 455340

15:15 - 16:30  PQ305

Seminar 6 

On Detection of Machine Generated TextOn Detection of Machine Generated Text 

Prof. Yue Zhang (張岳 教授 )

Zoom Meeting Link: https://polyu.zoom.us/j/85605655395?pwd=XGbSk8QSO8fp7qa8K1l9bl7thvJb8s.1
Meeting ID: 856 0565 5395Passcode: 455340

16:30 - 16:45  Tea Break
16:45 -18:00  PQ305

Seminar 7

Eliciting Alignments in Foundation Language ModelsEliciting Alignments in Foundation Language Models

Dr. Derek F. Wong (黃輝 博士)

Zoom Meeting Link: https://polyu.zoom.us/j/85605655395?pwd=XGbSk8QSO8fp7qa8K1l9bl7thvJb8s.1
Meeting ID: 856 0565 5395Passcode: 455340

*Confernce lunch is not provided 


Keynote Speeches and abstracts

Keynote Speech 1

What is Digital Humanities?

Prof. Kam Fai Wong(黃錦輝)

The Chinese University of Hong Kong

In recent years, large models like ChatGPT and GPT-4 have driven significant advancements in the field of artificial intelligence, revolutionizing various research domains. This talk begins with an introduction to the concept of Digital Humanities, it discusses the impact of digital technologies on various fields within the humanities, including education, language, history, philosophy, and arts. However, these models also present challenges such as privacy leakage, black-box nature, and poor reliability. This talk addresses these issues by introducing methods developed by our research team, focusing on enhancing the forgettability, reliability, adaptability, multiplicity, and explainability (FRAME) of large models.


Keynote Speech 2

从认知科学看待语言机制

Prof. Guodong Zhou(周國棟)

Soochow University

语言机制的研究是认知科学的核心。本次讲座将从形式主义、功能主义和认知主义等多个理论视角出发,剖析语言机制问题,深入探讨这些不同方法如何加深我们对大脑中语言处理过程的理解,并评估它们对计算模型的影响。通过整合认知科学的洞见,我们将揭示这一视角如何推动自然语言处理(NLP)技术的发展,并为认知科学与计算语言学的交叉研究开辟新的方向。

 

Keynote Speech 3

Advancing LLM Evaluation: Comprehensive Evaluation on Long-Context, Multi-Turn, and Instruction-Following

Xingshan Zeng (曾幸山)

Huawei Noah's Ark Lab

As Large Language Models (LLMs) are increasingly integrated into real-world applications, the need for comprehensive and systematic evaluation has never been more crucial. Traditional evaluations have largely focused on diverse tasks and broad knowledge domains, often neglecting the specific skills that are essential for practical applications. This talk introduces three cutting-edge benchmarks - M4LE, MT-Eval, and FollowBench - that address this gap by systematically evaluating the core skills (long-context comprehension, multi-turn conversational capabilities, and fine-grained instruction following) of LLMs.

M4LE introduces a benchmark tailored for assessing LLMs' ability to manage long sequences across diverse tasks and domains, revealing significant challenges in multi-span attention and semantic retrieval. MT-Eval shifts the focus to multi-turn interactions, highlighting how LLMs perform in complex, real-world conversational settings and identifying key factors affecting their multi-turn performance. FollowBench takes a different angle, evaluating LLMs on their ability to adhere to multi-level, fine-grained constraints in instruction following, uncovering critical weaknesses and suggesting areas for improvement. Togeth-er, these benchmarks provide a more nuanced understanding of LLM performance, highlighting critical areas for improvement and guiding future advancements in model evaluation.

 

Keynote Speech 4

Metaphor and Synesthesia Analysis via Computational Linguistic Methods

Dr. Zhongqing Wang(王中卿)

Soochow University

Metaphor, distinct from simile, is a rhetorical device that makes implicit comparisons without explicit figurative words. Textual metaphor detection refers to the automatic identification of metaphorical phenomena in text. Due to the absence of clear trigger words and the complexity of linguistic expressions in metaphors, current research in this area is still relatively preliminary. Synesthesia refers to a phenomenon in metaphor where the expressed sensation differs from the original sensation associated with a word. Detecting synesthesia requires not only textual analysis but also an understanding of cognitive science to analyze the relationships between different sensations. This report will elaborate on the corpus primarily used for metaphor detection, the latest detection methods, and recent research findings on the detection of synesthesia.

 

Keynote Speech 5

Knowledge Retrieval Augmentation: Paradigm and Key Technologies

Prof. Haofen Wang(王昊奮)

Tongji University

Knowledge retrieval augmentation technologies provide additional knowledge sources for large language models, effectively alleviating hallucination problems and issues concerning the timeliness of knowledge. These technologies have quickly become pivotal in optimizing large model practices. During technological iterations, various technologies such as Retrieval Augmented Generation (RAG), structural indexing optimization, knowledge graphs, vector databases, large model fine-tuning, and prompt engineering have been deeply integrated. Numerous functional modules have been proposed one after another, presenting a challenge for researchers to comprehensively understand RAG.

This talk aims to thoroughly review and analyze RAG from the perspectives of paradigms, key technologies, and application development, with the goal of grasping the development trends and future directions of the technology from a higher level. Through a comprehensive analysis of the current research status, we propose a research paradigm of modular RAG and RAG Flow. We summarize six major functional modules, comprising more than 50 operator operations, and distill seven typical RAG Flow design patterns from over a hundred papers, providing guidance for designing RAG systems.

Based on these paradigms, we further advance the open-source work of the OpenRAG series. We have built the OpenRAG Knowledge Base, which comprehensively covers the information required by RAG researchers and developers and offers support for highly customizable multidimensional analysis views. Additionally, we have established the OpenRAG Playground to assist researchers and engineers in quickly building cutting-edge baseline methods and rapidly validating and comparing different RAG Flows on public or custom datasets.

 

Keynote Speech 6

On Detection of Machine Generated Text

Prof. Yue Zhang (3EF)

Westlake University

With advances in large language models, machine generated text have been seen increasing rapidly over the Internet and in business and educational settings. However, it is not always desirable to have them, and in some situations maliciously generated text can cause harm in society. We consider the task of automatically detecting machine generated text in the open domain setting, where a detector does not need to know the model generating textual content, the domain of the content, or the language. We discuss both supervised settings and unsupervised settings, where the detection system learns from human labeled data and makes decision without receiving supervised tuning, respectively. Both evaluation settings and detection algorithms are discussed. Our final model fulfills the task with over 96% accuracy on detecting ChatGPT.

 

Keynote Speech 7

Eliciting Alignments in Foundation Language Models

Derek F. Wong(黃輝)

The University of Macau

Foundation large language models (LLMs) require supervised fine-tuning (SFT) to develop instruction-following capabilities for downstream tasks. Yet, collecting and annotating data for SFT is often expensive, especially for cross-lingual or non-English tasks. This raises a critical research question: How can we unlock and align the non-English knowledge within foundation models to serve minority groups effectively? In this seminar, we will systematically explore this issue by addressing the following questions:

  • Do foundation models possess adequate cross-lingual knowledge?
  • How do foundation LLMs compare to general-purpose SFT LLMs in handling cross-lingual tasks?
  • What unsupervised training strategies can be employed to enhance the cross-lingual capabilities of foundation LLMs?

Our discussion aims to provide insights into the development of multilingual LLMs and promote their application in low-resource settings.

 

Online Conference Handbook

Booklet CCFPolyU

 

 

 

 

 

  1. 中文参考语法
  2. 语言学前沿丛书
  3. 计算语言学与语言科技原文丛书
  4.  Lingua Sinica

Prof. HUANG Chu Ren 黃居仁

Chair Professor

PhD, MA, BA

Tel.:2766 4832

Email:该邮件地址已受到反垃圾邮件插件保护。要显示它需要在浏览器中启用 JavaScript。

Office:GH513

 

陳保亞教授 

北京大學人文特聘教授

教育部人文社科重點研究基地北京大學中國語言學研究中心主任

電郵: 该邮件地址已受到反垃圾邮件插件保护。要显示它需要在浏览器中启用 JavaScript。 (北京大學中國語言文學系)

会议

 主辦2024年8月22日-23日在香港理工大學舉行的"大模型時代的計算語言學高峰論壇暨香港理工大學與中國計算機學會合作創新國際研討會”。會議參考網頁:https://www.polyu.edu.hk/cbs/rp2u2/hk/research/research-activities/18-people/56-ccf-polyu/

主办2011年12月15日-16日在香港理工大学举行的“第十六届国际粤方言研讨会”。会议参考网页:http://www.yue2011.cbs.polyu.edu.hk

主办2010年8月15日-16日在北京大学举行的“第6届国际东亚理论语言学论坛(TEAL-6)”。会议参考网页:http://www.teal.cbs.polyu.edu.hk/

协助举办2010年8月12日-14日在北京语言大学举行的“第8届生成语言学国际学术研讨会(GLOW in Asia VIII)”。会议参考网页:http://www.blcu.edu.cn/CLT/glow/glowasia8.html

协助举办2010年8月17日-18日在北京大学举行的“纪念朱德熙教授诞辰90周年和陆俭明教授从教50周年学术研讨会”。会议参考网页:http://ccl.pku.edu.cn/zhulu2010/

 

讲座

 

黄居仁教授担任第三届中国语言学岭南书院2022年12月12日的主讲人。讲座参考网页:https://cuhk.edu.hk/ics/clrc/institute/2022/index.html

黄居仁教授担任2022年12月7日Chair Professor Distinguished Lecture Series 2022/23的讲者。讲座参考网页: