Yong Dai
Researcher @ Tencent AI Lab

Email: daiyongya@outlook.com

Github | Google Scholar
Brief Bio

I got my Ph.D. degree from UESTC in Chengdu, mentored by Zenglin Xu . My research interests mainly focus on how to pre-train and finetune language models to perform understanding tasks and generation tasks. Recently, I am also curious about multi-modality processing.
News


2023.12 | We released our multi-modal web agent paper "WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models".
2023.12 | We released our efficient diffusion paper for image generation "Emage: Non-Autoregressive Text-to-Image Generation".
2023.12 | We released our diversified reward training paper "On Diverse Preferences for Large Language Model Alignment".
2023.12 | We released our efficient alignment paper "Adversarial Preference Optimization".
2023.12 | We released our evaluation paper "TencentLLMEval: a hierarchical evaluation of Real-World capabilities for human-aligned LLMs".
2023.12 | We released our reward adaptation paper "Everyone deserves a reward: Learning customized human preferences".
2023.12 | We released our long-text processing paper "Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers".
2023.06 | Our paper "SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills" is accepted to ICASSP 2024.
2022.12 | Our paper "When Federated Learning Meets Pre-trained Language Models' Parameter-Efficient Tuning Methods" is accepted to Findings of ACL 2023.
2022.10 | Our paper "Leveraging Only the Category Name for Aspect Detection through Prompt-based Constrained Clustering" is accepted to Findings of EMNLP 2022.
2022.08 | We released our technique report "Effidit: Your AI Writing Assistant".
2022.05 | We released our multimodal data processing paper "One model, multiple modalities: A sparsely activated approach for text, sound, image, video and code".
2022.08 | We released our paper "Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors".
2022.03 | We released our paper "MarkBERT: Marking Word Boundaries Improves Chinese BERT".
2022.03 | Our paper "Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction" is accepted to Findings of ACL 2022.
2022.03 | Our paper "Exploring and Adapting Chinese GPT to Pinyin Input Method" is accepted to ACL 2022.
2022.01 | Our paper "Graph fusion network for text classification" is accepted to KBS.
2021.09 | Our paper "Unsupervised sentiment analysis by transferring multi-source knowledge" is accepted to CC.
2020.10 | Our paper "Contextualize knowledge bases with transformer for end-to-end task-oriented dialogue systems" is accepted to EMNLP 2021 (Oral).
2020.04 | Our paper "Adversarial training based multi-source unsupervised domain adaptation for sentiment analysis" is accepted to AAAI 2020.
Publications --- after ChatGPT


On Diversified Preferences of Large Language Model Alignment
Dun Zeng, Yong Dai*, Pengyu Cheng, Tianhao Hu, Wanshun Chen, Nan Du, Zenglin Xu
arXiv: 2312.07401
TencentLLMEval: a hierarchical evaluation of Real-World capabilities for human-aligned LLMs
Shuyi Xie, Wenlin Yao, Yong Dai, Shaobo Wang, Donlin Zhou, Lifeng Jin, Xinhua Feng, Pengzhi Wei, Yujie Lin, Zhichao Hu, Dong Yu, Zhengyou Zhang, Jing Nie, Yuhong Liu
arXiv:2311.05374
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, Dong Yu
arXiv:2401.13919
Adversarial Preference Optimization
Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Nan Du
arXiv:2311.08045
Everyone deserves a reward: Learning customized human preferences
Pengyu Cheng, Jiawen Xie, Ke Bai, Yong Dai, Nan Du
arXiv:2309.03126
Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du
arXiv:2308.13191
Publications --- pretrain techniques


"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction
Yong Dai*, Linyang Li*, Cong Zhou*, Zhangyin Feng, Enbo Zhao, Xipeng Qiu, Piji Li, Duyu Tang
Findings of ACL 2022
MarkBERT: Marking Word Boundaries Improves Chinese BERT
Linyang Li ,Yong Dai, Duyu Tang, Xipeng Qiu, Zenglin Xu, Shuming Shi
NLPCC 2023
Publications --- applications of PLMs


Effidit: Your AI Writing Assistant
Shuming Shi, Enbo Zhao, Duyu Tang, Yan Wang, Piji Li, Wei Bi, Haiyun Jiang, Guoping Huang, Leyang Cui, Xinting Huang, Cong Zhou, Yong Dai, Dongyang Ma
arXiv:2208.01815
One model, multiple modalities: A sparsely activated approach for text, sound, image, video and code
Yong Dai*, Duyu Tang*, Liangxin Liu*, Minghuan Tan*, Cong Zhou*, Jingquan Wang*, Zhangyin Feng, Fan Zhang, Xueyu Hu, Shuming Shi
Under review
SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills
Zhangyin Feng, Yong Dai, Fan Zhang, Duyu Tang, Xiaocheng Feng, Shuangzhi Wu, Bing Qin, Yunbo Cao, Shuming Shi
ICASSP 2024
Emage: Non-Autoregressive Text-to-Image Generation
Zhangyin Feng, Runyi Hu, Liangxin Liu, Fan Zhang, Duyu Tang,Yong Dai, Xiaocheng Feng, Jiwei Li, Bing Qin, Shuming Shi
arXiv:2312.14988
Exploring and Adapting Chinese GPT to Pinyin Input Method
Minghuan Tan*, Yong Dai*, Duyu Tang, Zhangyin Feng, Guoping Huang, Jing Jiang, Jiwei Li, Shuming Shi
ACL 2022
Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors
Cong Zhou, Yong Dai, Duyu Tang, Enbo Zhao, Zhangyin Feng, Li Kuang, Shuming Shi
arXiv:2204.12052
When Federated Learning Meets Pre-trained Language Models' Parameter-Efficient Tuning Methods
Zhuo Zhang, Yuanhang Yang, Yong Dai, Lizhen Qu, Zenglin Xu
Findings of ACL 2023
Skillnet-nlu: A sparsely activated model for general-purpose natural language understanding
Fan Zhang, Duyu Tang, Yong Dai, Cong Zhou, Shuangzhi Wu, Shuming Shi
arXiv:2203.03312
Contextualize knowledge bases with transformer for end-to-end task-oriented dialogue systems
Yanjie Gou, Yinjie Lei, Lingqiao Liu, Yong Dai, Chunxu Shen
EMNLP 2022 (Oral)
Publications --- UDA and Graph learning


Adversarial training based multi-source unsupervised domain adaptation for sentiment analysis
Yong Dai, Jian Liu, Xiancong Ren, Zenglin Xu
AAAI 2020
Unsupervised sentiment analysis by transferring multi-source knowledge
Yong Dai, Jian Liu, Jian Zhang, Hongguang Fu, Zenglin Xu
Cognitive Computation
Graph fusion network for text classification
Yong Dai, Linjun Shou, Ming Gong, Xiaolin Xia, Zhao Kang, Zenglin Xu, Daxin Jiang
Knowledge-Based Systems
Experience


05/21 - Now: Research Intern and Researcher, Tencent AI Lab
11/20 - 04/21: Visiting student, Westlake University
09/19 - 10/20: Research Intern, Microsoft STCA nlpg
09/18 - 08/19: Project leader, cooperation project with Nuance
Reviewer Services


ACL: 2022, 2023
EMNLP: 2022, 2023
COLING: 2022
AAAI: 2022, 2023, 2024
IJCAI: 2023
ECAI: 2023
ACM MM: 2023
TASLP
Knowledge-Based Systems
Neuralcomputing
Neural Network

Last updated: 2023-02-06