Ke-Han Lu

I am Ke-Han Lu, a first-year Ph.D. student at National Taiwan University, working under the supervision of Prof. Hung-Yi Lee. My research interests lie in the field of Multimodal Language Models, with a particular focus on cross-modal alignment and leveraging powerful language models to enhance multi-modal systems.

Research Experience

Instruction-following Speech Language Models I am currently working on developing instruction-following speech language models in cooperation with NVIDIA. We have proposed a scalable and robust framework called DeSTA [9, 12] for training these general-purpose speech systems. Additionally, I have co-authored papers related to evaluation benchmarks [7] and systems [10, 11] in this research direction. I have experience in fine-tuning large-scale language models using NeMo and Megatron-LM.

Automatic Speech Recognition I have focused on improving the recognition accuracy of non-autoregressive ASR systems by injecting linguistic knowledge from pre-trained language models through cross-modal alignment [4] and knowledge distillation [5]. I have experience in training ASR systems using ESPnet and pre-training Mandarin wav2vec2.0 with fairseq.

Publications

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee,
arXiv preprint, Paper, GitHub
SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning
Chien-yu Huang, Min-Han Shih, Ke-Han Lu, Chi-Yuan Hsiao, Hung-yi Lee,
arXiv preprint, Paper
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
Chun-Yi Kuan, Chih-Kai Yang, Wei-Ping Huang, Ke-Han Lu, Hung-yi Lee,
IEEE SLT 2024, Paper
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee,
InterSpeech 2024, Paper
HypR: A comprehensive study for ASR hypothesis revising with a reference corpus
Yi-Wei Wang, Ke-Han Lu, Kuan-Yu Chen,
InterSpeech 2024, Paper, GitHub
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee,
ICASSP 2024, Paper, GitHub
Investigating zero-shot generalizability on mandarin-english code-switched asr and speech-to-text translation of recent foundation models with self-supervision and weak supervision
Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee,
ICASSP workshop 2024, Paper
A Context-aware Knowledge Transferring Strategy for CTC-based ASR
Ke-Han Lu, Kuan-Yu Chen,
IEEE SLT 2022, Paper, GitHub
Non-autoregressive ASR Modeling using Pre-trained Language Models for Chinese Speech Recognition
Fu-Hao Yu, Kuan-Yu Chen, Ke-Han Lu,
IEEE/ACM Transactions on Audio, Speech, and Language Processing, Paper
A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021
Ke-Han Lu, Bo-Han Fang, Kuan-Yu Chen,
Poster spotlight, VQA workshop, CVPR 2021, Paper, Video, LeaderBoard
ntust-nlp-2 at ROCLING-2021 Shared Task: BERT-based semantic analyzer with word-level information
Ke-Han Lu, Kuan-Yu Chen,
ROCLING 2021: Conference on Computational Linguistics and Speech Processing, Paper
A Preliminary Study of Formosa Speech Recognition Challenge 2020 – Taiwanese ASR
Fu-Hao Yu, Ke-Han Lu, Yi-Wei Wang, Wei-Zhe Chang, Wei-Kai Huang, Kuan-Yu Chen,
International Journal of Computational Linguistics and Chinese Language Processing, Paper

Education

National Taiwan University
- Ph.D. in Communication Engineering
  - Feb 2024 - Present
National Taiwan University of Science and Technology
- M.S. in Computer Science and Information Engineering
  - Sep 2020 - Feb 2023
National Taiwan University of Science and Technology
- B.S. in Computer Science and Information Engineering
  - Sep 2016 - Jun 2020

Award

NSTC Graduate Research Fellowship（NSTC-GRF）
16th TaiwanTech Outstanding Youth Award

Skills

Programming: Python, PyTorch, Javascript, Latex
Software and tools: Linux, Docker, Git, NeMo, Megatron-LM, ESPNET, Huggingface Transformers, fairseq
Language: Mandarin(native), English(fluent)

Research Experience​

Publications​

Education​

Award​

Skills​

Research Experience

Publications

Education

Award

Skills