Junwen Bai

Mountain View, CA 94043     ·     junwen AT cs DOT cornell DOT edu

I'm a Research Scientist at Google. I received my PhD degree from the Department of Computer Science at Cornell University in 2022, advised by Prof. Carla P. Gomes. I received my Bachelor's degree in 2017 from Shanghai Jiao Tong University, where I spent four years in ACM Class. I am interested in the general areas of machine learning and language technology, with research focuses on sequence representation learning and probabilistic modeling, often under scenarios with low-supervision. I have developed scalable and general machine learning methods for real-world problems including automatic speech recognition, climate change and scientific discovery. My full CV can be found here.


Technical Reports:

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Gemini Team Google: ..., Trevor Strohman, Junwen Bai, Slav Petrov, Yonghui Wu, Demis Hassabis, Koray Kavukcuoglu, Jeffrey Dean, Oriol Vinyals
Google Blog: Tech Report, 2024.


Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation

Wen Wu, Bo Li, Chao Zhang, Chung-Cheng Chiu, Qiujia Li, Junwen Bai, Tara N Sainath, Philip C Woodland

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024.

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR

Junwen Bai, Bo Li, Qiujia Li, Tara N. Sainath, Trevor Strohman

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024.

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang

Advances In Neural Information Processing Systems (NeurIPS), 2023.

Efficient Domain Adaptation for Speech Foundation Models

Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays.

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023.

Xtal2DoS: Attention-based Crystal to Sequence Learning for Density of States Prediction

Junwen Bai, Yuanqi Du, Yingheng Wang, Shufeng Kong, John Gregoire, Carla Gomes

NeurIPS Workshop on AI for Science, 2022.

Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification

Junwen Bai, Shufeng Kong, Carla Gomes

International Conference on Machine Learning (ICML), 2022.


A workshop version was presented at NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2021.

Joint Unsupervised and Supervised Training for Multilingual ASR

Junwen Bai, Bo Li, Yu Zhang, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022.

A GNN-RNN Approach for Harnessing Geospatial and Temporal Information: Application to Crop Yield Prediction

Joshua Fan*, Junwen Bai*, Zhiyun Li*, Ariel Ortiz-Bobea, Carla Gomes

AAAI Conference on Artificial Intelligence (AAAI), 2022.


A workshop version won Best ML Innovation Paper award at NeurIPS workshop on Tackling Climate Change with Machine Learning, 2021.

Scaling End-to-End Models for Large-Scale Multilingual ASR

Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma, Junwen Bai

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021.

Contrastively Disentangled Sequential Variational Autoencoder

Junwen Bai, Weiran Wang, Carla Gomes

Advances In Neural Information Processing Systems (NeurIPS), 2021.

Representation Learning for Sequence Data with Deep Autoencoding Predictive Components

Junwen Bai, Weiran Wang, Yingbo Zhou, Caiming Xiong

International Conference on Learning Representations (ICLR), 2021.

HOT-VAE: Learning High-Order Label Correlation for Multi-Label Classification via Attention-Based Variational Autoencoders

Wenting Zhao, Shufeng Kong, Junwen Bai, Daniel Fink, Carla Gomes

AAAI Conference on Artificial Intelligence (AAAI), 2021.

Disentangled Variational Autoencoder based Multi-Label Classification with Covariance-Aware Multivariate Probit Model

Junwen Bai, Shufeng Kong, Carla Gomes

International Joint Conference on Artificial Intelligence - Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI), 2020. (Acceptance rate: 12.6%)

Deep Hurdle Networks for Zero-Inflated Multi-Target Regression: Application to Multiple Species Abundance Estimation

Shufeng Kong, Junwen Bai, Jae Hee Lee, Di Chen, Andrew Allyn, Michell Stuart, Malin Pinsky, Kathy Mills, Carla Gomes

International Joint Conference on Artificial Intelligence - Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI), 2020. (Acceptance rate: 12.6%)

SWALP: Stochastic Weight Averaging in Low-Precision Training

Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Wilson, Chris De Sa
International Conference on Machine Learning (ICML), 2019.
Imitation Refinement For X-Ray Diffraction Signal Processing

Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, John Gregoire, Carla P. Gomes
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.
An Efficient Relaxed Projection Method for Constrained Non-negative Matrix Factorization with Application to the Phase-Mapping Problem in Materials Science

Junwen Bai, Sebastian Ament, Guillaume Perez, John M. Gregoire, Carla P. Gomes
International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), 2018.
Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery

Junwen Bai, Johan Bjorck, Yexiang Xue, Santosh K. Suram, John M. Gregoire, Carla P. Gomes
International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), 2017.
Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery

Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert B. van Dover, John Gregoire, Carla P. Gomes
AAAI Conference on Artificial Intelligence (AAAI), 2017. (Innovative Application Award)


CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures

Carla P. Gomes, Junwen Bai, Yexiang Xue, Johan Björck, Brendan Rappazzo, Sebastian Ament, Richard Bernstein, Shufeng Kong, Santosh K Suram, R Bruce van Dover, John M Gregoire
In MRS Communications 9 (2) 600-608, 2019.

Phase Mapper: Accelerating Materials Discovery with AI

Junwen Bai, Yexiang Xue, Johan Bjorck, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Santosh K. Suram, R. Bruce van Dover, John M. Gregoire, Carla P. Gomes
In AI Magazine 39 (1), 15-26, 2018. (Cover story)
Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System

Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, Robert B.van Dover, Carla P. Gomes, John M. Gregoire
American Chemical Society Combinatorial Science 19(1), 37-46, 2017. (Editor's choice)

Professional Services


PC/reviewer: IJCAI '20, AAAI '21, IJCAI '21, ICML '21, NeurIPS '21 (Outstanding Reviewer), AAAI '22, SAS@AAAI '22, ICLR '22, ICASSP '22, IJCAI '22, AI4Good@IJCAI '22, ICML '22, NeurIPS '22, AAAI '23, ICLR '23, ICASSP '23, IJCAI '23, ICML '23, NeurIPS '23, ASRU '23, ICLR '24, ICASSP '24, Interspeech '24, ICML '24, NeurIPS '24, SLT '24

Journal reviewer: Journal of Chemometrics and Intelligent Laboratory Systems, Computational Materials Science, Transactions on Image Processing (TIP), Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Journal of Selected Topics in Signal Processing (JSTSP), Transactions on Machine Learning Research (TMLR), GeoInformatica, Transactions on Audio, Speech and Language Processing (TASL), IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), Neurocomputing, Knowledge-Based Systems, SN Computer Science, International Journal of Applied Earth Observation and Geoinformation (JAG), Applied Energy

Session Chair: IJCAI '20, ICML '22


Cornell University

PhD, Computer Science
July 2017 - Aug 2022

Shanghai Jiao Tong University

Bachelor's, Computer Science
Sept 2013 - July 2017