Junwen Bai

Mountain View, CA 94043     ·     junwen AT cs DOT cornell DOT edu

I'm a Senior Research Scientist at Google Deepmind (GDM). I received my PhD degree from the Department of Computer Science at Cornell University in 2022, advised by Prof. Carla P. Gomes. I received my Bachelor's degree in 2017 from Shanghai Jiao Tong University, where I spent four years in ACM Class. I am interested in the general areas of machine learning and language technology, with research focuses on sequence representation learning and probabilistic modeling, often under scenarios with low-supervision. I have developed scalable and general machine learning methods for real-world problems including automatic speech recognition, climate change and scientific discovery. My full CV can be found here.


Publications

* denotes equal contribution.

Technical Reports:

Responsive image

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Gemini Team Google: ..., Trevor Strohman, Junwen Bai, Slav Petrov, Yonghui Wu, Demis Hassabis, Koray Kavukcuoglu, Jeffrey Dean, Oriol Vinyals
Google Blog: Tech Report, 2024.

Conferences:

Responsive image

Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation

Wen Wu, Bo Li, Chao Zhang, Chung-Cheng Chiu, Qiujia Li, Junwen Bai, Tara N Sainath, Philip C Woodland

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL), 2024.

Responsive image

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR

Junwen Bai, Bo Li, Qiujia Li, Tara N. Sainath, Trevor Strohman

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024.

Responsive image

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang

Advances In Neural Information Processing Systems (NeurIPS), 2023.

Responsive image

Efficient Domain Adaptation for Speech Foundation Models

Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays.

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023.

Responsive image

Xtal2DoS: Attention-based Crystal to Sequence Learning for Density of States Prediction

Junwen Bai, Yuanqi Du, Yingheng Wang, Shufeng Kong, John Gregoire, Carla Gomes

NeurIPS Workshop on AI for Science, 2022.

Responsive image

Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification

Junwen Bai, Shufeng Kong, Carla Gomes

International Conference on Machine Learning (ICML), 2022.

 

A workshop version was presented at NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2021.

Responsive image

Joint Unsupervised and Supervised Training for Multilingual ASR

Junwen Bai, Bo Li, Yu Zhang, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022.

Responsive image

A GNN-RNN Approach for Harnessing Geospatial and Temporal Information: Application to Crop Yield Prediction

Joshua Fan*, Junwen Bai*, Zhiyun Li*, Ariel Ortiz-Bobea, Carla Gomes

AAAI Conference on Artificial Intelligence (AAAI), 2022.

 

A workshop version won Best ML Innovation Paper award at NeurIPS workshop on Tackling Climate Change with Machine Learning, 2021.

Responsive image

Scaling End-to-End Models for Large-Scale Multilingual ASR

Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma, Junwen Bai

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021.

Responsive image

Contrastively Disentangled Sequential Variational Autoencoder

Junwen Bai, Weiran Wang, Carla Gomes

Advances In Neural Information Processing Systems (NeurIPS), 2021.

Responsive image

Representation Learning for Sequence Data with Deep Autoencoding Predictive Components

Junwen Bai, Weiran Wang, Yingbo Zhou, Caiming Xiong

International Conference on Learning Representations (ICLR), 2021.

Responsive image

HOT-VAE: Learning High-Order Label Correlation for Multi-Label Classification via Attention-Based Variational Autoencoders

Wenting Zhao, Shufeng Kong, Junwen Bai, Daniel Fink, Carla Gomes

AAAI Conference on Artificial Intelligence (AAAI), 2021.

Responsive image

Disentangled Variational Autoencoder based Multi-Label Classification with Covariance-Aware Multivariate Probit Model

Junwen Bai, Shufeng Kong, Carla Gomes

International Joint Conference on Artificial Intelligence - Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI), 2020. (Acceptance rate: 12.6%)

Responsive image

Deep Hurdle Networks for Zero-Inflated Multi-Target Regression: Application to Multiple Species Abundance Estimation

Shufeng Kong, Junwen Bai, Jae Hee Lee, Di Chen, Andrew Allyn, Michell Stuart, Malin Pinsky, Kathy Mills, Carla Gomes

International Joint Conference on Artificial Intelligence - Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI), 2020. (Acceptance rate: 12.6%)

Responsive image

SWALP: Stochastic Weight Averaging in Low-Precision Training

Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Wilson, Chris De Sa
International Conference on Machine Learning (ICML), 2019.
Responsive image

Imitation Refinement For X-Ray Diffraction Signal Processing

Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, John Gregoire, Carla P. Gomes
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.
Responsive image

An Efficient Relaxed Projection Method for Constrained Non-negative Matrix Factorization with Application to the Phase-Mapping Problem in Materials Science

Junwen Bai, Sebastian Ament, Guillaume Perez, John M. Gregoire, Carla P. Gomes
International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), 2018.
Responsive image

Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery

Junwen Bai, Johan Bjorck, Yexiang Xue, Santosh K. Suram, John M. Gregoire, Carla P. Gomes
International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), 2017.
Responsive image

Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery

Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert B. van Dover, John Gregoire, Carla P. Gomes
AAAI Conference on Artificial Intelligence (AAAI), 2017. (Innovative Application Award)

Journals:

Responsive image

CRYSTAL: a multi-agent AI system for automated mapping of materials' crystal structures

Carla P. Gomes, Junwen Bai, Yexiang Xue, Johan Björck, Brendan Rappazzo, Sebastian Ament, Richard Bernstein, Shufeng Kong, Santosh K Suram, R Bruce van Dover, John M Gregoire
In MRS Communications 9 (2) 600-608, 2019.

Responsive image

Phase Mapper: Accelerating Materials Discovery with AI

Junwen Bai, Yexiang Xue, Johan Bjorck, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Santosh K. Suram, R. Bruce van Dover, John M. Gregoire, Carla P. Gomes
In AI Magazine 39 (1), 15-26, 2018. (Cover story)
Responsive image

Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System

Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, Robert B.van Dover, Carla P. Gomes, John M. Gregoire
American Chemical Society Combinatorial Science 19(1), 37-46, 2017. (Editor's choice)

Professional Services

SPC: IJCAI '21

PC/reviewer: IJCAI '20, AAAI '21, IJCAI '21, ICML '21, NeurIPS '21 (Outstanding Reviewer), AAAI '22, SAS@AAAI '22, ICLR '22, ICASSP '22, IJCAI '22, AI4Good@IJCAI '22, ICML '22, NeurIPS '22, AAAI '23, ICLR '23, ICASSP '23, IJCAI '23, ICML '23, NeurIPS '23, ASRU '23, ICLR '24, ICASSP '24, Interspeech '24, ICML '24, NeurIPS '24, SLT '24

Journal reviewer: Journal of Chemometrics and Intelligent Laboratory Systems, Computational Materials Science, Transactions on Image Processing (TIP), Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Journal of Selected Topics in Signal Processing (JSTSP), Transactions on Machine Learning Research (TMLR), GeoInformatica, Transactions on Audio, Speech and Language Processing (TASL), IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), Neurocomputing, Knowledge-Based Systems, SN Computer Science, International Journal of Applied Earth Observation and Geoinformation (JAG), Applied Energy

Session Chair: IJCAI '20, ICML '22


Education

Cornell University

PhD, Computer Science
July 2017 - Aug 2022

Shanghai Jiao Tong University

Bachelor's, Computer Science
Sept 2013 - July 2017