Homepage of Zhen Dong
PhD & Postdoc at Berkeley AI Research
Research Interests
Large language model (LLM) compression.
Efficient deep learning for generative models (Vision & NLP).
Hardware-software co-design for efficient AI chips.
Education
University of California at Berkeley:
Visual Object and Activity Recognition (4.00)
RISC-V CPU on FPGA Lab (4.00)
Digital Circuits and Computer Architecture (4.00)
Applications of Parallel Computers (4.00)
Statistical Learning Theory (4.00)
Convex Optimization and Approximation (4.00)
Peking University: (Rank 1/327 in EECS)
Digital Logic (4.00)
Principles of Digital Integrated Circuits (4.00)
Analog Circuits (3.99)
Advanced Analog Integrated Circuits Design (3.99)
Micro-Nano Integrated System (4.00)
Fundamentals of Solid State Physics (3.98)
Fundamentals of Semiconductor Materials (3.97)
Physics of Semiconductor (3.98)
Semiconductor Device Physics (3.98)
Principle of Integrated Circuits Process (3.99)
Awards
-
Winner of 2018-2020 Berkeley Fellowship.
-
Winner of PhD Forum (Second Place) at DAC 2024.
-
Doctoral Consortium at CVPR 2024.
-
Best Paper Nomination at Practical DL Workshop at AAAI 2023.
-
AWS Research Credits Award and Google Cloud Research Credits Award.
-
Tang Lixin Scholarship for outstanding students in China. (top 0.5%)
-
Tang Lixin 1st Prize Scholarship for graduate students studying abroad. (top 0.05%)
-
SenseTime Scholarship, National Scholarship and Fang Zheng Scholarship. (top 1%)
-
Pacemaker to Triple-A student and Triple-A student (twice) at Peking University.
-
1st Place in EMCC 2020 Competition on both Classification and Object Detection tracks.
-
2nd Place in Visual Wake Word Challenge at CVPR 2019.
-
1st Prize in the Chinese Olympiad in Physics and the Chinese Physics Competition for college students.
-
Princeton University Math Competition (PUMac): Top three among all participants in geometry group.
-
Top Ten Undergraduate Research Award at PKU EECS.
-
Outstanding Graduates at Peking University and Outstanding Graduates in Beijing.
Publications
- Zhikai Li, Xuewen Liu, Dongrong Fu, Jianquan Li, Qingyi Gu, Kurt Keutzer, Zhen Dong✉. “K-Sort Arena: Efficient and reliable benchmarking for generative models via K-wise human preferences,” arXiv 2024.
- Chenyu Wang*, Zhen Dong*✉, Daquan Zhou*✉, Zhenhua Zhu, Yu Wang, Jiashi Feng, Kurt Keutzer. “EPIM: Efficient Processing-In-Memory Accelerators based on Epitome,” DAC 2024.
- Lin Xu, Zhiyuan Hu, Daquan Zhou✉, Hongyu Ren, Zhen Dong✉, Kurt Keutzer, See-Kiong Ng, Jiashi Feng. “MAgIC: Investigation of large language model powered multi-agent in cognition, adaptability, rationality and collaboration,” EMNLP 2024.
- Jinbin Bai, Zhen Dong, Aosong Feng, Xiao Zhang, Tian Ye, Kaicheng Zhou, Mike Zheng Shou. “Integrating View Conditions for Image Synthesis,” IJCAI 2024.
- Javier Campos, Zhen Dong, Javier Duarte, Amir Gholami, Michael W. Mahoney, Jovan Mitrevski, Nhan Tran. “End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs,” ACM Transactions on Reconfigurable Technology and Systems (TRETS) 2024.
- Junyi Yao, Yijiang Liu, Zhen Dong, Mingfei Guo, Jiashi Feng, Kurt Keutzer, Li Du, Daquan Zhou, Shanghang Zhang. “PromptCoT: Align prompt distribution via adapted chain of thought,” CVPR 2024.
- Yuzhang Shang, Zhihang Yuan, Qiang Wu, Zhen Dong✉. “PB-LLM: Partially Binarized Large Language Models,” ICLR 2024.
- Lutfi Erdogan, VAR Kanakagiri, Kurt Keutzer, Zhen Dong✉. “Stochastic Communication Avoidance for Recommendation Systems,” IEEE CAI 2024.
- Ze Ma, Daquan Zhou✉, Chun-Hsiao Yeh, Xue-She Wang, Xiuyu Li, Huanrui Yang, Zhen Dong✉, Kurt Keutzer, Jiashi Feng. “Magic-Me: Identity-Specific Video Customized Diffusion,” AI4VA Workshop at ECCV 2024.
- Rongyu Zhang, Yulin Luo, Huanrui Yang, Zhen Dong, … & Shanghang Zhang. “Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-wise Linear Modulation,” AAAI 2024.
- Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer. “SqueezeLLM: Dense-and-Sparse Quantization,” ICML 2024.
- Anthony Chen, Huanrui Yang, Yulu Gan, Denis A Gudovskiy, Zhen Dong, Haofan Wang, Tomoyuki Okuno, Yohei Nakata, Shanghang Zhang, Kurt Keutzer. “Split-Ensemble: Efficient OOD-aware ensemble via task and model splitting,” ICML 2024.
- Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer. “LLM Inference Unveiled: Survey and Roofline Model Insights,” arXiv 2024.
- Huanrui Yang, Yafeng Huang, Zhen Dong, Denis A Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Yuan Du, Kurt Keutzer, Shanghang Zhang. “Fisher-aware Quantization for DETR Detectors with Critical-category Objectives,” WANT Workshop at ICML 2024.
- Yifan Zhang*, Zhen Dong*, Huanrui Yang, Ming Lu, Cheng-Ching Tseng, Yandong Guo, Kurt Keutzer, Li Du, Shanghang Zhang. “QD-BEV: Quantization-aware View-guided Distillation for Multi-view 3D Object Detection,” ICCV 2023.
- Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, Kurt Keutzer. “Q-Diffusion: Quantizing Diffusion Models,” ICCV 2023.
- Venkat Srinivasan, Zhen Dong, Banghua Zhu, Brian Yu, Hanzi Mao, Damon Mosk-Aoyama, Kurt Keutzer, Jiantao Jiao, Jian Zhang. “NexusRaven: A Commercially-Permissive Language Model for Function Calling,” FMDM Workshop & Instruction Workshop at NeurIPS 2023
- 3D Object DetectionYijiang Liu, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang Zhang. “NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers,” CVPR 2023.
- Lirui Xiao, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang Zhang. “CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification,” DAC 2023.
- Yifan Zhang*, Zhen Dong*, Huanrui Yang, Ming Lu, Cheng-Ching Tseng, Yandong Guo, Kurt Keutzer, Li Du, Shanghang Zhang. “QD-BEV: Quantization-aware View-guided Distillation for 3D Object Detection,” Best Paper Nomination, Practical DL Workshop at AAAI 2023.
- Javier Campos, Zhen Dong, Javier Duarte, Amir Gholami, Michael Mahoney, Jovan Mitrevski and Nhan Tran. “End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs,” OSCAR Workshop at ISCA 2023.
- Zhen Dong*, Kaicheng Zhou*, Guohao Li*, Qiang Zhou, Mingfei Guo, Bernard Ghanem, Kurt Keutzer, Shanghang Zhang. “UnrealNAS: Can We Search Neural Architectures with Unreal Data?” DAC 2023 Workshop.
- Tian Li, Xiang Chen, Zhen Dong, Weijiang Yu, Yijun Yan, Shanghang Zhang, Kurt Keutzer. “Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data,” Long Oral, IJCAI-ECAI 2022.
- Zhen Dong. “Hardware-aware Efficient Deep Learning,” PhD Thesis, 2022.
- Shixing Yu*, Zhewei Yao*, Amir Gholami*, Zhen Dong*, Michael W. Mahoney, and Kurt Keutzer. “Hessian-Aware Pruning and Optimal Neural Implant,” Oral, WACV 2022.
- Allison McCarn Deiana, Nhan Tran, … Zhen Dong, … Olivia Weng. “Applications and Techniques for Fast Machine Learning in Science,” Frontiers in Big Data 2022.
- Zhen Dong*, Yizhao Gao*, Qijing Huang, John Wawrzynek, Hayden K.H. So, Kurt Keutzer. “HAO: Hardware-aware neural Architecture Optimization for Efficient Inference,” Oral, FCCM 2021.
- Zhen Dong*, Dequan Wang*, Qijing Huang*, Yizhao Gao, Yaohui Cai, Tian Li, Bichen Wu, Kurt Keutzer, John Wawrzynek. “CoDeNet: Algorithm-hardware Co-design for Deformable Convolution,” Oral, FPGA 2021.
- Zhewei Yao*, Zhen Dong*, Zhangcheng Zheng*, Amir Gholami*, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael W. Mahoney, Kurt Keutzer. “HAWQV3: Dyadic Neural Network Quantization,” ICML 2021.
- Amir Gholami*, Sehoon Kim*, Zhen Dong*, Zhewei Yao*, Michael W. Mahoney, Kurt Keutzer. “A Survey of Quantization Methods for Efficient Neural Network Inference,” BLPCV (Book of Low-Power Computer Vision) 2021.
- Tian Li, Xiang Chen, Shanghang Zhang, Zhen Dong, Kurt Keutzer. “Cross-Domain Sentiment Classification with In-Domain Contrastive Learning,” short version at NeurIPS 2020 SSL Workshop, long version at ICASSP 2021.
- Zhen Dong, Zhewei Yao, Yaohui Cai, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. “HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks,” NeurIPS 2020.
- Yaohui Cai*, Zhewei Yao*, Zhen Dong*, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. “ZeroQ: A Novel Zero Shot Quantization Framework,” CVPR 2020.
- Sheng Shen*, Zhen Dong*, Jiayu Ye*, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. “Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT,” Spotlight, AAAI 2020.
- Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Yaohui Cai, Michael Mahoney, Kurt Keutzer. “Trace Weighted Hessian-Aware Quantization,” Oral, Opt-Workshop, NeurIPS 2019.
- Q. Huang, D. Wang, Y. Gao, Y. Cai, Zhen Dong, B. Wu, K. Keutzer and J. Wawrzynek. “Algorithm-hardware Co-design for Deformable Convolution,” Oral, EMC2-Workshop, NeurIPS 2019.
- Zhen Dong, Yaohui Cai, Amir Gholami, Tianjun Zhang, Kurt Keutzer. “Ultra-low Bit Quantization for Visual Wake Word Challenge,” 2nd Place at VWW Competition, CVPR 2019.
- Zhen Dong*, Zhewei Yao*, Amir Gholami*, Michael W. Mahoney, Kurt Keutzer. “HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision,” ICCV 2019.
- Runze Han, Peng Huang, Yachen Xiang, Chen Liu, Zhen Dong, et al. “A Novel Convolution Computing Paradigm Based on NOR Flash Array with High Computing Speed and Energy Efficiency,” IEEE Transactions on Circuits and Systems (TCAS) 2019, p.1-12.
- Zhen Dong, Zheng Zhou, Zefan Li, Peng Huang, Lifeng Liu, Xiaoyan Liu, Jinfeng Kang. “Convolutional Neural Networks for Image Recognition and Online Learning Based on RRAM Devices,” IEEE Transactions on Electron Devices (TED) 2018, p.793-801.
- Jinfeng Kang, Zhen Dong, Peng Huang, Renze Han, Lifeng Liu, Xiaoyan Liu. China patent about 3D RRAM.
- Huang, P., Li, Z., Zhen Dong, Han, R., Zhou, Z., Zhu, D., Liu, L., Liu, X. and Kang, J. “Binary Resistive Switching Device Based Electronic Synapse with Spike-Rate-Dependent-Plasticity for Online Learning,” ACS Applied Electronic Materials 2018, pp. 845-853.
- Xinxin Wang, Peng Huang, Zhen Dong, Zheng Zhou, Yuning Jiang, Runze Han, Lifeng Liu, Xiaoyan Liu, Jinfeng Kang. “A Novel RRAM-based Adaptive-Threshold LIF Neuron Circuit for High Recognition Accuracy,” International Symposium on VLSI Technology, Systems and Applications (VLSI-TSA) 2018, pp. 1-2.
- Zhen Dong, Z. Zhou, Z. F. Li, C. Liu, Y. N. Jiang, P. Huang, L. F. Liu, X. Y. Liu, and J. F. Kang. “RRAM based convolutional neural networks for high accuracy pattern recognition and online learning tasks,” Oral, VLSI-SNW 2017, pp. 145-146. IEEE, 2017.
- Zheng Zhou, Chen Liu, Wensheng Shen, Zhen Dong, Zhe Chen, Peng Huang, Lifeng Liu, Xiaoyan Liu, Jinfeng Kang. “The Characteristics of Binary Spike-Time-Dependent Plasticity in HfO2-Based RRAM and Applications for Pattern Recognition,” Nanoscale Research Letters (NRL) 2017, 12(1), p.244.
- P. Huang, D. B. Zhu, C. Liu, Z. Zhou, Zhen Dong, H. Jiang, W. S. Shen, L. F. Liu, X. Y. Liu, and J. F. Kang. “RTN based Oxygen Vacancy Probing Method for Ox-RRAM Reliability Characterization and Its Application in Tail Bits,” International Electron Devices Meeting (IEDM) 2017, pp. 21-4.
Research Experience
Talks, Media & Events:
-
Q-Diffusion is featured in the newest TensorRT post.
-
I co-organized the 3rd Workshop on Practical Deep Learning: Towards Efficient and Reliable LLMs at IEEE Conference on Artificial Intelligence (IEEE CAI) 2024.
- NexusRaven-V2-13B is presented at NeurIPS 2023 EXPO.
- Media on NexusRaven and NexusRaven-V2: Nexusflow.AI Official Blog, Deci AI Top 10 Under-13B LLMs, Medium Article1, Medium Article 2, Siliconangle Article, Yahoo Finance Article, Business Wire Article, Huggingface’s Post, Mark Tech Post, Analytics Vidhya Article, Together AI’s Post, Post on YC Hacker News, Meta Llama’s Newsletter, Ollama AI’s Post, etc.
- Invited Talk “Efficient Inference and Training of Large Neural Network Models” at Intel oneAPI DevSummit for AI and HPC, on Aug 21, 2023.
- Invited Talk “Hardware-Aware Efficient Deep Learning” at Peking University Institute of Artificial Intelligence (PKU-IAI), on June 11, 2023.
- I co-organized the LOVEU (LOng-form VidEo Understanding) workshop at CVPR 2023, Link to Zhihu.
- Invited to host the Practical DL Workshop at AAAI 2023 in Washington DC.
- Invited Talk “Efficient Deep Learning via Quantization and HW-SW Co-Design” at Hardware and Algorithms for Learning On-a-chip Workshop (HALO) in ICCAD 2022.
- Invited Talk “Efficient Inference and Training of Large Neural Network Models” at Intel oneAPI DevSummit for AI and HPC, on Dec 6, 2022.
- My dissertation on “Hardware-aware Efficient Deep Learning” was defended on June 29, 2022.
- “Efficient Neural Networks through Systematic Quantization and Co-Design”, virtually at Matchlab (Imperial College London), [slides].
- CoDeNet and HAO are presented at ML@B Seminar (Machine Learning at Berkeley).
- “Hessian-Aware Pruning and Optimal Neural Implant”, WACV 2022, Hawaii, US, [slides].
- Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2021, Berkeley, US.
- The book that I contributed to, “Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence“, is online for ordering.
- “HAO: Hardware-aware neural Architecture Optimization for Efficient Inference”, FCCM 2021 (online).
- “HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks”, NeurIPS 2020.
- HAWQ-V2 gets recommended by JiangMen (将门) AI media (in Chinese), Link to ZhiHu.
- “Systematic Neural Network Quantization”, NVIDIA GTC 2021.
- “Efficient Neural Networks through Systematic Quantization”, BAIR/CPAR/BDD Seminar 2020, [slides].
- “HAWQ-V3: Dyadic Neural Network Quantization” is presented at TVM Conference 2020.
- “ZeroQ: A novel Zero-Shot Quantization Framework”, Real-Time Intelligent Secure Explainable Systems (RISELab) Retreat 2020, Lake Tahoe (online), US, [slides].
- Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2020, Santa Rosa, US.
- “Q-BERT: Hessian Based Quantization of BERT”, AAAI 2020, New York, US, [slides].
- Q-BERT gets recommended by Synced (机器之心) AI media (in Chinese), Link to WeChat.
- Q-BERT gets recommended by AI.Science (Aggregate Intellect), Link to YouTube.
- “Hessian-Aware trace-Weighted Quantization”, Beyond First-Order Methods in ML Workshop at NeurIPS 2019, Vancouver, Canada.
- Real-Time Intelligent Secure Explainable Systems (RISELab) Retreat 2019, Monterey, US.
- Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2019, Berkeley, US.
- Visual Wake Word Challenge, LPIRC Workshop at CVPR 2019, Long Beach, US, [slides], [link].
- “RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks”, VLSI-SNW 2017, Kyoto, Japan, [slides].
Industry Collaborations
- Intel, Amazon, Alibaba, NVIDIA, Panasonic, ByteDance, Google, Meta, Apple, AMD, Nexusflow.ai, Samsung, Tesla.
Opensource
- Magic-Me: [github][website][demo], voted best in Huggingface Daily Paper Recommendations.
- NexusRaven: [github][huggingface], NexusRaven-V2: [github].
- NexusRaven-V2-13B: [huggingface][demo][leaderboard], 378 likes, 61k+ downloads. Rank Top-5 on Huggingface Trending when released.
- SqueezeLLM: Dense-and-Sparse Quantization, [github][paper].
- Q-Diffusion: Quantizing Diffusion Models, [github][paper].
- Awesome Quantization Papers, [github].
- LOVEU-TGVE (Text-Guided Video Editing) dataset and benchmark, [github][homepage].
- HAWQV3: Dyadic Neural Network Quantization, [github][paper].
- ZeroQ: A novel Zero-Shot Quantization Framework, [github][paper].
- CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs, [github][paper].
- HAP: Hessian-Aware Pruning and Optimal Neural Implant, [github][paper].
- BitPack: Tool to efficiently save ultra-low precision/mixed-precision quantized models, [github].
Service
- Reviewer for TNNLS (IEEE Transactions on Neural Networks and Learning Systems), TMLR (Transactions of Machine Learning Research), TPAMI (Transactions on Pattern Analysis and Machine Intelligence), JMLR (Journal of Machine Learning Research), IEEE Micro, TED (IEEE Transactions on Electron Devices), PR (Pattern Recognition), TCSVT (IEEE Transactions on Circuits and Systems for Video Technology), OJCAS (IEEE Open Journal of Circuits and Systems), JCST (Journal of Computer Science and Technology) and Fundamental Research (Elsevier).
- Reviewer for NeurIPS, ICML, CVPR, ICCV, AAAI, ECCV, IJCAI, ICLR, WACV, KDD, MLSys, TinyML, ECV, BLPCV.
- TA for Applications of Parallel Computers, Berkeley CS 267.
- TA for Online Course Applications of Parallel Computers on Moodle XSEDE.
- TA for Optimization Analytics, Berkeley INDENG 240.
- TA for Mathematical Programming, Berkeley INDENG 262A.
- BAIR Mentoring Program for Underrepresented Undergraduates.