Homepage of Zhen Dong
PhD at UC Berkeley
Efficient deep learning and hardware-software co-design.
Model compression for LLMs and AIGC models.
Computer architectures beyond Von Neumann such as in-memory computing.
University of California at Berkeley:
Visual Object and Activity Recognition (4.00)
RISC-V CPU on FPGA Lab (4.00)
Digital Circuits and Computer Architecture (4.00)
Applications of Parallel Computers (4.00)
Statistical Learning Theory (4.00)
Convex Optimization and Approximation (4.00)
Peking University: (Rank 1/327 in EECS)
Digital Logic (4.00)
Principles of Digital Integrated Circuits (4.00)
Analog Circuits (3.99)
Advanced Analog Integrated Circuits Design (3.99)
Micro-Nano Integrated System (4.00)
Fundamentals of Solid State Physics (3.98)
Fundamentals of Semiconductor Materials (3.97)
Physics of Semiconductor (3.98)
Semiconductor Device Physics (3.98)
Principle of Integrated Circuits Process (3.99)
Winner of 2018-2020 Berkeley Fellowship.
Best Paper Nomination at Practical DL Workshop at AAAI 2023.
AWS Research Credits Award and Google Cloud Research Credits Award.
Tang Lixin Scholarship for outstanding students in China. (top 0.5%)
Tang Lixin 1st Prize Scholarship for graduate students studying abroad. (top 0.05%)
SenseTime Scholarship, National Scholarship and Fang Zheng Scholarship. (top 1%)
Pacemaker to Triple-A student and Triple-A student (twice) at Peking University.
1st Place in EMCC 2020 Competition on both Classification and Object Detection tracks.
2nd Place in Visual Wake Word Challenge at CVPR 2019.
1st Prize in the Chinese Olympiad in Physics and the Chinese Physics Competition for college students.
Princeton University Math Competition (PUMac): Top three among all participants in geometry group.
Top Ten Undergraduate Research Award at PKU EECS.
Outstanding Graduates at Peking University and Outstanding Graduates in Beijing.
- Yifan Zhang*, Zhen Dong*, Huanrui Yang, Ming Lu, Cheng-Ching Tseng, Yandong Guo, Kurt Keutzer, Li Du, Shanghang Zhang. “QD-BEV: Quantization-aware View-guided Distillation for Multi-view 3D Object Detection,” ICCV 2023.
- Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, Kurt Keutzer. “Q-Diffusion: Quantizing Diffusion Models,” ICCV 2023.
- Venkat Srinivasan, Zhen Dong, Banghua Zhu, Brian Yu, Hanzi Mao, Damon Mosk-Aoyama, Kurt Keutzer, Jiantao Jiao, Jian Zhang. “NexusRaven: A Commercially-Permissive Language Model for Function Calling,” FMDM Workshop & Instruction Workshop at NeurIPS 2023.
- Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer. “SqueezeLLM: Dense-and-Sparse Quantization,” arXiv 2023.
- Yijiang Liu, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang Zhang. “NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers,” CVPR 2023.
- Lirui Xiao, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang Zhang. “CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification,” DAC 2023.
- Yifan Zhang*, Zhen Dong*, Huanrui Yang, Ming Lu, Cheng-Ching Tseng, Yandong Guo, Kurt Keutzer, Li Du, Shanghang Zhang. “QD-BEV: Quantization-aware View-guided Distillation for 3D Object Detection,” Best Paper Nomination, Practical DL Workshop at AAAI 2023.
- Javier Campos, Zhen Dong, Javier Duarte, Amir Gholami, Michael Mahoney, Jovan Mitrevski and Nhan Tran. “End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs,” OSCAR Workshop at ISCA 2023.
- Tian Li, Xiang Chen, Zhen Dong, Weijiang Yu, Yijun Yan, Shanghang Zhang, Kurt Keutzer. “Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data,” Long Oral, IJCAI-ECAI 2022.
- Zhen Dong. “Hardware-aware Efficient Deep Learning,” PhD Thesis, 2022.
- Shixing Yu*, Zhewei Yao*, Amir Gholami*, Zhen Dong*, Michael W. Mahoney, and Kurt Keutzer. “Hessian-Aware Pruning and Optimal Neural Implant,” Oral, WACV 2022.
- Allison McCarn Deiana, Nhan Tran, … Zhen Dong, … Olivia Weng. “Applications and Techniques for Fast Machine Learning in Science,” Frontiers in Big Data 2022.
- Zhen Dong*, Yizhao Gao*, Qijing Huang, John Wawrzynek, Hayden K.H. So, Kurt Keutzer. “HAO: Hardware-aware neural Architecture Optimization for Efficient Inference,” Oral, FCCM 2021.
- Zhen Dong*, Dequan Wang*, Qijing Huang*, Yizhao Gao, Yaohui Cai, Tian Li, Bichen Wu, Kurt Keutzer, John Wawrzynek. “CoDeNet: Algorithm-hardware Co-design for Deformable Convolution,” Oral, FPGA 2021.
- Zhen Dong*, Kaicheng Zhou*, Guohao Li*, Qiang Zhou, Mingfei Guo, Bernard Ghanem, Kurt Keutzer, Shanghang Zhang. “UnrealNAS: Can We Search Neural Architectures with Unreal Data?” under review.
- Zhewei Yao*, Zhen Dong*, Zhangcheng Zheng*, Amir Gholami*, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael W. Mahoney, Kurt Keutzer. “HAWQV3: Dyadic Neural Network Quantization,” ICML 2021.
- Amir Gholami*, Sehoon Kim*, Zhen Dong*, Zhewei Yao*, Michael W. Mahoney, Kurt Keutzer. “A Survey of Quantization Methods for Efficient Neural Network Inference,” BLPCV (Book of Low-Power Computer Vision) 2021.
- Tian Li, Xiang Chen, Shanghang Zhang, Zhen Dong, Kurt Keutzer. “Cross-Domain Sentiment Classification with In-Domain Contrastive Learning,” short version at NeurIPS 2020 SSL Workshop, long version at ICASSP 2021.
- Zhen Dong, Zhewei Yao, Yaohui Cai, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. “HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks,” NeurIPS 2020.
- Yaohui Cai*, Zhewei Yao*, Zhen Dong*, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. “ZeroQ: A Novel Zero Shot Quantization Framework,” CVPR 2020.
- Sheng Shen*, Zhen Dong*, Jiayu Ye*, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. “Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT,” Spotlight, AAAI 2020.
- Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Yaohui Cai, Michael Mahoney, Kurt Keutzer. “Trace Weighted Hessian-Aware Quantization,” Oral, Opt-Workshop, NeurIPS 2019.
- Q. Huang, D. Wang, Y. Gao, Y. Cai, Zhen Dong, B. Wu, K. Keutzer and J. Wawrzynek. “Algorithm-hardware Co-design for Deformable Convolution,” Oral, EMC2-Workshop, NeurIPS 2019.
- Zhen Dong, Yaohui Cai, Amir Gholami, Tianjun Zhang, Kurt Keutzer. “Ultra-low Bit Quantization for Visual Wake Word Challenge,” 2nd Place at VWW Competition, CVPR 2019.
- Zhen Dong*, Zhewei Yao*, Amir Gholami*, Michael W. Mahoney, Kurt Keutzer. “HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision,” ICCV 2019.
- Runze Han, Peng Huang, Yachen Xiang, Chen Liu, Zhen Dong, et al. “A Novel Convolution Computing Paradigm Based on NOR Flash Array with High Computing Speed and Energy Efficiency,” IEEE Transactions on Circuits and Systems 2019, p.1-12.
- Zhen Dong, Zheng Zhou, Zefan Li, Peng Huang, Lifeng Liu, Xiaoyan Liu, Jinfeng Kang. “Convolutional Neural Networks for Image Recognition and Online Learning Based on RRAM Devices,” IEEE Transactions on Electron Devices 2018, p.793-801.
- Jinfeng Kang, Zhen Dong, Peng Huang, Renze Han, Lifeng Liu, Xiaoyan Liu. China patent about 3D RRAM.
- Huang, P., Li, Z., Zhen Dong, Han, R., Zhou, Z., Zhu, D., Liu, L., Liu, X. and Kang, J. “Binary Resistive Switching Device Based Electronic Synapse with Spike-Rate-Dependent-Plasticity for Online Learning,” ACS Applied Electronic Materials 2018, pp. 845-853.
- Xinxin Wang, Peng Huang, Zhen Dong, Zheng Zhou, Yuning Jiang, Runze Han, Lifeng Liu, Xiaoyan Liu, Jinfeng Kang. “A Novel RRAM-based Adaptive-Threshold LIF Neuron Circuit for High Recognition Accuracy,” International Symposium on VLSI Technology, Systems and Applications (VLSI-TSA) 2018, pp. 1-2.
- Zhen Dong, Z. Zhou, Z. F. Li, C. Liu, Y. N. Jiang, P. Huang, L. F. Liu, X. Y. Liu, and J. F. Kang. “RRAM based convolutional neural networks for high accuracy pattern recognition and online learning tasks,” Oral, VLSI-SNW 2017, pp. 145-146. IEEE, 2017.
- Zheng Zhou, Chen Liu, Wensheng Shen, Zhen Dong, Zhe Chen, Peng Huang, Lifeng Liu, Xiaoyan Liu, Jinfeng Kang. “The Characteristics of Binary Spike-Time-Dependent Plasticity in HfO2-Based RRAM and Applications for Pattern Recognition,” Nanoscale Research Letters 2017, 12(1), p.244.
- P. Huang, D. B. Zhu, C. Liu, Z. Zhou, Zhen Dong, H. Jiang, W. S. Shen, L. F. Liu, X. Y. Liu, and J. F. Kang. “RTN based Oxygen Vacancy Probing Method for Ox-RRAM Reliability Characterization and Its Application in Tail Bits,” International Electron Devices Meeting (IEDM) 2017, pp. 21-4.
PhD at Berkeley AI Research (BAIR)
Advisor: Prof. Kurt Keutzer
Research on Hessian-AWare Quantization (HAWQ, HAWQ-V2, ZeroQ) Nov 2018 – Oct 2022
- Propose a second order based method to decide mixed-precision configuration and block-wise fine-tuning order.
- Prove theorem to use the trace of Hessian as sensitivity metric and conduct fast Pareto frontier optimization.
- Extend HAWQ to segmentation, object detection tasks and achieve state-of-the-art results.
- Conduct fast end-to-end quantization without fine-tuning and without using any training/test data.
Research on HW-SW Co-design and NAS (HAWQ-V3, CoDeNet, HAO) Jan 2019 – Oct 2022
- Propose efficient deformable operations for object detection on embedded FPGAs.
- Design new FPGA-core with ultra-low precision arithmetic.
- HW-SW joint architecture search and efficient implementation of mixed-precision NNs on CPU/GPU/FPGAs
Research on Efficient Natural Language Processing (Q-BERT, DASK) June 2019 – Oct 2022
- Propose new method to reduce the model size of BERT-base for applications on edge devices.
- Use second order information to help reduce communications during distributed training.
- Mixed-precision distributed training on the cloud or efficient fine-tuning on the edge.
Research Intern, NVIDIA AI Lab
Research on efficient neural architecture search methods. May 2021 — Aug 2021
Research Intern, Facebook AI
Research on efficient natural language processing (NLP) with limited resources. May 2020 — Aug 2020
Undergraduate Visiting Researcher Program (UGVR), Stanford University
Advisor: Prof. H.-S. Philip Wong
Research on utilizing RRAM array for large-scale networks and transfer learning.
Research on building tools based on statistical ML for analyzing energy consumption and delay in 3D RRAM array.
Research Intern, SenseTime AI Lab
Research on 4-bit model compression (both weight and activation) on RetinaNet for the SenseTime database.
Research Assistant, EECS School, Peking University
Advisor: Prof. Jinfeng Kang
Research on spike-time-dependent plasticity (STDP) characteristics in Oxide-RRAM for brain- inspired computing.
Research on NVM-based hardware implementation of convolutional neural networks.
Talks and Media
- Invited Talk “Efficient Inference and Training of Large Neural Network Models” at Intel oneAPI DevSummit for AI and HPC, on Aug 21, 2023.
- Invited Talk “Hardware-Aware Efficient Deep Learning” at Peking University Institute of Artificial Intelligence (PKU-IAI), on June 11, 2023.
- I co-organized the LOVEU (LOng-form VidEo Understanding) workshop at CVPR 2023, Link to Zhihu.
- Invited to host the Practical DL Workshop at AAAI 2023 in Washington DC.
- Invited Talk “Efficient Deep Learning via Quantization and HW-SW Co-Design” at Hardware and Algorithms for Learning On-a-chip Workshop (HALO) in ICCAD 2022.
- Invited Talk “Efficient Inference and Training of Large Neural Network Models” at Intel oneAPI DevSummit for AI and HPC, on Dec 6, 2022.
- My dissertation on “Hardware-aware Efficient Deep Learning” was defended on June 29, 2022.
- “Efficient Neural Networks through Systematic Quantization and Co-Design”, virtually at Matchlab (Imperial College London), [slides].
- CoDeNet and HAO are presented at ML@B Seminar (Machine Learning at Berkeley).
- “Hessian-Aware Pruning and Optimal Neural Implant”, WACV 2022, Hawaii, US, [slides].
- Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2021, Berkeley, US.
- The book that I contributed to, “Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence“, is online for ordering.
- “HAO: Hardware-aware neural Architecture Optimization for Efficient Inference”, FCCM 2021 (online).
- “HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks”, NeurIPS 2020.
- HAWQ-V2 gets recommended by JiangMen (将门) AI media (in Chinese), Link to ZhiHu.
- “Systematic Neural Network Quantization”, NVIDIA GTC 2021.
- “Efficient Neural Networks through Systematic Quantization”, BAIR/CPAR/BDD Seminar 2020, [slides].
- “HAWQ-V3: Dyadic Neural Network Quantization” is presented at TVM Conference 2020.
- “ZeroQ: A novel Zero-Shot Quantization Framework”, Real-Time Intelligent Secure Explainable Systems (RISELab) Retreat 2020, Lake Tahoe (online), US, [slides].
- Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2020, Santa Rosa, US.
- “Q-BERT: Hessian Based Quantization of BERT”, AAAI 2020, New York, US, [slides].
- Q-BERT gets recommended by Synced (机器之心) AI media (in Chinese), Link to WeChat.
- Q-BERT gets recommended by AI.Science (Aggregate Intellect), Link to YouTube.
- “Hessian-Aware trace-Weighted Quantization”, Beyond First-Order Methods in ML Workshop at NeurIPS 2019, Vancouver, Canada.
- Real-Time Intelligent Secure Explainable Systems (RISELab) Retreat 2019, Monterey, US.
- Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2019, Berkeley, US.
- Visual Wake Word Challenge, LPIRC Workshop at CVPR 2019, Long Beach, US, [slides], [link].
- “RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks”, VLSI-SNW 2017, Kyoto, Japan, [slides].
- Intel, Amazon, Alibaba, NVIDIA, Panasonic, ByteDance, Google, Meta, Apple, Xilinx, Samsung, Tesla, Wave.
- SqueezeLLM: Dense-and-Sparse Quantization, [github][paper].
- Q-Diffusion: Quantizing Diffusion Models, [github][paper].
- Awesome Quantization Papers, [github].
- LOVEU-TGVE (Text-Guided Video Editing) dataset and benchmark, [github][homepage].
- HAWQV3: Dyadic Neural Network Quantization, [github][paper].
- ZeroQ: A novel Zero-Shot Quantization Framework, [github][paper].
- CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs, [github][paper].
- HAP: Hessian-Aware Pruning and Optimal Neural Implant, [github][paper].
- BitPack: Tool to efficiently save ultra-low precision/mixed-precision quantized models, [github].
- Reviewer for TNNLS (IEEE Transactions on Neural Networks and Learning Systems), TMLR (Transactions of Machine Learning Research), TPAMI (Transactions on Pattern Analysis and Machine Intelligence), JMLR (Journal of Machine Learning Research), IEEE Micro, TED (IEEE Transactions on Electron Devices), PR (Pattern Recognition), TCSVT (IEEE Transactions on Circuits and Systems for Video Technology), OJCAS (IEEE Open Journal of Circuits and Systems), JCST (Journal of Computer Science and Technology) and Fundamental Research (Elsevier).
- Reviewer for NeurIPS, ICML, CVPR, ICCV, AAAI, ECCV, IJCAI, KDD, ICLR, MLSys, WACV, TinyML, ECV, BLPCV.
- TA for Applications of Parallel Computers, Berkeley CS 267.
- TA for Online Course Applications of Parallel Computers on Moodle XSEDE.
- TA for Optimization Analytics, Berkeley INDENG 240.
- TA for Mathematical Programming, Berkeley INDENG 262A.
- BAIR Mentoring Program for Underrepresented Undergraduates.