Skip to content

Homepage of Zhen Dong

PhD Candidate at UC Berkeley

Research Interests

Hardware and software co-design for efficient deep learning.

Model compression for classification/object detection/NLP on embedded platforms.

AutoML and Hardware-aware neural architecture search.

Computer architectures beyond Von Neumann such as in-memory computing.


University of California at Berkeley:

Visual Object and Activity Recognition (4.00)

RISC-V CPU on FPGA Lab (4.00)

Digital Circuits and Computer Architecture (4.00)

Applications of Parallel Computers (4.00)

Statistical Learning Theory (4.00)

Convex Optimization and Approximation (4.00)


Peking University: (Rank 1/327 in EECS)

Digital Logic (4.00)

Principles of Digital Integrated Circuits (4.00)

Analog Circuits (3.99)

Advanced Analog Integrated Circuits Design (3.99)


Micro-Nano Integrated System (4.00)

Fundamentals of Solid State Physics (3.98)

Fundamentals of Semiconductor Materials (3.97)

Physics of Semiconductor (3.98)

Semiconductor Device Physics (3.98)

Principle of Integrated Circuits Process (3.99)


  • Winner of 2018-2020 Berkeley Fellowship.

  • AWS Research Credits Award and Google Cloud Research Credits Award.

  • Tang Lixin Scholarship for outstanding students in China. (top 0.5%)

  • Tang Lixin 1st Prize Scholarship for graduate students studying abroad. (top 0.05%)

  • SenseTime Scholarship, National Scholarship and Fang Zheng Scholarship. (top 1%)

  • Pacemaker to Triple-A student and Triple-A student (twice) at Peking University.

  • 1st Place in EMCC 2020 Competition on both Classification and Object Detection tracks.

  • 2nd Place in Visual Wake Word Challenge at CVPR 2019.

  • 1st Prize in the Chinese Olympiad in Physics and the Chinese Physics Competition for college students.

  • Princeton University Math Competition (PUMac): Top three among all participants in geometry group.

  • Top Ten Undergraduate Research Award at PKU EECS.

  • Outstanding Graduates at Peking University and Outstanding Graduates in Beijing.


  • Zhen Dong*, Yizhao Gao*, Qijing Huang, John Wawrzynek, Hayden K.H. So, Kurt Keutzer. “HAO: Hardware-aware neural Architecture Optimization for Efficient Inference,” Oral, FCCM 2021.
  • Zhen Dong*, Kaicheng Zhou*, Guohao Li*, Qiang Zhou, Mingfei Guo, Bernard Ghanem, Kurt Keutzer, Shanghang Zhang. “UnrealNAS: Can We Search Neural Architectures with Unreal Data?” under review.
  • Zhen Dong*, Dequan Wang*, Qijing Huang*, Yizhao Gao, Yaohui Cai, Tian Li, Bichen Wu, Kurt Keutzer, John Wawrzynek. “CoDeNet: Algorithm-hardware Co-design for Deformable Convolution,” Oral, FPGA 2021.
  • Zhewei Yao*, Zhen Dong*, Zhangcheng Zheng*, Amir Gholami*, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael W. Mahoney, Kurt Keutzer. “HAWQV3: Dyadic Neural Network Quantization,” ICML 2021.
  • Tian Li, Xiang Chen, Zhen Dong, Weijiang Yu, Yijun Yan, Shanghang Zhang, Kurt Keutzer. “Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data,” Long Oral, IJCAI-ECAI 2022.
  • Zhen Dong, Zhewei Yao, Yaohui Cai, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. “HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks,” NeurIPS 2020.
  • Shixing Yu*, Zhewei Yao*, Amir Gholami*, Zhen Dong*, Michael W. Mahoney, and Kurt Keutzer. “Hessian-Aware Pruning and Optimal Neural Implant,” Oral, WACV 2022.
  • Amir Gholami*, Sehoon Kim*, Zhen Dong*, Zhewei Yao*, Michael W. Mahoney, Kurt Keutzer. “A Survey of Quantization Methods for Efficient Neural Network Inference,” BLPCV (Book of Low-Power Computer Vision) 2021.
  • Tian Li, Xiang Chen, Shanghang Zhang, Zhen Dong, Kurt Keutzer. “Cross-Domain Sentiment Classification with In-Domain Contrastive Learning,” short version at NeurIPS 2020 SSL Workshop, long version at ICASSP 2021.
  • Yaohui Cai*, Zhewei Yao*, Zhen Dong*, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. “ZeroQ: A Novel Zero Shot Quantization Framework,” CVPR 2020.
  • Sheng Shen*, Zhen Dong*, Jiayu Ye*, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. “Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT,” Spotlight, AAAI 2020.
  • Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Yaohui Cai, Michael Mahoney, Kurt Keutzer. “Trace Weighted Hessian-Aware Quantization,” Oral, Opt-Workshop, NeurIPS 2019.
  • Allison McCarn Deiana, Nhan Tran, … Zhen Dong, … Olivia Weng. “Applications and Techniques for Fast Machine Learning in Science,” Frontiers in Big Data 2022.
  • Q. Huang, D. Wang, Y. Gao, Y. Cai, Zhen Dong, B. Wu, K. Keutzer and J. Wawrzynek. “Algorithm-hardware Co-design for Deformable Convolution,” Oral, EMC2-Workshop, NeurIPS 2019.
  • Zhen Dong, Yaohui Cai, Amir Gholami, Tianjun Zhang, Kurt Keutzer. “Ultra-low Bit Quantization for Visual Wake Word Challenge,” 2nd Place at VWW Competition, CVPR 2019.
  • Zhen Dong*, Zhewei Yao*, Amir Gholami*, Michael W. Mahoney, Kurt Keutzer. “HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision,” ICCV 2019.
  • Zhen Dong, Zheng Zhou, Zefan Li, Peng Huang, Lifeng Liu, Xiaoyan Liu, Jinfeng Kang. “Convolutional Neural Networks for Image Recognition and Online Learning Based on RRAM Devices,” IEEE Transactions on Electron Devices 2018, p.793-801.
  • Jinfeng Kang, Zhen Dong, Peng Huang, Renze Han, Lifeng Liu, Xiaoyan Liu. China patent about 3D RRAM.
  • Huang, P., Li, Z., Zhen Dong, Han, R., Zhou, Z., Zhu, D., Liu, L., Liu, X. and Kang, J. “Binary Resistive Switching Device Based Electronic Synapse with Spike-Rate-Dependent-Plasticity for Online Learning,” ACS Applied Electronic Materials 2018, pp. 845-853.
  • Zhen Dong, Z. Zhou, Z. F. Li, C. Liu, Y. N. Jiang, P. Huang, L. F. Liu, X. Y. Liu, and J. F. Kang. “RRAM based convolutional neural networks for high accuracy pattern recognition and online learning tasks,” Oral, VLSI-SNW 2017, pp. 145-146. IEEE, 2017.
  • Runze Han, Peng Huang, Yachen Xiang, Chen Liu, Zhen Dong, et al. “A Novel Convolution Computing Paradigm Based on NOR Flash Array With High Computing Speed and Energy Efficiency,” IEEE Transactions on Circuits and Systems, p.1-12.
  • Xinxin Wang, Peng Huang, Zhen Dong, Zheng Zhou, Yuning Jiang, Runze Han, Lifeng Liu, Xiaoyan Liu, Jinfeng Kang. “A Novel RRAM-based Adaptive-Threshold LIF Neuron Circuit for High Recognition Accuracy,” International Symposium on VLSI Technology, Systems and Applications (VLSI-TSA), pp. 1-2.
  • Zheng Zhou, Chen Liu, Wensheng Shen, Zhen Dong, Zhe Chen, Peng Huang, Lifeng Liu, Xiaoyan Liu, Jinfeng Kang. “The Characteristics of Binary Spike-Time-Dependent Plasticity in HfO2-Based RRAM and Applications for Pattern Recognition,” Nanoscale Research Letters, 12(1), p.244.
  • P. Huang, D. B. Zhu, C. Liu, Z. Zhou, Zhen Dong, H. Jiang, W. S. Shen, L. F. Liu, X. Y. Liu, and J. F. Kang. “RTN based Oxygen Vacancy Probing Method for Ox-RRAM Reliability Characterization and Its Application in Tail Bits,” International Electron Devices Meeting (IEDM) 2017, pp. 21-4.

Research Experience

PhD Student, Electrical Engineering and Computer Sciences, UC Berkeley

Advisor: Prof. Kurt Keutzer

Research on Hessian-AWare Quantization (HAWQ, HAWQ-V2, ZeroQ)                                             Nov 2018 – present

  • Propose a second order based method to decide mixed-precision configuration and block-wise fine-tuning order.
  • Prove theorem to use the trace of Hessian as sensitivity metric and conduct fast Pareto frontier optimization.
  • Extend HAWQ to segmentation, object detection tasks and achieve state-of-the-art results.
  • Conduct fast end-to-end quantization without fine-tuning and without using any training/test data.

Research on HW-SW Co-design and NAS (HAWQ-V3, CoDeNet, HAO)                                               Jan 2019 – present

  • Propose efficient deformable operations for object detection on embedded FPGAs.
  • Design new FPGA-core with ultra-low precision arithmetic.
  • HW-SW joint architecture search and efficient implementation of mixed-precision NNs on CPU/GPU/FPGAs

Research on Efficient Natural Language Processing (Q-BERT)                                                          June 2019 – present

  • Propose new method to reduce the model size of BERT-base for applications on edge devices.
  • Use second order information to help reduce communications during distributed training.
  • Mixed-precision distributed training on the cloud or efficient fine-tuning on the edge.

Research Intern, NVIDIA AI Lab

Research on efficient neural architecture search methods.                                                             May 2021 — August 2021

Research Intern, Facebook AI

Research on efficient natural language processing (NLP) with limited resources.                       May 2020 — August 2020

Undergraduate visiting researcher program (UGVR), Stanford University

Advisor: Prof. H.-S. Philip Wong

Research on utilizing RRAM array for large-scale networks and transfer learning.             

Research on building tools based on statistical ML for analyzing energy consumption and delay in 3D RRAM array. 

Research Intern, SenseTime Corporation

Research on 4-bit model compression (both weight and activation) on RetinaNet for the SenseTime database.

Research Assistant, EECS School, Peking University

Advisor: Prof. Jinfeng Kang

Research on spike-time-dependent plasticity (STDP) characteristics in Oxide-RRAM for brain-inspired computing.    

Research on NVM-based hardware implementation of convolutional neural networks.           

Talks and Media

  • Invited Talk “Efficient Deep Learning via Quantization and HW-SW Co-Design” at Hardware and Algorithms for Learning On-a-chip Workshop (HALO) in ICCAD 2022.
  • My dissertation on “Hardware-aware Efficient Deep Learning” was defended on June 29, 2022.
  • “Efficient Neural Networks through Systematic Quantization and Co-Design”, virtually at Matchlab (Imperial College London), [slides].
  • CoDeNet and HAO are presented at ML@B Seminar (Machine Learning at Berkeley).
  • “Hessian-Aware Pruning and Optimal Neural Implant”, WACV 2022, Hawaii, US, [slides].
  • Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2021, Berkeley, US.
  • The book that I contributed to, “Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence“, is online for ordering.
  • “HAO: Hardware-aware neural Architecture Optimization for Efficient Inference”, FCCM 2021 (online).
  • “HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks”, NeurIPS 2020.
  • HAWQ-V2 gets recommended by JiangMen (将门) AI media (in Chinese), Link to ZhiHu.
  • “Systematic Neural Network Quantization”, NVIDIA GTC 2021.
  • “Efficient Neural Networks through Systematic Quantization”, BAIR/CPAR/BDD Seminar 2020, [slides].
  • “HAWQ-V3: Dyadic Neural Network Quantization” is presented at TVM Conference 2020.
  • “ZeroQ: A novel Zero-Shot Quantization Framework”, Real-Time Intelligent Secure Explainable Systems (RISELab) Retreat 2020, Lake Tahoe (online), US, [slides].
  • Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2020, Santa Rosa, US.
  • “Q-BERT: Hessian Based Quantization of BERT”, AAAI 2020, New York, US, [slides].
  • Q-BERT gets recommended by Synced (机器之心) AI media (in Chinese), Link to WeChat.
  • Q-BERT gets recommended by AI.Science (Aggregate Intellect), Link to YouTube.
  • “Hessian-Aware trace-Weighted Quantization”, Beyond First-Order Methods in ML Workshop at NeurIPS 2019, Vancouver, Canada.
  • Real-Time Intelligent Secure Explainable Systems (RISELab) Retreat 2019, Monterey, US.
  • Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2019, Berkeley, US.
  • Visual Wake Word Challenge, LPIRC Workshop at CVPR 2019, Long Beach, US, [slides], [link].
  • “RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks”, VLSI-SNW 2017, Kyoto, Japan, [slides].

Industry Collaborations

  • Alibaba, Amazon, Intel, NVIDIA, Google, Facebook, Apple, Xilinx, Samsung, Tesla, Panasonic, Wave.


  • HAWQV3: Dyadic Neural Network Quantization, [github][paper].
  • ZeroQ: A novel Zero-Shot Quantization Framework, [github][paper].
  • CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs, [github][paper].
  • HAP: Hessian-Aware Pruning and Optimal Neural Implant, [github][paper].
  • Awesome Quantization Papers, [github].
  • BitPack: Tool to efficiently save ultra-low precision/mixed-precision quantized models, [github].


  • Reviewer for TNNLS (IEEE Transactions on Neural Networks and Learning Systems), TMLR (Transactions of Machine Learning Research), TPAMI (Transactions on Pattern Analysis and Machine Intelligence), JMLR (Journal of Machine Learning Research), IEEE Micro, TED (IEEE Transactions on Electron Devices), TCSVT (IEEE Transactions on Circuits and Systems for Video Technology), OJCAS (IEEE Open Journal of Circuits and Systems), JCST (Journal of Computer Science and Technology) and Fundamental Research (Elsevier).
  • TA for Applications of Parallel Computers, Berkeley CS 267.
  • TA for Online Course Applications of Parallel Computers on Moodle XSEDE.
  • TA for Optimization Analytics, Berkeley INDENG 240.
  • TA for Mathematical Programming, Berkeley INDENG 262A.
  • BAIR Mentoring Program for Underrepresented Undergraduates.


UC Berkeley, CA, 94709