Skip to content


Homepage of Zhen Dong

PhD at UC Berkeley

Research Interests

Efficient deep learning and hardware-software co-design.

Model compression for LLMs and AIGC models.

Computer architectures beyond Von Neumann such as in-memory computing.


University of California at Berkeley:

Visual Object and Activity Recognition (4.00)

RISC-V CPU on FPGA Lab (4.00)

Digital Circuits and Computer Architecture (4.00)

Applications of Parallel Computers (4.00)

Statistical Learning Theory (4.00)

Convex Optimization and Approximation (4.00)


Peking University: (Rank 1/327 in EECS)

Digital Logic (4.00)

Principles of Digital Integrated Circuits (4.00)

Analog Circuits (3.99)

Advanced Analog Integrated Circuits Design (3.99)


Micro-Nano Integrated System (4.00)

Fundamentals of Solid State Physics (3.98)

Fundamentals of Semiconductor Materials (3.97)

Physics of Semiconductor (3.98)

Semiconductor Device Physics (3.98)

Principle of Integrated Circuits Process (3.99)


  • Winner of 2018-2020 Berkeley Fellowship.

  • Best Paper Nomination at Practical DL Workshop at AAAI 2023.

  • AWS Research Credits Award and Google Cloud Research Credits Award.

  • Tang Lixin Scholarship for outstanding students in China. (top 0.5%)

  • Tang Lixin 1st Prize Scholarship for graduate students studying abroad. (top 0.05%)

  • SenseTime Scholarship, National Scholarship and Fang Zheng Scholarship. (top 1%)

  • Pacemaker to Triple-A student and Triple-A student (twice) at Peking University.

  • 1st Place in EMCC 2020 Competition on both Classification and Object Detection tracks.

  • 2nd Place in Visual Wake Word Challenge at CVPR 2019.

  • 1st Prize in the Chinese Olympiad in Physics and the Chinese Physics Competition for college students.

  • Princeton University Math Competition (PUMac): Top three among all participants in geometry group.

  • Top Ten Undergraduate Research Award at PKU EECS.

  • Outstanding Graduates at Peking University and Outstanding Graduates in Beijing.


Research Experience

   PhD at Berkeley AI Research (BAIR)

    Advisor: Prof. Kurt Keutzer

    Research on Hessian-AWare Quantization (HAWQ, HAWQ-V2, ZeroQ)                                                        Nov 2018 – Oct 2022

  • Propose a second order based method to decide mixed-precision configuration and block-wise fine-tuning order.
  • Prove theorem to use the trace of Hessian as sensitivity metric and conduct fast Pareto frontier optimization.
  • Extend HAWQ to segmentation, object detection tasks and achieve state-of-the-art results.
  • Conduct fast end-to-end quantization without fine-tuning and without using any training/test data.

    Research on HW-SW Co-design and NAS (HAWQ-V3, CoDeNet, HAO)                                                    Jan 2019 – Oct 2022

  • Propose efficient deformable operations for object detection on embedded FPGAs.
  • Design new FPGA-core with ultra-low precision arithmetic.
  • HW-SW joint architecture search and efficient implementation of mixed-precision NNs on CPU/GPU/FPGAs

    Research on Efficient Natural Language Processing (Q-BERT, DASK)                                                        June 2019 – Oct 2022

  • Propose new method to reduce the model size of BERT-base for applications on edge devices.
  • Use second order information to help reduce communications during distributed training.
  • Mixed-precision distributed training on the cloud or efficient fine-tuning on the edge.

   Research Intern, NVIDIA AI Lab

    Research on efficient neural architecture search methods.                                                                        May 2021 — Aug 2021

   Research Intern, Facebook AI

    Research on efficient natural language processing (NLP) with limited resources.                             May 2020 — Aug 2020

  Undergraduate Visiting Researcher Program (UGVR), Stanford University

   Advisor: Prof. H.-S. Philip Wong

   Research on utilizing RRAM array for large-scale networks and transfer learning.             

   Research on building tools based on statistical ML for analyzing energy consumption and delay in 3D RRAM array. 

  Research Intern, SenseTime AI Lab

   Research on 4-bit model compression (both weight and activation) on RetinaNet for the SenseTime database.

  Research Assistant, EECS School, Peking University

   Advisor: Prof. Jinfeng Kang

   Research on spike-time-dependent plasticity (STDP) characteristics in Oxide-RRAM for brain- inspired computing.    

   Research on NVM-based hardware implementation of convolutional neural networks.           

Talks and Media

  • Invited Talk “Hardware-Aware Efficient Deep Learning” at Peking University Institute of Artificial Intelligence (PKU-IAI), on June 11, 2023.
  • I co-organized the LOVEU (LOng-form VidEo Understanding) workshop at CVPR 2023, Link to Zhihu.
  • Invited to host the Practical DL Workshop at AAAI 2023 in Washington DC.
  • Invited Talk “Efficient Deep Learning via Quantization and HW-SW Co-Design” at Hardware and Algorithms for Learning On-a-chip Workshop (HALO) in ICCAD 2022.
  • My dissertation on “Hardware-aware Efficient Deep Learning” was defended on June 29, 2022.
  • “Efficient Neural Networks through Systematic Quantization and Co-Design”, virtually at Matchlab (Imperial College London), [slides].
  • CoDeNet and HAO are presented at ML@B Seminar (Machine Learning at Berkeley).
  • “Hessian-Aware Pruning and Optimal Neural Implant”, WACV 2022, Hawaii, US, [slides].
  • Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2021, Berkeley, US.
  • The book that I contributed to, “Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence“, is online for ordering.
  • “HAO: Hardware-aware neural Architecture Optimization for Efficient Inference”, FCCM 2021 (online).
  • “HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks”, NeurIPS 2020.
  • HAWQ-V2 gets recommended by JiangMen (将门) AI media (in Chinese), Link to ZhiHu.
  • “Systematic Neural Network Quantization”, NVIDIA GTC 2021.
  • “Efficient Neural Networks through Systematic Quantization”, BAIR/CPAR/BDD Seminar 2020, [slides].
  • “HAWQ-V3: Dyadic Neural Network Quantization” is presented at TVM Conference 2020.
  • “ZeroQ: A novel Zero-Shot Quantization Framework”, Real-Time Intelligent Secure Explainable Systems (RISELab) Retreat 2020, Lake Tahoe (online), US, [slides].
  • Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2020, Santa Rosa, US.
  • “Q-BERT: Hessian Based Quantization of BERT”, AAAI 2020, New York, US, [slides].
  • Q-BERT gets recommended by Synced (机器之心) AI media (in Chinese), Link to WeChat.
  • Q-BERT gets recommended by AI.Science (Aggregate Intellect), Link to YouTube.
  • “Hessian-Aware trace-Weighted Quantization”, Beyond First-Order Methods in ML Workshop at NeurIPS 2019, Vancouver, Canada.
  • Real-Time Intelligent Secure Explainable Systems (RISELab) Retreat 2019, Monterey, US.
  • Berkeley AI Research (BAIR)/ Berkeley Deep Drive (BDD) Workshop 2019, Berkeley, US.
  • Visual Wake Word Challenge, LPIRC Workshop at CVPR 2019, Long Beach, US, [slides], [link].
  • “RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks”, VLSI-SNW 2017, Kyoto, Japan, [slides].

Industry Collaborations

  • Intel, Amazon, Alibaba, NVIDIA, Panasonic, ByteDance, Google, Meta, Apple, Xilinx, Samsung, Tesla, Wave.


  • SqueezeLLM: Dense-and-Sparse Quantization, [github][paper].
  • Q-Diffusion: Quantizing Diffusion Models, [github][paper].
  • Awesome Quantization Papers, [github].
  • LOVEU-TGVE (Text-Guided Video Editing) dataset and benchmark, [github][homepage].
  • HAWQV3: Dyadic Neural Network Quantization, [github][paper].
  • ZeroQ: A novel Zero-Shot Quantization Framework, [github][paper].
  • CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs, [github][paper].
  • HAP: Hessian-Aware Pruning and Optimal Neural Implant, [github][paper].
  • BitPack: Tool to efficiently save ultra-low precision/mixed-precision quantized models, [github].


  • Reviewer for TNNLS (IEEE Transactions on Neural Networks and Learning Systems), TMLR (Transactions of Machine Learning Research), TPAMI (Transactions on Pattern Analysis and Machine Intelligence), JMLR (Journal of Machine Learning Research), IEEE Micro, TED (IEEE Transactions on Electron Devices), PR (Pattern Recognition), TCSVT (IEEE Transactions on Circuits and Systems for Video Technology), OJCAS (IEEE Open Journal of Circuits and Systems), JCST (Journal of Computer Science and Technology) and Fundamental Research (Elsevier).
  • TA for Applications of Parallel Computers, Berkeley CS 267.
  • TA for Online Course Applications of Parallel Computers on Moodle XSEDE.
  • TA for Optimization Analytics, Berkeley INDENG 240.
  • TA for Mathematical Programming, Berkeley INDENG 262A.
  • BAIR Mentoring Program for Underrepresented Undergraduates.


UC Berkeley, CA, 94709