Skip to content


Homepage of Zhen Dong

PhD & Postdoc at Berkeley AI Research

Research Interests

Large language model (LLM) compression.

Efficient deep learning for generative models (Vision & NLP).

Hardware-software co-design for efficient AI chips.


University of California at Berkeley:

Visual Object and Activity Recognition (4.00)

RISC-V CPU on FPGA Lab (4.00)

Digital Circuits and Computer Architecture (4.00)

Applications of Parallel Computers (4.00)

Statistical Learning Theory (4.00)

Convex Optimization and Approximation (4.00)


Peking University: (Rank 1/327 in EECS)

Digital Logic (4.00)

Principles of Digital Integrated Circuits (4.00)

Analog Circuits (3.99)

Advanced Analog Integrated Circuits Design (3.99)


Micro-Nano Integrated System (4.00)

Fundamentals of Solid State Physics (3.98)

Fundamentals of Semiconductor Materials (3.97)

Physics of Semiconductor (3.98)

Semiconductor Device Physics (3.98)

Principle of Integrated Circuits Process (3.99)


  • Winner of 2018-2020 Berkeley Fellowship.

  • Best Paper Nomination at Practical DL Workshop at AAAI 2023.

  • AWS Research Credits Award and Google Cloud Research Credits Award.

  • Tang Lixin Scholarship for outstanding students in China. (top 0.5%)

  • Tang Lixin 1st Prize Scholarship for graduate students studying abroad. (top 0.05%)

  • SenseTime Scholarship, National Scholarship and Fang Zheng Scholarship. (top 1%)

  • Pacemaker to Triple-A student and Triple-A student (twice) at Peking University.

  • 1st Place in EMCC 2020 Competition on both Classification and Object Detection tracks.

  • 2nd Place in Visual Wake Word Challenge at CVPR 2019.

  • 1st Prize in the Chinese Olympiad in Physics and the Chinese Physics Competition for college students.

  • Princeton University Math Competition (PUMac): Top three among all participants in geometry group.

  • Top Ten Undergraduate Research Award at PKU EECS.

  • Outstanding Graduates at Peking University and Outstanding Graduates in Beijing.


Research Experience

   PhD at Berkeley AI Research (BAIR)

    Advisor: Prof. Kurt Keutzer

    Research on Hessian-AWare Quantization (HAWQ, HAWQ-V2, ZeroQ)                                                        Nov 2018 – Oct 2022

  • Propose a second order based method to decide mixed-precision configuration and block-wise fine-tuning order.
  • Prove theorem to use the trace of Hessian as sensitivity metric and conduct fast Pareto frontier optimization.
  • Extend HAWQ to segmentation, object detection tasks and achieve state-of-the-art results.
  • Conduct fast end-to-end quantization without fine-tuning and without using any training/test data.

    Research on HW-SW Co-design and NAS (HAWQ-V3, CoDeNet, HAO)                                                    Jan 2019 – Oct 2022

  • Propose efficient deformable operations for object detection on embedded FPGAs.
  • Design new FPGA-core with ultra-low precision arithmetic.
  • HW-SW joint architecture search and efficient implementation of mixed-precision NNs on CPU/GPU/FPGAs

    Research on Efficient Natural Language Processing (Q-BERT, DASK)                                                        June 2019 – Oct 2022

  • Propose new method to reduce the model size of BERT-base for applications on edge devices.
  • Use second order information to help reduce communications during distributed training.
  • Mixed-precision distributed training on the cloud or efficient fine-tuning on the edge.

   Research Intern, Bytedance AI Lab

    Research on novel diffusion models for better personalized text-to-image generation.                     Jan 2023 — Apr 2023

   Research Intern, NVIDIA AI Lab

    Research on efficient neural architecture search methods.                                                                        May 2021 — Aug 2021

   Research Intern, Facebook AI Research

    Research on efficient natural language processing (NLP) with limited resources.                             May 2020 — Aug 2020

  Undergraduate Visiting Researcher Program (UGVR), Stanford University

   Advisor: Prof. H.-S. Philip Wong

   Research on utilizing RRAM array for large-scale networks and transfer learning.             

   Research on building tools based on statistical ML for analyzing energy consumption and delay in 3D RRAM array. 

  Research Intern, SenseTime AI Lab

   Research on 4-bit model compression (both weight and activation) on RetinaNet for the SenseTime database.

  Research Assistant, EECS School, Peking University

   Advisor: Prof. Jinfeng Kang

   Research on spike-time-dependent plasticity (STDP) characteristics in Oxide-RRAM for brain- inspired computing.    

   Research on NVM-based hardware implementation of convolutional neural networks.           

Talks, Media & Events:

Industry Collaborations

  • Intel, Amazon, Alibaba, NVIDIA, Panasonic, ByteDance, Google, Meta, Apple, AMD,, Samsung, Tesla.



  • Reviewer for TNNLS (IEEE Transactions on Neural Networks and Learning Systems), TMLR (Transactions of Machine Learning Research), TPAMI (Transactions on Pattern Analysis and Machine Intelligence), JMLR (Journal of Machine Learning Research), IEEE Micro, TED (IEEE Transactions on Electron Devices), PR (Pattern Recognition), TCSVT (IEEE Transactions on Circuits and Systems for Video Technology), OJCAS (IEEE Open Journal of Circuits and Systems), JCST (Journal of Computer Science and Technology) and Fundamental Research (Elsevier).
  • TA for Applications of Parallel Computers, Berkeley CS 267.
  • TA for Online Course Applications of Parallel Computers on Moodle XSEDE.
  • TA for Optimization Analytics, Berkeley INDENG 240.
  • TA for Mathematical Programming, Berkeley INDENG 262A.
  • BAIR Mentoring Program for Underrepresented Undergraduates.


UC Berkeley, CA, 94709