Skip to content

 

Homepage of Zhen Dong

PhD & Postdoc at Berkeley AI Research

Research Interests

Large language model (LLM) compression.

Efficient deep learning for generative models (Vision & NLP).

Hardware-software co-design for efficient AI chips.

Education

University of California at Berkeley:

Visual Object and Activity Recognition (4.00)

RISC-V CPU on FPGA Lab (4.00)

Digital Circuits and Computer Architecture (4.00)

Applications of Parallel Computers (4.00)

Statistical Learning Theory (4.00)

Convex Optimization and Approximation (4.00)

 

Peking University: (Rank 1/327 in EECS)

Digital Logic (4.00)

Principles of Digital Integrated Circuits (4.00)

Analog Circuits (3.99)

Advanced Analog Integrated Circuits Design (3.99)

 

Micro-Nano Integrated System (4.00)

Fundamentals of Solid State Physics (3.98)

Fundamentals of Semiconductor Materials (3.97)

Physics of Semiconductor (3.98)

Semiconductor Device Physics (3.98)

Principle of Integrated Circuits Process (3.99)

Awards

  • Winner of 2018-2020 Berkeley Fellowship.

  • Best Paper Nomination at Practical DL Workshop at AAAI 2023.

  • AWS Research Credits Award and Google Cloud Research Credits Award.

  • Tang Lixin Scholarship for outstanding students in China. (top 0.5%)

  • Tang Lixin 1st Prize Scholarship for graduate students studying abroad. (top 0.05%)

  • SenseTime Scholarship, National Scholarship and Fang Zheng Scholarship. (top 1%)

  • Pacemaker to Triple-A student and Triple-A student (twice) at Peking University.

  • 1st Place in EMCC 2020 Competition on both Classification and Object Detection tracks.

  • 2nd Place in Visual Wake Word Challenge at CVPR 2019.

  • 1st Prize in the Chinese Olympiad in Physics and the Chinese Physics Competition for college students.

  • Princeton University Math Competition (PUMac): Top three among all participants in geometry group.

  • Top Ten Undergraduate Research Award at PKU EECS.

  • Outstanding Graduates at Peking University and Outstanding Graduates in Beijing.

Publications

Research Experience

   PhD at Berkeley AI Research (BAIR)

    Advisor: Prof. Kurt Keutzer

    Research on Hessian-AWare Quantization (HAWQ, HAWQ-V2, ZeroQ)                                                        Nov 2018 – Oct 2022

  • Propose a second order based method to decide mixed-precision configuration and block-wise fine-tuning order.
  • Prove theorem to use the trace of Hessian as sensitivity metric and conduct fast Pareto frontier optimization.
  • Extend HAWQ to segmentation, object detection tasks and achieve state-of-the-art results.
  • Conduct fast end-to-end quantization without fine-tuning and without using any training/test data.

    Research on HW-SW Co-design and NAS (HAWQ-V3, CoDeNet, HAO)                                                    Jan 2019 – Oct 2022

  • Propose efficient deformable operations for object detection on embedded FPGAs.
  • Design new FPGA-core with ultra-low precision arithmetic.
  • HW-SW joint architecture search and efficient implementation of mixed-precision NNs on CPU/GPU/FPGAs

    Research on Efficient Natural Language Processing (Q-BERT, DASK)                                                        June 2019 – Oct 2022

  • Propose new method to reduce the model size of BERT-base for applications on edge devices.
  • Use second order information to help reduce communications during distributed training.
  • Mixed-precision distributed training on the cloud or efficient fine-tuning on the edge.

   Research Intern, NVIDIA AI Lab

    Research on efficient neural architecture search methods.                                                                        May 2021 — Aug 2021

   Research Intern, Facebook AI

    Research on efficient natural language processing (NLP) with limited resources.                             May 2020 — Aug 2020


  Undergraduate Visiting Researcher Program (UGVR), Stanford University

   Advisor: Prof. H.-S. Philip Wong

   Research on utilizing RRAM array for large-scale networks and transfer learning.             

   Research on building tools based on statistical ML for analyzing energy consumption and delay in 3D RRAM array. 

  Research Intern, SenseTime AI Lab

   Research on 4-bit model compression (both weight and activation) on RetinaNet for the SenseTime database.

  Research Assistant, EECS School, Peking University

   Advisor: Prof. Jinfeng Kang

   Research on spike-time-dependent plasticity (STDP) characteristics in Oxide-RRAM for brain- inspired computing.    

   Research on NVM-based hardware implementation of convolutional neural networks.           

Talks, Media & Events:

Industry Collaborations

  • Intel, Amazon, Alibaba, NVIDIA, Panasonic, ByteDance, Google, Meta, Apple, Xilinx, Samsung, Tesla, Wave.

Opensource

  • NexusRaven [github][huggingface], NexusRaven-V2 [github].
  • NexusRaven-V2-13B [huggingface][demo][leaderboard], 350 likes, 10k+ downloads. Rank Top-5 on Huggingface Trending when released.
  • SqueezeLLM: Dense-and-Sparse Quantization, [github][paper].
  • Q-Diffusion: Quantizing Diffusion Models, [github][paper].
  • Awesome Quantization Papers, [github].
  • LOVEU-TGVE (Text-Guided Video Editing) dataset and benchmark, [github][homepage].
  • HAWQV3: Dyadic Neural Network Quantization, [github][paper].
  • ZeroQ: A novel Zero-Shot Quantization Framework, [github][paper].
  • CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs, [github][paper].
  • HAP: Hessian-Aware Pruning and Optimal Neural Implant, [github][paper].
  • BitPack: Tool to efficiently save ultra-low precision/mixed-precision quantized models, [github].

Service

  • Reviewer for TNNLS (IEEE Transactions on Neural Networks and Learning Systems), TMLR (Transactions of Machine Learning Research), TPAMI (Transactions on Pattern Analysis and Machine Intelligence), JMLR (Journal of Machine Learning Research), IEEE Micro, TED (IEEE Transactions on Electron Devices), PR (Pattern Recognition), TCSVT (IEEE Transactions on Circuits and Systems for Video Technology), OJCAS (IEEE Open Journal of Circuits and Systems), JCST (Journal of Computer Science and Technology) and Fundamental Research (Elsevier).
  • Reviewer for NeurIPS, ICML, CVPR, ICCV, AAAI, ECCV, IJCAI, ICLR, WACV, KDD, MLSys, TinyML, ECV, BLPCV.
  • TA for Applications of Parallel Computers, Berkeley CS 267.
  • TA for Online Course Applications of Parallel Computers on Moodle XSEDE.
  • TA for Optimization Analytics, Berkeley INDENG 240.
  • TA for Mathematical Programming, Berkeley INDENG 262A.
  • BAIR Mentoring Program for Underrepresented Undergraduates.

Contact

UC Berkeley, CA, 94709
zhendong@berkeley.edu