Skip to content

 

Homepage of Zhen Dong

PhD & Postdoc at Berkeley AI Research

Research Interests

Efficient AI: Efficient inference and training for generative models (Vision & NLP)

LLM compression, AI systems, serving and acceleration

Function-calling LLM agents and multi-agent systems

Hardware-software co-design and AI for science

Efficient evaluation and alignment of foundation models

Education

Ph.D./Postdoc at University of California, Berkeley

B.S. at Peking University: Rank 1/327 in EECS

 

Awards

  • Winner of 2018-2020 Berkeley Fellowship.

  • Winner of PhD Forum (Second Place) at DAC 2024.

  • Doctoral Consortium at CVPR 2024.

  • Best Paper Nomination at Practical DL Workshop at AAAI 2023.

  • 1st Place in EMCC 2020 Competition on both Classification and Object Detection tracks.

  • 2nd Place in Visual Wake Word Challenge at CVPR 2019.

  • AWS Research Credits Award and Google Cloud Research Credits Award.

  • 1st Place Research Funding Proposal at Berkeley Deep Drive (BDD) 2019

  • Winner of SenseTime Scholarship in 2018

  • Winner of Tang Lixin Scholarship for outstanding students in China (top 0.5%).

  • Winner of Tang Lixin 1st Prize Scholarship for graduate students studying abroad (top 0.05%).

  • Winner of Fang Zheng Scholarship (top 1%).

  • 1st Prize in the Chinese Olympiad in Physics and the Chinese Physics Competition for college students.

  • Princeton University Math Competition (PUMac): Top three among all participants in geometry group.

  • Top Ten Undergraduate Research Award at PKU EECS.

  • Outstanding Graduates at Peking University and Outstanding Graduates in Beijing.

Publications

Research Experience

Ph.D./Postdoc, Berkeley AI Research (BAIR), UC Berkeley                           Aug 2018 — Jun 2023

Advisor: Prof. Kurt Keutzer

Research on Hessian-AWare Quantization: HAWQ (ICCV’19), HAWQ-V2 (NeurIPS’20), ZeroQ (CVPR’20), HAP (WACV’22), Quantization Review (BLPCV’22), QD-BEV (ICCV’23), NoisyQuant (CVPR’23)

  • Propose a Hessian-based method to decide mixed-precision configuration and block-wise fine-tuning order.
  • Prove theorem to use the trace of Hessian as sensitivity metric and conduct fast Pareto frontier optimization.
  • Generalize to segmentation, 2D/3D object detection tasks and achieve state-of-the-art results.
  • Conduct fast end-to-end quantization without fine-tuning and without using any training/test data.

Research on HW-SW Co-design: HAWQ-V3 (ICML’21), CoDeNet (FPGA’21), HAO (FCCM’21), CSQ (DAC’23), EPIM (DAC’24)

  • Achieve hardware-aware quantization and utilize 4-bit Tensor Cores for inference acceleration.
  • Implement 4-bit kernels and mixed-precision on TVM, achieve 7.4x compression and 5.4x speedup against fp32.
  • Propose efficient deformable op on embedded FPGAs, design new FPGA-core with ultra-low precision arithmetic.
  • HW-SW joint architecture search and efficient implementation of mixed-precision NNs on CPU/GPU/FPGAs/PIM.

Research on Efficient LLMs and Diffusion Models: Q-BERT (AAAI’20), Q-Diffusion (ICCV’23), SqueezeLLM (ICML’24), PB-LLM (ICLR’24)

  • Propose sensitivity-based non-uniform quantization and dense-and-sparse decompose for handling of outliers.
  • Pioneer the usage of Hessian information to guide LLM quantization in both PTQ and QAT.
  • Implement 3/4-bit CUDA kernels and achieve 4.6x compression compared to fp16 and 2.4x speedup on an A6000.
  • Propose timestep-aware calibration and split shortcut quantize to achieve 4-bit diffusion models at the first time.

Research on Multi-agent Systems: MAgIC (EMNLP’24)

  • Pioneer the integration of probabilistic graphical modeling (PGM) to enhance the cognitive abilities of LLMs.
  • Present a framework to evaluate LLM-powered multi-agent systems by employing social deduction games.

Research on Image & Video Generative Models: PromptCoT (CVPR’24), ViewControl (IJCAI’24), D-Edit (AAAI’25), Meissonic, Magic-Me, VEditBench, K-Sort Arena

  • Propose new methods to achieve better control ability of generative diffusion models.
  • Develop novel efficient Arena algorithms for human-in-the-loop evaluation and alignment.
  • Present Meissonic-1B that elevates masked image modeling (MIM) text-to-image models to SDXL-level.

Research on AI for Science: FastML (Frontiers in Big Data’22), High-Momentum Particle Trigger (TRETS’24)

  • Review AI inference acceleration methods and how they help dark matter search, morphology characterization, etc.
  • Implement efficient AI on ASICs and FPGAs to reduce time cost and enable particle trigger decisions at CERN LHC.

Research Intern, Bytedance AI Lab                                                                          Jan 2023 — Apr 2023

Research Intern, NVIDIA AI Lab                                                                                 May 2021 — Aug 2021

Research Intern, Facebook AI Research                                                               May 2020 — Aug 2020


Undergraduate Visiting Researcher Program (UGVR), Stanford University

Advisor: Prof. H.-S. Philip Wong

Research Intern, SenseTime AI Lab

Research Assistant, EECS School, Peking University

Advisor: Prof. Jinfeng Kang          

Teaching

  • Head Graduate Student Instructor for Applications of Parallel Computers, Berkeley CS 267.
  • Course Coordinator for Online Course Applications of Parallel Computers on Moodle XSEDE.
  • Graduate Student Instructor for Optimization Analytics, Berkeley INDENG 240.
  • Graduate Student Instructor for Mathematical Programming, Berkeley INDENG 262A.
  • BAIR Mentoring Program for Underrepresented Undergraduates.
  • Representative Courses I took at UC Berkeley:
    Visual Object and Activity Recognition (4.00)
    RISC-V CPU on FPGA Lab (4.00)
    Digital Circuits and Computer Architecture (4.00)
    Applications of Parallel Computers (4.00)
    Statistical Learning Theory (4.00)
    Convex Optimization and Approximation (4.00)
  • Representative Courses I took at Peking University:
    Digital Logic (4.00)
    Principles of Digital Integrated Circuits (4.00)
    Analog Circuits (3.99)
    Advanced Analog Integrated Circuits Design (3.99)
    Micro-Nano Integrated System (4.00)
    Fundamentals of Solid State Physics (3.98)
    Fundamentals of Semiconductor Materials (3.97)
    Physics of Semiconductor (3.98)
    Semiconductor Device Physics (3.98)
    Principle of Integrated Circuits Process (3.99)

Opensource

Industry Collaborations

  • Intel, Amazon, Alibaba, NVIDIA, Panasonic, ByteDance, Google, Meta, Apple, AMD, Nexusflow.ai, Samsung, Tesla.

Talks, Media & Events:

Service

  • Reviewer for TNNLS (IEEE Transactions on Neural Networks and Learning Systems), TMLR (Transactions of Machine Learning Research), TPAMI (Transactions on Pattern Analysis and Machine Intelligence), JMLR (Journal of Machine Learning Research), IEEE Micro, TED (IEEE Transactions on Electron Devices), PR (Pattern Recognition), TCSVT (IEEE Transactions on Circuits and Systems for Video Technology), OJCAS (IEEE Open Journal of Circuits and Systems), JCST (Journal of Computer Science and Technology) and Fundamental Research (Elsevier).
  • Reviewer for NeurIPS, ICML, CVPR, ICCV, AAAI, ECCV, IJCAI, ICLR, WACV, KDD, MLSys, TinyML, ECV, BLPCV.

Contact

UC Berkeley, CA, 94709
zhendong@berkeley.edu