CIE Seminar – Is Model Compression Always Harmful to the Performance of Neural Networks?

cadence

Time: 1/22/2022 15:00-16:00PM (Sat)
Venue: Zoom Meeting
Registration: Click Here

In this talk, we will talk about modern techniques to compress deep neural networks, including model quantization for image classification.

Abstract:

The great success of deep learning is accompanied by exploding model size and computational cost of deep models. As an example, GPT-3 requires 175 billion parameters. This makes model compression and acceleration critical for application of deep neural networks on resource-constrained and speed-demanding platforms, especially for those with limited energy and stringent memory budget, such as DSP or IoT devices. In this talk, we will talk about modern techniques to compress deep neural networks, including model quantization for image classification task and network pruning for generative models. We will talk about conventional methods of neural network quantization and their limitations, followed by more recent progress, including advanced methods of adaptive bit-widths and mixed-precision quantization. After that, we discuss network pruning for generative models, which are typically much larger and more difficult to compress in comparison with those for image classification. Finally, we will introduce knowledge distillation which is a powerful method to enhance the performance of small-size models. We hope this will give a basic understanding of the compression techniques for deep models.

Speaker Bio:

Qing Jin is currently a PhD student in Northeastern University working on deep learning acceleration and software-hardware co-design for innovative computing systems. Qing obtained his Bachelor and Master degrees from Nankai University, both in Microelectronics, and Master degree in Computer Engineering from Texas A&M University. He has been working on RF integrated circuit design as well as low-power analog circuit design in the early years, and work on computer vision and deep learning algorithms in recent days. He has been a research intern in several companies including ByteDance, SenseBrain, Kwai and Snap, working on model compression, neural network quantization, and neural architecture search (NAS) for deep neural networks. His work has been published in top-tier conferences and journals in a broad area including computer vision, machine learning, as well as solid-state circuits, such as CVPR, ICML, NeurIPS, AAAI, BMVC, CICC, ISLPED, and JSSC. He is a co-recipient of Best Student Paper Award Finalist of IEEE Custom Integrated Circuits Conference (CICC). He is interested in both practical design and theoretical analysis related to deep learning.

Yanzhi Wang is currently an assistant professor in the Department of Electrical and Computer Engineering, and Khoury College of Computer Science (Affiliated) at Northeastern University. He has received his Ph.D. Degree in Computer Engineering from University of Southern California (USC) in 2014, under the supervision of Prof. Massoud Pedram. He received the Ming Hsieh Scholar Award (the highest honor in the EE Dept. of USC) for his Ph.D. study. He received his B.S. Degree in Electronic Engineering from Tsinghua University in 2009 with distinction from both the university and Beijing city. Dr. Wang’s current research interests include real-time and energy-efficient deep learning and artificial intelligence systems, model compression and mobile acceleration of deep neural networks (DNNs), deep learning acceleration for auto-driving, neuromorphic computing and non-von Neumann computing paradigms, as well as cyber-security in deep learning systems. His group works on both algorithms and actual implementations (mobile and embedded systems, FPGAs, circuit tape-outs, GPUs, emerging devices, and UAVs).

Related Posts