Download Free Audio of Hello everyone, I am here for the representation o...

Read Aloud the Text Content

This audio was created by Woord's Text to Speech service by content creators from all around the world.

Download

Text Content or SSML code:

Hello everyone, I am here for the representation on my paper, Towards Feature Distribution Alignment and diversity enhancement for data free quantization. I am Yangcheng Gao, a master student, supervised by Zhao Zhang, from hefei university of technology, and also the first author of the paper. First of all, I bring a brief introduction to model quantization. As you can see, when deploying deep models on edge devices, we usually consider problems like, inference latency, energy consumption or memory usage. To tackle this, integer computation, especially low-bit integer, is employed to low down the inference cost of neural network, cause its high computation efficiency. In this page, we show some basic concepts about quantization, you can refer to the survey below for more details. The main equantion on the top shows that, by scaling, rounding and shifting, the floating-point value, indicated by r, can be mapped to a range represented by low-bit integer. The quantized value can be indicated by the left part of the equation. This process can reduce the model size and accelerate the inference for deployment in edge devices, but also introduces the quantization noise and damage the model performance. Hence, quantization aware training is proposed to recover the performance by alternately training and quantizing the weights of model. However, in some cases, the original data are prohibited, such as privacy, secrecy or so on. Then we cannot fine-tuning or calibrate the quantized model by original data. As a result, real-world applications of quantization-aware training and post-training quantization may be restricted. Therefore, some methods, termed data-free quantization, are proposed to generate data for quantization-aware training or post-training quantization. Some method constrain the global activation distribution to generate data for calibration, such as ZeroQ, but they encounter severe performance degradation in low-bit quantization. Others, like GDFQ, employ GAN mechanism to generate fake data. Cluster-Q, our proposed method, is one of them. Although recent studies have witnessed lots of efforts, on the topic of data-free quantization, the obtained improvements are still limited, compared with quantization aware training, due to the gap between the synthetic data and real-world data. how to make the generated synthetic data closer to real-world data for fine-tuning will be a crucial issue to be solved. To close the gap, we explore the pre-trained model information at a fine-grained level. And we observe that current generative quantization methods lead to weak inter-class seperability in their quantized model. Based on this phenomenon, we can hypothesize that high inter-class separability will reduce the gap between synthetic data and real-world data. In this paper, from the perspective of sematic feature distribution, we improve the synthetic data generation for quantization. We show the tsne visualization results on the activations of ResNet20, running on the CIFAR-10 dataset, there exists such a property in deep features, that the inter-class separability is enhanced when getting deeper in the network. We therefore propose our Cluster-Q to solve this problem. Our contributions are shown above. First, a new and effective data-free quantization scheme termed Cluster-Q is proposed, with feature distribution alignment and diversity enhancement process. To the best of our knowledge, Cluster-Q is the first data-free quantization scheme to utilize feature distribution alignment with clusters. Second, we reveals that high inter-class separability of the semantic features is critical for synthetic data generation, which impacts the quantized model performance directly. Finally, we conduct extensive experiments to show the sota results. Now we show the overview of the Cluster-Q, with a conditional GAN desgin, the framework can produce close-to-real images for quantization. We feed the generator with gaussian noise and pseudo labels to produce synthetic data. These data will be used to fine-tuning the quantized model. However, in our framework, these data will be fed to the full-precision model. See the right part, we extract the feature distribution from semantic featrues. They can be aligned by clustering to obtain high inter-class separability, and the loss term will update the generator. Hence, the generator can produce data with distribution approximates the real images. We also show other loss terms in the left that update the generator or fine-tune the quantized model. In the whole process, the data generation and quantized model fine-tuning will be performed alternately. You can refer to our paper for more information. Then we show details of the Feature distribution alignment. Note that feature distributions in shallow layers have no aggregation property, we only act alignment on deep layers. In this way, the knowledge of pre-trained model can be extracted to enhance the classification boundaries of quantized model. By feeding the fake data to the full-precision model, the running statistics of each layer can be extracted to align to the centroids according to its soft label. The centroids can be learned by exponential moving average. And the distance from the statistics to centroids will be added to the loss to update the generator. In this way, we can align the feature distribution and improve the quality of generated data, which finally decides the quantized model performance.

Edit Content / SSML code