Download Free Audio of Specifically, we propose the loss function to adds... - Woord

Read Aloud the Text Content

This audio was created by Woord's Text to Speech service by content creators from all around the world.


Text Content or SSML code:

Specifically, we propose the loss function to adds the Euclidean distance between the running BN statistics and statistic centroid of each class in deep layers. As shown above, miu and sigma with hat denote the running mean and standard deviation for class c. Other two without hat represent corresponding statistics of the centroids. It add all of the distance in every layer. This process can extract class-wise distribution information and promote the quality of synthetic data. Thus, during the fine-tuning process, the learned classification boundary will be further enhanced. In addition, during the generator training, the Batch Normalization statistics obtained by mis-classified synthetic data, will not participate in the computation process. For these reasons, we need to update the centroids during generator training to release the negative effects. We directly use exponential moving average to update the centroids, as shown above. In this equation, beta-FD denotes the decay rate of exponential moving average to trade off the importance of previous and current statistics. Therefore, the distribution centroids can dynamically learn the class-wise feature distribution. And for diversity enhancement process, Although our proposed feature distribution alignment method can obtain high inter-class separability of semantic features, the distribution alignment may also cause vulnerability of class-wise mode collapse which will also degrade the generalization performance of quantized model. To avoid the class-wise mode collapse, we introduce gaussian noise to enlarge the perturbation within clusters. It can release the feature distribution homogenization of each class, which is caused by feature distribution over-fitting. To solve this issue, we define the diversity enhancement loss as shown above. N denotes Gaussian noise, lambda miu and lambda sigma denote the distortion levels of diversity enhancement. In this way, we can allow the running statistics for each class C to shift within a dynamic range around the centroids. Eventually, the generator can enhance the diversity of generated data within the same class. With comparison to others methods, such as Zero-Q, GDFQ, Qimera, DSG, DDAQ and Auto-Re-Con, our method obtains SOTA performance on different bit settings. W denotes weight bit-width and A denotes the activation bit-width. We only conduct experiments on 4-bit and 8-bit, cause power of two is widely used in practical applications. Noted that our method outperforms the Qimera 1.708% for MobileNet-V2 on Image-Net dataset. And on CIFAR10 dataset, similar conclusions can be obtained. That is, our method surpasses the current SOTA methods in terms of accuracy loss in this investigated case. We also present visualization results to show the superiority of proposed Cluster-Q. As shown in the left part, without learning classification boundaries, the data generated by ZeroQ have less class-wise discrepancy. For GDFQ, the generated data can be distinguished into different classes, but containing less detailed textures. And for Qimera, the little variance of the images within each class indicates that they encounter class-wise mode collapse. Compared to others, data generated by Cluster-Q have rich texture, much abundant colors and high difference. What's more, see the right part, we show synthetic data generated with the pre-trained ResNet20 model on CIFAR10, where "ship" is chosen as an example, there shows high diversity within the same class. Finally, we also conduct some ablation studies and sensitivity analysis on Cluster-Q. We first evaluate the effectiveness of each component in our Cluster-Q, diversity enhancement and exponential moving average. We conduct experiments to quantize the ResNet-18 into 4-bit on ImageNet dataset. We see that without the diversity enhancement or exponential moving average, the performance improvement of quantized model is limited. That is, both diversity enhancement or exponential moving average are important for our method. And for the hyper-parameter sensitivity analysis. It is clear that the quantized model achieves the best result, when beta-FD equals to 0.2. The performance is reduced when the decay rate is lower than 0.2, since in such cases the centroids cannot adapt to the distribution changing. Moreover, if beta-FD is increased beyond 0.2, the centroids will fluctuate and lead to performance degradation. And for the hyper-parameter alpha-3 which ajusts the importance of diversity enhancement, when alpha-3 goes up to 0.6, the performance of quantized model will increase. It demonstrates that diversity enhancement can improve the quality of synthetic data and lead to performance promotion. However, the performance of quantized model falls down when alpha-3 goes above 0.6, due to the excess distortion which disturbs the classification boundary. That's all for my representation, thank you.