Download PDFOpen PDF in browser

Distillation-Based Model Compression Framework for Swin Transformer

EasyChair Preprint 15433

7 pagesDate: November 16, 2024

Abstract

Vision Transformers (ViTs) have gained significant attention in computer vision due to their exceptional model capabilities. However, most ViT models suffer from high complexity, with a large number of parameters that demand considerable memory and inference time, limiting their applicability on resource-constrained devices. To address this issue, we propose a distillation-based framework for compressing large models for smaller datasets. The framework leverages fine-tuning and knowledge distillation to accelerate the training process of compressed models. To evaluate its effectiveness, two compressed Swin Transformer models, Swin-N and Swin-M, were introduced and tested on the CIFAR-100 dataset. Experimental results demonstrate that when trained using the proposed framework, both Swin-N and Swin-M exhibit significant improvements in accuracy compared to their counterparts trained from scratch, with Swin-N achieving an 18.89% increase and Swin-M showing a 20.10% improvement. Additionally, Swin-M closely approximates the accuracy of the Swin-T teacher model, further validating the effectiveness of the framework.

Keyphrases: Knowledge Distillation, Model Compression, Swin, ViT, fine-tuning

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15433,
  author    = {Mazen Amria and Aziz M. Qaroush and Mohammad Jubran and Alaa Zuhd and Ahmad Khatib},
  title     = {Distillation-Based Model Compression Framework for Swin Transformer},
  howpublished = {EasyChair Preprint 15433},
  year      = {EasyChair, 2024}}
Download PDFOpen PDF in browser