Distillation-Based Model Compression Framework for Swin Transformer

EasyChair Preprint 15433

7 pages•Date: November 16, 2024

Mazen Amria, Aziz M. Qaroush, Mohammad Jubran, Alaa Zuhd and Ahmad Khatib

Abstract

Vision Transformers (ViTs) have gained significant attention in computer vision due to their exceptional model capabilities. However, most ViT models suffer from high complexity, with a large number of parameters that demand considerable memory and inference time, limiting their applicability on resource-constrained devices. To address this issue, we propose a distillation-based framework for compressing large models for smaller datasets. The framework leverages fine-tuning and knowledge distillation to accelerate the training process of compressed models. To evaluate its effectiveness, two compressed Swin Transformer models, Swin-N and Swin-M, were introduced and tested on the CIFAR-100 dataset. Experimental results demonstrate that when trained using the proposed framework, both Swin-N and Swin-M exhibit significant improvements in accuracy compared to their counterparts trained from scratch, with Swin-N achieving an 18.89% increase and Swin-M showing a 20.10% improvement. Additionally, Swin-M closely approximates the accuracy of the Swin-T teacher model, further validating the effectiveness of the framework.

Keyphrases: Knowledge Distillation, Model Compression, Swin, ViT, fine-tuning

Links:

https://easychair.org/publications/preprint/fgwP

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:15433,
  author    = {Mazen Amria and Aziz M. Qaroush and Mohammad Jubran and Alaa Zuhd and Ahmad Khatib},
  title     = {Distillation-Based Model Compression Framework for Swin Transformer},
  howpublished = {EasyChair Preprint 15433},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser