In this work, we introduce a layer to retarget feature maps of convolutional neural networks (CNNs). Our "Retarget" layer densely samples values from feature maps at the locations inferred through our proposed spatial attention regressor. Our layer increments an existing saliency-based distortion layer by replacing its convolutional components with depthwise convolutions along with a set of other additional learnable parameters. The aforementioned reformulations, and the tuning of a few hyper-parameters, make the Retarget layer applicable at any depth of a feed-forward CNNs. Keeping in spirit with retargeting methods used in Content-Aware Image Resizing, we introduce our layer at the bottlenecks of different pre-trained network architectures. We validate our layer on the ImageCLEF2013, ImageCLEF2015, and ImageCLEF2016 subfigure classification task. The redesigned DenseNet121 model with the Retarget layer achieves state-of-the-art results under the visual category when no data augmentations are performed. Performing spatial sampling at deeper layers increases computational cost and memory requirements exponentially. To address this, we experiment with an approximation of the nearest neighbor interpolation and show consistent improvement over the baseline models and other state-of-the-art attention models, demonstrating our layer's broad applicability. The code will be publicly available.
Feature Map Retargeting to Classify Biomedical Journal Figures