Download PDFOpen PDF in browser

Capacity Visual Attention Networks

9 pagesPublished: September 29, 2016


Inspired by recent work in machine translation and object detection, we introduce an attention-based model that automatically learns to extract information from an image by adaptively assigning its capacity across different portions of the input data and only processing the selected regions of different sizes at high resolution. This is achieved by combining two modules: an attention sub-network which uses a mechanism to model a human-like counting process and a capacity sub-network. This sub-network efficiently identifies input regions for which the attention model output is most sensitive and to which we should devote more capacity and dynamically adapt the size of the region. We focus our evaluation on the Cluttered MNIST, SVHN, and Cluttered GTSRB image datasets. Our findings indicate that the proposed model is able to drastically reduce the number of computations, compared with traditional convolutional neural networks, while maintaining similar or better performance.

Keyphrases: deep learning, image recognition, machine learning, neural networks, visual attention

In: Christoph Benzmüller, Geoff Sutcliffe and Raul Rojas (editors). GCAI 2016. 2nd Global Conference on Artificial Intelligence, vol 41, pages 72--80

BibTeX entry
  author    = {Marcus Edel and Joscha Lausch},
  title     = {Capacity Visual Attention Networks},
  booktitle = {GCAI 2016. 2nd Global Conference on Artificial Intelligence},
  editor    = {Christoph Benzm\textbackslash{}"uller and Geoff Sutcliffe and Raul Rojas},
  series    = {EPiC Series in Computing},
  volume    = {41},
  pages     = {72--80},
  year      = {2016},
  publisher = {EasyChair},
  bibsource = {EasyChair,},
  issn      = {2398-7340},
  url       = {},
  doi       = {10.29007/lcmk}}
Download PDFOpen PDF in browser