Tags:Channel-first dataflow, Convolutional Neural Network, Hardware Accelerator and Pixel-first dataflow
Abstract:
With the development of research in hardware for Convolutional Neural Network(CNNs) Algorithms, it becomes crucial to examine the different aspects of hardware design. CNNs are mainly used in computer vision applications, and translating these algorithms into hardware calls for adopting appropriate dataflow to improve the utilisation of hardware resources resulting in higher throughput. In particular, the inference task at each neuron position can be assigned to a compute unit in the hardware accelerator, and several such neuron positions can be completed in parallel. We observe that adopting a static dataflow for an architecture can result in the under-utilisation of resources because of the different dimensions of the data in the network. The motivation of this paper is built upon the need for adaptive dataflow for the design to improve the multiply-and-accumulate (MAC) utilisation in CNNs. We propose a method, ADaMaT, which adapts the dataflow at runtime by appropriately assigning tasks to the MAC units depending on the dimensions of the layers instead of a pre-determined assignment. The adaptive assignment tries to maximise the MAC utilisation and improve the throughput. We have performed a comparative analysis among different static dataflows and our proposed ADaMaT dataflow.
ADaMaT: Towards an Adaptive Dataflow for Maximising Throughput in Neural Network Inference