Deep learning algorithms have demonstrated remarkable performance in many sectors and have become one of the main foundations of modern computer-vision solutions. However, these algorithms often impose prohibitive levels of memory and computational overhead, especially in resource-constrained environments. In this study, we combine the state-of-the-art object-detection model YOLOv3 with depthwise separable convolutions and variational dropout in an attempt to bridge the gap between the superior accuracy of convolutional neural networks and the limited access to computational resources. We propose three lightweight variants of YOLOv3 by replacing the original network’s standard convolutions with depthwise separable convolutions at different strategic locations within the network, and we evaluate their impacts on YOLOv3’s size, speed, and accuracy. We also explore variational dropout: a technique that finds individual and unbounded dropout rates for each neural network weight. Experiments on the PASCAL VOC benchmark dataset show promising results where variational dropout combined with the most efficient YOLOv3 variant lead to an extremely sparse solution that reduces 95% of the baseline network’s parameters at a relatively small drop of 3% in accuracy.
Depthwise Separable Convolutions and Variational Dropout Within the Context of YOLOv3