Monocular Depth Estimation Using Deep Edge Intelligence

Title:Monocular Depth Estimation Using Deep Edge Intelligence

Authors:Irfan Ali Sadab, Md Arafat Islam, Rashik Iram Chowdhury and Md. Ishan Arefin Hossain

Conference:STI 2024

Tags:absolute error mae mean, accuracy and computational efficiency, advanced cnn architectures, backbones for monocular depth estimation, best performing model, complexity and improve, convolutional neural networks, custom decoder, Deep Learning, depth estimation, depth estimation models, depth feature extraction, depth map, depth map predictions, depth prediction accuracy, depthwise separable convolutions, edge devices, Edge Intelligence, efficient convolutional techniques, enhance depth prediction, error mse root, images and depth maps, lightweight and efficient, mean absolute error mae, mean squared, model s accuracy, model s performance, model size and computational, Monocular Depth Estimation, nyu depth dataset, performance metrics, practicality and effectiveness, predicted depth maps, Quantization, requiring low latency, robotics augmented reality, size and computational demands, size and computational requirements, squeeze and excitation blocks and u net architecture

Abstract:

Monocular depth estimation, a crucial challenge in computer vision, has significant applications across various domains, including robotics, augmented reality, and autonomous systems. This study explores the efficacy of multiple Convolutional Neural Network (CNN) backbones as encoders within a UNet architecture, including DenseNet121, InceptionV3, Efficient- NetV2, MobileNetV2, and ResNet50, each paired with custom decoders incorporating squeeze-and-excitation blocks to enhance depth prediction accuracy. We introduce U-ResNet50, a model that combines ResNet-50 with an optimized custom decoder, designed to achieve high depth estimation accuracy. Trained on the NYU Depth V2 dataset with Structural Similarity Index Measure (SSIM) as the primary loss function, the models were evaluated on metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and SSIM accuracy. U-ResNet50 outperformed the others, achieving 94.62% SSIM accuracy and an RMSE of 0.0474, balancing efficiency and precision. To further enhance its practical application, we quantized the U-ResNet50 model, reducing its size and computational requirements while maintaining performance. The quantized model was deployed via a Streamlit web interface, demonstrating its potential for edge applications in robotics and augmented reality. Index Terms—Monocular Depth Estimation, Deep Learning, Quantization, Edge Intelligence