Autonomous Navigation in Rows of Trees and High Crops with Deep Semantic Segmentation
ABSTRACT. Segmentation-based autonomous navigation has recently been proposed as a promising methodology to guide robotic platforms through crop rows without requiring precise GPS localization. However, existing methods are limited to scenarios where the centre of the row can be identified thanks to the sharp distinction between the plants and the sky. However, GPS signal obstruction mainly occurs in the case of tall, dense vegetation, such as high tree rows and orchards. In this work, we extend the segmentation-based robotic guidance to those scenarios where canopies and branches occlude the sky and hinder the usage of GPS and previous methods, increasing the overall robustness and adaptability of the control algorithm. Extensive experimentation on several realistic simulated tree fields and vineyards demonstrates the competitive advantages of the proposed solution.
A Map-Free LiDAR-Based System for Autonomous Navigation in Vineyards
ABSTRACT. Agricultural robots have the potential to increase production yields and reduce costs by performing repetitive and time-consuming tasks. However, for robots to be effective, they must be able to navigate autonomously in fields or orchards without human intervention. In this paper, we introduce a navigation system that utilizes LiDAR and wheel encoder sensors for in-row, turn, and end-row navigation in row structured agricultural environments, such as vineyards. Our approach exploits the simple and precise geometrical structure of plants organized in parallel rows. We tested our system in both simulated and real environments, and the results demonstrate the effectiveness of our approach in achieving accurate and robust navigation. Our navigation system achieves mean displacement errors from the center line of 0.049 m and 0.372 m for in-row navigation in the simulated and real environments, respectively. In addition, we developed an end-row points detection that allows end-row navigation in vineyards, a task often ignored by most works that assume this information to be given. We finally propose a framework for evaluating agricultural robot navigation since no standard metrics exist.
Surgical fine-tuning for Grape Bunch Segmentation under Visual Domain Shifts
ABSTRACT. Mobile robots will play a crucial role in the transition towards sustainable agriculture. To autonomously and effectively monitor the state of plants, robots ought to be equipped with visual perception capabilities that are robust to the rapid changes that characterise agricultural settings. In this paper, we focus on the challenging task of segmenting grape bunches from images collected by mobile robots in vineyards. In this context, we present the first study that applies surgical fine-tuning to instance segmentation tasks. We show how selectively tuning only specific model layers can support the adaptation of pre-trained Deep Learning models to newly-collected grape images that introduce visual domain shifts, while also substantially reducing the number of tuned parameters.
Multi-camera GPS-free Nonlinear Model Predictive Control strategy to traverse orchards
ABSTRACT. This paper deals with autonomous navigation through orchards. It proposes a multi-camera GPS-free strategy relying on a nonlinear model predictive control scheme to follow a reference path. This latter, based on a Voronoi diagram for the row traversals or a spiral model for the headland maneuvers, is computed as a NURBS curve making it possible to deal with multiple orchard layouts. The method has been implemented on our robot and validated through experimentation conducted in an orchard.
Learned Long-Term Stability Scan Filtering for Robust Robot Localisation in Continuously Changing Environments
ABSTRACT. In field robotics, particularly in the agricultural sector, precise localization presents a challenge due to the constantly changing nature of the environment. Simultaneous Localization and Mapping algorithms can provide an effective estimation of a robot's position, but their long-term performance may be impacted by false data associations. Additionally, alternative strategies such as the use of RTK-GPS can also have limitations, such as dependence on external infrastructure.
To address these challenges, this paper introduces a novel stability scan filter. This filter can learn and infer the motion status of objects in the environment, allowing it to identify the most stable objects and use them as landmarks for robust robot localization in a continuously changing environment. The proposed method involves an unsupervised point-wise labelling of LiDAR frames by utilizing temporal observations of the environment, as well as a regression network called Long-Term Stability Network (LTS-NET) to learn and infer 3D LiDAR points long-term motion status.
Experiments demonstrate the ability of the stability scan filter to infer the motion stability of objects on a real agricultural long-term dataset. Results show that by only utilizing points belonging to long-term stable objects, the localization system exhibits reliable and robust localization performance for long-term missions compared to using the entire LiDAR frame points.
GAFAR: Graph-Attention Feature-Augmentation for Registration - A Fast and Light-weight Point Set Registration Algorithm
ABSTRACT. Rigid registration of point clouds is a fundamental problem in computer vision with many applications from 3D scene reconstruction to geometry capture and robotics. If a
suitable initial registration is available, conventional methods like ICP and its many variants can provide adequate solutions.
In absence of a suitable initialization and in the presence of a high outlier rate or in the case of small overlap though the task of rigid registration still presents great challenges. The advent of deep learning in computer vision has brought new drive to research on this topic, since it provides the possibility to learn expressive feature-representations and provide one-shot estimates instead of depending on time-consuming iterations like conventional robust methods. Yet, the rotation and permutation invariant nature of point clouds poses its own challenges to deep learning, resulting in loss of performance and low generalization capability due to sensitivity to outliers and characteristics of 3D scans not present during network training.
In this work, we present a novel fast and light-weight network architecture using the attention mechanism to augment point descriptors at inference time to optimally suit the registration task of the specific point clouds it is presented with. Employing a fully-connected graph both within and between point clouds lets the network reason about the importance and reliability of points for registration, making our approach robust to outliers, low overlap and unseen data. We test the performance of our registration algorithm on different registration and generalization tasks and provide information on runtime and resource consumption.
ABSTRACT. Normal Distribution Transformation (NDT) registration is a fast, learning-free pointcloud registration algorithm that works well in diverse environments. It uses a compact and discrete representation of pointclouds called NDT maps. However, because of discreteness in NDT maps, the global minima of the registration cost function does not always correlate to ground truth, particularly for rotational alignment. In this study, we examined the NDT registration cost function in depth and evaluated three modifications (Student-t likelihood function, inflated covariance/heavily broadened likelihood curve, and overlapped NDT cells) that aim to reduce the impact of discreteness. The first two modifications reduce the NDT representation discreetness by modifying the distribution to have broadened likelihood tails, while the last modification achieves continuity by creating overlap between NDT cells' distribution without increasing the number of NDT cells. We used the Pomerleau Dataset evaluation protocol for our experiments and found that using the heavily broadened likelihood NDT (HBL-NDT) (34.7% success rate) registration cost function and overlapped NDT cells (ONC-NDT) (33.5% success rate) resulted in significant improvements in registration results compared to the conventional NDT registration approach (27.7% success rate). However, no consistent improvement was observed using Student-t likelihood-based registration cost function (22.2% success rate) over the NDT P2D registration cost function (23.7% success rate). We also present the results of several other state-of-the-art registration algorithms for broader comparison.
Enhancing Door-Status Detection for Autonomous Mobile Robots during Environment-Specific Operational Use
ABSTRACT. Door-status detection, namely recognising the presence of a door and its status (open or closed), can induce a remarkable impact on a mobile robot's navigation performance, especially for dynamic settings where doors can enable or disable passages, changing the topology of the map. In this work, we address the problem of building a door-status detector module for a mobile robot operating in the same environment for a long time, thus observing the same set of doors from different points of view. First, we show how to improve the mainstream approach based on object detection by considering the constrained perception setup typical of a mobile robot. Hence, we devise a method to build a dataset of images taken from a robot's perspective and we exploit it to obtain a door-status detector based on deep learning. We then leverage the typical working conditions of a robot to qualify the model for boosting its performance in the working environment via fine-tuning with additional data. Our experimental analysis shows the effectiveness of this method with results obtained both in simulation and in the real-world, that also highlight a trade-off between the costs and benefits of the fine-tuning approach.
Self-supervised Learning for Fusion of IR and RGB Images in Visual Teach and Repeat Navigation
ABSTRACT. With increasing computation power, longer battery life and lower prices, mobile robots are becoming a viable option for many applications. When the application requires long-term autonomy in an uncontrolled environment, it is necessary to equip the robot with a navigation system robust to environmental changes. Visual Teach and Repeat (VT\&R) is one such navigation system that is lightweight and easy to use. Similarly, as other methods rely on camera input, the performance of VT\&R can be highly influenced by changes in the scene's appearance. One way to address this problem is to use machine learning or/and add redundancy to the sensory input. However, it is usually complicated to collect long-term datasets for given sensory input, which can be exploited by machine learning methods to extract knowledge about possible changes in the environment from the data. In this paper, we show that we can use a dataset not containing the environmental changes to train a model processing infrared images and improve the robustness of the VT\&R framework by fusion with the classic method based on RGB images. In particular, our experiments show that the proposed training scheme and fusion method can alleviate the problems arising from adverse illumination changes. Our approach can broaden the scope of possible VT\&R applications that require deployment in environments with significant illumination changes.
White-box and Black-box Adversarial Attacks to Obstacle Avoidance in Mobile Robots
ABSTRACT. Advances in artificial intelligence (AI) play a major role in the adoption of robots for an increasingly broader range of tasks. However, as recent research has shown, AI systems, such as deep-learning models, can be vulnerable to adversarial attacks where small but carefully crafted changes to a model’s input can severely compromise its performance. In this paper, we present two methods to find adversarial attacks against autonomous robots. We focus on external attacks against obstacle-avoidance behaviour where an attacker — a robot — actively perturbs the sensor readings of a goal-seeking victim robot. In the first method, we model the interaction between the victim and attacker as a dynamical system and generate a series of open-loop control signals for the attacker to alter the victim’s behaviour. In the second method, the assumption that the attacker has full knowledge of the system’s dynamics is relaxed, and closed-loop control for the attacker is learnt through reinforcement learning. We find that both methods are able to find successful attacks against the victim robot and thus constitute viable techniques to assess the robustness of autonomous robot behaviour.
Evaluating Techniques for Accurate 3D Object Model Extraction through Image-based Deep Learning Object Detection and Point Cloud Segmentation
ABSTRACT. Accurate 3D object model extraction is essential for a wide range of robotics applications, including grasping and object mapping, which require precise knowledge of objects’ shape and location to perform optimally. However, high accuracy can be challenging to achieve, particularly when working with real-world data where factors like occlusions, clutter and noise can greatly influence results. Several techniques can be found in literature for integrating 2D deep learning and point cloud segmentation. Nevertheless, comparative studies on these algorithms are very limited. In contrast, this paper evaluates methods for obtaining 3D object models using a combination of deep learning object detection and point cloud segmentation. We compare a number of existing techniques, some of which have been improved for performance, on real-world data. More specifically, the paper examines four methods for 3D object extraction: two for bounding box object detection, one for instance segmentation and a fourth method that involves estimating an object mask in the image inside the bounding box. We compare these techniques qualitatively and quantitatively using several criteria, providing insights into their strengths and limitations.
TAICHI algorithm: Human-Like Arm Data Generation applied on Non-Anthropomorphic Robotic Manipulators for Demonstration
ABSTRACT. In household settings, Learning from Demonstration techniques can enable end-users to teach their robots new skills. Furthermore, it may be necessary for the demonstrations to be accessible through a straightforward setup, such as a single visual sensor. This study presents a pipeline that uses a single RGB-D sensor to demonstrate movements taking into account all the keypoints of the human arm to control a non-anthropomorphic arm. To perform this procedure, we present the TAICHI algorithm (Tracking Algorithm for Imitation of Complex Human Inputs). This method involves the detection of significant points of the human arm and mapping them to the robot, a Gaussian filtering process to smooth the movements and the sensor noise and an optimization algorithm that obtain the closest configuration to the human arm without generating collisions with its environment or with itself. The novelty of this method is based on the use of the keypoints of the human arm, and taking into account the end-effector and the elbow, to obtain the most similar configuration for a non-anthropomorphic arm. We have carried out tests with different movements performed at different speeds to validate our method, checking its efficiency on the robot's end-effector.
Multi-Task Learning for Industrial Mobile Robot Perception Using a Simulated Warehouse Dataset
ABSTRACT. Autonomous industrial mobile robots need advanced perception capabilities to operate safely and human-compliantly in shared working environments. To achieve this high-level understanding of the mobile robots' surroundings, this paper investigates Multi-Task Learning approaches to process multiple tasks simultaneously and potentially improve the generalization performance. Our work alleviates the scarcity of datasets that are relevant for industrial settings by introducing and making publicly available a simulated warehouse dataset covering semantic segmentation, depth estimation and surface normals estimation tasks. We collect and examine numerous MTL task-balancing techniques for industrial mobile robot perception. Our experiments show that the performance of those approaches is very dependent on the considered dataset, which further highlights the value of introducing new relevant datasets.
Artifacts Mapping: Multi-Modal Semantic Mapping Extension of Geometric Maps
ABSTRACT. Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping.
When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings.
This work focuses on classifying and localising objects within a map, which is under construction (SLAM) or already built.
To further explore this direction, we propose a framework that can autonomously map predefined objects in a known environment using a multi-modal sensor fusion approach (combining RGB and depth data from an RGB-D camera and a lidar).
The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts (i.e., filtering and stabilizing measurements).
The experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing, while 85% and 80% of the objects were mapped using the single camera or lidar setup respectively. The comparison with single-sensor (camera or lidar) experiments is performed to show that sensor fusion allows the robot to accurately detect near and far obstacles, which would have been noisy or imprecise in a purely visual or laser-based approach.
Dynamic Human-Aware Task Planner for Human-Robot Collaboration in Industrial Scenario
ABSTRACT. The collaboration between humans and robots in industrial scenarios is one of the key challenges for Industry 4.0. In particular, industrial robots are offering accuracy and efficiency, while humans have both experience and the capability to manage complex situations that is absolutely not replaceable. The combination of these features can enhance the industrial process because they remove the user from heavy tasks that can be performed by a robot and allow him to dedicate his efforts to tasks where quality and experience make the difference in the final product. However, the collaboration between humans and robots raises a number of new problems to be addressed like safety, tasks scheduling and operator ergonomics. For example, human presence in the robot workspace introduces various elements of complexity into robot planning due to its dynamism and unpredictability. Planning must take into account how to coordinate the tasks between the robot and the human and be quick in re-planning to respond reactively to the operator's trigger. For this purpose, this work proposes a hierarchical Human-Aware Task Planner framework capable of generating a suitable plan to complete the process and managing user interrupts in order to have a constantly updated plan. The method is evaluated in a real industrial scenario and in a specific complex assembly task like the draping of fibre carbon plies.
Decentralized Market-Based Task Allocation Algorithm for a Fleet of Industrial Mobile Robots
ABSTRACT. In this paper, we present an efficient, resilient,
and flexible market-based task allocation algorithm with a
distributed architecture for a dynamic factory environment.
The proposed algorithm provides efficient and intelligent task
allocation mechanisms that reduce the time and total distance
traveled by the agents.
This algorithm is implemented in a simulation environment
that is similar to a real-world environment with various robots
and tasks to allocate to test its efficiency, resilience, and
flexibility. It is compared quantitatively with other baseline
solutions such as auction only with available robots and a queue
system.
The results show that the algorithm is more efficient than
the other methods tested. It is also reliable since it can handle
unpredictable behaviors such as corrupted messages, loss of
connection for an extended period, failures to complete tasks,
and obstacles blocking the robot’s path and forcing them to take
a different trajectory. Finally, it is flexible since it can be used
for several different purposes and is robust to communications
failures. Also, this algorithm possesses the drawback of being
ill-equipped to manage a substantial influx of task requests,
given that solely a single task is auctioned and assigned at any
given time.
Distributed 3D-Map Matching and Merging on Resource-Limited Platforms using Tomographic Features
ABSTRACT. The collaborative 3D-map merging problem is studied using extracted tomographic features on low-resource computational platforms. Instead of depending on 3D-features and descriptors, 2D-features are extracted from 2D-projections of horizontal sections of gravity-aligned local maps. Matching features from sections that are at the same height reduces search space and improves efficiency and performance over state-of-the-art feature extraction and registration pipelines. Tomographic feature extraction is observed to provide order of magnitude improvements in memory and time efficiency, rendering it useful for near real-time map merging tasks in resource-limited platforms (e.g. UAVs). Accordingly, this fast-scheme enables collaborative 3D-map merging.
A Temporal Perspective n-Point Problem with Model Uncertainties for Cooperative Pose Estimation in a Heterogeneous Robot Team
ABSTRACT. Many solutions exist for estimating the pose of an object with respect to a camera, where perfect knowledge of the object is assumed.
In this work we lift the assumption of a perfectly known model and introduce uncertainties for the 3d points, which are retrieved from a dynamically created model.
The positions of model points can either be uncorrelated or correlated.
The latter is typically the case for mobile robots navigating based on results of visual-inertial sensor fusion in unknown and GNSS-denied environments.
In our approach, a selection of poses estimated by one robot is used as a
dynamical 3d model and combined with 2d points from tracking this robot with
the camera of another robot.
In addition, selection criteria for adding and deleting 3d model points in an optimal way are proposed.
Weighted residuals in the tangent space are used in a generalized least-squares problem to calculate the transformation between the tracking camera and an object.
In addition, selection criteria for adding and deleting 3d model points in an optimal way are formulated and the measurement errors are projected into tangential planes of the unit sphere. \newline
The proposed method allows to estimate the relative pose of members of a robotic team with high accuracy.
The benefits of our approach are shown in simulation and also during real-world experiments using visual odometry measurements from a multicopter that is tracked by the camera of a rover.
Human-centered Benchmarking for Socially-compliant Robot Navigation
ABSTRACT. Social compatibility is one of the most important parameters for service robots. It characterizes the quality of interaction between a robot and a human. In this paper, a human-centered benchmarking framework is proposed for socially compliant robot navigation. In an end-to-end manner, four open-source robot navigation methods are benchmarked, two of which are socially-compliant. All aspects of the benchmarking are clarified to ensure the reproducibility and replicability of the experiments. The social compatibility of robot navigation methods with the Robotic Social Attributes Scale (RoSAS) is measured. After that, the correspondence between RoSAS and the robot-centered metrics is validated. Based on experiments, the extra robot time ratio and the extra distance ratio are the most suitable to judge social compatibility.
Improved path planning algorithms for non-holonomic autonomous vehicles in industrial environments with narrow corridors: Roadmap Hybrid A* and Waypoints Hybrid A*
ABSTRACT. This paper proposes two novel path planning algorithms, Roadmap Hybrid A* and Waypoints Hybrid A*, for car-like autonomous vehicles in logistics and industrial contexts with obstacles (e.g., pallets or containers) and narrow corridors. Roadmap Hybrid A* combines Hybrid A* with a graph search algorithm applied to a static roadmap. The former enables obstacle avoidance and flexibility, whereas the latter provides greater robustness, repeatability, and computational speed. Waypoint Hybrid A*, on the other hand, generates waypoints using a topological map of the environment to guide Hybrid A* to the target pose, reducing complexity and search time. Both algorithms enable predetermined control over the shape of desired parts of the path, for example, to obtain precise docking maneuvers to service machines and to eliminate unnecessary steering changes produced by Hybrid A* in corridors, thanks to the roadmap and/or the waypoints. To evaluate the performance of these algorithms, we conducted a simulation study in an industrial plant where a robot must navigate narrow corridors to serve machines in different areas. In terms of computational time, total length, reverse length path, and other metrics, both algorithms outperformed the standard Hybrid A*.
Assisted Localization of MAVs for Navigation in Indoor Environments Using Fiducial Markers
ABSTRACT. Micro aerial vehicles (MAVs) are often limited due to weight or cost constraints. This results in low sensor variety and sometimes even in low sensor quality. For example, many MAVs only offer a single RGB camera to capture the environment, apart from simple distance sensors. On the other side, maps of complex environments are typically captured using depth sensors like Lidar, which are not found on such drones. For MAVs to still benefit from and use these maps, it is necessary to implement a connection layer that enables the localization of the MAV in these maps. In this paper, we propose to use fiducial markers that can be recorded by an assisting device, e.g., a mobile phone or tablet, responsible for map creation. These fiducial markers have a known pose in the map and can be detected by a drone's RGB camera to localize itself. We show that the markers are localized in the map creation process with high precision and that the drone is able to determine its pose based on detected markers. Furthermore, we present a ROS 2 based drone controller for a Ryze Tello EDU MAV that uses an occupancy voxel map for navigation.
Stable Yaw Estimation of Boats from the Viewpoint of UAVs and USVs
ABSTRACT. Yaw estimation of boats from the viewpoint of un
manned aerial vehicles (UAVs) and unmanned surface vehicles
(USVs) or boats is a crucial task in various applications such
as 3D scene rendering, trajectory prediction, and navigation.
However, the lack of literature on yaw estimation of objects
from the viewpoint of UAVs has motivated us to address
this domain. In this paper, we propose a method based on
HyperPosePDF for predicting the orientation of boats in the 6D
space. For that, we use existing datasets, such as PASCAL3D+
and our own datasets, SeaDronesSee-3D and BOArienT, which
we annotated manually. We extend HyperPosePDF to work
in video-based scenarios, such that it yields robust orientation
predictions across time. Naively applying HyperPosePDF on
video data yields single-point predictions, resulting in far-off
predictions and often incorrect symmetric orientations due
to unseen or visually different data. To alleviate this issue,
we propose aggregating the probability distributions of pose
predictions, resulting in significantly improved performance, as
shown in our experimental evaluation. Our proposed method
could significantly benefit downstream tasks in marine robotics.
Graph-based Simultaneous Localization and Mapping with incorporated dynamic object motion
ABSTRACT. Over the last years, Simultaneous Localization and Mapping (SLAM) in dynamic environments has received more attention. This paper presents a SLAM algorithm in which dynamic object information is included within the graph-based optimization approach. By exploiting knowledge about the object’s motion within the scene, the constructed map is a more accurate representation of the environment. Using data from simulation, we show that the robot’s trajectory and the dynamic object’s trajectory better aligns with respect to the ground truth. Real-world experiments, which includes human motion within the optimization, show that the robot’s trajectory and thus the environment map is improved. This is verified based on the comparison between the constructed maps with and without the incorporation of the human motion. The validity of the map is obtained by evaluating three metrics from literature and a comparison to the building plans of the environment.
Visual-LiDAR Odometry and Mapping with Monocular Scale Correction and Motion Compensation
ABSTRACT. This paper presents a novel visual-LiDAR odometry and mapping method with low-drift characteristics. The proposed method is based on two popular approaches, ORB-SLAM and A-LOAM, with monocular scale correction and visual-assisted LiDAR motion compensation modifications. The scale corrector calculates the proportion between the depth of image keypoints recovered by triangulation and that provided by LiDAR, using an outlier rejection process for accuracy improvement. Concerning LiDAR motion compensation, the visual odometry approach gives the initial guesses of LiDAR motions for better performance. This methodology is not only applicable to high-resolution LiDAR but can also adapt to low-resolution LiDAR. To evaluate the proposed SLAM system's robustness and accuracy, we conducted experiments on the KITTI Odometry and S3E datasets. Experimental results illustrate that our method significantly outperforms standalone ORB-SLAM2 and A-LOAM. Furthermore, regarding the accuracy of visual odometry with scale correction, our method performs similarly to the stereo-mode ORB-SLAM2.