previous day
next day
all days

View: session overviewtalk overview

09:00-10:40 Session 7: Oral Session III
Difficulty-Aware Time-Bounded Planning under Uncertainty for Large-Scale Robot Missions

ABSTRACT. We consider planning problems where a robot must visit a large set of locations to complete a task at each one.

Our focus is problems where the difficulty of each task, and thus its duration, can be predicted, but not fully known in advance. We propose a general Markov decision process (MDP) model for difficulty-aware problems, and propose variants on this model which allow adaptation to different robotics domains.

Due to the intractability of the general problem, we propose simplifications to allow planning in large domains, the key being constraining navigation using a solution to the travelling salesperson problem (TSP).

We build a set of variant models for two domains with different characteristics: UV disinfection, and cleaning, evaluating them on maps generated from real-world environments. We evaluate the effect of model variants and simplifications on performance, and show that our models outperform a rule-based baseline.

Robust Multi-Agent Pickup and Delivery with Delays

ABSTRACT. Mult-Agent Pickup and Delivery (MAPD) is the problem of computing collision-free paths for a group of agents such that they can safely reach delivery locations from pickup ones. These locations are provided at runtime, making MAPD a combination between classical Multi-Agent Path Finding (MAPF) and online task assignment. Current algorithms for MAPD do not consider many of the practical issues encountered in real applications: real agents often do not follow the planned paths perfectly, and may be subject to delays and failures. In this paper, we study the problem of MAPD with delays, and we present two solution approaches that provide robustness guarantees by planning paths that limit the effects of imperfect execution. In particular, we introduce two algorithms, k-TP and p-TP, both based on a decentralized algorithm typically used to solve MAPD, Token Passing (TP), which offer deterministic and probabilistic guarantees, respectively. Experimentally, we compare our algorithms against a version of TP enriched with online replanning. k-TP and p-TP provide robust solutions, significantly reducing the number of replans caused by delays, with little or no increase in solution cost and running time.

On Improvement Heuristic to Solutions of the Close Enough Traveling Salesman Problem in Environments with Obstacles

ABSTRACT. In this paper, we present a novel improvement heuristic to address the Close Enough Traveling Salesman Problem in environments with obstacles (CETSP_obs). The CETSP_obs is a variant of the Traveling Salesman Problem (TSP), where the goal is to find a sequence of visits to given disk-shaped regions together with the points of visits to the regions. We address challenging instances in the polygonal domain with polygonal obstacles, where the final path connecting the regions must be collision-free. We propose a novel Post-Optimization procedure using the Mixed Integer Non-Linear Programming (MINLP) to improve existing heuristic solutions to the CETSP_obs. We deploy the method with existing heuristic solvers, and based on the presented evaluation results, the proposed Post-Optimization significantly improves the heuristic solutions of all examined solvers and makes them competitive regarding the solution quality. The statistical comparison yields the best-performing solver based on finding the sequence on relative sparse sampling of the disk regions, showing the benefit of the optimal MINLP-based solution of continuous optimization.

Context-Conditional Navigation with a Learning-Based Terrain- and Robot-Aware Dynamics Model

ABSTRACT. In autonomous navigation settings, several quantities can be subject to variations. Terrain properties such as friction coefficients may vary over time depending on the location of the robot. Also, the dynamics of the robot may change due to, e.g., different payloads, changing the system's mass, or wear and tear, changing actuator gains or joint friction. An autonomous agent should thus be able to adapt to such variations. In this paper we develop a novel probabilistic, terrain- and robot-aware forward dynamics model, termed TRADYN, which is able to adapt to the above mentioned variations. It builds on recent advances in meta-learning forward dynamics models based on Neural Processes. We evaluate our method in a simulated 2D navigation setting with a unicycle-like robot and different terrain layouts with spatially varying friction coefficents. In our experiments, the proposed model exhibits lower prediction error for the task of long-horizon trajectory prediction, compared to non-adaptive ablation models. We also evaluate our model on the downstream task of navigation planning which demonstrates improved performance in planning control-efficient paths by taking robot and terrain properties into account.

Social Robot Navigation through Constrained Optimization: a Comparative Study of Uncertainty-based Objectives and Constraints

ABSTRACT. This work is dedicated to the study of how uncertainty estimation of the human motion prediction can be embedded into constrained optimization techniques, such as Model Predictive Control (MPC) for the social robot navigation. We propose several cost objectives and constraint functions obtained from the uncertainty of predicting pedestrian positions and related to the probability of the collision that can be applied to the MPC, and all the different variants are compared in challenging scenes with multiple agents. The main question this paper tries to answer is: what are the most important uncertainty-based criteria for social MPC? For that, we evaluate the proposed approaches with several social navigation metrics in an extensive set of scenarios of different complexity in reproducible synthetic environments. The main outcome of our study is a foundation for a practical guide on when and how to use uncertainty-aware approaches for social robot navigation in practice and what are the most effective criteria.

10:40-11:00Coffee Break
11:00-12:40 Session 8: Oral Session IV
Delta filter – robust visual-inertial pose estimation in real-time: A multi-trajectory filter on a spherical mobile mapping system

ABSTRACT. Many state-of-the-art mobile mapping systems accomplish reliable and robust pose estimation utilizing combinations of inertial measurement units (IMUs), global navigation satellite systems (GNSS), visual-inertial- or LiDAR-inertial odometry (VIO/LIO). However, on a spherical mobile mapping system the underlying inherent rolling motion introduces high angular velocities, thus the quality of pose estimates, images, and laser-scans, degrade. In this work we propose a pose filter design that is able to do real-time sensor fusion between two unreliable trajectories into one, more reliable trajectory. It is a simple yet effective filter design that does not require the user to estimate the uncertainty of the sensors. The approach is not limited to spherical robots and theoretically is also suitable for sensor fusion of an arbitrary number of estimators. This work compares this filter against two pose estimation methods on our spherical system: (1) An approach that is based solely on IMU measurements, and (2) stereo-VIO with an Intel® RealSense™ tracking camera. The proposed “Delta” filter takes as input (1), (2), and a motion model. Our implementation gets rid of the drift in (1) and (2), estimates the scale of the trajectory, and deals with slow and fast motion as well as driving curves. To quantify our results, we evaluate the trajectories against ground truth pose measurement using an OptiTrack™ motion capturing system. Furthermore, as our spherical system is equipped with a laserscanner, we evaluate the resulting point-clouds against ground truth maps available from a Riegl VZ-400 terrestrial laserscanner. Our source code is found on github.

Dataset Generation for Deep Visual Navigation in Unstructured Environments

ABSTRACT. Visual navigation in unstructured environments is one of the essential functions of autonomous mobile robots. Most existing navigation methods are based on extracting traversable regions, and the visibility of the traversable region is crucial. However, in situations like plant-rich environments, such visibility is not guaranteed. A possible option is to directly infer the moving direction from images using deep learning techniques. Although this approach is promising because of a wide variety of application environments, obtaining a high-quality, large dataset will be an issue. This paper describes a data acquisition system that can easily collect various types of sensor data and a dataset generation method. We evaluate the generated datasets in several aspects. We also release the datasets, including image-path pairs and various raw sensor data.

Towards camera parameters invariant monocular depth estimation in autonomous driving

ABSTRACT. Monocular depth estimation is an effective approach to environment perception due to simplicity of the sensor setup and absence of multisensor calibration. Deep learning has enabled accurate depth estimation from a single image by exploiting semantic cues such as the sizes of known objects and positions on the ground plane thereof. However, learning-based methods frequently fail to generalize on images collected with different vehicle-camera setups due to the induced perspective geometry bias. In this work, we propose an approach for camera parameters invariant depth estimation in autonomous driving scenarios. We propose a novel joint parametrization of camera intrinsic and extrinsic parameters specifically designed for autonomous driving. In order to supplement the neural network with information about the camera parameters, we fuse the proposed parametrization and image features via the novel module based on a self-attention mechanism. After thorough experimentation on the effects of camera parameter variation, we show that our approach effectively provides the neural network with useful information, thus increasing accuracy and generalization performance.

Synthetic Data-based Detection of Zebras in Drone Imagery

ABSTRACT. Datasets that allow the training of common objects or human detectors are widely available. These come in the form of labelled real-world images and require either a significant amount of human effort, with a high probability of errors such as missing labels, or very constrained scenarios, e.g. VICON systems. Likewise, uncommon scenarios, like aerial views, animals, like wild zebras, or difficult-to-obtain information as human shapes, are hardly available. To overcome this, usage of synthetic data generation with realistic rendering technologies has recently gained traction and advanced tasks like target tracking and human pose estimation. However, subjects such as wild animals are still usually not well represented in such datasets. In this work, we first show that a pre-trained YOLO detector can not identify zebras in real images recorded from aerial viewpoints. To solve this, we present an approach for training an animal detector using only synthetic data. We start by generating a novel synthetic zebra dataset using GRADE, a state-of-the-art framework for data generation. The dataset includes RGB, depth, skeletal joint locations, pose, shape and instance segmentations for each subject. We use this to train a YOLO detector from scratch. Through extensive evaluations of our model with real-world data from i) limited datasets available on the internet and ii) a new one collected and manually labelled by us, we show that we can detect zebras by using only synthetic data during training. The code, results, trained models, and both the generated and training data are provided as open-source at

Navigating in 3D Uneven Environments through Supervoxels and Nonlinear MPC

ABSTRACT. Navigating uneven and rough terrains presents difficulties, including stability, traversability, sensing, and robustness, making autonomous navigation in these terrains a challenging task. This study introduces a new approach for mobile robots to navigate uneven terrains. The method uses a compact graph of traversable regions on point cloud maps, created through the utilization of supervoxel representation of point clouds. By using this supervoxel graph, the method navigates the robot to any reachable goal pose by utilizing a navigation function and Nonlinear Model Predictive Controller (NMPC). The NMPC ensures kinodynamically feasible and collision-free motion plans, while the supervoxel-based geometric planning generates near-optimal plans by exploiting the terrain information. We conducted extensive navigation experiments in real and simulated 3D uneven terrains and found that the approach performs reliably. Additionally, we compared resulting motion plans to some state-of-the-art sampling-based motion planners in which our method outperformed them in terms of execution time and resulting path lengths. The method can also be adapted to meet specific behavior, like the shortest route or the path with the least slope route. The source code is available in a GitHub repository.

12:40-14:00Lunch Break
16:30-16:50Coffee Break
16:50-18:10 Session 10: Poster Spotlight Session II
Ant Colony Optimization for Retail based Capacitated Vehicle Routing Problem with Pickup and Delivery for Mobile Robots

ABSTRACT. Mobile Robots have been the key for automation in various applications including picking and placing items in a retail store. Capacitated Vehicle Routing Problem with Pickup and Delivery(CVRP-PD) is widely used in similar applications like package delivery vehicles and mobile robots in retail, where mobile robots have a capacity limit such as weight and have to pickup and drop multiple items during their tour in an optimized manner. However, Retail application comes with more challenges where there could be multiple fixed deposit locations for particular pickup items such as packing counters and after delivering some items mobile robots can again regain capacity and be able to pickup more items during the same run. In this paper, we consider these constraints for retail applications and optimize retail orders for all mobile robots present in the environment, where the order requests for pickup and delivery of products at various locations while mobile robots have different maximum load capacities and robots can regain their capacity once they have dropped some items at their particular delivery locations. In this paper, we propose a method to solve this retail based CVRP-PD using Ant Colony Optimization(ACO). We take an industrial use-case and test the method with different order sizes and robot parameters. The results have been promising and used to solve the use-case under consideration. In addition, we also evaluate the results and propose future prospects.

Direct Object Reconstruction on RGB-D Images in Cluttered Environment

ABSTRACT. Robots have limited perception capabilities when observing the scene from a single viewpoint. Some objects on the scene might be partially occluded and their 3D shape is not fully available to the robot. Existing methods obtain object models through a series of observations using RGB-D sensors or the robot is trained to operate in the presence of occlusions. In this paper, we directly address object reconstruction in the presence of occlusions. We propose an image generation approach using a neural network architecture to remove occluding objects from RGB-D images and reconstruct the occluded object. The proposed method utilizes a cascade of neural networks trained to progressively remove occlusions and reconstruct the RGB-D images of the scene.

Robust Perception Skills for Autonomous Elevator Operation by Mobile Robots

ABSTRACT. Autonomous mobile service robots with transportation tasks are often restricted to work on a single floor, since remote access to elevators is expensive to integrate for reasons of safety certification. Therefore, already ten years ago first robots have been enabled to use the human interface for riding an elevator. This requires a variety of perception and manipulation capabilities as well as social skills when it comes to interaction with other people who want to use the elevator too. We summarize the progress in solving the specific tasks of detecting and localizing the required buttons to press robustly. A deep-learning approach for detecting buttons in images is combined with a verification based on predefined knowledge on button arrangements in the elevator’s control panels. Also perception of the elevator’s state and our realization of the robot’s elevator riding capabilities are discussed.

Scalable Evaluation Pipeline of CNN-based perception for Robotic Sensor Data under different Environment Conditions

ABSTRACT. Deep learning impacted a wide variety of perception applications for autonomous mobile robots. In classic computer vision benchmark tests, new algorithms keep appearing that outperform each other. However, these benchmark tests cannot be generalized, so that the specific application must be considered for the selection of sensors and algorithms. Especially in the agricultural domain, environmental conditions like weather and vegetation significantly influence the reliability of sensor systems. Therefore, it is essential to test different sensor modalities and algorithms in the operational design domain. This motivates the need for an evaluation framework which has the flexibility to compare and validate various perception algorithms, sensors suites, and data samples with a focus on different conditions.

This paper proposes a pipeline combining a test environment (AI-TEST-FIELD), a semantic environment representation (SEEREP), and an inference server (Triton) for an automatic evaluation of different CNN-based perception algorithms under various environment conditions. Recurring and comparable recordings of sensor raw data with identical scenarios and objects can be performed on the test field, with the only difference being the environmental conditions. The inference results are inferred once and stored alongside the sensor data in SEEREP. Thus, they can be queried efficiently based on the environment conditions to generate (partially overlapping) subsets of the whole dataset. It is demonstrated how this pipeline can be used to apply the CNN-inference just once on the data, and how the queried subsets can subsequently be used to evaluate the performance in different environment conditions.

Teach and Repeat and Wheel Calibration for LiDAR-equipped Omnidirectional Drive Robots

ABSTRACT. In this work, we present a novel teach-and-repeat (T&R) method for omnidirectional-drive autonomous ground vehicles vehicles, based on the tightly-coupled fusion of LiDAR and odometry readings in a relative representation. In addition to robot localization, we perform online estimation of the platform intrinsic parameters, which significantly enhances the robustness and localization accuracy of the system. We demonstrate the effectiveness of our approach, including the performance of the platform parameters and pose estimation, as well as the general T&R method, through simulation.

Adaptive Compliant Robot Control with Failure Recovery for Object Press-Fitting

ABSTRACT. Loading of shipping containers for dairy products often includes a press-fit task, which involves manually stacking milk cartons in a container without using pallets or packaging. Automating this task with a mobile manipulator can reduce worker strain, and also enhance the efficiency and safety of the container loading process. This paper proposes an approach called Adaptive Compliant Control with Integrated Failure Recovery (ACCIFR), which enables a mobile manipulator to reliably perform the press-fit task. We base the approach on a demonstration learning-based compliant control framework, such that we integrate a monitoring and failure recovery mechanism for successful task execution. Concretely, we detect collisions while the robot is performing the press-fit task and use wrench measurements to classify the direction of collision; this information informs the subsequent recovery process. We evaluate the method on a miniature container setup, considering variations in the (i) starting position of the end effector, (ii) goal configuration, and (iii) object grasping position. The results demonstrate that the proposed approach outperforms the baseline demonstration-based learning framework regarding adaptability to environmental variations and the ability to recover from collision failures, making it a promising solution for practical press-fit applications.

Graph-based LiDAR-Inertial SLAM Enhanced by Loosely-Coupled Visual Odometry

ABSTRACT. In this paper, we address robot localization using Simultaneous Localization and Mapping (SLAM) with Light Detection and Ranging (LiDAR) perception enhanced by visual odometry in scenarios where laser scan matching can be ambiguous because of a lack of sufficient features in the scan. We propose a Graph-based SLAM approach that benefits from fusing data from multiple types of sensors to overcome the disadvantages of using only LiDAR data for localization. The proposed method uses a failure detection model based on the quality of the LiDAR scan matching and inertial measurement unit (IMU) data. The failure model improves LiDAR-based localization by an additional localization source, including low-cost black-box visual odometers like the Intel RealSense T265. The proposed method is compared to the state-of-the-art localization system LIO-SAM in cluttered and open urban areas. Based on the performed experimental deployments, the proposed failure detection model with black-box visual odometry sensor yields improved localization performance measured by the absolute trajectory and relative pose error indicators.

Symmetric Object Pose Estimation via Flexible Modular CNN

ABSTRACT. Object pose estimation is a crucial task in various applications, including human-robot interaction, mobile robotics, and augmented reality. It involves determining the position and orientation of an object relative to a reference frame. This is a challenging task due to the need for accurate object detection and recognition, as well as understanding its geometry and the surrounding environment. Depending on the application and available resources, this task can be performed using Lidars, as in autonomous driving, or smaller RGBD cameras, as in mobile robotics. This work proposes an innovative convolutional neural network (CNN) for object pose estimation from RGBD data. The model is designed to have two separate branches, one for estimating the object's position and one for estimating the orientation, to facilitate the training process without loss in performance. Moreover, our approach emphasizes the problem of symmetric object pose estimation, for which we designed a new loss function to better represent the rotation error. The proposed model, with the newly introduced loss function, outperforms state of the art models on public datasets for object pose estimate, both for standard asymmetric objects and symmetric ones.

Learning State-Space Models for Mapping Spatial Motion Patterns

ABSTRACT. Mapping the surrounding environment is essential for the successful operation of autonomous robots. While extensive research has focused on mapping geometric structures and static objects, the environment is also influenced by the movement of dynamic objects. Incorporating information about spatial motion patterns can allow mobile robots to navigate and operate successfully in populated areas. In this paper, we propose a deep state-space model that learns the map representations of spatial motion patterns and how they change over time at a certain place. To evaluate our methods, we use two different datasets: one generated dataset with specific motion patterns and another with real-world pedestrian data. We test the performance through evaluations of the learning ability, mapping quality, and applicability to downstream tasks. The results demonstrate that our model can effectively learn the corresponding motion pattern, and has the potential to be applied to robotic application tasks.

Monocular Person Localization and Lidar Fusion for Social Navigation

ABSTRACT. Smooth social navigation requires not only detection of the people around the robot, but also accurate localization of the people, a process difficult to achieve with a single sensing modality. Hence, literature has focused on various fusion approaches, such as RGB-D, ROI based lidar vision fusion, or Artificial Neural Network based lidar vision fusion. However, monocular photogrammetry has always been ignored in the literature. In this work, we propose a fusion approach based on monocular positon estimation and lidar, and show the effectivenes of the approach on a public dataset.

Towards Data-Driven Discovery of Governing Swarm Robots Flocking Rules

ABSTRACT. Extracting local interaction rules that govern the dynamics of a swarm is a central challenge in many swarm robotics application domains. Reverse engineer of such dynamics might be highly beneficial in preventing the serious design handcrafting errors that swarm robotics engineers may implicitly make. Advances in data-driven based systems identification techniques, called SINDy, are currently enabling the tractable identification of the equations governing the dynamics of many systems. However, they have not yet to be applied in swarm robotics systems. In this work, we aim to combine sparsity-promoting techniques with nonlinear swarm dynamical systems to develop a data-driven system identification model capable of discovering governing swarm flocking interaction rules from swarm measurement data. We particularly build and compare two SINDy flocking models: Flock-SINDy-STLSQ and Flock-SINDy-SR3. our findings suggest that the Flock-SINDy-SR3 discover better the underlying flocking dynamics rules than the Flock-SINDy-STLSQ and is expected to be further used as a controller implemented on real drones.

Where to Place a Pile?

ABSTRACT. When planning missions for autonomous machines in real-world scenarios, such as open-pit mining, painting, or harvesting, it is important to consider how the machines will alter the working environment during their operations. Traditional planning methods treat such changes, like piles built during drilling, as constraints given to the planner that depend on the machine's trajectory. The goal is to find a trajectory that satisfies these constraints. However, our approach formulates the planning problem as finding optimal positions for changes, such as piles, along the machine's trajectory. We propose a heuristic solver and provide extensive experimental evaluations.

An EKF-based Multi-Object Tracking Framework for a Mobile Robot in a Precision Agriculture Scenario

ABSTRACT. Many robotic applications require the ability to locate multiple objects in the environment, but the use of instant-by-instant identification techniques may be unreliable in variable and poorly structured contexts, such as for the majority of precision agriculture settings. Inspired by the need of the H2020 CANOPIES projects, where robotic platforms are required to perform harvesting operations in table-grape vineyards, in this paper, we propose a framework for tracking objects of interest over time using a mobile robotic platform equipped with RGB-D camera. Specifically, we design a multi-object tracking module based on an Extended Kalman Filter (EKF) which takes into account the motion of the robot to update the estimate of the localization of the objects. We validate the approach within a realistic Unity-based simulator where a mobile robot is required to track table-grape bunches within the vineyard.

Stereo Visual Localization Dataset Featuring Event Cameras

ABSTRACT. Visual odometry and SLAM methods are facing increasingly complex scenarios and novel solutions are needed to offer more accurate and reliable results in challenging environments. Standard cameras are challenged under low light conditions or very high-speed motion, as they suffer from motion blur and operate at a limited frame rate. These problems can be alleviated by using event cameras -- asynchronous visual sensors that offer complementary advantages compared to standard cameras, as they do not suffer from motion blur and support high dynamic range. Although there is a number of existing datasets intended for visual odometry and SLAM that contain event data, most of them are collected using monocular sensors and limited either in terms of camera resolution or ground truth availability. Our work aims to complement this by further supporting development of robust stereo visual odometry and SLAM algorithms, allowing to exploit both event data and intensity images. We provide both indoor sequences with 6-DoF motion and outdoor vehicle driving sequences that additionally contain 3D lidar data. All sequences contain data from a synchronized high-resolution stereo event and standard cameras, whereas ground truth trajectories are provided by either a motion capture system or a highly-accurate GNSS/INS and AHRS that combines the fibre optic gyro IMU with a dual antenna RTK GNSS receiver.

Analyzing Data Efficiency and Performance of Machine Learning Algorithms for Assessing Low Back Pain Physical Rehabilitation Exercises

ABSTRACT. Physical rehabilitation focuses on the improvement of body functions, usually after injury or surgery. Patients undergoing rehabilitation often need to perform exercises at home without the presence of a physiotherapist. Computer-aided assessment of physical rehabilitation can improve patients' performance and help in completing prescribed rehabilitation exercises. In this work, we focus on human motion analysis in the context of physical rehabilitation for Low Back Pain (LBP). As 2D and 3D human pose estimation from RGB images had made impressive improvements, we aim to compare the assessment of physical rehabilitation exercises using movement data acquired from RGB videos and human pose estimation from those. In this work, we provide an analysis of two types of algorithms on a Low Back Pain rehabilitation datasets. One is based on a Gaussian Mixture Model (GMM), with performance metrics based on the log-Likelihood values from GMM. Furthermore, with the recent development of Deep Learning and Graph Neural Networks, algorithms based on Spatio-Temporal Graph Convolutional Networks (STGCN) are taken as a novel approach. We compared the algorithms in terms of data efficiency and performance, with evaluation performed on two LBP rehabilitation datasets: KIMORE and Keraal. Our study confirms that Kinect, OpenPose and BlazePose data yield to similar evaluation scores, and shows that STGCN outperforms GMM in most configurations.

20:00-22:30Social Event: Gala Dinner (Machado de Castro National Museum)