SYNASC 2023: 25TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING
PROGRAM FOR THURSDAY, SEPTEMBER 14TH
Days:
previous day
all days

View: session overviewtalk overview

09:00-09:50 Session 17: Invited talk: Random structures and patterns in spatio-temporal data: probabilistic modelling and statistical inference (Radu S. Stoica)

Abstract. The useful information carried by spatio-temporal data is often outlined by geometric structures and patterns. Filaments or clusters induced by galaxy positions in our Universe are such an example. Two situations are to be considered. First, the pattern of interest is hidden in the data set, hence the pattern should be detected. Second, the structure to be studied is observed, so relevant characterization of it should be done. Probabilistic modelling is one of the approaches that allows to furnish answers to these questions. This is done by developing unitary methodologies embracing simultaneously three directions: modelling, simulation, and inference. This talk presents the use of marked point processes applied to such structures detection and characterization. Practical examples are also shown.

10:10-12:10 Session 18: Artificial Intelligence (III)
10:10
Scaling-up the Analysis of Neural Networks by Affine Forms: A Block-Wise Noising Approach
PRESENTER: Asma Soualah

ABSTRACT. The effectiveness of neural networks in handling visual perturbations are frequently assessed using abstract trans- forms, such as affine transformations. However, these trans- forms may forfeit precision and be computationally expensive (time and memory consuming). We suggest in this article a novel approach called block-wise noising to overcome these limitations. Block-wise noising simulates real-world situations in which par- ticular portions of an image are disrupted by inserting non-zero noise symbols only inside a given section of the image. Using this method, it is possible to assess neural networks resilience to these disturbances while preserving scalability and accuracy. The experimental results demonstrate that the present block-wise noising achieves a 50% speed improvement compared to the usual affine forms on specific trained neural networks. Additionally, it can be especially helpful for applications like computer vision, where real-world images may be susceptible to different forms of disturbance.

10:30
Using sequences of API Calls to identify and classify ransomware families

ABSTRACT. This paper presents a Machine Learning based approach to properly identify and classify Ransomware families, based on API sub-sequences features. The increase of ransomware activity in the last years and their destructive capacity needs a proper response. Taking into account the variety of techniques used by ransomware gangs to evade detection, we propose a generic method to properly identify the specific traits of a ransomware family based on discriminant API Calls sequences. Another goal of this research is to identify the code similarities between ransomware families, hence the usage of Malware-asa-service concept in ransomware based attacks. The database used for training and validating our approach consists in 19,563 ransomware samples, 52 families and 416 campaigns (activity months), collected starting with 2020. Our experimental results show that sub-sequences of API-Calls can be successfully used to identify ransomware traits. However, this method is strongly dependent on the quality of the implemented emulator.

10:50
Machine Learning Predictive Models Applied on COVID-19 Datasets to Estimate Infection Risk in Specific Geographic Regions
PRESENTER: Diogen Babuc

ABSTRACT. This study investigates statistical indicators and evaluation metrics for five shallow machine-learning models on two COVID-19 datasets. The goal is to find the best-performing model which tracks the progress of COVID-19 statistics. The selected models are K-Nearest Neighbors, Decision Tree (DT), Support Vector Machine, Classification and Regression Trees (CART), and Extreme Gradient Boost (XGBoost). The dataset for the Oceania countries, selected from Kaggle was augmented using the cumulative sum of the values from two columns, to have enough training data. The best-performing models for this dataset are CART and DT, containing measurement values close to the ones from practical data. The DT values for mean squared and absolute errors can assess a reliable accuracy for the predictive model. The coefficient of determination is, on average, 0.8 for both CART and DT. The correlation coefficient indicates the strong relationship between the selected variables. For the dataset with the last 100 countries on Worldometer (total cases up to September 2020), the CART and DT models show a trustworthy ratio of true negative predictions to the total number of actual negatives. Accuracy, sensitivity, precision, and F1-score are all significant, which indicates that a classification model is performing very well across multiple important metrics. Through this, models can detect countries from the risk zone. The graphs and charts, obtained for the DT model, show that mean squared and absolute errors can create a forecast close to the actual value. XGBoost is in the third position. Some models can help in discovering geographic regions with a risk of infection. CART and DT are the best-performing models for both selected datasets.

11:10
Statistical relevance of neural networks and decision trees in the forecasting of a popular beverage consumption

ABSTRACT. In this paper, we compare the relevance of neural networks and decision tree methods used to forecast the popular beverage consumption part of collaborative work for developing a platform that can assist business decisions. After a process of training, we compute and compare the mean absolute log error (MALE), the mean squared log error (MSLE), the root mean squared log error (RMSLE), the geometric mean relative error (GMRE) and the exponential root mean squared log error (ERMSLE) values obtained in both methods, using a significant sample of millions of observations spanned over multiple years. The experimental results obtained have as a final goal to propose the best model that is suited to predict the consumption of the product with high accuracy and assist the business decision process reliably.

11:30
Leveraging self-supervised label generation in ego-trajectory segmentation for self-driving vehicles
PRESENTER: Andrei Mihalea

ABSTRACT. Ego-trajectory segmentation can play a crucial role in autonomous driving since it can offer valuable information for perception modules, representing a guideline about the environment and the possible allowed future trajectories of the vehicle at each point in time. Current approaches for trajectory estimation don't usually involve a segmentation task and are tied to some prior knowledge like maps or the usage of expensive data acquisition sensors like laser scanners. In our work, we present a novel approach that relies on a self-supervised mechanism for generating ego-trajectory segmentation labels and then training a semantic segmentation network on these labels in order to predict the future trajectory of the vehicle using only a monocular camera input. In every step of our pipeline, the only needed sensor is a monocular camera. The framework can be split in three representative steps: estimating the ego-motion of the vehicle using a self-supervised method, generating the segmentation labels from the predicted ego-motion and training the segmentation network on the previously generated labels in order to predict trajectories for new data. To evaluate our approach, we will use the KITTI odometry dataset and compare the predicted trajectories with the ground truth ones, obtained from the available ego-motion ground truth from the dataset.

11:50
Prediction of Cloud Service Failure using traditional Machine Learning
PRESENTER: Adrian Spataru

ABSTRACT. The ability to predict failures in complex systems is crucial for maintaining their optimal performance, opening the possibility of reducing downtime, and of minimizing costs. In the context of cloud computing, cloud failure represents one of the most relevant problems, which not only leads to substantial financial losses, but also negatively impacts the productivity of both industrial and end users. This paper presents a comprehensive study on the application of failure prediction techniques, by exploring four machine learning algorithms, namely Decision Tree, Random Forest, Gradient Boosting and Logistic Regression.

The research focuses on analysing the workload of an industrial set of clusters, provided as traces in the Google's Borg cluster workload traces. The aim was to to develop highly accurate predictive models for both job and task failures, goal which was achieved. A job classifier having a performance of 83.97\% accuracy (Gradient Boosting) and a task classifier of 98.79\% accuracy performance (Decision Tree) were obtained.

12:10
Enhancing the performance of software effort estimation through boosting ensemble learning

ABSTRACT. The artificial intelligence domain is growing at a rapid pace, bringing more and more benefits to the day-to-day life of humans, as well as their professional life, independent of the industry they work in. For software engineers and their work, search-based software engineering shows true potential and early results in facilitating tasks, as well as providing useful information and perspective for making various decisions, like choosing the architecture of a project, estimating resources, and so on. Software effort estimation is a major component of the software development cycle, playing a crucial role in the outcome of a project. Comprehending the volume and effort of a software project in the early stages is not a trivial problem, but a necessary step since both over and underestimates can lead to client dissatisfaction, thus a low-quality product, and in some cases, even project failure. This paper investigates whether the use of a boosting ensemble learning approach for the problem at hand contributes to enhancing the performance of estimating the software development effort. An increase of about 18% in terms of Mean Squared Error and 88% in R-squared performance metrics has been achieved by the boosted model compared to the classic one, without boosting.