EPIA2023: 22ND EPIA CONFERENCE ON ARTIFICIAL INTELLIGENCE
PROGRAM FOR TUESDAY, SEPTEMBER 5TH
Days:
next day
all days

View: session overviewtalk overview

10:00-10:30Coffee Break
10:30-12:30 Session 3A: KDBI - I
Location: Ballroom
10:30
A Comparison of Automated Machine Learning tools for Predicting Energy Building Consumption in Smart Cities

ABSTRACT. In this paper, we explore and compare three recently proposed Automated Machine Learning (AutoML) tools (AutoGluon, H2O, Oracle AutoMLx) to create a single regression model that is capable of predicting smart city energy building consumption values. Using a recently collected one year hourly energy consumption dataset, related with 29 buildings from a Portuguese city, we perform several Machine Learning (ML) computational experiments, assuming two sets of input features (with and without lagged data) and a realistic rolling window evaluation. Furthermore, the obtained results are compared with a univariate Time Series Forecasting (TSF) approach, based on the automated FEDOT tool, which requires generating a predictive model for each building. Overall, competitive results, in terms of both predictive and computational effort performances, were obtained by the input lagged AutoGluon single regression modeling approach.

10:50
Pollution Emission Patterns of Transportation in Porto, Portugal through Network Analysis

ABSTRACT. Over the past few decades, road transportation emissions have increased. Vehicles are among the most significant sources of pollutants in urban areas. As such, several studies and public policies emerged to address the issue. Estimating greenhouse emissions and air quality over space and time is crucial for human health and mitigating climate change. In this study, we demonstrate that it is feasible to utilize raw GPS data to measure regional pollution levels. By applying feature engineering techniques and using a microscopic emissions model to calculate vehicle-specific power (VSP) and various specific pollutants, we identify areas with higher emission levels attributable to a fleet of taxis in Porto, Portugal. Additionally, we conduct network analysis to uncover correlations between emission levels and the structural characteristics of the transportation network. These findings can potentially identify emission clusters based on the network's connectivity and contribute to developing an emission inventory for an urban city like Porto.

11:10
Hybrid SkipAwareRec: A Streaming Music Recommendation System

ABSTRACT. In an automatic music playlist generator, such as an automated online radio channel, how should the system react when a user hits the skip button? Can we use this type of negative feedback to improve the list of songs we will playback for the user next? We propose SkipAwareRec, a next-item recommendation system based on reinforcement learning. SkipAwareRec recommends the best next music categories, considering positive feedback consisting of normal listening behaviour, and negative feedback in the form of song skips. Since SkipAwareRec recommends broad categories, it needs to be coupled with a model able to choose the best individual items. To do this, we propose Hybrid SkipAwareRec. This hybrid model combines the SkipAwareRec with an incremental matrix factorisation (MF) algorithm that selects specific songs within the recommended categories. Our experiments with Spotify’s Skip Prediction Challenge dataset show that Hybrid SkipAwareRec has the potential to improve recommendations by a considerable amount with respect to the skip-agnostic MF algorithm. This strongly suggests that reformulating the next recommendations based on skips improves the quality of automatic playlists.

11:30
Mining Causal Links Between TV Sports Content and Real-World Data

ABSTRACT. This paper analyses the causal relationship between external events and sports content TV audiences. To accomplish this, we explored external data related to sports TV audience behaviour within a specific time frame and applied a Granger causality analysis to evaluate the effect of external events on both TV clients’ volume and viewing times. Compared to regression studies, Granger causality analysis is essential in this research as it provides a more comprehensive and accurate understanding of the causal relationship between external events and sports TV viewership.The study results demonstrate a significant impact of external events on the TV clients' volume and viewing times. External events such as the type of tournament, match popularity, interest and \textit{home team effect} proved to be the most informative about the audiences. The findings of this study can assist TV distributors in making informed decisions about promoting sports broadcasts.

11:50
Imbalanced Regression Evaluation under Uncertain Domain Preferences

ABSTRACT. In natural phenomena, data distributions often deviate from normality. One can think of cataclysms as a self-explanatory example: rarely occurring events differ considerably from common outcomes. In real-world domains, such tail events are often the most relevant to anticipate, allowing us to take adequate measures to prevent or attenuate their impact on society. However, mapping target values to particular relevance judgements is challenging and existing methods do not consider the impact of bias in reaching such mappings -- relevance functions. In this paper, we tackle the issue of uncertainty in non-uniform domain preferences and its impact on imbalanced regression evaluation. Specifically, we develop two methods for assessing the volatility of model performance when dealing with uncertainty regarding the range of target values that are more important to the underlying problem. We demonstrate the importance of our proposed methods in capturing the impact of small changes in relevance assessments of target values and how they may impact experimental conclusions.

12:10
Measuring latency-accuracy trade-offs in Convolutional Neural Networks

ABSTRACT. Several systems that employ machine learning models are subject to strict latency requirements. Fraud detection systems, transportation control systems and network traffic analysis are a few examples. These requirements are imposed at inference time, when the model is queried. However, it is not trivial how to adjust model architecture and hyperparameters in order to obtain a good trade-off between predictive ability and inference time. This paper provides a contribution in this direction by presenting a study of how different architectural and hyperparameter choices affect the inference time of a Convolutional Neural Network for network traffic analysis. Our case study focus on a model for traffic correlation attacks to the Tor network, that requires the correlation of a large volume of network flows in a short amount of time. Our findings suggest that hyperparameters related to convolution operations - such as stride, and the number of filters - and the reduction of convolution and max-pooling layers can substantially reduce inference time, often with a relatively small cost in predictive performance.

10:30-12:30 Session 3B: AMIA
Location: Library
10:30
Simulation-Based Adaptive Interface for Personalized Learning of AI Fundamentals in Secondary School

ABSTRACT. This paper presents the first results on the validation of a new Adaptive E-learning System, focused on providing personalised learning to secondary school students in the field of education about AI by means of an adaptive interface based on a 3D robotic simulator. The prototype tool presented here has been tested at schools in USA, Spain, and Portugal, obtaining very valuable insights regarding the high engagement level of students in programming tasks when dealing with the simulated interface. In addition, it has been shown the system reliability in terms of adjusting the students’ learning paths according to their skills and competences in an autonomous fashion.

10:50
Design and Development of Ontology for AI-based Software Systems to Manage the Food Intake and Energy Consumption of Obesity, Diabetes and Tube Feeding Patients

ABSTRACT. Poor and sedentary lifestyles combined with bad dietary habits have an impact on our health. Nowadays, diet-related diseases have become a major public health issue, threatening the sustainability of healthcare systems, and new strategies to promote better food intake are now being explored. In this context, the use of ontologies has gained importance over the past decade and become more prevalent. By incorporating ontologies in the healthcare domain, artificial intelligence (AI) can be enhanced to better support healthcare systems dealing with chronic diseases, such as obesity and diabetes requiring long-term progress and frequent monitoring. This is especially challenging with current resource inefficiency; however, recent research suggests that incorporating ontology into AI-based technological solutions can improve their accuracy and capabilities. Additionally, recommendation and expert systems benefit from incorporating ontologies for a better knowledge representation and processing to increase success rates. This study outlines the development of an ontology in the context of food intake to manage and monitor patients with obesity, diabetes, and those using tube feeding. A standardized vocabulary for describing food and nutritional information was specified to enable the integration with different healthcare systems and provide personalized dietary recommendations to each user.

11:10
A Gamified Distributed Infrastructure for Collectively Sharing People’s Eyes

ABSTRACT. This paper presents the design and evaluation of Gamified CollectiveEyes that is a digital infrastructure to collectively share human eyes and ears. Gamified CollectiveEyes collects people’s viewpoints and hearings in the world anywhere at all times, and a user sees several collected viewpoints simultaneously in a 3D virtual space. For navigating human viewpoints and hearings collected by Gamified CollectiveEyes, we propose a novel abstraction named topic channel in the paper, where a user can choose viewpoints and hearings that he/she wants to see by changing like a TV channel. After presenting an overview of Gamified CollectiveEyes, we show two user studies: the first is to investigate the human motivation mechanism to offer their viewpoints and hearings and the second is to investigate the configuration to present multiple viewpoints.

11:30
A System for Animal Health Monitoring and Emotions Detection
PRESENTER: Peter Mikulecky

ABSTRACT. We are used to seeing the manifestations of various emotions in humans, but animals also show emotions. A better understanding of animal emotions is closely related to creating animal welfare. Research in this direction may impact other ways to improve the lives of domestic and farm animals or animals in captivity. In addition, better recognition of negative emotions in animals can help prevent unwanted behaviour and health problems caused by long-term increased levels of stress or other negative emotional states. Research projects focused on the emotional needs of animals can benefit animals and contribute to a more ethical and sustainable relationship between humans and animals. This article is focused on the one hand on the description of the system that was created in the previous related research for monitoring the vital functions of animals, and on the other hand, especially on the investigation of the possibilities of how the given system can be used to identify the emotional states of animals.

10:30-12:30 Session 3C: AIOTA
Location: Card Room
10:30
Segmentation as a pre-processing for automatic grape moths detection

ABSTRACT. Grape moths are a significant pest in vineyards, causing damage and losses in wine production. Pheromone traps are used to monitor grape moth populations and determine their developmental status to make informed decisions regarding pest control. Smart pest monitoring systems that employ sensors, cameras, and artificial intelligence algorithms are becoming increasingly popular due to their ability to streamline the monitoring process. In this study, we investigate the effectiveness of using segmentation as a pre-processing step to improve the detection of grape moths in trap images using deep learning models. We train two segmentation models, the U-Net architecture with ResNet18 and InceptionV3 backbones and utilize the segmented and non-segmented images in the YOLOv5s and YOLOv8s detectors to evaluate the impact of segmentation on detection. Our results show that segmentation pre-processing can significantly improve detection by 3\% for YOLOv5 and 1.2\% for YOLOv8. These findings highlight the potential of segmentation pre-processing for enhancing insect detection in smart pest monitoring systems, paving the way for further exploration of different training methods.

10:50
Evaluating the causal role of environmental data in shellfish biotoxin contamination on the Portuguese coast

ABSTRACT. Shellfish accumulation of marine biotoxins at levels unsafe for human consumption may severely impact their harvesting and farming, which has been grown worldwide in response to the growing demand for nutritious food and protein sources. In Southern European countries, DSP (diarrhetic shellfish poisoning) toxins are the most abundant and frequent toxins derived from algal blooms, affecting shellfish production yearly. Therefore, it is essential to understand the natural phenomenon of DSP toxins accumulation in shellfish and the meteorological and biological parameters that may regulate and influence its occurrence. In this work, we studied the relationship between the time series of several meteorological and biological variables and the time series of the concentration of DSP toxins in mussels on the Portuguese coast, using the Pearson’s correlation coefficient, time series regression modeling, Granger causality, and dynamic Bayesian networks using the MAESTRO tool. The results show that, for the models tested, the mean sea surface and air temperature time series with a one, two, or three-week lag can be valuable candidate predictors for forecasting the DSP concentration in mussels. Overall, this proof-of-concept study emphasizes the importance of statistical learning methodologies for analyzing time series environmental data and illustrates the importance of several variables in predicting DSP biotoxins concentration, which can help the shellfish production sector mitigate the negative impacts of DSP biotoxins accumulation in shellfish.

11:10
Can the segmentation improve the grape varieties' identification through images acquired on-field?

ABSTRACT. Grape varieties play an important role in wine’s production chain, its identification is crucial for controlling and regulating the production. Nowadays, two techniques are widely used, ampelography and molecular analysis. However, there are problems with both of them. In this scenario, Deep Learning classifiers emerged as a tool to automatically classify grape varieties. A problem with the classification of on-field acquired images is that there is a lot of information unrelated to the target classification. In this study, the use of segmentation before classification to remove such unrelated information was analyzed. We used two grape varieties identification datasets to fine-tune a pre-trained EfficientNetV2S. Our results showed that segmentation can slightly improve classification performance if only unrelated information is removed.

11:30
Deep Learning-Based Tree Stem Segmentation for Robotic Eucalyptus Selective Thinning Operations

ABSTRACT. Selective thinning is a crucial operation to reduce forest ignitable material, to control the eucalyptus species and maximise its profitability. The selection and removal of less vigorous stems allows the remaining stems to grow healthier and without competition for water, sunlight and nutrients. This operation is traditionally performed by a human operator and is time-intensive. This work simplifies selective thinning by removing the stem selection part from the human operator's side using a computer vision algorithm. For this, two distinct datasets of eucalyptus stems (with and without foliage) were built and manually annotated, and three Deep Learning object detectors (YOLOv5, YOLOv7 and YOLOv8) were tested on real context images to perform instance segmentation. YOLOv8 was the best at this task, achieving an Average Precision of 74% and 66% on non-leafy and leafy test datasets, respectively. A computer vision algorithm for automatic stem selection was developed based on the YOLOv8 segmentation output. The algorithm managed to get a Precision above 97% and a 81% Recall. The findings of this work can have a positive impact in future developments for automatising selective thinning in forested contexts.

11:50
Enhancing Pest Detection Models through Improved Annotations

ABSTRACT. AI-based pest detection is gaining popularity in data-centric scenarios, providing farmers with excellent performance and decision support for pest control. However, these approaches often face challenges that require complex architectures. Alternatively, data-centric approaches aim to enhance the quality of training data. In this study, we present an approach that is particularly relevant when dealing with low data.

Our proposed approach improves annotation quality without requiring additional manpower. We trained a model with data of inferior annotation quality and utilized its predictions to generate new annotations of higher quality. Results from our study demonstrate that, using a small dataset of 200 images with low resolution and variable lighting conditions, our model can improve the mean average precision (mAP) score by 1.1 points.

12:10
Sound-based Anomalies Detection in Agricultural Robotics Application

ABSTRACT. Agricultural robots are exposed to adverse conditions reducing the time of life of the components. To reduce the number of inspection, repair and maintenance activities, we propose the use of audio-based systems to diagnose and detect anomalies in these robots. Audio-based systems are non-destructive/intrusive solutions. Besides, it provides a significant amount of data to diagnose problems and for a wiser scheduler for preventive activities. So, in this work, we installed two microphones in an agricultural robot with a mowing tool. Real audio data was collected with the robotic mowing tool operating in several different conditions and stages. Besides, a Sound-based Anomalies Detector (SAD) is proposed and tested with this dataset. The SAD considers a short-time Fourier transform (STFT) computation stage connected to a Support Vector Machine (SVM) classifier. Testing the proposed SAD with collected dataset, a F1 score between 95% and 100% is reached to detect anomalies in a real mowing robot operation.

12:30-14:00Lunch Break
14:00-16:00 Session 4A: KDBI - II
Location: Ballroom
14:00
Time-series pattern verification in CNC machining data

ABSTRACT. Effective quality control is essential for efficient and successful manufacturing processes in the era of Industry 4.0. Artificial Intelligence solutions are increasingly employed to enhance the accuracy and efficiency of quality control methods. In Computer Numerical Control machining area, challenges involve the identification and verification of specific patterns of interest or trends in a time-series dataset. However, this can be a challenge due to the extensive diversity. Therefore, this work aims to develop a methodology capable of verifying the presence of a specific pattern of interest in a given collection of time-series. This study mainly focuses on evaluating One-Class Classification techniques using Linear Frequency Cepstral Coefficients to describe the patterns on the time-series. A real-world dataset produced by turning machines was used, where a time-series with a certain pattern needed to be verified to monitor the wear offset. The initial findings reveal that the classifiers can accurately distinguish between the time-series’ target pattern and the remaining data. Specifically, the One-Class Support Vector Machine achieves a classification accuracy of 95.6 % ± 1.2 and an F1-score of 95.4 % ± 1.3.

14:20
Analysis of Dam Natural Frequencies Using a Convolutional Neural Network

ABSTRACT. The accurate estimation of dam natural frequencies and their evolution over time can be very important for dynamic behaviour analysis and structural health monitoring. However, automatic modal parameter estimation from ambient vibration measurements on dams can be challenging, e.g., due to the influence of reservoir level variations, operational effects, or dynamic interaction with appurtenant structures. This paper proposes a methodology for improving automatic identification of natural frequencies of dams using a supervised Convolutional Neural Network (CNN), fed with real preprocessed sensor monitoring data, in the form of spectrograms, for training. The case study is the 132 m high Cabril arch dam, in operation since 1954 in Portugal; the dam was instrumented in 2008 with a continuous dynamic monitoring system. Modal analysis has been performed using an automatic modal identification program, based on the Frequency Domain Decomposition (FDD) method. The evolution of the experimental natural frequencies of Cabril dam over time are compared with the frequencies predicted using the parameterized CNN based on different sets of data. The results show the potential of the proposed neural network to complement the implemented modal identification methods and improve automatic frequency identification over time.

14:40
Studying the impact of sampling in highly frequent time series

ABSTRACT. Nowadays, much data is being generated by all kinds of sensors, and more metrics are being measured. These large quantities of data are stored in large data centers and used to create datasets to train Machine Learning algorithms for most different areas. However, with more data, more time is needed to process that data and to train the Machine Learning algorithms, and more space is required to store all the data. This creates a problem with Big Data. In this paper, we propose simple techniques for reducing these large datasets into smaller versions without compromising the forecasting capability of the generated model and, simultaneously, reducing the time needed to train these models and the space required to store the reduced sets. The proposed approach was tested in three public datasets and one private dataset. The results show that it is possible to use reduced sets to train the algorithms without affecting the forecasting capability of their models. This approach is more efficient when used with datasets with higher frequencies and larger seasonalities. With the reduced sets we obtain decreases in the training time between 40% and 94% and between 46% and 65% for the memory needed to store the reduced sets.

15:00
Interpreting What Is Important: An Explainability Approach And Study On Feature Selection

ABSTRACT. Machine learning models are widely used in time series forecasting. One way to reduce its computational cost and increase its efficiency is to select only the relevant exogenous features to be fed into the model. With this intention, a study on the feature selection methods: Pearson correlation coefficient, Boruta, Boruta-Shap, IMV-LSTM, and LIME is performed. A new method focused on interpretability, SHAP-LSTM, is proposed, using a deep learning model training process as part of a feature selection algorithm. The methods were compared in 2 different datasets showing comparable results with lesser computational cost when compared with the use of all features. In all datasets, SHAP-LSTM showed competitive results, having comparatively better results on the data with a higher presence of scarce occurring categorical features.

14:00-16:00 Session 4B: AITS
Location: Library
14:00
An Ethical Perspective on Intelligent Transport Systems

ABSTRACT. Intelligent Transport Systems (ITS) is a fast evolving domain with an increasingly important role in shaping the future of transport and a significant impact on a wide range of issues, many of which have ethical implications. On the other hand, Ethics is essential to ensure that ITS are safe, fair, accountable, trustworthy, and respectful of privacy. This study examines the ethical concerns around transport system and its impact on economic, social and environmental dimensions, from the spirit of the foundational concepts of Ethics to the specific issues raised by intelligent transport, including those enhanced by Artificial Intelligence (AI) and Machine Learning (ML) systems. The primordial ethical concerns of transport have, in some extent, been mitigated with the introduction of the ITS paradigm, but others have arisen as a result of emerging technologies. Ethics is therefore critical in intelligent transport because of its potential to significantly impact individuals, communities, and society as a whole, and is an important tool to design more sustainable, equitable, and fair transport systems.

14:20
Improving Address Matching using Siamese Transformer Networks

ABSTRACT. Matching addresses is a critical task for companies and post offices involved in the processing and delivery of packages. The ramifications of incorrectly delivering a package to the wrong recipient are numerous, ranging from harm to the company’s reputation to economic and environmental costs. This research introduces a deep learning-based model designed to increase the efficiency of address matching for Portuguese addresses. The model comprises two parts: (i) a bi-encoder, which is finetuned to create meaningful embeddings of Portuguese postal addresses, then utilized to retrieve the top 10 likely matches of the un-normalized target address from a normalized database, and (ii) a cross-encoder, which is fine-tuned to accurately re-rank the 10 addresses obtained by the bi-encoder. The model has been tested on real-case scenario of Portuguese addresses and exhibits a high degree of accuracy, exceeding 95% at the door level. When utilized with GPU computations, the inference speed is about 4.5 times quicker than other traditional approaches such as BM25. An implementation of this system in a real-world scenario would substantially increase the effectiveness of the distribution process. Such an implementation is currently under investigation.

14:40
Safety, Stability, and Efficiency of Taxi Rides

ABSTRACT. We propose a novel approach for limiting passenger harassment during taxi rides, where penalizing harassing drivers and matching them to passengers play key roles. In this paper, we focus on the matching part. In particular, we propose a novel two-sided market model, with drivers on one of the sides and passengers on the other side, where drivers have profit preferences for passengers and passengers have the following preferences for drivers: (1) incomplete numeric driver-safety preferences; (2) complete numeric driver-delay preferences; (3) complete ordinal driver-type preferences. Safety can be based on past experiences. Delays can be based on departure times. Types can be based on gender, race, age, culture, etc. Given these three-layer preferences, we study increasing the safety and stability in matchings, thus possibly reducing harassment. In addition, we combine safety and stability with maximizing total profit or minimizing total delay. We design two novel algorithms (i.e.RunGS3Count2 and RunGS2Count3) and compared them in simulations in terms of efficiency, stability, and scalability.

15:00
Using CDR Data to Understand Post-Pandemic Mobility Patterns

ABSTRACT. During the COVID-19 pandemic, the measures imposed to slow the spread of the virus had a profound impact on population dynamics around the world, producing unprecedented changes in mobility. Spatial data on human activity, including Call Detail Records (CDRs), have become a valuable source of information for understanding those changes. In this paper we study the population's mobility after the first wave of the pandemic within Portugal, using CDR data. We identify the movements and stops of the citizens, at an antenna level, and compare the results in the first months after the lifting of most of the contingency measures with the same period of the following year, highlighting the advantages of using CDRs to analyze mobility in pandemic contexts. Results based on two mobile phone datasets showed a significant difference in mobility in the two periods.

14:00-16:00 Session 4C: AISC
Location: Card Room
14:00
Using Artificial Intelligence for Trust Management Systems in Fog Computing

ABSTRACT. Fog computing has recently attracted great attention as an emerging computing paradigm, avoiding the latency concerns of the cloud. However, due to the decentralized distributed nature of the fog, several security and privacy challenges rise when fog nodes work together and exchange data in certain tasks. Since fog servers are very close to end users and may collect sensitive information, they must be trustworthy for delegation. Yet, traditional cryptographic solutions cannot be deployed to handle internal attacks, i.e., from a rogue fog that has been authenticated to join the system, raising the concern of how to establish a reliable and trusted communication between the fog nodes. Trust Management Systems (TMS) have been developed to calculate the level of assurance between fog nodes based on their communication behavior to detect the malicious nodes in the network. Conventional authentication methods, i.e., password-based, certificated-based and biometric-based, do not fit the fog because of its unique architecture, consuming significantly more computation power and provoke latency. Thus, several re-search issues remain open for TMS in the fog, including creating trusted execution environments, trust and security during fog orchestration, collusion attack and access control. In this paper, we investigate using artificial intelligence techniques to tackle the main challenges of TMS in fog computing. We conduct a comparative study to evaluate the major TMS in the literature and identify their advantages and disadvantages. We then highlight the primary insights and recommendations to improve TMS using artificial intelligence to have more efficient TMS in fog computing.

14:20
Using Hyperspectral Images and Lidar Data to Create Models for the Classification and CAVE visualization of Tree Species

ABSTRACT. The article describes the use of artificial intelligence methods to create models for detecting and classifying tree species, using hyperspectral images and LiDAR data in the aerial photography of energy line structures. The most important output is a validated model with cloud infrastructure support for detecting and classifying objects of interest at the TRL 5 level, which is also exceptional on a global scale. The outputs of the research are also a geodatabase of reference tree characteristics, a library of spectral curves, a database of simulation of tree growth, but also a cloud infrastructure to support the development of classification models and data storage. An important output will be the visualization of the results of simulations in the "CAVE" environment.

14:40
Depression Detection for Twitter Users using Machine Learning Sentiment Analysis

ABSTRACT. Since depression often results in suicidal thoughts and leaves a person severely disabled daily, there is an elevated risk of premature mortality due to mental problems caused by depression. Therefore, it's crucial to identify the patient's mental illness as soon as possible. People are increasingly using social media platforms to express their opinions and share daily activities, which makes online platforms rich sources of early depression detection. The contribution of this paper is multifold. First, it presents five machine-learning models for English depression detection using twitter text. The best model for English text without negation, the model achieved 92% for binary classification and 88% for multi-classification (depressed, indifferent, happy). For English text with negation, an 87%, 85% f1 score was achieved for binary and multi-classification respectively. In addition, two automatically annotated English corpora, Eng_without_negation_60.000 corpus of 60,172 English tweets and Eng_with_negation_57.000 corpus of 57,392 English tweets. Both covered a wide range of depressed and cheerful terms, however, Negation was included in the Eng_with_negation_57.000 corpus. Finally, this paper exposes a depression-detection web application which implements our optimal models to detect tweets that contain depression symptoms and predict depression trend for a person using English language.

15:00
Source-Code Generation using Deep Learning: A Survey

ABSTRACT. In recent years, the need for writing effective, reusable, and high-quality source code has grown exponentially. Writing source code is an integral part of building any software system; the development phase of the software lifecycle contains code implementation, refactoring, maintenance, and fixing bugs. Software developers implement the desired solution by turning the system requirements into viable software products. For the most part, the implementation phase can be challenging as it requires a certain level of problem-solving skills and the ability to produce high-quality outcomes without decreasing productivity rates or not meeting the business plans and deadlines. Programmers’ daily tasks might also include writing large amounts of repetitive boilerplate code, which can be tedious, not to mention the potential bugs that could arise from human errors during the development process. The ability to automatically generate source code will save significant time and effort invested in the soft-ware development process by increasing the speed and efficiency of software development teams. In this survey, we review and summarize the recent studies on deep learning approaches used to generate source code in different programming languages such as Java, Python, and SQL (Structured Query Language). We categorize the surveyed work into two groups, Natural Language-based solutions for approaches that use natural text as input and Computer Vision-based solutions which generate code based on images as input.

15:20
An IoT-Based Framework for Sustainable Supply Chain Management System

ABSTRACT. The "smart supply chain" is a new way of doing business made possible by smart, sustainable business and IT trends. Sustainable supply chains are a creative movement that uses information technology to improve the quality of operations at their sites so that activities can be changed to meet social and environmental needs. IoT is one of the most critical parts of smart's technological foundation in this way. This paper shows how to set up a sustainable supply chain based on IoT. Based on the IoT's four-stage architecture, this framework was made by looking at the research, surveying general people, and evaluating the opinions of people who work in this field. This way of thinking makes it easy to make good environmental decisions throughout the supply chain. It also shows the direct link between data col-lection and how it interacts with sectors that are affected by environmental sustainability. Experts in the supply chain have approved this framework, which can help technology-focused industrial organizations adopt the smart supply chain.

16:00-16:30Coffee Break
16:30-18:30 Session 5: Student Symposium
Location: Ballroom
Emotional state classification from brain signals using CNNs adapted for fMRI signal properties

ABSTRACT. Accurately classifying emotional states from functional magnetic resonance imaging (fMRI) data presents challenges in the field of brain state classification. Traditional machine learning methods struggle with high-dimensional fMRI data, leading to a growing interest in deep learning (DL) models. In this study, we propose a novel approach to classify emotional arousal levels using fMRI data by adapting the EEGNet architecture, originally designed for electro-encephalography (EEG) classification. By leveraging two-dimensional representations of the fMRI time courses, EEGNet models achieved accuracies ranging from 70.49% to 72.54%. Furthermore, our newly developed network, fMRINet, outperforms previous models, reaching an accuracy of 73.76%. These results highlight the potential of DL models to capture complex patterns in fMRI data, even with limited available data. Our findings contribute to the field of brain state classification and provide insights into the classification of different emotional states using fMRI data.

Learning Postoperative Pain through Physiological Signals

ABSTRACT. Choosing the appropriate treatment to manage postoperative pain depends on accurately assessing its intensity. However, the current assessment methods are subjective, discontinuous, and inadequate for evaluating the pain of patients unable to communicate verbally. Therefore, there is a need to develop an objective and continuous method that does not require patient reports. This work proposes to develop data science strategies based on physiological signals to monitor and manage postoperative pain more effectively. To this end, relevant physiological features that exhibit strong correlations with self-reported pain will be identified, enabling the prediction of postoperative pain intensity and detection of pain relief after medication, with the support of machine learning approaches.

Electrocardiogram for Biometric Recognition: Collectability, Stability and Application Challenges

ABSTRACT. Innovative approaches to individuals recognition based on physiological and behavioral characteristics have emerged since surrogate representations of identity no longer suffice. This trend is encouraged by the increase of low-cost computational power, allowing these methodologies to be deployed with an efficiency that is expected to have a great impact on technology. Biometric recognition through Electrocardiogram (ECG) has recently seen progress, largely due to the uniqueness of the ECG signal. This thesis aims to develop a polymeric-based off-the-person ECG sensor to acquire ECG signals and propose new methods for biometric authentication and identification of individuals.

Clustering Massive, Noisy, and Unstructured Textual Streams

ABSTRACT. Many applications generate extensive short text as stream data. Traditional approaches often struggle to handle this high-dimensional, sparse data. Two main methods, similarity-based text stream clustering and the probabilistic topic model approach, have been used in Short Text Stream Clustering (STSC) studies. However, these methods have limitations when dealing with massive, noisy, user-generated data, including misspellings, ambiguous abbreviations, and nonstandard shortening of words. This research aims to evaluate the performance of these clustering methods against other state-of-the-art approaches and to develop methods that integrate probabilistic and similarity-based approaches. The focus is on identifying the most efficient similarity-based methods, exploring various word representation techniques, and creating an integrated approach that effectively handles the challenges posed by short text data streams.

Data Quality, Data Balance and Data Documentation: a framework

ABSTRACT. Data is one of the key element of the artificial intelligence pipeline. For this reason, data quality is crucial for producing good results. However, ensuring high data quality alone is not sufficient to prevent all ethical concerns. Expanding data quality frameworks, considering also data balance and data documentation can be helpful to address critical aspects of ethical considerations related to AI systems. The proposed framework introduces additional quality measures, such as data balance assessments and data documentation quality. The first one may be useful to identify the risks of disproportionate treatment of different groups based on their protected characteristics. The latter one emphasize the importance of documenting datasets, making them more transparent and accountable. By integrating these measures into the development pipeline through appropriate data labels, we aim to enable practitioners to build systems based on principles of trustworthiness, accountability and fairness. We outline future research directions for automating these metric evaluation processes.

Estimating the Density Ratio with a ReLU Induced Tessellation

ABSTRACT. Density ratio estimation, crucial in machine learning, is hindered by challenges such as high-dimensionality data. Current methods struggle with complex data or have high computational costs. In this work, we try to tackle some of the existing issues using disjoint linear regions that are created during the learning process of neural networks with ReLU activation function, providing an innovative approach to estimate density ratios, with applications in unsupervised domain adaptation problems.

A Semantic Search System for the Supremo Tribunal de Justiça

ABSTRACT. This work investigated and developed a prototype of a Semantic Search System to assist the Supremo Tribunal de Justi¸ca (Portuguese Supreme Court of Justice) in its decision-making process. We built a Hybrid Search System that incorporates both lexical and semantic techniques by combining the capabilities of BM25 and the potential of Legal-BERTimbau. In this context, we obtained a 335% increase on the Discovery metric when compared to BM25 for the first query result. This work also introduces a new technique of Metadata Knowledge Distillation.

Addressing Self-Sustainability in Multi-Agent Systems through Combating Terrorism Financing

ABSTRACT. Self-sustainability of Multiagent Systems (MAS) can be explored by advancing the idea of self-organization (SO) in MAS. We argue that when developing MAS, it must be ensured that they exhibit self-sustainability (SS) as an essential property. The preliminary work presents a consolidated definition of the concept and a set of conditions and properties for a Perfectly Self-Sustainable MAS. Focusing on a specific use-case, combating terrorism financing (TF) within a region, we propose a simple evaluation approach to determine various levels of SS in MAS. While the immediate goal of the concept is to train sustainable agents and to define best practices and conditions on how to achieve this, it anticipates that further down the road, the proposed evaluation methodology of assessing SS within MAS can be used to analyze current practices across other domains and help build a sustainable future. It is a trigger to MAS architects to pay the much needed attention in building systems that are by definition SS either from the application level or within an MAS ecosystem.

Data Leakage Detection and Data Denoising using Causal Mechanisms for Recommender Systems

ABSTRACT. Due to its capacity to filter content precisely for each user and offer a personalized experience, recommendation systems are becoming increasingly popular. Since they primarily rely on machine learning techniques, they have one significant drawback: their analysis relies on statistical connections, which might come from a wide variety of processes. Recommender systems in the real world rely on user behaviour that can be interpreted as causal. Many outstanding issues in recommendation can be solved using causality. Two data pre-processing tasks are the subject of this study. First, I'll expand the method introduced in my Master's Thesis (https://repositorio-aberto.up.pt/handle/10216/146453) to examine data leakage in recommender systems. I will use causal discovery specifically to find the factors that leak information. Second, I'll create a framework for denoising data via causal inference. Finally, as a last use case, I'll test our ideas in recommendation systems for medical decision support.

Can we hold AGI-enabled Robot Morally Responsible for their Actions?

ABSTRACT. This paper argues that if the Artificial General Intelligent (AGI) enabled, robot fulfils functionalist conditions of moral agency, i.e., Interactivity, Independence, and Adaptability prescribed by Floridi and Sanders (2004), AGI-enabled robot could be morally responsible for its actions, at least in a lighter sense. For current AI systems, the moral responsibility question may be less significant since current AI systems are domain-specific and are not autonomous similar to General AI. There is a debate in academia about whether DeepMind's AlphaGo apt subject for praise for "Move 37". I argue that even if AlphaGo directly fulfils moral agency conditions prescribed by the functionalists, we still cannot say AlphaGo is an apt subject for praise for "Move 37". The reason is that AlphaGo is a game-playing algorithm with no moral element. Moral standards do not govern the actions of AlphaGo. However, some agencies could be attributed to the AlphaGo. AlphaGo has some agency because it can perform some actions or some moves/unintended moves. To become a moral agent, one has to perform a morally significant action, whether it could be an immoral action or a moral action. The current AI's moral responsibility question may be less relevant than the future General AI. The researchers of AI and futurists have hypothesized that the developed AGI system would perform cognitive and behavioral tasks similar to humans. The moral responsibility question may be substantial at that time if an AGI-enabled robot kills an innocent person out of its autonomy. The functionality argument holds that whether the AGI system will be developed out of programming or run on a computer rather than in a brain does not matter. If the AGI-enabled robot can functionally interact with its environment or other moral agents, it can make any decisions autonomously, adjust to changing circumstances and perform morally relevant actions; they may be held morally responsible, at least in a lighter sense.

Symbolic music generation conditioned on continuous-valued emotions

ABSTRACT. In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach is the conditioning of a state-of-the-art transformer based on continuous-valued valence and arousal labels. In addition, we provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal.