ICAICTA2024: THE 11TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS
PROGRAM FOR SUNDAY, SEPTEMBER 29TH
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:40 Session D1: Vision&Image
Location: Room1
09:00
Object Removal System for Urban Imagery Using Image Segmentation and Inpainting with A Deep Learning Approach [ONLINE]

ABSTRACT. Digital imagery is a two-dimensional visual representation, which often holds emotional significance and crucial information. However, in images, specifically urban imagery, unwanted objects frequently appear. To address this issue, a system capable of automatically selecting areas with unwanted objects, removing these areas, and reconstructing the removed regions is essential. The object removal system is developed by implementing and integrating an image segmentation module, image inpainting module, and graphical user interface application. The pre-trained image segmentation model, DeepLabv3+, is used for the image segmentation module. On the other hand, there are seven pre-trained image inpainting models, including DeepFillv2, EdgeConnect (Places), EdgeConnect (PSV), MADF (Places), MADF (PSV), MAT, and CoModGAN, which are compared across several testing aspects to be used in the image inpainting module. Based on the analysis of the test results on the test data, the DeepLabv3+ model is proven to perform accurate segmentation with a mIoU value reaching 0.936. The CoModGAN model is chosen as the pre-trained model of the image inpainting module due to its average PSNR score of 26.59dB, SSIM of 0.8908, FID of 39.99, and subjective evaluation of 4.105. The graphical user interface application developed and integrated with the image segmentation and image inpainting modules successfully provides flexibility to users and shows increased performance compared to previous studies.

09:20
3D Traffic Scenes Reconstruction for Autonomous Vehicles using Gaussian Process Latent Variable Model (GPLVM) [ONLINE]

ABSTRACT. Traffic scenes understanding and visualization are important to autonomous vehicles, allowing them to navigate their surrounding and increase the passengers’ sense of trust. This paper contributes to autonomous vehicle research a framework that is able to understand and reconstruct traffic scenes using only a single image from a single monocular camera installed in a vehicle. The reconstruction process is applied between frames, utilizing Simple Online and Realtime Tracking (SORT) framework to improve the vehicles’ movement smoothness. Vehicle shape reconstruction is carried out using gaussian process latent variable model (GPLVM) to embed 3D model shapes to latent variable space. Multisegmented Hough transform is used to detect lane markings, resulting in line equation which approximate the lane’s shape. The framework successfully combines the vehicles and road data to generate the 3D reconstruction of the surrounding traffic scene, although real-time performance is not achieved yet.

09:40
Consistency-Preserving Text-Based One-Shot Video Tuning with Segmentation Mask Guidance

ABSTRACT. Recent text-to-video (T2V) techniques have achieved remarkable success using the text-to-image (T2I) diffusion-based generation diagram. These methods have also been extended to tune video content using the existing T2I model in a one-shot manner. However, these models still struggle with temporal consistency preservation and tend to cause severe jitters, especially for moving objects. To address this issue, in this study, we propose to incorporate segmentation guidance into the diffusion pipeline to promote temporal stability. In particular, we first extract the positions of user-specified objects in each frame using an object segmentation model and generate a sequence of mask images. Then, we utilize the features of the mask image sequence as the query for the cross-attention mechanism in the diffusion model, while the content features of the original video serve as the key and value to generate the edited image sequence. As such, the object position information in the mask guidance can effectively guide the video generation process to reduce jitter. Experiments demonstrate that our method contributes to improved video quality compared to prior video tuning methods in terms of temporal smoothness.

10:00
Summarizing Video Content with a Single Image Using Large Language Models

ABSTRACT. Generating thumbnails for news videos plays an important role in efficiently understanding the contents. Prior techniques mostly handle this task by selecting one keyframe as a representative image. However, this approach cannot effectively handle a video whose key content is distributed across different frames. In this paper, we propose a summarization of a news video by composing its key contents into one image as a thumbnail. To achieve this, our method starts with text extraction from each scene in the video using OCR, speech recognition, and existing image captioning models. We then group these texts based on similarity and leverage large language models to score the group significance. Next, for each group, a keyframe is selected by jointly considering the importance and content quality. Eventually, we compose the objects in these keyframes into a single image as a thumbnail in a non-overlap manner and utilize diffusion-based generative models for further quality refinement. Experiments on real-world news videos demonstrate that our method can effectively extract key video contents and generate natural and informative video thumbnails.

10:20
Implementation of Traffic Congestion Classification Method from CCTV Video Based on Image Feature Analysis with YOLO Algorithm

ABSTRACT. Traffic congestion is one of the major problems in the field of transportation. Traffic congestion classification can be done to detect congestion so that the occurring traffic congestion can be noticed and handled immediately. Traditional traffic congestion classification methods, such as methods that rely on ground sense coil and GPS, are expensive and require much effort to be implemented. With the help of artificial intelligence, classifying traffic congestion in the video obtained from the traffic surveillance camera is possible to be done. Information of the vehicles in the traffic CCTV video can be obtained by using YOLO algorithm. YOLO algorithm is an object detection algorithm based on convolutional neural network. Traffic features, such as traffic flow, occupancy, density, and speed, can be extracted from the object detection result by utilizing image processing methods. Artificial neural network can then be used to classify the status of the traffic based on these four features. Based on the experiment results, the accuracy, precision, recall, and F1-score of the classification model are 84.75%, 84.66%, 84.75%, and 84.69%, respectively.

09:00-10:40 Session D2: Environment
Location: Room2
09:00
Web Application Development of Secondary Education Carbon Footprint Monitoring System [ONLINE]

ABSTRACT. This research presents a comprehensive solution to the significant challenge of environmental impact caused by the widespread integration of technology in secondary education. It proposes a web-based application that optimizes procurement decisions, aligning them with environmental sustainability goals. The system uses a data-driven approach, leveraging existing school data to monitor and reduce carbon emissions effectively. The application not only aids in making informed purchasing decisions but also in managing inventory and usage patterns, thus allowing a significant reduction in carbon footprint. This framework serves as a model for schools worldwide to integrate environmental considerations into their operational strategies, promoting a more sustainable future in education.

09:20
Macroscopic and Energy-Based Greenhouse Gas Emissions Predictions: Current Techniques and Future Directions

ABSTRACT. Predicting greenhouse gas emissions is a crucial effort in mitigating climate change and reducing the harmful effects of these gases. Various machine learning models have been employed for intelligent prediction of greenhouse gas emissions, both at a macroscopic level and through energy demand forecasting. The most popular models include Long Short Term Memory (LSTM), Back Propagation Neural Network (BPNN), Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Random Forest (RF). To enhance the performance of these models, numerous optimization techniques have been utilized, with those from the swarm intelligence group being particularly prominent. Current research challenges involve selecting the appropriate machine learning model and optimization technique, addressing dependency on official data, overcoming model interpretability limitations, and dealing with training data constraints. Future research opportunities lie in discovering or modifying existing machine learning models and optimization techniques, utilizing transfer learning to mitigate limited training data issues, and leveraging quantum computing-based optimization techniques to refine existing machine learning models.

09:40
Heterogeneous Transfer Learning Optimization For Greenhouse Gas Emissions Prediction Using Quantum Annealing

ABSTRACT. Despite its urgency, the prediction of greenhouse gas (GHG) emissions at the city level is hampered by limited quality and quantity of training data so most predictions of GHG emissions are carried out at the country level, using different feature spaces. Heterogeneous Transfer Learning (HeTL) is considered capable of getting around this limitation due to its ability to facilitate the transfer of knowledge between domains that have different feature spaces and distributions. However, the implementation of HeTL is still haunted by the potential of negative transfer in the knowledge transfer process. Current studies on mitigating negative transfer in HeTL still rely heavily on classical optimization techniques and focus solely on either feature-level or instance-level optimization. In this paper, a method is proposed to optimize the knowledge transfer process in HeTL using quantum annealing. The proposed optimization is carried out in three stages: (1) feature alignment, (2) common feature optimization, and (3) data instance optimization. The proposed method seeks to optimize the knowledge transfer process at both the feature and instance levels. It utilizes a combination of classical computing and quantum computing, thereby combining the advantages of the classical approach and the quantum approach to obtain optimal results.

10:00
Quantum-Based Prediction Model for Carbon Neutrality

ABSTRACT. Carbon neutrality is a global target pursued by cities worldwide to achieve a balance between carbon emissions and removals, reaching a net-zero carbon state. Mitigation measures are being implemented to reduce emissions and enhance carbon sequestration, aiming to meet the targets set for 2050 or 2060. However, challenges posed by urban sprawl and increasing urbanization raise concerns about the feasibility of achieving carbon neutrality. Various studies have been conducted to project the attainment of this goal by developing prediction models. Machine learning (ML) prediction models use socio-economic, energy, and technological data to forecast carbon neutrality. These models consider factors like GDP per capita, urbanization rate, total energy consumption, and forest stock volume, formulating scenarios based on policy documents and historical data. Some models have incorporated optimization methods like the sparrow search algorithm, genetic neural network, and aquila optimizer to improve prediction accuracy. However, classical optimization methods have limitations, such as susceptibility to getting trapped in local optima, which can affect model performance. Quantum-based optimization methods, particularly quantum annealing (QA), are emerging as potential solutions to address these challenges by leveraging the principles of quantum mechanics to optimize complex problem spaces. QA enhances ML processes like feature selection, hyperparameter optimization, and regression model optimization. This study provides a review of pipeline processes from state-of-the-art methods, as well as their potential quantum-based enhancements, to achieve more precise predictive models.

10:20
Classification of Water Quality Index Using Machine Learning Algorithm for Well Assessment: A Case Study in Dili, Timor-Leste

ABSTRACT. This paper investigates to use of information technology, i.e. machine learning algorithms for water assessment in Timor-Leste. It is essential to assess groundwater quality to ensure the safety and availability of well water. The Water Quality Index (WQI) is the standard tool for assessing water quality, which can be calculated from physicochemical and microbiological parameters. However, in developing countries, it is sometimes difficult due to machine malfunctions and limited human resources. In such cases, missing-value imputation and machine learning models are useful for classifying water samples into suitable or unsuitable with significant accuracy. Some imputation methods were tested, and four machine-learning algorithms were explored: logistic regression, support vector machine, random forest, and Gaussian naïve Bayes. We obtained a dataset with 368 observations from 26 groundwater sampling points in Dili, the capital city of Timor-Leste. According to experimental results, it is found that 64% of the water samples are suitable for human consumption. We also found k-NN imputation method and random forest method were the clear winners, achieving 96% accuracy with three-fold cross-validation. The analysis revealed that some parameters significantly affected the classification results.

10:40-11:10Coffee Break
11:10-12:10 Session Keynote2
Location: Room1
11:10
Empowering Under-resourced Languages with Cutting-edge Voice Cloning: a Focus on Indonesian

ABSTRACT. This keynote presents a comparative study of cutting-edge voice cloning technologies, focusing on under-resourced languages with Indonesian as the primary case study. By leveraging state-of-the-art models such as Vall-E, YourTTS, and StyleTTS2, we explore their performance in generating high-quality, natural-sounding voice outputs using limited data. The presentation highlights the technical challenges in training these models for Indonesian and demonstrates how each technology addresses the limitations of data availability. Through the analysis of data collected from real-world scenarios, we compare the results in terms of voice quality, naturalness, and linguistic accuracy. The findings reveal critical insights into the strengths and limitations of each model, showcasing how AI-driven voice cloning can enhance the accessibility and inclusivity of under-resourced languages in digital platforms.

12:10-13:30Lunch Break
13:30-14:30 Session Keynote3
Location: Room1
13:30
Automatic speech recognition - a view from large language model

ABSTRACT. The decoder only LLM such as ChatGPT was originally developed to only accept text as input. Recent advances have enabled it for other modalities: such as audio, video and images. Our focus in this talk is the integration of speech modality into LLM. For this task, the research community has proposed various innovative approaches: e.g, applying discrete representations, integrating pre-trained encoder to existing LLM decoder architectures (Qwen) , multitask learning and multimodal pretraining. In the talk, I will review the recent approaches of ASR task using LLM and introduce 2 of our NTU’s speech lab works for this task:

  1. “Hyporadise”: Applying LLM on N-best hypothesis generated by traditional ASR models to improve the top1 ASR transcription result. Our results show that LLM not only exceed the performance of traditional LM re-scoring, LLM can recover and generate correct words not found in the N-best hypothesis - we call such an ability GER (Generative Error Correction).
  2. Leveraging LLMs for ASR and Noise-Robust ASR: In this work, we extend Hyporadise approach to include hypothesis (language) noise information into the LLM. Our insight is that under low SNR speech condition, there will be more diverse N-best hypothesis due to higher decoding uncertainty. This diversity can be captured and represented as an embedding vector called noisy language embedding. This embedding can then be exploited as a prompt. With fine-tuning on a training set, the LLM can be shown to have improve performance for the GER task.
14:30-15:00Coffee Break
15:00-16:40 Session E1: NLP2
Location: Room1
15:00
An Interactive Question-Answering System using Large Language Model and Retrieval-Augmented Generation in An Intelligent Tutoring System on the Programming Domain [ONLINE]

ABSTRACT. Abstract—The insufficient communication between mentors and students has been one of the main disadvantages of modern programming learning platforms. In this paper, we propose the development of a web-based intelligent tutoring system with a question-answering (QA) system to provide live interaction between students and a mentor figure. We propose the implementation of an alternative QA system using a large language model (LLM) and a retrieval-augmented generation (RAG) mechanism. We utilized the LangChain library and integrated the RAG mechanism with the history-aware retriever and direct integration into the web application. We performed internal and external evaluations in the form of qualitative evaluation via subjective scoring towards answers from various quantized LLMs in both single-turn and multi-turn conversation scenarios. We conclude that the Llama 3 model displays consistent and promising results compared to other models and that documents with a higher character count may act as better knowledge bases for the RAG process.

15:20
Automatic Question-Answer Alignment for Japanese Diverse Local Assembly Minutes

ABSTRACT. In the question-and-answer sessions in Japanese local assembly minutes, various topics about local administration are discussed, from which residents can learn about the administrative policy. However, in the discussion in some councils, a single council member asks several questions in a batch, then a prefectural governor and persons in charge answer their corresponding questions one by one. This kind of argument structure is difficult for residents to read since the text of the question and that of the answer are separated from each other. In order to make such assembly minutes easy to read, this work proposes to transform them into "one question, one answer" format through two stage processes, text segmentation and QA alignment. We employ several robust segmentation methods for the segmentation so that the proposed method can be applied to various discussion styles of Japanese local assembly minutes. Our experimental evaluation showed that the proposed methods performed well on not only the specific minutes but also those other than the minutes used for extracting the training data of our methods.

15:40
Enhancing Quantum NLP Robustness – Analysis on Noisy Models for Quantum Sentiment Classification

ABSTRACT. This research aims to improve the robustness of Quantum Natural Language Processing (QNLP) models in the presence of noise. Using Lambeq, an open-source library for quantum NLP, we focus on sentiment classification tasks to evaluate the impact of noisy quantum models. Our study involves the application of three types of optimizers: Simultaneous Perturbation Stochastic Approximation (SPSA), Nelder-Mead, and Adam. The performance and resilience of these optimizers are assessed under various noise conditions to determine their effectiveness in maintaining model accuracy and stability. The research shows that the Adam optimizer outperforms SPSA and NelderMead in QNLP simple sentiment analysis tasks, both in noiseless and noisy environments. Adam achieved 96.67% accuracy in noiseless conditions, significantly higher than SPSA’s 83.33% and Nelder-Mead’s 53.33%, highlighting its superior performance in optimizing quantum circuits. Moreover, in noisy simulation, Adam outperforms others with 70.00% accuracy performance. The work is useful for NLP implementations in quantum computers. This research is essential because the noisy nature of current quantum computers makes it difficult for QNLP models to perform accurately and reliably.

16:00
Identifying Plausibility Phrases in Instructional Texts Using BoostingBERT and AdaBoost.RT [ONLINE]

ABSTRACT. The coherence of each word or phrase in instructional text is crucial because incorrect word choice can lead to different outcomes. This research aimed to develop models to identify word or phrase coherence in instructional texts for classification and regression tasks. Word coherence or plausibility phrases are tested by evaluating how well a word fits when substituted into the text based on the surrounding context. This method is similar to BERT training techniques, masked language model (MLM). Using the dataset from SemEval 2022 task 7, the models were developed using BERT advanced variants, DeBERTaV3, and the boosting method to increase their performance. There are different models for each task, the BoostingBERT implementation model for the classification task and AdaBoost.RT implementation method for regression. There is also an imbalanced data issue that was addressed using undersampling techniques. As a result, the model achieved an accuracy of 0.6424 in classification and a Spearman’s rank correlation of 0.765 in regression. The boosting method increased the accuracy and the Spearman’s rank correlation by 0.1-0.2.

16:20
Physics Assessment Generation Through Pattern Matching and Large Language Models

ABSTRACT. Question generation has been an active area of research in Natural Language Processing (NLP) for some time, particularly for educational applications. This need has become even more pressing in the evolving educational landscape where online assessments are increasingly common. Our research focuses on generating physics assessments due to the unique challenge presented by the combination of generating both textual and numerical content. This paper presents an innovative approach to automated physics assessment generation by integrating pattern matching techniques with large language models (LLMs) which are Pegasus, T5, ChatGPT-3.5 Turbo, and Mistral 7B. The proposed method involves two main processes: generating variable values through pattern matching using regular expressions and paraphrasing the generated assessment questions using LLMs to ensure syntactic and semantic diversity. The generated paraphrases then get evaluated using automatic metrics (BLEU, METEOR, ROUGE, and ParaScore) and human assessments. The results indicate that LLMs with larger parameters used in this research, which are ChatGPT-3.5 Turbo and Mistral-7B, excel in generating high-quality paraphrases that are both syntactically correct and contextually meaningful. Both models achieved perfect human evaluation scores (3.000) compared to Pegasus (1.705) and T5 (1.529). Additionally, they received higher ParaScore scores, with ChatGPT-3.5 Turbo scoring 0.803 and Mistral-7B scoring 0.788, outperforming Pegasus (0.768) and T5 (0.760). Additionally, the results highlight the limitations of traditional n-gram based evaluation metrics and the potential of ParaScore as a more representative measure. This research contributes to the development of more reliable and varied question banks, aiding educators in creating personalized and cheat-resistant assessments.

15:00-16:40 Session E2: General informatics
Location: Room2
15:00
Rollover Prevention System using Estimated Lateral Force Rejection for High Deck Vehicle Model [ONLINE]

ABSTRACT. The aim of this study is to investigate the impact of roadside curb for a high deck vehicle during driving condition. The impact has generated the unwanted lateral force for the high deck vehicle. The unwanted lateral force occurred at the Center of Gravity (CG) of the high deck vehicle due to the external disturbance from the impact. The unwanted lateral force causes the vehicle to rollover while in the dynamic mode which causes road fatalities and injured a lot of passengers. In order to minimize the rollover effect, an active safety system called Estimated Lateral Impact Rejection system coupled with Active Front Wheel Steering system (EsLIR-AFWS) is proposed in this study. The developed control strategy is evaluated using verified high deck vehicle model, namely NAVYA using IPG CarMaker simulation tool. The proposed control strategy is analyzed based on various impact forces and various longitudinal driving speed to evaluate the robustness of the control design. The performance of the high deck vehicle model is evaluated in term of roll rate, yaw rate, lateral acceleration, lateral displacement and rollover index value.

15:20
A Novel Approach to Explainable AI: Leveraging Ripple Down Rules Algorithm for Knowledge-Based Explanations

ABSTRACT. The rapid advancement of AI technologies has led to their widespread application across various fields, raising critical social awareness regarding the need for trustworthy AI decisions. Ensuring AI transparency is essential for building trust, as it requires the system to explain its decision-making processes. This paper introduces a novel approach to developing an explainable AI (XAI) system using the Ripple Down Rule (RDR) knowledge acquisition technique. Our proposed method involves creating an imitation model with RDR using a proxy approach to an existing machine learning model. Information extracted from the imitation model's knowledge base is then processed into concise explanations. Findings indicate that the RDR-generated model closely mirrors the machine learning model it imitates, providing valid explanations as confirmed by expert validation. This study concludes that the RDR technique can effectively imitate classification machine learning models on structured tabular datasets and generate reliable explanations for their decisions.

15:40
Artificial Intelligence Adoption Framework for Business Architecture

ABSTRACT. This research develops a business architecture framework for adopting artificial intelligence (AI) technology based on the Business Architecture Body of Knowledge (BIZBOK). BIZBOK guides aligning business capabilities, information value streams, and organizational structure with strategic objectives. BIZBOK offers a best practice guide for developing a strong and holistic business architecture, but BIZBOK has not adopted AI in its development. The main motivation is the push from Industry 4.0 and the need to increase organizational competitiveness through AI. This research aims to create a practical framework for adopting AI in enterprise architecture, especially business architecture. This framework is designed to guide organizations in adopting AI to improve business process efficiency, product and service innovation, and optimize resource allocation.

16:00
Generating synthetic data on agricultural crops with DCGAN [ONLINE]

ABSTRACT. Convolutional Neural Network (CNN) is one of the deep learning architectures that is very effective for handling images. CNN is able to automatically extract important features from images, making it very suitable for various image processing tasks such as classification, object detection, and segmentation. However, even though CNN has great capabilities, one important thing to note is the amount of data. A considerable amount of data is needed for the CNN model to work optimally and avoid overfitting. To handle this problem, a synthetic data augmentation process is used using the Deep Convolutional Generative Adversarial Network (DCGAN) method. The generator network contained in the DCGAN model has a latent space dimension input whose value can vary. The size of the latent space dimension is very important in enabling data or image reconstruction during the training process. This study tested latent space dimension values on a corn plant dataset totaling 9159. The latent space values used in this experiment were 64, 100, and 128. In addition, this study also tested different batch sizes, namely 64 and 128. The model was evaluated using Fre’chet Inception Distance (FID) and Inception Score (IS). From the evaluation results, the best score on FID was 0.018001 and IS 1.239421.

16:20
Interaction Design of Indonesian Local Language Learning Application Using User-Centered Design

ABSTRACT. The endangerment of Indonesian local languages is an issue requiring urgent attention. In response, the Indonesian government has initiated several revitalization movements, including the development and implementation of local language curricula in schools. However, challenges remain, particularly the lack of motivation among students to learn these languages. One way to address this issue is by using a more engaging learning approach through the utilization of technology. This paper addresses the challenge of designing a mobile learning app for Indonesian local languages, with a focus on Javanese as the sample language, employing a user-centered design approach. The MDA (mechanics, dynamics, aesthetic) framework is utilized to create an engaging and enjoyable learning experience, aimed at increasing student motivation. The final product is a high-fidelity prototype evaluated using two usability testing methods: the Single Ease Question (SEQ) and the System Usability Scale (SUS). The final prototype achieved an average SEQ score of 6.93 out of 7 and an average SUS of 95 out of 100. Additionally, an evaluation of the gamification element was conducted to assess their impact on student motivation. The results demonstrate that the prototype meets the usability goals of being easy to learn, and also achieved the user experience goals which are fun, motivating, and cognitively stimulating.