Program for Saturday, September 28th

PROGRAM FOR SATURDAY, SEPTEMBER 28TH

Days:

next day

all days

View: session overview talk overview

09:00-09:10 Opening Session

Location: Room1

09:20-10:40 Session A1: Communication

Chair:

Yuya Hosoda

Location: Room1

09:20	Tamon Mikawa, Yasuhisa Fujii, Yukoh Wakabayashi, Kengo Ohta, Ryota Nishimura and Norihide Kitaoka Listening Head Motion Generation for Multimodal Dialog System ABSTRACT. This paper addresses the listening head generation (LHG), i.e., an avatar head motion in dialogue systems. In face-to-face conversations, head motion is a modality frequently used by both listeners and speakers. Listeners, in particular, tend to leverage head motion along with other backchanneling cues to react to the speaker and regulate the flow of the conversation. The type of head motion during dialogues varies between cultures and individuals, which implies that head motion generation for natural communication requires considering them. Additionally, existing works for head motion generation have primarily tackled speaker head generation, with limited work on listeners. In this study, we have created a multimodal dataset of casual Japanese conversation and a scalable, real-time LHG model that adapts to individual differences in head motion. We also developed the LHG that reflects individual tendencies via fine-tuning the model. The proposed models were evaluated through subjective experiments rated by four testers. The results showed that the proposed models successfully generated natural head motion and improved the appropriateness of head motion by focusing on individual tendencies. Further analysis was conducted to compare the differences between our method and actual human motion.
09:40	Takanori Kanai, Yukoh Wakabayashi, Ryota Nishimura and Norihide Kitaoka Predicting Utterance-final Timing Considering Linguistic Features Using Wav2vec 2.0 ABSTRACT. Accurate turn-taking prediction is essential in spoken dialog systems, in order to determine whether the system or the user should make the next utterance. Previous research has significantly improved the accuracy of turn-taking prediction, allowing dialog systems to avoid unnatural pauses before responding. However, in human-to-human dialogs, responses do not always occur immediately after a speaker's utterance ends; sometimes there are deliberate pauses or responses made with overlap. Therefore, this study proposes a method to estimate in advance when the interlocutor's utterances will end, allowing the system to respond with more natural timing, including occasional overlaps. We utilized wav2vec 2.0, fine-tuned for automatic speech recognition, to estimate utterance end times by considering linguistic features, and compared these methods with prediction methods that use only acoustic features. The results of our comparison showed that considering linguistic features allows more accurate prediction of utterance-final timing. Additionally, we observed that when using the proposed method, the estimated time until the end of the utterance decreases as the utterance approaches its end.
10:00	Fachry Dennis Heraldi and Fariska Zakhralativa Ruskanda Effective Intended Sarcasm Detection Using Fine-tuned Llama 2 Large Language Models ABSTRACT. Detecting sarcasm in English text is a significant challenge in sentiment analysis due to the discrepancy between implied and explicit meanings. Previous studies using Transformer-based models for intended sarcasm detection show room for improvement, and the development of large language models (LLMs) presents a substantial opportunity to enhance this area. This research leverages the open-source Llama 2 LLM, released by Meta, fine-tuned to develop an effective sarcasm detection model. Our proposed system design generalizes the use of Llama 2 for text classification but is specifically designed for sarcasm detection, sarcasm category classification, and pairwise sarcasm identification. Data from the iSarcasmEval dataset and additional sources, totaling 21,599 samples for sarcasm detection, 3,457 for sarcasm category classification, and 868 for pairwise sarcasm identification, were used. Methods include prompt development, fine-tuning using Parameter Efficient Fine-tuning (PEFT) with Quantized Low Rank Adaptation (QLoRA), and zero-shot approach. Our model demonstrates significant improvements, sarcasm detection model and pairwise sarcasm identification model are surpassing top models on previous study: F1-score of 0.6867 for sarcasm detection, Macro-F1 of 0.1388 for sarcasm category classification, and accuracy of 0.9 for pairwise sarcasm identification. Results demonstrate that Llama 2, combined with external datasets and effective prompt engineering, enhances intended sarcasm detection. The PEFT technique with QLoRA reduces memory requirements without compromising performance, enabling model development on devices with limited computational resources. This research underscores the importance of context and intention in intended sarcasm detection, with dataset labeling discrepancies remaining a significant challenge.
10:20	Rio Alexander Audino, Masayu Leylia Khodra and Aqsath Rasyid Naradhipa Direct Quotation Sentiment Analysis in Indonesian News Text using the IndoT5 Generative Models [ONLINE] ABSTRACT. Previous research on sentiment analysis of direct quotations in news articles utilized regex and Named Entity Recognition (NER) systems, but these methods often failed to capture the full context of the quotations. This paper introduces a generative approach that processes entire news documents to extract quotation sentences, speakers, targets, and sentiment polarity. By fine-tuning a generative model using a dataset annotated with GPT-4 and human annotators, and experimenting with regex for extraction, we demonstrate the effectiveness of this approach. The IndoT5-base-paraphrase model achieved impressive results, with F1 scores of 0.99 for quotation and speaker extraction, 0.74 for target extraction, and 0.81 for polarity analysis. These findings highlight the potential of combining generative models with regex for comprehensive sentiment analysis.

09:20-10:40 Session A2: Information hiding

Chair:

Rinaldi Munir

Location: Room2

09:20	Kota Kato, Takeshi Nakai and Koutarou Suzuki Card-based Secure Sorting Protocols based on the Sorting Networks [ONLINE] ABSTRACT. Card-based cryptography enables us to realize secure computation using physical cards with simple manual operations. The efficiency of a card-based protocol is evaluated by the number of shuffles and cards. The Haga et al. (IWSEC2022) is the only work to address card-based secure sorting. They proposed two protocols: a Las Vegas protocol and a finite-runtime protocol. The finite-runtime means that the number of shuffles is deterministic. In this paper, we propose a general converter that transforms an arbitrary sorting network into a finite-runtime card-based sorting protocol. The converter uses Haga et al.'s finite-runtime protocol for n=2 as its building block. Furthermore, we show card-based sorting protocol obtained by applying the converter to the AKS sorts. As a result, we improve the number of additional cards from O(n^2+m) to O(n+m) compared to Haga et al. On the other hand, the number of shuffles increases from O(nm) to O(nmlog n).
09:40	Masaharu Son, Takeshi Nakai and Koutarou Suzuki Search Result Verifiability in Multi-User Dynamic Searchable Symmetric Encryption [ONLINE] ABSTRACT. Dynamic Searchable Symmetric Encryption (DSSE) enables a single user to retrieve and update an encrypted database stored on an external server without decryption. Multi-User DSSE (MUDSSE) enables a data owner to give access rights to multiple users, and the users perform keyword searches on the encrypted database. This paper shows a concrete construction of verifiable MUDSSE, which allows users to verify the correctness of their search results, for the first time. Our construction is achieved by extending the method proposed by Bost et al. (ePrint 2016) for converting a (single-user) DSSE into a verifiable one to a method applicable to MUDSSE.
10:00	Sarah Azka Arief, Candy Olivia Mawalim and Dessi Puji Lestari Indonesian Speech Anti-Spoofing System: Data Creation and Convolutional Neural Network Models ABSTRACT. Biometric systems are prone to spoofing attacks. While research in speech anti-spoofing has been progressing, there is a limited availability of diverse language datasets. This study aims to bridge this gap by developing an Indonesian spoofed speech dataset, which includes replay attacks, text-to- speech, and voice conversion. This dataset forms the foundation for creating an Indonesian speech anti-spoofing system. Subsequently, light convolutional neural network (LCNN) and residual network (ResNet) models, based on convolutional neural networks (CNN), were developed to evaluate the dataset. The input features used are linear frequency cepstral coefficients (LFCC). Both models demonstrate remarkably low minDCF and EER scores approaching zero. The results also exhibit exceptional scores with 4-fold cross validation, showing strong initial performance with no signs of overfitting. However, models trained solely on Common Voice or Prosa.ai datasets performed poorly in cross-source tests, suggesting generalization issues due to a lack of diversity in the dataset. This highlights the need for further improvement and continued research in Indonesian speech spoof detection.
10:20	Priyanshu Dhingra, Satyam Agrawal, Chandra Sekar Veerappan, Eng Siong Chng and Rong Tong Enhancing Speech De-identification with LLM-Based Data Augmentation [ONLINE] ABSTRACT. This paper addresses the challenge of data scarcity in speech de-identification by introducing a novel, fully automated data augmentation method leveraging large language models. Our approach overcomes the limitations of human annotation,enabling the creation of extensive training datasets. To enhance de-identification performance, we compare pipeline and end-to-end models. While the pipeline approach sequentially applies speech recognition and named entity recognition, the end-to-end model jointly learns these tasks. Experimental results demonstrate the effectiveness of our data augmentation strategy and the superiority of the end-to-end model in improving PII detection accuracy and robustness.

10:40-11:10Coffee Break

11:10-12:10 Session Keynote1

Chair:

Fariska Z. Ruskanda

Location: Room1

11:10

Kazumasa Uehara

AI-driven decoding of neural features underlying behavior

ABSTRACT. Thanks to advances in multivariate neural data recording and artificial intelligence (AI), wecan now decode neural features related to both behavior and cognition. The field ofneuroscience is expanding its scope and insights through integration with AI. In my talk, Iwill present recent studies on the convergence of neuroscience and AI, including our workson decoding signals from scalp electroencephalogram (EEG) data during human visualperception and dyadic interactions. Additionally, our laboratory has recently developedanalyses that ensure explainability and interpretability by visualizing the rationale behind AIdecisions. My talk will include an approach to explainable AI. Toward the end, I will alsotouch on my recent work on the “Virtual Brain Project”, which seeks to apply thesetechnologies in the medical field.

12:10-13:30Lunch Break

13:30-14:50 Session B1: Speech

Chair:

Norihide Kitaoka

Location: Room1

13:30	Kaito Takahashi, Yukoh Wakabayashi, Kengo Ohta, Akio Kobayashi and Norihide Kitaoka Improving Speech Recognition for Japanese Deaf and Hard-of-Hearing People by Replacing Encoder Layers ABSTRACT. Communication between hearing individuals and those with hearing impairments generally involves sign language, written communication, and speech. It has been reported that more than half of Japanese people with hearing impairments communicate using speech. Therefore, speech recognition systems available for individuals with hearing impairments are demanded. However, speech recognition systems trained on speech from hearing individuals do not achieve high recognition accuracy for speech from individuals with hearing impairments. In this study, we propose a method to replace the encoder layer of the speech recognition model based on SSL to achieve high-accuracy speech recognition for speech from individuals with hearing impairments. By this method, we improved the recognition performance for significantly speech from individuals with hearing impairments.
13:50	Ayu Mawadda Warohma, Hilwadi Hindersah and Dessi Puji Lestari Speaker Recognition using MobileNetV3 for Voice-based Robot Navigation [ONLINE] ABSTRACT. Several studies have implemented speaker recognition as part of Human-Robot Interaction (HRI) to enhance the perception capabilities of voice-based robots. In this context, our research is implemented on a delivery robot that requires constraints on control and interaction. Implementing a speaker recognition system for a delivery robot is designed to ensure that the robot only executes commands from authorized speakers while avoiding receiving commands from unauthorized speakers. We propose text-independent speaker identification based on speaker embedding d-vector with MobileNetV3, then compared with a Fast ResNet-34. In addition, the compatibility of different feature extraction representations, MFCC, and Mel-scaled spectrogram is evaluated for the proposed architecture. The proposed system has been evaluated on a dataset in Bahasa Indonesia with various acoustic environments. The results of the proposed approach have a better computing efficiency of 98.27%, a smaller model size with an 87.47% reduction, and faster inference time of about 7ms compared to Fast ResNet-34.
14:10	Lingga Pangestu Rachman and Fadhil Hidayat Audio Segmentation using Short-Time Energy (STE) for Makhraj Recognition in Qur’an Recitation with XGBoost and Convolutional Neural Network (CNN1D) Approaches [ONLINE] ABSTRACT. This study proposes and evaluates a new system for recognizing the articulation points (makhraj) in the recitation of the Qur'an by adding an audio segmentation step using the Short-Time Energy (STE) method. The study focuses on the case of recognizing the letter Lam, specifically lam tarqiq and lam tafkhim. Audio segmentation with STE has proven effective in improving the accuracy of makhraj recognition, as STE successfully identifies audio segments with the highest accumulated energy values, facilitating the voice recognition process. The 1-Dimensional Convolutional Neural Network (CNN1D) model with 39 Mel Frequency Cepstral Coefficients (MFCC) features showed the most optimal results. Model evaluation indicates that the proposed approach can provide high performance in recognizing the makhraj of the letter Lam, with an average F1 Score of 97.95%. The implementation of this system is expected to significantly contribute to the teaching of Qur'anic recitation, particularly in aiding the learning of makhraj more accurately and efficiently.
14:30	Akeyla Pradia Naufal, Dessi Puji Lestari, Ayu Purwarianti, Kurniawati Azizah, Dipta Tanaya and Sakriani Sakti Machine Speech Chain with Emotion Recognition ABSTRACT. Developing natural speech recognition and speech synthesis systems requires speech data that authentically represents real emotions. However, this type of data is often challenging to obtain. Machine speech chain offers a solution to this challenge by using unpaired data to continue training models initially trained with paired data. Given the relative abundance of unpaired data compared to paired data, machine speech chain can be instrumental in recognizing emotions in speech where training data is limited. This study investigates the application of machine speech chain in speech emotion recognition and speech recognition of emotional speech. Our findings indicate that a model trained with 50% paired neutral emotion speech data and 22% paired non-neutral emotional speech data shows a reduction in Character Error Rate (CER) from 37.55% to 34.52% when further trained with unpaired neutral emotion speech data. The CER further decreases to 33.75% when additionally trained with combined unpaired speech data. The accuracy of recognizing non-neutral emotions ranged from 2.18% to 53.51%, though the F1 score fluctuated, increasing by up to 20.6% and decreasing by up to 23.4%. These results suggest that the model demonstrates a bias towards the majority class, as reflected by the values of the two metrics.

13:30-14:50 Session B2: Web application

Chair:

Windy Gambetta

Location: Room2

13:30	Rania Dwi Fadhilah, Ayu Purwarianti and Listiarso Wastuargo Knowledge Graph Based Recommender System Development in Intelligent Tutoring System on Programming Domain ABSTRACT. Facing‬‭ a‬‭ shortage‬‭ of‬‭ digital‬‭ talent‬‭ in‬‭ Indonesia,‬‭ the‬‭ web-based‬‭ intelligent‬‭ tutoring‬‭ system,‬‭ CodeBuddy.ai,‬‭ was‬‭ developed‬‭ to‬‭ teach‬‭ basic‬‭ C++‬‭ programming‬‭ for‬‭ beginners.‬‭ This‬‭ system‬‭ uses‬‭ a‬‭ Knowledge‬‭ Graph‬‭ based‬‭ recommendation‬‭ system‬‭ for‬‭ personalizing‬‭ learning‬‭ paths‬‭ according‬‭ to‬‭ student‬‭ capabilities.‬‭ The‬‭ study‬‭ compares‬‭ two‬‭ KG‬‭ models:‬‭ Semantic‬‭ Similarity‬‭ Calculation,‬‭ which‬‭ uses‬‭ semantic‬‭ weighting,‬‭ and‬‭ Random‬‭ Walk‬‭ with‬‭ KG‬‭ Embedding,‬‭ which‬‭ explores‬‭ relations‬‭ more‬‭ extensively.‬‭ These‬‭ models‬‭ were‬‭ chosen‬‭ for‬‭ their‬‭ potential‬‭ effectiveness‬‭ in‬‭ educational‬‭ recommender‬‭ systems‬‭ and‬‭ to‬‭ provide‬‭ a‬‭ focused‬‭ comparison‬‭ of‬‭ their‬‭ performance‬‭ in‬‭ this‬‭ specific‬‭ context.‬‭ Experimental‬‭ results‬‭ show‬‭ that‬‭ Random‬‭ Walk‬‭ with‬‭ KG‬‭ Embedding‬‭ outperforms‬‭ by‬‭ effectively‬‭ detecting‬‭ more‬‭ remote‬‭ relationships,‬‭ making‬‭ it‬‭ superior‬‭ for‬‭ delivering‬‭ precise‬‭ educational‬‭ recommendations.‬‭ The‬‭ Random‬‭ Walk‬‭ with‬‭ KG‬‭ Embedding‬‭ model‬‭ achieved‬‭ lower‬‭ MAE‬‭ (average‬‭ 0.4175595)‬‭ and‬‭ RMSE‬‭ (average‬‭ 0.6286225)‬‭ scores,‬‭ higher‬‭ trust‬‭ (average‬‭ 0.558315),‬‭ and‬‭ greater‬‭ diversity‬‭ (0.379259535820306),‬‭ indicating‬‭ its‬‭ effectiveness‬‭ in‬‭ providing‬‭ more‬‭ accurate‬‭ and‬‭ varied learning path recommendations.‬
13:50	Rita Rismala, Nur Ulfa Maulidevi and Kridanto Surendro Personalized Neural Network-based Aggregation Function with Partial Preference Handling in Multi-Criteria Collaborative Filtering [ONLINE] ABSTRACT. Recommendation quality is a main concern in multi-criteria collaborative filtering (MCCF). To enhance the quality, a personalized neural network-based aggregation function that is efficient, yet personal has been introduced, namely P-FFNN-MCCF. However, it is limited when dealing with partial preference problems that arises when users fail to thoroughly assess criteria. Combining multiple attributes, such as overall rating and reviews, using a multi-attribute BERT along with a rule-based approach, namely MA-BERT-RA, has proven to be highly effective in overcoming this problem. Therefore, to address the issue of partial preference and enhance the quality of recommendations, we propose integrating P-FFNN-MCCF with MA-BERT-RA, known as P-FFNN-MCCF x MA-BERT-RA. Based on the test results on two datasets, our method improved criteria ratings prediction by up to 50% and item recommendations by up to 7%. Our method is also superior to the baseline methods. These results clearly demonstrate the effectiveness of our method, showcasing its potential to improve the performance of the recommendation system.
14:10	Fajar Muslim, Isabela Handayani Sumantri and Ayu Purwarianti Evaluating Trustworthiness of Twitter Users: Addressing Data Drift and Domain Relevance Through Novel Dataset Creation and Feature Analysis ABSTRACT. This research explores the challenge of assessing Twitter user trustworthiness using machine learning, with a focus on the Indonesian social media context. Indonesia, with its vast social media user base, often faces rapid misinformation dis- semination, exemplified by COVID-19-related false information [1]. Initially, a Finland Twitter dataset [cited] achieved a 97% training accuracy but dropped to 65% when tested on Indonesian twitter user data, indicating significant data drift. A novel In- donesian dataset was introduced improved performance to 100% training accuracy and around 94% for validation and testing. The model performed well more than 80% accuracy in related domains but struggled (40-80%) in less related ones. Feature analysis revealed key factors influencing trustworthiness, such as content score, like ratio, and friends count. This study highlights the need for tailored datasets and domain-specific approaches to enhance model reliability and address misinformation risks in Indonesian social media
14:30	Setio Basuki, Zamah Sari, Rizky Indrabayu, Reza Fauzan, Aulia Arif Wardhana and Masatoshi Tsuchiya Predicting Citation Counts with Machine Learning: A Citation Function Approach [ONLINE] ABSTRACT. This paper develops a machine learning model to predict the citation counts obtained by research papers. The model uses citation functions, representing the intentions of the paper's author when making citations of previous works, to estimate the number of citations. These intentions can include introducing a research topic, making comparisons, criticizing previous works, etc. Three predictors have been developed based on citation functions: citing sentence, regular sentence, and reference. The prediction is treated as a regression and classification problem by pre-grouping the number of citations into three categories: high-count, medium-count, and low-count. The dataset was obtained from the International Conference on Learning Representations (ICLR) 2017-2020, containing 5,156 accepted and rejected papers. This paper uses only the accepted papers since the main task is to predict the number of citations of accepted/published papers. To obtain the number of citations one year after publication, this paper uses the API provided by Semantic Scholar. According to experiments, the best results in classification reach 98.33% accuracy, and in regression, the results reach 0.3 on both RMSE and MAE. The feature called ‘citing paper dominant,’ representing the superiority of the citing paper over the cited paper, has demonstrated its effectiveness in achieving the best prediction results despite its low distribution in the dataset. In conclusion, citation function-based predictors are effective in estimating the future impact of a paper.

14:50-15:20Coffee Break

15:20-16:40 Session C1: NLP1

Chair:

Tomoyosi Akiba

Location: Room1

15:20	Vito Christian Samudra and Dicky Prima Satya Application of Startup Success Prediction Models and Business Document Extraction Using Large Language Models to Enhance Due Diligence Efficiency [ONLINE] ABSTRACT. Startups face extreme uncertainty and high failure rates, posing challenges for investors in identifying promising ventures. This research, based on a case study and interviews at a prominent Indonesian corporate venture capital firm, explores the due diligence process, typically taking 4-6 weeks depending on data completeness. Using Large Language Model (LLM) and Machine Learning (ML) technologies developed with the Team Data Science Process (TDSP) methodology, the research aims to enhance due diligence efficiency. Key development steps include data integration, ML model creation for startup success classification, and the integration of OpenAI's GPT-4 and Google Search APIs for comprehensive business analysis. The system's dashboard offers features such as pitch deck, financial, market trends, competitor, and founding team analyses, along with startup success prediction using the XGBoost model. This model, deployed via Flask, demonstrated consistent results through cross-validation. Customer acceptance testing, conducted with eight experienced startup investors, yielded a high satisfaction rate of 4.50 out of 5.00, indicating strong approval of the system's effectiveness.
15:40	Moses Ananta, Rahayu Utari, Amany Akhyar and Gusti Ayu Putri Saptawati Integrating BERTopic and Large Language Models for Thematic Identification of Indonesian Legal Documents [ONLINE] ABSTRACT. The increasing complexity and volume of legal documents pose significant challenges for information retrieval and text analysis. Traditional text analysis methods are often inadequate, resulting in time-consuming and labor-intensive processes. This study applies advanced natural language processing (NLP) techniques, specifically BERTopic and large language models (LLMs), to cluster and identify themes within Indonesian legal paragraphs. The methodology includes data collection, preprocessing, BERTopic topic modeling, and LLM-based topic refinement. Results show that the "intfloat/multilingual-e5-large-instruct" embedding model, with a minimum cluster size of 40, achieves optimal performance with a Silhouette Score of 0.723 and a Davies-Bouldin Index of 0.340. Subsequent LLM refinement using Meta’s LLaMA-3-8B-Instruct language model enhances the readability and relevance of the extracted topics. The approach enhances the organization and analysis of complex legal documents, with practical implications for improving legal information retrieval and management.
16:00	Masato Kikuchi, Masatsugu Ono, Toshioki Soga, Tetsu Tanabe and Tadachika Ozono Coarse-Grained Sense Inventories Based on Semantic Matching between English Dictionaries ABSTRACT. WordNet is one of the largest handcrafted concept dictionaries visualizing word connections through semantic relationships. It is widely used as a word sense inventory in natural language processing tasks. However, WordNet's fine-grained senses have been criticized for limiting its usability. In this paper, we semantically match sense definitions from Cambridge dictionaries and WordNet and develop new coarse-grained sense inventories. We verify the effectiveness of our inventories by comparing their semantic coherences with that of Coarse Sense Inventory. The advantages of the proposed inventories include their low dependency on large-scale resources, better aggregation of closely related senses, CEFR-level assignments, and ease of expansion and improvement. Our inventories are publicly available for free use\footnote{Our inventories are publicly available at \url{https://drive.google.com/file/d/1txm5b2OpHOHGIsieYFycEFrqiUGwxSd-/view?usp=sharing}.}.
16:20	Amany Akhyar and Masayu Leylia Khodra Automatic Single Document Summarization For Indonesian News Article Using Abstract Meaning Representation [ONLINE] ABSTRACT. With the increasing number of online news sources, effective summarization becomes essential to provide readers with concise and informative content. This study focuses on developing an automatic summarization system for single Indonesian news articles using Abstract Meaning Representation (AMR). Leveraging a machine learning-based AMR parser, the system constructs sentence representations, selects subgraphs to build summary graphs, and generates summary texts. The baseline uses retrained Word2Vec and selects the top three most similar sentences via cosine similarity for ROUGE evaluation against IndoSum's abstractive summary. Despite not surpassing baseline performance, the proposed system achieves an average ROUGE-1 of 0.62833, ROUGE-2 of 0.54449, and ROUGE-L of 0.58889. The findings indicate that while the proposed system effectively summarizes, it tends to prioritize initial sentences during subgraph selection, which is crucial for constructing accurate summary graphs. This tendency highlights areas for further improvement. Future research can build upon these findings by employing advanced graph construction algorithms for summary graphs and alternative text generation techniques. This study contributes to ongoing efforts to enhance text summarization systems and provides valuable lessons for future research in this field.

15:20-16:40 Session C2: Biomedical informatics

Chair:

Dessi Puji Lestari

Location: Room2

15:20	Ansh Mishra, Jia Qi Yip and Eng-Siong Chng Time-domain Heart Sound Classification Using Neural Audio Codecs ABSTRACT. Heart auscultation remains a valuable approach for early detection of valvular heart disease, but its efficacy depends on physician skill. The rise of telehealth services necessitates consideration of heart sound transmission and classification through compressed formats. While neural audio codecs offer impressive compression rates, they may introduce distortions that could impact classification accuracy. We propose a novel approach to perform heart sound classification directly on the continuous representations of the codec, rather than on reconstructed audio. Our method demonstrates good performance on the heart sound classification task while leveraging the encoder of the audio codec, potentially improving robustness and efficiency in telehealth applications. Using the Descript Audio Codec (DAC) encoder combined with a modified M5 convolutional neural network, our approach achieves a Classification Error Rate (CER) of 0.78 on the Yaseen Dataset, outperforming models trained on transmitted audio and comparing favorably with state-of-the-art methods. This performance, coupled with the method's inherent robustness to transmission artifacts, highlights its potential for improving remote cardiac diagnostics in telehealth settings.
15:40	Made Arbi Parameswara and Rila Mandala BIOGAN-BERT: BioGPT-2 Fine Tuned and GAN-BERT for Extracting Drug Interaction Based on Biomedical Texts [ONLINE] ABSTRACT. Drug-drug interactions (DDIs) occur when two or more drugs are used together, leading to unexpected and potentially harmful effects. Identifying DDIs requires manual annotations, but the increasing volume of research publications and the slow data annotation process make this challenging. Machine learning, especially deep learning, can efficiently extract and identify DDIs from biomedical literature. However, class imbalance in datasets reduces model performance. This study introduces BIOGAN-BERT, which combines data augmentation using the Pretrained Language Model (PLM) BioGPT-2 and Generative Adversarial Network (GAN) to address class imbalance in DDI extraction tasks. It identifies gaps in existing imbalance handling studies and proposes enhancements through PLM-based data augmentation and semi-supervised learning with GAN. BioGPT-2 generates additional data from labeled and unlabeled sources, enriching the training dataset. This data is then processed using GAN-BERT, allowing the model to learn from more complex data distributions, thereby improving data quality and model generalization. Traditional methods like sampling only increase the number of data instances, and loss functions merely assign greater representation to the loss values. While these methods expand the learning space for models, they do not enhance data representation. In contrast, this novel approach uses data augmentation to increase both the quantity and the diversity of data. Evaluation results show that BIOGANBERT outperforms several baselines, significantly increasing the micro F1-Score for minor classes to 0.85 compared to 0.83 for the best baseline model, demonstrating its effectiveness in handling class imbalance and contextual variations in biomedical data.
16:00	Matthew Martianus Henry, Christian Kenneth and Bens Pardamean Implementation of Fine-Tuned BERT for Enzyme Classification Based on Gene Ontology [ONLINE] ABSTRACT. Enzymes are biocatalysts with vital roles in biological functions and many industrial applications. Diverse enzymes are classified using Enzyme Commission (EC) nomenclature, making differentiation challenging. On the other hand, another biological information, gene ontology (GO), can describe the biological aspects of enzymes, covering related biological processes (BP), molecular functions (MF), and their locations within cells (CC). This study proposes a novel EC class and subclass classification of enzymes within the ontology subclass based on their GO semantics using a Bidirectional Encoder Representation of Transformer (BERT). The BERT model is first fine-tuned using the preprocessed GO term name and definition, with the enzymes in each ontology class (BP, MF, or CC) are also divided based on how the GO assigned, either through manual annotation (NONIEA) or electronically inferred (IEA). BERT successfully obtained 0.93, 0.60, 0.99, 0.90, 0.40, and 0.35 F1 scores during fine-tuning for BP IEA, BP NONIEA, MF IEA, MF NONIEA, CC IEA, and CC NONIEA, respectively. On the test set, the fine-tuned BERT significantly outperformed GOntoSim, a framework to calculate semantic similarity based on classical information theory, in EC class classification across all metrics with less inference time in all ontology subclass. Expanded further to the EC subclass, BERT can classify the enzyme on the EC subclass level in BP IEA and MF IEA ontology subclass. However, longer epochs are needed in fine-tuning. This result shows that the names and definitions of GO terms are distinguishable features in classifying enzymes as an alternative to the information content approach.
16:20	Aria Rahman and Esmeralda Contessa Djamal Sleep Apnea Identification Based on Multi-Frequency Band of EEG Signal Using Slim UNETR [ONLINE] ABSTRACT. Sleep Apnea is a sleep disorder characterized by repeated interruptions in breathing lasting more than 10 seconds, often detected using Electroencephalogram (EEG) signals. This study proposed a novel approach using Wavelet and U-Net architecture to classify Sleep Apnea: No Apnea, Obstructive Sleep Apnea (OSA), or Central Sleep Apnea (CSA). The slim UNETR method, adapted from an image analysis application, processes EEG signals. The representation of EEG signals, which are multi-channel recorded time sequences, can be approximated into spatial images. Therefore, a connection between channels and time sequences allows the attention mechanism, an advantage of transformer-based methods. Multi-attention allows EEG signals to be processed in parallel in several frequency bands. This adaptation minimizes noise and enhances the precision of EEG signal analysis by focusing on relevant data rather than the entire signal field. Experimental results demonstrate that the choice of frequency band and EEG channel significantly impact classification accuracy, with the C3-M2 channel using beta waves achieving the highest accuracy of 97.91% and the C4-M1 channel with delta waves recording the lowest accuracy of 91.33%. These findings highlight the importance of optimizing frequency bands and channels in EEG-based Sleep Apnea detection. Wavelet and Slim UNETR can improve accuracy and efficiency in online Sleep Apnea monitoring for a certain duration of time. EEG signals classified through the Wavelet and Slim UNETR methods can detect apnea episodes and anticipate further treatment.

17:00-19:00 Welcome Reception