ICADL 2024: THE 26TH INTERNATIONAL CONFERENCE ON ASIA-PACIFIC DIGITAL LIBRARIES
PROGRAM FOR WEDNESDAY, DECEMBER 4TH
Days:
next day
all days

View: session overviewtalk overview

09:00-09:30 Session 1: Welcome

Welcome to Monash University Malaysia:  Professor Nafis Alam, Head, School of Business

Welcome to ICADL:  Professor Gillian Oliver, Conference Chair

 

09:30-10:30 Session 2: Keynote: Professor Carsten Rudolph

 Faculty of IT, Monash University Australia

Title:  Security-aware data provenance : Can I trust the data I see?

Abstract:  While integrity and appraisal of records have always been considered important, existing standards and security discussions are missing some essential perspectives, in particular with respect to cybersecurity. Digital technology, aggregation and processing in a large scale and complex steps, such as machine learning-based systems, have fundamentally increased the complexity of understanding provenance of data. Provenance graphs have been suggested as a mechanism to document the origins of data records. While these can provide valuable information, these graphs do not convey information on the risk of data to be maliciously falsified and manipulated. This talk explores how security-aware provenance might add some of the missing cybersecurity factors to recordkeeping and discusses some of the challenges in this area.

Online Participants:  Zoom link

Chair:
10:30-11:00Refreshments
11:00-12:30 Session 3

Online participants: Zoom link

11:00
Freedom of Information and Information Policy in Southeast Asia: the cases of Thailand and Indonesia

ABSTRACT. The government administration in Southeast Asia has experienced significant changes in recent years, notably in transparency and accountability. The government initiatives to provide public access through the implementation of the Freedom of Information Act. This study analyses the issue of enforcement of FoI law in Thailand and Indonesia and aims to identify the information policy in compliance to particularly FoI laws in Thailand and Indonesia regarding records and information management in public sectors. This study is limited to the context of the FoI, recordkeeping and information services in Thailand and Indonesia. The results reveal that firstly, the FoI laws as an information policy to support good governance do not always guarantee transparency, accountability, and good governance because information disclosure has not been regularly conducted due to limited circumstance where government agencies were not ready to use, re-use, and share their information and public records. Secondly, the FoI laws as an information policy, did not succeed in establishing good recordkeeping and information management because government officials lacked adequate knowledge and skills on recordkeeping and information management, leading to failure in performing information services and in complying with the FoI. Lastly, the FoI laws, used as an information policy, did not considerably raised awareness about the significance of setting up effective public records, archives and information management systems among public sector due to the fact that people have low awareness on the FoI laws and on their human rights to access to public records and information held by public sector. Also, citizens and government officials lack understanding on the relationship between right of information access and recordkeeping and information management, thus they do not acknowledge the necessity to have effective information and recordkeeping systems in public sector.

11:20
A research study investigating the impact of COVID-19 on the information needs of international students in the UK

ABSTRACT. Globally, COVID-19 has affected university students. Throughout the pan-demic, international students have been particularly hard hit. Loneliness and difficulties are the main factors affecting the mental health of overseas students. Transitioning to online learning has presented challenges in terms of living conditions and internet access reliability. With the overall aim of investigating the impact of COVID-19 on international students, this study used a questionnaire survey with 545 participants in this study. NVivo and SPSS software were used to analyses data. The outcome demonstrates that several issues were encountered by international university students in the UK during the COVID-19. They mostly used social media and their respective colleges as information sources.

11:40
The Power of Warning: Unpacking the Impact of Fact-Checking Flag on News Sharing and Verification

ABSTRACT. Grounded in information gap theory (IGT), the objective of this paper is to develop and empirically validate a conceptual model comprising perceived be-lievability, information curiosity, fact-checking flag, intention to share and in-tention to verify. The proposed model is tested through a between-participants experiment using a simulated scenario where users browse Facebook news posts. A total of 177 participants were randomly assigned to one of two exper-imental conditions: one where a fact-check flag was present in a Facebook news post, and the other where the fact-check flag was absent. Four hypotheses were tested using one-way ANOVA and PROCESS MODEL 4. Results show that the presence of fact-checking flag has both a direct negative impact on the intention to share and an indirect effect through perceived believability. How-ever, it influences the intention to verify only indirectly by increasing infor-mation curiosity. This study enriches the literature by not only shedding light on the underlying process by which fact-checking flag affects users’ behavioral intentions but also extending the boundary conditions of IGT.

12:00
A framework to facilitate older people in leveraging online financial services
PRESENTER: Dain Thomas

ABSTRACT. Older people are encountering digital exclusion due to the evolving technological realm. The use of digital financial services among older people aged 65 and over is low in comparison to other age groups. A wide range of challenges are associated with older people’s low usage of online financial services. Hence, interventions have to be developed to reduce the exclusion. In order to create interventions, factors which contribute to their challenges have to be identified. This paper elucidates a framework which was developed from qualitative data that could be leveraged to develop potential solutions by focusing on the main factors which could prevent them from fully utilizing digital financial services. Not only would this framework be beneficial for older people but also for intermediaries who assist older people in accessing digital financial information as this tool could aid them in choosing the appropriate solution required to help the individual to use online financial services.

12:30-13:30Lunch Break
13:30-15:00 Session 4A

Onlline participants: Zoom link

Chair:
Location: 6405
13:30
Factors Influencing University Students’ AI Use and Knowledge Acquisition

ABSTRACT. Despite the growing emphasis on artificial intelligence (AI) education, there is relatively little research on the motivational factors that influence students' intention regarding AI knowledge acquisition and the utilization of AI applications. Understanding these factors not only enhances our knowledge of AI education but also helps educators and researchers to develop appropriate interventions to promote AI learning that align with students’ needs and expectations. Guided by expectancy-value theory and theory of planned behavior, we investigated the role of expectancy-value beliefs in fostering university students’ intentions to learn and use AI. 141 university students participated in this study. Our findings revealed that intrinsic and utility value beliefs played a mediating role in promoting students’ behavioral intentions in AI learning. We also found that while effort cost negatively affected these intentions, opportunity cost positively influenced intentions to acquire AI knowledge and use AI applications. Additionally, we identified gender differences in students’ expectancy-value beliefs, which can inform educators in designing gender-specific interventions to enhance female students’ motivation in AI learning.

13:50
Can pre-trained language models generate titles for research papers?

ABSTRACT. The title of a research paper communicates in a succinct style the main theme and, sometimes, the findings of the paper. Coming up with the right title is often an arduous task, and therefore, it would be beneficial to authors if title generation can be automated. In this paper, we fine-tune pre-trained language models to generate titles of papers from their abstracts. Additionally, we use GPT-3.5-turbo in a zero-shot setting to generate paper titles. The performance of the models is measured with ROUGE, METEOR, MoverScore, BERTScore and SciBERTScore metrics. We find that fine-tuned PEGASUS-large outperforms the other models, including fine-tuned LLaMA-3-8B and GPT-3.5-turbo, across most metrics. We also demonstrate that ChatGPT can generate creative titles for papers. Our observations suggest that AI-generated paper titles are generally accurate and appropriate.

14:10
Generative Agents Navigating Digital Libraries
PRESENTER: Saber Zerhoudi

ABSTRACT. In the rapidly evolving field of digital libraries, the development of large language models (LLMs) has opened up new possibilities for simulating user behavior. This innovation addresses the longstanding challenge in digital library research: the scarcity of publicly available datasets on user search patterns due to privacy concerns. In this context, we introduce Agent4DL, a user search behavior simulator specifically designed for digital library environments. Agent4DL generates realistic user profiles and dynamic search sessions that closely mimic actual search strategies, including querying, clicking, and stopping behaviors tailored to specific user profiles. Our simulator's accuracy in replicating real user interactions has been validated through comparisons with real user data. Notably, Agent4DL demonstrates competitive performance compared to existing user search simulators such as SimIIR and SimIIR 2.0, particularly in its ability to generate more diverse and context-aware user behaviors. To further support the digital library research community, we also present Agent4DLData, a concise yet comprehensive collection of simulated user search sessions generated by Agent4DL. This dataset provides researchers with a valuable resource for studying user interactions in digital library systems.

14:30
Evaluation of the Quality of AI-Generated Scientific Text Under Different Types of Cognitive Complexity Tasks

ABSTRACT. As Artificial Intelligence Generated Content (AIGC) continues to deepen its application in the field of scientific research, this study aims to explore the current quality of AIGC in completing research tasks, providing insights for improving AIGC in the scientific research domain. This study first reviews and summarizes existing information quality evaluation frameworks and AIGC-related research to propose quality evaluation criteria for AIGC in the research context. Then, by setting research tasks with different cognitive complexities, user experiments were conducted on the ChatGPT and ERNIE Bot platforms to select appropriate AIGC quality evaluation criteria for these tasks. The quality of AIGC generated by ChatGPT and ERNIE Bot was evaluated based on the selected criteria, revealing the strengths and weaknesses of current AIGC in meeting users' research information needs. The results show that users generally value relevance, professionalism, and readability when evaluating AIGC for research tasks. However, attention to specific criteria such as accuracy, diversity, coherence, and creativity varies depending on the cognitive complexity of the research tasks. Additionally, AIGC performs well in understanding, evaluating, and creating tasks but has significant shortcomings in remembering and analyzing tasks, particularly in terms of accuracy and professionalism.

13:30-15:00 Session 4B

Online participants: Zoom link

Location: 6404
13:30
Using Annotator Labels Instead of Golden Labels for Fine Emotion Detection

ABSTRACT. . Textual fine-grained emotion detection is a challenging task that has yet to achieve powerful performance in both pretrained language models (PLMs) and large language models (LLMs). In this paper, we analyze a fine-emotion dataset and current approaches to provide insight of existing issues. We propose the idea of treating fine-emotion detection as having multiple appropriate answers, and to consider annotator level labels instead of the golden label. Annotator labels are labels provided by individual annotators before being aggregated into the golden (reference) label. These labels highlight the subjectivity of individual annotators before the labels were aggregated. We then evaluate treating neutral label separately and using LLMs as aid for mistake filtering and augmentation. We show that using annotator labels instead of allows BERT model to predict different interpretations without being penalized despite the weaker performance. Large potential has yet to be explored on annotator level label fine-emotion detection and we provide several ideas through evaluating approaches and analyzing the results. We hope to encourage a change in how fine emotion is detected, multiple accurate annotator labels, even within a multi-label scenario.

13:50
Investigating Industry–Academia Collaboration in Artificial Intelligence: PDF-Based Bibliometric Analysis from Leading Conferences

ABSTRACT. This study presents a bibliometric analysis of industry–academia collaboration in artificial intelligence (AI) research, focusing on papers from two major international conferences, AAAI and IJCAI, from 2010 to 2023. Most previous studies have relied on publishers and other databases to analyze bibliographic information. However, these databases have problems, such as missing articles and omitted metadata. Therefore, we adopted a novel approach to extract bibliographic information directly from the article PDFs: we examined 20,549 articles and identified the collaborative papers through a classification process of author affiliation. The analysis explores the temporal evolution of collaboration in AI, highlighting significant changes in collaboration patterns over the past decade. In particular, this study examines the role of key academic and industrial institutions in facilitating these collaborations, focusing on emerging global trends. Additionally, a content analysis using document classification was conducted to examine the type of first author in collaborative research articles and explore the potential differences between collaborative and noncollaborative research articles. The results showed that, in terms of publication, collaborations are mainly led by academia, but their content is not significantly different from that of others. The affiliation metadata are available at https://xxx.

14:10
Clarifying Questions Generation for Conversational Search based on “People Also Ask” Feature

ABSTRACT. Improving Conversational search is an important area of research in Natural Language Processing and AI today. Extensive research is being done to make the conversational search user experience more effective. The main aim is to help the user get an answer to the query as fast and correctly as possible by enabling the system to act as an assistant to the user by asking the minimum number of relevant questions rather than just displaying the results randomly of what is asked in the search box. Recent research has shown that asking clarifying questions significantly improves the quality and efficiency of conversational search by reducing the gap between the user’s query and information need. In this work, we approach the problem of finding relevant clarifying questions by exploiting the "People Also Ask" (PAA) feature of a popular search engine. We perform a qualitative assessment to verify the quality of the extracted questions and their potential applicability to clarification in search. Next, we convert the PAA questions into clarifying questions using various transformer-based Large Language Models (LLMs), such as T5, BART, GPT2, and use established natural language generation metrics to evaluate the performance of different LLMs for paraphrasing the questions. Finally, we discuss the results and the relation between PAA questions and clarifying questions to draw some useful conclusions and directions of future work.

14:30
Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents

ABSTRACT. This paper investigates the presence of OCR-sensitive neurons within the Transformer architecture and their influence on named entity recognition (NER) performance on historical documents. By analysing neuron activation patterns in response to clean and noisy text inputs, we identify and then neutralise OCR-sensitive neurons to improve model performance. Based on two open access large language models (Llama2 and Mistral), experiments demonstrate the existence of OCR-sensitive regions and show improvements in NER performance on historical newspapers and classical commentaries, highlighting the potential of targeted neuron modulation to improve models' performance on noisy text.

15:00-15:30Refreshments
15:30-17:00 Session 5

Online participants: Zoom link

15:30
Comparative Analysis of Public Library Service Feedback on Social Media and Google Maps Reviews

ABSTRACT. In an era of heightened consumer awareness and intense competition, public libraries must prioritize service quality, user satisfaction, and loyalty. Traditional methods of gauging user feedback, such as surveys and interviews, are time-consuming and resource-intensive. With the advent of big data and the proliferation of mobile internet, users now express their opinions through social media platforms like Instagram and Facebook and review platforms like Google Maps. This study compares public library service feedback from social media and Google Maps reviews in Taiwan's six municipalities. The Keypo big data analysis system and web scraping techniques categorized feedback using the LibQUAL+ model: Library as Place, Affect of Service, and Information Control. The comparative analysis revealed significant differences in the nature and focus of feedback across platforms. Social media feedback was more detailed and service-oriented, while Google Maps re-views focused on physical amenities and resource availability. These findings suggest that public libraries can benefit from a dual approach to feedback collection, leveraging both social media and review platforms to gain comprehensive insights into user opinions and enhance service quality.

15:50
Estimating Citizen Personality Traits Using Social Media Posts

ABSTRACT. Personality traits are crucial factors that influence individual behavior and responses. In this study, we introduced both personality traits and experience traits in an analysis of the characteristics of citizens using social media posts from different cities. The personality traits referenced some facets of the "Big Five Model" of personality. The experience traits were introduced considering the close relationship between personality traits and behavior. Specifically, we labeled social media posts from each city with two labels, representing a personality trait and an experience trait. We then examined how these traits reflect the tendencies in citizens' behavior. The personality traits defined in this study were "Altruism," "Artistic Interest," "Adventurousness," "Gregariousness," and "Activity." We assigned the labels manually and fine-tuned large language models to assign the personality trait labels to cities automatically. Finally, we analyzed the differences in personality trait trends between cities. Our experimental results showed that the F1 scores of the prediction models for both personality traits and experience traits exceeded 0.8. The analysis of social media posts using the trained models demonstrated that the citizens of a certain city had significantly higher scores for the personality traits "Artistic Interest" and "Gregariousness" than those from other cities, which was consistent with the results of previous questionnaire-based studies.

16:10
The Impact of Preprints on the Citations of Journal Articles Related to COVID-19

ABSTRACT. To investigate the impact of preprints on the citation counts of COVID-19-related papers, this study compares the number of citations received by drafts initially distributed as preprints and later published in journals with those received by papers directly submitted to journals. The difference in the median number of citations between COVID-19 preprint-distributed papers and COVID-19 directly submitted papers published in 184 journals was tested using the Mann-Whitney U test. The results showed that 129 journals had a statistically significant higher median citation count for COVID-19 preprint-distributed papers compared to directly submitted pa-pers, with a p-value of less than 0.05. In contrast, no journals had a statisti-cally significant higher median citation count for COVID-19 directly sub-mitted papers. This indicates that 70.11% of the journals that published preprint-distributed papers experienced a significant increase in citations. We also identified that among the 184 journals, 13 journals garnered a sub-stantial number of citations. Among the 74,037 COVID-19 papers, pre-print-distributed papers (9,028) accounted for only 12.19%. However, among the 2,015,997 citations received by COVID-19 papers, preprint-distributed papers garnered 542,715 citations, representing a substantial 26.92%. These results suggest that distributing preprints prior to formal publication may help COVID-19 research reach a wider audience, potential-ly leading to increased readership and citations.

16:30
Evaluating Large Language Models for Healthcare: Insights from MCQ Evaluation

ABSTRACT. This study investigates the performance of general and medical-specific Large Language Models (LLMs) in obstetrics and gynecology, focusing on their ability to accurately handle medical multiple-choice questions (MCQs). We evaluated models like Llama2, Mistral, PMC_LLaMA, and BioMistral, to assess and enhance their reliability and accuracy. Despite the expectations, general-purpose models occasionally outperformed specialized medical models. Our methods, including Structural Influence Testing and Contextual Enhancement Testing, demonstrated significant potential in improving model accuracy and reducing misinformation. Specifically, Structural Influence Testing increased Mistral's accuracy from 40% to 46% and Llama2's from 28% to 43% with five shots. Contextual Enhancement Testing yielded a 4% accuracy gain for Mistral and 6% for Llama2 using search terms. This research highlights the importance of optimizing LLMs to empower healthcare professionals with precise and reliable medical information, ultimately improving patient outcomes and supporting informed clinical decisions.

16:45
Digital Nudge Alerts: Fact-checking Generative AI Responses

ABSTRACT. Generative artificial intelligence (GenAI) chatbots have reshaped human-AI interaction behavior and transformed industries and educational sectors. Despite its advantages, GenAI presents several limitations and concerns. This study addresses fact-checking responses from GenAI to minimize the negative impacts of artificial hallucination. Artificial hallucination is a response generated by GenAI that contains false or misleading information. Addressing this problem necessitates appropriate interventions to remind users to ensure the accuracy of the GenAI-generated information, thus cultivating responsible and accountable usage. To address this concern, this study employed a behavioral economics approach using a digital nudge-based intervention to subtly remind users to fact-check the AI-generated output. To ensure cost efficiency, the proposed digital nudge intervention is delivered through a browser extension that automatically triggers popover alerts within the GenAI environment.

16:00-17:00 Session 6: AP-iNext Business Meeting

Chair: Di Wang (Renmin University of China, China)

This is an invite-only session.

 

Chair:
Location: 6403
17:00-18:30 Session 7
Location: Exhibition Area
The Power of URLs in Scholarly Papers: Their Role as Metadata Sources for Data Repositories
PRESENTER: Kazuhiro Wada

ABSTRACT. Open science is increasingly important in academia. This paper investigates the citations using URLs, used to reference research resources in scholarly papers, and their potential as metadata sources for data repositories. A dataset of URL citations extracted from scholarly papers in natural language processing is analyzed. We first examine trends in locations and frequency of URL citations over time, then classify the types of research resources and their citation purposes. Furthermore, we assess the usability of URL citations for automatic metadata generation in data repositories. Specifically, we explore how many metadata records can be added and how much the metadata fields can be enriched. Our findings suggest that URL citations offer a promising approach for enhancing metadata quality and availability in data repositories.

Do “Altmetrics” Precede the Bibliometrics?
PRESENTER: Bon-Jin Koo

ABSTRACT. This poster aimed to identify the usefulness of altmetrics as a leading indicator for predicting bibliometrics. This study extracted common trends of bibliometrics and altmetrics using PCA and analyzed potential time gaps between them using cross-correlation analysis. For this job, this study focused on the field of social science, while employing the bibliometrics and altmetrics data from 495,652 papers out of a total of 42 topics published in the past 30 years (1993-2022). As a results of the analysis, the Captures, among the altmetrics indicators, preceded the number of papers, the number of Authors by 2 years and the Usage counts by 3 years, and the finding of the study suggests the captures can be used as a leading indicator of the number of articles, authors, and usages.

Development of a Common Data Set for Smart City in Thailand : Research Concept Paper

ABSTRACT. Thailand has numerous data sources related to smart city development; however, these data are often fragmented, focusing on specific systems, aspects, and technologies. This fragmentation poses challenges for inter organizational use and collaborative development efforts. This research aims to develop a common data set for smart city development in Thailand, comprehensive characteristics of the seven key areas in smart city development: smart environment, smart mobility, smart energy, smart economy, smart people, smart living, and smart governance. The study employs a research and development methodology divided into three phases: (1) analyzing foundational data for smart city development from international and national organizations through a literature review, (2) investigating information and data management practices from key Thai organizations, and (3) synthesizing these findings to develop a common data set, designing a data dictionary for smart cities in Thailand, and evaluating it with expert input. The results indicate significant advancements in city management and sustainability through the use of a standardized dataset. The data dictionary details the data domains, elements, and their properties, providing a valuable resource for expanding knowledge on common data sets for smart cities. This tool will benefit Thailand’s administrators in managing smart city data and developing comprehensive databases. Additionally, it offers a foundation for scholars and researchers to further explore and develop fundamental data for smart city initiatives in other cities.

Analysis of Differences in User Engagement Behavior on Multiple Chinese Public Libraries’ New Media Platforms

ABSTRACT. With the development of an information-based society, public libraries and new media are increasingly converging. This study investigates and compares user engagement behaviors across multiple new media platforms of public libraries and different types of promotional content. The results show that public libraries can gain more followers on WeChat Official Account and Weibo, while achieving higher user engagement on TikTok. Additionally, there are significant differences in user interactions across different types of posts. This study can help guide libraries in formulating targeted promotional strategies on different new media platforms, thereby enhancing user engagement and improving the dissemination effectiveness of libraries' new media platforms.

Generating Surprising and Diverse Ideas Using ChatGPT

ABSTRACT. Generative AIs have become widespread and is being used for a variety of purposes. However, a phenomenon called hallucination, in which generative AI outputs plausible lies, is being viewed as a problem. While hallucination is a problem when it is important to be factual, in some cases, such as idea generation, it is not a problem. In fact, in some cases, it may be better to have hallucination occur to come up with surprising and diverse ideas. In this study, we compared several prompts with the aim of getting ChatGPT to output information that differs from the training data for GPT when generating ideas for a fictional historical novel. The results showed that it was more effective and led to generation of more surprising ideas to directly in-struct ChatGPT to include content that differs from the facts within the prompt rather than using adversarial prompts that cause hallucinations.

Capabilities and Challenges of LLMs in Metadata Extraction from Scholarly Papers
PRESENTER: Yu Watanabe

ABSTRACT. In scholarly papers, research data are cited and their construction and use are mentioned. The descriptions of research data in papers may be used as information for its metadata. In this paper, we focus on Large Language Models (LLMs), which have achieved high performance in various natural language processing tasks, and investigate LLM's ability to extract metadata from papers. In the experiment, we used LLMs to extract metadata from papers and analyzed their metadata extraction capabilities quantitatively and qualitatively. The result indicated that while LLMs can extract metadata from papers extensively, the extraction accuracy is not necessarily high. We confirmed that there are challenges in identifying the names of research data and linking information related to research data.

Estimation of Relevance between Datasets for Enhancing Accessibility of Research Artifacts
PRESENTER: Koichiro Ito

ABSTRACT. Open science has recently been promoted globally, and it is encouraged to publish and share research artifacts such as datasets, tools, software, code, and scholarly papers. To accelerate open science, the accessibility of research artifacts is also important. Previous studies focused on the relevance between scholarly papers to improve their accessibility. However, the relevance between research artifacts other than scholarly papers has yet to be sufficiently explored. In this study, we focus on datasets, and verify the feasibility of estimating the relevance between datasets. Access using the relevance between datasets may be useful when it is difficult to verbalize the characteristics of the desired dataset. We implemented a method for estimating the relevance between datasets. The method utilizes datasets' metadata expressed as text, and the relevance is estimated by a BERT-based model. Experimental results demonstrated the feasibility of the estimation. Finally, we discuss the effect of using metadata for each metadata field.