GTM2024: 14TH ANNUAL GLOBAL TECHMINING CONFERENCE
PROGRAM FOR TUESDAY, SEPTEMBER 17TH
Days:
previous day
all days

View: session overviewtalk overview

08:45-09:50 Session 3A: Power Talks

This session is a speed round of 5-8 minute Power Talks, showcasing research-in-progress

Location: Spektrum
08:45
Societal Impact of Emerging Technologies: A Novel Approach Combining Patent Data and Global News Media Analysis

ABSTRACT. This study proposes a novel approach to measure the societal impact of emerging technologies by combining patent data with the Global Database of Events, Language, and Tone (GDELT). Using genetically engineered food as a case study, we applied topic modeling to patent data and extracted policy-related themes from GDELT. Cosine similarity was used to match the themes and topics. Preliminary results indicate a shift from negative to more balanced views on genetically engineered food, with certain technologies positively influencing government policies. This research aims to develop new metrics for assessing the prominent impact of emerging technologies, contributing to tech mining and informed decision-making.

08:53
Identifying S&T Priorities Using Domain Knowledge and Pre-trained Model
PRESENTER: Xuezhao Wang

ABSTRACT. Accurately identifying S&T priorities of various countries is important for understanding their S&T planning and future arrangements. We propose a framework for identifying S&T Priorities. It employs a comprehensive approach, integrating domain knowledge, feature definition, automatic classification, and text refinement. This framework enables the stepwise filtering and accurate identification of S&T priorities within news texts related to S&T policies and strategic plans. We have preliminarily verified the effectiveness of this framework through manual checks of identification results by librarians. In addition, the processing procedures and techniques involved in each step of the framework exhibit strong universality, offering good applicability and generalization when used for other types of text analysis. However, it should be noted that there is room for improvement in terms of threshold determination and model performance.

09:01
Automating Reproducible Bibliometrics with OpenAlex
PRESENTER: John Culbert

ABSTRACT. Bibliometric studies and Science, Technology & Innovation (STI) research which involves bibliometric data taken from the Web of Science (WoS) or Scopus suffers from a lack of openness, transparency, and reproducibility as researchers are not permitted to freely share and publish the underlying data from their analyses. Workarounds such as “we searched the [query terms q] and exported n records from WoS version m” are difficult to utilise as the underlying dataset from the study is unavailable.

Reproducibility has been a key open issue in bibliometrics/scientometrics for a long time (e.g. Waltman et al., 2018), and the current data sharing and publishing restrictions with the commercial providers are not likely to change in the short term. As a consequence, bibliometric research based on WoS and Scopus data is likely to remain unreproducible and lacks the transparency which is required for Open Science research.

In this presentation, we showcase a proven methodology which supports the automatic composition of open bibliometric datasets selected from OpenAlex (Priem et al., 2022) which can be freely shared and published. We will develop a tool based on this methodology which is capable of processing bulk datasets of DOIs and locating matching articles or other records from OpenAlex.

Open bibliometric datasets enable better bibliometrics and promote openness in scientometrics research. Through facilitating easy bulk access to OpenAlex with the proposed tool, we aim to assist the production of open and reproducible research.

09:09
Identification of Key Core Technologies Based on Patent Data Analysis: A Case Study of Small Molecule Drugs in the Field of Lung Cancer
PRESENTER: Haiyun Xu

ABSTRACT. Exploring the core themes and their evolution related to small molecule drugs in the field of lung cancer through patent literature is of significant importance for drug development strategies, technological development trends, research collaboration, and clinical treatment guidance. This paper first identifies the core domains within the field based on importance and dominance strength. Through BERTopic, text clustering is performed on the documents within the core domains, and a new SMTO theme evaluation system is constructed, achieving the identification of core themes. Finally, a core theme evolution Sankey diagram is constructed, and the evolution patterns are explored.

09:17
Patent Technology Gap Technology Opportunity Forecasting Study
PRESENTER: Wenjun Sun

ABSTRACT. In the patent map, not all blank spots are technological opportunities, and accurately predicting the real technological opportunities at the patent blank spots is crucial for enterprises to seize the market opportunity, reduce the risk of R&D investment, and gain competitive advantages. In this study, we constructed a technology function matrix based on the number of patents and IPC classification number to identify the patent gap areas. Then, we use the Matrix Factorization algorithm to reconstruct the technology function matrix, predict the probability of realizing the technology opportunities in the patent gap area based on the reconstructed matrix, and further compare the prediction results by integrating the technology similarity and function similarity. Finally, the organization and time dimensions are added based on the technology function matrix respectively, and tensor reconstruction is carried out, to make a more accurate prediction by mining the connection and change situation between each element. The experimental results show that the method proposed in this paper can effectively predict the realization probability of technical opportunities in the patent gap area. By dividing the prediction set and validation set, the technical opportunities in the gap area with a higher realization probability predicted in the prediction set are all realized in the validation set, which further illustrates the reliability of this study.

09:25
Research on the Layout of the Industry Chain for Typical EDA Chips Software
PRESENTER: Ping Zhao

ABSTRACT. This research presents an exhaustive analysis of the Electronic Design Automation (EDA) industry, a cornerstone of the integrated circuit industry. The methodology employed integrates various approaches including patent analysis, expert consultation, similarity calculation, and machine learning techniques. A strategic approach, combining keyword and technology classification searches, is utilized within the Incopat database. This strategy leads to the identification and analysis of 15,537 EDA-related patents, providing a substantial pool of data for comprehensive study. The TF-IDF algorithm is employed to calculate the similarity between EDA patents across multiple defined dimensions. This complex process involves stages such as unified word segmentation, stop-word removal, vector calculation, and similarity calculation. The outcome is a multi-class classification of patents, a valuable tool for exploring the technological layout of leading R&D entities in the EDA market. A deeper investigation into the industry chain layout of the three EDA giants: Synopsys, Cadence, and Siemens EDA, is conducted through a meticulous survey of primary R&D entities. Our study reveals frequent patent transfers as a critical strategy for consolidating technological advantages. In conclusion, the paper underscores potential expansion areas for EDA technology, with a particular emphasis on the biotech industry, new energy vehicle industry, and artificial intelligence sector. We advocate for increased collaboration and synergies among enterprises, universities, and research institutions to cultivate a more robust and dynamic EDA ecosystem. This collaborative approach, we argue, will be essential for driving innovation and sustaining growth in the EDA industry.

09:33
GitHub as a Tech Monitoring Tool: Case with Quantum Technologies

ABSTRACT. This study investigates the use of GitHub as a technology monitoring tool to understand trends in newly emerging and disruptive technologies that have a significant impact on developing cybersecurity capabilities, such as quantum technologies. We collected GitHub projects dedicated to two branches of quantum technology closely related to cyber defense aspects and analyzed their content using NLP techniques. Our analysis, based on star indicators that capture the support of the GitHub community, reveals that quantum technologies are in an early stage of adoption, with top repositories primarily focused on quantum learning materials. There is also notable interest in quantum algorithms and circuit development to unlock the technology's full potential. Keyword-based analysis on README files attached to each GitHub project highlights the need for transparent guidelines given the complexity of quantum technology. Python plays a central role in quantum development. Time-series-based trend analysis indicates growing projects addressing various aspects, such as integrating classical and quantum computing, post-quantum cryptography solutions, and improving quantum simulation capabilities.

08:45-09:50 Session 3B: Power Talks

This session is a speed round of 5-8 minute Power Talks, showcasing research-in-progress

Location: Lux
08:45
Identifying Key-Core Technologies in Biosensor Development Based on Semantic Associations of Technology Structure-Function
PRESENTER: Man Jiang

ABSTRACT. Identifying and mining deep semantic information from patent texts is essential for accurately capturing technological development opportunities and guiding the strategic orientation of Key-Core technologies. This paper presents a novel method for identifying Key-Core technologies by analyzing semantic associations between technology structure and function. By integrating sentence-level structural information with word-level semantic data, the proposed approach enhances the accuracy and depth of Key-Core technology identification. The method begins by extracting technology structure-function elements based on their positions within patent texts, followed by using an improved SAO (Subject-Action-Object) model to capture the internal semantic structure of these elements. Next, a multidimensional set of indicators is developed to assess the KC-Score of technologies, focusing on their criticality, centrality, and cohesiveness. Additionally, a knowledge graph is constructed to visually represent both internal and external semantic associations of technology structure and function, providing actionable insights for researchers and technology managers. The method’s effectiveness is validated through its application to biosensor technology, where it successfully identifies key structural elements, such as multi-layer functional materials and nucleic acid engineering, and their associated functions. This study offers a comprehensive tool for advancing biosensor technologies and provides a framework applicable to other technological domains, supporting innovation and strategic planning.

08:53
Digital Tools in the Circular Economy: A Scientometric Analysis Using AI and Direct Citation Clustering
PRESENTER: Vinicius Muraro

ABSTRACT. In the field of sustainability, the circular economy (CE) has emerged as a transformative approach to resource utilization and waste minimization. More recently, digital tools have shown a potential to accelerate the transition to more sustainable and circular practices This paper adopts the scientometric methodology that utilized the Leiden Algorithm for clustering publications based on direct citations, to explore the integration of digitalization and artificial intelligence (AI) in the circular economy. We refine this approach by focusing on a narrower dataset that captures the interplay between technological advancements and circular economic practices. By employing a large language model, we generate concise, informative labels for each cluster, thereby enhancing the understanding of cognitive connections within this interdisciplinary field. The study analyzes the evolution of topics over time, assessing how digital tools and AI have influenced the productivity and thematic development of circular economy research. Preliminary results suggest a focus on manufacturing processes and data-driven strategies in resource management, highlighting the pivotal role of AI and digital technologies in advancing CE principles. This analysis not only underscores the dynamic nature of the circular economy but also provides a methodological blueprint for future research in integrating advanced technologies with environmental sustainability efforts.

09:01
The Emerging Family Model Based on Knowledge Gene Theory: A New Tool for Discovering Technology Opportunities
PRESENTER: Zhinan Wang

ABSTRACT. The identification of emerging technologies based on emerging score may miss many technology opportunities. The significant impact of emerging technologies on society cannot be achieved by a single emerging technology alone. The inherent properties of technology reveal that it has attributes of heredity and variation. Many disappearing or unstable emerging technologies actually achieve diffusion effects through new technologies after undergoing heredity and variation. Therefore, this paper introduces the knowledge gene theory on the basis of the emerging score indicator. First, it classifies emerging technologies based on changes in emerging score. Then, by extracting the knowledge genes of the emerging technologies, it identifies emerging families and constructs influence forecasting indicators for these families by deep learning methods. The emerging families that are expected to have high impact in the future are considered as technological opportunities. This paper uses smart manufacturing field to verify the effectiveness of the approach for discovering technological opportunities.

09:09
Knowledge Transfer from Interdisciplinary Research on Circular Economy to Academic Disciplines - a bibliometrics study
PRESENTER: Vinicius Muraro

ABSTRACT. This study investigates the knowledge transfer from interdisciplinary research on Circular Economy (CE) to classical academic disciplines (ADs) such as Engineering, Economics, Geography, and Management. Despite the prolific publication of CE research in interdisciplinary journals, there is a notable discrepancy in its integration into traditional AD journals. Utilizing a bibliometric analysis of 22,400 articles indexed in Web of Science, this paper explores the dissemination and evolution of CE publications across both disciplinary and interdisciplinary journals. The study employs Journal Citation Reports for journal categorization and utilizes tools like Excel and VantagePoint to map the flow of knowledge within the field. Our findings reveal that a significant majority of CE research is published in interdisciplinary platforms, particularly in Engineering, highlighting a robust intersection of knowledge from various disciplines. The study not only identifies the disciplines incorporating CE into their evolution but also offers insights into those that are lagging, proposing a methodological protocol for further academic inquiry.

09:17
Research on Patent Portfolio Identification Based on Matrix Similarity
PRESENTER: Chongjun Xi

ABSTRACT. Patent portfolio identification is of great significance for optimizing the screening mechanism of patented technologies and the cultivation and transformation of high-value patents. This study began from the connotation of patent portfolios, selected patent similarity and patent complementarity indicators to characterize the relevance of patents, and constructed a patent relationship matrix by selecting relevant indicators based on the characteristics and functions of patent portfolios. With the degree matrix of patent relationship matrix and patent relevance matrix, potential patent portfolios are identified by clustering them. The results show that the method proposed in this study is scientific, reasonable and interpretable, and the identified patent portfolios can better meet the relevant features of patent portfolios.

09:25
Investigating Past, Present, and Future Trends on the Interface Between Marine and Medical Research & Development: A Quantitative Strategic Foresight Analysis
PRESENTER: Mehdi Zamani

ABSTRACT. The convergence of marine sciences and medical studies has the potential for substantial advances in healthcare. This study uses bibliometric and topic modeling studies to map the progression of research themes from 2000 to 2023, with an emphasis on the interdisciplinary subject of marine and medical sciences. Using the Hierarchical Dirichlet Process, we discovered dominating research topics during three periods, emphasizing shifts in focus and developing trends. Our data show a significant rise in publication output, indicating a growing interest in using marine bioresources for medical applications. The paper identifies two main areas of active research: "natural product biochemistry" and "trace substance and genetics", and each has great therapeutic potential. We used social network analysis to map the collaborative networks and identify the prominent scholars and institutions driving these breakthroughs. Our study indicates important paths for research policy and R&D management. It underscores the significance of quantitative foresight methods and interdisciplinary teams to identify and interpret future scientific convergences and breakthroughs. Also, it highlights critical insights for policymakers and research and development managers operating at the crossroads of healthcare innovation and environmental sustainability.

10:00-11:15 Session 4A: AI Readiness, Scientific Contributions, and Frugal Innovations
Chair:
Location: Spektrum
10:00
An assessment of AI readiness for the adoption of AI technologies in organisations
PRESENTER: Nigel Walton

ABSTRACT. As artificial intelligence grows exponentially, its adoption and successful implementation will be critical to the success and survival of modern companies. However, research on AI readiness and AI adoption is still in its infancy resulting in a lack of guidance for researchers and practitioners alike.

It is the purpose of this paper to advance the understanding of AI-readiness by undertaking a systematic literature review of published work on the topic and to propose a new conceptual AI Readiness Levels (AI-RL) framework.

Although some early foundational research has been carried out in the field, this is fragmented and lacks any form of seminal model. The complexity of AI as a technology and its manifold organisational applications currently makes a one-size fits all solution highly elusive.

The weaknesses of the current research on `readiness` is that it is mostly conceptualised with psychological factors such as readiness for change and general contextual factors not digital and technological factors relating to data structures, use cases and AI. Readiness models also require context-specific consideration and need to be tailored to the related domain such as the industry, technology or organisational characteristics.

Therefore, when assessing the current state of AI readiness within organisations, this adds little value without acknowledging the future, aspired AI adoption purpose and how the two are aligned together.

10:25
Exploring Scientific Contributions through Citation Context and Division of Labor
PRESENTER: Liyue Chen

ABSTRACT. Scientific contributions are a direct reflection of a research paper's value, illustrating its impact on existing theories or practices. Existing measurement methods assess contributions based on the authors' perceived or self-identified contributions, while the actual contributions made by the papers are rarely investigated. This study measures the actual contributions of papers published in Nature and Science using 1.53 million citation contexts from citing literature and explores the impact pattern of division of labor (DOL) inputs on the actual contributions of papers from an input-output perspective. Based on the automatic identification results, it analyzes the relationship between the distribution of actual contribution types and DOL input types, as well as the internal relationships among different types of scientific contributions. Results show that experimental contributions are predominant, contrasting with the theoretical and methodological contributions self-identified by authors. This highlights a notable discrepancy between actual contributions and authors' self-perceptions, indicating an "ideal bias." There is no significant correlation between the overall labor input pattern and the actual contribution pattern of papers, but there is a positive correlation between input and output for specific types of scientific contributions, reflecting a "more effort, more gain" effect. Different types of labor division input in research papers exhibit a notable co-occurrence trend. However, once the paper reaches the dissemination stage, the co-occurrence of different types of actual scientific contributions becomes weaker, indicating that a paper’s contributions are often focused on a single type.

10:50
Assessing Frugal Inventions by means of Generative Artificial Intelligence
PRESENTER: Andre Herzberg

ABSTRACT. The use of frugal innovations has emerged as a crucial strategy for addressing the challenges of sustainable development and fostering inclusive growth. Starting in developing countries, their application has migrated as well to more developed countries to address new customer groups. Before frugal innovations reach the market, they start as frugal inventions, which are often protected by intellectual property in form of patents. In this line of research, the identification and evaluation of frugal patents is carried out in a semi-automatic procedure. It was shown in previous research that the evaluation via a machine learning approach in the form of topic modeling is only of limited use because the precision suffers. Our basic research question strives to deal with this shortcoming: Can a Large Language Model outpace traditional approaches of assessing the frugality of patents? To answer this question, we test the use of a Large Language Model, more precisely GPT-4, for the evaluation of frugal patents. For this purpose, we assess the frugality of 172 patents in the technology field of white goods. We test four basic propositions, which mainly deal with the precision of the used large language models, their explainability of evaluation results, the relevance of prompt engineering and the comparison to previous approaches. As managerial implications, we identify efficient ways of assessing the frugality of a patent. This is of particular importance for practitioners, as it reduces the workload and costs remarkably for identifying such innovations.

10:00-11:15 Session 4B: Leveraging Large Language Models (LLMs)
Chair:
Location: Lux
10:00
Extracting Semantic Entity Triplets by Leveraging LLMs
PRESENTER: Andrei Kucharavy

ABSTRACT. As Large Language Models (LLMs) become increasingly powerful and accessible, there is a rise in concerns regarding the automatic generation of academic papers. Several instances of undeniable usage of LLMs in reputable journals have been reported. Probably significantly more articles were partially or entirely written by LLMs but have not yet been detected, posing a threat to the veracity of academic journals.

The current consensus among researchers is that detecting LLM-generated text is ineffective or easy to evade in a general setting. Therefore, we explore an alternative approach, targeting the stochastic nature of LLMs. As LLMs are stochastic text generators, hallucinations in long texts are a persistent problem, and the generated output regularly contains counterfactual components. Semantic entity triplets can be used to assess a text's factual accuracy and filter the publication corpus accordingly.

Previous work has built a classical triplet-extraction pipeline based on spaCy. However, the limitation of this method is the retrieval of relatively few triplets that tend to be overly generic, to the point of being domain-agnostic. We overcome these limitations by applying few-shot prompting on the recently released Meta-Llama-3-8B-Instruct. The results show we can extract more triplets per paragraph than the classical extraction method. Moreover, we show that the triplets are more specific and find no evidence of hallucination when comparing the extracted subjects and objects to the original reference text.

10:25
Exploring Citation Impact in Circular Economy Research: An Analysis of Expected Citations Based on LLM-Generated Lexical Similarities Between Papers

ABSTRACT. We present a novel approach for estimating citation impact in circular economy research using Large Language Models to create lexical similarity relationships between papers. By applying cosine similarity, we weigh the estimated citations for each paper based on the citations of their most similar papers. This approach builds on the concept of related records and employs a bottom-up clustering methodology for citation-based assessments, enhancing the granularity and accuracy of bibliometric analysis. Our dataset consists of publications from 2001 to 2022 sourced from the Web of Science Core Collection and processed by ECOOM. Using this comprehensive dataset, we identify thematic clusters and apply normalization by publication year, cluster, and document type to achieve the most accurate citation estimations. This combined normalization strategy yielded improved results, providing a more nuanced understanding of citation impact in Circular Economy research.

10:50
Study on patent function words auto extraction based on a small LLM
PRESENTER: Chunjiang Liu

ABSTRACT. When tackling extensive patent data analysis, the analysis of technology function matrix stands out as a crucial method for delving deep into patent technology content and portfolios. However, conventional approaches to patent technology function matrix analysis rely heavily on manual technical decomposition and indexing. This process often leads to extended analysis durations, cumbersome indexing procedures, and challenges in keeping the matrices updated. Consequently, widespread adoption and promotion of this analysis method become constrained. In recent years, the emergence of advanced language models (LLMs), exemplified by ChatGPT developed by OpenAI, has presented novel solutions to these challenges. LLMs exhibit the capacity to handle intricate and vast datasets, engage in representation learning, automatically discern the underlying structures within data, and facilitate the organization and utilization of knowledge (Wang et al., 2023). However, mindful of the computational resources and costs associated with model training and deployment, there arises a necessity to distill the insights of complex large-scale models into more compact forms, akin to the process where teachers impart knowledge to students (Tian et al., 2024). This approach, known as knowledge distillation, aligns with the Teacher-Student model proposed by Hinton et al. (2015). In light of these considerations, this paper introduces a Teacher-Student technology distillation model and conducts an empirical analysis utilizing a small LLM.

11:15-11:45Coffee Break
11:45-13:00 Session 5A: Innovative Text Mining and Technological Insights
Location: Spektrum
11:45
Deep Learning-based Text Mining for Technology Monitoring in the Automotive Domain

ABSTRACT. The "Text2Tech" research project aims to automate the extraction of technology-related data and its associations from unstructured text sources, such as patents, research papers, and industry news, using Natural Language Processing (NLP). Specifically, the project focuses on Named Entity Recognition (NER) and Relation Extraction (RE), fundamental tasks in NLP, to identify and understand relations between extracted entities. This study assesses the efficacy of Large Language Models (LLMs) for these tasks within the automotive manufacturing context, which is typically characterized by limited training data.

We explored two main approaches: a prompt-based method using zero-shot and few-shot prompting strategies with models like GPT-3.5 and BART, and a fine-tuning approach that iteratively adjusts models based on semi-automatically labeled data. Our initial results indicate that while prompt-based methods are useful, fine-tuning significantly enhances model performance by tailoring it to specific domain needs.

Preliminary results demonstrate the challenges of adapting LLMs to domain-specific NER and RE, highlighting differences in performance between models and the nuanced nature of relationship extraction. For instance, BART-large trained with the REBEL approach shows promising early results compared to other models. Despite these advancements, the complexity of accurately capturing and categorizing relationships in automotive manufacturing documents remains high, as reflected by low inter-annotator agreement in manual labeling. Future work will include refining models and expanding the dataset for improved entity linking and relationship extraction accuracy.

12:10
Conceptualizing and Identifying Technological Leapfrogging: Insights from Punctuated Equilibria Theory
PRESENTER: Dongmei Ye

ABSTRACT. Technological development possesses an elusive nature. Current studies focus on exploring evolutionary patterns such as technological convergence and diffusion but lack exploration of major technical state changes and their underlying processes. This is not conducive to genuinely understanding the historical technology changes. Therefore, this manuscript employs punctuated equilibrium theory to understand and dissect the technological advancement process. Our focus differs from prior efforts by operationalizing the technical status, which is critical for examining the technical level. Specifically, we first provide a phenomenological description and conceptualization of technological leapfrogging based on the theory of punctuated equilibrium and in conjunction with forms of technological evolution. On this basis, we quantitatively measure technological leapfrogging and apply it to the Stereo Lithography Appearance (SLA), the earliest technique of 3D printing, as a case study.

12:35
Computational clues for finding information in text: application for UNESCO Geoparks’ documents
PRESENTER: Alysson Mazoni

ABSTRACT. Information retrieval is an area that has been in expansion in the last few decades and it is mostly related to developing techniques based on natural language processing. However a more general approach that can be expanded into a methodology to analyse large portions of texts is still not a reality. We study a sequence of articles describing UNESCO Geoparks (Patzak 1998; Henriques 2017) as our case study where we aim to detect a particular information: how is the local community treated in the sequence of documents? We aim to discover if they are included in the planning and management and in what form. We have assembled a training dataset where sentences were manually selected as the ones that cite the local community and define how it is treated, namely: spectators, beneficiaries, stakeholders or actors. Our procedure consists of having a training dataset of relevant sentences where we evaluated a few properties related to the original texts: - Relative position to the length of the text. - Topic modelling as a representation of the sentences in the whole texts. - Document embeddings for the sentences. We have obtained a success rate of 92% in detection of relevant sentences and 74% success in automatic classification of the categories on how the local population is treated. Such an approach is therefore useful for preliminary analyses of large sets of documents if applied to specific sets of texts and with correct training.

11:45-13:00 Session 5B: Advancing Knowledge Discovery and Analytics
Chair:
Location: Lux
11:45
A knowledge graph of hematopoietic stem cell oriented to scientific knowledge discovery
PRESENTER: Zhengyin Hu

ABSTRACT. With the overwhelming size and rapid growth of biomedical literatures, it is a big challenge to extract, mine and identify the new, useful and potential knowledge accurately from the literatures by understandable patterns in a credible way. Knowledge Discovery in Biomedical Literature (KDiBL) aims to alleviate these issues by combining text mining, semantic techniques and scientometrics methods, and has become an important research area in TechMing. Knowledge Graph (KG) can effectively link, integrate and fuse heterogeneous information from multi-sources data as semantic network based on the graph structure. This study construct a hematopoietic stem cell (HSC) KG to support KDiBL. It is organized as follows. In Section 2, we briefly introduce the HSC KG. In Section 3, a KDiBL case which discovers the latent and unknown relations of diseases in HSC based on the HSC KG is presented in details. Finally, a conclusion was given.

12:10
Enhancing Trust in Online Product Review Analytics: Designing Reviewer Behavioral Visibility for New Product Development

ABSTRACT. New product development leverages online product reviews for cost-effective customer feedback, but anonymity on digital platforms raises concerns about input data quality for analytics. As a remedy, this study explores how behavioral visibility in customer profiles can increase perceived reviewer trustworthiness to ease data quality assessment and promote the intention to use OPR analytics for managerial decision-making. Drawing on design science principles, qualitative expert interview data is analyzed to develop a solution approach that is demonstrated based on social set analysis and large language model-aided topic analysis. Structural equation modeling (PLS-SEM) finally evaluates the impact of reviewer behavioral visibility on managerial intent to use OPR data. The findings show, that (1) Reviewer behavioral visibility functions as an antecedent of perceived reviewer trustworthiness for new product developers; (2) The proposed novel solution approach effectively groups reviewers based on historical reviewing behavior leveraging public reviewer profile data; (3) The underlying research model confirms that perceived trustworthiness fully mediates managerial usage intent and reveals different behavioral visibility factor hierarchies compared to customer decision-making. When successfully implemented, the approach helps assess the input data quality of online product review analytics activities offering a simple and effective approach to facilitate customer feedback integration in new product development and similar decision-making contexts.

12:35
Trade Marks Text Mining
PRESENTER: Gareth Jones

ABSTRACT. PURPOSE: Research conducted to use text mining techniques on the Trade Mark specification data field for the purpose of gaining insight into the modern Trade Mark landscape. Includes what goods and services customers are seeking protection for, how customers are using the Trade Marks system, and trends for particular topics or words.

METHODOLOGY: Specification text strings from UK Trade Marks applications 2014-2023 are extracted and processed by removing symbols and numbers. Highest performing sentence BERT model tokenizes into 768-bit vectors. UMAP dimension reduction transforms the data to 10 dimensions, parameter optimization for HDBSCAN finds a density-based clustering. Viewing cluster centroids is performed to identify cluster topics.

RESULTS:

· 250 topics identified across Trade Marks specifications from 2014-2023 inclusive.

· Performance metric DBCV scores of 0.4 indicate an appropriate clustering for analysis.

· Two-thirds of the data was clustered, remaining data contains keywords several topics.

· Biggest clusters ‘Clothing, Footwear, Sportswear’, ‘alcoholic beverages’, ‘financial services’.

· Newest clusters are for AI and Blockchain software.

· Activity for CBD products peaking 2019 after its legalization prior year

· Generic terms such as ‘software’, ‘beer’, ‘clothing’, show decline

· Covid impact shown by disinfectant products and medical masks activity in 2020

CONCLUSION: BERTopic methodology successfully applied to identify 250 topics for the modern Trade Marks landscape. It explains how customers use Trade Marks, with ~20% specification data containing keywords from 5 or more topics. Further analysis for technology such as blockchain, AI, gaming could explain how these fields have changed over time.

13:00-14:00Lunch
14:00-15:15 Session 6A: Advancing Knowledge Discovery and Analytics
Location: Spektrum
14:00
News Sentiment Analysis Leveraging Large Language Model for International Relation Studies

ABSTRACT. In this study, we report our investigation of applying large language models (LLMs) to international relations studies. The goal is to apply LLM-based sentiment analysis to understand the chronicle perception evolution of a foreign country by analyzing news articles. This study uses the evolving Polish media view of China as an example. Sentiment analysis, the computationally identifying and categorizing subjective information expressed in text, has been applied in international relations studies in the past. However, sentiment analysis has witnessed significant advancements with the emergence of LLMs. LLMs, such as GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers), have revolutionized natural language processing tasks by exhibiting exceptional capabilities in understanding and generating human-like text. The benefits of using pre-trained LLMs are twofold. One is to avoid manually labeling sentiment tags, which can be time-consuming and expensive. Two is to allow researchers to investigate not only the headlines but also the news article content. We gathered news articles from the Polish daily “Gazeta Wyborcza” with the keyword “Chin” 1. The time span of articles collected is 34 years, from 1989 to 2023. In total, 18409 articles were collected. Two hypotheses were tested: i) sentiment analysis based on news article content may provide more information than based on the headline. ii) Sensitivity analysis with LLMs can provide results similar to those of human annotators. The experiment results in support the acceptance of the hypotheses. This study opens up the new ways of approaching international relations studies.

14:25
Innovation research analysis: A topic modeling of publications across countries
PRESENTER: Denilton Darold

ABSTRACT. Innovation research as a discipline does only exist since about 50-60 years. Early publications for example by Schumpeter (1934), Bush (1945) or Arrow (1962) were the basis of innovation research and innovation economics. However, an evolutionary process led to what innovation research is today. A systematic use of innovation theory and innovation research emerged also mainly in the 1990s and continues to be an important scientific input, both from economics as well as management, into policy making of industrialized countries. This paper intends to explore this evolution and analyze the current status of the discipline and the differences among countries. By using novel sentence transformers models, we employ an unsupervised topic modeling to analyze the evolution of innovation research in industrialized countries. We use bibliometric data from Elsevier-Scopus to analyze the evolution of innovation research in industrialized countries empirically. For this analysis, we resort to about 220,000 publications in Scopus, with more than 21,000 in the year 2022 alone. The implementation chosen, BERTopic, leverages on sentences transformers and BERT, techniques that allows for a contextualized and nuanced text analysis

14:50
International collaboration and high citation: Global impact or home country effect?
PRESENTER: Jue Wang

ABSTRACT. Scientific research has become more collaborative, which brings a number of advantages including higher citation rates. This study examines the factors contributing to higher citations by distinguishing between the quality of work and the home-country effect. Using international co-authorship as a key variable, we analyze citation patterns across a diverse range of fields over a 10-year period, and differentiate between citations accrued in the authors’ countries and citations received in other countries. The results demonstrate the presence of both a global impact and a home-country effect. Specifically, publications with international co-authorship receive significantly more citations from abroad, which strongly implies that international collaboration fosters high-quality research and positively impacts citation rates, especially when considering the relatively smaller foreign community size once the authors’ home countries are excluded. On the other hand, it is also observed that domestic citations from authors’ countries increase faster than foreign citations and the effect is more pronounced over a longer period of time, which suggests that home country effect plays an important role in accumulating citations through the increased visibility in the domestic research community. The study confirms the pivotal role of international collaboration in research impact and highlights the significance of network building.

14:00-15:15 Session 6B: Analyzing Technological Structures and Ecosystems
Location: Lux
14:00
A Motif Based Method for Technology Structural Similarity Analysis

ABSTRACT. This study aims to explore a novel approach for assessing the structural similarity of technology networks across different enterprises or organizations. Rather than focusing on macroscopic technological allocation, we delve into micro-level network topology by analyzing network motifs—local building blocks of network structure. In our empirical investigation within the biomedical field, we utilize patent citation data for network construction. The results reveal that by examining the distribution and similarity of motifs, we gain deeper insights into the micro-level correlations within enterprise technology networks. Furthermore, this analysis allows us to identify essential technological combinations. Overall, our method and findings provide a fresh perspective on understanding innovation activities and expanding enterprises’ technological horizons.

14:25
The Evolution of the Innovation Ecosystem in the Semiconductor Industry
PRESENTER: Hung-Chi Chang

ABSTRACT. How does the semiconductor industry innovation ecosystem develop? How do institutional settings shape the evolution of the innovation ecosystem in the process of technological development? This study examines the development of the semiconductor industry innovation ecosystem and how institutional factors influence the evolution of the innovation ecosystem and technological advancement. Collect the number of patents and the number of citations and industrial transactions in the Taiwan semiconductor industry as data sources to explore whether the changes in the two periods of the "Enforcement Rules of the Statute for Upgrading Industries" and the "Statute for Industrial Innovation" impact the semiconductor industry. The result shows that industrial policy significantly positively impacts the semiconductor industry in the early stage. In the later stage, institutional factors do not affect innovation activities in the semiconductor industry. Policies and regulations have varying degrees of impact on industrial development and corporate performance. The innovation ability of the semiconductor industry shows its independence and maturity. These findings have important implications for understanding the role of policy on industrial development and its long-term trends.

14:50
The Illusion of Decline: Unpacking Intrinsic Network Effects in Scientific Disruption
PRESENTER: Vincent Holst

ABSTRACT. Recently, a new research field has emerged called Science of Science. As the name suggests, its main goal is to detect structural patterns in science itself. A common approach is given by evaluating bibliometric measures on citation networks. However, it is well known that even random networks exhibit non-trivial behaviors on certain measures. Therefore, it is crucial to distinguish true changes in scientific practices from intrinsic properties of large-scale networks. Here, we argue that the results of the widely recognized study by Park et al. (2023) can largely be attributed to such structural properties of citation networks. Based on the CD index (Funk & Owen-Smith, 2017), which is a measure of disruption in citation networks, the authors reported a decline in the disruptiveness of scientific and technological knowledge over time. However, when randomly rearranging the citations between papers in the SciSciNet data source (Lin et al., 2023) through Monte Carlo simulations, we observe that even the randomized citation networks yield a nearly identical decline in disruption compared to the true, underlying network. In other words, even in a counterfactual world where scientists cite randomly instead of referring to the papers and research articles they build upon, one would retrieve the results from Park et al. (2023).

15:15-15:45Coffee Break
15:45-16:45 Session 7: AI: Technical Implications and Ethical Issues for Bibliometrics

Joint Panel Discussion debriefing findings from the week's concurrent events: GTM2024 and Workshop on Bibliometric Measures of Epistemic Change

Moderators: Jochen Gläser and Alan Porter

Panelists: Wolfgang Glänzel, Tommaso Ciarli, Cassidy Sugimoto, and Yi Zhang

Location: Spektrum