GTM2024: 14TH ANNUAL GLOBAL TECHMINING CONFERENCE
PROGRAM FOR MONDAY, SEPTEMBER 16TH
Days:
next day
all days

View: session overviewtalk overview

13:30-14:45 Session 1: Welcome and Keynote

Welcome

General Conference Co-Chairs - Alan Porter and Denise Chiavetta 

Program Chair - Rainer Frietsch

Keynote

Soeren Auer "Leveraging NeuroSymbolic AI for Tech-Mining in the Open Research Knowledge Graph”

Location: Spektrum
14:45-15:15Coffee Break
15:15-16:30 Session 2A: Innovative Approaches in Machine Learning
Location: Spektrum
15:15
Explainable Prediction of Knowledge Recombination: A Synergised Method with Large Language Models and Graph Learning
PRESENTER: Yi Zhang

ABSTRACT. When assuming innovation is the recombination of existing knowledge, this study develops a synergised method with large language models (LLMs) and knowledge graphs for predicting knowledge recombination, with two phases: LLM-augmented graph embedding and graph-enhanced LLM inference. Technically, this work is a pioneering study of the GTM community, bringing fresh insights into AI’s great potential in revolutionising the paradigms of tech mining. Empirically, this study visualises the global topical landscape with potential recombinations in a science map, providing explainable evidence for understanding the global trends of inter-/cross/transdisciplinary innovation.

15:40
Early identification of potential disruptive technologies using outlier detection method and deep learning from patent data
PRESENTER: Xin Li

ABSTRACT. In the early stage of the formation of disruptive technologies, identifying potential disruptive technologies early is crucial to the strategic layout of enterprise R&D and strategic decision-making on technological innovation. Patent data records a large amount of information on technology development activities and is also considered an essential data carrier for disruptive technologies. Numerous scholars take patents as primary data sources and use various patent analysis methods to identify disruptive technologies. However, there is a lack of deep-level patent semantic mining approaches to extract the technical information embedded within patents at a fine-grained level. Therefore, in this paper, we proposed a framework for early identifying potential disruptive technologies. In the framework, we firstly introduced the SAOX structural semantic analysis method aimed at deeply mining technical information within patents. Secondly, we also integrated patent indicator data features and patent text data features to obtain more comprehensive technological information. Furthermore, we used global outlier detection algorithms and local outlier detection algorithms to identify outlier patents simultaneously. Finally, we used deep learning to predict the number of forward citations of the outlier patents containing potential disruptive technologies, in order to measure the impact of potential disruptive technologies and identified them early. We took artificial intelligence technology as a case study to verify the feasibility and effectiveness of this framework.

16:05
A Data Pipeline for Identifying and Classifying Fiber Composite Companies via Web Scraping and Machine Learning
PRESENTER: Anne Kreuter

ABSTRACT. To enhance collaboration and innovation within the fiber composite materials sector—critical for advancing the circular economy in regions like Elbe Valley Saxony—this study introduces a robust data pipeline that integrates web scraping, machine learning, and synthetic data generation with large language models. Addressing the technological, organizational, and social challenges prevalent in the sector, the pipeline employs rank-based zero-shot learning, and two transformer-based classifiers to effectively filter and identify relevant web content, achieving an identification F1 score of 0.8. Subsequent categorization into ten distinct activity classes navigates the complexities of multi-label classification, resulting in weighted and micro F1 scores of 0.56 and 0.59, respectively. This innovative approach not only demonstrates precision in distinguishing relevant companies but also supports the sector's sustainability and digital transformation efforts, offering a transformative tool for stakeholders in the circular economy.

15:15-16:30 Session 2B: Advanced Analytical Techniques in Emerging Topics and Industry Sustainability
Location: Lux
15:15
Research on the Measurement Method of Hysteresis Effect in the Associated Evolution of Fund and its Funded Paper Topics
PRESENTER: Haiyun Xu

ABSTRACT. The study investigates the synergy and hysteresis in the evolution of funding and supported paper topics, enhancing emerging trend predictions. The Department of Social and Economic Sciences of the National Science Foundation of the United States and associated paper data serve as a case study example. Initially, the LDA model identifies topics in both fund and paper texts, establishing their association using cosine similarity. Subsequently, topic evolution time series are created. Then, the hysteresis effect is analyzed based on topic popularity and content, exploring the hysteresis effect and mechanism in topic evolution for predicting emerging trends. These findings deepen understanding of knowledge generation, dissemination, and transformation processes in scientific research and offer valuable insights for evaluating fund assistance and related efforts.

15:40
Early Detection Method for Weak Signal in the Evolution Process of Emerging Topic Networks
PRESENTER: Haiyun Xu

ABSTRACT. By analyzing the multiscale evolutionary sequences of knowledge types in early thematic knowledge networks, this study explores the characteristics and driving mechanisms of the early emergence and development of transformative innovation topics. By acquiring data on papers, patents, and product information, thematic co-occurrence networks are constructed to obtain scientific, technological, and industrial knowledge networks. Subsequently, temporal knowledge sub-networks of the three types are obtained by year, and from various micro-scale perspectives (nodes, edges, and graphlets) and meso-scale perspectives (topic communities), the number of multiscale knowledge types contained in the knowledge networks is extracted to form multiscale knowledge type evolutionary sequences. Utilizing dynamic time warping to measure the similarity between knowledge type evolutionary sequences, the Isolation Forest and COPOD methods are employed to detect mutations in sequence evolution. Finally, the transfer entropy method is used to explore the predictive causal relationships between different knowledge type evolutionary sequences. This paper selects the field of regenerative medicine (stem cells) and leukemia for comparative analysis, investigating the characteristics of knowledge type evolution in disruptive technology fields and incremental technology fields from a micro-meso linkage perspective, and analyzing the relationships and mechanisms of action among multiscale knowledge type evolution.

16:05
A Study on Sustainability of Emerging Industries: A Comparative Text-Mining and Thematic Analysis
PRESENTER: Hung-Chi Chang

ABSTRACT. How do emerging sectors implement social responsibility? How do emerging sectors achieve sustainability? This study aims to explore how firms implement social responsibility to achieve sustainability in emerging sectors by analyzing the sustainability reports of 34 firms in the semiconductor industry and 34 firms in the finance-insurance industries, which are public listing companies in Taiwan. Combining text-mining techniques and thematic analysis, this study attempts to delineate the definitions of sustainability for these industries by comparing and contrasting the text-ming preliminary results with the findings gathered from analyzing the content of the firms' annual reports. The result shows the semiconductor industry was identified as "Employees" and "Environment," while for the finance insurance industry, the main keyword was "Risk." This shows that the semiconductor industry should care about employee well-being, training, and development and implement environmentally friendly practices. An effective risk management mechanism in the financial industry should be established to ensure stable operations and profitability. In conclusion, sustainability issues originate from enhancing the development of business to achieve the stable operation and profitability of these industries. This study integrates text-mining and thematic analysis to enhance the improved findings. This approach deeply explores sustainability reporting, uncovering subtle differences and hidden patterns, provides empirical evidence on corporate sustainability, and helps guide policy formulation and corporate practices.