View: session overviewtalk overview
| 10:10 | Density-adaptive Stream Text Clustering for Large Scale Dynamic Topic Modeling PRESENTER: Ming Liu ABSTRACT. The dynamic topic model extends traditional topic models to analyze temporal data, enabling the detection of topic evolutions over time. However, most of these models usually require simultaneous access to all the data, making them unsuitable for processing data that continuously streams in. This paper introduces a new dynamic topic modeling method, Density-Adaptive Stream Clustering (DASteamTopic), which is specifically designed for stream data. DASteamTopic combines pre-trained language models with a new stream clustering method that uses micro-clusters to improve the memory efficiency and the adaptability of stream data. It also introduces a unique density-adaptive distance function to measure micro-cluster distances, realizing the automatic number identification of the micro-clusters and the detection of clusters with arbitrary shapes. This method has been verified on three datasets: DBLP, Tweet and New York Times News. Compared with the state-of-the-art dynamic topic models, DASteamTopic generates higher quality topics. Experiment results reveal an improvement in topic quality of up to 32%. Additionally, DASteamTopic experiences less topic drift, ensuring that topics remain more consistent over time. |
| 10:30 | An Optimized BERTopic Modeling Framework for Emerging Technology Identification PRESENTER: Man Jiang ABSTRACT. Identifying emerging technologies is inherently challenging due to their weak signals, small scale, fragmented distribution, and cross-domain characteristics. In patent-based analyses, conventional BERTopic models face limitations in domain-specific semantic representation, outlier detection, and automated topic labeling. This study proposes an optimized BERTopic modeling framework for emerging technology identification. The framework integrates patent-domain embeddings, a main–sub hierarchical modeling strategy for reclustering outlier documents, multi-source keyword generation with semantic ranking, and large language model–based topic labeling. Empirical validation using digital health patents from 2015 to 2024 demonstrates that the proposed framework significantly improves semantic coherence, topic coverage, and sensitivity to marginal and cross-domain emerging topics compared with standard BERTopic configurations. The results indicate that the optimized framework enhances the applicability of BERTopic in patent-based emerging technology analysis and provides a reusable methodological reference for technology evolution analysis and forward-looking assessment. |
| 10:50 | Beyond Flat Keywords: A Hierarchical and Functional Framework for Fine-Grained Scientific Intelligence Mining PRESENTER: Xiao Zhou ABSTRACT. Current keyword extraction methodologies typically treat documents as flat sequences, failing to capture global hierarchical structures and often suffering from “functional blindness” regarding the specific semantic roles of keywords. To address these limitations, we propose HC-SEKE, a framework that augments a DeBERTa-v3-large backbone with a parallel Mixture-of-Experts (MoE) module and a hierarchical context scoring mechanism. Furthermore, to efficiently categorize extracted keyphrases into five functional dimensions (i.e., Task, Method, Field, Dataset, Metric), we implement a knowledge distillation and Supervised Fine-Tuning (SFT) strategy that transfers the reasoning capabilities of GLM-4.5 into a lightweight Qwen3-0.6B model, thereby ensuring low-latency inference. Empirical results across benchmark datasets (including Inspec and Krapivin) demonstrate that HC-SEKE significantly outperforms state-of-the-art supervised baselines, achieving an F1@10 score of 57.9% on Inspec (a 1.4% improvement over the SEKE baseline). Additionally, qualitative evaluations and case studies validate the functional analyzer's effectiveness, demonstrating that it can precisely categorize keywords into five functional dimensions (e.g., Method, Task) and successfully recall semantically critical long-tail terms that are often missed by statistical methods. |
| 11:10 | Hybrid Metric-Guided Multi-Agent Debate for Keyphrase Extraction from Scientific Literature PRESENTER: Dianyuan Zhang ABSTRACT. Keyphrase Extraction (KPE) is a foundational task for navigating the exponential growth of scientific literature, facilitating essential applications such as information retrieval, document indexing, and text summarization. While Large Language Models (LLMs) have revolutionized zero-shot information extraction, existing unsupervised methods face significant challenges: (1) Hallucination, where models generate linguistically fluent but factually deviant phrases; and (2) Reasoning Stagnation, where a lack of objective self-correction mechanisms causes models to lock into erroneous initial stances, preventing the generation of new insights.To address these limitations, this paper proposes MetricMAD, a Hybrid Metric-Guided Multi-Agent Debate framework specifically designed for keyphrase extraction. We engineer an adversarial "Extractor-Critic" environment that leverages dialectical interaction to refine candidate phrases. To prevent the debate from devolving into varying consensus without quality improvement, we introduce a novel destructiveness-based hybrid metric as a hard arbitration mechanism. This metric objectively evaluates the contribution of each phrase to the document's semantic integrity, guiding the multi-agent system toward convergence on semantically precise and factually reliable keyphrases.Extensive experiments across six standard benchmarks (Inspec, Krapivin, NUS, SemEval-2010, SemEval-2017, and DUC2001) demonstrate that MetricMAD significantly outperforms strong baselines and standard LLM prompting strategies without requiring annotated training data. These results establish a new state-of-the-art for zero-shot keyphrase extraction and offer a robust methodology for high-fidelity knowledge acquisition from scientific texts. |
| 10:10 | Mapping Technology Landscapes and Optimization Pathways for Age-Friendly Mobile Applications ABSTRACT. This study aims to systematically identify core technical features of current age-friendly apps and quantify differences in aging-friendly technologies across app categories through technical mining methods, thereby providing scientific data support for app optimization. With app store reviews, product descriptions, and interview transcripts of elderly users as research materials, we adopt LDA topic modeling to build a research framework featuring “text data mining-feature extraction”. Results indicate that technical mining successfully identified “large font display,” “voice interaction,” and “simplified navigation” as high-frequency technical features in age-friendly apps. Financial payment apps performed worst in “simplified operation processes,” while news and information apps showed significant deficiencies in “information density control.” Positive mentions related to “voice assistance features” and “real-time feedback mechanisms” in elderly user interview texts showed a significant positive correlation with the intensity of technical application. This study expands the application scope of technical mining in evaluating age-friendly products, providing crucial decision-making support for enterprise app iterations and regulatory bodies in establishing technical standards. |
| 10:30 | Identification of Technological Innovation Types Based on Multi-source Heterogeneous Information Network Under Policy Orientation ABSTRACT. Against the backdrop of strengthening the strategic layout of science and technology and promoting industrial transformation towards innovation-driven development,policies have become the core basis for defining key areas of technological innovation and allocating innovation resources.However, current research on technological innovation generally suffers from insufficient integration of the policy dimension,lack of multi-source heterogeneous data fusion,and weak forward-looking prediction capabilities.In response to these issues,this study integrates policy documents,patent literature,and scientific papers to construct a policy-oriented multi-source heterogeneous information network (HIN).Furthermore,the HAN-LSTM model is designed to achieve collaborative fusion of multi-dimensional node features and mine potential policy-technology associations through link prediction.A three-dimensional indicator system is then established to identify types of technological innovation.Subsequently, an empirical study is conducted focusing on the field of solar cells.Experimental results demonstrate that the HAN-LSTM model significantly outperforms baseline models in link prediction performance,effectively identifying incremental technological innovations and radical technological innovations under policy orientation,and providing differentiated guidance on innovation directions for different types of enterprises.This paper not only improves the identification method of technological innovation types under policy orientation but also expands the application scenarios of heterogeneous information networks in the field of technological innovation,thereby providing support for promoting the efficient allocation of policy-led science and technology resources and enterprise innovation decision-making. |
| 10:50 | Empirical Orthogonal Function Based Analysis of Domain Knowledge Collaborative Evolution: Revealing Knowledge Teleconnection Phenomena PRESENTER: Kaiwen Shi ABSTRACT. Against the backdrop of scientific and technological development increasingly relying on knowledge linkage and restructuring, traditional science and technology management paradigms based on proximity assumptions struggle to reveal the cross-disciplinary, non-adjacent knowledge interaction mechanisms underlying major original innovations. They particularly overlook the implicit, long-range, and time-lagged co-evolutionary patterns within knowledge ecosystems. To address this, this study introduces the concept of knowledge teleconnection, aiming to develop a domain-based coevolutionary analysis method grounded in empirical orthogonal function (EOF) techniques. This approach seeks to detect and interpret synergistic relationships among knowledge units that are semantically distant within disciplinary spaces yet exhibit statistically correlated developmental trajectories. Using artificial intelligence as a case study, this research analyzes literature data from Web of Science spanning 2000–2024. By extracting key spatio-temporal modes through empirical orthogonal function analysis, it identifies distant correlation pairs within the field—such as between deep learning and traditional heuristic algorithms, or large language models and conventional machine learning—and constructs a distant correlation network to reveal AI's knowledge co-evolutionary structure. The findings not only theoretically expand analytical paradigms for knowledge co-evolution, offering new perspectives on understanding complex dynamic behaviors in innovation systems, but also provide actionable data-driven support for forward-looking science and technology management planning, interdisciplinary cultivation, and disruptive innovation anticipation. This advances the paradigm shift from static knowledge structure mapping to dynamic knowledge system comprehension. |
| 11:10 | A BERT-Driven Approach for Identifying Patent Risks in Export Control Contexts PRESENTER: Xiaoliang Zhang ABSTRACT. Abstract: Amidst intensifying global technological competition, traditional manual methods for identifying patent-related export control risks are increasingly inadequate. This study proposes an intelligent screening approach based on Natural Language Processing (NLP). Focusing on quantum technology, we constructed a unified semantic representation model by fine-tuning Sentence-BERT with a weakly-supervised contrastive learning strategy, merging patent texts and U.S. Export Control Classification Number (ECCN) regulations. Evaluated on a test set of 200 patents (81 controlled), the model calculates cosine similarity between patent and ECCN embeddings. Through systematic threshold optimization, it achieved a peak F1-score of 0.633 (Recall: 85.2%, Precision: 50.4%) at the optimal threshold. The high recall rate confirms its effectiveness as a preliminary filter to minimize missed controls. This work validates the feasibility of automated patent screening via semantic similarity. While precision requires further improvement, the proposed method offers an efficient, weakly-supervised solution for matching technical and regulatory texts, with significant practical application potential. |