GTM2019: 9TH GLOBAL TECHMINING CONFERENCE
PROGRAM FOR THURSDAY, OCTOBER 17TH

View: session overviewtalk overview

09:00-09:30 Session 1: Welcome and Keynote

Welcome - Denise Chiavetta and Alan Porter

Keynote "What are we "mining" when Tech Mining?"  - Caroline Wagner

Location: Rm 222
09:35-10:35 Session 2A: Network Analysis
Location: Rm 222
09:35
A Network-Based Automated Approach for Identifying Technological Spillovers with an Application in Solar Photovoltaics
PRESENTER: Bixuan Sun

ABSTRACT. Innovation spillovers is a central concept in theories of technological change, but detailed empirical studies have been limited. Utilizing advances in multiple academic disciplines, this paper develops a new methodology to identify the role of spillovers in the historic advancement of a technology domain. Our methodology consists of three unsupervised steps. First, we utilize patent citation networks to build a network of patents within a field, and identify the patents that represent key positions in that sector’s technological trajectory. Second, we utilize natural language processing to quantitatively measure the “technological distance” between patents. Specifically, we apply the Latent Dirichlet Allocation algorithm to categorize patent abstracts based on word co-occurrence within documents, and calculate technological distance between two patent abstracts. Finally, we use econometric techniques to identify patterns in the technological distance between cited and citing patents at points in the citation network that suggest particular significance for the sector. These patterns quantify the dependence of high network importance on citation relationships with more distant prior art. We demonstrate our method by studying the development of solar photovoltaic (PV) technology over the period 1901– 2018 using PATSTAT database. The results show the contribution of other technology sectors to the development of technology critical for addressing energy and environmental challenges. The results can also help identify possible new roles and types of activities for public policy and research organization design, complementing other insights on the significance of public and private R&D funding and deployment support.

09:55
Emerging Terms and Co-authorship Dynamics: The Case of Microneedle Technology
PRESENTER: Stephen Carley

ABSTRACT. This study aims to enhance our understanding of how research actors contribute the variation and selection of topics within the domain of an emerging technology space. Our case study is Microneedles. A network perspective on how research actors contribute to the generation as well as selection of topics associated with an emerging technology is needed to provide insights on how uncertainty and ambiguity are reduced throughout the emergence process by increasing topic coherence as well as how certain pathways are selected out during this process.

10:15
Frugal invention candidates as antecedents of frugal patents – the moderating role of frugal attributes analysed in the medical engineering technology
PRESENTER: Martin G Moehrle

ABSTRACT. Frugal innovations offer companies the access to new market segments, based on some specific characteristics. Typically, they focus on an existing solution (be it physical goods or services) as blueprint, then change parts of its functionality and reduce the costs, often dramatically, and focus on the performance of core features. For many companies, be it in developed or emerging countries, the questions arises how to find ideas for such frugal innovations. For this purpose, we introduce a patent-based method for the identification of frugal invention candidates and their qualification as frugal patents. We sketch a process with four steps in general, than apply this process in the medical engineering technology. In particular, using semantic analysis, we shed light on the moderating role of frugal attributes to qualify a frugal invention candidate as frugal patent. The application results in a comprehensive set of frugal patents. Our approach deepens the understanding of frugality by providing an appropriate assessment based on a newly developed frugal thesaurus as well as underpinning the role of engineering achievements (instead of simple downsizing). Managers and experts from other industries may adapt our method to their field of experience or search.

09:35-10:35 Session 2B: Novel Data
Location: Rm 225
09:35
Research on subject profile of stem cell based on knowledge graph
PRESENTER: Zhengyin Hu

ABSTRACT. With the expansion of research fields and rapid increase of scientific literatures, knowledge structures of subjects become more and more complex. Subject profile can reveal the multi-level and multidimensional knowledge structures of subjects like user profile characterizes users. As a new knowledge organization technology, knowledge graph can be used to construct subject profile. Stem cell and regenerative medicine is the frontier of life science, and it is expected to become the third way to treat diseases after drug therapy and surgery. Based on multi-source scientific literatures and information, a domain knowledge graph of stem cell was constructed. Based on the knowledge graph, we are designing and constructing subject profile of stem cell, which is of great significance to help decision-makers and researchers to grasp the overall research of stem cell at different levels and perspectives.

09:55
Identifying the potential emerging technologies: A machine learning approach using academic papers and Twitter data
PRESENTER: Yang Wen

ABSTRACT. With the high integration of science and technology development, emerging technologies such as nanotechnology, biotechnology, artificial intelligence and so on are rapidly emerging. Identifying potential emerging technologies as early as possible is crucial for government and enterprise research and development strategic planning and innovation policies to gain first-mover advantages in market competition environment and improve their competitiveness. Researchers mostly use papers or patent data, and apply some quantitative methods (such as bibliometrics, patent analysis, trend extrapolation, etc.) to identify emerging technologies, but they rarely use social media data related to emerging technologies (e.g., such as Twitter data), and these methods are difficult to process large-scale data. Therefore, in order to avoid the one-sidedness of using papers or patent data alone, and to solve the problem of batch processing of large-scale data, this paper proposes a framework for identifying potential emerging technologies using machine learning method with academic papers and Twitter data. In this framework, firstly, we use machine learning to identify the seeds of emerging technologies using multiple science breakthroughs indicators. Secondly, we use Twitter data mining to analyze Twitter users' sense of, response to, and expectations for these seeds of emerging technologies. Finally, we analyze a comparison of the results of machine learning and Twitter data mining to identify potential emerging technologies. The solar cell technology is selected as a case study. This paper contributes to identify potential emerging technologies, as well as understand their emergence and development, and will be of interest to solar cell technology R&D experts.

10:15
The adoption patterns of advanced technologies in Canada

ABSTRACT. Technology adoption has multiple benefits including productivity increase and higher quality of products, which in return can lead to increased economic performance. The industry 4.0 revolution is made possible by the advances in ICT technologies allowing the integration of technologies such as cloud-computing and IoT which leads to smart-manufacturing (SM). This paper aims at understanding the adoption patterns of advanced technologies by Canadian firms. Our paper uses the apriori algorithm, which looks for patterns in technology adoption. We focus on a market basket analysis approach to understand what bundles of technologies are being adopted by Canadian firms. Our data comes from the 2014 edition of the Survey of Advanced Technology (SAT) provided by Statistics Canada. Results show that only 5 % of firms adopted 3D printing in 2014 or before. If 3D printing for metals was adopted, there is a 75% 3D printing for plastics was also adopted. This particular set of technologies is isolated from the rest, suggesting that only early adopters have been experimenting with it, Furthermore, when we look at what firms are planning to adopt, 3D printing is the most popular set of technologies, suggesting that 2014 was in fact the birth of Industry 4.0. The study confirmed the low uptake of key SM technologies. The study also showed that adopting advanced technologies might be a complex process as firms usually, must adopt not only one technology, but a bundle of technologies.

10:35-11:00Coffee Break
11:00-12:20 Session 3A: Topic Identification
Chair:
Location: Rm 222
11:00
Identifying the latent technology opportunities based on a perspective of coupling publications with patents

ABSTRACT. In the prior studies on TOA (technology opportunities analysis) preferred to the relevant approaches based on patent data. In the traditional methodological framework, the patents are the main data for the technology forecasting, technology evaluation and innovation opportunities analysis. Basically, besides the patent data, publication data also could provide the supplemental information to enhance the TOA, especially for some specific topics of science or technology; for example, in the early stage, the relevant publications on some emerging technologies or emerging materials could be much more than the patents. Therefore, an integrated framework for TOA through a novel perspective of coupling publications with patents is proposed in this paper. Compared with the prior studies, two research gaps are described and attempted to mitigate through a new perspective and an integrated computing framework. To verify the proposed framework of computation in the above, a case study on artificial intelligence is conducted, and over 230 thousands publications retrieved from WOS (Web of Science) and over 26 thousands patents from DII (Derwent Innovations Index) are included.

11:20
Evaluating the effect of time and journal quality to topics: Structural topic models of scientific publications
PRESENTER: Arho Suominen

ABSTRACT. Recent literature has extensively used Topic models to analyze scientific and patent document. Topic models, such as Latent Dirichlet Allocation, are challenged by the fact that the model does not incorporate any metadata, which could potentially have an impact to how latent themes should be interpreted. Recent works have developed multiple targeted topic models, such as the author topic model or dynamic topic model, that allow for controlling one metadata attribute. Structural Topic Model add to the more targeted topic models by discovering topics but also estimate their relationship to multiple metadata variables. Taking advantage of this possibility, this study focuses on the impact of time and journal quality on topics, evaluating 1) if topics are impacted by the emergence of new research themes or 2) by different type of research being presented in higher quality journals. The study is conducted using a 75 479 ISI WoS records on fuel cell technology ranging from 1990 to 2014. Using Python programming for pre-processing and R package "stm" for analysis, our study concluded that time had a low, in many topics no statistically significant, effect to topics across time. However, for many topics journal quality had a statistically significant effect.

11:40
Term-based topic extraction incorporating word embedding techniques: A comparative study
PRESENTER: Yi Zhang

ABSTRACT. Topic extraction, known as the way of modelling data, extracting features, detecting regularities, and identifying topics for representing the data, is a key concern of the bibliometric community for decades. This study aims to develop a term-based method for topic extraction with the incorporation of word embedding techniques. In this paper, we mainly focus on the incorporation of word embedding for term representation, the design of validation measurements for term-based topic extraction on bibliometric data, and the comparison of popular clustering algorithms on given tasks of topic extraction. This study would provide experimental results for the development of novel clustering methods incorporating word embedding techniques, and create solutions of accurate topic extractions for the bibliometric community.

12:00
Evolution of Topics and Novelty in Science
PRESENTER: Omar Ballester

ABSTRACT. Topic models can provide an insight into the semantic structures of texts. These techniques, originated in computer science have emerged as a possible solution to topic extraction from scientific publications. In this paper, we implement Doc2Vec, one of the most recent developments in topic modelling based on Neural Networks, on the production of researchers. We overcome the major limitations described in the literature of the application of topic models to knowledge domains, and explore how our model helps discover novelty and interdisciplinary research.

11:00-12:20 Session 3B: TechMining Methods
Chair:
Location: Rm 225
11:00
Revealing Distinct Association Patterns in Disease-Gene-Drug Based on Coupling Network and Subject-Action-Object-Triples
PRESENTER: Yali Qiao

ABSTRACT. A huge amount of associations among different biological entities (e.g., disease, drug, and gene) are scattered in millions of bio-medical articles. Systematic analysis of such heterogeneous data can infer novel associations among different biological entities and make further efforts to propose novel therapeutic targets or decipher disease mechanisms. However, little research has been devoted to investigating associations among drugs, diseases, and genes in an integrative manner. In this paper, we use MEDLINE/PubMed data and extract biological entities and their associations by applying SAO (Subject-Action-Object) method. Further, we construct a three-layered coupling network to describe the connection between them and use community-detection algorithm to cluster the entities on different layers respectively. Our results investigate whether the community-detection algorithm could help prioritize disease genes based on the associations between diseases and their surrounding genes, and it will help researchers generate testable hypothesis of possible roles of genes in specific disease research. In addition, diseases in the same community are likely to be associated than the other diseases on the basis of “guilt by association” rule, which can also help researchers to infer new disease relationships. Second, this paper also suggests association pattern between disease and drug, in which two disease associated with each other are targets for the same drug, which support the hypothesis that similar disease can be treated by same drugs, allowing the opportunity for drug repositioning purpose.

11:20
A enhanced text-based approach to identify and trace technological topics: Illustrated for additive manufacturing
PRESENTER: Ying Huang

ABSTRACT. Technological innovation is a continuous and dynamic process that spans the entire life cycle of an idea, from scientific research to production. The accelerated competition of technological innovation and the increasingly complex process of technological changes propose a higher challenge to trace the evolution of technology, especially for the emerging technology. How to systematically and effectively analyze the patent literature, so as to more comprehensively, deeply and accurately understand the process and trend of technological evolution, is of great significance to the improvement of technological innovation and technological competitiveness. This papers proposes a approach to extract the technological thesaurus from patent title, build the corresponding relationship between the patent documents and the technological topics, and then trace the emergence, derivation, fusion and extinction of these technological topics from the perspective of the technology similarity. Additive manufacturing, an emerging technology with great representativeness and development potential, is selected as a case study to explore the technology development path in different stages of its technology life cycle. Such a systematic approach analyzes the distribution of technological topic in each stage and tracks the evolution of these topics comprehensively and meticulously.

11:40
Research on Potential Drug Side Effects Recognition Based on SAO Semantic Network
PRESENTER: Jiayun Wang

ABSTRACT. New drugs need to undergo clinical testing before they put into use, which can identify the efficiency and side effect of the drugs.However, a large number of clinical tests will bring dramatically costs. Therefore, predicting and identifying potential drug side effects can save time and cut costs. This paper aims to construct an effective method for identifying potential side effects of drugs by using Subject-Action-Object (SAO) semantic analysis. We use PubMed to retrieve all articles related to drug treatment and its side effects during. First, the SAO structure was extracted from the bio-medical papers we obtained. Second, we clean the SAO structure on the basis of UMLS corpus and side effect vocabulary (SIDER). Then we can get three types of associations between different entities, such as "drug-drug", "drug-side effects", and "side effects-side effects". Link prediction is used to predict new connections between drugs and side effects. Our results can be used to identify drug side effects that are not existed in researches, and help researchers to propose reasonable hypothesis that can save time and cut costs for research in bio-medical field.

12:00
Research on Evolutionary Process Recognition of Therapeutic Technology for Complicated Diseases
PRESENTER: Mengge Xia

ABSTRACT. The topic evolution model can help researchers quickly sort out the development of technical topics. Previous studies have divided the thematic relations into five types. On this basis, the analysis of the relationship between the maturity or attention of the topic and the evolution patterns can help researchers discover the potential law of technology development and make more accurate predictions. Consider possible relationships between topics at the semantic level, this paper proposes a topic evolution pattern analysis method based on the improved similarity calculation model and takes Alzheimer disease as the object of empirical study to verify the effectiveness of the evolutionary model. Firstly,we extract the MeSH terms of the literatures in the field from Pubmed database and obtain the document-topic frequency matrix. Then, the time period is divided and multidimensional evaluation is carried out for each time stage including. Density and heat index are used and mapped to two-dimensional coordinates. Finally, we identify the topic association relation of adjacent time periods and construct the topic evolution diagram. In this paper, the cosine similarity formula is improved by using the semantic similarity value between terms, so as to obtain more accurate topic correlation between adjacent sub-periods. On this basis, we combine the results of the topic evolutionary capability evaluation to identify and analyze the evolution regular pattern of the four topic evolution modes: emergence, division, integration and inheritance, so as to discover the possible relationship between topic maturity or attention and evolution patterns.

12:20-13:50Lunch Break
13:15-13:45 Session 4: Power Talks
Location: Rm 222
13:15
A Multidimensional Bibliometric Indicator to Identify Top Players
PRESENTER: Rebecca Jansen

ABSTRACT. Publication counts have traditionally been used to identify top organizations and researchers who are subsequently labelled experts/leaders. However, the word ‘top’ is subjective and hinges on what is being measured. For our purpose, ‘top’ refers to research excellence in order to find potential collaborative partners. In this scenario, additional bibliometrics add value to the process, particularly for assessing quality vs. quantity.

We have investigated whether a multidimensional bibliometric indicator that takes several metrics into account can help identify ideal candidates for collaborative ventures. We employed five dimensions of science excellence to calculate one indicator: volume, research impact, usage, prestige and collaboration. Initial results suggest that our methodology provides a more level playing field from which to rank research excellence as opposed to publication counts alone, which only identify large organizations or established researchers with high output. As a result, organizations or researchers that rank highest in our multidimensional indicator are not necessarily those with the highest publication rates.

A combination of metrics provides a more meaningful and holistic assessment of top players. We propose that the strength of our indicator is the potential identification of ‘hidden gems’ or ‘rising stars’ who have not amassed the highest publication counts but have produced a very respectable amount of high quality collaborative research. More specifically, when looking for partners in potential collaborative ventures this multidimensional indicator is more apt to identify reputable researchers or smaller organizations who produce high quality work with a tendency to work collaboratively on an international scale.

13:20
Research on Technology Opportunity Analysis of Treating AIDS Based on Link Prediction in Dynamic Networks
PRESENTER: Xinglu Wang

ABSTRACT. The biomedical literature stores a large amount of medical information. With the rapid growth of biomedical literature, Knowledge Discovery in Text (KDT) has become an effective means of discovering and acquiring information in the literature. The SAO structure is a kind of knowledge representation in the form of "subject-predicate-object" to represent the knowledge units and their semantic relations in the literature, which overcomes traditional text mining methods' limitation. This paper intends to use AIDS treatment technology as an example. Based on the "AIDS/HIV" keyword in the PubMed database for retrieval and identification technology, the SAO semantic mining method is used to identify the "key problem-solution" pair and obtain the disease. Treatment technology. Then, based on the SAO semantic structure and keyword co-occurrence, the evolutionary relationship between disease treatment techniques is identified, and the analysis and evolution path identification of disease treatment techniques are realized. On this basis, the SAO network is constructed, and the relationship strength between nodes is calculated. The SAO network analysis is carried out. The development trend of emerging technology is analyzed with five indicators: in&out degree, key action , node "degree distribution" evolution and network center deviation.

13:25
Monitoring and Forecasting the Development Trends of Emerging Technologies Using Text Mining and Citation Analysis: The Case of Nanogenerator Technology
PRESENTER: Mingjie Fan

ABSTRACT. Researchers usually use academic papers or patents data and apply citation analysis to identify the technology evolution paths and development trends. The time lag of citation analysis makes the citation analysis results incomplete and cannot fully explain the evolution path of technology. With the development of computer technology, text mining plays an increasingly important role in exploring the development trends of technology. Therefore, this paper proposes a framework that uses academic papers and patents as data resources and integrates the text mining and citation analysis to monitoring the evolutionary path of emerging technologies and forecasting the changing trends of these emerging technologies. The nanogenerator technology is selected as a case study. Firstly, we apply citation analysis to monitoring the basic evolutionary path of nanogenerator technology. Secondly, we apply Hierarchical Dirichlet Process (HDP) model, which is a kind of text mining method to extract the technical topics appearing in academic papers, and we analyze a comparison of the results of citation analysis to improve the technological evolution path in the past few years that is missing due to the time lag of citation analysis. Finally, considering the gaps between academic papers and patents, we also apply the HDP model to extract the technical topics appearing in patents in the past few years, and based on the technical evolution path, we analyze a targeted comparison of the technical topics in academic papers in the corresponding years, so as to forecasting the development trends of the nanogenerator technology patents in each evolution path.

13:30
A scientometric and text mining approach to identify the research hotspots and emerging topics on Himalaya using the Web of Science records

ABSTRACT. This study deals with an analysis of the research publications dealing with Himalaya and indexed in the Web of Science database covering the period between 1901 and 2018. Besides the traditional bibliometric/scientometric approach (e.g., determining the shares of countries, research institutes and publication outlets, etc.), it will focus on prolific fields, such as geosciences, and high-impact (top 10% by citation) papers in order to identify hotspots. Using the highly collaborative and interdisciplinary dataset, an attempt will be made to perform text mining (e.g., using the noun phrases extraction and identifying the emerging topics) especially to shed light on the various themes that include the traditional (tectonics, biodiversity, monsoon) as well as more recent ones (earthquake, deformation, glacier, precipitation, climate-change) that are of utmost societal relevance.

13:35
A Big Data Knowledge Computing Platform for Information Research
PRESENTER: Ning Yang

ABSTRACT. Information research is a method of using modern information technology and soft science research methods to form valuable information products by collecting, selecting, evaluating and synthesizing information resources. With the advent of the era of big data, the core work of information analysis with data is facing enormous opportunities and challenges. How to make good use of big data in an effort to solve the problem of big data, optimize and improve the traditional information research methods and tools, innovation and research based on big data are the key issues that need to be studied and solved in current information research work.So,we design and implement a universal knowledge computing platform for information research, which enables intelligence analysts to easily use all kinds of big data analysis algorithms without writing programs (http://www.zhiyun.ac.cn).

13:50-15:00 Session 5: Measuring Tech Emergence Contest Special Session

Noting rich synergies among analyses treating different facets of tech emergence, the VPInstitute sponsored a Measuring Tech Emergence “contest” this past April 2019 to generate novel and viable indicators. In this special session, we present contest winners as well as showcase the variety of approaches and challenges.

  • Welcome & Overview - Denise Chiavetta
  • Empirical Scoring - Alan Porter
  • Judges’ Perspectives -Nils Newman, Dewey Murdick, Phil Shapira
  • Short presentation of contest submissions:
  • Panel Discussion (contest participants, 3 judges)
  • Open Discussion

The Panel will consist of a selection of the 13 contest submissions, as represented below:

Location: Rm 222
13:50
Identifying Emerging Technology: A Neural Network Based Solution
PRESENTER: Jin Mao

ABSTRACT. We formalize the problem of identifying emerging technology as the prediction of emergence score for terms by applying neural network based solution. Details of our methods are as follows. 1. Data enhancement. References of the articles in the dataset are completed. 2. Term Identification. An open source tool, The Termolator (Meyers et al., 2018), is applied to identify meaningful terms from the dataset. These terms have a few statistical characteristics when comparing with the background multidisciplinary dataset that consists of 8,683 articles from Nature during 2003 to 2012. Acronyms are regulated as well. 3. Defining emerging score. The number of document frequency at year y for a term t_i is regulated by a smoothing factor, δ

13:50
A Machine Learning Framework for Predicting Emerging Technologies
PRESENTER: Shuo Xu

ABSTRACT. Emerging technologies are closely related to emerging topics in terms of several well-documented attributes: radical novelty, relatively fast growth and prominent impact. Therefore, our previous work on detecting emerging topics (Xu et al., 2019a) is adapted to measure the technology emergence in the Tech Emergence Contest, but the DIM (dynamic influence model) is replaced by the TNG (topical n-grams) model, and three indicators are designed to reflect the above attributes. In more details, the relatively fast growth indicator is calculated from the results of the TNG model and the radical novelty indicator comes from the CIM (Citation Influence Model). As for the prominent impact indicator, the involving authors are used. The following fields are utilized to develop the models: title, abstract, publication year, author and cited references. Due to missing cited references in the provided dataset, cited references are retrieved from Web of Science according to DOIs, which are further cleaned with a method in our another work (Xu et al., 2019b).

13:50
A study on emerging topics discover by text minging
PRESENTER: Ning Yang

ABSTRACT. Our method mainly used three fields of pubyear, title and abstract in WOS data to measure the emerging topics with three steps. First, LDA model was used to extract the topics in the titles and abstracts. Second, topic time intervals data (topic intensity timing change) was constructed by integrating the publication year. Finally, emerging topics were identified by the emerging topic measurement algorithm.

13:50
Measuring technology emergence: Different approaches, different outputs
PRESENTER: Chao Min

ABSTRACT. Numerical evaluations play an important role when judging the outputs of technology emergence measurements. In this paper, we show that different measurement methods have various performances. We demonstrate this issue on three datasets with different approaches. First, the choice of calculator involves how to count a term (document or term frequency) and whether to remove the overall trend. Second, various models are available for selection such as delay index, boost value, ARIMA, and exponential smoothing. Third, emerging topics can be built up from either bottom-up or top-down strategies. Results show that the outputs of these approaches are quite different and that delay index is best consistent with the golden standard we set. In addition, document frequency (on Dssc and Neuro datasets) and term frequency with trend removed (on Smarthome dataset) are the best calculators respectively. However, the construction of golden standard is still up in the air. Our results indicate that evaluations on technology emergence measurement can be difficult unless proper baselines are available for comparisons.

13:50
Measuring Tech Emergence: Selected bibliometric approaches for the identification of emerging and promising technologies
PRESENTER: Edgar Schiebel

ABSTRACT. It is of great interest to identify emerging topics and technologies.For decades, the bibliometric community has put much effort to identify these emerging technologies in a quantitative way, referring to scientific publications data (Small et al., 2014). Most bibliometric studies generally start out with a given small topic which has been identified as emerging and analyze its characteristics. In contrast, a smaller number of publications apply data mining, text mining, citation analysis and bibliographic coupling networks to structure a given technology field with the help of cluster and mapping algorithms and subsequently assess the level of emergence in the found clusters (e.g. Small et al. 2014 and Shibata et al. 2011). As a contribution to the Tech Emerging Contest at the 9th Global Tech Mining Conference and the second strand of publications identifying emerging technologies, we analyze the given technology field “synthetic biology”. The contest searches for methods that provide a reproducible procedure to identify emerging R&D topics within a technology domain. The data is gathered from the Web of Science (WoS) database. The challenge is to best predict topics that are notably active in the two last years of research measured by three parameters: (1) scale which addresses the focus within the technology domain, (2) final data set which has to be created based on some basic key words, (3) analytical approaches used to (4) finally provide a ranked list of ten to thirteen precise terms that describe emerging topics in the respective field (output).

13:50
Co-word Network Embedding: A Way to Detect Emerging Technology
PRESENTER: Zihong Wang

ABSTRACT. Emerging technologies are often perceived as having large undeveloped economic potential and being capable of promoting the society status quo. With technological development, practical applications, or both largely unrealized, how to capture new terminology that may represent emerging technology, and what does the terminology mean? To address this question, we propose our new algorithm to identify the emerging technology and its relevant concepts. The method we developed combines co-word network and neural network model to get dense representation of emergent terms and conduct further indicator analysis. As a result, we obtain emerging technologies with their relevant technologies and concepts respectively. This can help policy-makers and entrepreneurs figure out which technologies are more important and their relevant items such as other technologies, equipment, methods and so on.

13:50
Predicting Emerging Technologies: A Text Mining Approach Based on The Temporal Exponential Random Graph Model (TERGM) Methodology.
PRESENTER: Yang Guancan

ABSTRACT. Emerging technologies are of great interest to a wide range of stakeholders in both industry and government who aim to set up investment-related strategies. Technology forecasting offers a relevant opportunity in this direction and is currently a hot upcoming area of research. However, accurate forecasting of emerging technologies is still problematic mainly due to overly dependent on some external measuring indicators with ex post evaluation perspective. In this study, we propose a novel a text mining approach based on the temporal exponential random graph model (TERGM) methodology to forecast emerging technologies terms on data from WOS and PubMed publication data. This approach focuses on the term-pair rather than term frequency, which makes it possible to use network inference method to estimate the odds of tie (term-pair) formation on the time series scenario. The approach also allows representing the embedding relationship of MESH terms pair and emerging terms. The case of biotechnology shows that our approach can facilitate responsive technology forecasting and planning.

13:50
Forecasting technology emergence: Scenario exploration and prediction accuracy
PRESENTER: Li Tang

ABSTRACT. With a growing number of technology forecasting methods, evaluating and improving prediction accuracy has gained importance yet remaining under-investigated. Build upon previous studies, we develop new measures to evaluate technology emergence. The method is applied to a case study of synthetic biology, in which we examine tech emergence using publications extracted from the Web of Science in the period of 2007-2016. Multiple sources and weighting schemes are adopted for evaluation and prediction under various forecasting scenarios. We calculate and compare the prediction accuracy rates against the gold standards: the actual top tech terms emerged in a later period (Y2017-Y2018) in Web of Science publications and Derwent Innovation Index patents. Our evaluation scheme has the potential to improve technology forecasting method and contributes to wiser public funding and smarter private investment in emerging technologies.
 

15:00-15:25Coffee Break
15:25-16:25 Session 6A: Using Web Data
Location: Rm 222
15:25
Interdisciplinarity beyond bibliometrics - (in)validation of website information as an indication of interdisciplinarity
PRESENTER: Rainer Frietsch

ABSTRACT. Interdisciplinarity is the integration and use of different aspects from different disciplines. The main task of the project underlying this presentation is to measure and assess the level of interdisciplinarity of research units beyond bibliometrics. The concrete research agenda of this presentation is the extraction and analysis of indications of interdisciplinarity from websites by all Fraunhofer institutes. Tests of usability and reliability as well as a (in)validation of the newly extracted indicators will be conducted based on existing indicators of interdisciplinarity. We use text data for the identification of interdisciplinarity and assess the intensity of the interdisciplinarity of each institute. For this purpose, we scraped the websites of all Fraunhofer institutes. This unstructured data will be analysed using text mining approaches. We expect to extract four different kinds of indicators based on the websites of Fraunhofer institutes. We expect our analyses mainly to be of conceptual/methodological value. If we are able to show that text-based - websites or annual reports are more easily accessible - approaches substitute (or supplement) some of the indicators, we would be able to generalize our analytical approach and assess the level of interdisciplinarity of any research unit/entity worldwide.

15:45
Could the organisation’s websites be a valid data source for research? - An analysis of the complementary nature between web-based indicators and traditional indicators in innovation studies

ABSTRACT. In this exploratory study, we use a web mining technique to source data in order to analyse innovation of technological firms. 1570 tech companies were processed from which, a total of 965 websites have been extracted and analysed based on keywords related to 4 core concepts (R&D, intellectual property, collaboration and external financing) especially important for the innovation of SME tech firms. We built regression models with both questionnaire-based and web-based indicators in order to test the contribution of our Web-based indicators. Our results show significant models where web-based indicators contribute significantly to the explanation of innovation performance. This means that firms with a higher occurrence of keywords related to R&D, IP and External financing in their website display characteristics that have not been captured with our questionnaires. We conclude that some of the data extracted via our web mining technique can be complementary to other data derived from classical methods.

16:05
Comparing website measures on R&D with patenting indicators

ABSTRACT. New data sources, such as websites and social media, are readily available at modest to no cost, but little is known about how to operationalize valid and reliable measures for research in science and innovation policy. The purpose of this power talk is twofold: (1) to operationalize R&D keyword-based measures derived from firm websites, and (2) to correlate those measures to common indicators of patenting activity. This work builds on past research in the sense that it starts with a frame list of patenting firms, rather than firms identified through business databases. The findings therefore should be generalizeable to the population of patenting firms. The sample consists of 1,146 patenting firms whose websites were successfully crawled. Preliminary results show that there is a weak correlation between R&D mentions on websites and patenting intensity: Kendall’s tau is 0.16 and is significant at α= .01. As part of the power talk, we will review a limited set of other measures drawing on keyword-derived search terms and derive simple correlations, and possible multivariate regression outputs, to tease out how patenting firms make use of their websites to convey R&D and innovation-related themes. We will then assess how valid and reliable these measures are.

15:25-16:25 Session 6B: Network Analysis II
Chair:
Location: Rm 225
15:25
A dynamic network analysis method for identifying emerging technology
PRESENTER: Lu Huang

ABSTRACT. Identifying emerging technologies becomes an important issue which affects the development strategies of countries, companies and research institutions. However, due to the uncertainty, ambiguity and complexity of emerging technologies, the identification process is full of high difficulty and low accuracy. A dynamic complex network analytics methodology is proposed in this paper, which includes 3 steps: First, a dynamic weighted co-words network based on literature data from Web of Science (WOS) is constructed; Secondly, a link prediction model based on back propagation neural network algorithm is trained to predict the evolution of future network, through learning the features of historical networks; Thirdly, an indicator system based on predicted dynamic network is established to identify emerging technologies, which contains multi-dimensional factors—novelty, growth, coherence and impact. An empirical study on information science is conducted to demonstrate the reliability of the methodology.

15:45
LDA Meets Word2Vec: A Novel Model for Measuring Technology Similarity
PRESENTER: Xiaowen Xi

ABSTRACT. Technology similarity is an important basis for identifying potential technology competition and partners among subjects. Technical similarity needs to be identified through the similarity of R&D results. Patent information is the most important embodiment of global R&D achievements .Therefore, It is an important question about how to use patent information to accurately calculate the technical similarity between subjects.In previous studies, scholars have applied text mining for research, mainly include keyword analysis, Co-word analysis method, LDA topic model, etc.However, relevant frontier areas and technical topics cannot be meticulously represented. As a result, the measurement of technical similarity is too general and short of scientific. Word2Vec is a wordembedding model to predict a target word from its surrounding contextual words, It makes up for the shortcomings of LDA topic model.In this paper, we propose a hybrid approach by using both Word2vec and LDA to extract the technology topics of patentee in a semantic space.our hybrid method not only generates the relationships between patentee and topics, but also integrates the contextual relationships among words. Experimental results indicate that patentee features generated by our hybrid method are useful to improve the accuracy of similarity calculation. In order to further analyze the technical layout of the research field, identify potential competition and cooperation partners, we construct a multi-layer complex network of patentee-technical topics and analyze it.

16:05
Identifying Key Technology Convergence Based on Complex Network and Sentiment Analysis
PRESENTER: Jin Wang

ABSTRACT. Technology convergence is one of the important sources of innovation. The identification of key technology convergence can not only enable researchers in related fields to quickly grasp the technology, but also help enterprises to develop technology strategies. This paper combines network analysis and semantic analysis, and takes the patent information of related fields as the data. Firstly, the keywords in the patent title and abstract are extracted and selected to form the technical dictionary. The machine learning method is used to do sentiment analysis on the abstract sentences containing the topics. After that we obtain the convergence relationship between the themes, which is convergence recognition at the technical topic level. Then, through co-occurrence analysis of IPC of the same patent, we recognize the convergence in the technical field. Finally, based on the convergence relationship obtained from the first two steps, we establish a convergence network of technical topics and technical fields respectively, and through the relationship between the keyword-patent-IPC, the relationship between the two networks is constructed to form a two-layer network of “technical field-technical theme”. We evaluate the convergence between technologies through the topological indicators in complex network theory, the key technology convergence is identified in the end.

16:30-17:00 Session 7: Conference Conclusion

TechMining for Global Good 2019 Awardee Presentation - Thomas Woodson

Closing Observations - Cassidy Sugimoto

Wrap-up - Denise Chiavetta and Alan Porter

Location: Rm 222
17:30-19:30 Session 8: GTM/TPAC Poster Contest with Cocktail Reception

This year’s Global TechMining Conference is pleased to welcome members of GaTech's Technology Policy and Assessment Center (TPAC) as judges for this year's poster session.

Location: Hodges Room
17:30
A study on evaluation methodology for preliminary (ex-ante) feasibility analysis of government R&D program in Korea

ABSTRACT. In this study, systematical methodology for preliminary feasibility analysis of R&D sector will be introduced and discussed. The preliminary feasibility analysis refers to the preliminary validation and evaluation conducted by the Minister of Strategy and Finance in order to establish a budgeting and fund management plan for large-scale new programs. It is a unique system to pre-verify the feasibility of governmental led investment in science and technology. The preliminary feasibility analysis is aimed at preventing the waste of budget and contributing to the efficiency of financial management by making transparent and fair decision based on priority of new investment of financial business through objective and neutral investigation of the feasibility of large-scale government funded programs. The Republic of Korea has recognized the importance of science and technology as a driving force of national growth and has continuously increased the proportion of government fiscal spending. As of 2019, government spending on R&D exceeded about 19 billion US dollars. Over the past few decades, Korea's R&D has been investing heavily in areas of relatively clear economic and industrial effectiveness. However, in recent years, it has begun to be used as a means of exploring knowledge, advancing and solving social problems in order to transform into new R&D paradigm so is called ‘first mover strategy’. Many large R&D programs are being pursued to meet the government's policy objectives in Korea. For such a large-scale investment, it is necessary to examine the feasibility of the program's necessity, purpose, detailed activities, and implementation strategy.

17:30
Public R&D funds for influential publications
PRESENTER: Minki Kim

ABSTRACT. There is a debate that the quantitative-indicator-based programs to evaluate research performance hinder researchers from producing influential outputs. Although it is necessary to assess the quality of R&D projects, there is still a lack of social understanding about which elements of the research may increase the qualitative performance. Therefore, building on the prior studies showing the positive relationship between the autonomy of researchers and the quality of research, this study examines the factors that enhance the productivity of influential papers in public R&D projects. To this end, we assume that the amount of research grant and the research period are related to the autonomy of the research. The quality of the research is measured by the highly cited papers. Through the logistic and negative binomial regression analysis, this study shows that publishing highly cited paper is positively related with the amount of research grant and the research period. Substantially lengthy research period, however, is negatively related with the quality of the research. We further shows how age, gender, conducting multiple projects, and prior experience of publishing highly cited paper are related to the quality of research to induce a policy implication for a effective public R&D based on the quality of the study.

17:30
Proximity and Cluster Effects: The Case of Emerging Biotechnology Innovation Networks
PRESENTER: Duenkai Chen

ABSTRACT. Literature maintains that proximity has strong impacts on enhancing interactive learning and innovation in the clusters. However, what is less clear is how do interactions occur in the networks to develop linkage between actors in emerging high-tech sectors? To explore the associations that regional cluster and technological distance brought to enhance the formation of R&D networks in the emerging high-tech sector, this paper examines the R&D collaboration network of biotechnology industry in Taiwan. This paper aims to explore the R&D collaboration networks between the actors in the innovation system to understand whether cluster effects and technological proximity would enhance the R&D collaborations in the high-tech science-based sectors. The finding of this paper suggest that while the nascent sector stays in a small size, geographical proximity is not the most important factor to determinate the networking establishments between the actors in the innovation system. In contrast, the technological distance, the fit of specialties and the mutual complementary of the business is the key factor to drive the formation of collaboration networks and alliances in the biotechnology sector- a science-based sector. To enhance the collaboration network in a nascent science-based sector, cluster effects through policy intervention attempting to stimulate the collaboration networks between the actors may not be the mostly efficient enhancement. Instead, the strength of local knowledge base, shorten the technological distance, and enhance the mutual complementary between the actors would be most important enhancements to strengthen the local collaboration networks and the knowledge transfer in the networks.

17:30
Identification and Analysis of Innovation Value Stages in Research Fronts and follow-up studies
PRESENTER: Di Zhang

ABSTRACT. This study identified and analyzed the innovation value stages of research fronts and follow-up studies using ESI Research Fronts' core papers, citing papers and core authors' follow-up research papers, including the steps of the indexing of innovation value stage, the proposal of indexes of innovation value stage, the judgment of innovation value stage, the analysis of innovation value stage, the analysis of the stage of innovation value of the research groups, and the identification of the themes of innovation value stages. The innovation value stage indexing theme and words are determined by literature research and expert consultation. The innovation value stage is indexed from the three structural fields of title, abstract and keyword. The indexing method adopts automatic word-indexing. The innovation value stage indexes are proposed, which the scores are scored according to the position of the indexing words (keyword 4 points, title 2 points, summary 1 point), and the scores of each value stage of each paper are calculated separately, finally, the value stage which has the maximum value is taken as the value stage of the paper. The themes of innovation value stages are identificated by following steps: extracting sentences where the indexed words are located in the abstract, natural language processing, phrase cleaning, expert interpretation. And the corresponding empirical research is carried out with the research front of "CH3NH3PbI3 perovskite solar cells and inorganic hole transport materials".Finally, the effectiveness of the research method system and the reliability of the empirical results were verified by expert consultation.

17:30
Identifying Key Technologies Based on Scientific Papers Using Clustering and Complex Network Theory
PRESENTER: Yanpeng Wang

ABSTRACT. Key technologies play an important role in economic and industrial development. We defines key technologies as three categories in this study, which are hotspot technologies, generic technologies and emerging technologies. Our study utilizes a mixed methodology approach to identify the three kinds of key technologies mentioned above. First, we vectorize the papers by TFIDF, and K-means++ algorithm is used to cluster the papers, each cluster is interpreted as one hotspot technology. Second, we use the clusters as nodes, and cosine similarity between centroid vectors of each two clusters as the weight of edges to construct network of hotspot technologies. According to complex network theory, the more structural holes a node has, the more control benefits it occupies. Thus, we use the indicators of structural holes to identify genetic technologies. Finally, we use link prediction algorithm to predict the missing edges in the network of hotspot technologies, aiming to discover the fusion between different research areas, the missing edges and node pairs represent the emerging technologies. Artificial intelligence (AI) was selected as a case study. The result shows that: there are 14 categories and 132 hotspot technologies in AI, mainly distributed in the fields of machine learning, natural language processing, computer vision and robotics. Generic Technologies are mainly distributed in the field of machine learning, the basic algorithms and optimization methods such as classification and regression lay a solid foundation for the research in other AI fields. Emerging technologies are mainly distributed in deep learning, such as GAN and attention mechanism.

17:30
Detecting Fake News via Machine Learning- A Literature Review and Comparison of Existing Algorithms
PRESENTER: Duenkai Chen

ABSTRACT. The prevalence of the Internet has amplified the propagation of fake news and its detection is an emerging research area which has a lot of unfilled potential. The problem is complicated and needs more sophisticated approaches to completely eradicate it. In this paper, we explore the possibility to detect falsity in political news articles based on linguistic pattern using various machine learning models. For our dataset, we collected around 25,000 pre-classified articles, provided and available publicly online. “Bag-of-word” approach is used to extract features and then the data is fed into multiple classic text classification algorithms. Among the algorithms, SVC outperformed other classifiers, with an accuracy rate as high as 91.52%, considering precision, recall, f1-score, and support. To really solve the fake news problem, unsupervised learning and deep learning are the way to go in the long run because unbiased, labeled data is hard to find and requires an enormous amount of time and human resources, which is not efficient nor practical. Unlabeled data does result in lower accuracy and uncertainty in classification, but it is available all over on the Internet and is much easier to acquire than labeled data. The works we have done provide some useful insights into fake news and text classification problem.

17:30
A Comparative Analysis on R&D Collaborations in Emerging Sectors: The Emergence of AIOT
PRESENTER: Shihhsin Chen

ABSTRACT. This paper aims to conduct an in-depth study the emergence of emerging sectors. Through analyzing the emergence of artificial intelligence (AI) and internet of things (IOT), this paper attempts to explore which subject categories and IPC field that AI and IOT began with emerging in the past 10 years (2009-2018). The main research question this paper tries to answer is what role the innovation policy plays in enhancing the convergence of technologies from various emerging sectors? We assume the essential differences of various kinds of emerging technologies, the policy model has to be tailor-made to stimulate the development of emerging sectors. This study therefore proposes a conceptual framework which integrates the concepts of innovation ecosystem, innovation networks, and sectoral innovation systems to study the dynamic systemic features in building emerging innovation ecosystem. The ultimate contribution of this paper aims to identify the systemic features and propose a policy framework for better stimulating the development of emerging sectors in the era of global technology change.

17:30
A Study on Global M&A and Innovation Networks in Life Science Sector
PRESENTER: Shihhsin Chen

ABSTRACT. Existing literature maintains that the process of innovation follows non-linear patterns across the domains of science, technology, and the economy, the contemporary scientometric mapping techniques can be used to investigate and visualize the innovation process and the research collaboration networks among these perspectives. The perspectives can be represented as continents of data related to varying extents over the time. This research considers the impacts on the S&T policy on the establishment of R&D collaborations through analyzing joint research data from Web of Science (WOS), co-patenting (development) data gathering from Webpat integrated patent database, commercialization data from Medtrack and TEJ database in the life science sector. The finding shows that RD collaboration and M&A activities (merge and acquisition) gradually turns into more globalized. In the recent decade, the US firms dominate the global M&A activities. This finding shed light on the possibility to develop an innovation opportunities explorer between the US and East Asian countries. Potential Policy Implications will be addressed.

17:30
The Emergence of Global 3D Printing Innovation Ecosystem
PRESENTER: Shihhsin Chen

ABSTRACT. This research considers the 3D Printing R&D innovation networks through analyzing joint research data from Web of Science (WOS) and co-patenting (development) data gathering from the Webpat global patent database. Specific attention will be paid on observing the dynamics of innovation networks of 3D printing R&D in the past three decades (1989-2018). While existing literature maintains that the process of innovation follows nonlinear patterns across the domains of science, technology, the multiple-perspective approach enables us to reconstruct artifacts of the dynamics of innovation, shaping networking trajectories and resulting in more globalized regimes. The finding shows that that USA has always been a leading country in 3D printing and east Europe countries also play an important role. China started to become a crucial player of 3D printing in 2009. Regarding Taiwan’s situation, it has barely reached an outstanding position compared to these previous actors. However, Taiwan government started to increase supporting to 3D printing industry in 2014 because of its strong ODM and OEM background. This preliminary analysis also indicates that in the field of medical, veterinary science, electro-graphy, soldering, working metallic powder, and micro-organic compound are the main applications that current 3D-printing technology applied in the real-world. This study contributes to the literature on innovation ecosystem, systems of innovation, and has implications for scientists, industrialists, RD researchers, and policymakers.

17:30
Exploring a Method of Detecting Cross-topics of Science and Technology

ABSTRACT. It is very important to explore the science-technology relationship, especially the relevance of the theme level which deeply into the connotation of the innovation. On the one hand, the differentiation degree of the topics extracted by the LDA model used in the existing measure indexes (such as STL) is not high enough, and the topic analyzing is mistily and equivocally. On the other hand, VOSviewer supports text mining and topic clusting and overlay visualization, so it is worth exploring the feasibility of this application in detecting the science-technology correlative topics. In theory, we regard science and technology as two specific organizations and create their overlay maps, which are used to detect overlapping areas on the two overlay maps as related topic visualization, and then use the item’s score to calculate each cluster’s or topic’s skewness values to science or technology, which are used to determine the correlative topics from the perspective of quantitative data. Case study demonstrates the feasibility and comparative advantages of this approach.