GTM2017: 7TH GLOBAL TECHMINING CONFERENCE
PROGRAM FOR MONDAY, OCTOBER 9TH

View: session overviewtalk overview

09:00-09:35 Session : Welcome and Keynote Presentation

Welcome: Alan Porter and Denise Chiavetta, GTM2017 Program Co-Chairs

Keynote SpeakerStuart Graham, JD, PhD, 2010-2013 Chief Economist of the United States Patent & Trademark Office (USPTO)

Patent Offices and Patent Information – Advancing the Frontier

Dr. Graham will comment on the evolving role of patent offices in being providers of, and partners in, information and analytics. He will offer insights on how public-private relationships can be improved, and how complementary activities can be promoted to increase benefits to all information stakeholders.

 

Location: Rm 222
09:40-10:40 Session 1A
Location: Rm 225
09:40
Large-scale topic networks: can we improve efficiency and obtain similar results using LSH?
SPEAKER: Diane Gal

ABSTRACT. This study is an attempt to dine the most appropriate parameters for the application of locality sensitive hashing in the context of the creation of large document networks based on lexical similarities. LSH is an approximate nearest search algorithm. Using a document set from the Web OF Science and relevant to cardiovascular science 27 different networks will be built after the extraction of noun phrases using natural language processing. We expect that the accuracy of the network will increase together with the number of random hashing functions applied and with the number of permutations of these random functions. But that this increase has a quadratic relation with the required processing time and storage. the goal of this study is to find the optimal balance between these parameters and computational constraints.

10:00
Measuring Patent Similarity Based on SAO Semantic Analysis
SPEAKER: Yun Chen

ABSTRACT. Nowadays, patent as the most important intellectual properties is the way to protect technology achievements. The patent quantity is increasing rapidly. That makes it more difficult for examiners to find the similar patents quickly and more challengeable for applicants to evaluate the risk of patent infringement. How to measure patent similarity accurately and quickly has become an advanced research hotspot. There are three mainly methods to measure patent similarity including IPC codes analysis, citation analysis and keywords-based analysis. However they cannot express the semantic technology information relationship between patents. SAO (subject-action-object) structure analysis not only emphasizes keywords but also expresses the semantic relations of components in patent. Some researchers measure patent similarity based on SAO semantic analysis. But in previous study, the researchers just consider that every SAO structure is equally important for patent. As we know the same SAO structure may appears in almost all patents, when patents are around the same technology topic. It is appropriate to distinguish SAO structures appearing in many patents from those appearing in few patents. This paper present a method to get weight of each SAO structure extracted from the patent called DW (distinguishing weight). What’s more, we proposes a framework to find similar patents based on SAO semantic analysis considering the different weigh of every SAO structure in patent. A case study to measure the patent similarity about robot technology demonstrates the reliability of our method and the results indicate the practical meaning of our method to get more accurate result.

10:20
Drafting Alignment of Business Method Patent Documents: A Text-mining Approach

ABSTRACT. In this work, we define a measure of drafting alignment that appropriately quantifies the similarity between claims and specification of patent documents. Given that both the claims and the specification are text which usually are big by any standard measure of data, we use methods from text mining to accomplish our objective. To measure the alignment, we first remove from each of the two texts (i.e. from the set of claims and the specification) the punctuation, numbers, “stop words”, and to avoid case sensitivity, convert the text documents to lowercase. Subsequently we “stem” each word in each constituent text document to reduce it to its stem, followed by converting each resulting document to a vector, in line with the vector-space model. Further, we employ the SMART framework (Salton 1971) to prepare the document-term matrix corresponding to the two pre-processed text documents. The calculation of drafting alignment is then carried out using the obtained document-term matrix. After defining the metric, we build a statistical (regression) model to investigate the relationship between various patent-specific factors and the defined drafting alignment. The objective of such an investigation is to determine factors of which entrepreneurs or inventors should be cognizant, as potential “drivers” of drafting alignment.

09:40-10:40 Session 1B
Location: Rm 222
09:40
Evaluating research portfolio's through ontology based text annotation

ABSTRACT. Evaluating whether a portfolio of funded research projects (of a research council), or a portfolio of research papers (the output of a university) is relevant for science and for society required two-dimensional mapping of the project portfolio: (i) projecting the portfolio on a science map showing how the portfolio fits into and possibly shapes the research fronts, and (ii) projecting the portfolio on a map of societal challenges, showing where the portfolio links to societal problem solving or innovation. This requires evaluating in two different ‘languages’: a technical language relating projects to the research front, and a societal language relating the projects to societal challenges. We developed an approach to do this by annotating text (paper or project abstract for example) using entity recognition and existing knowledge taxonomies (ontologies). The advantage is that the method is much less dependent on subjective classifications by single experts or a small group of experts, and that it is rather user-friendly.

10:00
A Multi-Field Approach to the Author Uncertainty Problem

ABSTRACT. The ability to identify individual scholars is fundamental to evaluating productivity, mobility, collaboration and performance. Despite the advent of author identifiers and name disambiguation algorithms bibliometricians continue to be frustrated in attempt to disambiguate large-N common names. The rise of scientific powers like China contribute to this. ‘Wang’ is the most common surname in mainland China, accounting for some 7.25% of all last names in the most populous country on the planet. As such, it provides a unique case study. This study considers highly cited authors with this name and points to matching fielded data for purposes of identifying scholarship belonging to each. Results are obtained by a VantagePoint script that proceeds via multiple rounds of reduction. Starting with a Web of Science download of a common surname followed by a forename first initial the script iteratively reduces an initial dataset by considering commonality among fielded data shared by two names that are similar. If the names being compared meet the script’s commonalities threshold in Field A their records then advance to a Field B comparison (but are otherwise dropped from the dataset), and so forth. The order of field-based comparisons is nontrivial as commonalities in certain fields are more telling than in others. After all reductions are made a body of scholarship results that can be evaluated in terms of how closely it resembles the true list of publications for a given scholar. Results also allow for consideration of how the procedure advanced herein compares against alternate techniques.

10:20
Tracking researchers and their outputs: New insights from ORCID IDs
SPEAKER: Jan Youtie

ABSTRACT. Abstract The ability to identify scholarly authors is central to bibliometric analysis. Efforts to disambiguate author names using algorithms or national or societal registries become less effective as the number of publications from China and other nations with anglicized names increases, resulting in much name similarity. This work analyzes the adoption and integration of an open source, cross-national identification system, the Open Researcher and Contributor ID system (ORCID), in Web of Science metadata. Results at the article level show greater adoption of the iD in European countries and organizations as compared with those in Asia and the US. Focusing analysis on individual highly cited researchers with the anglicized Chinese surname “Wang,” results indicate wide scope for greater adoption of ORCID. The mechanisms for integrating ORCID iDs into articles also come into question in an analysis of co-authors of one particular highly cited researcher who have varying percentages of articles with ORCID iDs attached. These results suggest that systematic variations in adoption and integration of ORCID into WoS metadata should be considered in any bibliometric analysis based on it.

10:40-11:05Coffee Break
11:05-12:25 Session 2A
Location: Rm 225
11:05
Analysing the science-technology link using interlinked databases
SPEAKER: Jos Winnink

ABSTRACT. We implemented a data infrastructure to analyse relations between science and technology. This infrastructure is composed of three components: (1) a licensed in-house version of Clarivate Analytics’ Web of Science database (WoS) containing information on scientific papers published form 1980 onwards; (2) the PATSTAT database provided by the European Patent Office (EPO) containing bibliographic information of more than 90 million patent documents; and (3) a in-house developed database that links scientific publications cited in patents to corresponding records in the WoS.

The computerised algorithms developed and implemented to identify breakthrough publications are applied to the data in the WoS to generate a dataset that consists exclusively of potential breakthrough discoveries in science. Furthermore we applied the OECD-definition for ‘breakthrough inventions’ to construct a dataset of ‘breakthrough inventions’. We use bibliographic information from the WoS as a proxy for the developments in science and use patent information from PATSTAT as a proxy for technological developments. Using these information sources we are able to analyse not only the relation between scientific discoveries and technological developments in general, but are also able to analyse how breakthrough discoveries and breakthrough inventions interact.

11:25
Identifying Research Fronts Based on Scientific Papers and Patents using Topic Model: Case Study on Regenerative Medicine
SPEAKER: Zhengyin Hu

ABSTRACT. Accurately identifying the research fronts of certain area has become increasingly significant in scientific and technological development. Scientific papers and patents are available and valuable data foundation in research fronts identification, for scientific papers reflect the basic research achievements and patents reflect the application research achievements. However, most studies identify research fronts depend on a single kind of data source and rarely consider scientific papers and patents concurrently, which affecting the accuracy of the results. Therefore, this paper constructs a new method to identify research fronts by combining scientific papers and patents. First, it uses LDA topic model to generate each research topics from scientific papers and patents. Then,common research topics of scientific papers and patents are mined based on topics similarity. After that, research fronts are identified according to research topic age (RTA) and number of research topic authors (NRTA) indexes. Finally, it takes Regenerative Medicine (RM) as a case study. The result indicates that this method can identify research fronts from the perspectives of science and technology simultaneously, which makes the results more accurate. Further, the method can also be applied to track the evolution trends of research fronts.

11:45
Identifying potentially commercially disruptive technologies
SPEAKER: Peter Evans

ABSTRACT. There have been many attempts to analyse new or emerging technologies. In 2010 the IPO developed an algorithm to identify science intensive disruptive technologies using selected indicators based on patent data. During the course of that work it was observed that some traditional indicators calculated using academic literature data were nonsensical when applied to patent data. This is because patents are filed for multiple strategic reasons i.e. to allow direct commercialisation of an invention or series of inventions by an applicant or inventor, to allow the licensing of intellectual property, or to place the invention into the public domain to allow further advancement in the field or prevent others from patenting the same invention. Thus we surmise that simply analysing patent data in the absence of other external influences will not necessarily nor reliably indicate new, emerging or commercially disruptive technologies. To advance this field, the IPO’s recent research has focused on a different starting point, namely taking data related to known commercially disruptive products or companies. Detailed research was carried out to explore 60 products which are considered to be commercially disruptive. Data from the successful products or technologies were compiled into a matrix and, using multi-criteria decision analysis, were clustered into related groups. We have shown that products may be clustered into one of several areas. Our next step includes automated clustering using the existing numerical indicators and the application of machine learning techniques to the descriptive text contained in the patents.

12:05
Technology Cycles and the Evolution of Topics in Scientific Publications and Patents – Analyzing Keywords and Textual Data

ABSTRACT. In this paper, we aim to provide an automated, empirical implementation of the model of technology cycles. The development and evolution of scientific topics as well as technologies most often follows a similar pattern (e.g. Kuhn, 1970; Abernathy & Utterback, 1978; Utterback & Abernathy, 1975; Dosi, 1982; Meyer-Krahmer & Dreher, 2004; Schmoch 2007). Based on these (mostly) theoretical models, we propose a method that automatically determines the focus of a textual data extracted from patents and publications, which represent a technology cycle. We start from the hypothesis that the two cycles have different characteristics that can be separated empirically. Our method consists of two steps. First, the titles and abstracts of patents and publications are clustered into different groups of thematically related documents according to their similarity, based on tf-idf scores for each term in each document. By applying a clustering algorithm such as k-means, similarity documents are grouped together based on a similarity measure, e.g. cosine similarity (Manning et al., 2008). In a second step, keywords describing the focus of a cluster are determined. After the determination of the characterizing terms ct_i(c) of each cluster by extracting P keywords, we assign a focus F(c) to each cluster c. In a final step, we analyze the focus of the cycles, e.g. by assessing the degree of theoretical or practical focus of the characterizing terms. In a final step, the method will be evaluated by calculating precision, recall and f-score values based on a test dataset.

11:05-12:25 Session 2B
Chair:
Location: Rm 222
11:05
The Web of Innovation: Using Website Data to Understand How Firms Innovate
SPEAKER: Sanjay Arora

ABSTRACT. How firms approach the innovation journey is a function of a diverse array of factors, including technology specialization and know-how, market receptivity, ability to course-correct on strategic and tactical considerations, and access to alliances and networks. Most extant work examines these determinants through the use of traditional data sources that allow only some of these constructs to be analyzed at a given time. In contrast, websites offer new ways to examine aspects of innovation simultaneously: Websites contain information on human capital, product portfolios, investments, and alliances and networks. We focus on three industrial sectors – nanotechnology, green goods manufacturing, and synthetic biology. In the sampling design, we apply three already-published keyword search strategies (Arora et al., 2013; Shapira et al., 2015; and Shapria et al., 2017) to USPTO PatentsView; subsequently, we use an implementation of a novel web scraping technique – Apache Nutch – to capture relevant data from firm websites. Information retrieval, natural language processing, and named entity recognition methods are applied to operationalize key variables. Then, we employ two types of statistical models: (1) clustering and classification models, and (2) random walks and Markov chains. The results show that high-tech small firms are likely to maintain distinct innovation strategies and orientations as revealed through their websites. For example, highly innovative small firms that patent frequently are more likely than non-highly patenting firms to stress certain concepts on their sites, e.g., university linkages, which signifies access to new knowledge and research and development capabilities.

11:25
Identifying research labs of MNEs and analyzing Global Innovations based on patent data

ABSTRACT. ‘Global Innovations’ is a term used to describe new products and new technologies that are not only made for global markets, but where the ideas and inventions stem from global teams. This presentation suggests a method for the identification and analysis of the trends and characteristics of Global Innovations based on patent data as well as a method for the identification of foreign research labs of (German) multinational companies. We define Global Innovations as patent applications with inventors from at least two continents – Europe, Americas, Asia/Pacific, and Africa. The assignment of patents to continents is based on the address of the inventor to be able to also count global innovations within a firm. The labs are identified by a rule-based approach using individual thresholds (critical mass). The method is validated against a small set of companies where the labs are known (Gold Standard). Company data from BvD's Orbis database is taken into account, especially to identify dependent firms and companies. For this purpose Orbis was matched with PATSTAT based on a similarity measure of company/applicant names (Neuhäusler et al. 2015). We also briefly present the matching procedure and the quality – in terms of recall and precision – of this approach.

11:45
Identifying European Cross-Industry inventions in the timeframe from 1980 to 2013 – A Combination of PATSTAT and Amadeus search

ABSTRACT. In many cases, the borders between different industries are blurring. In such cases, companies need the knowledge from another industry to create inventions and – as follow up - innovations. Mostly companies need to cooperate with companies from other industries. This so-called cross-industry innovation is a phenomenon in different industries which has already been well investigated. Cross-industry innovation is observable at different “smart” products like smart home solutions or smart televisions in which companies combine solutions from formerly different industries.

The concept of cross-industry innovation is investigated in various research publications (e.g Enkel, Gassmann 2010; Enkel, Heil 2014; Gassmann et al. 2010; Heil 2015; Levén et al. 2014; Lew, Sinkovics 2013), it is based on two strategic approaches (1) the knowledge based view and (2) the open innovation theory which was introduced by Chesbrough (2003). The majority of these studies concentrates on cases in which two (and not more) companies cooperate. This leaves the question open if cross-industry innovation takes place in cases with more than two companies, and if so what relevance this – now sharpened – phenomenon has.

To answer these questions, we operationalize cross-industry innovation with the proxy of European patents. We describe a method to identify European cross-industry patents. To get the relevant data we use a patent search on the patent database PATSTAT and a company search on Amadeus database for the industry assignment. We apply time-series analysis on the whole data set and group the received patents by means of cluster analysis.

12:05
Translating patentometrics into useful intelligence: the case of 3D bioprinting

ABSTRACT. 3D bioprinting is expected to revolutionize the health industry. This research aims to extend the original scope of a hybrid data model for competitive technology intelligence applied to 3D bioprinting, which was described in an earlier work. The original model is based on the assessment of scientific publications and patent production, and is further supported by experts’ feedback. It describes technology pathways, specifically who, where and what “hot topics” are being developed. This research aims to strengthen the original approach by tracking the innovation pathway of 3D bioprinting technology from patents into specific products and applications. A scanning process was applied to business and news databases (EMIS, Google News, Factiva, LexisNexis and ProQuest News) and the results were integrated into a roadmap. Among the most notable findings are that vascular grafts and bone regeneration are the top 3D bioprinting technologies under investigation. Totipotent (embryonic) cells are being used to study future organ production and for cancer research. Moreover, skin printing appears to be a promising alternative to face a diversity of predominant health challenges. Specific types of biomaterials such as renal proximal tubule epithelial cells, or heart and liver tissue, are being printed for pharmaceutical research. Furthermore, human ears have been successfully printed and surgically placed. This research is intended for stakeholders along the 3D bioprinting supply and value chain. As 3D bioprinting is growing rapidly, annual updates are required to track this innovative technology.

12:25-13:45Lunch in GaTech Hotel Dining Room
13:20-13:40 Session 3: Power Talks
Location: Rm 225
13:20
Technology Roadmapping of Emerging Technologies: Scientometrics, Time Series Analysis and SAO-based Approach

ABSTRACT. This work proposes an approach which combines a set of quantitative methods to generate technological roadmaps, which draw on Science, Technology & Innovation data. The approach is designed to be applied to emerging technologies and its outcomes can be considered as inputs for competitive technical intelligence activities. It comprises five integrated methods within the tech mining field, namely: scientometrics, for the retrieving and structuring of scientific publications and patents; text mining, in terms of term-clumping and subject-action-object analysis, for topical analysis; hierarchical clustering, to identify the structure of the technology; time series analysis, to get a quantitative measure of the evolution and forecast of the technology; and technological roadmapping, to integrate all the information in a single picture. The approach was applied to the combined field of additive manufacturing in aeronautics. The data was retrieved from Patseer, Web of Science and Scopus databases and the application of the overall approach allowed us to understand both what ideas have dominated the evolution of technology, and which can do so in the near future. Findings were placed in a technological roadmap. In the initial years, it shows embryonic developments of the technology, such as prototypes manufactured by stereolithography. On the other hand, in the short-term future it reveals new products to be available in the market, such as 3D printed blisk blades and 3D printed acoustic liners. Future lines of research should consider the integration of webscraping to identify subject-action-object structures in specialized webpages.

13:25
Measuring the convergence-divergence activities of technology based on International Patent Classification (IPC): examples from ICT
SPEAKER: Zhinan Wang

ABSTRACT. As pressing problems in science and engineering need solutions beyond the scope of a single field, the enhancement of creativity and innovation through convergence–divergence evolutionary processes plays an important role in the society. Converging the essential knowledge and experience from beyond their own factory gate is necessary and key to successful innovation management (Curran and Leker 2011). The value of convergence has been highlighted by NSF as a process for catalyzing new research directions and advancing scientific discovery and innovation. Convergence is actually part of a dynamic and cyclical convergence–divergence process that originates organically from brain functions and other domains of the global human activity system. The convergence phase consists of analysis, making creative connections among disparate ideas, and integration. The divergence phase consists of taking these new convergences and applying them to conceptual formation of new systems; application of innovation to new areas; new discoveries based on these processes; and multidimensional new outcomes in competencies, technologies, and products (Roco et al. 2013). The convergence process is focused at technical side to see “what” interact. The divergence process is focused at functional side to seek “how” diverge. Patent information as a core element for technology management and strategic planning. This paper aims at measuring the technological convergence-divergence activities by International Patent Classification (IPC) information. Thus the two research questions remain: How to evaluate the convergence value of IPC at the technical side? How to measure the divergence speed or movement of IPC at the functional side?

13:30
Measuring the Interdisciplinarity of Technology based on Knowledge Flows in Patents: a Case Study in Synthetic Biology
SPEAKER: Dong Wan

ABSTRACT. Despite many researches on the interdisciplinarity of publications across subject categories, there are a few studies to analyzing interdisciplinarity of technology from a knowledge flow perspective. But the measuring method of interdisciplinarity in publications can be flexibly applied to patents data, especially citation-based approach. This paper, therefore, presents a procedural method to analyze citation-based interdisciplinarity by measuring technology knowledge flow of patents. The method constructs a technology knowledge flow map that shows knowledge flows among IPC codes, and can also represent it in the form of technology field knowledge flow map by exploiting the concordance between IPC codes and technology fields. In order to measure the degree of technology interdisciplinarity in a special research system, we propose the indicators of integration(I), diffusion(D) and specialization(S) in publications, and apply them to patents data. What’s more, we output a visual technology interdisciplinary map to interpret the interdisciplinary impact and interdisciplinary causality of technology at the end of our method. The presented method is illustrated using patents related to synthetic biology. The result shows that synthetic biology is an active interdisciplinary technology, both originated from a number of technology categories, and also feedback to these categories, and the assignees play different roles in the process of knowledge flow of technology. Furthermore, we expect that the method will be incorporated to become a basis of systematic systems for assess interdisciplinarity of technology and support technology exports to conduct knowledge-intensive technology planning activities.

13:35
Intelligence Gathering to Support Strategic Decision-Making in a Private organization
SPEAKER: Ajit Kumar

ABSTRACT. As it is said “Opportunity dances with those already on the dance floor” there lies an opportunity in enhancing the strategic decision taking capability based on data-driven decisions which are more objective in nature and rational for future businesses and technology portfolio. We are in a data rich environment and the primary challenge is to gather intelligence and its usage that guides the technology strategy of organizations. The challenge is made complex due to the rapid pace of technology change and convergence that drives firm's competitiveness and transformation. A four step process is generally used for such decision making wherein the steps are: defining the problem, constructing a model describing the real world problem, identifying possible solutions to the modeled problem, evaluating solutions and recommending the appropriate solution. We relate such decision taking based on technological intelligence to the sense function of dynamic capabilities theory, requiring data to be captured from different sources (patent, citation, market reports, social media, government reports) and in different forms and formats (structured, semi-structured and unstructured; text, images, video, simulations). The evolution of big data capabilities and techniques such as Natural Language Processing, Network analysis, and visualization gives a new lease and approach to technology management. This provides support to strategic decision makers for their decisions and foresight activities towards managing innovations and emerging technologies. We present interesting findings for an organization based on a mix of such techniques to support decision making for seizing opportunities towards value creation.

13:45-14:45 Session 4A
Location: Rm 225
13:45
Expert knowledge similarity measurement using network graph edit distance
SPEAKER: Qingyun Liao

ABSTRACT. In the era of knowledge economy, how to find and identify suitable experts and how to locate competitors or partners are important for both company and government expert knowledge measurement is a key solution to this challenge. This paper, therefore, provides a new way to measure expert knowledge similarity. We extract expertise features “author’s keywords” for experts from the text of papers published by them.Hence, expert knowledge network map based on author-keywords matrix is constructed. Furthermore, to measure expert knowledge similarity, graph edit distance method was creatively applied on expert knowledge network map. From case study, we get an excel of experts knowledge similarity matrix in big data domain. For every expert, we can find several similar experts who share knowledge. Also ,here we get a “big data” domain expert knowledge distribution map which shows knowledge in big-data domain, such as ”parallel computing” vividly. Thus , graph edit distance method is proved an efficient way on expert knowledge similarity measuring. The study of expert knowledge similarity measurement is meaningful for expert finding, expert identification and competitors or partners locating. What’s more, based on expert knowledge network map, it is an exploration for expert knowledge similarity measurement. Also , It is a relatively new and creative method to apply graph edit distance on expert knowledge similarity measurement.

14:05
Bibliometric Network Densification Patterns for Three Renewable Energy Technologies
SPEAKER: Elisa Boelman

ABSTRACT. The aim of this paper is to explore and illustrate a possible use of densification, a metric derived from network theory, to shed light into the evolution of three renewable energy technologies. The combination of the statistical analysis of publications (bibliometrics) and network analysis allows monitoring technological developments and can be used for the identification of emerging topics. Renewable energy has been addressed by the European Commission and in bibliometric studies.

We used the JRC-developed Tools for Innovation Monitoring (TIM) software to do bibliometric analysis of three renewable energy technologies and exported results to calculate network densification metrics as defined by Bettencourt. TIM counts activity levels and uses network analysis to identify and visualise relationships between entities publishing scientific content. We used TIM to retrieve information from the SCOPUS database about scientific publications and entities in wind energy, photovoltaics and geothermal energy. We designed Boolean search strings in the TIM tool to retrieve documents containing specific keywords in the title, abstract or keywords of publications in a given period in time.

For the technologies examined, our results from network theory and bibliometric analysis provide potentially relevant metrics for mapping these technologies according to their developmental stage. As foreseen, higher densification exponents (>1) do correspond to technologies with more established collaboration networks, while lower densification exponents (<= 1) are obtained for more emerging technologies. The level of granularity of the technologies seems to be adequate and the approach could be applied to a wider range of upcoming renewable energy technologies.

14:25
Knowledge without borders? A re-investigation from the spatial and temporal perspective
SPEAKER: Jue Wang

ABSTRACT. The study investigates the spatial and temporal effect of knowledge diffusion. Knowledge is known without borders, especially for codified knowledge that can be easily broadcasted in the form of publications. This paper intends to explore the impact of geographic proximity on the diffusion of codified knowledge, and argues that codified knowledge also transmits faster in close proximity and is subject to the similar geographic constraints but to a lesser extent. The geographic proximity advantage would be particularly relevant in the early stage of dissemination. We collected three sets of research articles published in 1990, 2000 and 2010 and compared the citations they received domestically and from abroad. The study found that domestic citations accumulate faster than foreign citations and reached their peak in 3-4 years while the foreign citations did not hit the highest point until 11 years. The result shows that geographic proximity does play a role in the transmission of knowledge. Those locate closer to the knowledge origin would be exposed and react to publications faster, but the advantage of geographic proximity fades over time.

13:45-14:45 Session 4B
Location: Rm 222
13:45
Interactions between data science and policy analysis: Evidence from the perspective of bibliometrics

ABSTRACT. With the rapid development and broad application of information technologies, the interactions between data science and policy analysis have been widely observed. This paper aims to answer the following two research questions: 1) considering the increasingly in-depth interactions between data science and policy analysis, how can we detect and track such interactions and visualize their dynamics in the past decades? 2) Since the interactions between data science and policy analysis seem like a trend, is there a direct correlation between such interactions and the quality of related research? A bibliometric study is constructed, mainly including topic analysis and network analysis. Topic analysis is designed for the first question, in which topic models are used to identify core topics and a model of scientific evolutionary pathways discovers the evolutionary relationships between topics (e.g., predecessors and descendants) within a learning process. Oriented to the second question, a co-authorship network is constructed and the degree of the active interactions of nodes is identified by centrality measures. Citation statistics and journal impact factors then act as reference indicators for evaluating the quality of articles, and correlation analysis is used to examine whether a correlation exists between the quality of articles and active cross-disciplinary interactions. Finally, quantitative results are integrated with expert knowledge, and an in-depth discussion is given to explore insights on the interactions between data science and policy analysis.

14:05
Tracking the Emergence of Synthetic Biology

ABSTRACT. Synthetic biology is an emerging domain that combines biological and engineering concepts and which has seen rapid growth in research, innovation, and policy interest in recent years. This research presentation will discuss an effort to delineate this emerging domain using a newly constructed bibliometric definition of synthetic biology. The approach is dimensioned from a core set of papers in synthetic biology, using procedures to obtain benchmark synthetic biology publication records, extract keywords from these benchmark records, and refine the keywords, supplemented with articles published in synthetic biology dedicated journals. The search strategy is compared with other recent bibliometric approaches to define synthetic biology, using a common source of publication data for the period from 2000 to 2015. The research details the rapid growth and international spread of research in synthetic biology in recent years, demonstrates that diverse research disciplines are contributing to the multidisciplinary development of synthetic biology research, and visualize this by profiling synthetic biology research on the map of science. Further shown is the roles of a relatively concentrated set of research sponsors in funding the growth and trajectories of synthetic biology. In addition to discussing these analyses, the presentation notes limitations and explores lines for further work including refining the approach by further applications of machine learning and by adapting for analyses of patent landscapes in synthetic biology.

14:25
Tech mining tools for technology roadmapping: their usage in trends monitoring and bibliometric analysis

ABSTRACT. This paper presents an approach to integration of trends monitoring and bibliometric analysis into technology roadmapping in the area of UAVs: it provides an overview of the trends influencing the development of UAVs (using STEEPV-categories); performs bibliometric analysis of the publications in 2005-2015 to identify the hot topics; and presents the ways how this information can be used in technology roadmapping process. The results obtained can be taken as a guide by researchers, business representatives or policy makers involved in foresight activity.

The methodological approach implies usage of the following methods: qualitative (literature review, expert procedures) and quantitative (bibliometric analysis, technology mining). It includes the following stages:

1. Semi-quantitative monitoring of STEEPV-trends (using different information sources: international reports, news, foresights, strategic documents, conference materials, etc.). 2. Quantitative analysis of Scopus publications in 2005-2015 (bibliometric analysis and technology mining with Vantage Point and VOSviewer). 3. Combination of the results of quantitative and qualitative procedures and their integration into a technology roadmap.

As a result of this study, the relative assessment of STEEPV-trends influencing the area of UAVs was conducted using quantitative data processing and expert procedures; the main research fronts and hot topics were detected (f.e. green UAVs, flying cars, sensor UAV technologies, drone swarms); and the possibilities for integration of trends monitoring and bibliometric analysis on different stages of technology roadmapping were explored and analyzed. Studying the ways for using these tools in developing, validating and updating technology roadmaps is an open room for further research.

14:45-15:10Coffee Break
15:10-16:10 Session 5A
Location: Rm 225
15:10
HUMAN ASSIGNED VS. MACHINE CREATED: LINKS BETWEEN PATENTS AND SCHOLARLY PUBLICATIONS
SPEAKER: Arho Suominen

ABSTRACT. To measure knowledge flows between scholarly literature and patents, studies have used a several approaches, such as keyword search based count of science publications and patenting in a technology with an expectation of linearity of innovation, patents citing publications or vice versa, or author-inventor co-occurrences. Patent citations, specifically non-patent references has often been seen as a proxy of for the “science-dependence” or “science-base” of a technology, although this has been critiqued as an over simplification. Another avenue to classify patents and publications would be to rely on machine learning, specifically unsupervised learning. In this study, we analyse the relationship between publication and patents by looking at the intersection of human assigned and machine learned linkages between science and patents. We use a macro level approach focusing on the whole science publication and patents from one country.

15:30
Exploration of China's technological innovation capacity from knowledge flow
SPEAKER: Jiao Zhang

ABSTRACT. Innovation through the knowledge flows, use and creation has become a major driver for economic growth. The measure of technological innovation capacity has gained more and more attention of researchers. The role of knowledge exchange is especially important in a knowledge and technology driven economy because it allows better penetration and diffusion of innovation and stimulates cooperation in R&D. In this paper, we propose a method to explore China's technological innovation capacity by measuring technology knowledge flow of patents at two levels: macroscopic and microscopic analysis. Macroscopic analysis mainly focus on the main regions and countries of technology inflow/outflow. Two indicators, that is self-citation ratio and other-citation ratio, are proposed. Microscopic analysis emphasize the technology integration and technology diffusion among IPC codes of patents and citation. We propose the indicators of technology integration and technology diffusion in order to revealing absorption, integration and re-innovation of knowledge and diffusion and influence of knowledge .We compare the changes of two levels in three time periods to find the change characteristics of China's technological innovation capability in different technical fields. The result shows that more and more technical fields are inclined to cite Chinese patents and have strong ability in technology absorption and diffusion. In general, China's technological innovation capability has been significantly improved.

15:50
An Analysis of Bogie Technology Development: Based on Patent Generational Citation Tree
SPEAKER: Bingxiu Gui

ABSTRACT. This paper uses patent generational citation tree to explore the development of bogie technology, which is one of the core technologies of high speed rail. First of all, the patent citation information is used to explore the generational development tree of bogie technology. Later, we shall dig three aspects of information closely related to the technology development, including patent titles, patent publication dates and patent granted regions. It is on this basis that the bogie technology development is analyzed from three respects, including technology development path, technology life cycle and technology regional diffusion. Our results show that the development of bogie technology is precisely on the path and the main path is clear. The technical life cycle is still in the stage of growth, while the spread of technology is somewhat lacking and lagging in terms of scope and speed. Finally, we draw upon the sampling method to select a complete reference chain to verify relevant results.

15:10-16:10 Session 5B
Location: Rm 222
15:10
TechMining LexisNexis: Challenges in Using Legal Journal Articles in Cocitation Analysis and Text Mining
SPEAKER: Lexi White

ABSTRACT. The study of citation patterns in scientific research has been a fruitful area of study in recent years. Many scientometric researchers have investigated networks of research publications and indicators in a variety of databases. Little research, however, has focused on citation patterns among legal publications (See, Shapiro & Pearse, 2006). Almost no research has looked at citation patterns between legal and scientific publications (See, Pasadeos et al. 2006). Since legal publications are housed in different databases than scientific publications, they are excluded from large citation studies like those done by Leydesdorff and colleagues (2015) that probe databases such as SCOPUS and Web of Science. While the two primary legal publication databases LexisNexis and Westlaw are owned by science-giants Elsevier and Thompson Reuters respectively, they operate very differently and are not optimized to allow exploration of network patterns among the articles. This research seeks to explore citation patterns on a specific, bounded topic—sugar sweetened beverages—across not only scientific research, but also legal research.

15:30
SMS: a platform for linking and enriching data for science and innovation studies

ABSTRACT. Funded by the European Commission, the RISIS projects develops a data infrastructure for science, technology and innovation studies. A core of the infrastructure is SMS platform, developed to supports the integration and enrichment of heterogeneous data: private data (such as WoS); confidential data (e.g., surveys and other privacy sensitive data), and open data (as available on the Web). The SMS platform, based on semantic web technologies and Linked Open Data principles, supports the integration of these data sets, and enriches the data. It supports the user to retrieve a tailored data set, that includes the variables required. The data can be exported in various formats for advanced analysis and visualization. At the conference, the platform will be briefly described, and we will demonstrate the use with a few research examples: (i) analyzing the geography of innovation and (ii) explaining the quality of Higher Education institutions through characteristics of the geographic location.

15:50
Mapping the computer science research landscape in sub-Saharan Africa: Bibliometric and altmetric analyses
SPEAKER: Matthew Harsh

ABSTRACT. Standard metrics that focus on journal publications, citations, and patents do not present a complete picture of research productivity in Africa. A large amount of research in Africa is published in gray literature or only published as MSc or PhD thesis (Harsh & Zachary 2013). In addition, much of African research is published in local journals that remain invisible to the global scientific community (Tijssen et al. 2006). The invisibility of this research raises important concerns about the criteria and methodology for measuring scientific productivity and research capacity, evaluating research performance and understanding the impact of research in Africa. The aim of our research is to reduce this invisibility and map the contours of computer science research activity in sub-Saharan Africa. Our study utilizes a mixed methodology approach based on bibliometric and altmetric data as well as survey data. Our analysis is based on a database of sub-Saharan computer science researchers constructed by our team. To map African computer science research productivity and visualize collaboration patterns, we utilize both bibliometric and altmetric data derived from specialized databases such as ACM, IEEE Xplore, and INSPEC and from academic networking sites such as Academia.edu. We use co-authorship as the measure for scientific research collaboration. Analysis of authors’ institutional affiliation is used to determine international collaboration. Lastly, we analyzed responses to a survey that was administered via the internet to computer science researchers in the region to understand the wider impacts of research and provide additional insights on research productivity and collaboration.

16:15-17:15 Session : IP Panel Discussion and TechMining for Global Good Award Ceremony

Panel Discussion: IP Data in TechMining

Dr. Alan Marco, 2014-2017 Chief Economist of the United States Patent & Trademark Office (USPTO)
Rich Corken,  Head of Data science and IP analytics in the UK's Intellectual Property Office's Informatics Team
Prof. Dr. Martin G. Moehrle, Institute of Project Management and Innovation-Universitat Bremen

Presentation of 2017 TechMining for Global Good Award

 

Location: Rm 222