GTM2015: 5TH GLOBAL TECHMINING CONFERENCE
PROGRAM FOR WEDNESDAY, SEPTEMBER 16TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-09:30 Session 3: Welcome and Opening Keynote

Welcome: Alan Porter and Denise Chiavetta, 2015 Program Co-Chairs

Keynote: "Some considerations and challenges for technology data mining" - Dr. David J Holland Smith, Dstl Fellow

Location: Room 222
09:35-10:45 Session 4A: Data Science and TechMining
Location: Room 222
09:35
Innovation and business growth in a strategic emerging technology: New methods for real-time intelligence on graphene enterprise development and commercialization

ABSTRACT. This paper presents the results of research to develop new data sources and methods for real-time intelligence to understand and map enterprise development and commercialization in a rapidly emerging and growing new technology. The paper draws on research that is developing novel and scalable methods to mine and combine information from unstructured online sources including enterprise webpages, established structured databases including data on patenting, and qualitative information. The promise of this strategy is that it combines up-to-date online data sources, including fast-breaking streams, with available structured data and interview insights so as to allow the development of real-time and on-going monitoring, mapping and analysis. The research focuses on enterprise development and commercialization strategies in graphene. In the paper, we systematically analyze the development and commercialization strategies of 74 firms through web content mining, structure data analysis, and qualitative analysis. We show how graphene commercial activity is moving from the production of graphene materials to intermediate and final products, examine shifts towards more specialized applications, and investigate implications for business and policy. Although the paper is focused on graphene, the approaches and methods developed are applicable to other emerging strategic new technologies.

09:55
Assessment of entrepreneurial activity in innovative system: Towards measurement models and indicators

ABSTRACT. Innovation as a socioeconomical phenomenon has been recognized as a crucial role player in strengthening competitive advantages of economies and long-term growth. Effective innovation policies provide opportunities for diminishing the innovation gap between emerging economies and innovation leaders. Governments as the major funder and supporter is also obligated to provide a platform where innovations can flourish and foster. However, in order to foster innovation there are challenges such as aligning regulation and policies which accordingly cause slow down the development process. This misalignment leads to misinterpretation of the situation ending up governments to squander resources on similar programs (Wessner 2005). The importance of government interaction with other entities has been addressed earlier in the literature. The concept of the Triple Helix of university-industry-government relationships initiated in the 1990s by Etzkowitz (2003). Looking at the situation now, by the vast development of IT infrastructure, the amount of data in our world has been exploding. Multimedia and individuals with smartphones on social network sites will continue to fuel exponential growth. Therefore the “Triple Helix” concept has been recently evolved to “Quadruple Helix” by Carayannis and Campbell (2009) which emphasizes the importance of integrating the perspective of the media-based and culture-based public. Due to this surge of data from various sectors, there is an emergence for broadening the view and encounter the novel unstructured data. Within the innovation context, studies has been done for countries innovation indexes. Our approach is to use the new sources of public data in order to explore the societal and cultural capacities which has been considered as an important innovation potential factor of the economies. As long as social network services could be the best representative of the social activity, our focus will be to get meaningful information from this valuable source.

10:15
Under-reporting research relevant to local needs in the global south. Database biases in the representation of knowledge on rice
SPEAKER: Ismael Rafols

ABSTRACT. Bibliometrics can provide very helpful tools for developing knowledge representations that can help in addressing grand challenges or societal problems, such as tackling obesity, climate change or pandemics. However, these representations are highly dependent on the data and methods used. The aim of this paper is to investigate potential biases introduced by available databases in the representation of research topics.

In a previous study on rice research, we showed that the bibliographic database CAB Abstracts (CABI) – which is focussed on agriculture and global health – has a larger coverage of rice research for most low income countries than Web of Science (WoS) or Scopus.

In this study, we present evidence that this unequal coverage distorts significantly the knowledge representation of rice research, globally and for different countries. We find that the journal coverage of the bibliometric databases WoS and Scopus under-represent some of the more application oriented topics (namely: i) production, productivity and plant nutrition (top left); ii) plant characteristics (top center); and iii) diseases, pests and plant protection (center).

Given that these are issues relevant to small farmers, producing for the local market, and with no access to the seeds developed with molecular biology techniques (GM – bottom left), we pose the question whether the inadvertent effect of the biases in the dominant database is to under-represent the type of research that is relevant for improving their wellbeing, without introducing the use of the highly contested GM seeds.

09:35-10:45 Session 4B: Patents and TechMining
Location: Room 225
09:35
Ownership transfer of patents at the State Intellectual Property Office of China

ABSTRACT. Ownership transfer of patent rights is a special form of technology transfer. Acquiring a patent implies that a firm sees a market for a technology where others do not. This technology acquisition, however, involves costs for adopting the technology, which might include structural changes within the firm, increasing R&D expenditures or hiring qualified R&D personnel (Serrano 2006). Patent acquisition can thus be seen as an investment, which is especially true when only little is known about the acquired technology. Empirical evidence shows that ownership transfer of patents from foreign to Chinese owners (filings originally filed by a foreign applicant) has increased enormously in the recent yeas. This phenomenon accounts for more than 6,000 patents per year. At least at first sight, this not only implies that Chinese firms see a market for the commercialization of foreign technologies, but also that they try to learn from their foreign counterparts. We intend to first of all analyze basic trends in transfer of patent ownership in China, i.e. who are the sellers and the buyers of SIPO filings, but also examine the structures. This includes the question whether firms acquire technologies they are already familiar with or vice versa, which is achieved by comparing the profile of the transferred patents with the existing patent profile of the new owners at the technology field level. In addition, we will test the assumption that especially (technologically) valuable patents are acquired, as these can be assumed to form the basis of new technological lines within firms.

09:55
Methodology for Identifying Pharmaceutical Key Molecules Using Technology Foresight of Patent Documents

ABSTRACT. Pharmaceutical companies use patents to protect the majority of their potential innovations. The content of patent documents is reverted into highly significant information, since the technical knowledge protected may be used for the research and development of new inventions or improvements. The different forms of patent protection in the pharmaceutical sector include the processes for producing a given active pharmaceutical ingredient (API). The first process patent applications are normally filed in the early stages of research, and in some cases are filed together with the actual patent for the molecule. When a product is successful on the market there is a technology race and other companies/researchers start to investigate potential routes for producing the molecule to find new ways of synthesizing it or improve aspects of its manufacture, such as using simpler reagents or reducing the number of stages in the synthesis process. These new routes are also protected by patents, as they may replace the route in the original patent. Using technology foresight based on patent documents for production processes is a way of identifying key molecules for the production of a given API, since they contain descriptions of the reagents and intermediates involved in the process, and the physicochemical reaction conditions. Identifying the molecules is strategic, because it is through them that the number of synthesis stages can be reduced, and new production routes and/or analogous drugs can be developed. This study sets forth a methodology for identifying the key molecules for an API. It also presents the findings of a case study that used this methodology to search for key molecules (structurally similar reagents or intermediates) for the production of zidovudine, an API for antiretrovirals widely used in HIV/AIDS treatment around the world.

10:15
An in depth study of patent thickets: A case study of lithium ion accumulators

ABSTRACT. We aim at proposing an extensive study of patent thickets within lithium ion accumulators technologies. Using a combination of IPC codes and keywords, we identified 38399 patents worldwide between 2000 and 2015 amongst which 10973 linked by either a category X or a category Y citation. This dataset allowed us to identify 1849 patent thickets between 285 firms over the 15 year period. The network interconnecting these thickets is shown in figure 1. Results show that the leaders of the market (Panasonic, Toyota, Nissan, Samsung, Sony) are involved in a large number of thickets (320 for Panasonic, 282 for Toyota and 371 for Samsung). We wish to extend the standard analysis of patent thickets in three ways. Thickets are usually identified between three firms, our first step will be to extend thickets to higher levels and find a coherence in the observed thicket. Second we will study the strength of thickets by looking at the number of times we identify the same firms in a thicket (for any given level). Finally, we follow the evolution of the thickets over time to better understand how thickets are formed between firms and how they might be reinforced.

10:45-11:05Coffee Break
11:05-12:35 Session 5A: SPECIAL PANEL: Emergence of Community for Technical Emergence?

Evgeny Klochikhin, Kevin Boyak, and Alan Porter

This panel will consist of short presentations on the current state of “technical emergence” research, followed by discussion of the main issues associated with developing indicators of technical emergence. The goal of the session is to gauge interest in technical emergence and to assess the feasibility of launching a stand-alone technical emergence conference.

Location: Room 222
11:05-12:35 Session 5B: TechMining Applied
Location: Room 225
11:05
A Scientometric Analysis of Additive Manufacturing in Latin America

ABSTRACT. This research presents a competitive technical intelligence analysis of the additive manufacturing technology. Presence of this technology over the global industry is discussed, and challenges for Latin America are determined.

Additive Manufacturing is the process of joining materials to produce three-dimensional objects from digital models. The term 3D printing is much more popular, and is commonly used as a synonym for additive manufacturing.

This technology emerged in the 80s and during its first two decades presented a slow growth; however, the 3D printing market has expanded dramatically since 2012. Research carried out throughout these years, has resulted in a continuous expansion in the fields of design and manufacturing, and its impact is clearly growing.

Additive manufacturing’s development gets bigger across different technologies, markets, industries and regions. Latin America experiences its first technology effects at market level more than technology development. Participation on patent applications is still low in comparison to developed regions. According to the ECLAC (2014), lack of vision in Latin America is mainly due to an excessively nationalist perspective of its programs for development.

The potential around this technology is widely recognized and future expectations are promising, Latin America inevitably should face these challenges and participate of this knowledge. Their organizations (academic and industry) should anticipate to future changes such as the implementation of additive manufacturing across different industries, and be competitive facing global changes that may be imminent.

11:25
Tech mining for monitoring technology trends: related methods, sources and software tools

ABSTRACT. Quantitative methods are increasingly being used in studies devoted to monitoring technology trends. This is caused by the need for validation of expert assessments with empirical data through searching for implicit signs of technological change in large amounts of information. Tech mining as a special form of “big data” analytics is becoming especially popular in FTA. In the context of information overload and limited resources, the question is how to use tech mining in combination with other related methods on different stages of technology monitoring, what sources of information to select and how to automate this process in order to increase its efficiency. This paper performs a quantitative analysis of tech mining approaches that can be used in technology monitoring: it provides the overview, dynamics and potentials for existing and advanced tech mining tools and related techniques; identifies the main groups and combinations of methods; studies the possibilities of using them on different stages of technology monitoring; and provides a discussion on the factors that could influence the choice of suitable tech mining techniques for technology monitoring purposes. The results are considered to be taken as a guide for researchers, practitioners or policy makers involved in an FTA activity.

For the purpose of this study the collection of conference proceeding was created using the abstracts of the GTM participants’ presentations made in recent 5 years (2011-2015). This collection was created using structured and unstructured data for creating the following fields: title, year, abstract, keywords, country, organization. Data was processed (cleaned and grouped), analyzed (based on the keywords co-occurrence) and mapped with Vantage Point software. The analysis was conducted in 3 iterations through discussions with experts. Thus, the tech mining proceedings were processed with quantitative methods, providing the approach “tech mining for analyzing tech mining”.

As a result of this study, the evolution of tech mining approaches to monitoring technology trends (from the point of methods, sources and tools) used by different authors in 2011-2015 was studied using quantitative data processing (bibliometric analysis, natural language processing, statistical analysis, PCA). Tech mining and related methods were divided into two groups: main (bibliometrics for structured data, text mining for unstructured data) and auxiliary (for example, network analysis, cluster analysis, trend analysis) methods, and the most frequently used combinations of them were studied. Key trends and weak signals concerning to the use of existing and emerging methods in technology monitoring (web-scraping, ontology modelling, advanced bibliometrics, semantic TRIZ, sentiment analysis, and others) were detected. The possibilities of employing tech mining and related methods on different stages of technology monitoring (scanning and monitoring; data analysis and integration; discussion, validation and prioritization; updating the database of trends) were explored and analysed. It is concluded, that the following factors could influence the choice of tech mining methods for technology monitoring: the task of the study (to find technology trends, patenting patterns, invisible colleges or others), the type of the trends (f.e., emerging technologies, research fronts, disruptive technologies), the sort of information sources (publications, patents, web content), the search strategy (broad query, topic category or specific keywords), the units of analysis (documents, structured data, unstructured data) and others. In the future it will be possible to analyse in detail the evolution of tech mining approaches to technology monitoring from the point of subject areas, countries, and centres of excellence.

11:45
Combining scientometrics with patent-metrics analysis for CTI services in R&D decision-making: Practices of the National Science Library of CAS
SPEAKER: Xiwen Liu

ABSTRACT. Scientometirc analysis has been applied into the scientific and technological trends tracing and performance evaluations in China in recent years. Since 2012, NSL-CAS has been attempted to provide the CTI service for the industrial sectors in difference stages of decision-making. According to the aims of CTI service and requirements of users, NSL has made the services for the technological innovations of enterprises, such as the novelty search for technology development, selections of technology innovating paths, proofing of decision making of technology developing projects, evaluations of technology advancement of products, monitoring the technology competitors, searching the partners for technology development, supporting industrial technology strategies. There are many indicators of scientometric methods for the technical analysis, single or mixed or composite index. For the CTI services, we could choose or construct the different indicators schema for the analysis purposes. For the industrial technological strategy in technology innovations path scanning or selections, the scientometirc indicators could play the right roles of technology development in site. For the meso technology analysis services, bibliometrics and patent analysis indicator should been combined or mixed for the different issues or stages of the emerging technology, the mixed indicators could profile the characteristics of technology. For the micro technology analysis services, patent technology trends analysis could been applied mainly for the new product development, and some bibliometric indicators should been added for the monitoring of technology trends. For the CTI services for the technology based industry sector, we should construct a composite indicators schema to discover the technology intelligence. And more, there are some inherent limitations for the bibiometirc and patent indicators in the CTI services.

12:35-14:05Lunch Break
13:00-13:30 Session 6: Lunchtime Power Talks
Location: Room 222
13:00
Testing technology-industry concordances using linked micro-level data on patents and firms
SPEAKER: Qifeng Weng

ABSTRACT. Empirical economic analysis of technological change often turns to patent data for measuring inventive activity as an input into economic production. In linking patent and economic data, researchers often face a tradeoff between coverage and detail: country-level data often provide broad coverage but are not suitable for learning about how the industry-level dynamics of knowledge spillovers, a level at which the tension between cooperative and competitive R&D efforts are likely to be most salient. Meanwhile, firm-level ‘micro-data’ provide rich detail on how firm structure relates to inventive activity as revealed through patent databases. But such micro-data often suffer from limited coverage in time and space and thus threaten selection bias when employed for large-scale policy analysis. A number of attempts have therefore been made at ‘meso-scale’ linking of patent and economic data. These efforts usually take the form of ‘technology-industry concordances,’ where official industry classifications in economic data are matched to technology classifications in patent data. This facilitates using patent statistics aggregated to the industry level and then associating them specific industrial sectors. While a number of such concordances have been developed, we are not aware of any of them having been tested and validated with micro-level data. This paper will provide such a test, using a large, global sample of firm-patent data.

One of the most well-known technology-industry concordances is the Yale Technology Concordance (YTC) (Kortum & Putnam 1997). The YTC system probabilistically matches technologies to industries using a sample of Canadian patents issued between 1978 and 1993, which patent examiners manually classified to industry (using the Canadian Industrial Classification or cSIC). However, a significant drawback of the YTC is the fact that it is based on a sample of patents limited in time and space, and hence is unlikely to effectively represent the dynamically evolving technological landscapes of contemporary industries (much has changed in many industries since 1993). To address these drawbacks, Lybbert and Zolas in a 2014 paper in Research Policy develop an Algorithmic Links with Probabilities (ALP) approach to matching patent technologies and economic industries (using the SITC system). Their ALP approach uses text-mining techniques combined with Bayesian weighting to infer the likelihood that a patent belonging to a given technology also belongs to a given industry (and vice versa). Lybbert and Zola conclude in their paper that the ALP technique produces matches which are qualitatively similar to the YTC in distribution, with the ALP matching more closely approaching the YTC as more patents are processed in the text-mining exercise. However, Lybbert and Zola also find that their ALP technique is somewhat more prone to producing Type I errors, i.e. suggesting that a patent belongs to a particular industry when it in fact does not.

We are able to test the performance of the ALP technique using independent data linking patents to industries (i.e. without text-mining). We use patent-firm matches using the OECD’s Microdatalab, which combines PATSTAT with the Orbis firm-level database.

13:05
Identifying hotspots: Assessing when the use of heat maps is preferable to more conventional cartography

ABSTRACT. VantagePoint datasets lend themselves to visualization using a number of geographical information programs (e.g. Google Earth, Google Maps, ArcGIS, etc) and tools within these programs. The current presentation considers a handful of visualization tools provided by Google Earth--i.e. placemarks, linestrings and heat maps, with an emphasis on the latter. In particular, identifying situations in which heat maps prove useful in turning information into knowledge, along with situations in which more conventional map tools are preferable, is a key objective of the present undertaking.

13:10
Text Mining non-technical terms for technology mining

ABSTRACT. In Technology Mining, analysts tend to focus on the technical jargon often removing other terms from the term bank. However, it is the non-technical terms that can give us insight into the life cycle status of technology or the influences acting on the development of the technology. In this brief talk, I will discuss the opportunities that analysis of non-technical terms provide in Technology Mining.

13:15
Nanotechnology landscape in Brazil: a value chain framework analysis

ABSTRACT. This research presents an overview of the nanotechnology landscape in Brazil based on the dynamic of patenting over the period 2000 to 2013. A modular keyword search strategy is applied to identify a wide range of nanotechnology patents in Derwent Innovations Index. By combining quantitative and qualitative approaches, the unique patentes are grouped into four categories according to a nanotechnology value chain framework: nanomaterials, nanointermediates, nano-enabled products and nanotools. Using VantagePoing software for data and text mining, a quantitative approach is performed based on the content analysis of the “Use” subfield. In order to validate the results and give more accuracy to the research, the same records are analyzed in a qualitative approach consisting of a manual categorization based on the reading of the “Title” and “Abstract” fields of all the collected patents. The study aims to identify Brazil`s profile considering the concentration of patents into the nanotechnology value chain framework and can offer significant insights on the development of this emerging technology in Brazil.

13:20
Building a patent search strategy of an emerging technology using citation information
SPEAKER: Seokbeom Kwon

ABSTRACT. Building a Patent Search Strategy of an Emerging Technology Using Citation Information

14:05-15:15 Session 7A: Methods in Unsupervised Learning
Chair:
Location: Room 222
14:05
A Patent Search Strategy based on Machine Learning for the Emerging Field of Service Robotics

ABSTRACT. High technologies are in the core focus of supra-national innovation policies. For being effective and efficient, these policies strongly rely on credible databases that display entire value creation chains, starting from research and development up to production and sales. Regarding emerging technologies, for which the latter are still marginal, tracking early development efforts becomes important. However, since these are not yet part of any official industry, patent or trademark classification systems, delineating boundaries to measure this early stage is a nontrivial task. Service robotics (SR) is such a technology. Its applications spread through a multiplicity of services including medical assistance, fully automated construction, delivery, inspection, maintenance, as well as cleaning of public places or even home entertainment.

This paper is aimed to present a methodology to automatically classify patents as concerning service or industrial robotics (IR). We introduce a synergy of a traditional technology identification process like keyword extraction and verification by an expert community with a machine learning algorithm. The result is a novel possibility to allocate patents avoiding an erratic lexical query approach and reducing the dependency on iterative input from third parties which is usually costly and time consuming.

14:25
Overlay of science and technology patterns with unsupervised learning: Case of thermal management system
SPEAKER: Samira Ranaei

ABSTRACT. The analysis of citation networks of patents or papers has been extensively used to define the knowledge structure or linkage between science and technology(S&T). However, citation approach is limited dues to the time lag, data coverage to cited or citing documents, and may under-represent the possible knowledge flow between S&T data sources. In this paper, it is assumed that the linguistic pattern of patents and publications illustrate their topical overlaps and would spot the potential growing fields in research or practice. The novelty of our approach is the utilization of topic modeling and expert opinion, in order to cluster patents and articles based on their content rather than citations. Applicability and accuracy of our method is tested on a corpus of documents in field of thermal management system.

14:45
Map of Technology: Topic modeling full-text patent data
SPEAKER: Arho Suominen

ABSTRACT. A central challenge for the mapping patents is the creation of valid and accurate coordinates. Our study discusses the choice of the origin of coordinates in order to make a map of technology, and, in particular, demonstrates the advantages of unsupervised learning-assigned coordinates over those created by human reasoning. We use Latent Dirichlet Allocation to classify all full-text patents published by the USPTO in 2014 (N=374,704) and create a representation of technology in 2014. Our results suggest that unsupervised learning is able to create a coherent classification of technology with a practical computational effort. Further work is needed to structure the classification in a format useful for the end-user.

14:05-15:15 Session 7B: TechMining for Technology Planning
Chair:
Location: Room 225
14:05
Technology mining for emerging S&T trends and developments: dynamic term clustering and semantic analysis
SPEAKER: Pavel Bakhtin

ABSTRACT. In the world of rapidly developing Science and Technology (S&T), with increasing volumes of S&T-related data and greater interdisciplinary and collaborative research, technology mining (TM) helps to acquire intelligence about emerging trends and future S&T developments. The task is becoming crucial not only for high-tech startups and large organizations, but also for venture capitalists and other companies, which make decisions about S&T investments. Governments and Public Research Institutions are also among the main stakeholders and potential users of TM to set up R&D priorities, plans and programs according to the current and future state of S&T development. Term clusters built by TM and bibliometric tools based on co-occurrence of authors’ keywords or terms processed from titles and abstracts of scientific documents combine totally different types of objects: research fields, major problems and challenges, methods, inventions, products, technologies and etc. Specific expertise in the field may allow a researcher to identify key objects of the study. However, objects themselves and their frequency dynamics over the time period alone do not fully indicate S&T developments and emerging trends in the area. In order to improve the process of the identification of emerging S&T trends and developments, the paper focuses on dynamic term clustering and suggests a systemic approach to combine TM, bibliometrics, NLP and semantic analysis as part of the unified analytical framework. The approach proposed utilizes existing clustering methods and tools along with the analysis of term linguistic dependencies in order to study changes of objects over the time along with their semantic meanings.

14:25
Tech Mining to Validate and Refine a Technology Roadmap
SPEAKER: Geet Lahoti

ABSTRACT. In this study, we use 'tech mining' to validate and refine the content of a particular section namely nanocomposite coatings of the roadmap -- Engineered Materials and Structures Roadmap (excerpted from Nanotechnology Roadmap, Technology Area 10, National Aeronautics and Space Administration, April 2012). We explore R&D and innovation activities in the area of nanocomposite coatings by mining publication and patent records. We analyze the developmental status of the related technologies and try to find quantitative information to validate the predictions made by the experts. Moreover, we generate topical intelligence using keywords obtained from publications and patents and we believe that this could help in refining the roadmap section.

14:45
Leveraging tech mining for revealing and forecasting innovation pathways: what now and what next?

ABSTRACT. Since 2006 a group of scholars in Europe, China and the US, have grappled with the mapping of emerging fields of technology with a view to reveal and prospect potential innovation pathways. Topics have focused on, nanobiosensors, photovoltaics, telehealth, lab-on-a-chip technology, electric vehicles and neurotechnologies. Currently a number of projects are underway which look at fields such as Big Data Analytics, Additive Manufacturing, Bioprinting and Nanotechnology in Healthcare (particularly drug delivery). As part of this initiative, in my own projects in Europe and in fruitful collaboration with colleagues in GeorgiaTech and the Beijing Institute of Technology, I have participated in a number of projects to reveal and forecast innovation pathways, some leveraging tech mining to augment this process. Other projects have focused heavily on qualitative studies, informed by theories of technical change and innovation studies. These “other projects” reveal openings for further leveraging tech mining and other big (and small) data analytics. This presentation will give an overview from my perspective about the “Revealing and Forecasting Innovation Pathways” initiative, and through a number of examples describe potential useful directions that are being explored today and options that could be a further next step.

15:15-15:35Coffee Break
15:35-16:45 Session 8A: Interdisciplinary Research and Citation Analysis
Location: Room 222
15:35
Interdisciplinary Research based on scientific papers citation relationship
SPEAKER: Zongying Tan

ABSTRACT. Tracking and analyzing integration trends of different disciplines, measuring interdisciplinary degrees and revealing dynamic structures of disciplines can provide basis for making disciplines development strategies and relevant policies. Citation analysis is commonly used in interdisciplinary research. Papers and their references are relevant in subject and inherited in time. Through citation analysis of papers that belong to different disciplines, we can explore academic and professional similarity of different disciplines. We systematically design interdisciplinary research methodology that includes disciplines citation matrices and citation networks, measurement of disciplines’ knowledge source and data map visualization and improved integration index. We use SCIE papers of nine years (1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2009) as empirical research data which contains 7.34 million (7336274) papers and 200 million (197 275 384) references. The disciplines analyzed in this paper are SCIE subjects. There are 22 first-level disciplines that we call disciplines and 175 branches. Interdisciplinary degree and changes of the world, USA, China and India are analyzed on disciplines level and branches level by using our interdisciplinary research methodology.

15:55
Machine learning based classification of research grant award records

ABSTRACT. Policy makers frequently ask agencies to report how much money they are spending on research and development activities in specific fields or topics; however, records are rarely classified in ways that will inform policy and budget decisions. This work explores how topic co-clustering, an approach to text analysis based on machine learning, might be used to tag NSF grant awards automatically with terms referring to scientific disciplines or to socioeconomic objectives. We use metadata in the grant records to validate the results, and do not access the metadata as part of the automated tagging process. The results show that in the case of scientific disciplines, where our language models were well-formed and we had a reliable comparison set for manual classification, the machine assigned tags were a reasonable and reliable means for describing the research conducted under each grant. In assigning socio-economic objectives to grants, we saw relatively poor precision and recall in classification, due to the poorly-formed and sparse language models available for those terms. Our analysis suggests that this approach can be used to classify large corpora of scientific awards into desired categories.

16:15
Tech Mining Cited References to Understand the Influence of Journal Articles on reports of the US National Research Council
SPEAKER: Jan Youtie

ABSTRACT. Few studies have empirically examined the influence of journal articles on policy reports. In this work, we present a method that measures this influence based on an analysis of the cited references in reports of the US National Research Council (NRC), which is the report-writing body of the National Academies; the National Academies serves as a science advisor to the US Congress (among other functions). Obtaining this information was challenging because references did not appear in a standard form and about 20% of the NRC reports (primarily in the defense area) used footnotes in lieu of a cited reference list. We used the following process to address these issues (1) separate reports according to whether or not they had a cited reference list, (2) in "footnote-only reports," separate footnotes with multiple references into individual references, (3) write a macro to classify each of the resulting 120,000 references into journal articles, other NRC reports, or another type of reference using lists from common academic literature indexes as thesauri (e.g., Web of Science, Scopus, Sage, EBSCOhost, Engineering Village, Medcline), and (4) manually code all references not able to be matched through the macro using two separate coders to enhance coding accuracy.The results indicate that journal articles are increasingly used in NRC reports in recent years.

15:35-16:45 Session 8B: Mapping for TechMining
Location: Room 225
15:35
Funding Proposal Overlap Mapping: A Tool for Science and Technology Management
SPEAKER: Ying Huang

ABSTRACT. Overlay maps provide a concise way to contextualise previously existing information of an organization, topic, or specific technological fields in a cognitive space. In the past few years, we have proposed a systematic method to drawing science overlap maps (Rafols et al. 2010) and patent overlap maps (Kay et al. 2014) based on publication and patent respectively. Research proposals, to agencies such as the U.S. National Science Foundation (NSF), reflect new ideas, concepts, tools, and data that play a vital role in the development of science and technology (Nichols 2014)). Proposal analyses can provide valuable research intelligence "upstream" of analyses of research outputs, such as publications and patents. In this context, we present a new approach to visualize proposal content by mapping NSF awards based on compilations of Program Element Codes (PECs). As a base for mapping, we extract the PECs from NSF awards for 1976 through 2014. We categorize the PECs into disciplines, locating them based on the extent of co-occurrence in individual awards. By overlaying sets of awards on the base map, we can see distributions across disciplines. This is effective in showing changes over time, as in distribution of proposals on a given subject matter or by a research unit. It can also be used to contrast the emphases of different research units. In this paper, we illustrate for Big Data as our target field for this case study. This exercise shows the potential of funding awards (reflecting proposals) analysis to aid in research assessment, R&D opportunities analysis, and portfolio management.

15:55
(Map of Science)2: Fields of S&T in Scientometric and Tech Mining Papers

ABSTRACT. Maps of science are well known nowadays as useful and captivating tool and are widely used both by researchers and policy makers. In this paper, we present text mining tools for building maps of science of a special kind which we call Map of Science Squared. These are maps of Science and Technology through the lenses of Scientometrics and Tech Mining. Specific “nodes” are created, weighted and coupled whenever possible based on mentioning in full texts or abstracts. The questions we intend to answer with this are: (1) What R&D fields are in focus of Scientometrics and Tech Mining today, which fields are old-timers and newcomers? (2) Do we link R&D fields in non-traditional ways through our studies? (3) What fields are locally bound? Two corpora were created, one for Scientometrics and Bibliometrics, and one for Tech Mining. Names of R&D Fields and Locations were extracted based on a hybrid approach which involves statistic and linguistic analysis, publication classification scheme and several thesauri. Results of processing text collections allow us to get an overall picture, as well as to make interesting conclusions on specific cases. It is an object of future research to analyze drivers of studies in Scientometrics and Tech Mining. It should help us understand what reasons are behind the fact that some fields are more often studied than the other ones, if it is caused by availability of input data, “market pull” effect, researchers’ background or other factors.

16:15
Classifying Biomedical Text for Mining Keyword Correlations and Technology Opportunities Analysis
SPEAKER: Alan Porter

ABSTRACT. Seeking opportunities for future research and innovation makes great sense for emerging technologies. Technology Opportunities Analysis (TOA) attempts to capture technological dynamics using tech mining to develop Competitive Technical Intelligence (CTI) and grasp rising hotspots in a specific domain. Biomedical research requires strict procedures from formulation development to clinical trial and only few studies end up with market products. It is necessary and beneficial to carry out TOA regarding such developmental stages in biomedical field to grasp more detailed insights. In this study, we demonstrate how to classify biomedical text automatically based on characteristic descriptions regarding research stages of reported publications using multi algorithms, so as to identify potential opportunities and future research directions. We illustrate this research framework in how gold nanoparticles (GNPs) have been applied to biomedical domain in different aspects. We further illuminate how such correlational analysis can generate leading indicators for the future.

17:30-19:00 Session 10: Poster Session

Poster Session

Location: Georgia Tech Hotel Ballroom
17:30
Meta-analysis and its application in the research on the development status of atomic clocks
SPEAKER: unknown

ABSTRACT. Meta-analysis is a quantitative synthetic research method that statistically integrates results from individual studies to find common trends and differences. An atomic clock is a precise type of clock that stabilizes the oscillator frequency by referring and locking it to an atomic resonance. Atomic clocks forms the basis of sciences such as precision spectroscopy and the determination of fundamental constants. A better understanding of the global development status of atomic clocks is necessary to provide support for emerging research areas and highlight areas where further research is necessary. There has been a widespread discussion about the current state of the research on atomic clocks within the academic circles, however few synthetic research based on the rich resources of available literatures has been performed. This paper reviews the general methodology of meta-analysis, assesses its advantages and disadvantages, and synthesizes its use in assessing the global development status of atomic clocks and discusses future direction. Literature searches were conducted using ISI Web of Science database and other similar search engines for the relevant keywords: atomic clock, cold atom clock, optical clock, ionic clock, space atomic clock, etc. We also checked the published studies of the world’s major atomic clock research institutions, including National Institute of Standards and Technology (NIST), Physikalisch Technische Bundesanstalt (PTB) and so on. Qualifying data constituted any reported numerical findings applicable for a quantitative meta-analysis, i.e., the key characteristics of various types of atomic clocks. Studies were collected for analysis until February 2013. Data were manually retrieved from publications due to the different information formats in the scientific publications. To explore the main characteristics of the wide variety of atomic clocks, we selected transition frequency, medium/long-term frequency stability, frequency accuracy, natural/actual linewidth, quality factor, etc. Other response variables were not available in enough quantity to include in meaningful quantitative analyses. The result indicates that the investigations on fountain clock, optical clock, optical lattice clock, and nuclear clock have become the active field in atomic clock research. The accuracy of primary frequency standards contributing data to International Time Bureau (BIH) has greatly increased since 1950’s and has reached 10^–16 level. The femtosecond optical frequency combs associated with the laser cooling and trapping has also reached maturity, thereby providing various isolated and nearly motionless quantum references for the optical clocks. The best of these new clocks are now surpassing the accuracy of even the best Cs fountain clocks. NIST have succeeded in demonstrating two optical clocks based on single trapped Hg and Al ions with the stability of 1.9×10^–17 and 8.6×10^–18, respectively. For a well chosen transition, the systematic uncertainties of the optical clocks are expected to be as low as 10−18. A “solid-state nuclear clock” from 229Thorium nuclei implanted into VUV transparent Calcium fluoride crystals will open new possibility of reaching a fractional instability on the 10^−21 level.

17:30
Mining Big Data in health: an overview of the data mining in Malaria tropical disease using Web 2.0 tools
SPEAKER: unknown

ABSTRACT. Whereas it is not trivial, task to identify the core information contained in 2.5 quintillion bytes/day added to the Web and that 47% of them are related to health, it is necessary to obtain new approaches to aid developing and undeveloped countries to deal with Big Data. The monitoring of this strategic information will contribute significantly to the scientific and technological development of the country in the field of Public Health, helping Policy makers and the business community take notice of scenarios and validate the effectiveness of actions, which may precede the future, provide better management of activities Science, Technology & Innovation. According to WHO, neglected tropical diseases (DN) threaten more than 1 billion people worldwide and malaria is one of them, such as dengue, tuberculosis and Chagas disease. Cases have been reported in Europe, USA etc. Thus, it is urgent to understand this scenario, promote the implementation of basic knowledge to other instances, and validate actions in addressing these ills. Thus, this work identified some 2.0 tools for health, which we tested with data mining. The results point to a better multidisciplinary view of the dissemination of existing knowledge and generating an active understanding and putting knowledge into a dynamic and interactive process. This process like a synthesis, dissemination, exchange and; consequently generate core information to decision makers and managers contributing to: the applicability for improving global health in order to contribute to better delivery of health services, products that strengthen the health system, reduced public spending and increased access of the population in health care.

17:30
An inter-organisational network perspective about the emergence of biotechnology innovation networks in Taiwan (1998-2013)
SPEAKER: unknown

ABSTRACT. This paper focuses on applying overlay mapping (Rotolo et al 2015) to map out the research collaboration and innovation networks involving domestic and international actors in the Taiwanese biotechnology sector from 1998-2013. The theoretical framework of this paper is based on the sectoral innovation system and network theory. Combining social network analysis (by using UCINET bundle with NetDraw) , more than 65 elite group interviews and the data collecting from the financial report of the 140 biotech firms, this paper empirically examines how actors’ interactions in the Taiwanese biotechnology sector have evolved in recent years. First, the results illustrate how knowledge transfer networks in Taiwanese biotechnology sector have developed rapidly in the last decade. Secondly, social network analysis illuminates the central role of intermediaries and the leading domestic laboratories in the business network. In summary, firms may not be the main vectors of innovation in a nascent science-based sector, such as the biotechnology sector in Taiwan. Instead, innovation networks appear to jointly evolve alongside firms and other actors in innovation activities. Ultimately, this paper aims to further explore the relationships between networking strategies, the network role (Gould and Fernandez 1989), the structural position of the key actors, and the innovation performance of firms. The additional data collection is largely completed and preliminary data analysis has been done which would be useful to map the sectoral knowledge network, its evolution between 1998 and 2013, and the implications of the changes for innovation performance and relevant policies and institutions. Next steps include performing detailed analyses and further developing both the conceptual framework and the insights for biotechnology innovation policy.

17:30
Hybrid of Agent-Based Modelling Simulation with Social Network Analysis for Innovation Study
SPEAKER: unknown

ABSTRACT. This study explores how to apply Agent-based Modelling Simulation (ABMS) and Social Network Analysis (SNA) to innovation studies. In this study, we took knowledge brokering in biopharmaceutical industry as an example to investigate how proposed method can be applied to innovation studies. ABMS is needed to assist us in understanding the effectiveness of different policy interventions and how these policies will affect future development of the industry sector via "virtual experiments". Innovation studies have shifted toward analyses of social and economic activities embedded in networks. This is particularly true in the case of biotechnology, a high-tech sector that thrives on networking among relevant actors. That is why we drew insights from the social networks literature. In the context of social networks, for example, Gould and Fernandez described brokerage as the role played by a social actor who mediates contact between two actors. The goal of this paper is to build an agent-based model to formalize and simulate knowledge dynamics through computational models. We furthered our knowledge in how to simulate knowledge dynamics in innovation networks using ABMS, and discuss ABMS-related issues such as types of agents that should include in ABMS, identifying agent interaction rules. The contributions of this study are both in methodological use of ABMS for studying the impact of policy instruments and in the areas of policy-suggestions for knowledge dynamics in an emerging sector.

17:30
Peptide fusion inhibitors targeting the HIV-1 gp41: a patent review (2009 - 2014)
SPEAKER: unknown

ABSTRACT. As the first peptide HIV fusion inhibitor targeting gp41, enfuvirtide (T20) was approved by the US FDA in 2003 as a salvage therapy for HIV/ AIDS patients who failed to respond to the then existing antiretroviral therapeutics. However, its clinical application is limited by its relatively low potency, low genetic barrier to drug resistance and short half-life. Therefore, it is essential to develop new peptide HIV fusion inhibitors with improved antiviral efficacy, drug-resistance profile and pharmaceutical properties. In this paper, we reviewed the patents, patent applications and related research articles for the development of new peptide fusion inhibitors targeting the HIV-1 gp41 published between 2009 and 2014. To improve enfuvirtide’s anti-HIV efficacy, drug-resistance profile, half-life and pharmaceutical properties, the best approaches include the addition of the pocket-binding domain (PBD) to the N-terminus of T20 and linking of the M-T hook to the N-terminus of PBD, as well as conjugation of cholesterol, serum albumin-binding motif or gp120-binding fragment with a PBD-containing C-terminal heptad repeat-peptide. Therefore, sifuvirtide from Tianjin FusoGen Pharmaceuticals, Inc., albuvirtide from Frontier Biotechnologies Co., Ltd., cholesterol-conjugated HIV fusion inhibitor from the Institute of Pathogen Biology, Chinese Academy of Medical Science, 2DLT, a bivalent HIV fusion inhibitor/inactivator, and an enfuvirtide/sifuvirtide combination regimen from the New York Blood Center may all have potential as next-generation HIV fusion inhibitors targeting gp41 for clinical use.