View: session overviewtalk overview
Welcome: Alan Porter and Denise Chiavetta, 2015 Program Co-Chairs
Keynote: "Some considerations and challenges for technology data mining" - Dr. David J Holland Smith, Dstl Fellow
09:35 | Innovation and business growth in a strategic emerging technology: New methods for real-time intelligence on graphene enterprise development and commercialization SPEAKER: Philip Shapira ABSTRACT. This paper presents the results of research to develop new data sources and methods for real-time intelligence to understand and map enterprise development and commercialization in a rapidly emerging and growing new technology. The paper draws on research that is developing novel and scalable methods to mine and combine information from unstructured online sources including enterprise webpages, established structured databases including data on patenting, and qualitative information. The promise of this strategy is that it combines up-to-date online data sources, including fast-breaking streams, with available structured data and interview insights so as to allow the development of real-time and on-going monitoring, mapping and analysis. The research focuses on enterprise development and commercialization strategies in graphene. In the paper, we systematically analyze the development and commercialization strategies of 74 firms through web content mining, structure data analysis, and qualitative analysis. We show how graphene commercial activity is moving from the production of graphene materials to intermediate and final products, examine shifts towards more specialized applications, and investigate implications for business and policy. Although the paper is focused on graphene, the approaches and methods developed are applicable to other emerging strategic new technologies. |
09:55 | Assessment of entrepreneurial activity in innovative system: Towards measurement models and indicators SPEAKER: Arash Hajikhani ABSTRACT. Innovation as a socioeconomical phenomenon has been recognized as a crucial role player in strengthening competitive advantages of economies and long-term growth. Effective innovation policies provide opportunities for diminishing the innovation gap between emerging economies and innovation leaders. Governments as the major funder and supporter is also obligated to provide a platform where innovations can flourish and foster. However, in order to foster innovation there are challenges such as aligning regulation and policies which accordingly cause slow down the development process. This misalignment leads to misinterpretation of the situation ending up governments to squander resources on similar programs (Wessner 2005). The importance of government interaction with other entities has been addressed earlier in the literature. The concept of the Triple Helix of university-industry-government relationships initiated in the 1990s by Etzkowitz (2003). Looking at the situation now, by the vast development of IT infrastructure, the amount of data in our world has been exploding. Multimedia and individuals with smartphones on social network sites will continue to fuel exponential growth. Therefore the “Triple Helix” concept has been recently evolved to “Quadruple Helix” by Carayannis and Campbell (2009) which emphasizes the importance of integrating the perspective of the media-based and culture-based public. Due to this surge of data from various sectors, there is an emergence for broadening the view and encounter the novel unstructured data. Within the innovation context, studies has been done for countries innovation indexes. Our approach is to use the new sources of public data in order to explore the societal and cultural capacities which has been considered as an important innovation potential factor of the economies. As long as social network services could be the best representative of the social activity, our focus will be to get meaningful information from this valuable source. |
10:15 | Under-reporting research relevant to local needs in the global south. Database biases in the representation of knowledge on rice SPEAKER: Ismael Rafols ABSTRACT. Bibliometrics can provide very helpful tools for developing knowledge representations that can help in addressing grand challenges or societal problems, such as tackling obesity, climate change or pandemics. However, these representations are highly dependent on the data and methods used. The aim of this paper is to investigate potential biases introduced by available databases in the representation of research topics. In a previous study on rice research, we showed that the bibliographic database CAB Abstracts (CABI) – which is focussed on agriculture and global health – has a larger coverage of rice research for most low income countries than Web of Science (WoS) or Scopus. In this study, we present evidence that this unequal coverage distorts significantly the knowledge representation of rice research, globally and for different countries. We find that the journal coverage of the bibliometric databases WoS and Scopus under-represent some of the more application oriented topics (namely: i) production, productivity and plant nutrition (top left); ii) plant characteristics (top center); and iii) diseases, pests and plant protection (center). Given that these are issues relevant to small farmers, producing for the local market, and with no access to the seeds developed with molecular biology techniques (GM – bottom left), we pose the question whether the inadvertent effect of the biases in the dominant database is to under-represent the type of research that is relevant for improving their wellbeing, without introducing the use of the highly contested GM seeds. |
09:35 | Ownership transfer of patents at the State Intellectual Property Office of China SPEAKER: Peter Neuhäusler ABSTRACT. Ownership transfer of patent rights is a special form of technology transfer. Acquiring a patent implies that a firm sees a market for a technology where others do not. This technology acquisition, however, involves costs for adopting the technology, which might include structural changes within the firm, increasing R&D expenditures or hiring qualified R&D personnel (Serrano 2006). Patent acquisition can thus be seen as an investment, which is especially true when only little is known about the acquired technology. Empirical evidence shows that ownership transfer of patents from foreign to Chinese owners (filings originally filed by a foreign applicant) has increased enormously in the recent yeas. This phenomenon accounts for more than 6,000 patents per year. At least at first sight, this not only implies that Chinese firms see a market for the commercialization of foreign technologies, but also that they try to learn from their foreign counterparts. We intend to first of all analyze basic trends in transfer of patent ownership in China, i.e. who are the sellers and the buyers of SIPO filings, but also examine the structures. This includes the question whether firms acquire technologies they are already familiar with or vice versa, which is achieved by comparing the profile of the transferred patents with the existing patent profile of the new owners at the technology field level. In addition, we will test the assumption that especially (technologically) valuable patents are acquired, as these can be assumed to form the basis of new technological lines within firms. |
09:55 | Methodology for Identifying Pharmaceutical Key Molecules Using Technology Foresight of Patent Documents SPEAKER: Flavia Maria Lins Mendes ABSTRACT. Pharmaceutical companies use patents to protect the majority of their potential innovations. The content of patent documents is reverted into highly significant information, since the technical knowledge protected may be used for the research and development of new inventions or improvements. The different forms of patent protection in the pharmaceutical sector include the processes for producing a given active pharmaceutical ingredient (API). The first process patent applications are normally filed in the early stages of research, and in some cases are filed together with the actual patent for the molecule. When a product is successful on the market there is a technology race and other companies/researchers start to investigate potential routes for producing the molecule to find new ways of synthesizing it or improve aspects of its manufacture, such as using simpler reagents or reducing the number of stages in the synthesis process. These new routes are also protected by patents, as they may replace the route in the original patent. Using technology foresight based on patent documents for production processes is a way of identifying key molecules for the production of a given API, since they contain descriptions of the reagents and intermediates involved in the process, and the physicochemical reaction conditions. Identifying the molecules is strategic, because it is through them that the number of synthesis stages can be reduced, and new production routes and/or analogous drugs can be developed. This study sets forth a methodology for identifying the key molecules for an API. It also presents the findings of a case study that used this methodology to search for key molecules (structurally similar reagents or intermediates) for the production of zidovudine, an API for antiretrovirals widely used in HIV/AIDS treatment around the world. |
10:15 | An in depth study of patent thickets: A case study of lithium ion accumulators SPEAKER: Jean-Paul Rameshkoumar ABSTRACT. We aim at proposing an extensive study of patent thickets within lithium ion accumulators technologies. Using a combination of IPC codes and keywords, we identified 38399 patents worldwide between 2000 and 2015 amongst which 10973 linked by either a category X or a category Y citation. This dataset allowed us to identify 1849 patent thickets between 285 firms over the 15 year period. The network interconnecting these thickets is shown in figure 1. Results show that the leaders of the market (Panasonic, Toyota, Nissan, Samsung, Sony) are involved in a large number of thickets (320 for Panasonic, 282 for Toyota and 371 for Samsung). We wish to extend the standard analysis of patent thickets in three ways. Thickets are usually identified between three firms, our first step will be to extend thickets to higher levels and find a coherence in the observed thicket. Second we will study the strength of thickets by looking at the number of times we identify the same firms in a thicket (for any given level). Finally, we follow the evolution of the thickets over time to better understand how thickets are formed between firms and how they might be reinforced. |
Evgeny Klochikhin, Kevin Boyak, and Alan Porter
This panel will consist of short presentations on the current state of “technical emergence” research, followed by discussion of the main issues associated with developing indicators of technical emergence. The goal of the session is to gauge interest in technical emergence and to assess the feasibility of launching a stand-alone technical emergence conference.
11:05 | A Scientometric Analysis of Additive Manufacturing in Latin America SPEAKER: Ana Marcela Hernández de Menéndez ABSTRACT. This research presents a competitive technical intelligence analysis of the additive manufacturing technology. Presence of this technology over the global industry is discussed, and challenges for Latin America are determined. Additive Manufacturing is the process of joining materials to produce three-dimensional objects from digital models. The term 3D printing is much more popular, and is commonly used as a synonym for additive manufacturing. This technology emerged in the 80s and during its first two decades presented a slow growth; however, the 3D printing market has expanded dramatically since 2012. Research carried out throughout these years, has resulted in a continuous expansion in the fields of design and manufacturing, and its impact is clearly growing. Additive manufacturing’s development gets bigger across different technologies, markets, industries and regions. Latin America experiences its first technology effects at market level more than technology development. Participation on patent applications is still low in comparison to developed regions. According to the ECLAC (2014), lack of vision in Latin America is mainly due to an excessively nationalist perspective of its programs for development. The potential around this technology is widely recognized and future expectations are promising, Latin America inevitably should face these challenges and participate of this knowledge. Their organizations (academic and industry) should anticipate to future changes such as the implementation of additive manufacturing across different industries, and be competitive facing global changes that may be imminent. |
11:25 | Tech mining for monitoring technology trends: related methods, sources and software tools SPEAKER: Nadezhda Mikova ABSTRACT. Quantitative methods are increasingly being used in studies devoted to monitoring technology trends. This is caused by the need for validation of expert assessments with empirical data through searching for implicit signs of technological change in large amounts of information. Tech mining as a special form of “big data” analytics is becoming especially popular in FTA. In the context of information overload and limited resources, the question is how to use tech mining in combination with other related methods on different stages of technology monitoring, what sources of information to select and how to automate this process in order to increase its efficiency. This paper performs a quantitative analysis of tech mining approaches that can be used in technology monitoring: it provides the overview, dynamics and potentials for existing and advanced tech mining tools and related techniques; identifies the main groups and combinations of methods; studies the possibilities of using them on different stages of technology monitoring; and provides a discussion on the factors that could influence the choice of suitable tech mining techniques for technology monitoring purposes. The results are considered to be taken as a guide for researchers, practitioners or policy makers involved in an FTA activity. For the purpose of this study the collection of conference proceeding was created using the abstracts of the GTM participants’ presentations made in recent 5 years (2011-2015). This collection was created using structured and unstructured data for creating the following fields: title, year, abstract, keywords, country, organization. Data was processed (cleaned and grouped), analyzed (based on the keywords co-occurrence) and mapped with Vantage Point software. The analysis was conducted in 3 iterations through discussions with experts. Thus, the tech mining proceedings were processed with quantitative methods, providing the approach “tech mining for analyzing tech mining”. As a result of this study, the evolution of tech mining approaches to monitoring technology trends (from the point of methods, sources and tools) used by different authors in 2011-2015 was studied using quantitative data processing (bibliometric analysis, natural language processing, statistical analysis, PCA). Tech mining and related methods were divided into two groups: main (bibliometrics for structured data, text mining for unstructured data) and auxiliary (for example, network analysis, cluster analysis, trend analysis) methods, and the most frequently used combinations of them were studied. Key trends and weak signals concerning to the use of existing and emerging methods in technology monitoring (web-scraping, ontology modelling, advanced bibliometrics, semantic TRIZ, sentiment analysis, and others) were detected. The possibilities of employing tech mining and related methods on different stages of technology monitoring (scanning and monitoring; data analysis and integration; discussion, validation and prioritization; updating the database of trends) were explored and analysed. It is concluded, that the following factors could influence the choice of tech mining methods for technology monitoring: the task of the study (to find technology trends, patenting patterns, invisible colleges or others), the type of the trends (f.e., emerging technologies, research fronts, disruptive technologies), the sort of information sources (publications, patents, web content), the search strategy (broad query, topic category or specific keywords), the units of analysis (documents, structured data, unstructured data) and others. In the future it will be possible to analyse in detail the evolution of tech mining approaches to technology monitoring from the point of subject areas, countries, and centres of excellence. |
11:45 | Combining scientometrics with patent-metrics analysis for CTI services in R&D decision-making: Practices of the National Science Library of CAS SPEAKER: Xiwen Liu ABSTRACT. Scientometirc analysis has been applied into the scientific and technological trends tracing and performance evaluations in China in recent years. Since 2012, NSL-CAS has been attempted to provide the CTI service for the industrial sectors in difference stages of decision-making. According to the aims of CTI service and requirements of users, NSL has made the services for the technological innovations of enterprises, such as the novelty search for technology development, selections of technology innovating paths, proofing of decision making of technology developing projects, evaluations of technology advancement of products, monitoring the technology competitors, searching the partners for technology development, supporting industrial technology strategies. There are many indicators of scientometric methods for the technical analysis, single or mixed or composite index. For the CTI services, we could choose or construct the different indicators schema for the analysis purposes. For the industrial technological strategy in technology innovations path scanning or selections, the scientometirc indicators could play the right roles of technology development in site. For the meso technology analysis services, bibliometrics and patent analysis indicator should been combined or mixed for the different issues or stages of the emerging technology, the mixed indicators could profile the characteristics of technology. For the micro technology analysis services, patent technology trends analysis could been applied mainly for the new product development, and some bibliometric indicators should been added for the monitoring of technology trends. For the CTI services for the technology based industry sector, we should construct a composite indicators schema to discover the technology intelligence. And more, there are some inherent limitations for the bibiometirc and patent indicators in the CTI services. |
13:00 | Testing technology-industry concordances using linked micro-level data on patents and firms SPEAKER: Qifeng Weng ABSTRACT. Empirical economic analysis of technological change often turns to patent data for measuring inventive activity as an input into economic production. In linking patent and economic data, researchers often face a tradeoff between coverage and detail: country-level data often provide broad coverage but are not suitable for learning about how the industry-level dynamics of knowledge spillovers, a level at which the tension between cooperative and competitive R&D efforts are likely to be most salient. Meanwhile, firm-level ‘micro-data’ provide rich detail on how firm structure relates to inventive activity as revealed through patent databases. But such micro-data often suffer from limited coverage in time and space and thus threaten selection bias when employed for large-scale policy analysis. A number of attempts have therefore been made at ‘meso-scale’ linking of patent and economic data. These efforts usually take the form of ‘technology-industry concordances,’ where official industry classifications in economic data are matched to technology classifications in patent data. This facilitates using patent statistics aggregated to the industry level and then associating them specific industrial sectors. While a number of such concordances have been developed, we are not aware of any of them having been tested and validated with micro-level data. This paper will provide such a test, using a large, global sample of firm-patent data. One of the most well-known technology-industry concordances is the Yale Technology Concordance (YTC) (Kortum & Putnam 1997). The YTC system probabilistically matches technologies to industries using a sample of Canadian patents issued between 1978 and 1993, which patent examiners manually classified to industry (using the Canadian Industrial Classification or cSIC). However, a significant drawback of the YTC is the fact that it is based on a sample of patents limited in time and space, and hence is unlikely to effectively represent the dynamically evolving technological landscapes of contemporary industries (much has changed in many industries since 1993). To address these drawbacks, Lybbert and Zolas in a 2014 paper in Research Policy develop an Algorithmic Links with Probabilities (ALP) approach to matching patent technologies and economic industries (using the SITC system). Their ALP approach uses text-mining techniques combined with Bayesian weighting to infer the likelihood that a patent belonging to a given technology also belongs to a given industry (and vice versa). Lybbert and Zola conclude in their paper that the ALP technique produces matches which are qualitatively similar to the YTC in distribution, with the ALP matching more closely approaching the YTC as more patents are processed in the text-mining exercise. However, Lybbert and Zola also find that their ALP technique is somewhat more prone to producing Type I errors, i.e. suggesting that a patent belongs to a particular industry when it in fact does not. We are able to test the performance of the ALP technique using independent data linking patents to industries (i.e. without text-mining). We use patent-firm matches using the OECD’s Microdatalab, which combines PATSTAT with the Orbis firm-level database. |
13:05 | Identifying hotspots: Assessing when the use of heat maps is preferable to more conventional cartography SPEAKER: Stephen Carley ABSTRACT. VantagePoint datasets lend themselves to visualization using a number of geographical information programs (e.g. Google Earth, Google Maps, ArcGIS, etc) and tools within these programs. The current presentation considers a handful of visualization tools provided by Google Earth--i.e. placemarks, linestrings and heat maps, with an emphasis on the latter. In particular, identifying situations in which heat maps prove useful in turning information into knowledge, along with situations in which more conventional map tools are preferable, is a key objective of the present undertaking. |
13:10 | Text Mining non-technical terms for technology mining SPEAKER: Cherie Trumbach ABSTRACT. In Technology Mining, analysts tend to focus on the technical jargon often removing other terms from the term bank. However, it is the non-technical terms that can give us insight into the life cycle status of technology or the influences acting on the development of the technology. In this brief talk, I will discuss the opportunities that analysis of non-technical terms provide in Technology Mining. |
13:15 | Nanotechnology landscape in Brazil: a value chain framework analysis SPEAKER: Daniel Giacometti Amaral ABSTRACT. This research presents an overview of the nanotechnology landscape in Brazil based on the dynamic of patenting over the period 2000 to 2013. A modular keyword search strategy is applied to identify a wide range of nanotechnology patents in Derwent Innovations Index. By combining quantitative and qualitative approaches, the unique patentes are grouped into four categories according to a nanotechnology value chain framework: nanomaterials, nanointermediates, nano-enabled products and nanotools. Using VantagePoing software for data and text mining, a quantitative approach is performed based on the content analysis of the “Use” subfield. In order to validate the results and give more accuracy to the research, the same records are analyzed in a qualitative approach consisting of a manual categorization based on the reading of the “Title” and “Abstract” fields of all the collected patents. The study aims to identify Brazil`s profile considering the concentration of patents into the nanotechnology value chain framework and can offer significant insights on the development of this emerging technology in Brazil. |
13:20 | Building a patent search strategy of an emerging technology using citation information SPEAKER: Seokbeom Kwon ABSTRACT. Building a Patent Search Strategy of an Emerging Technology Using Citation Information |
14:05 | A Patent Search Strategy based on Machine Learning for the Emerging Field of Service Robotics SPEAKER: Vladimir Korzinov ABSTRACT. High technologies are in the core focus of supra-national innovation policies. For being effective and efficient, these policies strongly rely on credible databases that display entire value creation chains, starting from research and development up to production and sales. Regarding emerging technologies, for which the latter are still marginal, tracking early development efforts becomes important. However, since these are not yet part of any official industry, patent or trademark classification systems, delineating boundaries to measure this early stage is a nontrivial task. Service robotics (SR) is such a technology. Its applications spread through a multiplicity of services including medical assistance, fully automated construction, delivery, inspection, maintenance, as well as cleaning of public places or even home entertainment. This paper is aimed to present a methodology to automatically classify patents as concerning service or industrial robotics (IR). We introduce a synergy of a traditional technology identification process like keyword extraction and verification by an expert community with a machine learning algorithm. The result is a novel possibility to allocate patents avoiding an erratic lexical query approach and reducing the dependency on iterative input from third parties which is usually costly and time consuming. |
14:25 | Overlay of science and technology patterns with unsupervised learning: Case of thermal management system SPEAKER: Samira Ranaei ABSTRACT. The analysis of citation networks of patents or papers has been extensively used to define the knowledge structure or linkage between science and technology(S&T). However, citation approach is limited dues to the time lag, data coverage to cited or citing documents, and may under-represent the possible knowledge flow between S&T data sources. In this paper, it is assumed that the linguistic pattern of patents and publications illustrate their topical overlaps and would spot the potential growing fields in research or practice. The novelty of our approach is the utilization of topic modeling and expert opinion, in order to cluster patents and articles based on their content rather than citations. Applicability and accuracy of our method is tested on a corpus of documents in field of thermal management system. |
14:45 | Map of Technology: Topic modeling full-text patent data SPEAKER: Arho Suominen ABSTRACT. A central challenge for the mapping patents is the creation of valid and accurate coordinates. Our study discusses the choice of the origin of coordinates in order to make a map of technology, and, in particular, demonstrates the advantages of unsupervised learning-assigned coordinates over those created by human reasoning. We use Latent Dirichlet Allocation to classify all full-text patents published by the USPTO in 2014 (N=374,704) and create a representation of technology in 2014. Our results suggest that unsupervised learning is able to create a coherent classification of technology with a practical computational effort. Further work is needed to structure the classification in a format useful for the end-user. |
14:05 | Technology mining for emerging S&T trends and developments: dynamic term clustering and semantic analysis SPEAKER: Pavel Bakhtin ABSTRACT. In the world of rapidly developing Science and Technology (S&T), with increasing volumes of S&T-related data and greater interdisciplinary and collaborative research, technology mining (TM) helps to acquire intelligence about emerging trends and future S&T developments. The task is becoming crucial not only for high-tech startups and large organizations, but also for venture capitalists and other companies, which make decisions about S&T investments. Governments and Public Research Institutions are also among the main stakeholders and potential users of TM to set up R&D priorities, plans and programs according to the current and future state of S&T development. Term clusters built by TM and bibliometric tools based on co-occurrence of authors’ keywords or terms processed from titles and abstracts of scientific documents combine totally different types of objects: research fields, major problems and challenges, methods, inventions, products, technologies and etc. Specific expertise in the field may allow a researcher to identify key objects of the study. However, objects themselves and their frequency dynamics over the time period alone do not fully indicate S&T developments and emerging trends in the area. In order to improve the process of the identification of emerging S&T trends and developments, the paper focuses on dynamic term clustering and suggests a systemic approach to combine TM, bibliometrics, NLP and semantic analysis as part of the unified analytical framework. The approach proposed utilizes existing clustering methods and tools along with the analysis of term linguistic dependencies in order to study changes of objects over the time along with their semantic meanings. |
14:25 | Tech Mining to Validate and Refine a Technology Roadmap SPEAKER: Geet Lahoti ABSTRACT. In this study, we use 'tech mining' to validate and refine the content of a particular section namely nanocomposite coatings of the roadmap -- Engineered Materials and Structures Roadmap (excerpted from Nanotechnology Roadmap, Technology Area 10, National Aeronautics and Space Administration, April 2012). We explore R&D and innovation activities in the area of nanocomposite coatings by mining publication and patent records. We analyze the developmental status of the related technologies and try to find quantitative information to validate the predictions made by the experts. Moreover, we generate topical intelligence using keywords obtained from publications and patents and we believe that this could help in refining the roadmap section. |
14:45 | Leveraging tech mining for revealing and forecasting innovation pathways: what now and what next? SPEAKER: Douglas Robinson ABSTRACT. Since 2006 a group of scholars in Europe, China and the US, have grappled with the mapping of emerging fields of technology with a view to reveal and prospect potential innovation pathways. Topics have focused on, nanobiosensors, photovoltaics, telehealth, lab-on-a-chip technology, electric vehicles and neurotechnologies. Currently a number of projects are underway which look at fields such as Big Data Analytics, Additive Manufacturing, Bioprinting and Nanotechnology in Healthcare (particularly drug delivery). As part of this initiative, in my own projects in Europe and in fruitful collaboration with colleagues in GeorgiaTech and the Beijing Institute of Technology, I have participated in a number of projects to reveal and forecast innovation pathways, some leveraging tech mining to augment this process. Other projects have focused heavily on qualitative studies, informed by theories of technical change and innovation studies. These “other projects” reveal openings for further leveraging tech mining and other big (and small) data analytics. This presentation will give an overview from my perspective about the “Revealing and Forecasting Innovation Pathways” initiative, and through a number of examples describe potential useful directions that are being explored today and options that could be a further next step. |
15:35 | Interdisciplinary Research based on scientific papers citation relationship SPEAKER: Zongying Tan ABSTRACT. Tracking and analyzing integration trends of different disciplines, measuring interdisciplinary degrees and revealing dynamic structures of disciplines can provide basis for making disciplines development strategies and relevant policies. Citation analysis is commonly used in interdisciplinary research. Papers and their references are relevant in subject and inherited in time. Through citation analysis of papers that belong to different disciplines, we can explore academic and professional similarity of different disciplines. We systematically design interdisciplinary research methodology that includes disciplines citation matrices and citation networks, measurement of disciplines’ knowledge source and data map visualization and improved integration index. We use SCIE papers of nine years (1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2009) as empirical research data which contains 7.34 million (7336274) papers and 200 million (197 275 384) references. The disciplines analyzed in this paper are SCIE subjects. There are 22 first-level disciplines that we call disciplines and 175 branches. Interdisciplinary degree and changes of the world, USA, China and India are analyzed on disciplines level and branches level by using our interdisciplinary research methodology. |
15:55 | Machine learning based classification of research grant award records SPEAKER: Jeffrey Alexander ABSTRACT. Policy makers frequently ask agencies to report how much money they are spending on research and development activities in specific fields or topics; however, records are rarely classified in ways that will inform policy and budget decisions. This work explores how topic co-clustering, an approach to text analysis based on machine learning, might be used to tag NSF grant awards automatically with terms referring to scientific disciplines or to socioeconomic objectives. We use metadata in the grant records to validate the results, and do not access the metadata as part of the automated tagging process. The results show that in the case of scientific disciplines, where our language models were well-formed and we had a reliable comparison set for manual classification, the machine assigned tags were a reasonable and reliable means for describing the research conducted under each grant. In assigning socio-economic objectives to grants, we saw relatively poor precision and recall in classification, due to the poorly-formed and sparse language models available for those terms. Our analysis suggests that this approach can be used to classify large corpora of scientific awards into desired categories. |
16:15 | Tech Mining Cited References to Understand the Influence of Journal Articles on reports of the US National Research Council SPEAKER: Jan Youtie ABSTRACT. Few studies have empirically examined the influence of journal articles on policy reports. In this work, we present a method that measures this influence based on an analysis of the cited references in reports of the US National Research Council (NRC), which is the report-writing body of the National Academies; the National Academies serves as a science advisor to the US Congress (among other functions). Obtaining this information was challenging because references did not appear in a standard form and about 20% of the NRC reports (primarily in the defense area) used footnotes in lieu of a cited reference list. We used the following process to address these issues (1) separate reports according to whether or not they had a cited reference list, (2) in "footnote-only reports," separate footnotes with multiple references into individual references, (3) write a macro to classify each of the resulting 120,000 references into journal articles, other NRC reports, or another type of reference using lists from common academic literature indexes as thesauri (e.g., Web of Science, Scopus, Sage, EBSCOhost, Engineering Village, Medcline), and (4) manually code all references not able to be matched through the macro using two separate coders to enhance coding accuracy.The results indicate that journal articles are increasingly used in NRC reports in recent years. |
15:35 | Funding Proposal Overlap Mapping: A Tool for Science and Technology Management SPEAKER: Ying Huang ABSTRACT. Overlay maps provide a concise way to contextualise previously existing information of an organization, topic, or specific technological fields in a cognitive space. In the past few years, we have proposed a systematic method to drawing science overlap maps (Rafols et al. 2010) and patent overlap maps (Kay et al. 2014) based on publication and patent respectively. Research proposals, to agencies such as the U.S. National Science Foundation (NSF), reflect new ideas, concepts, tools, and data that play a vital role in the development of science and technology (Nichols 2014)). Proposal analyses can provide valuable research intelligence "upstream" of analyses of research outputs, such as publications and patents. In this context, we present a new approach to visualize proposal content by mapping NSF awards based on compilations of Program Element Codes (PECs). As a base for mapping, we extract the PECs from NSF awards for 1976 through 2014. We categorize the PECs into disciplines, locating them based on the extent of co-occurrence in individual awards. By overlaying sets of awards on the base map, we can see distributions across disciplines. This is effective in showing changes over time, as in distribution of proposals on a given subject matter or by a research unit. It can also be used to contrast the emphases of different research units. In this paper, we illustrate for Big Data as our target field for this case study. This exercise shows the potential of funding awards (reflecting proposals) analysis to aid in research assessment, R&D opportunities analysis, and portfolio management. |
15:55 | (Map of Science)2: Fields of S&T in Scientometric and Tech Mining Papers SPEAKER: Irina Efimenko ABSTRACT. Maps of science are well known nowadays as useful and captivating tool and are widely used both by researchers and policy makers. In this paper, we present text mining tools for building maps of science of a special kind which we call Map of Science Squared. These are maps of Science and Technology through the lenses of Scientometrics and Tech Mining. Specific “nodes” are created, weighted and coupled whenever possible based on mentioning in full texts or abstracts. The questions we intend to answer with this are: (1) What R&D fields are in focus of Scientometrics and Tech Mining today, which fields are old-timers and newcomers? (2) Do we link R&D fields in non-traditional ways through our studies? (3) What fields are locally bound? Two corpora were created, one for Scientometrics and Bibliometrics, and one for Tech Mining. Names of R&D Fields and Locations were extracted based on a hybrid approach which involves statistic and linguistic analysis, publication classification scheme and several thesauri. Results of processing text collections allow us to get an overall picture, as well as to make interesting conclusions on specific cases. It is an object of future research to analyze drivers of studies in Scientometrics and Tech Mining. It should help us understand what reasons are behind the fact that some fields are more often studied than the other ones, if it is caused by availability of input data, “market pull” effect, researchers’ background or other factors. |
16:15 | Classifying Biomedical Text for Mining Keyword Correlations and Technology Opportunities Analysis SPEAKER: Alan Porter ABSTRACT. Seeking opportunities for future research and innovation makes great sense for emerging technologies. Technology Opportunities Analysis (TOA) attempts to capture technological dynamics using tech mining to develop Competitive Technical Intelligence (CTI) and grasp rising hotspots in a specific domain. Biomedical research requires strict procedures from formulation development to clinical trial and only few studies end up with market products. It is necessary and beneficial to carry out TOA regarding such developmental stages in biomedical field to grasp more detailed insights. In this study, we demonstrate how to classify biomedical text automatically based on characteristic descriptions regarding research stages of reported publications using multi algorithms, so as to identify potential opportunities and future research directions. We illustrate this research framework in how gold nanoparticles (GNPs) have been applied to biomedical domain in different aspects. We further illuminate how such correlational analysis can generate leading indicators for the future. |
Poster Session