Keyword Extraction and Technology Entity Extraction for Disruptive Technology Policy Texts

EasyChair Preprint no. 6883

5 pagesDate: October 19, 2021


The rapid development of disruptive technologies has attracted the attention of major countries in the world in recent years, and the mining and research on the texts of disruptive technology policies of these countries can reveal the key layout, focus areas, and development pattern of each countries disruptive technology.

This article first crawls the texts of disruptive technologies from the science and technology policy websites of major countries.

Then, the text is segmented by Spacy, the segment result is filtered by a word list to construct an applicable TF*IDF matrix, and finally the matrix weights are optimized with manually collected domain core words and important words. After these, extraction and statistics of technical entity are performed according to a specified word list. Through comprehensive analysis, it can be found that the keyword hotspots of the experimental texts are focused on artificial intelligence, information security, new energy, etc.

The key areas of specific disruptive technologies are artificial intelligence, air and space, and new generation communication technologies. The result reflects the current situation and policy focus of disruptive technology development in these countries.

Keyphrases: disruptive technology, keywords extraction, S&T policy, technology entity extraction

