ICSSI2024: INTERNATIONAL CONFERENCE ON THE SCIENCE OF SCIENCE & INNOVATION
PROGRAM FOR TUESDAY, JULY 2ND
Days:
previous day
next day
all days

View: session overviewtalk overview

08:45-09:45 Session LT2: Lightning Talk Round 1
Chair:
Daniel Larremore (University of Colorado Boulder, United States)
Location: Main Auditorium
08:45
Daniel Gross (Duke University, United States)
Who pays for scientific training in the United States?

ABSTRACT. Science is the cornerstone of modern innovation. All scientists must be trained, and doctoral training is expensive. In the United States, it is also heavily subsidized by government, industrial, and philanthropic funding. We collect and validate comprehensive, systematic data from 75 years of U.S. doctoral graduates' dissertations to determine who funded this training, in what fields, and how this support changed over time. We document large shifts in funding sources over this period, especially within the government and private sectors. We show that variation in the scale and subject matter priorities of different funding organizations has led to significant changes in the share of U.S. PhD graduates in specific subjects over time. Government funding has large effects on the annual number of scientists the U.S. higher education system produces: funding an additional 100 dissertations on average increases domestic PhD production by roughly 80 graduates. We discuss a range of potential additional uses of these data, which we intend to make more widely available when complete.

08:55
Yifan Qian (Kellogg School of Management, Northwestern University, United States)
Jian Gao (Kellogg School of Management, Northwestern University, United States)
Binglu Wang (Kellogg School of Management, Northwestern University, United States)
Zihang Lin (Kellogg School of Management, Northwestern University, United States)
Benjamin F. Jones (Northwestern University, United States)
Dashun Wang (Northwestern University, United States)
Untapped Potential and Gender Gap in US Faculty Patenting
PRESENTER: Yifan Qian

ABSTRACT. Universities are sources of key scientific and technological breakthroughs that support rising standards of living and improved health. However, scientific activity is vast, with over 2M articles now published each year. Science’s increasing specialization exacerbates the “burden of knowledge,” making identifying individuals with significant commercial and social impact potential challenging. To address this, our research aims to pinpoint key faculty members and uncover untapped innovation potential using high-scale data and innovative AI tools. Our analysis, conducted at a census level covering faculty across various universities and fields in the US, reveals a significant reservoir of untapped patenting potential. Moreover, our findings highlight a consistent gender gap across universities and academic fields, with female faculty members exhibiting larger untapped patenting potential. These findings offer valuable insights for research institutions and policymakers, emphasizing the importance of fostering faculty engagement in patenting activities to maximize the potential for innovation and commercialization.

09:05
Katie Spoon (University of Colorado Boulder, United States)
Clara Boothby (National Center for Science and Engineering Statistics, United States)
Kevin Welner (University of Colorado Boulder, United States)
Quantifying inequitable education pathways to scientific and technical jobs
PRESENTER: Katie Spoon

ABSTRACT. Many scientific and technical occupations have been slow to diversify by race, socioeconomic background and gender. Despite broad interest in measuring access to STEM careers, progress has been limited in part due to the practical complexity of collecting longitudinal data. Most studies of this topic focus on one particular transition, such as the transition from high school to college. We systematically investigate the education pathways students from various social groups take to the STEM workforce using comprehensive nationally-representative data from the National Center for Education Statistics and the National Center for Science and Engineering Statistics from 1970 to 2020. The scale and detail of this dataset enable us to quantify the magnitude and variation of inequality in access to STEM careers, contextualize its relationship with institutional prestige, and highlight the importance of intersectionality in studies of STEM workforce development.

09:15
Sadamori Kojaku (Binghamton University, United States)
Attila Varga (Indiana University, United States)
Filipi Nascimento Silva (Indiana University, United States)
Clara Boothby (Indiana University, United States)
Xiaoran Yan (Zhejiang Lab, China)
Stasa Milojevic (Indiana University Bloomington, United States)
Alessandro Flammini (Indiana University Bloomington, United States)
Filippo Menczer (Indiana University Bloomington, United States)
Yong-Yeol Ahn (Indiana University Bloomington, United States)
GRaM: A Gravity-Informed Model for Tracking Scientist Mobility in Knowledge Spaces

ABSTRACT. Representing scientific knowledge is critical for navigating the complex dynamics of scientific progress and engaging various stakeholders, including academia, businesses, and governments. They are interested in strategically monitoring and forecasting the evolution of emerging scientific fields and technological innovations. The emergence of initiatives such as the Foresight and Understanding from Scientific Exposition (FUSE) program and the Emerging Research Areas and their Coverage (ERACEP) initiative highlights a growing demand for innovative tools capable of identifying and characterizing the progression of new scientific domains and technologies. Traditional methodologies have primarily employed discrete representations of scientific entities, including keywords, classification trees, and citations to map the scientific landscape. While these discrete models simplify the complex landscape of science, they have difficulty grasping the fine details found in areas like Particle Physics, which encompasses a broad array of interrelated topics from the standard model to large hadron colliders. Such discretization often overlooks the fine-grained relationships between topics.

Recent advancements have enabled the exploration of continuous representation as an alternative, leveraging artificial neural networks and related technologies. Yet, these methods frequently act as black-box algorithms, obscuring whether the distances between data points in the representation correspond to meaningful physical quantities. In response, we introduce the Gravity Informed Transformer Model (GRaM), which advances beyond traditional science mapping by adopting a continuous representation of scientific literature based on an interpretable and quantifiable physical phenomenon--the flux of scientists moving between research topics. GRaM uses the Sentence BERT model's fine-tuning capabilities and captures the semantic characteristics and thematic trajectories of the scientific landscape, offering a more fine-grained and smooth perspective on scientific knowledge.

Fig. 1A illustrates the GRaM methodology, starting with the process of sampling sequences of publications per author from the training dataset. Each sequence consists of titles from randomly selected publications of an author, one per year. These sequences are then used to fine-tune a Sentence BERT model through noise contrastive estimation (NCE), aligning the similarity between two publications a and b with the probability of an author coauthoring these two papers in sequence. Finally, the model is used to infer the embeddings for all dataset publications based on titles to generate a continuous knowledge map. In addition to that, we incorporate a gravitational potential analogy derived from topic mobility.

Our analyses employ the SciSciNet bibliographic dataset, with nearly 134 million publications and authors, integrated with Web of Science (WoS) Core Collection subject categories. We evaluate GRaM across three primary tasks: mapping scientific works and authors, predicting collaboration ties, and forecasting new journal emergence. A 2D representation of the embedding, visualized through UMAP projection (Fig. 1B), reveals distinct regions corresponding to similar subject publications, including a detailed mapping of the physics field (Fig. 1C). Case studies of Nobel laureates Giorgio Parisi and Donna Strickland further showcase GRaM's capability in tracing scientific trajectories and the thematic evolution of researchers (insets 1 and 2, respectively). Fig. 1D shows that GRaM performance in recovering the local topical structure is similar to state-of-the-art methods.

We also evaluate GRaM's predictive accuracy in forecasting new research collaborations (Fig. 1E) and identifying the birth of new journals (Fig. 1F). By comparing distances within the derived space against a baseline of random researcher pairs, GRaM displays a remarkable superiority in predicting future collaborations, outperforming traditional and other embedding methods. Additionally, the analysis of gravity potential growth rates and their correlation with new journal paper densities illustrates GRaM's strength in predicting journal emergence, surpassing other models.

09:25
Miura Chiaki (The university of Tokyo, Japan)
Kimitaka Asatani (The University of Tokyo, Japan)
Ichiro Sakata (The University of Tokyo, Japan)
Scientists Have an Inherent Prioritized Queue in Selecting Collaborations
PRESENTER: Miura Chiaki

ABSTRACT. The temporal dynamics of co-authorship are of significant interest in understanding researchers' topic migration and its relation to impactful research. While longitudinal analyses of persistent collaboration have been conducted, there is ongoing debate about the existence and nature of consistent macroscopic patterns among groups of researchers. We focus on the time intervals of scientists' recurring collaboration, as it reflects the dynamism of researchers’ interests, assuming that individual dyads consistently engage with the same topic. We argue that empirical data suggest a fat-tailed interval distribution in collaboration, which can be explained by the priority selection model.

09:35
Alexander Furnas (Northwesttern University, United States)
Timothy LaPira (James Madison University, United States)
Dashun Wang (Northwestern University, United States)
Partisan Disparities in the Use of Science in Policy
PRESENTER: Alexander Furnas

ABSTRACT. Science, long considered a cornerstone in shaping policy decisions, is increasingly vital in addressing contemporary societal challenges. However, it remains unclear whether science is used differently by policymakers with different partisan commitments. Here we combine large-scale datasets capturing science, policy, and their interactions, to systematically examine the partisan differences in the use of science in policy across both the federal government and ideological think tanks in the United States. We find that the use of science in policy documents has featured a roughly six-fold increase over the last 25 years, highlighting science’s growing relevance in policymaking. However, the pronounced increase masks stark and systematic partisan differences in the amount, content, and character of science used in policy. Democratic-controlled congressional committees and left-leaning think tanks cite substantially more science, and more impactful science, compared to their Republican and right-leaning counterparts. Moreover, the two factions cite substantively different science, with only about 5% of scientific papers being cited by both parties, highlighting a strikingly low degree of bipartisan engagement with scientific literature. We find that the uncovered large partisan disparities are rather universal across time, scientific fields, policy institutions, and issue areas, and are not simply driven by differing policy agendas.Probing potential mechanisms, we field an original survey of over 3,000 political elites and policymakers, finding substantial partisan differences in trust toward scientists and scientific institutions, potentially contributing to the observed disparities in science use. Overall, amidst rising political polarization and science’s increasingly critical role in informing policy, this paper uncovers systematic partisan disparities in the use and trust of science, which may have wide-ranging implications for science and society at large.

11:00-11:30Coffee Break
11:30-12:00 Session IN: Plenary: Anne Hultgren
Chair:
Bhaven Sampat (Arizona State University, United States)
Location: Main Auditorium
12:00-12:30 Session IN: Plenary: Toby Smith
Chair:
Daniel Larremore (University of Colorado Boulder, United States)
Location: Main Auditorium
12:30-13:30Lunch Break
13:30-15:45 Session D: Technology and Social Sciences
Location: Room 120
13:30
Elle O'Brien (University of Michigan School of Information, United States)
Jordan Mick (University of Michigan School of Information, United States)
How is generative AI changing scientific software?

ABSTRACT. Scientific software is a cornerstone of research across disciplines. Scientists write software to facilitate data collection, generate data through computational simulations, investigate patterns in data through descriptive statistics and visualizations, and run statistical analyses to connect data to hypotheses. In recent years, the central role of software in scientific reproducibility has been highlighted by Open Science initiatives, which have encouraged research groups and publishers to adopt code-sharing practices (National Academies of Sciences and Medicine, 2018).

13:45
Lulin Yang (University of Pittsburgh, United States)
Lingfei Wu (University of Pittsburgh, United States)
Can Large Language Model Automate Academic Writing?
PRESENTER: Lingfei Wu

ABSTRACT. Large Language Models (LLM) have demonstrated remarkable writing abilities comparable to those of human experts. While the impact of LLMs on professions requiring basic writing skills has been extensively studied (Eloundou et al. 2023), their potential implications for advanced writing tasks, particularly in academia, remain uncertain. This inquiry holds significance due to the long-term perception of scholars as the most highly educated and the least susceptible to automation (Frey and Osborne 2017; Autor 2015). Against this backdrop, this study addresses the critical question: Can LLMs automate academic writing?

14:00
Linzhuo Li (Department of Sociology, China)
Yiling Lin (School of Computing and Information, The University of Pittsburgh, United States)
Lingfei Wu (School of Computing and Information, The University of Pittsburgh, United States)
Displacing Science
PRESENTER: Lingfei Wu

ABSTRACT. Recent research on the decline in the paper disruption index (D-index) has sparked heated debates among scholars and garnered significant attention from policymakers and research institution leaders globally. To bridge the gap between policymakers’ interest and scholars’ skepticism about the D-index, we present this article summarizing key insights from our eight-year investigation, including interviews with scientists across nine disciplines and an analysis of 41 million papers over six decades. Our work confirms the decline in disruptive papers, addresses relevant technical concerns, and makes several original contributions: we clarify that the D-index measures how new ideas render old ones obsolete, suggesting “Displacing” as an alternative interpretation for “D”; we show that federal funding agencies like the NIH and NSF are less likely to support disruptive research; and we introduce the “principle of functional equivalence" to explain the origins of recombining and displacing mechanisms in science, stressing that not all innovation problems are combinatorial, and challenging the belief that AI is the mature solution to scientific innovation. This article aims to promote broader and more accurate use of the D-index in research evaluation and to inspire new funding mechanisms for scientific breakthroughs.

14:15
Shahan Ali Memon (Information School, University of Washington, United States)
Jevin West (Information School, University of Washington, United States)
Science of AI-mediated Science
PRESENTER: Shahan Ali Memon

ABSTRACT. In "The Structure of Scientific Revolutions [1]," Kuhn describes scientific progress as a non-linear accumulation of knowledge guided by periods of normal science often followed by paradigm shifts. These paradigm shifts often alter the foundational practices of scientific fields. There have been many technological revolutions that have led to paradigm shifts in science. The microscope, for example, led to the discovery of cell theory. The invention of the telescope revolutionized our understanding of the universe, leading to the foundation of modern astronomy. More recently, the accessibility of vast digital scholarly data, coupled with advanced computing capabilities, has given rise to the Science of Science (SciSci). The wide adoption of artificial intelligence (or AI) could have a similar effect on science. The difference is that AI has not just changed a single field of science but has transcended its impact across disciplinary boundaries. The ability of AI to process vast amounts of data, learn from patterns in the data, and perform complex tasks has made it a disruptive force in many fields and industries. Examples include AlphaFold which is an AI system that predicts a protein's 3D structure from its amino acid sequence, or Large Language Models that are currently being used in various industries such as healthcare, finance, education, and so on.

AI is changing how scientists communicate but also potentially how they ask questions and how they do science. Increasingly, the various processes of science are being mediated by AI. New tools are being developed to aid scholars with hypothesis generation, literature review, data collection, experimentation, writing, and so on. Consequently, with the adoption of these tools, the science we produce is changing. We call the study of this new AI-infused science the Science of AI-mediated Science. The Science of AI-mediated Science revolves around five dimensions: (i) development of AI tools and systems for science, such as Consensus, an AI-driven search engine for literature review; (ii) evaluation and auditing of AI tools and systems, involving rigorous assessment against benchmarks and the establishment of best practices; (iii) analysis of the impact of AI on scientific practices, leveraging large-scale scholarly data to conduct an empirical analysis of how AI is changing science; (iv) exploration of the epistemological, philosophical, and ethical implications of AI-mediated science; and (v) surveying and monitoring the developments within the field. Table 1 provides a snapshot of the field's current state across these various dimensions, cross-tabulated with processes of science. The evident sparsity within the cells attests to the early developmental stage of the field, denoting opportunities for exploration.

Our contribution is three-fold: first, we provide a living survey of the field by summarizing the published and ongoing research along various dimensions and processes of science. Second, we offer a conceptual framework geared towards analyzing the role and impact of AI on science. Third, we provide a research agenda with pertinent opportunities and open research questions for the new scientific field. Conceptually, this emerging field is an offspring of the Science of Science paradigm with an emphasis on AI's role as an intermediary or guide in the scientific process. Our aim is to help facilitate a meaningful and structured form of conversations and collaborations among an interdisciplinary cohort of scholars in the SciSci community, especially to set an agenda for the co-creation of the Science of AI-mediated Science.

References 1. Kuhn, T. S. (1997). The structure of scientific revolutions (Vol. 962). Chicago: University of Chicago press.

14:30
Eamon Duede (Harvard University, United States)
Ian Foster (University of Chicago, United States)
William Dolan (University of Chicago, United States)
Karim Lakhani (Harvard Business School, United States)
Oil & Water? Diffusion of AI Within and Across Scientific Fields

ABSTRACT. Prepared in Latex. See attached file.

14:45
Sai Koneru (Pennsylvania State University, United States)
Jian Wu (Old Dominion University, United States)
Sarah Rajtmajer (Pennsylvania State University, United States)
Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences

ABSTRACT. Hypothesis formulation and testing are central to empirical research. However, with exponential increase in the number of scientific articles published annually, manual aggregation and synthesis of evidence related to a given hypothesis is a challenge. Scholarly databases fail to aggregate, compare, contrast, and contextualize existing studies in a way that allows comprehensive review of the relevant literature. Work in the areas of natural language processing (NLP) and natural language understanding (NLU) has emerged to address various challenges related to synthesizing scientific findings. Automated approaches for fact-checking, for example, have received significant attention in the context of misinformation to assess the accuracy of a factual claim based on a literature. What remains a gap, however, are methods to determine whether a research question is addressed within a paper based on its abstract, and if so, whether the corresponding hypothesis is supported or refuted by the work. Automatically identifying published work in support or refute of particular hypotheses will facilitate building connections between publications beyond citations and aggregating scientific contributions to automatically and dynamically evaluate hypotheses with strong and weak evidence.

In this work accepted at LREC-COLING 2024, our contributions are as follows. First, we propose the scientific hypothesis evidencing (SHE) task which is defined as the identification of the association between a given declarative hypothesis and a relevant abstract. This association can be labeled either entailment, contradiction, or inconclusive.

Second, we curate a novel Collaborative Reviews (CoRe) dataset for the task using community-driven annotations of studies in the social sciences. Our CoRe dataset is built from 12 different open-source collaborative literature reviews actively curated and maintained by domain experts and focused on specific questions in the social and behavioral sciences. The dataset contains 69 unique hypotheses tested across 602 different scientific articles. The findings are aligned to 3 labels leading to a total of 638 triplets containing abstract, hypothesis, and label. We split the dataset into to training (70% ), development (15%), and held-out test (15%) sets.

Finally, we evaluate state of the art NLU models on the SHE task. Specifically, we evaluated two families of NLP methods on the task using our dataset: transfer learning models; LLMs. In the case of transfer learning models, we evaluate sentence pair classifiers based on pre-trained embeddings and Natural Language Inference models. For the sentence pair classification, concatenated hypothesis and abstract embeddings are used as input to the model, which contains three successive fully-connected layers followed by a three-way softmax layer. We evaluate the performance of two pre-trained embedding models: longformer; and OpenAI's text-embedding-ada-002. In case of transfer learning, we treat the task as an Natural Language Inference using an abstract as the premise and determine whether it entails a given hypothesis. Among models proposed for the NLI task, we evaluate the Enhanced Sequential Inference Model (ESIM) and Multi-Task Deep Neural Network (MT-DNN).

We tested two LLMs, namely OpenAI's ChatGPT and Google's PaLM 2, and experimented with five prompts used in prior work. All are prefix prompts, i.e., prompt text comes entirely before model-generated text. Depending on the prompt template, we requested LLMs return one of three sets of labels: (true, false, neutral); (yes, no, maybe); (entail, contradict, neutral). We tested the models in a zero-shot setting, retrieval-augmented few-shot, and using prompt ensembling with majority voting to ensemble the outputs of our five individual prompts. Within the PDF version of the paper we summarize model performance on the test set. Reported metrics are averaged across experimental settings. The sentence pair classification model using text-embedding-ada-002 embeddings yielded the best performance achieving a macro-F1-score of 0.615, followed by the pre-trained gpt-3.5-turbo model with prompt ensembling in the few-shot setting.

The observation that all models achieve macro-F1-scores less than 0.65 demonstrates that SHE is a challenging task for current NLU and that LLMs do not seem to perform better than traditional language models and transfer learning models. Our study quantitatively showcases the limited reasoning capability of state of the art LLMs and suggests there is still a ways to go before LLMs are readily usable for discerning evidence of scientific hypotheses, at least in the social sciences. Our dataset has been shared with the research community.

15:00
Nur Ahmed (Sloan School of Management and CSAIL, MIT, United States)
Amit Das (Lundquist College of Business, University of Oregon, United States)
Kirsten Martin (Mendoza College of Business, University of Notre Dame, United States)
Kawshik Banerjee (Dept. of CSE, RUET, Bangladesh)
The Narrow Depth and Breadth of Corporate Responsible AI Research

ABSTRACT. The rapid development and deployment of Artificial Intelligence (AI) technologies have brought both benefits and risks to society. Responsible AI (RAI) research, which critically examines the ethical and societal implications of AI, is crucial for ensuring that this powerful technology is developed and deployed in a socially beneficial manner (Acemoglu, 2021; Brynjolfsson, 2022). However, the extent of industry engagement in this critical subfield of AI remains largely unknown. This study addresses two main research questions: (1) To what extent do AI firms engage in the critical examination of their technologies through RAI research? (2) To what extent do AI firms rely on RAI research in their commercial inventions? To answer these questions, we systematically analyzed over 6 million peer-reviewed articles published between 2010 and 2022, as well as 22 million patent citations (Marx & Fuegi, 2020). We employed supervised machine learning techniques to classify articles as RAI research and measured industry engagement based on author affiliations and patent citations. Our analysis was conducted on three distinct datasets to assess the extent of industry's engagement in responsible AI research.

Fig. 1: Degree of engagement in responsible AI research by AI firms The results indicate that the majority of AI firms show limited or no engagement in RAI research. Among over 1,700 firms holding AI patents, 89.9% did not publish any RAI research from 2010 to 2022, and only 3.3% published 5 or more responsible AI papers during this period (Fig. 1). Even among firms publishing AI research, only 11.2% demonstrated meaningful engagement in RAI research. Our results also show that leading AI firms produce far fewer RAI papers compared to their conventional AI (CAI) research engagement and trail top-ranked universities in RAI research (Fig. 2a). Moreover, industry's participation in RAI conferences has remained relatively constant (20-23%) from 2010 to 2022, in stark contrast to their increasing presence (29-37%) in CAI conferences (Figure 2b). Our analysis, using multiple NLP techniques, incorporating both quantitative and qualitative analysis, reveals that the industry places less emphasis on critical issues such as moral considerations, environmental concerns, and the ethical consequences of AI, compared to academia. Furthermore, a large-scale generic patent citation analysis indicates that industry rarely incorporates RAI research into their commercial products (Fig. 3). Only 19 RAI papers were cited in patents from 2010 to 2022, in contrast to over 8,000 CAI papers. A separate analysis of AI patents up to 2018 shows that none cite any responsible AI papers. These findings have important implications for the future development and governance of AI technologies. The industry's limited engagement in RAI research could inhibit their capacity to absorb academic knowledge, identify and mitigate potential harms, and foster public trust in AI. Our study highlights the need for policymakers to encourage industry participation in RAI research and for the research community to make their work more accessible and actionable for practitioners.

Fig. 2: (a) Engagement disparity between industry and academia. (b) The proportion of papers with industry-affiliated authors in conventional vs responsible AI conferences.

Fig. 3: Number of papers cited in patents over the years. Overall, this large-scale study provides compelling evidence that, despite having substantial resources, industry's involvement in RAI research remains limited in depth and breadth, contrasting starkly with its dominance in CAI research (Ahmed, Wahed, & Thompson, 2023). A significantly smaller number of firms engage in RAI research and they show a lack of enthusiasm towards critical issues in this domain. Furthermore, we observe industry is not integrating RAI research into its product commercialization. This lack of engagement could inhibit industry's capacity to anticipate and mitigate AI-induced societal harms. Consequently, policymakers may need to incentivize greater industry participation in RAI research to align AI development with societal values and ensure public trust in this increasingly pervasive technology.

15:15
Juan Mateos-Garcia (Google DeepMind, UK)
Tracking equity in adoption and impact from AI in Science: The case of AlphaFold

ABSTRACT. Artificial Intelligence (AI) stands to transform scientific discovery by enabling faster and better search in vast combinatorial spaces of hypotheses (Wang et al. 2023) but the implications for equity are ambiguous: AI could lower barriers to entry for historically underrepresented communities or push further ahead established actors who have the relevant resources and skills. We try to shed light on this process through an analysis of AlphaFold, arguably the most consequential application of AI in science so far (Jumper et al. 2021). Our context is structural biology. A protein’s structure determines its function and is critical for understanding biological processes and disease. Structural biologists have historically spent years determining the structure of individual proteins using experimental methods. More recently, computational biologists started using AI tools to predict protein structures from amino acid sequences. In 2020, AlphaFold 2 (AF), an AI system developed by DeepMind effectively “solved” this grand challenge for single domain proteins (Pereira et al. 2021). DeepMind published AF’s methodology, open source code and, together with EMBL-EBI (the European Bioinformatics Laboratory), an open AF Protein Structure Database (AFDB) with over 200 M predicted protein structures (Varadi et al. 2022). AF’s arrival is widely perceived to have revolutionised structural biology, opening up new avenues for research and applications (Callaway 2022). Its methods paper has received more than 20,000 citations and AFDB has 1.7 million users (Kovalevskiy, Mateos-Garcia, and Tunyasuvunakool 2024).

15:30
Junsol Kim (University of Chicago, United States)
Zhao Wang (Northwestern University, United States)
Haohan Shi (Northwestern University, United States)
Ling (University of Michigan, United States)
James Evans (University of Chicago, United States)
Individual Misinformation Tagging Reinforces Echo Chambers; Collective Tagging Does Not

ABSTRACT. Fears about the destabilizing impact of misinformation online, such as the spread of vaccine hesitancy and Capitol attack, have motivated individuals and platforms to respond. Individuals have become empowered to challenge others’ online claims with misinformation tags (or fact-checks) as “vigilantes” in pursuit of a healthy information ecosystem and to break down ideological echo chambers. Platforms have experimented with collective misinformation tagging systems that verify the accuracy of content through collective inputs from a wider distribution of users. Notably, on Twitter’s new platform, Community Notes, misinformation tags undergo a peer-review process by diverse users before being revealed to the original posters. Community Notes selectively exposes misinformation tags that receive votes from heterogeneous user groups, ensuring they are verified across a broad spectrum of perspectives to activate the “wisdom of crowds.”

13:30-15:45 Session E: Science and Innovation
Chair:
Jeff Tsao (Sandia, United States)
13:30
Yanbo Wang (Faculty of Business and Economics at the University of Hong Kong, Hong Kong)
The Contribution of Chinese Science to US Technological Advancement: Evidence from Patent Citation of Academic Papers

ABSTRACT. Science has played a crucial role in propelling technological advancements (Fleming & Sorenson, 2004; Marx & Fuegi, 2020, 2022; Nelson, 1982). Although China's scientific advancements have been well-documented (Gaulé & Piacentini, 2013; Qiu et al., 2022; Xie et al., 2014), the extent to which the world, particularly innovation leaders such as the US, has leveraged China-produced scientific knowledge (“China Science”) for technological advancement remains unclear. A common belief is that China's contribution to the US, compared to the benefits it receives from bilateral collaborations, is minimal. Therefore, we know little about the evolving dynamics in bilateral knowledge spillovers, particularly regarding the extent to which Chinese science has been leveraged by US innovators for technological advancement. We examine the pattern of academic knowledge flow from foreign science to US technology during the period of 2000 - 2020. Our data are from multiple sources. We primarily utilize the Reliance on Science database, created by Marx and Fuegi (2020, 2022), to obtain information on academic paper citations in US patents. We further use PatentsView to obtain information on US patents. For publications cited by US patents, we use Microsoft Academic Graph (MAG) to identify each author’s country information through their primary institutional affiliation’s geographic location. For papers with multiple authors, we concentrate on the last author in identifying the country origin of knowledge production. In conclusion, we study the contribution of Chinese science to US technology, as evidenced by patent citation of scientific publications. Our analysis documents that foreign science is prevalent in US patents, with China emerging as a leading foreign producer of science cited in US patents. In 2020, nearly one-third of US science-reliant patents referenced articles published by Chinese scholars, representing a six-fold increase compared to 2000. While Chinese science has predominantly played a supplementary role in US technological advancement, we observe that in nearly 15% of US patents citing Chinese science, China serves as the primary source of referenced scientific knowledge.

13:45
Aj Alvero (University of Florida, United States)
Ruohong Dong (University of Florida, United States)
Klint Kanopka (New York University, United States)
David Lang (Stanford University, United States)
Algorithmic Tradeoffs, Applied NLP, and the State-of-the-Art Fallacy

ABSTRACT. Computational sociology is a growing area of research, but the computational tools used for analysis vary greatly in their power, transparency, and usage. It is also the case that methods developed in computational science become popularized once they are able to surpass shared benchmarks of classification and prediction accuracy. In this short paper, we consider how these dynamics might influence computational method selection in sociology and how this might influence sociologists to select methods that do not best answer or fit their research question. To demonstrate our point, we compare results from a previous natural language processing (NLP) driven analysis of college admissions essays (n = 240,000) using more recent and older methods. The methods are also distinct in their relative statistical flexibility and opacity. We find that the more newer, more sophisticated methods which surpassed the older, more transparent methods in past computational benchmarks and popularity did not exceed the performance reported in the original study. These results suggest the latest computational tools might not always be appropriate by default. We also consider the social meaning and implications of our findings to future research on text analysis and authorship characteristics. Our findings inform similar studies of text but also sociological adoption of NLP methods.

14:00
Larisa Cioaca (Duke University, United States)
Ashish Arora (Duke University, United States)
Sharon Belenzon (Duke University, United States)
Ferracuti (Duke University, United States)
The Private Value of Innovating for the Government

ABSTRACT. Government spending on innovation can be classified as “push” and “pull” policies. Whereas push spending pays for R&D inputs (e.g., through research grants), pull spending pays for outputs─innovatitive products─to pull in private R&D investments. Sometimes push and pull policies are used in conjunction, such as when the government “guarantees demand” by rewarding firms that complete government-funded R&D contracts with noncompetitive production contracts for the resulting products. Anticipating future public demand, firms co-invest at the R&D stage using their private funds. This paper studies the hybrid use of these push and pull R&D policies. We develop a conceptual framework to explain when it is optimal for the government to bundle R&D contracts with production contracts and the implications of this bundling for the private value of R&D contracts for firms.

14:15
Nandini Banerjee (University of Notre Dame, United States)
Diego Gomez-Zara (University of Notre Dame, United States)
All-Female Teams Produce More Disruptive Work: Evidence from Scientific Papers
PRESENTER: Nandini Banerjee

ABSTRACT. Disruption quantifies the extent to which a scientific work consolidates its predecessors’ ideas and disrupts the subsequent use of the components on which it was built [2, 6]. Previous research has analyzed the characteristics of scientific teams that produce disruptive papers and patents. Smaller team sizes [6], new collaborative relationships amongst members [8], less hierarchy [7], and teams that do not work remotely [3] are shown to be associated with producing disruptive work. Despite these findings, the influence of team gender composition on disruption remains uninvestigated. In this study, we evaluate the effects of gender composition on scientific disruption. We analyzed 20 million papers produced by teams of different gender compositions between 1950-2010 using the SciSciNet [4] dataset. This analysis included the papers’ metadata, including their disruption index (DI), year of publication, author’s gender, and field of study. A disruption value of -1 denotes papers that consolidate their predecessors' ideas, while a DI of 1 denotes papers that upend their predecessors’ ideas [2, 6]. We considered papers disruptive if their DI lay in the 95th percentile for the papers produced that year. To infer the authors’ gender, SciSciNet offers a probability metric indicating the likelihood of a name belonging to a female individual [5]. We assigned researchers as female if their probability score was higher than 50%. A team’s gender composition was calculated as a percentage of female authors identified, resulting in all-male teams (probability equal to 0%), 50/50 teams (50%), and all-female teams (100%). To analyze the effect of gender composition on disruption, we compared the ratio of disruptive papers to total publications produced annually among teams of three gender compositions (all-male, 50/50, and all-female) from 1950 to 2010 (Fig. 1a). For all-female teams, the proportion of disruptive work produced increased as the decades progressed. In contrast, all-male and 50/50 teams had a lower and stagnant rate of the proportion of disruptive work produced. A t-test confirms the differences between the proportion of disruptive work produced by teams of different gender compositions were significant (t=9.7, p<0.1). Thus, our results show that all-female teams have produced a higher proportion of disruptive papers than teams of other gender compositions, with a notable increase in contribution over the past two decades. We also analyzed the effect of team size on disruption, as established by previous research [6], for all-male, 50/50, and all-female teams with sizes between 2 and 15 (Fig. 1b). Expectedly, DI decreased as team size increased, but smaller all-female teams appeared more disruptive for the same team size than all-male or 50/50 teams. Thus, even though small teams are more disruptive than large teams, amongst small teams, the gender composition of a team affects which types of teams are more disruptive. Together, these results suggest a significant relationship between team gender composition and disruption. To explore potential explanations for these findings, previous research shows that women participate more in team activities when teams have a female majority than a minority [1]. Thus, an all-female team’s environment seems to influence the members’ ability to fully participate in creating disruptive research. Our results will help inform diversity, equity, and inclusion (DEI) initiatives to allow for greater equity and opportunities for female researchers. We plan to investigate if these trends are also portrayed in patent teams and present these results in the future.

14:30
Haochuan Cui (Nanjing University, China)
Yiling Lin (University of Pittsburgh, United States)
Lingfei Wu (University of Pittsburgh, United States)
James Evans (The University of Chicago, United States)
Age,Aging Scientist and Slow Scientific Advance
PRESENTER: Haochuan Cui

ABSTRACT. With rising life expectancies around the world and an older scientific workforce than ever before, what does aging mean for individual scientists and what do aging scientists mean for scientific progress? In this study, we first discovered the anchoring effect of citation behavior in individual scientists’s career, and characterize how scientists in all fields and at all levels of productivity age relative to the collective frontier of scientific knowledge. We contribute to prior literature on aging in science by linking scientific advance and obsolescence with individual aging and demonstrate that with age comes not only a preference for aging ideas, but active defense against new ones. We further show how this individual proclivity toward defense accumulates within fields and forecasts a reduction in the churn of new ideas conceived and published. By linking the demography of the scientific workforce to the use and production of new ideas, we offer novel insight to concerns about diminishing returns to past scientific investment and efforts.

14:45
Huaxia Zhou (Northwestern University, United States)
Mengyi Sun (Northwestern University, United States)
Aliakbar Akbaritabar (Max Planck Institute for Demographic Research, Germany)
Emilio Zagheni (Max Planck Institute for Demographic Research, Germany)
Luís A. N. Amaral (Northwestern University, United States)
Partition and reunification of science under external shocks in Germany

ABSTRACT. Scientific enterprise has transformed society and the economy in significant ways, yet it remains susceptible to external social factors that could change its trajectory. To better understand external perturbations on the scientific enterprise, we take advantage of the natural experiment that occurred in Germany. The German science system underwent a process of first separation and then reunification, providing a unique opportunity to evaluate the impact of external shocks on the science system (Archambault, Mongeon, & Larivière , 2017). Despite the significance of this historical event, limited quantitative research has been conducted on the path from partition to reunification of the German science system.

15:00
Jingyuan Zeng (London School of Economics and Political Science, UK)
The making of high-tech clusters: Evidence from early-mover corporate labs

ABSTRACT. Do firms that patented early on breakthrough innovation exclude rivals from key intellectual assets and block the diffusion of breakthrough? This paper compiles novel historical micro data on corporate labs' longitudinal R&D performances in each technology class, and proposes a unique natural experiment to exogenize corporate labs' head start in the late 1950s' microchip breakthrough. Then, the paper sequentially unveils whether the head start of corporate labs facilitated internal R&D outcomes, rivals' R&D performance, and cluster development. Firstly, evidence shows that a quasi-exogenous head start of corporate labs in the late 1950s' microchip research triggered persistent R&D and product development, rather than a one-off acquisition of monopoly patent rights. Secondly, corporate labs' head start in microchip research raised innovation and product development to a larger extent when they are exposed to more intense competition from other early-mover rivals. Thirdly, rather than hampering the diffusion of breakthrough, early-mover labs were instead strategically sharing their knowledge at a significantly higher rate, which consolidated their dominant designs and directed rivals' innovation toward exploitative, process innovation. Lastly, early-mover labs were pivotal catalyst of local agglomeration forces, spurring "Silicon Valleys" beyond California.

15:15
Huaxia Zhou (Northwestern University, United States)
Mengyi Sun (Northwestern University, United States)
Anomaly detection on career independence of biomedical researchers using bibliometric database

ABSTRACT. Achieving career independence, marked by securing a tenure-track position, is a significant milestone for scholars. In biomedical research, this milestone has traditionally been inferred from large-scale bibliometric databases based on the last authorship, due to the field's specific publication norms. However, the accuracy of this method has not been systematically accessed. Here, we analyze the career outcome of 2.7 million biomedical researchers using OpenAlex1, the largest open-source bibliometric database. Unexpectedly, we found an anomaly where over 70% of authors securing the last authorship did so in their first career year, which seems improbable and suggests potential estimation errors (see Fig. 1). This anomaly is prevalent across various years, countries, and disciplines.

15:30
Giorgio Tripodi (Kellogg School of Management, Northwestern University, Evanston, IL, USA, United States)
Yifan Qian (Kellogg School of Management, Northwestern University, Evanston, IL, USA, United States)
Benjamin F. Jones (Kellogg School of Management, Northwestern University, Evanston, IL, USA, United States)
Dashun Wang (Kellogg School of Management, Northwestern University, Evanston, IL, USA, United States)
Tenure and Scientific Production
PRESENTER: Giorgio Tripodi

ABSTRACT. Tenure is the most important contractual right in the US academic system. It grants faculty members a permanent position and, as recorded in the 1940 Statement of Principles on Academic Freedom and Tenure, guarantees the necessary academic freedom and job security to attract and retain “men and women of ability”(1). Against this backdrop, tenure marks a pivotal point in academic careers. On the one hand, pursuing tenure creates a strong incentive to produce a substantial number of high-quality articles. On the other, job security associated with a tenured position drastically modifies the system of academic incentives, and faculty productivity may diminish after the promotion is achieved. At the same time, job security also grants faculty members the freedom to focus on what they value the most, potentially resulting in greater exploration or risk-taking behavior. To shed light on the multifaceted effects of tenure on scientific production, we reconstruct the careers of ~13K US faculty members who have experienced a transition from a tenure-track position to a tenured one between 2012 and 2015 across life, physical, and social sciences. To do so, we rely on a unique longitudinal dataset maintained by the Academic Analytics Research Center (AARC) (2). We primarily focus on four interconnected aspects of scientific careers: (i) research productivity, as measured by the number of publications; (ii) scientific impact and novelty, as a proxy for risk-taking behavior; (iii) heterogeneity and variance in scientific production, and (iv) topic exploration. Our results show that tenure shapes the average research productivity trajectory, irrespective of career age. However, norms of collaboration in different fields matter considerably. Indeed, we can distinguish two clear patterns. Contrary to previous empirical evidence based on specific fields, we find that the so-called “canonical” trajectory (i.e., rapid increase and decline) only characterizes faculty working in fields dominated by small teams (3). Conversely, in fields generally organized through labs, faculty do not experience a decline in productivity after achieving tenure but rather a stabilization (see Figure 1). We also find that not all faculty members respond to tenure in the same way, and the share of faculty members who increase or decrease their productivity level after tenure also depends on collaboration norms (4). Further, after tenure, faculty produce relatively fewer hit papers (i.e., papers in the top 5% of most cited papers in a given year and subfield) but tend to publish a larger share of novel articles (i.e., papers with atypicality score lower than zero), confirming the hypothesis that academic freedom incentivizes scientists to explore unconventional ideas (5). Tenure also affects the variance of scientific production across faculty members. The relative level of inequality within fields, measured via the Gini coefficient, decreases up to the year before tenure and then sharply increases when faculty do not face binding productivity constraints. Regarding topic exploration, we apply a community detection algorithm to the co-citing network of each faculty member’s publication list to classify the topic of each paper according to the community it belongs to (6). Faculty members re-organize their research portfolio after tenure: around 50% start a new agenda and abandon some topics explored early in their careers. However, the large majority keep working on some of the topics they focused on before tenure. Finally, a fair share of faculty further diversifies the research portfolio after promotion (particularly in fields with collaboration norms), and productivity increases or remains constant for those who do, while it decreases for those who do not update their research agenda. Our results on the role of tenure in shaping scientists’ productivity, impact, and research agendas carry explicit and direct policy implications that can enhance the ability of institutions governing academic research—including university committees and funding agencies—to encourage more significant innovation and novel scientific directions.

13:30-16:00 Session F: Publishing and Research Dissemination
Chair:
Yifang Wang (Northwestern University, United States)
Location: Main Auditorium
13:30
Yotam Sofer (Department of Strategy and Innovation, Copenhagen Business School, Denmark)
Exploring the Role of Social Media in the Diffusion of Economic Research

ABSTRACT. For more than a decade, social media have become a key channel for knowledge dissemination used by scientists as a whole, and economists in particular. However, their role in the diffusion of knowledge is understudied. This article investigates the role of social media visibility of working papers on diffusion outcomes. While previous studies focused on the diffusion of STEM research, this article explores the diffusion of economic research. To do so, a data set of all NBER working papers published between 2015-2018, covering their social media mentions, as well as bibliometric and altmetric indicators, is used. To estimate the causal effect of social media visibility on diffusion, an instrumental variable approach, leveraging quasi-random variation in social media posting policy of the NBER’s communication office, is employed. The results indicate heterogeneity in the role social media play in the diffusion of economic research. Increased social media visibility of working papers positively affects the likelihood and the extent to which research is diffused to public discourse (measured by blogs and news mentions), within the first year from publication, as well as within the scientific community (measured by academic citations), four years post-publication. No effect on citations in policy documents was found. Lastly, the likelihood to publish a working paper in a peer reviewed journal is found to be unrelated to social media visibility of the working paper. The results of this article provide evidence for the role social media play in the diffusion of economic knowledge.

13:45
Salsabil Arabi (University of Wisconsin-Madison, United States)
Chaoqun Ni (University of Wisconsin-Madison, United States)
B. Ian Hutchins (University of Wisconsin-Madison, United States)
You do not receive enough recognition for your influential science

ABSTRACT. During career advancement and funding decisions in biomedicine, reviewers have traditionally relied on journal-level measures of scientific influence like the impact factor in profile evaluation. Prestigious journals are believed to pursue a reputation of exclusivity by rejecting a substantial number of submissions, many of which may be of high quality. This practice may inadvertently create a system where some influential articles are prospectively published in prestigious journals, while most impactful works are overlooked. Therefore, we intend to measure the degree to which journal prestige hierarchies capture or overlook influential science. We quantify the fraction of scientists’ articles that would receive recognition because they are published in journals above a chosen impact factor threshold or are at least as well-cited as articles appearing in such journals. Our analysis suggests that the number of papers cited at least as well as those appearing in high-impact factor journals vastly exceeds the number of papers published in such venues and this trend persists across gender, racial, and career stages of individual scientists. We also find that approx. half of researchers never publish in a venue with an impact factor above 15, potentially excluding them from consideration for opportunities when evaluations heavily prioritize journal prestige.

14:00
Miftahul Jannat Mokarrama (Northern Illinois University, United States)
Hamed Alhoori (Northern Illinois University, United States)
Are Research Articles Cited in Relevant Policy Documents? A Quantitative Analysis Using Large Language Models (LLMs)

ABSTRACT. In recent years, there has been a growing concern and emphasis on conducting research beyond academic or scientific research communities, benefiting society at large. A well-known approach to measuring the impact of research on society is enumerating its policy citation(s). Policies generally direct decision-makers to implement rules and regulations or adopt laws that affect people. Therefore, investigating how research evidence is used in policy documents is critical for understanding real-world implications and applications of academic works. Despite the importance of research in informing policy, there is no concrete evidence to suggest that all research cited in policy documents is relevant. This is concerning because invalidated research citations in policy documents can undermine the trustworthiness of policymakers and nullify their public acceptance. Therefore, it is crucial to identify the degree of relevance between research articles and citing policy documents. In this paper, we examined the scale of textual relevance of youth-focused research in the referenced US policy documents using natural language processing techniques, state-of-the-art pre-trained Large Language Models (LLMs), and statistical analysis. From our experiments and analysis, we conclude that research articles that get US policy citations are likely to be relevant to the citing policy documents.

14:15
Fengyuan Liu (New York University Abu Dhabi, UAE)
Bedoor Alshebli (New York University Abu Dhabi, UAE)
Talal Rahwan (New York University Abu Dhabi, UAE)
Editors handle the submissions of their collaborators and colleagues despite explicit policies
PRESENTER: Fengyuan Liu

ABSTRACT. Editors are crucial to the integrity of the scientific publishing process, yet they themselves could face conflicts of interest (COIs). According to policies adopted by various publishers, editors have a potential COI when they handle submissions from their colleagues or recent collaborators. When handling such submissions, editors could potentially jeopardize the integrity of the editorial decisions, if they treat such submissions favourably, consciously or otherwise. Naturally, a number of policies have been put in place to govern such COI, but their effectiveness remains unknown. We fill this gap by analyzing half a million papers handled by 60,000 different editors and published in 500 journals by six publishers, namely Frontiers, Hindawi, IEEE, MDPI, PLOS, and PNAS. We find numerous papers handled by editors who collaborated recently with the authors; this happens despite policies explicitly prohibiting such behavior. Overall, nearly 6% of journals have a COI rate >10%, and over half of them have a COI rate >2%. Moreover, leveraging three quasi-experiments, we find that COI policies have a limited, if any, effect on regulating this phenomenon. Finally, we revealed a suitability-integrity trade-off when managing editorial COI, by showing that 30% of papers with COI would have been handled by a less-suitable editor, were COI policies to be enforced. These findings highlight the need for policy reform to assure the scientific community that all submissions are treated equally.

14:30
Emily Escamilla (Old Dominion University, United States)
Vicky Rampin (New York University, United States)
Jian Wu (Old Dominion University, United States)
Martin Klein (Los Alamos National Laboratory, United States)
Michele Weigle (Old Dominion University, United States)
Michael Nelson (Old Dominion University, United States)
Toward Long-term Computational Reproducibility: Assessing the Archival Rate of URLs to Git Hosting Platforms in Scholarly Publications

ABSTRACT. Reproducibility is a foundational principle of scientific research and it is contingent on the availability of the original methodology, including software products and data. Recent studies indicated that a significant fraction of Uniform Resource Locators (URLs) linking to open-access datasets or software that claimed to be available are no longer accessible. This project focuses on studying the current archival efforts to preserve software products in scholarly publications and reveal the technical and policy gaps to better preserve software products using archival services toward long-term computational reproducibility.

14:45
Yixuan Liu (Northeastern University, United States)
Rodrigo Dorantes-Gilardi (Northeastern university, United States)
Albert-László Barabási (Northeastern University, United States)
Publishing in Nature is a Catalyst for Scholarly Impact

ABSTRACT. Established in 1869, Nature is one of the flagship journals for outstanding scientific break- throughs, often publishing highly cited articles whose impact can reach outside of Academia. In order to publish in Nature, however, authors face a rigorous review characterized by a notably low acceptance rate, along with a publishing fee that may pose a financial barrier for many researchers globally. While previous research has explored the role of factors like research institutions, journals, and co-authors in shaping scientific careers, there is a lack of comprehensive analysis on the specific impact of publishing in prestigious venues like Nature. This study aims to bridge this gap by investigating the effects of Nature publications on researchers’ performance metrics, such as citation counts and productivity, while accounting for variations across career stages and gender. By leveraging a large-scale dataset and employing robust analytical methodologies, we seek to illuminate the intricate pathways through which academic influence and recognition are cultivated, offering insights into the evolving prestige of high-impact journals and their broader implications for the scientific ecosystem. We use Dimensions, a comprehensive dataset encompassing over 140 million linked publications from authors and research entities, to identify a cohort of 4,000 authors who have published physics papers in Nature and a matched control group of peers with equivalent pre-publication performance metrics. The analysis utilizes temporal data on publication and citation patterns for this cohort. Additionally, we determine author gender using methodologies validated in prior research. To assess the impact of Nature publications on researchers’ trajectories, we employ a heterogeneous difference-in-differences methodology, which accounts for group and temporal effects in analyzing academic outcomes post-publication. This approach allows us to robustly evaluate the causal effect of publishing in Nature while ensuring the validity of the parallel trends assumption. The results reveal a notable upward trend in post-publication performance, with a 5-year Average Treatment Effect on the Treated (ATET) of 393 and a 10-year ATET of 1116. Remarkably, despite women comprising less than 14% of the sample, no significant gender-based disparities in citation or publication growth were detected (p-value = 0.938 with the Kolmogorov- Smirnov test). Notably, from 1990 onwards, we observe a marked increase in citation counts for Nature publications, exceeding 1000 citations by 2010, suggesting an ascending trajectory of its prestige. Furthermore, distinct responses emerge across career stages. Mid-career researchers (11-25 years after first publication) exhibit the highest growth in scholarly output, while late-career researchers (25 years or more after first publication) benefit from the most significant uplift in citation metrics. This pattern suggests that while mid-career authors are incentivized towards increased productivity following a Nature publication, late-career researchers experience a more substantial impact on their citation count, likely due to the cumulative effect of their prior work. Our study underscores the complex dynamics of scientific prestige mediated through high-impact journals like Nature, offering insights into the multifaceted factors shaping career development within the scientific ecosystem.

15:00
Haohan Shi (Northwestern University, United States)
Matthew Vaneseltine (University of Michigan, United States)
Expertise Mismatch and Academic Retraction: An Embedding-Based Analysis

ABSTRACT. This investigates the impact of team members' prior knowledge on the incidence of paper retractions. It examines two hypotheses: 1) whether the familiarity with the paper's topic is lower among members of teams producing retracted papers compared to those with nonretracted papers, and 2) if the level of topic familiarity varies based on the reason for retraction. Analyzing 9,778 retracted records and using a propensity-matched sample of 29,334 unretracted papers, the study utilizes author knowledge banks—computed through the average word embedding of their previous publications' titles—and compares this with the cosine similarity of the focal paper's topic. Findings show a significant disparity in topic familiarity between authors of retracted and unretracted papers, indicating that subject knowledge may reduce retraction risk. The study also differentiates familiarity levels in retractions due to fraud versus error, emphasizing on how retraction reasons may reflect specific deficiencies in team expertise.

15:15
Er-Te Zheng (Renmin University of China; University of Sheffield, China)
Hui-Zhen Fu (Zhejiang University, China)
Zhichao Fang (Renmin University of China; Leiden University, China)
Can social media attention promote academic purification for retracted articles?
PRESENTER: Er-Te Zheng

ABSTRACT. Theoretical contribution Currently, research integrity stands as a paramount concern within the scientific community. This study reveals the potential of social media in upholding research integrity and facilitating the academic purification of retracted articles. Academic purification refers to the process of mitigating the negative impact of misinformation inherent in retracted articles on both the academia and society (Zheng et al., 2024). In this study, it specifically refers to the influence on the retraction speed and the subsequent decline in citations after the retraction of articles. There are significant variations in the effectiveness of academic purification across different social media platforms. Platforms such as Twitter (currently known as X) and blogs emerge as instrumental tools in identifying flawed articles, thereby increasing the retraction speed; whereas the influence of news reports can be used to enhance public awareness of the retraction status of articles, subsequently diminishing post-retaction citations. Overall, this study indicates that social media not only extend the reach of scholarly articles but also serve as a surveillance mechanism to expose problems in research, curtail the spread of flawed articles, and promote the formation of a more robust academic ecosystem.

Data and method The study collected data from the Web of Science (WoS) and Retraction Watch databases, obtaining data on 6,123 retracted articles published between 2012 and 2021 along with their corresponding retraction notices. Data pertaining to the frequency of mentions on news outlets, blogs, and Twitter before and after retraction for all retracted articles were collected through Altmetric.com. Specifically, the dataset comprised 4,329 news mentions, 1,449 blog mentions, and 101,896 Twitter mentions before retraction; and 2,677 news mentions, 2,370 blog mentions, and 45,791 Twitter mentions after retraction. This study explores two aspects of academic purification. The first is the retraction speed of flawed articles, quantified by the retraction time lag (i.e., the duration between publication and retraction of the article). A shorter retraction time lag indicates enhanced effectiveness in academic purification. The second aspect is the change in citations subsequent to the retraction of the articles, measured by the change in average annual citations (i.e., the difference between the average annual citations post-retraction and those pre-retraction). A greater decline in citations indicates better effectiveness in academic purification. The study employs matching and Difference-in-Difference(DiD) methods from causal inference. As for retraction speed, coarsened exact matching (CEM) was conducted between two groups of retracted articles with high and low social media attention, in order to determine the influence of social media attention on retraction time lag. In the study, retracted articles garnering top-tier social media mentions, constituting the upper 20% echelon, are recognized as high attention, while those falling within the lower 20% tier or receiving no mentions are classified as low attention. Control variables encompass disciplinary field, open access status, publication year, journal impact factor, and the number of authors, facilitating one-to-one matching between articles with high and low social media attention. Likewise, for the study on changes in citations after retraction, CEM was conducted between groups with high and low attention, integrated with the DiD method. This approach endeavors to explore whether social media attention affects the changes in citation after retraction.

Findings Figure 1a shows the comparative findings regarding differenes in retraction time lag between treatment and control groups. Articles with high attention across blogs, news, and Twitter exhibit shorter average retraction time lags compared to those receiving low attention. Specifically, articles with high blog attention have a retraction time lag 121.44 days shorter than their low-attention counterparts (p=0.113); articles with high news attention have a retraction time lag 181.73 days shorter than those with low attention (p=0.126); articles with high Twitter attention have a retraction time lag 89.80 days shorter than those with low attention (p=0.053). The study reveals a correlation wherein articles accruing greater social media attention prior ro retraction experience abbreviated retraction time lags. Additionally, observations suggest that comments on articles with higher attention are more likely to engage with controversial research results rather than merely summarizing results. Therefore, articles with higher social media attention are more likely to reveal potential issues within the articles, thereby promoting the retraction of flawed articles. Figure 1b illustrates that only articles receiving high attention in news show a significantly greater decrease in citations after retraction compared to those with low attention. Specifically, there is an additional average annual citation decrease of 4.82 (p=0.004). Articles with high attention in blogs (p=0.269) and Twitter (p=0.372), however, do not show a significant difference in the decrease of average annual citations. Therefore, high news attention plays the most significant role in diminishing the citations of retracted articles, underscoring its superior effectiveness in academic purification.

References Zheng, E.-T., Fang, Z., & Fu, H.-Z. (2024). Is gold open access helpful for academic purification? A causal inference analysis based on retracted articles in biochemistry. Information Processing & Management, 61(3), 103640.

15:30
Jina Lee (University of Illinois Urbana-Champaign, United States)
Are Women’s Works and Claims Received with More Uncertainty?

ABSTRACT. Understanding how scientists view new discoveries as facts or mere possibilities is critical, especially during crises like the recent pandemic, as it shapes our actions and policies. Yet, our perception of scientific ‘facts’ is often colored by social contestations, suggesting that social biases might influence our understanding of new discoveries. Using computational text analysis and data from the Microsoft Academic Graph (MAG), I measure uncertainty cues in in-text citations of research relevant to the SARS-COV2 virus. The study finds that research papers with female last authors are often viewed with more uncertainty than those by male last authors. Interestingly, papers that claim that they are novel face less uncertainty, but this pattern holds only for male-authored papers. My findings highlight the need for equitable evaluation in scientific research, given its profound implications for policy and intervention strategies during crises.

15:45
Mathias Nielsen (University of Copenhagen, Denmark)
Claudia Acciai (University of Copenhagen, Denmark)
Natalie Schroyens (University of Copenhagen, Denmark)
Merhout (University of Copenhagen, Denmark)
The Role of Scientists in Correcting Misinformation: Political Framing and Source Characteristics

ABSTRACT. The spread of misinformation on topics like climate change underscores the need for scientific experts to correct public misconceptions1. Trust in scientists and their messages is vital for science's role in shaping public opinion and tackling societal challenges. However, the growing politicization of science poses barriers to effective science communication2-4, highlighting the need for evidence on how scientists can effectively counter misinformation5.

15:45-16:00Coffee Break
16:00-16:30 Session D2: Technology and Social Sciences
Location: Room 120
16:00
Youwei He (Seoul National University, South Korea)
Jeong-Dong Lee (Seoul National University, South Korea)
Innovation Beyond Intention: The Role of Exaptation in Technological Advancements
PRESENTER: Youwei He

ABSTRACT. The frameworks that explore scientific and technological evolution suggest that discoveries and inventions are intrinsic processes, while the wealth of knowledge accumulated over time enables researchers to make further advancements, echoing Newton’s sentiment of ‘standing on the shoulders of giants’. However, contrary to this view, Park et al.’s (2023) research challenges this notion, revealing that despite the exponential growth in new scientific and technical knowledge, there is a concerning decline in the disruptiveness of papers and patents by the measure of CD index proposed by Funk et al. (2017). Exaptation borrowed from biological evolution, is now recognized as a pivotal yet often neglected mechanism in technological evolution. Significant technologies often do not emerge out of thin air, but rather result from applying existing technologies in other domains. For instance, bird feathers initially served for waterproofing and insulation before enabling flight, and microwave ovens originated from radar magnetrons. Exaptation signifies a cross-field evolutionary process, driven by the functional shift of pre-existing knowledge, technology, or artifacts. We found that exaptation helps to increase the disruptiveness of innovations. Over time, the average disruptiveness decreases, but the efforts for exaptation increase.

16:15
Kyle Demes (OurResearch, United States)
Jason Priem (OurResearch, United States)
Jason Portenoy (OurResearch, United States)
The Past, Present, and Future of OpenAlex: The World’s First Completely Open SKG

ABSTRACT. Scientific Knowledge Graphs (SKGs) have become critical infrastructure in the global research ecosystem—they help researchers find the right research, facilitate large-scale analyses on the science of science, and are used by institutions, funders, and governments to plan, design, and evaluate their research strategies. Historically, access to these SKGs has only been through costly fees where even paid subscribers are unable to access the full SKG or share the data underlying their analyses. OpenAlex (1), a comprehensive and completely open SKG, is disrupting this model by democratizing access to information on the global research ecosystem.

16:00-16:30 Session E2: Science and Innovation
16:00
Thomas Gebhart (University of Minnesota, United States)
Russell Funk (University of Minnesota, United States)
Innovative Science Eludes Recommendation
PRESENTER: Thomas Gebhart

ABSTRACT. The recent explosion in text processing capabilities of modern machine learning models has completely upended the manner by which information in text corpora is extracted and re-encoded for usage in downstream analyses. From early approaches like Doc2Vec [5], to more recent large language models like BERT [4] or the GPT variants [2], all of these text processing models may be characterized as functions which map text documents into embeddings, vector-valued representations which reflects the statistical properties of the documents with respect to an underlying corpus and learning task.

16:15
Jonathan Coopersmith (Texas A&M University, United States)
Rebecca Slayton (Cornell University, United States)
Arthur Daemmrich (Arizona State University, United States)
Using the past to improve the future: An applied history framework for fighting failure

ABSTRACT. Historical analysis can improve the prospects of successful innovation, but the methods of applied history are not as simple as many assume. Because historical analysis is ineluctable—assumptions about the past always guide decisions about the future—the next generation of innovators needs a systemic means of evaluating historical reasoning. This paper provides a framework to guide innovation decision-makers and policymakers in historical analysis. We use applied history to develop a typology of failure that applies widely across science and technology engaged in translation to markets. We then provide a framework for using historical methods to improve understanding of these types of failure, and thereby reduce the likelihood of their recurrence.

16:00-16:30 Session F2: Publishing and Research Dissemination
Location: Main Auditorium
16:00
Saqib Mumtaz (Haas School of Business, University of California, Berkeley, United States)
Lost in Translation? Science Communication & the Commercial Diffusion of Ideas

ABSTRACT. The use and exploitation of scientific knowledge is a key driver of firm performance and innovation. Even though the scientific knowledge presented in scientific publications is accessible to inventors, it is not necessarily digestible as scientific papers use specialized language and often fail to explain the broader implications and potential applications of the research, limiting the knowledge's use. Science communication through mass media can help science break out of its academic silos by bringing attention to new scientific discoveries and providing simpler descriptions of key findings in a non-technical language. The existing literature has largely overlooked the role of science communication and mass media in the dissemination of scientific knowledge. In this paper, I study the impact of media coverage on the diffusion of scientific knowledge to industry and find that media coverage leads to a broader diffusion of science. Notably, papers that use more complex technical language derive greater benefit from media exposure, underscoring the important role of science communication. Additionally, the results show that the effect is higher for firms that hire scientists and in sectors where the firm's expertise is limited.

16:15
Kyle Siler (University of Toronto, Canada)
The Diverging Fates of Breakaway and Zombie Journals
16:30-18:00 Session P2: Poster Session
Location: Great Hall
Sergio Pelaez (Georgia Tech, United States)
Diana Hicks (Georgia Tech, United States)
Value Expressions in Patent Documents: A New Method and Its Applications to Patent Valuation and Direction
PRESENTER: Sergio Pelaez

ABSTRACT. This research examines the emergence of "value expressions" in patent documents and their relationships with patent value and direction. Value expressions are statements that describe the societal or commercial value of an invention. They can take the form of public value expressions (PVEs), which articulate societal benefits, or private value expressions (PRIVEs), which express commercial advantages. The increasing presence of value expressions in patents may reflect shifting norms in science, technology, and innovation (STI) towards demonstrating broader societal impacts (e.g., Gibbons et al., 1994; Von Schomberg, 2013; Stilgoe et al., 2013; Hicks, 2016; Ribeiro et al., 2018). Patent documents provide a novel setting to study the written articulation of these societal implications through PVEs. Although prior qualitative research has substantiated their presence (Ribeiro and Shapira, 2020), these studies have been constrained by their scale. Addressing this limitation has been identified as a priority by the STI research community (Castaldi et al., 2024). In a recent paper, we pioneered a large-scale computational analysis of PVEs in patents using recent advances in artificial intelligence (AI) and machine learning (ML). A new method was employed that uses generative language models (GLMs) to efficiently produce labeled datasets, which were then used to train classifiers that can identify and extract value expressions across millions of patent sentences. Variables capturing the occurrence and density of PVEs were constructed at the patent level (Pelaez et al., 2024). However, there are unanswered questions regarding the function, purpose, and influence of PVEs in patents that we seek to untangle in the present study. We preliminary found that patents with at least one PVE as well as those with a greater proportion of the text allocated to PVEs were associated with higher patent value, measured by the number of claims, citations, technology classes, and family size. An explanation is proposed that PVEs serve as a dual-purpose mechanism: they enable inventors to align with the current STI norms emphasizing societal impact, and they act as strategic tools for patent attorneys. PVEs are used to expand the scope of patents and ensure the language used is vague and therefore non-detrimental to the applicants’ rights as it helps avoiding admitting prior art or narrowing the patent's scope. Similar textual strategies in patents have been discussed in prior works (Myers, 1995; Arinas and Sancho, 2010; Arinas, 2012; Burk and Reyman, 2014), but this is the first time PVEs are shown to support such strategies. Semi-structured interviews with inventors, attorneys, and examiners are being conducted to further explore the functions and implications of PVEs. The study also seeks to examine whether PVEs associate with patents aimed at solving societal challenges, contrasting with the relationship between PRIVEs and commercially-oriented patents. This analysis sheds light on the longstanding debate in STI studies over whether normative statements translate into actual changes in the direction of innovation (e.g., Fisher, 2005; Holbrook, 2005; Holbrook and Frodeman, 2011; Bernstein et al., 2021; Schiff et al., 2020). Overall, this research contributes a pioneering computational study of value expressions in patents, provides new insights into underexplored dimensions of patent value indicators, proposes underlying mechanisms linking value expressions to patent drafting strategies, and advances debates on norms versus the realization of societal impacts in innovation.

Jin Ai (Indiana University, United States)
Richard Steinberg (Indiana University, United States)
Chao Guo (University of Pennsylvania, United States)
Filipi N. Silva (Indiana University, United States)
Network Approach to Research Novelty: Typology and Metrics

ABSTRACT. Novel research inherently faces skepticism from scholars and institutions seeking to maintain the established conventions 1. They are at further risk due to the current evaluation metrics. These include the frequency approach of measuring the atypical/new combinations 2, the categorized approach of measuring the types of novel strategies 3, and the impact approach of measuring the disruptiveness 4. While promising, these efforts are still limited as they either focus on citation structure (i.e., the combinatorial measures) or on the impact of changing the citation network (i.e., the disruptive measures). It remains unknown how the links between the citation structure when novelty occurs and the different impacts this citation structure might have on shaping subsequent citation networks. These issues become particularly challenging in emerging interdisciplinary fields, where the backbone knowledge structure is not yet clearly defined, and the legitimacy of the field is still being established.

Rod Abhari (Northwestern University, United States)
Emőke-Ágnes Horvát (Northwestern University, United States)
When Retractions Fail: Conspiracy and Denial in Online Attention to Retractions

ABSTRACT. 2023 was a record-breaking year for scientific retractions. Over 10,000 academic articles were retracted, nearly tripling the number of retractions issued in 2022 (McKie, 2024). A retraction, according to the Center on Publication Ethics, is a formal statement that completely repudiates the scientific credibility of a research article (Barbour et al., 2009). While the prevalence of retractions may seem high, it likely represents only a fraction of the total number of fraudulent, manipulated, or otherwise illegitimate articles that are deserving of retraction (Marcus & Oransky, 2017). Thus, some scientists have argued that an increase in retractions actually indicates that the system of science is performing as it should (Hilgard & Jamieson, 2017). But while retractions may not indicate a breakdown of the scientific method, they are a threat to the public image of science in the context of politicized scientific distrust. Ongoing attacks on academic institutions by conservative interests have raised questions about both the competence of key scientists and the incentive structures through which science is produced (Oreskes & Conway, 2010). In the context of politicization, individual cases of retractions are increasingly likely to be interpreted as an institutional crisis within science rather than the isolated failings of individual scientists or journals (Hilgard & Jamieson, 2017). Indeed, science skeptics, including scientists whose articles were retracted, have claimed that the retraction ‘crisis’ is actually a series of attempts from corrupt science publishers to censor politically dangerous opinions (Savolainen, 2023; McCullough, 2023). While previous research has quantified the prevalence of retraction shares on social media (e.g., Peng et al., 2022; Serghiou et al., 2021), these works did not analyze the rhetorical context in which the retracted articles were mentioned. As a result, it remains unclear why retracted articles receive social media attention. This, we argue, is an essential matter for determining the public impact of a retraction. If a retraction-sharing post merely repeats the findings without mentioning the retraction, or if it frames the retraction as proof of a conspiratorial plot, then the corrective function of a retraction is directly undermined. To this end, the proposed talk will present our research evaluating the extent to which retractions are effective at drawing public attention to the faults of retracted articles (Abhari & Horvat, 2024). Materials and Methods With data sharing agreements from Altmetric and Retraction Watch, we collected a database of roughly 3 million social media posts that link to 13,376 retracted articles. Using the text found in the dataset, we performed a content analysis of the social media mentions in the corpus. To do so, our research team trained a GPT-4 classifier to determine whether a given mention of a retracted article either 1) avoided the retraction (retraction avoidance), or 2) ascribed cynical motivations for the retraction (retraction cynicism). The model achieved a high performance for measuring both types of mentions, with an F1 classification score of 0.86 for retraction avoidant mentions and 0.81 for retraction cynical mentions. Our research questions and hypotheses are as follows: RQ1: Do social media platforms significantly differ in how their users discuss retractions? H1a: Twitter/X contains significantly more retraction avoidant mentions than other platforms. H1b: Twitter/X contains significantly more retraction cynical mentions than other platforms. RQ2: Do articles significantly differ by subject in how their retractions are discussed? H2a: Politicized research receives significantly more retraction avoidant mentions. H2b: Politicized research receive significantly more retraction cynical mentions. Significance By analyzing the prevalence of problematic retraction mentions, the proposed talk will elaborate on the extent to which retractions are effective at drawing public attention to the faults of retracted articles. It will also discuss how public discussion of retractions contributes to science misinformation, including conspiracy theories, that directly undermine the core practices of science.

Caroline Wagner (The Ohio State University John Glenn College of Public Affairs, United States)
Travis Whetsell (Georgia Tech, United States)
Can We Develop a Policy-Relevant Measure of National Scientific Capacity?

ABSTRACT. No standard measure exists for assessing a nation’s capacity to conduct scientific research. Many proxies attempt to capture parts of it through indicators such as “research and development spending as a percent of gross domestic product” but these are inadequate for policy analysis purposes. Organizations report indicators to help compare nations, but few are nations with trusted data. A theory of measurement of scientific capacity and a standard set of measures would help to compare nations, and we begin some steps in this direction. We propose measures within a framework, index the measures, and create comparative measures for 172 countries. We place the index into the hierarchy of a complex system to better explicate measures for policy feedback.

Xiang Zheng (University of Wisconsin-Madison, United States)
Qiyao Yang (University of Wisconsin-Madison, United States)
Jai Potnuri (University of Wisconsin-Madison, United States)
Chaoqun Ni (University of Wisconsin-Madison, United States)
Ian Hutchins (University of Wisconsin-Madison, United States)
Analysis of the comparative strengths of intramural and extramural grant funding mechanisms

ABSTRACT. Science funders utilize a variety of funding mechanisms to advance scientific discovery, and the comparative strengths of these approaches are frequently debated. One prominent example is the contrast between extramurally funded research, where grants are awarded to external institutions, and intramurally funded research, where scientists are directly hired by funding agencies. Each mechanism is backed by theoretical justifications. In this context, we quantify the comparative strengths of the National Institutes of Health’s extramural and intramural mechanisms. When adjusted for investment, extramural research excels at producing scholarly outputs such as publications and citations, which are standard metrics in academic assessment. In contrast, intramural research, whether basic or applied, stands out for producing research that influences subsequent clinical studies, aligning with its agency mission. These findings provide evidence that the institutional incentives associated with different funding mechanisms drive their comparative strengths.

Maalvika Bhat (PhD Student, Northwestern University, United States)
Agnes Horvat (Northwestern University, United States)
Daniel Romero (University of Michigan, United States)
Understanding Clickbait in Multiplatform Science Dissemination

ABSTRACT. This study aims to understand how sensationalist clickbait titles affect the sharing of and engagement with scientific articles across digital platforms. Scholars’ increased use of digital platforms for disseminating and promoting their work points to the need to better understand the role of sensationalist science titles (Ransohoff, 2018). Clickbait refers to a sensationalized or misleading headline in order to attract clicks on a piece of content. Publications that use this strategy of clickbait encourage people to widely share headlines and images that are often misleading or decontextualized – fanning the flames of sensationalism and misinformation. This study specifically aims to quantify the impact of sensationalist clickbait on the dissemination and public engagement of scientific research across multiple digital platforms, addressing a critical gap in existing literature on science communication. Literature on this topic discusses two key challenges: First, the social networks that underpin today’s hyper-connected society enable extremely fast and wide information diffusion, increasing the scale of the problem (Doerr et al., 2011). Second, ways in which information is altered, interpreted, or framed along its diffusion significantly impact what message is conveyed to the public (Entman, 1993). Despite the surge of interest in the broad area of information distortion, to the best of our knowledge, only one recent experimental study tackles multiple ways in which scientific content may be distorted as it propagates over the news, blogs, and social media (Ribeiro et al., 2019). Despite existing descriptions of how scientific content changes as it is shared online (Hwang et al., 2023), the specific dynamics of science clickbait have not been extensively explored within the realm of online science communication. This area presents a unique challenge, as clickbait headlines have the potential to significantly distort scientific messages across various platforms. Our results will provide fundamental information and baseline models for future studies about scientific misinformation. Our study targets approximately 8 million authors who had a paper featured in our Altmetric dataset, focusing on papers that have been mentioned at least once within a 7-year timeframe. This study begins by identifying clickbait from a wide range of disciplines. Existing clickbait detection approaches typically engineer features based on linguistic markers in a supervised machine-learning framework. However, algorithms have not been developed for science clickbait and their performance suffers from a lack of sufficient relevant training data (Chakraborty et al., 2016). Our first attempts to label mentions of scientific articles as clickbait or not indicate that the task of creating training data is not easily crowdsourced. Furthermore, in addition to labeling clickbait, our study also aims to understand attitudes towards clickbait in one's own work versus in the broader science dissemination ecosystem, highlighting the nuanced perspectives and ethical considerations that AI, such as ChatGPT, might not fully grasp or evaluate. Hence, rather than crowdsourcing this task, we are using a labeling protocol that is based on expert assessments by scholars who label potential clickbait in their own area of expertise, ensuring a depth of understanding and accuracy in identifying clickbait within the academic landscape. Thus far, we have sent out almost 70,000 emails, receiving 750 responses, with over 500 being complete. Our survey findings indicate a significant concern over inaccurate coverage of scientific research, with ~93% of respondents affirming it as a problem. Furthermore, our analysis of clickbait mentions reveals that 38% are considered "sensationalist," while ~44% exploit the "curiosity gap" to attract attention. Despite the potentially misleading nature of these headlines, ~84% were deemed "probably not harmful" or "definitely not harmful." Additionally, ~79% of respondents believe that clickbait titles are an effective method for drawing attention to scientific research. The strong consensus among survey participants about the problem of inaccurate scientific coverage suggests a community prioritizing precision. Yet, a closer look reveals a different story regarding clickbait: while identified as "sensationalist" by many, it's largely seen as harmless, suggesting a possible appreciation for its value in communication. This implies that researchers might be strategically using clickbait to draw attention to their work, balancing public engagement with scientific accuracy. This reflects an active engagement with digital media, recognizing the dual role of clickbait in attracting viewers and posing challenges to scientific integrity. It indicates a wider discussion within the scientific community on managing the trade-offs between reach and reliability in the dissemination of their research. Following our examination of clickbait's categorization, we delve into its impact by measuring its spread and comparing it to the distribution of less sensationalist titles across platforms for the same studies. This enables us to understand clickbait's influence on information sharing. Furthermore, our analysis extends to user interactions with clickbait, focusing on specific reactions across various platforms. Finally, we examine whether scientific field and topic distribution vary between clickbait and non-clickbait content. For this comparison, we use information about research areas gathered from OpenAlex to understand whether some domains and topics are more susceptible to clickbait. Answers to these research questions will provide new knowledge about the extent of clickbait usage in science communication.

Hong Chen (School of Information, University of Michigan, Ann Arbor, United States)
Yi Bu (Department of Information Management, Peking University, China)
Lu Zhong (Department of Computer Science, Rensselaer Polytechnic Institute, United States)
Caifan Du (School of Information, University of Texas at Austin, United States)
Eric Meyer (School of Information, University of Texas at Austin, United States)
Ying Ding (School of Information, University of Texas at Austin, United States)
Jianxi Gao (Department of Computer Science, Rensselaer Polytechnic Institute, United States)
Science resilience hinges on persistence collaboration sacrificing equity
PRESENTER: Hong Chen

ABSTRACT. The widespread flows of knowledge specify a global, complex, and dynamic interconnectedness of science activities, marked by intensified collaboration and interaction across disciplinary and geographical boundaries. However, this interconnected system remains vulnerable to substantial disruptions in recent years, such as conservative policies on international collaboration, changes in funding priorities, and constraints imposed by pandemics, which can all challenge the continuity of collaboration efforts among scientists. These disruptions highlight the importance of resilience in global scientific collaboration. Resilience is typically described as a networked system’s ability to withstand shocks and maintain functionality (Liu et al. 2022). In the context of science, it involves adaptability, the sustainability of research outcomes, and the capacity to continue collaboration despite stressful conditions.

Although there is extensive research on the resilience of natural and artificial complex systems, limited research on resilience focuses on the resilience within the realm of human activities, such as the collaboration behavior of scientists, due to its dynamic, complex and multifaceted nature. To understand resilience in science, we utilize Microsoft Academic Graph, a large-scale bibliographic dataset, to model the network dynamics of scientific collaboration. We construct networks of each discipline with individual scientists as nodes and coauthored publications as edges. We then adopt a novel way of measurement that quantifies the multi-dimensional characteristics of resilience in a standardized way (Gao et al. 2016). This includes two components: density (s) and heterogeneity (H) to succinctly estimate network resilience as eff=s+H. Density refers to the network’s average weighted degree. A denser collaboration network indicates higher average productivity and more collaborative connections among scientists. Heterogeneity is measured by the normalized standard deviations of the edge weights. It evaluates the degree to which collaborative output is concentrated among a limited number of nodes within the network. The topological structure of heterogeneous collaboration networks suggests a pattern of intense collaboration among a small set of influential scientists, along with more tenuous connections among the broader scientific community, which is also related to significant disparities in social capital among scientists.

As Figure 1 illustrates, “Hard” science disciplines, such as Chemistry and Computer Science, tend to exhibit greater heterogeneity and thus greater resilience, with most intensive collaborations centered around hub scientists. “Soft” science disciplines, such as History and Philosophy, largely focus on community-based collaboration and have relatively lower heterogeneity and resilience.

Our analysis also reveals a strong correlation between density and heterogeneity across almost all 19 major scientific disciplines, with both showing a consistent upward trend from 1985 to 2014. The rising trend during the three decades in density and heterogeneity coincides with the expansion of most disciplines with more active scientists and larger team size in publication. The growing collaborative activities strengthen connections within the scholarly community and meanwhile contribute to the widening disparities among elite scientists and less established scientists.

Overall, we investigate the resilience of scientific collaboration by applying a new metric with macroscopic network parameters to coauthorship networks and examining the topological characteristics. This study offers a framework for comprehending the resilience of scientific collaboration with profound implications for guiding policies in response to natural disasters and geopolitical turmoil.

Ronen Tamari (Astera Institute, Common SenseMakers, United States)
Pepo Ospina (Astera Institute, Common SenseMakers, Spain)
Shahar Oriel (Astera Institute, Common SenseMakers, United States)
Wesley Finck (Astera Institute, Common SenseMakers, Canada)
Sensemaking Networks: Transforming Social Media into a Sensemaking Layer for Science
PRESENTER: Ronen Tamari

ABSTRACT. Our project, Sensemaking Networks, aims to address three interrelated limitations of current science publishing and communication processes: (1) poor reach and feedback, (2) knowledge fragmentation, and (3) rigid formats. To address these issues, Sensemaking Networks combine two promising trends in science publishing and communication: science social media and semantic nanopublishing.

Er-Te Zheng (Renmin University of China; University of Sheffield, China)
Hui-Zhen Fu (Zhejiang University, China)
Zhichao Fang (Renmin University of China; Leiden University, China)
Can ChatGPT predict article retraction based on Twitter mentions?

ABSTRACT. Theoretical contribution Detecting problematic research articles timely is a vital task. Conventional approaches for identifying problematic articles have focused primarily on text-based plagiarism and image manipulation. However, these methods have limitations in detecting more forms of misconduct, such as data falsification and authorship issues. This study explores whether ChatGPT can signal potential problems with the retracted articles prior to retraction based on Twitter (currently known as X) mentions, thereby playing a role in predicting future retraction of problematic articles. The result indicates that there are indeed retracted articles with their Twitter mentions containing recognizable evidence signaling problems before retraction, although they represent only a limited share of all retracted articles with Twitter mention data (approximately 16%). Using the manual labelling results as the baseline, ChatGPT demonstrates superior performance compared to other methods, implying its potential in assisting human judgment for predicting article retraction. This study uncovers both the potential and limitation of social media events as an early warning system for article retraction, shedding light on a potential application of generative artificial intelligence in promoting research integrity.

Data and method The study collected data from the Web of Science (WoS) and Retraction Watch databases, obtaining data on 6,123 retracted articles published between 2012 and 2021. Twitter mentions for retracted articles were collected through Altmetric.com. We obtained a total of 3,628 retracted articles with at least one tweet (accounting for 59.3%). There are 749,480 non-retracted articles that were published in the same issue of the same journals as the 3,628 retracted articles in our dataset. Similarly, the 436,223 non-retracted articles have accumulated at least one tweet. Figure 1 illustrates our research workflow. To examine whether ChatGPT can predict retraction based on Twitter mentions, this study considers tweets about both retracted and non-retracted articles. We adopt the Coarsened Exact Matching (CEM) method to match non-retracted articles with the selected retracted articles. After matching, a total of 7,010 articles were successfully matched, comprising 3,505 retracted articles and 3,505 non-retracted articles. Then we filter the tweets and divide them into training set and test set, to evaluate the effectiveness of Twitter mentions in predicting article retraction by four prediction methods, including manual labelling, keyword identification, machine learning models, and ChatGPT.

Findings Manual labelling was initially conducted to address a fundamental question of the study: whether tweets per se can provide recognizable evidence indicating potential problems in articles, which may ultimately lead to article retraction. Based on the tweet texts of the 337 articles in the test set, 28 articles were labelled positive (i.e., retracted) because we identified critical messages in the tweets that may imply a risk of retraction. Conversely, the remaining 309 articles were labelled negative (i.e., non-retracted) because there were no evident words expressing a critical attitude towards the tweeted articles. Since there were 158 articles in the test set that had actually been retracted, the values of accuracy, precision, recall, and F1-Score of human prediction are 60.24%, 92.86%, 16.46%, and 0.28, respectively. The results of human prediction suggest that, overall, only a limited portion of retracted articles had their Twitter mentions conveying detectable concerns about potential problems prior to retraction (as represented by recall = 16.46%). In contrast, the Twitter mentions of most retracted articles did not identify the potential problems that led to article retraction. However, for those articles with Twitter mentions labelled as an indication of future retraction, the majority of them were indeed retracted after being highlighted on Twitter (as represented by precision = 92.86%). This suggests the potential of scholarly Twitter mentions in predicting article retraction. We compared the performance of different methods with human prediction as the reference. This was done to determine the extent to which each method’s prediction aligns with human judgment. Table 1 presents the performance of the three main methods (i.e., keyword identification, machine learning, and ChatGPT). We calculated the indicators accuracy’, precision’, recall’, and F1-Score’ with the results of human prediction assumed as the actual situation. ChatGPT, particularly GPT-4, exhibits the strongest consistency with human prediction, as reflected by higher values of accuracy’ (=94.96%), precision’, and F1-Score’ compared to other methods.

Table 1. Performance of different methods with human prediction as the reference Method Accuracy’ (%) Precision’ (%) Recall’ (%) F1-Score’ Keyword identification 59.64 14.94 82.14 0.25 Machine learning: LR 56.97 11.76 64.29 0.20 Machine learning: RF 63.50 12.00 53.57 0.20 Machine learning: NB 47.77 13.00 92.86 0.23 Machine learning: SVM 84.57 22.73 35.71 0.28 ChatGPT: GPT-3.5 87.83 37.25 67.86 0.48 ChatGPT: GPT-4 94.96 68.97 71.43 0.70

Overall, this study first (1) validated the feasibility to identify problematic articles and predict retraction based on tweet texts through human labelling, and then (2) demonstrated the potential of ChatGPT in making such predictions using human prediction results as the baseline. While ChatGPT may pose a threat to research integrity, this study reveals that it can also help to identify flawed articles and predict their retraction. Therefore, the key lies in how individuals utilize the AI tools.

Christopher Graziul (University of Chicago, United States)
Alia Smith (University of Chicago, United States)
Camille Cypher (University of Chicago, United States)
Emma Shippert (University of Chicago, United States)
Grace Hogue (University of Chicago, United States)
Kristene Loaiza (University of Chicago, United States)
Leticia Mercado (University of Chicago, United States)
Saba Johnson (University of Chicago, United States)
Margaret Beale Spencer (University of Chicago, United States)
The Praxis of Data Annotation and Ideational Search
PRESENTER: Grace Hogue

ABSTRACT. Realizing the potential for AI-driven science requires high-quality data annotation for model development, and this potential depends on realizing data annotation as a synthesis of theory and practice aligned with this goal. The same is true for model development and the related task of ideational search, where researchers explicitly reflect on ways to achieve a desired technical outcome based on metatheories of the task involved. We use data annotation tasks, and related ideational search for appropriate models, to study policing from a developmental psychology perspective as a case study on benefits of integrating social theory into AI/ML-based research practice. We report such praxis is (a) non-trivial given domain, (b) necessarily collaborative, and (c) either adaptive or maladaptive for productive research. We encourage focused attention on the role of reflexivity for the co-constitutive meaning-making processes linked to use of annotated data for AI/ML in computational social science.

Lili Miao (Indiana University Bloomington, United States)
Byungkyu Lee (New York University, United States)
Vincent Larivière (University of Montreal, Canada)
Cassidy Sugimoto (Georgia Institute of Technology, United States)
Yong-Yeol Ahn (Indiana University Bloomington, United States)
Persistent Colonialism in Contemporary International Collaboration

ABSTRACT. International scientific collaboration is vital for advancing knowledge, yet remnants of colonialism persist within its dynamics. This study investigates power imbalances in international collaborations through labor division and research priority. Using authorship order as a proxy for labor division, our analysis reveals a hierarchical structure in international collaboration that researchers from advanced countries tend to assume the role of last author, researchers from scientifically proficient and developing countries are more likely to be listed as first authors, with researchers from lagging countries often occupying supportive roles as middle authors. Regression models controlling for demographic factors confirm disparities, showing researchers from non-advanced countries are less likely to hold last author positions, even when they contribute funding to research. Furthermore, internationally coauthored papers from less advanced countries exhibit greater topical disparities compared to their papers without such collaboration, suggesting researchers from less advanced countries have limited influence in setting research agendas. These findings underscore ongoing colonial legacies within current international scientific partnerships, necessitating efforts to address inherent inequities.

Rodrigo Dorantes Gilardi (northeastern university, United States)
Yixuan Liu (Northeastern University, United States)
Albert-László Barabási (Northeastern University, United States)
Assessing the Impact of Migration on the Career Success of Creative Individuals: A Comprehensive Study of Artists' Mobility

ABSTRACT. The migration patterns of creative individuals, particularly artists, have long been speculated to play a pivotal role in their career trajectories and overall success. Despite anecdotal evidence supporting the positive correlation between migration and career advancement, empirical studies have been hampered by the lack of comprehensive datasets tracking the work-based migration of individual innovators and its subsequent impact on career success. Addressing this critical gap, our study utilizes an extensive dataset comprising 34,000 artists, analyzing both temporary and permanent migration patterns derived from biographical data including 1 million art exhibitions and the historical artist database maintained by the Getty Research Institute.

Mark Whiting (University of Pennsylvania, United States)
Amirhossein Nakhaei (University of Aachen, Germany)
Duncan Watts (University of Pennsylvania, United States)
Atlas: building a map of commensurable knowledge from existing research

ABSTRACT. We introduce Atlas, an open source platform designed to bridge the gap between vast amounts of research findings and the need for actionable, commensurable insights. Atlas aims to provide a tool chain for research cartography, by blending human expertise and advanced computational techniques, including Natural Language Processing (NLP) and Large Language Model (LLM) coding, to create a standardized, commensurable database of scientific findings. Atlas provides views of the aggregated data in an online environment, as well as developer tools to design and contribute new research concepts and articles to include in the analysis. We evaluate the accuracy of these models and compare them to previously rated dimensions from a manual research cartography effort and are in the process of developing mechanisms to automatically parameter tune new research concepts.

Robert Ward (Georgia Institute of Technology, United States)
Marc Santolini (Learning Planet Institute, Université Paris Cité, France)
Interventions on Network Formation

ABSTRACT. New discoveries in science, true and false, are increasingly produced through collaboration [10]. Science funders have responded by creating programs to influence the types of collabora- tions that form, network interventions with the goal of accelerating breakthroughs and synthesis across fields [7]. In the United States, for example, the National Institutes of Health uses RM1 grants to fund multidisciplinary teams of three-to-six investigators, with diverse backgrounds and expertise, preferably containing members early in their careers. Similarly, universities like Ohio State, Stanford, and Georgia Tech have created internal seed grants to stimulate collabo- ration within and across departments. Despite the popularity of such programs we know little about how they ought to be designed [6]. Previous research has focused on understanding the structure and function of endogenous collaboration networks, with research on network interventions aimed at understanding how physical proximity, shared tasks, and financial incentives can be used to promote interactions [3, 1, 9, 2, 8, 5]. The question of which collaborations should be induced is easy to take for granted. For example, networks with low clustering seem to increase the rate at which findings replicate [4]. If this is true, then creating collaborations across cliques, between scientists with no collaborators in common, may help to address the replication crisis. The main complication is that scientists who take up inducements to form new collabora- tions do not forget their old ones, nor necessarily abandon them. We cannot design a network, en masse, from scratch. This means that it may not be sufficient to understand what the local networks of the highest performing nodes look like. We need to understand which networks it is possible to create by making a relatively small number of changes to the existing one, and more fundamentally, how the properties of an initial network constrain opportunities for change.

In this study we use formal and computational methods to improve the design of interven- tions on network formation. We begin by formalizing the problem of choosing an optimal set of τ edges to add in order to maximize a linear outcome function for an arbitrary initial net- work, and prove an NP-Hardness theorem for it. We introduce an agent-based model and use a series of Monte-Carlo experiments to understand the effectiveness of traditional intervention designs, and identify the structural mechanisms that cause them to fail. First, interventions that use random assignment, which are particularly common in practice, have consistently weak or negative effects because they undersample the most beneficial edges to add. Second, interven- tions to connect researchers to expertise in different clusters while maintaining low clustering coefficients are often only able to add a few edges before geodesic distances shrink and it is impossible to add edges without increasing clustering. This is particularly relevant for science, where low clustering is associated with higher creativity, more efficient problem solving, and higher replication rates [4]. Taken together, our results suggest that regardless of the opportunities or incentives used to motivate new collaborations, the intervention designs used today are often bound to fail. The structure and composition of a network constrain possibilities for change, so that certain types of relationships are undersampled or impossible to create, which leads intuitively effective interventions to have weak effects. Motivated by these results, we introduce an algorithm based on simulated annealing to design interventions that for most combinations of networks and outcomes, are orders of magnitude more effective than the interventions used today.

Yicong Yan (Renmin University of China, China)
Kai Li (University of Tennessee, Knoxville, United States)
Guancan Yang (Renmin University of China, China)
Xiaobin Lu (Renmin University of China, China)
How are machine learning research cited in library and information science literature? A citation analysis based on Papers with Code (PwC) dataset and entitymetrics

ABSTRACT. Machine learning (ML) has emerged as a transformative force reshaping numerous scientific domains. As an interdisciplinary field intrinsically linked to information technologies, library and information science (LIS) is inevitably influenced by the rapid advancements in ML. However, there is a notable lack of systematic empirical investigations into the nature and extent of this influence. This study pioneers the examination of how ML literature is cited in LIS publications by leveraging the novel Papers with Code (PwC) dataset, which comprehensively covers core ML publications and annotates granular ML-related entities such as tasks, methods, and datasets.

Hanzhang Zhou (Harvard University, United States)
Richard Freeman (Harvard University, United States)
Danxia Xie (Tsinghua University, China)
Hanzhe Zhang (Michigan State University, United States)
The Decentralization of Science: Evidence from Award-Winning Researchers
PRESENTER: Hanzhang Zhou

ABSTRACT. Scientific production is becoming more decentralized: International research collaborations are across broader and more diverse regions [2], and research institutions in emerging economies are progressively expanding their scientific capacities to world class standards [3]. This broad distribution of knowledge production enhances the accessibility and dissemination of scientific insights. However, it is less clear if this pattern is mirrored in the distribution of top researchers. Our research aims to address two primary questions: (i) how has the distribution of top talents across institutions evolved within various academic disciplines over time, and (ii) what are the underlying causes and implications of these distribution patterns, including their variations across fields? Investigating the concentration of academic talent reveals the mechanisms that underpin intellectual progress and its impact on society. By examining where these leading researchers work, we gain insight into how ideas spread, the pace at which they are embraced, and the overall trajectory of scientific and scholarly advancements.

Siqi Xie (Hong Kong University of Science and Technology, Hong Kong)
Xuan Zeng (Hong Kong University of Science and Technology, Hong Kong)
Masaru Yarime (Hong Kong University of Science and Technology, Hong Kong)
Bless or Curse: The Effect of US Sanctions on China’s Technology Development in the Case of AI

ABSTRACT. The U.S.-China trade friction is one of the most important geopolitical events in the 21st Century. On 3 April 2018, the U.S. government released a list of goods subject to tariff increases, which imposed a 25% tariff on 1,333 items of $50 billion of goods exported from China to the U.S. (Timmons, 2020). The trade war between China and the U.S. officially started. China is ambitious to become a pivotal player in the global value chain (GVC). Chinese giants, backed by huge state funding, are catching up in emerging technologies such as 5G, Artificial Intelligence (AI), and semiconductors, challenging the U.S. technological supremacy (SCMP, 2021). Therefore, unlike other trade frictions in the past, the main conflict between the U.S. and China evolved from trade imbalance to technology superpower competition.

Boris Veytsman (George Mason University, United States)
Andrew Nesbitt (Ecosyste.ms, United States)
Daniel Mietchen (Ronin Institute for Independent Scholarship,, United States)
Eva Maxfield Brown (University of Washington, United States)
Howison (The University of Texas at Austin, United States)
Pimentel (Universidade Federal Fluminense, Brazil)
Hébert-Dufresne (University of Vermont, United States)
Druskat (German Aerospace Center, Germany)
Biomedical Open Source Software: Crucial Packages and Hidden Heroes

ABSTRACT. Despite the importance of scientific software for research, it is often not formally recognized and rewarded [1–4, 6, 7]. This is especially true for foundation libraries, which are used by the software packages visible to the users, being “hidden” themselves. The funders and other organizations need to understand the complex network of computer programs that the modern research relies upon. In this work we used CZ Software Mentions Dataset [5] to map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems. We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality(Figure 1).

Jeff Tsao (Sandia National Labs, United States)
Venkatesh Narayanamurti (Harvard University, United States)
Bell’s Dodecants: An Architecture for How Technoscience Actually Advances

ABSTRACT. Much attention has been paid to a scholarly understanding of science and technology (what together we will call technoscience) and how it advances. This understanding, however, has not translated to an improved operational practice of advancing technoscience—the formal societal processes we call research and development (R&D). Indeed, one might argue that the operational practice of advancing human technoscience, particularly its exploratory research half, has become less effective in recent decades (Arora et al., 2018), with calls for new ways of thinking becoming more frequent (Georgescu, 2022; Nielsen & Qiu, 2022).

In this talk, we discuss just such a new way of thinking: an architecture for what technoscience is and how it advances (Narayanamurti & Tsao, 2024). We call our architecture, illustrated in Figure 1, “Bell’s Dodecants.” The name follows from its six mechanisms and two flavors for how technoscience advances, with Alexander Graham Bell an exemplar, embodied in Bell Labs, of the balanced nurturing of all six mechanisms and two flavors. The name is also a deliberate parallel to that of the Pasteur’s Quadrant framework introduced by Donald Stokes (Stokes, 1997), which we go beyond but wish to acknowledge. Bell’s Dodecants is first and foremost informed by our desire for an architecture that can be translated into operational practice (Narayanamurti & Odumosu, 2016). But it also builds on and unifies scholarly work of the past century, for example: from the history and philosophy of science and technology; from the evolutionary biological, complexity, physical, and economic sciences, and from the world of research leadership and policy.

Pradyumn Sampurnanand Pathak (Depaul University, United States)
Alexander Belikov (Unknown, France)
Jamshid Sourati Sourati (Depaul University, United States)
Towards Abductive Hypotheses Generation in Science and Technology Using Active Learning

ABSTRACT. Introduction–A scientific theory is a set of explanations for natural phenomena from which certain rules could be derived. These rules could be used to predict the outcome of future events. The more events a theory can correctly predict, the better approximation it is of the natural phenomena. From this point of view, one can dichotomize scientific discoveries into the conforming and non-confirming findings. The latter group is crucial for initiating theory modification processes giving birth to stronger theories. At the core of science is abductive reasoning activated when a non-conforming observation occurs—an observation at odds with the currently predominant theories [1]. Along with the aphorism “what does not kill me makes me stronger,” modifications inspired by anomalous observations are almost always towards higher generality as they introduce articulations to theories enabling them to explain the new, non-conforming findings as well as the previous ones.

Nobuyuki Shirakawa (Niigata University, Japan)
Reshaping Research in Japan: Interdisciplinary Innovation in Response to Governmental Pressures on Humanities and Social Sciences

ABSTRACT. In 2016, Japan initiated the integration of considerations for Ethical, Legal, and Social Issues (ELSI) within its 5th Science and Technology Basic Plan, marking a pivotal move towards fostering Responsible Innovation within the country's Science, Technology, and Innovation Policy framework. This commitment was reinforced by the 2021 iteration of the Science and Technology Innovation Basic Plan. The purpose of this presentation is to delve into the intricacies of Japan’s policy on Science Technology and Innovation, explore the emergence of ELSI as a critical element of policymaking in the context of the worldwide push for Responsible Research and Innovation (RRI), and highlight particular ELSI endeavors undertaken by Japanese academic institutions. These initiatives emphasize the contribution of the humanities and social sciences. Furthermore, this discussion aims to underscore the importance of understanding vulnerability within the framework of Responsible Innovation, distinguishing between "technological developments that transcend existing barriers" and "technological developments that acknowledge and incorporate these barriers."

Dean Shamess (University of Saskatchewan, Canada)
Bruce Cater (Trent University, Canada)
Byron Lew (Trent University, Canada)
Smith (York University, Canada)
The Role of Universities in Sustaining Innovation

ABSTRACT. This paper offers an explanation of why firms outsource the earliest stages of their R&D to universities. The technological innovations that drove the mid-20th century’s golden age of U.S. productivity growth resulted, in large part, from firms’ in-house pursuits not only of product and process development, but also of basic research that yielded critical advances in foundational science and engineering. By the 1980s, however, firms began downsizing their in-house basic research programs, outsourcing that early-stage work to universities, and concentrating their in-house efforts on the developmental stages of R&D.

Kyle Schirmann (Harvard University, United States)
Entry and Capabilities: Manufacturing in the East African Generic Drug Market

ABSTRACT. The Global South has long experienced difficulties in developing drugs for infectious diseases, in importing high-quality medications given limited foreign exchange resources, and in initiating and scaling local pharmaceutical manufacturing. Public health emergencies in recent decades have highlighted the subsequent regional and global risks: difficulties in transferring manufacturing processes for mRNA vaccines intensified the COVID crisis, while antibiotic monotherapy heightens the risk that bacteria develop resistance. Such vulnerabilities are present at all points in the pharmaceutical value chain, including basic science and translational research in drug development, manufacturing of active pharmaceutical ingredients (APIs), drug compounding, and distribution.

Ross Potter (Clarivate London, UK)
Ann Beynon (Clarivate US, United States)
The Societal Impact of Science: Using Topic Modeling to Investigate the Broader Impact of Academic Research
PRESENTER: Ross Potter

ABSTRACT. The societal impact of academic research is becoming increasingly significant to all stakeholders as science shifts towards a more open access model (Plan S 2019) and research funding becomes ever more competitive. Universities, governments and other bodies are striving to quantitively or qualitatively assess academic research in terms of societal impact. This impact can include economy, politics and health. Though it can be defined as measuring returns from publicly funded research (Bornmann 2012), what this means in practice is open to interpretation. Here, topic modeling - a statistical model based around text-mining used to discover ‘topics’ occurring in a set of documents (e.g., Zheng et al. 2021) - is used to demonstrate how societal impact topics can be isolated from other, more academic, topics and therefore used to illustrate what areas of society research is reaching and influencing. Coronaviruses was chosen as the research topic to model due to its global public health effects, links to several UN Sustainable Development Goals and economic impact. Such societal themes (as well as others) should emerge from topic modeling.

Ning Luo (The Hong Kong University of Science and Technology, Hong Kong)
Ohchan Kwon (The Hong Kong University of Science and Technology, Hong Kong)
Masaru Yarime (The Hong Kong University of Science and Technology, Hong Kong)
Peacocks Fly to the Southeast: How Talent Policies Foster Entrepreneurship in Emerging Economies

ABSTRACT. Entrepreneurship plays a vital role in driving technological innovation and economic growth (Aghion & Howitt, 1992). In particular, entrepreneurial firms founded by highly skilled and innovation-driven individuals tend to achieve superior growth and make significant economic contributions (Decker et al., 2014). However, emerging economies often find it hard to attract and nurture start-ups with high growth potential (Chatterji et al., 2014).

In this study, we investigate whether talent policies attracting high-skilled talents from abroad can boost entrepreneurship in emerging economies. Our empirical context is a particular talent policy in Shenzhen, China, named the Peacock Plan. Shenzhen was traditionally a manufacturing base, but the Chinese government aimed to transform the city into a hub for high-value, innovation-driven activities. Intending to close the skills gap for this transformation, the government launched the Peacock Plan in 2011 to attract high-skilled individuals from abroad to bolster the city’s technology sectors. These overseas talents included foreign professionals and Chinese returnees who had accepted higher education or worked overseas. Although not explicitly aimed at fostering entrepreneurship, anecdotal evidence suggests the Peacock plan managed to attract entrepreneurs.

J. Peter Nilsson (IIES Stockholm University, Sweden)
Lena Hensvik (Uppsala University, Sweden)
What Happens when Discrimination in Academia Becomes Salient?

ABSTRACT. We document individual, organizational, and field wide impacts following a public disclosure of evidence (Wennerås and Wold, 1995, 1997) suggesting substantial gender bias in the competence assessments of newly qualified PhDs (within 5 years of PhD) in biomedicine applying for a prestigious individual grant from the Swedish Medical Research Council (MFR) - the Swedish equivalent of the NIH. We show that the revelation triggered several changes within the MFR: within two years, the share all-male review committees decreased from 55 percent to zero; within five years, female authored reviews increased from 10 to 45 percent, while maintaining the scientific competence of the review committees.

Kai Li (University of Tennessee, Knoxville, United States)
Chenyue Jiao (University of Illinois Urbana-Champaign, United States)
Zhichao Fang (Renmin University of China, China)
Indexing of data papers in four research databases

ABSTRACT. Research data has become one of the most important objects in the research system during the past decade. Researchers across knowledge domains are relying on larger quantities of data to investigate their research topics, which has brought significant changes to how the research system works and how research is done [1]. Such changes have given rise to the activities of publishing data produced during research activities [4]. Among the different data publication models, one approach that is receiving growing attention is to publish the data as a peer-reviewed paper, or ”data paper” [5]. According to Chavan and Penev, a data paper is a “scholarly publication of a searchable metadata document describing a particular online accessible dataset, or a group of datasets, published in accordance to the standard academic practices” [3]. Serving as a descriptor and citable proxy of data objects in the bibliographic universe, this new academic genre can make research data more findable, citable, and reusable under the current research infrastructure [6]. Even though this genre is said to be embraced by researchers from many disciplines, very few science of science research have analyzed them, partly due to a lack of understanding of the infrastructure to support their discovery, especially how they are included and indexed in research databases.

Kai Li (University of Tennessee, Knoxville, United States)
Tianji Jiang (University of California, Los Angeles, United States)
Accuracy of Datasets in OpenAlex

ABSTRACT. Research data is increasingly recognized as a valid if not ”first-class” research output [4]. During the past few years, various infrastructures have been set up to facilitate the long-term preservation, sharing, and (re-)use of research data, such as open data repositories. These infrastructures allow researchers to track citations to the datasets, which creates new possibilities for the science of science community to examine the new topic of research data along with classic topics in the field [3] and have more meaningful conversations with the research policy community in the open science movement [5].

Ira Kuhn (National Institutes of Health, United States)
Marina Volkov (National Institutes of Health, United States)
Proposed NIH Pilot Metascience Scholars Program

ABSTRACT. As the largest public funder of biomedical research in the world, the National Institutes of Health (NIH) has supported decades of advances that have expanded fundamental scientific knowledge and improved health. NIH has long engaged with the metascience community to build on an evidence-base to strengthen its stewardship of public funds with the goal of enhancing health, lengthening life and reducing illness. For example, one NIH Institute, the National Instititute of General Medical Sciences, has an ongoing program in collaboration with the National Science Foundation that supports a portfolio of research to provide scientific analysis of important aspects of the biomedical research enterprise and efforts to foster a diverse, innovative, productive and efficient scientific workforce from which future research discoveries and scientific leaders will emerge (1). To fully capitalize on contributions from the growing metascience field, NIH is exploring avenues for working more closely with metascience scholars to identify and pursue study questions of mutual interest that may lead to tangible strategies for change at NIH.

Kara Kedrick (Carnegie Mellon University, United States)
Kevin Zollman (Carnegie Mellon University, United States)
Simon DeDeo (Carnegie Mellon University, Santa Fe Institute, United States)
Cascades, Leaps, and Strawmen: How Explanations Evolve
PRESENTER: Kara Kedrick

ABSTRACT. Scientific breakthroughs often take the form of novel explanations. How these explanations are developed, however, has undergone significant changes. The burden of extensive accumulated knowledge has shifted the paradigm of discovery, from the work of solitary geniuses to collaborative teams that build upon the ideas of others [4, 9]. This shift underscores the significance of viewing explanations as social objects [8, 6], where science progresses by a cooperative or competitive exchange, discussion, and revision of explanations with others. Very little is understood about this process, and, in particular, how our explanations are affected by the presence of prior explanations that, through their appeal or incompetence, draw our attention to different features of the matter to hand, and serve as raw material for our own thoughts.

Hiroyuki Miyamoto (Tokyo Institute of Technology, Japan)
Noriyuki Higashide (The University of Tokyo, Japan)
Cristian Mejia (The University of Tokyo, Japan)
Ichiro Sakata (The University of Tokyo, Japan)
Yuya Kajikawa (The University of Tokyo, Japan)
Understanding Venture Capital Investment Trends through Social and Academic Recognition

ABSTRACT. Venture capitals (VCs) are important catalysts in the innovation ecosystem [1]. They provide risk money to startups with significant science and technology potential. However, little is understood about the dynamics of VC investments, especially their increase and peak-out trend pattern. This gap exists even within studies on the Diffusion of Innovation, where VC investment data has been scarcely utilized as a source for analysis [2, 3]. Considering VCs are key to transferring scientific achievements into innovations in industry, their investment dynamics on a topic might relate to attention from business and science. This study proposes methods to assess large-scale VC investment trends by examining the relationship between the VC trends and the emergence and accumulation of news articles and academic papers. We find that VCs invest with foresight in specific science and technology topics before they gain social attention, and that academic accumulation positively correlates with sustained investment growth.

Gabriel Falcini (University of Campinas, Brazil)
The scientific impact of scholarly outputs generated under collaboration in Brazil: a bibliometric approach

ABSTRACT. In the past, studies have suggested that innovation could offer solutions to development challenges in the Global South, with some arguing that being late in the development and adoption of technological innovations could potentially provide advantages when emulating international experiences (Fagerberg & Godinho, 2006; Mobaraki, 2017). However, mere technology replication has proved inefficient, evidencing the need for indigenous innovation that takes into account local specificities (Fu et al., 2011; Smith & Leydesdorff, 2014; Mohamed et al., 2022). Today, the challenges persist: economic inequality, social and gender equity, and sustainability are contemporary issues that add another layer of complexity to the scenario (Chataway et al., 2014; Mazzucato, 2019).

Frances Carterjohnson (National Science Foundation, United States)
Lisa Gajary (Ohio State University, United States)
Anand Desai (Clarivate, UK)
Phillips (Decision Catalyst LLC, United States)
If the next big thing in R&D evaluation is AI, who benefits?

ABSTRACT. Artificial intelligence (AI) applications, particularly those involving machine learning and natural language processing, have emerged as the “next big thing” in evaluation (Mason and Montrosse-Moorhead, 2023). However, before we grasp yet another shiny object, it is essential that we first consider the potential consequences of adoption of the new technology in terms of the ethics and duties of evaluators of investments in the research development and innovation enterprise. We are responsible for ensuring that the voices of our funders, their awardees, and the diverse communities served are included as we incorporate innovative technological methods into practice. In this paper, we apply Everett Rodgers’ (1962) diffusion of innovation theory and its more recent expositions, to explore the inclusivity of our AI modeling practices. According to Rogers, an innovation must have five characteristics to spread: (1) high relative advantage, (2) testability, (3) observability, (4) compatibility, and (5) low complexity. We then show how we can use these characteristics as heuristics that aid in identifying how to include diverse voices in the products and processes of AI modeling (Kapoor, Bommasani, et. al. 2024).

Likun Cao (University of Chicago, United States)
James Evans (University of Chicago, United States)
Depth versus Breadth in the Hierarchical Recombination of Technology

ABSTRACT. The human knowledge system, just like many other complex systems, is in a process of continuous evolution. Among approaches to study the driving forces of knowledge emergence, recombination theory represents a large and growing scholarly focus in social science (Xiao et al., 2021). The recombination theory assumes a resemblance between biological and knowledge evolution: Just like offspring in the biological world are born out of genetic reshuffling through reproduction, new ideas and concepts in the knowledge system are also generated by shuffling the components of prior ones. This notion has been echoed by many empirical works. To date, two branches of studies—mostly based in sociology and management—investigate the conditions for combinatorial innovation in knowledge.The first focuses on the features of knowledge components, which are the basis for combinations. These features include component newness, particularity, and familiarity. The second stream of work examines component architecture, focusing on structural concepts such as “depth” vs. “breadth”, hierarchy and modularity, often measured on a level above components.

Adriana Bin (University of Campinas, Brazil)
Evandro Coggo Cristofoletti (University of Campinas, Brazil)
Ana Carolina Spatti (University of Campinas, Brazil)
Daniela Maciel Pinto (University of Campinas, Brazil)
Prevato Lopes (University of Campinas, Brazil)
Campgnolli (University of Campinas, Brazil)
Dematte (University of Campinas, Brazil)
Colugnati (Federal University of Juiz de Fora, Brazil)
Deviations in Peer Review Processes: A Study of Funding Selection Practices

ABSTRACT. Recent discussions regarding more transparent practices of research prioritization, selection, and funding involves, among other aspects, a deeper understanding of traditional procedures for scientific evaluation, known as peer review. In this context, discussions frequently highlight potential limitations of the classic peer review model, which is seen as vulnerable to various forms of interference and inaccuracies that may impact both the selection of research for funding and the determination of publication outcomes (Recio- Saucedo et al., 2022). Despite this acknowledgment, research on peer review is scarce, both when considering this practice in the context of scientific journals (Squazzoni et al., 2020) and funding agencies. One of the main reasons for this gap is the absence of data.

Shea Andrews (University of California, San Francisco, United States)
Stephanie Ponte (New Vision Research, United States)
Joseph Helpern (New Vision Research, United States)
The Effect of Research Funding on Early-Career Researchers: Insights from the New Vision Investigator Award

ABSTRACT. Research grants provide critical career resources and opportunities that substantially influence a researcher’s career advancement, mobility, and overall trajectory (Melkers et al. 2022). For early career researchers, securing a competitive research award is a critical step in establishing oneself as an independent researcher by creating protected time to conduct research based on their own research questions and designs. Recipients of career development awards from federal agencies have been shown to be more likely to show an increase in the number of publications and securing additional funding (Conte and Omary 2018; Millsap et al. 2001; Nikaj and Lund 2018). Similarly, privately funded early-career awards result in increased funding and higher-impact publications (Dorismond et al. 2021). Nevertheless, there is a need for further investigation to evaluate the impact of receiving early career awards from private entities on researchers' productivity and influence.

Vicente Amado Olivo (Michigan State University, United States)
Nutan Chen (Volkswagen Group ML Research Lab, United States)
Wolfgang Kerzendorf (Michigan State University, United States)
Neural Author Name Disambiguator: Developing a Global Researcher Registry for an Interconnected Research System

ABSTRACT. The exponentially growing global research system has resulted in a complex, disconnected network that hinders effective research [2]. Specifically, the exponential growth is underscored by the global number of researchers growing by 15% from 2014 to 2018 [7]. Additionally, traditional research structures were not inherently designed to accommodate such scale. For example, the surge in the number of researchers has created challenges in identifying suitable experts for various research processes [see e.g., 6]. The need for appropriate experts extends to research processes, such as: identifying appropriate peer reviewers, discovering isolated researchers for conference talks, fostering collaborations across disciplinary networks, and driving large-scale projects.

José Córdova (Northwestern University, United States)
Toma Hirose (Northwestern University, United States)
Haoshan Shi (Northwestern University, United States)
Romero (Northwestern University, United States)
Hórvat (Northwestern University, United States)
Early Indicators of Academic Articles’ Online Popularity

ABSTRACT. Science communication has undergone dramatic changes over the past decades [1]. Although the description and prediction of content popularity on online platforms have been widely studied [2], virality of content in the scientific communication domain still needs investigation. To fill this gap in prior research, we provide insights into what features are associated with the popularity of research articles online. Identifying early indicators of online popularity can help stakeholders to quickly detect and respond to potentially viral scientific content, including misinformation. Furthermore, these indicators can inform strategies for effectively promoting accurate scientific literature. With this in mind, we formulate our guiding research question: What features serve as effective early indicators of the online popularity of scientific articles?

Randi Vogt (Geisinger Health System, United States)
Jake Hofman (Geisinger Health System, United States)
Patrick Turley (Geisinger Health System, United States)
Zhang (Geisinger Health System, United States)
Mestechkin (Geisinger Health System, United States)
Heck (Geisinger Health System, United States)
Goldstein (Geisinger Health System, United States)
Chabris (Geisinger Health System, United States)
Meyer (Geisinger Health System, United States)
Scatter versus decile bar plots: What do different ways of visualizing the same polygenic index-phenotype relationship convey to lay audiences?

ABSTRACT. On social media (e.g., X, Reddit), blogs, and in the popular press, laypeople are exposed to the results of research originally published in scientific journals. In particular, the visual representations included in scientific papers are often screenshotted and shared to communicate—or miscommunicate—scientific results. The choices that authors make about how to visually present results may therefore impact the conclusions that laypeople take away. For instance, some have suggested that visualizing the relationship between a phenotype (e.g., intelligence) and a polygenic index (PGI, also known as a polygenic score) by presenting a single bar for each PGI decile suggests that genes are more determinative than when the same relationship is presented by scatter plots, which show the full range of individual phenotypes for each decile (Harden, 2023; Harden & Belsky, 2018). If this is true, laypeople who encounter these plots may reach incorrect conclusions about the genetics of sensitive traits, even compared to the published results themselves.

Jing Zhang (SOAS University of London, Renmin University of China, UK)
Sensitive Intervention Points of Climate Policies in China from Analysis of Patent Data: A Link Prediction Model

ABSTRACT. Global Financial Crisis (GFC) in 2007-2008 gives us a stern warning that traditional economic models and tools are far away from addressing problems in reality. And one of the main reasons could be that policy making process takes a top-down way. Although regulators keep an eye on societal shifts and technological development, incomplete and less valuable information are always a constraint to make effective strategies, which may even be catalysts for disasters. Climate change, the so-called “Green Swan”, plays the role as an enormous externality and has been bringing more uncertainties and challenges to the economic system. Based on the above, this paper builds a model through the method of link prediction covering all patent data about new energy in China (181,781 in total from 2010-2023) to anticipate the likelihood of connection between unconnected nodes of patents and then uses them as the early warnings for future important climate policies which may act as triggers or sensitive intervention points of phase transitions. This paper promotes a quantitative and bottom-up method of policy-making and suggests to pay more attention to edge innovation.

Stefano Verzillo (European Commission, Joint Research Centre, Spain)
Enkelejda Havari (IESEG, France)
Elena Meroni (European Commission, Joint Research Centre, Spain)
Corinna Ghirelli (Bank of Spain, Spain)
The Long-Term Causal Effects of Winning an ERC Grant

ABSTRACT. Governments allocate a considerable portion of their budget to supporting basic research in a variety of disciplines (Jacob and Lefgren, 2011). Several European countries are now distributing research funds through competitive grants, assessing not only the profiles of principal investigators but also the potential impact and quality of the proposed projects. Despite a myriad of programs and grant-assignment mechanisms, there is a prevailing scarcity of evidence regarding their impact on scientific output and related dimensions (Ganguli, 2017; Adda and Ottaviani, 2023).

Ziyu Chen (University of Hong Kong, Hong Kong)
Xiaohu Zhang (University of Hong Kong, Hong Kong)
The territorial dynamics of technological jump and differentiation in China: Insights from topological data analysis

ABSTRACT. Cities do not exist as isolated spatial entities unaffected by technological advancements occurring beyond their borders; rather, they engage in competition within the urban system for both capital and high-skilled labor to generate new inventions. Analogous to how an organism carves out an ecological niche by interacting with and adapting to external environment, cities also continuously construct their technological niches as they adopt new technologies and obsolete old ones.

Carolina Biliotti (IMT School for Advanced Studies, Italy)
Jeffrey Lockhart (University of Chicago, United States)
Too Doomed to Surprise: Gender Bias in Emerging and Surprising Science

ABSTRACT. Despite the increased participation of women in science (Huang et al., 2019; Lerchenmueller, Sorenson, and Jena, 2019; Muri´c, Lerman, and Ferrara, 2020; Hengel, 2022; Bendels et al., 2017) and there being no evidence of higher quality in male scientific production (Hengel, 2022), women scientists remain under-represented in academic and scientific leadership. Described as the discovery paradox, minority groups, and women in particular, are likely to innovate by originally merging previously ignored links between scientific concept, but receive less uptake in terms of future career progression (Hofstra et al., 2020). The lower return to scientific innovation for women could be related to the specific nature of women’s innovation. Scientific innovation comes in multiple forms, which might be incentivised and rewarded differently. Alternatively, men and women may be differentially rewarded for the same kinds of innovations.

Theresa Lant (Pace University, United States)
Susan Day (University of Colorado, Boulder, United States)
Enhancing DEIA and Innovation with Social Science and STEM Collaborations

ABSTRACT. Investments in interdisciplinary collaborations in STEM fields have increased [1]. However, there is still a lack of integration between STEM fields and social sciences [2]. Developing convergent and translational research should challenge STEM investigators to pay attention to non-technical components of collaboration, and engage with investigators from the social sciences [3]. Unfortunately, a focus on the technical aspects of collaboration often encourages complacency about the potential contributions of social sciences and the importance of social impact [4]. Diverse research teams outperform homogeneous ones on a wide variety of measures, including innovation [5]. Convergent science, medicine, and engineering teams are also more likely to achieve societal gains, including the well-being of individuals and workforce competitiveness by engaging social scientists.

Cristian Mejia (The University of Tokyo, Japan)
Identifying links and gaps between scientific research and policies in the public discourse

ABSTRACT. The social acceptance of novel science and technology varies across different sections of the population. This study proposes a method to measure social agreement with scientific research and related policies on controversial topics like climate change and sustainable food systems. By combining network analysis of academic citations with sentiment analysis of news articles, the research develops a semantic similarity heatmap to visualize links and gaps between scientific findings and public discourse. Using palm oil research as a case study, the project demonstrates how this approach can inform policymaking by identifying areas of consensus and controversy.