next day
all days

View: session overviewtalk overview

11:45-12:00 Opening Ceremony

Xiaofan Wang & Yamir Moreno

12:00-13:10 Session 1: Invited Talks (A. Horvat & J. Park)
Networks in online scholarly communication
Creativity and Networks
13:10-14:05 Session 2: Lightning Session I
Mobility network reveals the impact of spatial vaccination heterogeneity on COVID‐19

ABSTRACT. COVID-19 pandemic is not only a public health challenge but also an immense societal problem because social processes, such as political polarization and social contagion, significantly impact the course of epidemics. Although the mass vaccination presents an effective solution—reaching the herd immunity, it is still challenging to predict the course of the pandemic as many sociopolitical factors come into play and multiple variants have emerged. These factors include highly unequal vaccine allocation across locations, heterogeneous vaccine acceptance across social groups, and their mixing patterns in social and mobility networks. Such heterogeneity raises important questions: What are the implications of heterogeneous spatial vaccine uptake across the society? How can the COVID-19 data inform us on improving vaccination campaign strategies for both the current and future pandemics? Our study addresses these questions through large-scale epidemic simulations on the U.S. mobility network, using the observed and hypothetical vaccination distributions. Departing from highly aggregated models to understand vaccine performance [1], we employ a data-driven approach to study the impact of spatial vaccination heterogeneity [2]. Specifically, we leverage fine-grained human mobility data, vaccination data, and census data in the U.S. These rich datasets, along with fine-grained data-driven models, enable us to study the outcome of hypothetical vaccination distributions and vaccination campaigns with unprecedented detail [3]. The goal of our study is twofold—to develop a general network-based framework for analyzing the impact of spatial vaccination heterogeneity on transmission and to provide policy recommendations for location-based vaccination campaigns. While our study examines the state of vaccination at a specific point in time, we show that our main conclusions are robust under different transmission dynamics and vaccination distributions and our methodology can be used for future epidemics. We begin by investigating the impact of spatial vaccination heterogeneity on COVID-19 by focusing on two major network effects. The first network effect is homophily, a phenomenon on the clustering of similar people, either due to sorting, social contagion, or local regulations. In our context, homophily captures the fact that vaccination rates are similar among geographically close or socially connected locations. A high level of homophily in vaccination leads to clusters of the unvaccinated, which may trigger localized outbreaks and produce more cases than expected by the overall vaccination rate. The second network effect is the hub effect, where the vaccination rate of central and highly mobile places can have a disproportionate impact on the case count. Due to various reasons such as the urban-rural divide, hubs in the U.S. generally have a higher vaccination rate, which may potentially reduce the severity of outbreaks. We visualize these two patterns at the county level in Fig.1 for illustration. To quantify the impact of these two effects on case counts, we examine both synthetic and fine-grained U.S. mobility networks. We design synthetic networks that exhibit either hub effect or homophily to study how these two effects operate in isolation. By comparing the original vaccination distribution with hypothetical vaccination distributions where we remove or flip the direction of homophily or hub effects, we find that the homophily effect exacerbates the size of an outbreak, while the hub effect attenuates it. Next, we repeat the same procedure on the empirical mobility networks and the COVID-19 vaccination distribution in the U.S. Because vaccination data is only available at the county level, we leverage additional fine-grained census features and Bayesian deep learning to infer vaccination rates at the level of the census block groups (CBGs). We show that given the vaccination distribution of COVID-19 in January 2022, the observed homophily accounts for at least a 9.3% increase in new COVID-19 infections within 30 days in comparison with hypothetical scenarios without homophily, while the hub effect caused by urban-rural divide reduces the cases. In the second part of our study, we address the question: where do increased vaccination rates provide the greatest benefit in controlling the pandemic? Inspired by our findings on the network effects, we propose an efficient algorithm to find the optimal locations that can maximally reduce the case numbers given a fixed increase in the overall population vaccination rate. While it is computationally challenging to search over all possible vaccination strategies based on transmission simulations for 200,000 CBGs, our algorithm solves these challenges by using gradient-based optimization on a differentiable surrogate objective. We predict that our proposed vaccination strategy can reduce the number of cases by 9.5% with only a 1% increase in overall vaccination rate — a 2.5 times improvement compared to vaccinating random locations. We also compare other reasonable vaccination strategies. We find that vaccinating hubs is 1.5 times more effective than random locations, whereas surprisingly vaccinating the least vaccinated locations (a strategy attempted by several states) is only 0.3 times more effective. These results suggest that accurate location-based targeting can be a highly potent strategy to substantially reduce case counts, although the societal and political challenges need to be resolved.

Modelling Network Evolution using Information Theory

ABSTRACT. In information theory, the Asymptotic Equipartition Property (AEP) and the idea of typical sequences form the basis of data compression. The most general version of the AEP, known as the Shannon-McMillan-Breiman (SMB) theorem can be applied to finite-valued stationary ergodic processes. As our research focus is on compression of networks, we are interested in forming a stochastic process of networks to which general AEP can be applied. To this end, we have defined a process called a `Network Evolution Chain' to simulate the dynamics of evolving networks and analyse it by means of information theory.

Universal citation dynamics in science and law

ABSTRACT. Human culture is built on our ability to record and accumulate knowledge. Perhaps one of the most sophisticated examples is the scientific system. Science accumulates knowledge over time by building on existing work through citations, which allow scientific communities to compress and use existing knowledge. Examining how scientists cite existing work has revealed many insights into the ways scientists combine existing knowledge to produce new knowledge. For instance, the law of preferential attachment (or rich-get-richer principle) reveals that the historic attention a scientific publication receives is indicative of its future regard. The recency bias, which observes that recent publications tend to receive more citations, underscores the scientific community’s limited “attention span.” However, most of these results are confined to the science of science, making it difficult to determine the primary driver of these patterns—are they the result of fundamental limitations and characteristics of human beings or a consequence of the way the scientific enterprise operates, which is largely a result of historical accidents? How much can we generalize the citation and knowledge generation patterns discovered in science to other knowledge systems that build on the past? Here, we address these questions by focusing on another sophisticated knowledge system—the common-law legal system in the U.S.

In common law jurisdictions, judges and lawyers base their legal reasoning on citations to prior judicial decisions. As a result, the common law builds on the past and is constantly in flux as judges issue new decisions. For these reasons, citations are a critical part of the common law system. Both scientific and legal systems heavily build on past knowledge, where the former discovers rules of nature and society while the latter formalizes rules for society. While both systems rely on citations and are carried out by humans, they are distinct in many ways. Scientists choose their own research problems while judges are assigned to cases; science is self-organizing while the legal system is strictly enforced by law; although both professions require substantial training, the entry into science is much more open (anyone can publish) and the number of scientists has been growing rapidly while the entry to the judicial system is limited and the number of judges is not growing exponentially; science aims to be egalitarian whereas the legal system has a codified hierarchy. Such contrasts in how the two systems are organized and operate provide us an ideal opportunity to test whether the “laws” of citation can be generalized beyond science.

We show that, despite the differences between the two systems, the fundamental citation dynamics are remarkably universal, suggesting that citation dynamics are primarily shaped by intrinsic human constraints and robust against the numerous factors that distinguish the two systems. We demonstrate that the two systems share similar characteristics across exponential growth (Fig. 1-Left A-F), heterogeneous citation distributions (Fig. 1-Left G-H), preferential attachment (Fig. 1-Left I-J), citation recency (Fig. 1-Left K-L), publications with intense but late recognition (Fig. 1-Left M-P), and individual author fitness (Fig. 1-Right A-F). Our results build a strong bridge across two disparate systems, suggesting that theories and tools that describe human-based reference mechanisms (e.g., science, common law) can be translated into one another.

Motif transition intensity: A novel network-based early warning indicator for financial crises

ABSTRACT. Financial crisis, rooted in a lack of system resilience and robustness, is a particular type of critical transition that may cause grievous economic and social losses and should be warned against as early as possible. Regarding the financial system as a time-varying network, researchers have identified early warning signals from the changing dynamics of network motifs. In addition, network motifs have many different morphologies that unveil high-order correlation patterns of a financial system, whose synchronous change represents the dramatic shift in the financial system’s functionality and may indicate a financial crisis; however, it is less studied. This paper proposes motif transition intensity as a novel method that quantifies the synchronous change of network motifs in detail. Applying this method to stock networks, we developed three early warning indicators. Empirically, we conducted a horse race to predict ten global crises during 1991-2020. The results show evidence that the proposed indicators are more efficient than the VIX and the other 39 network-based indicators. In a detailed analysis, the proposed indicators send sensitive and comprehensible warning signals, especially for the U.S. subprime mortgage crisis and the European sovereign debt crisis. Furthermore, the proposed method provides a new perspective to detect critical signals and may be extended to predict other crisis events in natural and social systems.

(CANCELLED) resilience and cooperation of social-ecological systems

ABSTRACT. Material flows − such as food trade − allow human societies to rely on agricultural natural resources available both locally and in other regions of the planet. Thus, in a globalized world multiple pools of the same resource are often harvested by multiple users through a network of interactions. It is not clear to what extent the interconnectedness, structure, and modularity (i.e., when subsystems of nodes exhibit stronger internal connectivity) of such a network may affect the resilience of the system. Here we investigate the impact of globalization on the sustainable use of natural resources (e.g., fisheries, forests, or croplands) and apply this framework to for food trade production (e.g., fisheries, forests, or croplands). We find that their resilience may either increase or decrease with the network’s interconnectedness and modularity, depending on whether its structure is random or heterogeneous, respectively. Global food trade exhibits a heterogeneous structure and its resilience is decreasing with the increase in connectivity of the last few decades.

Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection

ABSTRACT. The proliferation of fake news and its serious negative social influence push fake news detection methods to become necessary tools for web managers. Furthermore, the multi-media nature of social media makes multi-modal fake news detection become popular, for its ability to extract various modal features of various news. However, current literatures on multi-modal detection are more likely to pursue the model's accuracy but ignore the robustness. To address this problem, we first propose a comprehensive robustness evaluation of multi-modal fake news detection models under adversarial circumstances. In this work, we simulate the attack methods of malicious users, i.e., those who post fake news on social networks for specific purposes, and developers, i.e., those who are involved in detector training and can add backdoor triggers to the model. Specifically, we evaluate five multi-modal detectors using attacks on aspect including adversarial fake news in the testing stage and backdoor trigger fake news in the training stage. In particular, we use six adversarial attacks to fool detectors and two backdoor attacks to injecting backdoors into feature extractors on image and text modalities. In the last, we ensemble different adversarial and backdoor methods to attack both modalities, to evaluate the robustness of the detectors in the multi-modal attack scenario. Comprehensive experiment results imply some novel insights: (1) The detection performance of the state-of-the-art (SOTA) detectors degrades significantly under adversarial attacks, even worse than not well-performing detectors in normal conditions; (2) Most multi-modal detectors are more vulnerable when subjected to adversarial attacks and backdoor attacks on image features than text features; (3) The images that correspond to the popular events will cause significant degradation to the model when they are subjected to backdoor attacks; (4) The detection performance of these detectors under multi-modal attacks is worse than under uni-modal attacks; (5) Defensive methods, e.g. adversarial training, will improve the robustness of the multi-modal detectors under the attacks.

Alternating quarantine for sustainable epidemic mitigation

ABSTRACT. Battling the spread of SARS-CoV-2, most countries have resorted to social distancing policies, imposing restrictions, from complete lockdowns to severe mobility constraints, gravely impacting socioeconomic stability and growth. Current observations indicate that such policies must be put in place for extended periods (typically months) to avoid the reemergence of the epidemic once lifted. This, however, may be unsustainable, as individual social and economic needs will, at some point surpass the perceived risk of the pandemic.

To bypass this gridlock we present the alternating quarantine (AQ): first, households are partitioned into two cohorts, then these cohorts undergo weekly successions of quarantine and routine activity. Hence, while Cohort 1 remains active, Cohort 2 stays at home and vice versa, ensuring little interaction between the cohorts. This provides highly efficient mitigation, alongside continuous socioeconomic productivity, in which half of the workforce remains active at each point in time.

The AQ strategy limits social mixing while providing an outlet for people to sustain their economic and social routines. Its efficiency is rooted in two independent mitigating effects: Dual-partition of population and time. Splitting the population into two isolated cohorts reduces the number of infectious encounters, as, indeed, classrooms, offices, and public places operate at half their usual density. On top of that, each cohort is only active for half of the time, one week out of two - further attenuating infections within each cohort by, roughly, an additional factor of one-half.

Synchronization with the disease cycle. AQ treats one of the main obstacles for COVID-19 mitigation – the ∼1 week incubation period of SARS-CoV-2. During this incubation, exposed individuals behave as invisible spreaders, unaware of their potential infectiousness. To illustrate AQ’s remedy, consider an individual in Cohort 1 who was active during week 1, and therefore might have been infected. This individual will soon enter their presymptomatic stage, precisely the stage in which they are invisible, and hence contribute most to the spread. However, according to the AQ routine, they will be confined to their homes during week 2, and consequently, they will be isolated precisely during their suspected invisible spreading phase. If, by the end of week 2 they continue to show no symptoms, most chances are that they are, in fact, healthy, and can, therefore, resume activity in week 3 according to the planned routine. Conversely, if they do develop symptoms during their quarantine, they (and their cohabitants) must remain in isolation, similar to all symptomatic individuals.

Hence, the weekly succession is in resonance with the natural SARS-CoV-2 disease cycle, and in practice, leads to isolation of the majority of invisible spreaders. If implemented fully, it guarantees, in each bi-weekly cycle, to prune out the infectious individuals and sustain an active workforce comprising a predominantly uninfected population.

Reviving a failed network via microscopic interventions

ABSTRACT. Complex systems, biological, social or technological, often experience perturbations and disturbances, from overload failures in power systems to species extinction in ecological networks. The impact of such perturbations is often subtle, the system exhibits a minor response, but continues to sustain its global functionality. However, in extreme cases, a large enough perturbation may lead to a large-scale collapse, with the system abruptly transitioning from a functional to a dysfunctional dynamic state. For instance, in cellular dynamics, genetic knockouts, beyond a certain threshold, lead to cell death; in ecological systems, changes in environmental conditions may, in extreme cases, cause mass-extinction; and in infrastructure networks, a cascading failure, at times, results in a major blackout.

When such collapse occurs, the naïve instinct is to reverse the damage, retrieve the failed nodes and reconstruct the lost links. Such response however is seldom efficient, as (i) we rarely have access to all system components, limiting our ability to reconstruct the perturbed network; (ii) even if we could reverse the damage, due to hysteresis, in many cases, the system will not spontaneously regain its lost functionality.

To address this challenge, we consider here a two-step recovery process •Step I. Restructuring. Retrieving the network topology and weights to a point where the system can potentially regain its functionality •Step II. Reigniting. Introducing dynamic interventions to steer the system back to its functional state.

The challenge in Step II is that in most practical scenarios we lack direct control over the dynamic activity of the majority of the nodes. Hence, we seek to reignite the system via micro-interventions, i.e. controlling just a small number of components, typically - just a single or, at most, few nodes. Therefore, in this talk, we will characterize the conditions under which such single-node reigniting can be achieved. Along the way, we will expose a new, currently unexplored, dynamic phase of complex systems - the Recoverable phase, capturing a state in which the system can be driven towards functionality by controlling just a microscopic set of nodes.

Network medicine framework reveals herb-symptom relation of Traditional Chinese Medicine

ABSTRACT. Traditional Chinese medicine (TCM) relies on combinations of natural medical products to treat symptoms and diseases. While clinical data have demonstrated the effectiveness of certain TCM-based treatments, the mechanistic nature of how TCM herbs treat diseases remains largely unknown. In addition, existing studies focus on effect of single herbs or prescriptions, overlooking the high-level principles of TCM. To uncover the mechanistic nature of TCM on a system level, in this work we establish a network medicine framework for TCM from the human protein interactome for the systematic study of herbs, symptoms (diseases), and their relations. Leveraging this platform, we observed that genes associated with a symptom cluster into localized modules (top left panel of the figure). Furthermore, the network distance between symptom modules is indicative of the symptoms’ co-occurrence and similarity (bottom left panel). Next, we show the network proximity of a herb’s targets to a symptom module (top right panel) is indicative of the herb’s efficacy in treating the symptom. Specifically, the network proximity z-score, representing relative network distance compared to random expectation, captures known herb-symptom indications (bottom right panel). Finally, we apply our framework to real-world hospital data, to show that the relative risk of symptoms of patients increases as network distance decreases, and that the patients’ recovery after herbal treatment matches the predictions of herb-symptom proximity. Moreover, we identified novel herb-symptom pairs that are predicted to be effective by network proximity and proven effective in hospital data, but previously unknown to the TCM community, highlighting the predictive power of our framework in herb discovery and repurposing opportunities. Together, network medicine offers a novel platform to understand the mechanism of traditional medicine and to predict herbal treatments against diseases.

Semantic speech networks linked to formal thought disorder in early psychosis

ABSTRACT. Background and Hypothesis. Mapping a patient's speech as a network has proved to be a useful way of understanding formal thought disorder in psychosis (Mota et al., 2012; Mota et al., 2017). However, to date, graph theory tools have not incorporated the semantic content of speech, which is altered in psychosis (Covington et al., 2005; Kuperberg et al., 2010; Ditman & Kuperberg, 2009).

Study Design. We developed an algorithm, “netts”, to map the semantic content of speech as a network, then applied netts to construct semantic speech networks for a general population sample, and a clinical sample comprising patients with first episode psychosis (FEP), people at clinical high risk of psychosis (CHR-P), and healthy controls.

Study Results. Semantic speech networks from the general population were more connected than size-matched randomised networks, with fewer and larger connected components, reflecting the non-random nature of speech. Networks from FEP patients were smaller than from healthy participants, for a picture description task but not a story recall task. For the former task, FEP networks were also more fragmented than those from controls; showing more connected components that included fewer nodes. CHR-P networks showed fragmentation values in-between FEP patients and controls. A clustering analysis suggested that semantic speech networks captured novel signal not already described by existing NLP measures. Network features were also related to negative symptom scores and scores on the Thought and Language Index, although these relationships did not survive correcting for multiple comparisons.

Conclusions. Overall, these data suggest that semantic networks can enable deeper phenotyping of formal thought disorder in psychosis. Whilst here we focus on network fragmentation, the semantic speech networks created by Netts also contain other, rich information which could be extracted to shed further light on formal thought disorder. We are releasing Netts as an open Python package. Ultimately, speech markers derived from this tool could lead to significant advances in disease prediction and clinical practice.

Natlia B. Mota, Mauro Copelli, and Sidarta Ribeiro. Thought disorder measured as random speech structure classifies negative symptoms and schizophrenia diagnosis 6 months in advance. npj Schizophr., 3(1):1–10, 2017. ISSN 2334265X. doi: 10.1038/s41537-017-0019-3.

Natalia B. Mota, Nivaldo A.P. Vasconcelos, Nathalia Lemos, Ana C. Pieretti, Osame Kinouchi, Guillermo A. Cecchi, Mauro Copelli, and Sidarta Ribeiro. Speech graphs provide a quantitative measure of thought disorder in psychosis. PLoS One, 7(4):1–9, 2012. ISSN 19326203. doi: 10.1371/journal.pone.0034928.

Michael A. Covington, Congzhou He, Cati Brown, Lorina Na i, Jonathan T. McClain, Bess Sirmon Fjordbak, James Semple, and John Brown. Schizophrenia and the structure of language: The linguist’s view. Schizophr. Res., 77(1):85–98, 2005. ISSN 09209964. doi: 10.1016/j.schres.2005.01.016.

Gina R. Kuperberg. Language in Schizophrenia Part 1: An Introduction. Linguist. Lang. Compass, 4(8):576–589, 2010. ISSN 1749818X. doi: 10.1111/j.1749-818X.2010.00216.x.

Tali Ditman and Gina R. Kuperberg. Building coherence: A framework for exploring the breakdown of links across clause boundaries in schizophrenia. J. Neurolinguistics, 23(3): 254–269, 2010. ISSN 09116044. doi: 10.1016/j.jneuroling.2009.03.003.

Competition network structure and firm performance: The mediating role of product market entry

ABSTRACT. Some recent studies combine competitive dynamics theory and social network theory to create the competition network of competitive interdependence among firms. Although some research found that the relational structure of the competition network influences the product market entry behavior of firms, there is still limited empirical evidence about that idea, and it remains unanswered whether product market entry behavior can explain the influence of competition network structure on firm performance. I aim to fill those gaps by providing evidence from the dataset of 28,700 exporting deals of cashew kernels of 387 Vietnamese companies with 876 companies in 77 countries in 2016 and 2017. Using negative binomial regression analysis this study shows that while firm’s node degree has a negative direct effect on the level of change in revenue, it has a positive indirect effect on that outcome through product market entry strategy. In contrast, the negative effects of ego network density on product market entry and the level of change in revenue were not statistically significant. This research provides empirical evidence that competition on the focal firm is harmful to its performance, but it can be beneficial if the firm increases the number of product market entries. This research also questions the effects of ego network density on outcomes that were found in the literature.

14:50-16:05 Session 3A: Theory I
Functional observability and target state estimation in large-scale networks

ABSTRACT. Observing the internal states of a network system via measurement and/or estimation is fundamental for the prediction, control, and identification of large-scale complex systems, such as power grids, neuronal networks, and food webs. High dimensionality, however, poses physical and costs constraints on sensor placement, limiting our ability to make a network observable. Noting that often only a relatively small number of state variables are essential for control, intervention, and monitoring purposes in large-scale networks, we propose a graph-based theory of functional observability. A system is functionally observable when a targeted subset of state variables can be reconstructed from the available measurements, and our results establish conditions under which this is possible for large-scale networks. Figure 1A provides an illustrative example of a network system which, although not completely observable, is functionally observable with respect to the considered target node. Based on the developed theory, we further design two highly-scalable algorithms to: (i) place a minimal set of sensors to ensure the network functional observability and (ii) design the corresponding functional observer (estimator) with minimum computational cost. Figure 1B shows that the number of sensor nodes required to make a system functionally observable decreases substantially for a smaller number of target nodes in different complex network models and real-world datasets. Our methods are applied to cyberattack detection in power grids and the monitoring of the COVID-19 pandemic, demonstrating that the proposed functional observability approach can achieve accurate estimation with substantially fewer resources.

On the Spectra of Weighted Subgraph Generated Models

ABSTRACT. Random graph models have long been a primitive tool to understand complex physical, social, and biological systems. Much of the theoretical analysis on random graph models has been relying on the assumption that edges are formed independently of each other. Such assumption, however, fails to capture the higher-order connectivity patterns that are observed in large complex networks [1]. For example, triadic connections are more likely to be established among individuals who share common friendships [4], and a four-node motif (the semi-clique) can be found in co-authorship networks [1].

Building upon the recent development from [3], we develop a class of random-graph models - weighted subgraph generated models (wSUGMs) - that explicitly include higher-order structures in the formation process. In these models, various subgraphs (e.g., links, triangles, semi-cliques) are generated with different probabilities and the resulting network is the direct sum of all such subgraph generated networks normalized to keep the edge weights within [0,1].

We use a Chernoff inequality for matrices to show that the maximum eigenvalue of the adjacency matrix of such a random graph can be approximated by the maximum eigenvalue of the expected graph with high probability. The corresponding error bound is dependent only upon the graph size n and the maximum expected degree of the network and is converging to zero as the size n grows to infinity. As a corollary of this result, we show that critical nodes in the realized network can be predicted by using only the generating model. Specifically, we derive probability bounds for the convergence of centrality measures (degree, eigenvector, and Katz centrality) of nodes in networks sampled from the wSUGM to the centrality measures of the corresponding nodes in the expected network.

This convergence result justifies the use of the expected network instead of the realized one for drawing conclusion about node importance. This is of practical interest as in many cases information about the exact realized network may not be available, or may be changing over time (e.g., nodes may join or leave, and new links or triangles may be formed). Our result guarantees that information about the generating model (the wSUGM), which is typically easier and cheaper to collect [2], is sufficient for the analysis of fundamental network properties such as norm and centrality measures.

[1] Benson, Austin R., David F. Gleich, and Jure Leskovec. "Higher-order organization of complex networks." Science 353.6295 (2016): 163-166. [2] Breza, Emily, et al. "Using aggregated relational data to feasibly identify network structure without network data." American Economic Review 110.8 (2020): 2454-84. [3] Chandrasekhar, Arun G., and Matthew O. Jackson. "A network formation model based on subgraphs." Available at SSRN 2660381 (2016). [4] Granovetter, Mark S. "The strength of weak ties." American journal of sociology 78.6 (1973): 1360-1380.

Applying power-law models without assuming parameter values

ABSTRACT. The power-law distribution is one of the most important statistical models in network science. The usual technique to apply a power-law model is to first infer the scale exponent parameter from an observed data set and then use the estimated scale exponent to generate synthetic data sets. This approach has important limitations because it: does not apply a general power-law model, but the single specific distribution which happens to provide the best statistical fit to observations; and destroys correlations and other constraints typically present in complex network data. Here we propose a constrained surrogate method that overcomes these limitations by: choosing uniformly at random from a set of sequences exactly as likely to be observed under a discrete power-law as the original sequence, regardless of scale exponent; and showing how additional constraints can be imposed in the sequence. This non-parametric approach involves redistributing observed prime factors to randomize values in accordance with a power-law model but without restricting ourselves to a single scale exponent or destroying correlations. We test our results in simulated and real data sets.

Tuning (global, spectral) eigenvector localization using a (local, structural) centrality measure

ABSTRACT. This work studies the relationships between X-degree and eigenvector localization of the non-backtracking matrix of a graph. The former is a local, structural quantity formally derived from a perturbation analysis of the non-backtracking matrix. The latter is a global, spectral quantity known to indicate the breakdown of the application of certain mean-field approximations to real world networks. Our main contribution is to show how a local, structural quantity predicts and controls a global, spectral quantity; we do so via a new growth model that optimizes X-degree in order to tune the amount of eigenvector localization observed.

Hypergraph Generative Models and Associated Laplacians

ABSTRACT. Many complex systems involve interactions between more than two agents. Hypergraphs capture these higher-order interactions through hyperedges that may link more than two nodes. We consider the problem of mapping nodes in a hypergraph into one or higher dimensional locations such that most interactions are short-range. This embedding is relevant to many follow up tasks, such as node reordering, node clustering, and visualization. We show that two spectral hypergraph embedding algorithms, which reveal linear and periodic structures respectively, are associated with a new class of hypergraph generative model. The model assigns a probability to each hyperedge which decays with the sum of the squared pairwise distances between nodes in the hyperedge, therefore encouraging short-range connections. This random graph model allows us to quantify the relative presence of periodic and linear structures in the data through maximum likelihood. We demonstrate this approach on synthetic and real-world hypergraphs.

14:50-16:05 Session 3B: Structure I
Community detection and reciprocity in networks by jointly modeling pairs of edges

ABSTRACT. To unravel the driving patterns of networks, the most popular models rely on community detection algorithms. However, these approaches are generally unable to reproduce the structural features of the network. Therefore, attempts are always made to develop models that incorporate these network properties beside the community structure. Here, we present a probabilistic generative model and an efficient algorithm, namely, JointCRep, to both perform community detection and capture reciprocity in networks. Our approach differs from previous studies in that JointCRep jointly models pairs of edges with exact 2-edge joint distributions, without relying on pseudo-likelihood approximations or conditional independence assumptions. In addition, it provides closed-form analytical expressions for both marginal and conditional distributions. Specifically, JointCRep is a probabilistic generative model that estimates the likelihood of network ties with a bivariate Bernoulli distribution where the log-odds are linked to community memberships and pair-interaction variables. The numerical implementation uses an Expectation-Maximization algorithm that is efficient, as it exploits the sparsity of the network, and we provide open-source code online.

We validate our model on synthetic data in recovering communities, edge prediction tasks, and generating synthetic networks that replicate the reciprocity values observed in real networks. We also highlight these findings on two real datasets that are relevant for social scientists and behavioral ecologists. We observe that we capture reciprocity well, but with substantial improvement in community detection tasks while also improving in edge prediction tasks. That is, by considering 2-point joint distributions, and thus relaxing the common conditional independence assumption, our model is able to overcome the limitations of both standard algorithms and of the model that incorporates reciprocity but relies on a pseudo-likelihood approximation. To the extent of our knowledge, JointCRep is the first such method for jointly modeling pairs of edges with exact 2-edge joint distributions. In addition to providing standard analysis tools, it also allows practitioners to reply more accurately to questions that were not fully captured by standard models, for instance predicting the joint existence of mutual ties between pairs of nodes.

The preprint is available online at

Community structure in hypergraphs and the emergence of polarization

ABSTRACT. See attached.

Asymptotic Properties of the ABCD Graph Benchmark with Community Structure

ABSTRACT. The Artificial Benchmark for Community Detection (ABCD) is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR benchmark but with better scalability. Another advantage of ABCD is its simplicity, which unlike other models allows for theoretical analysis. In this work, we investigate various theoretical asymptotic properties of the ABCD model including (i) the degree distribution, (ii) the distribution of community sizes, (iii) the assignment of nodes to communities (a.k.a. ground truth partition), and (iv) the modularity of the ground-truth partition.

Systematic assessment of the quality of fit of the stochastic block model for empirical network data

ABSTRACT. We perform a systematic analysis of the quality of fit of the stochastic block model (SBM) for 275 empirical networks spanning a wide range of domains and orders of size magnitude. We employ posterior predictive model checking as a criterion to assess the quality of fit, which involves comparing networks generated by the inferred model with the empirical network, according to a set of network descriptors. We observe that the SBM is capable of providing an accurate description for the majority of networks considered, but falls short of saturating all modeling requirements. In particular, networks possessing a large diameter and slow-mixing random walks tend to be badly described by the SBM. However, contrary to what is often assumed, networks with a high abundance of triangles can be well described by the SBM in many cases. We demonstrate that simple network descriptors can be used to evaluate whether or not the SBM can provide a sufficiently accurate representation, potentially pointing to possible model extensions that can systematically improve the expressibility of this class of models.

Identification of critical cliques in complex networks

ABSTRACT. Traditionally, the interactions among nodes are described pair widely by the links in a network. However, in many real systems, such as human social networks, biological systems, and technological networks, interactions are beyond pairwise interactions. Therefore, identifying critical higher-order units in complex networks is of theoretical and practical significance. First, we propose several higher-order centralities, such as the high-order circle ratio, generalization degree, high-order h-index, and high-order PageRank, to quantify the importance of higher-order structures. Simulation results show that the higher-order PageRank index and the higher-order circle ratio index are more competitive in identifying the importance of higher-order structures due to the richer information they contain. Meanwhile, we compare the proposed indexes with the traditional low-level ones of networks and find that the effects of the two types of centralities are related to the number of cliques in the network. Then, we study the impact of the structures identified by various indexes on network robustness, node-system synchronization, and propagation dynamics (see Fig.1). The consistency of the results verifies our conclusions. Finally, the effectiveness of our proposed metrics is further studied in higher-order networks with a specific distribution, which confirms the generality of the proposed metrics. The structure of a network in higher dimension will help to understand the properties of the network, and the findings of this work provide a new perspective on the control, prediction and protection of various complex networks.

14:50-16:05 Session 3C: Social Media I
Detecting and modelling real percolation and phase transitions of information on social media

ABSTRACT. It is widely believed that information spread on social media is a percolation process, with parallels to phase transitions in theoretical physics. However, evidence for this hypothesis is limited, as phase transitions have not been directly observed in any social media. Here, through an analysis of 100 million Weibo and 40 million Twitter users, we identify percolation-like spread and find that it happens more readily than current theoretical models would predict. The lower percolation threshold can be explained by the existence of positive feedback in the coevolution between network structure and user activity level, such that more-active users gain more followers. Moreover, this coevolution induces an extreme imbalance in users’ influence. Our findings indicate that the ability of information to spread across social networks is higher than expected, with implications for many information-spread problems.

Shifting Polarization and Twitter News Influencers between two U.S. Presidential Elections

ABSTRACT. Social media are decentralized and interactive, empowering users to produce and disseminate information on an increasingly massive scale. Subsequently, these platforms have transformed the dynamics of political communication and extended the power to politically influence others beyond political elites and news organizations to the common voter. Here, we analyze the dynamics of political polarization among Twitter users using nearly a billion tweets that we collected over the 2016 and 2020 U.S. presidential elections. From this data, we recreate retweet networks of political news diffusion categorized by political orientations and factual credibility. Given these, we identify the top influencers for each political orientation in terms of their ability to spread information, and analyze their affiliations (or lack thereof) with political and media organizations. The discovery of the top influencers for both elections enables us to perform a unique comparison, showing how influence has shifted and how the types of influencers have changed. We find that 75% of the top 100 influencers of all political orientations in 2020 were not there in 2016, indicating their high susceptibility to turnover. Additionally, the majority of influencers affiliated with media organizations shrunk their fraction by 10% from 2016 to 2020. The replacement came mostly from influencers affiliated with political organizations with center- or right-leaning political orientations. However, unaffiliated influencers advanced too, overtaking about one third of the majority drop. Interestingly, for influencers with extreme-right political orientations, and those spreading fake news, we observe the proportions of new media-affiliated influencers are larger in 2020 compared to 2016. We enhance our work by analyzing the levels of ideological polarization induced by the influencers and those that propagate their content and compare them across the two elections. First, we project influencers onto a similarity graph, where each edge between any two influencers represents the weighted overlap of the Twitter users that retweet their content. Running community detection, we find two persistent communities, one containing left-leaning influencers and the other containing right-leaning influencers. Using modularity and normalized graph cut on this network, we quantify community strength and separation. We find that the overlap of users retweeting left-oriented content and right-oriented content has decreased in 2020 compared to 2016, as evidenced by a decrease in inter-community edge weight. Meanwhile, the strength of the communities themselves have increased. These results point to increasing polarization. We reinforce this conclusion by separately inferring the ideology of Twitter users based on the influencers they retweet by projecting a bibartite graph between influencers and the Twitter users that retweeted them on a latent ideology scale using correspondence analysis. The subsequent ideology distribution (see Figure 1) reveals a noticeable increase in polarization of the influencers and of their retweeters from 2016 to 2020, shown through increasing bimodality. Ultimately, the observed increase in polarization, coupled with the significant turnover rate of influencers between elections, suggests the onset of new, highly polarized influencers in Twitter.

Detecting and Forecasting Local Collective Sentiment Using Emojis

ABSTRACT. The analysis of collective social sentiment using large-scale data obtained from the Internet, such as social media data, has been actively conducted in recent years, but not many of them considered geographical distributions of sentiments or their spatial dynamics. In this study, we analyzed tweets associated with location information to detect local collective sentiment of each prefecture in Japan, especially in response to societal events. To extract positive and negative sentiments, we used emojis as language-independent universal indicators of positive/negative sentiments. We found that negative sentiment increased nationwide on the day of a major typhoon hit and after the onset of a COVID-19 pandemic in Japan, while positive sentiment increased around Christmas and the announcement of university or high school admission decisions, with some geographical variations. We also built a linear regression model to forecast the local positive sentiment of a prefecture from other prefectures’ past values, which achieved a reasonable predictability with R2 = 0.5–0.6. Based on the coefficient matrix of this sentiment forecast model, we constructed a causal network of prefecture sentiments in Japan. Interestingly, the relationships among prefectures and their centralities changed significantly before and after the COVID-19 pandemic.

Spiral of silence in online political deliberation: YouTube during the 2020 US presidential elections

ABSTRACT. Understanding how individuals respond to social influence is of paramount importance to studying political collective behavior (Huckfeldt et al. 2013). An often overlooked aspect in opinion dynamics models is the possibility that opinion feedback does not imply reconsideration but could also lead to self-censorship (Gaisbauer, Olbrich and Banisch 2020). We explore online political deliberation within the framework of the Spiral of Silence theory, which states that in a public setting where an individual believes to hold a minority position, fear of isolation will lead them to think that not sharing their own opinion is in their best interest (Hayes and Matthes 2017). We use this framework to determine if when an online conversation about politics turns contentious, individuals are less likely to share their opinions publicly or to engage in self-censorship. Studying conversations in YouTube videos posted by 10 major US news outlets before and after the 2020 U.S. Presidential election, we find that users often influence each other's tones. Having inferred comments' negativity and toxicity, we find that toxicity of a response is dependent on the toxicity of the parent comment. The same holds for negativity. Furthermore, when looking at an individual's transition from an uncensored state to one of self-censorship, we find that when an individual has decided that not sharing their own opinion is in their best interest, they are unlikely to revert this decision.

Inferring the ideological space of multipolar social systems

ABSTRACT. Social polarization is a pervasive phenomenon that strains social relations, erodes trust in institutions and, as exemplified by the recent push against vaccines in highly polarized societies such as the United States, may even jeopardize public health. Traditionally, polarization has been framed as a dichotomous conflict, usually defined as the division of a population into two contrasting groups. However, today’s political conflicts often involve not two, but multiple potentially dissenting factions. We call these contexts multipolar systems. The most representative examples are multi-party democracies, where the multilateral tensions between the different parties often lead to gridlock and uncertainty. Therefore, measuring and characterizing polarization in multipolar systems is a critical endeavour for modern societies.

We present a method to infer opinions in multidimensional contexts through networks of interactions based on the DeGroot learning process. To characterize and measure the polarization of the inferred opinion distribution we propose different metrics based on the covariance matrix, which is the multidimensional generalization of the variance, a quantity often adopted as a one-dimensional measure of polarization. The main characteristic of our multidimensional framework is that, instead of assuming the underlying ideological structure, such as conservative vs progressive, or liberal vs authoritarian, etc., it can reveal the natural space that best describes the social landscape, which does not necessarily correspond to traditional categories. We do so by modeling the ideological space as a multidimensional simplex, with the opinion poles placed at the vertices of the simplex.

By applying this methodology to Twitter data from multi-party elections, we find that the main axis of polarization is the left-wing / right-wing split. The spontaneous emergence of this axis from the computation not only supports the robustness and explanatory power of our method but also quantitatively validates the traditional left / right classification scheme of political parties. However, our most striking finding comes from the secondary axes of polarization, as they reveal non-trivial tensions specific to each system. These tensions can be understood in terms of the underlying socio-political context, but their importance could not be anticipated using classical political theory alone. Our approach’s adaptability overcomes these limitations.

14:50-16:05 Session 3D: Science of Innovation I
First-mover advantage drives gender disparities in Physics citations

ABSTRACT. Mounting evidence suggests that publications and citations of scholars in the STEM fields suffer from gender biases. Such biases often cause an invisibility syndrome in women and other minorities, resulting in a higher dropout rate among women, a phenomenon known as leaky pipeline. Thus, it is of societal importance to accurately identify those biases and devise bottom-up approaches to tackle them.

In this work, we focus on analysing publication and citation patterns in the Physics community. We infer the gender of the papers' primary authors combining their names with a picture-based method. Two papers that cover similar topics in a comparable way are assumed to include a similar set of outgoing citations. Thus, to detect pairs of similar papers we apply a statistical test based on the hypergeometric distribution to check if the size of the common set of citations is so large that it can not be explained by randomness. Then, we compare the incoming citations to each element of the pair, such that if the two publications are respectively led by a man and a woman, this comparison allows us to detect potential inequalities in the citation patterns.

We have found that the average number of citations received by publications with male primary authors is higher than for female primary authors. However, our results suggest that the temporal difference between two papers (the first-mover advantage) is driving this citation disparity. The data presents a decreasing trend, confirming that those papers published first, tend to obtain a higher number of citations, regardless of gender.

These results combined suggest that the overall disparity in the citation network is driven by the cumulative advantages and the first-mover effect that men have in Physics. This cumulative advantage, however, could create implicit biases that should be tackled by appropriate policies that foster the participation of women and other minorities.

Become a better you: correlation between the change of research direction and the change of scientific performance

ABSTRACT. It is important to explore how scientists decide their research agenda and the corresponding consequences, as their decisions collectively shape contemporary science. There are studies focusing on the overall performance of individuals with different problem choosing strategies. Here we ask a slightly different but relatively unexplored question: how is a scientist's change of research agenda associated with her change of scientific performance? Using publication records of over 14,000 authors in physics, we quantitatively measure the extent of research direction change and the performance change of individuals (Fig.1a-b). We identify a strong positive correlation between the direction change and impact change (Fig.1c). Scientists with a larger change on research topics are more likely to produce works with increased scientific impact compared to their past ones. On the other hand, after excluding the influence induced by the mediate variable of the overall output $n$, the direction change is not associated with productivity change (Fig.1d). Those who stay in familiar topics do not publish faster than those who venture out and establish themselves in a new field. In general, the statistics provide an encouraging prediction for scientists who venture into a new field. Once they are established in the new field, they are likely to become better scientists. We perform supplementary analyses about the choice of parameter $m$ and the relative position between the two sequences of $m$ papers, the results demonstrate the robustness of our conclusion. Besides, the gauge of direction change in this work is uncorrelated with the diversity change of research agenda (Fig.1e) and the switching probability among topics (Fig.1f), capturing the evolution of individual careers from a new point of view. Though the finding is inevitably affected by the survival bias, it sheds light on a range of problems in the career development of individual scientists.

(CANCELLED) Firm network community and Collaborative innovation: Does connectedness matter?

ABSTRACT. Drawing on a social network perspective, we explore the effect of the firm network community on corporate collaborative innovation. We use managerial interlocking to construct the firm network. Our results show that the connectedness of the firm network community positively affects firms’ collaborative innovation. We document that a dense community helps build social trust and convey more information and further prompts collaborative innovation. We also find that the core position of firms can strengthen the relationship. Further analysis shows that the positive effect of community function on firm collaborative innovation is primarily found in firms with strong technology innovation level and firms located in regions with high social trust, high intellectual property protection, and high network infrastructure. Our analysis enriches the understanding of the effect of informal network community and provide new insights in prompting firms’ collaborative innovation.

Modeling minority arrival in scientific citation networks

ABSTRACT. We begin by studying the growth dynamics of publications by men and women in the American Physical Society (APS) publications, which comprises 564,517 papers published in the APS journals from 1893 to 2015 and total of 6,715,562 citations. We identify the genders of the first authors of 216,263 papers in the dataset, and group the papers into two groups: female-led papers and male-led papers. We observe the growth dynamics of papers in the two groups, and find: (1) The average in-degree growth is time invariant and group independent, i.e., the average number of citations received by a paper after τ years is independent of the paper’s publication time and independent of the group. (2) Despite different initial conditions, the sizes (numbers of nodes) of both groups grow exponentially with the same exponent over the last few decades. (3) The ratio of the sizes of the two groups starts to fluctuate in a high range, but quickly converges to a low range as the system stabilizes.

To explain the empirical growth dynamics, we propose a network growth model with two mechanisms: preferential attachment and homophily. We adopt the preferential attachment network model with time-invariant degree growth introduced by Sun et al, and generalize the homophily model introduced by Karimi et al. The preferential attachment mechanism considers three main contributing factors: node degree as a classical amplifier that can be introduced by various mechanisms such as the reference-copying process; node fitness as a reflection of intrinsic differences between the nodes; and aging as a mechanism that reflects the natural preference for new and, at the same time, limits the strong bias towards old nodes. By the homophily mechanism, the homophily values determine the tendency of the nodes to connect to the same group in the network growing process. Through these mechanisms a newly arrival node (new paper) creates connections (citations) to existing ones (existing papers) in the network. Our model analytically explains the general properties of publication growth, such as the time invariant and group independent average degree growth, the exponential growth of the sizes of both groups, and the convergence of the ratio of the two group sizes. Simulation results using our model with the actual homophily values found in the empirical data can well recover our observations. Moreover, our model allows us to investigate how different interventions, such as altering homophily values, can influence the growth dynamics and the ratio of the two group sizes.

Mapping scientific foraging

ABSTRACT. Science is \textit{foraging}---collective exploration and exploitation of the space of knowledge by scientists. This process underpins the growth and decline of research fields, utilities of knowledge for future discoveries, and training of scientists. Conceptualizing the ``location'' of scientists in the knowledge space is instrumental to understand scientific foraging. Many approaches are built on {\it bag-of-topics} representations, which represent one's interest as a distribution over topics identified by keywords, subject categories, and references. One can operationalize how scientists change their interests by measuring the changes in the distribution.

A major limitation of the bag-of-topics approach is that the topic distribution does not take into account the relationship between topics. For example, Physics has a spectrum from theoretical to experimental studies. When a scientist changes from one topic to another in the spectrum, the degree of change in topics can be radical or moderate depending on the ``distance'' between topics, which is crucial but neglected in the bag-of-topics representations. How can we reflect the nuanced relationship between topics to the representation of a scientist's interests?

Here, we abandon the bag-of-topics representation altogether in favor of a \textit{geometric} representation---\textit{knowledge space}. We construct the knowledge space by embedding papers into a high-dimensional vector space such that two papers are close to each other if they are similar in terms of title semantics and close in a citation network (Figs.~A and B). We then represent a scientist's research interest in year $t$ as the centroid of the papers that the scientist cited in year $t$. This representation of research interest captures the relationship between topics. For example, a scientist who publishes papers about Particle Physics is mapped to a point that is close to other papers of the same topic as well as papers about Nuclear Physics because the two topics are related. Through validations with papers published in Physical Review journals, we demonstrate that the knowledge space captures more nuanced changes in research interests than the bag-of-topics representations (Fig.~C). In addition, the author's location is not only indicative of topics but also strongly predictive of new collaborations, with accuracy substantially greater than simple predictors based on collaboration networks and the bag-of-topics representation (Fig.~D). Research trajectories are coherent with time, and career age, providing richer insights into the evolution of scientists' research interest (Fig.~E), and we show their correlations to productivity and impact.

14:50-16:05 Session 3E: Economic Networks I
Macroscopic properties of buyer-seller networks in online marketplaces

ABSTRACT. Online marketplaces are the main engines of legal and illegal e-commerce, yet their empirical properties are poorly understood due to the absence of large-scale data. We analyze two comprehensive datasets containing 245M transactions (16B USD) that took place on online marketplaces between 2010 and 2021, covering 28 dark web marketplaces, i.e., unregulated markets whose main currency is Bitcoin, and 144 product markets of one popular regulated e-commerce platform. We show that transactions in online marketplaces exhibit strikingly similar patterns despite significant differences in language, lifetimes, products, regulation, and technology. Specifically, we find remarkable regularities in the distributions of transaction amounts, number of transactions, inter-event times and time between first and last transactions. We show that buyer behavior is affected by the memory of past interactions and use this insight to propose a model of network formation reproducing our main empirical observations. Our findings have implications for understanding market power on online marketplaces as well as inter-marketplace competition, and provide an empirical foundation for theoretical economic models of online marketplaces.

Industrial process networks: a trophic analysis

ABSTRACT. Production processes are central to organized societies and have thus been extendedly studied in the complex systems and complex networks litterature from various viewpoints: interfirm supply-chain, trade, or material flow analysis.

Recent works use block models to reconstruct an interfirm network from partial knowledge. While they give deep insight into the structure of production processes, those works leave aside the technical details of production: inputs and outputs are usually aggregated, and industrial processes are not considered. Furthermore, they neglect the ordering intrinsinc to production processes, and tend to focus on sectoral layers or communities.

To address the first issue, we turn to large and detailed databases of industrial processes, that have a two-mode structure (processes on one hand and products on the other). In previous works, short mean path-length and power-law distribution of degrees were observed, due to the existence of "hub" such as utility sectors.

The second point is addressed leveraging recent theoretical works extending the notion of trophic level, closely related to PageRank. For each node a numeric levels is computed, while the variance among levels quantifies network coherence.

In this article we first show how these tools can be extended to a bipartite setting, then we argue that industrial processes are well structured into successive discrete levels. We compared to other networks in various fields, and discuss possible explanations.

The time-varying characteristic of similarity and information redundancy among dimensions of stock price transmission: Evidence from China’s lithium battery stocks

ABSTRACT. When studying the stock prices transmission, many scholars only use the closing price of stocks as representatives. However, considering only the price transmission from a single dimension is incomplete and may lead to a lack of information and efficiency. Exploring the relationship of the stock price transmissions among different dimensions can help us more fully understand the impact among different stocks. In order to explore the relationship among different dimensions of stock price transmission, this paper selects China’s lithium battery stocks as a case, takes into account the high, low, opening, and closing prices of lithium battery stocks, and explores the similarity and information redundancy among dimensions of stock prices transmission from a time-varying perspective. First, multiplex network of Granger Causality between stock subjects with four price dimensions is constructed. Then, the similarity between two different dimensions of the stock price transmission is explored by calculating the Jensen–Shannon distance between the different layers of the multiplex network. We also use the method of multiplex network inter-layer information redundancy identification to identify the information redundancy among four dimensions of stock price transmission. In the results, we comprehensively compared the similarity and information redundancy relationship among four dimensions of the lithium battery stock price transmission .We find that the closing price transmission has a high reference value for the low price transmission. While the transmission characteristics of the opening and closing prices need to be considered separately. In addition, combining with different characteristics of the lithium battery market environment ,we analyze the information redundancy relationship between different dimensions of the stock price transmission,which provides a reference for reduce the risk of price transmission when in the same type of lithium battery market environment.

Structure of international trade hypergraphs

ABSTRACT. We study the structure of the international trade hypergraph consisting of triangular hyperedges representing the exporter-importer-product relationship. Measuring the mean hyperdegree of the adjacent vertices, we first find its behaviors different from those in the pairwise networks and explain the origin by tracing the relation between the hyperdegree and the pairwise degree. To interpret the observed hyperdegree correlation properties in the context of trade strategies, we decompose the correlation into two components by identifying one with the background correlation remnant even in the exponential random hypergraphs preserving the given empirical hyperdegree sequence. The other component characterizes the net correlation and reveals the bias of the exporters of low hyperdegree towards the importers of high hyperdegree and the products of low hyperdegree, which information is not readily accessible in the pairwise networks. Our study demonstrates the power of the hypergraph approach in the study of real-world complex systems and offers a theoretical framework.

Inequality in economic shock exposures across the global firm-level supply network

ABSTRACT. For centuries, national economies created wealth by engaging in international trade and production. The resulting international supply networks not only increase wealth for countries, but also create systemic risk: economic shocks, triggered by company failures in one country, may propagate to other countries. When working on aggregate data, the effect of these shocks is typically dramatically underestimated [1]. Using global supply network data on the firm-level [2], we present a method to estimate a country's exposure to direct and indirect economic losses caused by the failure of a company in another country. We show the network of systemic risk-flows across the world. We find that rich countries expose poor countries much more to systemic risk than the other way round. We demonstrate that higher systemic risk levels are not compensated with a risk premium in GDP, nor do they correlate with economic growth. Systemic risk around the globe appears to be distributed more unequally than wealth. These findings put the often praised benefits for developing countries from globalized production in a new light, since they relate them to the involved risks in the production processes. Exposure risks present a new dimension of global inequality, that most affects the poor in supply shock crises. It becomes fully quantifiable with the proposed method.

14:50-16:05 Session 3F: Biological Networks I
Structure-based approach to identifying small sets of driver nodes in biological networks

ABSTRACT. In network control theory, driving all the nodes in the Feedback Vertex Set (FVS) forces the network into one of its attractors (long-term dynamic behaviors) [1,2], but the FVS is often composed of more nodes than can be realistically manipulated in a system; for example, only up to three nodes can be controlled in intracellular networks, while their FVS may contain more than 10 nodes. Previous studies have shown that smaller control sets are available [1-3], but they cannot be identified without knowledge of the network’s dynamical model. We developed an approach to rank subsets of the FVS on Boolean models of intracellular networks using topological, dynamics-independent measures. We investigated the use of topological prediction measures — the centrality measures: distance and out-degree, the propagation measures: CheiRank and PRINCE Propagation, and two cycle-based measures. Using each measure, every FVS subset was ranked and then evaluated against two dynamics-based metrics that measure the ability of interventions to drive the system towards or away from its attractors: To Control and Away Control. After examining an array of biological networks, we found that the FVS subsets that ranked in the top according to the propagation metrics can most effectively control the network. This result was independently corroborated on a second array of different Boolean models of biological networks. Notably, the FVS subsets that we identified had greater than 85%/95% accuracy for achieving high To/Away Control when using the most stringent cutoff. Consequently, overriding the entire FVS is not required to drive a biological network to one of its attractors, and this method provides a way to reliably identify effective FVS subsets without requiring knowledge of the network's dynamics [4].

[1] A. Mochizuki, B. Fiedler, G. Kurosawa, and D. Saito, Dynamics and control at feedback vertex sets. Ii: A faithful monitor to deter- mine the diversity of molecular activities in regulatory networks, Journal of Theoretical Biology 335, 130 (2013). [2] J. G. T. Zañudo, G. Yang, and R. Albert, Structure-based control of complex networks with nonlinear dynamics, Proceedings of the National Academy of Sciences 114, 7234 (2017). [3] A. J. Gates, R. Brattig Correia, X. Wang, and L. M. Rocha, The effective graph reveals redundancy, canalization, and control pathways in biochemical regulation and signaling, Proceedings of the National Academy of Sciences 118, 10.1073/pnas.2022598118 (2021). [4] E. Newby, J. G. T. Zañudo, and R. Albert, Structure-based approach to identifying small sets of driver nodes in biological networks,

Role of mitochondrial genetic interactions in determining adaptation to high altitude human population

ABSTRACT. Physiological and haplogroup studies performed to understand high-altitude adaptation in humans are limited to individual genes and polymorphic sites. Due to stochastic evolutionary forces, the frequency of a polymorphism is affected by changes in the frequency of a near-by polymorphism on the same DNA sample making them connected in terms of evolution. Here, first, we provide a method to model these mitochondrial polymorphisms as “co-mutation networks” for three high-altitude populations, Tibetan, Ethiopian and Andean. Then, by transforming these co-mutation networks into weighted and undirected gene–gene interaction (GGI) networks, we were able to identify functionally enriched genetic interactions of CYB and CO3 genes in Tibetan and Andean populations, while NADH dehydrogenase genes in the Ethiopian population playing a significant role in high altitude adaptation. These co-mutation based genetic networks provide insights into the role of different set of genes in high-altitude adaptation in human sub-populations.

Integrating gene-perturbation networks with diverse disease phenotypes and cheminformatic data for cell type-guided drug discovery

ABSTRACT. Large-scale pharmacogenomic databases such as the Connectivity Map (CMap) have greatly assisted computational drug discovery. Despite their utility, CMap studies have mostly been agnostic to gene-perturbation interactions in multiple disease contexts. We present a computational framework that uses the recent large-scale CMap to build over 50 cell type-specific gene-perturbation networks and integrates these networks with an intermediary network of diverse disease phenotypes and cheminformatic data for a nested prioritization of cell lines and perturbations. The prediction performance of our method surpasses that of solely cheminformatic measures, as well as state-of-the-art methods that use CMap data to generate gene-perturbation networks and rank perturbations in a cell type-specific manner. Top-ranked drug perturbations identified using our framework have high chemical structural diversity, suggesting its potential for building compound libraries. Finally, proof-of-concept applications of our framework demonstrates the effectiveness of the intermediary disease phenotypes in providing additional non-redundant information on drug mechanisms related to diseases that are not directly evident from the input disease signatures. Overall, our analytical framework outperforms currently available methods in terms of predictive power and offers the potential to be a feasible blueprint for a cell type-specific drug discovery and repositioning platform that accounts for multiple disease phenotypes.

(CANCELLED) Extracting information from gene coexpressionnetworks of Rhizobium leguminosarum

ABSTRACT. Nitrogen uptake in legumes is facilitated by bacteria such as Rhizobium leguminosarum. For this bacterium, gene expression data are available, but functional gene annotation is less well developed. More annotations could lead to a better understanding of the pathways for growth, plant colonisation and nitrogen fixation in R. leguminosarum. We present and evaluate a pipeline that combines novel scores from gene coexpression network analysis in a principled way in order to identify genes that are associated with certain growth conditions or highly coexpressed with a predefined set of genes of interest. We apply this pipeline to R. leguminosarum gene coexpression networks to obtain putative functional annotations and a prioritised list of genes for further study.

Unraveling cradle-to-grave disease trajectories from multilayer comorbidity networks

ABSTRACT. Multimorbidities, the presence of multiple diseases or conditions in a patient, are strongly dependent on age, and they change with patients' aging. In order to observe disease progress, we need to understand disease trajectories and in which directions they change overage. We use a unique dataset containing 44 million records of almost all inpatient stays from 2003 to 2014 in Austria to investigate the disease trajectories. We developed a new, multilayer disease network approach to quantitatively analyze complex connections between two or more conditions and how they evolve over the life course of patients. Nodes represent diagnoses in specific age groups in intervals of ten years. Each layer is then a comorbidity network for one age group. Inter-layer links encode a significant correlation between diagnoses (p < 0.001, relative risk > 1.5), while intra-layers links present correlations between diagnoses across different age groups. We used an unsupervised clustering algorithm for detecting overlapping clusters in the multilayer comorbidity network. The resulting clusters reveal the most common diseases trajectories and their time-dependent characteristics. We identify 1260 (618 for females, 642 for males) distinct disease trajectories that contain on average 9 (IQR 2-6) different diagnoses that range over up to 8 age groups (mean: 2.3 age groups). These trajectories might partially overlap, which allows us to find bifurcation points in typical trajectories. We find 74 pairs of trajectories that share some diagnoses at younger ages but develop into markedly different patient phenotypes at older ages, whereas other, at young age distinct groups of diagnoses converge on the same diseases in older age (52 pairs). For instance, we identified four clusters of patients diagnosed with G45 (Transient cerebral ischemic attacks and related syndromes) in the age group between 40-49 years. There is one trajectory in which patients are only diagnosed with only G diagnoses (Diseases of the nervous system) throughout life. In comparison, one of the other trajectories has a high risk of being diagnosed with respiratory system diseases ( J40 - Bronchitis, not specified as acute or chronic, J43 - Emphysema). While in the next trajectory, we detected a high risk for cardiovascular diseases ( I21 - acute myocardial infarction ) together with diseases of the musculoskeletal system (M41 - Scoliosis) later in life. These results indicate that such cluster analyses can help identify critical events that put patients at high risk for different diagnoses decades later. These cluster analyses are of clinical relevance as they could serve for clinical decision-making and allow a more personalized medicine approach that could be integrated into daily clinical practice.