COMPLEX NETWORKS 2022: ELEVENTH INTERNATIONAL CONFERENCE ON COMPLEX NETWORKS & THEIR APPLICATIONS
PROGRAM FOR WEDNESDAY, NOVEMBER 9TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-09:40 Session Speaker S3: Shlomo HAVLIN Bar-Ilan University, Israel
09:00
Network Science and Applications

ABSTRACT. Network science has been applied in many worldwide systems and processes in different disciplines. These include social systems, physiology, traffic, climate, epidemics, and very recently physics. I will show some examples of how network tools can mitigate pandemics [1], how switching between topics of scientists affects their scientific impact [2], and how fresh teams can be associated with original and interdisciplinary research [3]. I will also show how network tools can help to improve urban traffic [4], identify novel features in interdependent networks [5], and identify novel physical processes [6].

References: [1] Cohen et al, PRL 85, 4626 (2000); PRL 86, 3682 (2001); Y. Liu et al National Sciernce Review 8 (1) nwaa229 (2021) [2] An Zeng et al, Nature Communications, 10, 3439 (2019) [3] An Zeng et al, Nature Human Behavior, 5 (10), 1314-1322 (2021) [4] Daqing Li et al, PNAS 112, 669 (2015); Limiao Zhang et al, PNAS 116, 8673 (2019); G. Zeng et al, PNAS 116, 23 (2019) [5] A Bashan et al, Nature Physics 9, 667 (2012); Y Berezin et al Scientific Reports 5 (1), 1-5 (2015) [6] I Bonamassa et al, To be published (2022)

09:40-10:40 Session Lightning L2: Diffusion & Epidemics - Dynamics on/of Networks
09:40
Paths for emergence of superspreaders in dengue fever spreading network
PRESENTER: Allbens Atman

ABSTRACT. The identification of superspreaders is essential to contain an epidemic, especially when there is not enough information about the disease to develop precautionary measures. Unlike infections caused directly between individuals of the same species, epidemics caused by vectors have well-explored peculiarities. In this direction, we intend to study the networks obtained from the dissemination of dengue to verify, from the results of a simulation of agent-based models, if the transmission of this disease follows the 20/80 rule for the proportion of spreaders and infected. We built different transmission networks considering the spread between vectors and humans up to the second generation and we observed that despite the human-to-human transmission network follow the 20/80 rule, the other networks (human-mosquito, mosquito- mosquito and mosquito-human) did not follow this rule. Varying the density of agents, we show that the phenomenon of superspreading is accentuated with high density of mosquitoes. These characteristics of vector-borne disease networks need to be further explored, as these vec- tors are highly vulnerable to climate change, and a better understanding of disease spread can help better target dengue epidemic control strategies.

09:45
Methods Evaluation - Missing Data in Age-stratified Contacts Predictions

ABSTRACT. The spreading of diseases is driven by the topology of the underlying network, which is formed by interactions. A current example is the ongoing pandemic of SARS-CoV2 in the world-wide social network, consisting of humans and their contacts with each other. The number and diversity of contacts a person has has a direct influence on the spread of viruses. Mathematical models can model the spread to increase understanding and allow the prediction of outbreaks or even epidemics. Informative data about parameters is required for the purpose of model specification. A common obstacle in most work is missing data in age and/or location groups. Mostly, surveys were only approved for adults and social contact data for children were only available, if a parent participated in the survey and provided the information. One possible reason for missing data in locations (e.g. provinces or other administrative divisions) are online surveys without sampling design that yield a convenience sample. The aim of the proposed paper is to investigate applied approaches to handle missing data in age-groups and the effect on the estimated age-stratified contact matrices. A special focus lies on the effect on those results of enforcing reciprocity constraints on the network’s topology. I conduct a simulation study to compare the predictions to pandemic and non-pandemic contact data and evaluate with respect to predictive accuracy of mean number of contacts within and between different age-groups. I compare weighted, sampling-weighted, and the unweighted reciprocal topology case.

09:50
Can one hear the position of nodes?

ABSTRACT. Wave propagation through nodes and links of a network forms the basis of spectral graph theory. Nevertheless, the sound emitted by nodes within the resonating chamber formed by a network is not well-studied. The sound emitted by vibrations of individual nodes reflects the structure of the overall network topology but also the location of the node within the network. In this article, a sound recognition neural network is trained to infer centrality measures from the nodes' waveforms. In addition to advancing network representation learning, sounds emitted by nodes are plausible in most cases. Auralization of the network topology may open new directions in arts, competing with network visualization.

09:55
Disentangling the Growth of Web3 Blockchain-based Networks by Graph Evolution Rules
PRESENTER: Alessia Galdeman

ABSTRACT. In recent years, novel paradigms that contrast the over-centralization of the current Web 2.0 are emerging. In this context, Web3 is a trending idea, based on blockchain technologies. From a researcher's point of view, Web3 services are resourceful because they offer publicly available, validated, temporal data that can be accessed through a blockchain interface. Blockchain Online Social Networks (BOSNs) are an example of platforms belonging to the Web3 ecosystem; they represent complex systems that include both social and financial dimensions. Non-fungible tokens (NFTs) are another example of Web3 service; they are data units that guarantee a unique certificate of ownership for a digital object together with a digital asset's uniqueness and non-transferability.

Given the complexity of such techno-social systems, it is essential to study how they evolve over time, to get deeper insights into their internal growth mechanisms. In the literature, there exist many models and measures that describe network growth by observing the link formation process, such as preferential attachment, homophily, and triadic closure. However, network evolution, especially in the Web3 context, cannot be explained by a single a-priori mechanism. More realistic models might adopt a mesoscopic approach, observing how small frequent subgraphs evolve.

10:00
In temporal networks correlation lags are informative of approaching bifurcations

ABSTRACT. Using the network approach to describe the evolution of a spatio-temporal dynamical system with a time-varying parameter that approaches a bifurcation point, we show that the lags that maximize the correlations between time-series in neighboring nodes are informative of the approaching bifurcation. In particular, we have found that the variance of the distribution of lags can exhibit an extreme value (maximum or minimum) before the bifurcation point.

10:05
Topological-temporal properties of evolving high order networks
PRESENTER: Alberto Ceria

ABSTRACT. Human social interactions are usually collected in the form of dyadic interactions, and have been effectively studied in the framework of evolving networks, where links connecting couple of nodes can be activated and deactivated over time. However, individuals can interact in groups composed of more than two people. Such group interactions can be represented as high order events (or, equivalently, hyperlink activations). In this paper, we characterize time evolving networks via the topological-temporal properties of such high order events. In particular, we characterize the topological and temporal properties of high order events from three perspectives: 1)The interrelation between the topological and temporal distance of events with the same or different orders, 2) the correlation or overlap in topological location between events with different orders and 3) temporal correlation of events with different orders that occur close in topology. In order to compare real-world networks with different number of nodes, these properties are compared with the null models that we design that systematically and gradually destroy topological and temporal properties of events of an arbitrary order. We applied our methods to 8 real-world physical contact networks. We discover that events close in time tend to be also close in topology. Moreover, we observe that nodes involved in the interaction of a large group, tend to interact together in groups of a small size too. Such tendency is more evident in contact networks measured in the context of high and primary schools, workplace, and hospitals, whereas it is less evident in the context of conferences and museums, where interactions are less routine. We also observe that nodes that interact with many different groups (events) of size $d$ tend also to be involved in many different groups (events) with another order. %We further find that the number of distinct groups of size $d$ a node interacts with, is approximately proportional to the total number of order d events the node participates. These observations suggest that an individual's large number of interactions of one order would not reduce his or her number of events of another order. Individuals tend to be consistently active or inactive in events across orders. Finally, we show that events of a large order are usually part of long trains composed of neighboring lower order events. The topological correlation and temporal correlation of neighboring events discovered in the second and third analysis explain the temporal and topological correlation found in the the first analysis. Our characterization methods provide new tools to compare real-world high order networks such as contact networks and collaboration networks and to investigate how properties of high order events affect dynamic processes unfolding on them. They may also inspire the development of more refined models of time-varying high order networks in reproducing key high order event properties.

10:10
Attributed Stream-Hypernetwork analysis: Homophilic Behaviors in Pairwise and Group Political Discussions on Reddit

ABSTRACT. Complex networks are solid models to describe human behavior. However, most analyses employing them are bounded to observations made on dyadic connectivity, whereas complex human dynamics involve higher-order relations as well. In the last few years, hypergraph models are rising as promising tools to better understand the behavior of social groups. Yet even such higher-order representations ignore the importance of the rich attributes carried by the nodes. In this work we introduce ASH, an Attributed Stream-Hypernetwork framework to model higher-order temporal networks with attributes on nodes. We leverage ASH to study pairwise and group political discussions on the well-known Reddit platform. Our analysis unveils different patterns while looking at either a pairwise or a higher-order structure for the same phenomena. In particular, we find out that Reddit users tends to surround themselves by like-minded peers with respect to their political leaning when online discussions are proxied by pairwise interactions; conversely, such a tendency significantly decreases when considering nodes embedded in higher-order contexts - that often describe heterophilic discussions.

10:15
Dynamic transition graph for estimating the predictability of financial and economical processes.
PRESENTER: Anton Kovantsev

ABSTRACT. The problem of time series predictability estimation always appears when we deal with a forecasting task. Especially when the process is not sustainable and performs some critical transitions or significantly changes its character. In these cases it is important to notice the moment when changes begin and to distinguish their direction as soon as possible in order to adjust the forecasting algorithm or, at least, properly evaluate the forecast veracity. We propose here the dynamic graph based method of real-time tracing the changes in a time-series predictability. This approach helps to filter some ‘noise’ information and emphasize the significant aspects in the complex dynamic system behavior. Besides, some graph characteristics, such as centrality degree, number and size of loops, connectivity and entropy, can be useful for predictability evaluation of the system which produces the time series. A graph neural network classifier trained on the artificial data-set turned out to be able for time series predictability verification on every step of incremental tracing along the time series in real-time.

10:20
A community-aware ranking scheme to identify influential nodes in complex networks
PRESENTER: Stephany Rajeh

ABSTRACT. Centrality measures are popularly used across biological, financial, and social networks to identify influential nodes. Nevertheless, these measures are prone to locating critical nodes in the densest network area while neglecting other dense regions that make up an essential part of the network. Consequently, diffusion processes starting from these nodes die out within their locally shared region. To tackle this issue, we present a new ranking scheme targeting influential nodes across the network by exploiting its community structure. The proposed community-aware ranking scheme is independent of the network type and centrality measures. Using the Susceptible-Infected-Recovered (SIR) diffusion model, we investigate its effectiveness using a set of popular classical centralities such as degree, closeness, and betweenness centrality. Results demonstrate the effectiveness of the proposed ranking scheme compared to the classical ordering of the centrality measures. Indeed, it leads to a more significant outbreak in synthetic and real-world networks.

10:25
Blending Machine Learning and mechanistic models to learn ecological time series

ABSTRACT. To anticipate the response of ecosystems to anthropogenic pressure and climate change, reliable ecosystem models are required. In contrast to pure machine learning (ML) models, mechanistic models can generalize in out-of-distribution scenarios even when the process under study has time dependent non-linear dynamics. However, their adoption has been limited because fitting the model parameters is notably difficult due to three major challenges: (a) ecosystems follow nonlinear potentially chaotic dynamics (b) available ecological time series are heterogeneous and noisy, and (c) only partial knowledge about ecosystems’ dynamics is available. In this work we merge ML techniques with mechanistic ecological models and present a framework to circumvent those issues. We start by presenting a bayesian framework to fit general dynamical systems models, and prove that in systems with chaotic or limit cycle dynamics, naive gradient descent methods will fail. We then prove that splitting the time series into short chunks can circumvent the problem, and implicitly allow the model to integrate heterogeneous datasets. We generate the observation data by sampling the simulated dynamics and contaminating the samples with noises. This model has been chosen because (a) it generates fluctuations that resemble the behaviour of observed ecological time series, (b) it produces chaotic dynamics that are notoriously challenging to forecast for a wide range of realistic parameters, and (c) it has been used as a benchmark in studies concerned with ecosystem forecast. We show that our framework can: • Learn from noisy and independent time series, inferring the metabolic and ingestion rate of the species • Providing reliable forecasts in regimes far from the ones observed during training. • Provide support for the correct model among several candidates

10:40-11:15Coffee Break
10:40-11:15 Session Poster P3A: [1-8] Biological Networks
Structural analysis of SARS-CoV-2 Spike Protein variants through Graph Embedding.

ABSTRACT. ince December 2019, severe acute respiratory syndrome coro- navirus 2 (SARS-CoV-2) has affected almost all the countries. The un- precedented spreading of this virus has lead to the insurgence of many variants that impact on protein sequence and stricture.Protein Contact Networks (PCNs) have been recently proposed as modelling framework for protein structures. In such a framework the protein structure is repre- sented to as a unweighted graph whose nodes are the central atoms of the backbones (C-α) and edges connect two atoms falling in the spatial dis- tance between 4 and 7 angstroms.In this work we explore the possibility to embed PCN using Graph Neural Networks and then to analyse in the embedded space each residue, in order to distinguish mutated residues from non-mutated ones. In such a way we aim to possibly predict possible future mutations

A Network-based Approach for Inferring Thresholds in Co-expression Networks

ABSTRACT. Gene co-expression networks (GCNs) specify binary relationships between genes and are of biological interest because significant network relationships suggest that two co-expressed genes rise and fall together across different cellular conditions. GCNs are built by (i) calculating a co-expression measure between each pair of genes and (ii) selecting a significance threshold to remove spurious relationships among genes. This paper introduces a threshold criterion based on the underlying topology of the network. More specifically, the criterion considers both the rate at which isolated nodes are added to the network and the density of its components when the threshold varies. In addition to Pearson's correlation measure, the biweight midcorrelation, the distance correlation, and the maximal information coefficient are used to build different GCNs from the same data and showcase the advantages of the proposed approach. Finally, a case study presents a comparison of the predictive performance of the different networks when trying to predict gene functional annotations using hierarchical multi-label classification.

Building Differential Co-expression Networks with Variable Selection and Regularization
PRESENTER: Camila Riccio

ABSTRACT. This work introduces a technique for the inference of differential co-expression networks. The approach takes as input a matrix of differential expression profiles, where each entry corresponds to the Log Fold Change of a gene expression between control and stress conditions for a specific sample. It outputs a matrix of coefficients, where each non-zero entry represents a pairwise connection between genes. The proposed approach builds on Lasso and is applied to differential expression profiles of rice between control and salt-stress conditions. A total of 25 genes were identified to respond to salt stress and as differentially expressed. About half of these genes (11) were reported with a statistically significant number of different GO annotations relevant to salt stress response.

Using the Duplication-Divergence Network Model to Predict Protein-Protein Interactions

ABSTRACT. Interactions between proteins are key to most biological processes, but thorough testing can be costly in terms of money and time. Computational approaches for predicting such interactions are an important alternative. This study presents a novel approach to this prediction using calibrated synthetic networks as input for training a decision tree ensemble model with relevant topological information. This trained model is later used for predicting interactions on the human interactome. Results show that deterministic metrics perform better than their stochastic counterparts, although a random forest model shows a feature combination case with comparable precision results.

Dynamics of Drosophila melanogaster social interaction networks
PRESENTER: Milan Petrovic

ABSTRACT. Summary. There are differences in dynamics in the COC and CTRL networks. COC networks have more links than CTRL networks becouse of psychostimulant treatment which increased the activitys of flies. Consequently, time needed to form open and closed triads in network is much shorter in COC networks.}

Persistent homology to analyse disruptions of functional and effective brain connectivity
PRESENTER: Jaroslav Hlinka

ABSTRACT. We extend topological data analysis by proposing directed persistent homology (DPH) and testing its ability to describe causal brain networks. We explain its potential advantages and pitfalls and illustrate them by its discriminatory power in two enigmatic examples of disease-related brain network alterations: epilepsy and schizophrenia. We estimate networks from fMRI, EEG and iEEG data, apply DPH and test separability of healthy and diseased brain states DPH signatures by machine learning.

Aberrant change in brain network flexibility during the performance of Theory of Mind task in schizophrenia patients

ABSTRACT. A comparison of dynamic brain network reconfiguration captured by fMRI is performed between two groups of 64 schizophrenia patients and 64 healthy controls. The results show differences in three scales of single nodes, group of nodes, and whole network. These three scale measures are then used to train a generalized linear model (GLM) and predict the diagnosis based on the reconfiguration/flexibility of the brain during Theory of Mind task. The results show a significantly higher than chance classification accuracy.

Role of mitochondrial genetic interactions in determining adaptation to high altitude human population
PRESENTER: Rahul Verma

ABSTRACT. Physiological and haplogroup studies performed to understand high-altitude adaptation in humans are limited to individual genes and polymorphic sites. Due to stochastic evolutionary forces, the frequency of a polymorphism is affected by changes in the frequency of a near-by polymorphism on the same DNA sample making them connected in terms of evolution. Here, first, we provide a method to model these mitochondrial polymorphisms as “co-mutation networks” for three high-altitude populations, Tibetan, Ethiopian and Andean. Then, by transforming these co-mutation networks into weighted and undirected gene–gene interaction (GGI) networks, we were able to identify functionally enriched genetic interactions of CYB and CO3 genes in Tibetan and Andean populations, while NADH dehydrogenase genes in the Ethiopian population playing a significant role in high altitude adaptation. These co-mutation based genetic networks provide insights into the role of different set of genes in high-altitude adaptation in human sub-populations.

10:40-11:15 Session Poster P3B: [9-11] Machine Learning & Network
Geometric Deep Learning graph pruning to speed-up the run-time of Maximum Clique Enumerarion algorithms
PRESENTER: Marco Grassia

ABSTRACT. In this paper we propose a method to reduce the running time to solve the Maximum Clique Enumeration (MCE) problem. Specif- ically, given a network we employ geometric deep learning in order to find a simpler network on which running the algorithm to derive the MCE. Our approach is based on finding a strategy to remove from the network nodes that are not functional to the solution. In doing so, the resulting network will have a reduced size and, as a result, search times of the MCE is reduced. We show that our approach is able to obtain a solver speed-up up to 42 times, while keeping all the maximum cliques.

Fire Together, Wire Together?

ABSTRACT. The current research identifies setups that show paradoxical relations between the tendency of computational units to fire together and information gain. We consider the most fundamental network structure, comprising only two computational units. We model it by the prior distribution over the output values and a row stochastic matrix that maps each output value into a distribution over the input values. Combinations of the model parameters were considered in a full factorial design experiment to produce a variety of setups. Then, for each setup, we produced a label that represents the relationship between the tendency of the computational units to fire together and the information gained from connecting them. We use a machine learning algorithm (decision tree) to identify paradoxical patterns. For example, there is sometimes a pattern of high efficiency of a connection between computational units although these units rarely fire together. We provide a suggestion for extending the model in the current research to more general structures. We conclude that there is potential to enhance learning in artificial neural networks by using information measures (Specifically, ’information gain’) to improve the adjustment of connection weights.

Reaction impurity prediction using a data mining approach
PRESENTER: Adarsh Arun

ABSTRACT. Automated prediction of reaction impurities can be useful in facilitating rapid early-stage reaction development, synthesis planning and optimization. Existing reaction predictors are catered towards main product prediction, and are often black-box, making it difficult to troubleshoot erroneous outcomes. This work aims to present an automated impurity prediction workflow that is interpretable and transparent, as it is based on data mining large chemical reaction databases. A 14-step workflow was implemented in Python and RDKit using Reaxys data. Evaluation of potential chemical reactions between functional groups present in the same reaction environment in the user-supplied query species can be accurately performed by directly mining the Reaxys database for similar or ‘analogue’ reactions involving these functional groups. Reaction templates can then be extracted from analogue reactions and applied to the relevant species in the original query to return impurities and transformations of interest. Three proof-of-concept case studies based on active pharmaceutical ingredients (paracetamol, agomelatine and lersivirine) were conducted, with the workflow able to suggest the correct impurities within the top two outcomes. At all stages, suggested impurities can be traced back to the originating template and analogue reaction in the literature, allowing for closer inspection and user validation. Ultimately, this work could be useful as a benchmark for more sophisticated algorithms or models since it is interpretable, as opposed to purely black-box solutions, and illustrates the potential of chemical data in impurity prediction. In the long run, applying this workflow to the entire Reaxys database will allow for the development of enriched chemical reaction networks with impurity information that can facilitate rapid decision making and synthesis route planning.

11:15-13:00 Session Oral O4A: Community Structure
11:15
Outliers in the ABCD Random Graph Model with Community Structure (ABCD+o)
PRESENTER: Pawel Pralat

ABSTRACT. The Artificial Benchmark for Community Detection graph (ABCD) is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter \xi can be tuned to mimic its counterpart in the LFR model, the mixing parameter \mu.

In this paper, we extend the ABCD model to include potential outliers. We perform some exploratory experiments on both the new ABCD+o model as well as a real-world network to show that outliers posses some desired, distinguishable properties.

11:30
The Unconstrained LFR Benchmark
PRESENTER: Bojan Evkoski

ABSTRACT. The most common approach for evaluating and comparing community detection algorithms is to use networks with a priori known community structure. In the absence of real-world networks with known community structure, artificially generated networks are used, and the Lancichinetti---Fortunato---Radicchi (LFR) benchmark~\cite{lancichinetti2008benchmark} is the most widely accepted algorithm for generating artificial networks that resemble real-world networks. A common comparison setting, used for example in \cite{yang2016comparative,chakraborty2020ensemble,lu2018community,orman2013towards,poulin2019ensemble}, is to vary only the LFR mixing parameter $\mu$, which corresponds to partition difficulty (a higher $\mu$ means a higher percentage of edges going out of communities, making it harder to find the true communities). The community detection algorithms are then compared on one or a few different network sizes (LFR parameter $n$). In this setting, the diversity of LFR networks is minimized. We argue that the performance of community detection algorithms may vary depending on other network properties, i.e., some algorithms perform better on one set of LFR parameters and other algorithms perform better on others. Consequently, conclusions based on only one set of LFR parameters may be misleading.

To perform a more comprehensive benchmarking of community detection algorithms while avoiding the shortcomings of the common LFR benchmarking, we propose the Unconstrained LFR benchmark. It consists of two steps: \textbf{generating diverse LFR networks} and then \textbf{benchmarking by applying the Friedman test and the post-hoc Nemenyi test}. In this way, the full diversity of the LFR network space can be explored and the potential bias from a single set of LFR parameters is avoided.

11:45
Influence-based Community Deception
PRESENTER: Giuseppe Pirrò

ABSTRACT. This paper studies the novel problem of influence-based community deception. Tackling this problem amounts to devising tools to protect the users of a community from being discovered by community detection algorithms. The novel setting considers networks with both edge directions and models the influence of nodes as edge weights. We present a deception strategy based on modularity. We conducted an experimental evaluation that shows the feasibility of our proposal.

12:00
AutoGF: Runtime Graph Filter Tuning for Community Node Ranking

ABSTRACT. A recurring graph analysis task is to rank nodes based on their relevance to communities of shared metadata attributes (e.g. the interests of social network users). To achieve this, approaches often start with a few example community members and employ graph filters that rank nodes based on their structural proximity to the examples. Choosing between well-known filters typically involves experiments on existing graphs, but their efficacy is known to depend on the structural relations between community members. Therefore, we argue that employed filters should be determined not during algorithm design but at runtime, upon receiving specific graphs to process. To do this, we split example nodes into training and validation sets and either perform supervised selection between well-known filters, or account for granular graph dynamics by tuning parameters of the generalized graph filter form with a novel optimization algorithm. Experiments on 27 community node ranking tasks across three real-world networks of various sizes reveal that runtime algorithm selection selects near-best AUC and NDCG among a list of 8 popular alternatives, and that parameter tuning yields similar or improved results in all cases.

12:15
Hierarchical communities in complex networks
PRESENTER: Leto Peel

ABSTRACT. Modular and hierarchical structures are pervasive in real-world complex systems. A great deal of effort has gone into trying to detect and study these structures. Important theoretical advances in the detection of modular, or “community”, structures have included identifying fundamental limits of detectability by formally defining community structure using probabilistic generative models. Here we present a theoretical study on hierarchical community structure in networks, which has thus far not received the same rigorous attention. We address the following questions: 1) How should we define a valid hierarchy of communities? 2) How should we determine if a hierarchical structure exists in a network? and 3) how can we detect hierarchical structure efficiently? We approach these questions by introducing a definition of hierarchy based on the concept of stochastic externally equitable partitions and their relation to probabilistic models, such as the popular stochastic block model. We enumerate the challenges involved in detecting hierarchies and, by studying the spectral properties of hierarchical structure, present an efficient and principled method for detecting them.

12:30
Robustness and sensitivity of network-based topic detection
PRESENTER: Carla Galluccio

ABSTRACT. In the context of textual analysis, network-based procedures for topic detection are gaining attention as an alternative to classical topic models. Network-based procedures are based on the idea that documents can be represented as word co-occurrence networks, where topics are defined as groups of strongly connected words. Although many works have used network-based procedures for topic detection, there is a lack of systematic analysis of how different design choices, such as the building of the word co-occurrence matrix and the selection of the community detection algorithm, affect the final results in terms of detected topics. In this work, we present the results obtained by analysing a widely used corpus of news articles, showing how and to what extent the choices made during the design phase affect the results.

12:45
The metric backbone preserves community structure and is a primary transmission subgraph in contact networks
PRESENTER: Luis M. Rocha

ABSTRACT. The structure of social networks strongly affects how different phenomena spread in human society, from the transmission of information to the propagation of contagious diseases. It is well-known that heterogeneous connectivity strongly favors spread, but a precise characterization of the redundancy present in social networks and its effect on the robustness of transmission is still lacking. This gap is addressed by the recently introduced metric backbone, a subgraph that is sufficient to compute all shortest paths of weighted graphs. We show that the metric backbones of nine contact networks obtained from proximity sensors in a variety of social contexts are generally very small: ranging from about 6% to 20% of the original graph, except for the case of a network with minimal or random social interaction where the backbone is 49%. The small relative size of the metric backbone reveals that shortest paths on these networks are very robust to attacks and failure, likely a feature that derives from surprisingly vast amounts of redundancy.

While many edges involved in local structure are removed to reveal the backbone, the latter preserves all shortest paths whether these characterize local, short-range, or long-range distances. Therefore, the metric backbone preserves the complete distribution of multi-scale distances and the natural hierarchy of complex networks. Indeed, using various measures of community structure, we show that the metric backbone preserves the community structure of all the original contact networks studied. Additionally, using Susceptible-Infected (SI) epidemic spread models, we show that the metric backbone is a primary subgraph in epidemic transmission, almost preserving the transmission times of the original network, especially in comparison to random and thresholded graphs of the same size.

Importantly, other backbone methods end up removing edges that are not redundant for shortest paths. For instance, the disparity filter backbone, which has been proposed to preserve the multiscale structure of complex networks, alters the distribution of shortest paths, overall connectivity, and even removes nodes. This is also the case of the effective resistance sparsification backbone, an edge sampling method that achieves high fidelity in epidemic simulations vis-à-vis the original graphs, but is nonetheless parameter-dependent, stochastic, and re-weights the original edge weights. In contrast, the metric backbone is a parameter-free, principled method to obtain a (typically very small) subgraph that fully preserves all shortest paths on the original graph, unlike edge thresholding or backbons based on statistical null models or sampling. This is important for studying epidemic spread on social networks, which depend on the maintenance of the multiscale hierarchical structure of complex networks. Indeed, preserving the multiscale distribution of distances and community structure is important for any communication dynamics on complex networks that depend on shortest paths. A preprint of the work, currently under review, is available.

11:15-13:00 Session Oral O4B: Machine Learning & Networks
11:15
Learning Attribute Distributions Through Random Walks
PRESENTER: Nelson Antunes

ABSTRACT. We investigate the statistical learning of nodal attribute distributions in homophily networks using random walks. Attributes can be discrete or continuous. A generalization of various existing canonical models, based on preferential attachment is studied, where new nodes form connections dependent on both their attribute values and popularity as measured by degree. We consider several canonical attribute agnostic sampling schemes such as Metropolis-Hasting random walk, versions of node2vec (Grover and Leskovec, 2016) that incorporate both classical random walk and non-backtracking propensities and propose new variants which use attribute information in addition to topological information to explore the network. The performance of such algorithms is studied on both synthetic networks and real world systems, and its dependence on the degree of homophily, or absence thereof, is assessed.

11:30
Quantifying Biobank Impact

ABSTRACT. Biobanks, biological repositories of sample data set aside for research, vary widely in terms of purpose, scope, governance, and type of data. They vary widely in terms of recognition as well, with only a handful of well-known biobanks. The characteristics of impactful biological cohorts and the mechanisms of recognition given to their creators remain elusive as quantitative measures applied to the universe of biobanks are hard to find. Here, we use data mining to identify a list of more than one thousand biobanks together with their introductory paper to the academic community spanning several decades in time. We find that a big share of the citations biobanks receive come from collaboration with third party research groups, with over 20\% of all papers citing a biobank having its lead scientist as a co-author. Stringent data-access policies resulting in a collaboration seem justified as we confirm for a major biobank, where half of the papers using its data do not cite the biobank's main paper. Finally, we use machine learning to predict the success of a biobank measured by its $h$-index after one year of publication, and capture the major role of the leading scientist popularity and the inclusion of genetic data as drivers to success. These results bring evidence of previously discussed problems of lack of recognition and helps unveil the underlying mechanisms of biobank success.

11:45
What deep learning can bring to two decades of correlations, hierarchies, networks and clustering in financial markets

ABSTRACT. Complex networks in finance have been explored over the past two decades. Besides the minimum spanning tree (MST) algorithm, several algorithms have been developed and used to analyze the correlation of stocks returns, e.g. the planar maximally filtered graph (PMFG), the directed bubble hierarchical tree (DBHT), and the triangulated maximally filtered graph (TMFG). Despite a variety of algorithms and dependence measures (from linear correlation to information theoretic and copula-based measures of dependence), the hierarchical clustering structure of stocks correlations (and credit default swaps, sovereign bonds, commodities) has been confirmed across all studies since the seminal paper. However, many important and practical problems remain unsolved: (i) What is an optimal or relevant rolling window length for performing correlation-based networks or clusters analysis? (ii) What is the `optimal' number of clusters? How many relevant layers in the hierarchy of correlations? (iii) How can we validate and reproduce network-related research when studies are performed on proprietary data? Recent advances in deep learning can help approach these questions. We will detail the research avenues that we see open for being tackled with deep learning techniques.

12:00
Sparsification of deep neural networks before training using complex network analysis

ABSTRACT. In many areas of applied deep learning, it can be challenging to design neural network architectures to extract meaning from data. Despite neural network architectures being at the core of the deep learning approach, little work has been done to understand the relationship between architectural design and learning performance through the lens of network science. This is because, in their traditional form, artificial neural network topologies are associated with dense connectivity between successive layers which doesn't change throughout training. However, recent works with neural network pruning have shown that edges can be removed from trained neural networks while maintaining the same performance. By training these sparse networks, we a deeper understanding of the interaction between structure (a.k.a., architecture or topology) and function in neural networks. To achieve these goals, we leverage tools from the study of adaptive networks in complex systems. In this extended abstract, we outline several key questions in this area and present our experimental methodology and results for our initial experiments.

12:15
Does modular function result in modular structure in sparse deep neural networks?

ABSTRACT. The property of complex networks, where they can be structurally decomposed into separate sub-networks and those sub-networks learn re-usable sub-functions is known as modularity. In nature, modularity in networks emerge as a result of learning complex functions that have modular representations. That is, those functions can be broken down into smaller sub-functions, that are independent and re-used to compose functions of higher complexity.

The development of artificial intelligence over the years has taken inspirations from biological networks. More recently, artificial neural networks (ANNs) designed explicitly to be modular have been shown to achieve superior generalization performance. In this work we propose to investigate whether modularity in ANNs, like in nature, can be an emergent phenomenon as a result of modularity of the learning functions.

12:30
Outlier mining in high-dimensional datasets based on Jensen-Shannon distances and graph structure analysis

ABSTRACT. We propose a new method for mining outliers in high-dimensional datasets. The method is based in the analysis of the fully connected graph, where the links between the elements of the dataset are defined by using Jensen-Shannon (JS) distances. We demonstrate the performance of the method using a publicly available dataset of credit card transactions, where a few of them a labeled as frauds.

12:45
Graph Mining and Machine Learning for Shader Codes Analysis to Accelerate GPU Tuning
PRESENTER: Lin Zhao

ABSTRACT. The graphics processing unit (GPU) has become one of the most important computing technologies. Disassembly shader codes, which are machine-level codes, are important for GPU designers (e.g., AMD, Intel, NVIDIA) to tune the hardware, including customization of clock speeds and voltages. Due to many use-cases of modern GPUs, engineers generally find it difficult to manually inspect a large number of shader codes emerging from these applications. To this end, we develop a framework that converts shader codes into graphs and employs sophisticated graph mining and machine learning techniques over a number of applications to simplify shader graphs analysis in an effective and explainable manner, aiming at accelerating the whole debugging process and improving the overall hardware performance. We study shader codes’ evolution via temporal graph analysis and structure mining with frequent subgraphs. Using them as the underlying tools, we conduct a frame’s scene detection and representative frames selection. We group the scenes (applications) to identify the representative scenes and predict a new application’s inefficient shaders. We empirically demonstrate the effectiveness of our solution and discuss future directions.

11:15-13:00 Session Oral O4C: Network Models
11:15
Percolation in heterogeneous spatial networks with long-range interactions
PRESENTER: Guy Amit

ABSTRACT. Spatial networks are a class of networks where the nodes are embedded in a metric space, meaning that the nodes have associated coordinates that allow to define distances between them. Typically the edges between the nodes are constructed such that a pair of nodes with a small distance between them have a larger probability of being connected with an edge compared to nodes which are far apart. In this work, we model a spatial network using a random walk. A central question in network science is whether a giant component (GC) exists or not, i.e., is there a cluster of connected nodes with a size proportional to the number of nodes, N. We find that the existence of a GC in this problem is closely related to the problem of percolation in a one-dimensional lattice with long-range interactions. Surprisingly, a GC exists for a large range of values of the random walk stability parameter $\alpha$, even when a mean step size cannot be defined.

11:30
Robustness of Noisy Quantum Networks
PRESENTER: Bruno Coutinho

ABSTRACT. Quantum networks allow us to harness networked quantum technologies and to develop a quantum internet. But how robust is a quantum network when its links and nodes start failing? We show that quantum complex networks based on typical noisy quantum-repeater nodes are prone to discontinuous phase transitions with respect to the random loss of operating links and nodes, abruptly compromising the connectivity of the network, and thus significantly limiting the reach of its operation. Furthermore, we determine the critical quantum-repeater efficiency necessary to avoid this catastrophic loss of connectivity as a function of the network topology, the network size, and the distribution of entanglement in the network. From all the network topologies tested, a scale-free network topology shows the best promise for a robust large-scale quantum internet.

11:45
The Frechet distribution in drone networks
PRESENTER: Piet Van Mieghem

ABSTRACT. In this paper, we focus on the link density in Random Geometric Graphs (RGGs) with a distance-based connection function. After deriving the link density in D dimensions, we focus on the two-dimensional (2D) and three-dimensional (3D) space and show that the link density is accurately approximated by the Frechet distribution, for any rectangular space. We derive expressions, in terms of the link density, for the minimum number of nodes needed in the 2D and 3D spaces to ensure network connectivity. These results provide first-order estimates for e.g. a swarm of drones to provide coverage in a disaster or crowded area.

12:00
The distribution of cover times of random walks on random regular graphs
PRESENTER: Ofer Biham

ABSTRACT. We present analytical results for the distribution of cover (C) times of random walks (RWs) on random regular graphs (RRGs) consisting of $N$ nodes of degree $c \ge 3$. Starting from a random initial node $i$ at time $t=0$, at each time step $t \ge 1$ an RW hops into a random neighbor of its previous node. In some of the time steps it visits new nodes that have not been visited before, while in other time steps it revisits nodes that have already been visited. The cover time $T_{\rm C}$ is the number of time steps required for the RW to visit every single node in the network at least once. We derive a master equation for the distribution $P_t(S = s)$ of the number of distinct nodes $s$ visited by an RW up to time $t$ and solve it analytically. Inserting $s = N$ we obtain the cumulative distribution of cover times, namely the probability $P(T_{\rm C} \le t) = P_t(S = N)$ that up to time $t$ an RW will visit all the $N$ nodes in the network. Taking the large network limit, we show that $P(T_{\rm C} \le t)$ converges to a Gumbel distribution, whose mean is $\langle T_{\rm C} \rangle \simeq N \ln N$.

12:15
Universal growth of social groups
PRESENTER: Ana Vranić

ABSTRACT. In this work, we analyzed the distribution of group sizes and how members join groups on online social platforms. We showed that group size distributions for Meetup and Reddit follow a log-normal distribution, indicating the universal growth patterns in both systems. The proposed model can reproduce the observed log-normal distributions. The model considers the interplay between random and social diffusion between groups, resulting in different log-normal distributions. When social connections are more critical, such as in Reddit, group size distribution becomes broader. This work shows that online social groups follow universal growth mechanisms that must be considered in modeling the evolution of social systems.

12:30
Partial synchronization in neural networks: chimeras and beyond

ABSTRACT. Synchronization of neurons is believed to play a crucial role in the brain under normal conditions, for instance, in the context of cognition and learning, and under pathological conditions such as Parkinson disease or epileptic seizures. In the latter case, when synchronization represents an undesired state, understanding the mechanisms of desynchronization is of particular importance. In other words, the possible transitions from synchronized to desynchronized regimes and vice versa should be investigated. It is known that such dynamical transitions involve the formation of partial synchronization patterns, where only one part of the network is synchronized. The most prominent example is given by chimera states [1]. In the present talk, we discuss the occurrence of chimera states in complex networks of coupled neural systems. Moreover, we investigate another peculiar pattern called solitary states that has recently received a lot of attention [2]. We show how chimera states and solitary states are formed in dynamical networks of different kinds including single- and multilayer networks.

[1] A. Zakharova, Chimera Patterns in Networks: Interplay between Dynamics, Structure, Noise, and Delay, Understanding Complex Systems (Springer, Cham, 2020) doi: 10.1007/978-3-030-21714-3

[2] A. Zakharova, Investigating partial synchronisation in complex dynamical networks, Research Features 2022 doi: 10.26904/RF-141-2648415675

12:45
A More Powerful Heuristic for Balancing An Unbalanced Graph
PRESENTER: Amit A. Nanavati

ABSTRACT. We present a more powerful heuristic algorithm for the NP- complete problem of finding a minimum size subset of edges in an unbalanced signed graph G whose ’+’/’−’ labels can be flipped to balance G. Our algorithm finds a minimal flipping edge-set, starting with a given spanning tree T of G, by considering both the edges not in T and those in T because flipping a tree-edge can sometimes balance multiple fundamental unbalanced cycles at the same time. This can give a much smaller minimal flipping edge-set than the current algorithm where only the edges not in T are considered for flipping.

13:00-14:30Lunch Break
14:30-16:00 Session Oral O5A: Resilience, Synchronization & Control
14:30
Investments in Robustness of Complex Systems: Algorithm Design
PRESENTER: Richard La

ABSTRACT. We study the problem of determining suitable investments in improving the robustness of complex systems comprising many component systems with an aim of minimizing the (time) average costs to system operators. The problem is formulated as an optimization problem that is nonconvex and challenging to solve for large systems. We propose two approaches to finding a good solution to the optimization problem: the first approach is based on a gradient method and finds a local optimizer. The second approach makes use of a convex relaxation of the original problem and provides both a lower bound on the optimal value and a feasible point. The lower bound can be used to bound the optimality gap of the solutions obtained by our methods. We provide numerical results to demonstrate the effectiveness of the proposed approaches.

14:45
Analysis on the effects of graph perturbations on centrality metrics
PRESENTER: Lucia Cavallaro

ABSTRACT. Graph robustness upon node failures state-of-art is huge. However, very little is known on the effects of centrality metrics ranking after graph perturbations. To fill this gap, our aim is to quantify how much small perturbations (evaluated through ψ) in a graph will affect the centrality metrics (evaluated through ζ). Thus, we considered two type of probabilistic failure models (i.e., Uniform and Best Connected), a fraction of nodes under attack τ with 0 < τ ≤ 1, and three popular centrality metrics (i.e., Degree, the Eigenvector and the Katz centrality). We discovered that in the Uniform model ψ is not significantly affected when τ is small even with a quite high failure probability (i.e., p ≤ 85%) and that the Eigenvector centrality is the most susceptible metric to deformation respect to the other herein analysed. Vice versa, when in the Best Connected model, the amount of perturbation ψ is proportional to τ.

15:00
Robustness of Preferential-Attachment Graphs: Shifting the Baseline

ABSTRACT. The widely used characterization of scale-free networks as "robust-yet-fragile" originates primarily from experiments on instances generated by preferential attachment. According to this characterization, scale-free networks are more robust against random failures but more fragile against targeted attacks when compared to random networks of the same size. Here, we consider a more appropriate baseline by requiring that the random networks match not only the size but also the inherent minimum degree of preferential-attachment networks they are compared with. Under this more equitable condition, we can (1) prove that random networks are almost surely robust against any vertex removal strategy and (2) show through extensive experiments that scale-free networks generated by preferential attachment are not particularly robust against random failures.

15:15
Validity of a one-dimensional reduction of dynamical systems on networks
PRESENTER: Prosenjit Kundu

ABSTRACT. Resilience is a system’s ability to alter its activity in order to sustain its functionality when it is disrupted. To study resilience of dynamics on networks, Gao et al. proposed a theoretical framework to reduce dynamical systems on networks, which are high dimensional in general, to one-dimensional dynamical systems. We refer this method as the GBB reduction. In addition to the absence of degree correlation, the correctness of this GBB reduction is based on some assumptions. We investigate the precision of the one-dimensional reduction when networks are assumed to be devoid of degree correlation to obtain the following main results. First, when the dispersion of the node’s state is modest, the one-dimensional reduction tends to be accurate. Second, the correlation between the node’s state and the node’s degree, which is common for various dynamical systems on networks, is unrelated to the accuracy of the one-dimensional reduction.

15:30
Resilience of Coupled Social and Technical Networks in Open Source Software

ABSTRACT. Open source software (OSS) permeates our digital economy and society. OSS ecosystems evolve in a decentralized manner, integrating the contributions of dispersed individuals and drawing a complex network of dependencies between themselves. We rely on these OSS ecosystems in many ways, from browsing internet to driving our car or even using some kitchen appliances.

Our approach to studying the resilience of OSS ecosystems builds on three observations or stylized facts. First, in OSS ecosystems dependencies between libraries form a directed network, in theory a directed acyclic graph, in which nodes are libraries and a directed edge indicates a dependency of one library on another. It has been observed that these networks are growing for many ecosystems, especially their transitive dependencies. Second, we also know that errors, bugs, and vulnerabilities spread through these networks, indicating how issues in key libraries can impact the whole system. Finally, libraries within OSS ecosystems, even widely used ones, often have few or solo maintainers. This suggests that a few people are critical to the system's survival when issues emerge.

We outline a method to quantify the resilience of an OSS ecosystem integrating both social and technical factors. Adapting the tools of network and complexity science to the case of OSS ecosystems promises to deliver an improved picture of overall systemic risk. Specifically, we simulate how the removal of developers from the system and the corresponding introduction of faults spreads and impacts the system as a whole. We model the functionality of a given library using a production function approach (inspired by existing work on supply chain networks), assuming that library functionality is a function of both active developers and functional upstream dependencies.

We apply this method to a comprehensive dataset of the Rust ecosystem, covering tens of thousands of libraries maintained by tens of thousands of individuals.

15:45
Propagation of disruptions in supply networks of essential goods: A population-centered perspective of systemic risk

ABSTRACT. The Covid-19 pandemic drastically emphasized the fragility of national and international supply networks (SNs),leading to significant supply shortages of essential goods for people, such as food and medical equipment. Severe disruptions that propagate along complex SNs can expose the population of entire regions or even countries to these risks. A lack of both, data and quantitative methodology, has hitherto hindered us to empirically quantify the vulnerability of the population to disruptions. Here we develop a data-driven simulation methodology to locally quantify actual supply losses for the population that result from the cascading of supply disruptions. We demonstrate the method on a large food SN of a European country including 22,938 business premises, 44,355 supply links and 116 local administrative districts. We rank the business premises with respect to their criticality for the districts' population with the proposed systemic risk index, SRIcrit, to identify around 30 premises that -- in case of their failure -- are expected to cause critical supply shortages in sizable fractions of the population. The new methodology is immediately policy relevant as a fact-driven and generalizable crisis management tool. This work represents a starting point for quantitatively studying SN disruptions focused on the well-being of the population.

14:30-16:00 Session Oral O5B: Human Behavior
14:30
Which sparrows are making a summer? The role of conformism and network structure in shaping cooperative behavior.

ABSTRACT. The relationship between cooperative and individualistic behavior has been the subject of research since the advent of economics. The question becomes especially important in light of the depletion of natural resources. We design a model of iterative multiplayer PD games where players can be individualistic, which resembles the standard payoff-maximizing approach in PD games, or conformist, when players choose the strategy that the majority of their peers play. These attitudes are updated according to replicator dynamics and players are placed in an exogenous (random or scalefree) network structure determining social connections. We also introduce financial incentives in favor of cooperative behavior. The main conclusion is that cooperation can be advanced through monetary incentives better in scalefree network structures, but scalefree setups seem to be underperforming when the incentive is withdrawn as they produce lower levels of and larger drops in cooperation shares. These results appreciate the design and operation of incentive schedules that build on individualistic behavior.

14:45
Quantifying complexity and similarity of chess openings using online chess communities data

ABSTRACT. Among all board games, chess is by no doubt one of the most fascinating. Opening Theory is one of the pillars of this game and requires years of study to be mastered. Here we exploit the ``wisdom of the crowd'' in online chess platforms to answer questions that, traditionally, only chess experts can manage. We first define the relatedness network of chess openings that quantifies how similar are two openings to play. In this network, we spot communities of nodes corresponding to the most common opening choices and their mutual relationships. We use this network to forecast the future openings players will start to play, and we back-test these predictions, with performances considerably higher than those of a random predictor. Finally, we use the Economic Fitness and Complexity algorithm to measure how difficult to play openings are and how skilled in openings players are.

15:00
A Spinglass Model of Video Recorded Street Violence
PRESENTER: Jeroen Bruggeman

ABSTRACT. When groups encounter challenges in uncertain situations, collective action—in our case, attack of or defense against opponents—can be explained by an Ising spinglass model with asymmetric spin values. This model makes a novel prediction about the temporal unfolding of collective action. When a proportion of group members does not contribute, and the remainder conditional cooperators may cooperate if enough others do, a mean-field analysis predicts that collective action breaks out in a burst if the proportion of defectors is below a critical value. Above the critical value, there is no burst but a fizzle of a few group members who start cooperating asynchronously. Furthermore, small groups and small clusters in large groups are more easily agitated to cooperate than large groups even though they have a smaller chance of winning. The predicted critical value, the two temporal patterns, and the size effect are strongly supported by video data of street fights.

15:15
An information-theoretic approach to hypergraph psychometrics

ABSTRACT. Psychological network approaches propose to see symptoms or questionnaire items as interconnected nodes, with links between them reflecting pairwise statistical dependencies evaluated cross-sectional, time-series, or panel data. These networks constitute an established methodology to assess the interactions and relative importance of nodes/indicators, providing an important complement to other approaches such as factor analysis. However, focusing the modelling solely on pairwise relationships can neglect potentially critical information shared by groups of three or more variables in the form of higher-order interdependencies. To overcome this important limitation, here we propose an information-theoretic framework based on hypergraphs as psychometric models. As edges in hypergraphs are capable of encompassing several nodes together, this extension can thus provide a richer representation of the interactions that may exist among sets of psychological variables. Our results show how psychometric hypergraphs can highlight meaningful redundant and synergistic interactions on either simulated or state-of-art, re-analyzed psychometric datasets. Overall, our framework extends current network approaches while leading to new ways of assessing the data that differ at their core from other methods, extending the psychometric toolbox and opening promising avenues for future investigation.

15:30
Lexical networks constructed to correspond students’ short written responses: A quantum semantic approach
PRESENTER: Ismo Koponen

ABSTRACT. We introduce a simple method to construct lexical network (lexicons) of how students use scientific terms in written texts. The method is a generalization of a contingency-based word-pair co-occurrence analysis, based on recently introduced ideas of quantum semantics. The quantum semantics allows estimating the effect of subjective bias on weighting the importance of co-occurrence. Using the generalized word-pair co-occurrence counting, we construct students’ lexicons of scientific (life-science) terms they use in their responses in question concerning food chains in life-science contexts. In addition, the method allows constructing ensembles of lexicons simulating probabilistically variability of individual lexicons. The re-analyzes of the written reports show that while sets of top-ranking terms contain nearly the same terms irrespective of details of the method of counting co-occurrences, the relative rankings of some key-terms may be different in quantum semantic analysis.

15:45
Gender and the influence of the department network in topic selection on early career faculty
PRESENTER: Lluis Danus

ABSTRACT. The topics of scientific research are constantly evolving, new topics gain attention while others languish. This process has sped up considerably in recent decades due to the explosion in the number of published research articles every year, putting extra pressure on scientists who have to keep up to date with recent developments in order to advance their careers. Therefore, as they move with science, careers of individual scientists are also intrinsically linked to change. Arguably, one of the milestones in the career of an early-career scientist is obtaining a faculty position at a research institution once their postdoctoral training is completed. The research environment provided by the department is bound to have an influence in the early, more ductile stages of a scientific career. And yet, studies looking at the evolution of scientific careers do rarely consider the host institution or department as one of the factors playing an important role in the development of early-career faculty (ECF), and if this factor affects equally to female and male researchers.

Our assumption is that departments and research institutions act as invisible knowledge vectors by exposing scientists to the same ideas, and become potential incubators for new ideas through collaboration among faculty members. To test this, we examine two cohorts of ECF in departments of Chemical Engineering using a network approach to topic modeling. The first dataset contains the data of ECF who were offered a position in a department and their response (acceptance or declination). Our results show that years after joining a department there is a drift towards the topics of the department before the acceptance for those who joined the faculty, while those who declined the offer drift away from the department topics. Then our second dataset comprises ECF in the top 50 Chemical Engineering departments in Europe and North America. This second analysis reveals striking differences between male and female ECF, with the former converging towards the department topics and the latter mostly moving away from them despite that both joined the department and have a similar number of collaborators. Finally, we examine departmental collaboration networks of ECF and observe a clear distinction in how they select their collaborators. Despite we do not observe differences when collaborating with other researchers of the same academic age, female ECFs tend to collaborate preferably with senior male over senior female faculty while their male colleagues collaborate equally with both groups.

Our findings show that while departments exert an attractive force on early-career faculty, this force affects differently male and female faculty. While much more work still needs to be done to elucidate the reasons behind such disparities, our work highlights the importance of studying research environments to fully understand gender differences in science.

14:30-16:00 Session Oral O5C: Biological Networks
14:30
Modeling of Hardy-Weinberg Equilibrium using dynamic random networks in an ABM framework
PRESENTER: Valentino Romano

ABSTRACT. Hardy-Weinberg equilibrium is the fundamental principle of population genetics. In this article, we present a new NetLogo model called “Hardy-Weinberg Basic model v 2.0”, characterized by a strict adherence to the original assumptions made by Hardy and Weinberg in 1908. A particularly significant feature of this model is that the algorithm does not make use of the binomial expansion formu-la. Instead, we show that using a procedure based on dynamic random networks, diploid equilibrium can be achieved spontaneously by a population of agents re-producing sexually in a Mendelian fashion. The model can be used to conduct simulations with a wide range of initial population sizes and genotype distribu-tions for a single biallelic autosomal locus. Moreover, we also show that without any mathematical formalism the algorithm is also able to confirm the prediction of Kimura’s diffusion equations on the time required to fix a new neutral allele in a population, due to genetic drift alone.

14:45
Redundancy in the Structure and Dynamics of Complex Systems

ABSTRACT. While most advances in complex systems have come from the study of patterns of connectivity (network structure), which provides many insights into the organization of complex systems, a critical gap remains in understanding how the structure of networks affects the dynamics of complex systems. For instance, in brain networks we do not know how synaptic connectivity leads to the dynamical patterns of functional connectivity that are responsible for human behavior. Likewise, while we know much about the connectivity of gene and protein regulation from existing systems biology models, the structure of interactions from these models is not sufficient to predict regulatory dynamics or derive control strategies that allow us, for instance, to revert a diseased cell to a healthy state. Our lab has been working to addresses this critical gap with an original insight: in addition to patterns of connectivity and patterns of dynamics, there are important patterns of redundancy which dictate how structure affects dynamics in networks. We summarize our work on the distance backbone and effective graph referring to papers published in the last year as well as some currently under review.

15:00
From the connectome to action: emergent dynamics in a robotic model based on the connectome of the nematode C. elegans
PRESENTER: Pablo M. Gleiser

ABSTRACT. We analyze the neural dynamics and its relation with the emergent actions of a robotic vehicle that is controlled by a neural network numerical simulation based on the nervous system of the nematode Caenorhabditis elegans. The robot interacts with the environment through a sensor that transmits the information to sensory neurons, while motor neurons outputs are connected to wheels. This is enough to allow emergent robot actions in complex environments, such as avoiding collisions with obstacles. Working with robotic models makes it possible to keep track simultaneously of the detailed microscopic dynamics of all the neurons and also register the actions of the robot in the environment in real time, avoiding the complex technicalities of simulating a real environment. This allowed us to identify several relevant features of the microscopic dynamics associated with the emergent macroscopic behavior, some of which have already been observed in biological worms. These results suggest that some basic complex macroscopic behaviors observed in living beings can be almost completely determined by the underlying structure of the associated neural network, being relatively independent of the detailed neuronal dynamics.

15:15
Inferring probabilistic Boolean networks from steady-state gene data samples

ABSTRACT. Probabilistic Boolean Networks (PBNs) have been proposed for estimating the behaviour of dynamical systems as they combine rule-based modelling with uncertainty principles. Inferring PBNs directly from gene data is challenging however, especially when data is costly to collect and/or noisy, e.g., in the case of gene expression profile data. In this paper, we present a reproducible method for inferring PBNs directly from {\it real} gene expression data measurements taken when the system was at a steady state. The steady-state dynamics of PBNs is of special interest in the analysis of biological machinery. The proposed approach does not rely on reconstructing the state evolution of the network, which is computationally intractable for larger networks. We demonstrate the method on samples of real gene expression profiling data from a well-known study on metastatic melanoma. The pipeline is implemented using Python and we make it publicly available as an OpenAI gym environment.

15:30
Quantifying High-Order Interactions in Complex Physiological Networks: a frequency-specific approach
PRESENTER: Laura Sparacino

ABSTRACT. Recent advances in information theory have provided several tools to characterize high-order interactions (HOIs) in complex systems. Among them, the so-called O-information is emerging as particularly useful in practical analysis thanks to its ability to capture the overall balance between redundant and synergistic HOIs. While the O-information is computed for random variables, its extension to random processes studied in the frequency domain is very important to widen the applicability of this tool to networks whose node exhibit rich oscillatory content, such as brain and physiological networks. This work presents the O-information rate (OIR), a measure based on the vector autoregressive and state space modelling of multivariate time series devised to assess the synergistic and redundant HOIs among groups of series in specific bands of biological interest. The new measure is illustrated in two paradigmatic examples of physiological networks characterized by coupled oscillations across a wide range of temporal scales, i.e. the network of cardiovascular and cerebrovascular interactions where redundant synchronized activity emerges around the frequencies of vasomotor and respiratory rhythms, and the network of scalp electroencephalographic signals where synergetic HOIs are detected among the alpha and beta waves recorded over the primary sensorimotor cortex.

15:45
A Novel Reverse Engineering Approach for Gene Regulatory Networks.
PRESENTER: Francesco Zito

ABSTRACT. Capturing the rules that govern a particular system can be useful in any field where the causes of its effects are unknown. Indeed, discovering the causes that produced a particular effect is extremely useful in fields such as biology. In this paper, a reverse engineering method based on machine learning is proposed. This method was used to replicate real world behaviour and use this knowledge to generate the relative Gene Regulatory Network. The datasets from the DREAM4 competition were used to validate this method.

16:00-16:35Coffee Break
16:00-16:35 Session Poster P4A: [1-6] Networks in Finance & Economics
On the Empirical Association between Trade Network Complexity and Global Gross Domestic Product
PRESENTER: Mayank Kejriwal

ABSTRACT. In recent decades, trade between nations has constituted an important component of global Gross Domestic Product (GDP), with official estimates showing that it likely accounted for a quarter of total global production. While evidence of association already exists in macro-economic data between trade volume and GDP growth, there is considerably less work on whether, at the level of individual granular sectors (such as vehicles or minerals), associations exist between the complexity of trading networks and global GDP. In this paper, we explore this question by using publicly available data from the Atlas of Economic Complexity project to rigorously construct global trade net- works between nations across multiple sectors, and studying the correlation between network-theoretic measures computed on these networks (such as average clustering coefficient and density) and global GDP. We find that there is indeed significant association between trade networks’ complexity and global GDP across almost every sector, and that network metrics also correlate with business cycle phenomena such as the Great Recession of 2007-2008. Our results show that trade volume alone cannot explain global GDP growth, and that network science may prove to be a valuable empirical avenue for studying complexity in macro-economic phenomena such as trade.

Measuring the Stability of Technical Cooperation Network Based on the Nested Structure Theory
PRESENTER: Wenhui Liu

ABSTRACT. Nested structure is a structural feature that is conducive to system stability formed by co-evolution. In our opinion, it is just like what the biological species do in the mutualistic ecosystem that enterprises collaborate to apply for patents in the technical cooperation network, changing to form one dynamic equilibrium after another. In this paper, a nestedness-based analytical framework is built to reflect the topological stability of the technical cooperation network of Zhongguancun Science Park (Z-Park). We study why this technically mutualistic ecosystem can reach a stable equilibrium with time going by, as well as, we propose an index called Nestedness Disturbance Index (NDI) to study what the role park areas and technical fields play in the steady states.

Optimizing robustness of modular networks based on reinforced nodes
PRESENTER: Yael Kfir-Cohen

ABSTRACT. Many real-world infrastructure systems include various resources, for example, the electric generators in the electricity network. Generally, the number of resources might be limited, so the important issue of centralization versus decentralization is a crucial consideration. In addition, it is common for infrastructure systems to exhibit a modular structure in which there are many links within the cities and few links between them. Here, we analyze a modular network that contains reinforced nodes, which serve as power resources that contribute to the functionality of the network. We use tools from percolation theory to analytically derive novel fundamental equations for studying the resilience of our network model. The results obtained from these equations were also confirmed by numerical simulations. We show how to distribute the reinforced nodes to achieve optimal robustness. Specifically, we distinguish between reinforced nodes that have some links connecting them to other modules (“inter-nodes”) and nodes whose links are entirely contained within their modules (“intra-nodes”). We find that the functionality of the network strongly depends on the amounts of reinforced nodes and inter-nodes. Furthermore, changes in the distributions of the same number of reinforced nodes can have a significant impact on the functionality of the network. We find the optimal way to distribute the reinforced nodes between inter-nodes and intra-nodes so that the system is maximally robust. Finally, we observe that above a certain average degree, the optimal distribution of the reinforced nodes does not depend on the average degree of the network.

The way to webify the financial system in China--from individual credit reporting

ABSTRACT. Network science has a plethora of applications. The financial system is among the most important ones. How to build networks of banks relies on not only the network science but the particularities of financial systems. This paper will argue that individual credit reporting (ICR) is a good entry point to realize webifying financial systems in China. Reasons include that ICR is widely promoted in China, which covers personal basic information including birth dates, birthplaces, incomes, consumption, work, business, bank loans, investment in stocks, bonds, foreign exchanges, and almost all important information. Individuals are the dots in financial networks, who weave a web by making contact with banks and other financial institutions. Most papers on this topic are limited by methodologies and are far away from digging into the core of building the financial system. This paper is revolutionary in both the methodology and the angle to realize the goal of building financial networks.

Exploring the energy complexity nexus in the production space with network theory

ABSTRACT. The intuitive notion of complexity is fairly clear in the sense that given differ- ent complex systems, there is generally a consensus as to which ones are more

complex and which ones are more simple. However, a formal definition and quan- tification is lacking dispite several attempts. In the context of a socio-energetic

transition that the modern world must traverse in order to become sustainable, a better understanding of complexity is important because complexity has been tied with power density [1]. This implies that the possibilities of achieving a certain socio-economic complexity are limited by the level of sustainable power density available to our socio-economic system.

In this work, we present several forms to rank complexity according to sym- metries of the composing motifs - elemental building blocks - of weighted directed

networks [2]. We apply the rankings to networks constructed to represent the sequence of transformations goods undergo from their initial extraction phase to their final state as consumer products to be used and discarded.

Measuring the Importance of Industrial Sectors based on Biased Random Walk Algorithm: Industrially Eco-nomic Impact of Trump Administration's Trade Policy toward China
PRESENTER: Dawei Wang

ABSTRACT. During the Trump administration, trade frictions between China and the U.S. have escalated and finally turned into a trade war, deeply changing the structure of international division and trading merchandise. How to measure the impact received by the Global Value Chain (GVC) and these two countries during this period under the perspective of complex science is an important issue deserved study. This paper adopted the Multi-Region Input-Output (MRIO) data made by Asian Development Bank (ADB) to construct the Global Industrial Value Chain Network (GIVCN) model, which is actually the reduction of spreading of intermediate goods on the GVC, designed the dynamic network characteristic indices based on the Markov process to measure the industrial influence and demand dependence of industrial sectors in the global scope, and the relationship between them and the level of national economic development.

16:00-16:35 Session Poster P4B: [7-11] Structural Network Measures
Statistical network similarity

ABSTRACT. Graph isomorphism is a problem for which there is no known polynomial-time solution. Nevertheless, assessing (dis)similarity between two or more networks is a key task in many areas, such as image recognition, biology, chemistry, computer and social networks. Moreover, questions of similarity are typically more general and their answers more widely applicable than the more restrictive isomorphism question. In this article, we offer a statistical answer to the following questions: a) {\it ``Are networks $G_1$ and $G_2$ similar?''}, b) {\it ``How different are the networks $G_1$ and $G_2$?''} and c) {\it ``Is $G_3$ more similar to $G_1$ or $G_2$?''}. Our comparisons begin with the transformation of each graph into an all-pairs distance matrix. Our node-node distance, Jaccard distance, has been shown to offer a good reflection of the graph's connectivity structure. We then model these distances as probability distributions. Finally, we use well-established statistical tools to gauge the (dis)similarities in terms of probability distribution (dis)similarity. This comparison procedure aims to detect (dis)similarities in connectivity structure, not in easily observable graph characteristics, such as degrees, edge counts or density. We validate our hypothesis that graphs can be meaningfully summarized and compared via their node-node distance distributions, using several synthetic and real-world graphs. Empirical results demonstrate its validity and the accuracy of our comparison technique.

Intersection of random spanning trees in small-world networks
PRESENTER: András London

ABSTRACT. Alon et al. \cite{alon} investigated the following 2-player zero-sum game on a connected graph $G$: \emph{tree player} chooses a spanning tree $T$ of $G$, while \emph{edge player} chooses an edge $e$ of $G$. The payoff to the edge player is defined by a function $\mathrm{cost}(T,e)$. It is a natural continuation of their work is to consider the case when both players are a tree player. This also leads us to the problem of intersection of random spanning trees, that is the number of common edges of two spanning trees of $G$ chosen uniformly at random. In this paper we derive a lower bound for the minimum expected intersection and using bootstrap simulations we determine the empirical mean value for synthetic and real networks. Experiments show that for some real networks the observed empirical mean intersection highly differs from the minimum expected. Our finding may provide a new perspective of investigating real small-world networks and gives some new insights on the structure of them.

Node Classification Based on Non-symmetric Dependencies and Graph Neural Networks
PRESENTER: Emanuel Dopater

ABSTRACT. One of the interesting tasks in social network analysis is detecting network nodes’ roles in their interactions. The first problem is discovering such roles, and the second is detecting the discovered roles in the network. Role detection, i.e., assigning a role to a node, is a classification task. Our paper addresses the second problem and uses three roles (classes) for classification. These roles are based only on the structural properties of the neighborhood of a given node and use the previously published non-symmetric relationship between pairs of nodes for their definition. This paper presents transductive learning experiments using graph neural networks (GNN) to show that excellent results can be obtained even with a relatively small sample size used for training the network.

Delta density: comparison of different sized networks irrespective of their size
PRESENTER: Jakub Plesnik

ABSTRACT. This paper describes a complex area of measuring network density with a focus on the ability to compare density between multiple different-sized networks. We point out problems of the classical approach that comes from comparing the density of networks of different sizes, and in response, we introduce a new measure called ∆-density. Theoretical background for ∆-density is accompanied by a practical example of appli- cation. Experiments use five real-world networks with time information to analyze how is ∆-density changing over time.

Detecting network hyper-motifs in real networks and their emergent properties
PRESENTER: Miri Adler

ABSTRACT. Networks are fundamental for our understanding of complex systems. The study of networks has uncovered common principles that underlie the behavior of vastly different fields of study, including physics, biology, sociology, and engineering. One of these common principles is the existence of network motifs—small recurrent patterns that can provide certain features that are important for the specific network. However, it remains unclear how network motifs are joined in real networks to make larger circuits and what properties emerge from interactions between network motifs. Here, we develop a framework to explore the mesoscale-level behavior of complex networks. Considering network motifs as hypernodes, we define the rules for their interaction at the network’s next level of organization. We develop a method to infer the favorable arrangements of interactions between network motifs into hyper-motifs from real evolved and designed network data. We mathematically explore the emergent properties of this higher-order circuits and their relations to the properties of the individual minimal circuit components they combine. We apply this framework to biological, neuronal, social, linguistic, and electronic networks and find that network motifs are not randomly distributed in real networks but are combined in a way that both maintains autonomy and generates emergent properties. This framework provides a basis for exploring the mesoscale structure and behavior of complex systems where it can be used to reveal intermediate patterns in complex networks and to identify specific nodes and links in the network that are the key drivers of the network’s emergent properties.

16:35-17:15 Session Speaker S4: Melanie MITCHELL Santa Fe Institute, USA
16:35
Why AI is Harder Than We Think

ABSTRACT. Why AI is Harder Than We Think Since its beginning in the 1950s, the field of artificial intelligence has cycled several times between periods of optimistic predictions and massive investment (“AI Spring”) and periods of disappointment, loss of confidence, and reduced funding (“AI Winter”). Even with today’s seemingly fast pace of AI breakthroughs, the development of long-promised technologies such as self-driving cars, housekeeping robots, and conversational companions has turned out to be much harder than many people expected.

One reason for these repeating cycles is our limited understanding of the nature and complexity of intelligence itself. In this talk I will discuss some fallacies in common assumptions made by AI researchers, which can lead to overconfident predictions about the field. I will also speculate on what kinds of new ideas and new science will be needed for the grand challenge of making AI systems more robust, general, and adaptable—in short, more intelligent.

17:15-19:15 Session Oral O6A: Information Spreading in Social Media
Chair:
17:15
Change my Mind: Data Driven Estimate of Open-Mindedness from Political Discussions

ABSTRACT. One of the main dimensions characterizing the unfolding of opinion formation processes in social debates is the degree of open-mindedness of the involved population. Opinion dynamic modeling studies have tried to capture such a peculiar expression of individuals' personalities and relate it to emerging phenomena like polarization, radicalization, and ideology fragmentation. However, one of their major limitations lies in the strong assumptions they make on the initial distribution of such characteristics, often fixed so as to satisfy a normality hypothesis. Here we propose a data-driven methodology to estimate users' open-mindedness from online discussion data. Our analysis - focused on the political discussion taking place on Reddit during the first two years of the Trump presidency - unveils the existence of statistically diverse distributions of open-mindedness in annotated sub-populations (i.e., Republicans, Democrats, and Moderates/Neutrals). Moreover, such distributions appear to be stable across time and generated by individual users' behaviors that remain consistent and underdispersed.

17:30
Testing the dynamics of online polarization
PRESENTER: Matteo Cinelli

ABSTRACT. Several studies pointed out that users seek the information they like the most, filter out dissenting information, and join groups of like-minded users around shared narratives. Feed algorithms may burst such a configuration toward polarization, thus influencing how information (and misinformation) spreads online. However, despite the extensive evidence and data about polarized opinion spaces and echo chambers, the interplay between human and algorithmic factors in shaping these phenomena remains unclear. In this work, we propose an opinion dynamic model mimicking human attitudes and algorithmic features. We quantitatively assess the adherence of the model's prediction to empirical data and compare the model performances with other state-of-the-art models. We finally provide a synthetic description of social media platforms regarding the model's parameters space that may be used to fine-tune feed algorithms to eventually smooth extreme polarization.

17:45
Opinion polarization as a natural consequence of beliefs being interrelated

ABSTRACT. The emergence of opinion polarization within human communities -- the phenomenon that individuals within a society tend to develop conflicting attitudes related to the greatest diversity of topics -- has been in the focus of interest for decades, both from theoretical and modelling points of view. Regarding modelling attempts, an entire scientific field -- opinion dynamics -- has emerged in order to study this and related phenomena. Within this framework, agents' opinions are usually represented by a scalar value which undergoes modification due to interaction with other agents. Under certain conditions, these models are able to reproduce polarization -- a state increasingly familiar to our everyday experience. In the present paper an alternative explanation is being suggested along with its corresponding model. More specifically, we demonstrate that by incorporating two well-known human characteristics into the representation of agents, namely: (1) in the human brain beliefs are interconnected, and (2) people strive to maintain a coherent belief system; polarization immediately starts up under exposure to news and information. Furthermore, the model gives account for the proliferation of fake news, and shows how opinion polarization is related to various cognitive biases.

18:00
The Russian invasion of Ukraine through the lens of ex-Yugoslavian Twitter
PRESENTER: Bojan Evkoski

ABSTRACT. The war between Ukraine and Russia brings about dramatic changes in the world. Analysing the structure and content of the communications on social media, such as Twitter, can help in understanding the causes, developments and consequences of this conflict. The geographical area of interest in our research is the part of ex-Yugoslavia where the BCMS (Bosnian, Croatian, Montenegrin, Serbian) languages are spoken, official varieties of the pluricentric Serbo-Croatian macro-language. This area is strongly politically divided by diverging influences of NATO (Croatia, Montenegro, Northern Macedonia, Bosniak and Croatian entity in Bosnia and Herzegovina) and Russia (Serbia, Serbian entity in Bosnia and Herzegovina). While Croatia is full EU member since 2013, Montenegro, Northern Macedonia and Serbia are EU candidate members, while Bosnia and Herzegovina is potential candidate. Regarding military alliances, NATO members are Croatia (since 2007), Montenegro (since 2017) and Northern Macedonia (since 2020), while Serbia does not aspire to join NATO, primarily due to a complex Serbia-NATO relationship caused by the NATO intervention in Yugoslavia in 1999.

To shed light on the impact of the Russian invasion on this brittle and complex geographical and political area, we use social network analysis over available Twitter data, 6-weeks before and 6-weeks during the invasion. We discover a complex landscape of ideology-specific and country-specific communities, and analyse the transition into evident pro-Ukraine and pro-Russia leanings.

As the communities show very divergent properties, we echo concerns of the heavy polarization and possible destabilization of this area of the Balkans.

18:15
The Evolution of the Covid-19 Vaccine Debate on Twitter

ABSTRACT. The COVID-19 pandemic has been a time of great uncertainty on a global scale. Since the outbreak of the coronavirus, we have witnessed a surge of announcements and contrasting information on both online and traditional media. In this paper, we analyze how online (mis)information about COVID-19 vaccines evolved on Twitter in the period from January 2020 to April 2021 across five countries: France, Germany, Great Britain, Italy, and USA. In particular, our study explores how information linked to reliable and questionable sources circulated on Twitter and whether its diffusion changed in correspondence with two major events, i.e., the announcement of the first Covid-19 vaccine from Pfizer, and the European Medicines Agency (EMA)'s suspension of the AstraZeneca vaccine. Our results show that news from questionable sources are present in the debate of all five countries. Moreover, the network analysis reveals that users cluster together in communities consuming opposite kind of information sources, suggesting the presence of an echo chamber effect.

18:30
Assessing the Role of Influencers in Tweet Propagation
PRESENTER: Joel Nilsson

ABSTRACT. Understanding the mechanisms by which information spreads on social media is of importance in many different contexts, from viral marketing campaigns to news diffusion. In particular, a widely investigated aspect of information spreading has to do with the role of so-called ``influencers'', here intended as users with a large basin of followers in an online platform. The aim of this paper is to contribute to understand the role of influencers on the Twitter microblogging platform. The perspective we take is completely model-free and does not rely on any network of interactions among Twitter users. It is based on comparing the engagement (i.e., overall amount of retweets, quotes and replies) on a specific Twitter conversation thread before and after an influencer happens to contribute to it. What we find in our analysis is that users that have a large following (i.e., influencers) tend to appear in the conversation before or in conjunction with the engagement peaks, and that the overall engagement seems to increase after an influencer has contributed to the conversation.

18:45
Using knowledge graphs to detect partisanship in online political discourse
PRESENTER: Ari Decter-Frain

ABSTRACT. Existing methods for detecting partisanship and polarization on social media focus on either linguistic or network aspects of online communication, and tend to study a single platform. We explore the possibility of quantifying online polarization using knowledge graph embeddings, which can potentially combine linguistic and network information across multiple platforms to enable more accurate discovery of a political dimension in online space. We train embeddings on graphs that combine different types of text- and network information, and include either Twitter data or data from multiple social media platforms. We then compare an unsupervised approach (PCA) and a semi-supervised, seed-based approach for uncovering a political dimension in the embeddings. We find that both methods provide minimal meaningful information about the partisanship of Twitter users.We suggest possible reasons why the methods struggle, which stem from the underlying data, and some unresolved challenges of working with knowledge graph embeddings.

19:00
Cognitive Cascades within Media Ecosystems: Simulating Fragmentation, Selective Exposure and Media Tactics to Investigate Polarization
PRESENTER: Nicholas Rabb

ABSTRACT. This work introduces a simple extension to the recent Cognitive Cascades model of Rabb et al. with modeling of multiple media agents, to begin to investigate how the media ecosystem might influence the spread of beliefs (such as beliefs around COVID-19 vaccination). We perform some initial simulations to see how parameters modeling audience fragmentation, selective exposure, and responsiveness of media agents to the beliefs of their subscribers influence polarization.

17:15-19:15 Session Oral O6B: Diffusion & Epidemics
17:15
Exact solution of heterogeneous Markovian SI epidemics on networks

ABSTRACT. We exactly solve the SI process on heterogeneous networks. Of primary importance are the eigenvalues of the infinitesimal generator Q, that are on its diagonal, and are equal to the sum of all weighted links in the cut set between all susceptible and all infected nodes. Furthermore, we show that small eigenvalues correspond to bottleneck configurations in the SI spreading process. Finally, we compute the exact solution for all weighted networks and all times under some light assumptions, using an iterative construction of the eigenvectors of Q. We demonstrate the validity of our method by simulations.

17:30
Birth-Death Processes Reproduce the Epidemic Footprint
PRESENTER: Gerrit Großmann

ABSTRACT. *This is an extended abstract submission, the introduction is provided below*

The stochastic dynamical behavior of epidemics is usually highly dependent on the topology of possible agent interactions. Often the possible interactions are constrained by a graph structure, i.e., the contact network. The standard paradigm in this category is the Susceptible-Infected-Susceptible (SIS) model. Nodes are either infected (I) or susceptible (S). Infected nodes (randomly) propagate their infection to their susceptible neighbors and can spontaneously recover (i.e., become susceptible again). Here, we consider a continuous-time model where the waiting times between events follow an exponential distribution.

A crucial value for any SIS model is the effective infection rate $\beta$, defined as the ratio between infection and recovery rate (thus, for simplicity, we set the recovery rate to one). It determines the epidemics' characteristics, such as its long-time behavior. To this end, we define the infection footprint $\tau(\beta)$ to be the expected fraction of infected nodes in equilibrium as our value of interest. For technical reasons (i.e., to get rid of the extinction), we set the recovery rate to zero if only a single node is infected. Thereby, we artificially remove the trap state where all nodes are susceptible and get a meaningful and well-defined equilibrium. Our goal is to estimate $\tau(\cdot)$ efficiently. To this end, we propose a model reduction (or lumping) technique that provides an abstraction of the network topology and leads to a simple birth-death process. Both, the (approximate) construction and (exact) analysis of the reduced model are computationally fast and can be performed for contact networks with millions of nodes with ease.

17:45
Influence of heterogeneous age-group contact patterns on critical vaccination rates for herd immunity to SARS-CoV-2
PRESENTER: Caterina Scoglio

ABSTRACT. In this presentation, we deal with the question of the challenges associated to creating herd immunity to SARS-CoV-2 infection by means of preventive vaccination strategies with waning immunity that take into account the contact rates among age segments. In particular, short-lived immunity implies that continuous vaccination campaigns are needed to preserve the herd immunity. Therefore, we adopt the assumption of reaching a disease-free equilibrium (DFE) where susceptible and vaccinated individuals are only present. Then, using an age-structured Susceptible-Infected-Recovered-Vaccinated model, we firstly derive the expression for the vaccination rates that lead to the maximum vaccination coverage at this equilibrium for a supply of vaccines per unit time given by a fixed mean per capita vaccination rate w. Next, if R*0 denotes the basic reproduction number at the DFE with vaccinated individuals, we compute two different sets of per age-group vaccination rates: 1) the set that minimizes R*0 with the constraint that w is the same as the critical per capita rate wc under uniform vaccination (R*0(wc) = 1), and 2) the set at which the minimum R*0 equals 0.996 when a suitable (and lower) total vaccination rate is assumed.

For the limited supply of vaccine given by wc, we found that the value of R*0 obtained by maximizing the vaccination coverage is always larger than the minimum of R*0 attainable under the same constraint on the mean per capita vaccination rate. The latter then defines the optimal allocation of vaccines among age groups under the given supply. On the other hand, since this minimum R*0 will be clearly less than 1, the vaccination rates of the second set (R*0 = 0:996) will be smaller than those of the first set, thus achieving the herd immunity at a lower supply of vaccine.

18:00
Ensemble of Opinion Dynamics Models to Understand the Role of the Undecided about Vaccines
PRESENTER: Jacopo Lenti

ABSTRACT. In the last years, vaccines debate has attracted the attention of all the social media, with an outstanding increase during COVID-19 vaccinations campaigns. The topic has created at least two opposing factions, pro- and anti-vaccines, that have conflicting and incompatible narratives. However, a not negligible fraction of the population has an unclear position, as many citizens feel confused by the vast amount of information coming from both sides in the online social network. The engagement of the undecided population by the two parties has a key-role in the success of the vaccination campaigns. In this paper, we present three models used to describe the recruitment of the undecided population by pro-vax and no-vax factions in a three-states context. Starting from real-world data of Facebook pages previous labelled as pro-, anti-vaccines or neutral, we describe and compare three opinion dynamics models that catch different behaviours of the undecided population. The first one is a variation of the SIS model, where undecided position is considered an indifferent position, including users not interested in the discussion. Neutrals can be “infected” by one of the two extreme factions, joining their side, and they “recover” when they lose interest in the debate and go back to neutrality. The second model studied is a Voters model with three parties: neutral pages represent a centrist position. They lean their original ideas, that are different from both the other parties. The last is the Bilingual model adapted to the vaccination debate: it describes a context where neutral individuals are in agreement with both pro-, ad anti-vax factions, with a position of compromise between the extremes (“bilingualism”). If they have a one-sided neighbourhood, the necessity (or the convenience) to agree with both parties comes out, and bi-linguists can become mono-linguists. Our results depicts an agreement between the three models: anti-vax opinion propagates more than pro-vax, thanks to an initial strategic position in the online social network (even if they start with a smaller population). While most of the pro-vaccines nodes are segregated in their own communities, no-vaccines ones are entangled at the core of the network, where the majority of undecided population is located. In the last section, we propose and compare some policies that could be applied on the network to prevent anti-vax overcome: they lead us to conclude that censoring strategies are not effective, as well as segregating scenarios based on unfollowing decisions, while the addition of links in the network favours the containment of the pro-vax domain, reducing the distance between pro-vaxxers and undecided population.

18:15
Epidemic evolution in a dynamic multilayer network to study the spread of antibiotic resistance
PRESENTER: Paola Stolfi

ABSTRACT. Antimicrobial resistance (AMR) is a major public health problem of the 21st century. The present work has been conducted within the JPIAMR-project MAGIcIAN whose aim is to support the sustainable introduction of antimicrobial drugs minimising the emergence of AMR. In particular, we developed a multi-level model to describe the spread of gonorrhoea, which is caused by a multidrug resistant bacterium, at individual level (within-host) and at population level (between-host). Here we focus on the latter level which includes a dynamic multilayer sexual contact network. Each layer represents a sexual community and we provide a realistic description of the formation and break up processes of sexual partnerships in each layer allowing for the possibility of mixing among layers. We show the behaviour of the network through numerical simulations and we calibrate the model over real data.

18:30
Overcoming vaccine hesitancy by multiplex social network targeting

ABSTRACT. Understanding the impact of social factors on disease prevention and control is one of the key questions in behavioral epidemiology. The interactions of disease spreading and human health behavior such as vaccine uptake give rise to rich dueling dynamics of biological and social contagions. In light of this, it remains largely an open problem for optimal network targeting in order to harness the power of social contagion for behavior and attitude changes. Here we address this question explicitly in a multiplex network setting. Individuals are situated on two layers of networks. On the disease transmission network layer, they are exposed to infection risks. In the meantime, their opinions and vaccine uptake behavior are driven by the social discourse of their peer influence network layer. While the disease transmits through direct close contacts, vaccine views and uptake behaviors spread interpersonally within a long-range potentially virtual network. Our comprehensive simulation results demonstrate that network-based targeting with initial seeds of pro-vaccine supporters significantly influences the ultimate adoption rates of vaccination and thus the extent of the epidemic outbreak.

18:45
Inferring the transmission dynamics of Avian Influenza from news and environmental data
PRESENTER: Nejat Arınık

ABSTRACT. Avian Influenza (AI) is a highly contagious animal disease. Although a few human infection cases occurred in the past, it mainly infects wild and domestic bird species. Transmission between birds can be direct due to close contact, or indirect through contaminated materials such as feed and water. Particularly, migratory wild birds play a key role in this transmission and make the viruses spread over long distances. The emergence and spread of AI has serious consequences for animal health and a substantial socio-economic impact for worldwide poultry producers. Due to its highly contagious nature, it is critical to monitor the evolution and spread of this disease. There are mainly two types of surveillance systems established for this purpose: Indicator-Based Surveillance (IBS) and Event-Based Surveillance (EBS). IBS usually relies on official notification procedures submitted by a country, whereas EBS extracts epidemiological events mostly from unofficial sources, such as news articles, through various Natural Language Processing tasks. Recently, several EBS platforms have shown their effectiveness by detecting the first signals of emerging infectious disease outbreaks in a timely manner and providing alerts within previously unaffected areas.

Even though it is a well-studied and constantly monitored disease, real transmission routes of the AI viruses are hard to discover, i.e. where did an outbreak that appears in some location came from. Nonetheless, acquiring such disease transmission routes may allow to discover transmission patterns and to timely predict subsequent outbreaks. In this work, we focus on the problem of inferring how the AI disease spreads in the absence of knowledge about real disease transmission routes. We address it as a network inference problem, leveraging on spatio-temporal event datasets provided by EBS platforms and environmental data. The inferred network is then used to study the transmission patterns of the AI disease. Although network inference problems have been widely studied in the domain of information diffusion, our work differentiates from others in that 1) unlike the contact network analysis which are at the individual farm level, it is at the metapopulation level (e.g., country, province, city), and 2) it reflects a more realistic scenario by taking advantage of the migration routes of wild birds, as well as poultry transportation data and environmental data related to the outbreak locations.

19:00
Detecting Global Community Structure in a COVID-19 Activity Correlation Network

ABSTRACT. The global pandemic of COVID-19 over the last 2.5 years have produced an enormous amount of epidemic/public health datasets, which may also be useful for studying the underlying structure of our globally connected world. Here we used the Johns Hopkins University COVID-19 dataset to construct a correlation network of countries/regions and studied its global community structure. Specifically, we selected countries/regions that had at least 100,000 cumulative positive cases from the dataset and generated a 7-day moving average time series of new positive cases reported for each country/region. We then calculated a time series of daily change exponents by taking the day-to-day difference in log of the number of new positive cases. We constructed a correlation network by connecting countries/regions that had positive correlations in their daily change exponent time series using their Pearson correlation coefficient as the edge weight. Applying the modularity maximization method revealed that there were three major communities: (1) Mainly Europe + North America + Southeast Asia that showed similar six-peak patterns during the pandemic, (2) mainly Near/Middle East + Central/South Asia + Central/South America that loosely followed Community 1 but had a notable increase of activities because of the Delta variant and was later impacted significantly by the Omicron variant, and (3) mainly Africa + Central/East Canada + Australia that did not have much activities until a huge spike was caused by the Omicron variant. These three communities were robustly detected under varied settings. Constructing a 3D "phase space" by using the median curves in those three communities for x-y-z coordinates generated an effective summary trajectory of how the global pandemic progressed.

17:15-19:15 Session Oral O6C: Multilayer Networks
17:15
Community recovery from temporal and real-valued network data

ABSTRACT. This is an extended abstract for the arxiv preprint "Community recovery in non-binary and temporal stochastic block models" (arXiv:2008.04790).

This article studies the estimation of latent community memberships from pairwise interactions in a network of N nodes, where the observed interactions can be of arbitrary type, including binary, categorical, and vector-valued, and not excluding even more general objects such as time series or spatial point patterns. As a generative model for such data, we introduce a stochastic block model with a general measurable interaction space S, for which we derive information-theoretic bounds for the minimum achievable error rate. These bounds yield sharp criteria for the existence of consistent and strongly consistent estimators in terms of data sparsity, the statistical similarity between intra- and inter-block interaction distributions, and the shape and size of the interaction space. Moreover, this general framework makes it possible to study temporal networks with T snapshots, in settings where both N and T go to infinity, and the temporal interaction patterns are correlated over time. We finally present an online algorithm for clustering temporal networks with Markovian interactions, which fully utilises the non-binary nature of the observed data. We illustrate on numerical experiments on synthetic and real data sets the performance of this algorithm.

17:30
A comparative analysis of multiplex phonological and orthographic networks

ABSTRACT. The study of natural language using a network approach has made it possible to characterize novel properties ranging from the level of individual words to phrases or sentences. A natural way to quantitatively evaluate similarities and differences between spoken and written language is by means of a multiplex network defined in terms of a similarity distance between words. Here, we use a multiplex representation of words based on orthographic or phonological similarity to evaluate their structure.

In order to identify possible emergent structures as well as compare the characteristic properties of each language family when modeling morphological similarity in natural language, we use various global and local metrics, such as link density, or the number of connected components, as well as the degree, the clustering coefficient and the average degree of the neighboring nodes.

17:45
Shedding a light on the ESG risk factor: a multiplex approach
PRESENTER: Davide Stocco

ABSTRACT. One of the main challenges of nowadays financial system concerns the transition to sustainability. Several rating agencies have proposed a scoring framework to reflect firms’ compliance with Environmental, Social, and Governance (ESG) responsibilities and to give an instrument to drive investors’ decisions. Employing a multiplex network approach, it is possible to allow for a multi-attribute description of the financial market network based on different market variables. In our study, we employ multiplex networks to assess the dependency structure between the market and sustainable attributes of the listed firms. We prove the presence of a significant interdependence between the ESG and the risk exposure profile of the listed firms. We find that this relationship has increased during the last years, especially for the best-performing firms.

18:00
Centrality in the Macroeconomic Multi-Network Explains the Spatio-Temporal Evolution of Country Per-Capita Income

ABSTRACT. The investigation of the determinants of the spatio-temporal evolution of country per-capita income has a long tradition in applied and theoretical economics. Within this vast body of research, a large stream of literature has recently explored the role that technology diffusion, occurring via cross-country spillovers, may play in shaping the observed heterogeneity in the patterns of country income, both cross-sectionally and over time. Empirical testing of the income-enhancing effect of technology diffusion has been traditionally carried out using aggregate measures of country openness as a proxy of the extent to which a country is exposed to foreign markets and migration flows. However, these measures are essentially local, as they only account for direct interactions with neighboring countries. Indeed, it may be the case such openness proxies are not perfectly correlated with indicators accounting for the global embeddedness of a country in the networks of international relations. In this work, we present indeed a simple theoretical country-growth model predicting that, net of country-specific spatio-temporal characteristics (including traditional openness proxies), country per-capita income should positively depend on its global importance (i.e., by her Bonacich or Katz centrality) in the macroeconomic networks wherein she is embedded. Next, we take to the data the implications of the theoretical model, using data on the international networks of merchandise trade, finance and migration. We build a 3-layer international multi-layer and we employ different measures of country centrality in such network as a covariate in panel regressions explaining country per-capita income. The empirical exercises strongly support the predictions of the model, robustly across a number of alternative specifications of the empirical model and controlling for possible endogeneity issues and spatial effects.

18:15
Estimation of flow trajectories in a multi-lines network
PRESENTER: Romain Loup

ABSTRACT. Automatic passenger counters measure the number of passengers entering and leaving buses at each stop. Given this information, can we estimate the complete trajectories of passengers within the entire multi-line network? This communication attempts to propose an estimation of all passenger trajectories in the multi-line network with an algorithm based on iterative proportional fitting (IPF).

18:30
Multilayer Block Models for Exploratory Analysis of Computer Event Logs

ABSTRACT. We investigate a graph-based approach to exploratory data analysis in the context of network security monitoring. Given a possibly large batch of event logs describing ongoing activity, we first represent these events as a bipartite multiplex graph. We then apply a model-based biclustering algorithm to extract relevant clusters of entities and interactions between these clusters, thereby providing a simplified situational picture. We illustrate this methodology through two case studies addressing network flow records and authentication logs, respectively. In both cases, the inferred clusters reveal the functional roles of entities as well as relevant behavioral patterns. Displaying interactions between these clusters also helps uncover malicious activity. Our code is available at https://github.com/cl-anssi/MultilayerBlockModels.

18:45
Interspecific competition affects the structural stability of mutualistic networks

ABSTRACT. We developed a new model taking into account heterogeneous interspecific competition. Our investigations show that this leads to drastically reduced regions of feasible and stable coexistence. In the talk we further elaborate on the differences between mean field theories and the multilayer model with varying interspecific competition. We derive conditions for which a feasible and stable population of coexisting plants and pollinators is possible.

19:00
Excess closure in a multilayer population-scale social network
PRESENTER: Eszter Bokanyi

ABSTRACT. Recent studies on large-scale social networks successfully utilise the growing abundance of digital data sources such as online social networks or mobile communication datasets to uncover fundamental insights on human interaction.

However, in most of these social network data sources, the sample of people that are represented by the nodes is biased, and lack of demographic data makes it hard to assess representativity. Moreover, it is often not clear what exact social relations these online or communication ties represent, thus, it is difficult to interpret findings when the goal is to derive meaningful conclusions about people's social ties.

We overcome a number of these drawbacks by presenting a thorough analysis of the complete structure of a 17M node population-scale social network of the Netherlands containing roughly 1.6B edges. This network is derived from highly curated official data sources of the country’s national statistics institute and includes every registered resident in 2018. The edges cover several social relationships: family, household, work, school, and neighbor. We model each of these edge types as a layer of a node-aligned multilayer network.

In addition, we have rich individual-level demographic and socio-economic attributes on the nodes (people) available. We consider the network to be a representation of the social opportunity structure in the Netherlands.

Here, we present first results that show how this population-scale social network is markedly different from many of the large scale social networks we typically study and reflect on the consequences for computational social science. Below, we in particular do so by revisiting the well-known concept of \emph{closure}.

Closure is important because individuals have very different resource structures encoded into their social relationships throughout their lives or across demographic groups, which affects their access to opportunities and information. However, if we choose to measure closure in a complete population scale social network through traditional local clustering coefficient on the separate layers, we would get values close to 1. By unioning edges from all layers, despite the average local clustering coefficient lowering to 0.40, it is still unable to resolve potential overlaps or bridges between edges from different layers in people's egonetworks.

To overcome this problem, we propose a normalized clustering coefficient that we call excess closure, that fully exploits the multilayer structure of the networks, and captures the fraction of triangles in people’s social circles that span across multiple types of relationships.

Figure 1 shows how degree and excess closure change with age (a demographic attribute) in the population. Young children have low degrees and very high excess closure since they are only part of family, neighborhood, and household structures. Subsequent levels of education paired with working opportunities come with both an increasing median degree, and decreasing excess closure, reaching its minimum around the university age. Working years are characterized by a slight increase in closure, and gradually decreasing degree, giving place to low degrees and increased closure in retirement years.

Our new normalized multilayer clustering coefficient measure excess closure helps to analyse complete large scale social networks. The measure captures overlap and bridging between edges of different types in the egonetwork of an indivual. We find that excess closure varies across demographic groups as well as throughout people's lives and it gives a more fine-grained understanding of closure in multilayer population-scale social network data. Our results show a sharp transition from closed to open network structures as young adults engage in higher levels of education, and a reverse process as people retire. These measurements are first steps in building both methods and universal insights on the rich network structure of highly curated population-level network datasets.