NETSCI 2022: INTERNATIONAL SCHOOL AND CONFERENCE ON NETWORK SCIENCE
PROGRAM FOR THURSDAY, JULY 28TH
Days:
previous day
next day
all days

View: session overviewtalk overview

11:00-12:00 Session 13: Poster V
RESPONSE OF GENE REGULATORY NETWORKS AFTER INFECTION OF H3N2 VIRUS

ABSTRACT. Viral infection is a complicated dynamical process, in which the viruses intrude into cells to duplicate themselves and triggers succeeding biological processes regulated by genes. It may lead serious disaster to human's health. A scheme is proposed to monitor the response of cells after their being infected by viruses. Co-expression levels of genes measured at successive time points form a gene expression profile sequence, which is mapped to a temporal gene regulatory network. The fission and fusion of the communities of the networks are used to find the active parts. We investigated an experiment of injection of flu viruses into a total of $17$ healthy volunteers, which develop into an infected group and a survival group. The survival group is much more chaotic, i.e., there occur complicated fissions and fusions of communities over the whole network. For the infected group, the most active part of the regulatory network forms a single community, but it is included in one of a large community and completely conservative in the survival group. There are totally $6$ and $7$ genes in the active structure that take part in the Parkinson's disease and the ribosome pathways, respectively. Actually, totally $30$ genes (covering $30/48=62.5\%$) of the genes in the active structure participate in the neuro-degeneration and its related pathways. This scheme can be extended straightforwardly to extract characteristics of trajectories of complex systems.

Multi-scale Transition Matrix Approach To Time Series

ABSTRACT. Statistical and structural characteristics provide us with a multi-dimensional picture of time series. Rooting the properties in a unified scheme is the preliminary step to develop a model that can reproduce most part or even the whole of the picture. In this paper, we proposed a concept called multi-scale transition matrix, which is a series of transition matrices describing the jumping probabilities between states after specified numbers of jumps each. The eigenvectors corresponding to the unitary eigenvalues are identical with the probability distribution function. The second largest eigenvalues depict the upper-boundary curve of the persistence. The change speed of the eigenvector corresponding to the second largest eigenvalue along with time scale displays the relaxation behavior of the time series. The multi-scale matrix maintains the structure of autocorrelation in the original time series and its evolution, which are merged by the averaging procedure in the statistical properties. These predictions are confirmed by using the series generated with the Auto-Regressive Conditional Heteroskedasticity Model, the Auto-Regression Model and the fractional Brownian motion, and the empirical records for Shenzhen Component index in Mainland China and the word length series of the novel entitled \emph{Remembrance of Things Past} written by Marcel Proust. Hence, the concept is a good candidate of bridges between the multi-dimensional picture and the dynamical models of the time series.

Family's Influence on Romantic Relationship and Its Reconstruction

ABSTRACT. Family's influence on romantic relationship is investigated from theoretical and practical perspectives. Theoretically, an ordinary equation based model is proposed to describe the romantic relationship between two partners, where the influence received by each individual from its family is taken into account. The introduction of the family's opinion leads to rich and interesting structure in the dynamical process. With the decreasing response of a partner to its own family, two bifurcations occur, separating the dynamical behavior into three types, i.e., the transition from damping oscillation to one of four stable states, two of stable states, and limit cycles. These findings are explained with the stability analysis of equilibrium points. Practically, for each individual the opinion from its partner's family is an interesting but hidden variable. The reservoir computing is adopted to discover the hidden variable from the activities of the individual, its family, and its partner. The model and the discovering method can be extended easily to investigate the relationship between two social groups such as the lateral negotiation, where the two representatives play game under the guidance from their own groups each.

A stochastic SEIHR model for COVID-19 data fluctuations

ABSTRACT. Although deterministic compartmental models are useful for predicting the general trend of a disease’s spread, they are unable to describe the random daily fluctuations in the number of new infections and hospitalizations, which is crucial in determining the necessary healthcare capacity for a specified level of risk. In this paper, we propose a stochastic SEIHR (sSEIHR) model to describe such random fluctuations and provide sufficient conditions for stochastic stability of the disease-free equilibrium, based on the basic reproduction number that we estimated. Our extensive numerical results demonstrate strong threshold behavior near the estimated basic reproduction number, suggesting that the necessary conditions for stochastic stability are close to the sufficient conditions derived. Furthermore, we found that increasing the noise level slightly reduces the final proportion of infected individuals. In addition, we analyzeCOVID-19 data from various regions worldwide and demonstrate that by changing only a few parameter values, our sSEIHR model can accurately describe both the general trend and the random fluctuations in the number of daily new cases in each region, allowing governments and hospitals to make more accurate caseload predictions using fewer compartments and parameters than other comparable stochastic compartmental models.

Reinforcement Learning based Whale Optimization Algorithm for Solving the Calibration Problem in Car Following Models

ABSTRACT. Simulating driving behavior in high accuracy allows for short-term prediction of traffic parameters, such as speeds and travel times. Car following models are a method to describe the connectivity structure of vehicular networks. Its calibration has received much attention in the simulation and traffic control field. It has been widely accepted that the model parameters vary in multiple dimensions, including across individual drivers, but also spatially across the network and temporally. The calibration of parameters can be transformed into an optimization problem. However, the traditional algorithms in solving this optimization problem is not efficient enough to react to different models, in order to support fast decision making. This research proposes a Q-learning based Whale Optimization Algorithm (QWOA) to improve the accuracy and efficiency. Reinforcement learning is integrated to the intelligent updating of solution agents in the algorithm, leading to an adaptive alternate between update strategies. The proposed algorithm is applied in calibrating 3 popular models: the Gipps car-following model, the Intelligent-Driver Model (IDM) and the Gazis-Herman-Rothery model (GHR). 4 different datasets testing on different road conditions and car types are employed to conduct the validation. The simulation results uncover that QWOA is able to locate the optimal solution with higher accuracy than the state-of-art algorithms in the literature. Meanwhile, it shows a superior performance in the calibration of 3 different models, which demonstrate a high adaptive capacity of QWOA in coping with varies models. This result implies that the proposed approach has benefits for a large-scale traffic network simulation analysis. The reinforcement learning method can also be extended to other optimization control in network.

Novel Graph Neural Network Based Efficient Subgraph Embedding Method for Link Prediction

ABSTRACT. Link prediction in complex networks is aiming to complete the missing links or predict the network to generate new links according to current network structure and other vital information. Link prediction is very important to mine and analyze the evolution of the network and it has been a hot research field of complex networks in recent years.Most of traditional link prediction algorithms often make predict basing on node similarity, while those link prediction algorithms basing on node similarity often needing to define the similarity function in advance with uncertain. The predefined similarity function has strong assumptions which is only suit to some very specific network structure, not universal and has great defects.To solve the mentioned defects of traditional link prediction algorithms, by analyzing the subgraph structure of target node, this paper proposes proposed a Graph Neural Network based subgraph target node link prediction algorithms LPGATS (Links Prediction by Graph Attention Networks for Subgraph) with enhanced Graph Attention Networks mechanism.In order to automatically learn the structural characteristics of subgraph around target node,our LPGATS firstly extracts the h-hop subgraph of the target node pair, and then predicts whether the target node pair generates links according to the subgraph.In the algorithm process,LPGATS algorithm used Graph Attention Networks mechanism to label the different importance of different node in predicting handling in achieving better experimental accuracy results.We have carried out experiments on seven real classical data sets with solid good experimental results. The experimental results have shown that the our link prediction algorithm LPGATS is suitable for various network structures and is superior to other classical link prediction algorithms such as CN(Common Neighbors),JC(Jaccard),AA(Adamic-Adar),RA(Resource Allocation),Katz and so on.

Temporal Network Sociomarkers for detecting early-warning signals of critical transition in social systems

ABSTRACT. There are critical transition phenomena in almost all the evolutional complex systems in the real world. When these systems evolve to a tipping point, the states of systems often undergo regular changes. As a particular kind of complex system, the social system is no exception. When the outbreak of major social events is approaching, some state parameters of the social system may also produce traceable variations. Therefore, the early warning research on social emergencies based on the critical transition theory of complex systems provides significant theoretical significance and application value to the fields of social system security management and control, crisis intervention, and public opinion guidance.

In this paper, a Temporal Network Socialmarker (TNS) method is proposed by drawing on the idea of the Dynamic Biomarker (DNB) method. Instead of the traditional rigorous dynamic models, this study utilizes the statistical characteristics and trends of systematic observation data to establish a model-free method to predict the critical transition of social systems. The temporal network is applied to sample the social system, and it can describe the evolution process of the social system from the perspective of behaviour interaction structure. Apply a hierarchical inference model to capture social groups at different scales, and transfer the research object from the individual to the group, which overcomes the data fluctuation caused by the subjective behaviour of the individual in the social system to a certain extent. Besides, the method supports backtracking to sensitive groups at critical points of system evolution and has good interpretability.

This study conducted experiments on the Enron Email Dataset and the John Jay & ARTIS Transnational Terrorism (JJATT) Dataset. In the analysis of Enron's email dataset, this method captures two tipping points in the evolution of Enron's system. One of them was the crisis caused by the conflict between the top management of Enron, and the other was that the formal investigation and the stock price plummeted. In the analysis of the JJATT dataset, this method captures the tipping point of the evolution of the terrorist relationship network system before the occurrence of 6 terrorist attacks. Most of them correspond to the first six months to one year before the terrorist attack. The above cases show that this method can capture the early warning signals of the critical transformation of the social system and illustrate a relatively accurate prediction ability for the correlated social emergencies.

Temporal Analysis of Transaction Ego Networks with Different Labels on Ethereum

ABSTRACT. Due to the widespread use of smart contracts, Ethereum has become the second-largest blockchain platform after Bitcoin. Many different types of Ethereum accounts (ICO, Mining, Gambling, etc.) also have quite active trading activities on Ethereum. Studying the transaction records of these specific Ethereum accounts is very important for understanding their particular transaction characteristics, and further labeling the pseudonymous accounts. However, traditional methods are generally based on static and global transaction networks to conduct research, ignoring useful information about dynamic changes. Our work chooses six kinds of important account labels, and builds ego networks for each kind of Ethereum account. We focus on the interaction between the target node and neighbor nodes with temporal analysis. Experiments show that there is a significant difference between various types of accounts in terms of several network features, helping us better understand their transaction patterns. To the best of our knowledge, this is the first work to analyze the dynamic characteristics of Ethereum labeled accounts from the perspective of transaction ego networks.

The interaction behaviors among nonferrous metal prices-based on the recurrence and network analysis

ABSTRACT. As the basic material of national economy, non-ferrous metals often show extreme volatility due to the increase of uncertain emergencies. These price fluctuations often transmit to other metals and different markets which exacerbates the complex nonlinear interaction of these prices. This paper attempts to explore the complex interactions among 8 kinds of non-ferrous metal prices in futures market and spot market from the perspective of nonlinear dynamics. The relationships among prices are measured by the recurrence quantification analysis method. And kinds of nonlinear interaction networks are constructed based on these measurement indicators from different angles to investigate their nonlinear interactions and the dependence structures. We then measure the evolutionary behaviors of the price interaction network structures, especially before and after extreme events. The results reveal that the interactions between different non-ferrous metals prices have gradually increased. The risk transmission of non-ferrous metal prices has become more complicated. In addition, the interactions among metal prices strengthened after the extreme events. Enterprises should expand their perspective and pay attention to more non-ferrous metals.

Time-Series Evolution Analysis of the Activity Level of Product Buyers Based on Amazon Data

ABSTRACT. With the popularization of the Internet and electronic payment methods, online consumption has become an important part of today's consumption structure, therefore grasping the dynamic transformation and evolution characteristics of user groups in the product life cycle has become the key for merchants to expand their sales scope and achieve steady sales growth. However, the existing research mostly focuses on the differentiated analysis of individual users, such as user portraits, recommendation algorithms, user stickiness, etc., and there is little time-series evolution analysis of the average activity level of product buyers. Taking Amazon users as an example, a "co-product network" is constructed for users who have purchased the same product. The research found that the number of purchases of each product obeys a power-law distribution, the products which have been purchased more than 10 times are selected as the research subject, regarding the top 5% products as high sales, and the last 10%-20% as low sales. It is found that the consumer groups of the two shows different evolutionary characteristics. Our research has shown that on the whole, the average user degree shows a trend of rising first and then falling, reaching a peak around the first year; separately, the average user degree of products with high sales shows a downward trend, while those with low sales show an upward trend. This indicates that if merchants can focus on users with a large degree in the early stage of product release, there is a higher possibility of obtaining better sales results.

Fast Approximation of Network Efficiency

ABSTRACT. Complex networks are encountered frequently in our day to day lives. Exact inference of efficiency values is infeasible in large networks due to the need to solve the all-pairs shortest path problem. A method of network efficiency estimation based on sampling was proposed for solving the problems of high complexity and time-consuming calculation of efficiency in large scale networks. The experimental result show that the proposed method can accurately and effectively estimates global efficiency of both generated and real-world network, which reduces the calculation time by at least 90% compared with original method.

Higher-Order Netwroks Inference via Hypergraph Neural Networks

ABSTRACT. Complex networks beyond pairwise interactions can greatly enhance our ability to model and predict real-world complex systems. However, the higher-order structures of the whole system typically are served as implicit variables and can not be obtained directly. Inferring the reasonable higher-order networks from node-wise time series is crucial for further research. In this work, we propose a hypergraph neural networks (HGNN)-based inference method, which can be used to reconstruct the higher-order networks from the time series data. We apply the proposed method to a diverse range of higher-order dynamics, including epidemic, spreading, phase oscillators, and social dynamics. The results demonstrate the ability of our method to recover reasonable higher-order networks without any prior knowledge of the system dynamics.

Resampling community detection to maximize propagation in complex networks

ABSTRACT. Identifying important nodes in complex networks is essential in theoretical and applied fields. A small number of such nodes have deterministic power to decide information spreading, so it is of importance to find a set of nodes that maximize network propagation. Based on baseline ranking methods, various improved methods were proposed, but there does not exist one enhanced method that covers all the base methods. In this paper, we propose a penalized method called RCD-Map, which is short for resampling community detection to maximize propagation, on five baseline ranking methods(Degree centrality, Closeness centrality, Betweennees centrality, K-shell and PageRank) with nodes' local community information. We perturbed the original graph by resampling to decrease the biases and randomness brought by community detection methods - both overlapping and non-overlapping methods, without increasing computation complexity. To assess the performance of our identifying method, SIR(susceptible-infected-recovered) model is applied to simulate the information propagation process. The result shows that methods with penalties all perform better with a vaster propagation range.

A Convolutional Neural Network Approach to Predicting Network Connectivity Robustness

ABSTRACT. In the past decades, complex networks have become a research hotspot attracting more and more attention. The study of complex networks is currently pervading all kinds of sciences, such as statistical physics, combinatorial mathematics, compute science and systems engineering. Nowadays, all the networked systems are faced with internal cascade failures and external malicious attacks. How to construct a more robust network is becoming increasingly essential. Optimizing the network robustness based on connectivity is a common idea. Network connectivity, which calculated by largest connected component(LCC) plays a significant role in maintaining its basic functions and properties. And the capacity of a network to maintain its connectivity against attacks is referred to as the connectivity robustness. Here we use a sequence of values that record the remaining connectivity of the network after a sequence of node- or edge- removal attacks to quantitatively measure the connectivity robustness. However, the current method for calculating the connectivity robustness of a network is attacking simulation, which is badly time consuming. The adjacency matrix of a network can be processed as an image with one channel. Besides, various of mathematical models are proposed to generate synthetic networks, which provides enough samples. These two factors above make it possible to apply CNN(convolutional neural network) in predicting connectivity robustness of a network. A lighter CNN structure based on VGG is designed to extract features of a input network, and then the downstream regressor finishes the prediction task. The neural network structure is as fig.1.The input is an adjacency matrix converted image; the output is the predicted LCC curve. We find that sometimes illogical values appear in the CNN output curve. Here we design a filter with boundary constraints and linear interpolation, which could fix those values too large or too little and locally increasing segments in LCC curve. The proposed method is nearly one hundred times faster than the original method and its prediction error level is less than standard deviation in most cases. In summary, the proposed method has high predicting precision and very fast processing speed, which implies the great potential of deep learning method on dealing with complex networks. Fig. 1: CNN structure details and partial experimental results.

Identification Methods of Important Nodes Based on Multi-Attribute in Hypernetworks

ABSTRACT. In order to overcome the deficiency of incomplete importance of nodes evaluated by single attribute and subjective weight selection of indicators, based on the K-shell method in hypernetwork, this paper introduces the influence of neighbor nodes on their own nodes while comprehensively considering the attributes of nodes, combined with the index of betweenness centrality, using the entropy method to determine the contribution weight of each index to node importance. A multi-attribute identification method for important nodes in hypernetworks(HMAC) is proposed from both local and global perspectives. The advantages and disadvantages of different identification methods are compared through the natural connectivity of network and the relative size of the maximum connected subgraph, and further verified by the empirical data of Xining city bus hypernetwork. The results show that the method can identify the important nodes in the hypernetwork comprehensively and effectively.

High dimensional logistic regression model with error function penalty

ABSTRACT. The traditional logistic regression model is a widely used model for solving various problems in low dimensional data. When applied to high dimensional data, however, it is discovered that the logistic regression model is not sparse and lacks consistent variable selection, resulting in a poor interpretation of maximum likelihood estimation. In order to address the variable selection problem in high dimensional environment, we propose a new regularized logistic regression model, called error function logistic (ERF-logistic), and also demonstrate its Oracle properties. The simulation study is used to investigate the performance of the proposed ERF-logistic model. Furthermore, we compare the performance of ERF-logistic regression with LASSO-logistic (LASSO for short, the same below), adaptive LASSO-logistic (ALASSO) and elastic net logistic (ELN). We use synthetic data to generate sparse models with real coefficient positions v1, v2, and v5, and the ERF-logistic estimates are correct for the number and position of coefficients. We also tested the performance of ERF-logistic with the leukemia public dataset, and ERF-logistic still outperformed existing methods.

The Kuramoto model on complex networks with community structures

ABSTRACT. The Kuramoto model is the classical oscillator model to study the synchrony phenomenon. The various evidence shows that modular structures play an essential role in the organisation of collective behaviours of oscillators. In the project, we are investigating the synchronisation phenomenon of Kuramoto on the complex networks with community structures. And we introduce different coupling strengths for inter-links and intra-links and examine how the characteristics of transitions are affected by the tunning of these two coupling strengths. The study reveals the competing effects of inside and outside couplings on synchronisation.

Iterative structural coarse graining for propagation dynamic in complex network

ABSTRACT. Complex networks well describe the connective pattern between units in a complex system and are widely used as the underlying structure to study the propagation dynamics. However, it has been a challenge to understand the behavior of propagation dynamics in large-scale networks due to the time-consuming cost of numerical simulation. In this case, it is a very crucial problem how to reduce the complexity of large-scale networks and preserve the properties of propagation dynamics. Here, we develop a novel framework to coarse grain the large-scale network by repeatedly merging cliques into a new node in a descending way. We then analytically give the minimum infection rate required for preserving the dynamic characteristic. Numerous numerical experiments in empirical networks confirm that our method can not only significantly reduce the size of networks but also efficiently preserve the dynamic characteristic, such as the outbreak size and critical threshold. In addition, we also designed some new methods on the basis of the framework to identify influential spreaders, key edges to immune, as well as sentinels to detect the outbreak. The results show that this method significantly outperforms the existing adaptive centralities, which further highlights that the framework provides a new perspective and solution to solve the real problem in propagation.

Refinement for community structures of bipartite networks

ABSTRACT. Bipartite networks composed of dichotomous node sets are ubiquitous in nature and society. Partly for simplicity’s sake, many studies have focused on their projection onto their unipartite versions where one only needs to care about a single type of node. When it comes to mesoscale structures such as communities, however, properly incorporating a priori structural restrictions such as bipartivity is ever more important. In this paper, as a case study, we take the community structure of bipartite networks in various scales to examine the amount of information of bipartivity encoded in the community detection procedure. In particular, we report the robustness in reliability of detected community based on consistency by comparing the detection algorithm with or without the consideration of bipartivity. From the analysis with model networks embedding prescribed communities and real networks, we find that the community detection tailored to take the bipartivity into account clearly yields more robust community structures than the one without such structural information. This demonstrates the necessity for customizing the community detection algorithm by encoding whatever information is known about networks of interest and, at the same time, raises an interesting question on the possibility of estimating the quantitative amount of information from such a customization.

What Factors can Affect Taking Over Members' Own Inter-organizational Social Networks in Business Firm?

ABSTRACT. Many researchers have studied social capital of business firms. Inter-organizational social network benefits them in various forms. However, previous research does not consider that an inter-organizational social network is no more than an inter-individual network. Therefore, the inter-organizational network always risks losing them if individuals leave their company. Here is a question: what factors do affect taking over leaving users' social capital connecting to external firms? We focus on the relationship between intra-organizational social networks and the inheritance of inter-organizational social networks from leaving members. This study uses datasets of business cards in a business firm, which represent the external social networks of the firm. To analyze the factors on the inheritance of social capital, we create some network variables of members who are left out of the company. First, we define "legacy rate", which refers to the percentage of business cards that a given user has only one of in the firm. Second, we also define "recovery rate", which refers to the percentage of a user's legacy business cards that have since been reincorporated by other users. Finally, we need some intra-organizational network measures. We create a binary network where users are connected to those who captured the same business cards, which can be interpreted as a network for information sharing and collaboration. As a result, we found two distinctive features. First, a higher legacy rate relates lower recovery rate. Second, higher network closure relates higher recovery rate. It is quite interesting that the clustering coefficient and recovery rate positively correlate. This result could be interpreted based on some theories of social networks. For example, it is widely known that network closure leads to high motivation to share their knowledge. This study suggests that closed triads and dense teams are key to maintaining the social capital of firms.

A Model of Multidimensional Opinion Formation in Online Social Networks

ABSTRACT. Today's online social networks (OSNs) face the social problem of polarization, which divides users into groups with opposing opinions. Although many opinion formation models have been proposed to elucidate the mechanism of polarization, they do not consider the ``curse of dimensionality,'' which can be a problem in multidimensional opinion formation models. The characteristics of users and their posts can be categorized by using a distributed representation of words, in which the word is represented by a several-hundred-dimension vector. However, since the distance between higher-dimensional vectors tends to be large as known as the curse of dimensionality, the opinion interaction model in higher-dimensional space is difficult to construct. This paper proposes a model that constructs a low-dimensional subspace corresponding to the current topic from the words and proposes a model that interacts between the opinion vectors by projecting the higher-dimensional opinion vectors into a low-dimensional subspace. As the rule for the interaction of low-dimensional opinion vectors, we extend our previously proposed 1-dimensional opinion formation model. This rule can incorporate empathy and repulsion reactions and is extendable to multiple dimensions by using subspace with periodic multi-dimensional torus boundaries. We conducted simulations using randomly generated opinion value vectors and word vectors. The simulation was conducted with the subspace of different dimensions, and a comparison of the time evolution of opinion values for a particular word vector shows that opinion formation proceeds quickly when the number of dimensions is small, but as the number of dimensions increases, there was almost no fluctuation in opinion. This suggests that active discussion focusing on a particular topic contributes to opinion formation.

12:00-13:15 Session 14A: Dynamics II
12:00
Emerging Complexity in Collective Dynamic Responses of Networked Systems

ABSTRACT. With a growing share of renewable energy supply in modern power grids, the safe and reliable grid operation is now increasingly challenged by the network-wide impact of strong, distributed and stochastic fluctuations in power supply [1]. Yet, the complex spatio-temporal response patterns emerging from the interplay between fluctuating inputs, grid dynamics, and network structure are far from understood to date. Here we show how a single periodic signal already induces complex dynamic response patterns across the network [2], and how the power spectrum of a single stochastic signal determines the heterogeneity and stationarity of the distributed network responses [3].

[1] Witthaut et al., Reviews of Modern Physics (2022). [2] Zhang et al., Science Advances (2019). [3] Zhang et al., in preparation.

12:15
The spatio-temporal propagation of signals in complex networks

ABSTRACT. A major achievement in the study of complex networks is the observation that diverse systems, from sub-cellular biology to social networks, exhibit universal topological characteristics. Yet this universality does not naturally translate to the dynamics of these systems, hindering our progress towards a general theoretical framework of network dynamics. The source of this theoretical gap is the fact that the behavior of a complex system cannot be uniquely predicted from its topology, but rather depends also on the dynamic mechanisms of interaction between the nodes, hence systems with similar structure may exhibit profoundly different dynamic behavior. To bridge this gap, we derive the patterns of network information transmission, indeed, the essence of a network's behavior, by offering a systematic translation of topology into the actual spatio-temporal propagation of perturbative signals.

In our formalism the system is captured by two layers of description: the underlying network Aij, which is weighted, random and often exhibits extreme levels of heterogeneity (e.g., scale-free); the nonlinear dynamic interaction mechanisms, capturing, e.g., epidemic spreading, gene regulation or neuronal activation. Our main finding is that this extremely broad range of nonlinear dynamic models, exhibits propagation rules that condense around three highly distinctive dynamic universality classes, characterized by the interplay between network paths, degree distribution and the interaction dynamics. Along the way we uncover several interesting mappings of structure to dynamics – for example, we derive the conditions where hubs expedite information spread, an ultra-efficient signal propagation, vs. those where hubs become bottle-necks that, despite the extremely short paths, delay the propagation and prevent signals from efficiently penetrating the network.

Ultimately, we show that we can predict the precise patterns of propagation – i.e. when/where a signal will be observed in the network – for a vast base of combined network and dynamics. This prediction helps us leverage the major advances in the mapping of real world networks, into predictions on the actual dynamic propagation, from the spread of viruses in social networks to the diffusion of genetic information in cellular systems.

12:30
Rethinking the Micro-Foundation of Opinion Dynamics: Rich Consequences of an Inconspicuous Change

ABSTRACT. A key aspect in the study of group behavior is to understand how individuals form collective opinions via peer influence. To identify the main mechanisms underlying complex opinion formation processes, researchers have long been exploring simple mechanistic mathematical models. Most opinion dynamics models are built on the common assumption that individuals integrate others' opinions by taking some weighted averages, however, researchers might need to rethink this micro-foundation. The universally-adopted weighted-averaging mechanism features a non-negligible unrealistic implication. To remedy this unrealistic feature, additional assumptions and parameters have to be introduced at the expense of losing model simplicity and mathematical tractability. In this paper, we propose a new micro-foundation of opinion dynamics, i.e., the weighted-median mechanism, which fundamentally resolves the above problem without any additional assumption. As indicated by a complete set of studies, such an inconspicuous change from averaging to median leads to rich consequences. The weighted-median mechanism, derived from the cognitive dissonance theory in psychology, is well supported by online experiment data. It also broadens the applicability of opinion dynamics models to multiple-choice issues with ordered discrete options. Theoretical analysis reveals that, despite its simplicity in form, the weighted-median mechanism exhibits rich dynamical behavior dependent on some delicate social network structures. Moreover, comparative numerical studies show that the weighted-median mechanism predicts various non-trivial real-world patterns of opinion evolution, while some widely-studied averaging-based models fail to. For example, among all the models in comparison, only the weighted-median model captures the empirically validated feature that consensus is less likely to be achieved in larger or more clustered groups. In addition, regarding how extreme opinions are located in social networks, only the weighted-median model reveals a pattern that is consistent with a real Twitter dataset. Namely, extremists tend to reside in peripheral areas of social networks and form small local clusters. All the above arguments and evidence supports the weighted-median mechanism as a well-founded and expressive micro-foundation of opinion dynamics.

12:45
Multistability, intermittency and hybrid transitions in social contagion models on hypergraphs

ABSTRACT. Although ubiquitous, interactions of groups of individuals (e.g., modern messaging applications, group meetings, or even a parliament discussion) are not yet thoroughly studied. Frequently, single-groups are modeled as critical-mass dynamics, which is a widespread concept used not only by academics but also by politicians and the media. However, less explored questions are how a collection of groups will behave and how the intersection between these groups might change the global dynamics. Here, we follow our initial formulation in terms of binary state dynamics on hypergraphs, which generalizes both the SIS epidemic spreading process and the simplicial complex contagion. We show that our model has a rich and unexpected behavior beyond discontinuous transitions, as initially expected for higher-order networks. In particular, we might have multistability and intermittency due to bimodal state distributions, as illustrated in Fig. 1 (please, see attached pdf file). Furthermore, by using artificial random models, we demonstrated that this phenomenology could be associated with community structures. Specifically, we might have multistability or intermittency by controlling the number of bridges between two communities with different densities. The introduction of bridges (hyperedges connecting different communities) destroys multistability but creates an intermittent behavior. Furthermore, we provide an analytical formulation showing that the observed pattern for the order parameter and susceptibility are compatible with hybrid phase transitions. Our findings open new paths for research, ranging from physics, on the formal calculation of quantities of interest, to social sciences, where new experiments can be designed. This abstract is a summary of the findings presented in our pre-print in [https://arxiv.org/abs/2112.04273].

13:00
Balanced Hodge Laplacians Optimize Consensus Dynamics over Simplicial Complexes

ABSTRACT. Despite the vast literature on network dynamics, we still lack basic insights into dynamics on higher-order structures (e.g., edges, triangles, and more generally, k-dimensional “simplices”) and how they are influenced through higher-order interactions. A prime example lies in neuroscience where groups of neurons (not individual ones) may provide the building blocks for neurocomputation. Here, we study consensus dynamics on edges in simplicial complexes using a type of Laplacian matrix called a Hodge Laplacian, which we generalize to allow higher- and lower-order interactions to have different strengths. Using techniques from algebraic topology, we study how collective dynamics converge to a low-dimensional subspace that corresponds to the homology space of the simplicial complex. We use the Hodge decomposition to show that higher- and lower-order interactions can be optimally balanced to maximally accelerate convergence, and that this optimum coincides with a balancing of dynamics on the curl and gradient subspaces. We additionally explore the effects of network topology, finding that consensus over edges is accelerated when 2-simplices are well dispersed, as opposed to clustered together.

12:00-13:15 Session 14B: Ecological Networks
12:00
Practical challenges in Network Ecology in the Big Data era

ABSTRACT. Ecology is the study of the relationships between the biotic and the abiotic compartments of nature. Its complexity can be pictured by adopting a network approach, i.e. modelling each actor as a node and each relationship with an edge. Although ecological networks represent one of the best way to study and compare ecosystems, building a large network of this kind is often difficult and extremely time-consuming (especially if we aim at capturing the entire web of interactions and not just a subset of them, as specialised food-webs, pollination networks or host-parasite networks). In our humble opinion, the challenges posed by the hot topic of the green transition (how does the ecological complexity vary across habitats and locations? To what extent are the ecological networks resilient? Can we infer the health conditions of a given ecosystem? If so, how?) need a huge amount of open data concerning biodiversity and interaction networks to be properly tackled - at the moment, in fact, the amount of adequate open data is simply not on par with more common, already established techniques (e.g. the Species Distribution Model). To this aim, our first proposal is that of standardising node aggregation on taxonomy. Standardisation would solve the problems related to the definition of an ecological network and to the comparison between different ones; besides, taking advantage from existing open datasets it would also speed up the data gathering process. The topic of node aggregation is not new: here, however, we address it without resorting upon global network metrics but focusing on the behaviour of each node before and after the simplification process - in such a way that every ecologist can observe the performance of the simplification process in an entity-wise fashion for all taxa. Second, we introduce the concept of Open Potential Ecological Network (OPEN) each node of which represents a species in a pre-determined location and all its interactions: we built our OPEN completely in silico, by exploiting open databases contained in GBIF and GLOBI. Our choice has been driven by the consideration that consensus on ecological networks induced by algorithms has not been reached yet; hence, we decided not to simulate a network but to build a potential one, every node of which is an actual species present in that location and every interaction of which is potentially real (since recorded at some point in space and time). We believe our two proposals to have the potential of helping policymakers to make better-informed decisions on the future of the green transition and the role of urban biodiversity.

12:15
(CANCELLED) Ecology and space in the interaction of real estate agencies.

ABSTRACT. The real estate market is a socio-technical system reflecting the complexity of human interactions. The most relevant variables of study are the land use, the property types and values, their spatial distribution. Besides the buyers and sellers, this market is characterized by an overwhelming presence of intermediary agencies that can strongly influence the transaction prices. In this work, we develop a network science study of the real estate market in three main regions of interest in Spain (Madrid, Barcelona, Balearic Islands). Our hypothesis stands for that each region encloses a limited number of properties, in which the increment of the population’s demand leads to an unbalance in the market, i.e., agencies struggling for limited sources (properties to announce for) to trade for in enclosed geographies. Hence we are interested to unveil spatio-commercial patterns agencies shape in each region of interest. Inspired by network models of ecology, we built a bipartite weighted network to capture the market dynamics that real estate agencies exert on specific subdivisions of each region. We codified real estate agencies into a set of nodes, and the land subdivisions (1km2 cells) where these agencies have advertisements into another set of vertices. Each agency is associated with a cell by the number of advertisements it announces on it. This leads to an early categorization of real estate actors with generalist agencies, operating all over the region, and localist ones working in specific places. We construct a model of agencies’ competition by projecting the bipartite graph into the space of agencies where the link’s weight represents an statistics of the competition level two agencies have. The weighted the link, the more competence two agencies have in each cell. This way, we account for both the presence of agencies in cells, and the influence (proportion of advertisements) they exert over these subdivisions. We capture the community structure of the agencies’ networks (Louvain, Infomap, Oslom), by iteratively removing percentages of generalist agencies obtaining stable clusters of localist agencies with common spatio-commercial patterns. Our results yield stable communities with marked spatial affinities: (class 1) local agencies that uniquely compete with generalist ones, and (class 2) local agencies actively interacting among them on the market. Noticeably to mention that the former class of agencies operates in the urban periphery, while the latter spatially organize their activity inside the metropolitan areas, and generalist ones are spread all over the region (see Fig. 1). These findings reveal new spatio-commercial patterns emerged by the interaction of agencies in specific regions of interest of Spain, suggesting novice geographical regions where the active market is not completely driven by the generalist ones. The full results are currently in the writing process.

12:30
Heterogeneity increases the fraction of coexisting species in mutualistic networks

ABSTRACT. Abstract: In recent decades, mutualistic networks have attracted much attention because of their unique structural properties (e.g., high nestedness, etc) that make them a favor for species existence. The feasibility and stability of species equilibrium in mutualistic networks have been widely studied, while the fraction of species that could be co-existed is rarely investigated. To investigate the relationship between structure and coexistence, we monitor how species coexist as link weight is modified, why we remain linking structure. A few studies have proposed mechanics to adjust link weights, but most are grounded in structure-based approaches, without considering the dynamics of the systems. Here, we proposed a mechanism of co-adaptation between the structure and dynamics in mutualistic networks. In our model, structure receives reciprocal adaptation resulting from species persistence dynamically -- the species persistence changes with the adaptive structure. It is found that our method can increase can increase the fraction of coexisting species dramatically. The reason is comparing to structure-based adaptation, our method can increase the heterogeneity of structure. Owing to the broad applications of mutualistic networks, our findings offer new ways to design mechanisms that enhance the resilience of many other systems, such as smart infrastructures and social-economic systems.

12:45
Identifying sensor species that anticipate critical transitions in multispecies ecological systems

ABSTRACT. Ecological systems can undergo sudden, catastrophic changes known as critical transitions, and the statistical methods known as early-warning signals have be used to anticipate them. However, the presence and intensity of the early-warnings in the different species depend on the system’s dynamics, which is usually not completely known or even totally unknown. Therefore, detecting early-warning signals in ecosystems remains very challenging. This is especially true for systems with many species because the early-warning signals can be weakly present or even absent in some of them, and there is no general methodology to identify those species that display early-warning signals. In this work we show that, in mutualistic ecological systems, it is possible to identify species that early anticipate critical transitions by knowing only the system structure —that is, the network topology of plant-animal interactions. To do so, first, we leverage the mathematical theory of structural observability of dynamical systems to identify minimum sets of “sensor species” whose measurement guarantees that we can infer changes in the abundance of all other species. Importantly, the identification of these minimum sets of sensor species does not depend on the system’s specific dynamics. Second, through an algorithm that samples many maximum matchings of the network, we assign a score to every species in the system depending of how likely it is to be a sensor; we call this the “sensor score”. To evaluate the usefulness of the sensor score to detect early-warnings, we used a dataset containing 51 empirical plant–pollinator and seed-dispersal networks, and numerically induced a critical transition on them. Then, we analyze the performance of every species to detect early-warnings and assigned an “early-warning score” to each of them (higher score means earlier detection). We observed that about 14% of the species may not show any warning (score zero), and an additional 65% of species may show “late” warnings of the critical transition (low score). Finally, we compared the sensor score of every species, in every network, to its early-warning score. We found that species that have a higher sensor score, i.e., are more likely to be sensors, tend to anticipate earlier critical transitions than other species. Our results underscore how knowing the network structure of multispecies systems can improve our ability to anticipate critical transitions.

13:00
Generalized positive feedback loops determine the magnitude of the extinction effect in plant-pollinator communities

ABSTRACT. The extinction of a species in a plant-pollinator mutualistic community can cause a cascading effect and lead to major biodiversity loss. The ecologically important task of predicting the severity of the cascading effects is made challenging by the complex network of interactions among the species. Bipartite networks are widely used to represent plant-pollinator communities; in such networks nodes denote the plant and pollinator species and edges denote the mutualistic interactions between them. Here we used the Boolean threshold model of Campbell et al. [1] and our previously proposed methods [2] to show that identifying generalized positive feedback loops (stable motifs) can help pinpoint the species whose extinction leads to catastrophic damage to the community. A stable motif is a set of nodes and their corresponding states such that the nodes form a minimal strongly connected component and their states form a trap space of the Boolean model, meaning that the state of any node inside the stable motif does not change regardless of the state of the nodes outside the stable motif [3]. Each stable motif can be controlled via its driver set - a minimal set of node states that if stabilized, results in the stabilization of all the node states in the stable motif [4]. Using stable motif analysis, we identified and stabilized the driver sets of the stable motifs that correspond to the extinction of groups of species. We then measured the damage percentage with respect to a stable community with the maximum number of species. We compare these results with the loss of species selected from the top 10% according to previously studied structural measures - betweenness centrality, node contribution to nestedness [5], and node MusRank score [6]. Our results in Figure 1 show that the driver sets of stable motifs can identify certain crucial species that the other measures fail to find. The correct identification of such species has important implications for conservation efforts and developing community management strategies; stable motif analysis shows promising potential to be used as a complementary tool in such endeavors.

[1] Campbell, C., Yang, S., Albert, R. and Shea, K., 2011. A network model for plant–pollinator community assembly. Proceedings of the National Academy of Sciences, 108(1), pp.197-202. [2] Nasrollahi, F.S.F., Za˜nudo, J.G.T., Campbell, C. and Albert, R., 2021. Relationships among generalized positive feedback loops determine possible community outcomes in plant-pollinator interaction networks. Physical Review E, 104(5), p.054304. [3] Zanudo, J.G. and Albert, R., 2015. Cell fate reprogramming by control of intracellular network dynamics. PLoS computational biology, 11(4). [4] Yang, G., G omez Tejeda Za˜nudo, J. and Albert, R., 2018. Target control in logical models using the domain of influence of nodes. Frontiers in physiology, p.454. [5] Jonhson, S., Dom ınguez-Garc ıa, V. andMu˜noz, M.A., 2013. Factors determining nestedness in complex networks. PloS one, 8(9), p.e74025. [6] Dom ınguez-Garc ıa, V. and Munoz, M.A., 2015. Ranking species in mutualistic networks. Scientific reports, 5(1), pp.1-7.ndeavors.

12:00-13:15 Session 14C: Theory III
Chair:
12:00
Homophily and preferential attachment generate core-peripheries

ABSTRACT. Core-periphery (CP) structures are, along with communities, one of the most common mesoscale arrangements found in empirical networks. In core-peripheries, a group of nodes called the core dominates connections, whereas a second group known as the periphery remains largely unconnected to itself, and mostly to the core. Despite their ubiquity, most research has focused only on the detection of core-peripheries, whereas the mechanisms that produce them are largely unknown. We show that core-peripheries emerge on networks that evolve through combinations of preferential attachment and homophily (while neither of the two mechanisms produces core-peripheries in isolation for fixed groups). This result applies to both growing and rewiring network evolution dynamics. We first develop and implement evolution algorithms for networks with two groups A and B, whose connections are governed by two homophily parameters, as well as a parameter that balances between preferential attachment and random rewiring. The combined results from simulations and mean-field analyses reveal that preferential attachment and homophily can interact in non-trivial ways that yield core-periphery structures. We find both growing and rewiring models differ in the qualitative type of CP structures they generate. In rewiring we find the strict scenario where most connections are located in the core, while the periphery remains largely disconnected; in growing models, CP structures still retain a large portion of inter-group links, while the periphery is not entirely disconnected to itself. We also find that network evolution processes can be directly impacted by initial conditions, leading to scenarios where one group becomes and remains the core, despite both groups being similarly homophilous and of the same size. We validate our results by analyzing four empirical networks, where we fit our model parameters via a maximum-likelihood method from temporal network statistics and predict the core-peripheriness of the networks at different points in time. In addition, we showcase the potential of our model by displaying the effect of parameter interventions on the networks' core-peripheriness. Our research provides an explanation for the emergence of core-periphery structures in networks via the combination of two well-known network evolution mechanisms. These insights have applications in social networks, transportation networks, and others. For example, the existence of privileged and elite social groups can be understood as an emergent phenomenon caused by seemingly unrelated mechanisms.

12:15
Identifying influential nodes by leveraging redundant ties
PRESENTER: Bitao Dai

ABSTRACT. Structure-based influential nodes identification is a long-term challenge in the study of complexnetworks. While global centrality-based approaches are generally considered to be more accurateand reliable, the requirements of complete network information and low computational complexityare hard to meet, limiting their applications in many practical scenarios. In addition, recentstudies have highlighted the effect of cyclic structures introducing redundant paths in networkconnectivity and exaggerating the importance of traditional centrality measures. In this work, wedevelop a new centrality metric, called Multi-Spanning Tree-based Degree Centrality (MSTDC),to quantify node importance with linear complexity by leveraging redundant ties. MSTDC iscalculated using the aggregation of degrees of a small number of spanning trees constructedwith a few randomly selected root nodes. Our experiments on six empirical networks revealthat MSTDC outperform other benchmark network indices in identifying influential nodes fromthe perspective of both maintaining network connectivity and maximizing spreading capacity. In addition, we find that MSTDC is extraordinarily effective in networks with high clusteringcoefficients. Our study provides novel insights into the role of redundant connections in networkstructural and functional analyses.

12:30
Competing spreading dynamics in simplicial complex

ABSTRACT. Interactions in biology and social systems are not restricted pairwise but can take arbitrary sizes. Extensive studies have revealed that the arbitrary-sized interactions significantly affect the spreading dynamics on networked systems. Competing spreading dynamics, i.e., several epidemics spread simultaneously and compete with each other, have been widely observed in the real world. Yet, how arbitrary-sized interactions affect competing spreading dynamics still lacks systematic study. This study presents a model of two competing simplicial susceptible-infected-susceptible epidemics on a higher-order system represented by simplicial complex and analyzes the model’s critical phenomena. In the proposed model, a susceptible node can only be infected by one of the two epidemics, and the transmission of infection to neighbors can occur through pairwise (i.e., an edge) and higher-order (e.g., 2-simplex) interactions simultaneously. A mean-field (MF) theory analysis and numerical simulations show that the model displays rich dynamical behavior depending on the 2-simplex infection strength. When the 2-simplex infection strength is weak, the model’s phase diagram is consistent with the simple graph, consisting of three regions: the absolute dominant regions for each epidemic and the epidemic-free region. With the increase of the 2-simplex infection strength, a new phase region called the alternative dominant region emerges. In this region, the survival of one epidemic depends on the initial conditions. Our theoretical analysis can reasonably predict the time evolution and steady-state outbreak size in each region. In addition, we further explore the model’s phase diagram both when the 2-simplex infection strength is symmetrical and asymmetrical. The results show that the 2-simplex infection strength significantly impacts the system phase diagram.

12:45
Epidemic Dynamics in scale-free hypernetwork

ABSTRACT. The hypergraph offers a platform to study structural properties emerging from more complicated and higher-order than pairwise interactions among constituents and dynamical behavior such as the spread of information or disease. Account for the impact of community structure on the spread of disease, mathematical modeling of epidemic propagation on networks is extended to hypergraphs. Taking the uniform scale-free hypernetwork as the topological structure of individual contact network in the population, the SIS propagation model was constructed with the response process strategy and contact process strategy as the propagation mode. The transmission threshold of infectious diseases on the uniform scale-free hypernetwork was theoretically analyzed, and the steady-state equation was given.The effects of hypergraph structure and the model parameters are investigated via individual-based simulation. The results show that compared with the complex network based on ordinary graph, infectious diseases spread more easily on the hypernetwork based on hypergraph. Based on the COVID-19 epidemic data published online in 2020, the SIS propagation model on the real contact hypernetwork was constructed and the epidemic transmission law was analyzed. It was found that the transmission of infectious diseases on the network constructed with real data was consistent with the theoretical analysis and numerical simulation results. The conclusion of this study provides a theoretical basis for the prevention and control of the epidemic in China and the prevention of infectious diseases in the future.

13:00
Identifying key players in complex networks through network entanglement

ABSTRACT. Finding an optimal set of key nodes in a network whose removal from the network would dismantle the network is one of the fundamental research problems of Network Science. In this paper, we introduce an entanglement-based dismantling framework, which captures the network's transport properties and enables new insights into the intrinsic topological features of the complex system. This framework is founded on a frontier intersection study of quantum information and complex systems. A new vertex entanglement is then presented to quantify the importance of vertices in maintaining functional diversity based on the proposed entanglement framework. We analytically show that the vertex entanglement tends to vanish in a transient diffusion time, while coinciding with the number of connected components on an extremely long time scale. As an application, the proposed vertex entanglement provides a novel approach to tackle the dismantling task, note that optimal network dismantling remains an intractable and challenging problem. We compare the results with other cutting-edge algorithms, pinpointing the superior performance of vertex entanglement in network dismantling tasks.

12:00-13:15 Session 14D: ML for Networks I
Chair:
12:00
Mastering percolation-like games with deep learning

ABSTRACT. Percolation theory offers a set of highly developed tools for understanding the phase transition between a connected and disconnected network. However, applying the theory requires strong assumptions on the order of node removal. In recent years, there has been increasing interest not just in how a network responds to node removals due to various heuristics, but rather to try and discover the optimal attack.

Machine learning suggests a different approach for optimizing percolation strategy. Instead of defining analytically tractable heuristics, machine learning methods treat the problem as a ``black box,'' and use the expressive power of multilayer neural networks to optimize the objective. Deep reinforcement learning is particularly promising in this regard. AlphaGo in particular is of interest because a winning strategy requires creating lattice connectivity more effectively than one's opponent.

Here we revisit lattice percolation and develop a playable game, the objective of which is to disable all nodes in as few moves as possible. We consider several definitions for disabling nodes, based on largest component size, flow barriers or surface area to volume ratio. We train agents using reinforcement learning that can successfully play each version of the game with performance comparable to a human expert. The trained agents expose the intrinsic differences between superficially similar games. The fact that that minor differences in the game definition induce substantial differences in strategy, implies that there may be no single strategy for attacking or defending a network.

12:15
Prediction and mitigation of cascading failures using a graph-neural-network approach

ABSTRACT. A local disturbance can often trigger cascading failures of other elements in complex systems, for instance, the blackout of power grids. Naturally, prediction and control of cascading failures in complex networks became a central research topic in network science. Here we study the avalanche dynamics and mitigation strategy of cascading failures on a couple of synthetic models, the Motter-Lai model and the power-grid model proposed by P. Schultz et al., and empirical power grids of France, Spain, and other countries. We show that the reinforcement of the nodes that fail subsequently without a systematic strategy can rather increase avalanche size. Instead, we propose an avalanche centrality of each node $i$, related to the global avalanche size triggered by node $i$ and the probability that node $i$ fails by other nodes, can be utilized for effective containment of avalanche. However, this quantity requires a high computation time of $O(N^3 \log N)$ with respect to the system size $N$. Thus, this quantity may not be useful for large systems. To overcome this barrier, we develop a graph neural network (GNN) algorithm, of which the computational complexity is scalable as $O(N)$. Moreover, the algorithm is reasonably transferable to other untrained networks. Accordingly, the GNN trained on small synthetic networks is applicable to much larger synthetic networks and empirical power grids. The avalanche centrality predicted by the GNN for large networks can be applied to effective avalanche mitigation. The GNN framework can also be implemented in other complex processes that require high computational costs for simulations.

12:30
Geometric graphs from data to aid classification tasks with Graph Convolutional Networks
PRESENTER: Yifan Qian

ABSTRACT. Traditional classification tasks learn to assign samples to given classes based solely on sample features. This paradigm is evolving to include other sources of information, such as known relations between samples. Here, we show that, even if additional relational information is not available in the dataset, one can improve classification by constructing geometric graphs from the features themselves, and using them within a Graph Convolutional Network. The improvement in classification accuracy is maximized by graphs that capture sample similarity with relatively low edge density. We show that such feature-derived graphs increase the alignment of the data to the ground truth while improving class separation. We also demonstrate that the graphs can be made more efficient using spectral sparsification, which reduces the number of edges while still improving classification performance. We illustrate our findings using synthetic and real-world datasets from various scientific domains.

12:45
Hierarchical Multiagent Reinforcement Learning for Bus Bunching Control

ABSTRACT. Bus network plays a crucial role in the multi-mode transportation systems. Experience reveals that buses for high-frequency transit lines mostly arrive irregularly at stops, generally in bunches. Previous work explained that the variability of traffic conditions and stochastic arrivals of passengers at bus stops induced this inherent instability. Such phenomenon is named as bus bunching and has several negative impacts, such as lengthening waiting time, reducing public transport users, increasing operating costs, etc. To curb bus bunching, researchers proposed multiple control strategies, including holding, stop skipping, operating speed controlling, and boarding limits. Although holding control is a proven strategy to improve service reliability, it has the secondary effect of reducing the operational speed. Therefore, to eliminate slowdown effects from holding control, this paper constructs a hybrid strategy integrating operating speed control and holding. Latest developments in model-free reinforcement learning (RL) have provided a new perspective on sequential control problems, enabling us to develop efficient and effective control strategies for an extensive system while eliminating complex model analysis. Wang and Sun proposed a multi-agent deep reinforcement learning (MDRL) framework to develop an efficient and reliable holding control policy considering the whole transit system. However, little work has ever succeeded in applying RL to bus systems, with multiple control strategies supported. Our work fills the gap by proposing a multi-agent hybrid actor-critic architecture based on proximal policy optimization (PPO), consisting of a two-layer policy model. We follow a centralized training, but decentralized execution paradigm: communication between different agents facilitates the training process, while each agent executes its policy independently based on local observations during execution. This study differs from previous work in the following aspects: (i) We propose a multi-agent hybrid actor-critic architecture offering two methods of operating speed control and holding, improving the transit service quality while also reducing the whole travel time of all passengers. (ii). Considering the delay in receiving rewards for various actions, we design a large discount factor for actions with immediate reward (holding) and a smaller one for actions that take longer to payoff (accelerating). (iii). We leverage the self-attention module to enable the critic network input observations of a varying number of activated agents. Monte Carlo simulation experiments with dynamic demand and traffic disturbance are conducted to compare the performance of both traditional headway-based control methods and existing MARL methods. Results show that our method outperforms other baselines, not only stabilizing a strongly unstable bus line but also shortening the traveling times of passengers. Furthermore, this proposed architecture is not restricted to a hybrid strategy of operating speed control and holding and can also support more control methods such as traffic signal priority.

13:00
Ethereum Account Classification based on Graph Convolutional Network

ABSTRACT. Accounts in Ethereum are found to be involved in various services or businesses. Account classification can help us detect illegal behavior, track transactions, and de-anonymize the Ethereum transaction system. In this paper, we make use of Graph Convolutional Network (GCN) to solve the account classification problem in Ethereum. We model the Ethereum transaction records as a large-scale transaction network and find that the network is with high heterophily, in which accounts with different features and different labels are connected. In order to solve this problem, we propose a GCN-based model called EH-GCN. The experimental results on a realistic Ethereum dataset show that the proposed method achieves the most advanced classification performance, and results on benchmarks show it produces a competitive performance under homophily.

12:00-13:15 Session 14E: Social Networks II
12:00
Quantifying Political Polarization on a Network Using Generalized Euclidean Distance

ABSTRACT. It is commonly thought that political polarization on social media is on the rise. This belief can only be supported if we have a measure for polarization. Such a measure should take into account how extreme the opinions of the users are, how much they organize in echo chambers, and how isolated these echo chambers are from each other. The most popular ways of estimating polarization are insensitive to at least one of these factors, thus they cannot support the opening statement. In this paper, we propose a measure of political polarization which can capture the three factors we listed. The measure is based on the Generalized Euclidean distance, which estimates the distance between two vectors on a network – e.g. vectors representing user opinions. This measure can fill the methodological gap left open by the current state of the art, and leads to useful insights when applied to real world debates happening on social media and to data from the US Congress.

12:15
Optimal social bubbles in social networks by a fast-decycling framework

ABSTRACT. The process of globalization creates many opportunities, but sometimes also brings side effects that may cause damage to our societies. One recent example is the quick global contagion of COVID-19, which has now killed more than six million people worldwide. Some measures have been implemented to prevent infection and slow transmission of COVID-19, such as keeping a physical distance and creating social bubbles. Such measures are intended to reduce the risk of infection by decreasing the interactions among social networks. This process, theoretically, corresponds to the optimal bond percolation in complex networks. Optimal bond percolation (OBP) is the problem of finding the minimal set of edges whose removal or deactivation from a network would dismantle it into isolated subcomponents, i.e., social bubbles, at most size C. Solutions to the OBP problem is also a theoretic fundamental of the strategies to cope with the epidemic, such as social distancing and population immunization. To solve the OBP problem, we proposed a fast-decycling framework composed of three stages: (1) recursively removes edges of the highest importance from the 2-core structure of the network, (2) recursively breaks the trees in the remaining network, and (3) reinserts the unnecessarily removed edges into the networks through the explosive percolation process. We introduced two categories of algorithms based on this 2-core based framework: localized (decentralized) algorithms and globalized algorithms. These 2-core based approaches perform better than existing simple OBP algorithms and are as good as the state-of-art algorithms when applied on real-world networks. Our results shed light on the faster design of more practical social distancing and social bubble policy.

12:30
Friendship Modulates Hierarchical Relations in Public Elementary Schools

ABSTRACT. Hierarchical relationships are pivotal for social structures in human beings. Yet, little is known about mechanisms connecting social hierarchies and friendship in elementary school children.

Here, we present the results of a large-scale experiment (856 children between 9 to 12 years) from 14 different elementary schools in Santiago de Chile, designed to measure social status through aggregated cooperative patterns and then explore the connection between friendship and dyadic cooperation in students with different social status.

We map each classroom's cooperative network using a modified Prisoner's Dilemma in a lab in the field setting. In a networked set up of tablet computers, each student played the game in pairs with each classmate. They had to decide simultaneously how many tokens (between o to 10) send to their peers in each round. Thus, we proxy social status using the page rank, which considers the total number of received tokens and the sender's social position. To measure friendship, we run a peer nomination questionnaire, where students nominated up to five friends. Then, we investigate how dyadic cooperation varies according to social status and friendship.

In general, we find that the more significant the difference in social status, the greater the dyadic cooperation gap, indicating acts of deference from lower-status individuals to higher-status individuals. However, when we separately analyze relationships involving mutual declarations of friendship, the association between social status and cooperation disappears. Among friends, we do not observe acts of deference from the lower status to the higher status member of the dyad. These results suggest that friendship implies fundamental equality, which is not affected by social status differences in elementary school students.

12:45
Temporal Analysis of Moral Interactions and Polarization in the US Congress

ABSTRACT. The attack on the US Congress on January 6th, 2021, made it drastically clear how divided the United States are along political lines. But the origins of this division are still debated. In this study, we investigate the role of moral interactions between Democrats and Republicans for the emergence of political polarization in the US Congress. Following Jonathan Haidt's Moral Foundations Theory (MFT, Haidt 2012), we hypothesize that, when addressing their peers, Members of Congress (MCs) will appeal to moral foundations congruent with their ideology. This means that Democrats focus primarily on Fairness and Care moral foundations, whereas Republicans appeal to Authority, Loyalty, and Sanctity. We apply Computational Text Analysis to the protocols of the US Congress, covering the 144 year period between 1873 and 2017, or 72 legislative periods. Using an extended version of the Moral Foundations Dictionary (Graham et al., 2009), we quantify the level of moral rhetoric for 7958 Members of Congress (MCs), on each of the six different moral foundations, and separately for each of the 72 legislative periods. We couple this data with roll-call based estimates of MCs political position on the liberal-conservative spectrum (see Nokken and Poole, 2004). We then compute an index of Moral Divergence, which quantifies the difference in moral rhetoric between Republican and Democratic MCs, and compare the trajectory of Moral Divergence with a roll-call based metric of political polarization. We found that there is indeed a strong correlation between our moral divergence metric and ideological party polarization (Figure 1). During epochs in American politics characterized by relative political harmony, such as between 1940 and 1975, moral divergence is low; legislators of both parties use very similar moral language. In periods of extreme party polarization, such as in the ‘Gilded Age’ around 1900 and again in the current era, Democrats and Republicans appeal to very different moral foundations. These results are very different from previous attempts at proxying political polarization by analyzing congressional speeches (Gentzkow et al., 2019; Lauderdale and Herzog, 2016). While these studies did not find any close alignment between party polarization and speech-based metrics, our moral divergence metric parallels polarization over the whole 144 year period of available data. However, we also found that the correlation between the two metrics becomes even stronger when we shift their timelines, so that party polarization predicts moral divergence one congressional period (2 years) in the future. These results may indicate that changes in polarization actually drive moral divergence, rather than the other way round, as Moral Foundations Theory would suggest. In addition, we find that on the level of individual legislators, the interrelation between moral interactions and political ideology is more complex than predicted by MFT.

13:00
A community detection method to track opinion dynamics in online debate networks

ABSTRACT. In this work, we present a novel algorithm to detect communities in online debate networks, such as Reddit. The challenge lies in the fact that such networks are both signed (positive/agreement, negative/disagreement, and neutral interactions) and temporal. Our method continuously updates communities as interactions (edges) enter the system, and automatically determines the number of communities without human intervention. It therefore allows to track the communities’ evolution and to identify events such as community birth, death, split or merge. We apply our method to synthetic and real-world social media data, and draw social interpretations for the community dynamics and events inferred. To our knowledge, this is the first community detection method for temporal signed networks.

12:00-13:15 Session 14F: Economic Networks III
12:00
Co-evolution patterns of global technology and investment: Dynamic multilayer network approach

ABSTRACT. Innovation strongly relies on continuous technology advances and capital investing to support these research & development activities. How innovation diffuses from different firms, industries, and countries and what drives the emergence of new technology is vital for understanding the dynamics of economic growth. We proposed a multilayer network approach to map the evolutional characteristics of worldwide capital-technology networks. Based on a five-decade (1969-2018) comprehensive dataset we build a temporal multi-layer network in which the node represents the firm/country, and two types of links are used to represent the technology and capital bonds respectively. The technology linkages between nodes represent the similarity of the technology that companies implemented, while capital bonds indicate the common equity investment relation among them. We develop several measures to assess the level of inter-similarity between technology network and capital network: (i) A joint probability that a technology link connected two nodes having common investors in the capital network; (ii) A conditional probability that a technology connection forms after a capital linkage occurred or vice versa. (iii) inter-community overlapping that measures the level of communities in the technology network overlapping with communities in the capital network; (iv) Centrality correlation of nodes in technology and capital networks. Moreover, we analyze the time-evolution of the above inter-similarity features to shape the landscape of worldwide technology innovation.

12:15
A study on the reliability of trade dependence network in tungsten industry chain based on percolation

ABSTRACT. The issue of mineral resource supply security has been around for a long time. As the application value of tungsten in strategic emerging industries continues to be explored, major developed countries have successively added tungsten to their lists of critical raw materials, the importance of inter-country trade dependency relationships at different stages of the tungsten industry chain, and the extent to which changes in trade dependency relationships affect the entire trade system. This is all related to national security and is a matter of concern for all countries. This study takes as its object the commodities with the keyword ‘tungsten’ published by Uncomtrade from 2009 to 2018, tungsten ores, ferro-tungsten and tungsten final products, and calculates the total trade value of tungsten-related products. Using the modified Point Mutual Information (PMI) method to calculate the trade dependence relationship between countries, a multi-layered trade dependence network between countries (regions) was constructed using countries as nodes and trade dependence intensity as connected edges for each stage of the tungsten industry chain. Then, different scenarios are used to filter out the connected edges in the network to simulate the process of percolation. Through the study we found that, first, from critical link score observation, the spatial distribution of the critical link scores of trade and the red areas of trade volume are extremely similar, specifically, the countries surrounded by high critical link scores are those with large total trade imports or exports, such as China, the United States, Germany and Russia. Second, In terms of the network as a whole, the downstream network is more stable, and extreme trade can greatly affect the stability of the network, such as,the trade between Poland and Russia regarding midstream ferro-tungsten in 2014, which accounted for 46% of the overall system. This single import can make the whole network less stable and extremely detrimental for importing countries to cope with risks, so in order to increase the ability to cope with risks, the source path of imports should be increased. In this paper, through percolation to filter edges and simulate risks, find out the important national trade dependence, quantifies the degree of influence of national trade dependencies, and provides a basis for the country to formulate relevant policies at all stages of the tungsten industry chain.

12:30
Maximum-Entropy models for Network Regression Analysis on Trade Data.
PRESENTER: Marzio Di Vece

ABSTRACT. Econometrics and Network Science have separately studied the dependence between the extensive margin, the intensive margin, and the propagation of shocks in International Trade Data in the last fifteen years.

Econometrics uses novel methods to encode the decision process of the country by treating its tendency to trade and the amount traded either separately or jointly or opportunely increasing the variance of the estimates. Network Science tried to solve the inference problem using notions grounded in Statistical Physics, i.e. the Maximum Entropy Formalism. The maximization of the Shannon Entropy leads to the maximally unbiased parametric distribution compatible with the constraints.

Progress has recently been made in reconciling the two branches of literature using Maximum Entropy Formalism and allowing Economic factors to enter into the specifications of discrete-valued weights. These models provide an informative gain with respect to Network Science methods and a better inference of both extensive and intensive margins than Econometric models.

We proceed on this line of work, introducing Regression models that can infer topology and weights in an integrated or conditional fashion for continuous-valued data. Depending on the constraints of choice, different distributions are available, and we inspect their pros and cons and compare them with the state-of-the-art.

Our contribution improves on the Econometric and Network Science literature, potentially shedding light on the interplay between extensive and intensive margins in Trade Data, and we claim both Econometricians and Network Scientists can further use it to recover the effects of economic factors on trade volumes once topological measures are taken as fixed.

12:45
Re-organisation of socioeconomic networks in Sierra Leone due to external shocks

ABSTRACT. Individual socioeconomic status is a crucial driver of macroscopic phenomena in social networks. Status homophily is one of the key elements that influence network evolution and potentially lead to observable social segregation patterns. Moreover, homophilic mechanisms are adaptive to the external environment, leading to changes in the social structure. However, the observation of these phenomena in large-scale systems is still problematic due the poor availability of proper digital data, especially in developing countries. Here we combine large-scale mobile phone communication data from a major provider in Sierra Leone with a fine-grained socioeconomic map to build a large socioeconomic network and to analyze how segregation in the network changes on the short run and at different scales in response to the exogenous shock related to the COVID-19 restriction policies. In general, on a global scale, we observe a significant level of segregation present, as compared to the segregation reproduced by simple reference models accounting for different confounding factors (physical distance, income distribution, and degree distribution). Interestingly, by following the segregation index of the socioeconomic stratification matrix, global segregation decreases evidently during the three-day lockdown introduced by the government in response to the identified cases of the COVID-19 pandemic. We find that this is induced by a relative increase of communication between the richest area around the capital city Freetown and the rest of the country. At a mesoscale level we find that richest classes tend to interact less with poorest classes than vice versa, giving rise to an asymmetric social mixing structure as compared to normal times. From an individual point of view, we observe that the network positions in the new configuration are strongly dependent on socioeconomic status, leading to a new equilibrium between classes. Indeed, we find that during lockdown rich people occupy more segregated positions in the network than before, while poor people occupy more integrated ones. Our study highlights that emergency policies can have a strong impact on the socioeconomic structure of social networks, and that such rich phenomenology can emerge from multiscale analysis.

13:00
Technology network focusing on “small” firms

ABSTRACT. The “technology space”, or “technology network”, is a network in which each technological field is a node and the weights of the edges connecting the nodes reflect the proximity of the fields. The technological network has been recognized for its usefulness in understanding the processes of technological development and convergence, and for predicting future developments. The proximity between any given technological field is measured by the co-occurrence of IPC codes within a firm's patent portfolio. By mapping a firm's patent activities on the technology network, it is possible to predict the fields in which the company should newly advance according to its technology portfolio [1]. Ref. [2] compared different methodologies for the prediction of firm’s future submission of patents in new sectors. While these studies provide us with very useful insights into technology development, they do not provide sufficient analysis or strategic planning suggestions at a firm level. They focused only on major firms, but the strategies for technology portfolio building of small firms are not necessarily the same as those of major firms. (Note that here, “small” means that a firm has a small technology portfolio – i.e., with only a small number of patents.) Nowadays, small firms such as start-ups with severely limited resources are the key to major technological innovation. In this study, we compare two technology networks: one constructed by capturing the proximity between technological field in the usual way (hereafter, RCA-based network), and one in which technology portfolios of small firms are also reflected in the proximity (hereafter, non-RCA-based network). The proximity between a given pair of technology fields in the conventional measure takes into account RCA (Revealed Comparative Advantage), which means that those firms with small portfolios or evenly diversified portfolios are neglected. Therefore, we used another major to construct the non-RCA-based network, which is simply not using RCA. The data we used were on patents filed between 2000 and 2020, extracted from Orbis IP, one of the largest patent databases. Panel A in the figure shows the annual change in the total number of firms whose portfolio information is included in the proximity calculations. It indicates that the portfolios of the majority of firms are actually not reflected in the conventional RCA-based network. That is appropriate when one wants to look at technology developments, but not when one wants to see where in the network new technologies may emerge (often driven by small firms) and how small firms should expand their portfolios. By comparing the two networks, we could understand which combinations of technological fields are underrepresented in the conventional RCA-based network (see Panel B). Furthermore, we classified patterns of temporal changes in proximities between technological fields. Our analysis captured multiple pattens such as small firms achieving a combination of rare technological fields and then major firms beginning to focus on that combination, or small firms expanding their portfolios into fields that major firms never even looked at in the first place.

13:30-14:40 Session 15: Invited Talks (G. Bianconi & M. Timme)
Chair:
13:30
The dynamics of higher-order networks: the effect of topology and triadic interactions
14:05
Nonequilibrium Network Dynamics
14:55-16:10 Session 16A: Network Inference
14:55
Network inference via process motifs for lagged correlation in linear stochastic processes

ABSTRACT. A major challenge for causal inference from time-series data is the trade-off between computational feasibility and accuracy. Motivated by process motifs for lagged covariance in an autoregressive model with slow mean-reversion, we propose to infer networks of causal relations via pairwise edge measure (PEMs) that one can easily compute from lagged correlation matrices. We introduce a PEM with a correction for confounding factors and a PEM with a correction for reverse causation. To demonstrate the performance of our PEMs, we consider linear stochastic processes on random networks and show that our proposed PEMs can infer networks accurate and efficiently. Specifically, our approach achieves accuracy higher than or similar as results from Granger causality, transfer entropy, and convergent crossmapping---but with much shorter computation time than any of these methods. Our fast, accurate PEMs are thus easy-to-implement methods for network inference with a clear theoretical underpinning. They provide promising alternatives to current paradigms for the inference of linear models from time-series data, including Granger causality, vector-autoregression, and sparse inverse covariance estimation.

15:10
Autonomous inference of complex network dynamics from incomplete and noisy data

ABSTRACT. The availability of empirical data that capture the structure and behavior of complex networked systems has been greatly increased in recent years, however a versatile computational toolbox for unveiling a complex system's nodal and interaction dynamics from data remains elusive. Here we develop a two-phase approach for autonomous inference of complex network dynamics, and its effectiveness is demonstrated by the tests of inferring neuronal, genetic, social, and coupled oscillators dynamics on various synthetic and real networks. Importantly, the approach is robust to incompleteness and noises, including low resolution, observational and dynamical noises, missing and spurious links, and dynamical heterogeneity. We apply the two-phase approach to inferring the early spreading dynamics of H1N1 flu upon the worldwide airline network, and the inferred dynamical equation can also capture the spread of SARS and COVID-19 diseases. These findings together offer an avenue to discover the hidden microscopic mechanisms of a broad array of real networked systems.

15:25
On the importance of correlation in graph reconstruction

ABSTRACT. The structure of empirical networks is often unknown: we generally observe measurements of pairwise interactions and not the network itself. Some form of post-processing is needed to convert these data to networks. Recent work shows that a Bayesian framework can be used to generate a distribution of graphs compatible with all the available information instead. Crucially, to keep solutions tractable, these work assume that the edges are conditionally independent, an assumption that can be violated in practice. For example, triadic closure and clustering are two well-known phenomena inducing correlations among edges in empirical networks.

Here, we introduce a minimal Bayesian network reconstruction framework that can account for correlations. In the model, we account for correlation by using a hypergraph comprised of 2-edges and 3-edges. The 2-edges and 3-edges are supposed to exist independently a priori with probabilities q and p respectively. To obtain a pairwise description of the interactions, these hypergraphs are projected onto a graph with two edge labels: 2-edges become ``regular edges'' and 3-edges become a triangle of ``correlated edges''. The likelihood of the observations (e.g., number of interactions between two individuals) is then a Poisson mixture model where the projection labels determine the type of measurement made (no interaction, regular edge and correlated edge).

As an uncorrelated baseline, we also study a network model in which weak and strong ties are a priori independently distributed: each weak edge exists independently with probability q1 and each strong edge exists independently with probability q2 among the remaining unconnected pairs. Again, the strength of the interactions determines the type of measurements made.

We develop sampling algorithms for these two models and show how to fit them to empirical data. As an example, we use Zachary's karate club hypergraph obtained with Young et al. method. In a regime where the Poisson distributions are well separated, both models identify all effective edge types correctly. However, when distributions start overlapping, the model with correlation can better reconstruct the structure using the information contained in the hypergraph neighbourhood of the pairwise interactions.

15:40
Unveiling the higher-order organization of multivariate time series

ABSTRACT. Time series analysis has proven to be a powerful method to characterize different phenomena in biology, neuroscience, economics, and to understand some of their underlying dynamical features. However, to date, it remains unclear whether the information encoded in multivariate time series, such as the evolution of financial assets traded in major financial markets, or the neuronal activity in the brain, stem from either independent, pairwise, or group interactions (i.e., higher-order structures [1]).

In this work, we propose a novel framework to characterize the instantaneous co-fluctuation patterns of signals at all orders of interactions (pairs, triangles, etc.), and to investigate the global topology of such co-fluctuations [2]. In particular, after (i) z-scoring the N original time series, (ii) we calculate the element-wise product of the z-scored time series for all the $\binom{N}{k}$ k-order patterns (i.e. edges, triplets, etc). Here, the generic elements represent the instantaneous co-fluctuation magnitude between a (k+1) group interaction. To distinguish concordant group interactions from discordant ones in a k-order product, concordant signs are always positively mapped, while discordant signs are negatively mapped (Fig. 1a-b). (iii) The resulting new set of time series encoding the $k$-order co-fluctuations are then further z-scored across time, to make products comparable across $k$-orders. (iv) Finally, for each time frame t, we construct a weight filtration, by sorting all the k-order co-fluctuations by their weights. The weight filtration proceeds from the top down -- in the spirit of persistent homology -- so that when k-order patterns are gradually included, weighted holes and cliques start to appear (i.e., descending from more coherent patterns to less coherent). Yet, to maintain a well-defined weight filtration, only k-order patterns respecting the simplicial closure condition are included, while the remaining ones are considered as simplicial violations, or ''hyper coherent'' states, and analysed separately (Fig. 1c-d).

We show that the instantaneous persistent topological properties and the number of violations uncovered by our framework distinguish different regimes of coupled chaotic maps [3]. This includes the transition between different dynamical phases and various types of synchronization (Fig. 1e). Armed with these interpretational benchmarks, we also apply our method to resting-state fMRI signals and to financial time series. We find that, during rest, the human brain mainly oscillates between chaotic and partially intermittent states, with higher-order structures mainly reflecting somatosensory areas. In financial time series, by contrast, higher-order structures discriminate crises from periods of financial stability (Fig. 1f). Overall, our approach suggests that investigating the higher-order structure of multivariate time series might provide new insights compared to standard methods, and might allow to better characterise group dependencies inherent to real-world data.

[1] F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lucas, A. Patania, J.-G. Young, and G. Petri, Physics Reports 874, 1 (2020). [2] A. Santoro, F. Battiston, G. Petri, and E. Amico (2022), arXiv 2203.10702. [3] K. Kaneko, Chaos: An Interdisciplinary Journal of Nonlinear Science 2, 279 (1992). [4] J. Faskowitz, F. Z. Esfahlani, Y. Jo, O. Sporns, and R. F. Betzel, Nature neuroscience 23, 1644 (2020).

15:55
Inferring the transmission dynamics of Avian Influenza from news and environmental data

ABSTRACT. Avian Influenza (AI) is a highly contagious animal disease, which infects many wild and domestic bird species. Transmission between birds can be direct due to close contact between birds, or indirect through contaminated materials such as feed and water. Particularly, migratory wild birds play a key role in this transmission and make the viruses spread over long distances. Although AI is a well-studied disease in the literature and there exist several surveillance platforms for monitoring the evolution of AI outbreaks, in practice we hardly know about the real transmission routes of the AI viruses, i.e. the provenance information when a new outbreak appears in some location. This makes the early detection of AI outbreaks very challenging.

In this work, we tackle the problem of how the AI disease spreads in the absence of real disease transmission routes. Towards this end, we first reconstruct these transmission routes based on how AI outbreaks evolve in the news articles collected by surveillance platforms, accompanied by environmental data related to the outbreak locations, through an attributed location-aware dynamic network. Then, we study the underlying network dynamics to unveil the AI transmission patterns. In our network construction, the nodes correspond to outbreak locations extracted in news data and provided by surveillance platforms, and the edges between them represent the disease transmission routes that are more likely to occur. Our hypothesis is that the probability of disease transmission between locations is mainly related to two aspects: 1) temporal differences between outbreaks, and 2) distance between outbreak locations.

We analyze the resulting dynamic network with two groups of spatio-temporal measures. The first group characterizes a single node through macroscopic measures (e.g. adapted dynamic Pagerank), which allow identifying key locations in disease transmissions, whereas the second group summarizes the whole network structure (e.g. time-space spreading index based on hotspot analysis). We show the interest of our approach by applying it to the AI outbreak datasets collected by PADI-Web and ProMED, two well-known surveillance platforms in Epidemic Intelligence, for the period of 2019-2021. Our preliminary results confirm the existence of super-spreaders, i.e. the locations being particularly effective in transmitting AI disease, in our datasets.

14:55-16:10 Session 16B: Spatial Analysis II
Chair:
14:55
Regionalization through optimal information compression on spatial networks

ABSTRACT. The process of aggregating areal units into contiguous clusters, known as regionalization, is central to the analysis of spatial data. Regionalization provides a means to reduce the effect of noise or outliers in sampled data, identify socioeconomically homogeneous areas for policy development, and simplify the visualization of data in maps among many other applications. Most existing regionalization methods require a substantial amount of manual input, such as the number of desired regions or a similarity measure among regional populations, which may be desirable for some applications but does not allow us to extract the natural regions defined solely by the data itself. Here we view the problem of regionalization as one of data compression on the network representing adjacency between spatial units. We define the optimal partition of this spatial network as the one that minimizes the description length required to transmit distributions of data found on the nodes, and develop an efficient, parameter-free greedy optimization algorithm to identify this partition. We demonstrate that our method is capable of recovering planted spatial clusters in noisy synthetic data, and that it can meaningfully coarse-grain real demographic data. Using our description length formulation, we find that the information contained in spatial ethnoracial data in metropolitan areas across the U.S. has become more difficult to compress over the period from 1980 to 2010, which reflects the rising complexity of urban segregation patterns of these metros. We identify the increasing overall diversity of these metros as a major contributor to this lower data compressibility, while the characteristic length scale of ethnoracial clustering does not appear to be a significant factor.

15:10
SARS-CoV-2 Sequences Misclassification Network Spatial Analysis

ABSTRACT. Understanding the spatial-temporal correlation of SARS-CoV-2 genome sequences is crucial in discovering the effects of human mobility, contacts and hence social community structures, COVID-19 policies, and geo-locations on the spreading, as well as variations and evolution of SARS-CoV-2. However, traditional alignment-based comparative methods such as BLAST and CLUSTAL take into account only the local similarity or consistent ordering across sequences and are often computationally infeasible for comparing large ensembles of long genome sequences.

In this work, we propose a novel network-generating framework based on misclassification results from state-of- the-art deep learning techniques for correlation-based network analysis. Specifically, we propose to measure the class-wise associations between genomes, named as indistinguishability(class_i, class_j), based on the empirical likelihood of them being misclassified as one another. That is to say, given a decent enough trained classifier, some underlying correlation is suggested if a certain class of genome sequences is frequently misclassified into another class. Indeed, when a deep learning neural network is trained to classify a dataset, essentially the filters embedded in the deep learning architecture are trained to detect the class-specific distinctive features. For instance, in the case of a convolutional neural network, the filters closer to the input layer are expected to detect finer distinctive details and conversely the output layer more general features. Hence, when the trained classifier is confused by a data point to the extent to have incorrectly classified it, similar features between the target and the predicted class could be expected.

As a result, a misclassification network could be constructed where node i represents an ensemble of genomes from region i and weighted edge w_{ij} between node i and node j is computed as their indistinguishability score, computed based on empirical misclassification likelihood between genome ensemble from region i and region j. Generating complex networks based on misclassifications of deep learners allows us to relate various regions (i.e., data clusters) based on their similar latent features; and subsequently conduct network analysis such as community detection and centrality ranking. Further, we observe that valuable information is embedded in the often-neglected incorrectly classified cases. As illustrated in Figure 1, misclassification on sequences from England mostly happened when they were mistaken for sequences from other European regions, especially those near England, such as Scotland, Denmark, and France. This observed misclassification spatial dependency aligns with our expectation that social contacts and consequently traveling have a significant impact on the indistinguishability between genome sequences. This study opens up new prospects in the field of network science by introducing a new genre of networks to be studied: misclassification networks. To generalize, in the scenarios where a routine procedure of measuring nodes association is absent, the misclassification network could come into play.

15:25
(CANCELLED) Multiscale Causal Structure in Armed Conflict

ABSTRACT. Armed conflict is a major and ongoing problem around the world today. Its consequences extend far beyond direct casualties in terms of political, economic, and environmental repercussions that spread even to regions not directly involved. Here, we introduce a first principles approach to conflict by clustering sequences of conflict events into causal conflict avalanches and analyzing the statistics of the resulting avalanches. In this work, we develop a systematic approach to investigate different spatial and temporal scales of armed conflict. We rely on data from the Armed Conflict Location & Event Data Project (ACLED) and focus on conflict events occurring from the years 1996- 2021 in Africa. A conflict event can have innumerable nuances and microscopic social, economic and political factors. These microscopic features introduce ambiguity into how individual conflict events on the smallest scale belong to large-scale outbreaks such as riots, wars, or revolutions. We investigate different spatial and temporal scales with a systematic coarse-graining procedure. For space, we tile the region with semi-regular bins that constitute our level of resolution (Figure 1A), and for time we group days into discrete intervals. The sizes of these spatial and temporal bins determine the scale at which we perform our analysis and are analogous to sociopolitical definitions of battles or wars. This formalism bridges the gap between microscopic and macroscopic descriptions of armed conflict.

15:40
Network constraints on worker mobility

ABSTRACT. Career mobility requires desirable workplace skills and access to relevant labor markets. So, how do detailed skill requirements and the similarities among occupations shape the present and future of worker's careers? Here, we model career transitions as a network of occupations connected by the similarity of occupations' skill requirements. Using a nationally representative survey and two resume data sets each representing 100 million individual workers, we show that skill similarity predicts transition rates between occupations and that predictions improve with increasingly-granular skill data. Thus, a new measure for skill specialization based on workers' embeddedness in their economy's occupation network measures may predict future career dynamics. Job changes and/or relocations that decrease embeddedness correspond to increased wages and workers tend to decrease their embeddedness over their careers. While low-embeddedness workers may leverage their locally-rare skills in wage negotiations, employers might also offer higher wages as an incentive for skilled workers to relocate. We find evidence for the latter since the combined embeddedness of city pairs corresponds to increased Census migration and increased flows of enplaned passengers according to the US Bureau of Transportation Statistics. This study directly connects workplace skills to workers’ career mobility and spatial mobility, thus offering new insights into skill specialization and urbanization.

15:55
Link prediction for mesoscopic COVID-19 transmission networks in Republic of Korea

ABSTRACT. We analyze the dataset of $N=279,930$ confirmed cases of the severe acute respiratory syndrome coronavirus 2 (COVID-19) over the last two years in Republic of Korea, provided by Korea Disease Control and Prevention Agency. This dataset contains transmission information on who infected whom, along with each infected individual's location at the district level and age, as well as on the infection date $t$ (in day) regarding when the infection possibly occurred. We construct two kinds of mesoscopic transmission networks, namely, location and age networks. In a location network, each node denotes one of 250 districts in Korea, while a link between two nodes is created if infection occurred between two individuals associated with those nodes. The weight of a link is defined as the number of infections occurred on the link. One can similarly define an age network having 100 nodes in which each node corresponds to each age. By using the temporal information on infection events, we get the time series of location and age networks on a date $t$, respectively denoted by $G^{\rm L}_t$ and $G^{\rm A}_t$, by confining the observation period to $[t-13,t]$ (two weeks).

Then we see how the structure of these networks changes over time in terms of clustering behavior and link predictability. We calculate the average clustering coefficients (CC) for both binary and weighted versions of networks as shown in Fig~\ref{fig}(a,~b). For both location and age networks, we find that there are a number of triangles containing at least one weak link. Together with negligible values of the average CC for weighted versions of networks, it may also imply that the networks without such weak links can be seen to have a tree structure.

To forecast the spreading pattern and hopefully mitigate it, it is important to predict which nodes would be connected with each other in a future based on the past network structure. We apply link prediction methods to the location and age networks on a date $t$. Each pair of nodes that were not connected in $G^{\rm L}_t$ and $G^{\rm A}_t$ will either be connected in the next period $[t+1,t+7]$ or remain unconnected; the set of former (latter) pairs of nodes is denoted by $E^{\rm L}_{t+}$ ($U^{\rm L}_{t+}$). We define a novel predictability measure as the ratio of the average of similarity index for node pairs in $E^{\rm L}_{t+}$ to that for node pairs in $U^{\rm L}_{t+}$. Figure~\ref{fig}(c,~d) show temporal behaviors of the predictability measure for the similarity index called common neighbors (CN), those when the matrix representing human mobility (OD) or contact matrix (CM) was used as ``similarity index'', and those when an extended version of CN by incorporating the human mobility and contact patterns was used. For both location and age networks, we find that link predictability using the topological information could be improved when combined with additional information on human mobility and contact patterns.

14:55-16:10 Session 16C: Theory IV
14:55
Cycle analysis of Directed Acyclic Graphs

ABSTRACT. Many real world networks express an important constraint leading to a characteristic order: publication dates of papers in a citation network, dependency of packages in computer software, and predator-prey relationships in a food web. If edges represent relations that respect this order, they can exist only if they link a high value node to a lower value node, and are therefore naturally directed. Such a network is called a Directed Acyclic Graph (DAG) as no paths start and finish at the same node.

In this paper, we consider a directed network as an undirected graph plus associated node meta-data. Using this decomposition, we can find a Minimum Cycle Basis (MCB) of the undirected graph and characterise it with the directionality information, see fig.1.a. We first show that only four classes of directed cycles exist, and that they can be fully characterised by the organisation and number of source and sink node pairs and their antichain structure, see fig.1.b. Furthermore, we introduce a series of metrics to characterise cycles and their organisation in the network.

We then turn to a special class of networks: transitively reduced DAGs. Transitive reduction has two effects: i) it stabilises the properties of the MCB, making the metrics a consistent characterisation of the systems represented by DAGs, and ii) it reduces the number of cycle classes present to two, diamonds and mixers.

We measure the characteristics of the Minimum Cycle Bases of four models of transitively reduced DAGs: Lattice, Russian Doll, Erdos-Renyi DAG, and the Price model. We show that the metrics we introduce are able to not only distinguish the models but are also revealing of their generating mechanisms.

We believe the generalised directed cycles classes we have identified and the metrics capturing their organisation will prove useful in understanding processes on real-life directed systems.

15:10
Characterizing cycle structure in complex networks

ABSTRACT. A cycle is the simplest structure that brings redundant paths in network connectivity and feedback effects in network dynamics. An in depth understanding of which cycl es are important and what role they play on network structure and dynamics, however, is still lacking. Here we define the cycle number matrix (Fig. b), a matrix enclosing the information about cycles in a network, and the cycle ratio (Fig. b), an index that quantifies node importance. Experiments on real networks suggest that cycle ratio contains rich information in addition to well known benchmark indices. For example, node rankings by cycle ratio are largely different from rankings by degree , H index, and coreness, which are very similar indices (Figs. c & Numerical experiments on identifying vital nodes for network connectivity (Fig. e) and synchronization (Fig. and maximizing the early reach of spreading (Fig. show that the cycle ratio performs overall better than other benchmarks. I n addition , we highlight a significant difference between the distribution of shorter cycles in real and model networks. We believe our in depth analyses on cycle structure may yield insights, metrics, models, and algorithms for network science 1.

15:25
Entropy of labeled versus unlabeled networks

ABSTRACT. The structure of a network is an unlabeled graph, yet graphs in most models of complex networks are labeled by meaningless random integers. Is the associated labeling noise always negligible, or can it overpower the network-structural signal? To address this question, we introduce and consider the sparse unlabeled versions of popular network models, defining them as maximum-entropy ensembles of graphs with constraints analogous to their labeled formulations, but as probability distributions over the set of unlabeled graphs rather than labeled graphs (see, e.g., the labeled and unlabeled versions of the Erdos-Renyi model, shown in the first and third row of Figure \ref{fig1}, respectively). We distinguish these unlabeled network models from the ensembles produced by sampling graphs from labeled network models and then removing their labels (see the delabeled version of the Erdos-Renyi model in the second row of Figure \ref{fig1}). We compare the entropy of the labeled and unlabeled versions of the different models, as a way of examining the amount of noise injected by node-labeling. We show that labeled and unlabeled versions of Erdos-Renyi graphs are entropically equivalent, but that their degree distributions are very different. The labeled and unlabeled versions of the configuration model may have different prefactors in their leading entropy terms, although this remains conjectural. Our main results are upper and lower bounds for the entropy of labeled and unlabeled one-dimensional random geometric graphs; the unlabeled version has entropy scaling no faster than linearly with $n$, which is negligible in comparison to the entropy of the labeled version, which scales as $n\log n$. These results imply that in sparse networks the entropy of meaningless labeling may dominate the entropy of the network structure, suggesting a need for a thorough reexamination of the statistical foundations of network modeling -- and an opportunity for the development of a new science of unlabeled networks. For further information, see arXiv:2204.08508.

15:40
Unifying information propagation models on networks and influence maximisation

ABSTRACT. Information propagation is a central theme in social, behavioural, and economic sciences, with important theoretical and practical implications, such as the influence maximisation problem for viral marketing. Two widely adopted models are the independent cascade model where nodes adopt their behaviour from each neighbour independently, and the linear threshold model where collective effort from the whole neighbourhood is needed to influence a node. However, both models suffer from certain drawbacks, including a binary state space, where nodes are either active or not, and the absence of feedback, as nodes can not be influenced after having been activated. To address these issues, we consider a model with continuous variables that has the additional advantage of unifying the two classic models. For the associated influence maximisation problem, the objective function is no longer submodular, a feature that most approximation algorithms are based on but is arguably strict in practice. Hence, we develop a framework, where we formulate the influence maximisation as a mixed integer nonlinear programming and adopt derivative-free methods. Furthermore, we show that the problem can be exactly solved in the special case of linear dynamics, and propose a customised direct search method, with local convergence. We demonstrate the rich behaviour of the newly proposed information propagation model and the close-to-optimal performance of the customised direct search numerically on both synthetic and real networks.

15:55
(CANCELLED) Mean-field solution for critical behavior of signed networks in competitive balance theory

ABSTRACT. The competitive balance model has been proposed as an extension to the balance model to address the conflict of interests in signed networks. In this model, two different paradigms or interests compete with each other to dominate the network’s relations and impose their own values. In this paper, using the mean-field method, we examine the thermal behavior of the competitive balance model. Our results show that under a certain temperature, the symmetry between two competing interests will spontaneously break which leads to a discrete phase transition. So, starting with a heterogeneous signed network, if agents aim to decrease tension stemming from competitive balance theory, evolution ultimately chooses only one of the existing interests and stability arises where one paradigm dominates the system. The critical temperature depends linearly on the number of nodes, which is a linear dependence in the thermal balance theory as well. Finally, the results obtained through the mean-field method are verified by a series of simulations.

14:55-16:10 Session 16D: Economic Networks II
14:55
Quantifying the exposure of banking systems to the propagation of supply chain shocks in large scale firm-level production networks

ABSTRACT. Traditionally, banks manage the credit risk of loans they extend to corporate customers with statistical models using node-level information about their customers. However, through their supply chain linkages these customers are connected to a large complex supply network that can expose them to potentially large production shocks [1]. Banks do not have access to information to quantify the exposure of their customers to the propagation of shocks in supply networks, leaving the financial system exposed to the systemic risks inherent to supply networks. We show a new framework that allows us to quantify for each firm in the supply network the systemic risk it entails for the financial system, by first simulating the shock propagation cascade it triggers in the supply network with the model of [1], then converting the resulting production losses of firms into financial losses, determining which firms default and thus can not repay their loans and finally calculate the losses of equity for each bank. Consequently, we can compare the size of banks’ exposure to direct customers versus its higher order exposures to the overall supply network. We apply our new methodology to the supply network between 245,000 Hungarian firms with a million of edges and link them to the banking system via a comprehensive bank-firm loan data set. Fig. 1 shows the rank sorted distribution of the financial systemic risk index (FSRI) of each firm, measuring the fraction of overall bank equity in the country that is lost in response to a firm’s failure. A small fraction of firms pose sizeable risks to the financial system, affecting up to 6% of overall bank equity. This magnitude of risk is explained mostly by the propagation of shocks in the supply network and not by their direct impacts on banks. Therefore, our findings show that it is crucial for regulators’ financial systemic risk assessment to monitor supply network shock propagation in order to have a more complete picture of threats to financial stability.

15:10
Spreading of Shocks in Production Networks and the heterogeneity of firms within industry sectors

ABSTRACT. Models for the spreading of shocks in production networks are one of the key tools to assess the economic and consequently societal effects of crises like natural disasters and climate change [1], or the COVID-19 pandemic [2]. A crucial element of these models are the highly aggregated industry sector level production networks (based on Input-Output tables), on which the spreading dynamics of the direct initial shocks – stemming from crisis scenarios – to other parts of the economy are simulated. Based on the firm-level production network of Hungary, containing 240,000 firms, we show that firms within a given industry exhibit vastly heterogenous interlinkages in the production network. By using a firm-level shock propagation model [3], we show that crisis scenarios with the same initial shock size for each sector, but affecting different companies within sectors, lead to substantially different shock cascades. These would be indistinguishable when considering aggregated sector level production networks. The histogram in Fig. 1a shows the networkwide production losses due to propagation of shocks triggered by 1,000 synthetic COVID-19 shock scenarios in March 2020. Fig. 1b shows how single sectors are affected by the corresponding shocks. The large interquartile ranges show the sizeable differences in cascades triggered by different initial shocks of the same size. This large level of uncertainty in production losses can not be captured when using aggregated sector level data and can lead to fundamentally wrong assessments of systemic risk when firm-level details are ignored.

References [1] S. Hallegatte, C. Green, R.J. Nicholls, J. Corfee-Morlot, Nat. Clim. Change 3, (2013) 802 [2] A. Pichler, M. Pangallo, M. d. Rio-Chanona, F. Lafond, D. Farmer, SSRN3788494 (2021) [3] C. Diem, A. Borsos, T. Reisch, J. Kertész, S. Thurner, arXiv:2104.07260 (2021)

15:25
Navigating the green transition: Systemic importance vs. CO2 emissions of firms in production networks

ABSTRACT. One of the biggest challenges of the green transition is the reorganization of economic production such that the least amount of greenhouse gases is emitted while the production of economic goods and services is kept at decent levels. We show how modeling the economy as a production network of firms can aid our thinking about decarbonization. For this purpose, we compare a centrality measure of systemic importance of individual firms with their respective annual CO2 emissions, as seen in Fig. 1. High emissions and low systemic importance are characteristic for companies who produce more greenhouse gas emissions than their socio-economic relevance would allow them to. By targeting these companies with policies, a maximum of saved emissions and a minimum of disruption to overall production can be expected. This idea is demonstrated for the 30.000 companies in the Austrian pork supply network, which we reconstruct from various data sources and whose emissions are calculated with the help of a recent life cycle analysis [1]. The nodes in this network are farms, slaughterhouses, meat processors, distribution centers and supermarkets. The edges are supply and buy relations between these companies and represent the number of transferred pigs or kilogrammes of pork. In order to quantify the systemic importance of firms, we use the economic systemic risk index (ESRI) developed by Diem et al. [2]. A firm’s ESRI is the fraction of overall production loss of the total network, if the firm would stop its production, therefore constraining what other firms are able to produce. Companies with high emissions and a low ESRI are potential leverage points for decarbonization. As a proof of concept, we show that by closing the firms in the red zone of Fig. 1, total emissions of the Austrian pork supply network are reduced by 22,9 % while total production is reduced by just 17,6 %. We are working on extending this methodology and applying it to more complex production networks in the future.

15:40
Pattern Analysis of Money Flows in the Bitcoin Blockchain

ABSTRACT. Bitcoin is a cryptocurrency that stores transaction records in a public distributed ledger called the blockchain. All transactions that occurred since the beginning of Bitcoin in 2009 can therefore be consulted by anyone. This unique dataset allows us to study financial transaction networks among pseudonymous participants. Several works analyzed static transaction networks but did not consider the flow of money over the time. In this work, we focus on the analysis of flows, a challenging task given the scale of the data (hundreds of millions of transactions).

We propose a method based on taint analysis to track Bitcoin money flow from initial starting points to the dissolution of the taint. The algorithm derives the dynamics subgraphs passing through known entities in the transaction network. We study the pattern of money flowing from different starting points: we taint coins minted by different mining pools in one day period between 2013 and 2016, and use graph embeddings from three representations of the data: (1) static network, (2) dynamic network, and (3) money flow pattern tree. Both qualitative and quantitative analysis show that mining pools have different diffusion patterns and that those patterns evolve over time. Based on this initial result, we are developing a method to select critical entities and expand our unsupervised approach to characterize other money flow patterns, in particular, related to illegal and cybercrime activities.

15:55
Topological Similarity of Prefectures in Japan Based on Firm-level Supply Chain Analysis

ABSTRACT. One of the most important networks in the economy is a supply-chain network, which is the directed network formed by firms and trade relationships. It has recently been shown that the shape of firm-level Japanese supply-chain networks is different from the World Wide Web well known as a bow-tie. However, the relation between its topological feature and the domestic economy remains unclear. In addition, although the domestic supply chains of Japanese firms started to be diversified, firms still agglomerate on urban area such as Tokyo substantially. In this work, we investigate the interconnectivity and similarity between prefectures in Japan in term of firm-level supply-chain network. We use the data collected by Tokyo Shoko Research Inc. in 2018, which includes one million firms and several million supplier-customer links. We construct the multilayer network having 47 layers defined with prefectures in Japan and analyze the topological features of supply-chain network for each layer. Bow-tie decomposition for each layer enable us to visualize the economic circulation of prefecture unit. We discuss the effect of COVID-19 pandemic to the economy in terms of the similarity between prefecture layers.

14:55-16:10 Session 16E: ML for Networks II
14:55
Nonlinear machine learning pattern recognition and bacteria-metabolite multilayer network analysis of perturbed gastric microbiome

ABSTRACT. The stomach is inhabited by diverse microbial communities, co-existing in a dynamic balance. Long-term use of drugs such as proton pump inhibitors (PPIs), or bacterial infection such as Helicobacter pylori, cause significant microbial alterations. Yet, studies revealing how the commensal bacteria re-organize, due to these perturbations of the gastric environment, are in early phase and rely principally on linear techniques for multivariate analysis. Here we disclose the importance of complementing linear dimensionality reduction techniques with nonlinear ones to unveil hidden patterns that remain unseen by linear embedding. Then, we prove the advantages to complete multivariate pattern analysis with differential network analysis, to reveal mechanisms of bacterial network re-organizations which emerge from perturbations induced by a medical treatment (PPIs) or an infectious state (H. pylori). Finally, we show how to build bacteria-metabolite multilayer networks that can deepen our understanding of the metabolite pathways significantly associated to the perturbed microbial communities.

15:10
Graph Motif: Towards Understanding the Interaction Patterns of Samples in Intermediate Layers

ABSTRACT. Graph, as an effective tool to reveal the underlying relationship between entities, plays an important role in various deep learning and data analysis tasks. Inspired by this, we construct an edge-dynamic graph to explore the complex interaction patterns among samples in the intermediate layers of deep neural networks. Then we introduce triangle motif, a high-order local structure pattern, to discover the properties of the edge-dynamic graph, which provides insights into the dynamics of deep neural networks. Extensive experiments show that samples in the same category are more likely to form triangle motifs than in the different categories as the layer goes deeper.

15:25
ranking aggregation algorithm of group behavior for spammer detection

ABSTRACT. Online review networks have become a popular platform for information dissemination which attracts large numbers of spammers to influence the reputation of the online products by writing false review information. Thus, the spammer detection becomes crucial to control the fraud reviews which is valuable for practical applications. Existing ranking-based methods usually focus on the rating value and textual features of reviews ignoring the features of group behavior which results in the sharp degradation of spammer detection’s performance in case of large-scale spammer group attack. In this paper, we propose a novel ranking aggregation method based on features of collusive attack of spammer groups to optimize the spammer ranking algorithm by reassignment of spamicity score for each reviewer. Extensive experiments on various real-world datasets with spammer injection demonstrate that our proposed optimization method can significantly improve the accuracy and robustness of the spammer detection. The average improvements of performance for ranking before and after optimization achieve 10.73%, 42.86% and 4.32% respectively on the Recall, Precision and AUC metrics.

15:40
Graph regression for pressure peak prediction in fracturing processes

ABSTRACT. In the oil and gas industrial field, fracturing construction technology is commonly used for increasing oil–gas production. A major concern in fracturing construction is whether the pressure at the wellhead will exceed safety threshold, when a large amount of sand-containing fluid is periodically injected into a well. To avoid accidents like blowout caused by extreme wellhead pressure, it is critically important to monitor pressure change in real time. In this work we propose to predict the pressure peaks during each fracturing period, for better designing future fracturing strategy. Towards this end, we present a novel non-parametric graph regression method, which is able to model the correlation of historical pressure peaks and learn the features of fracturing cycles via Laplacian-smoothness based graph learning. whereby the periodic fracturing signals, namely, fracking fluid concentration sequence, together with peaks, are able to be encoded in a latent Euclidean space by Laplacian-smoothness inspired graph learning. Then we introduce non-parametric linear regression for peak prediction based on the most similar nodes (e.g. peaks) w.r.t the node in the predicting period. Meanwhile, the current graph is updated using the predicted peak value. In the graph update, we particularly introduce a node forgetting mechanism to control the graph scale and to reduce the computational complexity, so as to achieve rapid prediction of construction operations. We conduct extensive experiments on real-world datasets from a well. The experimental results demonstrate that the proposed method can effectively predict the oil pressure peak and significantly outperform the state-of-the-art models.

15:55
sing graph embeddings deformation over time to detect disinformation on Twitter

ABSTRACT. Social networks, and in particular Twitter, allow collecting a set of digital traces in real time such as mentions,comments or retweets. From these interactions, a graph can be constructed with the users as nodes and the predefined interactions as links. Social evolution such as polarization can be understood through networks structural evolution over time. To measure and to understand these transformations, we have developed a geometry-based approach. Graphs are represented with node embeddings using spring layouts or Graph Neural Networks and graph alignment is used to keep track of nodes at different time steps. Changes in the latent space between timesteps are represented as a speed vector field and properties of the latent space can be identified and linked to social evolution.

Communities are at the heart of the propagation of information and therefore of misinformation. They channel the propagation of information online. The appearance and disappearance of communities allows us to understand the games of influence between narratives in a debate. Focusing on the borders that lies between communities thus enables us to study the evolution of narratives over time.

Using a Twitter dataset of more than 300 million tweets on the French political landscape over six years and a dataset of more than 100 million tweets on the climate change debate over three years, a long-term reconstruction of latent spaces has been computed. It allowed us to understand the construction of opinions, such as climate skepticism or vaccine hesitancy, in a unique way.

These informational landscapes change organically with the natural evolution of the network, but also under the impact of exogenous shocks. Modeling the stability of online communities allows us to understand more fundamental issues such as the emergence of social movements or the polarization of a political landscape, but also to detect sources of these evolution such as disinformation. A few cases of localized disinformation campaigns have been found using this method.

14:55-16:10 Session 16F: Temporal Networks II
Chair:
14:55
Submodularity of influence maximization on temporal networks

ABSTRACT. In this work, we analyze the properties of the influence function using the discrete Susceptible-Infected-Recovered (SIR) model on temporal networks in the context of influence maximization problem. We frame the problem for a spreading process following the rules of the SIR model with temporal scale equal to the one characterizing the evolution of the network topology. The influence function has the submodularity property on static networks with the SIR model, guaranteeing an optimality gap for greedy algorithm to find influential spreaders. In temporal networks the influence function does not have the submodularity property, so there is no performance guarantee for greedy algorithm. We first analyze why the submodularity property is violated on temporal networks. We observe that the main reason for violations is recovered nodes blocking paths of further infection. When a new node is added to the set of initially infected nodes, it might be infected or infect other nodes prematurely, which might decrease the final outbreak size rather than increasing it as intended. The premature infection and recovery happens because of temporality, the neighbors of a node change, new paths for infection emerges, and these can not be used if the node was infected and recovered. Further, we analyze how common the violation of submodularity condition occurs on temporal random and real-world networks. We observe that the frequencies for the violation of submodularity condition and observing a decreasing outbreak size by adding a node to a set of spreaders follow the same behavior. We do our analysis separately for nodes selected randomly and nodes selected using the greedy algorithm. We observe that violations appear more frequently for randomly selected nodes on random temporal networks. The frequency is also higher when the recovery probability is higher. Using the nodes selected by the greedy algorithm, the frequency of violations in random temporal networks drops to zero for small sets of nodes, and has very low values for larger sets. When lower recovery probabilities are used, violations are negligible. In real-world temporal networks, even though we observe violations for randomly selected nodes, when the nodes are selected by the greedy algorithm the violations are non-existent. These results suggest that in practice the nodes selected by greedy algorithm on real-world networks do not violate the submodularity condition. In order to further analyze the accuracy of greedy algorithm, we compare its solutions to the optimal solutions found with brute-force search. The results suggest that the solutions of greedy algorithm are on average around 98% of the optimal solutions. This shows that even though there is no theoretical guarantee for greedy algorithm, in practice it finds close to optimal solutions.

15:10
Link weight variation offers superiority in controlling temporal networks

ABSTRACT. The control of temporal networks is of paramount importance to complex systems in diverse fields. Recent studies showed that temporal networks are more controllable than their static counterparts, in terms of control time, cost and trajectory length. However, the underlying mechanism of this intriguing phenomenon remains elusive, partly due to the fact that multiple properties of a temporal network simultaneously change over time. Here we explore a general model of temporal networks, and prove that weight variation of a link is equivalent to attaching a virtual driver node to that link through solid analyses. Consequently, the variation of link weights can significantly increase the dimension of controllable space and remarkably reduce control cost, which unveils the fundamental mechanism for the advantages of temporal networks in controllability. The finding of this mechanism leads to a graphic criterion that allows us to further discover that, degree-heterogeneous networks are more advantageous for enhancing controllability by link weight variation, and the favourable positions of weight variation are the incoming links of the nodes with high out-degree and low in-degree. Our results are validated in both synthetic and empirical network data, together deepening the understanding of network temporality and shedding a new light on the long-standing problem of establishing graphic criteria for the controllability of general dynamic systems.

15:25
Topological-temporal properties of evolving networks

ABSTRACT. Many real-world complex systems including human interactions can be represented by temporal (or evolving) networks, where links activate or deactivate over time. Characterizing temporal networks is crucial to compare such systems and to study the dynamical processes unfolding on them. A systematic method that can characterize simultaneously the temporal and topological relations of active links (also called contacts or events), in order to compare different real-world networks and to detect their common patterns or differences is still missing. In this paper, we propose a method to characterize to what extent contacts that happen close in time occur also close in topology. Specifically, we study the interrelation between temporal and topological properties of the contacts from three perspectives: (1) the auto-correlation of the activity time series which records the total number of contacts in a network that happen at each time step; (2) the interplay between the topological distance and interevent time of two contacts; (3) the temporal correlation of contacts within local neighborhoods beyond a node pair. By applying our method on 13 real-world temporal networks, we found that temporal-topological correlation of contacts is more evident in virtual contact networks than in physical contact networks. This could be due to the lower cost and easier access of online communications than physical interactions, allowing and possibly facilitating social contagion, i.e., interactions of one individual may influence the activity of its neighbors. We also identify different patterns between virtual and physical networks and among physical contact networks at, e.g., school and workplace, in the formation of correlation in local neighborhoods. Patterns and differences detected via our method may further inspire the development of more realistic temporal network models, that could reproduce jointly temporal and topological properties of contacts.

15:40
Criticality-driven graph summarization time series of a long-haul internet backbone network

ABSTRACT. The realisation of a fast and resilient communication network is a must in the society today, with a large number of infrastructure, business and end-users connected to the internet. This work proposes a definition and evaluation of the internet infrastructure criticality at the long-haul, physical level, of the backbone network. The approach is done from a time-dependent perspective. Particularly, this proposal investigates different network key performance indicators (KPIs) for assessing network efficiency at local and global levels through a graph summarization of the network.

The novelty of this proposal lies in the analysis of KPIs evolving over time (Fig. 1). Real observations of data traffic and multiple runs of network simulations will inform such KPIs, under a range of plausible scenarios. To facilitate the temporal network analysis, the overall algorithm pipeline is as follows: (1) Computing KPIs and (weighted) centrality measures with real and simulated data; (2) Approaching criticality at each network node; (3) Node-grouping - as graph reduction process; (4) Network embedding both at local subgraphs and for the whole network; (5) Time series mining at the multiple network embedding spaces; (6) Decision-making process.

The main advantage of this proposal is a better understanding of the time evolution of a network criticality for optimal decision-making support on maintenance operations and management. In addition, graph summarization (node-grouping and network embedding) provides an efficient temporal network representation, where Euclidean matrix-valued time series methods will work for the analysis further. The work presents the case study of the long-haul backbone network of one of the biggest internet providers in the UK. The graph summarization process shows its ability to generate knowledge for decision-making support to improve the network performance and resilience for a next generation of a national digital infrastructure.

15:55
Mitigate SIR Epidemic Spreading via Contact Blocking in Temporal Networks

ABSTRACT. Progress has been made in how to suppress epidemic spreading on temporal networks via blocking all contacts of targeted nodes or node pairs. In this work, we develop contact blocking strategies that remove a fraction of contacts from a temporal (time evolving) human contact network to mitigate the spread of a Susceptible-Infected-Recovered (SIR) epidemic. We define the probability that a contact $c(i, j, t)$ is removed as a function of a given centrality metric of the corresponding link $l(i,j)$ in the aggregated network and the time $t$ of the contact. The aggregated network captures the number of contacts between each node pair. A set of 12 link centrality metrics have been proposed and each centrality metric leads to a unique contact removal strategy. These strategies together with a baseline strategy (random removal) are evaluated in empirical contact networks via the average prevalence, the peak prevalence and the time to reach the peak prevalence. We find that the epidemic spreading can be mitigated the best when contacts between node pairs that have fewer contacts and early contacts are more likely to be removed. A strategy tends to perform better when the average number contacts removed from each node pair varies less. The aggregated pruned network resulted from the best contact removal strategy tends to have a large largest eigenvalue, a large modularity and probably a small largest connected component size.

16:10-17:10 Session 17: Poster VI
Weighted Graph Convolutional Networks for Twitter users’ geolocation

ABSTRACT. Predicting the geographical location of users of social media like Twitter has several applications in health surveillance, emergency monitoring, content personalization, and social studies in general. Thus, recent works have explored the usage of deep learning techniques as transformers and embeddings for the user geolocation prediction task. In this work we process a large collection of 900M tweets collected in Argentina in 2019, from which we prepare and make available a labelled dataset composed of 140k geolocated users, 9M geolocated tweets and 124 tweets in total. The dataset is available for hydration at [1]. We contribute to the research in this area by designing and evaluating new methods based on weighted multigraphs combined with state-of-the-art deep learning techniques. The structure of these graphs is the combination in different layers of “extended” mentions and follower networks, which we define in a special way to take into account the connections (mentioning or following) that users have through paths that go across external users). The features associated to each user come from a logistic regressor that combines embeddings of the user tweets’ content and the usage of local indicative words (LIW). We train the graphs with different information processing strategies, e.g., information diffusion through transductive and inductive algorithms -RGCNs and GraphSAGE, respectively- and node embeddings with Node2vec+ (see example in Figure 1). We assess the performance of each method in terms of the AUC and execution time, comparing them to baseline models both in the public Twitter-US dataset as in our dataset from Argentina.

[1] Twitter location data for datasets Twitter-ARG-Exact and Twitter-ARG-Bbox, https://github.com/fedefunes96/twitter-location-data.

The von Neumann entropy for the Pearson correlation matrix: A test of the entropic brain hypothesis for psychedelics

ABSTRACT. The entropic brain hypothesis states that key functional parameters should exhibit increased entropy during psychedelic-induced altered brain states. This hypothesis has gained significant support over the years, particularly via thresholding Pearson correlation matrices of functional connectivity networks. However, the thresholding procedure is known to have drawbacks, mainly its arbitrariness in the threshold value selection. In this work, we propose an entirely objective, threshold-independent method of entropy estimation. Let R be a generic N × N Pearson correlation matrix. We define ρ = R/N and prove that ρ satisfies the necessary conditions for a density operator. Therefore, the von Neumann entropy S = −tr (ρ log ρ) can be directly calculated from the Pearson matrix. To demonstrate the generality and power of the method, we calculate the entropy of functional correlations of the brains of volunteers given the psychedelic ayahuasca. We find that the entropy increases for the ayahuasca-induced altered brain state, as predicted by the entropic brain hypothesis for psychedelic action. beverage ayahuasca.

Timescale determines the entropic importance of edges in complex networks

ABSTRACT. Measuring the importance of edges (and subsequently nodes) is a central task in network science, yet this pursuit has not been sufficiently explored from the perspective of information theory. To this end, we utilize the framework of von Neumann entropy (VNE) for networks and quantify the importances of edges by studying how their removals change the network’s VNE. As a practical consideration, these VNE-based rankings are too computationally expensive to directly apply to very large networks, and we therefore introduce approximate rankings that utilize spectral perturbation theory to efficiently approximate how edge removals affect VNE. We focus on a formulation for VNE that is based on the eigenspectra of a Laplacian matrix, which allows us to interpret VNE (and the rankings obtained therefrom) using the perspective of diffusion dynamics. We study VNE-based rankings for synthetic and empirical networks representing the U.S. Senate, the London metro system, and the human brain, exploring how the rankings change as we vary a timescale parameter β > 0.

Communities in gene co-expression networks across different organs

ABSTRACT. Communities, or modules, in gene co-expression networks reveal sets of genes that are similarly expressed across individuals and therefore potentially involved in related biological processes. Nowadays, there has been an increasing amount and variety of gene expression data obtained across various organs. Gene expression and co-expression both depend on the organ in general. A challenge for deciphering such data is how to integrate and distinguish between communities found in different organs and find their biological relevance.

Here, we construct a multilayer gene co-expression network with two layers corresponding to two different organs in the digestive system: the pancreas and the stomach. First, for each organ we construct a gene co-expression network from tissue-specific transcripts per million (TPM) data obtained from GTEx portal and apply the graphical lasso algorithm to the Pearson correlation matrix of gene co-expression. Then, we use the multilayer Infomap algorithm to detect communities within the multilayer network. The results shown in the left panel of Figure 1 indicate that there are some generalist communities which contain mostly genes in both organs, and some communities that are specific to just one organ. We perform gene set enrichment analysis on each community of genes, using g:Profiler. The community with the most overlapping genes between the two organs (i.e., module 1 in Figure 1) has many significant biological terms, such as structural constituent of ribosome, which are considered to represent our basic biological function and are expressed in most organs in the human body. In contrast, communities with no shared genes with the other organ have unique significant biological terms. For example, module 15 has significant terms such as metal ion binding, and is specific to the pancreas. Module 10 has significant terms such as oxidative phosphorylation and respiratory chain complex, and is specific to the stomach. A few example significant biological terms and their corresponding p-values are listed in the table in the right panel of Figure 1. We plan to develop and apply methods to quantify the significance of each community detected in a multilayer network as well as to simultaneously analyze more organs.

Talents Network: How Sources Define the Sinks?

ABSTRACT. Education can be considered as one of the crucial factors contributing to economic growth and social progress. This proposal presents a data-driven approach to investigate how college educations mechanism for workforce development. While labor economics has evolved to study workers' skills to explain labor trends, a similar lens has yet to be applied to workforce development at scale. To do so, we employ various novel large datasets which offer the best opportunity for connecting labor dynamics and higher education. The problem area is of huge significance as higher education stakeholders are seeking new guidelines to adapt themselves to the future of work. There are very few studies using these datasets which provides a unique opportunity to gain new insights into the problem.

On the limits of contact tracing: a cascade epidemic process with quarantine

ABSTRACT. We obtain analytical and simulation results of the effect of contact tracing on an epidemic cascade model. We study generation by generation, discrete time and continuous time models in a mean field approach. We show that contact tracing is quite limited in reducing the natural $R_0$ of an epidemic, specially if the health system runs way behind the epidemic front. We compare the results to synthetic data and with the data from Cuba's first COVID wave.

Detailed Wage Gap Decompositions: Controlling for Unobserved Worker Heterogeneity Using Network Theory

ABSTRACT. Recent advances in the literature of decomposition methods in economics have allowed for the identification and estimation of detailed wage gap decompositions. Differences in wages are decomposed into a component explained by differences in skills and a residual component that may reflect factors such as discrimination. In the context of such detailed decompositions, building reliable counterfactuals requires using tighter controls to ensure that similar workers are correctly identified by making sure that important unobserved variables such as skills are controlled for, as well as comparing only workers with similar observable characteristics. This paper contributes to the wage decomposition literature in two main ways: (i) developing an economic principled network based approach to control for unobserved worker skills and job task heterogeneity; and (ii) extending existing generic decomposition tools to accommodate for potential lack of overlapping supports in covariates between groups being compared, which is likely to be the norm in more detailed decompositions. We illustrate the methodology by decomposing the gender wage gap in Brazil. We find that better controlling for unobserved worker and job heterogeneity reduces the portion of the gender wage gap that cannot be explained by covariates and thus plausibly reflects discrimination. However, even with detailed controls, male workers still outearn female workers by 14%.

Are We Fascinated by Eccentric Ideas?

ABSTRACT. Analyzing extreme behavior in our routine choices and discussions, which we label as eccentric behavior, can be of utmost importance to understand the increasing polarization in society. In our work, we compare the popularity of ideas against their eccentricity to understand individuals’ fascination towards eccentricity. We have collected and analyzed data from two sources of completely different natures. First, we collected data from online exper-iments at a mid-size US university, where students from different study domains were recruited to participate in the experiment. Second, we collected posts and connection information of approximately three thousand users from GAB, a far-right oriented social media site, from August 2016 till January 2021, gathering a total of 147,000 gabs (posts).

Each idea in both datasets was converted into a numerical vector using Doc2Vec and PCA. For each user, we reconstructed their “knowledge base” at every time point, which is a set of recent idea-vectors posted by the user or by their neighbors. Using this knowledge base as a frame of reference, we computed the eccentricity of each new idea, which was defined as the distance of the new idea-vector from the “center of gravity” in the knowledge base of the user. This captures how off central the new idea was relative to general discussion going on around the user. In the following steps, ideas were categorized into different popularity levels based on the number of likes. For each popularity level, probability distribution of eccentricity was constructed using the kernel density estimation method. Results are sum-marized in Figure 1. In the plots of high popularity levels, the tail of the probability density function becomes broader in all three data sources. The average eccentricity also increases as the popularity level goes up. The average eccentricity and eccentricity distributions for different popularity levels are significantly different from each other. We can conclude that the eccentric ideas attract more attention of the audience; in other words, individuals are more captivated by eccentric opinions.

Floquet Theory for Spreading Dynamics over Periodically Switching Networks

ABSTRACT. For many social, physical, and biological networks, the structure evolves over time and their structural patterns switch with daily, weekly and/or annually cycles. For example, bus schedules vary for weekdays and weekends, since most persons have different schedules at these times. Thus motivated, we formulate and analyze metapopulation susceptible-infected-susceptible (SIS) epidemic models over temporal networks having an adjacency matrix, A(t)that periodically switches. Letting x_i(t) be the fraction of infected individuals in metapopulation region $i\in\{1\dots,N\}$, we study a time-dependent linearized ODE for the expected evolution for x(t)=[x_1(t),..,x_N(t)]^T, which is given by $\frac{d}{dt}\mathbf{x}(t)={\bf M}(t)\mathbf{x}(t)$ with ${\bf M}(t)=\beta {\bf A}(t)-\mu {\bf I}$. Here, $\beta$ and $\mu$ are the recovery and infection rates, respectively. Using Floquet theory a framework that extends the theory of linear systems to the setting of time-varying periodic systems---we characterize the epidemic threshold and growth/decay rates in terms of a Floquet multiplier of a system's monodromy matrix. We apply our theoretical techniques to explore curfew strategies that balance human mobility and risk to infection. We also investigate Parrando's Paradox for this context, whereby we find that supercritical epidemics could mistakenly be predicted to be subcritical if one neglects that the system is inherently temporal and periodic.

Category Integration and Evaluation in Science

ABSTRACT. Categories help audiences navigate markets and interpret the offerings before them. How does this process work and what are its consequences? The dominant view casts offerings—from financial securities to scientific papers—as sets of concepts, recombined from categories whose reception depends on their perceived ambiguity, novelty or surprisal. But this deconstructed view seems at odds with how we actually experience the world. Humans receive information not in unstructured sets but in sequences that traverse the network of human knowledge, moving within and between market categories, reinforcing or violating audience expectations with each step. A scientific paper is not just a combination of concepts from various disciplines, but a complex pattern of disciplinary integration—yet we know little about how these patterns affect evaluation in science or elsewhere. This paper advances a theory of category integration, drawing insights from human graph learning to extend our understanding of categorization and evaluation. It departs from exist- ing literature by considering how the organization of categorical information inside an offering affects inferences about its quality and applications. I claim that integration functions as a moderator that amplifies or attenuates the perceived ambiguity of market offerings. Offerings that densely connect concepts from different categories are likely to be seen as more ambigu- ous, broader in application but lower in quality than category spanning offerings that reinforce category boundaries through their structure. To test this claim, I study how scientists evaluate interdisciplinary research using a large full-text corpus of scientific journal articles. I construct a representation for each article by first parsing the sequence of cited references from its text. I encode the relationships between neighboring instances of references as a first-order Markov chain and then, having labeled each reference with its disciplinary provenance, I project these relationships onto those disciplines. I describe the path a paper takes through the discipline space in the language of information theory by estimating the surprisal of each step, given audience expectations. Weighted sur- prisal can be manipulated to measure four quantities of interest: category spanning, category integration, the aggregate ambiguity (or interdisciplinarity) of an offering and the weighted KL divergence between different models of the stimulus that audiences see. Applying these measures across my corpus reveals that scientific papers vary widely in the extent to which they recombine and integrate knowledge from multiple disciplines. These dimensions are non-redundant: the correlation between category spanning and integration is weak, while the KL divergence between the combinatorial view and my structural view of market offerings is high. The interaction effects of category spanning and integration on cita- tions, a proxy for positive evaluations, are consistent with the claim that integration functions as a moderator. When category spanning and integration are combined into a single mea- sure—aggregate ambiguity—it has an inverted-U relationship with citations that is conserved across nearly all research areas. Contrary to prevailing wisdom, my results suggest that authors need to be disciplined in the communication of multidisciplinary content in order to maximize their appeal.

The Philippine Cyber-political Divide: Online Polarization Amid The 2022 Philippine Elections

ABSTRACT. In this article, we study Twitter users (>100,000) interact with topics (?100) centered around the upcoming 2022 Philippine Presidential Elections. Here, “interaction” looks beyond the usage of the hashtags as it is also about how users quote or respond to tweets containing hashtags of interest. Specifically, we want to uncover levels of political polarization between supporters of two leading presidential candidates – Ferdinand Marcos, Jr., and Leni Robredo noting that political polarization has been found to affect the susceptibility of groups to fake news. We analyze the results of rule-based annotation to distinguish the behavior of supporters of the two groups with regards to negativity, content, and interactions with one another. Groups that tweet primarily of Leni Robredo almost exclusively quote one another’s tweets acting like an echo chamber. Those tweeting largely about Ferdinand Marcos, Jr. tend to interact with a mix of those tweeting about Leni Robredo and others that tweet about Marcos, Jr.

Upon comparing with the results of community detection, accounts tweeting about Leni Robredo that fall into communities primarily made of accounts that talk largely about Marcos, Jr. speak negatively of Leni Robredo. We find at least two classification of Twitter users that support Marcos, Jr. – those that primarily tweet negatively about Leni Robredo and those that tweet positively about Marcos, Jr. Comparing with the community of largely Leni Robredo accounts, we find no such distinction. These phenomena highlight the strategies online citizens adopt when campaigning online.

Sentiment and structure in word co-occurrence networks on Twitter

ABSTRACT. We explore the relationship between context and happiness scores in political tweets using word co-occurrence networks, where nodes in the network are the words, and the weight of an edge is the number of tweets in the corpus for which the two connected words co-occur. In particular, we consider tweets with hashtags #imwithher and #crookedhillary, both relating to Hillary Clinton's presidential bid in 2016. We use these unique original tweets (no retweets or quote tweets) to form three corpora: the ``favor'' corpora composed of tweets with #imwithher, the ``against'' corpora for tweets with #crookedhillary, and the ``all'' corpora, which combines the previous two corpora. The tweets were sampled from periods where the respective hashtags reached their peak usage in 2016. Neutral words are found to be dominant and most words, regardless of polarity, tend to co-occur with neutral words. We also do not observe any score homophily among positive and negative words. Given that several words in the corpus do not occur frequently, we extracted a more robust network structure using network backboning. We implemented it in two passes, first by removing frequently occurring words on Twitter, and second, by using the disparity filter to remove edges between words that co-occur less frequently. Using the Louvain algorithm on the resulting backbone results in well-defined communities that correspond to themes in favor or against the target (Hillary Clinton). Thus, although we observe no clear relationship between happiness scores and co-occurrence at the node or edge level, a community-centric approach can isolate themes of competing sentiments in a corpus.

Integrated Twitter Analysis to Distinguish systems Thinkers at Various Levels

ABSTRACT. Although the application of Systems Thinking (ST) has become essential for practitioners and experts when dealing with turbulent and complex environments, there are limited studies available in the extant literature that investigate how experts' systems thinking skills can be revealed within Twitter analysis. To address this gap, this study uses a social network analysis approach to explore the relationship between experts' different levels of systems thinking skills, the Twitter clusters, and the followers’ network. COVID-19 emerged as a relevant case study to investigate the relationship between COVID-19 experts’ Twitter network and their systems thinking capabilities. A sample of 55 trusted expert Twitter accounts related to COVID-19 has been selected for the current study based on the lists of Forbes, Fortune, and Bustle. These experts' Twitter network has been constructed based on features extracted from their Twitter accounts, including organic tweets’ metrics, other tweets’ measures, sentimental analysis, source of tweets, their textual tweets, and other extracted features. After clustering the network based on tweets and Twitter features, we found that three distinct groups of experts emerged. For validating the result, the followers’ network of experts was constructed. Then, the system thinking dimensions were mapped to followers’ network characteristics. These included Betweenness centrality, Closeness centrality, Degree centrality, Eigen centralities, and node-level metrics. By comparing the followers’ network characteristics of 55 experts, this study found that three identified clusters had meaningful differences in centrality scores and node-level metrics. The cluster with a higher, medium, lower score can be classified as Twitter accounts of Holistic thinkers, Middle thinkers, and Reductionist thinkers, respectfully. In conclusion, this research tested that the capabilities of individuals as system thinkers can reveal unique network patterns and distinct communities associated with the level of system thinking of COVID-19 experts.

Intermunicipal Travel Networks of Mexico

ABSTRACT. In this work, we present networks that describe the travel patterns between municipalities in Mexico. Using anonymized mobile device geo-location data, we construct directed, weighted networks that capture the flow of people between municipalities on a daily basis; the weights in these networks represent a (normalized) measure of the volume of travel between municipalities. We have generated a dataset of 731 networks of daily intermunicipal travels in between 2020-01-01 and 2021-01-12.

We show that the weight distribution of these networks is heavy-tailed, showing some routes being the most heavily travelled. We also show that the sum of weights in the network (a measure of the overall volume of intermunicipal travels within the country) fluctuates in the analysed period; with downward trends associated to phenomena such as the national Covid-19 lockdown and later Covid-19 peaks, and upward trends associated to vacation periods.

We describe changes in centrality measures (degree, strength, betweenness centrality) in the time period. We observe that municipalities with larger populations exhibit higher, less time-variant centrality values than municipalities with smaller population sizes. We show that changes in centrality measures also weakly correspond to Covid-19 phenomena.

Finally, we explore the community structure on these networks (using label propagation) and its changes. We show that these communities are composed mostly of geographically adjacent municipalities. We show that communities break down into smaller modules during low mobility periods such as lockdowns, and regain size when restrictions are lifted.

We believe that the generated dataset may be of interest for network scientists interested in social and epidemiological applications.

The Development of Network Theorizing

ABSTRACT. Without being rooted in network theories, the results generated by whatever sophisticated mathematical models or metrics that utilize network analytics would be undertaking the risk of limited explanatory capacity and therefore lose their power in enlightening social practical implications. Revitalizing the superior role of network theories is especially imperative in terms of the computational revolution in the computational social science domain. Model algorithms design is crucial in generating effective and beneficial results, whereas biased algorithms will further entrenched the evaluating system which would in turn influence the orientation, quality and value of people’s work (Scott,1988; O’Neil, 2016). In order to fully unpack the potential of network analysis techniques enabled by the revolution of big data and technologies, the study reviews and reflects the development of network theories, identifies holes where network theories are missed and suggests ways of which network theories could be better integrated. Specifically, the study begins with a review of Simmel's ideas that set the theoretical foundation for network thinking and formalize the fundamental significance of social networks. Subsequently, the study reviews classic network studies mostly from the 1980s and 1990s with an overall aim to learn the prominent ways that networks were studied and understood, particularly in terms of the aspects of significance and usefulness. After the general review, the study turns the spotlight on the theoretical importance of structural holes, one of the key domains within the network field, to further extend the explanations of why social networks matter, and more importantly, to synthesize ways in which the theories are misused or overlooked in current network research.

(Post-)Modern Network Thinking. Gerog Simmel was one of the earliest who engaged in a structural approach to explore human interaction in modern urban society, which set the classic theoretical foundation for the emergence and evolution of network thinking. His ideas of association, individual and social relationship, social geometry, as well as connection and disconnection, collectively provide social explanations for psychological experiences in modern society, which precondition the individual agentic capacity and preferences (i.e. to build a bridge / door), as well as social interaction patterns as a consequence of the “spoke structure” of social web affiliations (Simmel, 1950, 1964, 1994; Simmel & Levine, 1961).

Contemporary Network Theories. By the end of the twentieth century, the explosion of technical ability to do network analysis started to overshadow the explanation power of network theories. Several influential theoretical arguments emerged within the last two decades. Scott (1988) provided a key summary about how social networks provide a powerful model for social structure and why it matters to always start with theory, which provides explanation for helping understand where theoretical problems happen in computational sciences that don't have a grounding in social science because the visualization has become the key element of big data analysis. Granovetter’s (1983) theory of weak ties argued that although strong ties tend to be better for emotional support, such as feelings of closeness and stability, weak ties are better in some occasions that are crucial for social mobility, such as the case that weak ties are more useful in finding a job. This beneficial return of ties is closely related to social capital. Lin (1999) proposed a network perspective of social capital by considering social capital as assets in networks. The network approach provides opportunities to resolve the controversies in the notion of social capital, such as many have focused on the network closure embedded in the theories of Bourdieu, Coleman, and Putnam, there has not been enough attention to the role of brokers, who traverse network holes.

The Network Theory of Structural Holes. Bearing the general network ideas in mind, building upon Burt’s idea, this section centers around a particular approach to the study of social networks that focuses on the theoretical importance of structural holes. Social capital is a metaphor of social resources and advantages. The structural holes therefore become a form of social capital because of its ability in increasing information and control benefits. Getting access to and extraction of resources capture the nature of social capital as a metaphor. However, conflicting arguments have been proposed regarding the benefits of structural hole and brokerage. Although Burt (2001a, 2001b, 2002, 2004) has largely argued that structural holes indicate advantages, counterpoints have been proposed in terms of the dynamic of other/no-broker actors (Buskens & van de Rijt, 2008), and the effectiveness of being a broker (Gargiulo & Benassi, 2009).

To sum up, these theoretical pieces provide a concrete conceptual map of network studies. However, as influential and instructive as they are, some of the theoretical aspects are minimized or lost in the current computational social science field. By synthesizing these aspects and reflecting upon current network studies, this study aims to revitalize the leadership role of network theories in shaping the field of social networks analysis.