Infectious disease dynamics in homophily-driven dynamic small-world networks: A model study.
ABSTRACT. Dear selection committee,
in the absence of a personal comment section, I hope it is okay to use the abstract section. I would like to inform you that the extended abstract I am submitting has been accepted for the Applied Network Science special issue on "Epidemics Dynamics & Control on Networks". As ANS is one of the journals to cover the conference I believe it would be a great addition to COMPLEX NETWORKS 2020. Further, my selection for "post-conference publications" is only with regards to selections of extended abstracts, not for the manuscript itself.
Testing framework for proxy-based Influence maximization algorithms
ABSTRACT. We devise a testing framework to rank proxy-based influence maximisation algorithms. Earlier works compare these algorithms by calculating their top choices on a given network, then checking which set fares better in a diffusion simulation. In real-life applications, however, the top choices of these algorithms might be unaccessible for various reasons. Consequently, we have to choose our spreaders from a different set that might not contain any highly ranked agents at all. There is no guarantee that a proxy that is better at predicting the performance of the most popular agents will be equally successful for an arbitrary group of individuals. This calls for a systematic test, and in this paper, we provide one with the help of a novel statistical method, the Sum of Ranking Differences. For demonstration, we use the real-life social network, iWiW and a classical diffusion model, the Linear Threshold. The final ranking of the proxies is remarkably different from what we obtain by examining the performance of their top choices. The results highlight that the standard test alone is an inadequate predictor of performance and SRD should be a necessary, if not the primary tool for ranking proxies.
Analyzing the Impact of Geo-Spatial Organization of Real-World Communities On Epidemic Spreading Dynamics
ABSTRACT. Models for complex epidemic spreading are an essential tool for predicting both local and global effects of epidemic outbreaks. The ongoing development of the COVID-19 pandemic has shown that many classic compartmental models, like SIR, SIS, SEIR considering homogeneous mixing of the population may lead to over-simplified estimations of outbreak duration, amplitude and dynamics (e.g., waves). The issue addressed in this paper focuses on the importance of considering the social organization into geo-spatially organized communities (i.e., the size, position, and density of cities, towns, settlements) which have a profound impact on shaping the dynamics of epidemics. We introduce a novel geo-spatial population model (GPM) which can be tailored to reproduce a similar heterogeneous individuals' organization to that of real-world communities in chosen countries. We highlight the important differences between a homogeneous model and GPM in their capability to estimate epidemic outbreak dynamics (e.g., waves), duration and overall coverage using a dataset of the world's nations. Results show that community size and density play an important role in the predictability and controllability of epidemics.
Specifically, small and dense community systems can either remain completely isolated, or show rapid bursts of epidemic dynamics; larger systems lengthen the epidemic size and duration proportionally with their number of communities.
Testing strategies for epidemic detection in livestock trade network
ABSTRACT. Animal trades between farms and other livestock holdings form a complex network
which is known as an ‘animal trade network’. The movement of animals play a key role
in the spread of infectious diseases among farms. The existence of large disease outbreak among animal farms in different countries in the past leads to irreparable impacts on the global economy and public health [1]. Examples of these outbreaks are that of foot and mouth disease in the UK in 2001 [2], the swine fever epidemics in Netherlands and Germany in the 1990s and in 2003 [3] and the recent African swine fever (ASF) outbreak in Eastern Europe [4] and China in 2018 and 2019 [5]. Therefore studies of spreading dynamics are highly necessary for both health and economic reasons.
We propose a temporal network model for the generation of synthetic animal trade
data with realistic structure and dynamics which can be used for Monte Carlo simulations in the field of the livestock trade system. Our random animal trade transmission network model considers a set N of nodes representing hypothetical farms and traders with different characteristics. It then generates a hypothetical sequence of animal transmissions (t; i; j;x) that may happen between the nodes within some time interval [0;T].
Here, x is the number of animals transported from node i to node j at time point t.
Figure 1 shows the logic of our model for the application case of the German pig trade network.
Our experimental results show that the model can well reproduce several relevant
properties of real trade networks such as the distributions of two novel centrality-like metrics that we specifically designed to indicate nodes that appear promising to test for epidemic outbreaks.
Stimulation Index of Cascading Transmission in Information Diffusion over Social Networks
ABSTRACT. Analyzing and modeling of the information diffusion on social networks is essential because the SNSs have become established as an crucial information infrastructure.
In particular, "Influence Maximization," extracting information source nodes that deliver information to as many users as possible on the network, has been widely worked on.
However, the actual information diffusion is caused not only the propagation according to the network structure but also locally rise the ”trend” topics.
Then, we focused on edges that cause a chain of information transmission, regardless of the number of people who can reach the information.
Based on the information cascade where information is propagated in chains between nodes on a network, we propose Stimulation Index to quantify how much edges affect the subsequent transmission of information.
We also evaluated the proposed index using an artificial network and verified that it works effectively.
Diffusion dynamics prediction based on subgraph samples and motifs
ABSTRACT. Motifs are believed to represent structural and dynamical properties in networks. Nevertheless, small motifs are not always representative, while large motifs are hard to evaluate, which results in problem of recognition of optimal motif size, effective algorithms development, and motif significance estimation.
We explore, in which extent diffusion dynamics on a graph can be estimated on the base of its subgraphs and motifs in particular. For this purpose we explore and compare motifs distributions for initial graph and its samples extracted by different techniques, and analyse how subgraph sizes affect prediction accuracy of diffusion time on the base on motifs.
This allows to understand which subgraph sizes are appropriate for such kind of prediction, and how can we represent subgraph structural patterns to use smaller samples for dynamics approximations on large graphs. Several sampling techniques are compared for VK dataset with interest attribute markup. 4-5 node motifs are taken for graphs representation and for prediction evaluation.
Comparing the efficacy of embeddings in Hyperbolic and Euclidean geometries with respect to the task of community detection.
ABSTRACT. Network representation learning has emerged as a efficacious tool in graph mining. Recent work has suggested that representations embedded in hyperbolic geometry often surpass representations learned in Euclidean geometry on downstream tasks by notable margins. Hitherto, most work has examined tasks such as link prediction, network alignment, and node classification; however, community detection remains a notable omission. In this extended abstract, we seek to address this gap.
Calibrating Network Models for Real Networks Using graph2vec Embedding
ABSTRACT. The emergence of machine learning and data-driven solutions has opened up new research areas in network science. A related research question is how to fine-tune the parameters of network models to mimic real-world network as closely as possible. The traditional approach is to manually select a few graph metrics and calibrate the parameters of the network models in order to produce synthetic graphs that accurately match the properties of the original networks.
In this work, we study an alternative framework where the model calibration is carried out in the embedding space learned by the grpah2vec method. Here we focus on four frequently used network models that we calibrate for 412 real-world networks from various domains. We study which models describe real-world networks the most accurately in each domain. Moreover, we compare the embedding based network calibration method to the feature-based method in terms of accuracy, time complexity, and interpretability. We can conclude that while embedding based network calibration is promising but, it can be cumbersome for large networks due to the running time.
Detecting Geographical Competitive Structure for POI Visit Dynamics
ABSTRACT. We provide a framework for analyzing geographical influence networks that have impacts on visit event sequences for a set of point-of-interests (POIs) in a city. Since mutually-exciting Hawkes processes can naturally model temporal event data and capture interactions between those events, previous work presented a probabilistic model based on Hawkes processes, called CHP model, for finding cooperative structure among online items from their share event sequences. In this paper, based on Hawkes processes, we propose a novel probabilistic model, called RH model, for detecting geographical competitive structure in the set of POIs, and present a method of inferring it from the POI visit event history. We mathematically derive an analytical approximation formula for predicting the popularity of each of the POIs for the RH model, and also extend the CHP model so as to extract geographical cooperative structure. Using synthetic data, we first confirm the effectiveness of the inference method and the validity of the approximation formula. Using real data of Location-Based Social Networks (LBSNs), we demonstrate the significance of the RH model in terms of predicting the future events, and uncover the latent geographical influence networks from the perspective of geographical competitive and cooperative structures.
Consensus Embeddings for Networks with Multiple Versions
ABSTRACT. Machine learning applications on large-scale network-structured data commonly encode network information in the form of node embeddings. Network embedding algorithms map the nodes into a low-dimensional space such that the nodes that are "similar" with respect to network topology are also close to each other in the embedding space. Many real-world networks that are used in machine learning have multiple versions that come from different sources, are stored in different databases, or belong to different parties. Due to efficiency or privacy concerns, it may be desirable to compute consensus embeddings for the integrated network directly from the node embeddings of individual versions, without explicitly constructing the integrated network. Here, we systematically assess the potential of consensus embeddings in the context of processing link prediction queries on user-chosen combinations of different versions of a network. For the computation of consensus embeddings, we use linear (singular value decomposition) and non-linear (variational auto-encoder) dimensionality reduction methods. Our results on a large selection of protein-protein interaction (PPI) networks (eight versions with 255 potential combinations) show that consensus embeddings enable real-time processing of link prediction queries on user-defined combinations of networks, without requiring explicit construction of the integrated network. We observe that linear dimensionality reduction delivers better accuracy and higher efficiency than non-linear dimensionality reduction. We also observe that the performance of consensus embeddings is amplified with increasing number of networks in the database, demonstrating the scalability of consensus embeddings to growing numbers of network versions.
Graph Convolutional Network with Time-based Mini-batch for Information Diffusion Prediction
ABSTRACT. Information diffusion prediction is a fundamental task for understanding and controlling information spreading phenomenon. Many of the previous works use static social graph or cascade data for diffusion prediction. A recently proposed model DyHGCN [22] newly uses dynamic graph and achieve better performance. However, training phase of DyHGCN is computationally expensive due to the multiple graph convolution computation. To reduce the training time of handling dynamic graph, we propose a deep learning model which uses a novel mini-batch input and a dynamic graph. We conduct experiments on three real-world datasets. The experimental results show that our model performs comparable results against baseline models. Moreover, our model learns about 5.97 times faster than the state-of-the-art baseline, which shows the effectiveness and efficiency of the model.
A Sentiment Enhanced Deep Collaborative Filtering Recommender System
ABSTRACT. Recommender systems use advanced analytic and learning techniques to select relevant information from massive data and inform users’ smart decision-making on their daily needs. Numerous works exploiting user’s sentiments on products to enhance recommendations have been introduced. However, there has been relatively less work exploring higher-order user-item features interactions for sentiment enhanced recommender system. In this paper, a novel Sentiment Enhanced Deep Collaborative filtering Recommender System (SE-DCF) is developed. The architecture is based on a Neural Attention network component aggregated with the output predictions of a Convolution Neural Network (CNN). Specifically, the developed neural attention component puts more emphasis on user and item interactions when constructing the latent spaces (user-item) by adding the mutual influence between the two spaces. Additionally, the CNN learns the specific review of users and his sentiments aspects. Hence, it models accurately the item latent factors and creates a profile model for each user. The proposed framework allows users to find suitable items through the comprehensive aggregation of user’s preferences, item attributes, and sentiments per user-item pair. Experiments on real-world data prove that the proposed approach significantly outperforms the state-of-the-art methods in terms of recommendation performances
Avres: visualising multilayer networks for analysing human-trafficking networks
ABSTRACT. Human-trafficking relies on complex social networks activity, which usually involves various kinds of interactions between the actors. Investigation of such criminal networks using traditional statistical tools can quickly become overwhelming. Visualization tools can help to reduce this complexity (increased with the different layers of interactions) by assisting users in different tasks such as the creation of graphs and their exploration. We propose Avres, a network based approach to enhance network construction and analysis. Using a flexible multilayer model approach and a network visual representation based on multiple views, the expert is able to keep track of new incentives as she goes through new interesting parts of the network. The tool, freely available, is prototyped on web technologies and packed with graph algorithms together with visual representations, supported with graph databases, to allow data insertion and analysis of criminal networks. The tool has been successfully used in the investigation of a Nigerian network active in human trafficking.
SaraBotTagger - a Light Tool to Identify Bots in Twitter
ABSTRACT. In this work we present SaraBotTagger, a tool for identifying
bots in Twitter based only in the metadata of the accounts based on a
Random Forest Classifier, able to efficiently identify bots in Twitter. We
also propose a validation methodology that verify if the classifier is able
to identify bots that are actually suspended by Twitter. We apply our
methodology to the context of two real-world discussion, regarding STF
(the Supreme Court of Brazil) and anxiety and investigate the action of
bots and humans from the perspective of their description, the tweets in
their timelines and the their topological organization in retweet networks.
Connecting the Dots: Integrating Point Location Data into Spatial Network Analyses
ABSTRACT. Transportation networks allow us to model flows of people and resources across geographic space, but the people and resources we wish to model are not natively tied to our networks. Instead, they can occur as point data (such as store, train station, and domicile locations) and/or grid data (such as socio-economic and aggregate area data). Here we present a set of methods to integrate point data into an augmented transportation network. This method facilitates analyses of temporo-spatial measures (such as accessibility scores) using only efficient breadth first search algorithms. We demonstrate the approach by calculating walkability scores for the train stations within the central Tokyo area.
Topological Analysis of Synthetic Models for Air Transportation Multilayer Networks
ABSTRACT. Airline transportation systems can naturally be modeled as multilayer networks, each layer capturing a different airline company. Originally conceived for mimicking real-world airline transportation systems, synthetic models for airline network generation can be helpful in a variety of tasks, such as simulation and optimization of the growth of the network system, analysis of its vulnerability or strategic placement of airports.
In this paper, we thoroughly investigate the behavior of existing generative models for airline multilayer networks, namely BINBALL, STARGEN, and ANGEL. To conduct our study, we used the European Air Transportation Network (EATN) and the domestic United States Airline Transportation Network (USATN) as references. Our extensive analysis of structural characteristics has revealed that ANGEL excels the two previously introduced generative models in terms of replication of the layers of the reference networks. To the best of our knowledge, this is the first study that provides a systematic comparison of generative models for airline transportation multilayer networks.
Order Estimation of Markov-Chain Processes in Complex Mobility Network Embedded in Vehicle Traces
ABSTRACT. Vehicle mobility in urban traffic systems is complex, partly because it depends on it reflects mobility of a human who drives a vehicle, and partly because it depends on many roles which the vehicle plays. Previous studies on human mobility revealed that it includes Levy-flights-like motions and memoryless deterministic walks as well as random walks, but the mobility of vehicles may be more biased due to their
functions. Focusing our research target on a sightseeing vehicle with sufficiently
limited functions, we show a method to measure regularity of visitation patterns, quantified by order(s) of Markov chains in their mobility. Graphs of higher-order Markov chains, which are representatives of mobility in a network style, possess statistical properties; in our observation dataset, they include degree distributions similar to scale-free networks. The detection of mobility in real social experiments, which is also assumed on these graphs, yields the order of Markov chains inside it with its comparison with the results of agent-based simulations. Centrality indices of the mobility networks well coincide with prediction of these analytical and numerical results.
Economic Integration Index Evaluated from Loop Flow Component in Global Value-Added Network
ABSTRACT. We studied the structure of value-added flow in GVAN from 2000 to 2014 constructed from the Input-Output model. We found GVAN had two regional communities: Europe and the Pacific Rim for 2004--2011 using the map equation method and decomposed the value-added flow into the circular flow and gradient flow in these communities. We also found that the value-added is circulating within the Strongly Connected Component. Moreover, we calculated Economic Integration Index of the regions around economic crisis, which indicated Europe was much unstabler than the Pacific Rim.
Measuring the Nestedness of Global Production System based on Bipartite Network
ABSTRACT. Nested structure was originally found in the species mutualistic network, and later found in the socioeconomic system. In order to explore whether there is a nested structure and the level of nestedness in the global production network on the global value chain, here we construct a bipartite network based on the Inter-Country Input-Output table, in which each node assume two roles, namely upstream industry and downstream industry. Then we use SBD, BIN algorithm and FCA algorithm to reorder the adjacency matrix to maximize its degree of nestedness, and we further measure the NODF of each resulted matrix by reordering so as to ensure which ranking algorithm can access to the best result. We can explain the industrial niches of the upstream and downstream industry sectors as well as the competition and cooperation between the various industries by analyzing the best result. This research is of great significance for understanding the ecological relationship between industrial sectors in the global value chain.
Extracting the Backbone of Global Value Chain from High-Dimensional Inter-Country Input-Output Network
ABSTRACT. As networks are widely used to represent the abstraction of the real-world systems, the extraction of the truly important edges linking nodes of a large-scale network, namely, network backbone, which can help form reduced but meaningful representation of a dense complex network and understand the topological characteristics and grasp the key points of analysis, has been attracting increasing attentions in recent years. Among numerous existing methods, we believe there is no appropriate one for the inter-country input-output (ICIO) network, which is almost a both directed and weighted complete multigraph even with all the self-loops that could exist. In this paper, we redefined the inter-country and inter-sector propagating process of intermediate goods, coming up with the concept of Strongest Relevance Path Length (SRPL) based on Revised Floyd-Warshall Algorithm (RFWA). Then the Global Value Chain (GVC) backbone can be effectively abstracted according to the comparison between real ICIO relations and SRPL paths and network analysis was then carried out to testify the global economic integration trends. The results show that the backbone of GVC has a more complete topological structure in the economic sense with many undiscovered trends in the former international trade network analyses showing up.
Analysis of tainted transactions in the Bitcoin Blockchain transaction network
ABSTRACT. Blockchain technology, with its decentralised peer-to-peer network and cryptographic protocols, has led to a proliferation of cryptocurrencies, with Bitcoin at the forefront. The blockchain publicly records all Bitcoin transactions which can be used to build a dynamic and complex network to give a representation of the transactions in the underlying monetary system. Despite the cryptographic guarantees there exist inconsistencies and suspicious behavior in the chain. We reported on two such anomalies related to block mining in previous work.
In this paper, we build a network using bitcoin transactions and apply techniques from network science to analyse its complex structure. We focus our analysis on sub-networks induced by the two sets of anomalies, and investigate how inequality in terms of wealth and anomaly fraction evolves from the blockchain's origin. Thereby we present a novel way of using network science to detect and investigate cryptographic anomalies.
Forming diverse teams based on members' social networks: a genetic algorithm approach
ABSTRACT. Previous research shows that diverse teams in background and skills can outperform homogeneous teams. However, people often prefer to work with others who are similar and familiar and fail to assemble teams with high diversity levels. We propose a team formation algorithm that suggests diverse based on individuals' social networks, allowing them to keep high familiarity levels. Our novel algorithm is based on the NSGA-II genetic optimization that splits students into well-connected and diverse teams within an organizational network. It optimizes measures of team communication cost and diversity in O(n^2) time. The optimization finds Pareto optimal solutions that optimize both metrics, returning teams that have both diversity in member attributes and previous connections between members. We tested the algorithm on real team formation data collected from the MyDreamTeam platform. The solutions provided by the algorithm are superior to the teams assembled by the students, in both diversity and communication cost measures.
Nowcasting country-wide headache symptoms from social media traces and air quality
ABSTRACT. On the World Health Organization’s ranking of causes of disability, headache disorders are among the ten most disabling conditions for both genders combined and among the five most disabling for women. Headaches can be a sign of stress or emotional dis- tress, or it can result from a medical disorder, such as migraine or high blood pressure, anxiety, or depression. It can also be caused by environmental factors such as weather and air pollution. In the United States annual healthcare costs attributable to migraines have been estimated to approximate $17 billion. In the EU, the total annual cost of headache was calculated at 173 billion euros.
Despite its importance, there are not official data streams to measure headache problems in nearly real time, just emergency visits. In this study we leverage the wealth of user-generated data, available through social media and air pollution data to estimate the possibility to measure and track headaches in near real time. Our work builds on recent literature on how social media can be used to alert and nowcasting public health problems such as epidemics or health emergencies.
Trust somebody but choose carefully : an empirical analysis of social relationships on an exchange market
ABSTRACT. This article analyses the influence of trust on the functioning
of a market for perishable goods, where there exists no quality signal
and quantities can be scarce. On this market, agents can choose between
bidding or exchanging through bilateral transactions. Starting from the
empirical analysis of a market with a peculiar organization we propose a
measurement of trust, based on the dynamics of agents’ encounters. We
then analyze the differences in the social network structures and estimate
how they affects the market outcomes.We bring into the light that, when
the transaction links on the auction market reflects the economic constraints
of the partners, the relationships on the bilateral market depends
on something more. Clearly, the prices of the bilateral transactions are
the consequences of economics and non economics determinants. At first
glance, the stable co-existence of two market structures looks like a paradox.
Our results help to understand the distinctive characteristics and
functioning of each sub-market.
Extending DeGroot Opinion Formation for Signed Graphs and Minimising Polarization
ABSTRACT. Signed graphs offer a more rich representation of social networks than unsigned graphs. Most opinion formation models are developed for unsigned graphs. In this paper, we extend DeGrootian opinion dynamics to accommodate signed graphs. Furthermore, we also define the task of minimizing polarization on a budget through the lens of this DeGrootian model as an optimization problem and provide numerical results to demonstrate a decrease in polarization.
Graph comparison and artificial models for simulating real criminal networks
ABSTRACT. Network science is an active research field due to its numerous applications in areas like computer science, economics, or sociology.
Criminal networks, in particular, possess specific topologies which allow them to exhibit strong resilience to disruption. Starting from a dataset related to meetings between members of a Mafia organization which operated in Sicily during 2000s, we here aim to create artificial models with similar properties. To this end, we use specific tools of Social Network Analysis, including network models (Barabàsi-Albert being identified as the most promising) and metrics which allow us to quantify the similarity between two networks. To the best of our knowledge, DeltaCon and spectral distances have never been applied in this context. The construction of artificial, but realistic models can be a very useful tool for Law Enforcement Agencies, who could reconstruct the structure and simulate the evolution of criminal networks based on the information available.
Community Detection in a Multi-layer Network Over Social Media
ABSTRACT. Detection of Communities over social network, also known as network clustering or community extraction, has been widely studied in the past few years. The objective of community detection is to identify strongly connected components in a complex network. It reveals how people connect and interact with each other. In real world, however, a person is engaged in several traits of connections, these connection or social ties carries other different challenges in community extraction. More than one traits of connections can be exhibited as a multiplex network that contained itself a collection of multiple interdependent networks, where each network represents a trait of the connections. In this literature, this study provide readers with a brief understanding of multi-layer networks and community detection methods also describe the problem of community detection in Facebook page in terms of multi-layer and proposed approach to detect community and its structure using a modularity method. The study also investigate how strong the ties between users and their polarity towards the page over the span of time. The results successfully removes the isolates from network and built a well define structure of community.
Maximal Labeled-Cliques for Structural-Functional Communities
ABSTRACT. Cliques are important building blocks for community structure in networks representing structural association between entities. Bicliques play a similar role for bipartite networks representing functional attributes (aka. labels) of entities. We recently proposed a combination of these structures known as labeled-cliques and designed an algorithm to identify them. In this work we show how to use these structures to identify structural-functional communities in networks. We also designed a few metrics to analyse those communities.
Composite Modularity and parameter tuning in the weight-based fusion model for community detection in node-attributed social networks
ABSTRACT. The weight-based fusion model (WBFM) is one of the most simple and efficient for community detection (CD) in node-attributed social networks (ASNs) which contain both links between social actors (aka structure) and actors' features (aka attributes). Although WBFM is widely used, it has a logical gap as we show here. Namely, the gap stems from the discrepancy between Composite Modularity that is usually optimized within WBFM and corresponding CD quality measures. The discrepancy may cause misinterpretation of CD results and difficulties with parameter tuning within WBFM. To fulfil the gap, we theoretically study how Composite Modularity is related to CD quality measures. This study further yields a pioneering non-manual parameter tuning scheme that provides the equal impact of structure and attributes on CD results. Experiments with synthetic and real-world ASNs show that our conclusions help to reasonably interpret CD results and that our tuning scheme is very accurate.
ABSTRACT. Community detection is a central topic in network science, where the community structure observed in many real networks is sought through the principled clustering of nodes. Spectral methods give well-established approaches to the problem in the undirected setting; however, they generally do not account for edge directionality. We consider a directed spectral method that utilizes a graph Laplacian defined for non-symmetric adjacency matrices. We give the theoretical motivation behind this directed graph Laplacian, and demonstrate its connection to an objective function that reflects a notion of how communities of nodes in directed networks should behave. Applying the method to directed networks, we compare the results to undirected spectral clustering on symmetrized versions of the adjacency matrices. A simulation study using a directed stochastic block model shows that directed spectral clustering can succeed where the undirected approach fails. And we find interesting and informative differences between the two approaches in the application to Congressional cosponsorship networks.
Efficient Community Detection by Exploiting Structural Properties of Real-World User-Item Graphs
ABSTRACT. In this paper, we study the problem of detecting communities in real-world user-item graphs, i.e., bipartite graphs that represent interactions between a user and an item. Instead of developing a generic clustering algorithm for arbitrary graphs, we tailor our algorithm for user-item graphs by taking advantage of the inherent structural properties that exist in real-world networks. Assuming the existence of the core-periphery structure that has been experimentally and theoretically studied, our algorithm is able to extract the vast majority of the communities existing in the network by performing dramatically less computational work compared to conventional graph-clustering algorithms. The proposed algorithm achieves a subquadratic runtime (with respect to the number of vertices) for processing the entire graph, which makes it highly practical for processing large-scale graphs which typically arise in real-world applications. The performance of the proposed algorithm, in terms of both community-detection accuracy and efficiency, is experimentally evaluated with real-world datasets.
Graph Auto-Encoders for Learning Edge Representations
ABSTRACT. Graphs evolved as very effective representations of different types of data including social networks, biological data or textual documents. In the past years, significant efforts have been devoted to methods that learn vector representations of nodes or of entire graphs. But edges, representing interactions between nodes, have been vastly ignored. Surprisingly, there are only a few studies that focus on generating edge representations or deal with edge-related tasks such as the problem of edge classification. In this paper, we propose a new model (in the form of an auto-encoder) to learn edge embeddings in (un)directed graphs. The encoder corresponds to a graph neural network followed by an aggregation function, while a multi-layer perceptron serves as our decoder. We empirically evaluate our approach in two different tasks, namely edge classification and link prediction. In the first task, the proposed model outperforms the baselines, while in the second task, it achieves results that are comparable to the state-of-the-art.
Enriching Graph Representations of Text: Application to Medical Text Classification
ABSTRACT. The graph has been utilized to achieve state-of-the-art performance in text classification tasks. The same basic structure underlies knowledge graphs, large knowledge bases that contain rich information about the world. This paper proposes a novel synthesis of the graph of words structure with knowledge graphs, that produces informative hybrid representations of a corpus. We focus on the domain of medical text classification and medical ontologies in order to test our proposed methods and analyze different alternatives in terms of text representation models and knowledge injection techniques. The method we present produces text representations that are both explainable and effective in improving the accuracy on the OHSUMED classification task, surpassing neural network architectures such as GraphStar and Text GCN.
Experimental Evaluation of Train and Test Split Strategies in Link Prediction
ABSTRACT. In link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint and independent train and test sets are needed. However, objects in a real-world network are inherently related to each other. Therefore, it is far from trivial to separate candidate links into these disjoint sets.
Here we characterize and empirically investigate the two dominant approaches from the literature for creating separate train and test sets in link prediction, referred to as random and temporal splits. Comparing the performance of these two approaches on several large temporal network datasets, we find evidence that random splits result in too optimistic results, whereas a temporal split gives a more fair and realistic indication of performance. Results appear robust to the selection of temporal intervals. These findings will be of interest to researchers that employ link prediction or other machine learning tasks in networks.
Multi-scale Anomaly Detection on Attributed Networks
ABSTRACT. Many social and economic systems can be represented as attributed networks encoding the relations between entities who are themselves described by different node attributes. Finding anomalies in these systems is crucial for detecting abuses such as credit card frauds, web spams or network intrusions. Intuitively, anomalous nodes are defined as nodes whose attributes differ starkly from the attributes of a certain set of nodes of reference, called the context of the anomaly.
While some methods have proposed to spot anomalies locally, globally or within a community context, the problem remain challenging due to the multi-scale composition of real networks and the heterogeneity of node metadata.
Here, we propose a principled way to uncover outlier nodes simultaneously with the context with respect to which they are anomalous, at all relevant scales of the network.
We characterize anomalous nodes in terms of the concentration retained for each node after smoothing specific signals localized on the vertices of the graph. Besides, we introduce a graph signal processing formulation of the Markov stability framework used in community detection, in order to find the context of anomalies. The performance of our method is assessed on synthetic and real-world attributed networks and shows superior results concerning state of the art algorithms. Finally, we show the scalability of our approach in large networks employing Chebychev polynomial approximations.
On the Impact of Communities on Semi-supervised Classification Using Graph Neural Networks
ABSTRACT. Graph Neural Networks (GNNs) are effective in many applications. Still, there is a limited understanding of the effect of common graph structures on the learning process of GNNs. In this work, we systematically study the impact of community structure on the performance of GNNs in semi-supervised node classification on graphs. Following an ablation study on six datasets, we measure the performance of GNNs on the original graphs, and the change in performance in the presence and the absence of community structure. Our results suggest that communities typically have a major impact on the learning process and classification performance. For example, in cases where the majority of nodes from one community share a single classification label, breaking up community structure results in a significant performance drop. On the other hand, for cases where labels show low correlation with communities, we find that the graph structure is rather irrelevant to the learning process, and a feature-only baseline becomes hard to beat. With our work, we provide deeper insights in the abilities and limitations of GNNs, including a set of general guidelines for model selection based on the graph structure.
Evolution of Similar Configurations in Graph Dynamical Systems
ABSTRACT. We investigate questions related to the time evolution of discrete
graph dynamical systems where each node has a state from {0,1}.
The configuration of a system at any time instant is a Boolean
vector that specifies the state of each node at that instant. We
say that two configurations are similar if the Hamming distance
between them is small. Also, a predecessor of a configuration B is
a configuration A such that B can be reached in one step from A.
We study problems related to the similarity of predecessor
configurations from which two similar configurations can be reached
in one time step. We address these problems both analytically and
experimentally. Our analytical results point out that the level
of similarity between predecessors of two similar configurations
depends on the local functions of the dynamical system. Our
experimental results, which consider random graphs as well as small
world networks, rely on the fact that the problem of finding
predecessors can be reduced to the Boolean Satisfiability problem (SAT).
Computing Temporal Twins in Time Logarithmic in History Length
ABSTRACT. A temporal graph G is a sequence of static graphs indexed by integers representing time instants. For ∆ an integer, a pair of ∆-twins is a pair of different vertices (u,v) which, starting at some time instant, have exactly the same neighbourhood outside {u, v} for ∆ consecutive instants. We address the enumeration problem of all pairs of ∆-twins in G, such that the overall runtime depends the least on the history length, namely max{t : Gt ∈ G not empty } − min{t : Gt ∈ G not empty }. We give logarithmic solutions, using red-black tree data structure. Numerical analysis of our implementation on graphs collected from real world data scales up to 10^8 history length.
.
ABSTRACT. We present an algorithm to compute path homology for simple digraphs, and use it to topologically analyze various small digraphs en route to an analysis of complex temporal networks which exhibit such digraphs as underlying motifs.
%Though similar algorithms already exist, there is little existing literature documenting them or their results. The algorithm we present features optimizations to increase the size of the largest workable digraphs, as well as some capabilities for symbolic output.
The digraphs analyzed include all digraphs, directed acyclic graphs, and undirected graphs up to certain numbers of vertices, as well as some specially constructed cases. Using information from this analysis, we identify small digraphs contributing to path homology in dimension $2$ for three temporal networks, and relate these digraphs to network behavior. We conclude that path homology can provide insight into temporal network structure and vice versa.
Revealing transmission of healthcare-associated infections using network-constrained patient trajectories
ABSTRACT. Contact tracing is used in hospital epidemiology as a standard tool to reveal outbreaks of healthcare-associated infections. Contact tracing typically operates by identifying close temporal overlaps between patients movement history when passing through the same locations. However, missing data and indirect contacts can pose significant limits to this methodology and result in potentially misleading conclusions. Here, we propose a methodology that examines the overall similarity between temporal network-constrained patient trajectories to reveal instances of disease transmission that would otherwise be missed using standard contact tracing based on overlaps. Specifically, we define a kernel measuring pairwise similarity between points of trajectories that forms the basis of a similarity graph between patient movement. To showcase the utility of our methodology, we apply it to two different data sets collected across a group of four major hospitals in West London: (1) trajectories of patients colonised with drug-resistant bacteria throughout 2018 to 2020; and (2) trajectories of patients whom acquired covid-19 over their hospital visit during March and April of 2020. Using graph-based semi-supervised learning, we show that the patient similarity graph reveals missing patient contacts that improve the characterisation and understanding of the number and extent of outbreaks.
Convergence towards an Erdos-Renyi graph structure in network contraction processes
ABSTRACT. Complex networks encountered in biology, ecology, sociology and technology often contract due to node failures, infections or attacks. The ultimate failure, taking place when the network fragments into disconnected components was studied extensively using percolation theory. We show that long before reaching fragmentation, contracting networks lose their distinctive features. In particular, we identify a major universality class of network structures, which under a broad class of node deletion processes exhibit a stable flow towards a universal fixed point, representing a maximum-entropy ensemble, called the Erdos-Renyi ensemble. This is in sharp contrast to network expansion processes, which lead to diverse families of complex networks, whose structure is highly sensitive to details of the growth mechanism. These results imply that contracting networks in the late stages of node failure cascades, attacks and epidemics reach a common structure, providing a unifying framework for their analysis.
Reconstructed potentials to characterize complex networks
ABSTRACT. We introduce a characterization of complex networks based on associating the potential of a Schroedinger equation to the graph spectrum. The reconstructed potential provides a compact representation of network properties. In particular, this technique is able to detect the percolation phase transition in Erdos-Renyi random networks in terms of the fractality of the ensemble median potential. The application to a randomly subsampled real-world network is finally performed to test the method.
Probabilistic Network Sparsification by Betweenness
ABSTRACT. In this paper, we study the problem of probabilistic network sparsification. A probabilistic network is a network in which edges are associated with a probability of existence. Sparsification is the problem of removing a predefined percentage of edges from a network while a specific structural property is preserved [1,2]. Parchas el al. in [3] for the first time proposed a method to sparsify probabilistic networks while the expected degree of nodes have the least change. Since the number of edges in sparsified graphs is lower than the original one, the entropy of the graph is normally lower. This is an advantageous result, as we can estimate measures with a smaller number of samples compared to what is needed in the original probabilistic networks. Networks sparsified by the method proposed in [3] can be used to estimate a number of measures like average shortest path length with low error.
Data-Driven Analysis of Complex Networks and Their Model-Generated Counterparts
ABSTRACT. Data-driven analysis of complex networks has been in the focus of research for decades. An important research question is to study how well real networks can be described with a small selection of metrics, furthermore how well network models can capture the relations of graph metrics observed in real networks. In this paper, we apply machine learning techniques to investigate the aforementioned problems.
We study 482 real-world networks from different domains such as brain, food, social, protein interaction, web and infrastructural networks, along with 6x482 synthetic networks generated by six frequently used network models with previously calibrated parameters to make the generated graphs as similar to the real networks as possible.
An Algorithmic Information Distortion in Multidimensional Networks
ABSTRACT. Network complexity, network information content analysis, and lossless compressibility of graph representations have been played an important role in network analysis and network modeling. As multidimensional networks, such as time-varying, multilayer, or dynamic multilayer networks, gain more relevancy in network science, it becomes crucial to investigate in which situations universal algorithmic methods based on algorithmic information theory applied to graphs cannot be straightforwardly imported into the multidimensional case. In this direction, as a worst-case scenario of lossless compressibility distortion that increases linearly with the number of distinct dimensions, this article presents a counter-intuitive phenomenon that occurs when dealing with networks within non-uniform and sufficiently large multidimensional spaces. In particular, we demonstrate that the algorithmic information necessary to encode multidimensional networks that are isomorphic to logarithmically compressible monoplex networks may display exponentially larger distortions in the general case.
ABSTRACT. Given a graph $G_t$ which is a result of a $t$ time, evolutionary process, the goal of \emph{graph de-evolution} of $G_t$ is to infer
what was the structure of the graph $G_t'$ for $t' < t$.
This general inference problem is very important to help understand the mechanisms behind complex systems like social networks and their asymptotic behavior.
In this work we take a step toward this direction and consider undirected, unlabeled trees that are the result of the well known random \emph{preferential attachment} process.
We compute the most likely \emph{root set} (possible isomorphic patient zero candidates) of the tree as well as the most likely \emph{previous graph} $G_{t-1}$ structure.
While the one step forward reasoning in preferential attachment is very simple, the backward (past) reasoning is more complex and includes the use of automorphism and isomorphism of graphs, which we clarify here.
Synchronization in complex networks with long-range interactions
ABSTRACT. Variants of collective behavior can materialize in large ensembles of coupled dynamical systems, and synchronization is one of the most significant among them due to its enormous applicability from neuronal networks to finance. At the same time, current study of long-range interactions is attracting researchers’ attention mainly because interactions among dynamical units in a network may not be present only in the form of short-range direct communications, but also through the long-range connections arising along the long-distant paths among the nodes. Despite a few recent works on synchronization in long-range interacting systems, there are still a lot of areas regarding the influences of long-range communications on top of non-regular complex networks that remain unexplored. We have derived the local and global asymptotic stability conditions for complete synchronization manifold with k-path Laplacian matrices. Importantly, we showed that the analytical findings are in excellent agreement with the numerical results. For the numerical illustrations, we have contemplated the Erdös–Rényi random network by means of a long-range connection governed by the power-law and demonstrate the emergence of complete synchronization. Particularly we have examined the synergy between the coupling strength and the power-law exponent.
Synchronizability and information transmission in complete dynamical networks of discontinuous maps
ABSTRACT. This work is dedicated to the study of information measures, synchronization and a topological order in complete dynamical networks of discontinuous piecewise linear maps with different slopes. It stands out that the networks topologies are characterized by circulant matrices and the conditional Lyapunov exponents are explicitly determined. Some monotony properties related to the amplitude of the network synchronization interval are established, depending on the network order and on the local dynamics. It is proved that to stabilize the synchronized states, it suffices to require that the transversal Lyapunov exponent has to be negative. Properties of the mutual information rate and the Kolmogorov-Sinai entropy, depending on the synchronization interval, are discussed.
Furthermore, various types of computer simulations show the experimental applications of these results and techniques.
ABSTRACT. We demonstrate how to deduce admissible synchronization patterns of systems with higher order interactions from the hypergraph structure and the type of interaction dynamics, how this structure manifests itself in stability analysis, and how it can be used to reduce the dimensionality of stability calculations. This extends both the studies of full synchronization on higher order systems and the analysis of cluster synchronization on systems with dyadic interactions. Our next steps will include using this framework to show how higher order interactions stabilize/destabilize cluster states in specific systems with various coupling topologies.
Effect of nonisochronicity on the chimera states in coupled nonlinear oscillators
ABSTRACT. We investigate the conditions which enable one to show the phenomenon of swing of synchronized states via amplitude chimera states in non-locally coupled systems with symmetry breaking with coupled Stuart-Landau oscillators as an example. Chimera states represent the spatio-temporal patterns coexisting with synchronized and desynchronized behaviour in coupled identical oscillators. We identify that the radius of non-local interaction and the non-isochronicity in the system also play an important role in the observation of such states. The system shows such notable property neither for smaller values of nonisochronicity nor for higher values. We also carefully study the occurrence of different transition routes to recently observed dynamical state called chimera death while varying the strength of nonisochronicity parameter.
Prioritizing investments in critical facility access during and following natural hazard events using geospatial data and network perturbation models
ABSTRACT. Critical infrastructure failure from natural hazard events affects the economic and social well-being of communities. This is particularly acute in lower income countries, where infrastructure may be less resilient and disaster recovery may be more limited by available resources. Ongoing work in this domain focuses on engineering vulnerability and risk management, resilience planning and community preparedness. This study builds on these works by using geospatial data and network topological models to identify three key metrics to enhance cross-sectoral decision making. Network segment criticality is calculated to identify the most important network segments to overall functionality, as defined by origin-destination travel cost. The impact to critical facilities and supply chain metrics can be assessed by the increases in travel cost due to perturbed networks and segment failure. Our algorithm is designed for flexible use on transportation, power, water, supply chain and other networks. A case study in the Commonwealth of Dominica is shown to illustrate flooding impact on road networks and the resulting access of households to hospital facilities. The algorithm has also been applied to power infrastructure in the Dominican Republic and supply chains for fuel transport between ports and power plants. A key focus of this study is helping to identify the highest priority investments, including potential tradeoffs between sectors. In the example shown, resources could be allocated to upgrading critical road segments, or towards hospitals expected to receive a higher influx of households if road networks are perturbed. This is part of a larger study from a forthcoming white paper background note for the World Bank on resilient infrastructure.
ABSTRACT. Black box AI systems for automated decision making, often based on machine learning over (big) data, map a user’s features into a class or a score without exposing the reasons why. This is problematic not only for the lack of transparency, but also for possible biases inherited by the algorithms from human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. The future of AI lies in enabling people to collaborate with machines to solve complex problems. Like any efficient collaboration, this requires good communication, trust, clarity and understanding. Explainable AI addresses such challenges and for years different AI communities have studied such topic, leading to different definitions, evaluation protocols, motivations, and results. This lecture provides a reasoned introduction to the work of Explainable AI (XAI) to date, and surveys the literature with a focus on machine learning and symbolic AI related approaches. We motivate the needs of XAI in real-world and large-scale application, while presenting state-of-the-art techniques and best practices, as well as discussing the many open challenges.
Using Distributed Risk Maps by Consensus as a Complement to Contact Tracing Apps
ABSTRACT. The rapid spread of COVID-19 has demonstrated the need for accurate information to contain its diffusion. Technological solutions are a complement that can help citizens to be informed about the risk in their environment. Although measures such as contact traceability have been successful in some countries, their use raises resistance in society.
This paper proposes a variation of the consensus processes in directed networks to create a risk map of a determined area. The process shares information with trusted contacts: people we would notify in the case of being infected. When the process converges, each participant would have obtained the risk map for the selected zone.
The results are compared with the impact of the pilot project testing the spanish contact tracing app (RadarCOVID). The paper also shows the combination of both strategies: contact tracing to detect potential infections, and risk maps to avoid movements into conflictive areas. Figures show that contact tracing apps could work with less that the 60% of participants. On the other hand, the elaboration of risk maps could work with only a 20% of active installations, but the effect is to delay the propagation instead of reducing the contagion. With both strategies actives, we achieve a significant reduction of infected people with a lower amount of participants.
Blocking the Propagation of Two Simultaneous Contagions over Networks
ABSTRACT. We consider the simultaneous propagation of two contagions over a social network. We assume a threshold model for the propagation of the two contagions and use the formal framework of discrete dynamical systems. In particular, we study an optimization problem where the goal is to minimize the total number of infected nodes subject to a budget constraint on the total number of nodes that can be vaccinated. While this problem has been considered in the literature for a single contagion, our work considers the simultaneous propagation of two contagions. Since the optimization problem is NP-hard, we develop a heuristic based on a generalization of the set cover problem. Using experiments on three real-world networks, we compare the performance of the heuristic with some baseline methods.
Time-critical Crowdsourced Responses and EmergencyMitigation to Global Biosecurity Threats
ABSTRACT. We live in an interconnected world, mediated by complex networks of information, human mobility and trade. As the COVID-19 pandemic showed, interconnectedness can have negative consequences, accelerating the spread of disease. Although the current pandemic was not intentional, it is conceivable that the interconnectedness which led to it could have been exploited by adversarial agents such as terrorists. For this reason, it is important to improve the design and hence the robustness and resilience of the interdependent networks in question. In particular, we ask whether there is a way societies can use interconnectedness to their advantage in the case of a biological emergency. Current emergency response strategies mainly rely on top-down management,e.g. governments and policy makers trying to enforce behavioral policies on the population. As clearly demonstrated by the current pandemic, which saw so many groups of citizens spontaneously emerging to help local authorities in providing relief,bottom-up emergency responses may offer a valuable contribution to effective centralized emergency management.Contributing to the aforementioned efforts, we model different emergency dynamics and study the conditions under which such distributed bottom-up approaches may be more effective. In particular, we focus on one specific aspect of emergency responses,namely the need to form an accurate estimation of true risk in a specific geographical area. A close match between these two assessments is essential for good emergency management, so that both individuals and policy makers can adopt an appropriate behavioral policy for a given level of risk.To that end, we employ an agent-based simulation where agents have a limited capacity to collect samples of the true level of risk. A social network in which agents share individual estimates of risk and influence each other is embedded into a geographical network of nodes, accounting for the agents’ spatial distribution. From the agents’ actions and interactions we reconstruct the collective perception of risk and compare this to the true risk distribution in the model.
Immigration in the Italian Public Debate: Dynamics of Interactions in a Segregated Network
ABSTRACT. We explore a network of Twitter interactions around the topic of immigration in Italy, in order to understand if it is possible to retrieve clusters with opposite stance, how the interactions between opposite factions unfold, how the presence of clusters affects the diffusion of contents, ideas and information. We find a strongly segregated structure, that fits almost perfectly the Italian political and social landscape, and we see that this structure clearly affects the diffusion of content: the groups related to the Government Opposition, even though they are on radically different stances towards migrants, share the same behaviour of closure towards the outside. By measuring the entropy of the sharing patterns of the URLs shared by the users of the different clusters, we see that most of the content that circulates within those communities never gets out, generating a sort of echo chamber effect.
Epidemic-driven conflict and conflict-driven epidemics
ABSTRACT. 1. Introduction
While lockdowns and social distancing have become the dominant Non-Pharmaceutical Interventions (NPIs) to contain the spread of the COVID-19 during the pandemic period, effectively saving hundreds of thousands of lives, they have created massive economic downturn and tremendous psychological distress in the US and around the world. Accordingly, these restrictions by NPIs may provide momentum for protests by libertarians, economic liberals and people with great psychological stress. An emerging wave of peaceful and violent protests is populating the post-pandemic period, with NPIs subsequently losing uptake, modulating the probability, timing, and amplitude of a second pandemic wave, which could further produce a second lockdown and enforce enhanced NPIs. Despite the substantive risks of protest, the complexity of protests driven by mobilization makes it difficult to predict the propagation of protests and how it could become the leading driver of a second pandemic wave through the suppression of NPIs. Here, we present a system of differential equations that describe the protest-driven epidemics.
2. Result
To explain the interplay of protests and epidemics, namely, epidemic-driven conflict and conflict-driven epidemics, we develop a simulation model that integrates the models of protest evolution and epidemic spreading using a novel dataset of the US county-wise friendship network from Facebook. In this model, a county-wise SEIR epidemic model with commuting and airline transportation mobility is used to model the epidemic processes with states in lockdown when active cases exceeds a threshold.
The lockdown initiates mobilization for protests by switching on social unrest in the closed counties, while decreasing the transmission rate and mobility flows at the same time. Mobilization spreads over the friendship network following a epidemic-like process. Accordingly, protesters gather at a protest site in each state, get infected, return to their home counties, and transmit the disease (see Fig. 1(a)).
We first characterize the dynamics of the protest evolution. We simulate the COVID-19 spreading seeded from New York City and Seattle. Fig. 1(b) shows the growing number of total protesters as states are increasingly closed as the disease spreads to all states through mobility flows. After most of states are in lockdown around the 50th day, the size of protests grows logistically until growth balances the intrinsic decay.
Growing protests refuel the pandemic in our simulation. In Fig. 1(b), the infected cases exponentially increases until most of states are closed around the 50th day, and the cases start decreasing by the decreased transmission rate and reduced mobility flows. However, growing protests boost the disease spreading, and reverse the curve to an increasing trend again. On the contrary, the infected cases in the simulation without protests monotonically decreases to zero after lockdowns. Thus, our finding highlights protests as an unexpected side effect of lockdowns.
This competing effect of lockdowns and protests leads us to a question about the cost of lockdown measures in terms of increased infection. Does a protest-driven spreading overwhelm decreased transmission by lockdowns? Fig. 1(c) shows the epidemic spreading when the closure of states is late due to a high lockdown threshold. Although the decreasing trend under protests is encouraging, more than doubled cases by the late lockdown predict more severe damage in the period of a year.
3. Discussion
These findings give a warning message that large protests can be an immediate cause of a conflict-driven second-wave epidemic. Recent evidence of spreading at protests exemplifies the substantive risks of protests occurring over the world. We provide a theoretical framework of the complex feedback loop between NPIs, social unrest, and epidemic evolution. Further integration of our conflict-driven epidemic model and political polarization can measure the precise impact of protests on the spreading of COVID-19. These findings can lead to organizational insights for local and federal authorities on improved preparedness to increase the epidemiological safety of protests, and monitor potential community transmission across states.
ABSTRACT. We address the problem of reducing the spread of an epidemic
over a contact network by vaccinating a limited number of nodes that represent individuals or agents.
We propose a Simulation-baed vaccine allocation method (Simba), a combination of (i) numerous repetitions of an efficient Monte-Carlo simulation, (ii) a PageRank-type influence analysis on an empirical transmission graph which is learned from the simulations, and (iii) discrete stochastic optimization.
Our method scales very well with the size of the network and is suitable for networks with millions of nodes.
Moreover, in contrast to most approaches that are model-agnostic approaches and solely perform graph-analysis on the contact graph, the stochastic simulations explicitly take the exact diffusion dynamics of the epidemic into account.
Thereby, we make our vaccination strategy sensitive to the specific clinical and transmission parameters of the epidemic.
Non-backtracking Eigenvalues: X-Centrality and Node Immunization
ABSTRACT. The non-backtracking matrix and its eigenvalues have many applications in network science and graph mining, such as node and edge centrality, community detection, length spectrum theory, graph distance, and epidemic and percolation thresholds. Moreover, in network epidemiology, the reciprocal of the largest eigenvalue of the non-backtracking matrix is a good approximation for the epidemic threshold of certain network dynamics. In this work, we develop techniques that identify which nodes have the largest impact on the leading non-backtracking eigenvalue. We do so by studying the behavior of the spectrum of the non-backtracking matrix after a node is removed from the graph. From this analysis we derive two new centrality measures: X-degree and X-non-backtracking centrality. We perform extensive experimentation with targeted immunization strategies derived from these two centrality measures. Our spectral analysis and centrality measures can be broadly applied, and will be of interest to both theorists and practitioners alike.
Using Link Clustering to Detect Influential Spreaders
ABSTRACT. Spreading processes are increasingly analysed in the context of complex networks, for example in epidemics research, information dissemination or rumors. In these contexts, the effect of structural properties that facilitate or decelerate spreading processes are of high interest, since this allows in-sights into the extent to which those processes are controllable and predict-able. In social networks, actors usually participate in different densely con-nected social groups that emerge from various social contexts, such as workplace, interests, etc. In this paper, it is examined if the number of groups an actor connects to can be used as a predictor for its capability to spread information effectively. The social contexts (i.e. groups) a node par-ticipates in are determined by the Link Communities approach by Ahn et al. (2010). The results are contrasted to previous findings of structural node properties based on the k-shell index of nodes (Kitsak et al. 2010) by ap-plying both methods on artificially generated and real-world networks. They show that the approach is comparable to existing ones using structural node properties as a predictor, yet no clear evidence is found indicating that one or the other approach leads to better predictions in all investigated networks.
ABSTRACT. Fake news diffusion represents one of the most pressing issues of our online society. In recent years, fake news has been analyzed from several points of views, primarily to improve our ability to separate them from the legit ones as well as identify their sources. Among such vast literature, a rarely discussed theme is likely to play uttermost importance into our understanding of such a controversial phenomenon: the analysis of fake news’ perception. In this work, we approach such a problem by proposing a family of opinion dynamic models tailored to study how specific social interaction patterns concur to the acceptance, or refusal, of fake news by a population of interacting individuals. To discuss the peculiarities of the proposed models, we tested them on several synthetic network topologies, thus underlying when/how they affect the stable states reached by the performed simulations.
Not all interventions are equal for the height of the second peak
ABSTRACT. In the extended abstract we summarize the results of our simulation paper [3] of the spread of an epidemic like COVID-19 with temporary immunity on finite spatial and non-spatial network models. In particular, we assume that an epidemic spreads stochastically on a scale-free network and that each infected individual in the network gains a temporary immunity after its infectious period is over. After the temporary immunity period is over, the individual becomes susceptible to the virus again. When the underlying contact network is embedded in Euclidean geometry, we model three different intervention strategies that aim to control the spread of the epidemic: social distancing, restrictions on travel, and restrictions on maximal number of social contacts per node. Our first finding is that on a finite network, a long enough average immunity period leads to extinction of the pandemic after the first peak, analogous to the concept of “herd immunity”. For each model, there is a critical average immunity duration L_c above which this happens. Our second finding is that all three interventions manage to flatten the first peak (the travel restrictions most efficiently), as well as decrease the critical immunity duration Lc, but elongate the epidemic. However, when the average immunity duration L is shorter than L_c, the price for the flattened first peak is often a high second peak: for limiting the maximal number of contacts, the second peak can be as high as 1/3 of the first peak, and twice as high as it would be without intervention. Thirdly, interventions introduce oscillations into the system and the time to reach equilibrium is, for almost all scenarios, much longer. We conclude that network-based epidemic models can show a variety of behaviors that are not captured by the continuous compartmental models.
Identifying Biomarkers for Important Nodes in an Epidemic
ABSTRACT. This paper uses network science techniques to evaluate the SATHCAP dataset concerning HIV and drug use. A referral network is generated via respondent-driven sampling, which is used to identify important bridge nodes that are responsible for maintaining the structure of large connected components of sexual and drug-using activity. These nodes are scrutinized to determine biomarkers and social factors that distinguish them from the underlying population. It is found that attributes such as homelessness and sexual abuse are more prevalent in these bridge nodes. These nodes are ill-served by public health efforts, because they are hard to reach and difficult to identify. Intervention campaigns targeted at groups displaying these attributes could meaningfully lower the spread of HIV.
Possibilities and Limitations of using mobile phone data in exploring human behavior
ABSTRACT. Big Data as provided by modern communication systems provide unprecedented opportunities for research. Mobile phones have become almost like a new organ in additional to our biological ones and we practically never get rid of them, hence the analysis of CDR-s (Call Detail Records) are particularly important in gaining information about the whereabouts, contacts and activity patterns of people. We will review some of the results from such analyses, including large scale structure of the society, mobility patterns, gender and age dependence of interactions, bursty character of the activity. We will show that sometimes extremely precise information can be obtained and applied to support theories of social anthropology e.g., about family relationships. However, CDR data should be used with care, as bias could occur since information from one communication channel is considered only. We analyze this aspect and suggest a general description of such biases.
StreamFaSE: an online algorithm for subgraph counting in dynamic networks
ABSTRACT. Counting subgraph occurrences in complex networks is an important analytical task with applicability in a multitude of domains such as sociology, biology and medicine. This task is a fundamental primitive for concepts such as motifs and graphlet degree distributions. However, there is a lack of online algorithms for computing and updating subgraph counts in dynamic networks. Some of these networks exist as a streaming of edge additions and deletions that are registered as they occur in the real world. In this paper we introduce StreamFaSE, an efficient online algorithm for keeping track of exact subgraph counts in dynamic networks, and we explain in detail our approach, showcasing its general applicability in different network scenarios. We tested our method on a set of diverse real directed and undirected network streams, showing that we are always faster than the current existing methods for this task, achieving several orders of magnitude speedup when compared to a state-of-art baseline.
Generalized optimal paths revealed through the large deviations of random walks on networks
ABSTRACT. Numerous problems of both theoretical and practical interest involve finding shortest paths in networks in the presence of some obstacles or constraints. A somewhat related class of problems focus on finding the optimal distributions of weights which, for a given connection topology, maximize some flow or minimize a given cost function. We show that both issues can be approached through an analysis of the large-deviation functions of random walks. The idea is quite versatile, as paths and weights can in fact be taylored to given mean values and/or fluctuations of one or several time-integrated observables. Some ideas that may be useful in technological applications and in the control of stochastic dynamics will be discussed.
Comparing Box-Covering Algorithms for Fractal Dimension of Complex Networks
ABSTRACT. The fractal nature of complex networks has received a lot of research interest recently since fractality has been verified in several real-world networks (WWW, actor collaboration networks, protein interaction networks), moreover, fractal nature has been associated with many important features of networks such as robustness, modularity, and information contagion.
The method to identify fractal nature in complex networks is similar to that of regular fractal objects: using the box-covering method. For a given network, we partition its vertices into boxes of given size. A box is a subgraph of the network, which diameter is less than the given box size. The goal of the box-covering algorithm is to find the minimum number of fixed-sized boxes needed to cover the entire network.
The box-covering problem belongs to a family of NP-hard problems. On the other hand, in practice, various algorithms are adopted to obtain an approximate solution.
In this work, we provide a systematic review of the various box-covering algorithms proposed throughout the years. Furthermore, we compare the performance of the algorithms in terms of running time and approximation ability both using real-world networks and mathematical network models. We also make our implementations of the algorithms publicly available as a Python module.
ABSTRACT. We propose a new centrality measure, called the Random Walk Decay centrality. While most centralities in the literature are based on the notion of shortest paths, this new centrality measure stems from the random walk on the network. We provide an axiomatic characterization and show that the new centrality is closely related to PageRank. More in detail, we show that replacing only one axiom, called Lack of SelfImpact, with another one, called Edge Swap, results in the new axiomatization of PageRank. Finally, we argue that Lack of Self-Impact is desirable in various settings and explain why violating Edge Swap may be beneficial and may contribute to promoting diversity in the centrality measure.
Towards Mesoscopic Structural Analysis of the Fediverse of Decentralized Social Networks
ABSTRACT. Open-source, distributed decentralized online social networks (DOSNs) are emerging as alternatives to the popular though centralized platforms -- i.e., hosted and controlled by a single company -- like Facebook or Twitter. In DOSNs, everyone is allowed to get the code and set up their own server. A particularly relevant model is the federated one, whereby each server can communicate with the others using the same protocol, meaning that if a user is signed up for a certain server, s/he is still able to interact with users on another server -- in a similar fashion as for the email service. In addition, any platform that implements the common protocol, such as ActivityPub, becomes part of a massive social network, called Fediverse, thus enabling individuals to use their accounts on a platform to follow users on other platforms without needing an account there.
The Fediverse currently provides several services, such as Mastodon and Friendica for microblogging, PeerTube and Funkwhale for video hosting, PixelFed for image hosting. To our knowledge, Mastodon is the only platform in the Fediverse that has received relative attention from the research community. However, several open questions still remain to address on Mastodon, which motivated us to focus on this platform in this work.
Our major contribution is to fill a gap in the understanding of the mesoscopic structure of Mastodon: in fact, unlike existing works, our study builds upon a network model over the instances and exploits it to provide insights into connections among instances, and hence their users, over the detected modularity-based community structure as well as core-decomposition. Given the versatility of our analysis methodology, we believe our study can serve as a starting point to analyze the entire Fediverse of DOSNs.
The effect of commuting on the structure and assortativity of online social ties
ABSTRACT. Commuting across cities creates opportunities to meet and establish relationships to a wide variety of people. Contrary, living and work around the same location limits the possibilities to develop social ties towards diverse communities. In this paper we test the influence of urban mobility on the structure and assortativity of social connections.
We follow the mobility of individuals through their geolocated Twitter messages in the top 50 metropolitan areas of the US. After identifying the home and work location of users and constructing their online social network, we include socio-economic information from census data to their main locations. Our findings show that commuting to distant locations increases the number of connections people can develop and acts against closed, highly clustered social ties. However, commuting to work at a distant location does not change the pattern that people most likely develop connections towards others from similar socio-ecconomic background. It means that despite physical meeting possibilities the city offers, tie formation is still predominantly driven by homophilic relationships.
The role of geography in the complex diffusion of innovations
ABSTRACT. The urban-rural divide is increasing in modern societies calling for geographical extensions of social influence modelling.Improved understanding of innovation diffusion across locations and through social connections can provide us with new insights into the spread of information, technological progress and economic development. In this work, we analyze the spatial adoption dynamics of iWiW, an Online Social Network (OSN) in Hungary and uncover empirical features about the spatial adoption in social networks. During its entire life cycle from 2002 to 2012, iWiW reached up to 300 million friendship ties of3 million users. We find that the number of adopters as a function of town population follows a scaling law that reveals a strongly concentrated early adoption in large towns and a less concentrated late adoption. We also discover a strengthening distance decay of spread over the life-cycle indicating high fraction of distant diffusion in early stages but the dominance of local diffusion in late stages. The spreading process is modeled within the Bass diffusion framework that enables us to compare the differential equation version with an agent-based version of the model run on the empirical network. Although both model versions can capture the macro trend of adoption, they have limited capacity to describe the observed trends of urban scaling and distance decay. We find, however that incorporating adoption thresholds, defined by the fraction of social connections that adopt a technology before the individual adopts, improves the network model fit to the urban scaling of early adopters.Controlling for the threshold distribution enables us to eliminate the bias induced by local network structure on predicting local adoption peaks. Finally, we show that geographical features such as distance from the innovation origin and town size influence prediction of adoption peak at local scales in all model specifications.
Ecological networks reveal species associations and communities in Urban Forests of South Delhi Ridge, India
ABSTRACT. Understanding species associations in urban ecosystems is of considerable interest to ecologists and conservation biologists.There is increasing evidence that urban nature plays a significant role in modulating the microclimate, and enhancing the quality of life in the cities. Urban trees help towns to cope with climate warming by cooling both air and surface, and yet the challenges imposed by the urban environment on plants are seldom considered. In this context, one of the most overlooked features is the manner and extent to which native, introduced and invasive species co-exist with each other in an urban forest. In this work, we have investigated urban forests in the National Capital region of India to explore the vegetation in various localities and its impacts on constituent species, an aspect that remains largely unexplored. Using a network approach, we identify distinct communities of plants in each of the six urban localities investigated, despite huge overlaps in overall species composition. We also find cliques preferred by invasive species Lantana camara and Prosopis juliflora in distinct urban pockets of New Delhi. We hope this work will pave the way for more detailed investigations into the ecological impact of biological invasions in Urban landscapes.
The macro-, meso- and micro-structure of individual-based community-wide plant-pollinator networks reflects pollen flow dynamics and plant reproductive success
ABSTRACT. We introduce a framework rooted in multilayer network modeling designed to depict the conspecific and heterospecific pollen flows mediated by floral visitors. By applying this analysis to 9 well-resolved individual plant-pollinator networks, we show that individual-based networks are highly modular, with modules often integrating individuals from different plant species, linked by their animal visitors. Consequently, the individual node position in the network with respect to its conspecifics or to the overall network have contrasting effects on individual plant reproduction. However, considering the mesoscale enhances the description of plant reproductive success, as it integrates all heterospecific and conspecific interactions of a given individual. We provide a simple but robust set of metrics to scale down network ecology from multitrophic communities to the individual level, where most ecological processes take place, hence moving forward the description and interpretation of ecological dynamics across scales.
Joint Modeling of Chromatin Marker Effects in 3D Genome Through Hi-C Interaction Graph
ABSTRACT. Hi-C experiments provide a window into the spatial packing of a genome in
3D within the cell. Even though Hi-C does not directly depend on the binding of specific
antibodies, previous analysis has revealed denser interactions around many chromatin
markers. However, the joint effects of these markers in Hi-C interaction graph have not
been globally characterized yet, limiting our understanding of genome shape. Here, we
propose ProbHiC to decompose Hi-C interaction graph in terms of known chromatin
markers. The method is based on convex likelihood optimization which can directly take
into account both interaction existence and nonexistence. We find 4 histone modifications (H3K4me1, H3K4me3, H3K9me3, H3K27ac to be highly predictive of most Hi-C
interactions across species and cell types. ProbHiC is quite effective in predicting Hi-C interactions and genome shape in one species, given they are trained on another species or cell types.
The effective graph: a weighted graph that captures nonlinear logical redundancy in biochemical systems
ABSTRACT. The ability to map causal interactions underlying genetic control and cellular signalling has led to increasingly accurate models of the complex biochemical networksthat regulate cellular function [1,2,3]. However, the traditional representation of bio-chemical networks as static and binary interaction graphs fails to accurately representan important dynamical feature of these multivariate systems: some pathways propagate control signals much more effectively than others [4] (see Fig. 1A & B). Suchheterogeneity of dynamical interactions reflectscanalization, as the system is robust tointerventions in redundant pathways, but responsive to interventions in effective path-ways. The simplest way to model such causal, interdependent nonlinear dynamics iswith multivariate, discrete dynamical systems; for instance, Boolean Networks (BN)are canonical models of complex systems which exhibit a wide range of dynamicalbehaviors [5]. BN provide a convenient modelling framework to explore general properties of complex systems, such as self-organization, criticality, causality, canalization,robustness and evolvability [1,6,7,8].
To capture the nonlinear logical redundancy present in biochemical network regulation, signalling, and control, we present theeffective graph. The effective graph is aweighted, directed graph that statistically integrates all dynamical redundancy presentin the BN dynamics, thus revealing the most important interactions in determining state-transitions, as well as very redundant pathways. In this talk we present a summary ofkey results derived from more than 40 systems biology models analyzed, including that:i) redundant pathways are prevalent in biological models of biochemical regulation (seeFig. 1D & E); ii) the effective graph provides a statistical but precise characterizationof multivariate dynamics in a causal graph form (see Fig. 1B & C); and iii) the effectivegraph provides an accurate explanation of how perturbation and control signals propagate in biochemical regulation, such as those induced by drug therapies on Cancer. SeeFig. 1C, and note how cancer drugs (purple nodes) lose their pathway toApoptosis(celldeath; green nodes), a desired control outcome in thisER+breast cancer model. Overall, our results indicate that the effective graph provides an enriched description of thestructure and dynamics of networked multivariate causal interactions. We demonstratethat it improves explainability, prediction, and control of complex dynamical systemsin general, and biochemical regulation in particular.
All simulations and code to support the findings are freely available in the CANApython package [9].
Analysis of psychostimulant-induced group behaviours using network based framework in Drospophila melanogaster
ABSTRACT. In this paper we describe experiments of group behaviours of adult Drosophila melanogaster males. We performed a quantitative analysis of social interaction networks structure on the global and local network level. We construct two classes of weighted social interaction networks: (i) CTR networks based on the social interactions of the group of flies raised in the group on the regular fly food and (ii) COC networks based on the social interaction of flies that was raised in the group and administrated orally trough food to 0.5 mg/mL of cocaine for 24 hours before tracking. Nodes are related to flies, links refer to the social interactions of two flies and weights denote the number of interactions of each two flies during one experiment.
NETME: On-the-fly knowledge network construction from biomedical literature
ABSTRACT. The huge amount of biological literature, which daily increases, represents a strategic resource to automatically extract and gain knowledge concerning relations among biological elements. Knowledge Networks are helpful tools in the context of biological knowledge discovery and modeling. Here we introduce a novel system called \netme, which, starting from a set of fulltext obtained from \pubmed, through an easy-to-use web interface, interactively extracts a group of biological elements stored into a selected list of ontological databases and then synthesizes a network with inferred relations among such elements.
The results clearly show that our tool is capable to efficiently and efficaciously infer reliable functional biological networks.
Clustering Active Users in Twitter Based on Top-k Trending Topics
ABSTRACT. Discovering meaningful topical clusters in online social networks (OSNs) has recently occupied an overwhelming research interest owing to its diverse applications. Most existing approaches focus on the contents generated by the social users and link structure of the underlying social network [1,2]. However, the degree of users' temporal topical activeness has not been thoroughly studied to identify its effect on the formation of topical clusters. As a result, the resulting clusters may contain mix of high and low active users as well as may contain users who have no inclination towards the query attributes. This research investigates on how the users’ behaviors and topical activeness vary with time and how these parameters can be employed in order to improve the quality of the detected topical clusters for top-k trending topics at different time intervals.
Our observation is that users have different degrees of topical activeness which vary widely over time. The proposed approach is commenced on measuring the degree of activeness for each candidate cluster member with respect to the given query attributes to enhance the quality of the detected topical clusters. Instead of giving the query topics manually, our system identify the top- k trending topics by taking into account the number of mentions on that topics and the coverage of that topics in OSN.
Inferring political opinion and its relationship with use of language in a Twitter conversation around a territorial conflict
ABSTRACT. Political polarization generates strong effects on society, driving controversial debates and influencing the institutions. Territorial disputes are one of the most important polarized scenarios and have been consistently related to the use of language. In this work, we analyzed the opinion and language distributions of a particular territorial dispute around the independence of Catalonia through Twitter data. We infer a continuous opinion distribution by applying a model based on retweet interactions, previously detecting elite users with fixed and antagonist opinions. The resulting distribution presents a mainly bimodal behavior with an intermediate third pole that shows a less polarized society with the presence of not only antagonist opinions. We find that the more active, engaged and influential users hold more extreme positions. Also we prove that there is a clear relationship between political positions and the use of language, showing that against independence users speak mainly Spanish while pro-independence users speak Catalan and Spanish almost indistinctly. However, the third pole, closer in political opinion to the pro-independence pole, behaves similarly to the against-independence one concerning the use of language.
Media Polarization on Twitter during 2019 Indonesian Election
ABSTRACT. In this study, we investigate the phenomenon of political polarization on news consumption patterns of Twitter users during 2019 Indonesian elections. By modeling news consumption as a bipartite network of news outlets-Twitter user, and then projecting to a network of news outlets, we observed the emergence of a number of media communites based on audience similarity. By measuring the political alignments of each news outlet, we shows the politically fragmented Indonesian news media landscape, where each media community becomes an political echo chamber for its audience. Our finding highlight the important role of mainstream media as a bridge of information between political echo chamber in social media environment.
A Framework for Interaction-based Propagation Analysis in Online Social Networks
ABSTRACT. Online social networks create a digital footprint of human interaction naturally by theway they function. Thus, they allow a large scale analysis of human behavior whichwas previously infeasible for social scientists. Consequently, social networks have been studied intensely in the last decade. The core of most social networks is the relationship between users which can be described as a graph. The graph can be either undirected, as is the case for the friendship relation of Facebook, or directed, which is the case of the follower relation on Twitter. The relationship is readily visible, e.g. on the user interfacethe social networks themselves. However, these edges are unweighted expressions of interest and reflect how individuals have chosen to relate to each other rather than how they actually interact with each other. For studying information propagation, comparing interaction properties is crucial and, therefore, using models based on connections that reflect different dimensions and strengths of acquaintance seems appropriate. Thus, there is a need for obtaining weighted edges from the communication that occurs on the social network. In this paper, we present a novel method to calculate an acquaintance score between pairs of Twitter users and use the resulting networks to enable the analysis of interaction based information propagation.
Better weighted clustering coefficient: now continuous
ABSTRACT. Nodes in real-world networks tend to cluster into densely connected groups, a property captured by the clustering coefficient.
It was however initially defined for unweighted and undirected networks, which left out crucial information about dynamics on weighted graphs.
Several generalizations have therefore been proposed for weighted networks.
However, none of them fulfills the continuity condition: edges with vanishing weights have an impact that differs from the absence of an edge.
We propose here a new definition of the clustering coefficient that tackles this issue while satisfying previously formulated requirements.
This new definition is less sensitive to weak spurious connections that can be generated by noise or statistical biases.
Compared to previous methods, it is thus able to detect topological features more precisely and behaves in a more sensible manner for inferred networks.
Finally we discuss the differences between clustering methods on several real-world graphs, notably weighted and directed brain networks.
A Dynamic Algorithm for Linear Algebraically Computing Nonbacktracking Walk Centrality
ABSTRACT. Dynamic graph data is used to represent the changing relationships in society, biology, web traffic, and more. When computing analytics on such evolving data, it is important to have algorithms that can update analytics quickly as data changes, without needing to recompute the analytics from scratch. A common analytic performed on graph data is that of centrality: identifying the most important (highly ranked) vertices in the graph. In this work we examine centrality scores based on nonbacktracking walks in graphs and propose methods to update such scores in dynamic graphs. We propose two dynamic methods and demonstrate that both are faster than statically recomputing the scores at each graph change. We additionally show that one of these methods is far superior than the other with regard to the quality of the scores obtained, and is able to produce good quality approximations with respect to a static recomputation. Our methods use properties of iterative methods to update local portions of the centrality vector as the graph changes (in this paper, we focus exclusively on edge additions). Experiments are performed on real-world networks with millions of vertices and edges.
TemporalRI: a subgraph isomorphism algorithm for temporal networks
ABSTRACT. Temporal networks are graphs where each edge is associated with a timestamp or an interval indicating when an interaction between two nodes happens. The Temporal Subgraph Isomorphism (TSI) problem aims at identifying all the subgraphs of a temporal network (called target) matching a smaller temporal network (called query), such that matched target edges satisfy the same chronological order of query edges. Few existing algorithms deal with the TSI problem and most of them can be applied only to small or specific queries. In this paper we propose TemporalRI, a subgraph isomorphism algorithm for temporal networks, inspired by RI algorithm. We show that our algorithm can handle queries of any size and any topology. Experiments on real networks show that TemporalRI is very efficient, especially for large queries and targets.
Digital Sousveillance: A Network Analysis Of US Surveillance Organizations
ABSTRACT. This study introduces a new methodological approach to the study of surveillance that I call digital sousveillance – the co-optation of digital data and the use of computational methods and techniques to resituate technologies of control and surveillance on individuals to instead observe the organizational observer. To illustrate the potential of this method, I employ quantitative network analytic methods to trace the changes and development of the vast network of public and private organizations involved in surveillance operations in the United States – what I term the “US surveillant assemblage” – from the 1970s to the 2000s. The results of the network analyses suggest that the US surveillant assemblage is becoming increasingly privatized and that the line between “public” and “private” is becoming blurred as private organizations are, at an increasing rate, partnering with the US government to engage in mass surveillance.
Searching for linguistic collocations in specialty languages by using complex networks
ABSTRACT. Special words and two-word combinations with their own specific and characteristic meaning within an area of knowledge that are specific references within a specialized language (called linguistic collocations), collectively constitute the specific terminology of that specialized language and characterize and define that language. In this work we analyze the specialized mathematical language produced by the complex network scientific community from the perspective of complex network theory and by using its tools to extract new properties and characteristics of this language. Specifically, this work is focused on the description of a methodology based on a variant of the PageRank algorithm to locate the collocations, defining a new structure (enriched line graph) that can be interpreted as a certain type of "interpolation" between the original graph and its associated line graph, showing new results, properties and characteristics of this specialty language.
An Analysis of Four Academic Department Collaboration Networks with Respect to Gender
ABSTRACT. Understanding how research is conducted is important for understanding how we get the information that influences the decisions we make. One method used for analyzing the patterns behind research is building collaboration networks. In a collaboration network, vertices represent researchers and an edge exists between two vertices if the corresponding researchers have collaborated.
We construct a collaboration network for the faculty in the Biology, Computer Science, Electrical Engineering, and Mathematics departments at a large undergraduate university in California. For each of these departments, we analyze collaboration patterns with respect to gender within this network. We find interesting collaborative behavior in the Mathematics department that differs from previously established claims about gender with respect to collaboration.
Movie Script Similarity using Multilayer Network Portrait Divergence
ABSTRACT. This paper addresses the question of movie similarity through
multilayer graph similarity measures. Recent work has shown how to
construct multilayer networks using movie scripts, and how they capture
different aspects of the stories. Based on this modeling, we propose to
rely on the multilayer structure and compute different similarities, so
we may compare movies, not from their visual content, summary, or
actors, but actually from their own storyboard. We propose to do so
using “portrait divergence”, which is has been recently introduced to
compute graph distances from summarizing graph characteristics. We
illustrate our approach on the series of six Star Wars movies.
Finding High-Degree Vertices with Inclusive Random Sampling
ABSTRACT. The friendship paradox (FP) is the famous phenomenon that one's friends typically have more friends than they themselves do. The FP has inspired novel approaches for sampling vertices at random from a network when the goal of the sampling is to find vertices of higher degree. The most famous of these methods involves selecting a vertex at random and then selecting one of its neighbors at random. Another possible method would be to select a random edge from the network and then select one of its endpoints at random, again predicated on the fact that high degree vertices will be overrepresented in the collection of edge endpoints. In this paper we propose a simple tweak to these methods where we consider the degrees of the two vertices involved in the selection process and choose the one with higher degree. We explore the different sampling methods theoretically, establishing significant bounds on their performances. We also apply the methods experimentally to both synthetic graphs and real-world networks to determine the improvement inclusive sampling offers over exclusive sampling, which version of inclusive sampling is stronger, and what graph characteristics affect these results.
Relationship graph for organisations using communication tools
ABSTRACT. As the future of work becomes more collaborative and organisations tend to be more digitalised, understanding how individuals build and lose their relationships has become critical to businesses.
By examining how people communicate across their organisation, business leaders can leverage organisational network analysis to support data-driven decision-making and improve their business processes, measure organisational health and achieve a higher-performing organisation.
We propose a method for organisations to create a relationship score based on communication logs and we explain how they can deduce important insights by adding information found in other systems and by extracting partial subgraphs which give a set of nodes and links verifying a certain property on the source graph.
Inferring Eukaryote-Prokaryote interactions in microbial communities: a multi-layer network approach
ABSTRACT. We study the co-occurrence of marine eukaryote and Prokaryote taxa based on abundances from the Malaspina project. We infer multilayer interactions between operational taxonomic units (OTU) based on their abundance profiles. Each layer corresponds to eukaryote and prokaryote taxa, respectively, and the interactions occur in the same layer (eukaryote-eukaryote interactions, or Prokaryote-Prokaryote interaction) or between layers (Eukaryote-Prokaryote interactions). We describe the most likely interactions which reflect similar abundance patterns as a proxy for symbiotic interactions.
Multiomics-based inference of cell type-specific regulatory networks in early human embryos
ABSTRACT. Cells respond to environmental changes by activating gene expression programmes that allow them to maintain cellular homeostasis. In multicellular organisms, these programmes differ from cell type to cell type and are thus an important component of a cell's identity. Therefore, mapping the molecular networks responsible for gene expression regulation is key to understand homeostatic maintenance and failure, as well as cell type specification. Although sequencing technologies have allowed for the development of high-throughput assays to measure gene expression, chromatin accessibility and methylation levels, the determination of regulatory interactions is restricted to a few transcription factors (TFs) with good quality antibodies. This has prompted the development of computational methods to infer regulatory relationships between TFs and their target genes. Unfortunately, even the best computational approaches predict a considerable number of false positive interactions, which originate from indirect associations: a path a->b->c can result in the prediction of a->c even if there is no direct link between those nodes. Here, we assessed whether the integration of other types of omics datasets with transcriptomic-based predictions could help with the removal of indirect TF-gene relationships and produce more reliable regulatory networks. We did this using data from blastocyst stage human embryos. This has the advantage of being a biological context with only three well-defined cell types, good quality omics data and sufficient domain knowledge to evaluate the plausibility of our predictions. On the other hand, this biological context also has many open questions that can be addressed with insights from regulatory network models.
Dynamical patterns of user activity during electoral campaigns and debates in Twitter
ABSTRACT. In the last years a lot of studies appeared in relation to the users behavior in Twitter on different political topics. In particular, special attention has been payed to electoral campaigns using different approaches.
For example, counting the number of retweets and mentions to the accounts of political parties and candidates, inferring collective opinion through the interaction network, etc. We focus in the analysis of the dynamical patterns of the time series of users activity during several electoral elections at different temporal scales.
Over the last five years numerous political events have taken place in Spain. Such as several regional and four general elections. This fact has provided us with a unique opportunity to analyze behavioral trends and identify activity patterns in several consecutive political elections. In particular, we have analyzed user behavior in the electoral campaigns of 2015, 2016, April 2019 and November 2019.
A Second-Order Adaptive Network Model for Learner-Controlled Mental Model Learning Processes
ABSTRACT. Learning new knowledge or a new skill usually requires the development of an ade-quate internal mental model in the form of a mental network. The learning process for such an internal model involves (first-order) mental network adaptation. Such a learning process often integrates different elements, such as learning by observation and learning by instruction. For an effective learning process, a main issue is to get an appropriate timing of the different elements. To control the timing of these elements of a learning process, the mental network adaptation process has to be adaptive itself: second-order mental network adaptation. The second-order adaptive mental network model proposed here addresses this, where the first-order adaptation process models the learning pro-cess of mental network models and the second-order adaptation process controls the timing of the elements of the learning process. It is illustrated for learner-controlled mental model learning in the context of driving a car where the learner is in control of the integration of learning by observation and learning by instruction.
ABSTRACT. Residential segregation is analyzed via an extended Schelling segregation model including in it economic terms. For different parameters values a wide range of phenomena with social interpretations appear, i.e. gentrification. This social reality occurs in urban zones where a economical gap between people exists and the financially handicapped ones are forced to leave. Another interesting scenario happens when the environment is economically deprived and only small clusters remain on it. This case higlights the importance of cooperation to overcome monetary issues. Both situations evolves in the system trough avalanches whose complementary cumulative distribution functions are depicted and fitted to power-law curves. The estimated exponents are in the range of other works related to self-organized criticality in different systems.
Scientific collaboration of researchers and organizations: A two-level blockmodeling approach
ABSTRACT. Understanding scientific collaboration (SC) patterns among researchers on different social levels is fundamental for the development and successful implementation of the R&D policies. Therefore, the study aims to provide a simultaneous insight into the SC patterns among individual researchers (individual level) and institutions (institutional level). Considering different social levels is important because SC on different levels are interdependent. In this study, SC on an individual level is operationalized by co-authorship of scientific paper while two institutions are said to collaborate if they collaborated on a joint research project. The data were retrieved by the Slovene information systems that contain information about researches and organizations that are registered at the Slovene research agency. Based on these data, the two-level SC networks were constructed for the researchers from the field of Social sciences and their institutions. The data for the period between 2005 and 2015 are analyzed by using the k-means-based blockmodeling approach for linked networks that enable to simultaneously blockmodel different levels of multi-level networks. The results for different levels are interpreted. The analysis of the two-mode network reveals that their institution membership in a large part determines the SC on an individual level.
Resident’s Alzheimer disease and social networks within a nursing home
ABSTRACT. This study investigated social networks of residents of a nursing home including residents with Alzheimer disease. We focus on friendship networks between residents to investigate the mutual relationship between social communication and friendship companies and how they affect the Alzheimer disease process and vice versa.We used methods of SNA including interviews, observations and social network analyses. The setting was a nursing home in Tehran with 42 residents that 10 of them were diagnosed with Alzheimer disease. The cross-sectional data were collected within this nursing home. Using SNA as a useful tool in Alzheimer researches has the potential to create new ideas and plans for a better caring and attributing special policies and strategies to residents with Alzheimer in nursing homes in order to prevent, control and treat for Alzheimer disease. This study showed that resident with Alzheimer or dementia have dramatically less relationships with others in comparison to the normal residents.
Using pandemic periods to improve now-casting modelsbased on search engine data
ABSTRACT. Online searches have been used to study different health-related behaviours including identifying disease outbreaks. However, as several reasons can motivate individuals to seek online information, particularly during a pandemic, current models, blind to whether such activity is related to an actual disease or not, are of limited interest. Here we propose a methodology to disentangle search behaviours linked to general information seeking (media driven) reflecting, for example, curiosity or fear, from searchers looking for treatment information (disease driven). The difficulty in making this separation leads more often than not to the disregard of pandemic periods for disease surveillance. However, we argue that pandemics offer a unique opportunity to study the weight of the several drivers of online behaviour. As a case study, we apply our methodology to two respiratory infectious diseases able to cause a pandemic, 2009-H1N1 and COVID-19.
Detecting People Interested in Non-Suicidal Self-Injury on Social Media
ABSTRACT. We propose a supervised learning approach to detect people interested in Non-Suicidal Self-Injury (NSSI). We treat the task as a binary classification problem, and build classifiers based upon features extracted from people self-declared interests. Experimental evaluation on a real-world dataset, the LiveJournal social blogging networking platform, demonstrates the effectiveness of our proposed model.
Quarantined world through SoundCloud hashtags network
ABSTRACT. In March and April all the world was involved in a pandemic lockdown due to the
Covid-19 emergency. During that period, people used social networks not only as entertainment channels, but above all as a place for expressing moods, collecting news
and sharing passions through ‘hashtags’. The research is aimed to verify, by examining
the hashtags network of the online music platform ‘SoundCloud’, whether the ‘hash-
tag’ is not only an index of the track, but a real form of communication between users.
Temporal Bibliometric Networks on Coronavirus reveal Politics of the Pandemic
ABSTRACT. A world crisis always brings forth new and unexpected responses that are fascinating from both scientific and social standpoints. The current global pandemic arising from the novel coronavirus outbreak shook the scientific world like few other events. In less than 12 months, the sheer numbers of publications suggested that scientists from all walks of life, irrespective of their respective fields of interests, shifted to COVID19 research and this led to many discoveries and also new directions of research for many. In contrast, the pandemic also resulted in shocking factoids based on incomplete interpretations of scientific data, which have continued to be foisted on the public at an alarming rate during the past nine months of COVID, the most colossal of these being the Lancet HCQ story. In this work, we use a Bibliometric approach to analyse the 2020 COVID19 publications, and compare these to the past twenty years of Coronavirus research, spanning two major epidemics of SARS and MERS. The patterns provide insights into the politics of this pandemic, opening up text and data mining as an important tool for understanding larger patterns across diverse fields.