NETSCI19: NETSCI 2019
PROGRAM FOR THURSDAY, MAY 30TH
Days:
previous day
next day
all days

View: session overviewtalk overview

11:15-12:45 Session 3A: Econ (money)
11:15
How production networks amplify economic growth

ABSTRACT. It is widely understood that the most important cause of long-term GDP growth is technological progress, though the factors that drive this progress are not fully understood. GDP is generated in an economy by a complex web of industries that buy goods from one another, convert these to new goods, and sell their production to households or other industries. Here, we argue that the network of production amplifies the effects of technological improvements as they propagate along chains of production, using new predictions for how this amplification depends on the shape of the network. A key property is an industry's output multiplier, which can be interpreted as the average number of production steps required to make a good. Examples for industries in the U.S. and China are shown in Fig. \ref{fig_networkDiagram}a.

Taking a network perspective, we derive two new predictions about output multipliers that are latent within standard economic models. First, the expected price reduction in an industry will be linearly correlated with its output multiplier. This occurs because the benefits of industries' productivity improvements accumulate downstream along the production chains ending at the industry, to an extent captured by the average length of these chains (Fig. \ref{fig_networkDiagram}b). Second, changes in a country's GDP will be correlated with the average of its industries' output multipliers. We also derive related predictions for price co-movement and the within-industry variation of price changes. We test these predictions against future price changes and growth using data from the World Input Output Database, finding results in excellent agreement. (Fig. \ref{fig_networkDiagram}c shows an example.) The results show how properties of the network of production exert an important influence on growth. We also consider how the network structure may evolve over long periods, suggesting insight into the process of structural change whereby output and employment realize large shifts with time, such as from agriculture to manufacturing to services.

11:30
Detecting Bot Activity in the Ethereum Blockchain Network
SPEAKER: Morit Zwang

ABSTRACT. The Ethereum blockchain network is a decentralized platform enabling smart contract execution and transactions of Ether (ETH) [1], its designated cryptocurrency. Ethereum is the second most popular cryptocurrency with a market cap of more than 100 billion USD, with hundreds of thousands of transactions executed daily by hundreds of thousands of unique wallets. Tens of thousands of those wallets are newly generated each day. The Ethereum platform enables anyone to freely open multiple new wallets [2] free of charge (resulting in a large number of wallets that are controlled by the same entities). This attribute makes the Ethereum network a breeding space for activity by software robots (bots). The existence of bots is widespread in different digital technologies and there are various approaches to detect their activity such as rule-based, clustering, machine learning and more [3,4]. In this work, we demonstrate how bot detection can be implemented using a Network Theory. Being a platform used for human interactions, the Ethereum network can be described and modeled by a Network Theory approach. The degree distribution of such networks, for example, often displays a power law distribution [5]. This phenomenon can also be observed in the Ethereum network when constructing a graph network that represents Ethereum transactions between wallets—where each wallet is a vertex and a transaction between two wallets is an edge. Previous research has demonstrated that time differences between consecutive events in many human activities display a power law distribution as well. The time difference between consecutive transactions in this work refers to the number of minutes between every transaction and its prior transaction. The time difference was calculated for the transactions of each wallet separately, and we created a histogram from the time difference of all the transactions of all wallets in the Ethereum network. The histogram of the time difference distribution of an arbitrary one-week sample shows that, indeed, the time difference between consecutive transactions demonstrates a power law distribution. (Fig. 1) It can be observed that the distribution of time differences between the consecutive transactions of Ethereum wallets does not perfectly fit the power law model and is characterized by multiple spikes. Each spike represents a collection of highly correlated wallets which deviate from the expected power law distribution rather than resembling spontaneous human activity. Anomalies from the power law model in human interaction networks might represent the occurrence of potentially interesting events [6]. In this case, we assume that transactions which are anomalous to the power law model represent non-human behavior executed by bots. This assumption is based on the nature of the anomalies (spikes occurring at a very specific time difference) and on the observation of other properties common to the anomalous transactions, such as having the same transaction value.

11:45
Structural changes in an interbank market across the financial crisis: A multiple core-periphery analysis

ABSTRACT. Interbank networks have been shown to have core-periphery structure (Fig. 1(a)), where core banks trade with various banks in the network, while peripheral banks trade mostly with core banks but not with peripheral banks. However, this long-standing observation came into question: a recent study showed that interbank networks were closer to bipartite structure rather than core-periphery structure on short time scales [Barucca and Fabrizio, 2016]. Here we have challenged this question [Kojaku et al., 2018] by applying a novel core-periphery detection algorithm [Kojaku and Masudas, 2017] to a temporal eMID interbank network between 1999 and 2012. Unlike many previous algorithms, this algorithm can identify multiple groups with core-periphery structure in a network. On a short temporal scale of aggregations (i.e., days), we found a structure close to bipartite structure, supporting the results in a recent study [Barucca and Fabrizio, 2016]. On large time scales, we found core-periphery structure until January 2008 and bipartite structure afterwards (Fig. 1(b)), implying that the core banks refrained from trading with other core banks after the global financial crisis. Additionally, the algorithm revealed segregation between the Italian and non-Italian banks and how banks changed their roles in the networks over time.

12:00
Source-destination betweenness determines bargaining power and costs of trading in networks.

ABSTRACT. Globalization has become a prominent feature of the modern economy. In this sense, supply, service and trading chains play an essential role in different contexts such as agriculture, transport and communication networks, international trade and finance (1). Both the earnings of the different agents involved and the efficiency of the system depend on the prices proposed by the intermediaries. The purpose of this work is to understand the basic principles of intermediation and the role of network topology on prices and efficiency.

Game theory constitutes a useful framework to study hypothetical situations among competing trading agents. In this context, the Nash Bargaining Game (2) studies how two agents share a surplus that they can jointly generate. As a generalization of bargaining games to n players, Choi et al. proposed and tested in the laboratory a model of intermediation pricing (3). In that model, the intermediaries are located in the nodes of a network, and they may post a price for the passage of a good. If the total cost of the cheapest path does not exceed the price of the good, the nodes located on it get their price as a payoff. They found that intermediaries rents depend on their criticality; in this context, a node is said to be critical if it lies on all possible paths between the source and destination. On the one hand, they proved that criticality is sufficient for the existence of intermediation rents and, on the other hand, their experiments showed that criticality is also necessary for large surplus extraction by intermediaries. It should be noted that those experiments were done in small groups and relatively simple networks up to 10 subjects. However, it is not clear that this need for criticality can be extrapolated to larger and more complex networks which, despite not presenting critical nodes, show a wide centrality range in the nodes.

In this work, we deal with this issue by incorporating larger and more complex networks to gain empirical insights into the role of network topology in commerce, and how intermediaries behave in a real-like trading network. To this end, we conduct experiments with subjects embedded in networks up to 50 nodes. It should be noted that in these networks there are no critical nodes, so a direct extrapolation of previous results should indicate that the intermediation rents are null. Our first finding is that, even when there are no critical traders, network topology affects intermediation: the intermediation rents vary significantly with the network topology, with random graphs showing much smaller intermediation rents than small-world-like networks. Our second finding is that neither criticality nor centrality affects pricing. Nevertheless, although node-centrality is uncorrelated with pricing, it is positively correlated with profits, which is a consequence of the fact that central nodes display a higher probability of being in the successful trade path. In addition, we have found that traders raise prices when they are on the cheapest path, and lower prices otherwise. Based on this finding, we have built a model that reproduces qualitatively the experimental results and allow us to study the implications of observed behavior in networks of any size and topology. The results of the model show that trade cost is jointly determined by the maximum number of independent paths and by the average centrality of nodes. These findings have applications in transportation and communication networks, supply chains, intermediation, funding, and financial markets.

References:

1: Anderson JE, Van Wincoop E (2004) Trade costs. Journal of Economic Literature 42(3): 691–751.

2: Nash Jr JF (1950) The bargaining problem. Econometrica: Journal of the Econometric Society. 155-162.

3 Choi S, Galeotti A, Goyal S (2017) Trading in networks: Theory and experiments. Journal of the European Economic Association 15(4):784–817.

12:15
Network Analysis of US State GDP Fluctuations: Great Recession and Recovery

ABSTRACT. From late 2007 to the middle of 2009, the US economy went through the ``Great Recession.’’ The pace of recovery has been slow, and in 2012, the US went into a ``near recession.’’ While the Great Recession has been officially over for ten years now, economic experts are still arguing about the exact nature and causes of the recession, and correct policies to alleviate it. Our previous investigation of the fluctuations of the US state GDPs showed that the correlations increased markedly during the recession – the states went into a more synchronized state. We performed a network analysis and a principal component analysis, and found the same clusters of states with both methods.

We have extended our analysis to look at the fluctuations of the various industry sectors for the states, and repeated our analysis. The goal is to clarify which sectors might have caused the recession, and if there are any early signals of the oncoming recession in the data. We find a very strong synchronization in the Financial and Insurance Sector of the state GDPs slightly before and during the Great Recession, see figure. We also note a synchronization signal in retail during the near recession of 2012.

We will present results for various industry sectors up until the latest data available at the time of the conference, and investigate the patterns of synchronization, the networks formed between states and industries, and their correlation to recessions and recovery.

12:30
Follow the money: making sense of financial transaction networks

ABSTRACT. Payment processing systems passively record everyday economic activity at its most basic unit—individual transactions. Financial transaction data from these systems describe microeconomic actions at macroeconomic scales in an inherently networked manner. Here, we extend temporal network analysis to financial transaction data from payment systems by noting that transactions also describe the literal movement of money, which must obey basic accounting constraints. We introduce the concept of balance-respecting trajectories to explicitly represent observed flows of money through a real, large-scale, mobile money payment system. A dynamic programming algorithm “follows” transacted money concurrently through all accounts. We find that observed trajectory motifs intuitively summarize mobile money data, isolating known use-cases. We also quantify circulation to find that mobile money is primarily single-use, even though it is digital and could be re-transacted indefinitely at little cost. We go on to aggregate trajectories, and discover that different use-cases create dramatically different structures of money flow at the system scale: payments result in a hub-and-spoke structure, money transfers form a largely amorphous structure, while money storage creates richer meso-scale structure based primarily on geography. Within the money storage sub-network we also uncover systematic fee evasion, a mildly fraudulent behavior, among scores of small, isolated subgroups of mobile money agents. This novel network approach promises to facilitate descriptive work on financial transaction data from any payment system, and encourage further study of the structure of economic activity at its most elemental.

11:15-12:45 Session 3B: Epidemics (data)
11:15
Reconstructing geographic transmission pathways of ZIKV epidemic in the Americas

ABSTRACT. The 2015-2016 ZIKV epidemic in the Americas posed unprecedented challenges to public health communities due to lack of reliable and timely surveillance resources and consequently unknown information about international transmission risks. Genomic epidemiology has helped identifying the origin and introduction time of the virus to ZIKV-affected countries in the Americas, based on sequencing data of virus' genome and phylogenetic inference models, however the specific path of the outbreak is still unknown.

In this work, we reconstruct the geographic transmission history and pathways of Zika epidemics by inferring temporal invasion networks simulated from a data-driven global stochastic vector-borne disease epidemic model (GLEAM). GLEAM tracks infectious individuals traveling around the world on daily basis, allowing us to build probabilistic international invasion networks from an ensemble of stochastic simulations. We then identify the maximum-likelihood invasion tree for ZIKV epidemics in the Americas. Our estimates for the introduction time and the most likely origins for first cases in the United States are in agreement with known phylogenetic studies e.g. the introduction time and the most likely source of ZIKV to the United States are January 2016 from Puerto Rico.

By incorporating data-driven disease dynamics, our model also provide accurate estimates of epidemic peak period for countries like Haiti, Venezuela, Mexico, etc. Furthermore, our estimate on the peak period of microcephaly cases in Brazil is in alignment with what is found in the surveillance data.

Our approach represents a modeling and computational effort to understand how ZIKV was introduced in the Americas and the subsequent spatio-temporal transmission pattern of the ZIKV epidemic.

11:30
Detecting pathogen type interactions from coinfection patterns: a comparison of multivariate statistical models in reconstructing eco-epidemiological networks
SPEAKER: Irene Man

ABSTRACT. Network analysis is entering fields where network structures are unknown, and statistical methods for inferring networks from data have burgeoned over the past decade. A crucial requirement for the valid application of such methods is the justification, based on principles grounded on subject matter, for the particular network model adopted. In validation studies, the ‘true’ network is often simulated from known multivariate probability distributions, with the risk of oversimplifying the underlying causal mechanisms in real systems. However, if we are to have models of a system on different levels of complexity, we should understand how they relate and when they are consistent.

Here, we consider the application to a challenging topic in epidemiology, namely the reconstruction of eco-epidemiological networks, consisting of competitive and synergistic interactions during infection with various (sub)types of a pathogen. In prior work [1], we demonstrated that, in the absence of long-lasting cross-immunity, the interaction parameters between two types can be estimated as the odds ratio of coinfection, provided that individual hosts are sampled from an endemic equilibrium and possible confounders are properly adjusted for. In the present work, we extend the setting to an arbitrary number of (potentially interacting) types and show how the conditional independence structure in coinfection patterns can be employed to reconstruct the network of pathogen type interactions.

Specifically, we evaluated a suite of statistical network models for describing the joint distribution of multivariate binary data, denoting endemic infection states of pathogens with up to 10 circulating (sub)types. The ‘true’ networks were composed of type-specific interactions in acquisition and clearance, built into SIS (Susceptible-Infected-Susceptible) dynamic models for the ensemble of pathogen types. The recovered networks were compared to the ‘true’ networks for varying sample size, prevalence, network configurations, and interaction structures. The associations inferred by the statistical network models were shown to correspond to the underlying interaction parameters, with an exact interpretation in the case of multiplicative symmetric interaction structures. Network topologies were reconstructed with high accuracy for datasets with large sample sizes, with near optimal sensitivity and specificity for edge recovery in models relying on sparse regression techniques (e.g. Ising models using ℓ1-regularization) or log-linear analysis with BIC-based model selection. Both methods retained good specificity at reduced sample sizes, albeit at the expense of loss in sensitivity.

In summary, our work shows how widely used mechanistic transmission models can be used to study patterns of co-occurrence when considering multiple pathogens or pathogen (sub)types. Vice versa, we demonstrate how these patterns may be employed for detecting pathogen (type) interactions. This has direct relevance for public health, as it helps to anticipate and understand the impact of interventions against a subset of pathogenic types. More generally, a better understanding of the relation between probabilistic and mechanistic models will improve the interpretation of recovered network structures in specific applications, and broaden the scope of statistical network inference.

[1] Man I, Wallinga J, Bogaards JA. Inferring pathogen type interactions using cross-sectional prevalence data: opportunities and pitfalls for predicting type replacement. Epidemiology 2018; 29(5): 666-674.

11:45
Human mobility shapes the HIV epidemic in Namibia

ABSTRACT. Namibia has a severe HIV epidemic. The infection is endemic in the general population, and there is significant geographic variation in prevalence (6%-40% for women, 0%-24% for men). Along with other countries in sub-Saharan Africa (SSA), Namibia has implemented extensive interventions aimed at preventing infections. However, all of them are based on the implicit assumption that all risk is localized: people get infected by members of their own communities. We argue that, in order to eliminate HIV in SSA (a goal set by the World Health Organization), a key ingredient is still missing: human mobility. The Namibian population is highly mobile. Consequently, individuals may be at risk of acquiring HIV both in their home community (from other residents or from visitors from other communities) or when visiting other communities (from residents of these communities). We quantified the importance of these three types of risk in the different geographic areas, and computed the various flows of risk importation and exportation among them. We used the network of mobility among the 106 Namibian constituencies (administrative areas), inferred from mobile phone records, and coupled it to localized prevalence values. With that, we built the three networks of risk flows. Our results show that mobility does not substantially change (in any area of the country), the number of new infections that would occur in the absence of mobility, but it substantially “redistributes” risk (Fig. 1). Hence, mobility changes which individuals become infected and where they acquire infection. Our results demonstrate that it is essential to design specialized interventions that, informed by mobility patterns, reach the groups who are at highest risk of getting infected, and increase adherence to treatment of the infected individuals.

12:00
Multi-level physician networks provide novel descriptors of cardiologists related to patient outcomes
SPEAKER: Erika Moen

ABSTRACT. Background. An inverse association between patient mortality and surgeon procedure volume is one of the strongest empirical regularities in health services research. These findings suggest that patients may improve their chances of survival by seeking treatment from high volume surgeons. If patients are channeled to high volume surgeons for selected surgical procedures, it follows that these surgeons would become central within the patient-sharing patterns of a health system. This raises the question of whether established patient-sharing patterns may be related to surgeon procedure volume and patient outcomes. Methods. Using national Medicare claims, physician networks can be inclusive of the entire nation or bounded using previously developed algorithms to assign physicians to “groups” (e.g., hospitals or hospital referral regions). Degree centrality is a measure of the number of ties a physician has to other physicians within the network boundary. A mathematically appealing feature of degree centrality is that it perfectly partitions into the sum of the measure evaluated for a sub-network and on the complement of the sub-network (Figure). Recognizing that a physician may be central within the hospital network but few ties across hospitals or vice versa, we evaluate both “local” (within hospital) and “external” (across hospital) degree centrality measures to determine their respective impacts on patient outcomes. The primary outcome of interest was risk-adjusted 2-year case fatality. Hierarchical logistic regression estimated the effects of physician local and external degree centrality on case fatality. Results and Discussion. We observed that patients have improved outcomes when treated by physicians who have high local degree centrality within the hospital patient-sharing network. This suggests that selective referrals within hospitals to high volume and central physicians would lead to improved patient outcomes. We also observed an increased likelihood for 2-year case fatality among patients treated by physicians with a high external degree, which raises several questions about physicains whose networks predict worse outcomes. We discuss potential clinical interpretations of these results and outline future work to explore these relationships further.

12:15
Inferring high-resolution disease-specific human mixing patterns
SPEAKER: Dina Mistry

ABSTRACT. The spreading dynamics of infectious diseases is highly dependent on the structure of social contact networks within populations. Modeling disease dynamics thus requires high-resolution data describing the social contact networks and the mixing patterns within them alongside knowledge of the mechanisms of disease transmission. Previous work has used a variety of methodologies to study human mixing patterns often at the national scale. However, this level of resolution can obscure the heterogeneity of contact patterns within large and diverse nations and thus the diverse range of epidemic outcomes that may be observed in different populations.

Here we propose a computational framework to develop synthetic contact networks via a combination of publicly available census and survey data on key sociodemographic features and demonstrate how they can be used to provide insight on the dynamics of disease transmission and make inference on important epidemiological parameters. In particular, we focus on the social settings key to the spreading of airborne infections: the household, school, workplace, and the general community, to characterize who individuals are likely to interact with or be in close proximity to during the day, and thus who they may infect or be infected by. We develop synthetic contact networks for several nations with large and diverse populations and focus on inferring the contact networks at the subnational scale to highlight the heterogeneity present within these populations. Specifically, we develop the synthetic contact networks at the subnational scale for Australia, Canada, China, India, Israel, Japan, Russia, South Africa, and the United States of America.

The estimated contact networks are then summarized in terms of age-mixing patterns of the population. Age has been shown to be a key determinant in shaping the spread of airborne diseases and thus provides a relevant dimension of transmission to project the mixing patterns on. The contact patterns from the different settings are then combined to generate a matrix of effective contacts by age with weights derived from an analysis of empirical contact pattern data from diary studies. We employ the generated age-specific contact matrices describing the number of contacts between an infectious individual of age j and susceptible individuals of age i in age-structured models of infectious disease spreading. We examine the scenario of the same infectious disease spreading in all populations in our study and show how the heterogeneous contact patterns and age distributions result in diverse epidemic outcomes, even within the same country. This divergence highlights the importance of characterizing contact patterns at a higher resolution than the national level, at least in large and highly culturally diverse countries as those included in this analysis.

12:30
Seed node similarity and the diffusion of deworming treatments in Uganda

ABSTRACT. The network location of the first informed individuals, i.e. the network seeds, affects how well information spreads as well as influences participant uptake of public health interventions. Yet, little is known about how seed node similarity affects overall diffusion rates or the division of labour between seed nodes. In rural Uganda, we tracked 6,148 people in 1,118 households from 28 villages during blanket deworming treatment programmes. Within each village, two community medicine distributors (CMDs)—seed nodes—were tasked with moving from door-to-door to treat all eligible individuals. We examined the influence of both structural equivalence in friendship networks (shared direct and indirect ties) and attribute similarity (13 indicators of demographic and socioeconomic homophily) of seeds against outcomes of village treatment rates and the ratio of treatment rates between the two seeds. Here we show that seed node similarity does not affect overall treatment rates. However, a more equal division of labour between seeds was associated with increased overall treatment rates, resulting in an absolute difference of up to 39.7% in treatment rates between villages. The only determinant of the division of labour was the presence of a direct edge between seeds. This result was surprising in that the negative effects on diffusion of network redundancy, i.e. having seeds located within the same cluster, were trumped by positive effects of direct communication between seeds. Diverse attributes of seed nodes did not translate into the treatment of a wider population. Peer effects between seed nodes should be considered to facilitate the more equal division of labour/effort amongst seed nodes and, in turn, to increase the diffusion of public health interventions.

11:15-12:45 Session 3C: Theory (math/theory 2)
11:15
Koopman operator and its approximations for networks with symmetries

ABSTRACT. Networks often exhibit symmetries in their adjacency structure and individual node dynamics. These symmetries can simplify the analysis of these systems and provide insights into their behavior, such as cluster synchronization and attractor basin structure. Yet a particular challenge is that many networks of current interest, such as biological networks and power grids, are increasingly high-dimensional and nonlinear, making an accurate representation of their behavior inaccessible to approaches based on traditional dynamics of states. Thus, operator based approaches to dynamical systems, which consider the evolution of functions that operate on the states of the system, have gained popularity in recent years. The approximations of these operators provide ways to apply these methods directly to data and accurately represent nonlinear dynamical systems in a linear setting. Such approximations are valid in the entire state space as opposed to local linearizations.

We focus on the Koopman operator, an infinite dimensional linear operator acting on functions of states. It can be well approximated via the recently proposed extended dynamic mode decomposition (EDMD) method that requires a dictionary of observables. Finding this approximation in its matrix form to obtain Koopman eigenvalues, eigenfunctions, and modes can be computationally expensive for a given high-dimensional system. Using tools from representation theory, we study the structure of the eigendecomposition of this operator, which we show is especially revealing of the symmetries through an appropriate isotypic component basis. We apply that knowledge to produce dictionaries of observables that ensure that the matrix corresponding to the Koopman operator approximation is block diagonal. That can offer computational advantages, illuminate the attractor stricture of the underlying system, and can potentially lead to novel data-driven methods of detecting symmetries in complex nonlinear networks.

11:30
The statistical physics of real-world networks

ABSTRACT. In the past 15 years, statistical physics has been successful as a framework for modelling complex networks. On the theoretical side, this approach has unveiled a variety of physical phenomena, such as the emergence of mixed distributions and ensemble non-equivalence, that are observed in heterogeneous networks but not in homogeneous systems. At the same time, thanks to the deep connection between the principle of maximum entropy and information theory, statistical physics has led to the definition of null models for networks that reproduce features of real-world systems but that are otherwise as random as possible. We review here the statistical physics approach and the null models for complex networks, focusing in particular on analytical frameworks that reproduce local network features. We show how these models have been used to detect statistically significant structural patterns in real-world networks and to reconstruct the network structure in cases of incomplete information. We further survey the statistical physics models that reproduce more complex, semilocal network features using Markov chain Monte Carlo sampling, as well as models of generalized network structures, such as multiplex networks, interacting networks, and simplicial complexes.

11:45
Analytical Formulation of the Block-Constrained Configuration Model

ABSTRACT. We provide a novel family of generative block-models for random graphs that naturally incorporates degree distributions: the block-constrained configuration model. Block-constrained configuration models build on the generalised hypergeometric ensemble of random graphs and extend the well-known configuration model by enforcing block-constraints on the edge generation process. The resulting models are analytically tractable and practical to fit even to large networks. These models provide a new, flexible tool for the study of community structure and for network science in general, where modelling networks with heterogeneous degree distributions is of central importance.

12:00
The Longest Path in Price Model

ABSTRACT. The Price Model (Price, 1965) is one of the oldest network models motivated by the pattern of citations in academic papers. In a citation network, each node represents a document while every entry s in the bibliography of document t is represented by a directed edge from s to t. A key feature of a citation network, one inherent in the Price model, is that there is a fundamental arrow of time in the network; bibliographies can only refer to older documents and so our edges point forward in time in our convention. This means that there are no cycles in the network, you can never find a path from a node that returns to that node. Thus a citation network is an example of a Directed Acyclic Graph, or DAG for short.

Mathematically, DAGs have some distinctive properties and one of them is that for any pair of nodes which are connected there is a well defined and meaningful longest path length. Unlike other types of network, the shortest path in DAGs is of lesser importance. For instance, most of our knowledge of the Price Model did not come directly from the original 1965 paper we cite, but via longer routes through more recent work such as the 2009 book by Newman and references therein.

In this work, we explore the scaling of the longest path in Price model using analytical and numerical methods. We assume that in Price model, the connection of edges from an existing node s to a new node t is made in one of two ways: with probability p we choose s uniformly at random from the set of existing vertices, otherwise with s is chosen in proportion to it’s current in-degree (Price’s ‘cumulative advantage’). We add m new edges in this way to each new node, t + 1, before repeating the process with the next path. We measure the longest path from the first node (t = 1) to each node t in the network. This is bounded from below by l(t), the length of the greedy path leading to t. This is defined such that in the greedy path, starting from t, each edge (u, v) leading to a given node v is chosen such that the difference (v − u) is minimised. We have shown that the greedy path length in the mean-field approximation is l(t) = mp ln(t) − mp ψ(mp + 1) − (mp)/(2t) + O(t^−2), where ψ is the digamma function. We confirm the leading order mp ln(t) behaviour using numerical simulations as illustrated in the figure. We also show numerically that to leading order, the ratio of longest to greedy path length appears to be constant, independent of the m and p parameters. One insight is that these long path scales are dominated by edges added to nodes added randomly, on average mp per node, which are typically shorter than edges added with cumulative attachment.

Our results add to the existing body of known analytical properties of network models, here the Price model and its undirected variant, the Barab´asi-Albert model (1999).

12:15
Hidden complexity in signed networks
SPEAKER: Sungmin Lee

ABSTRACT. In many complex systems, not only there are positive interactions promoting coordinated actions and activating collective behaviors but also prevalent are the negative interactions between agents suppressing and antagonizing them. For instance, in neuronal networks a neuron can be excitabory or inhibitory. Such systems can best be modeled as ‘signed’ networks. There had been considerable effort devoted to such signed interactions; however, quantitative understanding of the generic impact of such signed interactions on network dynamics is still lacking. Towards this goal here we investigate a simple dynamical process on signed network. Specifically we study a threshold cascade dynamics model based on Watts model [PNAS 99, 5766 (2002)] by generalizing it for signed networks. The model incorporates the effect of negative links and is used to apply various network-theoretical concepts and tools to quantitatively understand the signed network dynamics. In such signed network, the threshold cascade dynamics rule is formulated as follows. Activations: an inactive node gets activated if i) the fraction of active nodes among its neighbors connected by the positive link exceeds the prescribed threshold r, and at the same time, ii) there is no active neighboring node connected by the negative link. Deactivations: an active non-seed node can be deactivated if either the fraction of active nodes among its neighbors connected by the positive link drops to or below the threshold r or it encounters active neighbors connected by the negative link. The primary results of numerical simulations are displayed in Fig. 1. Firstly, the presence of the negative interactions (encoded by the negative-link mean degree z−) is found very effective in suppressing global cascade. As the negative links are introduced in the network, the cascade size ρ decreases promptly. While the cascade covers lesser fraction, it takes longer time to proceed, up to the point by which the depletion effect gets saturated at around z− ≈ z+. We will discuss in more detail other nontrivial effects of the existence of negative interactions to show the additional facets of complexity such as degenerate stationary states. As shown in Fig. 1, for a given z− and seed configuration, some stationary states are occurred. And interestingly, such occurrence of the degenerated stationary states is kept for increasing z−. We anticipate this work could prompt the community to the study of dynamic processes on signed networks at large, which we believe will prove itself a fertile ground for yet another layer of complexity in networked complex systems.

12:30
Mean-field theory of graph neural networks in graph partitioning

ABSTRACT. Nowadays, a signifiant attention has been paid to deep neural network approaches on graphs. They are referred to as the graph convolution or the graph neural network (GNN). An important question is whether such an approach is also promising for typical tasks in network science, such as community detection. While many experimental studies have been done, inner workings of GNNs are not well-understood theoretically.

Here we present a theoretical performance analysis of a GNN for graph partitioning problem [1]. We consider the ensemble of graph generated by the (symmetric) stochastic block model and evaluate the fraction of vertices that an algorithm can correctly infer the planted assignment. Our mean-field analysis accurately estimates the limit of the model parameter beyond which the untrained GNN cannot infer the planted assignment better than chance (solid green line in the figure below); such a limit is termed the algorithmic detectability limit.

Our analysis reveals that, even without any training, the GNN exhibits a nontrivial performance owing to the nonlinear architecture of the model. Moreover, our numerical experiment implies that the training of the GNN does not break the algorithmic limit that we derived for the untrained GNN.

We also show the relationships between the GNN that we consider here and other algorithms, such as the EM algorithm with belief propagation and the spectral method.

11:15-12:45 Session 3D: Success 1
11:15
Quantifying gender disparity in faculty flows across a hiring and promotion network

ABSTRACT. Previous work establishes that scientific progress is enhanced by faculty diversity (e.g., ethnic, racial, or gender diversity), because such diversity correlates with cognitive diversity that enhances critical and creative thinking and enriches the learning environments for mentoring young scientists. In particular, the representation of female tenure-track faculty in STEM has been tied to increasing percentages of female undergraduates in these fields. Despite these positive effects, however, quantifying the trends and impact of diversity in the scientific workforce remains difficult, due in large part to a lack of comprehensive data about the flow of scientists across hiring and promotion networks.

Here we apply a topical web crawler to collect longitudinal employment information for over 5,000 tenure-track faculty across 205 PhD-granting Computer Science departments in the U.S. and Canada, using the Internet Archive's Wayback Machine to conduct a complete census of these departments in years past, between 2011 and 2017. Past work analyzing the attrition, retention, and promotion of faculty between 2011 and 2017 alone highlights the growth of the computer science field (Fig. 1 (a)), but does not provide sufficient statistical power to determining the differential rates at which male and female faculty are promoted, change institutions, or leave the field. The new longitudinal census data presented here provides more detailed estimates of the precise transition rates, by gender, between faculty ranks, over time (Fig. 1 (b)), and allows additional analyses of how these rates covary with department attributes like prestige and geographic location. Our longitudinal analysis results shed new and more quantitative light on the ``leaky pipeline" metaphor of gender differences in faculty retention, promotion, and movements across the hiring network, illustrating that the flow of faculty within the scientific workforce is substantially more variable than previously recognized.

11:30
The Midas Touch: Quantifying the Impact of Awards on Scientific Developments
SPEAKER: Ching Jin

ABSTRACT. Scientific prizes create strong incentives to resist conservative tendencies and encourage exploration on high-risk, high impact science, and promote community-building celebrations. Despite a considerable number of studies on the awarding systems, the lack of large-scale, high-resolution datasets limits our understanding just to the relationship between awards and prizewinners; little is known about the influence of prizes on the development and proliferation of scientific fields. For a long time, whether prizes could boost scientific development are controversial: While a positive influence from prizes on scientific topics is expected based on the Matthew effect (Merton1968), prizes are often considered as late-signals of topic legitimation which instead suggests a saturation of growth (Zuckerman1992). By combining two large-scale data, the scientific prize dataset (12,960 winners and 3,236 prizes, see Fig. 1A) and the Microsoft Academic Graph dataset (209M scientists on 228K research topics), we studied the effect of prizewinning on topic boosts (Fig. 1B) in terms of four different measures: number of active researchers, number of new researchers, number of publications and number of citations. We adopted the coarsened exact matching (CEM) method to eliminate endogenous factors, by identifying topics with indistinguishable trajectory for each prizewinning topic prior to the prizewinning year. Comparing the future dynamical trajectories of the topics in the prizewinning group with the matching group, we found strong boosting effects of prizes on the future development of scientific topics (Fig. 1C), which are further influenced by three important factors: (i) the raw materials contained in the topic, (ii) the interaction time between the prize winner and the field (Fig. 1D), and most importantly (iii) the specificity of the prize, which is largely correlated with the prize’s position in the network (Fig. 1E&F). These results not only elucidate the puzzling relationship between prizes and scientific topics, but also provide new insights on the social function of prizes, with strong policy implications.

11:45
Productivity, prominence, and the effects of academic environment
SPEAKER: Samuel Way

ABSTRACT. Faculty at prestigious institutions produce more scientific papers, receive more citations and scholarly awards, and are typically trained at more prestigious institutions than faculty with less prestigious appointments. These imbalances are often attributed to a meritocratic system that sorts individuals into more prestigious positions according to their past achievements and potential for future scholarly impact. Here, we investigate the determinants of scholarly productivity and measure their dependence on the structure of the academic faculty hiring network and its influence on researchers' past training and current work environments. To distinguish the effects of these environments, we apply a matched-pairs experimental design to career and productivity trajectories of 2453 early-career faculty at all 205 Ph.D-granting computer science departments in the U.S. and Canada, who together account for over 200,000 publications and 7.4 million citations. Our results show that the prestige of faculty's current work environment, not their training environment, drives their future scientific productivity and prominence. Furthermore, the characteristics of a work environment are more predictive of faculty productivity and impact than mechanisms representing preferential selection or retention of more productive scholars by more prestigious departments. These results identify a network-based mechanism for cumulative advantage, in which an individual's past successes are "locked in" via placement into a more prestigious environment, which directly facilitates future success. The scientific productivity of early-career faculty is thus driven by where they work, rather than where they trained for their doctorate, indicating a limited role for doctoral prestige in predicting scientific contributions.

12:00
Quantifying and modeling gender bias in scientific careers

ABSTRACT. Understanding the mechanisms behind knowledge production and scientific careers is of central importance for society. In this project, we analyze how community biases affect, drive, inhibit or distort the evolution of scientific productivity and impact in careers across genders. We do this in a mechanistic fashion, adopting the framework of matched samples, that allows to obtain unbiased comparisons between male and female authors, and providing mathematical principles that explain and correct for the differences in careers. Specifically, our approach is based on balancing across the different groups of scholars the distribution of covariates that can affect the outputs that we investigate - as an example, it is well known that the impact of an author is correlated with his/her productivity, therefore when looking at impact only authors with similar levels of productivity should be compared. To obtain this balance we use the propensity score, which has been proven to be an efficient balancing score in the presence of several covariates. Specifically, after computing the propensity score the idea is to make its distribution equivalent across the investigated sets by bootstrapping the smaller ones to the size of the larger ones. The advantage of our approach above other techniques - like linear regressions or statistical hypothesis tests - is that it sharply increase the statistical quality of the comparison without dramatically cutting the size of the sets of scholars. This implies being able to perform highly accurate and unbiased comparisons over a dataset of unprecedented size, both with respect to the amount of scholars and the variety of disciplines. In fact, our main source of data is the Web of Science dataset, maintained by Clarivate Analytics, that contains data on about 54 millions papers and more than 7 millions authors, across 150 different disciplines organized in a thematic hierarchical structure.

First, we are able to estimate the difference in productivity between male and female authors, with the latter publishing on average about 20% fewer papers. Surprisingly, we find a strong, negative correlation between gender gap and percentage of women in a discipline, both for productivity and impact. This means that men tend to outperform women more in the disciplines in which the presence of women is higher. Interestingly, gender gaps in productivity and impact are strictly related for all disciplines, with a well defined 1 to 1 matching, enforcing the idea that the gender gap in impact is strongly linked to that in productivity.

To understand the roots of the relation between gender gap and women presence, we look at the collaboration network of co-authorship between authors. After extracting one network per discipline, we compute the nominal assortativity - a well established estimator for segregation within networks - with respect to gender. Across disciplines, a clear relation emerges between gender assortativity and the percentage of women: the disciplines with more women show higher segregation.

We demonstrate that this segregation pattern penalizes women by reducing their pool of possible coauthors. Indeed, after correcting for the number of distinct coauthors, we find that gender gap in productivity is significantly reduced. We believe that our results help clarify the mechanisms of gender gap in academia, offering valuable knowledge to shape research policies that can effectively contrast it.

12:15
Universal links between Mentor’s Tacit Knowledge Transfer and Protégés’ Propensity for leading Scientific Thinking
SPEAKER: Yifang Ma

ABSTRACT. In his Nobel banquet speech, Paul A. Samuelson acknowledged the key role his mentor played in developing his discoveries, “The first thing you must do is to have great teachers,” a sentiment echoed throughout science. The 4th epoch of science is collaboration, and a typical aspect of collaboration in science is mentorship. Mentorship purportedly shapes success by developing a mentee’s approach to thinking by conveying tacit knowledge about good practice, tools of the trade, and methods for presenting novel ideas that casts a long shadow over their protégés’ careers. Yet, few quantitative studies have identified a predictive link between mentorship on mentee achievement across scientific disciplines and varied outcomes of success. We examine if and how mentorship adds value to protégés’ existing talents using worldwide data on 50,000 mentors and protégés in 4 diverse fields (chemistry, biomedicine, mathematics and physics). A quasi-experiment method is used where we use if mentor is a future prizewinner or not as a treatment event, and select the mentors in control group using the Coarse Exact Matching process. In details, we find pairs of mentors with close publication records, citation impact, collaboration network, topic expertise, university ranks, etc., so that the pair of mentors are measurably indistinguishable, and the only difference is treated mentor get a scientific prize in the future while the matched counterpart not. Scientific prizewinning success often depends on unique know-how about conducting and publishing novel work, thus, protégés of the treated mentors have the unique opportunity to acquire this tacit know-how from their mentors. We report novel findings. First, a mentor’s tacit knowledge is strongly associated with protégé future success on three dimensions of achievement measures, universally across discipline and time. Protégés of winner mentors are 2.5 times likelier to be science prizewinners, 4 times likelier to be elected to the National Academy of Sciences, and 2 times likelier to write high impact articles than control group protégés in the future. Second, we used fixed effects model to confirm the results by controlling the endogenous factors, status, and social network effects. Our work provided potential implications; we highlighted the critical role of mentorship in mentees’ scientific performance, emphasized the tacit knowledge transformation from mentor to mentee, and suggested the new ways to understanding mentorship and career success.

12:30
Gender composition and communication pattern predict women's leadership success
SPEAKER: Yang Yang

ABSTRACT. Many leaders today do not rise through the ranks but are recruited directly out of graduate programs into leadership positions. We use a quasi-experiment and instrumental-variable regression to understand the causal link between students’ graduate school social networks and placement into leadership positions of varying levels of authority. Our data measure students’ personal characteristics and academic performance, as well as their social network information drawn from 4.5 million email correspondences among hundreds of students who were placed directly into leadership positions. After controlling for students’ personal characteristics, work experience, and academic performance, we find that students’ social networks strongly predict placement into leadership positions (Fig. A). For males, the higher a male student’s centrality in the school-wide network, the higher his leadership job placement will be. While centrality also predicts women’s placement, high-placing women students have one thing more: an inner circle of predominantly female contacts who are connected to many non-overlapping third-party contacts (Fig. B&C&F). Women with a network centrality in the top quartile and a female-dominated inner circle have an expected job placement level that is 2.5 times greater than women with low centrality and a male-dominated inner circle. This suggests that high-placing women require, on the one hand, access to both same-gender job information and social support relevant to success in male-dominated leadership positions, and, on the other hand, exposure to diverse job market information that is scattered widely among students. Women who have networks that resemble those of high-placing men are low-placing, despite having leadership qualifications comparable to high-placing women (Fig. D&E).

11:15-12:45 Session 3E: Structure (misc)
11:15
Detecting Self-Propagating Attacks in Cyber Networks

ABSTRACT. In May 2017, the WannaCry ransomware spread to hundreds of thousands of computers, infecting dozens of corporate and government networks across the world. Estimates of the malware's impact range from hundreds of millions to billions of US dollars[1]. WannaCry is an example of self-propagating malware: it attempts to spread rapidly by generating a set of pseudo-random IP addresses on the Internet, scanning them to check for the Windows vulnerability it exploits, and finally delivering the malicious payload to vulnerable machines. This work aims to detect such behavior based on network characteristics.

We represent the cyber network, obtained from connection logs, as a temporal, directed, bipartite graph. Connections occur between ``internal'' IP addresses (i.e. those belonging to a subnet of interest) and ``external'' addresses outside this subnet. Only connections from one set to the other can be observed (Figure 1A).

To detect potentially malicious behavior based on traffic between internal and external nodes, we define a node feature inspired by term frequency-inverse document frequency (tfidf) [2]. Each internal IP address i is treated as a ``document'' which contains ``terms'' that represent the external IPs j that have communicated with it. The score for an edge (i,j) in the network will be incremented each time i communicates with external node j, but will be lowered if j also communicates with many internal nodes. Therefore, connections to very popular external IP addresses will have a relatively low score, but frequent connections to uncommon external IPs will result in a high tfidf score, which suggests anomalous behavior. Since each internal node i may connect to many external nodes, it will have a distribution of tfidf scores, which can also vary over time. To capture each node's distribution in a single statistic that identifies nodes with several edges with high tfidf scores (i.e., a heavy-tailed distribution), we calculate the kurtosis of this distribution (the normalized, centralized fourth moment).

Using each internal node's kurtosis score at every time window, we label nodes with particularly high values as anomalous (Figure 1D). Our data was obtained by merging a simulated WannaCry attack with actual connection data from [3]. Our preliminary results show that the malicious IP addresses (those that mimic WannaCry behavior) can be easily discerned when using the kurtosis score nodal feature.

11:30
Model selection for coupled growth of multi-layer networks

ABSTRACT. Generative growth models, such as the family of preferential attachment models, have been studied in great variety for both simple and multi-layer networks. However, the goodness-of-fit of the network growth models for empirical data has been mostly judged based on some macroscopic properties of a static time-aggregated network, e.g., by comparing the empirical degree distribution with the one predicted by the model. Instead, we present a more principled microscopic statistical model selection approach that fits and evaluates generative growth models with respect to the temporal history of a multilayer network.

Then, we apply the methodology to empirical multilayer networks of publications and authors that have citation and authorship relations, in order to understand the interplay between the accumulation of citations to a publication and the centrality of its authors in the respective scientific community. Underlying our methodology is a general formalism for generative models of coupled growth of multiple layers in a network. In our edge-centric approach, a generative model provides the probability of adding a certain edge to the network at a certain time. Edges in and between different layers are generally explained by different growth mechanisms, which are encoded in the corresponding model components. The coupling between the dynamics of these different types of edges is encoded in the multivariate nature of the model components. With this model formalism, we show how to perform a time-respecting maximum likelihood estimation of the parameters of a specific model, along with the errors on the parameters---something generally neglected in network science. Last but not least, we describe a sampling procedure on the evolving network that allows us to make our methodology highly scalable for large networks.

Using the presented methodology, we compare a range of hypothesis-driven and evidence-informed growth model definitions for empirical multi-layer networks of citations and authorships. In particular, we focus on the model component of the citations edges, in which the probability of adding a new citation depends on previously observed citations and on various centralities of its authors.

For most of the studied networks, we find that author properties play a significant role in the citation dynamics. As the best out of a set of candidate models, the method selects a model with co-authorship centrality of authors for some networks, and a model that accounts for the number of previously written articles by the authors for other networks---highlighting differences in practices and dynamics in different scientific communities.

11:45
Finding Over- and Under-represented Pathways in Higher Order Networks

ABSTRACT. Dynamical processes on networks often generate data in the form of sequences of nodes visited over time. For example, a person traveling on a public transportation system moves through a sequence of stations, and a parcel of mail moves through a sequence of post offices. In network science, such sequences are known as pathways. Pathway data is often analyzed by breaking down sequences of nodes into time stamped edges, then aggregating the result into a weighted, directed network. While this approach allows one to apply traditional network analysis techniques, it simultaneously destroys sequential information inherent to pathway data, potentially leading to inaccurate models of systems where memory serves an important role. For example, a typical ride on public transportation will not result in cycles through the same stations in one trip. Memory constraints, which arise from properties of networked systems both physical and not, motivate the study of Higher Order Networks (HONs), which incorporate sequential, temporal, or memory information into models of pathway data beyond the first-order (traditional) network.

Given a networked system and a set of observed pathways through its nodes, our goal is to identify pathways that are observed significantly more or less often than expected ``at random'', where ``at random'' refers to paths generated by a memoryless random walk in the weighted network topology. In the case of undirected, unweighted networks, the Molloy-Reed configuration model can be used as a null model to understand whether the (first-order) network topology of a system is consistent with a random model where links are generated based on observed node degrees. To understand which paths are over- and under-represented, we extend the idea of such a random configuration model to higher-order networks, where the standard configuration model is not directly applicable since a HON is a directed, weighted network. We address this by adopting the Generalized Hypergeometric Ensemble of random Graphs (gHypEG), which we use to calculate the probability of observing a given directed, weighted edge. We show that the generalization of gHypEG to HONs allows us to understand which pathways occur significantly more or less often than expected at random. Specifically, gHypEG analytically calculates the probability that an observed weight on a higher-order transition -- equivalently the number of observations of a certain pathway -- would be smaller or larger if the pathways were generated by a random walk. We show the utility of our methodology in analyzing pathways representing people moving through the London Underground.

12:00
Force-Directed Layout of Causal Topologies in Time Series Data on Complex Networks

ABSTRACT. An attractive aspect of network science is the ability to use graph-based visualisations to explore the structure and function of complex systems. While standard network analysis and visualisation techniques use data that capture interactions between pairs of nodes real data on complex systems become increasingly fine-grained, thus capturing more than just pair-wise interactions. Consider, e.g., user clickstreams in the Web, data on information propagation in social networks, or sequences of interactions or reactions in systems biology. Such time-stamped and sequential data capture complex pathways through networked systems, which contain considerably more information on a system's function than just pair-wise interactions. Recent works have shown that correlations in the chronological order of interactions on such pathways can invalidate standard methods for network-based data analysis, calling instead for higher-order modeling and analysis techniques. Moreover, these correlations are not considered by state-of-the art network visualisation methods like, e.g. force-directed or spring-based layout algorithms. Despite the importance of network visualizations, a method that takes into account such correlations in time-stamped and sequential data for static visualisations of networked systems is not yet available.

Addressing this gap, we introduce a force-directed graph layout algorithm for time-stamped and sequential data on complex networks. Building on multi-order network models of causal paths, it enables us to generate static visualisations that capture the causal topology generated by the complex interplay between temporal and topological features in time series network data. Our approach is based on an intuitive generalisation of force-directed layout algorithms to higher- and multi-order models. Compared to state-of-the-art algorithms it adds attractive forces between nodes calculated based on a higher-order model of order k that captures indirected interactions between nodes based on pathways of length k. Thanks to this simple approach, nodes that are indirectly connected via causal paths are visualised in closer proximity, while nodes that cannot indirectly influence each other are rendered at a greater distance. We thus argue that our algorithm considerably enhances the correspondence between the position of nodes and their role in a system's causal topology.

As demonstrated in Figure 1, our method also highlights cluster structures that result from the interplay between topological and temporal features of time series data. Providing a novel and intuitive approach to incorporate temporal correlations into standard force-directed layout algorithms, our work advances our ability to visualise structures and patterns that remain invisible in state-of-the-art network visualizations. Given the wide usage of network visualizations and the growing importance of time series data on complex networks, our contribution should be of broad interest for the NetSci community. We would thus be happy to present our work in an oral contribution.

12:15
Anomaly detection in complex networks as an indicator of model over-simplification

ABSTRACT. The composer Roger Sessions wrote in a it New York Times essay that "I also remember a remark of Albert Einstein, which certainly applies to music. He said, in effect, that everything should be as simple as it can be but not simpler!''. The attempt to make every theory and model as simple as possible has been driving physics research since before Einstein. No friction, the spherical cow, and the point-like charge or mass are common approximations. A crucial question, however, is how to determine when the model is simpler than it must be? In the context of complex networks and graph theory, the initial focus was on undirected, unweighted networks. More recently, weighted, directed, multiplexed networks have been the focus of much research attention. However, most researchers still prefer to represent a systems network of interactions as if undirected and unweighted. The question, again, is how to determine when such a model is good enough, especially in the absence of data for testing of predictions. Here, we propose that the presence of anomalies in the structure of the undirected and unweighted projection of the network is a diagnosis tool that can be used to identify situations where the model is simpler than it must be. Our starting observation is the report of betweenness centrality anomalies in transportation networks. Guimerà et al reported that in air transportation networks nodes with large degree do not necessarily have the highest betweenness centrality, whereas some low degree nodes can have large betweenness centralities. The emergence of these anomalies has been attributed to the multi-community structure of the network and to spatial constraints such as geopolitical boundaries. Nevertheless, the general mechanisms governing the emergence of such anomalies remain unknown. In order to tackle these questions, we investigate the structure of the inter-city bus transportation network and the worldwide air transportation network using four different data sets to explore the origins of the centrality anomalies in transportation networks; Fig.~1. Our analysis reveals that unweighted transportation networks exhibit centrality anomalies for a significant fraction of the nodes compared with an appropriate null model with the same degree distribution. However, these anomalies disappear when we consider weighted representations of the network. Our findings support the hypothesis that such centrality anomalies are a symptom of a model that is simpler than it must be. Our findings have directed implications for the modeling of human dynamics, the spread of diseases, crime spreading, and spatial networks. Moreover, they also hint at clues about some of the challenges in modeling in systems biology, and economics and other social sciences where data incompleteness may be pervasive.

12:30
A Polynomial-Time Deterministic Approach to the Traveling Salesperson Problem
SPEAKER: Hiroki Sayama

ABSTRACT. We propose a new polynomial-time deterministic algorithm that produces an approximated solution for the traveling salesperson problem. The proposed algorithm ranks cities based on their priorities calculated using a power function of means and standard deviations of their distances from other cities and then connects the cities to their neighbors in the order of their priorities. When connecting a city, a neighbor is selected based on their neighbors’ priorities calculated as another power function that additionally includes their distance from the focal city to be connected. This repeats until all the cities are connected into a single loop. The time complexity of the proposed algorithm is O(n^2), where n is the number of cities. Numerical evaluation shows that, despite its simplicity, the proposed algorithm produces shorter tours with less time complexity than other conventional tour construction heuristics. The proposed algorithm can be used by itself or as an initial tour generator for other more complex heuristic optimization algorithms.

(Preprint available from https://arxiv.org/abs/1608.01716v4)