NETSCI19: NETSCI 2019
PROGRAM FOR FRIDAY, MAY 31ST
Days:
previous day
all days

View: session overviewtalk overview

11:15-12:45 Session 4A: Bio (theory)
11:15
Integration of probabilistic control theory with metabolic flux analysis

ABSTRACT. Metabolic pathways are essential chemical processes that produce complex molecules indispensable for living cells. Understanding how metabolic networks are controlled is a challenging question though. In this work, we present an integration of probabilistic control theory with metabolic flux analysis, enabling us to analyse the control of flux in metabolic networks. Metabolic networks are represented by a stoichiometric matrix, which embeds constraints between reaction fluxes arising from network topology and mass conservation. The stoichiometric matrix enables the computation of reaction correlation coefficients, which represent Pearson correlations between reaction fluxes over all feasible steady states, and can in turn be interpreted as failure probabilities in the relation between two reactions. We applied this technique to genome-scale metabolic networks corresponding to four healthy and cancer tissue types, namely breast, lung, kidney and urothelial cancer. We then used a Probabilistic Minimum Dominating Set (PMDS) model to find a minimum set of nodes that act as driver nodes to control the entire network (Nacher & Akutsu, 2015). We observed that correlations are stronger in cancer than healthy states for three out of four cancer types. The minimum dominating set is always smaller in cancer than in the healthy state, which suggests that fluxes tend to be more strongly controlled in cancer states, even though topological analysis shows a similar local structure in all networks. These results are supported by biological observations showing that cancer cells exhibit a more streamlined form of metabolism with increased activity in several core metabolic pathways. More widely, the integration of metabolic flux analysis with probabilistic control theory opens up new possibilities to understand the control of metabolism in natural organisms and diseases.

11:30
The cell cycle as an interaction of stable and oscillating motifs

ABSTRACT. Modeling the system level interaction patterns and dynamical landscape of biological systems is essential to understand the principles that govern every living organism. Mammalian cell division is triggered by environmental signals that engage a dense network of interacting genes, which work together to execute the ordered progression of cell cycle events required for correct duplication and separation of a cell’s genetic material. To model this interacting system we developed a synchronous Boolean model of the cell cycle of mammalian cells and verified that it has a cyclic attractor that reproduces the main phenotypic features of dividing cells [1]. We have found that the cell cycle network is dynamically modular: it can be decoupled into functional modules. When isolated, each module acts as autonomous decision maker (phenotypic switch) with a few distinct attractors. Wired together, the modules toggle each other’s attractors, forming a cycle. We thoroughly examined both modules (named Restriction Switch and Phase Switch) and the coupled system separately. At the level of the isolated modules a handful of stable motifs (self-sufficient subnetworks, as defined in [2]) are driving fixed point attractors. When using general asynchronous update, the coupled model has a complex attractor whose main cycle matches the cyclic attractor resulting from synchronous update. The complex attractor is toggling between the stable motifs of the modules, while also keeping the order and combination observed in biology, further emphasizing the integrative nature of the coupling. We argue that the cell cycle emerges as an interplay between stable and oscillating motifs. We present our emerging understanding of the highly non-trivial rules and conditions that must be fulfilled for the coupling of decision-making Boolean switches to yield a complex attractor that adheres to the rules of dynamical modularity. Profound understanding of the motif and module interactions could greatly contribute to more efficient ways of control as well to the unraveling of complex diseases affecting the cell. [1] Deritei, D et al. "Principles of dynamical modularity in biological regulatory networks." Scientific reports 6 (2016): 21957. [2] Zañudo, JGT, R Albert. "An effective network reduction approach to find the dynamical repertoire of discrete dynamic networks." Chaos 23.2 (2013): 025111.

11:45
The effective graph captures canalizing dynamics and control in Boolean network models of biochemical regulation

ABSTRACT. Boolean Networks (BN) are useful models of biochemical regulation in systems biology, capturing thequalitative dynamics of genetic control, cell signalling, metabolism, etc. An important aspect of these modelsis the ability to model both the structure and dynamics of multivariate interactions. However, the underlyinginteraction graph (the structure of the network) fails to accurately represent the variable effectiveness ofpathways in propagating control signals—a consequence of logical redundancy in the dynamical transitionrules of BN. This way, the dynamics is said to becanalized, as it is robust to interventions in redundantpathways, but responsive to interventions in effective pathways. Here we introduce theeffective graph, aweighted subgraph of the original BN interaction graph, where edge strength reflects a statistical measure ofnonlinear redundancy based on schema redescription and the Quine–McCluskey algorithm. The effectivegraph is thus a more parsimonious representation of how variables truly interact in a given BN, integratingboth structure and dynamics. Using simple network motifs, random ensembles, and systems biology models,we show that the effective graph captures the canalizing dynamics of BN better than the original interactiongraph. Specifically, we show that the effective graph: 1) is better at capturing the spread of perturbationsafter flipping a node’s state, such as those induced by drug therapies or gene knockouts; 2) helps understandhow control operates and propagates in BN, including revealing how structure-only methods under- or over-estimate the set of required control variables. Overall, our results indicate that the effective graph providesa valuable enriched description of multivariate interactions in BN that can improve our understanding,prediction, and control of complex systems in general and biochemical regulation in particular

12:00
Oscillating nodes can drive a Boolean network into an attractor

ABSTRACT. Cell types and behaviors can be understood by network representation and dynamic modeling of the interaction among biomolecules. In Boolean networks, the simplest dynamic models, the future state of each node depends on a Boolean logic function of the current state of its regulators (parent nodes). The state transition graph obtained from this model gives great insight into the possible system trajectories and attractors [1]. An alternative method to identify the attractors is based on incorporation of the regulatory logic into the network, and then finding generalized positive feedback loops called stable motifs and determining the key driver nodes of the stable motifs [2,3]. These driver nodes are the nodes which when fixed to their state corresponding to a particular attractor can drive the entire network to that attractor. Here, we present the use of causal logic [3] to identify nodes that can act as drivers even when their state is oscillating instead of being fixed. We find that when there is an exclusive sufficient pathway from the oscillating node state to a motif’s driver node and the average ON period of the oscillation is larger than or equal to the maximum time required by the driver node to stabilize the motif, then the oscillation definitely stabilizes the motif. We use the number of nodes and various cycle lengths in a motif to estimate the maximum time required to stabilize the motif (see Fig 1A for an example). Fig 1B shows an example from plant cell biology where we observe this kind of network structure [4].

12:15
Target controllability in genetic networks of macrophage activation

ABSTRACT. Macrophage cells play an important role in the Multiple Sclerosis (MS) disease. They are known to participate both to the degenerative process, myelin destruction, and to the regenerative one, coordinating remyelination. We focus on two activation states of the macropaghes: ‘alert’, which senses the environment, and ‘pro-inflammatory’, which is entitled of the cell defense against external agents. The correct genetic activation of macrophage phenotypes permits a correct remyelinating response [1], thus the possibility to steer it towards an healthy state while acting on a limited number of genes (drivers) would be greatly advantageous.

We model macrophage activation as a network (Figure 1.a), where N nodes correspond to genes involved in inflammation and directed links correspond to significant influences (inhibition or activation) as retrieved from macrophages.com [2] activation pathways. To enhance interpretation, genes are assigned to four different categories, according to their position inside a cell. We assume a linear time invariant dynamics and model our problem in a target controllability framework [3]. Each gene is tested as a driver and target nodes are the 19 genes for which the difference in gene expression between the two states of the macrpohage cells is most significant (p < 0:05) for patients and controls (Figure 1.b). Since computing the rank of the controllability matrix is ill-conditioned for a large network, it is not possible for us to test all target nodes at the same time. We compute the target control centrality as the number of target nodes that can be controlled from a driver node, when the target are chosen as follows: Step 0: the target set contains the first target node, the target control centrality is zero. Step 1: build the subgraph to apply the Kalman criterion, nodes accessible from the driver that can reach the target set. Step 2: check target controllability. – If the configuration is not controllable, discard the last target added to the target set and test the successive one; – else, include the successive target in the target set and increase by one the target control centrality. Repeat steps 1 and 2 until the last target node is tested.

Results show that the driver nodes IRF7, PRKCD and STAT1, known to be involved in the MS disease [4] and [5], can control up to 7 target nodes.

Our work is a preliminary step towards the identification of the genes influencing the inflammatory process of macrophages, which is a crucial mechanism in the MS’ disease.

12:30
The clock of chemical evolution
SPEAKER: Gabor Vattay

ABSTRACT. Chemical evolution is essential in understanding the origins of life. We present a theory for the evolution of molecule masses and show that small molecules grow by random diffusion and large molecules by a preferential attachment process leading eventually to life's molecules. It reproduces correctly the distribution of molecules found via mass spectroscopy for the Murchison meteorite and estimates the start of chemical evolution back to 12.8 billion years following the birth of stars and supernovae. From the Frontier mass between the random and preferential attachment dynamics the birth time of molecule families can be estimated. Amino acids emerge about 165 million years after the start of evolution. Using the scaling of reaction rates with the distance of the molecules in space we recover correctly the few days emergence time of amino acids in the Miller-Urey experiment. The distribution of interstellar and extragalactic molecules are both consistent with the evolutionary mass distribution, and their age is estimated to 108 and 65 million years after the start of evolution. From the model, we can determine the number of different molecule compositions at the time of the creation of Earth to be 1.6 million and the number of molecule compositions in interstellar space to a mere 719.

11:15-12:45 Session 4B: Spreading (social)
11:15
Rumor Spreading in Structured Populations
SPEAKER: Jessica Davis

ABSTRACT. Information spreading models attempt to explain how rumors, values, and ideas propagate through a population. Prior research has informed our understanding on how information spreading affects financial systems, product promotion, and data dissemination. A significant milestone in this area is the use of complex networks as the underlying substrate over which these rumors can spread. While the use of networks has revealed how different structural properties can either hinder or encourage information propagation, in order to move forward these models need to incorporate more realistic and complex properties that can capture the key features of large social systems. Individual interaction and exposure to information is limited by a variety of factors (geography, ideology, field of study, etc.). Incorporating a structure that is able to consider these constraining elements can reveal significant properties affecting the dynamics of information spreading.

In this work we analyze the Daley-Kendall (DK) Rumor Model, which describes the spread of a rumor through a population based on competing interactions between agents, on structured networks (Fig. A and B). While this model is similar to general epidemic models, the loss of interest (recovery) rate is instead defined as an interaction between individuals, which effectively removes the rumor threshold. This means that for a single, homogeneously mixed population, a rumor will always be known by a macroscopic fraction of the population. However, in a structured population individuals may belong to different communities or subpopulations that are coupled together by a tunable rate of interaction (e.g. diffusion of individuals, modularity). We use a Levin framework to model the rumor spreading process at a subpopulation level and find that critical points (global invasion thresholds) emerge that are dependent on both the level of interaction between subpopulations and the global network structure. These threshold values are verified by Monte Carlo simulations over a number of parameters and network topologies (Fig. C). The results reveal that while there is no local (subpopulation) rumor threshold, the global network structure plays a crucial role in how a piece of information propagates through a population. We further analyze the DK model on two real-world networks, the European commuting and airline network and the DBLP collaboration network and recover the threshold behavior observed in the synthetic networks (Fig D and E). Our analysis is an initial step in understanding how the global structure of a population affects the spreading of information, innovation, and collective behavior.

11:30
Beyond exposure: the complex pattern of social influence
SPEAKER: Yan Leng

ABSTRACT. Social influence manifests and drives the decisions in a wide range of behaviors, such as political voting, financial decision, and commercial behaviors. The independent cascade model and complex contagion model have been widely adopted to explain the diffusion of social influence as an outcome of single or multiple exposures to the existence of the decision to be made. In this study, we show that exposure itself does not suffice to explain social influence, which instead is a joint effect of exposure and preference elicitation of the spreader. We demonstrate this with three offline settings, visiting a grocery store, adopting the microfinance and enrolling in weather insurance. Our study provides empirical evidence to enrich diffusion models and informs network-based interventions to take advantage of preference elicitation.

11:45
Collective Information Processing in Human Rumor Spreading Networks

ABSTRACT. Rumors arise during ambiguous and potentially dangerous situations, as a way to help people understand potential risks. These unverified pieces of information can have powerful negative impacts, and have drawn recent widespread attention to their study, especially as they pertain to online social media. Previous work on the fidelity and accuracy of the propagation of rumors in the real world are at odds with laboratory experiments though, which find that rumors often become wildly distorted as they pass between individuals. It remains unknown how complex social interactions, such as the ways in which groups can combine noisy information to reduce overall uncertainty, may account for this difference. Here we design an experiment to examine whether redundant rumor pathways influence the fidelity with which rumors are spread across networks (Fig. 1(a)). In a pilot study with 40 participants, we observed that rumor recall accuracy decays over time, consistent with prior work. We do not observe a difference in the accuracy of rumors between the transmission chain and network scenario (Fig. 1(b)), yet achieving statistical significance is difficult given the experiment's small scale. We will expand on this research question by recruiting more participants using MTurk. Consistent with the previous findings, we observe that information leveling, or forgetting, is the dominant process, but we also observe addition of information and information fusion between different sources. The results of our analysis shed light on the ways in which collective wisdom may help or hinder the propagation of rumors.

12:00
Linear locally Bayes-optimal confidence-weighting can explain information cascades in group decision-making

ABSTRACT. Understanding the dynamics of group decision-making is a challenging endeavour as it must combine models of the individual’s behavioural rules with collective dynamics analysis. We proved that a naive Bayes optimal update rule to weight the neighbours’ opinions by their confidence is equivalent to linear summation. This result is in agreement with the two custom assumptions from behavioural ecology, psychology, and neuroscience by which individuals follow optimal probabilistic rules and computation is based on integration of evidence over time. We showed analytically that locally-optimal Bayesian integration of weighted votes, in order to reach a group decision, is described by an unstable linear dynamical system. We proposed that this instability will result, in some circumstances, in reduced time to group consensus, at the expense of group accuracy. We numerically compared the naive local Bayes optimal algorithm, Weighted Bayes Consensus, against an existing linear consensus algorithm with guaranteed convergence, Belief Consensus. In most cases tested Weighted Bayes Consensus converged, but not in all. On some tested graphs, Weighted Bayes Consensus achieved almost the same group accuracy as Belief Consensus in significantly fewer timesteps. In other tested graphs, however, group accuracy suffered much more, albeit with a much greater improvement in decision speed. In general our results show how individually statistically optimal decision rules can lead to linearly unstable group decision performance, which trades group decision speed against group decision accuracy. This may be of interest given the prevalence of ‘information cascades‘ in decision-making groups of humans and other animals, in which early erroneous information comes to dominate. From the perspective of evolutionary biology, since natural selection acts at the level of the individual rather than the group our results may help provide a normative explanation for such apparently non-adaptive behavioural outcomes.

12:15
Spreading and influence of fake and traditional fact-based news in Twitter

ABSTRACT. While misinformation and propaganda have existed since ancient times, their importance and influence in the age of social media is still not clear. Here, we characterize and compare the spread of information from websites containing fake news with the spread of information from traditional news websites on the social media platform Twitter using a dataset of more than 170 million tweets covering the five months preceding election day and concerning the two main candidates of the 2016 US presidential election.

We identify 30 million tweets containing a URL directing to a news outlet, sent by 2.2 million users. We classify news outlets among the top 250 shared domain names as spreading fake news and traditional news, from right to left, based on a list of news outlets curated by independent fact-checking and bias ranking organizations. We find that 25% of the tweets linking to news outlets points to websites containing fake or extremely biased news and that automated accounts diffusing fake news are much more active than the automated accounts diffusing other types of news.

We analyze the information diffusion networks by reconstructing the retweet networks corresponding to each type of news. We find that the networks of user diffusing fake news have a higher average degree and a less heterogeneous degree distribution than traditional news diffusion networks. Influencers of each news network are identified using the collective influence algorithm\cite{Morone2015}. While influencers of traditional news outlets are journalists and public figures with verified Twitter accounts, most influencers of fake news and extremely biased websites are unknown users or users with deleted Twitter accounts.

We use a causal modeling to understand the influence of the top news spreaders' activity on the supporters' activity, identified with a machine learning approach. We reveal that influencers of traditional news are driving the activity of the most part of Twitter while fake news top spreaders are, in fact, mostly following the activity of Trump supporters.

Our investigation provides new insights into the dynamics of news diffusion in Twitter, namely our results suggests that fake and extremely biased news are governed by a different diffusion mechanism than traditional center and left-leaning news. Center and left leaning news diffusion is driven by a small number of influential users, mainly journalists, and follow a diffusion cascade in a network with heterogeneous degree distribution which is typical of diffusion in social networks, while the diffusion of fake and extremely biased news seem to not be controlled by a small set of users but rather to take place in a tightly connected cluster of users that do not influence the rest of Twitter activity. Our results reveal that Twitter activity is mainly driven by traditional center and left leaning news outlets.

12:30
How Communities Mediate the Diffusion of New Ideas: The Spread of Intersectionality

ABSTRACT. This research contributes to closing the gap between computational and qualitative studies on the diffusion of scientific ideas. While computational studies have shown how network structures facilitate or obstruct diffusion, qualitative studies have highlighted that diffusion entails continuous translation and transformation of the spreading idea. We build on both these types of studies to examine how communities mediate the diffusion of ideas. As a case study, we analyze the spread of the ideas of intersectionality, first conceptualized by Kimberle Crenshaw in 1989. Using Scopus' data, we construct a network of scholars that have used the ideas of intersectionality in one of their publications, and we draw directed edges to reflect the pathways of diffusion of intersectionality among these scholars. We find that the diffusion network features communities of scholars, each assembled around one or a few highly cited scholars central within the community, who perform pivotal roles in introducing the ideas of intersectionality to their peers. By combining network analysis with automated text analysis and close reading, we test and develop the hypothesis that these scholarly communities adopt and adapt intersectionality in unique ways, in line with their research interests and familiar narratives. Particular attention is paid to the role of the hub-scholars, evaluating their influence in their community and their role in the translation of the ideas of intersectionality. We argue that our approach has potential beyond the scientific domain, particularly in the study of the diffusion of opinions, symbols, and ideas.

11:15-12:45 Session 4C: Theory (inference/models, theory 1
11:15
Specialization in Networks: Interplay of Growth and Dynamics
SPEAKER: Ben Webb

ABSTRACT. One of the characteristics observed in real networks is that, as a network's topology evolves so does the network's ability to perform various complex tasks. To explain this, it has also been observed that as a network grows certain subnetworks begin to specialize the function(s) they perform. We introduce a model of network growth based on this notion of specialization and show that as a network is specialized, via this model, its topology becomes increasingly modular, hierarchical, and sparser, each of which are important properties observed in real networks. This model is also highly flexible in that a network can be specialized over any subset of its components. By selecting these components (vertices) in various ways we find that a network's topology acquires some of the most well-known properties of real networks including the small-world property, disassortativity, power-law like degree distributions and clustering coefficients. This growth model also maintains the basic spectral properties of a network, i.e. the eigenvalues and eigenvectors associated with the network's adjacency network. This allows us in turn to show that a network maintains certain dynamic properties, specifically stability under mild conditions, as the network's topology becomes increasingly complex due to specialization.

11:30
Learning dynamical processes on complex networks from time series

ABSTRACT. Dynamical processes on real complex networks are typically accessed by time series of the state of every node [1]. A popular strategy to further our understanding of these underlying complex mechanisms is to develop mechanistic representations of these phenomena---simple dynamical models. While they offer useful, often mathematical, insights [2], they usually lack effective predictive power. The reason is that these real phenomena can hardly be represented by such simplistic representations, and more complex representations are needed to provide better predictions. In this vein, we consider the problem of learning dynamical processes on complex networks from actual time series data using deep learning. We show that this multivariate time series forecasting (MTSF) task can scarcely be successfully learned by standard machine learning approaches [3]. This is due in part to the lack of structural information encompassed by these models, for which the state of each node depends directly on the state of every other nodes. Inspired by recent advances in deep convolutional neural networks, we develop a conditional variational auto-encoder (CVAE) framework that takes such structural information into account explicitly. Instead of connecting the present state of each node to the past states of every node in the network within the CVAE, similarly to standard MTSF approaches, we consider a different CVAE for each individual node. Then, each model conditions the present state of its node only on its past states and those of its neighbors with shared parameters. Using non-markovian variations of the Susceptible-Infected-Susceptible dynamics, we show that our approach allows the generative model to learn more effectively the dynamical rules between nodes than standard MTSF approaches. By doing so, we develop a proof of concept, in line with Ref. [4] but in the realm of complex networks, which advocates the use of machine learning for predicting real complex network dynamics.

[1] A. Barrat, M. Barthélemy, A. Vespignani, Dynamical Processes on Complex Networks, Cambridge University Press (2008). [2] G. St-Onge, J-G. Young, E. Laurence, C. Murphy, L. J. Dubé, Phys. Rev. E 97, p. 022305 (2018). [3] M. Längkvist, L. Karlsson, A. Loutfi, Pattern Recognit. Lett. 42, pp. 11-24 (2014). [4] J. Pathak, B. Hunt, M. Girvan, Z. Lu, and E. Ott, Phys. Rev. Lett. 120, p. 024102 (2018).

11:45
Emergence of Scaling in Complex Substitutive Systems
SPEAKER: Ching Jin

ABSTRACT. A considerable number of ideas, products and behaviors spread by substitution - to adopt a new item, one needs to give up the old one. Consider the rare coexistence of creationism and evolutionism, or the development of new healthy habits, or the fact highlighted by Kuhn’s term paradigm shifts that scientists adopt better description of reality (e.g. Conservation of Energy) and abandon old framework (e.g. Phlogiston). Despite its ubiquity, our understanding of substitutive systems remains on the level of simple transitions, ignoring the complex networked relationship among products, partly due to the lack of empirical datasets tracing their dynamics. With five years’ endeavor, we collected four high-resolution datasets on substitutions: 1) handsets (8,928 different types adopted by 3.6 million users in a decade), 2) automobiles (all models sold in North America in 5 years), 3) scientific fields (4,320 scientific fields studied by 246,630 scientists in 20 years) and 4) mobile apps (2,672 most popular apps in Apple Store). By studying their impact dynamics and mapping substitutive patterns to evolving weighted networks (Fig. 1AB), we discovered three generic principles governing substitution, enabling us to build a minimal substitution model, which not only accurately captures the microscopic dynamical patterns (Fig. 1C), demonstrates the structural relationship between products, but also elucidates an unexpected temporal scaling of substitutive systems, allowing us to collapse diverse impact trajectories from different systems into one universal curve (Fig. 1DE). These results highlight a remarkable degree of regularity and universality in substitutive systems, providing reliable measures of impacts in such systems that may have potential policy implication.

12:00
A Heuristic Approach for Determining Driver Nodes in Complex Networks
SPEAKER: Babak Ravandi

ABSTRACT. Large complex dynamical systems behave in ways that reflect their structure. There are many applications where we need to control these systems by bringing them from their current state to a desired state. Affecting the state of these systems is done by communications with its key elements, called driver nodes, in reference to their representation as a network of nodes. Over the past decades, much focus has been paid on analytical approaches to derive optimal control configurations based on the concept of Minimum Driver node Set (MDS) for directed complex networks. However, the time and computation cost required to reach control using the MDS is not always efficient. In many applications, quickly locating a small set of nodes that efficiently control a significant percentage of the network is a better alternative. This is the approach we take in this paper. We seek to understand and employ the statistical characteristics of the MDSs to design efficient heuristic approaches. More precisely, identifying and using a small enough set of drivers that are effective in controlling most of the network. We test the heuristic approach and discuss its effectiveness for a variety of networks. Compared to the analytical approaches, the proposed heuristic performs in linear computational complexity and it controls a significant percentage of the network. Also, in terms of the minimum time required to achieve controllability, the heuristic approach is orders of magnitude faster than using the minimum driver nodes.

Fig.1 illustrates the performance of our heuristic on a few networks. The fraction of minimum driver nodes required to control the networks is denoted by n_d and the fraction of driver nodes used by the heuristic approach is denoted by n_d'. The minimum time needed to reach full control using the MDS approach is denoted by \tau and the minimum time needed to reach maximum partial control using the heuristic approach is denoted by \tau'. The maximum percentage of network controlled by the heuristic approach is denoted by c_c. The minimum time, i.e., the controllability index (\tau and \tau'), and control coverage (c_c) are defined based on Kalman’s rank condition. Lastly, cost of the heuristic approach (percentage of network sampled to create a driver node set) is denoted by S in Fig.1 (a-b). The results show in exchange for using extra driver nodes the heuristic outperforms the MDS approach in terms of the minimum time required to reach control. Moreover, our empirical results show by sampling 70% of the network, the heuristic approach achieves 85% control coverage (due to the space limits all results are not presented in this abstract).

12:15
Inference and Model Selection for Mechanistic Network Models
SPEAKER: Jp Onnela

ABSTRACT. There are (at least) two prominent paradigms to the modeling of network structure. In the statistical (probabilistic) approach, one describes a model that specifies the likelihood of observing a given network, i.e., these are probabilistic models of data in the shape of a network. In the mechanistic approach, one specifies a set of domain-specific mechanistic rules that are used to grow or evolve the network over time. Both modeling approaches provide distinct angles and advantages to our understanding of complex systems. One of the main strengths of the statistical approach is the ability to perform parameter inference and model selection in a statistically principled manner, but a downside is that these models usually do not scale well, and incorporation of data generating mechanisms in the likelihood function can be difficult. Mechanistic models, on the other hand, make it easy to incorporate aspects of network generation and these models are generally scalable, but there are few principled tools available for them for parameter inference and model selection. I will present a framework that may be used to carry out both parameter inference and model selection for mechanistic network models, and I will demonstrate its application to both simulated and empirical networks.

12:30
Choosing to grow a graph: Modeling network formation as discrete choice
SPEAKER: Jan Overgoor

ABSTRACT. We provide a framework for modeling social network formation through conditional multinomial logit models from discrete choice and random utility theory, in which each new edge is viewed as a ``choice'' made by a node to connect to another node, based on (generic) features of the other nodes available to make a connection. This perspective on network formation unifies existing models such as preferential attachment, triadic closure, and node fitness, which are all special cases, and thereby provides a flexible means for conceptualizing, estimating, and comparing models. The lens of discrete choice theory also provides several new tools for analyzing social network formation; for example, mixtures of existing models can be estimated by adapting known expectation-maximization algorithms, and the significance of node features can be evaluated in a statistically rigorous manner. We demonstrate the flexibility of our framework through examples that analyze a number of synthetic and real-world datasets. For example, we provide rigorous methods for estimating preferential attachment models and show how to separate the effects of preferential attachment and triadic closure. Non-parametric estimates of the importance of degree show a highly linear trend, and we expose the importance of looking at nodes with degree zero. Examining the formation of a large citation graph, we find evidence for an increased role of degree when accounting for age.

11:15-12:45 Session 4D: Success 2
11:15
The disciplinary structure of nation’s scientific production
SPEAKER: Lili Miao

ABSTRACT. We investigated the disciplinary structure of nation’s scientific production and its temporal evolution by using an annotated bibliometric dataset of publication counts from the Web of Science for over 200 countries between the years of 1973 and 2016, organized into 143 disciplinary classifications. We used the revealed comparative advantage (RCA) as an indicator of a country’s strength in a certain discipline, and applied it to examine how production-strength in certain disciplines is distributed across countries. The RCA-based discipline proximity network revealed three clusters of disciplines and corresponding countries—one on engineering and physics, another on medical sciences, social sciences, and the humanities, and the other in naturalist disciplines. We also found that scientifically advanced countries tended to have more homogeneous disciplinary production, and that the disciplinary structure of neighboring countries tended to be similar. We further found that the Gini index of the disciplinary distribution of country’s scientific production was correlated with its economic complexity index. Our study reveals common patterns of scientific production, which may lay a foundation for further analysis on the development of scientific enterprise in each nation as well as the relationships between scientific production and economic development.

11:30
Scientific Prize Network Predicts Who Pushes the Boundaries of Science
SPEAKER: Yifang Ma

ABSTRACT. Scientific prizes create strong incentives to resist conservative tendencies and encourage exploration on high-risk, high impact science. Winning a scientific prize for a scientist can be considered as an ultimate recognition of her/his prior scientific achievements and an incentive to further breakthroughs in the future. For these reasons, the prestige attached to prizes (“a signal of the best of the best work”) and the number of prizes has grown. For a long time, along with Zuckerman’s landmark work, the Nobel prize and Nobel laureates have been well studied. Yet, little is known about the intricate network of prizes when considering all the scientific prizes together, interdisciplinary linkages, and how locations in social ties predict future prizewinning.

By curating the unique dataset covering more than 3,000 different prizes including biomedicine, chemistry, mathematics and physics for 10,455 winners, more than 50 countries and over 100 years, we summarized the evolution of a century’s scientific reward system, the exponential growth of winners, gender balance, geographical locations and the hidden Matthew effects. We construct a prize network, in which nodes are prizes and link weights count the number of cases when the two prizes are won by the same person within a shorter period in her/his career. We find several key links between prizes and scientific advances. First, despite an explosive proliferation of prizes over time and across the globe, prizes are more concentrated within a relatively small group of scientific elites, and ties among elites are highly clustered, suggesting that a relatively constrained number of ideas and scholars push the boundaries of science. For example, 64.1% of prizewinners have won two prizes and 13.7% have won five or more prizes. Second, certain prizes strongly interlock disciplines and sub-disciplines, creating key pathways by which knowledge spreads and is recognized across science. Third, when we linked the winners using their collaboration and genealogical ties, a giant component of winners emerges which contains half of the winners. We further built a model to confirm that the two kinds of ties provide prior to multiple prizewinning. Our work describes narrative of science from a new scenario and provides a basis for predicting prizewinners, celebrating topics, and revealing the interconnectedness among celebrated scientists and their path-breaking ideas.

11:45
Taking census of physics

ABSTRACT. Despite being one of the oldest academic disciplines, defining physics is not an easy task. In this era of interdisciplinary science, of biological physics, network science and econophysics, it is becoming increasingly clear that referring to it as the science of the properties of matter and energy is nowadays outdated and inaccurate. Physicists work on very different topics, from particle physics to condensed matter and interdisciplinary physics, and their scientific production varies widely depending on the research area(s) to which they belong.

A thoroughful analysis of Web of Science data spanning more than 100 years has recently shed light on how physics has shaped and changed over time. While it is true that physics has always been in a constant dialogue with other disciplines, be it mathematics, chemistry or biology, Ref.[1] argues that much of this dialogue has been largely driven by the application of a common methodology, and the idea that in many cases complex phenomena can be understood in terms of a small number of universal laws. Following the footsteps of the late Cambridge physicist Sam Edwards, who once remarked that `Physics is what physicists do', Ref.[1] collected and analysed all available publications linked to the physics literature and provided a data-driven answer to what physics is. The research reveals that, despite a consistent number of such manuscripts appearing in a core set of physics journals, an even larger fraction of the physics literature is surprisingly not published in traditional physics journals. These interdisciplinary physics publications are rapidly growing, they are published in journals including Nature and Science, to the extreme case of the founding paper of chaos theory, which appeared in the Journal of the Atmospheric Sciences.

Building on this result, in this work[2] we focus on the inner structure of physics by looking at the evolution of the many subfields by which it is composed. We first propose a data-driven method to estimate how many physicists work in each field based on the full records of the Web of Science: (i) we first select the PACS (subfield) code for each paper appeared in the journals of the American Physical Society, (ii) we then propagate this information to other physics papers which appeared outside the APS consortium but are significantly referenced and cited by the initial core of papers, and (iii) finally suitably filter the resulting author--field bipartite network to assign scientists to research topics in which they worked for a significant share of their career.

We first show the evolution of the different physics domains, and the relative decrease in number of authors working in traditional topics such as nuclear physics, in favour of other areas such as interdisciplinary physics, electronic physics and astrophysics. We then take advantage of the newly generated dataset to understand and quantify differences across the physics spectrum in terms of average productivity, career length, field transition and role of mentorship. Our analysis unveils very different patterns of scientific production across physics disciplines. For instance, while physicists working in nuclear physics appear to be highly productive in terms of number of published papers per author, they are at the bottom of the list when information about the average team size per field is included to compute fractional productivity. Moreover, whereas some disciplines such as condensed matter require engagement from the beginning of the career, some others can still be explored successfully later in a scientific life. Strong gate-keeping effects appear to hold particularly for experimental topics, and those fields where mentorship plays a particularly strong role in the successful development of young scientists. We believe that, by shedding lights on how to better assess and compare the scientific production of physicists across different research topics, our work will be a useful reference for academic editors, hiring committees and funding agencies which are required to deal with the increasingly growing and rich structure of modern physics.

12:00
Probabilistic Reference Networks and Information-Theoretic Measure of Novelty and Influence in Creative Works
SPEAKER: Juyong Park

ABSTRACT. Amid the surging interest in scientific understanding of human behavior and intelligence, advanced methods and high-quality data of intellectual and creative achievements are enabling deep investigations into how creativity, an essential human faculty, crystalizes into novel knowledge and culture. To understand how they form and evolve, we characterize the interconnected nature of creativity using the concepts of novelty and influence that differentiate and connect creative achievements. Our general framework starts from the probabilistic reference network that compares the content of two creative works to estimate the likelihood that one referenced the other. This overcomes the following problems of classical citation networks (Figure 1A): First, with the exception of research publications, references are not seldom explicitly given, especially in cultural creations. Second, since citations are self-reported, it is subject to incompleteness due to human bias or misinformation. Our content-based reference/influence network between creative works (Figure 1B) considers all known data of older works and also accounts for the possibility that each element of a work could have been invented or originated from unknown sources by introducing a general prior. We define information-theoretic measures of novelty and influence based on the probabilities, and apply them to a comprehensive midi data of western classical piano music that have high scientific as cultural value. We find that a continual innovation against oneself as well as the entire field helps become an influential figure, and show that the developmental history of a creative enterprise can be visualized as a series of paradigmatic shifts creative styles.

12:15
On learning from repeated failures
SPEAKER: Yian Yin

ABSTRACT. Human achievements are often preceded by repeated attempts that initially fail, yet little is known about the mechanisms governing the dynamics underlying repeated failures. a simple one-parameter model that mimics how successful future attempts build on those past. Analytically solving this model reveals a first-order phase transition that separates patterns of repeated failures into regions of stagnation or progression, predicting that agents with the same characteristics and learning strategies could experience fundamentally different outcomes, who either explore distinct opportunities without a pattern of improvement, or exploit incremental refinements to systematically advance toward ultimate success. The model makes several empirically testable predictions, demonstrating that those who eventually succeed and those who do not are characterized by fundamentally distinct efficiency and quality trajectories for their repeated attempts to attain success. We collected large-scale data from three disparate domains, tracing repeated attempts by (i) NIH investigators to fund their research, (ii) innovators to successfully exit their startup ventures, and (iii) terrorist organizations to post casualties in violent attacks, finding broadly consistent empirical support that systematically verifies each of the four predictions by our model. Together, our findings unveil identifiable yet previously unknown early signals that allow us to anticipate the ultimate victory or defeat to which repeated failures may lead. Given the ubiquitous nature of failures and the paucity of quantitative approaches to understand them, these results represent a crucial step toward a deeper understanding of the complex dynamics beneath failures, the essential prerequisites for success.

12:30
Quantifying the past and present of artificial intelligence research
SPEAKER: Morgan Frank

ABSTRACT. As artificial intelligence (AI) applications see wider deployment, it becomes increasingly important to study the social and societal implications of AI adoption. On one hand, AI adoption has the positive potential to reduce human error and human bias with algorithmic rigidity. As examples, AI systems have balanced judges towards more equitable bail decisions [Kleinberg et al (2017)], AI systems can assess the safety of neighborhoods from images [Naik et al (2017)], and AI systems can improve hiring decisions for board directors while reducing gender bias [Erel (2018)]. On the other hand, recent examples suggest that AI technologies can be deployed without understanding the social biases they possess or the social questions they raise. Consider the recent reports of racial bias in facial recognition software [Buolamwini & Gebru (2018)], the ethical dilemmas of autonomous vehicles [Bonnefon et al (2016)], and income inequality from computer-driven automation [Acemoglu & Autor (2011)]. So, are social scientists keeping pace with the exponential advancement of AI research?

Here, we use the Microsoft Academic Graph to study the evolution of AI research and its related academic fields from 1950 to 2017. While we often use artificial intelligence today in reference to machine learning, the meaning of AI has fluctuated in the last 60 years to variably emphasize vision, language, speech, and pattern recog- nition (see Fig. 1A). Similarly, the referencing of AI research and of other academic fields towards AI research have changed over time. Although early AI researchers exhibited strong referencing behavior towards Philoso- phy, Geography, and Art, modern AI research references Mathematics and Computer Science most strongly (see Fig. 1B). Conversely, other fields—including the social sciences—do not reference AI research in proportion to its growing paper production (see Fig. 1C).

Meanwhile, the diversity of scientific impact among AI research institutions is decreasing (see Fig. 1D) thus enabling a transition from academic institutions to industry institutions as the “center” of the AI research community (see Fig. 1E). This transition, which started around the first Conference and Workshop on Neural Information Processing Systems in 1986, may be aided by the increasing prominence of AI-specific conferences. Industry’s growing role is alarming for studying the social and societal dynamics of AI technologies because social science research is less likely to reference AI publications with authors who have industry-based affiliations. While AI research decreases its share of references to social science, the fields that study social bias, ethical concerns, and regulatory challenges may be ignorant of new AI technology.

11:15-12:45 Session 4E: Structure (data)
11:15
Mischaracterization of Core-Periphery Structure in Complex Networks

ABSTRACT. Core-periphery structure in complex networks is often characterized in one of two ways. The first, the Borgatti-Everett formulation, specifies a block model in which core nodes connect to one another, peripheral nodes connect to the core, but peripheral nodes do not connect to other peripheral nodes. The second, the k-core decomposition, proposes a layered structure, in which a nested hierarchy of “k-shells” is extracted by recursively removing all nodes with degree less than k. Both characterizations have been applied across a number of substantive domains, allowing researchers to address a variety of problems ranging from the modeling of transnational markets to the design of effective vaccination strategies.

However, the Borgatti-Everett and k-cores characterizations suggest two conflicting conceptualizations of what constitutes core-periphery structure. Specifically, the imagery of “nested layers” evoked by the k-cores decomposition aligns with a shell-like block model, rather than a Borgatti-Everett block model. This is problematic for fields outside of network science that frequently use “core-periphery” and “k-cores” interchangeably to describe the same mesoscale pattern because it masks the conflicting explicit and implicit conceptualizations made by different core-periphery detection algorithms. Consequently, this can result in unexpected and diverging characterizations of core-periphery structure if the assumed structure of an algorithm diverges from the existing structure within a network.

We assess the substantive consequences of such structural mischaracterizations. First, we align various core-periphery algorithms with different core-periphery assumptions by comparing the structures extracted by the algorithms on synthetic Borgatti-Everett and shell networks with the existing structures inferred by respective stochastic block models. Next, we evaluate how prominently networks exhibit the two structures in practice by using the KONECT network database to evaluate the model fit of Borgatti-Everett and shell stochastic block models across many real-world networks. Finally, having characterized both the algorithms and real-world networks in terms of the competing structures, we assess the downstream repercussions of mischaracterizing core-periphery structure in domain-specific contexts. This work provides conceptual clarity to the notion of “core-periphery structure” and suggests practical guidelines for carefully extracting that structure.

11:30
A nonasymptotic measure for characterizing heavy-tailed networks

ABSTRACT. Heavy-tailed networks are often characterized by their degree distribution's similarity to a power law. However, many such real-life networks do not have power-law degree distributions, and in most applications the scale-free nature of the network is irrelevant so long as the network possesses a heavy tail. In this presentation we introduce the Cooke-Nieboer Index, a non-asymptotic statistical measure of the heavy-tailedness of a network's degree distribution which does not assume a power-law form. We apply the measure to several synthetic and real-life networks, compare it to the standard tail index method, and discuss its ability to distinguish between heavy-tailed, random, and planar networks.

11:45
Network Statistics Extrapolation for Approximate Bayesian Computation
SPEAKER: Sixing Chen

ABSTRACT. Network models (both statistical and mechanistic) continue to increase in sophistication, but parameter estimation continues to be challenging for both classes of models due to unwieldy or intractable likelihoods, which is a scenario suitable for likelihood-free methods. One recent likelihood-free method is approximate Bayesian computation (ABC), which is a simulator-based Bayesian approach. While seemingly suitable, the application of ABC for parameter estimation in network models requires the simulation of network realizations from candidate models in order to compute summary statistics of said realizations. This can be limiting or prohibitive when the observed network is large (e.g. large social networks of millions of nodes) since the generation of large network realizations as well as computing statistics of such realizations can be time consuming. To enable the use of ABC for parameter estimation with large observed networks, we propose a procedure to extrapolate (in terms of the number of nodes) the value of the summary statistics in order to significantly reduce the time required to generate network realizations, as well as use of sub-sampling to further reduce time required to compute network statistics. The extrapolation procedure takes advantage of predictable growth pattern (in terms of the number of nodes) that many network statistics can exhibit, and can dramatically reduce the time required for the whole ABC procedure. We show that the resulting ABC posterior from extrapolation is very similar to the gold standard ABC posterior from fully generated (no extrapolation) network realizations. The proposed procedure can extrapolate more than 1 order of magnitude, and can enable analysis that was previously not possible of large empirical networks through ABC.

12:00
Mapping undersampled networks

ABSTRACT. The map equation is a flow-based community detection method, which has proven to be effective also for capturing communities in the network structure itself. It takes advantage of the duality in information theory between finding regularities and compressing data with the modular description length measured by weighted entropy terms. However, because the map equation builds on naive entropy estimation, it assumes complete data such that analyzing undersampled networks can lead to overfitting. For example, the description length varies with the fraction of observed links and the map equation favors more communities if undersampling breaks the giant component into smaller components below the detection threshold. As a consequence, it is challenging to evaluate results with cross-validation with anything but wasteful equal splits and searching for the partition with the shortest code length can detect those with more and smaller communities than actually exist. To overcome these problems, we present two methods based on entropy estimation for undersampled discrete data. First, replacing the entropy terms in the map equation with the Grassberger estimator makes the modular description length nearly independent of the amount of data and therefore enables more effective cross-validation. Second, incorporating a Bayesian approach with a Dirichlet prior distribution into the map equation to estimate network flows avoids overfitting in undersampled networks and enables detection of statistically significant communities. We have integrated both approaches in the Infomap algorithm for detecting communities with the map equation framework and will demonstrate the benefits over the standard approach.

12:15
Sampling Networks by Nodal Attributes

ABSTRACT. Mapping out the underlying network is an essential part of studying complex systems. Various kinds of networks have been constructed from empirical data sets, however, this procedure is subject to noise and biases for various reasons. The consequences of these kinds of sampling could be nontrivial depending on how networks are sampled, which in turn would prevent us naively generalizing the properties of the observed networks to the whole system. Actually, there have been a number of theoretical and numerical studies on network sampling, which are roughly classified into random node/link sampling, and path-based sampling (e.g. snowball sampling).

In this work, we study another class of network sampling, where links are sampled with a probability depending on the attributes of the nodes. More specifically, we study the case where each node $i$ has an attribute $h_i$, which are drawn from the distribution $\rho(h)$, and the links between nodes $i$ and $j$ are sampled with the probability $r(h_i,h_j)$. A motivating example of this class is the the model proposed in (Torok~et~al.~PRE~2016), which was introduced to explain a commonly observed degree distribution in the data sets of social networks. While people communicate using various communication channels, i.e., devices and online services, the sources of the data sets are often limited to a single communication channel due to technical and privacy reasons. Thus, extracting a single communication channel is regarded as a sampling process, through which non-trivial biases are inevitably induced. Because each person has a different tendency to use the communication channel, the sampling process is more plausibly modelled by introducing the attributes for each person rather than by random or path-based sampling methods.

The main results are summarized as follows. First, we obtained the rigorous analytical expressions for the sampled network properties $P(k)$, $\bar{k}_{nn}(k)$, and $\bar{c}(k)$ for general functional forms of $\rho(h)$ and $r(h_i,h_j)$ when the attributes of the nodes are independent. The relation between the original network properties and the sampled network properties are also studied. It tells us that the sampling bias may play a dominant role in determining the properties of the sampled networks, indicating the potential risk of naive generalization of the data to the whole network. Furthermore, we also obtained a theory for the case where neighboring $h$ are correlated, which is important since the attributes are often correlated in reality.

The significance of this study is not limited to the social network analysis as it can be potentially applied to a wide range of complex networks.

12:30
Scale-free Networks Well Done
SPEAKER: Ivan Voitalov

ABSTRACT. In this work, we bring rigor to the activity of detecting power laws in empirical degree distributions of real-world networks. First, we provide a rigorous definition of power-law distributions, equivalent to the definition of \textit{regularly varying} distributions in statistics, i.e., distributions with CCDF of the form $\overline{F}(k) = \ell(k) k^{-\alpha}$, where $\alpha > 0$ and $\ell(k)$ is a \textit{slowly varying function}: $\lim\limits_{x \to \infty} \ell(tx)/\ell(x) = 1$ for any $t>0$. This definition allows the distribution to deviate from a pure power law arbitrarily but without affecting the power-law tail exponent. It also highlights the point that there \textit{cannot be any hypothesis testing} to define how likely it is that the measured degree sequence comes from a power law distribution with a given tail exponent $\gamma = \alpha + 1$. Intuitively, this is due to the fact that there is an infinite number of ``degrees of freedom'' contained in the space of slowly varying functions $\ell(k)$ that make the space of regularly varying distributions nonparametric. In view of this fact, the best strategy is to consult as many consistent $\gamma$-estimators as possible to see whether they agree on a given sequence. Based on the outcomes produced by the estimators, we \textit{define} what it means for a network to have power-law degree distribution as follows.

We identify three estimators --- Hill, moments and kernel-type --- that are based on \textit{Extreme Value Theory} (EVT), and that estimate extreme value index $\xi$ related, in case of regularly varying distributions, to the CCDF tail exponent $\alpha = 1/ \xi$. They are proven to be \textit{statistically consistent}, that is, converging to the true exponent value for any regularly varying distribution. Based on the values of $\xi$ that these estimators return for a given network degree sequence, we propose the following definition of power-law networks: (1) a network is {\bf not power-law (NPL)} if at least one estimator returns a negative or zero value of $\xi$; (2) a network is {\bf hardly power-law (HPL)} if all the estimators return positive values of $\xi$, and if at least one estimator returns a value of $\xi\leq\frac{1}{4}$; (3) a network is {\bf power-law (PL)} if all the estimators return values of $\xi>\frac{1}{4}$. Power-law networks with divergent second moments of their degree distributions, i.e., $\gamma<3$ and $\xi>1/2$, are particularly interesting for a variety of reasons, therefore, we also define a subclass of power-law networks with {\bf divergent second moments (DSM)} if all the estimators return values of $\xi>1/2$.

Finally, we implement all the considered estimators in a software package available to the public and apply them to a representative collection of real-world networks obtained from the KONECT database. According to their estimates, real-world scale-free networks are definitely not as rare as one would conclude based on the popular but unrealistic assumption that real-world data comes from power laws of pristine purity, void of noise and deviations.

14:30-16:00 Session 5A: Econ (social)
14:30
Machine intelligence for network science?

ABSTRACT. Does machine learning truly matter in network science or other sectors of statistical physics? Many network scientists, in particular those with a strong background in statistical physics, remain sceptical. This may be because computer science, traditionally, is more focussed on performance than on getting insights, offering transparency, being general or finding a minimal model. In [Timme & Nagler, News and Views: Pattern of Propagation, Nature Physics, in print] we argue that if fundamental principles underlying network dynamics are identified prior to the employment of intransparent black box feature extraction, not only hard tasks can be solved but also valuable insights may be provided. But this requires to frame our mathematical predictions according to the conditions under which the natural and artificial networks around us reveal themselves. Thus, this requires to bridge different disciplines through collaborations with researchers of complementary expertise. This talk aims to spread this message. We will exemplify this for seemingly universally optimal strategies (Generous Zero Determinant Strategies) and seemingly unresolvable (Prisoner's dilemma) conflicts of networked actors in complex noisy environments.

14:45
Innovative individuals in temporal networks: Identification, socio-economic characterization, and predictive applications

ABSTRACT. Whether we have to choose where to have dinner, which book to buy online, or which movie to watch, not all of us have the same taste for novelty. Robust evidence indicates that, in a social system, most individuals tend to choose what is already popular. This fact is represented by cumulative advantage models which assume that popularity begets popularity. Such a bias toward popularity amplifies popularity inequalities and reduces the agreement between individuals' or units' success and inherent talent or quality.

On the other hand, a central hypothesis in Rogers' popular innovation diffusion theory is that there exist "innovative" individuals who tend to be relatively earlier in adopting new ideas than other members of a social system. As innovative individuals are more subject to behavioral change, their accurate identification and well-timed targeting have the potential to increase the odds of success of marketing and behavioral-change campaigns. Yet, most studies to date have quantified individual innovativeness based on surveys on purchased items and self-reported scales, which makes it hard to replicate their findings at a large scale.

Here, we leverage credit-card transaction data to infer individual innovativeness, to unveil the socio-economic traits of innovative individuals, and to leverage the identified innovative individuals to early-identify successful shops. More specifically, we analyze a unique, nationwide credit-card transaction dataset from a developing country over three years (from June 2015 to May 2018), which we partially matched with a Call Data Record dataset from the same country over 2016. We build the temporal bipartite network that connects the individuals with the shops that they visit, and the individuals' social network. We quantify individuals' innovativeness in terms of their ability to collect discoveries -- i.e., to be repeatedly among the earliest visitors of physical stores that eventually become popular. Our results reveal the existence of a niche of highly-innovative individuals, termed "discoverers", whose number of discovered shops cannot be explained by chance.

We characterize the discoverers in terms of demographic and socio-economic traits (age, total expenditures, and mobility, among others). Besides, we formulate the predictive problem: Can we predict the presence of a shop among the top-shops by popularity, based on the shop's earliest visitors? To address this question, we build simple binary classifiers based on four groups of individuals: discoverers, social hubs, structural influencers, and active individuals (i.e., individuals who visited many shops). We find that the presence of these special individuals among the earliest visitors of a shop can result in an increased likelihood that the shop will be popular, and we observe the strongest predictive signal for the discoverers (see Fig. 1).

15:00
Network signatures to predict job change: Evidence from a large business social network

ABSTRACT. Understanding network-level behavior of job change has been of interests for both academics and practitioners. However, little study has examined the effect of dynamic change in network structure on job change, drawing on a large social network dataset. Japan has a unique culture where people exchange business cards in nearly every formal and informal setting. Capturing data from the business card exchange process allows us to reconstruct the structure of and dynamics of job change in a large social network (over 1.5M individuals for 3 years). Our estimation result showed two major results: (1) the increase in clustering coefficient was the contribution factor for job change; and (2) the increase in the number of new bridging ties was negatively correlated with job change.

The underlying mechanisms might be twofold. First, individuals in tightly-knitted network tend to have more strong ties, which delivers time-critical information on desirable jobs. Second, knowing people from different communities gives us fresh look at our job, which makes reevaluating our work environment in a positive way and discourages will to change job.

15:15
An agent-based framework of water governance in the Lake Champlain Basin: interactions between policy alternatives and network structure

ABSTRACT. The water governance system in the Lake Champlain Basin (LCB) is a multi-scale, multiplex social-ecological network that links governance actors, public sentiment, and environmental features and processes. The efficacy of the governance network to identify, plan for, and manage issues of public concern is a function of network topology. For example, water quality in Lake Champlain is degraded by harmful cyanobacteria blooms, which have adversely affected public health, reduced property values, and led to economic losses. Efforts to mitigate impacts and reduce anthropogenic drivers (e.g., agricultural runoff, aging stormwater infrastructure) are the focus of multiple overlapping policy arenas. Within these arenas, actors at federal, state, and municipal levels interact to address drivers and impacts of social-ecological change. Intra-arena dynamics are governed by rules (e.g., eligibility criteria, defection penalties). Some actors (e.g., state agencies) have the authority to change these rules, effectively changing how other actors may prioritize actions and allocate resources. These changes may affect network structure and actor behavior, ultimately altering system trajectory. To simulate the possible impacts of alternative policies across multiple scales and governance arenas, we constructed a coupled agent-based model (ABM) of water governance in the LCB that connects top-down governance to decisions made at regional, local, and individual scales. The model captures the network of actors and interlinked policy arenas embedded in the system's complex "ecology of games," thereby providing the capacity to experiment with policy design for collective action. We show that while policies that generate cooperation at broader scales may lead to better environmental outcomes, benefits are unequally distributed. Further, we find that non-governmental organizations serve an important role in bridging gaps between water governance arenas in the LCB. We also find that the degree of municipal collaboration is non-linear with respect to financial incentives and network structure. Our findings suggest that while regionalization of policies may reduce runoff, policymakers must consider political dynamics in policy design to ensure equity and continued collaboration.

15:30
Global labor flow network reveals the hierarchical organization and dynamics of geo-industrial clusters in the world economy
SPEAKER: Jaehyuk Park

ABSTRACT. Firms often achieve a competitive advantage through the formation of geo-industrial clusters. Although many exemplary clusters, such as Hollywood or Silicon Valley, have been frequently studied and are a key conceptual framework for policymakers and business economists from global organizations to regional development agencies of national governments, systematic approaches to identify and analyze geo-industrial clusters at the global scale are rare. In this work, we use LinkedIn’s data documenting the professional employment histories of more than 500 million users over 25 years to construct a labor flow network of over 4 million firms across the world and apply a recursive network community detection algorithm to reveal the hierarchical structure of geo-industrial clusters. We show that geo-industrial clusters, as a unit of aggregation, exhibit a stronger association between the influx of educated-workers and financial performance. We argue that geo-industrial clusters provide better insights into the growth and the decline of the economy than other common economic units such as regions or industries. Our study may provide a foundation for further cluster-based investigations of business strategy, urban economics, regional economics, and international development.

15:45
ScamCoins, S*** Posters, and the Search for the Next Bitcoin TM : Collective Sensemaking in Cryptocurrency Discussions
SPEAKER: Eaman Jahani

ABSTRACT. Participants in cryptocurrency markets are in constant communication with each other about the latest coins and news releases. Do these conversations build hype through the contagiousness of excitement, help the community process information, or play some other role? Using a novel dataset from a major cryptocurrency forum, we conduct an exploratory study of the characteristics of online discussion around cryptocurrencies. Through a regression analysis, we find that coins with more information available and higher levels of technical innovation are associated with higher quality discussion. People who talk about “serious” coins tend to participate in discussions displaying signatures of collective intelligence and information processing, while people who talk about “less serious” coins tend to display signatures of hype and naı̈vety. Interviews with experienced forum members also confirm these quantitative findings. Our claims are based on measuring the technical innovations and the price volatility of more than 140 digital currencies over three separate periods of each 100 days in 2016 and relating these variables to the characteristics of the online discussion around the coins during an overlapping period of 200 days. Relying on the insights from collective intelligence literature, we measure the quality of online discussion with three variables: equal participation by all community members, experienced community in terms of seniority and information flow from other sub-communities in terms of degree in the thread discussion network. These results highlight the varied roles of discussion in the cryptocurrency ecosystem and suggest that discussion of serious coins may be oriented towards earnest, perhaps more accurate, attempts at discovering which coins are likely to succeed. In other words, as there is less uncertainty about the coin’s technical merits, the discussion tends to become more truth-seeking.

14:30-16:00 Session 5B: Ecology
14:30
Stability and Sustainability of Resource Consumption Networks
SPEAKER: Sebastian Ruf

ABSTRACT. Sustainability of complex socio-ecological systems is a pressing societal issue. Often in the context of ecological systems, sustainability and stability to an equilibrium point are used interchangeably. Here we contribute to a dynamical systems theory of sustainability by introducing an novel notion of sustainability that is guided by the existing sustainable development literature. We argue that sustainability is a separate concept that should be considered over a finite time horizon, should hold point wise in time, and should depend on the underlying social network which drives the socio-ecological system.

The difference between sustainability and stability is considered in the context of a recently introduced model of resource consumption \cite{manzoor2016game} in which the consumption behavior of the agents in a social network is influenced by both the state of the resource being consumed and the consumption of their network neighbors. We derive sufficient conditions for stability of this model which are shown to hold on arbitrary network structure but depend on the sensitivity the agents place on social information \cite{ruf2018stability}. We extend these stability conditions, showing that under certain conditions it is the ratio of social information and ecological information that determines stability. Further we derive sufficient conditions for sustainability of this model and find that the ability to make a system sustainable are affected by the agent's sensitivity to both social and ecological information as well as the underlying social network structure \cite{halesus}.

14:45
Generic Mechanism for the Evolution of Mutualistic Networks
SPEAKER: Weiran Cai

ABSTRACT. Mutualism is a vital element that comprises natural and socio-economic systems. As evidenced in plant-pollinator and designer-contractor relations, the participating agents typically exhibit an ordered pattern of collective interactions. The interspecific relation can be represented by a bipartite network of multiple participants (plant and animal guilds for example). Such networks consistently express highly modular and nested structures, compared with their randomized counterparts. Whether and how the two structural properties of mutualistic networks can emerge out of a unified mechanism however remains unclear. Due to the absence of such mechanism, little consensus has been reached on the dynamical properties, ranging from network stability to external impacts. Here, we elucidate a unified principle that explains how high-level mutualistic network patterns can emerge from localized interactions. A rich ensemble of key dynamical behaviors are unveiled in the dynamical framework. We demonstrate that mutualism can exhibit either a stabilizing or destabilizing effect on the evolved network, which undergoes a drastic transition with the overall competition level. Most strikingly, the adaptive network may exhibit a profound nature of history-dependency in response to environmental changes, allowing it to be found in alternative stable structures. The adaptive nature of niche interactions, as captured in our framework, can underlie a broad class of ecological relations and also socio-economic networks that engage in bipartite cooperation.

15:00
Non-normal hierarchical organization drives cooperation and survivability in ecosystems

ABSTRACT. Biological systems show an amazing level of organization starting from cells to the higher ecological levels such as the biosphere [1]. The hierarchical organization of ecosystems has been for a long time the focus of study in different disciplines but it has inspired particularly in the past 20 years the development of network theory [2]. From this point of view a network materializes the complex interactions between the composing entities of large ecosystems, it thus defines the natural and structural backbone for describing complex behaviors, where dynamics are unavoidably bound to the topological properties. Recently, it has been discovered that most families of empirical networks arising from a wide spectrum of research fields manifest a high level of directed hierarchical organization yielding to a strong non-normality of the adjacency matrix [3]. In particular, non-normality appears as a universal property in ecological networks starting from the intra-species relations such as colonies of ants, bees etc., or families of wolves to large inter-species ones such as trophic networks e.g. food webs [3,4]. In this context, we have studied the role that non-normality plays in the cooperation and survivability of individuals or species in ecosystems. At odds with the common belief that a large number of interacting individual/species contributes to their mutual extinction through competition and exhausting of resources [1], we found that non-normality naturally embedded in their hierarchical organization is crucial for driving the inter-individuals/species cooperation and consequently their survivability. This mechanism is based on the fact that dynamical processes evolving on non-normal networks exhibit a peculiar behavior; initial small disturbances can undergo a transient phase and be strongly amplified although the system is linearly stable. Because of the non-normality of the network, the comprehension of the dynamical properties cannot be captured by the classical linear spectral methods, and more advanced techniques such as the pseudo-spectrum are needed to understand such behavior [3]. Our results are also supported by a dataset of empirical ecological networks of different levels of interaction.

References [1] J. D. Murray, Mathematical biology II: Spatial models and biomedical applications (Springer-Verlag, 2001). [2] M. E. J. Newman, Networks: An Introduction, Oxford University Press (2010) [3] M. Asllani, R. Lambiotte and T. Carletti, Sci. Adv. 4 eaau9403 (2018). [4] M. Asllani and T. Carletti, Phys. Rev. E 97, 042302 (2018).

15:15
Social Interactions of Single- and Mixed-Species of Freshwater Fish: A Clustering Analysis
SPEAKER: Sara Vanovac

ABSTRACT. Proximity-based social networks (PBSN) define a link between two individuals by spatial proximity at a given time. PBSN have been used to study animal interactions, and particularly the social behavior of fish [1, 2]. In this paper, we perform the first network analysis of the largest dataset assembled to date on freshwater fish: five years of data tracking 228 fish from five species every few seconds. We focus on the clustering coefficient for single-species and mixed-species. A large value of this coefficient would suggest that the network of interactions is not random, while a sustained large value over time would indicate a prolonged, purposeful social behavior.

Two high-quality regions of the data allowed to investigate both single-species and mixed-species interactions (Figure 1; left). Within each region and for each specie(s) of interest, we built a temporal network at regular time interval. As we averaged 1.08 data points per minute per species, we used 1 minute as window size to build the network. An ANOVA concluded that results obtained with similar window sizes did not statistically differ in terms of average clustering. When the signal of a fish was lost in a time window, its position was imputed as the average of the preceding and following time windows. As there is no consensus of what distance suffices to create a tie [1, 2], we computed the clustering for distances between 0 and 7 meters in regular increments of 0.3 (Figure 2; right). We found a high, sustained clustering for carps, as well as for carps interacting with other species. Results were not merely a function of species density: there were more pikes (n=158) than carps (n=114) yet pikes did not cluster. Results may be affected by the percentage of fish tracked, as carps had the highest coverage of individuals tracked. Overall, our study shows evidence for the sustained social behavior of carps in a high-resolution dataset.

[1] Canon Jones et al. Journal of the World Aquaculture Society 48(1):35–45, 2017. [2] James et al. Oecologia 143(2):211–219, 2005. [3] Monk & Arlinghaus. PLoS One 12(3):e0173989, 2017.

15:30
Resistance and Robustness of the Global Coral-Symbiont Network
SPEAKER: Sara Williams

ABSTRACT. Increasing ocean temperatures have widespread consequences for coral reefs, one of which is coral bleaching. We analyzed the global network of interactions between coral species and their symbiotic algae, dinoflagellates in the family Symbiodiniaceae, for resistance to temperature stress and robustness to network perturbations. Networks were assembled for the global, Caribbean, Pacific, and Indian oceans, as well as multiple subregions for each ocean. Null networks were created by changing either the physiological parameters of the nodes or the structures of the networks. A network bleaching model was developed in which each link is given a weight based on temperature thresholds for specific host-symbiont pairs. Resistance to temperature stress was determined from the response of the networks to the bleaching model. We define the network’s resistance as the rise in temperature (degree C) that increases the percentage of hosts bleached from 10% to 90% normalized by the maximum possible temperature excursion. Warming caused network breakdown in ways that differed among spatial scales, and from the null networks, providing evidence that network structure and the natural distribution of thermal tolerances affects the coral-symbiont network’s resistance to temperature stress. Ecological robustness, defined by how much perturbation (through the removal of links or nodes) is needed to decrease the number of nodes by 50%, was determined for multiple removal models on both the natural networks, and their null counterparts. Networks were more robust to link removal models than to node removals. Of the link removal models, removing links according to the network bleaching model resulted in the lowest robustness. Natural networks will maintain a higher level of stability under environmental stressors because of their interactions, unless those interactions are susceptible to a certain stressor, as is the case for coral reefs. The global network of interactions between corals and Symbiodiniaceae and associated thermal thresholds is non-random, and the evolution of this architecture has led to higher sensitivity to environmental perturbations compared to null cases.

15:45
Quantization of ecological interaction networks yields insights into the fundamental processes underlying community assembly
SPEAKER: Justin Yeakel

ABSTRACT. The dynamics of community assembly has a rich history in ecological theory. Recently, there has been much interest in assessing how changes in diversity within assembling communities impact the structure of interactions and vice versa. Here we examine a theoretical framework that seeks to generate communities by distilling many types of ecological interactions into a small number of unique pairwise directed links between species, including, but not limited to, ‘eat' interactions (e.g. resource dependencies) and 'need' interactions (e.g. reproductive or habitat dependencies). Different pairwise combinations of directed link types between species gives rise to the larger diversity of species interactions observed in nature, such as consumer-resource, and both service-resource and service-service mutualisms. Moreover, our framework permits the explicit inclusion of interactions that create or modify abiotic elements, which other species may depend on for survival, such as nutrients or habitat alterations. Inclusion of both biotic and abiotic agents permits more complex and indirect interdependencies between species, effectively incorporating the concept of ecosystem engineering into interaction networks, where the environment can be altered by a species or groups of species, thereby facilitating or inhibiting the colonization of others. Our framework shows that despite the complexity of real communities, some of the most remarkable processes and patterns such as realistic trophic level and degree distributions, the role of specialists over the course of assembly, and increasing nestedness with higher frequencies of mutualistic interactions, can be generated by simple interaction rules. Moreover our findings indicate that ecosystem engineering is likely an important component of interaction networks that is generally overlooked.

14:30-16:00 Session 5C: Brain (model)
14:30
Assessing the foundations of connectome organization across species and scales
SPEAKER: Aditya Nanda

ABSTRACT. Introduction: Discovery in network neuroscience generally proceeds through specification of empirically supported models of accurately reconstructed brain wiring diagrams, known as connectomes. Within this framework, principled statistical confirmation of connectome features becomes important, especially when such features otherwise have uncertain empirical functionality or unclear developmental origin1. Here we sought to establish such principled statistical confirmation for connectome hubs, neural cells or regions that putatively facilitate integration of information between distinct functional units.

Methods: We sought the statistical confirmation of hubs by building models of connectomes constrained on existing knowledge of brain-network evolution, architecture and function. One of the more important methods for building such models in network science is based on maximum-likelihood estimation of maximum-entropy probability distributions, and on the subsequent sampling of networks directly from these distributions2. For a weighted directed network W^*={w_ij^* } and the set of empirical constraints C^*‍‍=‍{c_u (W^* )}, this method seek to maximize the likelihood of P(W|θ)=‍exp[-H(W,θ)]⁄Z(θ) , where θ={θ_u } are Lagrange multipliers, H(W,θ)=∑_u▒〖θ_u c_u (W) 〗 is the Hamiltonian and Z(θ)=∑_W▒exp[-H(W,θ)] is the partition function. Importantly, when C^* comprises only constraints of the form c_u (W^* )‍=‍∑_(w_ij^*∈E_u)▒w_ij^* (where E_u is some pre-specified edge subset) it is possible, under an approximating assumption of edge independence, to write the log-likelihood of P(W|θ) in closed form, as ln⁡L=‍∑_u▒〖c_u (W^* )‍ln⁡〖x_u 〗 〗-∑_(i,j)▒ln⁡(〖1-〗⁡∏_u▒x_u^δ(w_ij∈E_u ) ) where x_u are latent variables and δ(*) is the delta function. Here we applied this formalism to build empirically supported connectome models constrained by module, hierarchy, and gradient features. We fit these models to two interareal (mouse3,4 and fruitfly5) and two interneuronal (roundworm6 and sea squirt7) connectomes, all of which represent the current most highly resolved phylum- and scale-specific connectome reconstructions.

Results and discussion: We comprehensively sampled model realizations across connectomes and model-types to show that module and gradient constraints do a reasonably good job in explaining the strength of most, although not all, network nodes and — except in the roundworm — of most hub nodes (the Figure shows normalized strength of models and empirical data, with bars denoting the standard deviation of constraints from the sampled network ensemble). Our results emphasize the need for stronger confirmation of connectome features, especially when such features have uncertain neurobiological support.

References: 1. Rubinov M. Nat Commun 7, 13812 (2016); 2. Squartini T, Garlaschelli D. New J Phys 13, 83001 (2011); 3. Oh SW et al. Nature 508, 207 (2014); 4. Rubinov M et al. T. PNAS 112, 10032–10037 (2015); 5. Shih, C.-T. et al. Curr Biol 25, 1249–1258 (2015); 6. Varshney et al. PLOS Comput Biol 7, e1001066 (2011); 7. Ryan, K et al. Elife 5, e16962 (2016).

14:45
Predicting synchronization regimes with dimension reduction on modular graphs

ABSTRACT. We study the synchronization of oscillator networks with community structures, such as two-star graphs and networks generated by the stochastic block model (SBM). We map high-dimensional (complete) dynamics unto low-dimensional (reduced) dynamics that captures the mesoscale features. This is achieved by introducing a spectral method for dimension reduction that generalizes and improves the method recently proposed in [1]. For the Kuramoto model on realizations of the SBM, the reduced dynamics describes the synchronized states correctly and, as a bonus, reveals the detectability limit for the SBM [2]. For the Sakaguchi-Kuramoto (SK) model on the mean adjacency matrix of the SBM, we prove that the spectral method leads to the same reduced model as the one obtained with the Ott-Antonsen Ansatz [3], highlighting the consistency of the approach with previous works. Moreover, we find new regions in the structural parameter space admitting chimeras, dynamical states characterized by the cohabitation of full synchronization in one community and partial synchronization in others. For the SBM, the size of chimera regions in the parameter space reaches a maximum when the communities are slightly asymmetric in size. However, for the two-star graphs, the size of the chimera regions has a more complicated behavior and exhibits multiple step transitions with respect to the ratio of the periphery sizes.

[1] E. Laurence, N. Doyon, L.J. Dubé and P. Desrosiers, arXiv:1809.08285 (2018). [2] J.-G. Young, P. Desrosiers, L. Hébert-Dufresne, E. Laurence and L.J. Dubé, Phys.Rev.E, 95, p.062304 (2017). [3] T. Kotwal, X. Jiang, and D.M. Abrams, Phys.Rev.Lett., 119, p.264101 (2017).

15:00
Investigating cognitive ability using action-based models of structural brain networks
SPEAKER: Viplove Arora

ABSTRACT. Recent developments in network neuroscience have highlighted the importance of developing techniques for analysis and modeling of brain networks. A particularly powerful approach for studying complex neural systems is to formulate generative models that use wiring rules to synthesize networks resembling the topology of a given connectome. Successful models can highlight the principles by which a network is organized (identify structural features that arise from wiring rules versus those that emerge) and potentially uncover the mechanisms by which it grows and develops. Previous research has shown that such models can validate the effectiveness of spatial embedding and other (non-spatial) wiring rules in shaping the network topology of the human connectome. In this research, we propose variants of the action-based model that combine a variety of generative factors capable of explaining the topology of the human connectome. We test the descriptive validity of our models by evaluating their ability to explain between-subject variability. Our analysis provides evidence that geometric constraints are vital for connectivity between brain regions, and an action-based model relying on both topological and geometric properties can account for between-subject variability in structural network properties. Further, we test correlations between parameters of subject-optimized models and various measures of cognitive ability and find that higher cognitive ability is associated with an individual's tendency to form long-range or non-local connections.

15:15
Homological connectivity dynamics

ABSTRACT. Brain activity is the exploration of a repertoire of different dynamical states. Functional connectivity dynamics (FCD) describes this in terms of transitions between brain network states, obtained on the basis of correlations between functional signals. In this context, topological observables of brain function have emerged as versatile and robust candidates to capture the shape and dynamics of brain information processing. To date, however, there has been little focus on the reliability, reproducibility and requirements necessary for the stability of homological features extracted from fMRI data and on how they relate to conventional observations in temporal brain networks. Here, using a benchmark test-retest resting state fMRI dataset, we take a first step in this direction by showing i) that homological features are stable and reproducible as compared to full brain networks across multiple temporal scales, and ii) that individual topological fingerprints provide discriminating power comparable to full networks despite requiring logarithmically less information. Further, iii) we find that the sequence of jumps in terms of homological distances confirms previous network-based observations that the human brain at rest performs a Lévy-like exploration of brain activation space. Localizing the topological structure by means of homological scaffolds , iv) we show that both the homological connectivity dynamics (HCD) and that of the scaffolds (SCD) reproduce well (Pearson corr $\sim 0.8$) the standard functional connectivity dynamics (FCD), even despite their extreme sparsity. Finally, we compare the temporal community structure in FCD and SCD and find that topological fluctuations underly FCD fluctuations of resting state brain networks to a large degree.

15:30
High-resolution data-driven model of the mouse connectome

ABSTRACT. Knowledge of mesoscopic brain connectivity is important for understanding inter- and intra-region information processing. Models of structural connectivity are typically constructed and analyzed with the assumption that regions are homogeneous. We instead use the Allen Mouse Brain Connectivity Atlas to construct a model of whole brain connectivity at the scale of 100 μm voxels. The data consist of 428 anterograde tracing experiments in wild type C57BL/6J mice, mapping fluorescently-labeled neuronal projections brain-wide. Inferring spatial connectivity with this dataset is underdetermined, since the approximately 2 × 10^5 source voxels outnumber the number of experiments. To address this, we assume that connection patterns and strengths vary smoothly across major brain divisions. We model the connectivity at each voxel as a radial basis kernel-weighted average of the projection patterns of nearby injections. The voxel model outperforms a previous regional model in predicting held-out experiments and compared to a human-curated dataset. This voxel-scale model of the mouse connectome permits researchers to extend their previous analyses of structural connectivity to much higher levels of resolution, and allows for comparison with functional imaging and other datasets.

14:30-16:00 Session 5D: Social systems (social 3)
14:30
Measuring the Impact of Bot Accounts on Political Network Polarization

ABSTRACT. Social media platforms have been facing public criticism about their handling(monetizing) of the online public sphere. Recent literature provides indubitable evidence of software-controlled accounts (a.k.a bots) being involved in political propaganda, trolling, voter disenfranchisement, and augmentation of the spread of negative, inflammatory, or misinforming content. In this work, we study their effect through the lens of political network polarization and quantify the network polarization change induced by bots. We study one month of Twitter activity data relating to gun debate and the tragic Stoneman Douglas High School shooting incident. The retweet network under study contains 3,282,592 nodes and 16,086,168 edges. We set up experiments and randomized controls to measure the change of network polarization due to the bot account activity.

For quantifying network polarization, we modify an existing random walk based controversy score for cases where the political orientation of users is available a priori. For obtaining political orientation we use a simple left-right scale and place social media users on this scale based on news domains they share. We crawl political scale of left and right for news domains from a third party crowd-sourcing platform. For users who do not share any content from the curated list of 1,241 news domains, we adopt a simple label propagation algorithm achieving above 97 accuracy and 95 F1-macro scores with 10 fold cross-validation. For bot detection, we register to Botometer API provided by Indiana University. Out of 260,813 queried accounts, we observe 24,936 (10 percent) bot accounts.

First, we stress our experimental design with synthetically generated networks. As a generative model of polarized political retweet networks, we modify an existing directed scale-free graph model with political preference variables over nodes. First, we show that the polarization metric is robust to the size of network and presents a strong correlation with the political preference strength parameter ($\delta$) of our model. Then, we show that random removals from these synthetic networks do not affect the polarization score of them.

Basing ourselves on the aforementioned robustness of our metric, we analyze the effect of removing bot accounts from the retweet(endorsement) network on Twitter. We show that removing bot accounts from the retweet network significantly reduces the distribution of polarization score (p=3.11e-6). As a sanity check we also randomly remove number of users equal to bot accounts from the network and see no significant change in the distribution of polarization score (pval=0.2575). Furthermore, we analyze hashtag-level network polarization and the change by removing bot accounts at that granularity. We build the retweet network of 100 most popular hashtags in our dataset. 27 of them experience increased polarization when bot accounts are removed. 10 of them show no change in the distribution. We compare and contrast the content generated (e.g. sentiment cues, news domains) by bot accounts in these three cases. Our results provide a strong evidence of the role of bot accounts in creating more polarized endorsement networks.

14:45
Quantifying Data Bias in the U.S. Justice System with Affinity Networks
SPEAKER: Xindi Wang

ABSTRACT. Machine learning (ML) algorithms are increasing being used to make human decisions in criminal justice (e.g., predicting recividism), in social services (e.g., predicting whether a child is in danger), in financial services (e.g., predicting risk of default on a loan), in human resources (e.g., predicting whether an applicant will be a good employee), etc. Since ML algorithms rely on data, biases in data can produce undesirable outcomes [1]. ProPublica’s 2016 investigation into the COMPAS ML software for assessing risk of recividism showed “Machine Bias” in Broward County, FL. 1 We address the problem of quantifying data bias in the COMPAS criminal records data 2 at the individual-level and use affinity networks to measure an individual’s node characteristics in terms of dyadicity (D) and heterophilicity (H) [2]. D and H, respectively, measure the extend to which a node’s neighbors have the same or different property as the node when compared to what is expected randomly. We build a ML model (based on pairwise comparison and ranking) to predict the time interval of an individual’s next crime (T c ) based on features of crime history and judges discretion (e.g., length of crime career, prior felony count, prior prison length, etc), age and gender. Individuals are separated based on racial groups – namely, black and white individuals – in order to measure racial bias in the data. For each individual, we predict T c using two models – called, normal and cross – trained on individuals having the same and different races, respectively. We measure individual predictive errors of the two models to study whether the individual is treated differently when compared to individuals having the same or different race. In case of unbiased data, predictive errors of the models should be the same (regardless of race). We observe that predictions for many individuals are far away from the “fair line” (the red line in Fig. 1(a) and 1(d)). We also observe that the error distributions of the normal and cross experiments are significantly different from each other (see Fig. 1(b) and 1(d)). We further examine the biases around race by constructing an affinity network of individuals (based on the same features given to the ML model) to quantify the interplay between node properties (race or prediction errors) and the structure of underlying affinity network as measured by D and H. We observe the following in Fig. 1(c) and 1(f). First, white individuals under-predicted (H) by the model trained on black individuals (Fig. 1(c)) are more heterophilic, meaning white individuals when compared to their blacks counterparts with similar criminal history are estimated to stay away from crime longer. Second, the D and H values based on race (F and F) are different from the random model (D = 1, H = 1). Black individuals have D > 1 and H ≈ 1, indicating that black individuals are more likely to be similar to other black individuals. For the white individuals, D and H are both greater than 1, indicating that the neighborhood of white individuals (based on feature similarity) are more diverse in terms of race. This suggests that the features of white individuals, which include decisions by the judge such as prison time, might be more case-based/subjective than for black individuals. Combining tools from ML and network science is a promising area for quantifying data bias.

15:00
Exploring and estimating causal attribution networks
SPEAKER: James Bagrow

ABSTRACT. Cause-and-effect reasoning, the attribution of effects to causes, is one of the most powerful and unique skills humans possess [1, 2]. Multiple surveys are being conducted to map out causal attributions as networks. These surveys ask individual participants to propose and rate various cause-effect relationships, but it is unclear how efficient these procedures are at exploring a large network and how well these survey results can be combined. Further, the total size of the collective causal attribution network held by humans is currently unknown, making it challenging to assess survey progress.

First, here we introduce and validate a new, theoretically-principled survey algorithm called Iterative Pathway Refinement (IPR). IPR is designed to crowdsource the construction of large networks efficiently [3]. This method is flexible and can be adapted to most networks where individual participants can help build the network topology, but our focus is on causal attribution networks, an application ideally suited to human contributions. Next, as multiple surveys of causal attribution are being conducted, it is fruitful to ask if these different efforts can be combined. We study three causal attribution networks (excerpts of which are illustrated in Fig. 1) to determine how well they can be combined into a single network. Unfortunately, combining these networks requires dealing with ambiguous nodes, as nodes represent written descriptions of causes and effects and different descriptions may exist for the same concept. We therefore introduce NetFUSES, a method for combining networks with such ambiguous nodes [4].

Finally, treating the different causal attribution networks as independent samples (either before or after applying NetFUSES) allows us to use their overlap to statistically estimate [5] the total size of the collective causal attribution network held by humans. We find that existing surveys capture 5.77% ± 0.781% of the ≈293 000 causes and effects estimated to be known, and 0.198% ± 0.174% of the ≈10 200 000 attributed cause-effect relationships [4].

In general, the construction of causal attribution networks generates important knowledge networks that may inform causal inference research and even help future AI systems to perform causal reasoning, but these networks are time-consuming and costly to generate, may be afflicted by human cognitive biases that must be accounted for, and to date no efforts have been made to combine different networks. Our work introduces new ways to construct these networks efficiently, studies the potential for combining different networks together, and provides statistical means to reason about the size and density of the large, mostly unobserved causal attribution network.

15:15
Cumulative effects of triadic closure and homophily in social networks

ABSTRACT. Much of the structure in social networks has been explained by two seemingly independent network evolution mechanisms [1]: triadic closure (formation of short cycles) and homophily (tendency to connect within attribute groups). It is common to consider these mechanisms separately or in the frame of a static model. Empirical studies, however, suggest that their dynamic interplay is responsible for the homophily seen in on- and offline social networks [2]. By combining these two mechanisms in a minimal, solvable dynamic model, we confirm theoretically the long-held and empirically established hypothesis that homophily can be amplified by the triadic closure mechanism [3].

By performing a mean-field bifurcation analysis, we estimate how much of the observed homophily in various friendship and communication network data sets is due to amplification for a given amount of triadic closure. We find that the cumulative advantage-like process leading to homophily amplification can, under certain circumstances, also lead to the widely documented core-periphery structure of social networks [4], as well as to the emergence of memory of previous homophilic constraints (equivalent to hysteresis phenomena in physics). The theoretical understanding provided by our results highlights the importance of early intervention in managing at the societal level the most adverse effects of homophilic decision-making, such as inequality, segregation, and online echo chambers.

[1] Toivonen, R. et al., Soc. Net. 31, 240–254 (2009). [2] Kossinets, G. & Watts, D. J., Am. J. Sociol. 115, 405–450 (2009). [3] Asikainen, A., Iñiguez, G., Kaski, K. & Kivelä, M., arXiv:1809.06057 (2018). [4] Rombach, P., Porter, M. A., Fowler, J. H. & Mucha, P. J., SIAM Rev. 59, 619–646 (2017).

15:30
Local Majority: A Limited-Concern Strategy for Networked Social Learning
SPEAKER: Edward Platt

ABSTRACT. Collective problem-solving is constrained by the communication patterns between individual problem-solvers, which can be modeled as a network. The success of networked problem-solvers has been found to depend on network structure , learning strategy , and problem complexity. We present the novel local majority learning strategy, motivated by practical constraints in real-world problem-solving, and use numerical simulations to show faster convergence and higher average performance compared to standard social learning strategies. Problem-solvers using a local majority strategy have a limited concern; they may consider some sub-problems and ignore others. We also assume that problem-solvers communicate only with others concerned with the same sub-problems. Improvements to problem solutions are made incrementally to an agreed-upon working solution, mimicking the inability of real-world problem-solvers to evaluate the entirety of complex-solutions. These constraints result in a recombination of sub-problem solutions, leading to novel solutions which could not have been found by gradient ascent or small mutations—a form of emergent creativity. Using NK-model simulations, we compare the local majority to conformity and best-neighbor strategies from, finding that for complex problems, local majority reaches a higher average solution value and converges to that solution more quickly. These results show, perhaps counterintuitively, that constraints on individual problem-solvers can improve group performance, and offer insight into how creative solutions can arise from constraints on problem-solving ability.

15:45
Social Success of Perfumes

ABSTRACT. Perfumery is a form of art, practised since ancient Egyptian and Mesopotamian times. The work of composing perfumes has ever since been left as a job for the nose - an expert with knowledge of pairwise complementary notes (scents, such as rose, vanilla, musk), volatilities of ingredients and other aspects of perfume making. This experience is typically acquired over many years of training and trials of many different combinations of ingredients.

Although information on precise formulation of perfumes is confidential, we obtain insights on perfume making by analysing the freely available information on perfume notes. From this perspective, perfume could be looked at as a type of a dish: mixing the correct amounts of the right ingredients leads to a harmonious whole. Indeed, similar network studies of food recipes exist.

A bipartite network is the most natural representation of our data on perfumes and their odour descriptors-notes. We study the structure of this network, example of which is shown in Fig. 1, to understand how note compositions, called accords, influence successful fragrance formulas. We obtain accords which tend to be present in perfumes that receive significantly more customer ratings. Our findings show that the most popular notes and the most over-represented accords are different to those that have the strongest effect to the perfume ratings. We also used network centrality to understand which notes have the highest potential to enhance note compositions. We find that large degree notes, such as musk and vanilla as well as generically-named notes, e.g. floral notes, are amongst the notes that enhance accords the most. This work presents a framework which would be a timely tool for perfumers to explore a multidimensional space of scent compositions.

14:30-16:00 Session 5E: Structure (theory)
14:30
Recoverability of network dynamics

ABSTRACT. It has been found that nonlinear dynamic systems experience state transitions, determined by interplay between the structure and dynamics. Once a transition occurs to a failed state a fundamental challenge is to recover the system back to functionality. This recovering is achieved through topological interventions and/or dynamic perturbations. However, the possible interventions are highly constrained. For example, typically we only control a microscopic set of nodes. Also it may be very costly to build too many links between nodes. Therefore, we wish to characterize the system's recoverability, whether it is possible and under what conditions. Some dynamical systems have a bistable regime. If such a system is damaged, e.g. by links removal, it loses its functionality at some point. Introducing links back does not bring it to a functional state. In order to reverse this failure, we need to reignite the system. We aim to asses: Can we reignite the system with only a few nodes or perhaps even just a single node? If not, what is the required fraction? We find that for some dynamical systems it is possible to recover via even just a node. Furthermore, we discover an additional phase where the system is failed but recoverable. Nonetheless, for some dynamics it is impossible to reignite the system with a small set of nodes and instead we find the fraction of nodes required for reigniting. These results have the potential to influence our understanding of recovery in numerous dynamical network systems such as infrastructure, biological systems, and others.

14:45
Compression of treelike complex networks using layered configuration models

ABSTRACT. Random graph models, when constrained to reproduce features of real networks, can compress network data and their mathematical descriptions. More specifically, given a generative model, the probability ascribed to a network can be written as a transformation of the amount of information required to describe the parametrization of the model (prior) and the network instance (likelihood). The minimum descriptions length (MDL) criterion states that shorter description lengths correspond to better statistical representations of a network than longer ones [1]; this provides a straightforward mechanism to select models from a large family of possible descriptions.

We calculate the approximate description length of graph ensembles generated by the classic configuration model and a number of its generalizations: the correlated configuration model, the degree-corrected stochastic block model [2], and the recently introduced layered configuration models [3]. We then apply MDL as a model selection criterion [1,2]. We find that the shortest description length is obtained, for most real complex networks, by either one of the two following models: Stochastic block models in the case of dense networks like social and information networks, and layered configuration models for treelike networks such as power grids and phylogenetic trees. Our results formalize the intuition that configuration models offer a compact description of sparse networks, and highlights the importance of centrality layers in treelike structures.

[1] P. D. Grünwald, The Minimum Description Length Principle, (MIT Press, 2007) [2] T. Peixoto, Phys. Rev. Lett. 110, 148701 (2013). [3] A. Allard & L. Hébert-Dufresne, arXiv:1804.09633 (in press at Phys. Rev. X).

15:00
Spectra of random networks in the weak clustering regime
SPEAKER: Thomas Peron

ABSTRACT. The asymptotic behavior of dynamical processes in networks can be expressed as a function of spectral properties of the corresponding adjacency and Laplacian matrices. Although many theoretical results are known for the spectra of traditional configuration models, networks generated through these models fail to describe many topological features of real-world networks, in particular non-null values of the clustering coefficient. Here we study effects of cycles of order three (triangles) in network spectra. By using recent advances in random matrix theory, we determine the spectral density of the network adjacency matrix as a function of the average number of triangles attached to each node for networks constructed according to the model proposed independently by Newman and Miller -- a model that generates clustered networks with vanishing assortativity and no modular structure. Our theoretical results accurately reproduce the positive skewed distribution of eigenvalues of the adjacency matrix (see Fig. 1). Interestingly, we further reveal that the largest eigenvalue of this matrix deviates from the value predicted for the traditional configuration model with the same connectivity by a factor inversely proportional to the average number of triangles. However, we show that this difference vanishes as the average degree increases, explaining why critical dynamical properties of clustered and locally tree-like networks tend to converge in such a limit.

15:15
Natural emergence of a core structure in networks via clique percolation

ABSTRACT. Networks are often presented as containing a "core" and a "periphery." The existence of a core suggests that some vertices are central and form the skeleton of the network, to which all other vertices are connected. An alternative view of graphs is through communities. Multiple measures have been proposed for dense communities in graphs, the most classical being k-cliques, k-cores, and k-plexes, all presenting groups of tightly connected vertices. We here show that the edge number thresholds for such communities to emerge and for their percolation into a single dense connectivity component are very close, in all networks studied. These percolating cliques produce a natural core and periphery structure. This result is generic and is tested in configuration models and in real-world networks. This is also true for k-cores and k-plexes. Thus, the emergence of this connectedness among communities leading to a core is not dependent on some specific mechanism but a direct result of the natural percolation of dense communities.

15:30
Cascading detectability limits revealed by preprocessing temporal networks with hierarchical communities
SPEAKER: Dane Taylor

ABSTRACT. It is commonplace to preprocesses multilayer and temporal networks by aggregating together layers that are similar (Fig. 1A). In previous research [1,2], we analyzed how layer aggregation can enhance community detection by lowering the fundamental limits on detectability. We extend this work by allowing layers to be drawn from hierarchical stochastic block models (SBMs). We develop random matrix theory to analyze the eigenvalues and eigenvectors of modularity matrices for networks obtained from the aggregation of L layers. In Fig. 1B, we show an example with N nodes and each layer is sampled from an SBM consisting of four communities organized in a two-level hierarchy (see block edge probabilities in the top-right inset). Each dominant eigenvalue that is well-separated (i.e., isolated) from the bulk eigen- values (shaded region in lower-left corner) has an associated eigenvector that correlates with the hierarchical community structure (see insets). We illustrate a novel phenomenon in Fig. 1C: as one increases the number of layers L, there is a cascade of detectability phase transitions in which first the coarse-scale structure becomes detectable (i.e., the two large communities of size N/2), and then in subsequent transitions, each of these breaks up into two communities to obtain the four communities of size N/4. These detectability phase transitions are measured through order parameters that measure the correlation between a dominant eigenvector and a vector that encodes the true community structure.

15:45
Graph Comparison via the Non-backtracking Spectrum
SPEAKER: Andrew Mellor

ABSTRACT. The comparison of graphs is a vitally important, yet difficult task which arises across a number of diverse research areas including biological and social networks. There have been a number of approaches to define graph distance however often these are not metrics (rendering standard data-mining techniques infeasible), or are computationally infeasible for large graphs. In this work we define a new metric based on the spectrum of the non-backtracking graph operator and show that it can not only be used to compare graphs generated through different mechanisms, but can reliably compare graphs of varying size. We observe that the family of Watts-Strogatz graphs lie on a manifold in the non-backtracking spectral embedding and show how this metric can be used in a standard classification problem of empirical graphs.