View: session overviewtalk overview
Renaud Lambiotte (University of Oxford, UK): Dynamics and Balance on Signed Graphs
This keynote session is supported by Elsevier.
Kathleen Carley (Carnegie Mellon University, USA): Social Cognition and Inauthentic Actors in High Dimensional Networks
Dynamic stability of complex networks ABSTRACT. See attached file |
Local dominance unveils clusters in networks PRESENTER: Ruiqi Li ABSTRACT. Clusters or communities can provide a coarse-grained description of complex systems at multiple scales, but their detection remains challenging in practice. Community detection methods often define communities as dense subgraphs, or subgraphs with few connections in-between, via concepts such as the cut, conductance, or modularity. Here we consider another perspective built on the notion of local dominance, where low-degree nodes are assigned to the basin of influence of high-degree nodes, and design an efficient algorithm based on local information. Local dominance gives rises to community centers, and uncovers local hierarchies in the network. Community centers have a larger degree than their neighbors and are sufficiently distant from other centers. The strength of our framework is demonstrated on synthesized and empirical networks with ground-truth community labels. The notion of local dominance and the associated asymmetric relations between nodes are not restricted to community detection, and can be utilised in clustering problems, as we illustrate on networks derived from vector data. |
Hypergraphs or simplicial complexes: Untangling the effect of higher-order representations on collective dynamics PRESENTER: Maxime Lucas ABSTRACT. Higher-order networks have emerged as a powerful framework to model complex systems and their collective behavior. Going beyond pairwise interactions, they encode structured relations among arbitrary numbers of units through representations such as simplicial complexes and hypergraphs. So far, the choice of which representation to use has often been motivated by technical convenience. In this talk, using synchronization of coupled oscillators, I will demonstrate that the effects of higher-order interactions are highly representation-dependent. In particular, hyperedges typically enhance synchronization in hypergraphs but have the opposite effect in simplicial complexes. I will explain this result by linking higher-order representation to (generalized) degree heterogeneity, which influences a wide range of dynamic processes from contagion to diffusion. |
Collective dynamics of causal network interactions PRESENTER: Samuel Unicomb ABSTRACT. Real-world interacting systems have complex topological and temporal correlations. For example, such correlations arise in human communication patterns, where interactions are temporally correlated and biased by topological structures such as local clustering and homophily. Correlations such as these have highlighted that pairwise interactions are insufficient to model observed complexity, and it is thus important to study higher-order network representations of interacting systems. Higher-order topology has been modelled by hypergraphs and simplicial complexes, while higher-order interaction sequences have been modelled predominantly using higher-order Markov models. In such models, one can conceptualise higher-order temporal sequences as correlated walks on a network. Although this is suitable as a model of conservative, pairwise processes, it is not a natural model of certain temporal motifs, despite allowing some generalisations to non-conservative behaviour. We introduce an alternative model of causal interactions that amounts to a continuous-time branching process on a network. Consider an underlying network whose edges mediate directed temporal interactions. Making use of directedness, we construct sequences of interactions on adjacent edges in which each interaction may produce subsequent interactions. These interactions may be biased by both the underlying network topology and the history of the process. Akin to a branching process, we encode the process history on network edges. To do so, we use a compartmental model of edge states in which edges are either resting or in an active directed state that indicates a temporal interaction (see Figure 1). A strength of this approach is that it allows us to adapt analyses of node-based compartmental models to our framework. To do this, we extend a master-equation analysis\footnote{\unicomb} that was formulated originally to model temporal (but non-causal) networks. Using our framework, we study how temporal correlations that are defined on a local scale determine the emergence of global avalanches of interactions. In a series of numerical experiments on synthetic and empirical networks, we show that a branching-process model of causality undergoes a bifurcation. Below a critical point, sequences are finite and die out. Above a critical point, interactions form a chain and sequences continue indefinitely. Our master-equation analysis suggests that a closed-form stability condition does not exist. In light of this, we formulate system stability in terms of a self-consistency equation that we solve using an efficient iterative procedure. We believe that our branching-process framework is general enough to encourage further modelling of causality and to help shed light on the role of topological and temporal correlations in complex systems. |
Double Exchangeability in Knowledge Graphs Endows Neural Networks with Logical Knowledge Transfer PRESENTER: Beatrice Bevilacqua ABSTRACT. In 2012, Google revealed that knowledge graphs (KGs) were an essential ingredient in its successful search engine, declaring Web search to be about "things not strings". Since then, Google's mastery of graphs, both the Web graph (used by Google's PageRank) and Google's Knowledge Graph, became nearly synonyms with Google's dominance in Web search. Today, the development of large language models (LLMs) trained on massive text datasets (strings as sequences of words) threatens to upend this graph-centric view of Web search. By learning associations between words, sentences, and context, LLMs now promise to revolutionize Web search. Under this paradigm shift, could graphs still be useful? More specifically, can graphs allow neural networks to reason beyond what LLMs can learn on massive text datasets? Graphs are actually strings (sequences) endowed with a permutation symmetry, according to graph machine learning theory (Murphy et al., 2019). In other words, what distinguishes a graph from a sequence of edges is the assumption that node ids are arbitrary, and should therefore be exchangeable without altering the underlying data they represent. In graph machine learning, all learnable graph properties emerge from this permutation symmetry (Srinivasan & Ribeiro, 2020). Are KGs also graphs? That is, are KGs just a sequence of relations (v --- lives_in ---> u) between entities that were given arbitrary ids? Our work posits that KGs have a different symmetry not present in standard graphs (Zhou, Bevilacqua, Ribeiro 2023) (Gao, Zhou, Ribeiro, 2023), and KGs are better described through what we define as double exchangeability, with symmetries defined with respect to permutations of both node ids and relation ids. By enforcing an equivariance to the aforementioned double symmetry, the double-equivariant neural networks are forced to learn higher-order logical relations in the knowledge graph beyond what LLMs can learn from data alone (Dai et al., 2019). Indeed, despite being extremely powerful to learn associations within the training data, LLMs have the intrinsic limitation of not being able to transfer existing knowledge to new concepts, e.g., new vocabulary or new associations, as shown next. For instance, Figure 1 (in the PDF) depicts a task that, despite its simplicity, can be solved only through the understanding that logical rules learned for one relation should directly be applicable to new relations and entities, unless contradicted. Our model can tackle this scenario, as the notion of double equivariance allows to disregard actual relation identifiers, in favor of learning relational dependencies among them. This showcases how our approach opens up a new promising direction towards a new type of neural network reasoning by enabling knowledge transfers of logical rules. Our empirical results on real-world datasets demonstrate that our approach can effectively generalize out-of-distribution (OOD) to entirely new relation types in test, without access to additional information. |
A multi-model analysis of galaxy evolution: A Network Science approach PRESENTER: Michail Papadopoulos ABSTRACT. The use of Network Science in Astrophysics has been very limited compared to other fields. Specifically, regarding galaxy evolution the only major work is [1], where identification between of evolutionary states was showcased. The study of galaxy evolution requires good estimates of the main physical quantities of large samples of galaxies (e.g., stellar mass, star formation rate, active galactic nucleus (AGN) fraction, etc.) throughout the history of the Universe. The presence of interstellar dust in galaxies complicates the analysis of the underlying phenomena, due to absorption. The optical/ultraviolet light emitted by stars and accreting supermassive black holes is absorbed and re-emitted in the infrared. To estimate the key physical quantities of galaxies, multiwavelength observations are required; all the way from the ultraviolet to the millimetre. Modelling the emission of the components of a galaxy (i.e., star-forming regions, AGN tori and the general host galaxy) is essential, especially for galaxies at high redshift where the spectrum is usually the only observable information. Equally important is the fitting of these models to real data. There are mainly 2 ways to fit the spectral energy distribution (SED) of a galaxy: radiative transfer models and energy-balance based models. An abundance of multivariate models with different assumptions about galaxy evolution and galaxy structure (e.g., the geometry of the AGN torus), can in general produce different SEDs, even for the same galaxy. Due to the fact that no single model is currently favored over the rest, a quantitative comparison amongst them is necessary. In this work, the framework of which we showcase in Fig.1, we use approaches of Network Science to construct networks of galaxies and infer information on how the different fitting models compare to each other. We use Wasserstein distance (earth mover’s distance) to pairwise compare all galaxy SEDs in our dataset and create a similarity matrix S. Using the k-nearest neighbor method, we sparsify S, keeping high values of similarity. This sparsified version is used as the weighted adjacency matrix of a network that quantifies similarity between galaxies. For the fitting of our sample we use 4 models that assume different geometry for the AGN torus, as in [2]. For any of these models, due to the fact that the fitted SED of each galaxy is different, the resulting networks will also be different (while referring on the same set of nodes). There is a one to one correspondence between networks and models. This correspondence is utilized to unveil significant information regarding galaxy evolution. Arguably, a comparison between the aforementioned networks can pose as an indicator of similarity between fitting models. To achieve this, we use standard graph comparison methods (e.g., Deltacon [3], matrix norms, etc.) and a clustering comparison method [4], coming from Information Theory. (For the purposes of clustering, we use the Louvain algorithm [5].) |
Nodes within the cerebellum are integrators in the human brain networks PRESENTER: Kanika Bansal ABSTRACT. The cerebellum is a complex neural system with a unique regular structure coupled with high connectivity to the cerebral cortex. Previous research has implicated the cerebellum’s role in internal models for fine motor execution; however, more recently, it is recognized that the cerebellum plays a similar role in a variety of cognitive functions, including spatial processing, working memory and executive function, likely suggesting its more generalized contribution to human intelligence. Despite this recent evidence, the nature by which computations in the cerebellum are distributed through cerebral cortical networks is not yet fully understood. Here, we deploy a networks approach to evaluate dynamic network reconfigurations in the cerebral cortex connectivity following inhibitory repetitive transcranial magnetic stimulation (rTMS) targeting the right cerebellum, using the ‘perturb and measure’ approach to uncover potential communication links through which the cerebellar stimulation spreads within the brain. We use dynamic community detection algorithms to compare how modular network structures emerge and evolve prior to and post cerebellar-stimulation. Our results indicate that: (1) flexibility, or the likelihood of network nodes to change module allegiances, increases post stimulation for multiple nodes across the entire cerebral cortex; (2) dynamic patterns in which module allegiances emerge and evolve are highly individualistic and do not follow a single functional prototype; and (3) cerebellar nodes play a role of integrators for distinct network modules within the brain. These findings suggest that cerebellar nodes modulate distributed cortical activity in an adaptive, orderly fashion, efficiently integrating and segregating information beyond the confines of the motor system. Crucially, this could provide a putative pathway for cerebellar contributions to a variety of high level cognitive functions. |
Modelling brain connectomes networks: Solv is a worthy competitor to hyperbolic geometry! PRESENTER: Dorota Celińska-Kopczyńska ABSTRACT. Connectomes are comprehensive maps of neural connections in the brain. Understanding the interactions shaped by them is a key to understanding, e.g., human cognition. As connectomes are spatially embedded complex networks with the structure shaped by physical constraints and communication needs, they seem to be exhibit traits inherent to non-Euclidean geometries. That is why a vast amount of research interest has been recently devoted to finding the suitable embeddings for connectome networks. Recent studies (e.g., WHKL22, AAS20) have found two-dimensional hyperbolic embeddings superior to Euclidean embeddings in modelling connectomes across species, especially human connectomes. However, those studies had some limitations: geometries other than Euclidean, hyperbolic or spherical were not taken into account. Our study broadens the perspectives for the suitable embeddings. We have analyzed the goodness-of-fit (measured with loglikelihood) of the embeddings for 21 connectome networks (8 species) to 26 unique tessellations (Euclidean, spherical, hyperbolic, Solv, Nil, and also product geometries). We included both two-dimensional manifolds and three-dimensional ones. Following William Thurston suggestion that the networks of neurons in the brain could be sucessfully represented in Solv geometry (one of eight so-called Thurston geometries), we stipulated that this using this geometry would outperform using hyperbolic geometry. Our findings suggest that in many cases (especially when it comes to human connectome), Solv embeddings indeed allow to obtain better fit measured with log-likelihood. |
Unraveling cradle-to-grave disease trajectories from multilayer comorbidity networks PRESENTER: Elma Dervic ABSTRACT. Background: The trajectories along which multimorbidity, the presence of multiple diseases or conditions in a patient, develops over a life-time are yet to be fully un- derstood. We aim to comprehensively identify typical life-spanning trajectories and critical events that impact patients’ hospital utilization and mortality. Methods: We use a unique dataset containing 44 million records of almost all inpatient stays from 2003 to 2014 in Austria to investigate disease trajectories. We develop a new, multilayer disease network approach to quantitatively analyse how cooccurrences of two or more diagnoses form and evolve over the life course of patients. Nodes represent diagnoses in age groups of ten years; each age group makes up a layer of the comorbidity multilayer network. nter-layer links encode a significant correlation between diagnoses (p < 0.001, relative risk > 1.5), while intra-layers links encode correlations between diagnoses across different age groups. We use an unsupervised clustering algorithm for detecting typical disease trajectories as overlapping clusters in the multilayer comorbidity network. We identify critical events in a patient’s career as points where initially overlapping trajectories start to diverge towards different states. Results: We identified 1,260 distinct disease trajectories (618 for females, 642 for males) that on average contain 9 (IQR 2-6) different diagnoses that cover over up to 70y (mean 23y). We found 70 pairs of diverging trajectories that share some diagnoses at younger ages but develop into markedly different groups of diagnoses at older ages. Conversely, we found 159 pairs of converging trajectories in which different sets of diagnoses at younger age lead to similar diagnoses in older age. Conclusions: The disease trajectory framework can help us to identify critical events as specific combinations of risk factors that put patients at high risk for different diagnoses decades later. Our findings enable a data-driven integration of personalized life-course perspectives into clinical decision-making. |
Non-selective distribution of infectious disease prevention may outperform risk-based targeting PRESENTER: Eugenio Valdano ABSTRACT. Epidemic control often requires optimal distribution of available vaccines and prophylactic tools, to protect from infection those susceptible. Well-established theory recommends prioritizing those at the highest risk of exposure. But the risk is hard to estimate, especially for diseases involving stigma and marginalization. We address this conundrum by proving that one should target those at high risk only if the infection-averting efficacy of prevention is above a critical value, which we derive analytically. We apply this to the distribution of pre-exposure prophylaxis (PrEP) of the Human Immunodeficiency Virus (HIV) among men- having-sex-with-men (MSM), a population particularly vulnerable to HIV. PrEP is effective in averting HIV infection among MSM. Global scale-up of PrEP has, however, been slow, showing the need to revisit targets and distribution strategies, based currently on individual risk of HIV acquisition. In this study, we analysed under which conditions a more equitable distribution, that is not exclusively aimed at high-risk individuals, can actually outperform the latter. We used the framework of complex networks to describe the heterogeneity in the number of sexual contacts in MSM populations that contribute to the spread of HIV. Our analytical results indicate that for high HIV prevalence and low PrEP efficacy, there exists a degree $k^*$ which maximizes the reduction in prevalence when immunized. The phenomenology does therefore fundamentally differ from the case of perfect immunization, i.e. 100\% efficacy. We show that this optimum emerges due to a trade-off between the probability of breakthrough infections to occur and the ability to prevent superspreading events when immunizing highly active individuals. More specifically, we found that the individual infection risk is maximally reduced when immunizing a node in degree class $k^*_{dir} < k^*$. At the same time, the reduction in the infection risk of the rest of the population monotonously increases with respect to the degree of the immunized individuals. The trade-off between the two contributions, which we summarize in the functions $F_{indir}(k)$ and $F_{dir}(k)$, leads to $k^*$. Fig.~\ref{fig1} illustrates the functions $F_{indir}(k)$ and $F_{dir}(k)$, and how they lead to an overall optimum in prevalence reduction ($f(k) = F_{indir}(k) + F_{dir}(k)$) at $k^*$. Nevertheless, from a practical point of view, uncertainties in the epidemic dynamics hinder a precise evaluation of $k^*$. Maximizing the impact of PrEP by distributing it to individuals in degree class $k^*$ seems thus not feasible. Hence, we considered random PrEP distribution and analyzed in which countries and regions it may outperform a risk-based approach. We used HIV prevalence, and treatment coverage, estimates from 63 MSM communities from 41 countries (source: UNAIDS). We found that in high-prevalence settings, offering PrEP indiscriminately can outperform targeting high-risk individuals. In particular, in many African communities, this equitable distribution among MSM may achieve the largest number of averted infections. Beyond application to PrEP, our results are general and may challenge immunization strategies in other public health contexts. |
Modeling adaptive forward-looking behavior in epidemics on networks PRESENTER: Michele Starnini ABSTRACT. The course of an epidemic can be drastically altered by changes in human behavior. Incorporating the dynamics of individual decision-making during an outbreak represents a key challenge of epidemiology, faced by several modeling approaches siloed by different disciplines. Here, we propose an epi-economic model including adaptive, forward-looking behavioral response on a heterogeneous networked substrate, where individuals tune their social activity based on future health expectations. Under basic assumptions, we show that it is possible to derive an analytical expression of the optimal value of the social activity that matches the traditional assumptions of classic epidemic models. Through numerical simulations, we contrast the settings of global awareness -- individuals only know the prevalence of the disease in the population -- with local awareness, where individuals explicitly know which of their contacts are infected. We show that behavior change can flatten the epidemic curve by lowering the peak prevalence, but local awareness is much more effective in curbing the disease early with respect to global awareness. Our work bridges classical epidemic modeling with the epi-economic approach, and sheds light on the effects of heterogeneous behavioral responses in curbing the epidemic spread. |
Triadic approximation for complex contagion on networks PRESENTER: Giulio Burgio ABSTRACT. Based on mechanisms of social reinforcement, complex contagion processes where transmission depends on the (simultaneous) exposure to multiple sources are a peculiar feature of human social systems. Such processes essentially rely on the group structure of social interactions and hypergraphs provide a suitable representation for them. Focusing on two-way and three-way interactions (i.e., rank-3 hypergraphs), we propose a mean-field approximation of the social structure that preserves the dynamical correlations emerging within groups of two and three nodes. Considering a binary-state dynamics (SIS-like), the approximation allows to analytically derive an explicit formula describing –for the first time– how the critical value λ_cr of the two-way transmission rate, marking the onset of extensive contagions, does depend on the three-way rate λ_△. This is in accordance with Monte Carlo simulations and with previously found implicit formulas for λ_cr [1]. Any uncorrelated-states model, on the contrary, predicts λ_cr to not depend on λ_△ [2], no matter how the structure is approximated [1]. [1] Giulio Burgio, Alex Arenas, Sergio G´omez, and Joan T Matamalas. Network clique cover approximation to analyze complex contagions through group interactions. Communications Physics, 4(1):111, 2021. [2] Alain Barrat, Guilherme Ferraz de Arruda, Iacopo Iacopini, and Yamir Moreno. Social contagion on higher-order structures. In Higher-Order Systems, pages 329–346. Springer, 2022. |
Impact of the COVID-19 outbreak on the Italian Twitter vaccination debate: A network–based analysis PRESENTER: Veronica Lachi ABSTRACT. Vaccine hesitancy, or the reluctance to be vaccinated, is a phenomenon that has recently become particularly significant, in conjunction with the vaccination campaign against COVID-19. During the lockdown period, necessary to control the spread of the virus, social networks have played an important role in the Italian debate on vaccination, generally representing the easiest and safest way to exchange opinions and maintain some form of sociability. Among social network platforms, Twitter has assumed a strategic role in driving the public opinion, creating compact groups of users sharing similar views towards the utility, uselessness or even dangerousness of vaccines. In this paper, we present a new, publicly available, dataset of Italian tweets, TwitterVax, collected in the period January 2019– May 2022. Considering monthly data, gathered into forty–one retweet networks — where nodes identify users and edges are present between users who have retweeted each other —, we performed community detection within the networks, analyzing their evolution and polarization with respect to NoVax and ProVax users through time. This allowed us to clearly discover debate trends as well as identify potential key moments and actors in opinion flows, characterizing the main features and tweeting behavior of the two communities. |
Nonnormative School Transitions and Adolescents’ Friendly and Conflictual Social Networks PRESENTER: Haoyang Zhang ABSTRACT. School transitions, especially those that occur during early adolescence, can be important turning points that affect an individual’s life trajectory (Elder, 1998) and their social networks (Benner, 2011). Adolescents’ social networks not only provide young people with an opportunity for learning and refining socioemotional skills (Crosnoe, 2000), but they can also affect a range of developmental outcomes, such as academic performance (Vaquera & Kao, 2008) and risky behaviors (McMillan et al., 2018; Steglich et al., 2010). Blending in with peers and making friends after navigating a school transition, could be the most critical component of adjusting to a new institution (Langenkamp, 2016). Given the vital role of peer relationships in young people’s everyday lives, in this project, we focus on the impact of certain school transitions on adolescents’ social networks, including both their friendly and conflictual network ties. School transitions differ as to whether they are normative or nonnormative (Crockett et al. 1989; Sutton, Muller, and Langenkamp, 2013). A normative school transition refers to the typical school promotion in which students advance to the next level of schooling, which often comes at a serious cost for adolescents (e.g., Benner, 2011; Felmlee et al., 2018). A nonnormative transition experienced by young people is not a result of a normal grade promotion and involves either a school transfer or geographic, residential mobility. The literature on nonnormative school transitions, especially school transfers, is underdeveloped (Sutton et al., 2013). The purpose of this project, therefore, is to investigate the possible consequences of these under-studied transitions for adolescents’ social lives. Based on a network perspective and life course theory, we anticipate that there will be costs to the social networks of youth who engage in nonnormative school transitions and that these school changes will be a source of homophily in their social connections. With data from 4 public middle schools (grades 5-8) in New Jersey (Paluck et al., 2016), we address the following question: how do school transfers and recent, geographic moves affect adolescents’ social networks? We examine two outcomes—friendly, or “friendship” networks (defined by those who they “spend time with”) and conflictual networks (defined by those who they “have conflict with”). According to previous research (Swanson & Schneider, 1999), we categorize students into four groups based on their experience with both school transfers and residential moves: 1) “movers”: students who move to a new residence without changing schools, 2) “changers”: people who change schools but do not move, 3) “leavers”: students who move and change schools at the same time, and 4) “stayers”: those who neither move nor change schools. Preliminary findings show that both friendship and conflict indegree often are negatively related to nonnormative transitions during school. For example, “movers” and “leavers” receive statistically fewer friendship nominations than those who stay in all four schools, with one exception (School 3) (p<.01); “changers” receive fewer friendship nominations than “stayers” in two schools (Schools 2 and 4) (p<.01). We also find that “leavers” received statistically fewer conflict nominations than “stayers” in Schools 1 and 4 (p<.05). As shown in Figure 1, nodes with the same color/shade (indicating one of the four transition status categories) tend to cluster together in the networks, especially for students who are “changers” and “stayers”. In other words, there appear to be strong tendencies towards homophily based on transition status in the networks. The results of our Exponential Random Graph Models (ERGMs) confirm this homophily trend, showing that students are significantly more likely to hang out with peers of the same transition status (especially for “stayers” and “changers”), gender, ethnicity, and grade. Our findings also indicate school-level variations for students with different transition types. Our preliminary results suggest that there can be notable costs in several schools to friendship nominations for adolescents who make nonnormative school transitions, especially when compared with those who remain in the same school. On the other hand, in some cases, students who change schools are less likely to be linked to others by conflictual ties, suggesting that making a nonnormative transition may have some protective effects. Finally, friendly connections tend to form between students that are based on their transition category, particularly for “changers” and “stayers,” providing evidence of significant homophily trends. Findings highlight the importance of investigating the influence of a variety of types of school transitions on adolescents, with attention to both their positive and negative social ties. |
Friendship Modulates Hierarchical Relations in Public Elementary Schools PRESENTER: Melanie Oyarzun ABSTRACT. Despite social status and hierarchy being prevalent aspects of social life, little is known about the connection between social hierarchies and friendship, particularly in the context of young children in a public elementary school setting. We conducted a large-scale experiment with 856 children aged 9-12 from 14 Chilean elementary schools designed to measure social status through aggregated cooperative patterns and then explore the connection between friendship and dyadic cooperation in students with different social status using a combination of experimental game theory and social network analysis. The participants played a social dilemma game on tablet computers, which elicited cooperative patterns among known peers. In each round, students chose how many of their ten tokens to keep and send to a peer. Each student played as many rounds until interacted with all their classmates. To resemble a social dilemma, received tokens doubled in value to create a trade-off between social and individual incentives. We constructed a network of interacted tokens and calculated normalized PageRank to proxy individual status, which considers the total number of received tokens and the sender's social position. To account for randomness and noise in the observed behavior, we applied a degree-preserving randomization method to filter the network. In non-friend relationships, we found a negative correlation between received tokens and social status, indicating that lower-status students tended to send more tokens to higher-status individuals and receive fewer tokens in return. However, this dynamic disappeared in mutual friendship pairs where the social status difference did not impact cooperation. Unilateral friendship nominations also led to different outcomes, with higher-status nominations resulting in mutual friendship-like behavior, while lower-status nominations resulted in non-friend-like behavior. Our results indicate that friendship implies fundamental equality, regardless of social status differences. This study provides a framework to explore social relationships and inform interventions in primary education. |
Early detection of students-at-risk of failing using a dynamic network approach PRESENTER: Nidia Guadalupe López Flores ABSTRACT. Identifying students-at-risk of failing a course or dropping out of a program is a significant problem in the field of Learning Analytics. Improving their early detection is important for enabling higher education institutions to design and provide resources to better support students. In addition, learning is a dynamic process, where social interactions are crucial as learning is not an individual achievement. The identification of students-at-risk of failing or dropping out a course is generally an imbalanced problem, as grade distribution is affected by several elements, and failing students are not always fairly represented. In this study, we compare the state-of-the-art graph oversampling method GraphSMOTE with the baseline minority oversampling method to evaluate the extent to which the synthetic samples generated with GrapSMOTE improve students-at-risk predictions based on social interactions in an online discussion forum. |
Understanding the evolution of successful careers in professional tennis PRESENTER: Chiara Zappalà ABSTRACT. Science of Success has recently triggered the interest of researchers, who have examined various systems so far, from paper citations in science to reputation in arts. Despite the growing literature about the topic, little attention has been given to success in sports, especially in individual sports. Furthermore, the impact of early career stages on players' future achievements has been rarely taken into account. Usually, we imagine top players as predestined champions, who need to be extraordinarily talented and hard-working to get to the top. Nevertheless, evidence suggests that a combination of talent and effort does not guarantee success. Top players may not be born to be successful, rather some fortuitous event might have played a role in shaping their careers. Those arguments motivate us to investigate the evolution of players' careers, focusing on tennis. Although a few studies have considered top players in tennis, there is no consensus about the predictors of future success. In this work, we analyse players' early career trajectories with the aim of understanding the mechanisms that make top players emerge. We collect data from the official rankings of the Association of Tennis Professionals (ATP), together with match results from different tournaments, from 2000 to 2019. We identify the top tennis athletes based on the maximum amount of points they reached in ATP ranking. We observe that those athletes have peculiar features compared to the other players, such as a longer career, and a typical evolution of ranking points that is on average higher from the beginning of their career. To explain those effects, we study how the prestige of the first tournaments players can attend influences the progression of their careers. We propose a new measure of the level of a given ATP competition, which not only captures their historical relevance but also considers players' attendance. In detail, we build a network of co-attendance of tennis tournaments, where nodes are tourneys and links are determined by players' trajectories. This allows us to relate competitions that might be far in space and time. From the co-attendance network, we can extrapolate a measure of tourney prestige, based on their eigenvector centrality in the network. We refer to the most prestigious events as high-level tourneys. In terms of attended competitions, we can disentangle the impact of the first attendances in different ATP tourneys from the evolution of athletes' careers. Moreover, we are able to differentiate between the influence of players' participation and the importance of their performance in a given tournament. We consider the first ten competitions of each player, and we look at the level of tourneys players attend when they enter the ATP circuit. We find that the sole participation is not a good proxy for athletes' success. Instead, the level of the tourney where players win their first match allows us to characterise top players. We conclude that the prestige of the tournament where they first win a match is a revealing factor of the future success of male tennis players. Our findings highlight the key role of athletes' early career stages, and they pave the way for a systematic study of the economic implications (higher prize money, more sponsors and advertising, etc.) that follow relevant sports results and shape players' professional development. |
The Role of Immigrants, Emigrants, and Locals in the Historical Formation of European Knowledge Agglomerations PRESENTER: Philipp Koch ABSTRACT. Did migrants help make Paris a Mecca for the arts and Vienna a beacon of classical music? Or was their rise a pure consequence of local actors? Here, we use data on the biographies of more than 22,000 famous historical individuals born between the years 1000 and 2000 to estimate the contribution of famous immigrants, emigrants, and locals to the knowledge specializations of European regions. We find that the probability that a region develops a specialization in a new activity (physics, philosophy, painting, music, etc.) in the next century based on births of famous individuals grows with the presence of immigrants with knowledge on that activity and related activities. Similarly, we find that the probability that a region loses one of its existing areas of specialization decreases with the presence of immigrants specialized in that activity and in related activities. In contrast, we do not find robust evidence that locals with related knowledge play a statistically significant role in entries or exits. We address endogeneity concerns using highly restrictive fixed effects models considering any location-period-activity specific factors (e.g. the presence of a new university attracting scientists). These findings advance our understanding of the role of immigrants, emigrants, and locals in the historical formation of European knowledge agglomerations. |
Ecosystem-level datasets for Open Source Software PRESENTER: William Schueller ABSTRACT. Open Source Software (OSS) is omnipresent in our daily lives: any application on our smartphones, but also the software running in our cars relies on them through a network of dependencies. Development of OSS is often supported by few individuals, but can also be the results of thousands of people collaborating to build reliable software. The collaborative work behind OSS is typically recorded at a fine grain level through Version Control Systems, the most famous being git. Projects versioned with git are hosted on platforms such as GitHub or Gitlab, which also contain additional information characteristic of social platforms: followers, stars, sponsors, comments, etc. Mining GitHub has lead to numerous studies, especially using the database GHTorrent or directly the list of events happening on GitHub available on GHArchive (gharchive.org). But those databases lack the raw data, and are specific to the platform GitHub. Another database fills this gap, the Software Heritage Project, an archive of all open source code -- but it does not cover the social elements of the git platforms. Not only all those databases are gigantic -- preventing large analysis due to computational needs as well as self-update of the data -- , but all of them lack information. Moreover, studying a platform as a whole leads to inclusion of many irrelevant projects and other pitfalls. What is more, the network of dependencies and actual usage of software is not recorded on those platforms, but can be typically found on the package managers of the given languages, e.g. Cargo (crates.io) for the language Rust or CRAN for R. In summary, despite the apparent importance of OSS ecosystems, their evolution is still poorly understood at large scale, likely owing to the difficulty of gathering and cleaning the data. Our work aims at filling the gap between all those datasets, and provide software to build à la carte datasets corresponding to the specific needs of researchers in a reproducible way, with reasonable time and resources, and benefitting from several sanitizing methods in an automatized pipeline as well as data anonymization procedures for potential dataset release. We showcase our highly detailed datasets of 3 ecosystems: Rust (published in Nature Scientific Data), R and Julia; each including millions of timestamped interactions between tens of thousands of developers in tens of thousands of projects, over up to 10 years. Beyond the questions specific to Software Engineering itself, the analysis of these multiplex temporal networks can draw interest from Economics and Computational Social Science. |
Mapping Global Value Chains at the Product Level PRESENTER: Lea Karbevska ABSTRACT. Value chain data is crucial to navigating economic disruptions, such as those caused by the COVID-19 pandemic and the war in Ukraine. Yet, despite its importance, publicly available value chain datasets, such as the “World Input-Output Database” or the “Inter-Country Input-Output Tables”, lack detailed information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and rely instead on more aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method to infer product-level value chain relationships from fine-grained international trade data. Our method exploits the idea that geographies that specialize in the export of a certain product will tend to specialize in the procurement of its inputs. This tendency should be observed twice in trade data: upstream and downstream. Here we combine both upstream and downstream specialization patterns in a model that we optimize to identify input-output relationships. We apply our method to data summarizing the exports and imports of 300+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) to create a product-level dataset of value chain relationships for 1,200+ products. This provides an approximate method to map value chain data at the product level that should be of interest to people working in logistics, trade, and sustainable development. Our method, however, is not without its limitations. While it is designed to operate at the product level, it is not perfectly accurate, meaning that it provides some false-positive value chain relationships. Also, it does not provide a full input-output network, but a set of the most likely value chain links for each product. Despite these limitations, our results show promise by identifying 100s of value chain links at the product level. This validates the possibility of using international trade data at the regional level to identify value chain relationships. |
Marta Sales-Pardo (Rovira i Virgili University, Spain): Using inference to learn from data: Is data always good enough?
0. Mapping Artificial Intelligence in Education Research: a Network‐based Keyword Analysis ABSTRACT. Artificial Intelligence in Education (AIED) is a highly dynamic and interdisciplinary research field, so formulating a general understanding of the research scope and knowledge development of the field is both valuable and challenging. This study aims to provide a systematic, hierarchical assessment of the knowledge structure within AIED by analyzing keyword co-occurrence networks at the micro-, meso-, and macro-scales to uncover the trending topics, emerging knowledge clusters, and interdisciplinary synergies within the field. A static keyword co-occurrence network (KCN) was constructed based on the 1830 articles related to AIED, which was decomposed into five temporal KCNs representing two-year time windows to analyze the knowledge evolution of the field over the last decade. A multiscale approach was proposed to derive meaningful insights from the constructed KCNs at three distinct levels. This study provides a holistic picture of the research themes and evolutions of AIED field, and provides an important overview of the knowledge structure within the field for education researchers. |
1. Modeling Interconnected Social and Technical Risks in Open Source Software Ecosystems PRESENTER: Johannes Wachs ABSTRACT. Open source software ecosystems consist of thousands of interdependent libraries, which users can combine to great effect. Recent work has pointed out two kinds of risks in these systems: that technical problems like bugs and vulnerabilities can spread through dependency links, and that relatively few developers are responsible for maintaining even the most widely used libraries. However, a more holistic diagnosis of systemic risk in software ecosystems should consider how these social and technical sources of risk interact and amplify one another. Motivated by the observation that the same individuals maintain several libraries within dependency networks (see Figure), we present a methodological framework to measure risk in software ecosystems as a function of both dependencies and developers. In our models, a library's chance of failure increases as its developers leave and as its upstream dependencies fail. Probabilities of library failure propagate through the dependency network. We apply our method to data from the Rust ecosystem, a growing ecosystem of tens of thousands of libraries and maintainers. Our method highlights several systemically important libraries that are overlooked when only considering technical dependencies. We compare potential interventions, seeking better ways to deploy limited developer resources with a view to improving overall ecosystem health and software supply chain resilience. |
2. Resource and location sharing to improve geometric networks PRESENTER: Lotte Weedage ABSTRACT. The demand for mobile services is growing enormously, in particular due to the introduction of technologies for 5G and beyond. Infrastructure sharing is a tool in wireless communications to enhance internet coverage and capacity by increasing redundancy and over-provisioning resources. In this scenario, resources of different operators are shared and their cell towers can be collocated. In earlier work, we investigated the benefit of infrastructure sharing based on simulations and the results showed that infrastructure benefits the network in terms of throughput and coverage. However, the underlying mathematical theory is underdeveloped. In this research, we show the underlying mathematics of national roaming in terms of collocation of base station and why this performs better compared to single-operator use. We consider a model for resource and location sharing where the location of the nodes is not yet fixed. We have N operators and every provider k ∈ {1, . . . , N } has user nodes ik ∈ Uk and base station nodes jk ∈ Bk, which are identically and independently distributed by a Poisson Point Process with parameter λU and λBS, respectively. Then, the operators k > 1 places a fraction p ∈ [0, 1] of their BSs on an existing node of operator 1. When two nodes share a location, we assume the resources available at that node are doubled. For the links between users and BSs we use the disk model where every user connects to all BSs within distance Rmax (Figure 1). We show analytically that for an increasing number of operators, the expected link strength increases when the collocation factor p is small. For a large collocation factor p, sharing is not always beneficial. We derive an optimal radius Rmax depending on the number of users and the available resources in the network and we show how failures and noise in the distance affect the network. |
3. Higher-Arity Algebra for Higher-Order Networks PRESENTER: Carlos Zapata Carratala ABSTRACT. In recent years Network Science has adopted extended methodological paradigms such as hypergraphs and simplicial complexes to capture higher-order interactions in networked systems. While such approaches capture hitherto neglected aspects of higher-order networks, the underlying mathematical models are still largely based on binary adjacency. This has implications in terms of data representation, coarse-graining constraints and computational efficiency when attempting to capture higher-order interactions that cannot be broken down into binary links, as exemplified by the Borromean link of three interlocking rings. In this contribution we propose a mathematical framework via hypergraph rewriting and hypermatrix algebras that allows to formalize such forms of higher-order connectivity in a parsimonious way. Our key observation is that higher-order adjacency leads naturally to higher-arity algebras, whose general theory is poorly understood and is one of the ongoing efforts within the mathematical community. The use of general algebras of adjacency hypermatrices associated with a hypergraph, particularly non-associative ones, induces significant modifications to network-theoretic notions such as centrality measures, path distances, spectra and compositionality. Our findings on higher-arity hypermatrix algebras and their connection to hypergraph topology thus open up the possibility for novel extensions and generalizations of standard techniques in higher-order systems science. |
4. To be existent is to be stable: automatic determination of the number of communities ABSTRACT. We have recently proposed community detection by modular decomposition of Markov chain (MDMC), which includes a hyper parameter that controls the resolution of detection of multi-scale communities [Okamoto, H., Qiu, X. Sci Rep 12, 20211 (2022)]. MDMC thus automatically determines the number of communities for a scale defined by the value of this parameter. The present study seeks to establish a method to assess whether a layer of communities obtained for a given scale is physically existent or false. |
5. The Fitness-Corrected Block-Model with Applications to Online Social Network Analysis PRESENTER: Stefano Guarino ABSTRACT. We recently introduced the Fitness-Corrected Block Model (FCBM), an adjustable-density variation of the well-known Degree-Corrected Block Model (DCBM), and we contextually released as open-source software a parallel version of the solver for the maximum-entropy DCBM and FCBM. In this talk, we will describe the FCBM, show that it can be nicely approximated by a simpler model when the network is sparse, and report on its use to define data-driven social networks of geo-referenced and age-stratified individuals. Further, we will present the generalization of the FCBM to directed networks, and propose a FCBM-based approach to generate synthetic networks that mimic followerships and/or interactions in online social networks. We will show how the model can be used to study diffusion processes on online social networks, through both analytical estimates and simulations, thus representing a powerful instrument to study information diffusion and propaganda. |
6. Entropy of labeled versus unlabeled networks PRESENTER: Harrison Hartle ABSTRACT. The structure of a network is an unlabeled graph, yet graphs in most models of complex networks are labeled by meaningless random integers. Is the associated labeling noise always negligible, or can it overpower the network-structural signal? To address this question, we introduce and consider the sparse unlabeled versions of popular network models, defining them as maximum-entropy ensembles of graphs with constraints analogous to their labeled formulations, but as probability distributions over the set of unlabeled graphs rather than labeled graphs (see, e.g., the labeled and unlabeled versions of the Erdős-Rényi model, shown in the first and third row of the accompanying figure, respectively). We distinguish these unlabeled network models from the ensembles produced by sampling graphs from labeled network models and then removing their labels (see the delabeled version of the Erdős-Rényi model in the second row of the accompanying figure. We compare the entropy of the labeled and unlabeled versions of the different models, as a way of examining the amount of noise injected by node-labeling. We show that labeled and unlabeled versions of Erdős-Rényi graphs are entropically equivalent, but that their degree distributions are very different. The labeled and unlabeled versions of the configuration model may have different prefactors in their leading entropy terms, although this remains conjectural. Our main results are upper and lower bounds for the entropy of labeled and unlabeled one-dimensional random geometric graphs; the unlabeled version has entropy scaling no faster than linearly with n, which is negligible in comparison to the entropy of the labeled version, which scales as n*log(n). These results imply that in sparse networks the entropy of meaningless labeling may dominate the entropy of the network structure, suggesting a need for a thorough reexamination of the statistical foundations of network modeling -- and an opportunity for the development of a new science of unlabeled networks. For further information, see doi:10.1103/PhysRevE.106.054308. |
7. The voices of rape: Cognitive networks reconstruct emotional perspectives of rape survivors' passive and active narratives on Reddit PRESENTER: Katherine Abramski ABSTRACT. Survivors of sexual assault often feel guilty, ashamed, and even somewhat responsible for what “happened to them”, despite not being at fault. The advent of social media has provided unprecedented access to narratives of victims of violence, providing textual data that associates ideas with emotions. In this way, social media posts act as online diaries where individuals can report their traumatic experiences, but also as communication channels, where online users can engage with other people's narratives. This study uses nearly 18,000 posts from nearly 3,000 users on Reddit’s r/sexualassault board to investigate how emotional perspectives of rape survivors are structured in terms of connections between concepts within the sentences of their posts. We apply cognitive network science, a research area that constructs quantitative models of cognitive and emotional knowledge, and in particular, we adopt the recent framework of textual forma mentis networks (Stella 2020) in which nodes represent concepts in sentences while links indicate either syntactic relationships (e.g. “I was raped” yields the syntactic link “I” - “rape”) or semantic overlap (e.g. linking “violence” and “aggression” as synonyms). Following previous investigations within clinical psychology (Henley et al. 1995), we investigate the differences in emotional structure of two different networks constructed from rape narratives written in the passive voice and active voice, respectively. We find that concepts of guilt, shame, blame, fault, responsibility, and victim are all more central to narratives written in the passive voice, while concepts related to other people (e.g. family, friends, and partners) are more central to narratives written in the active voice. These results suggest that survivors who use the passive voice may view themselves more as protagonists in their stories, while survivors who use the active voice may have a stronger tendency to frame their assailants as antagonists (see Figure 1). Our findings indicate how cognitive networks can provide quantitative support to pre-existing conjectures in clinical psychology, while giving voice to rape victims and enabling a better understanding of the mechanisms and repercussions of rape across thousands of different experiences. |
8. Interpretable Graph Embeddings based on Polar Opposites PRESENTER: Franziska Heeg ABSTRACT. Graph learning has become an important tool to analyse data on complex systems that can be represented as networks. It is used for tasks like link prediction, community detection, or node classification. Those tasks often require to learn graph embeddings that can be used, e.g., to define feature vectors of nodes. While popular embeddings like Laplacian eigenmaps or node2vec are useful for machine learning tasks, they are often difficult to interpret for humans.This issue was addressed for pre-trained word embeddings by introducing the POLAR framework which uses semantic polar opposites to define polar dimensions for the embedding and to project word vectors to a new polar subspace. Apart from facilitating interpretability, the resulting polar embedding has been shown to be competitive to the original embedding in terms of performance in various different machine learning tasks. Adopting the idea of polar opposites, we propose to apply the POLAR framework to obtain interpretable vector space embeddings of graphs. Our approach allows to choose pairs of nodes that should be considered “opposites” based on the underlying network topology, where each pair of opposite nodes defines one dimension of the embedding. Possible heuristics to choose such node pairs can, e.g., be based on node distances or community memberships. We tested our framework for (i) a synthetic grid network and (ii) a character co-occurrence network of Tolkien's Legendarium consisting of The Lord of the Rings, The Hobbit and The Silmarillion. In both cases we chose polar opposites based on (i) topological distance between node pairs and (ii) semantic differences between characters in the underlying novels, respectively. The results confirm that the application of POLAR to graph embeddings yields a vector space embedding with interpretable dimensions. |
9. A biased contrastive learning debiases graph neural networks PRESENTER: Sadamori Kojaku ABSTRACT. Societal biases in networks make their way into AI systems. For example, algorithms trained on social networks may make biased recommendations based on gender and ethnic homophily, which can bias conceptional decisions such as promotions and hiring. Previous debiasing methods include the manipulation of input data, manipulation of output embedding, and constrained learning such as adversarial learning. However, manipulation of input data or output embedding may degrade the quality of embedding and introduce new biases, while adversarial learning is known to be highly unstable and sensitive to hyperparameters. Here, we propose a simple framework that utilizes an organic capacity of neural networks for debiasing. Our approach is based on \textit{contrastive learning}, which trains a neural network to discriminate between actual data (e.g., a given network) and random data (e.g., a random network). We introduce a specific bias into the random data such that the protected attributes are not informative in the discrimination, preventing biases from entering the model. Our framework can be applied to any graph neural networks without changing the model architecture and training objectives, and serves as a simple yet powerful alternative to existing debiasing approaches. Let us demonstrate the effectiveness of our training framework using a small network of co-purchased political books. A widespread graph embedding---DeepWalk---results in a clear segregation of political groups (Fig.~A). CrossWalk---a debiasing method that manipulates the input network---was not able to remove this segregation (Fig.~B). However, when we trained DeepWalk using our proposed framework, the resulting embedding does not show strong discrimination of political ideologies (Fig.~C). We also tested our framework on a link prediction task, demonstrating that it predicts edges that are more invariant with respect to protected attributes, while still improving or maintaining the link prediction accuracy (Figs.~D--K). |
10. Idiosyncratic and systematic experienced isolation in urban networks ABSTRACT. Human mobility defines urban systems, which require the transfer of goods and ideas to grow and evolve their economies. Here we develop an understanding of the way that urban spatial structure biases mobility in cities. A growing body of work identifies regularities in the way people move around cities, with interesting implications for the structure of the city, which maps onto our demands for different kinds of mobility. Many studies also point out that we sort ourselves into groups and these divisions manifest in the places we visit. A current research frontier involves building an understanding of how this “homophily” is systematic—the product of urban spatial structure—and which is idiosyncratic to whims and preferences. To advance it, we construct spatial interaction networks in London from mobile phone data and measure experienced isolation, the degree to which individuals of one group will interact with individuals of another during daily life. We then use a constrained permuation technique to observe how isolation changes under different scenarios, finding that existing mobility is much more homophilous than what would be expected according to a gravity model. Observed and permuted are far from perfect integration, which would require redistributing amenities or people. We first use kernel methods to identify functionally coherent units of the city (corridors of contiguous activity) selecting local maxima in the kernel density function representing activity across London (Fig. 1A). We test various bandwidths (sigma), finding that the number of activity centres stabilises near 300 at 200 metres and below (Fig. 1B), suggesting that this approximates the true number of mass gathering places in London. This bandwidth also represents the point at which the flow of people between regions is best described by the gravity equation (R2=0.52). We then opt for machine learning to estimate magnitude of flows between areas. The predicted value is the degree to which mobility is systematic—expected given amenity, distance and population—and the error is that which is idiosyncratic. We resample the errors (idiosyncratic) and combine with predictions (systematic) (Fig. 1C, 1D). We measure experienced isolation as the entropy per home area of the demographic composition of visited areas, showing that experienced isolation is 2 standard deviations greater than the mean obtained from permuting the network. Our results show that residents in London could reallocate existing travel in a manner that holds distance constant while visiting more diverse areas. |
11. Labor Space : A spatial representation of the labor market PRESENTER: Seongwoon Kim ABSTRACT. The labor market can be considered as a hierarchical ecosystem of various economic units --- skills, jobs and industries. Firms headhunt, and we hunt-gather some profitable skills and knowledge to survive the capitalized world. Hence, a true understanding of the labor market requires an ecological, holistic perspective considering interrelationships between industry, occupation, and skill. However, existing studies have focused on either a single unit (skill, occupation, or industry) separately or a relationship between two units, which prevents us to understand the complex evolution of the labor market system driven by multi-unit interactions. Here, we create a high-dimensional embedding space of heterogeneous units in the labor market, called Labor Space. We fine-tuned a popular pre-trained large language model, Bidirectional Encoder Representations from Transformers (BERT), using the description from O*NET and European Skills/Competences, Qualifications and Occupations(ESCO) to generate the embedding space. The Labor Space provides an ecological landscape that connects components of the labor market through semantic relations and reveals multiple consistent spatial patterns. As shown in Fig.1, the whole space is clustered into three sub-spaces: industry, occupation, and skill. Moreover, each unit is consistently aligned on an axis, possibly representing the axis from production to services sectors. Furthermore, it enables us to identify the entities strongly connected to other levels of units, for instance, industry-friendly or skill-friendly occupations in each industrial structure. Through further analyses, we expect that the Labor Space allows us to trace and predict the evolutionary processes of industries, occupations, and skills all together. |
12. Quantifying Systemic Gender Inequality in Visual Art PRESENTER: Alexander Gates ABSTRACT. From disparities in the number of exhibiting artists to auction opportunities, there is overwhelming evidence of women’s under-representation in art, rooted in a complex and largely unknown interplay between gender, artistic performance and institutional practices. Here we explore the exhibitions and auction sales of 65,768 contemporary artists in 20,389 institutions, confirming systemic gender differences in the artist population, exhibitions and auctions. We introduce a statistical test for the gender preference of an institution based on their exhibition history, allowing us to categorize art institutions into one of the three categories: man-preferred institutions or woman-preferred institutions, whose proportion of exhibitions featuring women artists is significantly smaller or larger than the null hypothesis; and gender-neutral institutions, whose proportion of women and men artists is statistically indistinguishable from the null hypothesis. Under the gender-neutral criteria that hypothesizes institutions' exhibitions mirror the population gender composition, 27.81% institutions are man-preferred and 18.60% institutions are woman-preferred (Fig. 1a). In contrast, under the gender-balanced criteria which assumes gender parity in representation, 67.04% institutions are man-preferred and only 2.63% institutions are woman-preferred (Fig. 1b).Under both criteria, the fraction of man-preferred institutions increases with the institutional prestige, while the fraction of woman-preferred institutions decreases (Fig. 1a,b). To offer an overview of the distribution of institutional inequality across the whole institutional space, in Fig. 1d we show the art institution network, whose nodes are the museums and galleries, connected to each other if they exhibit, in a statistically significant number, the same artists. The emergence of clusters dense in blue or red nodes is evidence of inequality-based assortativity, indicating that institutions with comparable degrees of inequality tend to be connected to each other, often exhibiting the same artists. Finally, we build a logistic regression model to predict an artist's access to the auction market, finding that co-exhibition gender, capturing the gender inequality of the institutions that an artist exhibits has a higher influence on auction access than the artist's gender. These results help unveil and quantify the institutional forces that contribute to the persistent gender imbalance in the art world. |
13. Cross-national analysis of loss of adherence due to stop-and-go application of restrictions against COVID-19. PRESENTER: Albano Rikani ABSTRACT. In response to the spread of COVID-19, national authorities implemented non-pharmaceutical interventions (NPIs) limiting the mobility of the population with the aim of reducing mixing and therefore viral circulation. Often following a stop-and-go strategy, the same restrictions were applied several times. Previous studies, focusing on a few countries, found evidence for a loss of adherence to restrictions over time and in relation to the level of stringency. Here, we extend the analysis to six countries, covering Europe, Africa, South and North America, and we study whether repeated occurrence of the same NPIs produces an additional effect of reduction of adherence to restrictions. We implement various specifications of a Linear Mixed Model (LMM), using daily mobility and NPIs data spanning different spatial resolutions (from municipalities to the national scale). We find robust results suggesting that adherence loss increases with the number of occurrences in the application of the tiered system in Italy, Chile and South Africa (Figure). In Italy this effect increases with the level of stringency (Figure). Within the same occurrence we find a strong effect of loss adherence in time, for all the countries but Chile. Finally, we observe, for many countries, a similar but smaller effect of loss of adherence in time following the entire period of analysis. These findings represent a first quantitative measure of the behavioral response to the repeated occurrence of restrictions and extend the geographical coverage of loss of adherence analysis. They suggest an additional waning effect in certain contexts due to the sole repetition of the measures. Next to the overall duration and stringency of the measures, pandemic preparedness plans should also consider the impact of the repetition of measures on the expected response of the population. |
14. Discovering Semantic Relationships Using a Temporal Multiplex Network For a Context-Aware Filipino Wordnet PRESENTER: Briane Paul Samson ABSTRACT. The low-resource and highly morphological setting of Philippine languages is a challenge in developing a word representation or a language model. The current linguistic resources, such as the FilWordNet [1], lack in rich semantic data that is crucial in most NLP tasks. Considering the continuously evolving nature of a language, we aim to create word representations of the Filipino language that is temporal and context aware and store them in the expanded Filipino WordNet. As a language continues to evolve and show complexity, it may be represented as a language network to show the lexical, semantic, and syntactic features of a word [2]. We developed a data processing pipeline that maintains context-aware entries of Filipino and Philippine English words, their senses, and their semantic relationships, represented as semantic sets or semsets. Using a word co-occurrence network allows for the application of community detection algorithms and determining groups of word senses to form sense representations called semantic sets, shortened as semsets, with each set representing one or more semantic relationship among the senses. Three community detection algorithms were explored to create sense communities within each ego network: the Leiden, Louvain, and CW algorithms. Two quality functions were used for the Leiden algorithm, which are Constant Potts Model (CPM) with a resolution parameter, and Modularity. Communities that contained less than 10% of the total number of alters in the network were removed (Figure 1). Each resulting community then represents a word sense for the ego. With 4,557 seed words, we performed the Leiden community detection algorithm with modularity as its quality function and generated 11,548 word senses. New senses that only appeared recently, especially in social media, were also discovered, such as “awit” and “dilawan”. |
15. Misinformation spreading on networks: a quantitative analysis PRESENTER: Lucila Gisele Alvarez-Zuzek ABSTRACT. Why do people share inaccurate news headlines? What are the possible consequences of such behavior? Disinformation is a growing, and not a new, phenomenon and we as scientists must develop a clear understanding of why people fall into them and how to prevent it. Now, what are the cognitive factors driving belief/rejection? In this research, we study the spread of misinformation in a population, driven by individual decision-making mechanisms using the Drift Diffusion Model(DDM) to discern between intuitive and rational reasoning. For the mathematical modeling, we consider an agent-based model where individuals can be susceptible to misleading information and get activated (non-activated) if they decide to share (or not) the news with a certain probability driven by the DDM. More depth, the DDM uses a one-dimensional random walk to represent the accumulation in our brain of noisy evidence in favor of two alternatives -share or not share-. As free parameters we have: i) the threshold $a$, which quantifies response caution; ii) the drift $\nu$, which measures the ability to gather evidence; and iii) the bias $z$, which represents the a priori inclination for one of the alternatives. To explore a real-life scenario, we analyze age-disaggregated data of 800 participants and use the hierarchical Bayesian parameter estimation to obtain the empirical values of the free parameter. Finding that with age people tend to have less bias and are more cautious in answering but have less ability to gather evidence. As the structure of social relationships plays a fundamental role in these types of dynamics, we take a Twitter social network and explore how shapes the spread of misinformation on this population. Finally, we construct a theoretical phase diagram with the critical values of the free parameters above which the misinformation spreads reaching the whole system, and below which it fades off. We hope our findings are a step forward toward understanding this phenomenon. |
16. Multiscale semiwalk-based measure of structural balance in signed networks PRESENTER: Szymon Talaga ABSTRACT. Many networked system have both positive and negative ties. Examples include relations of friendships and animosity between people or correlations between financial time series. Such signed networks are often analyzed from the perspective of Structural Balance Theory (SBT), which posits that configurations of relations may be balanced or unbalanced depending on the pattern of positive and negative weights. Namely, each cycle is considered either balanced if the product over signs of all edges is positive, or unbalanced otherwise. This distinction is crucial because of the fundamental result of SBT: Networks with no unbalanced cycles can be decomposed into two groups with negative edges going only between them (strong structure theorem). A relaxed version (weak structure theorem) posits that if a network has no cycles with exactly one negative edge then it can be decomposed into k groups. The central technical problem in SBT is how to measure the overall degree of balance in a network, that is, how close it is to being perfectly clusterable. A natural solution is to measure this in terms of the fraction of balanced cycles. The problem is that it is not feasible to enumerate all cycles, especially in large graphs. Another problem is how to treat cycles of different lengths as obviously longer cycles are much more numerous but intuitively should be less important. A clever solution, proposed by Estrada and Benzi, is to approximate the cycles with closed walks and assign them with weights decreasing with length. This problem is analytically tractable, as such a sum can be approximated by the trace of the exponential of an adjacency matrix, which is always defined and easy to compute based on eigenvalues of the adjacency matrix. However, SBT for directed networks is formulated in terms of semicycles, which should be approximated not with ordinary walks but semiwalks. Moreover, it was noted already by Estrada and Benzi that walk-based approaches yield results that often suggest much lower levels of structural balance than typically expected, particularly for large networks. Here we extend their approach to semiwalks and introduce a tunable inverse temperature parameter β controlling the decay rate of the cycle weights (it was briefly considered by Estrada and Benzi, but they did not study its properties in detail). We use it to show that the default walk-balance approach (β = 1) often puts too much weight on very long cycles, explaining why large networks were often found to be surprisingly unbalanced. We present methods for finding the optimal value of β and discuss physical interpretations of the walk-balance formalism, which in turn allows us to find a principled way to handle arbitrary non-uniform edge weights. We also generalize the walk-balance formalism to measure weak structural balance and define pairwise measures allowing for SBT-motivated clustering of signed networks. We illustrate our methods on a classical dataset representing sympathies and antipathies within a group of monks, and show how rising tensions eventually lead to a decomposition of the monastery. We also reconstruct networks of co-sponsorships in the US Congress and measure the increasing polarization within the US legislature. |
17. The role of collective trust in sustainability of Stack Exchange communities PRESENTER: Ana Vranić ABSTRACT. Knowledge-sharing communities are fundamental for the development of any knowledge-based society. The emergence and evolution of trust among members could be necessary for the community's sustainability and the knowledge-sharing process's efficiency. In this work, we explore the sustainability of question answers StackExchange sites, analyzing four pairs of active and closed communities on topics: astronomy, physics, economics and literature. We map Stack Exchange data onto networks and using Stochastic Block Model infer their core-periphery structure. With Dynamical Reputation Model, we quantify the change in social trust in these communities. Our main findings imply that developing a trustworthy core is important for the sustainability of community. |
18. Competition and spreading of low and high quality information in online social networks ABSTRACT. The advent of online social networks as major communication platforms for the exchange of information and opinions is having a significant impact on our lives by facilitating the sharing of ideas. Through networks such as Twitter and Facebook, users are exposed daily to a large number of transmissible pieces of information that compete to spread widely. Such information flows have increasingly consequential implications for politics and policy, making the questions of discrimination and diversity more important in today's online information networks than ever before. However, while one would expect the best ideas to prevail, empirical evidence suggests that high-quality information has no competitive advantage. We investigate this puzzling lack of discriminative power through an agent-based model that incorporates behavioral limitations in managing a heavy flow of information [1] and measures the relationship between the quality of an idea and its likelihood to become prevalent at the system level. We show that both information overload and limited attention contribute to a degradation in the system's discriminative power. A good tradeoff between discriminative power and diversity of information is possible according to the model. However, calibration with empirical data characterizing information load and finite attention in real social media reveals a weak correlation between quality and popularity of information. In these realistic conditions, the model provides an interpretation for the high volume of viral misinformation we observe online [2]. While popularity and quality might not be strongly correlated, the presence of highly connected nodes in the network raises the question of whether their content is highly influential and facilitates the spreading mechanism. Even memes with low quality will be transmitted more often if they originated from an influential source indicating a strong correlation between the success of a meme and the exposition they initially received in the early stages of the spreading [3]. [1] D. F. M. Oliveira, K. S. Chan, E. D. Leonel, Physics Letters A, 382, 3376:3380, 2018 [2] X. Qiu, D.F.M. Oliveira, A. Sahami Shirazi, A. Flammini, F. Menczer. Nat. Hum. Behav., 1:1, 2017. [3] D. F. M. Oliveira, K. S. Chan, Physica A, 525, 657:663, 2019. |
19. Complex Networks of Human-Book Interactions PRESENTER: Jae Woo Lee ABSTRACT. People easily borrow popular books from the library. A human-book interaction complex network was created using book loan data in a university library. At universities, students have a strong tendency to borrow books from their majors, forming a skewed lending network [1]. We generate two projection networks (human network and book network) from the weighted bipartite human-book interaction network. The interaction between people and books is asymmetric. A hierarchical backbone network was generated using conditional probabilities of degree connectivity between people and books. The genre that appears in the upper module changes dramatically from fantasy novels to various materials. During this time, webtoons (comics serviced on websites) appeared in Korea and were published in some books. Reading patterns vary greatly depending on an individual's membership status. Complex networks can discriminate between changes in reading patterns. |
20. Extracting Belief Networks from Tweets PRESENTER: Rachith Aiyappa ABSTRACT. Beliefs are assumptions about states of the world (e.g., ‘vaccination is safe’), views on moral and political issues (e.g. ‘abortion is not a crime’), attitudes (e.g., ‘Obama is trustworthy’). In other words, beliefs are the stances an individual holds towards entities (e.g., Obama), concepts (e.g., vaccination, abortion), etc. Many of our communications, especially persuasions, are about communicating beliefs to each other. In other words, social contagions are belief contagions and underpin many challenges our society faces, like political polarisation, and vaccine hesitancy. Recently, models have been proposed to understand the dynamic interplay between the social network structure and the underlying cognitive processes of individuals (belief systems). These are able to capture empirical phenomena, like the emergence of both simple and complex contagion dynamics in a single social system, by tying an individual's response to social influence to the coherency of their existing belief systems. However, the empirical validation of these models, if they exist, is limited in its scope to surveys. In this work, we propose a pipeline to extract belief systems from social media text (tweets) using Large Language Models (LLMs). Contrary to conventional machine learning pipelines which require training data, we first show that LLMs show state-of-the-art performance in extracting beliefs in a zero-shot (no training data) setting. Next, we use this validated pipeline to extract the belief systems of a pro-vaccine community involved in the US Covid-19 discourse, during two different time periods. We find that the belief systems during the two periods differ from one another but are by themselves, coherent. This indicates the presence of an event or a series of events that may have resulted in this change, motivating future work in this direction. In sum, we demonstrate that LLMs in a zero-shot setting show impressive performance on stance detection and can be used for extracting belief networks of individuals or communities on social media for large-scale validation of recent social contagion models. |
21. Opinions spread faster when social networks change slower PRESENTER: Yara Khaluf ABSTRACT. In today’s fast-paced and interconnected world, opinion dynamics have a significant impact on human life. By understanding how individuals and groups form beliefs and make decisions, we can fundamentally contribute to social challenges such as social stability, conflict resolution, and urban growth. A critical part of opinion dynamics is the ability of society to adapt to new information (i.e., opinions and choices). If individuals and groups are not able to adapt, they become entrenched in their existing beliefs and perspectives. This can lead to group polarization, failure in recognizing new opportunities, and lack of innovation and progress. The spread of new opinions occurs mostly through social influence, where individuals are influenced by the opinions of those around them. Several studies1234 confirmed that the more connected individuals are, the easier new opinions spread throughout the population. This helps the formation of consensus and the adaptation to new opinions with superior quality to the old population belief. Nevertheless, most studies did not account for the impact of individual “digestion time”, that is, the amount of time that individuals take to process and adopt new information or opinions. Recent studies56 have shown that when there is a long digestion time, the population is only able to adapt to new better opinions when the individuals are sparsely connected. In this study, we show that fast network rewiring can also have a negative impact on the ability of the population to adapt to new better opinions. Through network rewiring, individuals change their connections with other individuals in the network, including forming new connections, strengthening existing connections, or removing connections. In our study, we consider different network topologies and investigate different rewiring mechanisms. While slow network rewiring helps the spreading of opinions, fast rewiring prevents the population from collectively processing new information and adapting to better opinions. Understanding the opinion dynamics in populations composed of individuals that change their social connection fast (or slowly) can have an important role in understanding why collective behavior differs between real-world and online social networks. |
22. Mapping the structure of an online political community ABSTRACT. The study of political participation and deliberation on online platforms has become a growing field of interdisciplinary research in recent years, as the web in general and social media in particular have become increasingly important spaces for political communication, organizations, and activism. In addition to general-purpose social media platforms like Facebook and Twitter, platforms specifically designed for political participation are of particular interest. These platforms aim to create an online community where users can engage in political discussions, share information and news, mobilize around political causes, or participate in deliberatory processes. Among this class of platforms, Decidim is one of the most broadly used by all kinds of communities, organizations, and institutions since it allows citizens to participate in decision-making processes and shape public policies. It is an open-source software specifically designed to maximize the engagement and create more inclusive, transparent, and accountable governance. Decidim provides a variety of tools and features to support online consultations, voting, deliberation, and co-creation, including discussion forums, proposal creation, and real-time voting. We introduce a network science framework to map the political landscape by analysing the actions of participants on the platform. By leveraging network analysis techniques, we aim to gain insights into the structure of an online political community emerged around a platform and a process. We focus on mapping the similarities between users, in particular in the commenting and voting of proposals. For this, we use two-mode networks: one for comments and one for endorsement. Moreover, by projecting this network onto the users, we can create a one-mode network of users where the edges represent the relationships between users based on their interactions with political content. Here, we follow the method introduced to study audience overlap and we use the phi-correlation coefficient to weight our projected networks. A key point in the evaluation of the deliberative quality of a process is the level of disparity in participation: We propose to evaluate the participation disparity by measuring the Gini coefficient of both the actors and the proposals in both the comment and the endorsement layers. We identify communities of users with similar interests to understand the ways in which they influence each other, or to track the activity of influence groups on the platform. The proposed method is applied to study the Conference on the Future of Europe, the first large-scale participatory process launched by the European Union and hosted on Decidim. Results help to get some insight on how a participatory process can work on a transnational scale. |
23. Understanding Vulnerability to Misinformation: Personal Characteristics Associated with Misinformation Consumption and Spreading in Hungary on Online Social Networks PRESENTER: Julia Szamely ABSTRACT. In recent years, fake news has become a widespread phenomenon of increasing concern, affecting a wide range of domains, including politics, the economy, healthcare, and environmental action. Understanding its causes is of fundamental importance, to be able to tackle the problem effectively. While considerable efforts have been made in this regard, scant availability of appropriate data has so far prevented a deep exploration of the personal characteristics associated with misinformation engagement. To contribute to filling this gap, in this study we set out to identify demographic and socioeconomic characteristics associated with online misinformation consumption and spreading in Hungary today, as well as to explore how these characteristics affect the dynamics of misinformation spreading. To this end, we collected a paired dataset of digital footprint (Facebook) and survey data, allowing for direct insight into the relationship between sociodemographic characteristics and online behaviour patterns. We characterize misinformation engagement of individuals from the Facebook dataset, by considering all posts or comments containing a link pointing to a misinformation website. Misinformation websites are collected from a variety of sources – various Hungarian misinformation website lists compiled by journalists and academics – to ensure the inclusion of most misinformation pieces in our dataset. Personal characteristics (obtained from both the survey and the digital footprint data) of individuals engaging more with misinformation pieces are explored by means of statistical analysis. The survey provides demographic and socioeconomic characteristics directly, while we obtain further personal characteristics from online behaviour patterns in the digital footprint data. Among other measures, we obtain network size, breadth, and depth of users’ networks, as well as we identify communities in the Facebook dataset based on Facebook page engagement. Specifically, obtaining projections of the bipartite networks of page engagements, and detecting both user and page communities to be used as personal characteristics. Finally, these variables are used in regression analyses to gain insight into the relationships between personal characteristics and misinformation vulnerability. Preliminary calculations are carried out on a pilot dataset, with sample collected from 150 college students in Hungary. A larger dataset, representative of the Hungarian Internet user population is under collection. The preliminary results show that age, willingness to vote, and right-wing political views positively significantly affect misinformation consumption, however the effect of right-wing views is moderated by willingness to vote on misinformation sharing, i.e. the effect is smaller for respondents that are right wing, but are also likely to vote. Building on these results, future work will include modelling misinformation and spreading dynamics on a simulated network mimicking empirical online social networks to gain insight into the effects of the identified personal characteristics on misinformation spreading dynamics, with the identified characteristics included as model parameters. Finally, the results obtained from modelling will be compared to the data to validate our findings. |
24. Measuring the Complexity of an Exchangeable Graph PRESENTER: Anda Skeja ABSTRACT. Model complexity remains a key feature of any proposed data generating mechanism. Measures of complexity can be extended to complex patterns such as signals in time and graphs. In this paper, we are concerned with the well-studied class of exchangeable graphs. Exchangeability for graphs implies a distributional invariance under node permutation and is a suitable default model that can widely be used for network data. For this well-studied class of graphs, we make a choice to quantify model complexity based on the (Shannon) entropy, resulting in graphon entropy. We estimate the entropy of the generating mechanism of a given graph, instead of choosing a specific graph descriptor suitable only for one graph generating mechanism. In this manner, we naturally consider the global properties of a graph and capture its important graph-theoretic and topological properties. Under an increasingly complex set of generating mechanisms, we propose a set of estimators of graphon entropy as measures of complexity for real-world graphs. We determine the large--sample properties of such estimators and discuss their usage for characterizing evolving real-world graphs. |
25. Analyzing Twitter Trends in the 2022 French Election PRESENTER: Boleslaw Szymanski ABSTRACT. Regressions trained to predict the future activity of social media users need rich features for accurate predictions. Many advanced models exist to generate such features, however, the time complexity of their computation is often prohibitive when scaling enormous data-sets. Some studies have shown that simple semantic network features can be rich enough to use for regressions without requiring complex computations. In terms of social media semantic networks, previous work used semantic networks generated from sentences as features for a time series regression to capture the volatility of the stock market. In general these approaches involve creating a semantic network for each person or object in the study. The alternative approach is to create large-scale, singular semantic networks that can be used to describe all users. An approach like this might be better for analyzing users since it can take advantage of the nuanced relationships between different social media communities, which cannot be done with an approach that only generates an individual semantic network for each user. We create a weighted semantic network between Twitter hashtags from a corpus of 3.7 million tweets related to the 2022 French presidential election. A bipartite graph is formed where hashtags are nodes and weighted edges connect the hashtags reflecting the number of Twitter users that interacted with both hashtags. The graph is then transformed into a maximum-spanning tree with the most popular hashtag as its root node in order to construct a hierarchy amongst the hashtags. We then provide a vector feature for each user where the columns represent each of the 1037 hashtags in the filtered data-set and the value for each column is the normalized count of interactions for the user with that hashtag and any children of the hashtag in the tree. In order to validate the usefulness of our semantic feature we performed a regression experiment to predict the response rate of each user with six emotions like anger, enjoyment, or disgust. The emotion data is manually annotated by a team from DARPA as part of the INCAS program. We provide a baseline feature for comparison which simply counts the number of times a user interacts with each of the 1037 hashtags. Both the baseline and our semantic feature perform well with the regression with most emotions having $R^2$ above 0.5. The semantic feature outperforms the baseline feature on 5 out of 6 emotions with a p value of 0.05. These results suggest that our semantic feature could be considered for use in further experiments predicting social media response on big data-sets. |
26. Measuring meaningful user influence on temporal networks PRESENTER: Pablo Soto ABSTRACT. On social media platforms, users establish communities to share their opinion and make an effort to influence others, and over time make a reputation for themselves. Users’ influence is a valuable asset, they can determine trends, political opinions, and misinformation. Prior work has provided multiple metrics to measure user influence, however, most of them focus mainly on the most recent interaction, and do not consider the role of priors in users' influence. Some just consider users' information (e.g., geolocation, profession, etc.) or sentiment content in the post. Taking consideration of past interactions is crucial nowadays since it is very common for users to post something that becomes viral but does not contribute to building influence. In our work, we study how meaningful influence can be measured considering prior interactions and how it affects the present network. To demonstrate this, we evaluate nodes’ centrality on temporal networks. For this, we implemented a weighted PageRank [1] that measures the centrality value based on users’ prior interactions. Edge weight represents relationship strengths with others over the topic timeline and is measured by a geometric weighted degree [2] with a decay factor to count the number of interactions between users. Figure 1 shows how the green node and blue node have the same centrality in Tn, but their temporal centrality will differ when considering their prior interactions (T1, …, Tn-1). Our experimental analyses using random graphs and real data sets show that our method can determine, from past interactions, influential actors that, with prior methods, could not have been discovered from viral or temporal actors. |
27. Biases in News Reporting and Consumption PRESENTER: Alessandro Galeazzi ABSTRACT. One of the most pressing challenges in the digital media landscape is understanding the impact of biases on the news sources that people rely on for information. Biased news can have significant and far-reaching consequences, influencing our perspectives and shaping the decisions we make, potentially endangering the public and individual well-being. To combat mis- and dis-information, many have begun to evaluate the reliability of news sources, but these assessments often only examine the validity of the news (narrative bias) and neglect other types of biases, such as the deliberate selection of events to favor certain perspectives (selection bias). Using machine learning to classify content, we build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases. Our results show that the source classification provided by third-party organizations closely follows the narrative bias dimension, while it is much less accurate in identifying the selection bias. Moreover, we found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions. Lastly, analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions. |
28. Sensitivity of Boolean attractors to small network changes and its implications for the inference of microbial interaction networks PRESENTER: Jyoti ABSTRACT. With the advent of next-gen sequencing techniques, the study of microbial dynamics has become ever more feasible. Numerous network inference techniques have been developed to determine the interactions among microbial species by using their abundance patterns as stable states or attractors of the system. Because the true network is often not known, we need to rely on the comparison of attractors (abundance patterns vs. the attractors produced by the inferred network) to assess the quality of the inferred network. Current approaches, which try to infer interaction networks using attractors, often fail to provide networks that can reproduce the complete set of initial attractors. Even with a fairly well-predicted network, the overlap of attractors can still be very low, as small changes in the network can lead to large changes in the attractor set. We use this observation to generate an evolutionary algorithm that uses the differences in attractor sets and drives a partially inferred network towards a network that can generate the initial attractors and in most cases, allows inference of the original network. The algorithm was achieved by studying the change of attractors under Boolean threshold dynamics with respect to structural changes in the network. Sensitivity experiments, under single and multiple edge rewiring (extended to edge addition/removal), were conducted on simulated random undirected networks with positive and negative interactions (mimicking mutualistic and competitive interactions in microbial systems). We show that these topological changes can be predicted from the attractors of the network before and after the modifications. The algorithm was able to predict a network that can produce the initial attractors with 95% and 79% success rates for graphs with 8 and 10 nodes, respectively. For real microbiome abundance data, we see almost 50% increase in the concordance of attractor sets compared to that of ESABO. Our algorithm can be used in combination with any network inference method working under Boolean dynamics. It is not restricted to microbiome analysis and can be used with any category of abundance data. |
29. Measures for causal network inference PRESENTER: Filipe Barroso ABSTRACT. We propose a method to infer a causal network of a complex system from data, by evaluating a measure of association between events. Two events are said to be causally linked if the occurrence of one increases the probability of occurrence of the other. Thus, causal networks encode heterogeneous causal or influence relationships between objects in a complex system. Identifying the underlying causal network allow us to study the propagation of influences of events. These networks are called causal networks and they find applications in several areas, from production lines to telecommunication networks or medicine. However, the causal relationships in large complex systems are often not known, and relying on expert input for manual identification is heavily time-consuming. Thus, automatic ways of inferring causation are of prime importance. We propose a normalized measure, similar to a certainty factor, defined as W\left(n\mid m\right)=P\left(n\mid m\right)-P\left(n\mid\bar{m}\right). The larger the absolute value of W, the likelier the causal relation is. To validate this method, we computed weights from synthetic data. In order to generate the data, we created a random acyclic graph, with the condition that has a single sink node. Then, we generated conditional random variables compatible with the graph and finally discrete values are generated according to those random variables. We can compare the most significant connections obtained with the ones in the original graph. A connection is deemed significant if its weight exceeds a threshold determined as the value immediately before the identified network ceases to be connected. This threshold can be identified quickly through binary search. The network can be further refined by identifying triangles and computing weights in the presence of a third variable. If for all combinations, the weights are close to zero, the connection can be deemed as created by a common cause and thus removed. We find excellent agreement between the inferred network and the original synthetic network, as illustrated in Figure 1. |
30. Noise-cleaning the precision matrix of fMRI time series PRESENTER: Francesca Santucci ABSTRACT. Rapidly changing brain activity patterns are constrained by an underlying Structural Connectivity (SC) network that evolves on much larger time scales. Studying the relationship between brain structure and emergent cognitive functions is an open issue and a critical challenge in neuroscience. Functional Connectivity (FC) is usually estimated through the correlation matrix C of functional time series of activity (usually BOLD fMRI signals at rest) corresponding to different brain areas. An alternative estimator of FC is the precision matrix J, the inverse of the correlation. Indeed, differently from C and within a Gaussian approximation, J accounts for direct causal relations only, and it may be understood as a model of more direct anatomical connections between brain areas. In fact, the estimation of FC from J has been shown to present several advantages, such as being a more accurate estimator of SC, providing better prediction scores for some diseases, or capturing intra-subject FC differences better. However, the use of J is severely limited by the low accuracy of its statistical estimation due to the short length (with respect to the number of brain areas) of the typical fMRI time series. Several techniques for the statistical inference of J have been proposed. Yet, a systematic benchmarking of these noise-cleaning techniques is still missing. In this work, we compare several standard noise-cleaning algorithms for the inference of J from short time series of the typical size and length of fMRI data. We add to the comparison recent methods grounded on Random Matrix Theory (RMT), such as the Optimal Rotationally Invariant Estimator (RIE), known in statistical physics and theoretical finance but not yet employed in neuroscience. The algorithms' performances are evaluated on time series, both synthetic (sampled from a multivariate Gaussian model, with parameters determining the correlation between time series and the amount of undersampling) and empirical (fMRI brain activity of human subjects at rest). We propose a variant of RIE, that, in the severe undersampling regime typical of the fMRI series, outperforms all the other estimators and two further simple methods: a variant of the PCA, that systematically improves the inferred J with respect to PCA, and two early-stopping Gradient Ascent algorithms, being the best-performing algorithms for high undersampling and weak correlation. |
31. Accounting for network dependencies when assessing covariate effects via Graphon-structured residuals PRESENTER: Nurzhan Sapargali ABSTRACT. The primary interest in analyzing network data in many cases lies in assessing the effects of exogenous covariates on edge formation rather than understanding structural aspects of the observed network pattern. Yet most models for network data focus on the latter issue and, most importantly, impose specific structural assumptions. To formulate a generalized linear model framework that, on the one hand, allows for straightforward incorporating and interpreting covariate effects and, on the other hand, takes account of the complex dependency structure without encompassing too restrictive assumptions, we introduce graphon-structured residuals. More specifically, we develop an extension to the Graphon model where one can use covariate information. We achieve this by extending the linear predictor by a Graphon and using a suitable link function. The Graphon model is, in this context, an appropriate modeling framework since, according to the Aldous-Hoover theorem, the family of Graphon models comprises the probability distribution of any infinite vertex-exchangeable networks. This characteristic makes it a flexible modeling tool without restrictive structural assumptions. Moreover, our approach heeds recent calls that network dependence can lead to spurious associations if not accounted for and that one should test a substantive theory with models with adequate predictive power. In three application cases to binary, count-valued, and weighted networks, we showcase the need to account for dependencies and the link-prediction abilities of our model. Finally, we supply a user-friendly R package to facilitate the adaption of Graphon models in applied network data analysis. |
32. Self-consistent inference framework for the gravity model: overcoming limitations of proxy-based analyses PRESENTER: Daekyung Lee ABSTRACT. The gravity model is a commonly employed framework for analyzing macroscopic flow patterns in geographically correlated systems. In this model, the flow between nodes $i$ and $j$ is represented as $f_{ij} = m_{i}^{out}m_{j}^{in}Q(r_{ij})$, where $m_{i}^{out}$ and $m_{i}^{in}$ denote the outward and inward \emph{mass} of the $i$th node, respectively, and $Q(r)$ is the deterrence function that reflects the distance dependency of the flow. Traditionally, the mass of each node has been considered unobtainable, which has led to the use of proxies such as population or GDP in previous research. Although proxy-based analyses are commonly used in various fields, their use poses a significant limitation on the results' validity. We propose a new framework that allows for the computation of both the mass and distance-dependent function iteratively, using flow data without the need for proxies. Our method is validated using synthetic flow data, and we apply it to the international trade network to observe each country's economic environment and regional characteristics of international trade. Our self-consistent framework can be extended to various fields, such as urban planning, traffic engineering, and infrastructure design. |
33. Economic Complexity algorithms in non-bipartite networks PRESENTER: Emanuele Calò ABSTRACT. Economic Complexity (EC) algorithms are tools that help assess countries’ industrial levels and growth potential. In this new view, international trade among countries can be described as a bipartite network composed of countries on the one side and thousands of their exported goods on the other. Countries and products are mutually linked, but no link is considered between countries nor between products. The network structure can thus be expressed in terms of the country-product binary matrix M, whose entries Mcp are 1 if the country c exports the product p and 0 otherwise. EC algorithms assign one quantity for each type of node, that is, one for countries and one for products, assessing respectively the quality of national industrial systems and the complexity of commodities they export. Moreover, these two quantities allow shedding light on the hidden capabilities of the system, that is, all the intangible assets driving their development and wealth, such as infrastructures and educational systems. The two main EC algorithms are Economic Complexity Index (ECI) and Economic Fitness Complexity (EFC). Both these algorithms cease to work when the network is not bipartite, even if there is only one “weak” link between two nodes of the same class (e.g., country-country or product-product). Our task has been to generalize them to non-bipartite (unipartite) networks, specifically by recasting the formalism in terms of the network’s adjacency matrix. Indeed, in the bipartite case, the adjacency matrix is a block matrix whose diagonal blocks are made of zeros, while the non-diagonal blocks are M and its transpose. Once the algorithms are expressed in terms of the adjacency matrix, we can generalize the algorithms to the non-bipartite case simply by considering adjacency matrices whose diagonal blocks are non-zeros. By doing so, we notice that each node has two attributes, e.g., in the EFC algorithm, every node has both a fitness and a complexity value. Furthermore, the formalism we introduced allows us to deal also with directed graphs, since it suffices to consider non symmetric adjacency matrices. Finally, the generalized version we developed allows us to study systems represented by unipartite networks for which before it was not possible to use EC algorithms. Despite EC algorithms being born in an economic context, they have also been applied to study ecological bipartite networks, specifically mutualistic systems such as plant-pollinator, seed-disperser, and anemone-fish. Going in this direction, we applied the generalized version of EFC to study the complexity of the prey-predator ecosystem in Florida Bay (a directed unipartite network) and disclose information on the hidden capabilities of the organisms in the system. In particular, considering the equations defining the algorithm, predators having high fitness eat a large variety of prey, from very simple to very complex. On the other hand, predators with low fitness eat only the organisms most predators eat. While if a predator with low fitness eats one prey, then the complexity of the prey is low. Our interpretation in terms of the capabilities of the system is the following. Fitness is determined by the number of organisms eaten, each weighted with its complexity. On the one hand, it means that if a predator has high fitness, then it has more capabilities than others (i.e., more clever, more versatile, or faster), as in the case of the Barracuda. On the other hand, organisms with high complexity weigh more in determining the fitness of a predator. Thus, we associate the complexity of an organism with the difficulty of being eaten. Indeed, the Green Turtle has low fitness because it is mainly herbivorous, but it has high complexity because it is a problematic animal to prey. In the fitness-complexity plane, we can therefore identify four regions, i.e., top right carnivores, top left herbivores, bottom left plants/animals which are only eaten, and bottom right decomposers (see Fig. 1). Decomposers are not present in the Florida Bay dataset, though they appear in other systems we have analized. |
34. Quantifying structure and diversity in film festival networks PRESENTER: Andres Karjus ABSTRACT. The film festival circuit is an integral component in the global film industry, yet has remained relatively understudied from a big-picture quantitative perspective. This is partly due to the lack of available data, and the hitherto dominant qualitative focus of related academic disciplines. In the existing literature, however, the film festival circuit is often discussed as a network, even if not quantified as such (De Valck, 2007; Elsaesser, 2005; Loist, 2016). Here, we employ data from the widely-used industry platform Cinando of Marché du Film - Festival de Cannes (2009–2021; 618 festivals, 32,700 films; Fig. 1A). We model festival events as a network connected by shared films, revealing a (bi)ennial rhythm of connections characteristic of the circuit (Fig. 1B). We propose to quantify festival profiles using showcased films' metadata, but argue against using count distributions of discrete metadata categories (i.e., genre, production country), as they tend not to be equidistant. Instead, we embed them in continuous latent vector spaces. In some cases, these can be externally sourced: e.g. geographical coordinates can be used to constitute the space of production countries. In others, such as “genre”, the latent space can be directly inferred from co-occurrence statistics, analogously to word embeddings in NLP. We demonstrate how these “festival embeddings”, where festivals are mean vectors of their films, provide insight into the non-random degree distribution in the festival network. We also utilize festival embeddings to measure diversity and changes in festival programming. Our operationalization of film festival data holds promise for researchers to better understand industry and cultural dynamics, as well as for festival programmers, industry actors and policymakers to make more informed decisions. While our methodological contribution focuses on the film industry, the approach described here could be applied to any ecosystem of related or connected events, festivals, or fairs. |
35. A new metric to measure similarity between nodes in bipartite networks: the Sapling Similarity PRESENTER: Giambattista Albora ABSTRACT. Many complex systems can be reduced to the interaction between two classes of different objects. An effective way to represent these systems is through bipartite networks, in which links can connect only nodes belonging to the two different classes. For instance, the bipartite network representing which country exports which product is the basis of the economic complexity framework. An important measure in bipartite networks is the measure of the similarity between nodes of the same layer that is very important for instance in economic complexity when one wants to measure the relatedness, a value that quantifies the affinity between a country and the export of a product. Indeed, fixing a target country, the traditional way to measure its relatedness with products is a collaborative filtering where one looks either at which products are similar to the products that the target country already exports, or at which products are exported by countries that are similar to the target country. Usual approaches to measure the similarity between nodes in bipartite networks are based on co-occurrences, that is how many times two nodes are linked with the same node of the other layer. Some well-known examples are Jaccard Similarity, Cosine Similarity, or, in the economic complexity framework the Product Space. These methods allow only positive values for the similarity, however this brings to a loss of information since it is possible that, for instance, knowing that a country c1 exports a product p is a hint of the fact that another country c2 is not connected to p and in this case we would expect a negative value for the similarity between c1 and c2: this is the case of Zambia and Japan. We introduce a new metric to measure the similarity which allows also negative values: the Sapling Similarity. We show that Sapling Similarity outperforms other similarity metrics when it is used to measure the relatedness between countries and products, to do recommendations of movies and products to people and to forecast future visits of people in geographical locations of a city. Finally, in the economic complexity framework, we show that machine learning algorithms like XGBoost outperform collaborative filterings in the task of measuring relatedness. Sapling Similarity can be used to do feature selection reducing the number of features and improving both the computational time to train the machine learning algorithm (that is lowered) and the quality of the relatedness measure. |
36. Propagation of disruptions in supply networks of essential goods: A population-centered perspective of systemic risk PRESENTER: Christian Diem ABSTRACT. The Covid-19 pandemic and the Russian invasion of Ukraine drastically emphasized the fragility of national and international supply networks (SNs)[1], leading to significant supply shortages of essential goods for people, such as food and medical equipment. Severe disruptions that propagate along complex SNs can expose the population of entire regions or even countries to these risks. ). Models for shock propagation and systemic risk assessment in supply- and production networks usually focus on the effects on firms [2]or the overall economy [3], but neglect the systemic risks threatening the basic supply of the population. The main goal of this work is to develop an economically and mathematically founded shock propagation methodology that allows us to simulate the propagation of supply chain disruptions, starting from any given point in the network, downstream until reaching the population. Further, we develop a systemic risk index, SRIcrit, to define the criticality of each actor in the SN. The method is able to identify those actors that carry high systemic risk for the supply of the population—modelled at the level of local administrative districts—with essential goods and to estimate how strong a given disruption will affect the population locally; in particular also which parts of the population will be affected and how severely. The design of our simulation study consists of five main steps: 1. observing the undisrupted supply network, 2. assuming (temporary) disruption of a specific node, 3. simulating how this node’s disruption propagates downstream, 4. calculating for each population node (district) the remaining local product supply, 5. calculating the fraction of population below a critical supply threshold We demonstrate this method on a large food SN of a European country including 22,938 business premises—e.g., the distribution centers and stores of the major supermarket chains—, 44,355 supply links and 116 local administrative districts. Fig. 1 visualizes how the failure of one node affects the supply level of the 116 districts. The fraction of the population falling below the critical supply threshold of 66% (red dashed line) is the systemic risk index, SRIcrit, of the failing node. We rank all business premises with respect to their criticality for the districts’ population with the proposed systemic risk index, SRIcrit, to identify around 30 premises that—in case of their failure—are expected to cause critical supply shortages in sizable fractions of the population. The new methodology is immediately policy relevant as a fact-driven and generalizable crisis management tool. This work represents a starting point for quantitatively studying SN disruptions focused on the supply security of the population. |
37. Greening Austria’s transport system: A computational approach for estimating the required rail network capacities under new waste transport regulations PRESENTER: Johannes Stangl ABSTRACT. Austria's transport sector is the second-largest contributor to the country's annual greenhouse gas emissions [1]. A significant share of these emissions can be attributed to the transport of waste between municipalities, firms, recycling facilities, and landfills. To address this, a new regulation has been implemented in Austria requiring waste transports weighing over 10 tons and exceeding a certain distance threshold to be transported via rail [2]. The distance thresholds will become increasingly stringent, starting with 300 km in 2023, 200 km in 2024, and 100 km in 2026. This study evaluates the impact of the new regulation on the Austrian rail network, utilizing the complete network of waste transactions collected by the Environment Agency Austria for the year 2020. We calculate the road distances for 500,000 waste transports and assign the nearest source and target train stations to those that fall under the new regulation. The accumulated waste weights for the different distance thresholds are then aggregated to 200 train stations in Austria. Figure 1 shows the estimated accumulated waste weights for the 20 most impacted train stations. The findings reveal that the capacities of certain train stations to handle waste transports will have to be significantly increased. In some cases, the increase of accumulating waste is estimated to be an order of magnitude higher for the 100 km distance compared to the 300 km threshold. The study highlights the need for rail transport companies to invest in Austria's rail infrastructure to manage the growing volume of waste in order to reduce transport emissions effectively. It also provides a compelling case for employing computational approaches and data analysis tools to support evidence-based policymaking. References: [1] M. Anderl et al., “Austria’s Annual Greenhouse Gas Inventory 1990 – 2021”, Environment Agency Austria, 2022. https://www.umweltbundesamt.at/fileadmin/site/publikationen/rep0841.pdf (Accessed: Feb. 14, 2023) [2] Austrian Government, “AWG-Novelle Kreislaufwirtschaftspaket“, 2021. https://www.ris.bka.gv.at/Dokumente/BgblAuth/BGBLA_2021_I_200/BGBLA_2021_I_200.pdfsig (Accessed: Feb 14, 2023) |
38. Ultimatum Game: Evolving network with status and asymmetric role allocation PRESENTER: Hana Krakovská ABSTRACT. Researchers have been examining the emergence of fairness by using the Ultimatum Game~\cite{guth1982} as a framework for decades now. In the game, there is a reward and two players with asymmetric roles. The first player, called the proposer, suggests how to split the reward, and the second player, called the responder, then decides whether to accept or reject the split. If they accept, the reward is divided as was suggested, while rejection results in neither player getting anything. A rational and payoff-maximizing agent should always accept any non-zero split, while a proposer should offer the smallest possible split. In contrast to these theoretical assumptions, experimental research on the Ultimatum Game has revealed different results. Responders in industrialized societies often reject proposals at 10-20\%, and proposers commonly offer 40-50\% of the reward \cite{camerer2011}. The results in small-scale societies and tribes are more diverse, with modal offers ranging from 25 percent to 58 percent \cite{henrich2005}, yet still showing behaviour diverging from the theoretical predictions. In the field of evolutionary game theory, various mechanisms that lead to offer thresholds approaching fair values have been proposed \cite{debove2016}. For example, role-alternating models suggest that the possibility of switching roles can lead to fair offers. Other models, such as those based on reputation, noise, empathy, spite, and spatial structure have also been explored. Many game-theoretic models involve players with dual roles, playing each role with equal probability. However, such models may neglect the dynamics of societies and situations with hierarchical structures. The aspect of asymmetric role allocation has been partially investigated by Han et al.~\cite{han2018}. Their results showed that experiments where players played both roles with equal probability promoted the evolution of lower offers, while unequal (single-role) role allocation led to the emergence of fairer behaviour. In our work, we explore the effects of asymmetric role distribution on the evolution of players' strategies in the Ultimatum Game. To do so, we consider an evolving network where players play the Ultimatum Game with their neighbours. If dissatisfied with their neighbours' strategies, players can cancel social ties and form new ones. Each player is assigned a proposer and responder strategy as well as a status variable that determines their frequency of being in each role. The existence of status has interesting implications. Consider a simple scenario where players all have the same strategy of how much they offer, and this offer $O$ is accepted by all players. In the case where players have equal role distributions, the expected payoff per game is always half of the reward, no matter what the value $O$ is. However, in the case of unequal role distribution, there are different expected payoffs for players with different status values (see the expected payoffs in Figure~\ref{fig1} for a uniformly distributed status in the population). In this work, we investigate various scenarios of how the status variable is coupled with the payoffs from the Ultimatum Game. We consider static status that does not change throughout the simulation and a running average of payoffs influencing the status of a player. We examine the co-evolution of strategies and interaction networks, the impact of evolutionary update rules, payoff-status feedback, and initial strategy distributions on offer and rejection thresholds. We report on the effect of different network topologies, such as scale-free network, Erd\"{o}s-R\'{e}nyi, and small-world network, and analyse the value of the status variable for both high-degree and low-degree nodes in the networks. We also look at how these properties influence the status distribution, social balance and the distribution of accumulated wealth between agents under different scenarios. |
39. Graph neural network mitigates nonlocal cascading failures PRESENTER: Hoyun Choi ABSTRACT. Often, cascading failures(CFs) in a complex network propagate nonlocally: after an initial local disturbance, successive failures are distant from the source. Since the avalanches can spread unexpectedly, the prediction and mitigation should be handled carefully. Here, we propose a strategy for mitigating such nonlocal CFs. First, we introduce Avalanche Centrality(AC), a novel centrality measure of a node representing its impact on CFs. Then, the nodes are prioritized in the order of ACs and reinforced for mitigation under limited resources. Compared to other heuristic measures, AC is efficient in reducing avalanche size; however, due to the nonlocality, they require heavy computation. To resolve this problem, we develop a graph neural network(GNN). We begin by training a GNN to predict the AC of each node using a large number of small networks. Once trained, the GNN can predict ACs in large networks and real-world topological power grids within a manageable computation. Thus, mitigation of nonlocal CFs in large networks is achieved by employing AC and GNN. The framework developed in this study can also be implemented in other complex processes that require unfeasible computation to simulate in large networks. |
40. Clustering in the d'Alembertian Eigenspace of Temporal Networks PRESENTER: Chester Tan ABSTRACT. Causal set theory is a proposal for quantum gravity, which models spacetime as a discrete partially ordered set of spacetime atoms \cite{glaser2014closed}. A common type of temporal networks can be represented by a set of temporal edges, or \emph{events}, of the form $(i, j, t)$ representing interactions between nodes $i$ and $j$ at time $t$ \cite{holme2012temporal}. This study investigates modelling temporal networks and other directed acyclic graphs with a partial order structure \cite{badie2022directed,clough2017embedding} with the causal set d'Alembertian and its spectral geometry. The framework presents solutions to two notable challenges of temporal networks: infinite valence, and non-local interactions between events. Where it is often unclear exactly how past events influence future events, the relations in the causal set represent possible causal connections between events, and in the limit of infinite time, the possible causal connections an event has can be infinite. Events in a temporal network can connect directly to events very far in their past, which presents challenges to model locality in the temporal network. The d'Alembertian is the Laplacian operator for continuous spacetime, and the discrete d'Alembertian is the discrete Laplacian operator for causal sets \cite{glaser2014closed}. The construction of the discrete d'Alembertian considers adjacencies by \emph{layers} as shown in Figure \ref{fig:temporal-networks}, proposed as a solution to non-locality. The number of required layers for an embedding in a finite number of dimensions is finite and scales linearly with the number of embedding dimensions, which solves the problem of infinite degrees. The eigenvectors of the discrete d'Alembertian are studied here as a discrete momentum space operator, which transforms the discrete spacetime graph signals on nodes into their momentum space counterparts. Spectral clustering in this momentum eigenspace is investigated as a community detection method for temporal networks and directed acyclic graphs, and its performance is evaluated in both synthetic and real-world datasets. This study bridges casual set theory, temporal networks, and graph signal processing, and hopes to be of interest to a broad range of communities including network science, physics, and graph learning. |
41. Predicting critical behaviors of complex dynamic systems via learning the governing mechanism PRESENTER: Xiangrong Wang ABSTRACT. Critical behaviors lie at the transition points between different states of a complex system and are therefore essential to the stable operation of the system. However, critical behaviors are typically resulted from a narrow range of parameters, which makes an accurate, long-term reliable prediction remain a major challenge. Here, we propose a framework to learn the rules that govern the dynamical process of a system. The learned governing rules further refine and guide the representative learning of neural networks from a series of dynamic graphs. The combination enables a knowledge based prediction for the critical behaviors of dynamical network systems. We evaluate our framework in the prediction of two typical critical behaviors in the spreading dynamics on various synthetic and real-world networks. We show that governing rules can be learned effectively and can significantly improve prediction accuracy. Our framework demonstrates a scenario of facilitating the representability of deep neural networks through learning the underlying mechanism, which aims to steer applications in predicting for example system pattern formation, system instability, whose behavior is complex yet driven by learnable physical rules. |
42. Percolated tree photon-cluster states generation for efficient photonic quantum error correction PRESENTER: Ashlesha Patil ABSTRACT. Cluster states are entangled states that can be represented as graphs with vertices (qubits) and edges (entanglement). Tree cluster states are highly desirable states for error correction against photon loss in all-photonic architectures. The current method to prepare these states results in huge resource overheads as the operations used to generate these states are probabilistic in nature. In this paper, we propose a percolation-based method to deterministically generate infinite tree states. This method maps to a mixed three-way percolation on the Bethe lattice as there are three probabilistic processes influencing the state generation. We analytically calculate the 3D percolation region where the state generation becomes deterministic. This method greatly reduces the resource overhead by getting rid of the need for multiplexing, unlike the current method. Smaller tree states used for applications such as all-photonic quantum repeaters can be constructed from the large percolated tree by strategically deleting some vertices (qubits) of the percolated Bethe lattice. |
43. Does metastability predict sensitivity to perturbation? PRESENTER: Eleanna Kritikaki ABSTRACT. Metastability describes the lability of a dynamical system, expressed in the tendency of that system to temporarily dwell near 'remnants’ of stable attractors. In simulations of network dynamics that utilise the Kuramoto model of synchronisation, a commonly used measure of metastability quantifies it as the variance in global coherence over time. It has been suggested that this measure peaks at/near the critical point, which marks the transition from incoherence to synchrony. Metrics derived from information theory, such as coalition entropy and integrated information, have also been found to peak at the same point as metastability. In this investigation, we therefore pose the question of whether metastability could be a predictor of criticality in the system. One hallmark of critical behaviour is heightened sensitivity to perturbations. With this in mind, we assessed the extent to which the metastability metric correlates with sensitivity to perturbation. We systematically applied random, short-lived perturbations to the nodes of a human brain connectome from probabilistic tractography. We quantified the sensitivity to the perturbation by the deviation of the phases of the perturbed oscillators from their unperturbed instances. We found that the sensitivity to perturbation and its relation to the metastability metric heavily depends on which node is being perturbed. Specifically, the post-perturbation deviation of high-degree nodes peaks significantly higher than that of low-degree nodes. In contrast, the post-perturbation deviation of low-degree nodes exhibits a ‘flatter profile’ with more diffuse maxima. As such, the sensitivity of high-degree nodes to perturbation shows a closer relationship to the metastability index, which is likely due to their role in the formation of the first coherent clusters preceding the onset of global network synchronisation in a structured network. However, we note that neither the sensitivity of high-degree nodes, nor that of low-degree nodes peaks at precisely the same location as that of the metastability index. Instead, the different peaks seem to define a region of maximal sensitivity, centred around the critical point and reflecting the gradual recruitment of synchronised clusters at the onset of global synchronisation. Supporting evidence for this hypothesis could come from using brain connectomes produced by deterministic tractography as those are typically much sparser and the hubs better defined. |
44. Structural Control of Community-Structured Oscillator Networks PRESENTER: Enrico Caprioglio ABSTRACT. Network oscillations are thought to play a crucial role in the function of many natural and artificial real-world systems. This has motivated extensive research into the mechanisms underlying the emergence of synchronization patterns in complex networks and strategies for controlling them. An important structural characteristic of many biological networks is modularity. In the human brain, for example, synchronization between modules has been associated with various cognitive functions. In this work, we use community-structured networks of Kuramoto oscillators to ask whether it is possible to achieve specific (de)synchronization patterns between modules. Specifically, we consider networks with strong intra-community coupling and weak inter-community coupling so that phase-synchronization within each community is reached before the global phase-locked regime. First, we show that, at the onset of the global phase-locked regime, the communities of oscillators can be grouped into macro-oscillators. Then, we extend the procedure recently introduced by Menara et al. (2022) to drive the network of macro-oscillators to display multiple functional patterns. Specifically, whereas Menara et al. prescribe configurations through tuning of the oscillators’ intrinsic frequency and interconnection weights, here, we purely rely on structural changes, namely, altering the number of inter-community connections whilst maintaining the same connectivity between macro-oscillators. Our approach highlights the value of considering coarse-grained descriptions of oscillator networks. For example, our method could be used to infer the number of connections between communities showing a degree of phase-synchronization when the precise connectivity is unknown. Additionally, our study prompts further research to identify control strategies for community-structured networks that exhibit intricate and unstable partial synchronization patterns, such as metastability, by relaxing the global phase-locked regime constraint. |
45. Structural insulators and promotors in networks under generic problem-solving dynamics PRESENTER: Johannes Falk ABSTRACT. Dynamics on graphs is an important concept for analyzing distributed decision-making and task coordination. A simple but challenging model of distributed problem-solving is the graph or network coloring problem. In this problem, the collective goal for each agent is to select a color for their vertex that differs from the colors of all of their neighbors. The coloring process proceeds in a decentralized way and is completed as soon as all links connect different colors. For small-world networks, it has been shown that the number of shortcuts has a drastic effect on the convergence speed toward a fully solved system. The high variability of the runtimes for a given number of shortcuts, however, suggested that there are easy-to-solve and difficult-to-solve graphs. To obtain some stylized facts about distributed decisions, we investigate the impact of the length of links within small-world networks on the performance of the graph coloring problem. For this purpose, we first use a genetic algorithm to prove that - for a given number of shortcuts - there are indeed networks that are easy to solve or difficult to solve. We then introduce a new heuristic whose behavior can be continuously tuned by a parameter: For large parameter values the agents always select their color by reasoned considerations (conflict minimization). The smaller the parameter, the more often the color choice happens randomly. We show that - depending on how reasoned the agents are - shortcuts can act as topological insulators or topological enhancers, i.e. either delaying or accelerating the regional reorganization efforts towards a trans-regionally compatible solution. The graph coloring problem is the simplest model to analyze the conflicts between and the solvability of distributed (polycontextural) logical systems, and thus provides a platform to study questions in the theory of polycontexturality. Additionally, our findings have implications for the understanding of the emergence of technological standards (here represented by globally compatible solutions), as well as for the development of more robust scheduling schemes in manufacturing and resource allocation. |
46. Just in time vs. all in sync: An analysis of two types of synchronization in a minimal model of machine activity in industrial production based on excitable dynamics on graphs PRESENTER: Sanghita Bose ABSTRACT. The notion of synchronization in logistics is distinct from that encountered in the natural sciences, and in particular, in physics. In logistics, synchronization is often associated with a ’just in time’ paradigm in supply and production systems, whereas physics synchronization explains the simultaneous and repetitive behavior of a system. A perfect synchronization in logistics is that the arrival (of goods at a warehouse or parts at a machine) is a cue for a subsequent transportation or manufacturing event. Globally, this type of synchronization can be envisioned as a wave of activity running through a distribution network or a production network. Here we employ a minimal model of propagating excitation (representing machine activity) in a graph (representing a production network). The model has two main parameters: machine setup time (given via the busy-to-idle machine recovery probability p) and conflicts with other production tasks or alternative products running on the same production line, all represented by the rate of spontaneous activation, f . The nodes (or machines) in the network can be susceptible (or idle), excited (or active), or refractory (or resetting) and are updated synchronously over time with defined update rules governing their states. The relevant dynamical information is then used to calculate the logistics synchronization index which is a function of the correlation that exist between the pairs of connected and unconnected machines in a network. Physics synchronization is calculated as a function of how often a large number of machines are simultaneously active with respect to an internal clock. Our investigation reveals that though the two distinct kinds of synchronization display fluctuating relationship at base level they are highly negatively correlated related at high machine setup level representing low recovery probability implying that activity either organizes into more synchronous activity or more sequential activity, but not both. Next we see the formation of two distinct regions of parameter space dominated by the two types of synchronization. The region of low recovery probability coupled with low rate of spontaneous excitations exhibit mostly negative correlation between the two synchronizations whereas a positive correlation is experienced for higher values of probability coupled with higher spontaneous excitations. Next, we study how these synchronizations depend on network architecture. We further use mean-field approaches to analytically predict the dynamics to better understand the interplay of the two types of synchronization. |
47. Community can form the synchronization stability of a network ABSTRACT. A power grid is one of the world's largest networked systems that is necessary to sustain a modern society. Because its connection structure can be understood as a network topology, network science has been used to investigate power grid features such as cascading failure and synchronization stability. So far, the network approach has mainly focused on the microscopic network characteristics of power-grid nodes and their stability. In this study, we expand our understanding of synchronization stability to the mesoscale of a power grid, a community. Because the dense connection in a community improves node interaction, it naturally impacts the nodes' reaction to external perturbations that could disrupt their synchronization. We show that node attributes related to constant community membership have an impact on successful synchronization recovery. We use the Chilean power grid as a case study to show how community inconsistency influences the synchronization stability of power-grid nodes. In addition, we also find that the effective function of a community—a consumer or a producer group—determines the overall distribution of partial synchronization stability in the German power grid as a case study. Our research reveals the importance of communities in synchronous interaction theoretically and power grid dynamics in practice. |
48. Structure, resilience and evolution of the European Air Route Network from 2015 to 2018 PRESENTER: Pau Esteve ABSTRACT. In spite of the dynamic nature of air transport, air route networks, i.e. the backbone used to organise aircraft flows, are expected to be mostly static, with small changes occasionally being introduced to improve the efficiency and resilience of the system. By leveraging a large data set of European flights comprising years 2015 to 2018, we analyse its structure and evolution from the perspective of complex networks, with the aim of firstly describing it, and secondly to confirm its static nature. Results depict a highly dynamic system, with major topological changes happening at the end of 2017. Peripheral links are usually more vulnerable, due to the lack of effective reroutings, as well as central regions; additionally, the overall resilience of the network is almost constant throughout time, in spite of an increase in traffic. Beyond specific operational insights, these results highlight the importance of taking into account the evolution of this network in the study of traffic flows. |
49. Subgraphs, motifs, and scaling in a dynamic airline network PRESENTER: Steve Lawford ABSTRACT. Understanding the small-scale topological structure of evolving airline networks is crucial for developing effective transportation policies and understanding firm behaviour. In this paper, we investigate the dynamic and spatial properties of small undirected subgraphs in Southwest Airlines' direct route service on the U.S. domestic market over a 15-year period. Using exact enumeration formulae, we identify over- and under-represented subgraphs, known as motifs and anti-motifs, with up to five nodes. Our analysis reveals significant topology transitions in Southwest's network, indicating the need for a dynamic modelling approach. We further provide evidence for time-varying power-law scaling between subgraph counts and the number of edges in the network. We compare these scaling properties with those of an Erd{\H{o}}s-R{\'e}nyi random graph and several deterministic graphs, and develop a toy regime-switching model to provide insights into changes in scaling behaviour. Our findings suggest that dynamic local topology plays a critical role in shaping economic decision-making at different levels. Our results extend the toolkit of subgraph-based methods and provide new insights into the complex interplay between transportation networks and strategic behaviour. This research has practical implications for policymakers, airlines, and other stakeholders seeking to understand and navigate the ever-evolving landscape of transportation networks. |
50. PageRank centrality with non-local random walk-based teleportation PRESENTER: David Bowater ABSTRACT. PageRank centrality was originally introduced to rank pages of the world wide web based on the movement of a random web surfer around the web network but it has grown in popularity due to its simplicity and generality, which has led to its use in a wide range of applications. However, when the random web surfer idea is generalised to a random walk on a network (where we can think of people, goods, information, or anything else moving around a network), special attention must be given to the notion of teleportation because it implies that a random walker will jump or ‘teleport’ directly from one node to any other without considering how far apart the nodes are. While this interpretation may be plausible in some scenarios, it is less so in others (e.g., transportation networks and mobility networks). Thus, our goal here is to introduce a general measure of PageRank centrality whereby the teleportation probabilities depend, in some way, on the distance separating the nodes, which we accomplish by exploiting recent advances in non-local random walks (i.e., random walks with long-range jumps). In this talk, we will describe the formulation of the proposed centrality model and present some experimental results on real-world networks to illustrate the differences between the two approaches. |
51. Dynamical processes in heterogeneous mobile agent networks PRESENTER: Ricardo Tetti Camacho ABSTRACT. Some dynamic processes show patterns of mobility and behavior, for example the SIR model for epidemics and the Maki-Thompson model for rumors, in the theoretical field we use these models in a homogeneous mixture, when everyone in the model had the same chance of contracting the disease. In this paper, we use computational modelling tools to explore the heterogeneous mixing using a rough surface to give to each person in the simulation a different probability to get the disease. Specifically, we use brownian motion to generate the surface and see how this pattern of mobility implies in the propagation. Our results suggest that the presence of a rough surface implies less spread of the dynamic, because this roughness resembles isolation bubbles, in the case of epidemical processes, and a reduction of the contact network that is formed at each time step. |
52. Characterize the role of human mobility behind COVID-19 spatial invasion in the US PRESENTER: Giulia Pullano ABSTRACT. Since late March, after detecting COVID-19 cases in many counties in the US [1], US states -as most of other countries worldwide -start to issue a stay-at-home order, mandating all residents to stay at home except to go to an essential job or shop for essential needs. Although associated mobility reductions mitigated local epidemic activities, they did not contain the national transmission [2]. The role of human mobility in driving COVID-19 national spread and the reasons why human mobility restrictions have not been sufficient to contain the invasion process are still unclear. Here we answer this question i) by analyzing human mobility network over time from mobile phone traces; and ii) integrating time-varying mobility network into a spatially explicit stochastic transmission model to reproduce the spatial invasion of the first wave of COVID-19 in the US [3]. Even if there is a reduction of the median degree distribution in April 2020 during the lockdown, and in November 2020, during voluntary social distancing (Fig1 a), however, the persisted links in the coupling network have not been largely impacted by mobility restrictions (Fig1 b), favoring the spatial invasion. Indeed, by simulating the invasion dynamics, the predictive power of the model is higher when the model is informed by the coupling network with respect to a random network (Fig 1 c). Most interestingly, long-range links are the ones which played a crucial role in the spatial invasion, suggesting the epidemic dynamics in US was shaped mainly by domestic air traffic importations. This characterization underlines the key aspects of human mobility which shaped COVID-19 spatial invasion and the importance to access real-time mobility data for preparedness and outbreak response. |
53. Interplay between mobility, multi-seeding and lockdowns shapes COVID-19 local impact PRESENTER: Mattia Mazzoli ABSTRACT. Assessing the impact of mobility on epidemic spreading is of crucial importance for understanding the effect of policies like mass quarantines and selective re-openings. While many factors affect disease incidence at a local level, making it more or less homogeneous with respect to other areas, the importance of multi-seeding has often been overlooked. Multi-seeding occurs when several independent and non-clustered infected individuals arrive at a susceptible population. This can lead to independent outbreaks that spark from distinct areas of the local social network. Such mechanism has the potential to boost incidence, making control efforts and contact tracing less effective. In a recent work [1], by means of a metapopulation model informed with daily mobile phone records from the first COVID-19 wave and by testing different network topologies for sub-populations internal mixing, we show that the effect produced by the number of initial infections is non-linear on the incidence peak and peak time, independently of the local social network topology assumed for populations. When case importations are carried by mobility from an already infected area, this effect is further enhanced by the local demography and underlying mixing patterns: the impact of every seed is larger in smaller populations. Finally, both in the model simulations and the analysis of epidemiological records, we show that a multi-seeding effect combined with mobility restrictions can explain the observed spatial heterogeneities in the first wave of COVID-19 incidence and mortality in five European countries. Our results allow us to identify what we have called the epidemic epicenter: an area that shapes incidence and mortality peaks in the entire country (Fig. 1). The present work further clarifies the non-linear effects that mobility can have on the evolution of an epidemic and highlight their relevance for epidemic control. From a public health point of view, surveillance on the importation of cases in a region is fundamental to anticipate the intensity of local outbreaks and minimize their consequences. |
54. From networks to flows: Using flow maps to understand mobility patterns in cattle trade PRESENTER: Sima Farokhnejad ABSTRACT. The study of mobility patterns is important in a variety of research areas [8, 6], and it has a direct impact on policymaking in urban environments, analyses of disease transmission, and epidemic control [5, 3]. There are many Network-Science approaches to model epidemics in various regions [4, 9]. While the power of network approaches to model disease spread is undeniable, they suffer from shortcomings. Very few approaches have used large cattle mobility datasets such as the ones available in Brazil or the USA [7] where the level of uncertainty about the trade is quite high [2]. For example, Brazil is the second-largest producer of beef in the world with an annual production of 10.3 million metric tons [1]. Yet, Brazil is known to have a less-than-perfect tracking system leading to uncertainty in the trade network [2]. Network-based approaches often ignore relevant spatial information; when cattle herds are transported between locations, areas in-between are affected. However, they might have higher spatial centrality because ground transportation is used in most cattle movements. This information is lost in networks because they only capture origin-destination pairs. Hence, we propose to convert the network of movements into a vector-field representation, as it complements the benefits of network analyses because it can capture dynamic mechanisms that would be hard to do using networks. It is a general approach that could be applied to any type of mobility dataset. Our approach can also be done computationally cheaper; hence, it is appropriate for large-scale datasets. Our objective is to convert the origin-destination representation (a network) into a vector field representation for the region being studied. The first step is to set divisions of the region being investigated—a granularity of the study—and project the network in this space. we can look into all the outgoing edges for each cell and taking them as vectors with distance and value, and then combine them into a resulting vector for each particular cell using a relevant method; this resulting vector represents the general flow direction for that cell. Note that only cells that have network nodes with outgoing edges have vectors, which means the vector field is incomplete. We use an interpolation method to complete the field. In order to look at the proposed methodology in a real scenario, we use the cattle mobility dataset from the state of Minas Gerais in Brazil. Vector fields are generated using the monthly trades for a spatial subdivisions (micro-regions). A cosine similarity analysis was conducted for micro-regions during the year 2013. The cosine similarity values are shown in Fig. 1A-1 for all micro-regions. The values of the similarity are used to cluster the cities or regions as having similar dynamics within the year; the different colours in Fig. 1A-2 indicate different cluster based on cosine similarity. Looking at Fig. 1A-1 we can see that several regions have values close to 1 (dark blue). That happens because the vector direction for the micro-regions remains nearly unchanged over a year, indicating a predictable direction for cattle trading from those regions. The cosine similarity measure shows how a micro-region’s vector behaviour has changed overtime; being in a range from steadiness or dynamic. As well as comparing the behaviour of one spatial area with others, which can lead to the division of larger spatial areas into predictable and unpredictable regions. The next step is to determine whether an area belongs to a critical point in the vector field. In the vector fields, critical points need to be identified for this purpose. Fig. 1B shows the resulting points of interests, in this case, sinks and sources, for four separate months. One can clearly see that the sinks change for different periods, which indicates that high dynamics of cattle trades in Minas Gerais is present. When comparing Fig. 1A-2 with Fig. 1A-2, most of the areas of the largest cluster of micro-regions (those with bluish cyan color in Fig. 1A-2) are not critical points in vector field. Comparatively, the areas with high sink (red areas in Fig. 1B-1) and source (dark blue areas in Fig. 1B-1) density correspond to those that belong to the cluster of parts whose behaviour is not steady. In other words, there are some risky parts in the system that act as sources (or sinks), and we cannot estimate where the epidemic will spread based on their direction of flow. Even though we cannot predict the exact direction in which diseases spread, we can restrict trade by knowing where the source points of flow are for each period of time. The next step in future work is to analyse the pattern of changes over time in sinks and sources. |
55. Changing Hands: Heterogeneity in network structure switches the dominant transmission mode of infectious disease PRESENTER: Pratyush Kumar Kollepara ABSTRACT. Several recent emerging diseases have exhibited both sexual and non-sexual transmission modes (Ebola, Zika and mpox). In the recent mpox outbreaks, transmission through sexual contacts appears to be the dominant mode of transmission. Motivated by this, we use an SIR-like model, to argue that an initially dominant sexual transmission mode can be overtaken by casual transmission at later stages, even if the basic casual reproduction number is less than one. The disproportionate contribution of casual transmission to the epidemic size highlights the risk of intervention designs which are informed only by the early dynamics of the disease. |
56. Causal inference in sparse partially observed large-scale networks ABSTRACT. Causal inference traditionally assumes "no interference" between treated units such that the treatment of one could not affect the outcome of another. However many real-world systems have interacting units, such as individuals connected in a large social network, that facilitate treatments to spill over connected units and render this assumption untenable. In this setting, inference of direct and indirect "spillover" effects suffers from assuming either that the network is fully observed -- even though such data are very unlikely to be available in practice--or that the effects are marginally additive -- which neglects the non-linearity of actual diffusion processes on networks. Our work provides analytical progress and an inference framework to address this open challenge of estimating causal effects in complex networks when connections and outcomes are partially observed, like in a sub-nationally aggregated social network. We assume that a randomised treatment of the networks' nodes seeds a parameterised diffusion of the treatment over the network grounded in a realistic spreading process. This process yields probabilities of nodes being "exposed" to the treatment, that we show can be analytically determined using a self-consistent equation when the network is sparse. We can conveniently describe both direct and indirect causal effects in terms of the diffusion parameters, allowing us to infer average causal effects for an individual in a privacy-preserving manner, at varying levels of treatment saturation, using a single experiment design and the consequently generated data on sub-nationally aggregated outcomes. |
57. Role of higher-order components in higher-order contagion dynamics on hypergraphs PRESENTER: Jung-Ho Kim ABSTRACT. The presence of the giant component is a necessary condition for the emergence of collective behavior in complex networked systems. Unlike networks, hypergraphs have an important native feature that components might be of higher order. Here, we introduce the m-th-order connectivity in hypergraphs as the connectivity between two hyperedges sharing m common nodes, and the m-th-order component is the connected component only through m-th- or higher-order connectivities [1]. We call the components with m ≥ 2 to be the higher-order components (HOCs). As a result of investigating seventeen real-world hypergraphs, we revealed that the existence of extensive largest HOCs is ubiquitous in the real world [1]. Nevertheless, the role of giant HOC in collective dynamics on hypergraphs has yet to be elucidated. To this end, we examined how giant HOC affects a typical high-order susceptible-infected-recovered (SIR) and susceptible-infected-susceptible (SIS) dynamics in which the infection rate β_i is determined according to the number of infected nodes i in a hyperedge, and infected nodes become recovered/susceptible at the rate of μ [Fig. 1(a)]. We defined the number of infected nodes i in a hyperedge as an order of the infection and rescaled infection rate λ_i = β_i/μ. In order to observe the role of the largest m-th-order component on the specific i-th-order infection dynamics, we set the only i-th-order infection to occur we are interested in. As a result, in the real-world hypergraph where the largest second-order component (2OC) extensively exists [Fig. 1(b, e)], both the first-order infection and the second-order infection had a relatively small critical point from both a single infected hyperedge seed and from ρ_0 = 1 [Fig. 1(c, f)]. However, in the randomized hypergraph where the largest 2OC is very small, the second-order infection from a single infected hyperedge seed had a much larger critical point than when the extensive largest 2OC was present [Fig. 1(d, g)]. On the other hand, with the initial condition where the infection was already prevalent (ρ_0 = 1), the second-order infection had a relatively small critical point. It could be understood as follow. When the infection spreads from a single seed on a hypergraph with only the giant first-order component (1OC), two infection routes are required for second-order infection to spread, making conditions difficult [Fig. 1(h)]. However, if giant 2OC exists, even one route of infection is sufficient. On the other hand, if an infection is already prevalent, it is possible to maintain infection with only second-order infection since two infection routes can be easily secured even with only the giant 1OC. Extending the above understanding to any infection order i, from a single seed, at least the giant i-th order component is required for the i-th order infection to spread alone, whereas, from ρ_0 = 1, only the presence of giant 1OC was sufficient to sustain infection with the i-th order infection alone. Finally, we propose a novel random higher-order-connected hypergraph model to systematically understand only the pure effect of HOC, in which the effect of various structures (e.g., short loop, clustering, assortativity) that exist in the real-world hypergraphs have been removed [1]. We verified how the contagion dynamics change depending on the presence or absence of giant 2OC using this model, and the results were qualitatively similar to the real-world hypergraph. Figure 1: (a) A schematic illustration of the contagion dynamics. (b) The relative size of the m-th-order component and (c-d) the relative outbreak size of SIR dynamics on the Coauth-DBLP hypergraph. (e) The relative size of the m-th-order component and (f-g) the relative outbreak size of SIS dynamics on the Contact-high-school hypergraph. (h) Second-order infection dynamics on the 1OC and 2OC. Reference [1] J.-H. Kim and K.-I Goh, Higher-order components in hypergraphs, arXiv:2208.05718 (2022). |
58. Restoration of the isolated core's influence on k-shell structures PRESENTER: Gyuho Bae ABSTRACT. The k-shell decomposition is an easily computed and intuitive tool for interpreting network dynamics and would seem to have a natural role in identifying influential nodes. However its utility is disputed in the literature. For example, in Kitsak's study in 2010 the authors support the use of the k-shell number(in short, ks). However, in other studies from Liu 2015, it is conjectured that nodes with high ks comprise a closed community, and those nodes decrease the predictability of influential nodes by using ks. In this study, we investigate whether the isolated core's lack of influence can be addressed by adding links from it to other core-periphery structures generated by a bespoke random block model. Our results show that the addition of only a limited number of links can sharply increase the core's influence. Furthermore, by investigating a number of real-world networks we conjecture that an extremely isolated core is a very rare phenomenon in practice and so the k-shell algorithm or such decomposition-based algorithms can still be useful in many cases. To support the conjecture, we modify the number of links between the core and other nodes in the network in well-known networks and measure how this affects the core's influence and ks number's predictability. |
59. A Branch-and-Cut Method for Globally Maximizing Modularity and Discovering Network Communities PRESENTER: Samin Aref ABSTRACT. Community detection is a classic problem in network science with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize modularity across different ways that a network can be partitioned into communities. Most heuristic algorithms for modularity maximization tend to scale well for large networks, but the scalability of these heuristics come at a cost: their partitions have no guarantee of proximity to an optimal solution and these sub-optimal solutions tend to have partitions disproportionally dissimilar to an optimal partition. Given these results, we propose a community detection algorithm based on exact or approximate maximization of modularity acknowledging that there will be a limit to the size of the largest network that such an algorithm can handle. We propose the Bayan algorithm which, unlike the existing methods, returns network partitions with a guarantee of either optimality or proximity to an optimal solution. At the core of the Bayan algorithm is a branch-and-cut scheme that solves a sparse integer programming formulation of the modularity maximization problem to optimality or approximate it within a factor. It takes advantage of structural properties of input networks and leverages the state-of-the-art optimization approaches (e.g., valid inequalities, variable fixing, pre-solve graph reduction, and lower bounding heuristics) to push the limits for solving a fundamental NP-complete problem exactly and efficiently. Through extensive experiments, we demonstrate Bayan's distinctive capabilities not only in maximizing modularity, but more importantly in accurate retrieval of ground-truth communities. We compare Bayan to 22 community detection algorithms including the modularity-based heuristics, optimization-based heuristics, as well as other community detection methods which do not rely on modularity or optimization. We use Lancichinetti-Fortunato-Radicchi (LFR) benchmark graphs as test instances with ground-truth partitions. We measure the performance of each method based on the AMI of the its partition with the planted ground-truth partition from the LFR graph generation process. Bayan's comparative level of performance remains stable over variations in the amount of noise in the data (graph) generation process. The performance of Bayan as an exact global modularity maximization algorithm also reveals the theoretical capability limits of maximum-modularity partitions in accurate retrieval of communities. Overall, our analysis points to Bayan as a suitable choice for a methodologically grounded detection of communities through exact (approximate) maximization of modularity in networks with up to 10k edges (and larger networks). Prospective advances in graph optimization and integer programming can push these limits further. |
60. Augmented Degree Correction for Bipartite Networks with Applications to Recommender Systems PRESENTER: Benjamin Leinwand ABSTRACT. Inputs to recommender systems can be represented as a matrix where each row represents a user, each column represents a product (e.g. a movie or song), and each entry represents a user’s rating of a given product. For services with large corpora and subscriber bases, the vast majority of ratings are unknown, resulting in very sparse matrices with mostly empty entries. Matrices of this type can be treated as partially observed, dense, weighted, bipartite networks. Similarities across users and across products can be leveraged to recommend products to users, even when no other information is observed. We adapt a model for the creation and estimation of dense weighted networks that incorporates community structure and flexible connectivity patterns, known as “augmented degree correction,” to this new setting. Additionally, we introduce an algorithm that accounts for missing values and utilizes the bipartite structure of the network, performing community detection on the user nodes and item nodes simultaneously. We apply our proposed methodology to a subset of the Jester dataset, and to simulated data. In both cases, we examine the results at various levels of sparsity. Our proposed approach outperforms Graph Convolutional Networks, Item Based Collaborative Filtering, User Based Collaborative Filtering, SVD with column-mean imputation, and Funk SVD on simulated data based on Normalized Mean Absolute Error as long as the data exceeds a modest sparsity threshold. On the subset of the Jester dataset, our model is the best performer when few ratings are observed, while existing approaches achieve better results when more data is available. These results indicate that the proposed approach may not be well-suited to detecting subtler signals restricted to a smaller subset of users and items, leaving room for enhancement. Nonetheless, as this model can robustly recover large scale, coarse patterns in the data, there appears to be a sparsity regime where the proposed approach provides better than state-of-the-art performance on ratings prediction. |
61. Simple crowd dynamics to generate complex temporal contact networks PRESENTER: Razieh Masoumi ABSTRACT. The underlying mechanisms behind the peculiar properties of empirical contact networks are not well known yet. However, we might be able to extract some statistical properties of these contact networks from relatively simple contact models, which do not necessarily have the complexities of empirical contact networks. In this work, we first aim to simulate a range of 2D models of particle dynamics and then generate their contact networks for each of these 2D models. In our 2D models, we consider a detection area for each particle which determines how particles are in contact. We define a contact as two particles are in the detection area of each other (See Fig. 1a). We specifically focus on temporal properties of these contact networks. We investigate which temporal properties can be explained by these simple crowd particles models. we compare these temporal quantities for 2D particle models and empirical data sets. The data were collected during four events organized by GESIS, the Leibniz Institute for Social Sciences, in Cologne, Germany. In Fig. 1b we have depicted the contact and inter-contact distributions for two Brownian and Active Brownian Particles model. |
62. The lock-down communities management: Optimal spatio-temporal clustering of the inter-provincial Italian network during the Covid-19 crisis PRESENTER: Jules Morand ABSTRACT. The COVID-19 pandemic has highlighted the need to better understand the dynamics and evolution of epidemics. Moreover, the unprecedented amount of information collected about it, at different levels of resolution, motivates a data-based modelling of pandemics. Besides, companies like Meta have made available co-location and movement data based on cell-phone tracking of their users. These data are completely anonymous, and can be used in an aggregated form to estimate probabilities of movement of people in between different areas, at different scales as well. Numerous studies in statistical physics have shown that it is possible to map an epidemic dynamic onto a network, whose nodes corresponds to persons or groups, and the links the contacts between them. Using movement data from META Data for good program, and publicly available data on the Italian population from ISTAT, we model the circulation of people in Italy before and during the Covid pandemic. In grouping the data at the level of provinces, we reconstructed the transition matrix for each day for the whole network. Using these matrices, we performed a spatial and a temporal clustering of the Italian movement network. Interestingly, the temporal clustering successfully identifies the first two lockdowns, without any further information about them. The spatial clustering results in 11 to 23 clusters depending on the period (normal traveling vs lockdown). The spatial clusters coincide well with proposed Italian macro-regions, and can be explained based on the available infrastructure and economical exchanges between different Italian regions. |
63. Temporal Behaviour of a Large Scale Microservice Network PRESENTER: Giles Winchester ABSTRACT. The microservice architecture, which is becoming the dominant paradigm for developing large-scale systems, enables the composition of applications using smaller services that interact with each other. Interactions evolve and new microservices are added as the system grows in size and functionality. Multiple instances of each microservice run in the system, and their number is dynamically adjusted to match the demand. As a result, dependencies within the architecture can become too complex to keep track of, which in turn can lead to unforeseen consequences for system performance and reliability, including catastrophic failures. To characterise the dependency behaviour of a large scale microservice system, we use tools from the framework of temporal networks. Specifically, we use the method proposed by Masuda and Holme (2019) to detect sequences of system states from network snapshots based on the distance between the eigenspectra of the Laplacian of the network snapshots' adjacency matrices. The temporal evolution of similarity of structural connectivity constructed from calls made between microservices in the Alibaba Cluster Trace Program shows both prolonged periods of steady behaviour as well as distinct fluctuations over the half-day record. Clustering analysis reveals that these fluctuations can also be characterised as persistent states with their own structural properties. Importantly, we find that some of these states are associated with changes in functional performance of the system, specifically, its response time. Since recording runtime calls between microservices can incur significant costs and is not always possible, we then ask whether a functional characterisation of the system based on CPU usage of individual microservices could predict the identified states. We find that the aforementioned system states are associated with specific patterns of functional connectivity. However, the coarse granularity of CPU recordings makes the determination of faster-timescale transient states more challenging. Further research on longer and finer-grain datasets is required to establish if structural connectivity can indeed be predicted from inferred functional connectivity, and to characterise transitions between system states over longer timescales. |
64. Evolution of Wikipedia knowledge graphs in response to planned events: A temporal networks approach. PRESENTER: Nicola Pedreschi ABSTRACT. The evolution of knowledge networks, such as networks of inter-referenced Wikipedia pages, is driven by the temporal patterns of human editing activity. Many existing works show how the volume of such activity (page visits, YouTube views, book sales, etc.) is in turn susceptible to exogenous perturbations such as news and events, and how it undergoes characteristic changes in response or in anticipation of an event. In this work not only we consider the changes in volume/density of the network in relation to an event, but also quantify the impact of an event on the time-varying fabric of page-to-page networks. In this work we analyze the time-evolution of temporal networks where nodes represent Wikipedia pages, and directed links between pairs of nodes represent the hyperlinks redirecting from one Wikipedia page to the other. In order to examine the effect of a planned event on the structure of the network and on its evolution, we extract the one-hop neighbourhood of the Wikipedia page directly related to the event (e.g., the Beijing 2008 Summer Olympics Wikipedia page, in Figure). We then track the addition/deletion of edges and nodes in time and measure how this process affects the structure of the network at multiple spatial scales. We capture the evolution of node-wise features, such as the time-series of in- and out- degree of individual nodes (panel a) in Figure), to investigate how the growth of the knowledge network unfolds, and is affected by events, at the micro-scale. Sharp, non-trivial increases in in- and out- degree of nodes (pages) can be observed in response to the 2008 Olympic Games in panel a). Similarly, we resort to the analysis of meso-scale structures in temporal networks, such as temporal communities and temporal rich club patterns (panel b) in Figure), thus capturing the evolution of the network at a more coarse-grained spatial scale. While temporal rich club patterns of increasing cohesion are observed during the growth of the network prior to the start of the 2008 O.G. (panel b)), we find evidence of a different, more cohesive pattern triggered by the event. Finally, we analyze the evolution of the overall structure of the network through measurements of the self-similarity of the whole network at different times, e.g., the Jaccard index computed for the network's edge-sets at times t and t*. The latter analysis, thus, allows us to detect and quantify the impact of the periods of global, i.e., macro-scale reorganization of the network in the proximity of the event of interest. In panel c) we display the sharp, wide-spread transition that the network undergoes due to the event, captured by our macro-scale analysis. Altogether, the micro-, meso-, and macro-scale temporally-resolved features we extract allow us to capture the full, multi-scale picture (or rather, movie) of the reorganization of the structure of the network prior to and in response to the event. We therefore use our analytical toolkit to characterise the effects on the relative temporal knowledge graph of different planned-events, such as other sports and politics-related events. Altogether, we devise a method to characterise the temporal patterns that arise from event-driven perturbations of the evolution of a Wikipedia knowledge network, ultimately delivering an event-type-specific taxonomy of different network-response patterns. |
65. Frequent Pattern Mining in Simplicial Complexes PRESENTER: Giulia Preti ABSTRACT. We introduce simplicial patterns (simplets) and generalize the task of frequent pattern mining from the realm of graphs to that of simplicial complexes. Our task is challenging due to the enormous search space and the need for higher-order isomorphism. We show that finding the occurrences of simplets in a complex can be reduced to a bipartite graph isomorphism problem, in linear time and at most quadratic space. We introduce an anti-monotonic frequency measure that allows us to start the exploration from small simplets and stop expanding a simplet as soon as its frequency falls below the selected threshold. We propose a memory-conscious algorithm, FreSCo, that, by carefully exploiting the relationships among the simplices and the simplets, achieves efficiency and scalability. FreSCo can compute the exact simplet frequencies, as well as quickly determine whether a simplet is frequent. Experimental results prove the significance of the simplets with respect to the traditional graph patterns. |
66. Large-scale dynamic network and information theoretic analyses of coherent cortical activity during visual working memory in primates. PRESENTER: Vinicius Lima ABSTRACT. *V. LIMA CORDEIRO (1), S. J. HOFFMAN2, N. DOTSON (3), C. M. GRAY (2), A. BROVELLI (4), D. BATTAGLIA (1,5) 1. Aix-Marseille University, INS, Marseille, France; 2. Montana State Univ. Bozeman, Bozeman, MT; 3. SNL-R, Salk Inst. for Biol. Studies, LA JOLLA, CA; 4Aix-Marseille University, INT, Marseille, France; 5. Univ. of Strasbourg Inst. for Advanced Studies, Strasbourg, France Working memory requires large-scale coordination among widespread cortical and subcortical brain regions. Long-range coherence between the oscillatory activity of cortical regions, arising in multiple frequency bands, may be a mechanism supporting information routing and integration across distributed multi-regional networks. In line with this hypothesis, the strength of coherence links between frontal and parietal regions is consistently modulated by the specific identity of a visual stimulus held in working memory . Is content-specific synchronization limited to the frontoparietal network or do large-scale networks of coherent oscillations support working memory? Here, we perform a time-resolved network coherence analysis of local field potentials recorded simultaneously from dozens of cortical areas in non-human primates performing a visual delayed match-to-sample task. Preliminary results indicate that networks in multiple frequency bands are not only structured in space, but also in time. Indeed, the bursting dynamics of different links are not independent but often coordinated to the fluctuation of other links, as we reveal through a brain-wide, edge-centric functional connectivity analysis. The consequence of these coordinated coherence fluctuations is that "network bursts" emerge spontaneously during the delay period, superimposed on a sustained network scaffold, visible also through static network analyses. We then compute the mutual information between stimulus identity and a variety of node- and link-level features for different frequency bands, comparing their relative encoding, such as node power, node strength within static functional connectivity, link coherence and link "entanglement" (dynamic coupling to other links). We then discuss the results in the context of dynamic mechanisms for stimulus-biased sampling of different classes of network bursts. |
67. Statistical and Machine Learning Link Selection Methods for Brain Functional Networks: Review and Comparison PRESENTER: Ilinka Ivanoska ABSTRACT. Network-based representations have introduced a revolution in neuroscience, expanding the understanding of the brain from the activity of individual regions to the interactions between them. In spite of important successes of this augmented network view, the use of brain network representations also introduced conceptual and computational problems, one of them being the complexity of performing the comparison between two groups of subjects. This hinders both our capacity of deciphering the main mechanisms behind pathologies, and the significance of any statistical and/or machine learning task used in processing this data. When group comparison is performed through automated tools, as e.g., through machine learning models, network representations lead to the common problem of high dimensionality. In short, the number of features (here of links or connections) can become larger than the number of instances, i.e., of subjects available in the analysis; this, in turn, can lead to an over-fitting of the classification models, and hence to unreliable and not generalisable results. The presence of a large number of potentially irrelevant links can further decrease the accuracy of the learning algorithm, and increase memory and computational requirements. A link selection method, allowing to remove irrelevant connections in a given scenario, is an obvious solution that provides improved utilization of the network representations. Selecting features in brain network analysis can help detecting significant biomarkers and brain connections that are relevant for a particular brain disease. This not only results in a conceptually simpler problem: it can also drastically reduce the computational complexity of the same, thus allowing the use of more complex numerical techniques in subsequent steps. The following question remains: given the large number of methods proposed in the Literature to perform such selection, how do they compare to each other when applied to brain networks? To answer this question, this contribution presents a large-scale review and evaluation of a large set of commonly used feature link selection strategies, based on two big families: statistics methods (univariate and multivariate) and machine learning models, making the first instance of such a large-scale comparison of methods for feature selection in brain networks. The link selection methods are evaluated on several datasets according to three main criteria: their performance, in terms of quantity of retained information; their stability, i.e., how consistently they yield the same set of links; and their computational cost. These results are compared across the datasets, to assess their generalisability, and to highlight the best performing ones. Results in this contribution show that most methods perform in a qualitatively similar and efficient way, in the sense that subsequent brain networks classification tasks are not negatively affected. Still, the sets of retained links strongly vary across methods (see Fig. 1); while the same quantity of information is preserved, each method localises it in a different way, thus the high heterogeneity in the set of links retained by each method shows that they are offering complementary views to the brain structure. While machine learning methods are conceptually more complex than statistical ones, they do not yield a clear advantage. The implications of these results in neuroscience tasks will be further discussed from several aspects. |
68. Dynamic fMRI unveils a robust functional network architecture in the brain: a hierarchy of strong positive and weak bimodal correlations PRESENTER: Maria Ercsey Ravasz ABSTRACT. Functional MRI can provide fundamental insights into cognitive processes. However, the dynamic nature of the signal results in widely varying networks not only between individuals, or repeated sessions with a single individual, but also between non-overlapping time windows in the same experimental session (Fig.1a). Analyzing an averaged network can thus obscure information carried by the dynamical properties of this system. In this study, we demonstrate that networks derived from fMRI correlation matrixes are extremely reliable if we focus on statistical distributions of network properties such as edge weight distributions or node distance distributions, and uncover a robust pattern of previously unnoticed brain interactions. Using animal and human data we demonstrate that the pattern consists of a small number of strong positive correlation-driven links, forming a sparse and highly distributed backbone to the functional network. This is complemented by a large number of bimodal interactions with weaker correlations that dynamically switch between positive and negative values (Fig. 1b). Both group-level comparisons between subjects (groups G1 and G2 in Fig. 1b) and test-retest analyses within subjects demonstrate remarkably stable hierarchies of functional interactions. Node distance distributions also show remarkably stable patterns, allowing a ranking of areas (nodes) by their network centrality (Fig. 1c). Based on fMRI data from previous alcohol use disorder studies, we show that despite maintaining the overall hierarchy of the functional network, this pathological state is characterized by mild but robust changes in the probability distribution of network properties, as well as significantly smaller communication efficiencies. Our methodology offers new opportunities to identify sensitive disease biomarkers. |
69. Emergence of High-Order Functional Hubs in the Human Brain ABSTRACT. Network theory is often based on pairwise relationships between nodes, which is not necessarily realistic for modelling complex systems. Importantly, it does not accurately capture non-pairwise interactions in the human brain, often considered one of the most complex systems. In this work, we develop a multivariate signal processing pipeline to build high-order networks from time series and apply it to resting-state functional magnetic resonance imaging (fMRI) signals to characterize high-order communication between brain regions. We also propose connectivity and signal processing rules for building uniform hypergraphs and argue that each multivariate interdependence metric could define weights in a hypergraph. As a proof of concept, we investigate the most relevant three-point interactions in the human brain by searching for high-order “hubs” in a cohort of 100 individuals from the Human Connectome Project. We find that, for each choice of multivariate interdependence, the high-order hubs are compatible with distinct systems in the brain. Additionally, the high-order functional brain networks exhibit simultaneous integration and segregation patterns qualitatively observable from their high-order hubs. Our work hereby introduces a promising heuristic route for hypergraph representation of brain activity and opens up exciting avenues for further research in high-order network neuroscience and complex systems. |
70. Mapping the space of research interests with embedding methods PRESENTER: Filipi Silva ABSTRACT. Scientists are explorers of the vast expanse of knowledge space, driven by a multitude of factors that go beyond their intrinsic curiosity, including funding opportunities, third-party interests, and the popularity of certain topics or problems. This journey of scientific exploration is far from straightforward, and understanding how scientists navigate this space is crucial to unraveling the complexities of scientific progress. In this pursuit, the ability to conceptualize the "location" of scientists in the knowledge space is essential. However, existing approaches to represent scientists' interests, for instance using bag-of-topics models, fall short of capturing the true essence of scientific exploration, as they are unable to capture the relationships between topics, which can be crucial in understanding how topics change over time. To address this limitation, we propose a geometric representation of scientists' interests, which we consider as a representation of the knowledge space. Our method constructs an embedding of papers into high-dimensional vectors such based on citation data (see Fig. a-b). Then, we represent a scientist's research interest in year t as the centroid of the papers that the scientist cited on that year (see Fig. c). This approach captures the relationship between topics in the representation of a scientist's interests. We demonstrate the effectiveness of our representation through visualization and evaluation of its potential in predicting journals of papers and collaboration ties among authors. By using UMAP, we create an visualization that appears to captures many global patterns of how the topics of science are organized. For instance, Physics is connected to Chemistry, which connects to Biological sciences, while Mathematics bridges Physics and Computer sciences. Similarly, Agricultural sciences bridge Biological and Earth & Ocean sciences, and Astronomy forms its own cluster, separated from Physics (see Fig. e). Our knowledge space representation predicts journals (see Fig. f) and new collaborations with higher accuracy than existing predictors based on collaboration networks and bag-of-topics models (see Fig. g). We also explore the associations between research trajectories and productivity and impact, demonstrating that our knowledge space can provide insights into the underlying mechanisms of scientific impact. In conclusion, our proposed knowledge space representation provides a more precise and detailed way to understand scientists' research interests and how they change over time. It captures the intricate relationships between topics and is a powerful tool for analyzing scientific exploration, allowing for better predictions of new collaborations and insights into the mechanisms of scientific impact. |
71. Hypergraph patterns and collaboration structure PRESENTER: Jonas L. Juul ABSTRACT. When a new team forms, who are likely to be members of this team? Who are unlikely to join forces? How do external circumstances such as tight deadlines or empty schedules affect how and which teams form? Questions like these arise in all of the different settings where team formation and performance are important. Networks with higher-order interactions provide a natural framework to formalize team collaboration structures. Here, we introduce a new family of structural hypergraph patterns, m-patterns, which formalize existing relationships between collaborators. We provide new theory and data analysis to study higher-order collaboration structures in emerging human and non-human collaborations. For a new team of m nodes, the past relationship between these m nodes can be formalized as an m-node hypergraph. In this hypergraph, every subset of nodes that have collaborated in the past is connected by a hyperedge. Because every subset of the m nodes can have had any number of collaborations in the past, these hypergraphs representing past relationships can be arbitrarily complex. To avoid this complexity, we limit our scope to summarizing the largest subsets of nodes that have collaborated in the past. Our resulting hypergraph representation of past relationships of m nodes, we call an m-pattern. Recall that a hypergraph is simple if none of its hyperedges are entirely contained in another. DEFINITION (m-pattern) A simple hypergraph with m vertices is an m-pattern. A natural question to ask is whether some m-patterns are more frequently occurring in hypergraphs than others? To understand how hypergraph density influences m-pattern prevalence, we study a G(N,p)-type hypergraph model. For this model, we present several theoretical results: 1) exact solutions for the expected prevalence of all m-patterns for any choices of m, N, and p; 2) proofs that no p exists such that certain (named) m-patterns are the most widespread in the limit N going to infinity; 3) proofs that there always exist p such that certain (named) m-patterns are the most widespread in the limit N going to infinity. We then turn to study past relationships among coauthors in new scientific collaboration. Fig. 1 shows collaboration structure on 2-author preprints as a function of time (top). We investigate team structures on early COVID-19 papers. Presumably these teams had to form quickly when the pandemic hit. We find that the new collaboration between two experienced scientists was consistently less prevalent among early COVID-19 papers than other papers from the same time period (bottom). Finally, we examine the relation between collaboration structure and future citations of academic papers. Using the Open Academic Graph, we study publications from 4 scientific fields: Computer Science, Geology, Mathematics and Sociology. Controlling for paper age, author citations, author age and author publication numbers, we find that 2-author repeat collaborations receive more citations and 3-author repeat collaborations receive less citations than expected in some fields. Understanding the formation and performance of teams is important in many different settings. Yet, a formal way of quantifying past relationships has been lacking in the past. We argue that m-patterns fill this gap. We present theoretical results for their prevalence in the simplest possible random-graph model, and several empirical observations demonstrating the connection between team structures, collaboration circumstances and performance. Many questions about team formation remain unanswered; m-patterns are one new way to approach them. |
72. Time series clustering method based on similarity measurement and community detection with intuitive visualization PRESENTER: Mengsi Cai ABSTRACT. With the growth of big data and computing power, fast clustering of high-dimensional data is no longer a technical bottleneck. However, the complexity of the social system has increased dramatically, which makes it more challenging to summarize the rules of disordered and irregular time series and understand the characteristics behind the clustering results intuitively and effectively. In order to group time series and intuitively display the relations among them, in this paper, we propose a novel technique of time series clustering based on three clustering stages. In the first stage, the similarity between every two time series objects is calculated by a similarity measurement. In the second stage, a similarity threshold is ingeniously determined, and the pairs of time series whose similarity is larger than the threshold are selected to construct a similarity network. In this network, the nodes represent time series objects, and the weights of the edges are the similarity between the pairs of time series. In the third and final stage, the similarity network is divided into communities by a community detection algorithm, and each community represents a time series cluster. The results obtained on several standard datasets and empirical datasets have been compared against traditional similarity-based clustering algorithms, showing that this methodology not only achieves promising performances but also provides an intuitive visualization of the clustering results. The proposed method innovatively expands the application of complex network theory and methods in the data clustering field, which helps discover the hidden structures and relationships within the data, and intuitively displays the clustering results by network communities. |
73. Inferring missing position in a spatial network PRESENTER: Jaiyong Lee ABSTRACT. Understanding the structure of spatial networks with positional information is essential for various fields. For example, in spatial networks such as transportation network or infrastructure networks including a power grid the position determines the characteristics of the nodes influencing the function of the network. However, obtaining complete and accurate positional data is sometimes arduous, particularly when hindered by legal restrictions. In this paper, we propose a method to infer the missing node-position using the edge information of the network. Our approach involves selecting adequate edge information and designing an appropriate objective function to use the chosen edge information. We maximize the use of the edge attribute to better predict the position of nodes. The performance of the proposed method is demonstrated through a case study on the Korean electric power grid. The results indicate that it can effectively infer the missing position, even when a significant portion (approximately 80%) is absent, provided that the edge information is appropriate. Our method can also be applied to other spatial networks such as transportation and mobility networks, social and contact networks, or biological neural networks, which are the examples where the position of the node is strongly related to the function of the network. |
Network Economics
-
Alexandra Brintrup (University of Cambridge, UK): Understanding complex supply networks: an interdisciplinary journey
-
Christian Diem (Complexity Science Hub, Austria): Systemic risk and shock propagation in firm-level supply networks
-
Cesar A. Hidalgo (University of Toulouse, France): What's new in economic complexity research?
-
Guido Caldarelli (Ca' Foscari University of Venice, Italy): The physics of financial networks
-
Giulio Cimini (University of Rome Tor Vergata, Italy): Reconstruction and Validation of Economic and Financial Networks
Huawei Focus Session on Networks and Computation
-
Claudio Gallicchio (University of Pisa, Italy): Reservoir Computing and Beyond
-
Wei Lin (Fudan University, China): Prediction and modulation of complex systems: From a viewpoint of machine learning
-
Silvia Ortín (Universitat de les Illes Balears, Spain): Physical Reservoir Computing: Advancements and Applications inPhotonic Reservoir Computing
-
Jie Sun (Huawei, Hong Kong): Bifurcation behaviors shape how continuous physical dynamics solves discrete Ising optimization
-
Zoltan Toroczkai (University of Notre Dame, USA): Towards understanding the fundamental limits of analog, continuous-time computing
The Networks & Computation focus session is supported by Huawei.
Network Biology
-
Réka Albert (Pennsylvania State University, USA): Computer-aided workflow for Boolean model optimization allows comprehensive integration of biological knowledge
-
Madalena Chaves (Centre de recherche INRIA Sophia-Antipolis Méditerranée, France): Cycle dynamics and synchronization analysis for biological oscillators
-
Pascal Falter-Braun (Helmholtz Institute of Network Biology, Germany): Interactome networks to understand genetic and infectious diseasesand facilitate drug discovery
-
Erzsébet Ravasz Regan (The College of Wooster, USA): Single-cell Network Modeling of Mechanosensitive Routes through the Epithelial to Mesenchymal Transition and their Inhibition by Mitochondrial Dysfunction
-
Ulrich Stelzl (University of Graz, Austria): Coordination of post-translational protein modifications within networks
14:15 | Laplacian Renormalization group for heterogeneous networks PRESENTER: Guido Caldarelli ABSTRACT. The renormalization group is the cornerstone of the modern theory of universality and phase transitions and it is a powerful tool to scrutinize symmetries and organizational scales in dynamical systems. However, its application to complex networks has proven particularly challenging, owing to correlations between intertwined scales. To date, existing approaches have been based on hidden geometries hypotheses, which rely on the embedding of complex networks into underlying hidden metric spaces. Here we propose a Laplacian renormalization group diffusion-based picture for complex networks, which is able to identify proper spatiotemporal scales in heterogeneous networks. In analogy with real-space renormalization group procedures, we first introduce the concept of Kadanoff supernodes as block nodes across multiple scales, which helps to overcome detrimental small-world effects that are responsible for cross-scale correlations. We then rigorously define the momentum space procedure to progressively integrate out fast diffusion modes and generate coarse-grained graphs. We validate the method through application to several real-world networks, demonstrating its ability to perform network reduction keeping crucial properties of the systems intact. |
14:30 | Filtering and modeling information in spatial networks ABSTRACT. Network filtering is crucial to characterize and visualize interconnected systems. In spatial networks, the physical distance between the connected nodes constraints the resulting structural organization. Inspired by purely ecological concepts, we introduce a novel criterion for filtering spatial networks that optimizes the benefit of having many links and the cost associated with their covered Euclidean distance d. We demonstrate analytically, and confirm with extensive simulations, the existence of an optimal connection density ρ which only depends on the spatial distribution of the links. Denser networks naturally emerge when links tend to connect spatially closer nodes, while sparser networks result when links connect the farther nodes (Fig. 1a). This effect is modulated by a parameter α and exhibits a transition at the critical point α=1 (Fig. 1b). We further derive a growth network model where the parameter α determines the local preferential attachment rule. By tuning α, several realistic configurations can be obtained from short-range regular patterns (α<1) to long-range scale-free networks (α>1). Notably, in the case of a real brain network (C. Elegans) (Fig. 1c), the best fit occurs around α=0.5, suggesting a tendency to favor economic links over connectivity hubs (Fig. 1d). These findings offer new insights into the role of space in complex systems and provide a flexible tool to visualize, model and design spatial networks. SEE ATTACHED FILE |
14:45 | On the reconstructability of complex networks PRESENTER: François Thibault ABSTRACT. Network reconstruction is a foundational tool in Network Science for bridging network models and real data. Given multivariate time series or observations of noisy networks, these models provide a—potentially probabilistic—description of the underlying network of interactions between the nodes. In recent years, there has been a raising interest for revisiting Bayesian reconstruction frameworks, built from first principles, where the inference assumptions are explicitly defined through a graph model G and an observation model X conditioned on G. These frameworks were designed partly in response to heuristics-based techniques, such as the correlation matrix thresholding method, which are commonly used in neuroscience, and whose assumptions are harder to extract. Beyond the gain in performance provided by these Bayesian approaches, they unlock a variety of useful tools, such as information theory, to further our understanding of the challenges of network reconstructability. We present a general framework exploring the notion of network reconstructability from an information-theoretic point of view. Given the network and data generating models, respectively given by G and X, we measure network reconstructability using the uncertainty coefficient U(G|X) = I(G;X)/H(G) = 1 − H(G|X)/H(G) ∈ [0, 1], where I(G; X) = H(G) − H(G|X) is the mutual information between G and X, such that H(G) and H(G|X) are the network prior and posterior entropies, respectively. In practice, U (G|X) measures the fraction of recoverable network information hidden in the observations X, which equals 0 when no information is recoverable, and 1 when perfect reconstruction is possible. In principle, U(G|X) represents a theoretical reconstruction limit. We demonstrate this using the probability of reconstruction error, for which we show a correspondence between its lower bound and U(G|X). We illustrate this correspondence by comparing U(G|X) with performance metrics of different reconstruction algorithms as a proxy for their probability of error. Our framework is general and can be used for quantifying the reconstructability of network ensembles as well as single instances of networks and observations, using different modeling assumptions. To this effect, we investigate the influence of the network prior over the reconstructability of specific real networks. In doing so, our framework allows us to perform model selection from first principles in the context of network reconstruction. |
15:00 | Effects of ensemble non-equivalence on network model selection PRESENTER: Francesca Giuffrida ABSTRACT. Canonical and microcanonical ensembles of networks can be either asymptotically equivalent or non-equivalent, depending on the enforced constraints. Network models with an extensive number of local constraints, such as the Configuration Model, have been shown to lead to ensemble non-equivalence. The present contribution investigates the implications of ensemble non-equivalence on model selection, within the framework defined by the Minimum Description Length (MDL) principle. MDL methods try to find the model that allows for the shortest description of the data. The most natural approach, which allows for the smallest description lengths in a precisely defined minimax sense (i.e. shortest description in the worst-case scenario), is the Normalized Maximum Likelihood (NML) one. The NML description length is defined as `complexity minus log-likelihood', where the likelihood term quantifies the goodness of fit and the complexity term prevents overfitting (by disfavoring overly complex models). The effects of ensemble non-equivalence on the likelihood term are well understood: non-equivalence implies a non-vanishing relative entropy density between canonical and microcanonical ensembles, which is equivalent to a non-vanishing difference between the rescaled log-likelihoods of the two models. Conversely, the effects of ensemble non-equivalence on the complexity term have not been studied systematically yet. The present study addresses this problem by calculating and comparing the full MDL of canonical and microcanonical network models. In particular, we compare the NML description lengths of canonical and microcanonical ensembles of bipartite networks defined by global and local constraints, namely the bipartite versions of the Erdős-Rényi model (for which equivalence holds) and of the Partial Configuration model (for which non-equivalence holds). We find that, while the microcanonical likelihood is always larger than the canonical one, microcanonical models are always more complex than their canonical counterparts, regardless of the enforced constraints. However, the differences between log-likelihood and complexity do not cancel out completely, and the resulting `trade-off' is strongly influenced by the degree of ensemble (non-)equivalence. Indeed, we find that the difference between canonical and microcanonical description lengths diverges in the limit of infinite network size for the Partial Configuration Model, while it remains at most finite for the Erdős-Rényi model. Overall, these results indicate that the canonical and microcanonical ensembles defined by equivalence-breaking constraints should be considered different from a model selection perspective. Finally, we investigate the relationship between NML and the Bayesian approach to model selection. While we confirm that these two frameworks can be considered equivalent (up to finite corrections and regardless of the chosen prior) when equivalence holds, we also find that this correspondence is no longer guaranteed when ensemble equivalence breaks down, because the Bayesian approach becomes much more sensitive to the choice of the prior. |
15:15 | Rank Distribution in Random Recursive Trees PRESENTER: Huck Stepanyants ABSTRACT. Random Recursive Trees (RRTs) are one of the simplest models of growing networks, with no parameters, and yet they are not well understood. An RRT is constructed via the following process: starting from a single node at time t=1, for each t>1 a new node is attached to an existing node chosen uniformly at random. The rank of a node, denoted k, is the number of edges in the shortest path from the node to a leaf. The rank is of great interest because it is labeling-independent, i.e., it does not change when the node labels are permuted. The rank distribution of a tree therefore provides insight into its natural unlabeled structure, emphasizing the peripheral structure or fringe. The rank distribution in the limit of large tree size has been extensively studied for many random tree models. These include RRTs, random search trees, and random increasing plane trees. However, until now, the full distribution of ranks in any of these tree models has been unknown. We derived a closed-form expression for the rank distribution c_k for RRTs in the limit of large tree size. Our method involves splitting RRTs into smaller trees to establish recursive equations for relevant RRT properties. These recursions are then solved by using generating functions. We found c_0=1/2 and c_1=1/e, in agreement with previous results [1]. Additionally, we found that c_2=0.124, c_3=0.00804, and c_4=0.000169. In the accompanying figure we confirm these analytical results with numerical simulations. [1] H. M. Mahmoud and M. D. Ward, Asymptotic properties of protected nodes in random recursive trees, Journal of Applied Probability 52, 290 (2015). |
15:30 | Modularity for probabilistic networks PRESENTER: Xin Shen ABSTRACT. Modularity for probabilistic networks Xin Shen, Christian Rohner, Matteo Magnani InfoLab/CNDS, Dept. of Information Technology, Uppsala University Many real-life networks are uncertain and can thus be better modeled using probabilistic networks, where each edge is associated with a probability representing the likelihood of its existence. Probabilistic networks have been studied in many applications, such as protein-protein interaction networks, social networks, and co-authorship networks [1]. Modularity is a fundamental measure to study networks, for example, to study communities and core-periphery structures. However, while the problem of finding modules in probabilistic networks has received some attention in the literature [2, 3], no existing work has studied how to extend modularity to this context. A natural way to use modularity with probabilistic networks is to replace it with its expected value. This is a common approach to deal with probabilistic edges, often used for other measures such as expected degree [4]. However, even assuming that edge probabilities are independent, there is no known simple way of computing the expected value of modularity other than enumerating and computing modularity in all possible worlds, and calculating the weighted average. The set of all possible worlds is the set of all graphs corresponding to the permutations of edges. This approach scales exponentially with the number of edges. To avoid computing an exponential number of modularities, one for each possible world, we instead propose to partition the set of possible worlds, so that for each partition we can (1) compute the expected value of modularity for that partition, and (2) compute the probability of the partition. This can reduce computational complexity, making it proportional to the number of partitions and the time to compute the probability of the partitions and the average modularity. In particular, we show that for some partitioning schemes, the probabilities of the partitions can be approximated in O(|E|2), where |E| is the number of edges. In Figure 1, we compare the relative error dr between the modularity computed via an exact method that calculates through all possible worlds, and the modularity computed using our method that calculates an approximate value through partitioning the set of possible worlds. The experiments reported in Figure 1 cover smaller networks where an exact computation of modularity is not prohibitive in terms of runtime. The current number of partitions is exponential wrt to the maximum degree (instead of the network size), and we are currently working on further reducing the number of partitions. From this figure, we can see that the relative error between the exact and approximated modularities is very small, and runtime is several orders of magnitude smaller using our current method. [1] Amin Kaveh. Modelling and Analysis of Probabilistic Networks. PhD thesis, Acta Universitatis Upsaliensis, 2022. [2] George Kollios, Michalis Potamias, and Evimaria Terzi. Clustering large probabilistic graphs. IEEE Transactions on Knowledge and Data Engineering, 25(2):325–336, 2011. [3] Chi Zhang and Osmar R Za¨ıane. Detecting local communities in networks with edge uncertainty. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 9–16. IEEE, 2018. [4] Amin Kaveh, Matteo Magnani, and Christian Rohner. Comparing node degrees in probabilistic networks. Journal of Complex Networks, 7(5):749–763, 2019. |
15:45 | Ensembles of reconstructed networks from dynamics PRESENTER: Brennan Klein ABSTRACT. When studying complex systems where the network structure is not obviously or immediately accessible, an adjacency matrix is often inferred, by estimating likely connections, co-occurrence, or influence between the entities of a system. This is commonly done from series data of a system's activity, leveraging the differences in activity at one time step with the activity at another as the basis for inferring interactions between nodes. Many techniques to reconstruct networks from time series data have been developed over the last several years (see [1,2] for two approaches in recent years). We define a (G,D,R) ensemble of random graphs as a framework for systematizing the study of networks reconstructed from time series of dynamics. This ensemble offers a number of conceptual and mathematical benefits, largely because networks sampled from a particular (G,D,R) ensemble explicitly encode information about the data generating process. Under this framework, graphs are created via a network reconstruction method, R, applied to time series data of dynamics, D, that were observed on the nodes of a "ground truth" network, which itself was sampled from an ensemble of graphs, G. This approach not only gives us a way to sample random graphs, but also a framework for developing and testing theories of complex systems, which invariably require models of the data generating process. We discuss this and related properties of graphs sampled from (G,D,R) ensembles in different settings, introducing a new perspective on the fundamental challenge of modeling complex systems---and potential ways to connect this framework to recent advances in our statistical understanding of network modeling [3]. |
14:15 | Causality from Correlation ABSTRACT. Interpreting correlation as causality is a well-known fallacy, but in many applications we need to understand causality while observing only correlations. In his nobel-prize-winning work, Granger addressed this challenge by defining causaliy as correlation with a time delay. However, granger-causality provides only an ad-hoc answer to problem that calls for a fundamental solution. In a recent paper we proposed a different approach by which causality in a system can be computed from instantaneous correlations if it is known that some elements of the system do not have a direct causal connection. In this talk I will present this approach and show that the necessary prerequisites are commonly met in networks. I illustrate this with simulated data from an ecological meta-foodweb with multiplex structure, where the proposed method successfully recovers the ground truth causality. In summary these results show that in networks it is often possible to compute causality from correlations. |
14:30 | Pattern reconstruction through generalised eigenvectors PRESENTER: Marie Dorchain ABSTRACT. One of the most widespread theories to explain the emergence of patterns is due to Alan Turing. His theory consists of a reaction-diffusion system of two species, whose stable spatially homogeneous equilibrium is disturbed in order to obtain a new inhomogeneous state, called a Turing pattern [1]. We consider its extension on a network, therefore the two species interact on the nodes and diffuse through the links [2]. In the latter paper, authors have proved that the emergence of the instability can be related to the spectrum of the Laplace matrix associated to the underlying network; in particular it should exist an eigenvalue that returns a positive Lyapunov exponent, named for simplicity "unstable" eigenvalue. Then one can show that in the short time regime, the pattern can be described by a linear combination of the eigenvectors associated with the unstable eigenvalues of the Laplacian. Interestingly enough the same result holds true for the asymptotic regime. The above claim relies on the existence of a basis of eigenvectors. We hereby introduce and study the case of defective networks, i.e., networks whose Laplacian matrix is not diagonalisable and thus does not have a linearly independent set of eigenvectors. Our aim is to show the effect of generalised eigenvectors on the Turing pattern. First we prove that one can extend the theory developed so far [2] to this new setting providing sufficient conditions for the emergence of Turing patterns, then by using the Brusselator model as a generic nonlinear system, defined on a non-normal network [3] we illustrate the impact of generalised eigenvectors in the pattern reconstruction. As shown in Fig.1, when only the eigenvectors are used (panel a)), the pattern is not fully recovered, whereas the pattern is reconstructed almost perfectly when also the generalised eigenvectors are considered (panel b)). Therefore, our results reveal the importance of the generalised eigenvectors: conditions for the Turing instability can be found and the pattern can now be reconstructed. |
14:45 | Self-organized waves in excitable dynamics on graphs PRESENTER: Marc Huett ABSTRACT. How classical spatiotemporal patterns can be detected and studied in networks is an open question of relevance to brain dynamics, epidemic diseases, and information spreading. In this talk we summarize our research on self-organized waves in a stylized (three- state) model of excitable dynamics on graphs and outline new directions. Our methodology allows studying nonlocal activity patterns in graphs hosting excitable dynamics. It allows the identification of those self-organized nonlocal patterns by the small sets of nodes acting as sources and the large set of links relaying activity in a preferential direction. Our results pave the way for advances in network controllability, performance tuning, and failure prediction. In this talk we furthermore outline, how such wave phenomena also emerge in more realistic models of excitable dynamics, e.g., networks of coupled FitzHugh- Nagumo oscillators. |
15:00 | Emergence of geometric Turing patterns in complex networks PRESENTER: Jasper van der Kolk ABSTRACT. Turing patterns, arising from the interplay between competing species of diffusive particles, has long been an important concept for describing non-equilibrium self-organization in nature, and has been extensively investigated in many chemical and biological systems. Historically, these patterns have been studied in extended systems and lattices. Recently, the Turing instability was found to produce topological patterns in networks with scale-free degree distributions and the small world property, although with an apparent absence of geometric organization. While hints of explicitly geometric patterns in simple network models (e.g Watts-Strogatz) have been found, the question of the exact nature and morphology of geometric Turing patterns in heterogeneous complex networks remains unresolved. In this work, we study the Turing instability in the framework of geometric random graph models, where the network topology is explained by an underlying geometric space. We demonstrate that not only can geometric patterns be observed, their wavelength can also be estimated by studying the eigenvectors of the annealed graph Laplacian. Finally, we show that Turing patterns can be found in geometric embeddings of real networks. These results indicate that there is a profound connection between the function of a network and its hidden geometry, even when the associated dynamical processes are exclusively determined by the network topology. |
15:15 | Information networks analysis of time series with applications to climate tipping points PRESENTER: Béatrice Désy ABSTRACT. Seemingly random and abrupt climate transitions might be a fundamental characteristic of the global climate system. Yet, the properties of such transitions are seldom captured by traditional climate models, which often smooth out or average over most small scale-fluctuations. Mapping time series to complex networks has been investigated in the last two decades, and proven successful in capturing dynamical transitions and offering insight into imperfect yet realistic temporal data. Here we introduce a new mapping from time series to complex networks and use it to investigate the timescales of climate systems, with applications to climate tipping points. [See extended abstract.] |
15:30 | Perturbed off scale -- Genuinely Nonlinear Network Responses and Tipping Points PRESENTER: Marc Timme ABSTRACT. External fluctuation signals and internal nonlinearities determine the emergent dynamics of complex networked systems and may cause a system to cross tipping points towards undesired, unpredictable or failure states. Due to the nonlinearities, perturbative techniques such as linear response theory dominate their analyses, relying on the assumption that to lowest order approximation, observables respond linearly with increasing signal strength. Here we report that, in contrast, averages of generic fluctuation-induced system responses are genuinely nonlinear, such that linear response theory is intrinsically incapable of describing them. We demonstrate why standard perturbation theory of any order is moreover incapable of capturing tipping points and propose an alternate nonlinear perturbation approach that helps to predict tipping points. These nonlinearities, to the best of our knowledge, have remained obscure in the literature so far. We illustrate examples in physical, biological and engineered systems, thereby bridging from a motivating example in a power grid model to a diverse set of open questions. [1] M. Timme and M. Schröder, Disentangling scaling arguments to empower complex systems analysis, Nature Phys. 16:1086 (2020) https://www.nature.com/articles/s41567-020-01063-5 [2] M. Thümler, M. Schröder and M. Timme, IFAC papers online 55:254 (2022). https://doi.org/10.1016/j.ifacol.2022.11.061 |
15:45 | Predicting Influential Higher-Order Patterns in Temporal Network Data PRESENTER: Christoph Gote ABSTRACT. Networks are frequently used to model complex systems comprised of interacting elements. The edges of a network capture the topology of direct interactions between elements. However, we argue that the true complexity of many systems originates from higher-order patterns in paths by which nodes can indirectly influence each other. In this talk, we show that by capturing only direct interactions, network models underfit higher-order patterns present in data. Consequently, they misidentify influential nodes in complex networks. Instead, models based on path data, i.e., ordered sequences of consecutive direct interactions, can capture these patterns. However, to avoid overfitting, such models should only consider those higher-order patterns for which the data provide sufficient statistical evidence. We propose five centrality measures based on MOGen, a multi-order generative model that accounts for all indirect influences up to a maximum distance but disregards influences at higher distances. We compare MOGen-based centralities to equivalent measures for both network and path models in a prediction experiment, where we aim to identify influential nodes in out-of-sample data. We find that MOGen consistently outperforms both models, especially in cases of social networks, where the range of possible interactions is extensive and data is limited. We further show that the performance difference between MOGen and the path model disappears if we have sufficient observations. This means, that indeed network models underfit and path models overfit indirect interaction patterns in sequence data. Instead, our MOGen-based centrality measures provide significantly more accurate predictions on the true influential nodes and node sequences. Our full paper on this topic was recently published at the 2022 IEEE/ACM International Conference on Social Networks Analysis and Mining (ASONAM 2022), where it received the best paper award. |
14:15 | Spatial immunization to abate disease spreading in transportation hubs ABSTRACT. Proximity social interactions are crucial for infectious diseases transmission. Crowded agglomerations pose serious risk of triggering superspreading events. Locations like transportation hubs (airports and stations) are designed to optimize logistic efficiency, not to reduce crowding, and are characterized by a constant in and out flow of people. Here, we analyze the paradigmatic example of London Heathrow, one of the busiest European airports at a scale of cells of 10x10 square meters. In our work and thanks to a dataset of anonymized individuals' GPS traces and trajectory reconstruction processes, we build a temporal co-presence network encoding movements and co-presences of a set of agents corresponding to passengers and workers within the whole airport perimeter in a typical day. On this network, we model the spreading of diseases employing compartimental models and parameters adapted to several real infections such as SARS, influenza or COVID-19. With these simulations, we find the areas of the airport with the largest number of contagions (hotspots, see Figure below panels on the right). These hotspots are not directly the cells with the largest agent presence, but they are the product of a combination between presence and time spent in the area. Inspired by recent results in the use of ultraviolet technologies for disinfection [1], we study the effect of reducing the infectivity in these hotspots in terms of reduction of the number of contagions and impact on the potential destinations of the passengers (see Figure panels a-c). By analyzing the average destinations of each terminal, we also detect the most vulnerable destinations to outbreaks developed at Heathrow airport and quantify the benefits of the spatial immunization technique to prevent regional and global disease diffusion. This method is immediately generalizable to train, metro and bus stations and to other facilities such as commercial or convention centers. The results of this work will appear soon in a manuscript already accepted for publication in Nature Comm [2]. |
14:30 | The limits of human mobility traces to predict the spread of COVID-19 PRESENTER: Laetitia Gauvin ABSTRACT. Mobile phone data have been widely used to model the spread of COVID-19, however, quantifying and comparing their predictive value across different settings is challenging. Their quality is affected by various factors and their relationship with epidemiological indicators varies over time. Here we adopt a model-free approach based on transfer entropy to quantify the relationship between mobile phone-derived mobility metrics and COVID-19 cases and deaths in more than 200 European subnational regions. We found that past knowledge of mobility does not provide statistically significant information on COVID-19 cases or deaths in most of the regions. In the remaining ones, measures of contact rates were often more informative than movements in predicting the spread of the disease, while the most predictive metrics between mid-range and short-range movements depended on the region considered. We finally identify geographic and demographic factors, such as users' coverage and commuting patterns, that can help determine the best metric for predicting disease incidence in a particular location. Our approach provides epidemiologists and public health officials with a general framework to evaluate the usefulness of human mobility data in responding to epidemics. |
14:45 | The Effects of Seasonal Human Mobility Networks and Aedes aegypti Habitat Suitability on Zika Virus Transmission in Colombia PRESENTER: Brandon Lieberthal ABSTRACT. The Zika virus pandemic of 2015-16, which caused over 1 million confirmed or suspected human cases in the Caribbean and Latin America, was driven by a combination of movement of infected humans and availability of suitable habitat for key mosquito disease vector species. Both human mobility and mosquito vector densities vary seasonally, and the goal of our research was to analyze the interacting effects of disease vector presence and human movement across metapopulations on disease transmission intensity and the probability of super-spreader events. Specifically, we tested the hypotheses that 1) regions with the highest probability of mosquito presence will have more super-spreader events during dry months, when mosquitoes are predicted to be more abundant, 2) regions reliant on tourism industries will have more super-spreader events during wet months, as they are more likely to contribute to network-level pathogen spread due to increased travel. We used the case study of Colombia, a nation with a population of about 50 million people, whose annual calendar can be partitioned into overlapping cycles of wet and dry seasons and tourism and off-tourism seasons that drive distinct cyclical patterns of mosquito occurrence and human movement. Our approach combined spatial modeling of mosquito presence with network modeling of human mobility. We used a Maximum Entropy model to estimate the spatially and temporally varying habitat suitability for Aedes aegypti, the primary mosquito vector of Zika virus in Colombia, in terms of environmental factors such as temperature, precipitation, and vegetation coverage and human factors indicating land use and development, socioeconomic status, and prevalence of standing water that provides key habitat for juvenile mosquitoes. We also used airline travel data to develop metapopulation patch network models of the 1,123 municipalities of Colombia for each month of 2015 and 2016. We combined our vector habitat suitability and human mobility network models to simulate the Zika virus epidemic if it had begun in each month of 2015. Our results show that whether the first human case was introduced to the network during the wet vs. dry season and during the tourism vs. off-tourism season profoundly affects the severity and trajectory of the epidemic. For example, Zika virus was first detected in Colombia in October of 2015. Had it originated in January, a dry season month with high rates of tourism, it likely could have infected up to 30% more individuals and up to 40% more super-spreader events may have occurred. In addition, popular vacation spots such as Barranquilla and Cartageña have the highest risk of super-spreader events during the winter, whereas densely populated areas such as Medellín and Bogotá are at higher risk of sustained transmission during dry months in the summer. Our research demonstrates that public health planning and response to vector-borne disease outbreaks requires a thorough understanding of how vector and host patterns vary due to seasonality in environmental conditions and human mobility dynamics. |
15:00 | Local vs. nationwide restrictions against COVID-19 pandemic in France between September 2020 and June 2021 PRESENTER: Chiara Elisa Sabbatini ABSTRACT. The end of 2020 saw the emergence of the first variant of concern (B.1.1.7 or Alpha variant) characterized by an advantage in transmissibility and severity. As mass vaccination campaigns were just starting to roll out, countries adopted different approaches to large-scale restrictions to manage the rise of cases through spring 2021. France applied national interventions with a second lockdown in the fall 2020, followed by a long-lasting curfew in the winter, and by a third lockdown in the spring 2021. Other countries, for example Italy, adopted regionally-based restrictions with a tiered system. We developed a regionally-based spatially-explicit epidemic metapopulation model informed by observed mobility fluxes estimated from mobile phone data and fitted the model to the regional trajectories of hospital admissions. The model integrated data on vaccination and the spread of the Alpha variant. Counterfactual scenarios were designed to compared the effectiveness and social cost of applied interventions in France with respect to alternative regionally-based restrictions. We find that regional interventions may lead to lower hospitalizations and shorter periods of restrictions. This outcome, however, non-trivially depends on the thresholds chosen to trigger and release the lockdowns. This range is also altered by the specific context, such as vaccine pace, spatial scale and heterogeneity. Our findings highlight the complexity of designing effective targeted strategies. A portfolio of different options needs to be considered in preparation for the next pandemic, depending on the specific profile of the emerging pathogen, the resulting epidemic context, and our understanding of these aspects in real-time. |
15:15 | Epidemic spatial invasion of the first wave COVID-19 epidemic in Senegal PRESENTER: Giulia Pullano ABSTRACT. On March 2, 2020, first COVID-10 cases were detected in Dakar, the capital of Senegal[1]. Few weeks later, a range of social distancing measures were enforced to contain the epidemic activity, like curfew, and school closure. Although control policies flatten local epidemic activities, they did not contain the national transmission. In late summer 2020 all Senegalese departments were affected by the epidemic, and the resulting invasion process was highly heterogeneous (Fig1a). The aspects which drove such process are still unclear. Here we answer this question by characterizing the role of human mobility and other structural aspects in shaping the epidemic in Senegal. We selected 5 phenomena-driven Levin models based on distance, origin population, destination population, distance and origin population, and distance origin, and destination population (gravity model) [2]. To assess the impact of mobility restrictions, we also extracted from mobile phone data, three daily origin-destination mobility indicators and we integrated them into a spatially-explicit transmission model [3]. The three mobile phone indicators couple locations accounting for the number of displacements (D) between locations, the time spent in each location by census block areas, (L), the time spent in the most visited area (i.e., work/school places) by census block area (C), respectively. We found that the distance-based model and D do not well model the initial phase of the invasion because they don’t capture long-range contagions (Fig1 b). While the gravity model and C are the ones that better reproduce the observed arrival times (Fig1 b, c), suggesting that commuting fluxes have driven the spatial invasion and they have not been largely impacted by mobility restrictions. This characterization may help public health agencies to target control policies in high-risk areas based on commuting fluxes and to optimize resource allocation. |
15:30 | The path to dominance: Mobility networks and geographical heterogeneity in the establishment of the alpha variant in the US PRESENTER: Jessica Davis ABSTRACT. Since the beginning of the COVID-19 pandemic an unprecedented effort has been devoted to monitoring the genetic evolution of the virus, which includes identifying potential variants of concern. The first highly publicized variant, the alpha variant (B.1.1.7 lineage), was identified in December 2020 by health authorities in the United Kingdom (UK). The speed of the spread of the alpha variant across the UK, during strict non-pharmaceutical interventions (NPIs), suggested a significant transmissibility advantage. Despite the unprecedented, and critical efforts to monitor both the emergence and evolution of SARS-COV-2 variants, genomic surveillance and phylogenetic analysis are subject to geographical biases in sampling and thus are only able to provide a partial view and understanding. Here, we complement these efforts by implementing a “multi-scale” modeling approach combining the global epidemic and mobility model (GLEAM) with a higher resolution metapopulation model within the United States (LEAM-US: Local epidemic and mobility model). GLEAM is integrated with international travel data that is used to model the timing and spatial distribution of imported infections within the US. The LEAM-US model then uses the international seeding pattern generated by GLEAM to simulate the local spread within the ~3,200 counties in the US. With this finer geographic resolution, the highly detailed subnational mobility networks within the LEAM-US model allow us to better study the path to dominance of the alpha variant. To model the variant arrival time and spread we use a two-strain, age-structured, compartmental model that accounts for the timeline of policy interventions (e.g., stay-at-home orders, school closures, etc.) and the current rate of vaccination observed in different states. The model is calibrated using an Approximate Bayesian Computation approach on the weekly incidence of COVID-19 deaths within each contiguous US state. The results suggest that the variant began circulating towards the end of 2020 and by May 2021 was the dominant variant in most states (Fig. 1A). However, the time to dominance is geographically heterogeneous. This heterogeneity increases significantly at the scale of the combined statistical areas (CSAs) within states (Fig. 1B). CSAs that contain international travel hubs (ex. large airports) and large populations experienced earlier introductions and circulation of infections caused by the new strain compared to more isolated CSAs with smaller populations. Within a given state, the time it takes for the variant to become dominant across each CSA could span a couple of months. Lastly, we validate our results against genomic surveillance by comparing the daily fraction of infections due to the variant from our model (median and 90%CI) with the data from Helix's surveillance dashboard (Fig. 1C). The methodology proposed in this work can inform future studies aimed at charting the spreading patterns and the time to reach dominance of new variants. |
15:45 | [CANCELLED] Integrating dynamical modeling and phylogeographic inference to characterize global influenza circulation PRESENTER: Francesco Parino ABSTRACT. The mechanisms underlying global seasonal influenza circulation involve a complex interplay between local (seasonality, demography, host immunity) and global factors (international mobility) shaping recurrent epidemic patterns. No study so far reconciled the two spatial levels and compared the coupling between national epidemics. The unevenly distributed human population across countries and seasonal areas and the complex network of human travelling occurring at both short and long-range distances (i.e. from city-to-city commuting to international air travel) are key to this challenge. In addition, the heterogeneous coverage of epidemiological and virological data limits traditional approaches and calls for phylodynamic models integrating different data sources. Here, we combine a generalized linear model (GLM) of phylogeographic diffusion with a dynamical model of global influenza spread that integrates high-resolution population distribution, the global flight network, and intra-country commuting. We use the model to simulate the seasonal influenza circulation network across macro-regions of the world. We compare different parametrizations - i.e. different values of R_max (winter reproductive ratio), R_min (summer reproductive ratio) and D (average immunity duration). We then test model output as predictors of the GLM to provide model validation and parameter selection based on genetic data. The simulated influenza circulation network outperform predictors based on raw air-travel, considered so far to be the best predictor of global influenza migration. The inclusion probabilities for the different parameterizations showed a preference for a winter reproductive ratio as high as 1.5 and an average immunity duration of 6 years. The combined approach thus enables parameter selection yielding a novel computational framework for describing global influenza dynamics at different scales. Our findings have important implications to improve preparedness against seasonal influenza epidemics. The approach can be generalized to other epidemic contexts, such as emerging disease outbreaks to improve the flexibility and predictive power of modelling. |
14:15 | Vaccination homophily during the Covid-19 pandemic PRESENTER: Julia Koltai ABSTRACT. Homophily, a well-observed attribute of social networks refers to the tendency of social contacts to be similar to one another [1]. Research on vaccine hesitancy has suggested that the predictors of vaccine refusal widely correlate with factors shaping interaction patterns in society while they also cluster geographically [2, 3]. As attitudes and behaviors tend to spread within social networks, the vaccination decision of an individual is strongly influenced by social contacts, eventually leading to vaccination homophily and vaccination clustering [4]. The consequences can be significant, particularly in terms of vaccine hesitancy, which contributes to outbreaks of diseases [5]. This paper aims to survey and understand vaccination homophily in egocentric networks during the Covid-19 pandemic. We use a merged dataset of two nationally representative surveys conducted in Hungary in November 2021 and June 2022 (N=2000). Using a network diary approach [6], we measured the egocentric network by asking respondents to list all their personal interactions in the last 24 hours. The diary log contained information about the Covid-19 vaccination status of each alter. This allowed us to calculate vaccination rates in the network of each individual. The results suggest strong clustering depending on vaccination status. The mean vaccination rate in vaccinated individuals' networks is 91% compared to 49% in unvaccinated individuals' networks (MED = 100% vs. 50%). When focusing on household member contacts this difference increases to 92% vs. 31% (MED=100% vs. 0%). but decreases when considering other family members, friends, or colleagues. The network’s vaccination rate is indeed the strongest predictor of vaccination status (AME=0.329, SE=0.016), and vaccination homophily is widespread. Our models predicting homophily show that both a large network size and a high level of education decrease the probability of homophily among unvaccinated individuals, whereas being trapped in a filter bubble in social media increases it. In turn, a high level of education signals stronger homophily among vaccinated individuals. Our findings corroborate previous reports by showing that vaccination homophily is common and close contacts in egocentric networks may have played a crucial role in vaccination decisions during the Covid-19 pandemic. 1. M. McPherson, L. Smith-Lovin, J. M. Cook, Annual Review of Sociology. 27, 415–444 (2001). 2. G. Burgio, B. Steinegger, A. Arenas, Commun Phys. 5, 1–7 (2022). 3. J. Lee, Y. Huang, Vaccines. 10, 352 (2022). 4. P. Konstantinou et al., Vaccines. 9, 607 (2021). 5. T. Hiraoka, A. K. Rizi, M. Kivelä, J. Saramäki, Phys. Rev. E. 105, L052301 (2022). 6. Y. Fu, Social Networks. 27, 169–186 (2005). |
14:30 | The Association Between Homophily on Illicit Drug Use and PrEP Conversations Among Latino Men Who Have Sex with Men in Miami-Dade, Florida, US: A Social Network and Spatially Explicit Study PRESENTER: Mariano Kanamori ABSTRACT. Latino men who have sex with men (LMSM) experience a disproportionate burden of human immunodeficiency virus (HIV). While pre-exposure prophylaxis (PrEP) is a highly effective biomedical intervention to prevent HIV, LMSM are not initiating PrEP. Understanding and addressing barriers to PrEP initiation is important in Miami-Dade County, Florida, an HIV hotspot, and home to a large LMSM community. We will present an innovative approach that combines social network analysis (SNA) with geospatial analysis to inform PrEP programs for LMSM who misuse drugs. To gain an understanding of how social networks can promote and encourage PrEP use, this study 1) identified associations of drug use homophily and PrEP promotion conversations within LMSM friendship networks, and 2) described the physical overlap between geographic drug risk areas with areas where these friendship networks held PrEP conversations. Respondent-driven sampling was used to recruit 10 sociocentric networks of 13 LMSM friends (N=130). Quadratic assignment procedure (QAP) correlations and multiple regression QAPs were used to identify influences of drug use homophily. Geocoding and visualizations were used to describe the syndemic of HIV and drug use by identifying overlapping geospatial clusters of HIV and drug use. Friendship relationships in which both friends used cocaine or marijuana were more likely to report PrEP-related conversations in the past six months than non-drug user dyads. The likelihood of talking about PrEP in the next six months was higher among dyads with cocaine use homophily and ecstasy use homophily, while lower among dyads with marijuana use homophily. Figure 1 includes ten network visualizations with connections representing previous PrEP conversations between a pair of friends and future potential for PrEP information dissemination in the next six months. More than half of the dyads within five of the networks had previously discussed PrEP (networks 2.B, 2.C, 2.E, 2.G, and 2.I). In four of the networks, about a quarter or more of existing ties that had not previously discussed PrEP reported future potential for sharing information about PrEP (networks 2.D, 2.F, 2.H, 2.I). The two networks with the greatest potential for sharing PrEP information in the future contained at least one ecstasy user and at least four cocaine users (networks 2.D and 2.H). We will also present maps portraying how marijuana and cocaine risk areas were located throughout Miami-Dade County while ecstasy risk areas were mostly in urban areas. Most drug risk areas associated with PrEP conversations were located in north and central Miami. Overlaps between geographic drug risk areas and PrEP conversation areas were different for each type of drug (marijuana, cocaine and ecstasy). Current HIV prevention efforts should be strengthened, as evidenced by the increasing HIV infection rates among priority populations such as LMSM and those who use drugs. Instead of solely relying on social media or other mass communication efforts to increase PrEP-related information, future interventions can harness the power of friendship and drug use networks. This can include enrolling entire sociocentric friendship groups, configuring friendship networks to connect those with PrEP information to those without information, and incorporating peer leaders. |
14:45 | Homophily-based social group formation in a spin glass self-assembly framework PRESENTER: Jan Korbel ABSTRACT. Homophily, the tendency of humans to attract each other when sharing similar features, traits, or opinions has been identified as one of the main driving forces behind the formation of structured societies. Here we ask to what extent homophily can explain the formation of social groups, particularly their size distribution. We propose a spin-glass-inspired framework of self-assembly, where opinions are represented as multidimensional spins that dynamically self-assemble into groups; individuals within a group tend to share similar opinions (intra-group homophily), and opinions between individuals belonging to different groups tend to be different (inter-group heterophily). We compute the associated non-trivial phase diagram by solving a self-consistency equation for ‘magnetization’ (combined average opinion). Below a critical temperature, there exist two stable phases: one ordered with non-zero magnetization and large clusters, the other disordered with zero magnetization and no clusters. The system exhibits a first-order transition to the disordered phase. We analytically derive the group-size distribution that successfully matches empirical group-size distributions from online communities. https://journals.aps.org/prl/accepted/da079Y55He913c8068064f73e16ceb1e82c0b9365 |
15:00 | The Centrality of Minorities under Triadic Closure and Homophily PRESENTER: Lisette Espín-Noboa ABSTRACT. Link formation in social networks is governed by a variety of well-known social mechanisms such as preferential attachment and homophily. The former describes the process in which individuals prefer to connect to well-connected or popular people, and the latter explains the tendency to connect to others based on similarity. Triadic closure, on the other hand, describes the formation of triangles when people connect to friends-of-friends, potentially amplifying induced homophily. In a system composed of two social groups of different size, these mechanisms are known to cause disparities in visibility and can force one group to the network's periphery. For a comprehensive understanding of how such network inequalities emerge, one needs to incorporate all of these fundamental link formation mechanisms in a single model. Previous modeling attempts usually lack either some critical link formation mechanism or an analysis of structural disparities emerging from the presence of a minority group. For instance, propose a model that incorporates homophily and triadic closure for static networks, but lacks an explicit preferential attachment dynamic. Holme & Kim extend the Barabási-Albert scale-free growth model by a triadic closure mechanism, but do not consider homophily and groups. In contrast, Karimi et al. introduce the BA-Homophily model by adding homophily to the Barabási-Albert model, and vary the size of the minority group to show under which conditions minorities are over- or under-represented in top degree-ranks. However, they do not consider tie formation via triadic closure. |
15:15 | Impact of multidimensional homophily on intersectional minorities PRESENTER: Samuel Martin-Gutierrez ABSTRACT. Homophily, our tendency to connect with similar others, is known to dramatically affect minorities in social systems composed of two groups (minority and majority), relegating them to disadvantaged positions in the network. However, real-world systems can rarely be reduced to the interaction between two social groups. As complex social beings, we belong to multiple groups at the same time, so individuals can experience marginalization in several dimensions simultaneously (ethnicity, gender, socioeconomic status, etc.). These disadvantages accumulate in a nonlinear way, causing intersectional inequalities that further harm people at the intersection of several minority groups. In this work, we develop a network model of homo / heterophilic interactions with multidimensional attribute vectors. We use the model to comprehensively explore inequalities of social capital, not only for one-dimensional minority groups, but also for intersectional minorities. The relative size of intersectional groups (whether they are majority or minority) is controlled by the correlation between the attributes (also called consolidation). We find a complex interaction between attribute correlation and homophily. Global inequality is higher for highly heterophilic or highly homophilic networks, but higher homophily also leads to higher between-group inequality. Correlation does not affect global inequality. However, it strongly conditions the source of inequality, transferring it between the within-groups and the between-groups components. Finally, in agreement with previous works, inequality decreases when the group sizes are balanced. |
15:30 | Statistical inference of homophily with network ensembles PRESENTER: Sina Sajjadi ABSTRACT. Motivation: Homophily, or the preference to connect with similar individuals, is a fundamental factor influencing network structures. It has been observed that people tend to link with those of the same race, gender, and religion, which can lead to inter-group inequalities at the network level. Measuring direct homophilic tendencies, or link creation preferences, is essential for understanding the underlying micro-level behaviors of network agents and studying the network. Previous Works: Measuring homophily in social networks is challenging due to its interplay with other social tendencies such as triadic closure and preferential attachment, which can obscure actual individual preferences or choice homophily. To account for these factors, previous approaches have used generative mechanistic network models to identify tie formation probabilities related to choice homophily. However, these methods do not account for disparities in group size, node activity, and other network properties that can bias homophily estimations. Additionally, these methods often lack a measure of uncertainty or error bars, a necessity for scientific measurements. Methodology: We leverage statistical models of network ensembles to infer the degree of homophily within a network. To do so, we first devise a generative random model with a hypothesized value of homophily $h$, where $h=1$ represents full homophily, $h=0$, full heterophily, and $h=0.5$, neutral preference (or indifference). In order to isolate the homophilic effect from other connection preferences, such as those dictated by group size and node activity/degree, we generate these random networks while preserving the degree sequence $K$ and then rewire the links based on $h$. By combinatorically counting the number of possible network configurations for a given number of inter-group links $m$, we obtain a closed analytical expression for the probability distribution $P(m|K;h)$ that describes the expected number of inter-links for a given value of homophily $h$ and degree sequence $K$. To infer the homophily parameter from an empirical network with a given $m$, we maximize a likelihood measure $L(h|K;m)$ based on $P(m|K;h)$, which represents the likelihood of a certain $h$ value given the observed number of interlinks $m$, and degree sequence $K$. Results: We test the inference framework using synthetic network models with known values of homophily preferences. The inference method was able to accurately recover the input values for the homophily preferences. The results for the Barabási–Albert (BA) model with homophily \cite{karimi2018homophily} are very robust with respect to group sizes and to the existence of preferential attachment (Fig. \ref{fig:homophily-inference}.a) , and moderately robust when we include triadic closure to the generative model (Fig. \ref{fig:homophily-inference}.b). Even if our inference model is agnostic about the underlying generative model and does not directly take preferential attachment and triadic closure into account. We applied our model to study gender homophily in scientific collaborations using the American Physical Society (APS) dataset, which contains data between 1940 and 2010. We use the gender labels previously inferred \cite{kong2022influence} and grouped the links formed over intervals of 10 years. Then, we used our framework to obtain the homophily of the network within each interval. Results (Fig. \ref{fig:homophily-inference}.c) show a small but significant homophilic preference, indicating scientists' tendency to collaborate with individuals of the same gender. This bias remains consistent over time, despite varying degrees of female participation and activity. |
15:45 | Heterogeneity- and homophily-induced vulnerability of a p2p network formation model: the IOTA autopeering protocol PRESENTER: Carlo Campajola ABSTRACT. The IOTA protocol is a distributed ledger that uses a peer-to-peer overlay network to relay information across the system and achieve consensus. In an upcoming upgrade it has been proposed to introduce “mana”, a scarce quantity associated to a peer that would to kenize its reputation, as a sybil protection mechanism. Mana works within a pseudo-random peer-to-peer network formation model, the IOTA autopeering module, which favours homophilic connections among nodes in a close mana range, leading to strongly homophile networks. In this work we present two malicious strategies to deploy eclipse attacks on the peer-to-peer network, taking advantage of the peculiarities of the model and its dependency on mana distribution. |
Network Economics
-
Alexandra Brintrup (University of Cambridge, UK): Understanding complex supply networks: an interdisciplinary journey
-
Christian Diem (Complexity Science Hub, Austria): Systemic risk and shock propagation in firm-level supply networks
-
Cesar A. Hidalgo (University of Toulouse, France): What's new in economic complexity research?
-
Guido Caldarelli (Ca' Foscari University of Venice, Italy): The physics of financial networks
-
Giulio Cimini (University of Rome Tor Vergata, Italy): Reconstruction and Validation of Economic and Financial Networks
Huawei Focus Session on Networks and Computation
-
Claudio Gallicchio (University of Pisa): Reservoir Computing and Beyond
-
Wei Lin (Fudan University, China): Prediction and modulation of complex systems: From a viewpoint of machine learning
-
Silvia Ortín (Universitat de les Illes Balears): Physical Reservoir Computing: Advancements and Applications inPhotonic Reservoir Computing
-
Jie Sun (Huawei, Hong Kong): Bifurcation behaviors shape how continuous physical dynamics solves discrete Ising optimization
-
Zoltan Toroczkai (University of Notre Dame): Towards understanding the fundamental limits of analog, continuous-time computing
The Networks & Computation focus session is supported by Huawei.
Network Biology
-
Réka Albert (Pennsylvania State University, USA): Computer-aided workflow for Boolean model optimization allows comprehensive integration of biological knowledge
-
Madalena Chaves (Centre de recherche INRIA Sophia-Antipolis Méditerranée, France): Cycle dynamics and synchronization analysis for biological oscillators
-
Pascal Falter-Braun (Helmholtz Institute of Network Biology, Germany): Interactome networks to understand genetic and infectious diseasesand facilitate drug discovery
-
Erzsébet Ravasz Regan (The College of Wooster, USA): Single-cell Network Modeling of Mechanosensitive Routes through the Epithelial to Mesenchymal Transition and their Inhibition by Mitochondrial Dysfunction
-
Ulrich Stelzl (University of Graz, Austria): Coordination of post-translational protein modifications within networks
16:30 | A More Descriptive Null Model for Assessing Data Mining Results PRESENTER: Giulia Preti ABSTRACT. We introduce a novel null model for assessing the results obtained by analyzing an observed transactional dataset using statistical hypothesis testing. Our null model maintains more properties of the observed dataset than existing models. Specifically, we preserve the Bipartite Joint Degree Matrix of the bipartite graph corresponding to the dataset, which ensures that the number of caterpillars, i.e., paths of length three, is preserved, in addition to the item supports and the transaction lengths, which are the properties considered by previous works. We develop Alice, a suite of two Markov-Chain Monte-Carlo algorithms for sampling datasets from our null model, based on a carefully defined set of states and efficient operations to move between them. The results of our experimental evaluation show that Alice mixes fast and scales well, and that our null model finds different significant results than ones previously considered in the literature. |
16:45 | Proper network randomization is key to assessing social balance PRESENTER: Bingjie Hao ABSTRACT. Repeating patterns, or motifs, have been widely studied to understand the balance in signed social networks. An important step is to compare the observed motif frequency to that of a randomized network (null model) to identify significant motifs. Existing randomization methods either disrupt the topology of the network, leading to fewer motifs in general, or treat the positive and negative links equivalently, ignoring the fact that some nodes have a tendency towards mostly negative or positive links. In fact, large-scale real-life networks show a poor correlation between the positive and negative degree of each node. To incorporate such biases, node-level signed degrees need to be considered as constraints in network randomization. Here, we propose “STP randomization” that preserves both the signed degrees and topology of the network based on the maximum-entropy framework[1]. We show that with proper randomization, the motif results change qualitatively and most social networks satisfy strong structural balance rather than a weaker notion as found previously[2]. This finding holds both at the level of triangles and larger motifs. We discuss the potential mechanisms behind this finding and further applications of the proposed network randomization benchmark. References [1]Kovács, I. A., Barabási, D. L. & Barabási, A.-L. Uncovering the genetic blueprint of the C. elegans nervous system. Proc Natl Acad Sci USA 117, 33570–33577 (2020). [2]Leskovec, J., Huttenlocher, D. & Kleinberg, J. Signed networks in social media. in Proceedings of the 28th international conference on Human factors in computing systems - CHI ’10 1361 (ACM Press, 2010). [3]Rao, A. R., Jana, R. & Bandyopadhyay, S. A Markov Chain Monte Carlo Method for Generating Random (0, 1)-Matrices with Given Marginals. Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 58, 225–242 (1996). [4]Szell, M., Lambiotte, R. & Thurner, S. Multirelational organization of large-scale social networks in an online world. Proceedings of the National Academy of Sciences 107, 13636–13641 (2010). |
17:00 | Generative models for two-ground-truth partitions in networks PRESENTER: Lena Mangold ABSTRACT. A myriad of approaches have been proposed to characterise the mesoscale structure of networks – most often as a partition based on patterns variously called communities, blocks, or clusters. Clearly, distinct methods designed to detect different types of patterns may provide a variety of answers to the network’s mesoscale structure. Yet, even multiple runs of a given method can sometimes return diverse and conflicting results, yielding entire landscapes of partitions which potentially include multiple (locally optimal) mesoscale explanations of the network. Such ambiguity motivates a closer look at the ability of these methods to find multiple qualitatively different 'ground truth' partitions in a network. Here, we propose a generative model which allows for two distinct partitions to be built into the mesoscale structure of a single benchmark network. We demonstrate a use case of the benchmark model by appraising the power of stochastic block models (SBMs) to detect coexisting bi-community and core-periphery structures of different strengths. We find that the ability to detect the two partitions individually varies considerably by SBM variant and that coexistence of both partitions is recovered only for a small subset of our planted structures, i.e. for a surprisingly small region of the space of underlying parameters. Our findings suggest that in many cases only one - in some way dominating - structure can be detected, even in the presence of other structures. They underline the need for considering entire landscapes of partitions when different competing explanations exist and motivate future research to advance structure coexistence detection methods. Our model also contributes to the field of benchmark networks more generally by enabling further exploration of ability of new and existing methods to detect ambiguity in mesoscale structure of networks. |
17:15 | Non Preferential Node Ranking Dynamics PRESENTER: Shahar Somin ABSTRACT. During the last decades, significant research efforts were directed towards the understanding of complex systems and the dynamics they undergo. To date, most studies concentrated on analyzing the degree-based absolute popularity of nodes, demonstrating evident preferential attachment and detachment patterns. However, numerous domains assign greater importance to node relative popularity exhibited by their ranked degrees. Indeed, from science to pop music, through economy, industry and online search engines, ranking items is in the heart of all social activities. While node ranking attracted some attention from the scientific community, a modeling of the entire system's ranking evolution is still lacking. In particular, the degree-based preferential hypotheses raise some fundamental questions that have yet to be addressed. First, do preferential attachment dynamics apply also to node ranking? Namely, is the rate at which nodes gain or lose their rank a monotonic function of their prior ranks? Second, if preferential patterns do not apply to ranking evolution, what is its functional form and which forces govern its demeanor? In our study we attend to the above stated questions by performing both theoretical and empirical analyses. We show for the first time that preferential principles do not apply to relative popularity evolution and that ranking dynamics follows a non-monotonous, inverse U-shaped, curve. In order to better comprehend these dynamical patterns, we propose an agents-based model which depends on the number of nodes (agents, N) and edges (agents' competitions, H) within the system. Employing this model, we derive the average change in rank $ R_{\updownarrow} $ as a function of the current rank r. Panel A in Fig. 1 demonstrates the comparison between the dynamics established in the empirical data and our agents-based model. The encouragingly strong fit substantiates our hypothesis regarding the non-monotonic dynamics of relative node popularity. Panel B in Fig. 1 presents the evaluation of $ R_{\updownarrow} $ examining various parameter settings. While the exact numerical characteristics of each plot vary with the chosen parameters, they all present an inverse-U shaped curve. Apart from revealing an inherent difference between the evolution of the absolute and relative popularity measures, the exhibited non-monotonic evolution of node ranking may shed a new light on observed yet hitherto unexplained phenomena. Specifically, they might elucidate the enabling conditions for the formation of dynamic or stationary systems, manifested by the extent of ranking fortification by the top-ranked nodes and the extent of ranking volatility in the slightly lower-ranked nodes. These results not only deepen our comprehension of complex networks’ popularity dynamics, but might also provide policy-makers and regulators with intervention means for tuning systems’ properties in order to control the extent of their internal mobility. |
17:30 | Dynamic hidden-variable network models PRESENTER: Harrison Hartle ABSTRACT. Models of complex networks often incorporate node-intrinsic properties abstracted as hidden variables. The probability of connections in the network is then a function of these variables. Real-world networks evolve over time and many exhibit dynamics of node characteristics as well as of linking structure. Here we introduce and study natural temporal extensions of static hidden-variable network models with stochastic dynamics of hidden variables and links. The dynamics is controlled by two parameters: one that tunes the rate of change of hidden variables and another that tunes the rate at which node pairs reevaluate their connections given the current values of hidden variables. Snapshots of networks in the dynamic models are equivalent to networks generated by the static models only if the link reevaluation rate is sufficiently larger than the rate of hidden-variable dynamics or if an additional mechanism is added whereby links actively respond to changes in hidden variables. Otherwise, links are out of equilibrium with respect to hidden variables and network snapshots exhibit structural deviations from the static models. We examine the level of structural persistence in the considered models and quantify deviations from staticlike behavior. We explore temporal versions of popular static models with community structure, latent geometry, and degree heterogeneity. While we do not attempt to directly model real networks, we comment on interesting qualitative resemblances to real systems. In particular, we speculate that links in some real networks are out of equilibrium with respect to hidden variables, partially explaining the presence of long-ranged links in geometrically embedded systems and intergroup connectivity in modular systems. We also discuss possible extensions, generalizations, and applications of the introduced class of dynamic network models. For further information, see doi:10.1103/PhysRevE.103.052307. |
16:30 | Implicit models, latent compression, intrinsic biases, and cheap lunches in community detection PRESENTER: Tiago Peixoto ABSTRACT. The task of community detection, which aims to partition a network into clusters of nodes to summarize its large-scale structure, has spawned the development of many competing algorithms with varying objectives. Some community detection methods are inferential, explicitly deriving the clustering objective through a probabilistic generative model, while other methods are descriptive, dividing a network according to an objective motivated by a particular application, making it challenging to compare these methods on the same scale. In this work, we present a solution to this problem that associates any community detection objective, inferential or descriptive, with its corresponding implicit network generative model. This allows us to compute the description length of a network and its partition under arbitrary objectives, providing a principled measure to compare the performance of different algorithms without the need for "ground truth" labels. Our approach also gives access to instances of the community detection problem that are optimal to any given algorithm, and in this way reveals intrinsic biases in popular descriptive methods, explaining their tendency to overfit. Using our framework, we compare a number of community detection methods on artificial networks, and on a corpus of over 500 structurally diverse empirical networks. We find that more expressive community detection methods exhibit consistently superior compression performance on structured data instances, without having degraded performance on a minority of situations where more specialized algorithms perform optimally. Our results undermine the implications of the "no free lunch" theorem for community detection, both conceptually and in practice, since it is confined to unstructured data instances, unlike relevant community detection problems which are structured by requirement. |
16:45 | Compressing network populations with modal networks reveals structural diversity PRESENTER: Alec Kirkley ABSTRACT. Analyzing relational data consisting of multiple samples or layers involves critical challenges: How many networks are required to capture the variety of structures present in the data? And what are the structures of these representative networks? Here we describe efficient nonparametric methods derived from the minimum description length principle to construct the network representations automatically. The methods input a population of networks or a multilayer network measured on a fixed set of nodes and output a small set of representative networks together with an assignment of each measurement or layer to one of the representative networks. We show that these methods recover planted heterogeneity in synthetic network populations and effectively identify important structural heterogeneities in global trade and fossil record networks. Our methods are principled, scalable, completely parameter-free, and adapted to a wide range of data---temporally ordered, unordered, directed, bipartite---providing a unified lens for exploratory analyses and pre-processing large sets of network measurements for downstream applications. |
17:00 | Exact and rapid linear clustering of networks with dynamic programming PRESENTER: Alice Patania ABSTRACT. We study the problem of clustering networks whose nodes have imputed or physical positions in a single dimension, such as prestige hierarchies [1] and the similarity dimension of hyperbolic embeddings [2]. Existing algorithms, such as the critical gap method [3] and other greedy strategies, only offer approximate solutions. In this talk, we introduce a dynamic programming approach that returns provably optimal solutions in polynomial time---$O(n^2)$ steps---for a broad class of clustering objectives [4]. We demonstrate the algorithm through applications to synthetic and empirical networks, and show that it outperforms existing heuristics by a significant margin, with a similar execution time. [1] E. E. Bruch and M. E. J. Newman, Sci. Adv. 4, eaap9815 (2018) [2] M. Boguñá, et al. Nat. Rev. Phys. 3, 114–135 (2021) [3] M. Bruno, et al.. arXiv:1906.09082 [4] A. Patania, A. Allard, J.-G. Young, arXiv:2301.10403 |
17:15 | Compression-based inference of network motif sets PRESENTER: Alexis Benichou ABSTRACT. Physical and functional constraints on empirical networks lead to topological patterns across multiple scales. A particular type of higher-order network feature that has received considerable interest is network motifs---statistically regular subgraphs. These may implement fundamental logical or functional circuits and have thus be referred to as “building blocks of complex networks”. Their well defined structures and small sizes furthermore means that their function is amenable to testing in synthetic and natural experiments. The statistical inference of network motifs is however fraught with challenges, from defining the appropriate null model and sampling it to accounting for the large number of candidate motifs and their potential correlations in statistical testing. We develop a framework for motif set inference based on lossless network compression using subgraph contractions. The minimum description length principle lets us select the most significant set of motifs as well as other prominent network features in terms of their combined compression of the network. Our approach overcomes the common problems in mainstream frequency-based motif analysis while avoiding repeated subgraph census on a sequence of randomly sampled graphs. The approach inherently accounts for multiple testing and correlations between subgraphs and does not rely on a priori specification of a null model. This provides a novel definition of motif significance which guarantees robust statistical inference. We apply our methodology to perform comparative connectomics, by evaluating the compressibility of diverse biological neural networks, including the connectomes of C. elegans and Drosophila melanogaster at different developmental stages, and by comparing topological properties of their motifs. |
17:30 | Sampling Networks from Modular Compression of Network Flows PRESENTER: Christopher Blöcker ABSTRACT. Traditional benchmark models generate networks with given structural characteristics such as degree distribution, degree correlations, or community structure. However, they do not consider dynamic processes on networks describing, for example, higher-order dependencies. The structure of networks and the dynamics on networks are interdependent and, as network structure constrains dynamics, we can learn about dynamics from structure and vice versa. Here, we propose a generative model based on the modular compression of network flows induced by the map equation, and explore emerging structures. The map equation is an information-theoretic objective function for community detection that uses random walks to model network flows. Exploiting the duality between compression and finding patterns, the map equation turns community detection into a compression problem where the goal is to minimise the random walk's average per-step description length. Recently, we proposed map equation similarity, MapSim, an information-theoretic node similarity measure based on the map equation's coding principles: MapSim interprets the modular structure of a network as an implicit node embedding in a non-metric latent space and relates the similarity between nodes u and v to the number of bits required to describe a random-walker step from u to v. Whereas a network's topology constrains a random walker's steps, a coding scheme for describing transitions allows us to calculate similarities between any node pair, whether they are connected or not. Applied to an unsupervised link-prediction task, MapSim outperforms popular embedding methods. To generate networks, we calculate MapSim values s(u,v) for all node pairs, turning them into probabilities on a per-node basis using the softmax function, p(u,v) = exp(β*s(u,v) / ∑_(v≠u) exp(β*s(u,v), where β controls how peaked the resulting probability distribution is. For illustration, we use the karate club network, which has two communities. We sample 100 networks for each β ∈ [0,2] with step size 0.1, and show the average resulting AMI between the sampled networks' and the original network's community structure, as well as the average number of sampled links. To compare the emerging community structure with the original one, we use Infomap to detect communities in the generated networks. We find that, as we increase β, the sampled networks become sparser, and fewer inter-community links are manifested. At the same time, AMI increases, making a sharp transition around β = 0.4. We show a concrete sampled network, and the original network's degree distribution as well as the average degree distribution over 100 samples for β = 0.7. By turning MapSim into a generative model, we show that mesoscopic network structure can be recovered from modular compression of a dynamic processes on the network. Our approach connects generative and descriptive notions of community detection. Exploiting this connection, we can, for example, generate first-order benchmark networks based on the compression of higher-order network flows. |
16:30 | Network science: Applications for sustainable agroecosystems and food security ABSTRACT. The global challenge of feeding two billion more people by 2050, using more sustainable agricultural practices whilst dealing with uncertainties associated with environmental change, requires a transformation of food systems. As part of a Royal Society Challenge-led Grant, which established a consortium of researchers in the UK and South America, I present our perspective for how advances in network science can provide novel ways to better understand, harness, and restore multiple ecological processes in agricultural environments. I describe: (i) a network-focused framework for managing agro-ecosystems that accounts for the multiple interactions between biodiversity and associated ecosystem services; (ii) guidance for incorporating socio-economic factors into ecological networks; and (iii) the potential to upscale network methods to inform efforts to build resilience, including global food-supply chains. In doing so I aim to facilitate the application of network science as a systems-based way to tackle the challenges of securing an equitable distribution of food. |
16:45 | Variation in host microbe-sharing networks structure across land use types and host species PRESENTER: Matan Markfeld ABSTRACT. Anthropogenic land use change degrades natural habitats and alters wild microbial communities including the host microbiome. The host microbiome plays a major role in the host’s health and disease states. Hence, the compositional and structural changes of individual hosts’ microbiome induced by land use change can affect disease dynamics at the host population level. Here, we ask what is the effect of land use and host species on the host microbes-sharing structure. We focused on the gut core and non-core microbiomes, sampled from two small-mammals species (native and non-native species) in rural Madagascar. Sampling was conducted in multiple land uses, including primary and secondary forests and different agricultural crops. We constructed a microbes-sharing network (Fig. 1A,B) in which individual hosts are connected based on their microbiome similarity. Then, we used flow-based network modularity (Infomap) to partition the network into modules, which consist of host individuals with more similar microbiomes (Fig. 1C). To measure the extent to which the microbiome similarity patterns align with land use, we incorporated node metadata into the network modularity analysis using an extension of Infomap. Using the parameter η, we controlled for the importance of metadata relative to edge connectivity in determining network partitioning (Fig. 1D). When η = 0 (only considering network topology), we found a single module in the network, as most individuals clustered together regardless of land use. However, by increasing η, we found that the value at which the network modularity significantly changed differs between the core and non-core microbiomes of the native host species, but not for the non-native host species. Our method and results shed more light on the relative effect of land use and host species on microbiome networks and enable a better understanding of the factors that shape the host microbial community. |
17:00 | The structure of the global network of Lichen Symbionts PRESENTER: Salva Duran-Nebreda ABSTRACT. Symbiosis is a major engine of evolutionary innovation underlying many extant complex organisms. Lichens are a paradigmatic example that offers a unique perspective on the role of symbiosis in ecological success and evolutionary diversification. Lichen studies have produced a wealth of information regarding the importance of symbiosis, but they frequently focus on a few species, limiting our understanding of large-scale phenomena such as guilds. Guilds are groupings of lichens that assist each other's proliferation and are intimately linked by a shared set of photobionts, constituting an extensive network of relationships. To characterize the network of lichen symbionts, we used a large data set (n=206 publications) of natural photobiont-mycobiont associations (Duran-Nebreda and Valverde, Composition, structure and robustness of lichen guilds, Scientific Reports in press). The entire lichen network was found to be modular, but this organization does not directly match genetic identity of the partnering species, prompting a reconsideration of lichen guild structure and composition. The multiscale nature of this network reveals that the major lichen guilds are better represented as clusters with several substructures rather than as monolithic communities. Heterogeneous guild structure fosters robustness, with keystone species functioning as bridges between guilds and whose extinction would endanger global stability. |
17:15 | Competitive hierarchies reduce network instability by keeping feedback weak PRESENTER: Franziska Koch ABSTRACT. Competitive hierarchies in ecological communities and beyond have been argued to lead to instability. However, theoretical work on competition has mostly been focussed on pairwise interactions and neglected the role of complex interactions between more than two species. The relation between hierarchy and instability has therefore never been tested in competition networks parametrised with empirical data. We used observations of benthic bryozoan species directly competing for space on the seabed to construct models of real competitive networks. For these bryozoan assemblages, we estimated the energy loss rates resulting from interference competition and used these estimates to parametrise both inter- as well as intraspecific interaction strengths. In order to investigate the relationship between structure and stability, we analysed how interaction strengths are distributed over feedback loops. All bryozoan assemblages were unstable. However, we found that the hierarchical structure actually mitigated instability by causing asymmetries in the interaction strengths. This asymmetric organization caused all feedback loops in the system to be weak, and thus reduced the destabilising effects of both short positive loops and long negative loops. Our results support the idea that competition leads to exclusion, but demonstrate that this is not because of, but despite, competitive hierarchy. Furthermore, our findings may help to understand how hierarchies and competitive power are sustained not only in nature but also in economic or social systems. |
17:30 | The architecture of multifunctional ecological networks PRESENTER: Mar Cuevas-Blanco ABSTRACT. The concept of keystone species was originally referred to an ecosystem level, defined as species that are disproportionally important for ecosystem functioning, which greatly surpasses that of other more abundant species in the community. Yet, their detection has so far been restricted to examining one or only a few interaction types in a community (e.g. parasitism, pollination, seed dispersal, herbivory). However, all species are involved in a myriad of interactions with other coexisting species, playing therefore multiple ecological roles that together define the multiple dimensions of their Eltonian niche [1, 2]. Given the multidimensional nature of ecosystem interactions, any attempt to fully capture their richness requires a representation that meaningfully incorporates such complexity. Thus, in order to gain a more comprehensive understanding of a species’ functional importance, it is necessary to move beyond considering only its single role to considering its multiple ecological roles, i.e. to shift from unifunctionality to multifunctionality [3]. To our knowledge, only one study has empirically estimated the weight of edges between layers by quantifying the role of the same individual in two ecological processes [4]. Our approach follows the consumer-resource paradigm, where plants are seen as “resources”, and “consumers” encapsulate different types of animals or fungi. In this work, interactions between both resources and consumers develop along six different observed functions: pollination, herbivory, seed dispersion, and pathogen-, saprotrophic-, and symbiotic-fungal interaction. The precise span of dimensions is here driven by the extent of our data, based on observations recently collected in the islet Na Redona, which includes direct observation of 16 plant species, 675 animal/fungus species, interacting across six fundamentally different ecological functions. Incorporating the functional dimension, the complete relational dataset is thus formalized in terms of a rank-3 tensor that we call the Resource-Consumer Function tensor (RCF). We interpret the architecture of this tensor as a weighted, multipartite, multilayer network and effectively visualise it as a multipartite edge-colored weighted network (Figure 1) [5]. The network displays two types of nodes: resources (plant species) and consumers (animals and fungi), with interactions (links) taking place between groups but no direct intragroup links. Each layer of the network represents a specific function and the strength of each interaction is represented by a link weight. Consumers (animals, fungi) are often centered around a single plant species and thus form clusters (see however the cluster formed by Lavatera Marı́tima and Geranium Molle). Interestingly, cross-cluster links are also present, thereby the ecosystem is entangled. A first question worth addressing from the RCF is to quantify the relationship between the resources of the ecosystem and the functions the system embodies. This is achieved by integrating out the consumer index and thereby building a Resource-Function Matrix (RFM), which is a multifunctional ecological network. For that purpose, we define the participation strength of a plant species in an ecological function. This matrix shows a stylized nested structure commonly found in e.g. mutualistic interaction networks, world-trade, inter-organizational relations, and others [6]. The complex nested pattern observed suggests that certain ecological functions and plant species can be classified as ”generalists” or ”specialists”, thereby highlighting the hierarchical nature of multifunction keystonenes. We formulate the novel concept of function keystonness, which focuses on the robustness of ecosystems with respect to perturbations to (instead of species as commonly assumed). To further understand the multifunctional species keystone-ness and the role of plant species as ecosystem assemblers, we project the RFM into the function class and extract a Function-Function Interaction Network (FFIN). We quantify the ecosystem robustness against perturbations (extinctions) of plant species by sequentially pruning edges in FFIN. Additionally, we rank plant species, based on their multifunctional keystoneness, by conditioning the FFIN to each single plant species and thus obtaining a set of 16 6-node networks. By properly normalizing sets of node and edge weights, we rank based on the specific role of resources (plant species) as brokers of functions. The dual concept of function keystonness can be addressed by following a similar mathematical manipulation: initially starting again from RFM we project now on the plant class and thus construct a Resource-Resource (i.e. plant-plant) interaction network (PPIN). We address two complementary questions: (i) how robust is the ecosystem against perturbations of functions? And (ii) how to quantify the heterogeneous roles and impacts of each different function in the ecosystem? The first question is answered by performing a robustness analysis in the PPIN, by sequentially pruning edges (i.e. functions) and analysing the response. We subsequently proceed to disaggregate PPIN by conditioning on each function, and extract 6 different plant-plant networks. The resulting ranking certifies the heterogeneity of roles and impacts of the different functions. [1] Anna Eklöf et al. “The dimensionality of ecological networks”. In: Ecology letters 16.5 (2013), pp. 577–583. [2]D Matthias Dehling and Daniel B Stouffer. “Bringing the Eltonian niche into functional diversity”. In: Oikos 127.12 (2018), pp. 1711–1723. [3] Peter Manning et al. “Redefining ecosystem multifunctionality”. In: Nature ecology & evolution 2.3 (2018), pp. 427–436. [4] Sandra Hervı́as-Parejo et al. “Species functional traits and abundance as drivers of multiplex ecological networks: first empirical quantification of inter-layer edge weights”. In: Proceedings of the Royal Society B 287.1939 (2020), p. 20202127. [5] Manlio De Domenico et al. “Mathematical formulation of multilayer networks”. In: Physical Review X 3.4 (2013), p. 041022. [6] Manuel Sebastian Mariani et al. “Nestedness in complex networks: observation, emergence, and implications”. In: Physics Reports 813 (2019), pp. 1–90. Acknowledgement: We acknowledge support from the Spanish Agency of Research (AEI) through Research and Development Projects Program (Grant PID2020-114324GB-C22 funded by MCIN/AEI/10.13039/501100011033). In particular, MCB thanks financial support Grant PID2020-114324GB-C22 funded by MCIN/AEI/10.13039/501100011033 and by “ESF Investing in your future”. |
16:30 | Network-Based Temporal Novelty Measure and the Innovation Glass Ceiling PRESENTER: Tara Sowrirajan ABSTRACT. We study innovation networks through different domains, from science and technology, to entertainment, to cuisine creation. We adapt a temporally evolving network measure representative of the degree of conventionality or atypicality of an innovation and find systematic differences in the successes of different gender innovators that hold through multiple innovation contexts. We utilize five datasets – a diverse sample of about 7 million international patent application records between 2001-2018 from the US, the UK, and Canada, over 15K items of content from Netflix, and more than 500K recipes from Food.com. We focus on the context innovation primarily and find that the phenomenon of an innovation glass ceiling for boundary-pushing, creative work done by women inventors generalizes from technology networks to the other domains of entertainment creation and cuisine innovation. We leverage the tendency to categorize inventions into multiple categories (cooperative patent classification codes for patents, content tags for Netflix entertainment, and recipe tags for the recipes) to produce a category co-classification network that measures how typical or atypical a combination of categories is, an example of which is depicted in Figure 1 (left). To explore the relationships between gender and success in boundary pushing inventions, we use the co-classification network to compute a metric for each innovation that represents the degree to which it features common combinations of categories or spans technical boundaries by combining areas rarely seen together. These scores are normalized within category as z-scores representing how typical or atypical the categories combined are. The face validity of our measure is bolstered in several ways. Consistent with prior research, we find that inventions that made unusual combinations in our data do have higher future impact (as measured through citations and maintenance fees). Women inventors’ boundary-pushing inventions are rejected more than men’s whereas women’s conventional inventions show no differences in patenting rates than men’s (Figure 1 (right)). Innovation success rates for men increase when they make novel connections between technological domains, while women face an increased chance of rejection. These differences exist after controlling for technology, team, and negotiating strategies. When performing innovative work, men are more likely to be rewarded as boundary spanners, while women are more likely to be rejected as boundary pushers. These findings from the patenting domain generalize to the contexts of entertainment and recipe creation where female creators also have consistently lower success rates than men when they engage in boundary spanning work. |
16:45 | Idea engines: Unifying innovation & obsolescence from markets & genetic evolution to science PRESENTER: Edward D. Lee ABSTRACT. Innovation and obsolescence describe dynamics of ever-churning and adapting social and biological systems, concepts that encompass field-specific formulations. We formalize the connection with a reduced model of the dynamics of the "space of the possible" (e.g. technologies, mutations, theories) to which agents (e.g. firms, organisms, scientists) couple as they grow, die, and replicate. We predict three regimes: the space is finite, ever growing, or a Schumpeterian dystopia in which obsolescence drives the system to collapse. We reveal a critical boundary at which the space of the possible fluctuates dramatically in size, displaying recurrent periods of minimal and of veritable diversity. When the space is finite, corresponding to physically realizable systems, we find surprising structure. This structure predicts a taxonomy for the density of agents near and away from the innovative frontier that we compare with distributions of firm productivity, covid diversity, and citation rates for scientific publications. Remarkably, our minimal model derived from first principles aligns with empirical examples, implying a follow-the-leader dynamic in firm cost efficiency and biological evolution, whereas scientific progress reflects consensus that waits on old ideas to go obsolete. Our theory introduces a fresh and empirically testable framework for unifying innovation and obsolescence across fields. |
17:00 | ATYPICAL COMBINATION OF TECHNOLOGIES IN REGIONAL CO-INVENTOR NETWORKS PRESENTER: Balazs Lengyel ABSTRACT. Innovation in regions occurs in collaborative work that requires interpersonal relations to transfer and combine knowledge. Co-inventor networks have been extensively used to proxy the knowledge transfer potential of social relations within and across spatial units. In this process, inventors are linked together according to whether they have worked on a patent previously. The structure of these networks qualifies the innovation capacity of regions because it can capture the potential of knowledge transfer and combination. Previous work has therefore investigated whether regional innovation benefits from the small-world networks of co-inventor collaboration and other structural measures such as density, assortativity, among others. Yet, a crucial element is still missing from the above discussion: the role of technological and knowledge domains the mesoscopic (the community level) properties of regional collaboration networks along which co-inventor collaborations shape. We still know little whether the technological specialisation of closely-knit co-inventor communities and inter-connections among technologically distant communities favour regional innovation. In this paper, we propose a community detection approach to deal with this problem at the mesoscopic level of regional co-inventor networks and investigate the creation of atypical patents requiring the combination of distinct knowledge to capture radical novelty in regions (Figure 1). This approach has major objectives over previous research: 1. small-world networks can be decomposed to network communities such that inter-community ties form network bridges; 2. since the detection of network communities relies on the structure of networks only, one can investigate the technological specializations of detected communities and characterise bridges by the similarity or dissimilarity of the specialized domains they link. We measure radical innovation by identifying atypical combinations of technologies in patents using the EPO PATSTAT database over three decades (1980-2014). The co-inventor network is constructed for NUTS2 regions in a cumulative fashion that enables us to estimate the correlation between new link formation and regional level outcomes in a fixed-effect regression framework. We apply the 'network of places' method to remedy biases caused by the automatic triadic closure when projecting bipartite collaboration networks and measure the degree of small-worldness in co-inventor networks. Finally, we detect co-inventor network communities over five-year time-windows of the networks of places. The technological specialization of communities, their interlinking, and the technological similarity of inter-linked communities are quantified on the regional level. This enables us to estimate the share of atypical patents in each region in the next five-year time-window by these regional-level explanatory variables. Our results suggest that the share of atypical patents is growing in those regions where co-inventor collaboration resembles small-world networks. Furthermore, we find that technological specialisation of co-inventor communities correlates positively with the share of atypical patents. Also, a growing share of bridging collaborations across communities with dissimilar technological portfolios further support atypical patenting. This new evidence implies that not the diversity of various technologies per se; instead, the presence of multiple, diversely specialised, and inter-linked inventor communities favour radical innovation outcomes in regional economies. |
17:15 | Quantifying disruptiveness using neural embedding method PRESENTER: Munjung Kim ABSTRACT. Development and disruption are a common dichotomy for science and technology; Scientific papers or products may enhance existing works (development) or introduce new fields (disruption). Understanding the mechanisms and conditions that derive disruptive innovations has been attracting much attention and the disruption index (denoted as D) by Funk and Owen-Smit played a pivotal role in these studies [1-2]. However, the disruption index has also some limitations such as restriction to its immediate neighbor and overlooking any topological structures that its larger context may exhibit [3]. These limitations raise a question: can we define a better alternative to the disruption index? Here, we introduce a neural embedding approach to quantify disruptiveness and demonstrate that our measure outperforms the original disruption index in multiple tasks. Our model estimates the probability that paper i is cited by paper j as P(j│i)=exp(u_j⋅v_i )/Z_i, where v_i is the target vector of paper i and u_j is the context vector of paper j. Unlike the normal Node2vec, our model is only trained in one direction (Fig. 1a) and therefore the target vector of a paper has high cosine similarity with context vectors of its citations and the context vector of a paper has high cosine similarity with target vectors of its references. This makes the cosine distance between the target vector and the context vector of less disruptive papers smaller than disruptive papers (Fig. 1b). Therefore, we define the new disruption index (denoted as CD) of paper i as the cosine distance between the target vector of paper i and the context vector of the paper i. We applied our measure to the APS dataset consisting of 644,022 papers published from 1900 to 2019. Our measure (CD), as a more continuous measure, shows a higher resolution around the zero than D (Fig. 1c). We also compared two measures in identifying disruptive papers which are likely to have a high percentile of disruption measure. Our proposed measure put the Nobel prize papers near the top 20% while the disruption index put them in both the top and bottom 10 % (Fig.1d). The same pattern is observed for the APS’s “milestone papers” (Fig. 1e). When we randomize the citation network while preserving the in- and out-degree, CD shows a completely different distribution from the real network, indicating that the original CD values are not solely due to the number of references or citations. However, D values from a randomized network show a concentration in the top 10 %, indicating that the disruption index may be confounded by the raw number of citations. Next, we examined the disruptive papers selected through the scholar survey. The CD values of these papers were top 10 % and 21 % while D was in the bottom 0.2 % and top 42 %. Lastly, the CD of papers that mentioned the last name of the cited authors in the title or abstract, which are expected to be less disruptive [2], were significantly smaller than random networks while D values in the real network were significantly larger than random networks (Fig. 1f). |
17:30 | Novelties as new combinations: higher-order Heaps' laws PRESENTER: Gabriele Di Bona ABSTRACT. There is empirical evidence of common patterns in the way humans explore the world in search of novelties, such as the Heaps' law, which characterizes how discoveries are distributed in time. This has triggered new mathematical models of exploration processes based on extraction from urns with an expanding set of items. However, an often overlooked aspect is that novelties can also arise as new combinations of existing elements. In this work, we extend the notion of novelty to groups of n consecutive elements that appear for the first time in a sequence, hence introducing the n-th order Heaps' law. Through the extensive analysis of empirical and synthetic sequences, we find that processes displaying the same rate of discovery between novelties as singletons can instead differ at higher orders. We also show that existing urn-based models are unable to capture the higher-order Heaps' laws found in real data, which leaves space for the development of a new generation of models. |
17:45 | Reconciling econometrics with maximum-entropy network models PRESENTER: Marzio Di Vece ABSTRACT. In the study of the World Trade Web (WTW), econometric approaches interpret the regression specification as the expected value of a probability distribution whose functional form can be chosen arbitrarily; still, as pointed out in [1], they are not capable of reproducing aggregate measures unless the WTW topology is fixed. Moreover, econometric models for discrete-valued data, as the ones based on the Poisson and negative binomial distributions, are employed even when empirical weights are not discrete, thus leading to the paradoxical situation of attributing a null likelihood to the empirical network structure. With the present contribution, we propose a solution to the aforementioned problems by adopting an approach rooted into statistical physics. In particular, we construct a set of maximum-entropy models, for both discrete- and continuous-valued weights, constrained to satisfy (both local and global) structural properties while allowing for a flexible, econometric reparametrization. Specifically, we consider two, broad classes of models, namely the integrated and conditional ones - depending on the probabilistic rules for placing links and loading them with weights. For what concerns the integrated models, both rules follow from a single, constrained optimization; for what concerns conditional models, instead, the two rules are disentangled and the functional form of the weight distribution follows from a conditional optimization procedure. As characterizing both the link weights (i.e. the empirical trade volumes) and the large-scale topology of the WTW via a single model is still an open issue, here we focus on two, different datasets, i.e. 11 years of the Gleditsch one and 11 years of the BACI one. In both cases, we find that maximum-entropy models for discrete-valued weights outperform traditional, econometric models both in the task of reproducing the relevant network statistics and in purely information-theoretic terms, as confirmed by the Akaike Information Criterion [2]. For what concerns continuous-valued weights, we consider different choices for the weighted constraints and explore the properties of the probability distributions induced by them; moreover, we compare their effectiveness (in predictive as well as information-theoretic terms) and stability (via the Fisher Shannon plane technique) [3]. 1. Duenas and Fagiolo, J. Econ. Interact. Coord. 8, 155–178 (2013). 2. Di Vece et al. Phys. Rev. Research 4, 033105 (2022). 3. Di Vece et al. Chaos Solit. Fractals 166, 112958 (2023). |
18:00 | Firm-level production networks: what do we (really) know? PRESENTER: Andrea Bacilieri ABSTRACT. Firm-level datasets on production networks are often confidential and arise from different data collection methods, making it difficult to determine stylized facts. Are standard network properties similar across all available datasets, and if not, why? We provide benchmark results from two administrative datasets (Ecuador and Hungary) which are exceptional in that there is no reporting threshold. We compare these networks to a leading commercial dataset (FactSet), and provide a systematic synthesis of published results on national firm-level production networks. We show that administrative datasets with no reporting thresholds have remarkably similar properties. We explain why some of these properties fail to be observed in datasets with missing data and reporting thresholds, while others appear robust. |
18:15 | Revealing the link between corporate structure and tax avoidance using higher order Markov models PRESENTER: Valeria Secchini ABSTRACT. Multinational corporations shift $1 trillion of profits to tax havens every year, costing governments around the world over $200 billion in lost tax revenue. To efficiently funnel profits out of the countries where they are generated and into tax havens, corporations use highly complex corporate structures of parents and subsidiaries spanning across countries. Previous work on corporate ownership chains have modeled corporate structures considering predominantly first-order dependencies. In this approach, nodes are countries, and the edges represent the number of companies in one node that own a company in the node to which it is pointing at, generally weighted by revenue. Approaches based on first order dependencies are, however, limited. Firms headquartered in specific countries can use different countries to organize their corporate structures. These higher order dependencies are important because, with the help of tax lawyers, multinational corporations can use combinations of countries to reduce their tax bill, isolate the headquarters from bankruptcy risks of a subsidiary, or protect their managers from personal responsibility. Here we construct a model to uncover the most prevalent higher order dependencies in corporate ownership structures and link them to corporate tax avoidance. We find around 70 higher orders, 75% of which contain tax havens. Consistent with the previous findings, we observe that "conduit" tax havens play a key role in the structure of all corporations, as well as small insland tax havens. Interestingly, we find that also subsidiaries located in unreported or unknown countries are important in the network. We assess the effect of higher order nodes on the effective tax rate paid by corporations, using LASSO regression. We find that some motifs are correlated with a low ETR. In this paper, we develop a data-driven method to understand which combination of countries are used by multinational corporations. Our method makes no assumptions about the countries involved and their position in the global economy, and has applications in policy making and the sustainability of the welfare state. |
18:30 | Modelling and countering Mexican cartels through complex systems PRESENTER: Rafael Prieto-Curiel ABSTRACT. Latin America is home to only 8% of the world's population, but roughly one in three homicides worldwide occur in the region. Mexico is one of the most violent countries in the world, reporting 34,000 victims of intentional homicide, nearly 27 victims per 100,000 inhabitants in 2021. Part of the violence suffered is explained by large conflicting cartels fighting to control portions of territory or profitable markets. Despite large losses in terms of members suffered by cartels over the last decade due to internal conflict against each other as well as incapacitation efforts from the state, Mexican drug cartels manage to compensate for their losses. We show that recruitment largely drives the ability of cartels to recover despite workforce losses. Here we formalise a model that considers sources of cartel size variation over time. Leveraging data compiled from local media and narco blogs, we detect the existence of 150 active cartels in Mexico in 2020. These data enable us to characterise the system of alliances and rivalries among drug cartels through a weighted network representation. Within this network representation, we consider four mechanisms explaining why cartel size varies: recruitment, incapacitation, saturation and conflict. Combining recruitment, incarceration, conflict and maturity, we construct a system of 150 coupled differential equations, one for each active cartel. We leverage the increasing number of homicides in Mexico and incapacitation effect (i.e., imprisonment rates) for the past 15 years to estimate the size of 150 active cartels and their recruitment. Our results show that all combined cartels comprise between 160,000 and 185,000 units, and unless they recruited at least 350 people per week, they would have eventually perished, reducing violence as well. With our system, we can compare a preventive strategy against organised crime (aimed at reducing cartel recruitment) against a reactive strategy (aimed at increasing incapacitation). Given the current cartel size and conflict, cartels will keep increasing their size and power, and we could observe 40% more casualties and 26% more cartel members by 2027. Even doubling arrests will still translate to a rise in violence compared to the 2022 levels. Conversely, decreasing the cartel's ability to recruit by half will reduce the weekly casualties by 2027 by 25% and cartel size by 11%. However, the cartel population is so large at this point that, even in the hypothetical scenario where recruitment drops to zero, it will take years to return to relatively low levels of violence. |
17:45 | Higher-order organization of resting-state fMRI signals PRESENTER: Andrea Santoro ABSTRACT. Time series analysis has proven to be a powerful method to characterize different phenomena in biology, neuroscience, economics, and to understand some of their underlying dynamical features. However, to date it remains unclear whether the information encoded in multivariate time series, such as the neuronal activity in the brain, stem from either independent, pairwise, or group interactions (i.e., higher-order structures [1]). In this work, we propose a novel framework to characterize the instantaneous co-fluctuation patterns of signals at all orders of interactions (pairs, triangles, etc.), and to investigate the global topology of such co-fluctuations [2]. In particular, after i) z-scoring the $N$ original time series, ii) we calculate the element-wise product of the z-scored time series for all the $\binom{N}{k}$ $k$-order patterns (i.e. edges, triplets, etc). Here, the generic elements represent the instantaneous co-fluctuation magnitude between a (k+1) group interaction. To distinguish concordant group interactions from discordant ones in a k-order product, concordant signs are always positively mapped, while discordant signs are negatively mapped Fig.1a-b). iii) The resulting new set of time series encoding the $k$-order co-fluctuations are then further z-scored across time, to make products comparable across $k$-orders. (iv) Finally, for each time frame t, we construct a weight filtration, by sorting all the k-order co-fluctuations by their weights. The weight filtration proceeds from the top down - in the spirit of persistent homology - so that when k-order patterns are gradually included, weighted holes and cliques start to appear (i.e., descending from more coherent patterns to less coherent). Yet, to maintain a well-defined weight filtration, only k-order patterns respecting the simplicial closure condition are included, while the remaining ones are considered as simplicial violations, or ``hypercoherent'' states, and analysed separately~(Fig.1c-d) We show that the instantaneous persistent topological properties and the number of violations uncovered by our framework distinguishes different regimes of coupled chaotic maps [3]. This includes the transition between different dynamical phases and various types of synchronization (Fig.1e). Armed with these interpretational benchmarks, we apply our method to resting-state fMRI signals from the 100 unrelated subjects of the Human Connectome Project. We find that, during rest, the human brain mainly oscillates between chaotic and partially intermittent states, with higher-order structures mainly reflecting somatosensory areas (Fig.1f). Moreover, time frames corresponding to the highest number of violations provide a higher level of subject identification compared to classical approaches~[4]. Overall, our approach suggests that investigating the higher-order structure of multivariate time series might provide new insights compared to standard methods, and might allow us to better characterise group dependencies inherent to fMRI brain data. |
18:00 | Topological scaffolds outperform functional connectivity in brain fingerprinting PRESENTER: Simone Poetto ABSTRACT. Network neuroscience is one of the dominating paradigms to describe, understand, and --possibly-- control brain function. In particular, Functional Connectivity (FC) --obtained as correlation matrices of neuroimaging signals-- is the de facto canonical approach in the field to capture the structure of coactivations of brain regions. Recently, various proposals have been put forth to improve the FC representation and overcome some of its implicit limitations, e.g. the linearity of correlations, the fact that it only describes pairwise relations disregarding potential higher-order terms, etc. In this work, we take a different route: we explore the space of correlations described by FC as a genuine topological space. We do this by leveraging the potential of using topological data analysis to identify brain states and discriminate between individuals. By analyzing the test-retest dataset from the Human Connectome Project, we show that features extracted from the algebraic-topological structure of brain activity are effective for brain fingerprinting and can complement established approaches such as network neuroscience and standard neuroimaging activation analyses. Specifically, starting from functional brain networks based on Pearson correlation between pair of regions, we compute the persistent homology of the corresponding correlation metric space. This results in persistence diagrams that summarize the birth and death of topological features of dimension 0 and 1. We then compute the localized version of topological summaries in the form of weighted scaffolds. To measure distances between connectivity matrices for functional connectivity and scaffolds we use the correlation distance. To compare persistence diagrams we use two different distances: Reininghaus kernel and Earth Mover Distance. We find that topological scaffolds are significantly more effective in discriminating between subjects in resting state condition with respect to both the persistent diagrams, and --crucially-- with the full FC. The results demonstrate that topological features are an effective instrument for brain fingerprinting, and that the topological structure (both in the case of the persistent homology and the scaffolds) is overwhelmingly more similar between time windows of the same subject, with respect to between different subjects. This result holds also when the comparison is done across different recording sessions. We quantified the effectiveness of these methods in terms of effect size, showing that homological scaffolds discrimination power largely outperforms full functional connectivity matrices. This is particularly important because scaffolds contain logarithmically less information bits than full FC matrices (scaffold densities are typically very low, <=10%, while FC matrices are dense). From this perspective, scaffolds can be seen as highly compressed topological fingerprints of individuals. An open natural next question is what are they capturing of the brain's dynamics. Preliminary results link the scaffolds to the synergistic edges across edges, which aligns with recent results obtained using a completely different lens, that is multivariate information theory, which suggests a synergistic nature of brain individual fingerprints. Overall, the study highlights the promising potential of topology as a tool for identifying brain states and characterizing connectivity dynamics, and it provides valuable insights into the benefits and limitations of using topological features for brain state investigation. |
18:15 | Inference of task-related higher-order interactions from neural signals PRESENTER: Andrea Brovelli ABSTRACT. A central hypothesis in neuroscience posits that cognitive functions arise from the coordinated activity of neural populations distributed over large-scale brain networks. Although central, progress towards testing these hypotheses has been limited by the lack of approaches to study the role of polyadic or higher-order interaction (HOIs), i.e. between more than two brain regions, in supporting cognitive functions. A standard approach in neuroscience involves the characterisation of inter-areal interactions that participate in cognitive processes by means of functional connectivity analysis of neural time series [1]. Here, we exploit recent advances in information theory to test metrics for the inference of task-related higher-order interactions between neural signals. The O-information [2] allows the characterisation of statistical interdependencies within multiplets of three and more variables in terms of both local constraints and shared informational content, reflecting synergy- and redundancy -dominated systems displaying complex emergent behaviours. We used this quantity and its gradient-based generalisation [3] to quantify the cerebral HOIs that are exclusively associated with task-related behavioural data. More precisely, we quantified task-related HOI by taking the difference (e.g., gradient) between the O-information computed on both the brain and behavioural variable and the one exclusively associated with cerebral variables. In addition, we tested generalised measures of redundancy and synergy for higher-order interactions proposed within the Partial Information Decomposition (PID) framework [4] and based on the minimum-mutual information approach [5]. We performed a simulation based on a multivariate Gaussian model [6] to benchmark the proposed HOI metrics. The model included four modules each composed of three variables, differentiated in terms of internal and task-related redundancy and synergy. We showed that the task-related O-information correctly captures whether a module is redundancy- or synergy -dominated, similarly to PID measures that separately extract redundant and synergistic information about task variables (Figure 1). We finally applied these tools to high-gamma activity (HGA) measures using magnetoencephalography (MEG) from human participants performing a reinforcement learning task. Preliminary results show that learning-related variables such as reward-related signals (Bayesian surprise) are encoded in multiplets over a distributed network including the visual, lateral prefrontal and orbitofrontal cortices. Overall, our project sets the basis for a new research framework focused on HOIs in cognitive networks and provides theoretical and computational tools for the system, cognitive and clinical neuroscience communities. |
17:45 | Modularity in random graphs from fluctuations or automatically from average degree? PRESENTER: Vilhelm Agdur ABSTRACT. Our results show the high modularity exhibited by Erd\H{o}s-R\'enyi random graphs~$G_{n,p}$ is asymptotically matched by \emph{any} graph with that degree sequence. Hence we can attribute the high modularity of $G_{n,p}$ to its degree sequence and the fluctuations in the random graph are not required as previously thought. |
18:00 | Using Voronoi Diagrams to Detect Optimal Community Structure of Weighted Directed Networks PRESENTER: Botond Molnár ABSTRACT. Community detection, i.e. partitioning network nodes into well-connected groups, is one of the classic problems of network science. While many algorithms have been developed for this purpose, most are designed for undirected graphs, and not all of them can take link weights into account. However, the accurate representation of many real-world systems requires considering the directionality and weight of connections. Here we propose a novel approach to community detection through a Voronoi partitioning of network nodes based on a shortest path metric, and generalize it to weighted and directed networks. To do so, each link is assigned a length, based on link weights existing in the data, as well as the local network topology. The Voronoi method accepts a single interpretable scalar parameter, the radius, which allows tuning the scale (or resolution) of the obtained communities. The radius can be set directly, based on knowledge about the expected size of communities, or it can be adjusted so as to maximize any problem-appropriate quality measure (e.g. modularity). This method is particularly advantageous studying networks in which a natural connection length measure exists. Most other methods require transforming the lengths into a strength measure first, which is often done in a non-principled ad-hoc manner. In contrast, the Voronoi algorithm can make use of length data directly. The method is also capable of revealing the hierarchical structure of a network if there is one, as in case of brain connectivity. The computational efficiency of the Voronoi algorithm is comparable to other state-of-the-art methods, with near-linear time complexity. We demonstrate the effectiveness of the Voronoi method by applying it to several empirical datasets representing brain connectivity and airline routes. Additionally, we performed extensive testing on a large artificial benchmark dataset created using the popular LFR algorithm, and compared the performance of the Voronoi method to that of Infomap, which is presently the most popular community detection algorithm used on directed networks. |
18:15 | Clustering graph embedding in sparse networks PRESENTER: Sadamori Kojaku ABSTRACT. Graph embedding maps a network into a compact vector space, allowing powerful machine learning models for visualization, clustering, and prediction. Given that community structure is ubiquitous, understanding how communities are embedded is critical. Two popular graph embedding approaches are based on graph spectral (e.g., Laplacian EigenMap) and neural networks (e.g., DeepWalk and node2vec). While some spectral embeddings are shown to be optimal in community detection provided that networks are sufficiently dense, little is known about the capacity of neural embedding in detecting communities, particularly in sparse networks. Here, we show that a neural embedding method---node2vec---performs exceptionally better than spectral embedding in capturing communities in networks, as evidenced by two standard benchmarks for community detection, i.e., the SBM and LFR benchmarks (Figs.~B--G). These benchmarks generate networks with communities with a varying strength of community structure, starting from nearly disconnected communities $(\mu = 0.05)$ to random networks with no community structure ($\mu=1.0$). We identify two factors that give rise to the exceptional effectiveness of node2vec. First, we show that node2vec is an optimal method for the SBM benchmark, with the algorithmic limit of community detectability coinciding with the information theoretic limit. Second, node2vec is highly robust against the sparsity and heterogeneity in network structure, thanks to an implicit regularization by its training algorithm (Smith et al. ICLR 2021), which helps to mitigate the susceptibility to local noise that affects spectral embedding in sparse networks. We demonstrate the effectiveness of the implicit regularization by implementing the modularity embedding and Laplacian EigenMap using a neural network trained with the same training algorithm as node2vec, showing a systematic improvement over spectral embedding. |
18:30 | Fast label propagation algorithm for community detection PRESENTER: Lovro Šubelj ABSTRACT. There exist various techniques that try to improve our understanding of networks. One such technique is to cluster the nodes of a network into communities, such that nodes within a community are relatively densely connected while they are relatively sparsely connected between communities. There exists a wide variety of approaches to detect communities in networks, each offering different interpretations and associated algorithms. For large networks, there is the additional requirement of speed. A technique that takes a heuristic approach is the label propagation algorithm (LPA), which runs in near-linear time. Simply put, LPA works by iteratively updating the community label of each node to a label that is most common among its neighbors. We propose a fast variant of LPA (FLPA), which is based on processing a queue of nodes whose neighborhood recently changed. In partitions found by either LPA or FLPA, we prove that each node is guaranteed to have most links to its assigned community. We first analyse LPA and FLPA on theoretical graphs such as a star, a cycle and a complete graph. We prove that both algorithms find the same partitions, but that FLPA has lower asymptotic complexity. Next, we thoroughly test FLPA on benchmark graphs and empirical networks. We find that the partitions found by LPA and FLPA are largely comparable, while FLPA can run up to 700 times faster than LPA. Our results show that FLPA is generally preferable to LPA. When using label propagation, we believe our fast variant will bring benefits at no additional costs. We consider FLPA to be useful as a quick initial look at a network, although other slower methods are arguably more robust and preferable. The suggested speedup might also be relevant in the context of other applications of label propagation. |
17:45 | Unraveling the mesoscale organization induced by network-driven processes PRESENTER: Giacomo Barzon ABSTRACT. Complex systems are characterized by emergent patterns created by the non-trivial interplay between dynamical processes and the networks of interactions on which these processes unfold. Topological or dynamical descriptors alone are not enough to fully embrace this interplay in all its complexity, and many times one has to resort to dynamics-specific approaches that limit a comprehension of general principles. To fill this gap, we introduce here a metric -- that we name Jacobian distance -- inherently related to the spatiotemporal spreading of a perturbation, allowing to exploit the latent geometry underlying network-driven processes. We compute the Jacobian distance for a broad set of nonlinear dynamics on synthetic and real-world networks of high interest for applications from biology to ecological and social dynamics. We show, analytically and computationally, that the process-driven latent geometry of a complex network is sensitive to both the specific features of the dynamics and the topological properties of the network. As an emblematic case study, we analyze how the effective mesoscale organization of a network is affected, and show how the spectrum of the Jacobian matrix determines under which conditions functional modules are not trivially mapped to topological ones. |
18:00 | Rethinking community detection in the light of the Laplacian Renormalization Group PRESENTER: Anna Poggialini ABSTRACT. Heterogeneous and complex networks represent the intertwined interactions between real-world elements or agents. A fundamental problem of complex network theory involves finding inherent partitions, clusters, or communities. By taking advantage of the recent Laplacian Renormalization Group approach, we scrutinize information diffusion pathways throughout networks to shed further light on this issue. Based on inter-node communicability, our definition provides a unifying framework for multiple partitioning measures. Moreover, we account for the elusive behavior of components bridging different communities within the same network. |
18:15 | Laplacian eigenmodes allow the categorization of edges in complex networks PRESENTER: Francesca Santucci ABSTRACT. It has been shown that information pathways on a complex network shed much light on the graph structure. The Graph Laplacian L governs the information flow in the network through the propagator K, allowing the definition of the tantamount of the canonical density operator ρ(τ), which accounts for the ensemble of accessible information diffusion states at time τ. In turn, ρ(τ) allows the definition of a graph Shannon Entropy S[ρ(τ)] that measures the degree of diffusion of the information initially contained in the individual nodes. It has been proven that S[ρ(τ)] can be considered an order parameter for second-order phase transitions in the information diffusion process, separating ranges of time when information integrates structures of different hierarchical levels. The extent of information diffusion over the nodes at a time τ can be visualized through the information integration matrix ζ[ρ(τ)], a discretized version of ρ(τ): ζ[ρ(τ)] is the adjacency matrix of a binary graph, connecting nodes that at time τ have already merged their information. In this sense, τ can be interpreted as a resolution parameter that allows the identification of graph structures on different scales. In this work, we have applied the aforementioned graph theoretical tools to address the issue of whether it is possible to characterize graph edges according to the way they contribute to information diffusion and whether this characterization induces edges communities that disclose their hierarchical role in the transmission process. In particular, we show that the integration time, the time an edge is formed in ζ[ρ(τ)], contains much information on the edge's hierarchical role and is in direct correspondence with the diffusion time phases revealed by S[ρ(τ)]. Secondly, using the spectral decomposition of ρ(τ) as a function of time, we analyze the separate contributions of each diffusion mode to the diffusion process. We show that the modes' contributions exhibit a characteristic profile, the fingerprint of the hierarchical role of each edge in the graph. Finally, we feed these diffusion properties into an unsupervised learning algorithm to classify graph edges into categories to reveal the possibly nested hierarchical diffusion structures of graph edges. We show that meaningful communities are found when this method is applied to synthetic graphs. |
18:30 | Efficient network exploration by means of resetting self-avoiding random walks PRESENTER: Oriol Artime ABSTRACT. Many empirical networks are formed by a huge number of elements, e.g., the Internet or the brain, and hence a complete map of the topological interactions is not known and possibly out of reach. Similarly, there are scenarios in which even if a system is of moderate size, we cannot have full access to their complete set of nodes and links due to experimental limitations, e.g., in interactomes. Efficiently exploring networks, and searching targets inside them when the topological information is incomplete, thus becomes a problem of utmost importance in practical applications of network theory. Random walks are a central object of study in network science, and they play a pivotal role on the topic of search strategies. Indeed, they are particularly useful when it comes to agnostically navigate a network, that is, to move across a network for which we do not have a priori information. Frequently, we want to optimize the time to perform certain tasks, e.g., visit all the nodes in the network, the so-called coverage time, or find nodes with certain properties (topological or not). For such an endeavor, the random walk model, and especially its modifications, have been largely explored in the literature. %For instance, it has been shown that not allowing the random walk to step back to the node it comes from can be beneficial for the search. Following this strand of research, in this work we propose a search strategy based on resetting self-avoiding random walks (SARW), showing that over a wide range of parameters, they become a very efficient strategy for which the network coverage is minimized. Complementing this result, we develop novel analytical techniques to characterize in depth the stochastic processes which this strategy is based on. Firstly, we tackle the problem of characterizing the time-dependent position distribution of a purely self-avoiding random walker on a network. The SARW is a non Markovian model of infinite memory, which has been traditionally difficult to handle analytically. Remarkably, we are able to reduce this infinite non-Markovianity to quasi-Markovian equations, which offer an excellent agreement with simulations. Moreover, we provide an analysis on first-passage statistics, that is, the distribution of times for which the walker arrives for first time at a node of degree $k$. Thus, this part deals with a well-known model in statistical physics for which, however, there are few mathematical known results when studied on complex networks. Secondly, we allow the self-avoiding walker to reset its memory, and we frame this as a stochastic resetting process. This is necessary to avoid topological traps, for which the non-resetting SARW blocks and is unable to continue its trajectory. By incorporating the resetting mechanism, we get rid of these absorbing states and the walk keeps evolving, eventually finding a predefined target. By using renewal equations, we accurately compute the time-dependent position and the first-passage distributions (see top Fig. for the latter), finding again a good agreement with simulations. Finally, we use the aforementioned developed concepts to show that the coverage time has a global minimum that depends on the resetting distribution (see bottom Fig.). Moreover, using the random walk as a benchmark (mean waiting time equal to $0$, leftmost points in the bottom Fig.), we can quantify in which parameter region, the resetting if beneficial or detrimental to explore the whole network. In this work, we also analyze the role played by the topological parameters (mean degree in Erdos-Renyi networks, exponent of the degree distribution in scale-free networks) on the position of the walker and in the first-passage statistics. Beyond synthetic topologies, we verify that the analytical predictions work well, and the minimization of coverage times occurs, in two empirical networks. The work is currently under review. |
17:45 | Network medicine identifies the core module of human immune dysregulation and immunodeficiency PRESENTER: Julia Guthrie ABSTRACT. Although research on rare autoimmune and autoinflammatory diseases has enabled definition of non-redundant regulators of homeostasis in human immunity, due to the single gene-single disease nature of many of these diseases, contributing factors were mostly unveiled in sequential and non-coordinated individual studies. Here, we used a network-based approach for integrating a set of 186 inborn errors of immunity with predominant autoimmunity/autoinflammation into a comprehensive map of human immune dysregulation which we termed “AutoCore”. The AutoCore is located centrally within the interactome of all protein-protein interactions, connecting a range of common, polygenic autoimmune/autoinflammatory diseases. The AutoCore can be subdivided into 19 endotypes that correspond to molecularly and phenotypically cohesive disease subgroups, providing a pathway-based disease classification and potentially providing a rationale towards systematic targeting for therapeutic purposes. Our study provides a blueprint for using network-based methods to systematically investigate the molecular relationships between individual rare diseases and address a range of conceptual, diagnostic, therapeutic challenges. |
18:00 | To be or not to be: Uncovering the Interplay between Phenotypic Robustness and Plasticity in Gene Regulatory Networks. PRESENTER: Anantha Samrajya Shri Kishore Hari ABSTRACT. Cancer is a complex disease characterized by uncontrollable cell proliferation and the ability of cells to colonize multiple organs in the body through a process called metastasis. While genetic mutations are thought to be the primary drivers of cancer, these changes are not sufficient to explain the various aspects of cancer, particularly metastasis. Metastatic cells must adapt dynamically to different biochemical and biomechanical changes in their environment as they migrate through tissue barriers, travel through the bloodstream, and colonize distant sites in the body. This dynamic adaptation is facilitated by two important properties of metastatic cells: phenotypic plasticity and phenotypic robustness. Phenotypic plasticity is the ability of cells to adapt dynamically to their environment by acquiring appropriate phenotypes, such as immune evasion and adhesion to the surroundings during colonization and lack thereof during migration. On the other hand, phenotypic robustness refers to the ability of cells to maintain their phenotypes that are conducive to their survival despite environmental fluctuations. Although these two properties have conflicting natures, a balance between the two is necessary for successful metastasis. We analyze gene regulatory networks underlying metastasis to better understand the emergence of phenotypic plasticity and robustness. Specifically, we focus on the complex gene-regulatory networks underlying Epithelial-Mesenchymal Plasticity (EMP), a critical process in metastasis. Our research reveals that the topological traits of these networks hold key information for explaining these emergent properties. While plasticity and robustness have an antagonistic relationship, common topological features such as positive feedback loops support both properties. These features are uniquely enriched in biological networks, indicating their evolutionary importance. Our findings offer new avenues for developing therapeutic targets to control plasticity and reduce the robustness of hybrid E/M phenotypes of cancer cells, which contribute greatly to their metastatic potential. By targeting the positive feedback loops that support both plasticity and robustness, it may be possible to reduce the ability of cancer cells to adapt to their environment and colonize distant organs in the body. Our findings could have significant implications for developing new cancer treatments that are more effective and targeted. |
18:15 | Gene regulatory network remodeling through embryonic development trajectories PRESENTER: Celine Sin ABSTRACT. During embryo development, gene regulatory networks orchestrate dynamically changing gene expression programs to enable the complex process of coordinated cell differentiation juxtaposed with stable maintenance of diverse cell types. While a handful of key transcription factors have been identified to promote certain differentiation events, it is not clear how gene regulatory networks can produce the dynamic range of gene expression programs. Network theory has demonstrated numerous relationships between the structure of networks and the dynamic processes arising from them, but these mathematically rigorous principles have not been linked to gene regulatory networks. We harmonized 21 scRNA-seq datasets altogether containing over 5 million cells following human embryo development from conception to organogenesis to build a series of gene regulatory networks resolved to cell types and developmental time. With the harmonized data, we were able to trace pseudotime trajectories and reconstruct the transcriptomic manifold to evaluate for expression program stability, attraction points, and bifurcation points. These local manifold properties are tied to the dynamic processes controlling cell fates and by linking them to topological structure of the underlying gene regulatory networks, we will better understand gene regulatory network remodeling and how this contributes to the enormous dynamic range of gene expression programs. Our findings guide future research in cell differentiation and are particularly relevant for the development of early embryo organoids. |
17:45 | [CANCELLED] Time Dynamics of the Dutch Municipality Network PRESENTER: Ivan Jokić ABSTRACT. Based on data sets provided by Statistics Netherlands and the International Institute of Social History, we investigate the Dutch municipality merging process and the survivability of municipalities over the period 1830 − 2019. We examine the dynamics of the population and area per municipality and how their distributions evolved during the researched period. We apply a Network Science approach, where each node represents a municipality and the links represent the geographical interconnections between adjacent municipalities via roads, railways, bridges or tunnels which were available in each specific yearly network instance. Over the researched period, we find that the distributions of the logarithm of both the population and area size closely follow a normal and a logistic distribution respectively. The tails of the population distributions follow a power-law distribution, a phenomenon observed in community structures of many real-world networks. The dynamics of the area distribution are mainly determined by the merging process, while the population distribution is also driven by the natural population growth and migration across the municipality network. Finally, we propose a model of the Dutch Municipality Network that captures population increase, population migration between municipalities and the process of municipality merging. Our model allows for predictions of the population and area distributions over time. |
18:00 | Diffusion-based classification of urban spatial patterns PRESENTER: Silvia Rognone ABSTRACT. The emergence of spatial patterns is one of the most interesting characteristics of complex spatial systems, and a particularly challenging problem is posed by the quantification of spatial heterogeneity in urban systems, where a large number of discrete components are involved. As a consequence, finding a suitable manner to fingerprinting and classifying cities based on their street patterns are still open problems. In this work, we propose a method to classify urban spatial patterns by looking at the overall organisation of the size of polygons cut by the street network of a city. We analyse the adjacency networks of polygons associated with the street patterns of 87 cities all around the world, obtained by associating each polygon to a node and representing adjacency relations as edges. We assigned each node to one of three colours, and we compare the heterogeneity of these colour patterns with that measured in the associated Voronoi mapping, obtained from the real network of each city. We quantify the heterogeneity of the spatial patterns exhibited by each city by measuring the Class Mean First Passage Times (MFPT) of random walks on the adjacency graphs and comparing this measure with the same on the Voronoi mapping. We will show that this measure can highlight new interesting relationships between cities, along with being useful for classifying and grouping cities by looking at their fundamental geometrical structure. These preliminary results shed light on how the comparison of urban systems based on the relative arrangement of the shapes cut by their street networks is a promising approach toward a more robust classification of cities based on their salient structural properties. |
18:15 | Mobility Rhapsody: Unleashing the Power of Deep Learning and XAI to Decode Urban Flows at City Scale PRESENTER: Fátima Velásquez-Rojas ABSTRACT. In this work, we aim to leverage deep neural networks (NN) and explainable AI tools (XAI) to investigate how human mobility is driven in different temporal settings (e.g., differences between working days and weekends). We carried out the study in 20 different cities around the world. The model we used is inspired by Deep Gravity Model [1] and consists of 4 fully connected networks. A comparison, in terms of accuracy, of the flow generated by our model (NN) and a Gravity model (G) can be seen in Table 1. Regardless of the temporal setting (e.g., weekends - WE, weekdays - WD), our model outperforms the baseline. Then, we use SHAP [2] to explain the importance of features using mobility flows generated by the models and spot differences in the attractiveness of different geographical features during weekends and weekdays. Using SHAP allows us to, for example, understand the role of restaurants in different temporal settings. In Figure 1, we can see an example of some preliminary results of SHAP values for the city of Barcelona. In the Figure, we can see the feature importance of the geographical features collected using OpenStreetMap, population, and distances. The target of the model is to generate flows that match the ones we collected from Twitter over 2014-2021 (2020 excluded). In Figure 1, on the left, we have the importance of features during weekdays, while on the right, we analyze the weekends. Each point denotes an origin-destination pair, where blue points represent pairs where the feature has a low value and red points pairs with high values. For each variable, these points are randomly distributed along the vertical axis to make overlapping ones visible. The point's position on the horizontal axis represents the feature's Shapely value for that origin-destination pair, that is, whether the feature contributes to increasing or decreasing the generated flow for that pair. The order of the features corresponds to the general importance. While there are not many differences in terms of general importance, the impact on the flow in weekdays over the shops in the origin (for example) in comparison to weekends is bigger. This methodology give us another way to characterize the generation of mobility flows on cities. [1] Simini et al. Nat Commun 12, 6576 (2021). [2] Lundberg and Lee. Advances in Neural Information Processing Systems 30, 4765–4774 (2017). |
18:30 | Historical Dynamics of the Kingdom of Chosŏn’s Governance: Patterns of Meritocratic Bureaucracy and Consequences of Systemic Corruption PRESENTER: Donghyeok Choi ABSTRACT. The study of history is a fascinating field of inquiry that seeks to uncover and understand the patterns of human actions under varying political, economic, or natural conditions, as well as the consequences that flow from those actions. While the complexity of the continually evolving social system makes it unlikely that a set of clean and concise “laws of history” will ever be found, the increasing availability of data on past human actions and social conditions has opened the avenue for “quantitative history”, which is the history-side analog of computational social science. In recent times, patterns of accomplishments by individuals in science, culture, and athletics have garnered much attention. This is likely driven by the fact that these feats are often widely admired by connoisseurs and aficionados for their excellence or perceived value to society, and they produce concrete objects in the form of publications, artworks, figures (numbers), etc. that can be systematically collected, Figure. The interaction network of three royal figures (King Tanjong and his uncles Suyang and Anp'yŏng) and bureaucrats during Suyang's Revolt of 1493. Suyang successfully overthrew Tanjong, while Anp'yŏng failed in his own revolt at the same time and was eventually executed. The edges in the network represent co-appearance in the same article in the Annals of Chosŏn during a two-month period surrounding the revolt, whereas the node colors represent the post-revolt status of the bureaucrats: Blue means purged and red means decorated by Suyang. This shows how political events and governance are correlated with the social network of major figures: A simple visual inspection suggests that the purged tend to be close to Anp'yŏng, and the decorated to Suyang. The inset in the lower left shows it in more quantitative terms: nodes with large eigenvector centrality--indication of close association with the central nodes, possibly partisanship and alliance--tend to be either decorated (Suyang's allies) or purged (Anp'yŏng's allies), whereas those with high betweenness centrality--indication of being intermediaries between central nodes rather than being partisan--do not show such a tendency. organized, and analyzed. However, an area of human activity that has an even more significant impact on society is politics. Politics can be broadly defined as ‘the activities associated with the governance of a country’ or ‘an authoritative allocation of values’. The importance of politics and governance in the functioning and evolution of a highly complex system such as a society cannot be overstated. A large-scale study of individuals’ actions and accomplishments in the field that spans the entire duration of a historical regime is all the more valuable because it offers a glimpse into how the society functioned and, when the governance is in a crisis, what kind of dynamics evolve that threaten its existence. In this presentation, we introduce and analyze the official government records from the Chosŏn dynasty (1392-1910 CE) that ruled the Korean peninsula and its inhabitants. These records are widely recognized as an exemplary specimen of meticulous, comprehensive, and reliable historic data. Utilizing modern quantitative methods on detailed information on Chosŏn's bureaucracy and its people, we aim to find clues to the following questions to help us obtain a better understanding of how social systems evolve and decay: What type of talents did Chosŏn's vaunted bureaucracy recruit into itself? How did the bureaucrats’ careers unfold over their lifetimes, and how did they correlate with major historical events? And what kinds of signals can we extract about Chosŏn’s fate from these data? The practical benefits importance of governance cannot be overstated -- functioning government, the opposite of anarchism, is considered necessary for an orderly society and active dissolution of discord and conflict between various social constituents. Characterizing the patterns of governance and bureaucracy during the Chosŏn dynasty could help us understand how the society functioned, and when governance is in crisis, what kind of dynamics evolve that threaten its existence. Ultimately, this research can help us obtain a better understanding of how social systems evolve and decay, and thus help us prepare for a better future. |