View: session overviewtalk overview

12:00 | Robustness on modular interacting networks ABSTRACT. With the spread of IoT applications, the acceleration of the smart city process and the comprehensive popularization of mobile Internet, the amount of various kinds of data related to the country's livelihood is increasing with each passing day. These massive data present many characteristics such as multi-source heterogeneity, wide distribution, dynamic growth, and so on, and clearly indicate from different perspectives that each system is in a large complex network system of interconnection, dynamic evolution, and coupling dependence. This lecture proposes a class of mathematical framework models to study the destructiveness of complex networks with association structure from the structural robustness of complex networks. It is found that the change in the ratio of interdependent edges and interdependent points can make the continuous phase transition that can occur in a single network disappear and make the system become stable. Furthermore, it is shown that the interdependent edges are similar to the external field effect in the ferromagnetic effect. With the help of this external field effect, two types of critical exponents for the critical phase transition point are defined and found to follow the universal Wisdom's law. However, current theoretical models tend to assume homogeneous coupling among sub-networks, that is all the different sub-networks interact, while real-world systems often have a variety of different coupling patterns. We propose two types of frameworks to explore the structural robustness under such coupled giant networks, including specific deterministic coupling patterns and coupling patterns for specific sub-networks with random connections. It is found analytically and numerically that when the total number of connected edges in the network is kept constant, the location of the critical point varies with the proportion of interconnected nodes rather than monotonically. There exists an optimal ratio of interconnected nodes at which the system becomes optimally robust and resilient to external shocks. The results of this study provide a deeper understanding of network resilience and show how networks can be optimized according to their specific coupling patterns. The results are published in PNAS (2021,118, 22, e1922831118; 2018, 115 (27), 6911-6915). |

12:15 | Research on Cascading Failure Model and Robustness of Weighted Interdependent Networks ABSTRACT. Cascading failures often occur in real systems, and the modeling method of weighted multilayer networks can be closer to the physical system. However, the cascading failures in complex networks are rarely studied in weighted interdependent networks. On the one hand, the influence of network weight on node capacity and load-sharing strategy is seldom considered in the design of cascading failure model. On the other hand, the cascade failure process in weighted interdependent network is seldom considered. In order to resist the cascading failure of networks, a new cascading failure model is proposed based on the construction of weighted interdependent networks, in which the initial load is constructed by incorporating the node strength, and the load-sharing rules are formulated by combining the remaining capacity of nodes and the node importance as measured by weight and node degree. Secondly, considering the time-varying characteristics of load and the inter-layer dependence in the cascading failure process, a cascading failure algorithm of weighted interdependent networks is proposed. Finally, the effect of parameters in the model on the network robustness and the performance of the model in different interdependent networks are obtained through the analysis of network cascading failures, which verifies the effectiveness of the proposed method. |

12:30 | Cost-effective Network Disintegration through Targeted Enumeration ABSTRACT. We live in a hyperconnected world---connectivity that can sometimes be detrimental. Finding an optimal subset of nodes or links to disintegrate harmful networks is a fundamental problem in network science, with potential applications to anti-terrorism, epidemic control, and many other fields of study. The challenge of the network disintegration problem is to balance the effectiveness and efficiency of strategies. In this paper, we propose a cost-effective targeted enumeration method for network disintegration. The proposed approach includes two stages: searching candidate objects and identifying an optimal solution. In the first stage, we use rank aggregation to generate a comprehensive node importance ranking, upon which we identify a small-scale candidate set of nodes to remove. In the second stage, we use an enumeration method to find an optimal combination among the candidate nodes. Extensive experimental results on synthetic and real-world networks demonstrate that the proposed method achieves a satisfying trade-off between effectiveness and efficiency. The introduced two-stage targeted enumeration framework can also be applied to other computationally intractable combinational optimization problems, from team assembly, via portfolio investment, to drug design. |

12:45 | Non-Markovian random walks characterize network robustness to non-local cascades ABSTRACT. Understanding the interplay between structure and dynamics is still one of the major challenges in network science. A central question concerns the robustness of a system against perturbations, since it can advance the development of powerful analytical techniques to explain and unravel rich phenomenology, as well as it can provide a solid ground for informed interventions. A main assumption behind the analysis of robustness is that for a system to be functional, it needs to be connected. Hence, concepts and techniques from percolation theory become useful and are frequently employed. This is a completely static approach, where a fraction of nodes (or links), either selected uniformly at random or based on topological or non-topological descriptors, is removed from the network. From a dynamical point of view, small failures placed in the network may evolve ---according to some rules that depend on the phenomenon one is trying to model--- causing system-wide catastrophic cascades. For the sake of mathematical tractability, cascades are assumed to spread via direct contacts. However, be it because the physical mechanisms behind the failure propagation permit far-off malfunctions, be it because the knowledge on the observed network topology is incomplete and the failure propagates through hidden or unobserved edges, real-world cascades display non-local features. From a modeling standpoint, some mechanisms like flow redistribution can lead to non-local spreading of failures but the mathematical treatment has been hitherto under-researched due to its sophistication and there is no direct way to control the underlying properties of the non-local events, seriously undermining our understanding of the phenomenon. To better reconcile theory and observations, we propose a dynamical model of non-local failure spreading that combines local and non-local effects. We assume that the cascade unfolds in a timescale much faster than the recovery of nodes, and that a disrupted unit cannot be visited more than once by the failure. This fact causes the failure to be no longer Markovian and, for modeling purposes, a natural choice is to consider a Self-Avoiding Random Walk-like (SARW) dynamics on the network. To cope with the non-locality, we introduce a teleporting probability: at each step $t$ the failure proceeds as in a SARW ---uniformly choosing an operational neighbor and transitioning there--- with probability $1 - \alpha \in [0, 1]$, otherwise with probability $\alpha$ it teleports to any operative node according to a teleporting rule $T_t(k)$, time- and degree-dependent. We name this the self-avoiding teleporting random walk (SATRW), which interpolates between percolation (purely non-local process, $\alpha = 1$) and the growing SARW (purely local process, $\alpha = 0$). We have characterized the rich critical behavior of our model by providing analytical expressions for several quantities employed to assess the system’s robustness, such as the time-dependent degree distribution $p_t(k)$, the size of the giant component in the residual network as the process evolves $s_t$, the cascade first-stop time distribution, and the mean value of $S^{\text{(STOP)}}$, the giant component at the cascade stop. These robustness descriptors display an excellent agreement with simulations in synthetic systems characterized by different types of complexity in terms of the heterogeneity of their structural connectivity. We find remarkable differences between homogeneous and heterogeneous systems, e.g., their dependence, or lack thereof, on the particular network parameters. However, we also report some hidden similarities between them, such as a dynamical version of the popular \textit{robust-yet-fragile} feature to static attacks. It is worth noticing that, despite our framework is expected to work for locally tree-like networks lacking topological correlations, such as degree-degree ones, it still works in empirical settings as we have shown for the case of a biomolecular system, namely the interactome of the nematode \textit{C. elegans}, and an infrastructural system, namely a national air traffic network, shown in Fig.~1. Our findings provide a solid ground for the analytical study of network robustness, in particular, and for non-local non-Markovian processes, in general. The article is currently under review. |

13:00 | Resilience-informed infrastructure network dismantling ABSTRACT. Large-scale networked infrastructure systems contribute significantly to modern society. Highly intra- and inter-connnected systems enable communities to be more productive, at the expense of becoming more vulnerable to extreme events, cascading failures, and operational demands, either random or deliberate. The resilience of infrastructure systems against common but random failure and rare but intentional attacks is critical for safe communities, as it addresses other types of contingencies in between. Network dismantling is a process to make the network dysfunctional by removing a fraction of components, which provides insights for robustness and resilience design under many events. In particular, to protect networks from uncertain dismantling, we need to understand how to optimally fragment networks into small clusters by removing a fraction of their assets with minimal cost. Approximation methods are desirable because finding the optimal dismantling strategy is NP-hard, thus impractical on infrastructure networks. Existing methods rely on the iterative removal of the nodes with the highest adaptive importance, either from basic centralities, such as degree and betweeness, or from some more advanced metrics like collective influence. However, the additive nature of such methods fails to capture the synergistic nature of the dismantling problem, such as phase transition in percolation process. Also, algorithms connecting network dismantling problems with network decycling problems, tend to identify better dismantling sets. Other recent strategies add realism by adopting nonuniform node removal costs, and applying a bisecting algorithm based on weighted spectral approximations iteratively, to proximate the optimal solution under nonuniform costs. Despite these efforts, the combinatorial optimization nature of the network dismantling problem still requires better global solutions, even if approximated. Additionally, the cost to remove components is the only factor considered in most previous methods. Network resilience, which can inform what to protect from dismantling to facilitate recovery, is seldom included as part of the cost. In this work, we propose a method employing Karger’s contraction algorithm and node-transferring heuristic optimization to approximate the optimal dismantling set, considering both component removal cost and network resilience after dismantling. With the cost associated with the resilience of the dismantled network into account, the proposed method not only dismantles the network as desired, but also makes it hard to recover. The proposed method, resilDism, obtains good performance compared to state-of-the-art network dismantling methods, and provides valuable insights to guide network design and resilience enhancement in practice. |

13:15 | Defense Against Shortest Path Attacks PRESENTER: Benjamin Miller ABSTRACT. Identifying shortest paths in a graph is a common problem in applications involving routing of resources, such as packets in a computer network or vehicle traffic on a road. We consider the case where a malicious actor wants a particular path p* to be the shortest between two nodes and can remove edges to achieve this goal. This work aims to raise the attacker's cost by obscuring the true graph while retaining its utility for legitimate users. In prior work, we showed that finding the optimal attack is NP-hard, but there is an approximation that can be efficiently computed using an algorithm called PATHATTACK. In our present work, the goal is to increase the cost of an attack by concealing the true weights and providing only approximate weights. Our approach is inspired by a technique for privacy preserving approximations of shortest paths. We release a graph where each edge weight has had noise from a Laplace distribution added to it. There are two attackers we consider: (1) an informed attacker, who is given an uncertainty interval to consider around each observed weight, and (2) an oracle-enhanced attacker, who is able to check whether candidate attacks are successful on the true graph. If an attacker assumes an uncertainty interval of width 2α around the observed weight, we say that attacker is α-informed. Each attacker uses a variant of PATHATTACK modified for the context. PATHATTACK uses an approximation algorithm to iteratively optimize the cost of edge removal to make a target path the shortest. The attackers in this case optimize the expected cost given the observed graph. The informed attacker continues until all remaining paths are longer than p*, for any edge weights within the assumed uncertainty intervals. In each scenario, we compare the attacker's required budget to the budget using PATHATTACK on the original (noiseless) graph. For the informed attacker, we also consider whether or not the attack succeeds. We evaluate the increase in the adversary's budget on a variety of graph models, including Erdős–Rényi, Barabási–Albert, and Kronecker graphs, as well as real networks of interest, such as roads and computer networks. In unweighted graphs, we give the true graph random weights. Our principal evaluation metric is the cost incurred by the attacker using one of the given strategies. Even when the adversary has access to an oracle, the cost incurred in the optimization is appreciably increased by the uncertainty in the data. The informed attackers become more reliably successful as the uncertainty intervals are widened, but this comes with a significant increase in cost. In the presentation, we will discuss the tradeoffs between increasing the adversary's budget and enabling users to compute distances between nodes. |

12:00 | Understanding European Integration with Bipartite Networks of Comparative Advantage ABSTRACT. Labor division and specialization across nations in the global value chain is a natural process and a potential area of policy intervention to foster convergence. Models on European integration suggest such diverging structures of specialization [1]. However, specialization into identical products signals competition across countries and less developed countries can benefit less from the integration in case they co-specialize with more developed countries. We study the evolution of co- specialization motif [2] directly on the bipartite network between 21 European countries and 31 industry sectors using the Relative Comparative Advantage (RCA)[3,4] to captures if two countries are specialized in the same industry during the period 2000-2014. Moreover, we aim to disentangle the relation of co-specialization within and across the groups of EU15 and CEE member states. By measuring the statistical significance of this motif comparing the occurrence of observed motifs to a null model for each using the Bipartite Configuration Model [5], we can assess how the production structures are changing (overlapping or diverging) across the EU states. Our measurement reveals no significant overlaps of comparative advantages across EU15 countries after 2000, which is probably due to their gradual integration. However, we find that CEE countries tend to be specialized in similar, if not identical, sectors. The number of co-specialization network motifs including both EU15 and CEE countries decrease after enlargement signalling [6] a deeper division of production between EU15 and CEE. We do however find that productivity increases in those CEE industries that had no significant overlap in specialization before EU accession but experience increasing co-specialization with other CEE countries after entering the EU. In the meantime, co-specialization across EU15 and CEE countries contribute slightly positive, if at all significant, to productivity growth after accession. The findings refer to the role of labor division between EU15 and CEE that foster co- specialization of CEE industries and facilitate their integration and convergence in the common market. Our new approach provides novel insights for the current EU policy on Smart Specialization. These results highlight the need to apply these policy tools in new member states differently from old member states to reflect labor division that can support convergence. |

12:15 | Investigating and Modeling the Dynamics of Long Ties PRESENTER: Ding Lyu ABSTRACT. Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynamic networks shows that contrary to such reasoning, long ties are more likely to persist than other social ties, and that many of them constantly function as social bridges without being embedded in local networks. Using a novel cost-benefit analysis model combined with machine learning, we show that long ties are highly beneficial, which instinctively motivates people to expend extra effort to maintain them. This partly explains why long ties are more persistent than what has been suggested by many existing theories and models. Overall, our study suggests the need for social interventions that can promote the formation of long ties, such as mixing people with diverse backgrounds. |

12:30 | Characterising the role of human behaviour in the effectiveness of contact-tracing applications ABSTRACT. Despite the widespread use of contact-tracing (CT) applications against the Covid-19 pandemic, as of today, the debate around their effectiveness is still open. Most studies indicate that very high levels of adoption are required for epidemic control [1], which has placed the main interest of policymakers in promoting app adherence. However, other factors of human behaviour, like delays in adoption or heterogeneous compliance, could also have a relevant impact on app effectiveness and are often ignored by policymakers. To characterise their importance in CT app’s effectiveness, we propose the implementation of a dynamic agent-based model describing the interplay between disease progression and app adoption in a single population. The model relies on a two-layer multiplex network to reflect the connectivity patterns in which each dynamic evolves (in-person contacts for the epidemic and Bluetooth interactions for the CT app) (figure 1a). Epidemic progression is described through a SEPIR model [2] with preventive quarantines triggered by the CT app, which was initialised to reflect the first wave of the Covid-19 pandemic (figure 1c). App adoption is incorporated through a threshold dynamic dependent on the intrinsic reluctancy level of each individual and the evolution of the pandemic. The model also considers heterogeneous compliance in terms of CT app reporting (figure 1b). Using this model, we quantified the impact of three features of human behaviour (time of adherence, the level of compliance, and the maximal level of adoption) in the effectiveness of CT app. This was achieved by exploring the parameter space in three hypothetical scenarios: the voluntary adoption scenario (where complete compliance is assumed), the imposed adoption one (with zero reluctancy towards app adoption) and the “adherence & compliance" scenario (only constraining the maximal level of adoption). The results obtained agree with prior literature, indicating the relevance of high percentages of adoption for the performance of CT apps. However, they also evidenced the importance of early adoption and moderate levels of compliance (above 20%) to obtain effective strategies. The insight obtained was also used to identify a bottleneck in the implementation of the Spanish CT app, where we hypothesise that a simplification of the reporting system could result in increased effectiveness through a rise in the levels of compliance. |

12:45 | Self-induced emergence of consensus in social networks: Reddit and the GameStop short squeeze ABSTRACT. The short squeeze of GameStop (GME) shares in mid-January 2021, primarily orchestrated by retail investors on the Reddit r/wallstreetbets community, caused major losses for short sellers hedge funds and a drastic surge of the stock price. Such an unprecedented event in finance represents a paramount example of a collective coordination action on online social media resulting in consensus formation on large scale. Here we characterize the structure and time evolution of Reddit conversation data, showing that the occurrence of GME-related comments and their sentiment grew much before the short squeeze actually took place. These early signs of the collective action can be associated to a self-reinforcing, increasing level of commitment and social identity of individual users. In order to understand such dynamics we develop a model of opinion formation that combines peer interaction with a self-induced feedback from the global state of the community. Analytic mean-field solutions and simulations on extracted social networks of Reddit users display a phase transition from a disordered state to full consensus depending on the level of social identity, with the presence of hubs favouring the emergence of consensus. Our results can shed light on the increasingly important phenomenon of self-organized collective actions on social networks. |

13:00 | The Steadiness of Transient Relationships ABSTRACT. Humans are social animals and having strong and supportive relationships with others has large effects on both physical and mental health. Because such strong relationships are frequently those with long duration, research has traditionally focused on long-term relationships. However, human communication networks have a considerable number of short-term transient relationships, and far less is known about their temporal evolution. Among the few things known about such relationships, previous literature suggests that ratings of relationship emotional intensity decay gradually until the relationship ends. Using mobile phone data from three countries (US, UK, and Italy), we show that the volume of communication between ego and its contacts is stable for most of the duration of a transient relationship. Contacts with longer durations receive more calls, with the duration of the relationship being predictable from call volume within the first few weeks of first contact. Relationships typically start with an early elevated period of communication, settling into the longer steady regime until eventually terminating abruptly. This is observed across all three countries, which include samples of egos at different life stages. These results are consistent with the suggestion that individuals initially engage frequently with a new alter so as to evaluate their potential as a tie, and then settle back into a contact frequency commensurate with the alter's match to ego on the so-called Seven Pillars of Friendship hypothesis. Our results also indicate that, while objective data such as mobile phone contact provides one important dimension about people's communication patterns, subjective measures that may not always parallel objective ones are still needed to develop a complete picture of a person's ego network. |

12:00 | CANE: Causality Aware Network Embedding for Time Series Data ABSTRACT. Learning vector representations of nodes is a fundamental step for using networks in machine learning pipelines. Embedding methods aim at fulfilling this aim by finding parsimonious vector representations for the topological patterns in network structures. However, structure is often not enough to capture the patterns occurring on the network due to dynamics: while the network structure constrains dynamic, it does not completely determine it. This distinction is due to patterns that, while present in temporal networks, cannot be expressed by their static counterparts. The most straightforward example is that of systems whose patterns differ at different points in time. i.e., the central nodes, communities, and other structures at time $t_1$ are different from those at time $t_2$, making them unpredictable using only the network structure. However, the temporal dynamics of networks can also be characterized by so-called higher-order patterns, i.e., patterns that do not depend on when interactions happen but on their order. Higher-order patterns lead to edge-sequences occurring with higher or lower frequency than what would be expected based on the network structure. The significance of these patterns for network analysis was recently highlighted by works demonstrating that correlation in the chronological order of interactions can invalidate standard methods for network-based data analysis, calling for higher-order modeling and analysis techniques. Despite their importance, most existing embedding methods do not account for higher-order patterns. They either learn representations for static networks or create representations that change with the variation in time of the network dynamics. This work addresses this gap by proposing a method to incorporate higher-order patterns in vector representations of nodes. The method builds on the multi-order network model presented in a previous paper and uses a banal yet significant feature of the model: multi-order networks are themselves networks. This fact is very convenient, as it allows the application of existing network methods to capture patterns in the higher-order network topology. We apply the community detection algorithm Infomap to the multi-order network, thus capturing higher-order community patterns that are potentially invisible to the application of network analysis methods to the standard topology. Then, we use the communities identified in the multi-order network to define vector representations that are expressive of the so-identified higher-order patterns |

12:15 | Impact of Heterogeneity on Network Embedding PRESENTER: Bo Liang ABSTRACT. In recent years, network embedding has attracted much attention from researchers and achieved excellent performance. But few works investigate the adaptability of network embedding, especially for performance in different network structures. Heterogeneity, as a universal topological characteristic, plays a prominent role in network behaviors. In this study, we investigate the effect of heterogeneity on the effectiveness of existing network embedding approaches. We conduct experiments in scale-free networks with varying power exponents from both macro and micro perspectives to address link prediction and node similarity tasks, respectively. The results indicate that network embedding approaches can be divided into two classes according to their performance in the link prediction task. The first class includes GF, SDNE, and Line1, and the second class includes Line2 and GraRep. As the network heterogeneity decreases, the performance of approaches in the first class declines, while the performance of approaches in the second class initially improves and then declines. Moreover, our simulation discovers that, based on the node similarity metric, nodes are partitioned into two clusters by approaches, corresponding to large-degree nodes and small-degree nodes, respectively. Furthermore, approaches in the same class present similar characteristics between large-degree nodes and small-degree nodes, and the embedding is interpreted to some extent. Specifically, approaches in the first class assume that large-degree nodes are similar to large-degree nodes and small-degree nodes, whereas approaches in the second class assume that large-degree nodes are just similar to large-degree nodes. Performance variations in the link prediction task can be explained by the characteristics of the approaches, and similar characteristics are confirmed in experiments on real networks. Based on the findings for link prediction, we offer a brief guide for choosing an appropriate method based on the extent of heterogeneity. The investigation provides insight into network embedding and offers some interpretation of embedding, which could further establish the connection between network science and machine learning. |

12:30 | Influence of clustering coefficient on network embedding in link prediction ABSTRACT. Multiple network embedding algorithms have been proposed to perform the prediction of missing or future links in complex networks. However, we lack the understanding of how network topology affects their performance, or which algorithms are more likely to perform better given the topological properties of the network. In this paper, we investigate how the clustering coefficient of a network, i.e., the probability that the neighbours of a node are also connected, affects network embedding algorithms' performance in link prediction, in terms of the AUC (area under the ROC Curve). We evaluate classic embedding algorithms, i.e., Matrix Factorisation, Laplacian Eigenmaps [1] and node2vec [2], in both synthetic networks and (rewired) real-world networks with variable clustering coefficient ($C_L$). Specifically, a rewiring algorithm is applied to each real-world network to change the clustering coefficient while keeping key network properties. We find that a higher clustering coefficient tends to lead to a higher AUC in link prediction, except for Matrix Factorisation, which is not sensitive to the change of clustering coefficient. To understand such influence of the clustering coefficient, we (1) explore the relation between the link rating (probability that a node pair is the missing link) derived from the aforementioned algorithms and the number of common neighbours of the node pair, and (2) evaluate these embedding algorithms' ability to reconstruct the original training (sub)network. All the network embedding algorithms that we tested tend to assign higher likelihood of connection to node pairs that share an intermediate or high number of common neighbours, independently of the clustering coefficient of the training network. Then, the predicted networks will have more triangles, thus a higher clustering coefficient. As the clustering coefficient increases, all the algorithms but Matrix Factorisation could also better reconstruct the training network. These two observations may partially explain why increasing the clustering coefficient improves the prediction performance. [1] Belkin, M., Niyogi, P.: Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, pp. 585–591. MIT Press (2002). [2] Grover, A., Leskovec, J.: Node2Vec: Scalable Feature Learning for Networks. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16, pp. 855–864. ACM, New York, NY, USA (2016). |

12:45 | Explainable Node Embeddings ABSTRACT. Node embedding methods take a network and embed its nodes in a low dimensional latent space. The user specifies the size of the low dimensional space. We address the following questions: (1) Can we explain the meaning of each dimension in the low dimensional latent space? If so, how? (2) Can we change the loss function of any node embedding method to produce embeddings that are more explainable? If so, how? We answer yes to both of these questions and provide methods to achieve them. We define a set of sense features that capture global and local node features in a network. Given a matrix of sense features, F ∈ R^n×f , where n is the number of nodes and f is the number of sense features, and an embedding matrix V ∈ R^n×d, where d is the number of embedding dimensions, we use Non-negative Matrix Factorization to learn an Explain matrix E ∈ R^d×f such that V E = F . Henderson et al. called this procedure sense-making. Each row of the Explain matrix E describes how much each dimension contributes to explaining a given sense feature. Figure 1(a) shows the results of our sense-making (as a heat map) on the EU Email Network when it is embedded into 32 dimensions with Structural Deep Network Embedding (SDNE). By inspecting the heat map in Figure 1(a), we can explain each dimension by their sense features. For example, Dimension 15 is explained by degree, personalized PageRank, number of egonet edges, and node betweenness. While the sense-making procedure mentioned above allows us to explain each dimension in terms of the user- defined sense features, it can still be difficult to explain each dimension when many sense features contribute to it. To improve explainability, we impose two constraints on the Explain matrix E during the training phase. First, we require the rows of the Explain matrix to be orthogonal to one another. That is, we want dimensions that are explained by different sense features. Second, we require the columns of the Explain matrix to be sparse. That is, we want each sense feature to contribute to the explanation of as few dimensions as possible. We impose these constraints by training an augmented version of SDNE. We first run the standard SDNE implementation and use the learned weights to initialize the model for the version with our augmented loss function, which we call SDNE+. Figure 1(b) shows the heat map for the Explain matrix produced by SNDE+ for the same EU Email Network. We observe that the heat map is sparser and fewer dimensions strongly correspond to many sense features. For example, Dimension 5 only corresponds strongly to node betweenness. For brevity, we have omitted the results showing that SDNE+ performs as well as SDNE on downstream tasks (such as link prediction). |

13:00 | Topological data analysis of truncated contagion maps ABSTRACT. The investigation of dynamical processes on networks has been one focus for the study of contagion processes. It has been demonstrated that contagions can be used to obtain information about the embedding of nodes in a Euclidean space. Specifically, one can use the activation times of threshold contagions to construct contagion maps as a manifold-learning approach. One drawback of contagion maps is their high computational cost. Here, we demonstrate that a truncation of the threshold contagions may considerably speed up the construction of contagion maps. For synthetic networks, we find that a carefully chosen truncation may also improve the recovery of hidden geometric structures. Finally, we show that contagion maps may be used to find an insightful low-dimensional embedding for single-cell RNA-sequencing data in the form of cell-similarity networks and so reveal biological manifolds. Overall, our work makes the use of contagion maps as manifold-learning approaches on empirical network data more viable. |

13:15 | Increasing the Stability of Embeddings of the Degenerate Core ABSTRACT. Given that real-world networks are noisy and dynamic, how stable are graph embeddings to perturbations in the graph periphery? We show that existing graph embedding algorithms produce embeddings that are unstable when removing periphery nodes leads to spikes in edge density. We introduce a method for identifying instability. Our method measures changes to the pairwise embedding distances among nodes in the graph's degenerate core (i.e., its k-core with maximum k) as outer k-shells (i.e., the "periphery") are iteratively shaved off. After removing the outermost remaining k-shell, we re-embed the graph. We quantify instability as the amount of change to the degenerate core pairwise distribution as each periphery shell is removed. We apply our method to embeddings of real-world and synthetic graphs produced by a variety of matrix-factorization, skip-gram, and deep-learning embedding algorithms. We find three patterns: 1) degenerate-core embeddings are sensitive to the removal of specific k-shells; 2) degenerate-core embeddings for Erdos-Renyi (ER) and Barabasi-Albert (BA) random graphs are stable; 3) as the periphery is removed, the degenerate-core pairwise distribution becomes smoother, suggesting a loss of community structure. To explain these patterns, we show that points of instability are correlated with increases in edge density. For dense subgraphs, we find that graph embedding algorithms fail to distinguish core and periphery nodes. Patterns 1 and 3 are consistent with this result because we find that edge density increases as particular k-shells are removed, causing inseparability between core and periphery embeddings. On the other hand, the degenerate cores for ER and BA graphs span a large percentage of the entire graph so removing the periphery has little impact on the degenerate-core embeddings. To mitigate embedding instability, we introduce a generic graph embedding algorithm. Our algorithm augments an existing algorithm's loss function by adding an instability penalty. We augment Laplacian Eigenmaps and LINE and show that the embeddings are indeed stable while preserving link-prediction results. |

13:40 | Recurrence view of temporal network data: System-state dynamics, recurrence plot, and embedding |

14:15 | WuDao: Pretrain the World |

14:50 | Artificial intelligence in breast cancer diagnostics |