TDA at SDM22 Workshop: Applications of Topological Data Analysis to Data Science, Artificial Intelligence, and Machine Learning Westin Alexandria Old Town Alexandria, VA, United States, April 28-30, 2022 |
Conference website | https://sites.google.com/view/tdaworkshopatsdm22/home |
Submission link | https://easychair.org/conferences/?conf=tdaatsdm22workshop |
Submission deadline | March 8, 2022 |
Important Dates
Submissions due: March 1, 2022 March 8, 2022 (updated 24 February).
Notifications due: March 22, 2022
Camera-ready presentations/papers due: April 15, 2022
Workshop: April 28, 2022
Background
In the information age, data has become critical to virtually every aspect of the human experience. Data-driven decision making has afforded humans increasing levels of utility, which in turn has driven a heavier reliance on data. Additionally, artificial intelligence and machine learning (AI/ML) are being applied to critical tasks (e.g., self-driving cars), despite being trained on subcollection of all the data that could be potentially found in the wild. As the volume and complexity of this data grows, it has become increasingly necessary for both the human and the AI to have frameworks capable of making sense of large, high-dimensional, incomplete, and noisy data sets.
Topological Data Analysis (TDA) is a rigorous framework that borrows techniques from geometric and algebraic topology, category theory, and combinatorics in order to study the “shape” of such complex high-dimensional data. Research in this area has grown significantly over the last several years bringing a deeply rooted theory to bear on practical applications in areas such as genomics, natural language processing, medicine, cybersecurity, energy, and climate change. Within some of these areas, TDA has also been used to augment AI and ML techniques.
The main premise of TDA is that the shape of the data is critical to understanding it. There are numerous approaches in the literature that are oriented toward this goal. Most operate by building a simplicial or cell complex, and possibly a sheaf of functions on the faces of a complex. A high-level TDA pipeline is as follows:
-
Sample data: Data living on some submanifold of ℝ^n
-
and endowed with a metric are sampled from a population. A distinction must be drawn between the sampled data, which can be observed, and the population, from which the sample is drawn. The goal is to draw conclusions of the latter from the former.
-
Build topological spaces: Because the data is incomplete, the shape of the population can only be approximated. Thus, a family of topologies or geometries (usually a filtration of simplicial complexes), reflecting different levels of granularity, are derived from the data.
-
Compute features: The main challenge is to try to identify features of this family that are characteristic of the population. This usually involves forming definitions of stable features across the filtration.
One common approach in TDA, namely persistent homology, “fills in” or “thickens” the point cloud by constructing the filtration of a simplicial complex based on considering neighbors at increasing distance thresholds and computes the homology of each of the resulting complexes. In this framework, the features being derived are algebraic representations, from which counts can be derived, of holes and higher dimensional voids. The features ascribed to the population are those that persist for a wide interval of threshold values. Conversely, those features that are born and die over a short interval are attributed to noise in the data due to sampling.
Other common approaches in TDA include mapper, which builds a skeleton graph of a data sample from ℝ^n (often referred to as a point cloud) with respect to a filtration function on the data, and sheaf modeling, which considers agreement of data across open covers of an underlying topological space. Additionally, although commonly data come from point clouds these same TDA techniques can be applied to images, functions, text, sound waves, and much more.
Workshop Description
We believe there is further utility to be gained in this space that can be facilitated by a workshop bringing together experts (both theorists and practitioners) and non-experts. Currently there is an active community of pure mathematicians with research interests in developing and exploring the theoretical and computational aspects of TDA. Applied mathematicians and other practitioners are also present in community but do not represent a majority. This speaks to the primary aim of this workshop which is to grow a wider community of interest in TDA. By fostering meaningful exchanges between these groups, from across the government, academia, and industry, we hope to create new synergies that can only come through building a mutual comprehensive awareness of the problem and solution spaces. By way of our invited speakers and direct outreach with all communities across the three sectors, we shall attain adequate representation of all groups at the workshop and achieve the goals we have set.
We plan plan for a full-day program that includes invited presentations by major contributors in the field, submitted talks, a poster session, and panel discussions, all designed to promote cross-community dialogue and new collaboration opportunities.
The organizers will build a program whose content is highly technical and broadly appealing to the attendees of the main conference. The invited speakers will include those actively contributing to the state of the art of TDA and who can provide a unique perspective on the challenges, successes, and potential of TDA to solve problems in complex domains. The organizers will seek talk proposals from the wider SIAM community on related topics, to ensure that the workshop is broadly appealing. The program will include two panel discussions (see below for potential topics) and two poster sessions to bring additional perspectives and facilitate conversations.
Submission Guidelines
The organizers invite participants to submit short (max 5 pgs. excluding references and supplementary material) papers to the workshop. These may include traditional research papers, position papers, and those of a more visionary character. Papers will be evaluated primarily on their relevance to the workshop topics and ability to stimulate intellectual conversations and/or healthy debate. As appropriate for the kind of submission, papers will also be evaluated on their originality, technical quality, level of insight, clarity, and potential impact.
The organizers encourage submissions which cover or are related to the workshop topics below but will consider papers of broad interest to the anticipated audience.
Depending on the number of submissions, the organizers may accept papers for either oral presentations or as part of a virtual poster session. All accepted papers will appear in an arxiv proceedings for the workshop.
Papers must be submitted by 11:59 PM AOE on March 1 using the SIAM Template (https://www.siam.org/publications/journals/about-siam-journals/information-for-authors). Notifications will be sent by March 22. The Camera-ready presentations must be ready by April 15.
List of Topics
Topics of interest for talks and posters include but are not limited to the following:
-
Application of TDA to study complex problem domains, e.g., cybersecurity, biology, natural language processing.
-
Use of TDA to aid in making AI/ML algorithms, such as deep neural networks which are notoriously opaque and fragile, more robust and explainable.
-
Practical interpretations of TDA. For example, persistent homology can provide information on the connected components and higher dimensional “holes”. While the number of connected components can be interpreted as categorical labels for your data, the practical implications of loops and voids is less obvious. What can we infer from the structure of loops and voids in the data?
-
Computational methods and other advances for computational efficiency of TDA tools.
-
Applications of other areas of mathematics to enhance TDA, e.g., dynamics, category theory, combinatorics.
Committees
Program Committee
- Tegan Emerson, Pacific Northwest National Laboratory
- John Healy, Tutte Institute for Mathematics and Computing
- Henry Kvinge, Pacific Northwest National Laboratory
- Justin Mauger, Naval Information Warfare Center Pacific
- Leland McInnes, Tutte Institute for Mathematics and Computing
- Washington Mio, Florida State University
- Tom Needham, Florida State University
- Michael Robinson, American University
- Robert Ghrist, University of Pennsylvania
- Rick Jardine, University Western Ontario
- Cliff Joslyn, Pacific Northwest National Laboratory
- Brett Jefferson, Pacific Northwest National Laboratory
- Bei Wang, University of Utah
- Dmitriy Morozov, Lawrence Berkley National Lab
Workshop Chairs
- R.W.R. Darling
- John A. Emanuello
- Emilie Purvine
- Ahmad Ridley
Venue
The workshop will be held during the SIAM International Conference on Data Mining (SDM22) which is scheduled to take place virtually and in-person at the Westin Alexandria Old Town in Aleandria, VA.