WSOM+ 2019: 13TH INTERNATIONAL WORKSHOP ON SELF-ORGANIZING MAPS AND LEARNING VECTOR QUANTIZATION, CLUSTERING AND DATA VISUALIZATION
PROGRAM FOR FRIDAY, JUNE 28TH
Days:
previous day
all days

View: session overviewtalk overview

09:30-10:30

Invited Talk (Tobias Schreck, Austria)

10:30-11:00

Coffee Break

11:00-13:10 Session 7

Theory and Methods

11:00
Solving a tool-based interaction task using deep reinforcement learning with visual attention

ABSTRACT. We propose a reinforcement learning approach that combines an "asynchronous actor-critic model" with a "recurrent model of visual attention". Instead of using the full visual information of the scene, the resulting model accumulates the foveal information of controlled glimpses and is thus able to reduce the complexity of the network. Using the designed model, an artificial agent is able to solve a challenging "mediated interaction" task. In these tasks, the desired effects cannot be created through direct interaction, but instead require the learner to discover how to exert suitable effects on the target object through involving a "tool". To learn the given mediated interaction task, the agent is "actively" searching for salient points within the environment by taking a limited number of fovea-like glimpses. It then uses the accumulated information to decide which action to take next.

11:25
Approximate Linear Dependence as a Design Method for Kernel Prototype-based Classifiers

ABSTRACT. The approximate linear dependence (ALD) method is a sparsification procedure used to build a dictionary of samples extracted from a dataset. The extracted samples are approximately linearly independent in a high-dimensional kernel reproducing Hilbert space. In this paper, we argue that the ALD method itself can be used to select relevant prototypes from a training dataset and use them to classify new samples using kernelized distances. The results obtained from intensive experimentation with several datasets indicate that the proposed approach is viable to be used as a standalone classifier.

11:50
Subspace Quantization on the Grassmannian

ABSTRACT. We propose two algorithms for clustering points on a Grassmann manifold, i.e., subspace quantization. In this framework points are subspaces created by vector data samples from the same class. The implementation substitutes the Euclidean distance employed in standard clustering algorithms to a distance on the Grassmannian based on angles between subspaces. The centroid computation is now done using a flag mean algorithm for averaging points on the Grassmannian, or by updating the centroid along a geodesic curve on the manifold. The resulting unsupervised algorithms are applied to the MNIST digit data set and the AVIRIS Indian Pines hyperspectral data set.

12:15
Variants of Fuzzy Neural Gas

ABSTRACT. Neural Gas is a prototype based clustering technique taking the ranking of the prototypes regarding their distance to the data samples into account. Previously, we proposed a fuzzy version of this approach, yet restricted our method to probabilistic cluster assignments. In this paper we extend this method by combining possibilistic and probabilistic assignments. Further we provide modifications to handle non-vectorial data.

12:40
Autoencoders Covering Space As A Life-Long Classifier

ABSTRACT. A life-long classifier that learns iteratively has many challenges such as concept drift, when the class changes in time, and catastrophic forgetting when the earlier learned knowledge is lost. Many successful connectionist solutions are based on an idea that new data are learned only in a part of a network that is relevant to the new data. We leverage this idea and propose a novel method for learning an ensemble of specialized autoencoders. We interpret autoencoders as manifolds that can be trained to contain or exclude given points from the input space. This manifold manipulation allows us to implement a classifier that can suppress catastrophic forgetting and adapt to concept drift. The proposed algorithm is evaluated on an iterative version of the XOR problem and on an iterative version of the MNIST classification where we achieved 0.9 accuracy which is a significant improvement over the previously published results.

13:05
Soft Subspace Topological Clustering over Evolving Data Stream

ABSTRACT. The aim of subspace clustering is to find groups of similar data points in all possible subspaces of a dataset. Data stream is a massive sequence of data coming continuously. Clustering this type of data requires some restrictions in time and memory. In this paper, we propose S2G-Stream, a new topological clustering of data stream, based on Growing Neural Gas algorithm and Soft Subspace clustering, by introducing two types of entropy weighting for both features and subspaces. Experiments on public datasets illustrated the ability of S2G-Stream to detect relevant features and subspaces and also to provide the best partitioning of the data.

14:45-16:25 Session 8

SOM: Practical Applications, part II

14:45
When clustering the multiscalar fingerprint of the city reveals its segregation patterns

ABSTRACT. The complexity of urban segregation challenges researchers to develop powerful and complex mathematical tools for assessing it. Individual-based models are now made possible in practice by the availability these last years of more and more fine-grained and massive data. Very recently, a mathematical object called multiscalar fingerprint containing all possible and all-scale individual trajectories in a city, was introduced. In this manuscript, we use clustering combined with specific measures for assessing features contributions to clusters, to explore this complex object and to single out hotspots of segregation. We illustrate how clustering allows to see where, how and to which extent segregation occurs.

15:10
Self-Organizing Maps in Earth Observation Data Cubes Analysis

ABSTRACT. Earth Observation (EO) Data Cubes infrastructures model analysis-ready data generated from remote sensing images as multidimensional cubes (space, time and properties), especially for satellite image time series analysis. These infrastructures take advantage of big data technologies and methods to store, process and analyze the big amount of Earth observation satellite images freely available nowadays. Recently, EO Data Cubes infrastructures and satellite image time series analysis have brought new opportunities and challenges for the Land Use and Cover Change (LUCC) monitoring over large areas. LUCC have caused a great impact on tropical ecosystems, increasing global greenhouse gases emissions and reducing the planet's biodiversity. This paper presents the utility of Self-Organizing Maps (SOM) neural network method in the process to extract LUCC information from EO Data Cubes infrastructures, using image time series analysis. Most classification techniques to create LUCC maps from satellite image time series are based on supervised learning methods. In this context, SOM is used as a method to assess land use and cover samples and to evaluate which spectral bands and vegetation indexes are best suitable for the separability of land use and cover classes. A case study is described in this work and shows the potential of SOM in this application.

15:35
Competencies in higher education: a feature analysis with self-organizing maps

ABSTRACT. Students are supposed to accomplish with a set of generic competencies when they finish their studies. One of the major challenges in Universities is to detect shortcomings in students in order to strengthen them, so they could accomplish with the competencies required for a professional career. In this paper, unsupervised machine learning techniques like Self-Organizing Maps are used to analyze students’ features from the bachelor’s degree in Psychology. This first approach will consist of clustering stu-dents’ profiles in their first course of college. The dataset contains 16 fea-tures from 53 individuals. Results show that clusters’ differences mostly depend on the organizational and social competencies on one side, and neu-roticism and amiability on the other.

16:00
Using SOM-based Visualization to Analyze the Financial Performance of Consumer Discretionary Firms

ABSTRACT. This paper analyzes financial ratios of 27 consumer discretionary firms listed on the S&P 500 over an eleven-year period from 2006-2016. It adopts a two-step approach wherein first a confirmatory factor analysis (CFA) on the financial time-series is conducted and the resulting constructs’ scores are then used to perform a cluster analysis using self-organizing maps (SOMs). The consumer discretionary sector is considered an economic and stock market predictor. It consists of non-essential goods and services which in an economic slump are more likely to be foregone. The suggested approach is expected to be a useful reference guide to help understand the past perfor-mance of inter- and intra-sector companies. It also enriches the body of liter-ature on the application of machine learning techniques to the analysis of firm- and sectoral-level performance.

16:25-17:20 Session 9

Life Science Applications, part II

16:25
Progressive clustering and characterization of increasingly higher dimensional datasets with Living Self-Organizing Maps

ABSTRACT. Long-lived consortiums in genomics generate massive highly-dimensional datasets over the course of many months or years with substantial blocks of data added over time. Algorithms designed to characterize and cluster this data are designed to run once on a dataset in its entirety, and thus, any analysis of these collections must be entirely re-done from scratch every time a new block of data is added. We describe a novel progressive clustering approach using a variation of the self-organizing map (SOM) algorithm, which we call the Living SOM. Our software package is capable of clustering highly-dimensional data with all of the power of regular SOMs with the added benefit of incorporating additional datasets as they become available while maintaining the initial structure as much as possible. This allows us to evaluate the impact of the new datasets on previous analyses with the potential to keep classifications intact if appropriate. We demonstrate the power of this technique on a collection of gene expression experiments done in an embryonic time course of development for mouse from the ENCODE consortium.

16:50
Network Community Cluster-Based Analysis for the Identification of Potential Leukemia Drug Targets

ABSTRACT. Leukemia is a hematologic cancer which develops in blood tissue and causes rapid generation of immature and abnormal-shaped white blood cells. It is one of the most prominent causes of death in both men and women for which there is currently not an effective treatment. For this reason, several therapeutical strategies to determine potentially relevant genetic factors are currently under development, as targeted therapies promise to be both more effective and less toxic than current chemotherapy. In this paper, we present a network community cluster-based analysis for the identification of potential gene drug targets for acute lymphoblastic leukemia and acute myeloid leukemia.

17:15
Simultaneous display of front and back sides of spherical SOM for health data analysis

ABSTRACT. We propose to simultaneously display the front and back sides of the spheri-cal SOM so that cluster locations can be expressed in terms of phase rela-tions even if they are on opposite sides of the spherical SOM. The technique is showcased on simple animal-, and medical health care-data. Furthermore, the component map was converted numerically (DIM (Dimensional Interac-tion Map) mode) for the medical data case and the result compared with that obtained with the front and back sides of the map.

17:20-17:30

Conference Closing