previous day
next day
all days

View: session overviewtalk overview

09:30-10:00Coffee Break
10:00-11:30 Session 8: User Engagement

User Engagement

A Quick Survey of Volumetric Content Streaming Approaches

ABSTRACT. Volumetric content is an important enabler for a wide range of applications such as immersive real-time 3D communications and virtual reality content viewing with interactive parallax. While nowadays there is more and more hardware that captures and presents 3D representations of the world, streaming these representations, known as volumetric content, is a key problem to be addressed. Major challenges are related to the transfer of large amounts of unstructured 3D data over bandwidth-limited networks, instant response to users’ behavior, i.e. latency compensation, as well as computational complexity at both the server and client devices. To provide an overview of studies conducted in the field of volumetric content streaming, we research relevant literature, summarize different streaming schemes related to this focus. This paper provides a discussion of the challenges of volumetric content streaming, and an overview of the representative volumetric content streaming approaches proposed in the literature to date. Future directions and areas requiring further research are also discussed.

Integrated Crowd Sourcing Framework Using Deep Learning for Digitalization of Indian Heritage Infrastructure

ABSTRACT. Every culture in the world reflects its magnificence and significance through the heritage infrastructure it conceives in the course of its civilization. India as a country is composed of diverse cultures which reflect grandeur and meritocracy in the architectural heritage across its territory. The necessity for digitizing and storing the information of the heritage of our country is challenging due to sheer scale of cultural data collection and also the reliability of such collection. These challenges can be overcome by harnessing the present state of art of technologies. Advancement of technology has impacted every area of our social life and the need to document, preserve the ancestral wisdom to pass it down generations is of prime importance. In this paper we propose our work on building a fully-fledged web framework using emerging technologies to aid the preservation of cultural heritage site and its related data and also to build a Deep Neural Network which classifies Heritage sites with the crowd sourced data. We propose an automated crowd-sourcing web application for data management and storage of images collected and a custom deep learning system for transfer learning with an extension for incremental learning. The framework also facilitates transfer learning to retrain the pre trained Deep Neural Network architecture for the crowd-sourced data for continuous improvement of the model and initiate back-end jobs for transfer learning. We also demonstrate the crowd-sourcing operations designed with academic hierarchy as reference and show its efficient data storage structure. We also display the extension of the framework as web application to edge devices to accelerate Indian heritage in digital space. Finally, we present the workflow for achieving 98.75% accuracy for a transfer learned model in the proposed framework with the crowd-sourced dataset.

A Realistic Polar Influence Propagation Model for Location based Social Networks

ABSTRACT. Location-based social networks (LBSNs) have gained significant popularity in recent years. LBSNs are a special type of social networks bridging the gap between online social networks and offline physical world. It is due to this special property that influence propagation in LBSNs have become an interesting research topic in past few years. Significant research work discussing the dynamics of influence propagation in LBSNs has been done so far. But the current influence propagation models still lack important parameters which make the model more aligned with real world scenarios. Based on existing works, this research paper proposes an improved realistic influence propagation model using mobile crowdsourced data obtained from a renowned LBSN. The proposed influence model incorporates polarity of influence associating it with positive or negative state. A new interest-match coefficient is also proposed which is based on real-world similarity between interests. The experimental results indicate that the proposed influence propagation model is meaningful and better aligned with reality.

Hashtags are (not) judgemental: The untold story of Lok Sabha elections 2019

ABSTRACT. Hashtags in online social media have become a way for users to build communities around topics, promote opinions, and categorize messages. In the political context, hashtags on Twitter are used by users to campaign for their parties, spread news, or to get followers and get a general idea by following a discussion built around a hashtag. In the past, researchers have studied certain types and specific properties of hashtags by utilizing a lot of data collected around hashtags. In this paper, we perform a large-scale empirical analysis of elections using only the hashtags shared on Twitter during the 2019 Lok Sabha elections in India. We study the trends and events unfolded on the ground, the latent topics to uncover representative hashtags and semantic similarity to relate hashtags with the election outcomes. We collect over 24 million hashtags to perform extensive experiments. First, we find the trending hashtags to cross-reference them with the tweets in our data set to list down notable events. Second, we use Latent Dirichlet Allocation to find topic patterns in the dataset. In the end, we use skip-gram word embedding model to find semantically similar hashtags. We propose popularity and an influence metric to predict election outcomes using just the hashtags. Empirical results show that influence is a good measure to predict the election outcome.

11:30-13:00 Session 9: Content Processing and Understanding II

Content Processing and Understanding II

PAI-BPR: Personalized Outfit Recommendation Scheme with Attribute-wise Interpretability

ABSTRACT. Fashion is an important part of human experience. Events such as interviews, meetings, marriages, etc. are often based on clothing styles. The rise in fashion industry and its effect on social influencing have made outfit compatibility a need. Thus, it necessitates an outfit compatibility model to aid people in clothing recommendation. However, due to the highly subjective nature of compatibility, it is necessary to account for personalization. Our paper devises an attribute-wise compatibility scheme with personal preference modelling which captures user-item interaction along with general item-item interaction. Our work also solves the problem of interpretability in clothing matching by locating the discordant and harmonious attributes between fashion items. Extensive experiment results on IQON3000, a publicly available real-world dataset, verify the effectiveness of the proposed model.

Multimedia Document Mining using Sequential Multimedia Feature Patterns

ABSTRACT. Recent years have witnessed the expeditious progress in multimedia technology and rapid growth of multimedia documents. The enormous amount multimedia documents require sophisticated multimedia mining methods to analyze and utilize the multimodal information. The multimodal objects of a multimedia document are described by the patterns of features. The feature pattern sequences are used to identify the contextual information of the documents. In this paper, we propose an approach for the discovery of sequential feature patterns from the multimedia documents. The sequential multimedia feature pattern mining generates the multimedia class sequential rules that are used to classify the multimedia documents. The efficiency of the proposed sequential multimedia feature pattern mining is evaluated by experimenting with four datasets of multimedia documents. Experimental results demonstrate that the proposed sequential feature pattern mining can be efficiently used for the knowledge mining from multimedia documents.

Quantifying (Hyper) Parameter Leakage in Machine Learning

ABSTRACT. Machine Learning models, extensively used for various multimedia applications, are offered to users as a blackbox service on the Cloud on a pay-per-query basis. Such blackbox models are commercially valuable to adversaries, making them vulnerable to extraction attacks to reverse engineer the proprietary model thereby violating the model privacy and Intellectual Property. Here, the adversary first extracts the model architecture or hyperparameters through side channel leakage, followed by stealing the functionality of the target model by training the reconstructed architecture on a synthetic dataset. While the attacks proposed in literature are empirical, there is a need for a theoretical framework to measure the information leaked under such extraction attacks. To this extent, in this work, we propose a novel probabilistic framework, Airavata, to estimate the information leakage in such model extraction attacks. This framework captures the fact that extracting the exact target model is difficult due to experimental uncertainty while inferring model hyperparameters and stochastic nature of training to steal the target model functionality. Specifically, we use Bayesian Networks to capture uncertainty in estimating the target model under various extraction attacks based on the subjective notion of probability. We validate the proposed framework under different adversary assumptions commonly adopted in literature to reason about the attack efficacy. This provides a practical tool to infer actionable details about extracting blackbox models and help identify the best attack combination which maximises the knowledge extracted (or information leaked) from the target model.

Efficient Compression Algorithm for Multimedia Data

ABSTRACT. In this work, we consider the problem of Cosine Similarity preserving dimensionality reduction (compression) for the sparse binary dataset. Pratap suggested a compression algorithm for high dimensional, sparse, binary data for preserving Inner product and Hamming distance. In this work, we show that their proposed algorithm also works well for Cosine Similarity. We present a theoretical analysis of the dimension reduction bound and complement it with rigorous experimentation on real-world datasets. We compare our results with the state-of-the-art for the considered problem -- SimHash, MinHash, Circulant Binary Embedding, and Densified one Permutation Hashing, and show that our result offers a significant saving in the compression time and the number of random bits required for the compression, and simultaneously provides comparable performance.

13:00-14:00Lunch Break
14:00-15:30 Session 10A: Short Paper - Content Analysis

Short Paper - Content Analysis

Addressing the Cold-Start Problem in Outfit Recommendation using Visual Preference Modelling

ABSTRACT. With the global transformation of the fashion industry and a rise in the demand for fashion items worldwide, the need for an effectual fashion recommendation has never been more. Despite various cutting-edge solutions proposed in the past for personalizing fashion recommendation, the technology is still limited by its poor performance on new entities, i.e. the cold-start problem. In this work, we attempt to address the cold-start problem for new users, by leveraging a novel visual preference modelling approach on a small set of input images. We demonstrate the use of our approach with feature-weighted clustering to perform occasion-oriented outfit recommendation. Quantitatively, our results show that the proposed visual preference modelling approach outperforms state of the art in terms of clothing attribute prediction. Qualitatively, through a pilot study, we demonstrate the efficacy of our system to provide diverse and personalized recommendations in cold-start scenarios.

Steady Flow Approximation using Capsule Neural Networks

ABSTRACT. CFD (Computational Fluid Dynamics) solvers have been very popular for fluid flow simulation which has been proved to be imperative to solve modern problems relating to analysis, design, and optimization in the field of aerodynamics. Nevertheless, CFD simulations are usually mem- ory intensive and computationally demanding, iterative time- consuming processes. Such drawbacks often affect productivity and limit the design space exploration and forbid interactive design. The real-time prediction of fluid flow helps us to overcome these drawbacks. There have been many successful implementations of the application of Deep Neural Networks for the real-time fluid flow prediction. Especially until now, CNNs (Convolutional Neural Networks) have been proven to be cutting edge solution for such approximations. However, CNN poses some challenges which makes it not suitable, especially for fluid flow approximations. We propose a new fluid flow approximation model for real-time prediction of non-uniform steady laminar flow, i.e. velocity field for 2D domain based on Capsule Neural Networks.

Assessing risk of attacks in large networked system with Context Sensitive Probabilistic Modelling

ABSTRACT. The recent trends of security breaches show that monetary or computational constraints no longer limit attackers, and the intent of the attacks are not confined to personal gains anymore. It has become a challenge to detect cyber-attacks in a large networked system due to the complex and distributed nature. In this paper, cyber-attacks will be identified by introducing the notion of risk of an attack. The risk is defined as the possibility that the current state of the system can lead to a breach if preventive measures are not taken. The model used to achieve this involves using Long Short-Term Memory (LSTM) to handle context-sensitivity of the dataset and a reward-based Markov Decision Process (MDP) to identify the risk associated with the current state. For this work, we demonstrate the effectiveness of using MDP and LSTM to detect attacks using CSE-CIC-IDS2018 dataset.

14:00-15:30 Session 10B: Short Paper - Novel Applications

Short Paper - Novel Applications

Fusion of Deep and Non-Deep Methods for Fast Super-Resolution of Satellite Images

ABSTRACT. In the emerging commercial space industry there is a drastic increase in access to low cost satellite imagery. The price for satellite images depends on the sensor quality and revisit rate. This work proposes to bridge the gap between image quality and the price by improving the image quality via super-resolution (SR). Recently, a number of deep SR techniques have been proposed to enhance satellite images. However, none of these methods utilize the region-level context information, giving equal importance to each region in the image. This, along with the fact that most state-of-the-art SR methods are complex and cumbersome deep models, the time taken to process very large satellite images can be impractically high. We, propose to handle this challenge by designing an SR framework that analyzes the regional information content on each patch of the low-resolution image and judiciously chooses to use more computationally complex deep models to super-resolve more structure-rich regions on the image, while using less resource-intensive non-deep methods on non-salient regions. Through extensive experiments on a large satellite image, we show substantial decrease in inference time while achieving similar performance to that of existing deep SR methods over several evaluation measures like PSNR, MSE and SSIM.

LayART: Generating indoor layout using ARCore Transformations

ABSTRACT. Reconstructing an indoor scene and generating a layout/floor plan in 3D or 2D is a widely known problem. Quite a few algorithms have been proposed in the literature recently. However, most of the existing methods either use RGB-D images, thus requiring a depth camera, or depend on panoramic photos with the assumption that there is little to no occlusion in the rooms. In this work, we proposed generation of layout using an RGB image captured using a simple mobile phone camera. We take advantage of Simultaneous Localization and Mapping (SLAM) to assess the 3D transformations required for layout generation. SLAM technology is built-in in recent mobile libraries such as ARCore by Google. Hence, the proposed method is fast and efficient, while giving the user freedom to generate layout by simply taking a few conventional photos, rather than relying on specialized depth hardware or occlusion-free panoramic photos.

Procedural Generation of Roads with Conditional Generative Adversarial Networks

ABSTRACT. Procedural terrain generation refers to the generation of terrain features, such as landscaping, rivers or road networks, through the use of algorithms, with minimal input required from the user. In the process of game development, generating terrain is often an important part of the game development process. Traditional generation methods are often too time consuming especially with larger terrain maps. On the other hand, procedural methods that generate terrain automatically often do not have much user control over the output. We explore the usage of conditional generative adversarial networks in the creation of road maps, as well as the application of such road maps in the creation of game levels in game development engines such as Unreal Engine 4.

Text-to-Clipart using AttnGAN

ABSTRACT. The Attentional Generative Adversarial Network (AttnGAN) is a state-of-the-art text-to-image generation model. One of the factors of AttnGAN’s success is the ability to evaluate the similarity between an input sentence and the generated image in the same feature space (Deep Attentional Multimodal Similarity Model, DAMSM). However, the network architecture of AttnGAN is complicated and vast, which necessitates considerable computational costs in the training process. When AttnGAN is applied to a text-to-image generation task in different image domains such as clipart, the output images are simpler than the high-resolution photos that AttnGAN originally assumes. Therefore, we propose a lightweight AttnGAN aiming at reducing the training computational cost without compromising the quality of the generated images. In particular, we focus on the image encoder; replacing it from Inception-v3 to VGG-16 reduces the DAMSM training time by approximately half of the original implementation.

15:30-16:00Coffee Break
16:00-17:30 Session 11: Undergraduate Student Consortium and Demo

Undergraduate Student Consortium and Demo

BOOKiiIT - Designing a Venue Booking System

ABSTRACT. Every academic institution has its system of managing the bookings of spaces. In an academic institution, rooms in different buildings may be handled by separate departments for booking purposes and most are carried out using the mail system. People using this system face numerous challenges like the absence of knowledge of available spaces, tediousness, improper management of bookings, clashes in bookings, communication failures, etc. These indicated a need for a well-defined, efficient, visualizable, and user-friendly space management system. BOOKiiIT, a Flutter app was proposed and designed using design thinking and PACT framework to make it user friendly and efficient in the different contexts of use. This app was implemented for the Indraprastha Institute of Information Technology (IIITD), but it can be easily replicated for other institutions.

Development and evaluation of an AI System for early detection of Covid-19 pneumonia using X-ray

ABSTRACT. This paper aims to integrate AI (Artificial Intelligence) with medical science to develop a classification tool to recognize Covid-19 infection and other lung ailments. Four conditions evaluated were Covid-19 pneumonia, non-Covid-19 pneumonia, pneumonia and normal lungs. The proposed AI system is divided into 2 stages. Stage 1 classifies chest X-Ray volumes into pneumonia and non-pneumonia. Stage 2 gets input from stage 1 if X-ray belongs to pneumonic class and further classifies it into Covid-19 positive and Covid-19 negative.

Towards a Safer Conversation Space: Detection of Toxic Content in Social Media

ABSTRACT. With content on social media turning increasingly toxic, it has attracted intensive research in the Natural Language Processing domain to detect aggression, hate, profanity, insult, cyberbullying and other personal attacks. Unlike most of the work in toxic content detection where the nature of toxicity is determined, we treat the detection of toxic content as a binary classification task. That is, whether the content is toxic or nontoxic. Here, we have explored Support Vector Machine, Boosting and deep neural networks for classification. With a goal of better predictive performance, our approach uses a majority voting ensemble to aggregate the predictions of individual classifiers. We have trained the models on twitter datasets and have achieved an F1 score of 94.5.

Arten-Net: An Emotion Classification System for Art

ABSTRACT. Art has been part of human culture since time immemorial. It is one of the earliest forms of communication of emotions and stories. Today, people invest a lot of money buying art and there is a need to help in classifying art not only in terms of age or style but also in terms of emotion evoked for ease in locating art displaying similar or same emotions. To the best of our knowledge, no systems exist that utilize multiple modalities for emotion classification of art pieces. This work proposes a classification system called Arten-Net that uses multimodal data from art pieces, title and image, to predict the emotion that art may evoke. Through quantitative and qualitative results we display how an ensemble of multimodal and unimodal classifiers is providing superior results than the multimodal and the unimodal classifiers individually.

Multimodal Emotion Recognition in Polish

ABSTRACT. Multimodal emotion recognition is a challenging task because emotions can be expressed through various forms and modalities. It can be applied in various fields, for example, human-computer interaction, crime, healthcare, multimedia retrieval, etc. In recent times, neural networks have achieved overwhelming success in determining emotional states. Motivated by these advancements, we present a multimodal emotion recognition system which is based on body language, facial expression and speech. This paper presents the techniques used in the Multimodal Emotion Recognition in Polish challenge. To detect the emotional state for various videos, data preprocessing operations are performed and robust features are extracted. For this purpose, we have used facial landmark detection for facial expressions and MFCC for speech. The data, in the form of videos, had variable length as a result of which, traditional classification algorithms couldn’t be used. Hence, we implemented a long short-term memory network. Each of the modalities are trained using an LSTM model and return an emotion with the highest probability. Then the models are combined using a weighted average approach, where the emotion with highest probability is the desired emotion.

Crowd Flow Collisions Simulation

ABSTRACT. The study of crowd scenes has become an interesting research area. Crowd flow collision is a critical problem since it may cause injuries or death to people. Current research methods on crowd flow collisions are only confined in 1D and 2D straight line directions. In this paper, we propose a 3D simulation for crowd flow collisions in different directions within an entry and an exit at a gate. The simulation efficiency is decreased when a number of agents is increased. The simulation frames per second are dropped below 35 when the characters are more than 100.