BIGMM2019: THE 5TH IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA
PROGRAM FOR FRIDAY, SEPTEMBER 13TH
Days:
previous day
all days

View: session overviewtalk overview

09:30-10:00Coffee Break
10:00-11:30 Session 12: Regular: New theory & models
Chair:
Santanu Chaudhury (Indian Institute of Technology Jodhpur, India)
10:00
Renjie Chu (School of Information and Computer, Taiyuan University of Technology, China)
Baoning Niu (School of Information and Computer, Taiyuan University of Technology, China)
Shanshan Yao (Institute of Big Data Science and Industry, Shanxi University, China)
Jianquan Liu (Biometrics Research Laboratories, NEC Corporation, China)
Peak-Based Philips Fingerprint Robust to Pitch-Shift for Massive Audio Retrieval
PRESENTER: Renjie Chu

ABSTRACT. An ideal audio retrieval system identifies a short query snippet from a massive audio database with both robustness and efficiency. Unfortunately, none of the existing systems could robustly handle all distortions while keeping efficient. An efficient audio retrieval method of the systems must match the features of the fingerprint. Enhanced Sampling and Counting method (eSC), the state-of-the-art audio retrieval method, proposed for Philips-like fingerprints, has achieved both high efficiency and strong robustness, featuring time-stretch resistance. We argue that Philips fingerprint, robust to many types of distortions except speed-change which includes time-stretch and pitch-shift, combined with eSC is promising towards an ideal audio retrieval system, if we could make it robust to pitch-shift. To achieve the goal, this paper proposes a peak-point based energy bands computation method (PPEB) to enhance Philips fingerprint (PF) with resistance to pitch-shift, and the resulting fingerprint is called Peak-point based Philips fingerprint (PPF). Experimental results show that PPF can resist pitch-shift ranging from 70% to 130%, while retaining the robustness of PF to various noise distortions.

10:22
Himanshu Aggarwal (Indraprastha Institute of Information Technology Delhi, India)
Rajiv Shah (Indraprastha Institute of Information Technology Delhi, India)
Suhua Tang (The University of Electro-Communications, Japan)
Feida Zhu (Singapore Management University, Singapore)
Supervised Generative Adversarial Cross-Modal Hashing by Transferring Pairwise Similarities for Venue Discovery
PRESENTER: Rajiv Shah

ABSTRACT. Venue discovery using real-world multimedia data has not been investigated thoroughly. We are referring to business and travel locations as venues in this study and aim to improve the efficiency of venue discovery by hashing. Most existing supervised cross-modal hashing methods map data in different modalities to Hamming space, where the semantic information is exploited to supervise data of different modalities during the training stage. However, previous works neglect pairwise similarity between data in different modalities, which lead to degraded performance of hashing function learning. To address this issue, we propose a supervised Generative Adversarial Cross-modal Hashing method by Transferring Pairwise Similarities (SGACH-TPS). This work has three significant contributions: i) we propose a model for making efficient venue discovery, ii) the supervised generative adversarial network can construct a hash function to map multimodal data to a common hamming space. iii) a simple transfer training strategy for the adversarial network is suggested to supervise data in different modalities where the pairwise similarity is transferred to the fine-tuning stage of training. Evaluation on the new WikiVenue dataset confirms the superiority of the proposed method.

10:44
Minhua Zhang (Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, China)
Youtian Du (Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, China)
Guangxun Zhang (Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, China)
Yujie Xie (Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, China)
Fuyuan Cao (Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, China)
Online Social Information Propagation Analysis Based on Time-Delay Mixture Diffusion Model
PRESENTER: Minhua Zhang

ABSTRACT. Information propagation modeling is an important problem in social media analysis. A lot of existing methods assume that information propagation process follows a uniform pattern. Different from the previous studies, we analyze the factors affecting the information transmission probabilities between users from the aspects of users' interest and community-level structure features, and propose a time-delay mixture diffusion model to describe the dynamics of information propagation process in online social networks (OSNs). In this work, we model the information propagation based on mixture of patterns by considering both content and two levels of structural features. We conduct the experiments on the Sina-Weibo dataset, one of the largest microblogging websites in China. The experimental results show that our proposed model outperforms the baseline models in information diffusion prediction.

11:06
Harsh Shrivastava (MIDAS Lab, IIIT, Delhi, India)
Rama Krishna P V N S (MIDAS Lab, IIIT, Delhi, India)
Karmanya Aggarwal (MIDAS Lab, IIIT, Delhi, India)
Meghna P Ayyar (MIDAS Lab, IIIT, Delhi, India)
Yifang Yin (NUS, Singapore)
Rajiv Ratn Shah (MIDAS Lab, IIIT, Delhi, India)
Roger Zimmermann (National University of Singapore, Singapore)
Robust and Scalable Face Retrieval Framework for Large-Scale Databases

ABSTRACT. Built upon the work of Yugo Sato et.al., this paper introduces a robust and scalable face retrieval system that can be used to retrieve the face envisioned by a user from a large-scale database. Our system is designed for a situation in which the user wishes to look for a person but doesn't exactly remember their face. In this system, we aim to utilise the visual memory inputs from the user in the form of the common facial attributes that they think their envisioned image has. Furthermore, Instead of information specific to the target, our face retrieval system asks the user to select several images that are similar to the impression of the target image the user want to search for. On the basis of the selection, our system automatically reduces the semantic gap between human-based representation and the computer-based representation of the target image by optimizing a deep convolutional network with the user in the loop. This work addresses a very critical challenge of image retrieval across different user's visual inputs from a very large-scale database compared to that used in the work of Yugo Sato et.al.. Similar to the previous works, we ran user studies with 10 subjects on a public large-scale database and confirmed that our framework beats the state of the art results in this task and establishes itself very effective for retrieving the envisioned image quickly and without much burden on the user.

11:30-13:00 Session 13: BigMM / TCMC Meeting
Chair:
Roger Zimmermann (National University of Singapore, Singapore)
13:00-14:00Lunch Break
14:00-16:00 Session 14A: Invited: Security & Privacy
Chair:
Yongqing Sun (NTT, Japan)
14:00
Vikram Patil (SUNY Albany, United States)
Shivam B. Parikh (SUNY Albany, United States)
Pradeep K. Atrey (SUNY Albany, United States)
GeoSecure-O: a Method for Secure Distance Calculation for Travel Mode Detection Using Outsourced GPS Trajectory Data

ABSTRACT. Detecting the mode of transportation using multimedia sensory data, such as GPS trajectory data, is an important research problem arising out in many application domains that use location-based services (LBS). Due to the large volume, such data is commonly outsourced to third-party cloud service providers (CSP) for storage and usage by the LBS providers. The LBS providers typically deploy mode detection algorithms at the CSPs, which leads to continuous tracking of the user's location and raises severe security and privacy concerns among the users. Although many approaches exist in the literature regarding travel mode detection, security, and privacy concerns are not addressed by them. GPS based mode detection algorithms use distance and derived features such as velocity and acceleration. In this paper, we present a method, called GeoSecure-O, to securely calculate the distance without revealing the users' location to the LBS provider and utilize it to determine the travel mode. We evaluate the proposed method on the Microsoft GeoLife dataset. Experimental results show that the proposed method not only protects users' location but also achieves an accuracy similar to what is achieved using the data with users' location.

14:30
Chao Wu (Zhejiang University, China)
Fengda Zhang (Zhejiang University, China)
Fei Wu (Zhejiang University, China)
Distributed Modelling Approaches for Data Privacy Preserving

ABSTRACT. Recently, machine learning has been developing rapidly. There is no doubt that data plays an important role in machine learning. However, it is hard to make full use of the data of different nodes to collaboratively train a good model with data privacy preserving. In this paper, we study and analyze several decentralized machine learning algorithms which are good at privacy protection. Firstly, we propose smart contract-based decentralized federated learning by combining federated averaging algorithm with smart contract. Secondly, we study the decentralized topology network-based machine learning algorithm to solve the problems caused by star-topology network. Then, we present a novel method of model aggregation based on distillation to break the limit that the models of different nodes have the same structures. To further preserve privacy, we improve a secure aggregation protocol and apply it to decentralized topology network-based federated learning. We also use several methods that generating new dataset from raw dataset to train models to protect data privacy. Finally, we analyze and compare different distributed machine learning algorithms through the experiments.

15:00
Jiali Liang (Shanghai University, China)
Dan Zeng (Shanghai University, China)
Shuaijun Chen (Huawei Noah’s Ark Lab, China)
Qi Tian (Huawei Noah’s Ark Lab, China)
Related Attention Network for Person Re-Identification

ABSTRACT. Person Re-identification (ReID) is a critical technology in intelligent video surveillance. In practice, person ReID remains challenging due to pedestrian misalignment and background clutter. Pedestrian images are generated by manually cropping or pedestrian detection algorithms in most existing datasets, which mainly cause two drawbacks. On the one hand, detection errors may lead to pedestrian misalignment and cluttered background. On the other hand, hand-drawn bounding boxes are highly accurate but with inconsistent scales. In order to solve these problems, we make two contributions. Firstly, we design a simple and effective data pre-processing algorithm, which aligns pedestrian images into a standard template based on keypoints. Secondly, we propose the Related Attention Network (RAN) to focus on human body regions by the pixel-level correlation, which improves ReID performance significantly. Experimental results on Market-1501 and DukeMTMC-reID datasets demonstrate the effectiveness of our method.

15:30
Akshay Agarwal (IIIT Delhi, India, India)
Akarsha Sehwag (IIIT-Delhi, India, India)
Richa Singh (IIIT-Delhi, India, India)
Mayank Vatsa (IIIT-Delhi, India, India)
Deceiving Face Presentation Attack Detection via Image Transforms

ABSTRACT. Presentation attacks can provide unauthorized access to the users and fool face recognition systems for both small scale and large scale applications. Among all the presentation attacks, 2D print and replay attacks are very popular due to their ease and cost-effectiveness in attacking face recognition systems. However, over the years, there are several successful presentation attack detection algorithms developed to detect 2D print and replay attacks. Generally, 2D presentation attacks are detected using the presence or absence of micro patterns which distinguish a real input from an attacked input. However, if a smart attacker digitally pre-processes the image using intensity transforms and then performs 2D presentation attack , the differences between real and attacked samples due to the micro-patterns would be minimized. In this paper, for the first time, we show that simple intensity transforms such as Gamma correction, log transform, and brightness control can help an attacker to deceive face presentation attack detection algorithms. Experimental results show that the smart attacker can increase the error rate of the hand-crafted as well as deep learning based presentation attack detectors.

14:00-16:00 Session 14B: Workshop: BD-CDH'19
Chair:
Naimul Khan (Ryerson University, Canada)
14:00
Chiranjoy Chattopadhyay (Indian Institute of Technology Jodhpur, India)
Welcome to the BD:CDH 2019
14:15
Santanu Chaudhury (Indian Institute of Technology Jodhpur, India)
Keynote Address
15:45
Naimul Khan (Ryerson University, Canada)
Introducing the presenters & their research titles
14:00-16:00 Session 14C: Invited: BigMM techniques for novel applications
Chair:
Keiji Yanai (The University of Electro-Communications, Japan)
14:00
Yifeng Wang (Shanghai University, China)
Dan Zeng (Shanghai University, China)
Deep Domain Adaptation on Vehicle Re-Identification

ABSTRACT. For vehicle re-identification, it is a task of searching all pictures in gallery and finding all vehicle images with the same ID as the given images. Despite using deep learning, we have achieved excellent results in vehicle re-identification. However, there is a huge challenge in vehicle re-identification. The vehicle model we have trained can only work well on particular set of datasets. When transferring to other datasets, its performance is not satisfactory. Domain adaptation is mainly used to solve the problem. As far as we know, the paper first use domain adaptation for vehicle re-identification, used to improve vehicle re-identification in cross dataset performance. The paper uses resnet as the basic skeleton network, adding Maximum Mean Discrepancy (MMD) to the optimization goal of the network, and extend it to multiple-kernel. It is found that the performance of vehicle re-identification has been improved on the basis of transferring directly through experiments.

14:30
Shinji Oyama (The University of Tokyo, Japan)
Toshihiko Yamasaki (The University of Tokyo, Japan)
Visual Clarity Analysis and Improvement Support for Presentation Slides

ABSTRACT. Presentation slides offer effective ways to deliver information in various fields. It has become easier to create slides owing to advanced presentation software such as PowerPoint. However, novices still face difficulty in designing slides that are easily comprehensible, as few slide evaluation methods exist that can objectively judge the quality of slides. In this paper, we analyze the features extracted from slides and tackle a simple classification problem, i.e., whether the input single-page slide is easy to understand. For evaluation, we created a new dataset of 1,000 PowerPoint slides with visual clarity label by using a crowdsourcing service. Using the 30\% of the slides with high/low clarity rating, we achieved an accurate classification rate of 90.3\%. We further proposed a feedback system that supports the improvement of slide designs. User study demonstrates that our system, which uses feedback on visual clarity scores and areas that should be modified, effectively supports slide improvement.

15:00
Zeeshan Ahmad (Ryerson University, Canada)
Naimul Mefraz Khan (Ryerson University, Canada)
Multidomain Multimodal Fusion for Human Action Recognition Using Inertial Sensors

ABSTRACT. One of the major reasons for misclassification of multiplex actions during action recognition is the unavailability of complementary features that provide the semantic information about the actions. In different domains these features are present with different scales and intensities. In existing literature, features are extracted independently in different domains, but the benefits from fusing these multidomain features are not realized. To address this challenge and to extract complete set of complementary information, in this paper, we propose a novel multidomain multimodal fusion framework that extracts complementary and distinct features from different domains of the input modality. We transform input inertial data into signal images, and then make the input modality multidomain and multimodal by transforming spatial domain information into frequency and time-spectrum domain using Discrete Fourier Transform (DFT) and Gabor wavelet transform (GWT) respectively. Features in different domains are extracted by Convolutional Neural networks (CNNs) and then fused by Canonical Correlation based Fusion (CCF) for improving the accuracy of human action recognition. Experimental results on three inertial datasets show the superiority of the proposed method when compared to the state-of-the-art.

15:30
Chengcheng Wei (University of Science and Technology of China, China)
Wengang Zhou (University of Science and Technology of China, China)
Junfu Pu (University of Science and Technology of China, China)
Houqiang Li (University of Science and Technology of China, China)
Deep Grammatical Multi-Classifier for Continuous Sign Language Recognition

ABSTRACT. In this paper, we propose a novel deep architecture with multiple classifiers for continuous sign language recognition. Representing the sign video with a 3D convolutional residual network and a bidirectional LSTM, we formulate continuous sign language recognition as a grammatical-rule-based classification problem. We first split a text sentence of sign language into isolated words and n-grams, where an n-gram is a sequence of adjacent n words in a sentence. Then, we propose a word-independent classifiers (WIC) module and an n-gram classifier (NGC) module to identify the words and n-grams in a sentence, respectively. A greedy decoding algorithm is employed to integrate words and n-grams into the sentence based on the confidence scores provided by both modules. Our method is evaluated on a Chinese continuous sign language recognition benchmark and achieves state-of-the-art performance, which demonstrates its effectiveness and superiority.

16:00-16:30Coffee Break
16:30-18:00 Session 15A: Workshop: BD-CDH'19
Chair:
Naimul Khan (Ryerson University, Canada)
16:30
Rui Oliveira Lopes (Faculty of Arts and Social Sciences, Brunei Darussalam)
Digitizing Architectural Heritage in Brunei Darussalam Towards Cultural Safeguarding, Tourism Development and Education

ABSTRACT. This paper intends to understand the cultural, stylistic and historical significance of architectural heritage in Brunei Darussalam in order to ensure its safeguarding and sustainability. The paper focus on the use of digital technologies to support the surveying and archival analysis of architectural heritage in Brunei Darussalam. Through the methods of digital humanities, research endeavour is focused on the documentation (geometric, architectural, historical) through 2D and 3D drawings, creating digital and interactive maps for geo-spatial, contextual, and phenomenological navigation to locate architectural heritage. This research demonstrates the potential of digital technologies to the study and safeguarding of architectural heritage and the instrumentalisation of these data to create an interactive and open access platform designed for education, conservation, cultural management, safeguarding awareness, social responsibility, and tourism development.

16:48
Uday Kulkarni (KLE Technological University, India)
Meena S.M (KLE Technological University, India)
Sunil Gurlahosur (KLE Technological University, India)
Uma Mudengudi (KLE Technological University, India)
Pratiksha Benagi (KLE Technological University, India)
Shashidhara Vyakaranal (KLE Technological University, India)
Classification of Cultural Heritage Sites Using Transfer Learning
PRESENTER: Uday Kulkarni

ABSTRACT. India is a country endowed with rich cultural heritage especially renowned architectural sites and they exist in a variety of forms such as monuments, sculpture, manuscripts and paintings. Cultural heritages connect generations over time and we need to preserve them. Architects, historians, travellers etc. across the globe visit India as it is known as world’s cultural heritage gems. They visit many historical sites where it often becomes difficult for them to identify and get historical details about the monument they are interested in. An accurate prediction of the images to its correct label (heritage site) allows more proficient searches through specific terms, thus helping in the studying and understanding the heritage assets. Classification of data, which involve images, is complex and also time consuming. The present state of art of machine learning techniques can be harnessed to atomize the classification of images. In this paper, we propose a crowdsourcing platform to collect Indian Digital Heritage (IDH) monuments data, perform image classification and query based retrieval of Image labels. Further we designed a transfer learning based image classifier, which retrained using IDH dataset on pre-trained Mobilenet V2 architecture over ImageNet Dataset. We demonstrate our crowd source framework using a web application, which is part of Indian Digital Heritage Space (IHDS) project funded by DST, Government of India, New Delhi.

17:06
Arjun Ghosh (Indian Institute of Technology Delhi, India)
Using N-Grams to Identify Edit Wars on Wikipedia

ABSTRACT. This paper presents the method of identifying Wikipedia edit wars using N-grams analysis. The analysis is conducted on the corpus of past versions of Wikipedia pages concerning historical figures who are glorified and idolised by the Hindu Right. The analysis shows that Wikipedia’s open structure and Article Policies enable a conversation between academic and popular histories, a feat which has been difficult in India in the past.

17:24
Mahak Jain (Microsoft, India)
Anurag Sanyal (Simon Fraser University, Canada)
Shreya Goyal (Indian Institute of Technology Jodhpur, India, India)
Chiranjoy Chattopadhyay (Indian Institute of Technology Jodhpur, India, India)
Gaurav Bhatnagar (Indian Institute of Technology Jodhpur, India, India)
A Framework for the Conversion of Textual BigData into 2D Architectural Floor Plan

ABSTRACT. In the recent past, due to the boost in digitization a huge volume of data (BigData) related to building and architectures has been accumulated. The sources of such data are interviews, blogs, websites, etc. These descriptions has details about the interior of a building. For humans it is easy to interpret and imagine its structure and arrangements of furniture. Automatic synthesis of real-world images from text descriptions has been explored in the computer vision community. However, there is no such attempt in the area of document images, like floor plans. Even though, Floor plan synthesis from sketches, as well as data-driven models, were proposed earlier, this is the first attempt to automatically render building floor plan images from textual description. Here, the input is a natural language description of the internal structure and furniture arrangements within a house, and the output is the 2D floor plan image of the same. We have experimented on publicly available benchmark floor plan datasets. We were able to render realistic synthesized floor plan images from the description written in English.

17:42
Julian True (Ryerson University, Canada)
Naimul Khan (Ryerson University, Canada)
Novel Segmentation Metric for Use in Augmented Reality Advertisement Integration
PRESENTER: Julian True

ABSTRACT. A major application area for Augmented Reality (AR) is advertisement. To achieve easy advertisement integration and management for stakeholders, a quantitative semantic understanding of the world, that adheres to human perception is necessary. Current deep learning-based segmentation algorithms such as Mask R-CNN provide a mask of the real world (e.g. building facades) in a greedy manner, which, although quantitatively accurate, does not adhere to the needs of the AR advertisers for appropriate ad placement, such as ad size, mask continuity, etc. In this paper, we propose three intuitive metrics for evaluating building facade segmentation specifically for advertisement integration, namely, average discontinuity, normalized continuous area, and resolution ratio. Each of these metrics is inspired from the way advertisers may want to view segmentation results within an AR world editor, or the way users may want to experience AR advertisements. Experimental results on a test segmentation shows the importance of such intuitive metrics, where we show how an accurate placement area for a sample 2D advertisement can be easily isolated from a segmented facade using our metrics.

16:30-18:00 Session 15B: Workshop: MLIAU'19
Chairs:
Xian-Hua Han (Yamaguchi University, Japan)
Yongqing Sun (NTT, Japan)
16:30
Weiyao Lin (Shanghai Jiao Tong University, China)
Keynote: Semantic information extraction and compression in large-scale multimedia systems
17:15
Xian-Hua Han (Yamaguchi University, Japan)
Yen-Wei Chen (Ritsumeikan University, Japan)
Residual Component Estimating CNN for Image Super-Resolution
PRESENTER: Xian-Hua Han

ABSTRACT. With the success of convolutional neural networks (CNNs) for different computer vision applications, CNNs have been widely applied for single image super- resolution (SR). The recent research line for CNN-based image SR mainly concentrates on exploring the pioneering network architectures such as very deep CNN, ResNet, GAN-net, for enhancing performance of the learned high-resolution (HR) image. Although the impressive performance with the recent CNN-based SR work has been achieved, the non-recovered high-frequency (residual) components are unavoidably existed with the current network architectures. This study aims to explore an unified CNN network architecture for learning not only the HR image but also simultaneously the difficultly recovered residual components in the first network. With one existed CNN architecture for image super-resolution, the HR image can be learned while some high-frequency content in the ground-truth image may not be perfectly recovered. For estimating the non-recovered high-frequency content, this study stacks another CNN architecture on the output of the baseline CNN, and construct an end-to-end residual component learning framework for more accurate image SR. Experimental results on benchmark dataset validate that the proposed residual component estimating CNN can overperform the non-stacked CNN architecture, and demonstrates state-of-the-art restoration quality.

17:30
Hitkul Jangid (Indraprastha Institute of Information Technology, Delhi, India)
Rajiv Ratn Shah (Indraprastha Institute of Information Technology, Delhi, India)
Ponnurangam Kumaraguru (Indraprastha Institute of Information Technology, Delhi, India)
Shin’ichi Satoh (National Institute of Informatics, Tokyo, Japan)
Maybe Look Closer? Detecting Trolling Prone Images on Instagram
PRESENTER: Hitkul Jangid

ABSTRACT. The widespread of better network infrastructure and smartphones have made images based social media platforms like Instagram and Flickr popular. The visual medium of communication has also led to an alarming increase in trolling incidents on social media. Though it is crucial to automatically detect trolling incidents on social media, in this paper, we look at the problem from the eye of prevention rather than detection. A system that can recognize trolling prone images can issue a warning to users before the content is posted online and prevent potential trolling incidents. We attempt to make a supervised classifier to detect trolling prone images and discuss why the conventional state-of-the-art image classification method does not work well for this task. We also provide an extensive analysis of trolling patterns in images from Instagram, discuss challenges and possible future paths in detail.