View: session overviewtalk overview
10:00 | Peak-Based Philips Fingerprint Robust to Pitch-Shift for Massive Audio Retrieval PRESENTER: Renjie Chu ABSTRACT. An ideal audio retrieval system identifies a short query snippet from a massive audio database with both robustness and efficiency. Unfortunately, none of the existing systems could robustly handle all distortions while keeping efficient. An efficient audio retrieval method of the systems must match the features of the fingerprint. Enhanced Sampling and Counting method (eSC), the state-of-the-art audio retrieval method, proposed for Philips-like fingerprints, has achieved both high efficiency and strong robustness, featuring time-stretch resistance. We argue that Philips fingerprint, robust to many types of distortions except speed-change which includes time-stretch and pitch-shift, combined with eSC is promising towards an ideal audio retrieval system, if we could make it robust to pitch-shift. To achieve the goal, this paper proposes a peak-point based energy bands computation method (PPEB) to enhance Philips fingerprint (PF) with resistance to pitch-shift, and the resulting fingerprint is called Peak-point based Philips fingerprint (PPF). Experimental results show that PPF can resist pitch-shift ranging from 70% to 130%, while retaining the robustness of PF to various noise distortions. |
10:22 | Supervised Generative Adversarial Cross-Modal Hashing by Transferring Pairwise Similarities for Venue Discovery PRESENTER: Rajiv Shah ABSTRACT. Venue discovery using real-world multimedia data has not been investigated thoroughly. We are referring to business and travel locations as venues in this study and aim to improve the efficiency of venue discovery by hashing. Most existing supervised cross-modal hashing methods map data in different modalities to Hamming space, where the semantic information is exploited to supervise data of different modalities during the training stage. However, previous works neglect pairwise similarity between data in different modalities, which lead to degraded performance of hashing function learning. To address this issue, we propose a supervised Generative Adversarial Cross-modal Hashing method by Transferring Pairwise Similarities (SGACH-TPS). This work has three significant contributions: i) we propose a model for making efficient venue discovery, ii) the supervised generative adversarial network can construct a hash function to map multimodal data to a common hamming space. iii) a simple transfer training strategy for the adversarial network is suggested to supervise data in different modalities where the pairwise similarity is transferred to the fine-tuning stage of training. Evaluation on the new WikiVenue dataset confirms the superiority of the proposed method. |
10:44 | Online Social Information Propagation Analysis Based on Time-Delay Mixture Diffusion Model PRESENTER: Minhua Zhang ABSTRACT. Information propagation modeling is an important problem in social media analysis. A lot of existing methods assume that information propagation process follows a uniform pattern. Different from the previous studies, we analyze the factors affecting the information transmission probabilities between users from the aspects of users' interest and community-level structure features, and propose a time-delay mixture diffusion model to describe the dynamics of information propagation process in online social networks (OSNs). In this work, we model the information propagation based on mixture of patterns by considering both content and two levels of structural features. We conduct the experiments on the Sina-Weibo dataset, one of the largest microblogging websites in China. The experimental results show that our proposed model outperforms the baseline models in information diffusion prediction. |
11:06 | Robust and Scalable Face Retrieval Framework for Large-Scale Databases PRESENTER: Harsh Shrivastava ABSTRACT. Built upon the work of Yugo Sato et.al., this paper introduces a robust and scalable face retrieval system that can be used to retrieve the face envisioned by a user from a large-scale database. Our system is designed for a situation in which the user wishes to look for a person but doesn't exactly remember their face. In this system, we aim to utilise the visual memory inputs from the user in the form of the common facial attributes that they think their envisioned image has. Furthermore, Instead of information specific to the target, our face retrieval system asks the user to select several images that are similar to the impression of the target image the user want to search for. On the basis of the selection, our system automatically reduces the semantic gap between human-based representation and the computer-based representation of the target image by optimizing a deep convolutional network with the user in the loop. This work addresses a very critical challenge of image retrieval across different user's visual inputs from a very large-scale database compared to that used in the work of Yugo Sato et.al.. Similar to the previous works, we ran user studies with 10 subjects on a public large-scale database and confirmed that our framework beats the state of the art results in this task and establishes itself very effective for retrieving the envisioned image quickly and without much burden on the user. |
14:00 | Welcome to the BD:CDH 2019 |
14:15 | Keynote Address |
15:45 | Introducing the presenters & their research titles |
16:30 | Digitizing Architectural Heritage in Brunei Darussalam Towards Cultural Safeguarding, Tourism Development and Education ABSTRACT. This paper intends to understand the cultural, stylistic and historical significance of architectural heritage in Brunei Darussalam in order to ensure its safeguarding and sustainability. The paper focus on the use of digital technologies to support the surveying and archival analysis of architectural heritage in Brunei Darussalam. Through the methods of digital humanities, research endeavour is focused on the documentation (geometric, architectural, historical) through 2D and 3D drawings, creating digital and interactive maps for geo-spatial, contextual, and phenomenological navigation to locate architectural heritage. This research demonstrates the potential of digital technologies to the study and safeguarding of architectural heritage and the instrumentalisation of these data to create an interactive and open access platform designed for education, conservation, cultural management, safeguarding awareness, social responsibility, and tourism development. |
16:48 | Classification of Cultural Heritage Sites Using Transfer Learning PRESENTER: Uday Kulkarni ABSTRACT. India is a country endowed with rich cultural heritage especially renowned architectural sites and they exist in a variety of forms such as monuments, sculpture, manuscripts and paintings. Cultural heritages connect generations over time and we need to preserve them. Architects, historians, travellers etc. across the globe visit India as it is known as world’s cultural heritage gems. They visit many historical sites where it often becomes difficult for them to identify and get historical details about the monument they are interested in. An accurate prediction of the images to its correct label (heritage site) allows more proficient searches through specific terms, thus helping in the studying and understanding the heritage assets. Classification of data, which involve images, is complex and also time consuming. The present state of art of machine learning techniques can be harnessed to atomize the classification of images. In this paper, we propose a crowdsourcing platform to collect Indian Digital Heritage (IDH) monuments data, perform image classification and query based retrieval of Image labels. Further we designed a transfer learning based image classifier, which retrained using IDH dataset on pre-trained Mobilenet V2 architecture over ImageNet Dataset. We demonstrate our crowd source framework using a web application, which is part of Indian Digital Heritage Space (IHDS) project funded by DST, Government of India, New Delhi. |
17:06 | Using N-Grams to Identify Edit Wars on Wikipedia ABSTRACT. This paper presents the method of identifying Wikipedia edit wars using N-grams analysis. The analysis is conducted on the corpus of past versions of Wikipedia pages concerning historical figures who are glorified and idolised by the Hindu Right. The analysis shows that Wikipedia’s open structure and Article Policies enable a conversation between academic and popular histories, a feat which has been difficult in India in the past. |
17:24 | A Framework for the Conversion of Textual BigData into 2D Architectural Floor Plan PRESENTER: Chiranjoy Chattopadhyay ABSTRACT. In the recent past, due to the boost in digitization a huge volume of data (BigData) related to building and architectures has been accumulated. The sources of such data are interviews, blogs, websites, etc. These descriptions has details about the interior of a building. For humans it is easy to interpret and imagine its structure and arrangements of furniture. Automatic synthesis of real-world images from text descriptions has been explored in the computer vision community. However, there is no such attempt in the area of document images, like floor plans. Even though, Floor plan synthesis from sketches, as well as data-driven models, were proposed earlier, this is the first attempt to automatically render building floor plan images from textual description. Here, the input is a natural language description of the internal structure and furniture arrangements within a house, and the output is the 2D floor plan image of the same. We have experimented on publicly available benchmark floor plan datasets. We were able to render realistic synthesized floor plan images from the description written in English. |
17:42 | Novel Segmentation Metric for Use in Augmented Reality Advertisement Integration PRESENTER: Julian True ABSTRACT. A major application area for Augmented Reality (AR) is advertisement. To achieve easy advertisement integration and management for stakeholders, a quantitative semantic understanding of the world, that adheres to human perception is necessary. Current deep learning-based segmentation algorithms such as Mask R-CNN provide a mask of the real world (e.g. building facades) in a greedy manner, which, although quantitatively accurate, does not adhere to the needs of the AR advertisers for appropriate ad placement, such as ad size, mask continuity, etc. In this paper, we propose three intuitive metrics for evaluating building facade segmentation specifically for advertisement integration, namely, average discontinuity, normalized continuous area, and resolution ratio. Each of these metrics is inspired from the way advertisers may want to view segmentation results within an AR world editor, or the way users may want to experience AR advertisements. Experimental results on a test segmentation shows the importance of such intuitive metrics, where we show how an accurate placement area for a sample 2D advertisement can be easily isolated from a segmented facade using our metrics. |
16:30 | Keynote: Semantic information extraction and compression in large-scale multimedia systems |
17:15 | Residual Component Estimating CNN for Image Super-Resolution PRESENTER: Xian-Hua Han ABSTRACT. With the success of convolutional neural networks (CNNs) for different computer vision applications, CNNs have been widely applied for single image super- resolution (SR). The recent research line for CNN-based image SR mainly concentrates on exploring the pioneering network architectures such as very deep CNN, ResNet, GAN-net, for enhancing performance of the learned high-resolution (HR) image. Although the impressive performance with the recent CNN-based SR work has been achieved, the non-recovered high-frequency (residual) components are unavoidably existed with the current network architectures. This study aims to explore an unified CNN network architecture for learning not only the HR image but also simultaneously the difficultly recovered residual components in the first network. With one existed CNN architecture for image super-resolution, the HR image can be learned while some high-frequency content in the ground-truth image may not be perfectly recovered. For estimating the non-recovered high-frequency content, this study stacks another CNN architecture on the output of the baseline CNN, and construct an end-to-end residual component learning framework for more accurate image SR. Experimental results on benchmark dataset validate that the proposed residual component estimating CNN can overperform the non-stacked CNN architecture, and demonstrates state-of-the-art restoration quality. |
17:30 | Maybe Look Closer? Detecting Trolling Prone Images on Instagram PRESENTER: Hitkul Jangid ABSTRACT. The widespread of better network infrastructure and smartphones have made images based social media platforms like Instagram and Flickr popular. The visual medium of communication has also led to an alarming increase in trolling incidents on social media. Though it is crucial to automatically detect trolling incidents on social media, in this paper, we look at the problem from the eye of prevention rather than detection. A system that can recognize trolling prone images can issue a warning to users before the content is posted online and prevent potential trolling incidents. We attempt to make a supervised classifier to detect trolling prone images and discuss why the conventional state-of-the-art image classification method does not work well for this task. We also provide an extensive analysis of trolling patterns in images from Instagram, discuss challenges and possible future paths in detail. |