MKEAH: Multimodal Knowledge Extraction and Accumulation Based on Hyperplane Embedding for Knowledge-Based Visual Question Answering

EasyChair Preprint 10761

9 pages•Date: August 22, 2023

Heng Zhang, Zhihua Wei, Guanming Liu, Ruibin Mu, Rui Wang, Chuan Bao Liu, Aiquan Yuan, Guodong Cao and Ning Hu

Abstract

External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world. Recent entity-relationship embedding approaches are deficient in some of representing complex relations, resulting in a lack of topic-related knowledge but the redundancy of topic-irrelevant information. To this end, we propose MKEAH to represent Multimodal Knowledge Extraction and Accumulation on Hyperplanes. To ensure that the length of the feature vectors projected to the hyperplane compares equally and to filter out enough topic-irrelevant information, two losses are proposed to learn the triplet representations from the complementary views: range loss and orthogonal loss. In order to interpret the capability of extracting topic-related knowledge, we present Topic Similarity (TS) between topic and entity-relation. Experimental results demonstrate the effectiveness of hyperplane embedding for knowledge representation in knowledge-based visual question answering. Our model outperforms the state-of-the-art methods by 2.12% and 3.24%, respectively, on two challenging knowledge-required datasets: OK-VQA and KRVQA. The obvious advantages of our model on TS shows that using hyperplane embedding to represent multimodal knowledge can improve the ability of the model to extract topic-related knowledge.

Keyphrases: Hyperplane, Knowledge-based Visual Question Answering, topic-related

Links:

https://easychair.org/publications/preprint/Q92l

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:10761,
  author    = {Heng Zhang and Zhihua Wei and Guanming Liu and Ruibin Mu and Rui Wang and Chuan Bao Liu and Aiquan Yuan and Guodong Cao and Ning Hu},
  title     = {MKEAH: Multimodal Knowledge Extraction and Accumulation Based on Hyperplane Embedding for Knowledge-Based Visual Question Answering},
  howpublished = {EasyChair Preprint 10761},
  year      = {EasyChair, 2023}}

Download PDF Open PDF in browser