Human Object Interaction Detection in Paintings using Multi-Task Learning

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Human Object Interaction (HOI) detection provides valuable insights into the meaning and interpretation of a painting, as the interactions between humans and object reveal information about the scene, characters, and story depicted in the artwork. Automatically detecting HOI in paintings is a challenging task, as the paintings often contain complex scenes with intricate details and variations in artistic style. Additionally, unlike in real-world images, the context and physics of the painting may not follow physical rules, which can further complicate the detection process. The proposed system addresses the complexities of this task, considering the intricate details and variations in artistic style found in paintings. It incorporates a model that captures discriminative information by extracting visual features from detected humans, objects, and the Region of Interest. The model analyzes spatial arrangements to understand the relationships and interactions between elements. Moreover, the model integrates contextual knowledge and semantic relationships using a knowledge graph based on Graph Convolution Network to capture the underlying meaning and story depicted in artwork. However, relying solely on appearance and context may not be enough to accurately infer HOIs in paintings. To overcome this challenge, multitask learning is employed by introducing four supplementary classification tasks. These tasks provide complementary information that enhances the HOI detection process, leveraging shared representations across multiple tasks. The proposed system introduces the SemArt-HOI benchmark dataset, augmenting the SemArt dataset with instance detection annotations and interaction classes. Experimental results demonstrate that the proposed model outperforms the state-of-the-art one-stage transformer-based HOI detection model in both single-task and multi-task settings by 1.19% and 1.51% respectively. Furthermore, the system exhibits superior efficiency, training four times faster and requiring fewer resources. This makes it suitable for practical and large-scale HOI detection in paintings.

Description

Keywords

Human Object Interaction Detection, Computer Vision, Deep learning, Multi-Task Learning

Citation

Endorsement

Review

Supplemented By

Referenced By