Abstract:
Human Object Interaction (HOI) detection provides valuable insights into the meaning
and interpretation of a painting, as the interactions between humans and object
reveal information about the scene, characters, and story depicted in the artwork.
Automatically detecting HOI in paintings is a challenging task, as the paintings
often contain complex scenes with intricate details and variations in artistic style.
Additionally, unlike in real-world images, the context and physics of the painting
may not follow physical rules, which can further complicate the detection process.
The proposed system addresses the complexities of this task, considering the
intricate details and variations in artistic style found in paintings. It incorporates
a model that captures discriminative information by extracting visual features from
detected humans, objects, and the Region of Interest. The model analyzes spatial
arrangements to understand the relationships and interactions between elements.
Moreover, the model integrates contextual knowledge and semantic relationships
using a knowledge graph based on Graph Convolution Network to capture the underlying
meaning and story depicted in artwork.
However, relying solely on appearance and context may not be enough to accurately
infer HOIs in paintings. To overcome this challenge, multitask learning is employed
by introducing four supplementary classification tasks. These tasks provide
complementary information that enhances the HOI detection process, leveraging
shared representations across multiple tasks. The proposed system introduces the
SemArt-HOI benchmark dataset, augmenting the SemArt dataset with instance detection
annotations and interaction classes. Experimental results demonstrate that
the proposed model outperforms the state-of-the-art one-stage transformer-based
HOI detection model in both single-task and multi-task settings by 1.19% and
1.51% respectively. Furthermore, the system exhibits superior efficiency, training
four times faster and requiring fewer resources. This makes it suitable for practical
and large-scale HOI detection in paintings.