Geometric Features Enhanced Human-Object Interaction Detection

Manli Zhu, Edmond S. L. Ho, Shuang Chen, Longzhi Yang and Hubert P. H. Shum
IEEE Transactions on Instrumentation and Measurement (TIM), 2024

 Impact Factor: 5.6 Top 25% Journal in Engineering, Electrical & Electronic

Geometric Features Enhanced Human-Object Interaction Detection

Abstract

Cameras are essential vision instruments to capture images for pattern detection and measurement. Human-object interaction (HOI) detection is one of the most popular pattern detection approaches for captured human-centric visual scenes. Recently, Transformer-based models have become the dominant approach for HOI detection due to their advanced network architectures and thus promising results. However, most of them follow the one-stage design of vanilla Transformer, leaving rich geometric priors under-exploited and leading to compromised performance especially when occlusion occurs. Given that geometric features tend to outperform visual ones in occluded scenarios and offer information that complements visual cues, we propose a novel end-to-end Transformer-style HOI detection model, i.e., geometric features enhanced HOI detector (GeoHOI). One key part of the model is a new unified self-supervised keypoint learning method named UniPointNet that bridges the gap of consistent keypoint representation across diverse object categories, including humans. GeoHOI effectively upgrades a Transformer-based HOI detector benefiting from the keypoints similarities measuring the likelihood of human-object interactions as well as local keypoint patches to enhance interaction query representation, so as to boost HOI predictions. Extensive experiments show that the proposed method outperforms the state-of-the-art models on V-COCO and achieves competitive performance on HICO-DET. Case study results on the post-disaster rescue with vision-based instruments showcase the applicability of the proposed GeoHOI in real-world applications.

Downloads

YouTube

Citations

BibTeX

@article{zhu24geometric,
 author={Zhu, Manli and Ho, Edmond S. L. and Chen, Shuang and Yang, Longzhi and Shum, Hubert P. H.},
 journal={IEEE Transactions on Instrumentation and Measurement},
 title={Geometric Features Enhanced Human-Object Interaction Detection},
 year={2024},
 publisher={IEEE},
}

RIS

TY  - JOUR
AU  - Zhu, Manli
AU  - Ho, Edmond S. L.
AU  - Chen, Shuang
AU  - Yang, Longzhi
AU  - Shum, Hubert P. H.
T2  - IEEE Transactions on Instrumentation and Measurement
TI  - Geometric Features Enhanced Human-Object Interaction Detection
PY  - 2024
PB  - IEEE
ER  - 

Plain Text

Manli Zhu, Edmond S. L. Ho, Shuang Chen, Longzhi Yang and Hubert P. H. Shum, "Geometric Features Enhanced Human-Object Interaction Detection," IEEE Transactions on Instrumentation and Measurement, IEEE, 2024.

Supporting Grants

Similar Research

Manli Zhu, Edmond S. L. Ho and Hubert P. H. Shum, "A Skeleton-Aware Graph Convolutional Network for Human-Object Interaction Detection", Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2022
Tanqiu Qiao, Qianhui Men, Frederick W. B. Li, Yoshiki Kubotani, Shigeo Morishima and Hubert P. H. Shum, "Geometric Features Informed Multi-Person Human-Object Interaction Recognition in Videos", Proceedings of the 2022 European Conference on Computer Vision (ECCV), 2022
Tanqiu Qiao, Ruochen Li, Frederick W. B. Li and Hubert P. H. Shum, "From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos", Proceedings of the 2024 International Conference on Pattern Recognition (ICPR), 2024
Qianhui Men, Howard Leung, Edmond S. L. Ho and Hubert P. H. Shum, "A Two-Stream Recurrent Network for Skeleton-Based Human Interaction Recognition", Proceedings of the 2020 International Conference on Pattern Recognition (ICPR), 2020

 

 

Last updated on 17 July 2024
RSS Feed