Focalized Contrastive View-Invariant Learning for Self-Supervised Skeleton-Based Action Recognition

Qianhui Men, Edmond S. L. Ho, Hubert P. H. Shum and Howard Leung
Neurocomputing, 2023

 Impact Factor: 6.0

Focalized Contrastive View-Invariant Learning for Self-Supervised Skeleton-Based Action Recognition

Abstract

Learning view-invariant representation is a key to improving feature discrimination power for skeleton-based action recognition. Existing approaches cannot effectively remove the impact of viewpoint due to the implicit view-dependent representations. In this work, we propose a self-supervised framework called Focalized Contrastive View-invariant Learning (FoCoViL), which significantly suppresses the view-specific information on the representation space where the viewpoints are coarsely aligned. By maximizing mutual information with an effective contrastive loss between multi-view sample pairs, FoCoViL associates actions with common view-invariant properties and simultaneously separates the dissimilar ones. We further propose an adaptive focalization method based on pairwise similarity to enhance contrastive learning for a clearer cluster boundary in the learned space. Different from many existing self-supervised representation learning work that rely heavily on supervised classifiers, FoCoViL performs well on both unsupervised and supervised classifiers with superior recognition performance. Extensive experiments also show that the proposed contrastive-based focalization generates a more discriminative latent representation.

Downloads

YouTube

Citations

BibTeX

@article{men23focalized,
 author={Men, Qianhui and Ho, Edmond S. L. and Shum, Hubert P. H. and Leung, Howard},
 journal={Neurocomputing},
 title={Focalized Contrastive View-Invariant Learning for Self-Supervised Skeleton-Based Action Recognition},
 year={2023},
 volume={537},
 pages={198--209},
 numpages={12},
 doi={10.1016/j.neucom.2023.03.070},
 issn={0925-2312},
 publisher={Elsevier},
}

RIS

TY  - JOUR
AU  - Men, Qianhui
AU  - Ho, Edmond S. L.
AU  - Shum, Hubert P. H.
AU  - Leung, Howard
T2  - Neurocomputing
TI  - Focalized Contrastive View-Invariant Learning for Self-Supervised Skeleton-Based Action Recognition
PY  - 2023
VL  - 537
SP  - 198
EP  - 209
DO  - 10.1016/j.neucom.2023.03.070
SN  - 0925-2312
PB  - Elsevier
ER  - 

Plain Text

Qianhui Men, Edmond S. L. Ho, Hubert P. H. Shum and Howard Leung, "Focalized Contrastive View-Invariant Learning for Self-Supervised Skeleton-Based Action Recognition," Neurocomputing, vol. 537, pp. 198-209, Elsevier, 2023.

Supporting Grants

Similar Research

Zhengzhi Lu, He Wang, Ziyi Chang, Guoan Yang and Hubert P. H. Shum, "Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient", Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023
Meng Li, Howard Leung and Hubert P. H. Shum, "Human Action Recognition via Skeletal and Depth Based Feature Fusion", Proceedings of the 2016 ACM International Conference on Motion in Games (MIG), 2016
Jingtian Zhang, Hubert P. H. Shum, Jungong Han and Ling Shao, "Action Recognition from Arbitrary Views Using Transferable Dictionary Learning", IEEE Transactions on Image Processing (TIP), 2018
Qianhui Men, Howard Leung, Edmond S. L. Ho and Hubert P. H. Shum, "A Two-Stream Recurrent Network for Skeleton-Based Human Interaction Recognition", Proceedings of the 2020 International Conference on Pattern Recognition (ICPR), 2020
Jingtian Zhang, Lining Zhang, Hubert P. H. Shum and Ling Shao, "Arbitrary View Action Recognition via Transfer Dictionary Learning on Synthetic Training Data", Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016
Ying Huang, Hubert P. H. Shum, Edmond S. L. Ho and Nauman Aslam, "High-Speed Multi-Person Pose Estimation with Deep Feature Transfer", Computer Vision and Image Understanding (CVIU), 2020

 

 

Last updated on 14 April 2024
RSS Feed