Hybrid Relation Guided Set Matching for Few-shot Action Recognition

Xiang Wang1,   Shiwei Zhang2*,   Zhiwu Qing1,   Mingqian Tang2,   Zhengrong Zuo1
Changxin Gao1,   Rong Jin2,   Nong Sang1*

1 Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China
2 DAMO Academy, Alibaba Group, China
Accepted to CVPR 2022   Paper link
Pytorch implementation is released in HyRSM under the project EssentialMC2 from DAMO Academy.

Schematic illustration of the proposed Hybrid Relation guided Set Matching (HyRSM) approach on a 3-way 1-shot problem. Given an episode of video data, a feature embedding network is first employed to extract their feature vectors. A hybrid relation module is then followed to integrate rich information within each video and cross videos with intra-relation and inter-relation functions. Finally, the task-specific features are fed forward into a set matching metric for matching score prediction.
Schematic illustration of the proposed Hybrid Relation guided Set Matching (HyRSM) approach on a 3-way 1-shot problem. Given an episode of video data, a feature embedding network is first employed to extract their feature vectors. A hybrid relation module is then followed to integrate rich information within each video and cross videos with intra-relation and inter-relation functions. Finally, the task-specific features are fed forward into a set matching metric for matching score prediction.
Abstract
Current few-shot action recognition methods reach impressive performance by learning discriminative features for each video via episodic training and designing various temporal alignment strategies. Nevertheless, they are limited in that (a) learning individual features without considering the entire task may lose the most relevant information in the current episode, and (b) these alignment strategies may fail in misaligned instances. To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric. The purpose of the hybrid relation module is to learn task-specific embeddings by fully exploiting associated relations within and cross videos in an episode. Built upon the task-specific features, we reformulate distance measure between query and support videos as a set matching problem and further design a bidirectional Mean Hausdorff Metric to improve the resilience to misaligned instances. By this means, the proposed HyRSM can be highly informative and flexible to predict query categories under the few-shot settings. We evaluate HyRSM on six challenging benchmarks, and the experimental results show its superiority over the state-of-the-art methods by a convincing margin.

Experiments
Comparison to recent few-shot action recognition methods on the meta-testing set of SSv2-Full, Kinetics, Epic-kitchens and HMDB51. The experiments are conducted under the 5-way setting, and results are reported as the shot increases from 1 to 5. ”-” means the result is not available in published works, and the underline indicates the second best result.
Comparison to recent few-shot action recognition methods on the meta-testing set of SSv2-Full, Kinetics, Epic-kitchens and HMDB51. The experiments are conducted under the 5-way setting, and results are reported as the shot increases from 1 to 5. ”-” means the result is not available in published works, and the underline indicates the second best result.
Results on 1-shot, 3-shot, and 5-shot few-shot classification on the UCF101 and SSv2-Small datasets. ”-” means the result is not available in published works, and the underline indicates the second best result.
Results on 1-shot, 3-shot, and 5-shot few-shot classification on the UCF101 and SSv2-Small datasets. ”-” means the result is not available in published works, and the underline indicates the second best result.
Last updated: March 2022

Citation

@inproceedings{wang2022hyrsm,
  title={Hybrid Relation Guided Set Matching for Few-shot Action Recognition},
  author={Wang, Xiang and Zhang, Shiwei and Qing, Zhiwu and Tang, Mingqian and Zuo, Zhengrong and Gao, Changxin and Jin, Rong and Sang, Nong},
  booktitle={{CVPR}},
  year={2022}
}

Contact

If you have any question, please contact Xiang Wang at wxiang@hust.edu.cn.