Abstract:To address limitations in gait recognition model regarding feature representation granularity and spatio-temporal dependency modeling, a novel model fusing multi-scale feature representation and attention mechanisms was proposed. The model consists of two key modules: multi-scale features fusion network (MFFN) structure and gait attention fusion module (GAFM). MFFN enhances the richness and discriminative power of feature representation through multi-scale and multi-granular feature fusion. GAFM effectively modeled long-term spatio-temporal dependencies by adaptively focusing on key frames and important regions in gait sequences. Experimental results on CASIA-B, CASIA-B*, and OUMVLP datasets show that the model outperforms existing models under various complex conditions, with average recognition rate improved by 0.9%, 0.3% and 0.6% respectively compared to the baseline model.