Abstract:To address the challenges of dynamic gesture human-computer interaction in the operating room environment, a gesture recognition algorithm named YOLO-RexNet, based on deep learning, was proposed. The dynamic gesture detection component utilized YOLOv8n, incorporated the DualConv convolution module into the network, reconstructed the C2f module with GhostNetV2, and applied model pruning techniques to reduce parameters and computation while maintaining high detection accuracy. Experimental results demonstrate a 70.9% reduction in parameters, a 69.7% reduction in computation, a 66.2% reduction in model size, and a 41.3% decrease in inference latency, with dynamic gesture-recognition accuracy reaching 99%. In the hand key point detection component, the Huber_Loss function was employed to enhance the precision of key point predictions. The proposed algorithm achieves lightweight design while ensuring the accuracy of dynamic gesture recognition, aiming to facilitate the gesture-controlled medical operations. When deployed on Jetson orin nano embedded edge devices, the frames per second can reach 65, which demonstrates practical potential.