

An interaction behavior recognition model based on an improved PoseC3D is proposed to address the challenges of behavior recognition in smart cities, particularly in urban surveillance and public safety monitoring systems. These systems often face challenges in distinguishing between highly similar behavior categories and dealing with high computational complexity. Firstly, the lightweight high-resolution network Lite-HRNet is used to extract bone points from the UT-Interaction human interaction dataset and estimate the joint coordinates of objects in the video, thereby reducing the model’s computational complexity and making it more suitable for real-time urban applications. Secondly, the PoseC3D model is used as the base model to extract skeletal modal features, and the impact of skeletal point noise on the model is reduced through multiple sampling. Finally, the CBAM lightweight attention mechanism is introduced into the PoseC3D model, which improves the model’s ability to recognize similar samples by processing the input feature layer using both channel attention and spatial attention mechanisms. The experimental results show that the improved PoseC3D model achieves higher recognition accuracy than the original PoseC3D model on the UT-Interaction dataset, reaching 91.6%, which verifies the effectiveness of the improved method for smart city applications.