

With the development of autonomous driving technology, there are still many challenges to accurately tracking pedestrians in complex environments, such as occlusion, dense crowds, and light variations. This study aims to design a highly adaptive pedestrian tracking system to improve the reliability of the autonomous driving perception system. In this paper, an innovative solution combining YOLOv8-pose and improved Bot-sort algorithm is proposed to integrate the detection bounding box, appearance features, and 17-point pose information through a multimodal feature fusion strategy, and the matching cost matrix is redesigned to enhance the tracking performance. Experimental results show that the proposed pose feature enhancement strategy significantly improves the system’s capability in similar appearance pedestrian differentiation and trajectory continuity, and it is well adapted to scenarios such as occlusion, dense crowds, and lighting changes. Meanwhile, the system maintains the real-time processing performance and provides reliable support for the automatic driving perception system, demonstrating the potential and value of multimodal feature fusion for pedestrian tracking in complex environments.