As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Most group activity recognition models focus mainly on spatio-temporal features from the players in sports games. Often they do not pay enough attention to the game object, which heavily affects not only individual action but also a group activity. We propose a new group activity recognition model for sports games that incorporates players’ motion information and game object positional information. The proposed method uses a transformer encoder for temporal feature extraction and a ’simple’ conventional convolutional neural network for extracting spatial features and fusing them with the relative ball position-embedded features. The experimental results show that our model achieved comparable results to state-of-the-art methods on the Volleyball dataset by using only one transformer encoder block and the ball position.