Spatio-Temporal Action Verification of Basketball Travelling Dribble Violation Using Mediapipe-YOLO-LSTM Framework

Wicaksono, Haryo Bimo and Wahyono, Wahyono (2023) Spatio-Temporal Action Verification of Basketball Travelling Dribble Violation Using Mediapipe-YOLO-LSTM Framework. In: 3rd International Workshop on Intelligent Systems, IWIS 2023, 9 August 2023through 11 August 2023, Ulsan.

[thumbnail of 14. Spatio-Temporal_Action_Verification_of_Basketball_Travelling_Dribble_Violation_Using_Mediapipe-YOLO-LSTM_Framework.pdf] Text
14. Spatio-Temporal_Action_Verification_of_Basketball_Travelling_Dribble_Violation_Using_Mediapipe-YOLO-LSTM_Framework.pdf - Published Version
Restricted to Registered users only

Download (523kB) | Request a copy

Abstract

Basketball is a very dynamic game where the movements the players make can be fast and subtle, yet fine-grained. As such the difference between an event occurring and not occurring can be small and occasionally unnoticeable, and the sequence of frames of which the dribble is done matters tremendously in order to determine whether a violation takes place. As such, a spatio-temporal action verification model is needed to assist referees in making decisions, as it is almost instantaneous in nature. The challenge is to achieve balance between speed and accuracy, since the violation detection has to be fast enough for real-time use. Mediapipe Pose Estimation is chosen for its lightweight nature and capability for human joint feature extraction, and YOLO is trained to detect and extract basketball coordinates and person in order to pinpoint a region of interest for Mediapipe Pose Estimation in the case of failure to detect a player. The raw dataset consists of videos of basketball players dribbling properly and committing a travelling violation. Long Short- Term Memory (LSTM) is used to model the temporal dynamics of the movements the players make, in the form of a binary classification as to whether the violation occurs. Fine-tuning is done on various hyperparameters of the LSTM model, such as the batch size, layers, number of units, optimizer, dropout rate, loss function, etc. Stacked LSTM architecture is found to be the most suitable model, at an accuracy of 0.8690. The resulting pipeline manages to run at least 10 FPS, with enough overhead to theoretically achieve a higher framerate, after various methods of optimizations.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Basketball Violation Detection; LSTM; MediaPipe; Spatio- Temporal Action Verification
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Mathematics and Natural Sciences > Computer Science & Electronics Department
Depositing User: Ismu WIDARTO
Date Deposited: 04 Sep 2024 04:35
Last Modified: 04 Sep 2024 04:35
URI: https://ir.lib.ugm.ac.id/id/eprint/6350

Actions (login required)

View Item
View Item