ABSTRACT
The interest in sports performance analysis is rising and tracking data holds high potential for game analysis in team sports due to its accuracy and informative content. Together with machine learning approaches one can obtain deeper and more objective insights into the performance structure. In soccer, the analysis of the defense was neglected in comparison to the offense. Therefore, the aim of this study is to predict ball gains in defense using tracking data to identify tactical variables that drive defensive success. We evaluated tracking data of 153 games of German Bundesliga season 2020/21. With it, we derived player (defensive pressure, distance to the ball, & velocity) and team metrics (inter-line distances, numerical superiority, surface area, & spread) each containing a tactical idea. Afterwards, we trained supervised machine learning classifiers (logistic regression, XGBoost, & Random Forest Classifier) to predict successful (ball gain) vs. unsuccessful defensive plays (no ball gain). The expert-reduction-model (Random Forest Classifier with 16 features) showed the best and satisfying prediction performance (F1-Score (test) = 0.57). Analyzing the most important input features of this model, we are able to identify tactical principles of defensive play that appear to be related to gaining the ball: press the ball leading player, create numerical superiority in areas close to the ball (press short pass options), compact organization of defending team. Those principles are highly interesting for practitioners to gain valuable insights in the tactical behavior of soccer players that may be related to the success of defensive play.
Acknowledgements
The authors thank the German Football League (Deutsche Fußball Liga, DFL) for providing the match data used in this study.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
The used data is a property of the German Football League (Deutsche Fußball Liga, DFL) and is not publicly available. The authors do not have permission to share the data publicly. This work can be reproduced using similar data from professional soccer (e.g., tracking data of other soccer leagues or data providers). To process the tracking data in this project, the floodlight package was used (a high-level data-driven sports analytics framework) (Raabe et al. Citation2022). This python package can be used to process similar match data to make use of the developed code of this investigation, which is publicly available on github: https://github.com/LForcher/kit_d-fine_sports-analytics.
Open scholarship
This article has earned the Center for Open Science badge for Open Materials. The materials are openly accessible at https://github.com/LForcher/kit_d-fine_sports-analytics