565
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Prediction of defensive success in elite soccer using machine learning - Tactical analysis of defensive play using tracking data and explainable AIOpen Materials

ORCID Icon, , , , &
Accepted 18 Jul 2023, Published online: 04 Aug 2023
 

ABSTRACT

The interest in sports performance analysis is rising and tracking data holds high potential for game analysis in team sports due to its accuracy and informative content. Together with machine learning approaches one can obtain deeper and more objective insights into the performance structure. In soccer, the analysis of the defense was neglected in comparison to the offense. Therefore, the aim of this study is to predict ball gains in defense using tracking data to identify tactical variables that drive defensive success. We evaluated tracking data of 153 games of German Bundesliga season 2020/21. With it, we derived player (defensive pressure, distance to the ball, & velocity) and team metrics (inter-line distances, numerical superiority, surface area, & spread) each containing a tactical idea. Afterwards, we trained supervised machine learning classifiers (logistic regression, XGBoost, & Random Forest Classifier) to predict successful (ball gain) vs. unsuccessful defensive plays (no ball gain). The expert-reduction-model (Random Forest Classifier with 16 features) showed the best and satisfying prediction performance (F1-Score (test) = 0.57). Analyzing the most important input features of this model, we are able to identify tactical principles of defensive play that appear to be related to gaining the ball: press the ball leading player, create numerical superiority in areas close to the ball (press short pass options), compact organization of defending team. Those principles are highly interesting for practitioners to gain valuable insights in the tactical behavior of soccer players that may be related to the success of defensive play.

Acknowledgements

The authors thank the German Football League (Deutsche Fußball Liga, DFL) for providing the match data used in this study.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The used data is a property of the German Football League (Deutsche Fußball Liga, DFL) and is not publicly available. The authors do not have permission to share the data publicly. This work can be reproduced using similar data from professional soccer (e.g., tracking data of other soccer leagues or data providers). To process the tracking data in this project, the floodlight package was used (a high-level data-driven sports analytics framework) (Raabe et al. Citation2022). This python package can be used to process similar match data to make use of the developed code of this investigation, which is publicly available on github: https://github.com/LForcher/kit_d-fine_sports-analytics.

Open scholarship

This article has earned the Center for Open Science badge for Open Materials. The materials are openly accessible at https://github.com/LForcher/kit_d-fine_sports-analytics

Additional information

Funding

The author(s) reported that there is no funding associated with the work featured in this article.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 280.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.