Abstract
In this paper the relationship between the outcome of a football match (win, lose or draw) and a set of variables describing the game actions is investigated across time, by analyzing data from 4 consecutive yearly championships. The aim of the study is to discover the factors leading to win the match. More precisely, the goal is to select, from hundreds of covariates, those that most strongly affect the probability of winning a match, to recognize regularities across time by identifying the variables whose importance is confirmed in different analyses, and finally to construct a small number of composite indicators to be interpreted as drivers of match outcome. These tasks are carried out using the Random Forest machine learning algorithm, in order to select the most important variables, and Principal Component Analysis, in order to summarize them into a small number of drivers. Variable selection is performed using the novel approach developed by Sandri and Zuccolotto [33–34].
Additional information
Notes on contributors
Maurizio Carpita
Maurizio Carpita is Full Professor of Statistics at the Department of Economics and Management, Scientific Director of the DMS StatLab (Data Methods and System Statistical Laboratory) at the University of Brescia — Italy and Scientific Coordinator of the Statistical Area at the EURICSE (European Research Institute of Cooperatives and Social Enterprises) — Italy. His main research interests relate on collection and organization of database, statistical methods, models, and algorithms for the measurement of perceptions, the applications of data analysis in economics and social sciences. His studies are about the measurement of subjective work quality and the socio-economic impact of cooperatives, nonprofit organizations and social enterprises.
Marco Sandri
Marco Sandri is Statistical Consultant and member of the DMS StatLab (Data Methods and Systems Statistical Laboratory) at the University of Brescia. He has authored/co-authored over 70 academic papers and articles in international journals and international conferences in the field of statistics and applications of statistics to life sciences. His main research areas are data mining, computational statistics and biostatistics.
Anna Simonetto
Anna Simonetto is Research Fellow of Statistics at the Department of Economics and Management and member of the DMS StatLab (Data Methods and System Statistical Laboratory) at the University of Brescia — Italy. Her main research interests focus on multivariate data analysis, structural equation modeling, statistics for social science data analysis and big data analysis.
Paola Zuccolotto
Paola Zuccolotto is Associate Professor of Statistics at the Department of Economics and Management and member of the Scientific Committee of the DMS StatLab (Data Methods and System Statistical Laboratory) at the University of Brescia — Italy. Her research topics cover data analysis, data mining and statistical modelling, with specific interest to prediction, feature selection, classification, dimensionality reduction, latent variables measurement, and applications in several different contexts (marketing, finance, social sciences, sensory analysis, sport, medicine, genetics).