790
Views
2
CrossRef citations to date
0
Altmetric
Articles

A knowledge discovery framework for the assessment of tactical behaviour in soccer based on spatiotemporal data

, , , &
Pages 384-398 | Received 25 Feb 2016, Accepted 27 May 2017, Published online: 07 Jun 2017

ABSTRACT

This paper addresses the problem of designing an explanatory computational model for the assessment of individual tactic skills in team sports. The modelling approach tackles the complexity and difficulty of this problem by fusing fuzzy human-like knowledge related to tactical behaviour with time-continuous position data from a tracking system. For this purpose, a hierarchical architecture is proposed. The bottom layer is represented by physically meaningful variables derived from time-continuous position data at specific time instances. Based thereupon, we introduce a temporal segmentation layer that relates the physical variables to game-situation-specific temporal phases. We show how the vague and imprecisely defined linguistic description of the task at hand can be transferred to fuzzy rules in order to get a meaningful temporal segmentation of the time-continuous position data. Finally, the resulting clusters are interpreted in terms of performance indicators in the top layer in order to provide a meaningful explanatory model for the assessment. We show the usefulness of our approach for the task of player evaluation. We do not only provide the coach with a single number to describe the players’ performance but also relate this number to the measurement variables, presenting a more holistic and sophisticated view of the players’ performance.

1. Introduction and problem statement

Mackenzie and Cushion [Citation1] conclude in their critical review about current performance analysis research in sports that variables are often investigated as a result of availability rather than developing a deeper understanding for performance. Therefore, performance analysis studies often fail to provide useful information for the coaching practice due to the multifaceted, complex and largely unpredictable behaviour in team sports like soccer. In the authors’ opinion, holistic approaches that include more naturalistic and qualitative methods, such as case studies, interviews and mixed methods, are needed for this kind of analysis [Citation2]. Additionally, the analysis results have to be translated into terms of the sports’ practice in order that coaches can apply the investigations’ conclusions.

Due to the increased availability of position data in sports in recent years, motion analysis [Citation3] has become an important technique for analysing competitions, in particular in team sports [Citation4]. Most research done in this field mainly focuses on time–motion analysis, whereas investigations analysing individual or collective actions from a tactical point of view are rather scarce [Citation5]. This may be due to the high complexity of player behaviour in team sports [Citation6] not allowing straightforward conclusions from the kinematic data to judgements of tactical behaviour. Nevertheless, our pre-assumption is that time-continuous position data of the players contain most of the crucial information characterizing tactical behaviour in team sports. This assumption is supported by the fact that tactical assessments by experts most often base on game/training observations, which in turn consider mainly the spatio-temporal appearances and changes over time of soccer players on the field.

We therefore pursue a modelling approach, which (a) is capable of extracting the essential information to perform expert-oriented assessments of tactical behaviour in soccer (meaning that actions are rated by the standard of expert knowledge), (b) adequately handles the high complexity of player behaviour in team sports and its corresponding data, (c) allows for a linguistic interpretation of the analysis results in terms of the sport expert’s terminology and (d) provides an assessment of the individual player performance based on the discovered hidden tactical patterns. This data-based approach uses position and tracking data provided by a local positioning measurement system [Citation7] and allows the qualification of assessments based on measurements.

The paper is organized as follows: Section 2 gives an overview about different directions of research dealing with the problems stated earlier. In Section 3, we introduce our hierarchical modelling approach linking time-continuous position variables with performance indicators in team sports. We describe the fuzzy variables and explain how behavioural patterns are identified. In Section 4.1, we introduce the one-versus-one (1 vs. 1) test case in soccer (situations, where one defender competes against one attacker) to illustrate the application of our framework on real world data. Finally, we discuss our approach and the results of the test case and give an outlook on ongoing work (Section 5).

2. Related work

The increasing availability of spatio-temporal data nowadays allows for an in-depth analysis of tactical behaviours in team sports. Two main directions of research recently emerged. Predictive approaches such as the works by Goldsberry, Yue et al. and Cervone et al. [Citation8Citation10] use the raw spatio-temporal data to simulate player behaviour on a fine-grained level. In the work by Cervone et al. [Citation10], the authors use a two-layered stochastic process model to take full advantage of the high resolution position data during a basketball possession in order to reveal decision-making tendencies of players based on their spatial strategy. Such fine-grained modelling of behaviours in team sports needs a vast amount of tracking data for model calibration, which is available only for very few team sports right now.

The other direction is to use the spatio-temporal data to constrain the statistical analysis of behavioural patterns on the context they have been performed. For instance, in the work by Vilar [Citation11], the phase relations between players during dyadic situations (situations with two main components; i.e. defender vs. attacker) have been explored with dependence on the location of the ball and the goal. The use of only coarse aggregate statistics to describe the context allows the use of such models also for relatively small data sets. However, results constrained by a rich context are often difficult to interpret if no qualitative description of the results is available. This line of research often lacks a systematic meaningful description and thus is not easily accessible by the coach.

Another way of modelling a complex dynamic scenario refers to imitating qualitative human linguistic descriptions by means of linguistic fuzzy variables and connecting them by means of fuzzy logical concepts in terms of IF-THEN rules and fuzzy relations [Citation12Citation14]. This approach was motivated by the observation that human operators can handle processes which are difficult to be handled robustly by conventional control methods. Particularly, fuzzy logic has been widely applied in managing non-linear, multivariable control problems in process industry [Citation15]. This way, it is natural to use fuzzy IF-THEN rules for characterizing events and, in particular, the beginning and end of temporal phases of events. However, this system of rule bases is likely to become unmanageably complex when the number of IF-THEN rules for various events is increasing. A way to face this complexity is to structure the set of events by a Petri Net which represents a dynamic process by means of a directed graph of less complex state-transition process units [Citation16,Citation17]. While classical Petri Nets rely on crisp and well-decidable states, in the context of team sport scenarios we need to take up fuzzy extensions of Petri Nets. The extension of Petri Nets with fuzzy reasoning has a long tradition [Citation18Citation22]. Ribaric et al. [Citation22] proposes a high-level Petri Net that uses fuzzy spatio-temporal relations for modelling and controlling purposes in the context of robot soccer.

In this paper, we take up this idea of a high-level Petri Net approach as an inspiration to introduce a temporal segmentation into phases and transitions between them. In order to get a meaningful temporal segmentation in the point of view of the coach, we employ fuzzy IF-THEN rules for the characterization and temporal localization of the transition states. This temporal segmentation is the preceding layer for the unsupervised clustering from which the spatial relation between the players are extracted in order to discover hidden tactical patterns and to interpret them by fuzzy linguistic expressions.

3. Hierarchical modelling approach linking time-continuous position variables with performance indicators

The overall goal is to develop a data-based approach based on position and tracking data provided by local positioning measurement systems (see e.g. [Citation1,Citation5,Citation7]) that allows to qualify assessments based on measurements. Tactical assessments are typically semi-structured and, above all, expressed in qualitative, vague and imprecisely defined linguistic concepts. Moreover, such assessment statements often rely on implicit common sense and intuitive reasoning assumptions. Therefore, we postulate that the knowledge representation has to satisfy the following properties:

  • the vagueness and imprecision related to temporal, spatial and spatio-temporal relationships have to be expressed in a form that is appropriate for human experts and users;

  • the knowledge base design should allow a hierarchical representation of the scenarios at different abstraction levels;

  • the design has to be based on a well-defined formalism and semantics that allow a formal analysis of different spatial, temporal and spatio-temporal relationships among the objects and conclusions based thereupon.

Our approach relies on including the knowledge of a sport expert into the design process of the computational analysis problem. Our goal is to define this analysis model in a way that it can provide a linguistic interpretation of the analysis results in terms of the sport expert’s terminology.

shows a block diagram of our conceptional model. The bottom layer of our model corresponds to the physical measurements. To enrich the expressiveness of the available measurement variables, physically meaningful variables like velocity and acceleration of the attacker (defender, resp.) are derived from the position data. Based thereupon, rules to define the transition moments (time instances, where a game switches from one tactical phase to the next; see, for an example, Section 4.1) in the game are extracted from the sport expert’s view in order to decompose the temporal domain into meaningful phases (second layer). This results in a directed graph of temporal phases (states) and their transition moments. The second layer can be considered as an interface between these physical variables, on the one hand, and game-situation-specific higher-level concepts on the other. With unsupervised clustering techniques in the third layer, different behavioural patterns are extracted and related to performance indicators. Finally, in the last layer the clusters are interpreted in terms of meaningful tactical patterns as conditionals of an explanatory model for the assessment.

Figure 1. Sketch of the conceptional model. Based on the coach’s view on the exercise, we structure the temporal domain into meaningful phases. Phase transitions are the crucial moments of the exercise, and are therefore described by measurements based on the position data. Using unsupervised clustering techniques, different behavioural patterns are extracted and related to performance indicators.

Figure 1. Sketch of the conceptional model. Based on the coach’s view on the exercise, we structure the temporal domain into meaningful phases. Phase transitions are the crucial moments of the exercise, and are therefore described by measurements based on the position data. Using unsupervised clustering techniques, different behavioural patterns are extracted and related to performance indicators.

3.1. First layer: feature construction

The first crucial step in our model approach is the design of the input features for the next layers. In order to get a meaningful clustering result later on, the input features have to be constructed in such a way that the crucial differences between the player behaviours are captured to a high extend. Based on the experts’ view on the exercise, in a first step, basic measurement variables like acceleration, speed or distances between players are selected. In a second step, the measurement window is defined in order to reduce the noise in the position data and to take care of the temporal characteristic of the signal. For very complex game situations, the number of constructed features can increase fast due to player permutations (for example, the pairwise distances between players). Therefore it is important to include the experts’ view on the exercise to remove less promising features from the beginning.

3.2. Second layer: temporal segmentation

The key problem when working with spatio-temporal data is how to reduce the high complexity represented in the data to a manageable level. In our modelling approach, we follow the way of thinking as proposed by Carl Adam Petri [Citation16] in order to decompose complex processes into a scheme of less complex state-transition process units. However, such a discrete event-dynamic system formalism can only be applied in a reasonable way when the behaviour of the system can be decomposed into separate states with well-defined conditions for state transitions. This means that all states, transitions and conditions need to be known in advance and be well defined. This brings us to the central question: What are adequate states and transitions in the context of team sports?

The answer to this question depends on the abstraction level to be chosen. In soccer for instance, on a high level, the game can be segmented in the phases of possession play, counter attack, free ball or ball out of play. On an intermediate level, a pass to a teammate usually leads to an immediate reaction (e.g. a shift of the position) of all other players and thus can be seen as a transition moment. At the level of ball possession the decomposition of the scenario depends on the task at hand. For instance, in a situation like a duel between two players, the process of possessing, loosing or winning the ball can be described by three states, e.g. the attacker is approaching the defender, the confrontation phase between the two players, and the final duel between them.

Obviously, such evaluations become more informative when circumstances and spatio-temporal characteristics of the game scenario are taken into account, e.g. whether there is much pressure on the player or not; duration of the duel; speed of the players; location on the field, etc. Pressure is an example for an attribute whose occurrence is not crisply defined but it is fuzzy [Citation12,Citation13]. Also, though spatio-temporal variables like location, duration, speed, etc. are crisp in nature, for statistical evaluation and interpretation purposes, clusters like short versus long duel are more informative in order to provide discretely distinguishable descriptions of situations. This motivates us to introduce fuzzy spatio-temporal relations.

Current tracking systemsFootnote1 provide position data with a high-time resolution. In order to reduce the overwhelming amount of such tracking data, we look for a coarse-to-fine approximation of the measured variables vi as functions in time. As a result of this approximation step, we obtain fuzzy variables as basis for fuzzy IF-THEN rules for characterizing game-specific events. For this, we outline an approach which was recently examined in the context of event-based signal processing [Citation23] and is based on the so-called discrepancy measure.

This approach allows us to approximate the time-dependent variables vi by fuzzy rules in the following way:

  • Let P1,,Pn be normalized fuzzy subsets with (trapezoidal) membership functions based on crisp intervals given by μP1,,μPn.

  • Let vˉi be the average of the variable v over the time interval Pi.

  • Define rule base (k = 1,…,n): If tPk, then v=vˉk.

This approach provides the approximation

(1) v(t)=iμPi(t)viiμPi(t)(1)

Starting with the variables vi and their representations as fuzzy variables as atomic expressions of our representation language, we are able to express composed statements by employing fuzzy logical connectives and their semantics, e.g. Lukasiewicz logics [Citation24]. The transition moment ti is then that point in time where the truth value of the corresponding fuzzy variable vi is greater than or equal to the firing threshold θi. The firing thresholds (θ1, θ2 and θ3) were optimized with particle swarm optimization algorithm by minimizing the mean squared error between the obtained and the true transition moment. Examples of such fuzzy rule bases are outlined in Section 4.1.

3.3. Third layer: behavioural pattern identification

One main focus of this paper is the identification of different behavioural patterns of the players and to describe them in a qualitative way. We do this with a three-stage approach. First, we extract a set of predefined features at and between transition moments. The idea here is that the extracted features describe the dynamics of the scene as complete as possible. Since the features relay on the position data, they are either based on the speed and acceleration of the players or are composed of relative properties between the players.

Second, we use unsupervised dimensionality reduction methods to reduce the number of features to a small set and use clustering methods to identify different behavioural patterns. Since even simple situations lead to a considerable number of input features (as we will show later in this paper), we need to reduce the high number of dimensions of the input space in order to be able to provide a linguistic description of the clusters which can be interpreted by a coach. Classical dimensionality reduction methods like the Principal Component Analysis cannot be used, because they only reduce the dimensionality of the feature space but not the number of variables. He et al. [Citation25] have shown that the Laplacian Score can be used to effectively reduce the number of features in an unsupervised way. We use the Laplacian Score to rank the features by their importance. Only the most important features are then used for the clustering method. For clustering we use the spectral clustering method as it does not make strong assumptions on the form of the cluster [Citation26]. It basically performs an additional dimensionality reduction of the input data keeping the relative similarity of each pair of points intact.

Finally, we provide a conceptional characterization for each cluster based on fuzzy predicates. We define a set of fuzzy predicates for each feature using natural language expressions like small, middle and high. To define the membership functions for each predicate, we adopt the approach outlined in [Citation27]. The centre of each membership function is set to the corresponding cluster centre. The overlap between neighbouring predicates is set to half of the distance between the cluster centres.

3.4. Fourth layer: player evaluation based on performance indicators

After several behavioural patterns have been identified, the next step is to interpret these patterns in terms of performance indicators in order to provide a meaningful explanatory model for the player assessment. The basic idea here is to use the cluster assignments of the individual attempts to, first, calculate an average value for each cluster and, second, to use the average values per cluster to generate a score for each player individually based on his cluster assignments. The final value for each player indicates then the expected average performance of a player based on the choices he made. The comparison of the estimated value with the real performance of that player enables then the coach to judge the tactical skills of the player (e.g. which pattern has been chosen) on the one hand, and to identify under/over performers based on the individual decisions made on the other.

Concretely, for the assessment of the one-versus-one situation later on, we define a goal scoring index (GSI), which quantifies the individual performance of a player with respect to the average player in terms of goals scored. The GSI is defined as follows:

(2) GSI=E[G|player]E[G]=i=1NE[G|Ci]P(Ci|player)E[G](2)

where E[G] is the number of expected goals per game (i.e. total number of goals divided by the number of games), E[G|Ci] is the number of expected goals for cluster Ci and P(Ci|player) is the probability that the player is following the behavioural pattern described by cluster Ci.

Next, we outline by the one-versus-one test case in detail (a) the approximation of time-dependent variables like speed, acceleration in terms of the meaningful temporal phases; (b) the aggregation of the individual rules to a common model (cluster analysis); and (c) the evaluation of the individual performance based on the clustering result.

4. Application of the approach on real data

4.1. One-versus-one test case

The study of the tactical behaviour of two players in a one-versus-one situation in soccer will serve as an introductory example to motivate and explain our approach. Tactical skills refer to the quality of an individual player to perform the right action at the right moment at the right place. For this purpose, one-versus-one soccer games with defined game situations are considered in order to generate a representative quantity of data sets. The exercise is outlined in in depth. A detailed description of the data-gathering process and the study design can be found in [Citation28].

Table 1. Summary of the description of the one-versus-one exercise. See [Citation28] for the full study design.

The spatio-temporal characteristics of two players during such a one-versus-one situation are captured in their position data. Very accurate position data is required for tactical analysis, because of the high sensitivity of tactical decisions to spatial information at the critical moments. Although the increasing availability of positional data in recent years, its analysis in respect to tactical behaviour has only been exploited to a limited extend so far. The dynamic nature as well as the spatial complexity of such data demands for a data reduction approach, which on the one hand is able to capture the key features of the tracking data and on the other hand is still interpretable by a non-technical expert.

4.2. Modelling of the one-versus-one scenario

Let us consider our one-versus-one scenario as described in . A sport expert tells us that the one-versus-one scenario can be split into the following meaningful temporal phases: ‘approaching’, ‘confrontation’ and ‘duel’. We formalize this linguistic model in terms of fuzzy logical concepts by introducing the following fuzzy variables:

Let v(s) denote the (fuzzy) truth value of the statement s. Then we are able to model the expert’s conception of temporal segmentation by means of the structure of the following fuzzy rule base: s

where the time instant t1 defines the transition time point from phase i to phase + 1. The statement ‘defender stops decelerating’ is represented by the negation of s2, which, by applying Lukasiewicz logics, leads to the truth value v(¬s2(t))=1v(s2(t)). Since this linguistic description is only a rough model, we need a parametrization which allows model fitting with respect to the observed measurement data. Our approach for this parametrization is based on introducing the rule activation thresholds θi.

The approach of representing and processing the analysis problem consists of two steps: the first step (second layer) consists of the derivation of a parametrized computational model that reflects the linguistic description provided by the expert, in our case R1, R2 and R3; by applying an optimization technique (particle swarm algorithm), the model parameters, in our case θ1, θ2 and θ3, are learned from the data by minimizing the mean squared error between the calculated and the true transition moments. We consider only t2 and t3 for optimization since the beginning of the game t1 has no relevance from a tactical point of view. The true transition moments t2 and t3 have been provided by the sport expert.

The second step (third layer) consists of the extraction of meaningful features based on the second layer model, and to use them for the qualitative description of the players’ behaviours found by the cluster analysis. In order to identify different behavioural patterns hidden in the data, we used the spectral clustering method (see [Citation26] for details) on an automatically reduced subset of the input features. The input features for clustering were constructed from the position data at and during the two crucial transition moments t2 and t3. They include features like average speed and acceleration of the players as well as some relative properties like the distance between the players. The automatic reduction of the feature space to a predefined number was based on the Laplacian Score of each feature (backward feature selection). We used the Laplacian Score method because of its power of preserving locality in the data space [Citation25].

4.3. Evaluation of the one-versus-one scenario

The one-versus-one scenario as described in [Citation28] was performed by a high-level youth team in several training sessions under similar weather conditions. The true transition moments t1, t2 and t3 were independently provided by two sport experts by means of marking them on recorded videos of the training exercises. The experts were two scientists with long-term experience in soccer, who were involved in the investigations and familiar with the coaches’ judgement criteria gathered by interviews. In total, 121 games have been recorded and analysed by the experts. We used a subset of M=87 games to optimize the firing thresholds as described in Section 3.2, and the remaining 34 games for the calculation of the disagreement with the two sport experts (validation error). The inter-rater-reliability as a measure for the objectivity of the experts’ input was acceptable as outlined in [Citation28].

4.3.1. Temporal segmentation by transition moment detection

shows the disagreement between our approach and the first expert (E1–NN), the disagreement with the second expert (E2–NN) and for comparison, the disagreement between the experts (E1–E2). For time point t2 on the other hand, the median value of the disagreement of our model with both experts is lower than (E1–NN) and almost equal (E2–NN) to the disagreement between the two experts (E1–E2). Since t2 is well defined in terms of the provided position data (e.g. it is the time point where the defender has to revert the direction of movement), the better agreement of our method with the experts shows that the fuzzy modelling approach is capable of capturing the experts’ view.

Figure 2. The disagreement in the transition moment prediction between the two sports experts (E1–E2) and the disagreement between our detection algorithm with the first expert (E1–NN) and second expert (E2–NN). Left plot shows results for t2, right plot for t3, respectively.

Figure 2. The disagreement in the transition moment prediction between the two sports experts (E1–E2) and the disagreement between our detection algorithm with the first expert (E1–NN) and second expert (E2–NN). Left plot shows results for t2, right plot for t3, respectively.

However, the definition of transition moment t3 is more vague (see ), which is also reflected in a higher disagreement of our model with the experts than the disagreement between the experts itself. Interestingly, the disagreement between the experts at t3 is smaller than at t2, whereas for our model it is the opposite. This shows that the concept of passing is well defined and the experts can easily infer it from the video. The transition moment at t2 on the other hand is easier to identify in the tracking data than from the video, which explains the slightly higher disagreement between the experts.

4.3.2. Tactical pattern discovery

In the first step, we automatically reduced the data set from 12 to 7 features based on the Laplacian Score of each feature (see Section 4.2 for details). Thereafter, the remaining 7 features were used to perform the spectral clustering. The clusters found were then described by four of the seven features, which were manually selected based on how well they separate the clusters. The selected features were:

  • avg velocity before confrontation phase: average velocity of the attacker during the interval [t20.4s,t2];

  • avg acceleration in confrontation phase: average acceleration of the attacker during the interval [t2,t3];

  • max velocity difference: maximum absolute difference between attacker’s and defender’s speed during the interval [t2,t3];

  • acceleration at passing moment: average acceleration of attacker during the interval [t30.04s,t3+0.04s].

We defined a set of fuzzy predicates for each feature using natural language expressions like small, middle and high. These expressions are modelled automatically taking the clustering result into account.

shows the clustering result (left) and the derived linguistic description (right). The found patterns show clear differences in all three performance indicators. Cluster B (diamond) for example has the highest goal rate whereas an attacking attempt assigned to cluster C (square) is less promising. The distinct differences between the clusters can also be seen in the shot ratio, but they are not so prominent as for the goal ratio. From the coach’s perspective, it is important to know what the different patterns distinguish. The linguistic description translates the patterns into the fuzzy expressions for each measurement variable. Looking at the linguistic description of the cluster, we can see that the successful pattern B is the one with high velocity before the confrontation phase and the highest difference in the maximal velocity between attacker and defender. In this case, the attacker is able to gain more space between him and the defender, and therefore receives less pressure during his attempt to score a goal.

Figure 3. Spectral clustering result with performance indicators for three identified behavioural patterns (left) and the derived linguistic description based on four manually selected features (right).

Figure 3. Spectral clustering result with performance indicators for three identified behavioural patterns (left) and the derived linguistic description based on four manually selected features (right).

4.4. Assessment of individual player performance

As described in Section 3.4, the discovered tactical patterns are used for the assessment of individual performances. shows the performance values of 14 selected players (all players with data for at least 6 games available), belonging to one team squad. The average goal ratio based on all 20 players is 0.153. The column Goal ratio depicts the goal ratio of each player, calculated by the players’ numbers of games and their achieved goals. The last three columns represent the player’s proportions for using each identified tactical pattern (see previous section). Based on these proportions and the known goal ratios for each pattern, column 5 (Exp-GR) shows the expected goal rate. Thus, this value outlines an objective measure for the one-versus-one performance with respect to a player’s tactical behaviour. GSI (see Section 3.4) normalizes the Exp-GR parameter against the whole group: values >1 represent skills better than, values <1 less than the average.

Table 2. Summary of the player assessment.

This approach solves a main problem of game-related testing in team sports. Coaches have often only few attempts of tests available to judge a player’s performance for a specific skill. For offensive actions like our one-versus-one example, the outcome (i.e. if a goal is scored or not) is conventionally the most important criterion to rate a player’s performance. For single events (shots), it is often pure coincidence if a goal is scored or not (e.g. the ball misses the goal very close even though the player performed apparently very well). Thus, having only few data can warp the objective assessment when mainly focusing only on the outcome, whereas our approach considers the whole performance of the player including his interaction with the opponent.

In , the first 5 players represent over-performers, all other players are under-performers compared to the average. Even though player 4 is ranked third (GSI = 1.30, expressing that his performance is 30% better than the average player), his goal ratio is 0.0, because he did not score a goal in one of his 7 games. Usually such a result would cause a very negative attitude to a player’s performance. Based on our approach, the coach would get the information that the player in fact performs very well. His bad scoring record could result by chance and misfortune, but maybe also by bad skills in the crucial scoring moment. However, the results would force the coach to observe the player’s performance in more detail and avoid prejudging his performance. On the other hand, player 2 scored 4 goals in 7 games which leads to an unreasonable high value of his goal ratio (0.57). In contrast, the expected goal ratio of this player based on the patterns he has chosen is 0.22. This value is in a more reasonable range and therefore facilitates the comparison of the performance between the players. Players 9–19 () did not score more than one goal, although they have six to nine games. This goes along with their bad GSI values. Even though it could be the case that there is a player with low GSI values but a high actual goal ratio, which would be in particular remarkable when confirmed by more data. This would mean that the player is very efficient although he chooses low efficient tactical patterns compared to the average. Such a result would advise the coach also to reanalyse this player’s performance in more detail, because the player is apparently capable of extraordinary skills. In our context this could be hints for creativity, which is hard to diagnose with any testing procedure.

5. Discussion and outlook

The paper at hand discusses an approach for modelling the tactical behaviour of players in game-like situations of soccer from an expert’s perspective. The goal was to design a four-layered hierarchical model with a temporal segmentation based on fuzzy variables and a data analysis layer for the interpretation of the individual player behaviours. The modelled transition states were designed to be as similar as possible to the real game events as perceived by the sports expert. The extracted features derived at the transition moments take the full spatio-temporal position data into account, instead of looking only at the outcome (e.g. ball lost, goal shot) as done by others (see [Citation1] for a critical review). The pre-assumption was that time-continuous position data of the players contain most of the crucial information which characterizes tactical behaviour in soccer. Spatio-temporal appearances and changes over time of the players is one of the main points experts consider during game/training observations. However, there are additional elements, like the viewing angle of players, which are not directly reflected in the data. Anyway, for the specific situation analysed in this paper, they are of minor relevance. If we look at the measurements in an isolated way, significant differences between successful and not successful passing attempts can be detected for the one-versus-one situation as we have shown in [Citation28]. However, this kind of analysis did not take into account interactions between the measurement variables. With the multifactorial statistical analysis, as presented in this paper, we are able now to present the complex interaction between the four selected measurement variables in a qualitative way that is suitable for the coach.

In our experimental setting, we only considered two competing players for a less complex tactical situation in game sports. The number of possible interaction patterns between the players is therefore rather low and the assumption of three clusters is reasonable. However, for more complex tactical situations, we expect the number of interaction patterns to increase strongly. Choosing the right number of clusters (k) in an automatic way can then be problematic, because the correct choice of k is often ambiguous. Furthermore, with an increasing number of clusters, the generated qualitative descriptions get harder to interpret. To avoid this problem altogether, a hierarchical clustering approach as described in [Citation29] should be adopted. This would allow for a top-down approach, where initially some general behaviours are generated and then refined to more distinct moving patterns as we move down to lower levels of the hierarchy.

Our general observation is that the factor of randomness and its implication for performance analysis in soccer is frequently underestimated by the practitioners. Our approach shows that there are statistical alternatives (e.g. GSI concept based on conditional expectations) for assessing the performance by taking statistical dependencies between tactical patterns and technical skills into account. It remains future work to further explore the potential of using contextual information like behavioural patterns for performance analysis.

Acknowledgements

We would like to thank Manfred Uhlig for helpful discussions, the InmotioTec GmbH for their support during the measurements and the youth academy of Austria Vienna for their participation in the data-gathering process.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The research reported in this article has been partly supported by the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry of Science, Research, and Economy and the Austrian Research Promotion Agency (10.13039/501100004955) in the frame of the COMET center SCCH.

Notes

References

  • R. Mackenzie and C. Cushion, Performance analysis in football: A critical review and implications for future research, J. Sports Sci. 31 (2013), pp. 639–676. doi:10.1080/02640414.2012.746720
  • G. Tenenbaum and M. Driscoll, Methods of Research in Sport Sciences: Quantitative and Qualitative Approaches, Meyer and Meyer, Oxford, 2005.
  • C. Carling, J. Bloomfield, L. Nelsen, and T. Reilly, The role of motion analysis in elite soccer: Contemporary performance measurement techniques and work rate, Sports Med. 10 (2008), pp. 839862.
  • R. Leser and K. Roemer, Motion tracking and analysis systems, in Computer Science in Sport: Research and Practice, A. Baca, Ed., Routledge, London, 2014, pp. 82–109.
  • R. Leser, A. Baca, and G. Ogris, Local Positioning Systems in (Game) Sports, Sensors 11 (2011), pp. 9778–9797. doi:10.3390/s111009778
  • F. Lebed and M. Bar-Eli, Complexity and Control in Team Sports: Dialectics in Contesting Human Systems, Vol. 6, Routledge, London, 2013.
  • G. Ogris, R. Leser, B. Horsak, P. Kornfeind, M. Heller, and A. Baca, Accuracy of the LPM tracking system considering dynamic position changes, J. Sports Sci. 30 (2012), pp. 1503–1511. doi:10.1080/02640414.2012.712712
  • K. Goldsberry, Courtvision: New visual and spatial analytics for the NBA, in MIT Sloan Sports Analytics Conference, 2012.
  • Y. Yue, P. Lucey, P. Carr, A. Bialkowski, and I. Matthews, Learning fine-grained spatial models for dynamic sports play prediction, in IEEE International Conference on Data Mining (ICDM), IEEE, 2014, pp. 670–679.
  • D. Cervone, A. D’Amour, L. Bornn, and K. Goldsberry, A multiresolution stochastic process model for predicting basketball possession outcomes, arXiv preprint arXiv:1408.0777, 2014.
  • L. Vilar, D. Arajo, B. Travassos, and K. Davids, Coordination tendencies are shaped by attacker and defender interactions with the goal and the ball in futsal, Hum. Mov. Sci. 31 (2014), pp. 16391651.
  • L.A. Zadeh, Fuzzy sets, Inf. Control 8 (3) (1965), pp. 338–353. doi:10.1016/S0019-9958(65)90241-X
  • L.A. Zadeh, Fuzzy logic and approximate reasoning, Synthese 30 (3–4) (1975), pp. 407–428. doi:10.1007/BF00485052
  • V. Novak and S. Lehmke, Logical structure of fuzzy IFTHEN rules, Fuzzy Sets Syst. 157 (2006), pp. 2003–2029. doi:10.1016/j.fss.2006.02.011
  • E.K. Juuso, Fuzzy control in process industry: The linguistic equation approach, in Fuzzy Algorithms for Control III, 1999, pp. 243–300.
  • C.A. Petri, Kommunikation mit Automaten, Institut fr instrumentelle Mathematik der Universität Bonn, 1962.
  • J.L. Peterson, Petri Net Theory and the Modeling of Systems, Prentice-Hall, NJ, 1981.
  • J. Knybel and V. Pavilska, Representation of Fuzzy IF-THEN rules by Petri Nets, ASIS 2005, MARQ, Research Report No. 84, 2005, pp. 121–125.
  • S.-M. Chen, J.-S. Ke, and J.-F. Chang, Knowledge representation using fuzzy Petri nets, IEEE Trans. Knowl. Data Eng. 2 (3) (1990), pp. 311–319. doi:10.1109/69.60794
  • K. Jensen, Coloured Petri nets: A high-level language for system design and analysis, in Advances in Petri Nets 1990, Lecture Notes in Computer Science, G. Rozenberg, Ed., Vol. 483, Springer-Verlag, 1991, pp. 342–416.
  • J. Cardoso, R. Valette, and D. Dubois, Fuzzy Petri net: An overview, Proc. 13th IFAC World Congr., 1996, pp. 443–448.
  • S. Ribaric and T. Hrkac, A model of fuzzy spatio-temporal knowledge representation and reasoning based on high-level Petri nets, Inf. Syst. 37 (3) (2012), pp. 238–256. doi:10.1016/j.is.2011.09.010
  • B.A. Moser, Matching event-sequences approach based on Weyl’s discrepancy norm, Event-based Control, Communication and Signal Processing, 2015, pp. 1–5.
  • R. Kruse, J.E. Gebhardt, and F. Klowon, Foundations of Fuzzy Systems, John Wiley and Sons, Inc., 1994.
  • X. He, D. Cai, and P. Niyogi, Laplacian score for feature selection, Adv. Neural Inf. Process Syst. Vol. 18, (2005), pp. 507–514.
  • A.Y. Ng, M.I. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algorithm, Adv. Neural Inf. Process Syst. 2 (2002), pp. 849–856.
  • M. Drobics and J. Himmelbauer, Creating comprehensible regression models, soft computing - A fusion of foundation, Methodol. Appl. 11 (2007), pp. 421–438.
  • R. Leser, B.A. Moser, T. Hoch, J. Stoegerer, G. Kellermayr, S. Reinsch, and A. Baca, Expert-oriented modelling of a 1vs1-situation in football, Int. J. Perform. Anal. Sport. 15 (2015), pp. 949–966.
  • R. Campello, D. Moulavi, and J. Sander, Density-based clustering based on hierarchical density estimates, in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2013, pp. 160–172.
  • H. Weyl, Über die Gleichverteilung von Zahlen mod. Eins, Mathematische Annalen 77 (1916), pp. 313–352. doi:10.1007/BF01475864
  • B.A. Moser, A similarity measure for image and volumetric data based on Hermann Weyl’s discrepancy, IEEE Trans. Pattern Anal. Mach. Intell. 33 (2011), pp. 2321–2329. doi:10.1109/TPAMI.2009.50

Appendix.

Approximation of time-dependent variables by fuzzy variables

In order to reduce complexity, we look for a coarse-to-fine approximation of the measured variables Vi as functions in time.

For this, we outline an approach which was recently examined in the context of event-based signal processing [Citation23] and is based on the so-called discrepancy measure. The discrepancy measure goes back to Hermann Weyl [Citation30], who introduced this measure in the context of measuring the irregularity of pseudo-random numbers. Given a time series f, this measure can be expressed as range of its partial sums, i.e.

(A1) fD=max1ink=0if(k)min1ink=0if(k)(A1)

or equivalently as

(A2) fD=max1i,jni=kjf(k)(A2)

For details regarding properties of the discrepancy measure as geometric norm, see [Citation23,Citation31]. It is interesting to observe that (A2) is based on an extremal principle. Those intervals of maximal discrepancy [di,di+1]i, on which such maximal partial sums are assumed, can be determined in linear time in the following way:

D1: Determine D=∥fD by means of (A1);

D2: Set d˜1:=0, i:=1; determine the minimal y > d˜2i1 with D=|d˜2i1=kyf(k)|; set d2i:=y;

D3: Determine the maximal x < d2i with D=|x=kd2if(k)|; set d2i1:=x;

D4: Repeat [D2] and [D3] for the remaining interval t > d2i by updating i:=i+1, d˜2i1:=d2i.

The construction of [di,di+1] implies that these intervals are of minimal length satisfying the maximal partial sum property. We refer to these intervals as minimal intervals of maximal discrepancy (MMD). Let I0 denote the universe of discourse of the signals. Observe that the MMD intervals induce a partition P1 of I0. For convenience, we set P0={I0}.

First, we outline a method for approximating f as step functions τ(k)=l=0kτ˜(k), by setting η(0)=η and

(5) η(k)=η(k1)τ(k1),(5)
τ˜(k)(t)=IPk1I(t)αI(k),
αI(k)=1#ItjIη(k)(tj),

where 1I denotes the indicator function of the interval I, #I denotes the number of non-zero elements of ηi in the interval I and Pk represents the refined partition of Pk1 that results from applying the MMD decomposition D1-D3 on the each element of Pk1 w.r.t. η(k1).

This approach allows to approximate the time-dependent variables Vi by fuzzy rules in the following way:

• Let P1,,Pn be normalized fuzzy subsets with (trapezoidal) membership functions based on crisp intervals given by μP1,,μPn.

• Let vˉi be the average of the variable v over the time interval Pi;

• Define rule base (k = 1,…,n): If tPk, then v=vˉk.

This approach provides the approximation

(6) v(t)=iμPi(t)viiμPi(t)(6)

See for an illustration.

Figure A1. Illustration of the signal decomposition based on discrepancy norm, which yields a coarse-to-fine approximation due to the chosen level of the generated tree of nested intervals. That three vertical solid lines indicate the transition moments t1, t2 and t3 (from left to right).

Figure A1. Illustration of the signal decomposition based on discrepancy norm, which yields a coarse-to-fine approximation due to the chosen level of the generated tree of nested intervals. That three vertical solid lines indicate the transition moments t1, t2 and t3 (from left to right).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.