Search in:

Sports Biomechanics Volume 23, 2024 - Issue 8

Submit an article Journal homepage

Free access

11,050

Views

CrossRef citations to date

Altmetric

Listen

Editorial

Machine learning in sports science: challenges and opportunities

Chris Richtera Therapie Delivering, Kaia Health, Munich, GermanyCorrespondence[email protected]

https://orcid.org/0000-0001-6017-1520

Martin O’Reillyb Institute for Sport and Health, University College Dublin, Dublin, Ireland

https://orcid.org/0000-0003-2425-5393

Eamonn Delahuntb Institute for Sport and Health, University College Dublin, Dublin, Ireland

https://orcid.org/0000-0001-5449-5932

Pages 961-967 | Received 10 Nov 2020, Accepted 20 Mar 2021, Published online: 20 Apr 2021

Cite this article
https://doi.org/10.1080/14763141.2021.1910334
CrossMark

In this article

Introduction
Discussion and implications
Disclosure statement
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Introduction

Sport scientists could benefit from knowing about the processes underpinning machine learning (ML) and its practical applications, as it has substantial implications for our field. ML has already had an impact on:

the devices used to acquire data (optimisation of inertial sensor readings (Jamil et al., Citation2020)),
the information extracted from data acquired by devices (3d kinematic and vertical ground reaction forces might be soon predicted via 2D videos (Morris et al., Citation2020; Weir et al., Citation2019), barbell movement trajectories and mass have been used to predict joint moments (Kipp et al., Citation2018)),
the processing of data acquired by devices (classification models are able to split data in to meaningful packages that would have previously placed large time demands on sport scientists (Ahmadi et al., Citation2014)) and
the utility of processed data to enhance our understandings of sports performance and injury risk prediction (classification models can facilitate objective decision making with respect to rehabilitation practices and injury prevention (Richter et al., Citation2019; Rommers et al., Citation2020)).

In this editorial we illustrate how ML has been used successfully to characterise aberrant movement profiles in injured athletes and how this data can subsequently inform rehabilitation practices (see references: Franklyn-Miller et al., Citation2016; Richter et al., Citation2019; Rommers et al., Citation2020). Furthermore, we outline the advantages of ML, as well as the challenges that sport scientists are likely to encounter when reading about or endeavouring to implement ML models. Let’s start with the most important question: ‘What is machine learning?’

ML is the creation and use of models that have learned from data (Grues, Citation2015) and involves the process of programming a computer to perform a classification or magnitude estimation without being ‘told’ how it is done (; the most basic and commonly used example is fitting a regression model). During this classification or magnitude estimation the computer (i.e., the machine) ‘experiments’ with different model parameters and prediction features (independent variables) and identifies (learns), which combination of parameters and prediction features best replicate (i.e., predict) the target (dependent variable). The advantage conferred over traditional statistical methodologies is that this process allows interdependency of data points and facilitates the detection of previously hidden clusters (e.g., athletes that respond differently to one stimuli) without any subjective magnitude/dependency settings, whilst also providing a clear error estimation. However, as with traditional statistical methodologies it is important to exert caution when interpreting ML outcomes because explanatory and predictive modelling are not the same (Shmueli, Citation2010)^—if a ML model is trained with erroneous data the model will not generalise and will generate false conclusions—hence the phrase ‘errors in, errors out’. While it is not necessary for sport scientists to comprehend the mathematical and statistical principles underpinning ML, it is important that they understand certain key concepts (data capture/processing, feature selection, modelling and evaluation), before using it as a component of their research toolkit.

Figure 1. Illustration of the processes of feature-based machine learning and the potential challenges associated with each of these processes.

Data capture/processing

When utilising ML, it is critically important to capture a sufficiently large dataset. Unfortunately, power calculations are not applicable to ML models, and it can’t be known a priori as to how many data samples are ‘sufficient’. In most instances, the more data the better, as this will provide the model with ‘more’ opportunities to learn. We would suggest that sharing of datasets amongst sports scientists should be considered to facilitate the development of robust ML models. Previous studies using injury prediction ML models have used >400 or >700 data samples (Richter et al., Citation2019; Rommers et al., Citation2020). If datasets are shared amongst sport scientists, it should be noted that it is highly probable that data captured from different laboratories are likely to have different noise characteristics or have been processed using different frameworks. Such dataset artefacts might result in a ML model that has learned an erroneous pattern (predictions are made on measures that describe noise rather than movement patterns), and hence efforts will be required to harmonise the data.

Feature selection

Regardless of how much data were collected, the features extracted from a dataset are possibly more important than the sample size of the dataset used, because if the prediction features inputted to a ML model hold no meaningful information it will never perform well. While a ML model could be trained on all the available data collected, models that seek to guide rehabilitation programmes, injury prevention initiatives, movement quality assessments and training load optimisation should be built using only a small selection of features—to aid interpretability. Traditionally, studies in sports science have used a subjective selection of features (e.g., a signal’s maximal value) that can potentially result in important information being discarded. Recently, data-driven approaches were introduced and are being used more frequently (e.g., principal component analysis [PCA]). For an optimal and robust collection of features, sports scientists should combine objective and domain-specific knowledge-based features because even a data-driven feature extraction does not guarantee that all information that is contained within the data is used. For example, Richter et al. (Citation2019) sought to develop a ML model that could differentiate between the operated and non-operated limb of an athlete undergoing rehabilitation following anterior cruciate ligament reconstruction and the limb of a non-injured healthy control athlete using kinematic and kinetic measures (time normalised waveforms) recorded during a variety of exercises. To extract prediction features, a PCA was used, as it captures the variability within the dataset. However, this approach may have resulted in missed information on potential benefit to the model, as variability of the dataset does not necessarily correspond to differentiation/classification ability.

Modelling

Following the completion of the feature selection, an ML approach needs to be selected, which can be tricky—as many techniques exist that differ in their abilities to learn (regression, decision-tree, neural networks etc., see: Hastie & Tibshirani, Citation2002). It is important to note that feature normalisation (transforming a features magnitude into for example, z-scores) may be required (depending on the ML technique) and that many ML models perform better when the training data is well balanced across different predictive outcomes of importance (data can be balanced using synthetic minority over-sampling (Chawla et al., Citation2002)). When building a model, it is important to consider the danger of overfitting that could be caused by an inappropriate ratio between input features and observations within the dataset. To avoid this, a feature-selection can be undertaken to identify the most useful features (O’Reilly et al., Citation2017), which unfortunately could result in the loss of information. To interpret/refine the output of a model, different visualisation approaches can be used to understand the importance of a feature and the predictions made (Liu et al., Citation2017). When seeking to develop a ML model it is common practice to evaluate multiple ML techniques and feature combinations to find the most appropriate model.

Evaluation

To estimate the performance of a ML model, it is important to use a criterion that aligns with the ML task. An often-used measure to evaluate the performance of a ML model is accuracy, which can be used if there is a balance across possible outcomes. Accuracy, however, should never be used if there is an imbalance across possible outcomes. Using accuracy in an injury prediction study is likely to yield an ‘excellent’ result, even if the model has learned nothing (e.g., 98% accuracy can be achieved by always predicting ‘no injury’ if 2% of the observations get injured). For classification problems, many performance measures can be computed from the confusion matrix (; a receiver operating curve can give information about different decision thresholds), while magnitude predicting ML models should use measures such as: mean-average-errors, root-mean-square-error, Bland-Altman plots or regression analyses (r²).

Table 1. Illustration of a binary classification confusion matrix and measures that can be directly calculated from it. Criteria that could be computed from the listed measures are also positive and negative likelihood ratio, diagnostic ratio and F₁ score.

Download CSV Display Table

To ensure the generalisability of a model, it should be evaluated by estimating its robustness. The best practice is to set aside a portion of the data, which is only used for the evaluation of the model (testing data) and for cross-validation during the fitting process ( lists possible options). This process can highlight if a model is under- (too rigid to learn pattern of training) or over-fitted (too flexible—learned noise of training data; Halilaj et al., Citation2018; Hastie & Tibshirani, Citation2002). Data from the same athlete should never be used simultaneously in training and testing data, as this can greatly overestimate a model’s performance (called data-leakage). Unfortunately, evaluation is not yet commonly used in sports science studies.

Table 2. List of common cross-validation methods.

Download CSV Display Table

Discussion and implications

Over the next decade, ML is likely to greatly enhance the objectivity of decision making in sports science (Franklyn-Miller et al., Citation2016; Richter et al., Citation2019; Rommers et al., Citation2020). Studies have demonstrated that ML can: predict future injuries (85% precision, 85% recall) based on pre-season measures (Rommers et al., Citation2020); identify movement strategies within a cohort—enabling the identification of movement-specific injury risk factors (Franklyn-Miller et al., Citation2016); and identify uninjured individuals who demonstrate movement patterns that are similar to injured individuals (Richter et al., Citation2019). However, substantial challenges need to be overcome (). There are many pitfalls that must be avoided, when developing a ML model. Successful ML model creation and assessment require comprehensive interdisciplinary expertise including domain experts with experience and knowledge of statistics and ML. To enable ML techniques to have a widespread positive impact in our field, it is recommended that coding and ML modules are integrated into sports science educational courses and that widespread international collaboration is adopted in the aggregation and harmonisation of ‘open access’ datasets.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

Ahmadi, A., Mitchell, E., Richter, C., Destelle, F., Gowing, M., O’Connor, N. E., & Moran, K. (2014). Toward automatic activity classification and movement assessment during a sports training session. IEEE Internet of Things Journal, 2(1), 98–103. https://doi.org/10.1109/BSN.2014.29
Web of Science ®Google Scholar
Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Web of Science ®Google Scholar
Franklyn-Miller, A., Richter, C., King, E., Gore, S., Moran, K., Strike, S., & Falvey, E. C. (2016). Athletic groin pain (part 2): A prospective cohort study on the biomechanical evaluation of change of direction identifies three clusters of movement patterns. British Journal of Sports Medicine, 51(5), 460–68. https://doi.org/10.1136/bjsports-2016-096050
PubMed Web of Science ®Google Scholar
Grues, J. (2015). Data science from scratch: First principles with python. O’Reilly.
Google Scholar
Halilaj, E., Rajagopal, A., Fiterau, M., Hicks, J. L., Hastie, T. J., & Delp, S. L. (2018). Machine learning in human movement biomechanics: Best practices, common pitfalls, and new opportunities. Journal of Biomechanics, 81, 1–11. https://doi.org/10.1016/j.jbiomech.2018.09.009
PubMed Web of Science ®Google Scholar
Hastie, T., & Tibshirani, R. (2002). Elements of statistical learning. Springer.
Google Scholar
Jamil, F., Iqbal, N., Ahmad, S., & Kim, D. H. (2020). Toward accurate position estimation using learning to prediction algorithm in indoor navigation. Sensors (Switzerland), 20(16), 1–27. https://doi.org/10.3390/s20164410
Web of Science ®Google Scholar
Kipp, K., Giordanelli, M., & Geiser, C. (2018). Predicting net joint moments during a weightlifting exercise with a neural network model. Journal of Biomechanics, 74, 225–229. https://doi.org/10.1016/j.jbiomech.2018.04.021
PubMed Web of Science ®Google Scholar
Liu, S., Wang, X., Liu, M., & Zhu, J. (2017). Towards better analysis of machine learning models: A visual analytics perspective. Visual Informatics, 1(1), 48–56. https://doi.org/10.1016/j.visinf.2017.01.006
Google Scholar
Morris, C., Mundt, M., Goldacre, M., & Jacqueline, A. (2020). Predict ground reaction forces from 2D video. https://twitter.com/JacquelineUWA/status/1327971555029073924
Google Scholar
O’Reilly, M. A., Johnston, W., Buckley, C., Whelan, D., & Caulfield, B. C. (2017). The influence of feature selection methods on exercise classification with inertial measurement units. In IEEE 14th International Conference on Wearable and Implantable Body Sensor Networks (BSN) (Vol. 14, pp. 193–196).
Google Scholar
Richter, C., King, E., Strike, S., & Franklyn-Miller, A. (2019). Objective classification and scoring of movement deficiencies in patients with anterior cruciate ligament reconstruction. Plos One, 14(7), e0206024. https://doi.org/10.1371/journal.pone.0206024
PubMed Web of Science ®Google Scholar
Rommers, N., Rössler, R., Verhagen, E., Vandecasteele, F., Verstockt, S., Vaeyens, R., Lenoir, M., D’hondt, E., & Witvrouw, E. (2020). A machine learning approach to assess injury risk in elite youth football players. Medicine & Science in Sports & Exercise, Publish Ah, 52(12), 1745–1751. https://doi.org/10.1249/mss.0000000000002305
PubMedGoogle Scholar
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. https://doi.org/10.1214/10-STS330
Web of Science ®Google Scholar
Weir, G., Alderson, J., Smailes, N., Elliott, B., & Donnelly, C. (2019). A reliable video-based ACL injury screening tool for female team sport athletes. International Journal of Sports Medicine, 40(3), 191–199. https://doi.org/10.1055/a-0756-9659
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Machine learning in sports science: challenges and opportunities

Introduction

Data capture/processing

Feature selection

Modelling

Evaluation

Table 1. Illustration of a binary classification confusion matrix and measures that can be directly calculated from it. Criteria that could be computed from the listed measures are also positive and negative likelihood ratio, diagnostic ratio and F₁ score.

Table 2. List of common cross-validation methods.

Discussion and implications

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Machine learning in sports science: challenges and opportunities

Introduction

Data capture/processing

Feature selection

Modelling

Evaluation

Table 1. Illustration of a binary classification confusion matrix and measures that can be directly calculated from it. Criteria that could be computed from the listed measures are also positive and negative likelihood ratio, diagnostic ratio and F1 score.

Table 2. List of common cross-validation methods.

Discussion and implications

Disclosure statement

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. Illustration of a binary classification confusion matrix and measures that can be directly calculated from it. Criteria that could be computed from the listed measures are also positive and negative likelihood ratio, diagnostic ratio and F₁ score.