Abstract
The aim of our research was to analyse the reliability and validity of judging on all women's apparatuses and all sessions (qualification, all round finals and apparatus finals) at the World University Games-Universiade 2009 in Belgrade. For validity assessment, mean absolute and rank deviations of judges’ execution scores were calculated. For consistency and reliability assessment, Cronbach's alpha coefficient, intra-class correlations, Armor's theta and Kendall's W coefficient were calculated. Vault and floor exercise finals were the sessions with the highest scores and the lowest score dispersion. The overall highest individual judge average absolute deviation was 0.34 point and the largest mean rank deviation was 0.88 with most values well below this. A correlation matrix for between-judge correlations identified three judges (out of 20) in the apparatus finals sessions with remarkably inferior correlations with others. Except for vault and floor finals, the results in terms of consistency (Cronbach's alpha mostly above 0.95) and reliability (Armor's theta mostly above 0.94, intra-class correlation for single and average measures above 0.87 and 0.94, respectively) were satisfactory. In conclusion, overall high values of reliability and consistency indices were found. Sessions where the variability between competitors is low (such as vault and floor finals in this competition) should be inspected with special care in future judging analyses.