Abstract
Background/aims: The Tanner-Whitehouse (TW) method for bone age determination has been the basis for many population studies and it is used in many clinics. However, TW bone age raters can differ systematically from each other. The aim of the study was to present a new standard version of TW bone age rating implemented by the automated BoneXpert method and calibrated on the manual TW stage ratings of the First Zurich Longitudinal Study.
Subjects: Hand radiographs of 231 children born in 1954–1956 were recorded annually from an average age of 5–20 years. For validation, 76 X-rays of Tanner's original Gold Series from eight boys were used.
Results: The root mean square deviation between manual and automated TW ratings in the Zurich data was 0.67 years for boys in the TW bone age range 5–15 years and 0.63 years for girls, 5–14 years. The new standard TW rating differs systematically from two previous TW versions of the automated method, based on different raters.
Conclusion: The new automated TW ratings show good accuracy relative to the manual ratings of the Zurich data and the Gold Series. There are significant differences between manual TW raters, an effect which is eliminated with the automated method.
Acknowledgements
Dedicated to the memory of James Tanner (1920–2010).
Declaration of interest: The first author is the owner of the company that develops and markets the BoneXpert medical device for automated determination of bone age studied in this paper.
Notes
1 The automated method's intended BA range extends down to 2.5 years for boys and 2 years for girls (defined on the GP BA scale). The Zurich data become scarcer towards this lower limit, so the lower thresholds, typically those below BA* 2.6 years, are not derived from the Zurich data. This is illustrated in , where it can be seen that the threshold between stage C and D for Met3 is difficult to derive from these data, with only four examples of bones of stage C. Therefore, it is only for the five bones ulna, Met1, DP3, DP5 and MP5, that the C-D thresholds are defined by the Zurich data. For the remaining bones, they are defined as the BA* value where the epiphysis is half as wide as the metaphysis (this is derived from another set of images extending to lower maturity than the Zurich data). For the case of Met3, this corresponds to BA* = 1.6 years. Likewise, it is only for the four bones ulna, Met1, DP5 and MP5 that the A-B and the B-C thresholds are defined by the Zurich data; for the remaining bones, the B-C thresholds are defined as the BA*, where the epiphysis width is 42% of the metaphysis width. This rule, not part of the TW method, was found to be a good approximation and, in any case, most of the B-C thresholds occur at bone ages well below the intended bone age range of the system.
2 This does not include stage transition D-E and below, because the D-E transition lies on average at BA* = 5.3 years in the short bones and here the Zurich data were scarcer, as is evident in . Also, the radius and ulna are not included because their definition of stages is different, e.g. capping is associated with stage H in radius and ulna does not even have a stage I.
3 A leave-one-subject-out cross-validation has been performed (like in Martin et al. (Citation2010b) and Thodberg et al. (Citation2009b)). The rms errors increase by less than 0.3%, which we consider negligible.
4 The Japanese TW version is normally used in conjunction with the TW2-Japan table for converting SMS into bone age (Martin et al. Citation2010b). Here we have instead used the TW3 conversion table, in order to be able to compare how the various versions rate the bones.
5 Another reason is that TW3 spans the maturity range in fewer years, 15 instead of 16 in the case of girls, and for that reason alone the errors are reduced by 6%. TW3 and GP represent a similar tempo of maturation, so rms errors of the two systems are comparable.