Full article: Repeatability imprecision from analysis of duplicates of patient samples and control materials

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Measurement imprecision is usually calculated from measurement results of the same stabilized control material(s) obtained over time, and is therefore, principally, only valid at the concentration(s) of the selected control material(s). The resulting uncertainty has been obtained under reproducibility conditions and corresponds to the conventional analytical goals. Furthermore, the commutability of the control materials used determines whether the imprecision calculated from the control materials reflects the imprecision of measuring patient samples. Imprecision estimated by measurements of patient samples uses fully commutable samples, freely available in the laboratories. It is commonly performed by calculating the results of routine patient samples measured twice each. Since the duplicates are usually analysed throughout the entire concentration interval of the patient samples processed in the laboratory, the result will be a weighted average of the repeatability imprecision measured in the chosen measurement intervals or throughout the entire interval of concentrations encountered in patient care. In contrast, the uncertainty derived from many measurements of control materials over periods of weeks is usually made under reproducibility conditions. Consequently, the repeatability and reproducibility imprecision play different roles in the inference of results in clinical medicine. The purpose of the present review is to detail the properties of the imprecision calculated by duplicates of natural samples, to explain how it differs from imprecision calculated from single concentrations of control materials, and to elucidate what precautions need to be taken in case of bias, e.g. due to carry-over effects.

Keywords:

Variance components

Results of repeated measurements of the same sample are supposed to vary randomly and thus be normally distributed making the average and the standard deviation optimal measures of the central tendency and variation, respectively. However, in the analytical laboratory the measurement methods need to be characterized in more detail, in particular the variance of results obtained under repeatability conditions i.e. no changes in the experimental conditions between measurements (within series or ‘runs’) in contrast to reproducibility conditions, where one or several experimental conditions are changed between measurements (between series or ‘runs’). Typically in laboratory medicine practice changes in conditions are limited to measuring systems, time and possibly, but not necessarily, reagents and calibrators. The actual changes between days or runs are not known in detail in all cases and reproducibility is in contrast to repeatability. These conditions are also addressed by the concept ‘intermediate measurement precision’.

The one-way analysis of variance, ANOVA is designed to compare the averages of several series of measurements investigating whether there is a difference between the averages of the series. Calculation tools are readily available in statistical and spreadsheet programs and the results are commonly reported in a standardized format comprising the within-, between- and total sum of squares, the degrees of freedom for each category and the ‘mean squares’ (MS) of the between- and within groups.

The information provided by the ANOVA can also be used to analyse the variance components, i.e. the repeatability and the reproducibility, since the mean square (MS_b and MS_w) is a measure of the corresponding variances. However, the calculation of the between series variance necessarily includes the within series variance and therefore needs to be corrected to obtain a ‘pure’ between series variance (reproducibility), $s_{b}^{2},$ before calculating the total variance. The correction is: (1) $s_{b}^{2} = \frac{{M S}_{b} - {M S}_{w}}{n_{0}}$ (1) where n₀ is acceptable as the average number of observations in the groups even in a slightly unbalanced design i.e. different number of observations in each series (group).

The total variance is the sum of the MS_w and the $s_{b}^{2} .$ If the MS_b<MS_w, (1) would be undetermined and therefore given the value zero and the total imprecision becomes equal to the repeatability. If this occurs even if the conditions are varied it signifies a stability against changing external conditions. A typical consequence would be allowing extended periods between calibrations.

The calculated variances are strictly true only for the concentration – or assigned concentration – of the sample used in the ANOVA design.

The Dahlberg formula

In the days of Laplace [Citation1] and Gauss [Citation2] (beginning of the nineteenth century) and Pearson (end of the nineteenth century) there was a struggle regarding optimal methods for describing the variation of measured values. This was beautifully resolved by the Gauss normal frequency distribution function as for instance discussed in the seminal publication by the astronomer Charlier in 1910 [Citation3]. It is challenging to follow the train of thoughts of the authors of the time in attempts to determine the ‘dispersion’, Streuung (in German) or ‘standard deviation’ of the random results of measurements. However, the archaic terminology in the literature varies and the conclusions are often difficult to follow for the non-statistician.

Before the advent of analogue and digital computers the numerical treatment of databases was a major hurdle and special tricks and means were tried to estimate the dispersion and the average. Dahlberg eventually discussed and formulated [Citation4] the calculation of the dispersion of results from duplicate measurements and it became recognized as the standard derivative from duplicates. (2) $s = \sqrt{\frac{\sum_{i = 1}^{N} {(x_{i, 1} - x_{i, 2})}^{2}}{2 \times N}} = \sqrt{\frac{\sum_{i = 1}^{N} d_{i}^{2}}{2 \times N}}$ (2) where d is the difference between duplicate results and N the number of duplicate pairs. It may be difficult to immediately see how this expression relates to the definition of the standard deviation and understand its inference and it is therefore justified to derive the Dahlberg formula from the definition of the standard deviation (s) and variance (s²) of a sample.

The standard deviation (s), which in a graph of the normal distribution represents the distance between the maximum (average) of the bell-shaped curve and the first inflexion point and corresponds to about 1/3 of the area under the curve, is calculated as (3) $s = \sqrt{\frac{\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}}{n - 1}}$ (3) for a sample of the population and where x_i is the individual observation, $\bar{x}$ the average of observations and n the number of observations in the sample.

The Dahlberg formula can be understood from the formula for pooling variances. If the variance (s²) has been calculated for the same quantity using the same measurement procedure in a commutable matrix but on various occasions, it is reasonable to assume that the calculated standard deviations belong to identical distributions. The uncertainty of the overall estimate of the variance can be reduced by pooling the estimates from many experiments. This is the same principle as reducing the uncertainty of an average by pooling the averages calculated for many similar or identical experiments. Unless the number of observations in each experiment is the same, the individually calculated results must be weighted by the number of observations in each experiment. This is true when pooling averages as well as variances. Therefore, the general expression of a pooled variance is a ‘best estimate’ of the standard deviation of several samples [Citation5] within a measurement interval (4) $s_{pool}^{2} = \frac{(n_{1} - 1) \times s_{1}^{2} + (n_{2} - 1) \times s_{2}^{2} + \dots + (n_{N} - 1) \times s_{N}^{2}}{(n_{1} + n_{2} + \dots + n_{N}) - N} = \frac{\sum_{i = 1}^{N} (n_{i} - 1) \times s_{i}^{2}}{\sum_{i = 1}^{N} (n_{i}) - N}$ (4) where the variance $(s_{i}^{2})$ has been calculated for each of the N studies (runs), each comprising n_i observations.

If the same quantity is measured the same number of times (n) and by the same procedure in N runs, then expression (4) is simplified to (5) $s_{pool}^{2} = \frac{(n - 1) \times (s_{1}^{2} + s_{2}^{2} + \dots + s_{N}^{2})}{N \times n - N} = \frac{s_{1}^{2} + s_{2}^{2} + \dots + s_{N}^{2}}{N} = \frac{\sum_{i = 1}^{i = N} s_{i}^{2}}{N}$ (5)

In duplicate measurements there are only two observations in each run and the average can be calculated by the expression: $\frac{x_{i, 1} + x_{i, 2}}{2} .$ When this expression of the average is inserted into the formula, and with algebraic simplifications, the pooled variance based on two observations (duplicates) will be (6) $s_{i}^{2} = \frac{\sum_{i = 1}^{N} [{(x_{1} - \frac{x_{1} + x_{2}}{2})}^{2} + {(x_{2} - \frac{x_{1} + x_{2}}{2})}^{2}]}{N} = \frac{\sum_{i = 1}^{N} \frac{{(2 x_{1} - x_{1} - x_{2})}^{2} + {(2 x_{2} - x_{1} - x_{2})}^{2}}{4}}{N} = \frac{\sum_{i = 1}^{N} {\frac{(x_{1} - x_{2})}{2}}^{2}}{N} = \frac{\sum_{i = 1}^{N} d_{i}^{2}}{2 \times N}$ (6) where N is the number of pairs (runs) and d the difference between the duplicates in each run.

The Dahlberg formula calculates the repeatability imprecision of measurements i.e. performed under ‘repeatability conditions’ according to the VIM definition [Citation6]. Notably, the repeatability imprecision (within series) is different from the reproducibility precision (between series) obtained by repeated measurements over days and weeks of the same sample. Reproducibility is not addressed by the Dahlberg formula. In contrast to the variance calculated from single control samples, the Dahlberg formula calculates the weighted average of the estimated repeatability imprecision of the entire studied concentration interval. It is also different from the total imprecision or method imprecision which is the sum of the variances of the repeatability and ‘pure’ reproducibility (1).

By partioning the results of results from a patient cohort and calculating the Dahlberg repeatability for each partition, an imprecision profile can be estimated [Citation7].

If the standard deviation can be assumed constant (homoscedastic) in the measuring interval, the Dahlberg uncertainty can be calculated from formula (2). If, however, it is more likely that the standard deviation is proportional to the concentration (heteroscedastic), then a relative difference of each pair is usually preferred to calculate a relative Dahlberg standard deviation: (7) $s_{rel} = \sqrt{\frac{\sum_{i = 1}^{N} {(2 \times \frac{x_{1} - x_{2}}{x_{1} + x_{2}})}^{2}}{2 \times N}}$ (7)

The result of the calculation of the relative Dahlberg uncertainty is commonly expressed as the coefficient of variation, CV_D, or as percentage %CV_D. The CV_D is appropriately viewed as the best estimate of a relative variation within a given measuring interval. The differences are relative to different numbers i.e. the average of several duplicates in the measuring interval and therefore not directly comparable to a conventional relative standard deviation, which is relative to the average of the measurement results of the control sample. The Dahlberg %CV is based on the average of all measured samples which not always explicitly known.

Numerous authors [Citation8,Citation9] have described the use of duplicate measurements for the estimation of imprecision. They have been especially popularized in the field of dentistry/orthodontics [Citation10]. A recent collection of papers in the journal Accreditation and Quality Assurance [Citation11–14] has also elucidated aspects of the matter.

Bias and the expanded Dahlberg formula

The Dahlberg formula is defined for measurements under true repeatability conditions, i.e. no systematic changes between the first and repeat measurements. However, there may be non-random differences (bias) caused by for instance sample degradation, ‘carry over’, reagent decay etc, even if the measurements are made in immediate succession. This bias may be constant for all measurements or vary between the pairs of measurements. In the latter case the bias can be expressed as the average of the non-random differences. To retain a primary repeatability, i.e. minimize the influence of the bias or change of measurement conditions, a correction can be applied.

To intuitively understand this correction, one has to consider the nature of an average. If a series of repeated observations is related to the first by a randomly distributed difference, which may be reasonable to assume, the average of the first and repeat series of observations will be the same. A consequence is that the average of the differences between the duplicates is zero.

For example, if we have the results of two observations of a series of samples (x₁, x₂…x_n) and the pairs of observations differ by a random value ±d, then the results will be (x₁), (x₁+d); (x₂), (x₂-d)…. and so on.

This can be summarized in a formula: (8) $Average = {\bar{x}}_{rep} = \frac{\sum_{i = 1}^{n} {(x \pm d)}_{i}}{n}, where d = f (N, x_{i}, s)$ (8)

If an average of the repeat series of observations is calculated from a large enough number of observations, then the random differences will cancel out and the averages of the series will be the same, i.e. ${\bar{x}}_{orig} = {\bar{x}}_{rep} = \frac{\sum_{i = 1}^{n} (x_{i})}{n} .$

The average of the differences will be zero because the sum of the positive deviations will be the same as the sum of the negative deviations: (9) $\sum_{i = 1}^{n} (+ d) = \sum_{i = 1}^{n} (- d)$ (9)

If the variance of the original series were Var₁ and the variance of the random variation Var_d, then the variance of the second series will be ${Var}_{1} + {Var}_{d} .$ As a seemingly paradoxical consequence the uncertainty of the repeat series of measurements will be larger than that of the first.

If there is a non-random component in the difference between the series, then the average would, written in the same format as above and assuming that the average of the error term is e, be (10) $\bar{x_{i}} = \frac{\sum_{i = 1}^{n} (x_{i} \pm d + e)}{n}$ (10)

Since, in the average calculated from a large number of observations, the random deviations cancel out, the average in (10) is reduced to $\bar{x_{i}} = \frac{\sum_{i = 1}^{n} (x_{i} + e)}{n} .$

We apply this reasoning to the Dahlberg formula (2) and assume that the repeat observation is x₂+e. Then the average of duplicate results will be [(x₁+(x₂+e)]/2.

Thus, (11) $s_{i}^{2} = \frac{\sum_{i = 1}^{N} {([x_{1} - (x_{2} + e)] - \frac{x_{1} + (x_{2} + e)}{2})}^{2}}{2 \times (N - 1)} = \frac{\sum_{i = 1}^{N} {(d_{i} - \frac{x_{1} + (x_{2} + e)}{2})}^{2}}{2 \times (N - 1)} = \frac{\sum_{i = 1}^{N} {(d_{i} - \bar{d})}^{2}}{2 \times (N - 1)}$ (11) which will accommodate a bias i.e. the average of non-random differences, the magnitude of which is the difference between the average of the original and duplicate results [Citation14]. Since we lose one degree of freedom in the calculation of the average of the differences then the Bessel correction N − 1, is required for small samples. This correction was described by Ingervall in 1964 [Citation15] and recognized as the ‘method of moments error’ (MME) by Springate [Citation8] and discussed by Hyslop [Citation9].

The Dahlberg formula and the expanded Dahlberg formula (ExpD, 11), which is derived above, can be compared to the MSE, ‘mean square error’ and its root, RMS ‘root mean square’, respectively. $MSE = \frac{\sum_{i = 1}^{N} {(x_{i} - x_{0})}^{2}}{N - 1} and RMS = \sqrt{\frac{\sum_{i = 1}^{N} x_{i}^{2}}{N}}$ where x₀ is a predetermined value e.g. the assumed target or average ad the RMS describes the width of a distribution centred at zero.

Even a small non-random deviation may influence the calculations. Since a zero non-random bias cannot always be assumed or justified, the expanded Dahlberg formula is a safe option to estimate the repeatability. If the non-random error is absent or small (as indicated by the average of the differences), the two formulas will yield identical results. Otherwise the original Dahlberg formula (2) runs the risk of overestimating the variance. The inconvenient necessity of calculating the average of the differences may not be a major hurdle with access to ample computer capacity.

The clinical interpretation of repeatability and reproducibility

The total imprecision represents the sum of two variances, the repeatability and the pure reproducibility variances, but disregards any bias. Some ambiguity exists in expressing the total imprecision since also the imprecision calculated as the variance across all observations of a control sample is commonly called total variance. That is only statistically appropriate if the pure reproducibility is negligible. Although this method risks underestimating the true total variance the effect is generally small inpracticalwork.

For the interpretation of results of clinical investigations at least three situations can be identified with different demands on the analytical sensitivity and therefore on the measurement uncertainty. Generally, the smaller the uncertainty is, the higher the analytical sensitivity will be; as expressed in the ‘minimal difference’, MD, i.e. the smallest significant difference between two results. (12) $M D = k \times \sqrt{u_{1}^{2} + u_{2}^{2}}$ (12)

If $u_{1} = u_{2},$ as would be expected under repeatability conditions, (12) is simplified to (13) ${M D}_{i} = u_{i} \times k \times \sqrt{2},$ (13) where u_i is the measurement uncertainty and k the coverage factor (usually 2 for a confidence level of 95%).

Consequently, in the monitoring of short termandfast events, e.g. changes of P/S-Troponin in myocardial infarctions, the repeatability would be the most appropriate information and suggest a higher analytical sensitivity, i.e. indicate a smaller significant change than if the inference were based on reproducibility. On the other hand, in monitoring of chronic diseases, e.g. diabetes or comparing a result with a reference interval, reproducibility may be the adequate choice, i.e. a larger difference would be necessary for aclinicallysignificant significant difference. In screening of patient populations imprecisions based on further assumptions, e.g. uncertainty caused by preanalytical conditions may be necessary or special effects of the prevalence of the condition.

Comparison between the analysis of variance components and Dahlberg repeatability

The repeatability in the ANOVA is calculated as the sum of squares of the difference between the observations and their average for each group (series) in the ANOVA. The repeatability variance, expressed as the mean square (MS_w) is obtained by the division of the sum of squares by the degrees of freedom, i.e. the total number of observations (n) minus the number of groups (k), i.e. (n − k). The repeatability can be calculated from any number of observations in each group with this method. Since the number of groups will always be n/2, if the groups contain only two observations, the df is also n/2, i.e. the number of pairs, or samples. In that case the MS_w equals the Dahlberg variance.

In the Dahlberg calculation the sum of squares of the differences between the observations and the average of each pair is also the starting point. A further simplification (6) of the formula uses the information that the average is the sum of the two observations divided by 2 and thus, the pooled variances become the sum of the squared difference between observations divided by the number of observations, equal to twice the number of samples determined. The Dahlberg expression is only applicable to duplicates. The degrees of freedom will be the number of pairs, or samples, i.e. the same as for repeatability variance in the analysis of variance components [Citation16,Citation17].

The similarity of the procedures emphasizes their nature of estimating repeatability. A difference is that the comprehensive analysis of variance components is only applicable for one sample concentration but any number of observations in the groups whereas the Dahlberg approach can only be applied to duplicate samples within an extended measuring interval and thus represents a best estimate of the repeatability of measurements in that interval.

Conclusions

The repeatability imprecision calculated by the Dahlberg formula is a ‘best estimate’ of an average repeatability imprecision in a chosen measurement interval, using commutable samples. This imprecision may vary substantially within the interval of concentrations encountered in medical laboratories. In case a non-random variation between the first and the repeat results cannot be excluded, a correction should be applied which results in a different formula, the expanded Dahlberg (moment measurement error). It is important to carry out the repeat measurements as close as possible to the first, be vigilant to systematic influences and to observe the measuring interval. A relative repeatability estimated by the Dahlberg formula is based on the average of the pairs of results and thus not directly comparable to a variance relative to defined value, e.g. the average of a set of results.

It is argued that the repeatability calculated by the analysis of variance components technique only represent the variance at one concentration but can be based on may repeated observations in each group.

In an ideal situation, the user should match the analytical goal with the clinical need, thus, repeatability should be the goal of choice when a short term development and small changes of a biomarker is evaluated, whereas the reproducibility variation is more appropriate when monitoring changes over extended periods of time.

Disclosure statement

No potential conflict of interest was reported by the authors.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

References

Laplace PS. Théorie analytique des probabilités. Courcier, Paris; 1812.
Google Scholar
Gauss CF. Theoria motus corporum coellestium. Göttingen: Gesellschaft Wissenschaft; 1809. (Werke 7 K).
Google Scholar
Charlier C. Grunddragen av den matematiska statistiken. Lund: Statsvetenskaplig Tidskrifts Expedition; 1910.
Google Scholar
Dahlberg G. Statistical methods for medical and biological students. London: G. Allen & Unwin Ltd.; 1940.
Google Scholar
Pearson K. On lines and planes of closest fit to systems of points in space. Phil Mag. 1901;2(11):559–572.
Google Scholar
JCGM. International vocabulary of metrology — Basic and general concepts and associated terms (VIM 3): Bureau International des Poids et Mesures; 2012 [2019 Apr 8]. 3 ed. Available from: https://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2012.pdf.
Google Scholar
Kallner A, Petersmann A, Nauck M, et al. Measurement repeatability profiles of eight frequently requested measurands in clinical chemistry determined by duplicate measurements of patient samples. Scand J Clin Lab Invest. 2020.
Google Scholar
Springate SD. The effect of sample size and bias on the reliability of estimates of error: a comparative study of Dahlberg's formula. Eur J Orthod. 2012;34(2):158–163.
PubMed Web of Science ®Google Scholar
Hyslop NP, White WH. Estimating precision using duplicate measurements. J Air Waste Manag Assoc. 2009;59(9):1032–1039.
PubMedGoogle Scholar
Houston W. The analysis of errors in orthodontic measurements. Am J Orthod. 1983;83(5):382–390.
PubMedGoogle Scholar
Roesslein M, Wolf M, Wampfler B, et al. A forgotten fact about the standard deviation. Accred Qual Assur. 2007;12(9):495–496.
Web of Science ®Google Scholar
Rosslein M, Rezzonico S, Hedinger R, et al. Repeatability: some aspects concerning the evaluation of the measurement uncertainty. Accred Qual Assur. 2007;12:425–434.
Web of Science ®Google Scholar
Hall BD, Willink R. A comment on: A forgotten fact about the standard deviation. Accred Qual Assur. 2008;13(1):57–58.
Web of Science ®Google Scholar
Synek V. Evaluation of the standard deviation from duplicate results. Accred Qual Assur. 2008;13(6):335–337.
Web of Science ®Google Scholar
Ingervall B. Retruded contact position of mandible. A comparison between children and adults. Odontologisk Revy. 1964;15:130–149.
Google Scholar
Kallner A, Theodorsson E. An experimental study of methods for the analysis of variance components in the inference of laboratory information. Scand J Clin Lab Invest. 2019:1–8.
PubMed Web of Science ®Google Scholar
IUPAC. Compendium of analytical nomenclature—Definitive Rules 2000 (‘‘The Orange Book’’), 3rd ed. [Updated 2002]. IUPAC; 1998. Available from: http://old.iupac.org/publications/analytical_compendium.
Google Scholar

Repeatability imprecision from analysis of duplicates of patient samples and control materials

Abstract

Variance components

The Dahlberg formula

Bias and the expanded Dahlberg formula

The clinical interpretation of repeatability and reproducibility

Comparison between the analysis of variance components and Dahlberg repeatability

Conclusions

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Repeatability imprecision from analysis of duplicates of patient samples and control materials

Abstract

Variance components

The Dahlberg formula

Bias and the expanded Dahlberg formula

The clinical interpretation of repeatability and reproducibility

Comparison between the analysis of variance components and Dahlberg repeatability

Conclusions

Disclosure statement

Correction Statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date