Search in:

Clinical Epidemiology Volume 11, 2019 - Issue

Submit an article Journal homepage

Open access

Views

CrossRef citations to date

Altmetric

Listen

Methodology

Random error units, extension of a novel method to express random error in epidemiological studies

Imre Janszky1 Deparment of Public Health, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway, [email protected]Correspondence[email protected]

Johan Håkon Bjørngaard1 Deparment of Public Health, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway, [email protected]

Pål Romundstad1 Deparment of Public Health, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway, [email protected]

Lars Vatten1 Deparment of Public Health, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway, [email protected]

Nicola Orsini2 Department of Public Health Sciences, Karolinska Insitutet, Stockholm, Sweden

Pages 127-132 | Published online: 23 Jan 2019

Cite this article
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Abstract

Currently used methods to express random error are often misinterpreted and consequently misused by biomedical researchers. Previously we proposed a simple approach to quantify the amount of random error in epidemiological studies using OR for binary exposures. Expressing random error with the number of random error units (REU) does not require solid background in statistics for a proper interpretation and cannot be misused for making oversimplistic interpretations relying on statistical significance. We now expand the use of REU to the most common measures of associations in epidemiology and to continuous variables, and we have developed a Stata program, which greatly facilitates the calculation of REU.

Keywords:

statistical significance
confidence intervals
P Value
random error
random error units

Several authors have highlighted that the concept of statistical null hypothesis testing is misleading and sometimes directly harmful for research.^{Citation1,Citation2} Great efforts have been made to inform researchers about misconceptions related to the use of P-values; for example, that the P-value is the probability that the result can be explained by chance or that the P-value is a direct measure of uncertainty.^Citation3^–^Citation5 Nevertheless, statistical null hypothesis testing continues to be a primary approach in many scientific disciplines, and serious misconceptions about P-values seem to prevail.

CIs allow us to evaluate the strength of an association and its precision separately. Therefore, CIs were promoted to replace P-values and to force users to turn away from null hypothesis testing. The use of CIs has certainly increased over time^{Citation6,Citation7} but it should be recognized that CIs are not immune to misinterpretations and are also widely misused.^{Citation4,Citation7} For example, the typical practice of checking whether the value corresponding to the null hypothesis is within the CI is just as wrong as using a P-value for statistical null hypothesis testing.^{Citation6,Citation7} There are numerous examples that illustrate how such misuse of CIs could lead to potentially damaging effects on clinical practice.^Citation8

To quote Greenland et al: “A key problem [with P-value and confidence interval] is that there are no interpretations of these concepts that are simple, intuitive, correct and foolproof ”.^Citation4 Unfortunately, alternative solutions, such as Bayesian methodology, P-value function, likelihood intervals, or the likelihood function, are just as complex as P-values or CIs. Although a deeper insight in statistics is desirable, it may be unrealistic to expect that most clinicians or other health care professionals will achieve the necessary understanding of these concepts. The continued dominance of statistical null hypothesis testing has recently triggered a drastic solution by one journal. The Basic and Applied Social Psychology has banned not only the reference to statistical significance and P-value, but – given their widespread misuse – also banned the use of CIs.^Citation9

However, we think that random error or uncertainty may be expressed in a way that does not require a strong background in statistics. Also, we suggest an approach that cannot be used or misused for statistical null hypothesis testing. In a previous paper, exemplified by ORs, we introduced the concept of random error units (REU) as a helping aid on random error. REU have an easy interpretation and can help avoiding hypothesis testing.^Citation10 In the present paper, we extend the concept of REU to other measures of associations than the OR and to continuous variables. In addition, we present a postestimation command in Stata that provides an easy method to calculate REU for various regression models.

The number of REU in a given study shows how many times more individuals are needed in an actual study to achieve the precision of a hypothetical gold standard study. Our proposed gold standard study using the OR as the measure of effect is a case–control study where the aim is to assess the effect of a binary exposure on the risk of a certain disease. This hypothetical gold standard has 500,000 cases and 500,000 controls where half the controls and half the cases are defined as exposed.

To the reader, REU might provide some kind of anchoring reference to how much random error there is in a given study. Just consider the meter – which is a standard measure of distance that was arbitrarily constructed as one ten-millionth of the distance from the equator to the North Pole. We have previously exemplified how the use of REU may achieve a better appreciation of the amount of random error that may be present in a study and how common pitfalls of handling random error can be avoided.

The idea of REU can easily be extended to other measures of associations than the OR. In , we show our choice of gold standard study and how REU can be calculated for various measures of effect, including the OR, the incidence rate ratio, the HR, and the risk ratio and risk difference, in settings where the exposure is a binary variable

Table 1 Random error units for the most frequent measures of effect in epidemiology

Download CSV Display Table

The choice of gold standard was arbitrary. Any hypothetical study could serve its purpose as long as the same one is used as standard reference when comparing random error across real-life studies. We have chosen studies with extremely small amount of random error as hypothetical gold standard studies. The purpose was to ensure that the number of REU in real-life studies would be unlikely to go below one, and in practice decimal values will not be needed either. Otherwise, the interpretation of the number of REU might be awkward as it could imply that a noninteger number of individuals would be needed to achieve the same precision as the standard hypothetical study. We can consider the value of 1 for REU as the “atom” – ie, a “nondividable” unit – of random error.

In , we use these measures of effect and present the number of REU in several hypothetical studies where the number of participants and the proportions and distributions of the exposure and the outcome differ.

Table 2 Number of random error units in some hypothetical studies with dichotomous exposures and outcomes using incidence rate ratios, risk ratios, and risk differences

Download CSV Display Table

In statistical models, categorical variables with more than two categories are typically entered as binary dummy variables. Thus, calculation of the REU for these dummy variables is the same as for any binary variable. However, a continuous variable needs to be transformed to a binary variable for the calculation of REU. To dichotomize continuous variables, we propose to use a cutoff that minimizes the random error, in other words, a cutoff that produces the binary variable with the lowest number of REU. In the Supplementary material, we present a postestimation command in Stata, which automatically detects both the measure of association and the type of variables and then calculates the number of REU. In the case of continuous variables, this command also identifies the cutoff value that minimizes the amount of random error.

In the Supplementary material, we also present the derivation of the method to calculate the number of REU for the different measures of association using the gold standard studies and demonstrate that the interpretation of the number of REU (ie, how many times more individuals an actual study would need to achieve the precision of the gold standard study) is correct.

In summary, we believe that the use of REU provides an explicit quantification of the random error with an easy intuitive interpretation and it might help avoiding some common mistakes concerning random error. However, it does not offer more, and it cannot per se replace existing methods. For example, it can be used as a helping tool together with the point estimate and its CIs. Those who prefer to present data without any tool, which can be misused for statistical null hypothesis testing, can consider the presentation of effect size with an accompanying information on the number of REU. We now expand the use of REU to the most common measures of associations in epidemiology and to continuous variables, and we have developed a Stata program (freely available on the Boston College Archive), which greatly facilitates the calculation of REU.

Acknowledgments

The authors wish to thank Dr Kenneth J. Rothman (Research Triangle Institute and Boston University School of Public Health) for the important discussion and advices that initiated this work.

Supplementary materials

Part I. Calculation of random error units for different measures of effect using our postestimation command in Stata

The reu command is available for download from the Boston College Archive. To install it, type at command line: ssc install reu.

OR
- . // Example for calculating the number of random error units for an OR
- . glm outcome exposure covar_1…covar_n, fam(bin) link(logit)
- . reu exposure
- //exposure is either binary or continuous; for dummies, each dummy needs be mentioned after the reu command
- // works equally well with logistic or logit procedures
Incidence rate ratio
- . // Example for calculating the number of random error units for an incidence rate ratio
- . glm outcome exposure covar_1…covar_n, fam(possion) link(log)
- . reu exposure
- //exposure is either binary or continuous, for dummies each dummy needs be mentioned after the reu command
- // works equally well with Poisson procedure
HR
- . // Example for calculating the number of random error units for HR
- . stset time, failure(outcome)
- . stcox exposure covar_1...covar_n
- //exposure is either binary or continuous; for dummies, each dummy needs be mentioned after the reu command
- . reu exposure
Risk ratio
- . // Example for calculating the number of random error units for risk ratio
- . binreg outcome exposure covar_1...covar_n, rr
- //exposure is either binary or continuous; for dummies, each dummy needs be mentioned after the reu command
- . reu exposure
Risk difference
- . // Example for calculating the number of random error units for risk difference
- . binreg outcome exposure covar_1...covar_n, rd
- //exposure is either binary or continuous; for dummies, each dummy needs be mentioned after the reu command
- . reu exposure
- // works equally well with linear regression

Part II. Derivation of the method to calculate REU as presented in

OR:
SE of log OR = √(1/a+1/b+1/c+1/d)
where a, b, c, d refer to those having both the outcome and the exposure, those not having the outcome but being exposed, those having the outcome but not being exposed, and those without the outcome nor exposure, respectively.
Since in the gold standard a=b=c=d=250,000, it follows that SE in the gold standard is 0.004.
Incidence rate ratio/HR
SE of log incidence rate ratio=√(1/a+1/b)
where a and b refer to exposed and unexposed cases, respectively.
Since a=b=250,000, it follows that SE in the gold standard is 0.0028284.
Risk ratio
SE of log risk ratio=√(1/a+1/b–1/c–1/d)
where a, b, c, d refer to exposed and unexposed cases, total number of exposed, and unexposed individuals, respectively.
Since a=b=250,000 and c=d=500,000, it follows that SE in the gold standard is 0.002.
Risk difference
SE of risk difference=√(a(c–a)/c^Citation3+b(d–a)/d^Citation3)
where a, b, c, d refer to exposed and unexposed cases, total number of exposed, and unexposed individuals, respectively.
Since a=b=50 and c=d=500,000, it follows that SE in the gold standard is 0.00002.

Part III. Demonstration of the interpretation of the REU

The number of random error units shows how many times more individuals an actual study would need to achieve the precision of the gold standard study. First we start the demonstration of this interpretation with an example for the OR. We consider a study on 100 individuals, half of them exposed to a dichotomous exposure that has no effect on the – likewise dichotomous – outcome, which is also present in half of the individuals. The standard error of the log OR in this study is 0.4, and consequently the number of random error units is 10,000. If we multiply this study with 10,000 (keeping the proportion of exposed and those with an outcome constant), we are getting exactly the proposed gold standard study (ie, a study on one million individuals, half of them exposed to a dichotomous exposure that has no effect on the outcome, which is also present in half of the individuals). More generally, decreasing the standard error of a study by a factor of n requires n² times as many observations (providing that the distribution of the exposure and outcome is constant).

SE/n=1/n√(1/a+1/b+1/c+1/d)= √(1/(n^Citation2a)+1/(n^Citation2b)+1/ (n^Citation2c)+1/(n^Citation2d))

The same can be shown for SE for the rest of the measures of associations.

Disclosure

The authors report no conflicts of interest in this work.

References

RothmanKJCurbing type I and type II errorsEur J Epidemiol201025422322420232112
PubMed Web of Science ®Google Scholar
StangAPooleCKussOThe ongoing tyranny of statistical significance testing in biomedical researchEur J Epidemiol201025422523020339903
PubMed Web of Science ®Google Scholar
GoodmanSA dirty dozen: twelve p-value misconceptionsSemin Hematol200845313514018582619
PubMed Web of Science ®Google Scholar
GreenlandSSennSJRothmanKJStatistical tests, P values, confidence intervals, and power: a guide to misinterpretationsEur J Epidemiol201631433735027209009
PubMed Web of Science ®Google Scholar
WassersteinRLLazarNAThe ASA’s statement on p-values: context, process and purposeAm Stat2016702129133
Web of Science ®Google Scholar
StangADeckertMPooleCRothmanKJStatistical inference in abstracts of major medical and epidemiology journals 1975–2014: a systematic reviewEur J Epidemiol2017321212927858205
PubMed Web of Science ®Google Scholar
FidlerFThomasonNCummingGFinchSLeemanJEditors can lead researchers to confidence intervals, but can’t make them think: statistical reform lessons from medicinePsychol Sci200415211912614738519
PubMed Web of Science ®Google Scholar
MccormackJVandermeerBAllanGMHow confidence intervals become confusion intervalsBMC Med Res Methodol20131313424172248
PubMed Web of Science ®Google Scholar
TrafimowDMarksMEditorialBasic Appl Soc Psych201537112
Web of Science ®Google Scholar
JanszkyIBjørngaardJHRomundstadPVattenLA novel approach to quantify random error explicitly in epidemiological studiesEur J Epidemiol2011261289990221805167
PubMed Web of Science ®Google Scholar

Download PDF

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Random error units, extension of a novel method to express random error in epidemiological studies

Abstract

Table 1 Random error units for the most frequent measures of effect in epidemiology

Table 2 Number of random error units in some hypothetical studies with dichotomous exposures and outcomes using incidence rate ratios, risk ratios, and risk differences

Acknowledgments

Supplementary materials

Part I. Calculation of random error units for different measures of effect using our postestimation command in Stata

Part II. Derivation of the method to calculate REU as presented in

Part III. Demonstration of the interpretation of the REU

Disclosure

References

Information for

Open access

Opportunities

Help and information

Random error units, extension of a novel method to express random error in epidemiological studies

Abstract

Table 1 Random error units for the most frequent measures of effect in epidemiology

Table 2 Number of random error units in some hypothetical studies with dichotomous exposures and outcomes using incidence rate ratios, risk ratios, and risk differences

Acknowledgments

Supplementary materials

Part I. Calculation of random error units for different measures of effect using our postestimation command in Stata

Part II. Derivation of the method to calculate REU as presented in Table 1

Part III. Demonstration of the interpretation of the REU

Disclosure

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Part II. Derivation of the method to calculate REU as presented in