Full article: Confidence Intervals for the Mean of a Log-Normal Distribution

31,551

Views

CrossRef citations to date

Altmetric

Abstract

Methods for calculating confidence intervals for the mean are reviewed for the case where the data come from a log-normal distribution. In a simulation study it is found that a variation of the method suggested by Cox works well in practice. An approach based on Generalized confidence intervals also works well. A comparison of our results with those of CitationZhou and Gao (1997) reveals that it may be preferable to base the interval on t values, rather than on z values.

Keywords

Generalized confidence interval

1. The problem

In applied statistics classes we sometimes come across data that need to be transformed prior to analysis. For example, income data can often be considered to be log-normal. One way of analyzing such data is to log-transform the original variable X and to base the inference on the transformed variable Y = log(X). This means that we assume that the distribution from which our data emerges can be approximated with a log-normal distribution. In this paper we will discuss interval estimation of the arithmetic mean value of X in a log-normal distribution. It is true that the median is often used to describe the average of skewed distributions like income distributions. However, there are situations when the arithmetic mean is a parameter of interest. For example, in a sample survey, a confidence interval for the average income can be used to calculate a confidence interval for the total income in the population.

Note that if X is log-normal, then the median of Y is equal to the log of the median of X. In this paper we will assume that it is the arithmetic mean of X, and not the median of X, that we want to make inference about.

It is a rather straight-forward task to use the log-transformed data Y to calculate a confidence interval for the expected value (mean value) of Y. We will discuss how this result can be used to calculate a confidence interval for the expected value of X.

2. Theory and notation

Let X denote the original variable that follows a log-normal distribution.

X has expected value and variance Var(X)=. We let Y denote the log-transformed, normally distributed variable Y = log(X), that has mean value E(Y)=, and variance . Denote the sample mean of Y with , and the sample variance of Y with s².

It holds (see e.g. CitationZhou and Gao, 1997) that

(1)

This means that the mean value of X is not equal to the antilog of the mean value of Y. An estimator of can be calculated from sample data as

(2)

An estimator of the variance of is given by

(3)

see e.g. CitationZhou and Gao, (1997).

3. Confidence intervals for

3.1 A numerical example

We will illustrate a number of methods for computing a confidence interval for using a small numerical example. The methods include a naïve method based on transformation and back-transformation; a method proposed by Cox; a modified version of the Cox method; a method motivated by large-sample theory; and a method based on generalized confidence intervals (CitationWeerahandi, 1993; CitationKrishnamoorthy and Mathew, 2003). Other methods that have been suggested for the same purpose are reviewed in CitationZhou and Gao (1997), but according to their simulation results the Cox method works well in large samples, and reasonably well even in small samples.

One sample of n=40 observations was generated, using CitationSAS (1997) software, from a log-normal distribution with parameters and . The population mean of X is . The observations were transformed as Y=log(X). The raw sample data are given in . The sample data are summarized in .

Table 1. A sample of data from a log-normal distribution.

Download CSV Display Table

Table 2. Summary statistics for the sample data.

Download CSV Display Table

3.2 Naïve method

It would seem natural to use the following “naïve” approach for calculating a confidence interval for . A confidence interval for is calculated using standard methods. The limits of the confidence interval are back-transformed to give the limits in a confidence interval for .

For our example data, the naïve approach would produce the point estimate . A standard 95% confidence interval for is calculated as with limits [4.806, 5.448]. This would give limits for as e^4.806 = 122.24 and e^5.448 = 232.29. Note that this confidence interval does not cover the population mean value, which is 244.69. Of course, this can occur because of chance; after all, we have only studied one single sample so far. However, it is noteworthy that the interval does not even cover the sample mean, which is 275.0. This illustrates the fact that the naïve method gives a biased estimator of .

3.3 Cox method

Cox (quoted as “personal communication” in CitationLand, 1971) has suggested that a confidence interval for can be calculated in the following way:

Calculate a confidence interval for log() as

(4)

where z is the appropriate percentage point of the standard Normal distribution. The limits in this confidence interval are back-transformed to give a confidence interval for . The method is valid for large samples. A similar approach has been suggested by CitationZhou, Gao, and Hui (1997) for the two-sample case.

For the sample data, and s²=1.010. The 95% confidence interval for log(X) is with confidence limits [5.248, 6.016]. Taking anti-logs we obtain the limits in the 95% confidence interval for as e^5.248 = 190.24 and e^6.016 = 409.82, respectively. A point estimate of is .

3.4 Cox method: a modified version

In the version of (4) that was given in CitationZhou and Gao (1997), the standard normal variate z was used. We propose to use t, with degrees of freedom based on the d.f. for the estimate of . There are several reasons for this suggestion. One reason is that a confidence interval for would base the interval on t. A second reason is simply that this will produce confidence intervals with coverage closer to the nominal level. The use of z instead of t might explain the rather poor performance of the Cox interval, for small n, in the simulations in CitationZhou and Gao (1997); our results presented below are considerably better.

For the sample data, and s² = 1.010. The 95% confidence interval for log(X) is with confidence limits [5.237, 6.027]. Taking anti-logs we obtain the limits in the 95% confidence interval for as e^5.237 = 188.0 and e^6.027 = 414.7, respectively. For this sample size, the difference compared to the standard Cox method is small.

3.5 Generalized confidence intervals

Generalized confidence intervals (CitationWeerahandi, 1993) can be used for inference about parameters where the sampling distribution is complicated. As noted in equation (2), the lognormal mean is a function of , which can be assumed to be Normally distributed, and S², which is a function of a variate. CitationKrishnamoorthy and Mathew (2003, p. 108) suggested the following procedure for computing a confidence interval for the lognormal mean:

Calculate and s² from the data.

For i = 1 to m (where m is large, for example m=10000)

Generate Z ∼ N(0, 1) and .

For each i, calculate .

(end i loop)

For a 95% confidence interval, the 2.5% and 97.5% percentiles for T₂ are calculated from the 10000 simulated values. These are the lower and upper limits in a confidence interval for . This means that a 95% confidence interval for the lognormal mean is obtained as [exp(T_2;0.025), exp(T_2;0.975)].

3.6 An approach based on large-sample theory

Instead of basing the calculations on transformed data the confidence interval may be calculated from the sample mean and sample variance of X directly, without using any transformations. According to the Central limit theorem, the distribution of a sample mean can be approximated with a normal distribution if n is reasonably large, for a large class of distributions. Thus, for large samples we can calculate the confidence interval as

(5)

In our example, the 95% confidence interval can be calculated as , which gives the limits as [178.84, 371.16].

4. An application

The data in are nine measurements of carbon monoxide levels in the air. The measurements were made close to a California oil refinery in 1990 – 1993. We will use these data to obtain confidence intervals for the mean carbon monoxide level. Initial investigations of these data, and of other similar datasets, indicates that a log-normal model may be appropriate. The data are posted at lib.stat.cmu.edu/DASL/.

Table 3. Carbon monoxide levels at an oil refinery in California.

Download CSV Display Table

The 95% confidence intervals for the example data, using the different methods we have discussed, are given in . It may be noted that our modified Cox method gives a somewhat wider interval than the Cox method, as expected. The generalized confidence interval has an upper limit that is well above the others, for these data.

Table 4. Lower and upper limits in confidence intervals using the different methods.

Download CSV Display Table

5. A simulation study

Samples of sizes 5 to 500 were generated from a log-normal distribution with parameters and replications were used. The CitationSAS (1997) software was used for simulation and analysis. Confidence intervals for the mean value were calculated according to the methods discussed above, in each sample.

The confidence intervals included are:

the naïve approach.
the Cox approach (equation (4), using z as multiplier.
the modified Cox method with t instead of z as multiplier.
the generalized confidence intervals. The simulation of the sampling distribution was based on 10000 replications.
the Large-sample approach, i.e.

Each interval was compared to the population mean value , and the number of intervals below, covering, or above was calculated. The results that are summarized in give the percentage of the samples that cover , and the percentage of the samples that produce intervals above or below .

Table 5. Results of the simulation study: percent of all intervals that cover the true parameter value.

Download CSV Display Table

Table

Download CSV Display Table

6. Discussion

The results for the Cox intervals are similar to the simulations in CitationZhou and Gao (1997). However, the coverage percentage is improved, especially in small samples, if the intervals are based on t rather than on z. For the modified Cox approach, the percentage of intervals which cover is close to the nominal level, 95%, for all sample sizes. This also holds for the generalized confidence interval approach. Note that the modified Cox intervals are slightly assymetric with a higher percentage to the left. The generalized confidence intervals are also slightly assymetric, but to the right.

The large-sample method, that is based on Central Limit Theorem arguments, gives a consistently lower coverage than 95%. Sample sizes of more than 200 seem to be needed to obtain a confidence level close to the nominal one. As expected, the intervals based on the naïve approach fail, since these intervals are intervals for some other parameter. The simulations were also run with standard deviations 0.5 and 2. All methods performed somewhat worse when the standard deviation increased but the relationships between methods remained unchanged.

It seems that the confidence intervals based on the modified Cox method work well for practical purposes. The calculations are simple and may be performed by hand, if desired. The generalized confidence interval approach also works well; a small disadvantage is that it requires a computer to simulate the sampling distribution.

References

Krishnamoorthy, K. and Mathew, T. (2003), “Inferences on the meansof lognormal distributions using generalized p-values and generalizedconfidence intervals,” Journal of statistical planning and inference, 115, 103–121.
Web of Science ®Google Scholar
Land, C. E. (1971), “Confidence intervals for linear functions ofthe normal mean and variance,” Annals of Mathematical Statistics, 42, 1187–1205.
Google Scholar
SAS Institute Inc. (1997), SAS/STAT software: Changesand enhancements through Release 6.12, Cary, NC: SAS Institute Inc.
Google Scholar
Weerahandi, S. (1993), “Generalized confidence intervals”. Journal of the American Statistical Association, 88, 899–905.
Web of Science ®Google Scholar
Zhou, X-H., and Gao, S. (1997), “Confidence intervals for the log-normal mean,” Statistics in Medicine, 16, 783–790.
PubMed Web of Science ®Google Scholar
Zhou, X-H., Gao, S., and Hui, S. L. (1997), “Methods for comparingthe means of two independent log-normal samples”, Biometrics, 53, 1129–1135.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Confidence Intervals for the Mean of a Log-Normal Distribution

Abstract

1. The problem

2. Theory and notation

3. Confidence intervals for

3.1 A numerical example

Table 1. A sample of data from a log-normal distribution.

Table 2. Summary statistics for the sample data.

3.2 Naïve method

3.3 Cox method

3.4 Cox method: a modified version

3.5 Generalized confidence intervals

3.6 An approach based on large-sample theory

4. An application

Table 3. Carbon monoxide levels at an oil refinery in California.

Table 4. Lower and upper limits in confidence intervals using the different methods.

5. A simulation study

Table 5. Results of the simulation study: percent of all intervals that cover the true parameter value.

6. Discussion

References

Information for

Open access

Opportunities

Help and information

Confidence Intervals for the Mean of a Log-Normal Distribution

Abstract

1. The problem

2. Theory and notation

3. Confidence intervals for

3.1 A numerical example

Table 1. A sample of data from a log-normal distribution.

Table 2. Summary statistics for the sample data.

3.2 Naïve method

3.3 Cox method

3.4 Cox method: a modified version

3.5 Generalized confidence intervals

3.6 An approach based on large-sample theory

4. An application

Table 3. Carbon monoxide levels at an oil refinery in California.

Table 4. Lower and upper limits in confidence intervals using the different methods.

5. A simulation study

Table 5. Results of the simulation study: percent of all intervals that cover the true parameter value.

6. Discussion

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date