409
Views
9
CrossRef citations to date
0
Altmetric
Original Articles

Ordered Double Ranked Set Samples and Applications to Inference

, , &

SYNOPTIC ABSTRACT

Balakrishnan and Li (2005) introduced the use of ordered ranked set sampling (ORSS) and derived the best linear unbiased estimators (BLUEs) under ORSS (BLUEs-ORSS). In this study, we extend the work to ordered double ranked set sampling (ODRSS) scheme by using the idea of order statistics from independent and nonidentically distributed random variables. The BLUEs of the location and the scale parameters of a location-scale family of distributions are derived using ODRSS (BLUEs-ODRSS). It is shown that the BLUEs-ODRSS are uniformly better than the BLUEs-ORSS for the two-parameter exponential, normal, and generalized geometric distributions. Furthermore, we also study the properties of the distribution-free confidence intervals for quantiles and tolerance intervals based on ODRSS. We show that the confidence and tolerance intervals under the ODRSS scheme are more precise than their counterparts based on the ORSS scheme.

1. Introduction

McIntyre (Citation1952) proposed an efficient sampling method for estimation of pasture and forage yields, which later became known as ranked set sampling (RSS). The RSS scheme becomes an efficient and cost-effective alternative to simple random sampling (SRS) when the sampling units can be ranked visually or by any inexpensive method. The mathematical setup of the RSS scheme was derived by Takahasi and Wakimoto (Citation1968). They showed that, under perfect ranking, the RSS-based mean estimator is unbiased, and it is more precise than the mean estimator based on SRS. Later on, Dell and Clutter (Citation1972) relaxed the assumption of perfect ranking, and proved that, even in the presence of ranking errors, the RSS-based mean estimator is not only unbiased but it is at least as efficient as the mean estimator with SRS. For a detailed review, bibliography, and a monograph on RSS, see Patil (Citation1995); Patil, Sinha, and Taillie (1999); and Chen, Bai, and Sinha (Citation2004), respectively.

Recently, there has been considerable research on the inference procedures based on the RSS protocol. Bhoj and Ahsanullah (Citation1996) derived the best linear unbiased estimators (BLUEs) for the unknown parameters of the generalized geometric distribution (GGD) based on RSS (BLUEs-RSS), and showed that the BLUEs-RSS are more efficient than the BLUEs based on simple ordered statistics (BLUE-OS). Al-Saleh and Al-Kadiri (Citation2000) introduced the double RSS (DRSS) scheme. DRSS also provides an unbiased estimator of the population mean, and it is uniformly better than the mean estimator based on RSS. Hossain and Muttlak (Citation2000) obtained the BLUEs-RSS of the unknown parameters of several location-scale distributions and showed their superiority over the BLUEs-OS. Balakrishnan and Li (Citation2005) extended the work and suggested the BLUEs based on ordered ranked set sampling (ORSS): BLUEs-ORSS. It is shown that the BLUEs-ORSS are uniformly better than the BLUEs-RSS when estimating the unknown parameters of GGD. The distribution-free confidence intervals for quantiles and tolerance intervals based on ORSS were considered by Balakrishnan and Li (Citation2006). In a more recent work, Balakrishnan and Li (Citation2008) obtained the BLUEs-ORSS for two-parameter exponential, normal, and logistic distributions. For these distributions, it is observed that the BLUEs-ORSS are uniformly better than the BLUEs-OS and BLUEs-RSS.

In this study, we extend the work of Balakrishnan and Li (2005, 2008) and introduce ordered double ranked set sampling (ODRSS). The BLUEs of the unknown parameters of several location-scale families of distributions are derived under ODRSS (BLUEs-ODRSS). It is shown that, in terms of relative efficiency (RE), the BLUEs-ODRSS are uniformly better than the BLUEs-ORSS. We also study the behavior of confidence intervals for quantiles and tolerance intervals based on ODRSS. In comparison with the confidence and tolerance intervals constructed under ORSS, the proposed (confidence and tolerance) intervals under ODRSS turn out to be more precise than their counterparts in terms of having high exact confidence and tolerance levels.

The outline of the rest of the article is as follows. In Section 2, we briefly explain the traditional RSS, DRSS, and ODRSS schemes. The BLUEs-ODRSS of the unknown (location and scale) parameters of a location-scale family of distributions are derived. Moreover, the BLUEs-ODRSS are compared with the BLUEs-ORSS for several location-scale families of distributions in terms of REs. Section 4 provides confidence and tolerance intervals based on ODRSS. Section 5 concludes the paper.

2. Ranked Set Sampling Schemes

In this section, we explain RSS, DRSS, and ODRSS schemes. On the lines of Balakrishnan and Li (Citation2005), the BLUEs of the unknown parameters of a location-scale family of distributions under ODRSS scheme are derived.

The RSS scheme is as follows: identify m2 from the target population. Randomly allocate these units into m sets, each of size m units. Now rank the units within each set visually or by any inexpensive method. Select the smallest ranked unit from the first set of m units. The second smallest ranked unit is selected from the second set. The procedure continues until the largest ranked unit is selected from the last set. This completes one cycle of a ranked set sample of size m.

Al-Saleh and Al-Kadiri (Citation2000) introduced the DRSS scheme for estimating the population mean. DRSS is an extension of RSS. The DRSS scheme is as follows: identify m3 units from the target population. Partition these units into m sets, each of size m2 units. Apply the RSS scheme on each set to get m ranked set samples, each of size m. Again, apply the RSS scheme on m ranked set samples to obtain a double ranked set sample of size m.

The ODRSS procedure is as follows: Let S1, S2, …, Sm be m sets, each of size m2 units. Partition the m2 units in the ith set Si into m subsets sij, each of size m units, i.e., Si = {sij} = {si1, si2, …, sim} for j = 1, 2, …, m. The units of the jth subset sij of the ith set Si are denoted by sij = {Y(i)j1, Y(i)j2, …, Yjm(i)}. Apply the RSS procedure to these m sets to get m ranked set samples, each of size m. Suppose the ith set S*i contains that ith ranked set sample, i.e., S*i = {Y(1)(i), Y(i)(2), …, Y(m)(i)} for i = 1, 2, …, m. Again, apply the RSS procedure to these m ranked sets (S*i) to obtain a double ranked set sample of size m. Let ZDRSSi = ith min {Si*} for i = 1, 2, …, m, then (ZDRSS1, Z2DRSS, …, ZDRSSm) represent a double ranked set sample of size m. Let ZODRSS(1)Z(2)ODRSS ⩽ · · · ⩽ ZODRSS(m) denote an ordered double ranked set sample obtained by arranging ZDRSS1, Z2DRSS, …, ZDRSSm in an increasing order of magnitude.

Let Y1, Y2, …, Ym be m independent and identically distributed (IID) random variables from an absolutely continuous cumulative distribution function (CDF) F(y) and probability density function (PDF)f(y). Let Y(1), Y(2), …, Y(m) be the order statistics of the sample Y1, Y2, …, Ym. Then, the CDF and PDF of Y(r)(1 ⩽ rm) are, respectively, given by

For more details, see Arnold, Balakrishnan, and Nagaraja (Citation1992) and David and Nagaraja (Citation2003).

Note that Y(r), r = 1, 2, …, m, are independent and not identically distributed (INID) random variables. Following David and Nagaraja (Citation2003), the CDF of ZDRSSr(1 ⩽ rm) is given by where denotes the sum over all permutations (j1, j2, …, jm) of (1, 2, …, m), for which j1 < j2 < · · · < ji and ji + 1 < ji + 2 < · · · < jm. For further results related to the order statistics from INID random variables, see Balakrishnan (1989a,b, 2007).

As mentioned by Al-Saleh and Al-Kadiri (Citation2000), here ZDRSSrs are INID random variables. Therefore, the CDF of ZODRSS(r)(1 ⩽ rm) is given by where denotes the sum over all permutations (j*1, j2*, …, j*m) of (1, 2, …, m) for which j*1 < j2* < · · · < j*i and j*i + 1 < ji + 2* < · · · < j*m.

Following Vaughan and Venables (Citation1972) and Bapat and Beg (Citation1989), another equivalent expression for the CDF of ZODRSS(r)(1 ⩽ rm) is given by (1)

where and Per(A) represents the permanent of the matrix A. Here “} i” and “} mi” show that the first and second rows are repeated i and mi times, respectively.

Similarly, the PDF of ZODRSS(r) is given by (2)

where

The joint density function of ZODRSS1(r) and ZODRSS2(s) (1 ⩽ r < sm) is given by (3) where

2.1. Moments of ODRSS

Let μODRSS(r) (1 ⩽ rm) and σODRSS(r, s) (1 ⩽ rsm) denote the means and variance–covariance based on ODRSS, respectively. The numerical values of μODRSS(r) and σODRSS(r, s) can be calculated by using (2) and (3). In and , using different values of m, the numerical values of μODRSS(r) and σODRSS(r, s) are given when sampling from four different location-scale families of distributions.

Table 1 Means under ODRSS for different distributions μODRSS(r)(1 ⩽ rm)

Table 2 Variances and covariances under ODRSS for different distributions, σODRSS(r, s)(1 ⩽ r < sm)

The considered location-scale distributions are given below.

  1. Let Y be an exponential random variable with PDF

  2. Let Y be a normal random variable with PDF

  3. Let Y be a generalized geometric random variable having PDF

    where Note that we consider two particular cases of GGD, i.e., if v = 1, GGD reduces to a symmetric rectangular distribution, and if v = 2, then GGD becomes the right triangular distribution.

In addition to the results presented in and , the ordered double ranked set samples possess some interesting distributional properties, which are similar to the usual order statistics given that the underlying PDF f(z) is symmetric about zero. Based on facts that f( − z) = f(z), F( − z) = 1 − F(z), f(r)(z) = f(mr + 1)( − z), f(r, s)(z1, z2) = f(ms + 1, mr + 1)( − z1, −z2), the following properties in Lemma 2.1. can be readily established (cf. Balakrishnan and Li, Citation2008).

Lemma 2.1 Consider ZODRSS = (ZODRSS(1)ZODRSS(2) ⩽ · · · ⩽ Z(m)ODRSS) is an ordered double ranked set sample of size m from a distribution which is symmetric, say about zero. Then, we have

Based on these relations under ODRSS, we have

The missing values in can be derived based on these relations.

3. BLUEs Based on ODRSS

Let ZODRSS = (ZODRSS(1), ZODRSS(2), …, Z(m)ODRSS)′ be an ordered double ranked set sample of size m obtained from a general location-scale distribution with location parameter μ and scale parameter σ( > 0). Let HODRSS(r) = (Z(r)ODRSS − μ)/σ be the standardized variate under ODRSS. Moreover, let E(HODRSS(r)) = μ(r)ODRSS, 1 ⩽ rm, Cov(HODRSS(r), H(s)ODRSS) = σODRSS(r, s), 1 ⩽ r < sm. Then, E(ZODRSS(r)) = μ + σ μ(r)ODRSS and Cov(ZODRSS(r), Z(s)ODRSS) = σ2σODRSS(r, s).

On the lines of Balakrishnan and Li (Citation2008), the BLUE-ODRSS, , of is given by (4)

which shows that the BLUEs-ODRSS can be written as a linear combination of ZODRSS, i.e., (5)

From (5), the values of coefficients (γr and δr) are given in based on different distributions for several choices of m.

Table 3 Values of the coefficients of the BLUEs-ODRSS for different distributions

The variance–covariance matrix of is given by (6)

 

where

If the underlying population is symmetric, then (4) and (6) can be simplified as follows: which shows that, for a symmetric population, the BLUEs-ODRSS are independent of each other. In , we report the variances and covariances of random variables under both ORSS and ODRSS schemes for different location-scale families of distributions.

Table 4 Variances–covariances of ORSS and ODRSS for different distributions

3.1. Comparison Between the BLUEs-ORSS and BLUEs-ODRSS

Balakrishnan and Li (2005, 2008) obtained the BLUEs-ORSS of the unknown parameters of the GGD, and normal, exponential, and logistic distributions, respectively. Let and be the BLUEs-ORSS of μ and σ, respectively. In order to compare the variances of the BLUEs-ORSS and BLUEs-ODRSS, we define the following REs:

Based on the results given in , it is worth mentioning that the BLUEs-ODRSS are uniformly better than the BLUEs-ORSS for all of the distributions considered here.

Table 5 REs of the BLUEs-ODRSS with respect to the BLUEs-ORSS

Here,

4. Confidence and Tolerance Intervals Based on ODRSS

In this section, we derive the confidence and tolerances intervals based on ODRSS. These intervals are also compared with their counterparts based on the ORSS scheme.

4.1. Distribution-Free Confidence Intervals for Quantiles

Consider Y be a continuous variable having CDFF(y), then the pth quantile is given by

The probability that the ξp lies between the rth and sth order statistics based on ODRSS is defined as

Using the fact that F(ZODRSS(i)) = U(i)ODRSS, 1 ⩽ im, we have

The above probabilities can be obtained easily by using (1), when the underlying distribution is uniform, i.e., YiU(0, 1). It is to be noted that the above probability depends only on r,s,m, and p. This shows that the random interval for ξp, i.e., [ZODRSS(r), Z(s)ODRSS] is purely nonparametric (distribution-free), because it is obtained without making any assumptions about the underlying parent distribution. As pointed out by Balakrishnan and Li (Citation2006), it is difficult to construct a confidence interval with exact confidence coefficient 1 − α, therefore, in , with different choices of r and s, two-sided 90% and 95% confidence intervals are given with corresponding confidence limits and exact confidence levels (ECLs).

Table 6 ODRSS- and ORSS-based 90% (bold 95%) confidence intervals for the pth quantile with ECL

Similarly, the upper confidence limit and lower confidence limit for ξp, are defined as follows: respectively. The upper and lower 100(1 − α)% confidence limits for ξp are given in and , respectively. For a fair comparison of single- and two-sided confidence intervals for ξp, we consider both ORSS and ODRSS.

Table 7 ODRSS- and ORSS-based 90% and 95% upper confidence intervals for the pth quantile with ECL

Table 8 ODRSS- and ORSS-based 90% and 95% lower confidence intervals for the pth quantile with ECL

Lemma 1

For a given value ofp, such that ξp = F− 1(p), we have

  1. The two-sided confidence interval [ZODRSS(r), Z(s)ODRSS] contains ξp with confidence coefficient ⩾ (1 − α) if and only if the two-sided confidence interval [ZODRSS(ms + 1), Z(mr + 1)ODRSS] contains ξ1 − p with confidence coefficient ⩾ (1 − α).

  2. The upper confidence limit ZODRSS(r) contains ξp with confidence coefficient ⩾ (1 − α) if and only if the lower confidence limits ZODRSS(mr + 1) contains ξ1 − p with confidence coefficient ⩾ (1 − α).

The proof follows that of Balakrishnan and Li (Citation2006).

From the results given in –8, it is clear that the ECLs of the proposed confidence intervals of quantiles are uniformly greater than the ECLs of confidence intervals based on ORSS. This confirms the superiority of the ODRSS over ORSS. The intervals are consistent with the symmetric relations presented in Lemma 4.1. Under ODRSS, there exist some confidence intervals for quantiles that are not possible under ORSS. For example, a 90% confidence interval [ZODRSS(2), Z(4)ODRSS] exists for ξ0.7 when m = 4, but it is difficult to obtain under ORSS.

4.2. Distribution-Free Tolerance Intervals

In order to construct a tolerance interval that covers at least a fixed proportion, say γ, of the population with tolerance level β, we require ZODRSS(r) and ZODRSS(s) (1 ⩽ r < sm), such that (7)

The one-sided tolerance intervals can be constructed on setting ZODRSS(r) = −∞ or ZODRSS(s) = ∞. This probability can be simplified as follows: (8) where QODRSSr, s = U(s)ODRSSUODRSS(r).

It is clear that it is difficult to satisfy (7) completely, but we can select values of r and s such that the quantity sr + 1 is as small as possible and at the same time satisfying (9)

Following (3), the joint density function of UODRSS(r) and UODRSS(s) (1 ⩽ r < sm) is given by (10) where

Consider the transformations, let QODRSSr, s = U(s)ODRSSUODRSS(r) and Q*ODRSSr, s = U(r)ODRSS. Then, using the Jacobian method, and after some simplification, the marginal distribution of QODRSSr, s is (11)

Similarly, the CDF of QODRSSr, s, , is given by

Note that the CDF can be used to find the probabilities for (8). In , we provide several tolerance intervals with exact tolerance levels (ETLs) for different values of m and γ.

Table 9 Two-sided tolerance intervals 90% (bold 95%) that cover γ proportion of the population with ETL

Lemma 2

For 0 < γ, β < 1, then the tolerance interval [ZODRSS(r), Z(s)ODRSS] with confidence coefficient β covers γ proportion of the population if and only if the tolerance interval [ZODRSS(ms + 1), Z(mr + 1)ODRSS] with confidence coefficient β covers γ proportion of the population, i.e.,

From , in terms of ETLs, it is noteworthy that the tolerance intervals with ODRSS are uniformly better than the tolerance intervals with ORSS. Furthermore, the tolerance intervals under ODRSS exist for more values of γ than ORSS.

5. Conclusion

In this article, we used the idea of INID random variables to propose an ODRSS scheme and then developed the BLUEs of the unknown parameters of several location-scale families of distributions. It is worth mentioning that the BLUEs-ODRSS are more precise than the BLUEs-ORSS. The distribution-free confidence intervals for quantiles and tolerance intervals were also derived under the ODRSS scheme. In terms of exact confidence and tolerance levels, the suggested intervals are uniformly better than their counterparts. Therefore, we recommend the use of ODRSS for precise estimation of the population parameters.

Acknowledgments

The authors are thankful to the Editor-in-Chief, Associate Editor, and anonymous referee(s) for their valuable comments and suggestions that led to an improved version of the article.

References

  • Al-Saleh, M.F., & Al-Kadiri, M. (2000). Double ranked set sampling. Statistics & Probability Letters, 48, 205–212.
  • Arnold, B.C., Balakrishnan, & N., Nagaraja, H.N. (1992). A first course in order statistics. New York, NY: John Wiley & Sons.
  • Bapat, R.B., & Beg, M.I. (1989). Order statistics from nonidentically distributed variables and permanents. Sankhya: The Indian Journal of Statistics, 51, 79–93.
  • Balakrishnan, N. (1989a). A relation for the covariances of order statistics from n independent and non-identically distributed random variables. Statistical Papers, 30, 141–146.
  • Balakrishnan, N. (1989b). Recurrence relations among moments of order statistics from two related sets of independent and non-identically distributed random variables. Annals of the Institute of Statistical Mathematics, 41, 323–329.
  • Balakrishnan, N. (2007). Permanents, order statistics, outliers, and robustness. Revista Matemática Complutense, 20, 7–107.
  • Balakrishnan, N., & Li, T. (2005). BLUEs of parameters of generalized geometric distribution using ordered ranked set sampling. Communications in Statistics-Simulation and Computation, 34, 253–266.
  • Balakrishnan, N., & Li, T. (2006). Confidence intervals for quantiles and tolerance intervals based on ordered ranked set samples. Annals of the Institute of Statistical Mathematics, 58, 757–777.
  • Balakrishnan, N., & Li, T. (2008). Ordered ranked set samples and applications to inference. Journal of Statistical Planning and Inference, 138, 3512–3524.
  • Bhoj, D.S., & Ahsanullah, M. (1996). Estimation of parameters of the generalized geometric distribution using ranked set sampling. Biometrics, 52, 685–694.
  • Chen, Z., Bai, Z., & Sinha, B.K. (2004). Ranked set sampling-theory and applications. Lecture Notes in Statistics, 176, New York, NY: Springer.
  • David, H.A., & Nagaraja, H.N. (2003). Order statistics (3rd ed.). New York, NY: John Wiley & Sons.
  • Dell, T.R., & Clutter, J.L. (1972). Ranked set sampling with order statistics background. Biometrics, 28, 545–555.
  • Hossain, S.S., & Muttlak, H.A. (2000). MVLUE of population parameters based on ranked set sampling. Applied Mathematics and Computation, 108, 167–176.
  • McIntyre, G.A. (1952). A method for unbiased selective sampling, using ranked sets. Australian Journal of Agricultural Research, 3, 385–390.
  • Patil, G.P. (1995). Editorial: ranked set sampling. Environmental and Ecological Statistics, 2, 271–285.
  • Patil, G.P., Sinha, A.K., & Taillie, C. (1999). Ranked set sampling: A bibliography. Environmental and Ecological Statistics, 6, 91–98.
  • Takahasi, K., & Wakimoto, K. (1968). On unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics, 20, 1–31.
  • Vaughan, R.J., & Venables, W.N. (1972). Permanent expressions for order statistic densities. Journal of the Royal Statistical Society, Series B, 34, 308–310.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.