1,736
Views
2
CrossRef citations to date
0
Altmetric
STATISTICS

Estimation of Population Mean by Using a Generalized Family of Estimators Under Classical Ranked Set Sampling

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Article: 1948184 | Received 13 Feb 2020, Accepted 23 Jun 2021, Published online: 26 Jul 2021

ABSTRACT

Estimation of population mean of study variable Y suffers loss of precision in the presence of high variation in the data set. The use of auxiliary information incorporated in construction of an estimator under Rank set sampling scheme results in efficient estimation of population mean. In this paper, we propose an efficient generalized family of estimators to estimate finite population mean of study variable under ranked set sampling utilizing information on an auxiliary variable. Bias and Mean Square Error (MSE) of the proposed generalized family of estimators are derived. The conditions of efficiency of proposed generalized family of estimators from competitor estimators are also derived. The applications of estimator are discussed using simulation study and real-life data sets for comparisons of efficiency. It is concluded that when correlation between study and auxiliary variables increases, the proposed generalized family of estimators proves to be the efficient estimator of population mean of the study variable.

1. Introduction

In many situations of practical interest, mainly in environmental and ecological studies, the variable of interest, say Y, is not easily observable in the sense that measurement may be expensive, time consuming, invasive or even destructive. Although data collection may be complex, ranking the potential sampled units with respect to an available auxiliary variable can often be relatively simple at no additional cost or for a very little cost. In those situations, where the variations in study variable is high and it is strongly correlated with auxiliary variable, the Ranked Set Sampling (RSS) proposed by Mclntyre (McIntyre, Citation1952) is more efficient as compared to Simple Random Sampling (SRS) (Patil et al., Citation1993, Citation1994; Stokes, Citation1977).

Literature on RSS has rapidly grown and several estimators, originally conceived for the SRS, have been re-proposed to estimate the mean of the study variable by changing their sampling design into RSS framework (Ali et al., Citation2021; Iqbal et al., Citation2020; Kadilar et al., Citation2009; Khan & Shabbir, Citation2016a, Citation2016b, Citation2016c; Mandowara & Mehta, Citation2013, Citation2016; Mehta & Mandowara, Citation2016; Pelle & Perri, Citation2018; Samawi & Muttlak, Citation1996; Singh et al., Citation2014; Vishwakarma et al., Citation2017). Motivated by these studies, and in line with many other contributions present in the literature section of this article, we propose an efficient generalized family of estimators by changing the sampling design of the estimator proposed by Shahzad et al. (Shahzad et al., Citation2019).

Notations under Ranked Set Sampling Design

Let Ω = {1,2, …, N} be a finite population of N units, Y the variable under study and X an auxiliary variable which is highly correlated with Y. Let µy and µx denote the population means of Y and X, respectively, Sy2 and Sx2 the variances, Cy and Cx the coefficients of variation, ρxy the correlation coefficient between X and Y, β2xand β1xthe kurtosis and skewness, and Cxy=ρxyCxCy. Let us denote Xji,Yji as the pair of the ith-order statistics of X and the associated element Y in the jth cycle. Then the ranked set sample is,

X11,Y11,X1m,Y1m,X21,Y21,X2m,Y2m,...Xr1,Yr1,Xrm,Yrm.

To obtain biases and mean square error, we consider following notations under RSS:

e0=yˉrssμyμy, e1=xˉrssμxμx. yˉrss=μy1+e0, xˉrss=μx1+e1. xˉi=j=1rXjir,yˉi=j=1ryjir, i=1,2,,m.

τxi=xˉiμx: Deviation of ith cycle ranked mean from population mean µx.

τyi=yˉiμy: Deviation of ith cycle ranked mean from population mean µy.

τxyi=xˉiμxyˉiμy: Cross product of the deviations.

γ=1n1N=1mr1N1mr,Cy2=Sy2μy2,Cx2=Sx2μx2,Cxy=Sxyμxμy=ρxyCyCx, Wxi2=i=1mτxi2m2rμx2,Wyi2=i=1mτyi2m2rμy2, Wxyi=i=1mτxyimrμxμy.

To obtain biases and mean square error, we consider following notations under SRS:

ϑ0=yˉsrsμyμy, ϑ1=xˉsrsμxμx,

yˉsrs=μy1+ϑ0, xˉsrs=μx1+ϑ1.

(1.1) Ee0=Ee1=0,Ee02=VyyWyi2=Vyy,Ee12=VxxWxi2=Vxx,Ee0e1=VxyWxyi=Vxy,Eϑ0=Eϑ1=0,Eϑ02=γCy2=Vyy,Eϑ12=γCx2=Vxx,Eϑ0ϑ1=γCxy=Vxy.(1.1)

2. Some Existing Estimators under SRS and RSS

Following are given some famous estimators along with their mean square errors.

Mandowara and Mehta (Mandowara & Mehta, Citation2013) proposed following estimators,

(2.1) ΔRSSmm1=yˉrssμxCx+β2xxˉrssCx+β2xδ,(2.1)
(2.2) ΔRSSmm2=yˉrssμxβ2x+Cxxˉrssβ2x+Cxα,(2.2)
(2.3) ΔRSSmm4=yˉrssμx+β2xxˉrss+β2x.(2.3)

The Bias and MSE of ΔRSSmm1, ΔRSSmm2 and ΔRSSmm4are,

BΔRSSmm1=μy2δϕ2δ+1Vxx2Vxy,
(2.4) MSEΔRSSmm1=μy2Vyy+δ2ϕ22Vxx2δϕ2Vxy,(2.4)
BΔRSSmm2=μy2αϕ1α+1Vxx2Vxy,
(2.5) MSEΔRSSmm2=μy2Vyy+α2ϕ12Vxx2αϕ1Vxy,(2.5)
BΔRSSmm4=μyλ22Vxxλ2Vxy,
(2.6) MSEΔRSSmm4=μy2Vyy+λ22Vxx2λ2Vxy.(2.6)

Which are minimum for,

δopt=ρxyCyϕ2Cx, ϕ2=μxCxμxCx+β2x, αopt=ρxyCyϕ1Cx, ϕ1=μxβ2xμxβ2x+Cx, λ2=μxμx+β2x.

Vishwakarma, Zeeshan and Bouza (Vishwakarma et al., Citation2017) developed the following exponential estimator,

(2.7) ΔRSSvz=yˉrssexpμxxˉrssμx+xˉrss.(2.7)

The MSE of ΔRSSvz is,

(2.8) MSEΔRSSvz=μy2Vxx4+VyyVxy.(2.8)

Shahzad et al. (Shahzad et al., Citation2019) introduced the generalized form of the estimators under simple random sampling given as,

(2.9) ΔSRSsh=wsh1yˉaμx+bαaxˉ+b+1αaμx+bg+wsh2μxxˉ.expηshμxxˉηsh2ξμxϕμx+xˉ+2λ.(2.9)

Wherewsh1,wsh2,α,a,b,g,ηsh,ξ,ϕ and λ are all constants. The bias and MSE of ΔSRSsh are,

BΔSRSsh=wsh1μy1k1Vxy+gg+12α2Γ2+αγshΓg+32γsh2Vxx+wsh2μxγshVxxμy,

MSEΔSRSsh=μy2+wsh12ΘAsh+wsh22ΘBsh+wsh1wsh2ΘCsh+wsh1ΘDsh+wsh2ΘEsh.

Where,

k1=αΓ+γsh, γsh=ηshμxbk+1ηshμx+2λ, bk=2ξϕ, k2=gg+12α2Γ2+αγshΓg+32γsh2,

ΘAsh=μy21+Vyy+k12+2k2Vxx4k1Vxy, ΘBsh=μx2Vxx, ΘCsh=2μyμxk1+γshVxxVxy,

ΘDsh=2μy2k1Vxyk2Vxx1, ΘEsh=2μyμxγshVxx.

Which is minimum for,

wsh1opt=2ΘBshΘDsh+ΘCshΘEsh4ΘAshΘBshΘCsh2, wsh2opt=ΘCshΘDsh2ΘAshΘEsh4ΘAshΘBshΘCsh2.

The minimum MSE of ΔSRSsh is given by,

(2.10) MSEΔSRSshmin=μy2ΘBshΘDsh2+ΘAshΘEsh2ΘCshΘDshΘEsh4ΘAshΘBshΘCsh2.(2.10)

3. Proposed Generalized Family of Estimators under RSS

Motivated from Shahzad et al. (Shahzad et al., Citation2019), we propose the following generalized family of estimators under Ranked set sampling,

(3.1) ΔRSSG=w1yˉrssaμx+bαaxˉrss+b+1αaμx+bg+w2μxxˉrssexpημxxˉrssη2ξμxϕμx+xˉrss+2λ.(3.1)

Where w1 and w2 are unknown constants and α,a,b,g,η,ξ,ϕ and λ are suitably chosen known constants.

Derivation of Bias and Mean Square Error

Rewriting the above estimator with “e” terms under first order of approximation we get,

ΔRSSG=w1μy1+e0αgΓe1+gg+12α2Γ2e12αgΓe0e1w2μxe11νe1+32ν2e12.

Where,

Γ=aμxaμx+b,k1=αgΓ+ν,k2=gg+12α2Γ2+ναgΓ+32ν2,bk=2ξϕ,υ=ημxημxbk+1+2λ.

Subtracting μy on both sides,

(3.2) ΔRSSGμy=w1μy1+e0k1e1+k2e12k1e0e1w2μxe1νe12μy.(3.2)

For bias, we apply expectation on both sides of 3.2, the expression for bias of ΔRSSG is given as,

BiasΔRSSG=w1μy1k1Vxy+k2Vxx+w2μxνVxxμy.

For MSE, we apply square and expectation on both sides of 3.2, the expression for MSE of ΔRSSG is given as,

MSEΔRSSG=μy2+w12ΘAam+w22ΘBam+w1w2ΘCam+w1ΘDam+w2ΘEam.

Where,

ΘAam=μy21+Vyy+k12+2k2Vxx4k1Vxy, ΘBam=μx2Vxx, ΘCam=2μyμxk1+vVxxVxy,

ΘDam=2μy2k1Vxyk2Vxx1, ΘEam=2μyμxvVxx.

For minimizing MSE, we obtained the optimum values of w1 and w2 as follows:

w1opt=2ΘBamΘDam+ΘCamΘEam4ΘAamΘBamΘCam2,

and

w2opt=ΘCamΘDam2ΘAamΘEam4ΘAamΘBamΘCam2.

Hence, the minimum Bias and MSE are given by,

BiasΔRSSGmin=w1optμy1k1Vxy+k2Vxx+w2optμxνVxxμy,
(3.3) MSEΔRSSGmin=μy2ΘBamΘDam2+ΘAamΘEam2ΘCamΘDamΘEam4ΘAamΘBamΘCam2.(3.3)

The estimator of BiasΔRSSGmin and MSEΔRSSGmin based on sample measurements are given as follows:

BiasΔRSSGmin=w1optμy1k1Vxy+k2Vxx+w2optμ xνVxxμy,
(3.4) MSE(Δ(RSS)G)min=[μy2ΘBamΘDam2+ΘAamΘEam2ΘCamΘDamΘEam4ΘAamΘBam ΘCam2].(3.4)

Where,

ΘAam=μy21+Vyy+k12+2k2Vxx4k1Vxy, ΘBam=μx2Vxx,ΘCam=2μyμxk1+vVxxVxy,

ΘDam=2μy2k1Vxyk2Vxx1, ΘEam=2μyμxvVxx,w1opt=2ΘBamΘDam+ΘCamΘEam4ΘAamΘBamΘCam2,

w2opt=ΘCamΘDam2ΘAamΘEam4ΘAamΘBamΘCam2.

All of these are sample observations, so we will calculate these observations for quantifying Bias and MSE of our estimator for any given sample.

4. Efficiency Comparison

We derive the theoretical conditions to compare the efficiency of our proposed generalized family of estimators to their competitor estimators.

  1. By (3.3) and (2.4),

MSEΔRSSGmin<MSEΔRSSmm1

if

μy2ΘBamΘDam2+ΘAamΘEam2ΘCamΘDamΘEam4ΘAamΘBamΘCam2μy2Vyy+δ2ϕ22Vxx2δϕ2Vxy<0
  1. By (3.3) and (2.5),

MSEΔRSSGmin<MSEΔRSSmm2

if

μy2ΘBamΘDam2+ΘAamΘEam2ΘCamΘDamΘEam4ΘAamΘBamΘCam2μy2Vyy+α2ϕ12Vxx2αϕ1Vxy<0
  1. By (3.3) and (2.6),

MSEΔRSSGmin<MSEΔRSSmm4

if

μy2ΘBamΘDam2+ΘAamΘEam2ΘCamΘDamΘEam4ΘAamΘBamΘCam2μy2Vyy+λ22Vxx2λ2Vxy<0
  1. By (3.3) and (2.8),

MSEΔRSSGmin<MSEΔRSSvz

if

μy2ΘBamΘDam2+ΘAamΘEam2ΘCamΘDamΘEam4ΘAamΘBamΘCam2μy2Vxx4+VyyVxy<0
  1. By (3.3) and (2.10),

MSEΔRSSGmin<MSEΔSRSshmin

if

μy2ΘBamΘDam2+ΘAamΘEam2ΘCamΘDamΘEam4ΘAamΘBamΘCam2μy2ΘBshΘDsh2+ΘAshΘEsh2ΘCshΘDshΘEsh4ΘAshΘBshΘCsh2<0

Note: When these conditions are satisfied, the proposed estimators will perform more efficiently as compared to their competitor estimators.

5. Simulation Study

A hypothetical data for the study variable (Y) and auxiliary variable (X) is generated by using Bivariate Normal Distribution with parameters,

N=1500,n=20, 30,m=5, 10,r=4, 3,μx=850,μy=550,Cy=1.25,Cx=1.5,ρxy=0.4, 0.7,0.8,0.9,0.99.

Samples of different values of n have been simulated 50,000 to calculate their average mean square errors and percent relative efficiencies. Percent Relative Efficiencies (PREs) of our proposed generalized family of estimators along with competitor estimators from literature have been presented in for different values of n and ρxy.

Table 1. PRE of Estimators by Simulation Study with ρxy = 0.4

Table 2. PRE of Estimators by Simulation Study with ρxy = 0.7

Table 3. PRE of Estimators by Simulation Study with ρxy = 0.8

Table 4. PRE of Estimators by Simulation Study with ρxy = 0.9

Table 5. PRE of Estimators by Simulation Study with ρxy = 0.99

shows that, when correlation coefficient of x and y equals to 0.4 and n = 20, our proposed estimator will be 257.31% more efficient thenΔSRSsh. In the same situation proposed estimator is 204.05%, 201.47%, 223.89% and 224.67% more efficient thanΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. At the same correlation coefficient, when we increase the sample size by n = 30, our proposed estimator will be 262.10% more efficient then ΔSRSsh. In the same situation proposed estimator is 201.59%, 198.80%, 237.06% and 232.31% more efficient than ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. shows that, when correlation coefficient of x and y equals to 0.7 and n = 20, our proposed estimator will be 274.65% more efficient then ΔSRSsh. In the same situation proposed estimator is 204.64%, 201.69%, 233.34% and 231.27% more efficient than ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. At the same correlation coefficient, when we increase the sample size by n = 30, our proposed estimator will be 319.29% more efficient then ΔSRSsh. In the same situation proposed estimator is 209.63%, 202.55%, 252.28% and 241.51% more efficient than ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. shows that, when correlation coefficient of x and y equals to 0.8 and n = 20, our proposed estimator will be 366.52% more efficient than ΔSRSsh. In the same situation proposed estimator will be 218.39%, 209.52%, 249.15% and 241.89% more efficient then ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. At the same correlation coefficient, when we increase the sample size by n = 30, our proposed estimator will be 394.62% more efficient then ΔSRSsh. In the same situation proposed estimator is 190.97%, 186.10%, 248.46% and 219.85% more efficient than ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively.

shows that, when correlation coefficient of x and y equals to 0.9 and n = 20, our proposed estimator will be 437.43% more efficient then ΔSRSsh. In the same situation proposed estimator will be 210.49%, 193.42%, 292.25% and 236.98% more efficient then ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. At the same correlation coefficient, when we increase the sample size by n = 30, our proposed estimator will be 516.48% more efficient then ΔSRSsh. In the same situation proposed estimator is 185.20%, 178.34%, 318.10% and 258.32% more efficient than ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. shows that, when correlation coefficient of x and y equals to 0.99 and n = 20, our proposed estimator will be 614.42% more efficient then ΔSRSsh. In the same situation proposed estimator is 278.19%, 269.29%, 367.86% and 331.27% more efficient then ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. At the same correlation coefficient, when we increase the sample size by n = 30, our proposed estimator will be 680.86% more efficient then ΔSRSsh. In the same situation proposed estimator is 227.26%, 216.05%, 351.55% and 305.36% more efficient than ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. Simulated results in show the trend that when we increase the sample size, efficiency of proposed estimators under RSS design also increases as compare to estimator under SRS design.

Results also revealed that as we increase the ρxy, proposed estimator in RSS performs more efficiently as compared to its competitor estimator in SRS (i.e. ΔSRSsh). Therefore, we may say that as the correlation coefficient of x and y increases, the use of RSS is more appropriate as compared to SRS.

6. Real-Life Applications

To observe performances of the estimators, we use the following real-life data sets. The descriptions of these populations are given below.

Population I [source: (James et al., Citation2013)]

The summary statistics are given below.

Y: Acceleration of automobiles

X: Engine horsepower of automobiles

Objective: To estimate population mean of Acceleration of automobiles.

N=392,n=30,m=10,r=3,μx=104.4694,μy=15.5413,Sy=2.7589,Sx=38.4912,Cx=0.3684,Cy=0.1775,Cxy=0.0451, β2x=0.6541,β1x=1.079,ρxy=0.9091,

Population II [source: (Multiple Indicator Cluster Survey (MICS, 2018–19)]

The summary statistics are given as below.

Y: Body Mass Index (BMI)

X: Weight

Objective: To estimate population mean of Body Mass Index (BMI).

N=39118,n=30,m=10,r=3,μx=12.1883,μy=16.8151,Sy=10.8438,Sx=10.7911,Cx=0.8854,Cy=0.6449,Cxy=0.4877,β2x=54.4802,β1x=7.0622,ρxy=0.5542,

Population III [source: (Multiple Indicator Cluster Survey (MICS, 2018–19)]

The summary statistics are:

Y: Weight

X: Height

Objective: To estimate population mean of Weight.

N=39118,n=30,m=10,r=3,μx=94.6221,μy=12.1883,Sy=10.7911,Sx=101.0391,Cx=1.0678,Cy=0.8853,Cxy=0.7016,β2x=74.6241,β1x=8.658,ρxy=0.7421,

Population IV [source: (Daly et al., Citation2001)]

The summary statistics are:

Y: Body Mass Index (BMI) of Crohn’s disease patients

X: Weight of Crohn’s disease patients

Objective: To estimate population mean of Body Mass Index (BMI) of Crohn’s disease patients.

N=117,n=20,m=5,r=4,μx=69.0256,μy=26.0624,Sy=4.9888,Sx=14.2438,Cx=0.2063,Cy=0.1914,Cxy=0.0325,β2x=0.7746,β1x=0.6571,ρxy=0.8222,

Population V [source: (Husby et al., Citation2005)]

The summary statistics are:

Y: Body Mass Index (BMI)

X: Thigh Circumference

Objective: To estimate population mean of Body Mass Index (BMI).

N=36,n=8,m=4,r=2,μx=49.3806,μy=25.678,Sy=3.8198,Sx=3.7599,Cx=0.0761,Cy=0.1488,Cxy=0.0066,β2x=0.6159,β1x=0.0607, ρxy=0.9848,

Percent Relative Efficiencies (PREs) of our proposed generalized family of estimators along with competitor estimators from literature have been presented in for different real-life populations. shows that, when we consider the population I, our proposed estimator will be 363.74% more efficient then ΔSRSsh. In the same situation the proposed estimator is 150.18%, 150.18%, 155.15% and 153.49% more efficient than ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. shows that, when we consider the population II, our proposed estimator will be 418.12% more efficient than ΔSRSsh. In the same situation proposed estimator will be 140.35%, 140.35%, 177.79% and 151.89% more efficient then ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. shows that, when we consider the population III, our proposed estimator will be 149.93% more efficient then ΔSRSsh. In the same situation proposed estimator will be 118.81%, 118.81%, 127.66% and 121.52% more efficient then ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. shows that, when we consider the population IV, our proposed estimator will be 221.91% more efficient then ΔSRSsh. In the same situation proposed estimator will be 107.11%, 107.11%, 100.32% and 120.72% more efficient then ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively. shows that, when we consider the population V, our proposed estimator will be 410.47% more efficient then ΔSRSsh. In the same situation proposed estimator will be 104.32%, 104.32%, 124.13% and 186.79% more efficient then ΔRSSmm1, ΔRSSmm2, ΔRSSmm4 and ΔRSSvz respectively.

Table 6. PRE of Estimators for Population I

Table 7. PRE of Estimators for Population II

Table 8. PRE of Estimators for Population III

Table 9. PRE of Estimators for Population IV

Table 10. PRE of Estimators for Population V

7. Conclusion

In this study, we proposed generalized family of estimators under RSS to estimate the finite population mean motivated from Shahzad et al. (Shahzad et al., Citation2019). The biases and MSEs of the proposed estimators were derived up to first order of approximation. The efficiency conditions for the proposed generalized estimator were also derived. On the basis of simulation study and real-life data sets, MSEs of all estimators have been computed and it is shown that the proposed generalized family of estimators are more efficient than the competitor estimators under SRS and RSS. It may concluded that with an increase in sample size and ρxy the proposed estimator in RSS performs more efficiently as compared to its competitor estimators in SRS (i.e. ΔSRSsh).

Public interest statement

Estimation of population parameters with minimum mean square error is very important issue of survey sampling. Different sampling designs and estimators have been proposed by researchers to deal with this issue. In this study, we proposed a generalized family of estimators for estimating population mean under classical ranked set sampling. Mathematical comparison, Simulation study and real-life applications have been utilized for comparison of efficiency.

Acknowledgements

The authors wish to thanks DG-Bureau of Statistics Punjab for providing the data about Multiple Indicator Cluster Survey (MICS) for the year 2018–19.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The authors received no direct funding for this research.

References