2,436
Views
4
CrossRef citations to date
0
Altmetric
STATISTICS

Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling

&
Article: 1899402 | Received 15 Jul 2020, Accepted 22 Feb 2021, Published online: 11 Apr 2021

ABSTRACT

In this article, we have suggested a class of estimators for the estimation of the population variance of the variable of interest. The proposed estimators used some certain known information of the auxiliary variable, such as kurtosis, coefficient of variation, and the minimum and maximum values. The properties of the suggested class of estimators such as the bias and mean squared error (MSE) are obtained up to the first order of approximation. In order to check the performances of the estimators and to verify the theoretical results, we conducted a simulation study. The results of the simulation study show that the proposed class of estimators have lower MSE than other existing estimators. This holds for all simulation scenarios. In the application part, we used data from Statistical Bureau of Pakistan, and from the Textbook of Cochran, which also confirms that the suggested class of estimators is more efficient than the usual unbiased variance estimator, ratio estimator, traditional regression estimator, and other existing estimators in survey literature.

1. Introduction

The purpose of survey sampling is to get accurate information about the characteristics of the population for improving the efficiency of the estimators under study at the lowest costs, less time and human efforts (for more details, see Yang et al. (Citation2020)). In several populations, there has been a few extreme values and to estimate the unknown population parameters without including this information is very sensitive. In which case, the results will be underestimated or overestimated. To solve this issue, it is important to use this information in estimating the population parameters. Isaki (Citation1983), Bahl and Tuteja (Citation1991), Upadhyaya and Singh (Citation1999), Kadilar and Cingi (Citation2006), Dubey and Sharma (Citation2008), H. Singh and Chandra (Citation2008), Shabbir and Gupta (Citation2010), H. P. Singh and Solanki (Citation2013), and Yadav et al. (Citation2015) have all suggested some wider classes of estimators for estimating finite population variance. Consider a finite population U=U1,U2,U3,,UN of size N units. Let yi and xi be the values of the study variable Y and the auxiliary variable X for the ith units respectively. Let Yˉ=1/Ni=1NYi and Xˉ=1/Ni=1NXi be the population mean of the study and the auxiliary variable, respectively. It is further assumed that Sy2=1/N1i=1NYiYˉ2 and Sx2=1/N1i=1NXiXˉ2 be the population variances of the study as well as auxiliary variable, respectively.

To estimate the unknown population parameter Yˉ, we select a random sample of size n units from the population by using simple random sampling without replacement (SRSWOR). Let yˉ=1/ni=1nyi and xˉ=1/ni=1nxi be the sample means of the study and the auxiliary variables, respectively, and their corresponding sample variances are Sˆy2=1/n1i=1nyiyˉ2 and Sˆx2=1/n1i=1nxixˉ2, respectively.

To find the bias and MSE for different estimators, we define the following terms. Let e0=sy2Sy2Sy2, e1=sx2Sx2Sx2 and e2=xˉXˉXˉ such that Eei=0 for i = 0, 1, 2.Ee02=θλ40,Ee12=θλ04,Ee22=θCx2,Ee0e1=θλ22,Ee0e2=θCxλ21

Ee1e2=θCxλ03,

where λ40=(λ401), λ04=(λ041), λ22=(λ221), θ=1n1N. Also λrs=μrsμ20r/2μ02s/2, where μrs=i=1NYiYˉrXiXˉsN1. Here λ40=β2(y) and λ04=β2(x) are the population coefficients of kurtosis.

The usual variance estimator of Sˆy2=sy2 [1] for population variance is given by

(1.1) Var(Sˆy2)=θSy4λ40.(1.1)

Isaki (Citation1983) suggested a ratio-type estimator for the variance of the study variable Y, which is denoted by SˆR2 [2], and is given by

(1.2) SˆR2=sy2Sx2sx2,(1.2)

Expressions for bias and MSE of SˆR2, in sample random sampling (SRS) are given by

(1.3) BiasSˆR2θSy4λ04λ22,(1.3)

and

(1.4) MSESˆR2θSy4λ40+λ042λ22.(1.4)

The classical regression estimator Sˆlr2 [3] in SRS is given by

(1.5) Sˆlr2=sy2+b(sy2,sx2)Sx2sx2,(1.5)

where b(sy2,sx2)=sy2λˆ22sx2λˆ04 is the sample regression coefficient. The MSE of the estimator Sˆlr2, is given by

(1.6) MSESˆlr2θSy4λ401ρ2,(1.6)

where

(1.7) ρ=λ22λ40λ04(1.7)
.

Bahl and Tuteja (Citation1991) suggested an exponential ratio-type estimator for the population variance of the study variable Y, which is denoted by SˆBT2 [4] and is given by:

(1.8) SˆBT2=sy2expSx2sx2Sx2+sx2,(1.8)

Expressions for bias and MSE respectively of SˆBT2, are given by

(1.9) BiasSˆBT212θSy23λ044λ22,(1.9)

and

(1.10) MSESˆBT2θSy4λ40+λ044λ22.(1.10)

Upadhyaya and Singh (Citation1999) proposed a ratio-type estimator SˆUS2 [5], that uses the kurtosis of an auxiliary variable in SRS, given by

(1.11) SˆUS2=sy2Sx2+λ04sx2+λ04,(1.11)

Expressions for bias and MSE respectively of SˆUS2, are given by

(1.12) BiasSˆUS2θSy2g0g0λ04λ22,(1.12)

and

(1.13) MSESˆUS2θSy4λ40+g02λ042g0λ22,(1.13)

where

(1.14) g0=Sx2Sx2+λ04(1.14)
.

Kadilar and Cingi (Citation2006) suggested a class of ratio estimators SˆKCi2 [6–8] which are given by

(1.15) SˆKC12=sy2Sx2+Cxsx2+Cx,(1.15)
(1.16) SˆKC22=sy2λ04Sx2+Cxλ04sx2+Cx,(1.16)
(1.17) SˆKC32=sy2CxSx2+λ04Cxsx2+λ04,(1.17)

where Cx=SxXˉ is the population coefficient of variation.

Expressions for bias and MSE’s respectively of SˆKCi2(i=1,2,3), in SRS are given by

(1.18) BiasSˆKCi2θSy2gigiλ04λ22,(1.18)

and

(1.19) MSESˆKCi2θSy4λ40+gi2λ042giλ22,(1.19)

where

(1.20) g1=Sx2Sx2+Cx,g2=λ04Sx2λ04Sx2+Cx,g3=CxSx2CxSx2+λ04.(1.20)

2. Proposed estimators

Motivated by Daraz et al. (Citation2018), we proposed an improved class of estimators for estimating the finite population variance Sy2 using certain known population parameters under simple random sampling scheme. The proposed estimator is given by

(2.1) SˆD2=k1sy2Sx2sx2α1+k2XˉxˉSx2sx2α2expa1(sx2Sx2)a1(sx2+Sx2)+2b1,(2.1)

where k1 and k2 are the unknown constants whose values are to be determined such that the MSE’s are minimum, a1 and b1 are the parameters of the auxiliary variables. Also, α1 and α2 are the scalar quantities which contain the values (0, −1, 1) from (2.1) we can generate the different classes of proposed estimator which are given in .

Table 1. Some classes of the proposed estimator

where

(2.2) L=expa1(sx2Sx2)a1(sx2+Sx2)+2b1(2.2)
.

Properties of the proposed estimator

Rewriting (2.1) in term of errors, we have

(2.2) SˆD2=k1Sy21+e01+e1α1k2Xˉe21+e1α2expg4e121+g42e11(2.2)

where

(2.3) g4=a1Sx2a1Sx2+b1(2.3)
.

By using Taylor series up to the first order of approximation, we have

(2.3) SˆD2Sy2Sy2+k1Sy21+e0e1α1+g42+e12α1g42+3g428+α1α1+12e0e1α1+g42k2Xˉe2e1e2α2+g42(2.3)

Using (2.3), the bias of SˆD2, is given by

(2.4) BiasSˆD2Sy2k1Sy2Dk2G,(2.4)

where D=1+θλ043g42+4α1g4+α1+18λ22α1+g42, and G=θSxλ03α2+g42. By squaring and taking expectation on both sides of EquationEquation (2.3), we get the mean squared error by using the first order of approximation, which is given by

(2.5) MSESˆD2Sy4+k12Sy4A+k22B2k1Sy4D2k2Sy2G+2k1k2Sy2F,(2.5)

where

A=1+θλ40+λ04α1+g422+α1g4+3g424+α1α1+122λ222α1+g4
,

B=θSx2, and F=θSxλ03(α1+α2+g4)λ21.

The optimum values of k1 and k2 obtained by minimizing (2.5) are k1opt=BDFGABF2, and k2opt=Sy2AGDFABF2. By substituting the optimum values of k1 and k2 in (2.5), we get the minimum MSE of SˆD2, which is given below:

(2.6) MSESˆD2minSy41AG2+BD22DFGABF2.(2.6)

3. Mathematical comparison

In this section, we compare the suggested class of estimator SˆD2 with the existing estimators Sˆy2,SˆR2,Sˆlr2,SˆBT2,SˆUS2, and SˆKCi2.

Condition (i): By (1.1) and (2.6), Var(Sˆy2)>MSESˆD2min if

θλ40+AG2+BD22DFGABF2>1.

Condition (ii): By (1.4) and (2.6), MSE(SˆR2)>MSESˆD2min if

θλ40+λ042λ22+AG2+BD22DFGABF2>1.

Condition (iii): By (1.6) and (2.6), MSE(Sˆlr2)>MSESˆD2min if

θλ401ρ2+AG2+BD22DFGABF2>1.

Condition (iv): By (1.9) and (2.6), MSE(SˆBT2)>MSESˆD2min if

θλ40+λ044λ22+AG2+BD22DFGABF2>1.

Condition (v): By (1.12) and (2.6), MSE(SˆUS2)>MSESˆD2min if

θλ40+g02λ042g0λ22+AG2+BD22DFGABF2>1.

Condition (vi): By (1.17) and (2.6), MSE(SˆKCi2)>MSESˆD2min if

θλ40+gi2λ042giλ22+AG2+BD22DFGABF2>1.

4. Simulation study

In order to verify the theoretical results in Section 3, we have conducted a simulation study by using the idea from Agarwal et al. (Citation2012). We generated six different artificial populations of the auxiliary variable X by using the following probability distributions.

XExponential(λ=3) and XExponential(λ=7), • XUniform(b3=0,b4=1) and XUniform(b3=3,b4=5),

XGamma(α3=4,α4=6) and XGamma(α3=8,α4=10).

After that, the study variable Yis computed as Y=ryx×X+e, taking ryx=0.80, where ryx is the correlation coefficient between the study and the auxiliary variables and eN(0,1) is the error term.

We considered the following steps in R-Software to obtain the MSE’s of the proposed class of estimators:

Step 1: In the first step, we generated a population of size 1000 using a certain type of probability distributions.

Step 2: We obtained population total, minimum and maximum values of the auxiliary variable from Step 1. We also computed the optimum values of the unknown constants of the proposed estimator.

Step 3: We considered different sample sizes for each population to generate the samples using SRSWOR.

Step 4: For each sample size, the values of bias’s and MSE’s are computed for all the estimators considered in this paper.

Step 5: The process in Step 3 and Step 4 is repeated 50,000 times and the results for artificial populations are reported in , whereas the results of the real data sets are summarized in .

Table 2. Mean squared error (MSE) of the estimators using the artificial populations

Table 3. Mean squared error (MSE) of the estimators using empirical data sets

Finally, the MSE’s of the estimators over all replications are obtained by using the following formula. MSE(Sˆk2)min=g=150000Sˆk2Sy2250000, for k=R,lr,BT,US,KCi,D1,D2,,D8.

Figure 1. Graphical display of the MSE’s results of the estimators using the artificial data

Note: The vertical line of the figures shows the MSE’s of the estimators, while the horizontal line indicates the corresponding estimators. For easiness, we denote the estimators by different numbers starting from 1 to 16. For more details, see . Source: Own computations.
Figure 1. Graphical display of the MSE’s results of the estimators using the artificial data

Figure 2. Graphical display of the MSE’s results of the estimators using the artificial data

Note: The vertical line of the figures shows the MSE’s of the estimators, while the horizontal line indicates the corresponding estimators. For easiness, we denote the estimators by different numbers starting from 1 to 16. For more details. see . Source: Own computations.
Figure 2. Graphical display of the MSE’s results of the estimators using the artificial data

5. Numerical examples

To check the performances of the suggested class of estimators, we used three real data sets to compare the MSE’s of different estimators. The description and summary statistics are given by

Data 1. (Bureau of Statistics (Citation2013), p. 135)

Y: Total number of students enrolls in 2012,

X: Total number of government primary and secondary schools for boys and girls 2012.

The summary statistics are given below:

N=36,n=15,Yˉ=148718.70,Xˉ=1054.39,Sy=182315.10,Sx=402.61,XM=2370,Xm=388,Cx=0.38,Cy=1.23,λ40=2365,λ04=4698,λ03=4697,λ21=4698,λ22=8975,ρyx=0.18.

Data 2. (Bureau of Statistics (Citation2013), p. 226)

Y: Employment level in 2012 by divisions,

X: Number of registered factories in 2012 by divisions.

The summary statistics are given below:

N=36,n=15,Yˉ=52432.86,Xˉ=335.78,Sy=178201.10,Sx=451.14,XM=2055,Xm=24,Cx=1.34,Cy=3.3986,ρyx=0.39,λ40=2365,λ04=4698,λ03=4697,λ21=4698,λ22=8975.

.

Data 3. ((Cochran (Citation1963), p. 24)

Y: Food cost of families employment,

X: Weekly income of families.

The summary statistics are given below:

N=33,n=5,Yˉ=27.49,Xˉ=72.55,Sy=10.13,Sx=10.58,XM=95,Xm=58,Cx=0.15,Cy=0.37,ρyx=0.25,λ40=5.55,λ04=2.08,λ03=0.51,λ21=0.54,λ22=2.22.

Figure 3. Graphical display of the MSE’s results of the estimators using the empirical data

Note: The vertical line of the Figures shows the MSE’s of the estimators, while the horizontal line indicates the corresponding estimators. For easiness, we denote the estimators by different numbers starting from 1 to 16. For more details, see . Source: Own computations.
Figure 3. Graphical display of the MSE’s results of the estimators using the empirical data

6. Conclusion

In this paper, we proposed a class of estimators for estimating the population variance of the study variable using some known information of the auxiliary variable. The properties of the proposed class of estimators are compared with other existing estimators. For this purpose, we reported some theoretical conditions in Section 3 under which the proposed estimators are more efficient than the existing estimators. These theoretical conditions are verified through the help of a simulation study and some empirical data sets. MSE’s results of various estimators over the simulation setup are demonstrated in . In comparing the MSE’s of the estimators, it is clear from the table that the proposed class of estimators performs the best over the cited existing estimators. The MSE’s results of various estimators in are plotted in which demonstrates that the MSE’s of the purposed class of estimators are significantly smaller than the MSE’s of other estimators. Similar results are obtained from the empirical data, which also confirms the theoretical results in Section 3. The empirical results are displayed in , which is then graphically shown in . Hence, based on our simulation results as well as through empirical results, we observed that the proposed class of estimators SˆDi2(i=9,10,11,,16) are more efficient than the other considered estimators. Among the suggested class of estimators, SˆD82 is preferable because of its least MSE.

PUBLIC INTEREST STATEMENT

In this article, we have suggested a class of estimators for the estimation of the population variance by using the maximum and minimum values of independent variable. In order to check the performance of the estimators and to verify the theoretical results we conducted a simulation study from different distribution and also used the data sets from real life application and display it graphically which confirmed that the suggested class of estimator is more efficient than the existent estimators because it’s least mean squared errors.

Acknowledgment

This work was supported by NSFC of China with grant [12071329]. We are very thankful to the two unknown referees, and the editor for their insightful comments and suggestions which greatly improved this paper.

Disclosure statement

The authors declare that there is no conflict of interests regarding the publication of this article.

Additional information

Notes on contributors

Umer Daraz

Umer Daraz received his M.Phil degree in Survey Sampling from Quaid-i-Azam University, Islamabad, Pakistan in 2016. He is currently pursuing his PhD degree under the supervision of Prof. Tang Yu at Soochow University, Suzhou, China. His research interests lie in the survey sampling, design experiment and combination design.

Mursala Khan

Mursala Khan got his Doctorate degree from Free University Berlin, Germany. His field of specialization is survey sampling. Currently, he is working as an assistant professor in the Department of Mathematics and Statistics, Riphah International University, Islamabad, Pakistan.

References