Abstract
Using the information in x, we consider a new estimator that uses an exponential function to estimate the unknown population mean of y in the case of non-responding units. These cases are divided into two categories as Case I and Case II. In Case I, non-response units are only available on y, whereas in Case II, non-response units are available on both x and y. The proposed estimators are derived from both scenarios, accordingly. The necessary comparisons are made theoretically and numerical study on the subject of education is carried out. We conclude that in both non-response schemes, the proposed estimators can be chosen in theory and also in applications, such as the education data.
1. Introduction
In sample surveys, efficient estimators can be used to obtain the unknown population parameters, such as variance, percentage, total and mean. The use of auxiliary variable information is a basic and common method when a new estimator is proposed. The study variable (y) can refer to successful students, while the auxiliary variable (x) can be the number of students per teacher, the ability of the teachers, the number of teachers, teaching methods, and so on. Different forms of estimators, such as ratio, regression, logarithmic, product and exponential, can be seen in the sampling theory when estimating the unknown population parameters using the information from x. The exponential type estimators, on the other hand, become prominent among others [Citation1].
Some information about the various variables may not be fully available every time. Based on this situation, Hansen and Hurwitz [Citation2] proposed a novel approach using the sub-sampling method. They considered the non-response units while estimating population parameters to reduce the effect of non-response and this technique is still popular in the sampling theory literature. The population size (N) consists of two unit groups as response unit and non-response unit in this technique. The sample size is determined by drawing n units from the population using the simple random sampling without replacement (SRSWOR) method. Here, only units are available as response units in this sample, whereas units are obtained as non-response units. The Hansen–Hurwitz method is used to obtain , z>1 from units with extra effort. Here, the value of r can be obtained differently using various z values to show the appropriateness of the proposed estimator for all combinations. In the final part of the technique, units can be used to estimate the unknown parameters. According to and r units, the sample means of y values are denoted as and , respectively.
Hansen and Hurwitz [Citation2] proposed an unbiased estimator for the population mean using the following method as: (1) (1)
The variance of the unbiased estimators is as follows: (2) (2)
In estimator, is the weight of the non-response units for the sample while is the weight of response. In Equation (2), , , and population mean of y is symbolized as .
The case of non-response is divided into two categories as Case I and Case II. The non-response units are only available on y in Case I, whereas they are available on both x and y in Case II. For both approaches, the population mean of x is known.
In sampling theory, one of the most important aims is to estimate the unknown population parameter with an efficient estimator [Citation3]. By this study, we think that we have made a significant contribution to the literature since we have used an exponential function in a new estimator proposed for the unknown population mean in the case of the non-response approach. Besides, the appropriateness of the proposed estimator is examined in detail via theoretical, numerical and simulation studies as well for both non-response approaches. In Section 2, the estimators in the literature for the non-response approach are given. After that, in Section 3, the proposed estimator is thoroughly analysed for Case I and Case II. In Sections 4 and 5, the theoretical comparisons and numerical study are presented, respectively. Then, the simulation study is conducted in Section 6. In the final section, the results are discussed.
2. Existing estimators in literature
Many estimators for estimating the population mean using the sub-sampling method proposed in the literature. Tables and present the ratio, regression and exponential estimators, as well as the MSE equations for these estimators, up to the first order of approximation, for both Cases I and II, respectively. In , represents the sample mean of y under the non-response approach. Here, and represents the coefficient of the population correlation between x and y. Furthermore, and are the sample mean and population mean of x, respectively, and is the population mean of y.
In , represents the sample mean of x under the non-response approach. Besides, , and represent the coefficient of the population correlation between x and y for the non-response group.
According to the estimators in literature, there are some symbols in estimators and their MSEs. The values of , , , k, and represent the unknown constants whose optimum values are used to obtain the minimum MSE and s can take only the values of −1, 0 and 1. Besides, , , and (a,b) are either real numbers or functions of the known parameters of X.
In addition, the estimators, proposed by Oncel Cekim and Cingi [Citation20], Javed and Irfan [Citation3], Kadilar and Oncel Cekim [Citation21] and Salehi and Seber [Citation22], are also important estimators in the literature.
3. Proposed estimators
This section introduces a new estimator for estimating the population mean in the presence of non-responding schemes. In Subsections 3.1 and 3.2, this estimator is examined for Cases I and II, respectively.
3.1. Case I
For Case I, we propose estimator as (3) (3)
Here, can only have 0 or 1 value to make the estimator ratio or product estimator. If takes the value of 1, it is a ratio type estimator; if takes the value of 0, it is a product type estimator. and represent the unknown constants whose optimum values are used later for the minimum .
To obtain the , and the minimum , the notations are used under the Case I as follows:
Using these notations, we rewrite the estimator in Equation (3) as: (4) (4)
We obtain the following result by expanding the right-hand side of the Equation (4) and ignoring two and higher powers of and terms: (5) (5)
We take the expectation of Equation (5) and derive the , respectively, as: (6) (6)
For the , we take square both sides of the Equation (5) and then expectation, respectively, as follows: (7) (7)
To simplify the mathematical notation, we can rewrite the for the Case I as: (8) (8) where
The optimum values of and , and , are obtained using the derivation as: (9) (9)
We substitute the and values into the Equation (8) and then we obtain the minimum for Case I as: (10) (10)
3.2. Case II
For Case II, we propose estimator as (11) (11) where and represent the unknown constants whose optimum values are used later for the minimum .
To obtain the , and the minimum , the notations are used under Case II as follows:
Using these notations, we rewrite the estimator in the Equation (11) as: (12) (12)
As in Case I, the is obtained by following the similar steps in Case II as well: (13) (13) (14) (14) (15) (15)
For the , we take square both sides of the Equation (13) and then expectation, respectively, as follows: (16) (16)
We can rewrite the in Equation (16) for Case II to simplify the mathematical notations as: and (17) (17)
The optimal values of and , and , are obtained, respectively, as follows:
We substitute the and values into the Equation (17) and then we obtain the minimum for Case II as: (18) (18)
4. Efficiency comparisons
In this section, the proposed estimators, and , are compared with several estimators in the literature in Subsections 4.1 and 4.2, respectively, to demonstrate the theoretical appropriateness for Case I and Case II, respectively.
4.1. Efficiency comparisons for the first case
We compare the with the MSEs of the estimators listed in and obtain the following efficiency conditions for Case I as follows:
(19) (19)
(20) (20)
(21) (21)
(22) (22)
(23) (23)
(24) (24)
(25) (25)
(26) (26)
Here, the MSE of the estimator is equal to the MSEs of the and estimators. For this reason, the efficiency conditions are similar to the conditions in (22) for these estimators.
Based on the condition results, we conclude that the estimator is more effective than other estimators in the literature under the conditions between (19) and (26) for Case I.
4.2. Efficiency comparisons for the second case
We compare the with the MSEs of the estimators listed in and obtain the following efficiency conditions for Case II as follows:
(27) (27)
(28) (28)
(29) (29)
(30) (30)
(31) (31)
(32) (32)
(33) (33)
(34) (34)
(35) (35)
(36) (36)
Here, the MSE of the estimator is equal to the MSEs of the and estimators. For this reason, the efficiency conditions are similar to the conditions in the Equation (33) for these estimators.
Based on the condition results, we conclude that the estimator is more effective than other estimators under the conditions between (27) and (36) for Case II.
5. Empirical study
After the theoretical comparisons, we use the numerical research on education to present the appropriateness of the proposed estimator in the cases of non-response. The required data set information are given as follows (Source: Satici and Kadilar [Citation23]):
Table
The numbers of teachers and successful students in Turkey’s 261 homogeneous districts in 2006 are considered in this population (Satici and Kadilar [Citation23]). In the districts, the numbers of elementary school teachers are used as the auxiliary variable (x) and the numbers of successful students in the transition to the secondary education exam are taken as the study variable (y). In this population, the last 25% of units (, 65 units) is represented as a group of non-response (missing data). Note that in this data set, the correlation coefficient between the study variable and the auxiliary variable is positive. For this reason, the value of is considered as one.
The MSE values of the existing estimators in the literature, listed in Tables and , as well as the MSE values of the and estimators, are obtained using the data set. Besides, the Percent Relative Efficiencies (PREs) of the proposed estimators (, ) and existing estimators in literature with respect to the Hansen–Hurwitz estimator () are computed by using the PRE formula as follows:
According to Case I and II, the results of the MSE and PRE values are given in Tables and . According to the obtained results in Tables and , the proposed estimator has the smallest MSE and highest PRE values among all other compared estimators in the literature of Case I and Case II, as well. As a result, we conclude that the proposed and estimators can be used to estimate the population mean in both cases of non-response.
6. Simulation study
In this section, we conduct the performance of the proposed and estimators, respectively, with the simulation study through R software. The simulation study design is carried out by taking a population size N = 1000 observations, comprising 25% non-response values using different z values for both cases. We use the multivariate normal distribution to generate the values for the auxiliary and the study variables, as well. According to Case I and Case II, the MSE and PRE values of the proposed, and , and various existing estimators are given in Tables and . Here, the PRE values are computed concerning the classical ratio estimator, and , according to Cases I and II, respectively.
In the first case, we assume that the data set follows the bivariate normal distribution having the means (7,1) and the standard deviations (0.1,10), with the correlation coefficient as 0.95. Here, is considered as one because the correlation coefficient is positive.
According to , the proposed estimator is more efficient than compared estimators under Case I. The proposed estimator has the minimum MSE and the highest PRE values, as well.
In the second case, we assume that the data set follows the bivariate normal distribution having means (1, 0.1), and standard deviations (1, 10), with the correlation coefficient as 0.95. As in Case I, the value of is considered as 1 because the correlation coefficient is positive.
As observed in , for z = 2 and z = 3, the proposed estimator is again the best estimator for the population mean of the study variable for Case II.
7. Conclusion
Based on the auxiliary variable information, several population mean estimators can be found in the literature when all information is available. Hansen and Hurwitz [Citation2] developed a technique in case all information may not always be available. This study uses the Hansen–Hurwitz method and proposes a new exponential estimator for the unknown population mean of y by using the information of x. Using this method, the proposed estimators are examined in Case I and Case II, separately. Statistical properties of the estimators, such as bias, MSE and the minimum MSE, are derived. In the first step, the proposed estimators are theoretically compared with the various estimators in the literature according to the related cases. Based on these comparison results, the proposed estimators can be used under the obtained conditions, instead of estimators in the literature. These obtained conditions are given between Equations (19)–(26) for the Case I and Equations (27)–(36) for the Case II. After that, educational data set is used in the numerical comparison. Numerical study confirms that the proposed estimators have the minimum MSE and the maximum PRE values among compared estimators under the non-response approaches. Besides, the simulation study is conducted to show the performance of the proposed estimators. Based on all results, we recommend the proposed estimates for the non-response case.
Acknowledgements
This publication is a part of PhD thesis of the first author.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- Solanki RS, Singh HP, Rathour A. An alternative estimator for estimating the finite population mean using auxiliary information in sample surveys. Int Sch Res Notices. 2012:657682. doi:10.5402/2012/657682
- Hansen MH, Hurwitz WN. The problem of non-response in sample surveys. J Am Stat Assoc. 1946;41(236):517–529.
- Javed M, Irfan M. A simulation study: new optimal estimators for population mean by using dual auxiliary information in stratified random sampling. J Taibah Univ Sci. 2020;14(1):557–568. doi:10.1080/16583655.2020.1752004
- Rao PSRS. Ratio estimation with sub sampling the non-respondents. Surv Methodol. 1986;12:217–230.
- Singh R, Kumar M, Chaudhary MK, et al. Estimation of mean in presence of non-response using exponential estimator. In finite study. arXiv preprint arXiv:0906.2462. 2009.
- Olufadi Y, Kumar S. Ratio-cum-product estimator using exponential estimator in the presence of non-response. J Adv Comput. 2014;3(1):1–11.
- Yadav SK, Subramani J, Misra S, et al. Improved estimation of population mean in presence of non-response using exponential estimator. Int J Agric Stat Sci. 2016;12(1):271–276.
- Pal SK, Singh HP. A class of ratio-cum-ratio-type exponential estimators for population mean with sub sampling the non-respondents. Jordan J Math Stat. 2017;10(1):73–94.
- Pal SK, Singh HP. Estimation of finite population mean using auxiliary information in presence of non-response. Commun Stat Simul Comput. 2018;47(1):143–165.
- Dansawad N. A class of exponential estimator to estimate the population mean in the presence of non-response. Naresuan Univ J Sci Techn. 2019;27(4):20–26.
- Singh GN, Usman M. Ratio-to-product exponential-type estimators under non-response. Jordan J Math Stat. 2019;12(4):593–616.
- Sinha RR, Kumar V. Regression cum exponential estimators for finite population mean under incomplete information. J Stat Manag Syst. 2017;20(3):355–368.
- Unal C, Kadilar C. Improved family of estimators using exponential function for the population mean in the presence of non-response. Commun Stat Theory Methods. 2021;50(1):237–248.
- Unal C, Kadilar C. A new family of exponential type estimators in the presence of non-response. J Math Fund Sci. 2021;53(1):1–16.
- Cochran WG. Sampling techniques. New York (NY): John Wiley and Sons; 1997.
- Kumar S, Bhougal S. Estimation of the population mean in presence of non-response. Commun Stat Appl Methods. 2011;18(4):537–548.
- Kumar S. Improved exponential estimator for estimating the population mean in the presence of non-response. Commun Stat Appl Methods. 2013;20(5):357–366.
- Riaz S, Nazeer A, Abbasi J, et al. On the generalized class of estimators for estimation of finite population mean in the presence of non-response problem. J Prime Res Math. 2020;16(1):52–63.
- Unal C, Kadilar C. Exponential type estimator for the population mean in the presence of non-response. J Stat Manag Syst. 2020;23(3):603–615.
- Cekim H O, Cingi H. Some estimator types for population mean using linear transformation with the help of the minimum and maximum values of the auxiliary variable. Hacettepe J Math Stat. 2017;46(4):685–694.
- Kadilar C, Oncel Cekim H. Hartley–ross type estimators in simple random sampling. AIP Conf Proc. 2015;1648:610007. doi:10.1063/1.4912849
- Salehi MM, Seber GAF. A new estimator and approach for estimating the subpopulation parameters. J Taibah Univ Sci. 2021;15(1):288–294.
- Satici E, Kadilar C. Ratio estimator for the population mean at the current occasion in the presence of non-response in successive sampling. Hacettepe J Math Stat. 2011;40(1):115–124.