Reducing errors-in-variables bias in linear regression using compact genetic algorithms: Journal of Statistical Computation and Simulation: Vol 85 , No 16

Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/00949655.2014.961157?needAccess=true

Abstract

A new technique is devised to mitigate the errors-in-variables bias in linear regression. The procedure mimics a 2-stage least squares procedure where an auxiliary regression which generates a better behaved predictor variable is derived. The generated variable is then used as a substitute for the error-prone variable in the first-stage model. The performance of the algorithm is tested by simulation and regression analyses. Simulations suggest the algorithm efficiently captures the additive error term used to contaminate the artificial variables. Regressions provide further credit to the simulations as they clearly show that the compact genetic algorithm-based estimate of the true but unobserved regressor yields considerably better results. These conclusions are robust across different sample sizes and different variance structures imposed on both the measurement error and regression disturbances.

Keywords:

linear regression
measurement error
compact genetic algorithm

ASM Subject Classification:

Notes

1. More generally, the CGA is also a member of the Estimation of Distribution Algorithms (EDAs). EDAs can be viewed as GAs whose search space is described by a single probability vector instead of an entire population.

2. In particular, Frisch [Citation12] makes use of the analysis framework he develops to discuss, among others, the EIVs problem.

3. see, for example, Greene.[Citation1]

4. Reviewing the downward bias in estimating the coefficient on the return to schooling, Card [Citation28] reports that the attenuation in the coefficient estimates range around 25–33%.

5. Furthermore, it can also turn out that the IV itself may be correlated not only with the endogenous regressor, but also with the dependent variable, in which case the instrument will no longer be viewed as exogenous.

6. In passing, note that the term ‘probability vector’ used therein does not refer to some conventional probability vector in which case the values of the individual elements must sum up to 1. The one used here is actually a vector of probabilities where each individual element shows the likelihood that the corresponding gene of the chromosome can take on the value of 1.

7. As noted previously, the CGA is but one member of a broader family of Evolutionary Algorithms called the ‘distribution of estimation algorithms’ (EDAs) [Citation23]. In subsequent analyses, we consider this point and check our results by performing further analyses.

8. Henceforth, $N (μ, σ)$ refers to the normal distribution where μ denotes the mean and σ the standard deviation.

9. Simulations are performed in a PC with an Intel Core i5 2.27 GHz CPU and 4 GB rams installed. Average CPU times are 0.32, 0.39 and 0.54 s, respectively, for n=30,50 and 100. We used the R software to run the simulations. The package eive (version 2.1) implements our algorithm and can be downloaded from the Cran repositories.

10. We thank the referee for his suggestion to report the relative bias and the relative MSE.

11. The relative bias and relative MSE would also help interpreting the accuracy of the results obtained from the CGA-based technique. Indeed, if either the rBias and rMSE are less than unity, this translates into a weaker performance of the estimator with respect to the OLS, and vice versa.

12. Moreover, even if the regressions have weaker explanatory power in some cases, the CGA still provides estimates with smaller variances than the MC method. We have not reported the variance values in our tables to keep them readable.

13. We thank the referee for pointing this issue out.

14. We have also performed these additional simulations using other configurations of the two error terms as provided in Tables –. The results have not been altered considerably.

15. This conjecture mainly draws on the case where one makes use of a ‘distributional knowledge’ about the error-prone variable. In particular, if it is known that X is not normal, it is then possible to make use of this as if an ‘IV’ is available and, consequently, to derive a linear estimator for β₁ (see, for instance, [Citation6, p.72–73]).

16. For simplicity, we assume that X₁ and X₂ are independent.

17. In particular, the correlation between the ${\hat{X}}^{CGA}$ and the clean X is up to 97%.

18. The p-value of the F-test is 0.0016.

19. Note that we do not know the real value of β₂ and we assume that $\hat{β_{2}}$ is an unbiased estimator of β₂.

20. We acknowledge the referee's proposition to use a different nomenclature for the R².

21. These values correspond, obviously, to those reported in Panel A of Table .

22. Concerning the IV regressions, for instance, finding good instruments still remains a difficult task.

Frisch R. Statistical confluence analysis by means of complete regression systems. Oslo: University Institute of Economics; 1934.

Google Scholar

Greene WH. Econometric analysis. Essex: Pearson; 2012.

Google Scholar

Card D. Estimating the return to schooling: progress on some persistent econometric problems. Econometrica. 2001;69(5):1127–1160. doi: 10.1111/1468-0262.00237

Web of Science ®Google Scholar

Pelikan M, Goldberg DE, Lobo FG. A survey of optimization by building and using probabilistic models. Comput Optim Appl. 2002;21(1):5–20. doi: 10.1023/A:1013500812258

Web of Science ®Google Scholar

Fuller WA. Measurement error models. New York (NY): Wiley; 1987.

Google Scholar

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 1,209.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Share icon
Back to Top

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

Reducing errors-in-variables bias in linear regression using compact genetic algorithms

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Reducing errors-in-variables bias in linear regression using compact genetic algorithms

Abstract

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature