78
Views
3
CrossRef citations to date
0
Altmetric
Original Articles

Nonparametric conditional density estimation of labour force participation

Pages 835-841 | Published online: 02 Feb 2007
 

Abstract

Labour force participation decision has been studied primarily in a parametric framework. The weaknesses of the parametric estimators to misspecification of the error distribution and to functional form assumptions are well known. This paper compares the predictive performance of widely used parametric and semiparametric estimators with results obtained from nonparametric kernel conditional density estimation with likelihood cross-validated bandwidth selection and mixed data type. The results are striking. The predictive performance of the nonparametric estimator is 95% against 71% to 77% of the parametric and semiparametric estimators. The nonparametric estimator is able to correctly predict the outcome for 83% of non-participants in the labour force as against 15% by probit and logit models. This underscores the need to use nonparametric estimators in studying labour market behaviour.

Acknowledgements

First of all I am grateful to Gary Engelhardt for his constant encouragement and generous support during this research. This paper has benefited tremendously from the outstanding teaching of Jeff Racine and his invaluable comments. I am also thankful to Jeff Racine for generously sharing his N© software for nonparametric estimation. I am fully responsible for all of the remaining errors in this paper. The views expressed here are those of the author and do not necessarily reflect those of the Federal Reserve Bank of Dallas or the Federal Reserve System.

Notes

1 In recent years many government programmes have been specially designed or expanded to provide economic incentives for labour force participation e.g. Earned Income Tax Credit (EITC) in the US.

2 There are applications of nonparametric kernel techniques to investigate other issues e.g. the decision to attend college (Tobias, Citation2003), interest rate models (Niizeki, Citation1998).

3 For Monte Carlo comparisons of parametric and nonparametric quantile regression estimators, see Min and Kim (Citation2004).

4 Almost all these papers have estimated single index models and performed specification tests of parametric null against a semiparametric alternative. In a literature so completely dominated by parametric specifications, these papers represent significant econometric refinements.

5 I am not aware of any major attempt to estimate the labour force participation decision in a completely nonparametric framework where the distribution of the unobservables as well the functional form of the regressors is left unspecified.

6 For example policymakers could be interested in learning about labour force non-participants if they are a target group for a programme to attract them to the labour force. Because a majority of females participate in the labour force, the parametric models are likely to do an unsatisfactory job of predicting the participation decision of a non-participant.

7 Racine (Citation2002) applied nonparametric conditional density estimation method to predict the decision to make a purchase of directly marketed consumer goods. The results were remarkable. The parametric models grossly underestimated purchases by those who actually made a purchase – a group of consumers of primary importance to direct marketers. The logit model correctly predicted purchase for only 8% of the consumers who actually made the purchase. The nonparametric method correctly predicted 39%.

8 He did these comparisons for two different datasets from Switzerland and Germany. He performed two types of specification tests. He conducted the information matrix test to test the parametric null against the alternative of the semiparametric single index specification.

9 The woman was classified as working if her hours of work were positive. The evaluation sample used is a sample of 1870 women from the 1987 wave of PSID. Besides the outcome variable, labour force participation, other variables used in the analysis are: education, experience, number of children under and over six years of age, marital status, and race. provides summary statistics of both the sample under consideration and the evaluation sample.

10 In the interest of space I do not describe the implementation of parametric and semiparametric estimators in this paper. For estimation details of the probit, logit and maximum score model, see Greene (Citation2003).

11 In the labour force participation model that I estimate, there are three continuous regressors and four discrete regressors. The continuous regressors are education, experience and non-labour income. The discrete regressors are marital dummy, race dummy, number of children under the age of six and number of children more than six years old.

12 By unconditional model it is meant that the percentage of participants in the labour force and non-participants are based on actual data by just counting the number of participants and non-participants in the sample without conditioning on other factors.

13 A confusion matrix is a cross-tabulation of the actual outcomes and predicted outcomes. The diagonal elements are those predicted correctly while off-diagonal elements are the ones incorrectly predicted by the model. This matrix has two rows and two columns and hence a total of four cells. The sum of all four cells is the total number of observations in the sample. The number of individuals actually participating in the labour force who are also predicted to be participating are placed in the top left cell, those actually participating but predicted not to be participating are placed in the top right cell, those actually not participating but predicted to be participating get placed in the bottom left cell. Finally those who actually do not participate and are predicted to be non-participants are noted in the bottom right cell.

14 McFadden et al. (Citation1977) statistic is calculated as where pij is the ijth entry of the confusion matrix expressed as a fraction of the total number of observations. This statistic adds up the percentage of correct predictions and includes a penalty for incorrect predictions by the model.

15 The estimated coefficients form the probit, logit and maximum score estimators are presented in Appendix 2.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.