113
Views
1
CrossRef citations to date
0
Altmetric
Research Articles

The variances of non-parametric estimates of the cross-sectional distribution of durations

ORCID Icon &
Pages 1243-1264 | Published online: 09 Sep 2022
 

Abstract

This paper focuses on the link between non-parametric survival analysis and three distributions. The delta method is applied to derive the variances of the non-parametric estimators of three distributions: the distribution of durations (DD), the cross-sectional distribution of ages (CSA) and the cross-sectional distribution of (completed) durations (CSD). The non-parametric estimator of the the cross-sectional distribution of durations (CSD) has been defined and derived by Dixon (Citation2012) and used in the generalized Taylor price model (GTE) by Dixon and Le Bihan (Citation2012). The Monte Carlo method is applied to evaluate the variances of the estimators of DD and CSD and how their performance varies with sample size and the censoring of data. We apply those estimators to two data sets: the UK CPI micro-price data and waiting-time data from UK hospitals. Both the estimates of the distributions and their variances are calculated. Depending on the empirical results, the estimated variances indicate that the DD and CSD estimators are all significant.

JEL Codes:

Acknowledgments

We are grateful for very helpful comments from Patrick Minford, Kul Luintel, Walter Distaso, seminar participants at Cardiff University and also from participants at the 2018 China Meeting of the Econometric Society. We would also like to thank the editor and referee for their comments and advice.

Notes

1 We use the term distribution as short hand for discrete probability density function.

2 This is also known as the unconditional hazard function.

3 Durations are censored if we do not observe their beginning (left-censored) or their end (right-censored). It is common practice in survival analysis not to used left-censored data, which is why we focus on the right-censored data.

4 We summarise this derivation in the online appendix.

5 DD is NA(not available) when i = 0. The reason is that â0d=Ŝ1ĥ0.

6 The cross-section is length biased, so that the probability of observing a spell is proportional to length. The CSA has an interruption bias, since the spells are incomplete. With a constant hazard, the two biases exactly cancel out. This happens when DD follows a Bernoulli distribution with a hazard rate that is constant (in macroeconomics this is used in the discrete-time Calvo model of pricing).

7 Slutsky’s theorem states that if there exist two random variables or vectors Xi and Yi, and those variables or vectors satisfy Xid.X and Yip.c, then there exists the relationship:

f(Xi,Yi)d.f(X,c)

Where Xid.X means that Xi converges to the fixed value X in distribution; Yip.c means that Yi converges to the constant point c in probability.

8 The maximum likelihood estimator Ŝi is close to the mean value of Si in large sample size. the Si can be replaced by Ŝi in Greenwood formula. At this point, we replace xi by x̂i and y by ŷ

9 This method could also be extended to include left-censored data or other data imperfections.

10 The interval (0,r1] can be defined as the “first” period, and (ri1,ri] is the “i”-th period. At this point, all the formulae are slightly different from previously result. For example, the estimator of CSD is ai=iSui1huik=0uFSk

11 Since the parameter of the exponential distribution of censored time and observed time are 0.5 and 2, separately. The right-censored proportion of the total sample can be known as 0.8 = 0.52+0.5. The algebra is shown by Efron (Citation1981).

12 That is, we include the right censored data in Ni, but have only uncensored data in Di

13 The benchmark value calculated from the Monte Carlo simulation. It is very close to the true value.

14 The CSA is the special case of the CSD, so we only provide the empirical results of CSD.

15 In the simulation results, the coefficient i of equation (24) is ignored in the simulation process. The reason is that i is a constant parameter for each ai.

16 As Cox (Citation1990) and Franz (Citation2007) have shown, the delta method is a robust method for calculating the confidence interval for the ratio variable if the coefficient of variation, CV, of denominator of the ratio variable is a small value, where CV=σ/μ. In the CPI micro-data, CV = 0.0061795. Since we use the delta method, we can interpret the ratio of the estimator to its standard deviation as a t-statistic, demonstrating that it is significantly different from zero. In Tian and Dixon (Citation2022), we evaluate the empirical size for the delta approximation of CSD estimator. The empirical results indicate that delta method is valid to do the null hypothesis test and construct the confidence interval for CSD estimator by using the critical value from student's t-distribution.

17 As with the CPI data, the CV = 0.01071 is small.

18 As shown in the online appendix, the KM estimator is a maximum likelihood estimator so that Ŝi1 converges to the true value Si1 and the marginal hazard function ĥi converges to the true value hi. At this point, we show that those result can be derived from the delta method. In the online appendix, we show how the KM estimator can be derived as an MLE.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 578.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.