Abstract
This paper focuses on the link between non-parametric survival analysis and three distributions. The delta method is applied to derive the variances of the non-parametric estimators of three distributions: the distribution of durations (DD), the cross-sectional distribution of ages (CSA) and the cross-sectional distribution of (completed) durations (CSD). The non-parametric estimator of the the cross-sectional distribution of durations (CSD) has been defined and derived by Dixon (Citation2012) and used in the generalized Taylor price model (GTE) by Dixon and Le Bihan (Citation2012). The Monte Carlo method is applied to evaluate the variances of the estimators of DD and CSD and how their performance varies with sample size and the censoring of data. We apply those estimators to two data sets: the UK CPI micro-price data and waiting-time data from UK hospitals. Both the estimates of the distributions and their variances are calculated. Depending on the empirical results, the estimated variances indicate that the DD and CSD estimators are all significant.
Acknowledgments
We are grateful for very helpful comments from Patrick Minford, Kul Luintel, Walter Distaso, seminar participants at Cardiff University and also from participants at the 2018 China Meeting of the Econometric Society. We would also like to thank the editor and referee for their comments and advice.
Notes
1 We use the term distribution as short hand for discrete probability density function.
2 This is also known as the unconditional hazard function.
3 Durations are censored if we do not observe their beginning (left-censored) or their end (right-censored). It is common practice in survival analysis not to used left-censored data, which is why we focus on the right-censored data.
4 We summarise this derivation in the online appendix.
5 DD is NA(not available) when i = 0. The reason is that
6 The cross-section is length biased, so that the probability of observing a spell is proportional to length. The CSA has an interruption bias, since the spells are incomplete. With a constant hazard, the two biases exactly cancel out. This happens when DD follows a Bernoulli distribution with a hazard rate that is constant (in macroeconomics this is used in the discrete-time Calvo model of pricing).
7 Slutsky’s theorem states that if there exist two random variables or vectors Xi and Yi, and those variables or vectors satisfy and
then there exists the relationship:
Where means that Xi converges to the fixed value X in distribution;
means that Yi converges to the constant point c in probability.
8 The maximum likelihood estimator is close to the mean value of Si in large sample size. the Si can be replaced by
in Greenwood formula. At this point, we replace xi by
and y by
9 This method could also be extended to include left-censored data or other data imperfections.
10 The interval can be defined as the “first” period, and
is the “i”-th period. At this point, all the formulae are slightly different from previously result. For example, the estimator of CSD is
11 Since the parameter of the exponential distribution of censored time and observed time are 0.5 and 2, separately. The right-censored proportion of the total sample can be known as 0.8 = The algebra is shown by Efron (Citation1981).
12 That is, we include the right censored data in Ni, but have only uncensored data in Di
13 The benchmark value calculated from the Monte Carlo simulation. It is very close to the true value.
14 The CSA is the special case of the CSD, so we only provide the empirical results of CSD.
15 In the simulation results, the coefficient i of equation (24) is ignored in the simulation process. The reason is that i is a constant parameter for each ai.
16 As Cox (Citation1990) and Franz (Citation2007) have shown, the delta method is a robust method for calculating the confidence interval for the ratio variable if the coefficient of variation, CV, of denominator of the ratio variable is a small value, where In the CPI micro-data, CV = 0.0061795. Since we use the delta method, we can interpret the ratio of the estimator to its standard deviation as a t-statistic, demonstrating that it is significantly different from zero. In Tian and Dixon (Citation2022), we evaluate the empirical size for the delta approximation of CSD estimator. The empirical results indicate that delta method is valid to do the null hypothesis test and construct the confidence interval for CSD estimator by using the critical value from student's t-distribution.
17 As with the CPI data, the CV = 0.01071 is small.
18 As shown in the online appendix, the KM estimator is a maximum likelihood estimator so that converges to the true value
and the marginal hazard function
converges to the true value hi. At this point, we show that those result can be derived from the delta method. In the online appendix, we show how the KM estimator can be derived as an MLE.