Abstract
Overdispersion is a common phenomenon in count or frequency responses in Poisson models. For example, number of car accidents on a highway during a year period. A similar phenomenon is observed in electric power systems, where cascading failures often follows some distribution with inflated zero. When the response contains an excess amount of zeros, zero-inflated Poisson (ZIP) is the most favourable model. However, during the data collection process, some of the covariates cannot be accessed directly or are measured with error among numerous disciplines. To the best of our knowledge, little existing work is available in the literature that tackles the population heterogeneity in the count response while some of the covariates are measured with error. With the increasing popularity of such outcomes in modern studies, it is interesting and timely to study zero-inflated Poisson models in which some of the covariates are subject to measurement error while some are not. We propose a flexible partial linear single index model for the log Poisson mean to correct bias potentially due to the error in covariates or the population heterogeneity. We derive consistent and locally efficient semiparametric estimators and study the large sample properties. We further assess the finite sample performance through simulation studies. Finally, we apply the proposed method to a real data application and compare with existing methods that handle measurement error in covariates.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
Notes: ‘Correct (Pois)’ refers to the results in Liu and Ma [Citation7] which models Y correctly according the the data generating process while ‘Misspecified (ZIP)’ models Y as a zero-inflated Poisson model which is incorrect to the data generating process. ‘Local 1’ uses a posited ; ‘Local 2’ uses a posited
. RC Normal is regression calibration where
is calculated under a normal distribution. RC Uniform is regression calibration where
is calculated under a uniform distribution. The truth is
. The dimension of Z is 3. For each method we report the mean, sample standard deviation (emp.sd), the average of the estimated standard deviation (est.sd) and the coverage of the estimated
confidence interval (
CI).
Notes: ‘Local 1’ uses a posited ; ‘Local 2’ uses a posited
. RC Normal is regression calibration where
is calculated under a normal distribution. RC Uniform is regression calibration where
is calculated under a uniform distribution. The truth is
. The dimension of Z is 10. For each method we report the mean, sample standard deviation (emp.sd), the average of the estimated standard deviation (est.sd) and the coverage of the estimated
confidence interval (
CI).
Notes: Y is generated from zero-inflated negative binomial distribution. ‘Local 1’ uses a posited ; ‘Local 2’ uses a posited
. RC Normal is regression calibration where
is calculated under a normal distribution. RC Uniform is regression calibration where
is calculated under a uniform distribution. The truth is
. The dimension of Z is 10. For each method we report the mean, sample standard deviation (emp.sd), the average of the estimated standard deviation (est.sd) and the coverage of the estimated
confidence interval (
CI).
Notes: Sample size is 100, dimension of is 20. The truth is
. ‘Local 1’ uses a posited
; ‘Local 2’ uses a posited
. RC Normal is regression calibration where
is calculated under a normal distribution. RC Uniform is regression calibration where
is calculated under a uniform distribution. For each method we report the mean, sample standard deviation (emp.sd), the average of the estimated standard deviation (est.sd) and the coverage of the estimated
confidence interval (
CI).
Notes: LL and UL denote lower limit and upper limit of the confidence confidence. ‘ZIP’ indicates fitting via a zero-inflated Poisson model. ‘Pois’ indicates fitting via a Poisson model. ‘Naive’ ignores the measurement error. ‘RC Normal’ is regression calibration where
is calculated under a normal distribution for X. ‘RC Uniform’ is regression calibration where
is calculated under a uniform distribution for X. ‘Local 1’ and ‘Local 2’ adopt working models
and
for
, respectively.