348
Views
2
CrossRef citations to date
0
Altmetric
Functional, Graph, and Tree-Based Approaches

A Tree-Based Semi-Varying Coefficient Model for the COM-Poisson Distribution

&
Pages 827-846 | Received 23 Jan 2019, Accepted 30 Mar 2020, Published online: 15 May 2020
 

Abstract

We propose a tree-based semi-varying coefficient model for the Conway–Maxwell–Poisson (CMP or COM-Poisson) distribution which is a two-parameter generalization of the Poisson distribution and is flexible enough to capture both under-dispersion and over-dispersion in count data. The advantage of tree-based methods is their scalability to high-dimensional data. We develop CMPMOB, an estimation procedure for a semi-varying coefficient model, using model-based recursive partitioning (MOB). The proposed framework is broader than the existing MOB framework as it allows node-invariant effects to be included in the model. To simplify the computational burden of the exhaustive search employed in the original MOB algorithm, a new split point estimation procedure is proposed by borrowing tools from change point estimation methodology. The proposed method uses only the estimated score functions without fitting models for each split point and, therefore, is computationally simpler. Since the tree-based methods only provide a piece-wise constant approximation to the underlying smooth function, we further propose the CMPBoost semi-varying coefficient model which uses the gradient boosting procedure for estimation. The usefulness of the proposed methods are illustrated using simulation studies and a real example from a bike sharing system in Washington, DC. Supplementary files for this article are available online.

Acknowledgments

The authors thank the associate editor and the three anonymous reviewers for their valuable suggestions which led to significant improvements in the article.

Notes

1 Care should be taken while setting the value for ξ, as large values can sometimes lead to nonconvergence of the algorithm. It is therefore advisable to first experiment with a few values and observe the convergence path (sequence of 2l values) and number of iterations.

2 To remove possible overfitting associated with the number of boosting iterations, for each M we let the model fitted to the training data run until there is not much change in 2l, and we fix the same number of iterations for the model on the test data for that specific M. For example, if the model with M = 15 takes 170 iteration for the training data, the same 170 iterations are used to evaluate the prediction error on the test data.

Additional information

Funding

This research is partially supported by Ministry of Science and Technology, Taiwan, grant 105-2410-H-007-034-MY3 (both authors) and grant 107-2811-M-007-1047 (first author).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.