Kernel smoothed probability mass functions for ordered datatypes: Journal of Nonparametric Statistics: Vol 32 , No 3

Abstract

We propose a kernel function for ordered categorical data that overcomes limitations present in ordered kernel functions appearing in the literature on the estimation of probability mass functions for multinomial ordered data. Some limitations arise from assumptions made about the support of the underlying random variable. Furthermore, many existing ordered kernel functions lack a particularly appealing property, namely the ability to deliver discrete uniform probability estimates for some value of the smoothing parameter. We propose an asymmetric empirical support kernel function that adapts to the data at hand and possesses certain desirable features. There are no difficulties arising from zero counts caused by gaps in the data while it encompasses both the empirical proportions and the discrete uniform probabilities at the lower and upper boundaries of the smoothing parameter. We propose likelihood and least-squares cross-validation for smoothing parameter selection and study their asymptotic and finite-sample behaviour.

Keywords:

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 Unordered kernel functions place a binary counting weight on each observation, for example, $1 - λ$ when $X_{i} = x$ and $λ / (c - 1)$ when $X_{i} \neq x$ , $λ \in [0, (c - 1) / c]$ , but cannot assess distance (see Aitchison and Aitken Citation1976, p. 419).

2 We are grateful to an anonymous referee who pointed out that Kokonendji and Zocchi (Citation2010) proposed a family of symmetric and asymmetric counting kernels that encompass the properties discussed herein: they write ‘the empirical proportions and the discrete uniform distribution at the extreme of the admissible smoothing parameter’.

3 To see this, note that $d_{z i} = 0$ when $z = X_{i}$ , and when $λ = 0$ , $l (X_{i}, z, 0) = λ^{d_{i z}} / Λ_{i} = 0^{0} / (0^{0} + \sum_{y \in D, y \neq X_{i}}^{c} 0^{| X_{i} - y |}) = 1$ when $z = X_{i}$ and 0 otherwise. So, for any $X_{i}$ and $z \leq x$ , $l (X_{i}, z, 0)$ will equal 1 if $X_{i}$ equals any value $z \leq x$ and zero otherwise, hence $L (X_{i}, x, λ) = 1 (X_{i} \leq x) = \sum_{z \in D, z \leq x} 1 (X_{i} = z)$ when $λ = 0$ .

4 One can show that the cross-validation function contains two leading terms that are related to λ. The first term is a positive deterministic quantity which is minimised at $\hat{λ} = 1$ , and the second term is a zero-mean $O_{p} (1)$ random variable which is minimised at $\hat{λ} = 1$ with some constant positive probability $δ_{1} \in (0, 1)$ . Therefore, the cross-validation function is minimised at $\hat{λ} = 1$ with a positive probability $δ > δ_{1} \in (0, 1)$ . In general, it is difficult to determine the exact value of δ because the exact value of δ depends on the unknown function $p (\cdot)$ . Simulation in Ouyang et al. (Citation2006) shows that $\hat{λ}$ takes the upper extreme value one roughly at $60 %$ chance and takes values between zero and one at $40 %$ chance. Ouyang et al. (Citation2009) showed similar theoretical and simulation results for the nonparametric regression model with categorical regressors.

5 The Wang and van Ryzin kernel function is $1 - λ$ when $X_{i} = x$ and $0.5 (1 - λ) λ^{| X_{i} - x |}$ when $X_{i} \neq x$ where $0 \leq λ < 1$ . Note, however, that this kernel, like all existing ordered kernel functions of which we are aware, lacks the ability to deliver discrete uniform probabilities for some value of the smoothing parameter. For the normalised version of this kernel, we divide $\hat{p} (x)$ by $\sum_{x \in D} \hat{p} (x)$ to ensure that the probabilities indeed sum to one as per Glad, Hjort, and Ushakov (Citation2003).

6 We are grateful to an anonymous referee who suggested incorporating these methods as comparators and who directed us to the R package Ake (Wansouwé et al. Citation2015) for their implementation. Note that the authors of this package only consider and implement least-squares cross-validation for smoothing parameter selection. Therefore, we beg the reader's forgiveness for including these comparators alongside those for least-squares cross-validation but not for likelihood cross-validation.

7 Typically, a P-value of 0 is simply recorded as, say, $< 2.2 \times 10^{- 16}$ when using the R statistical platform. However, a P-value of 1 would be unusual in a two-sided test setting (the two means would have to be identical) but is commonplace in a one-sided setting when the two means differ in the direction of the null, i.e. one is much larger than the other.

8 The median SSE (× 100) is 0.009 versus 0 for the normalised Wang and van Ryzin versus the proposed kernel, respectively, and the median SSE are identical for likelihood and least-squares cross-validation.

9 The negative binomial provided the best fit overall, but its tail is too thin, i.e. as the number of successful patent applications increases, the probability estimates approach zero too quickly yet there is still a substantial amount of empirical probability mass remaining in the tail.

10 Using a Taylor expansion, $f (Δ_{i}) \equiv \log (1 + Δ_{i}) = Δ_{i} - (1 / 2 (1 + {\bar{Δ}}_{i})^{2}) Δ_{i}^{2}$ , where ${\bar{Δ}}_{i} \in (0, Δ_{i})$ . Note that $1 / (1 + {\bar{Δ}}_{i})^{2} = 1 - 2 {\bar{Δ}}_{i} + O ({\bar{Δ}}_{i}^{2}) = 1 + o_{p} (1)$ because $| {\bar{Δ}}_{i} | \leq max_{1 \leq j \leq n} | {\bar{Δ}}_{j} | = o_{p} (1)$ .

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 912.00 Add to cart

* Local tax will be added as applicable

Kernel smoothed probability mass functions for ordered datatypes

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Kernel smoothed probability mass functions for ordered datatypes

Abstract

Disclosure statement

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature