243
Views
4
CrossRef citations to date
0
Altmetric
Articles

Kernel smoothed probability mass functions for ordered datatypes

, &
Pages 563-586 | Received 25 Oct 2018, Accepted 12 Apr 2020, Published online: 12 May 2020
 

Abstract

We propose a kernel function for ordered categorical data that overcomes limitations present in ordered kernel functions appearing in the literature on the estimation of probability mass functions for multinomial ordered data. Some limitations arise from assumptions made about the support of the underlying random variable. Furthermore, many existing ordered kernel functions lack a particularly appealing property, namely the ability to deliver discrete uniform probability estimates for some value of the smoothing parameter. We propose an asymmetric empirical support kernel function that adapts to the data at hand and possesses certain desirable features. There are no difficulties arising from zero counts caused by gaps in the data while it encompasses both the empirical proportions and the discrete uniform probabilities at the lower and upper boundaries of the smoothing parameter. We propose likelihood and least-squares cross-validation for smoothing parameter selection and study their asymptotic and finite-sample behaviour.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 Unordered kernel functions place a binary counting weight on each observation, for example, 1λ when Xi=x and λ/(c1) when Xix, λ[0,(c1)/c], but cannot assess distance (see Aitchison and Aitken Citation1976, p. 419).

2 We are grateful to an anonymous referee who pointed out that Kokonendji and Zocchi (Citation2010) proposed a family of symmetric and asymmetric counting kernels that encompass the properties discussed herein: they write ‘the empirical proportions and the discrete uniform distribution at the extreme of the admissible smoothing parameter’.

3 To see this, note that dzi=0 when z=Xi, and when λ=0, l(Xi,z,0)=λdiz/Λi=00/(00+yD,yXic0|Xiy|)=1 when z=Xi and 0 otherwise. So, for any Xi and zx, l(Xi,z,0) will equal 1 if Xi equals any value zx and zero otherwise, hence L(Xi,x,λ)=1(Xix)=zD,zx1(Xi=z) when λ=0.

4 One can show that the cross-validation function contains two leading terms that are related to λ. The first term is a positive deterministic quantity which is minimised at λˆ=1, and the second term is a zero-mean Op(1) random variable which is minimised at λˆ=1 with some constant positive probability δ1(0,1). Therefore, the cross-validation function is minimised at λˆ=1 with a positive probability δ>δ1(0,1). In general, it is difficult to determine the exact value of δ because the exact value of δ depends on the unknown function p(). Simulation in Ouyang et al. (Citation2006) shows that λˆ takes the upper extreme value one roughly at 60% chance and takes values between zero and one at 40% chance. Ouyang et al. (Citation2009) showed similar theoretical and simulation results for the nonparametric regression model with categorical regressors.

5 The Wang and van Ryzin kernel function is 1λ when Xi=x and 0.5(1λ)λ|Xix| when Xix where 0λ<1. Note, however, that this kernel, like all existing ordered kernel functions of which we are aware, lacks the ability to deliver discrete uniform probabilities for some value of the smoothing parameter. For the normalised version of this kernel, we divide pˆ(x) by xDpˆ(x) to ensure that the probabilities indeed sum to one as per Glad, Hjort, and Ushakov (Citation2003).

6 We are grateful to an anonymous referee who suggested incorporating these methods as comparators and who directed us to the R package Ake (Wansouwé et al. Citation2015) for their implementation. Note that the authors of this package only consider and implement least-squares cross-validation for smoothing parameter selection. Therefore, we beg the reader's forgiveness for including these comparators alongside those for least-squares cross-validation but not for likelihood cross-validation.

7 Typically, a P-value of 0 is simply recorded as, say, <2.2×1016 when using the R statistical platform. However, a P-value of 1 would be unusual in a two-sided test setting (the two means would have to be identical) but is commonplace in a one-sided setting when the two means differ in the direction of the null, i.e. one is much larger than the other.

8 The median SSE (× 100) is 0.009 versus 0 for the normalised Wang and van Ryzin versus the proposed kernel, respectively, and the median SSE are identical for likelihood and least-squares cross-validation.

9 The negative binomial provided the best fit overall, but its tail is too thin, i.e. as the number of successful patent applications increases, the probability estimates approach zero too quickly yet there is still a substantial amount of empirical probability mass remaining in the tail.

10 Using a Taylor expansion, f(Δi)log(1+Δi)=Δi(1/2(1+Δ¯i)2)Δi2, where Δ¯i(0,Δi). Note that 1/(1+Δ¯i)2=12Δ¯i+O(Δ¯i2)=1+op(1) because |Δ¯i|max1jn|Δ¯j|=op(1).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 912.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.