Full article: Approximation by finite mixtures of continuous density functions that vanish at infinity

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Given sufficiently many components, it is often cited that finite mixture models can approximate any other probability density function (pdf) to an arbitrary degree of accuracy. Unfortunately, the nature of this approximation result is often left unclear. We prove that finite mixture models constructed from pdfs in $C_{0}$ can be used to conduct approximation of various classes of approximands in a number of different modes. That is, we prove approximands in $C_{0}$ can be uniformly approximated, approximands in $C_{b}$ can be uniformly approximated on compact sets, and approximands in $L_{p}$ can be approximated with respect to the $L_{p}$ , for $p \in [1, \infty)$ . Furthermore, we also prove that measurable functions can be approximated, almost everywhere.

Keywords:

PUBLIC INTEREST STATEMENT

Finite mixture models are an expansive and expressive class of probability models that have been successfully applied in many situations where data follow a complex generative process that may be highly heterogeneous. It has long been known that finite mixture models, under sufficient regularity conditions, can approximate any probability density functions to arbitrary degrees of accuracy, and such results have been established under varying assumptive restrictions. Our work seeks to provide the weakest set of assumptions in order to establish approximation theoretic results over the widest class of probability density problems, possible. The result provides further evidence towards the success of mixture models in applications, and provides mathematical guarantees to practitioners who apply mixture models in their analytic problems.

1 Introduction

Let $x$ be an element in the Euclidean space, defined by $R^{n}$ and the norm ${∥\cdot∥}_{2}$ , for some $n \in N$ . Let $f : R^{n} \to R$ be a function, such that $f \geq 0$ , everywhere, and $\int f d λ = 1$ , where $λ$ is the Lebesgue measure. We say that $f$ is a probability density function (pdf) on the domain $R^{n}$ (an expression that we will drop, from hereon in). Let $g : R^{n} \to R$ be another pdf, and for each $m \in N$ , define the functional class:

M_{m}^{g} = \{h : h (x) = \sum_{i = 1}^{m} c_{i} \frac{1}{σ_{i}^{n}} g (\frac{x - μ_{i}}{σ_{i}}), μ_{i} \in R^{n}, σ_{i} \in R_{+}, c \in S^{m - 1}, i \in [m]\},

where $c^{T} = (c_{1}, \dots, c_{m})$ , $R_{+} = (0, \infty)$ ,

S^{m - 1} = \{c \in R^{m} : \sum_{i = 1}^{m} c_{i} = 1 and c_{i} \geq 0, \forall i \in [m]\},

$[m] = \{1, \dots, m\}$ , and $h_{m}^{ϕ} \in M_{m}^{ϕ}$ is the matrix transposition operator. We say that any $h \in M_{m}^{g}$ is a $m - c o m p o n e n t$ location-scale finite mixture of the pdf $g$ .

The study of pdfs in the class $M_{m}^{g}$ is an evergreen area of applied and technical research, in statistics. We point the interested reader to the many comprehensive books on the topic, such as Everitt & Hand (Citation1981), Titterington et al. (Citation1985), McLachlan & Basford (Citation1988), Lindsay (Citation1995), McLachlan & Peel (Citation2000), Fruwirth-Schnatter (Citation2006), Schlattmann (Citation2009), Mengersen et al. (Citation2011), and Fruwirth-Schnatter et al. (Citation2019).

Much of the popularity of finite mixture models stem from the folk theorem, which states that for any density $f$ , there exists an $h \in M_{m}^{g}$ , for some sufficiently large number of components $m \in N$ , such that $h$ approximates $f$ arbitrarily closely, in some sense. Examples of this folk theorem come in statements such as “provided the number of component densities is not bounded above, certain forms of mixture can be used to provide arbitrarily close approximation to a given probability distribution”(Titterington et al., Citation1985, p. 50), “the [mixture] model forms can fit any distribution and significantly increase model fit” (Walker and Ben-Akiva, Citation2011, p. 173), and “a mixture model can approximate almost any distribution” (Yona, Citation2011, p. 500). Other statements conveying the same sentiment are reported in Nguyen & McLachlan (Citation2019). There is a sense of vagary in the reported statements, and little is ever made clear regarding the technical nature of the folk theorem.

In order to proceed, we require the following definitions. We say that $f$ is compactly supported on $K \subset R^{n}$ , if $K$ is compact and if $1_{K^{C}} f = 0$ , where $1_{X}$ is the indicator function that takes value 1 when $x \in X$ and $0$ , elsewhere, and ${(\cdot)}^{C}$ is the set complement operator (i.e., $X^{C} = R^{n} ∖ X$ ). Here, $X$ is a generic subset of $R^{n}$ . Furthermore, we say that $f \in L_{p} (X)$ for any $1 \leq p < \infty$ , if

{∥f∥}_{L_{p} (X)} = {(\int {|1_{X} f|}^{p} d λ)}^{1 / p} < \infty,

and for $p = \infty$ , if

{∥f∥}_{L_{\infty} (X)} = i n f \{a \geq 0 : λ (\{x \in X : |f (x)| > a\}) = 0\} < \infty,

where we call ${∥\cdot∥}_{L_{p} (X)}$ the $L_{p} - n o r m$ on $X$ . When $X = R^{n}$ , we shall write ${∥\cdot∥}_{L_{p} (R^{n})} = {∥\cdot∥}_{L_{p}}$ . In addition, we define the so-called Kullback-Leibler divergence, see Kullback & Leibler, (Citation1951), between any two pdfs $f$ and $g$ on $X$ as

K L_{X} (f, g) = \int 1_{X} f log (\frac{f}{g}) d λ .

In Nguyen & McLachlan (Citation2019), the approximation of pdfs $f$ by the class $M_{m}^{g}$ was explored in a restrictive setting. Let $\{h_{m}^{g}\}$ be a sequence of functions that draw elements from the nested sequence of sets $\{M_{m}^{g}\}$ (i.e., $h_{1}^{g} \in M_{1}^{g}, h_{2}^{g} \in M_{2}^{g}, \dots$ ). The following result of Zeevi & Meir (Citation1997) was presented in Nguyen & McLachlan (Citation2019), along with a collection of its implications, such as the results of from Li & Barron (Citation1999) and Rakhlin et al. (Citation2005).

Theorem 1 (Zeevi and Meir, Citation1997). If

f \in \{f : 1_{K} f \geq β, β > 0\} \cap L_{2} (K)

and $g$ are pdfs and $K$ is compact, then there exists a sequence $\{h_{m}^{g}\}$ such that

lim_{m \to \infty} {∥f - h_{m}^{g}∥}_{L_{2} (K)} = 0 a n d lim_{m \to \infty} K L_{K} (f, h_{m}^{g}) = 0.

Although powerful, this result is restrictive in the sense that it only permits approximation in the $L_{2}$ norm on compact sets $K$ , and that the result only allows for approximation of functions $f$ that are strictly positive on $K$ . In general, other modes of approximation are desirable, in particular, approximation in $L_{p} - n o r m$ for $p = 1$ or $p = \infty$ are of interest, where the latter case is generally referred to as uniform approximation. Furthermore, the strict-positivity assumption, and the restriction on compact sets limits the scope of applicability of Theorem 1. An example of an interesting application of extensions beyond Theorem 1 is within the $L_{1} - n o r m$ approximation framework of Devroye & Lugosi (Citation2000).

Let $g : R^{n} \to R$ again be a pdf. Then, for each $m \in N$ , we define

N_{m}^{g} = \{h : h (x) = \sum_{i = 1}^{m} c_{i} \frac{1}{σ_{i}^{n}} g (\frac{x - μ_{i}}{σ_{i}}), μ_{i} \in R^{n}, σ_{i} \in R_{+}, c_{i} \in R, i \in [m]\},

which we call the set of $m - c o m p o n e n t$ location-scale linear combinations of the pdf $g$ . In the past, results regarding approximations of pdfs $f$ via functions $η \in N_{m}^{g}$ have been more forthcoming. For example, in the case of $g = ϕ$ , where

(1)

ϕ (x) = {(2 π)}^{- n / 2} exp (- {∥x∥}_{2}^{2} / 2),

(1)

is the standard normal pdf. Denoting the class of continuous functions with support on $R^{n}$ by $C$ . We have the result that for every pdf $f$ , compact set $K \subset R^{n}$ , and $ϵ > 0$ , there exists an $m \in N$ and $h \in N_{m}^{ϕ}$ , such that ${∥f - h∥}_{L_{\infty} (K)} < ϵ$ (Sandberg, 2001, Lem. 1). Furthermore, upon defining the set of continuous functions that vanish at infinity by

{C_0} = \left\{ {f \in C:\forall \epsilon &\gt 0,\exists \,\,{\rm{a}}\,\,{\rm{compact}}\,\,\mathbb{K} \subset {{\mathbb R}^{\rm n}}{\rm{,}}\,\,{\rm{such}}\,\,{\rm{that}}\,\,{{\left\| {\rm f}\right\|}_{{\mathcal{L}_\infty }\left({{\mathbb{K}^{\rm C}}} \right)}} \lt \epsilon } \right\}{\rm{,}}

we also have the result: for every pdf $f \in C_{0}$ and $ϵ > 0$ , there exists an $m \in N$ and $h \in N_{m}^{ϕ}$ , such that ${∥f - h∥}_{L_{\infty}} < ϵ$ [32, Thm. 2]. Both of the results from Sandberg (Citation2001) are simple implications of the famous Stone-Weierstrass theorem (cf. Stone, Citation1948 and De Branges, Citation1959).

To the best of our knowledge, the strongest available claim that is made regarding the folk theorem, within a probabilistic or statistical context, is that of DasGupta (Citation2008, Thm. 33.2). Let $\{η_{m}^{g}\}$ be a sequence of functions that draw elements from the nested sequence of sets $\{N_{m}^{g}\}$ , in the same manner as $\{h_{m}^{g}\}$ . We paraphrase the claim without loss of fidelity, as follows:

Claim 1. If $f, g \in C$ are pdfs and $K \subset R^{n}$ is compact, then there exists a sequence $\{η_{m}^{g}\}$ , such that

lim_{m \to \infty} {∥f - η_{m}^{g}∥}_{L_{\infty} (K)} = 0.

Unfortunately, the proof of Claim 1 is not provided within DasGupta (Citation2008). The only reference of the result is to an undisclosed location in Cheney & Light (Citation2000), which, upon investigation, can be inferred to be Theorem 5 of Cheney & Light (Citation2000), Ch. 20. It is further notable that there is no proof provided for the theorem. Instead, it is stated that the proof is similar to that of Theorem 1 in Cheney & Light (Citation2000, Ch. 24), which is a reproduction of the proof for Xu et al. (Citation1993, Lem. 3.1).

There is a major problem in applying the proof technique of Xu et al. (Citation1993, Lem. 3.1) in order to prove Claim 1. The proof of [Xu et al. (Citation1993, Lem. 3.1)] critically depends upon the statement that “there is no loss of generality in assuming that $f (x) = 0$ for $x \in R^{n} ∖ 2 K$ “. Here, for $a \in R_{+}$ , $a K = \{x \in R^{n} : x = a y, y \in K\}$ . The assumption is necessary in order to write any convolution with $f$ and an arbitrary continuous function as an integral over a compact domain, and then to use a Riemann sum to approximate such an integral. Subsequently, such a proof technique does not work outside the class of continuous functions that are compactly supported on $a K$ . Thus, one cannot verify Claim 1 from the materials of Xu & Light, (Citation1993), Cheney & Light (Citation2000), and DasGupta (Citation2008), alone.

Some recent results in the spirit of Claim 1 have been obtained by Nestoridis & Stefanopoulos, (Citation2007) and Nestoridis et al. (Citation2011), using methods from the study of universal series (see, for example, in Nestoridis & Papadimitropoulos, Citation2005).

Let

W = \{f \in C_{0} : \sum_{y \in Z^{n}} sup_{x \in {[0, 1]}^{n}} |f (x + y)| < \infty\}

denote the so-called Wiener’s algebra (see, e.g. Feichtinger, Citation1977)) and let

V = \{f \in C_{0} : \forall x \in R^{n}, |f (x)| \leq β {(1 + {∥x∥}_{2})}^{- n - θ}, β, θ \in R_{+}\}

be a class of functions with tails decaying at a faster rate than $o ({∥x∥}_{2}^{n})$ . In Nestoridis et al. (Citation2011), it is noted that $V \subset W$ . Further, let

{C_C} = \left\{ {f \in C:\exists \,\,{\rm{a}}\,{\rm{compact}}\,{\rm{set}}\,\mathbb{K}{\rm{,}}\,\,{\rm{such}}\,\,{\rm{that}}\,\,{{\bf{1}}_{{\mathbb{K}^}}}\rm f = 0} \right\}{\rm{,}}

denote the set of compactly supported continuous functions. The following theorem was proved in Nestoridis & Stefanopoulos (Citation2007).

Theorem 2 (Nestoridis and Stefanopoulos, Citation2007, Thm. 3.2). If $g \in V$ , then the following statements hold.

(a) For any $f \in C_{c}$ , there exists a sequence $\{η_{m}^{g}\}$ ( $η_{m}^{g} \in N_{m}^{g}$ ), such that

lim_{m \to \infty} {∥f - η_{m}^{g}∥}_{L_{1}} + {∥f - η_{m}^{g}∥}_{L_{\infty}} = 0.

(b) For any $f \in C_{0}$ , there exists a sequence $\{η_{m}^{g}\}$ ( $η_{m}^{g} \in N_{m}^{g}$ ), such that

lim_{m \to \infty} {∥f - η_{m}^{g}∥}_{L_{\infty}} = 0.

(c) For any $1 \leq p < \infty$ and $f \in L_{p}$ , there exists a sequence $\{η_{m}^{g}\}$ ( $η_{m}^{g} \in N_{m}^{g}$ ), such that

lim_{m \to \infty} {∥f - η_{m}^{g}∥}_{L_{p}} = 0.

(d) For any measurable $f$ , there exists a sequence $\{η_{m}^{g}\}$ ( $η_{m}^{g} \in N_{m}^{g}$ ), such that

lim_{m \to \infty} η_{m}^{g} = f, a l m o s t e v e r y w h e r e .

(e) If $ν$ is a $σ - f i n i t e$ Borel measure on $R^{n}$ , then for any $ν - m e a s u r a b l e$ $f$ , there exists a sequence $\{η_{m}^{g}\}$ ( $η_{m}^{g} \in N_{m}^{g}$ ), such that

lim_{m \to \infty} η_{m}^{g} = f,

almost everywhere, with respect to $ν$ .

The result was then improved upon, in Nestoridis et al. (Citation2011), whereupon the more general space $W$ was taken as a replacement for $V$ , in Theorem 2. Denote the class of bounded continuous functions by $C_{b} = C \cap L_{\infty}$ . The following theorem was proved in Nestoridis et al. (Citation2011).

Theorem 3 (Nestoridis et al., Citation2011, Thm. 3.2). If $g \in W$ , then the following statements are true.

(a) The conclusion of Theorem 2(a) holds, with $C_{c}$ replaced by $C_{0} \cap L_{1}$ .

(b) The conclusions of Theorem 2(b)–(e) hold.

(c) For any $f \in C_{b}$ and compact $K \subset R^{n}$ , there exists a sequence $\{η_{m}^{g}\}$ , such that

lim_{m \to \infty} {∥f - η_{m}^{g}∥}_{L_{\infty} (K)} = 0.

Utilizing the techniques from Nestoridis & Stefanopoulos, (Citation2007), Bacharoglou, (Citation2010) proved a similar set of results to Theorem 2, under the restriction that $f$ is a non-negative function with support $R$ , using $g = ϕ$ (i.e. $g$ has form (1), where $n = 1$ ) and taking $\{h_{m}^{ϕ}\}$ as the approximating sequence, instead of $\{η_{m}^{g}\}$ . That is, the following result is obtained.

Theorem 4 (Bacharoglou, Citation2010, Cor. 2.5). If $f : R \to R_{+} \cup \{0\}$ , then the following statements are true.

(a) For any pdf $f \in C_{c}$ , there exists a sequence $\{h_{m}^{ϕ}\}$ ( $h_{m}^{ϕ} \in M_{m}^{ϕ}$ ), such that

lim_{m \to \infty} {∥f - h_{m}^{ϕ}∥}_{L_{1}} + {∥f - h_{m}^{ϕ}∥}_{L_{\infty}} = 0.

(b) For any $f \in C_{0}$ , such that ${∥f∥}_{L_{1}} \leq 1$ , there exists a sequence $\{h_{m}^{ϕ}\}$ ( $h_{m}^{ϕ} \in M_{m}^{ϕ}$ ), such that

lim_{m \to \infty} {∥f - h_{m}^{ϕ}∥}_{L_{\infty}} = 0.

(c) For any $1 < p < \infty$ and $f \in C \cap L_{p}$ , such that ${∥f∥}_{L_{1}} \leq 1$ , there exists a sequence $\{h_{m}^{ϕ}\}$ ( $h_{m}^{ϕ} \in M_{m}^{ϕ}$ ), such that

lim_{m \to \infty} {∥f - h_{m}^{ϕ}∥}_{L_{p}} = 0.

(d) For any measurable $f$ , there exists a sequence $\{h_{m}^{ϕ}\}$ ( $h_{m}^{ϕ} \in M_{m}^{ϕ}$ ), such that

lim_{m \to \infty} h_{m}^{ϕ} = f, a l m o s t e v e r y w h e r e .

(e) For any pdf $f \in C$ , there exists a sequence $\{h_{m}^{ϕ}\}$ ( $h_{m}^{ϕ} \in M_{m}^{ϕ}$ ), such that

lim_{m \to \infty} {∥f - h_{m}^{ϕ}∥}_{L_{1}} = 0.

To the best of our knowledge, Theorem 4 is the most complete characterization of the approximating capabilities of the mixture of normal distributions. However, it is restrictive in two ways. First, it does not permit the characterization of approximation via the class $M_{m}^{g}$ for any $g$ except the normal pdf $ϕ$ . Although $ϕ$ is traditionally the most common choice for $g$ in practice, the modern mixture model literature has seen the use of many more exotic component pdfs, such as the student-t pdf and its skew and modified variants (see, e.g. Peel & McLachlan, Citation2000, Forbes & Wraith, Citation2013, and Lee & McLachlan, Citation2016). Thus, its use is somewhat limited in the modern context. Furthermore, modern applications tend to call for $n > 1$ , further restricting the impact of the result as a theoretical bulwark for finite mixture modeling in practice. A remark in Bacharoglou, (Citation2010) states that the result can be generalized to the case where $g \in V$ instead of $g = ϕ$ . However, no suggestions were proposed, regarding the generalization of Theorem 4 to the case of $n > 1$ .

In this article, we prove a novel set of results that largely generalize Theorem 4. Using techniques inspired by Donahue et al. (Citation1997) and Cheney & Light, (Citation2000), we are able to obtain a set of results regarding the approximation capability of the class of $m - c o m p o n e n t$ mixture models $M_{m}^{g}$ , when $g \in C_{0}$ or $g \in V$ , and for any $n \in N$ . By definition of $V$ , the majority of our results extend beyond the proposed possible generalizations of Theorem 4.

The article proceeds as follows: Our main theorem is stated and its separate parts are proved in Section 2. Comments and discussion are provided in Section 3. Necessary technical lemmas and results are also included, for reference, in the Appendix.

2. Main result

The remainder of the article is devoted to proving the following theorem.

Theorem 5 (Main result). If we assume that $f$ and $g$ are pdfs and that $g \in C_{0}$ , then the following statements are true.

(a) For any $f \in C_{0}$ , there exists a sequence $\{h_{m}^{g}\}$ ( $h_{m}^{g} \in M_{m}^{g}$ ), such that

lim_{m \to \infty} {∥f - h_{m}^{g}∥}_{L_{\infty}} = 0.

(b) For any $f \in C_{b}$ and compact $K \subset R^{n}$ , there exists a sequence $\{h_{m}^{g}\}$ ( $h_{m}^{g} \in M_{m}^{g}$ ), such that

lim_{m \to \infty} {∥f - h_{m}^{g}∥}_{L_{\infty} (K)} = 0.

(c) For any $1 < p < \infty$ and $f \in L_{p}$ , there exists a sequence $\{h_{m}^{g}\}$ ( $h_{m}^{g} \in M_{m}^{g}$ ), such that

lim_{m \to \infty} {∥f - h_{m}^{g}∥}_{L_{p}} = 0.

(d) For any measurable $f$ , there exists a sequence $\{h_{m}^{g}\}$ ( $h_{m}^{g} \in M_{m}^{g}$ ), such that

lim_{m \to \infty} h_{m}^{g} = f, a l m o s t e v e r y w h e r e .

(e) If $ν$ is a $σ - f i n i t e$ Borel measure on $R^{n}$ , then for any $ν - m e a s u r a b l e$ $f$ , there exists a sequence $\{h_{m}^{g}\}$ ( $h_{m}^{g} \in M_{m}^{g}$ ), such that

lim_{m \to \infty} h_{m}^{g} = f,

almost everywhere, with respect to $ν$ .

If we assume instead that $g \in V$ , then the following statement is also true.

(f) For any $f \in C$ , there exists a sequence $\{h_{m}^{g}\}$ ( $h_{m}^{g} \in M_{m}^{g}$ ), such that

lim_{m \to \infty} {∥f - h_{m}^{g}∥}_{L_{1}} = 0.

2.1. Technical preliminaries

Before we begin to prove the main theorem, we establish some technical results regarding our class of component densities $C_{0}$ . Let $f, g \in L_{1}$ and denote the convolution of $f$ and $g$ by $f ⋆ g = g ⋆ f$ . Further, we denote the sequence of dilates of $g$ by $\{g_{k} : g_{k} (x) = k^{n} g (k x), k \in N\} .$ The following result is an alternative to Lemma 5 and Corollary 1. Here, we replace a boundedness assumption on the approximand, in the aforementioned theorem by a vanishing at infinity assumption, instead.

Lemma 1. Let $g$ be a pdf and $f \in C_{0}$ , such that ${∥f∥}_{L_{\infty}} > 0$ . Then,

lim_{k \to \infty} {∥g_{k} ⋆ f - f∥}_{L_{\infty}} = 0.

Proof. It suffices to show that for any $ϵ > 0$ , there exists a $k (ϵ) \in N$ , such that ${∥g_{k} ⋆ f - f∥}_{L_{\infty}} < ϵ$ , for all $k \geq k (ϵ)$ . By Lemma 6, $f \in C_{b}$ , and thus ${∥f∥}_{L_{\infty}} < \infty$ . By making the substitution $z = k x$ , we obtain for each $k$

\int g_{k} (x) d λ = \int k^{n} g (k x) d λ = \int g (z) d λ = 1.

By Corollary 1, we obtain ${lim}_{k \to \infty} \int 1_{\{x : {∥x∥}_{2} > δ\}} g_{k} d λ = 0$ and thus we can choose a $k (ϵ)$ , such that

\int 1_{\{x : {∥x∥}_{2} > δ\}} g_{k} d λ < \frac{ϵ}{4 {∥f∥}_{L_{\infty}}} .

Since $g$ is a pdf, we have

|(g_{k} ⋆ f) (x) - f (x)| = |\int g_{k} (y) [f (x - y) - f (x)] d λ (y)| \leq \int g_{k} (y) |f (x - y) - f (x)| d λ (y) .

By uniform continuity, for any $ϵ > 0$ , there exists a $δ (ϵ) > 0$ such that $|f (x - y) - f (x)| < ε / 2$ , for any $x, y \in R^{n}$ , such that ${∥y∥}_{2} < δ (ϵ)$ (Lemma 6). Thus, on the one hand, for any $δ (ϵ)$ , we can pick a $k (ϵ)$ such that

\int 1_{\{y : {∥y∥}_{2} > δ (ϵ)\}} g_{k} (y) |f (x - y) - f (x)| d λ (y)

\leq 2 {∥f∥}_{L_{\infty}} \int 1_{\{y : {∥y∥}_{2} > δ (ϵ)\}} g_{k} d λ

(2)

\leq 2 {∥f∥}_{L_{\infty}} \times \frac{ϵ}{4 {∥f∥}_{L_{\infty}}} = \frac{ϵ}{2},

(2)

and on the other hand

\int 1_{\{y : {∥y∥}_{2} \leq δ (ϵ)\}} g_{k} (y) |f (x - y) - f (x)| d λ (y)

\leq \frac{ϵ}{2} \int 1_{\{y : {∥y∥}_{2} \leq δ (ϵ)\}} g_{k} d λ

(3)

\leq \frac{ϵ}{2} \times 1 = \frac{ϵ}{2} .

(3)

The proof is completed by summing (2) and (3).□

Lemma 2. If $f \in C_{0}$ is such that $f \geq 0$ , and $ϵ > 0$ , then there exists a $h \in C_{c}$ , such that $0 \leq h \leq f$ , and

{∥f - h∥}_{L_{\infty}} < ϵ

Proof. Since $f \in C_{0}$ , there exists a compact $K \subset R^{n}$ such that ${∥f∥}_{L_{\infty} (K^{C})} < ϵ / 2$ . By Lemma 7, there exists some $g \in C_{c}$ , such that $0 \leq g \leq 1$ and $1_{K} g = 1$ . Let $h = g f$ , which implies that $h \geq 0$ and $0 \leq h \leq f$ . Furthermore, notice that $1_{K} (f - h) = 0$ and ${∥h∥}_{L_{\infty}} \leq {∥f∥}_{L_{\infty}}$ , by construction. The proof is completed by observing that

f - h_{L_{\infty}} = f - h_{L_{\infty} (K^{C})}

\leq {∥f∥}_{L_{\infty} (K^{C})} + {∥h∥}_{L_{\infty} (K^{C})}

\ \ \ \ \ \ \ \ \ \ \ \ \ \le 2{\left\| f\right\|_{{\mathcal{L}_\infty }\left({{\mathbb{K}^c}} \right)}} &\lt \,\epsilon {\rm{.}}

For any $δ > 0$ , uniformly continuous function $f$ , let

w (f, δ) = sup_{\{x, y \in R^{n} : {∥x - y∥}_{2} \leq δ\}} |f (x) - f (y)|

denote the modulus of continuity of $f$ . Furthermore, define the diameter of a set $X \subset R^{n}$ by $d i a m (X) = {sup}_{x, y \in X} {∥x - y∥}_{2}$ and denote an open ball, centered at $x \in R^{n}$ with radius $r > 0$ by $B (x, r) = \{y \in R^{n} : {∥x - y∥}_{2} < r\}$ .

Notice that the class $M_{m}^{g}$ can be parameterized as

M_{m}^{g} = \{h : h (x) = \sum_{i = 1}^{m} c_{i} k_{i}^{n} g (k_{i} x - z_{i}), z_{i} \in R^{n}, k_{i} \in R_{+}, c \in S^{m - 1}, i \in [m]\},

where $k_{i} = 1 / σ_{i}$ and $z_{i} = μ_{i} / σ_{i}$ . The following result is the primary mechanism that permits us to construct finite mixture approximations for convolutions of form $g_{k} ⋆ f$ . The argument motivated by the approaches taken in Theorem 1 in Cheney & Light (Citation2000, Ch. 24), Nestoridis & Stefanopoulos (Citation2007, Lem. 3.1), and Nestoridis et al. (Citation2011, Thm. 3.1).

Lemma 3. Let $f \in C$ and $g \in C_{0}$ be pdfs. Furthermore, let $K \subset R^{n}$ be compact and $h \in C_{c}$ , where $1_{K^{c}} h = 0$ and $0 \leq h \leq f$ . Then for any $k \in N$ , there exists a sequence $\{h_{m}^{g}\}$ , such that

lim_{m \to \infty} {∥g_{k} ⋆ h - h_{m}^{g}∥}_{L_{\infty}} = 0.

Proof. It suffices to show that for any $k \in N$ and $ϵ > 0$ , there exists a sufficiently large enough $m (ϵ) \in N$ so that for all $m \gem (ϵ), h_{m}^{g} \in M_{m}^{g}$ such that

(4)

{∥g_{k} ⋆ h - h_{m}^{g}∥}_{L_{\infty}} < ϵ .

(4)

For any $k \in N$ , we can write

(g_{k} ⋆ h) (x) = \int g_{k} (x - y) h (y) d λ (y)

= \int 1_{\{y : y \in K\}} g_{k} (x - y) h (y) d λ (y)

= \int 1_{\{y : y \in K\}} k^{n} g (k x - k y) h (y) d λ (y)

= \int 1_{\{z : z \in k K\}} g (k x - z) h (\frac{z}{k}) d λ (z) .

Here, $k K$ is continuous image of a compact set, and hence is compact (cf. Rudin, Citation1976, Thm. 4.14]). By Lemma 8, for any $δ > 0$ , there exists $κ_{i} \in R^{n}$ ( $i \in [m - 1]$ , $m \in N$ ), such that $k K \subset ⋃_{i = 1}^{m - 1} B (κ_{i}, δ / 2)$ . Further, if $B_{i}^{δ} = k K \cap B (κ_{i}, δ / 2)$ , then we have $k K = ⋃_{i = 1}^{m - 1} B_{i}^{δ}$ . We can obtain a disjoint covering of $k K$ by taking $A_{1}^{δ} = B_{1}$ and $A_{i}^{δ} = B_{i}^{δ} ∖ ⋃_{j = 1}^{i - 1} B_{j}^{δ}$ ( $i \in [m - 1]$ ) and noting that $k K = ⋃_{i = 1}^{m - 1} A_{i}^{δ}$ , by construction (cf. Cheney & Light, Citation2000, Ch. 24). Furthermore, each $A_{i}^{δ}$ is a Borel set and $d i a m (A_{i}^{δ}) \leq δ$ .

For convenience, let $Π_{m}^{δ} = \{A_{i}^{δ} : i \in [m - 1]\}$ denote the disjoint covering, or partition, of $k K$ . We seek to show that there exists an $m \in N$ and $Π_{m}^{δ}$ , such that

{∥g_{k} ⋆ h - \sum_{i = 1}^{m} c_{i} k_{i}^{n} g (k_{i} x - z_{i})∥}_{L_{\infty}} < ϵ,

where $k_{i} = k$ ,

c_{i} = k^{- n} \int 1_{\{z : z \in A_{i}^{δ}\}} h (z / k) d λ (z),

and $z_{i} \in A_{i}^{δ}$ , for $i \in [m - 1]$ .

Further, $z_{m} \in A_{m - 1}^{δ}$ and $c_{m} = 1 - \sum_{i = 1}^{m - 1} c_{i}$ , with $k_{m}$ chosen as follows: By Lemma 6, $g \leq C < \infty$ for some positive $C$ . Then, ${∥c_{m} k_{m}^{n} g (k_{m} x - z_{m})∥}_{L_{\infty}} \leq c_{m} k_{m}^{n} C$ . We may choose $k_{m}$ so that $k_{m}^{n} = ϵ / (2 c_{m} C)$ , so that

{∥c_{m} k_{m}^{n} g (k_{m} x - z_{m})∥}_{L_{\infty}} \leq \frac{ϵ}{2} .

Since $0 \leq h \leq f$ , the sum of $c_{i}$ ( $i \in [m - 1]$ ) satisfies the inequality

\sum_{i = 1}^{m - 1} c_{i} = k^{- n} \sum_{i = 1}^{m - 1} \int 1_{\{z : z \in A_{i}^{δ}\}} h (\frac{z}{k}) d λ

= k^{- n} \int 1_{\{z : z \in k K\}} h (\frac{z}{k}) d λ

= \int 1_{\{x : x \in K\}} h d λ \leq \int 1_{\{x : x \in K\}} f d λ \leq \int f d λ = 1.

Thus, $0 \leq c_{m} \leq 1$ , and our construction implies that $h_{m}^{g} \in M_{m}^{g},$ where

h_{m}^{g} (x) = \sum_{i = 1}^{m} c_{i} k_{i}^{n} g (k_{i} x - z_{i}) \forall x \in R^{n} .

We can bound the left-hand side of (4) as follows:

{∥g_{k} ⋆ h - h_{g}^{m}∥}_{L_{\infty}}

\leq {∥(g_{k} ⋆ h) (x) - \sum_{i = 1}^{m - 1} c_{i} k_{i}^{n} g (k_{i} x - z_{i})∥}_{L_{\infty}}

+ {∥c_{m} k_{m}^{n} g (k_{m} x - z_{m})∥}_{L_{\infty}}

\leq {∥(g_{k} ⋆ h) (x) - \sum_{i = 1}^{m - 1} c_{i} k_{i}^{n} g (k_{i} x - z_{i})∥}_{L_{\infty}} + \frac{ϵ}{2}

=∥ \int 1_{\{z : z \in k K\}} g (k x - z) h (\frac{z}{k}) d λ (z)

- \sum_{i = 1}^{m - 1} \int 1_{\{z : z \in A_{i}^{δ}\}} g (k x - z_{i}) h (\frac{z}{k}) d λ (z) ∥_{L_{\infty}} + \frac{ϵ}{2}

(5)

\leq \sum_{i = 1}^{m - 1} \int 1_{\{z : z \in A_{i}^{δ}\}} {∥g (k x - z) - g (k x - z_{i})∥}_{L_{\infty}} h (\frac{z}{k}) d λ (z) + \frac{ϵ}{2} .

(5)

Since

{∥k x - z - (k x - z_{i})∥}_{2} = {∥z - z_{i}∥}_{2} \leq d i a m (A_{i}^{δ}) \leq δ,

we have $|g (k x - z) - g (k x - z_{i})| \leq w (g, δ)$ , for each $i \in [m - 1]$ . Since ${lim}_{δ \to 0} w (g, δ) = 0$ (cf. Makarov & Podkorytov, Citation2013, Thm. 4.7.3), we may choose a $δ (ϵ) > 0$ so that $w (g, δ (ϵ)) < ϵ / (2 k^{n})$ . We may proceed from (5) as follows:

{∥g_{k} ⋆ h - h_{g}^{m}∥}_{L_{\infty}} \leq w (g, δ (ϵ)) \int 1_{\{z : z \in k K\}} h (\frac{z}{k}) d λ + \frac{ϵ}{2}

= w (g, δ (ϵ)) k^{n} \int h d λ + \frac{ϵ}{2}

\leq w (g, δ (ϵ)) k^{n} + \frac{ϵ}{2}

(6)

< \frac{ϵ}{2} + \frac{ϵ}{2} = ϵ .

(6)

To conclude the proof, it suffices to choose an appropriate sequence of partitions $Π_{m}^{δ (ϵ)}, m \gem (ϵ)$ , for some large but finite $m (ϵ)$ , so that (5) and (6) hold, which is possible by Lemma 8.□

For any $r \in N$ , let ${\overset{ˉ}{B}}_{r} = \{x \in R^{n} : {∥x∥}_{2} \leq r\}$ be a closed ball of radius $r$ , centered at the origin.

Lemma 4. If $f \in L_{1}$ , such that $f \geq 0$ , then

lim_{r \to \infty} {∥f - 1_{{\overset{ˉ}{B}}_{r}} f∥}_{L_{1}} = 0.

Proof. By construction, each element of the sequence $\{1_{{\overset{ˉ}{B}}_{r}} f\}$ ( $r \in N$ ) is measurable, $0 \leq 1_{{\overset{ˉ}{B}}_{r}} f \leq f$ , and

lim_{r \to \infty} 1_{{\overset{ˉ}{B}}_{r}} f = f,

point-wise. We obtain our conclusion via the Lebesgue dominated convergence theorem.□

2.2. Proof of theorem 5(a)

We now proceed to prove each of the parts of Theorem 5. To prove Theorem 5(a) it suffices to show that for every $ϵ > 0$ , there exists a $h_{m}^{g} \in M_{m}^{g}$ , such that ${∥f - h_{m}^{g}∥}_{L_{\infty}} < ϵ .$

Start by applying Lemma 2 to obtain $h \in C_{c}$ , such that $0 \leq h \leq f$ and ${∥f - h∥}_{L_{\infty}} < ϵ / 2$ . Then, we have

{∥f - h_{m}^{g}∥}_{L_{\infty}} \leq {∥f - h∥}_{L_{\infty}} + {∥h - h_{m}^{g}∥}_{L_{\infty}}

(7)

< \frac{ϵ}{2} + {∥h - h_{m}^{g}∥}_{L_{\infty}} .

(7)

The goal is to find a $h_{m}^{g}$ , such that ${∥h - h_{m}^{g}∥}_{L_{\infty}} < ϵ / 2$ . Since $h \in C_{c}$ , we may find a compact $K \subset R^{n}$ such that ${∥h∥}_{L_{\infty} (K^{C})} = 0$ . Apply Lemma 1 to show the existence of a $k (ϵ)$ , such that

{∥h - g_{k} ⋆ h∥}_{L_{\infty}} < \frac{ϵ}{4},

for all $k \geq k (ϵ)$ . With a fixed $k = k (ϵ)$ , apply Lemma 3 to show that there exists a $h_{m}^{g} \in M_{m}^{g}$ , such that

{∥g_{k (ε)} ⋆ h - h_{m}^{g}∥}_{L_{\infty}} < \frac{ϵ}{4} .

By the triangle inequality, we have

{∥h - h_{m}^{g}∥}_{L_{\infty}} \leq {∥h - g_{k (ϵ)} ⋆ h∥}_{L_{\infty}} + {∥g_{k (ϵ)} ⋆ h - h_{m}^{g}∥}_{L_{\infty}}

(8)

< \frac{ϵ}{4} + \frac{ϵ}{4} = \frac{ϵ}{2} .

(8)

The proof is complete by substitution of (8) into (7).

2.3. Proof of Theorem 5(b)

For any $ϵ > 0$ and compact $K \subset R^{n}$ , it suffices to show that there exists a sufficiently large enough $m (ϵ) \in N$ so that for all $m \gem (ϵ), h_{m}^{g} \in M_{m}^{g},$ such that ${∥f - h_{m}^{g}∥}_{L_{\infty} (K)} < ϵ$ .

By Lemma 5, we can find a $k (ϵ, K) \in N$ , such that

(9)

{∥f - g_{k} ⋆ f∥}_{L_{\infty} (K)} < \frac{ϵ}{3},

(9)

for every $k \geq k (ϵ, K)$ . Since $g \in C_{0}$ , ${∥g∥}_{L_{\infty}} \leq C < \infty$ for some positive $C$ , by Lemma 6. For any $k, r \in N$ , via Young’s convolution inequality:

(10)

\Vert g_{k}\star f-g_{k}\star\left(\mathbf{1}_{\bar{\mathbb{B}}_{r}}f\right)\Vert _{\mathcal{L}_{\infty}} \le k^{n}C\int\left(\mathbf{1}_{\bar{\mathbb{B}}_{r}^{\complement}}f\right)\text{d}\lambda=k^{n}C\Vert f-{\rm {1}}_{\Bar {\mathbb{B}}_{r}}\,{f\Vert _{{\mathcal L}_{\rm 1}}\text{.}

(10)

For fixed $k$ , we may choose $r (ϵ, K) \in N$ , using Lemma 4, so that ${∥f - 1_{{\overset{ˉ}{B}}_{r}} f∥}_{L_{1}} \leq ϵ / (3 k^{n} C)$ and thus the final term of (10) is bounded from above by $ϵ / 3$ for all $r \geq r (ϵ, K)$ . Thus, for $k = k (ϵ, K)$ and, $r \geq r (ϵ, K)$

(11)

{∥g_{k (ϵ, K)} ⋆ f - g_{k (ϵ, K)} ⋆ (1_{{\overset{ˉ}{B}}_{r (ϵ, K)}} f)∥}_{L_{\infty}} \leq \frac{ϵ}{3} .

(11)

Using Lemma 3, with approximand $1_{{\overset{ˉ}{B}}_{r (ϵ, K)}} f$ , component density $g$ , compact set ${\overset{ˉ}{B}}_{r (ϵ, K)}$ , $h = 1_{{\overset{ˉ}{B}}_{r (ϵ, K)}} f$ , and with $k = k (ϵ, K)$ fixed, we have the existence of a density $h_{m}^{g} \in M_{m}^{g}, m \gem (ϵ) \in N,$ such that

(12)

{∥g_{k (ϵ, K)} ⋆ (1_{{\overset{ˉ}{B}}_{r (ϵ, K)}} f) - h_{m}^{g}∥}_{L_{\infty}} \leq \frac{ϵ}{3} .

(12)

We obtain the desired result by combining (9), (11), and (12), via the triangle inequality.

2.4. Proof of Theorem 5(c)

The technique used to prove Theorem 5(c) is different from those used in the previous sections. Here, we use a result of Donahue et al. (Citation1997) that generalizes the classic Barron-Jones Hilbert space approximation result (cf. Jones, Citation1992 and Barron, Citation1993) to Banach spaces.

To prove Theorem 5(c), it suffices to show that for every $ϵ > 0$ , there exists a sufficiently large enough $m (ϵ) \in N$ so that for all $m \gem (ϵ), h_{m}^{g} \in M_{m}^{g}$ such that ${∥f - h_{m}^{g}∥}_{L_{p}} < ϵ$ . Begin by applying Corollary 1 to obtain a $k (ϵ)$ , such that

(13)

{∥f - g_{k} ⋆ f∥}_{L_{p}} < \frac{ϵ}{2}

(13)

for all $k \geq k (ϵ)$ .

For some pdf $g$ and fixed $k \in N$ , let us define the class

G_{g}^{k} = \{h : h (x) = k^{n} g (k x - k μ), μ \in R^{n}\},

write the $m - p o i n t$ convex hull of $G_{g}^{k}$ as

C o n v_{m} (G_{g}^{k}) = \{h : h = \sum_{i = 1}^{m} c_{i} g_{i}, g_{i} \in G_{g}^{k}, c \in S^{m - 1}, i \in [m]\},

and call $C o n v_{\infty} (G_{g}^{k}) = C o n v (G_{g}^{k})$ the convex hull of $G_{g}^{k}$ . We further say that $\overline{C o n v} (G_{g}^{k})$ is the closure of $C o n v (G_{g}^{k})$ .

Because $g$ is a pdf, $g \in C_{0} \subset C_{b}$ , and $C_{b} \subset L_{\infty}$ , we observe that $g \in L_{1} \cap L_{\infty}$ . Thus, $g \in L_{p}$ , for any $1 < p < \infty$ , by Lemma 9. Since $g$ is a pdf and $f \in L_{p}$ , we have the existence of $g_{k} ⋆ f$ and the fact that ${∥g_{k} ⋆ f∥}_{L_{p}}$ is finite.

Furthermore, for any $ψ \in G_{g}^{k}$ , since $g \in L_{p}$ and by definition of $G_{g}^{k}$ , we have ${∥ψ∥}_{L_{p}} \leq k^{n / p} {∥g∥}_{L_{p}} .$ Thus, we have

(14)

{∥ψ - g_{k} ⋆ f∥}_{L_{p}} \leq {∥ψ∥}_{L_{p}} + {∥g_{k} ⋆ f∥}_{L_{p}} \leq K,

(14)

by choosing $K = k^{n / p} {∥g∥}_{L_{p}} + {∥g_{k} ⋆ f∥}_{L_{p}} > 0$ .

Following van de Geer (Citation2003), we can write the closure of $G_{g}^{k}$ as

\overline{C o n v} (G_{g}^{k}) = \{h : h (x) = \int k^{n} g (k x - k μ) f (μ) d λ (μ), f i s a p d f\},

and thus we immediately have $g_{k} ⋆ f \in \overline{C o n v} (G_{g}^{k})$ . Combined with (14), we can apply Lemma 11 to obtain the conclusion that there exists a function $h_{m}^{g} \in C o n v_{m} (G_{g}^{k (ϵ)}) \subset M_{m}^{g}$ , such that

{∥h_{m}^{g} - g_{k (ϵ)} ⋆ f∥}_{L_{p}} \leq \frac{K C_{p}}{m^{1 - 1 / α}},

where $α = min \{p, 2\}$ and $C_{p}$ is a finite constant. Since $p > 1$ , $m^{1 - 1 / α}$ is strictly increasing, and hence we can choose an $m (ϵ) \in N$ , such that for all $m \gem (ϵ)$ ,

(15)

{∥h_{m}^{g} - g_{k (ϵ)} ⋆ f∥}_{L_{p}} \leq \frac{ϵ}{2} .

(15)

The proof is then completed by combining (13) and (15) via the triangle inequality.

2.5. Proof of Theorem 5(d) and Theorem 5(e)

By Theorem 5(a), there exists a sequence $\{h_{m}^{g}\}$ that uniformly converges to $f$ , as $m \to \infty$ . Thus, by Lemma 12, $\{h_{m}^{g}\}$ almost uniformly converges to $f$ and also converges almost everywhere, to $f$ , with respect to any measure $ν$ . We prove Theorem 5(d) by setting $ν = λ,$ and we prove Theorem 5(e) by not specifying $ν$ .

2.6. Proof of Theorem 5(f)

It suffices to show that for any $ϵ > 0$ , there exists a sufficiently large enough $m (ϵ) \in N$ so that for all $m \gem (ϵ), h_{m}^{g} \in M_{m}^{g}$ , where $g \in V$ , such that ${∥f - h_{m}^{g}∥}_{L_{1}} < ϵ$ . Begin by applying Lemma 4 in order to find a $r (ϵ) \in N$ , for any $ϵ > 0$ , such that for all $r \geq r (ϵ)$ ,

(16)

{∥f - 1_{{\overset{ˉ}{B}}_{r}} f∥}_{L_{1}} \leq \frac{ϵ}{24} < \frac{ϵ}{2},

(16)

where $0 \leq 1_{{\overset{ˉ}{B}}_{r}} f \leq f$ , and $1_{{\overset{ˉ}{B}}_{r}} f \in C_{c}$ with compact support ${\overset{ˉ}{B}}_{r}$ .

Let $K = {\overset{ˉ}{B}}_{r}$ and apply the triangle inequality to obtain

{∥f - h_{m}^{g}∥}_{L_{1}} \leq {∥f - 1_{K} f∥}_{L_{1}} + {∥1_{K} f - h_{m}^{g}∥}_{L_{1}}

\leq \frac{ϵ}{2} + {∥1_{K} f - h_{m}^{g}∥}_{L_{1}} .

Hence, we need to show that there exists a function $h_{m}^{g} \in M_{m}^{g}$ , such that

{∥1_{K} f - h_{m}^{g}∥}_{L_{1}} \leq \frac{ϵ}{2} .

Since $g \in V$ and $g_{k} (x) = k^{n} g (k x)$ , by substitution, we have

(17)

g_{k} (x) \leq \frac{β k^{- θ}}{{(k^{- 1} + {∥x∥}_{2})}^{n + θ}},

(17)

where $β, θ > 0$ are independent of $k$ . By Lemma 5 and Corollary 1, we can obtain a $k_{1} (ϵ)$ , such that for all $k \geq k_{1} (ϵ)$ ,

(18)

{∥1_{K} f - g_{k} ⋆ (1_{K} f)∥}_{L_{1}} \leq \frac{ϵ}{4} .

(18)

Suppose that $γ > 1$ and let

K_{k} = \{x \in R^{n} : d i s t (x, K) \leq k^{- γ}\},

where

d i s t (x, X) = i n f \{{∥x - y∥}_{2} : y \in X\} .

By construction, $λ (K_{k}) = λ (K) + O (k^{- γ})$ and thus there exists a $k_{2}$ such that $λ (K_{k}) \leq λ (K) + 1$ , for any $k \geq k_{2}$ .

For any $k > k_{2}$ , we can show that

(19)

{∥g_{k} ⋆ (1_{K} f) - h_{m - 1}^{g}∥}_{L_{1} (K_{k})} < \frac{ϵ}{8} .

(19)

To do so, firstly, for any $x \in R^{n}$ ,

g_{k} ⋆ (1_{K} f) = \int 1_{K} g_{k} (x - y) f (y) d λ (y)

= \int 1_{k K} g (k x - z) f (\frac{z}{k}) d λ (z) .

To obtain a Riemann sum approximation of $g_{k} ⋆ (1_{K} f)$ , we use an argument analogous to that of Lemma 3. That is, we partition $k K$ into $m - 1$ disjoint Borel sets $Π_{m} = \{A_{1}, \dots, A_{m - 1}\}$ , and we approximate $g_{k} ⋆ (1_{K} f)$ by a $h_{m - 1}^{g} \in M_{m - 1}^{g}$ , where for each $i \in [m - 1]$ , $k_{i} = k$ , $z_{i} \in A_{i}$ , and

c_{i} = k^{- n} \int 1_{A_{i}} f (\frac{z}{k}) d λ (z) .

Define $k_{m} \in R_{+}$ , $z_{m} \in R^{n}$ , and $c_{m} = 1 - \sum_{i = 1}^{m - 1} c_{i}$ , where

(20)

c_{m} = \int f d λ - \int 1_{K} f d λ = {∥f - 1_{K} f∥}_{L_{1}} \leq \frac{ϵ}{24}

(20)

by (16). Then, by a similar argument to Lemma 3, $c_{i} \geq 0$ for all $i \in [m]$ and $\sum_{i = 1}^{m} c_{i} = 1$ . Thus, we may define an element $h_{m}^{g} \in M_{m}^{g}$ via the parameters above.

For sufficiently large $k \geq k_{2}$ , we use Lemma 3 to show that

{∥g_{k} ⋆ (1_{K} f) - h_{m - 1}^{g}∥}_{L_{\infty} (K_{k})} < \frac{ϵ}{8 (λ (K) + 1)},

which implies

{∥g_{k} ⋆ (1_{K} f) - h_{m - 1}^{g}∥}_{L_{1} (K_{k})} < \int 1_{K_{k}} \frac{ϵ}{8 (λ (K) + 1)} d λ

(21)

< \frac{ϵ λ (K_{k})}{8 (λ (K) + 1)} < \frac{ϵ}{8},

(21)

and thus (19) is proved. Using (19), we write

{∥g_{k} ⋆ (1_{K} f) - h_{m}^{g}∥}_{L_{1}}

= {∥g_{k} ⋆ (1_{K} f) - h_{m - 1}^{g} - c_{m} k_{m}^{n} g (k_{m} x - z_{m})∥}_{L_{1}}

\leq {∥g_{k} ⋆ (1_{K} f) - h_{m - 1}^{g}∥}_{L_{1} (K_{k})}

+ \left\| {{g_k}} \right. \star \left({{{\bf{1}}_{\mathbb K}f} \right) - h_{m - 1}^g\left\| {_{_{{\mathcal{L}_1}\left({\mathbb K}_k} \right)}}} \right.

+ {∥c_{m} k_{m}^{n} g (k_{m} x - z_{m})∥}_{L_{1}}

\leq \frac{\isin}{8} + c_{m} + ∥_{g_{k}} ⋆ (1_{K} f) ∥_{L_{1} (K_{k}^{c})} + ∥h_{m - 1}^{g} ∥_{L_{1} (K_{k}^{c})},

where ${∥c_{m} k_{m}^{n} g (k_{m} x - z_{m})∥}_{L_{1}} \leq c_{m}$ since $k_{m}^{n} g (k_{m} x - z_{m})$ is a pdf. The aim is now to prove that

\left\| {{g_k}} \right. \star \left({{{\bf{1}}_\mathbb{K}}f} \right)\left\| {_{{\mathcal{L}_1}\left({{{\mathbb K}_k}} \right)}} \right. &\lt {\epsilon \over {24}}\,{\rm{and}}\,h_{m - 1}^g\left\| {_{{\mathcal{L}_1}\left({{{\mathbb K}_k}} \right)}} \right. &\lt {\epsilon \over {24}}{\rm{.}}

Using polar coordinates and (17), we have

\int 1_{\{x : {∥x - y∥}_{2} > k^{- γ}\}} g_{k} (x - y) d λ (x)

\leq \int \frac{1_{\{x : {∥x - y∥}_{2} > k^{- γ}\}} β k^{- θ}}{{(k^{- 1} + {∥x - y∥}_{2})}^{n + θ}} d λ (x)

= β A_{n} k^{- θ} \int \frac{1_{(k^{- γ}, \infty)} r^{n - 1}}{{(k^{- 1} + r)}^{n + θ}} d λ (r)

\leq β A_{n} k^{- θ} \int 1_{(k^{- γ}, \infty)} r^{- θ - 1} d λ (r)

= β A_{n} k^{θ (γ - 1)} / θ,

where $A_{n}$ is the surface area of a unit sphere embedded in $R^{n}$ . We then have

∥g_{k} ⋆ (1_{K} f) ∥_{_{L_{1} (K_{k}^{c})}}

= \int \int 1_{\{y \in K\}} 1_{\{x \in K_{k}^{c}\}} f (y) g_{k} (x - y) d λ (x) d λ (y)

\leq {∥1_{K} f∥}_{L_{\infty}} \int \int 1_{\{y \in K\}} \frac{1_{\{x : {∥x - y∥}_{2} > k^{- γ}\}} β k^{- θ}}{{(k^{- 1} + {∥x - y∥}_{2})}^{n + θ}} d λ (x) d λ (y)

\leq {∥1_{K} f∥}_{L_{\infty}} λ (K) β A_{n} k^{θ (γ - 1)} / θ,

which implies that we can choose a $k_{3} \in N$ , such that for all $k \geq k_{3}$ ,

(22)

\left\| {{g_k}} \right. \star \left({{{\bf{1}}_{\mathbb{K}}f}} \right)\| {_{_{{\mathcal{L}}_1\,\left({{{\mathbb K}_k^c}} \right)\, \,}\lt\,\,{ \epsilon \over {24}}{\rm{.}}

(22)

Lastly, we write

\left\| {h_{m - 1}^g} \right.\left\| {_{{\mathcal{L}}_1\,\left({\mathbb K}_k^{\rm C}} \right)}} \right.

= \int 1_{K_{k}^{C}} \sum_{i = 1}^{m - 1} c_{i} k^{n} g (k x - z_{i}) d λ

= \sum_{i = 1}^{m - 1} \int 1_{K_{k}^{C}} [k^{- n} \int 1_{A_{i}} f (\frac{z}{k}) d λ (z)] k^{n} g (k x - z_{i}) d λ (x)

\leq ∥1_{K} f ∥_{L_{\infty}} \sum_{i = 1}^{m - 1} k^{- n} λ (A_{i}) \int 1_{K_{k}^{C}} g_{k} (x - \frac{z_{i}}{k}) d λ

\leq {∥1_{K} f∥}_{L_{\infty}} \sum_{i = 1}^{m - 1} k^{- n} λ (A_{i}) \frac{β A_{n} k^{θ (γ - 1)}}{θ}

\leq {∥1_{K} f∥}_{L_{\infty}} λ (K) β A_{n} k^{θ (γ - 1)} / θ,

which implies that we can choose the same $k_{3}$ as above to obtain the bound

(23)

\| {h_{m - 1}^g} \| {_{_{{\mathcal{L}}_1\,\left({{{\mathbb K}_k^c}} \right)}\ \lt\, { {\epsilon} \over {24}}{\rm{,}}

(23)

for any $k \geq k_{3}$ .

Thus, we obtain the bound ${∥1_{K} f - h_{m}^{g}∥}_{L_{1}} < ϵ / 2$ , for all $k \geq max \{k_{1}, k_{2}, k_{3}\}$ , by combining (18), (19), (20), (21), (22), and (23), via the triangle inequality. The result is proved by combing the bound above, with (16), for an appropriately large $r (ϵ) \in N$ .

3. Comments and discussion

3.1. Relationship to Theorem 1

In the proof of Theorem 1, the famous Hilbert space approximation result of Jones (Citation1992) and Barron (Citation1993) was used to bound the $L_{2}$ norm between any approximand $f \in L_{2}$ and a convex combination of bounded functions in $L_{2}$ . This approximation theorem is exactly the $p = 2$ case of the more general theorem of Donahue et al. (Citation1997), as presented in Lemma 11. Thus, one can view Theorem 5(c) as the $p \in (1, \infty)$ generalization of Theorem 1.

3.2. The class $W$ is a proper subset of the class $C_{0}$

Here, we comment on the nature of class $W$ , which was investigated by Bacharoglou, (Citation2010) and Nestoridis et al. (Citation2011). We recall that Bacharoglou (Citation2010) conjectured that Theorem 4 generalizes from $g = ϕ$ to $g \in V$ . In Theorem 5(a)–(e), we assume that $g \in C_{0}$ . We can demonstrate that $g \in C_{0}$ is a strictly weaker condition than $g \in V$ or $g \in W$ .

For example, consider the function in $g : R \to R$ such that $g (x) = 0$ if $x < 0$ and

g (x) = \sum_{i = 1}^{\infty} \frac{2^{2 i}}{i} [{(x - i + 1)}^{2 i} 1_{\{i - 1 \leq x < i - 1 / 2\}} + {(x - i)}^{2 i} 1_{\{i - 1 / 2 \leq x < i\}}] i f x \geq 0,

and note that

\int 1_{(- 1 / 2, 1 / 2)} \frac{{(2 x)}^{2 i}}{i} d λ = \frac{1}{2 i^{2} + i} < \frac{1}{i^{2}} .

Since $\sum_{i = 1}^{\infty} (1 / i^{2}) = π^{2} / 6$ , $g \in L_{1}$ . Furthermore, $g$ is continuous since all stationary points of $g$ are continuous. In $R$ , $g \in C_{0}$ if

lim_{x \to \pm \infty} g (x) = 0.

For $x \leq 0$ , we observe that $g = 0$ and thus the left limit is satisfied. On the right, for any $1 / ϵ > 0$ , we have $x (ϵ) \geq ⌈ϵ⌉ - 1 / 2$ , so that $g (x) < 1 / ϵ$ , for all $x > x (ϵ)$ , where $⌈\cdot⌉$ is the ceiling operator. Therefore, $g \in C_{0}$ .

Within each interval $i - 1 \leq x < i$ , we observe that $g$ is locally maximized at $x = i - 1 / 2$ . The local maximum corresponding to each of these points is $1 / i$ . Thus $g \notin W$ , since

\sum_{i = 1}^{\infty} \frac{1}{i} < \sum_{y \in Z} sup_{x \in [0, 1]} |g (x + y)|,

where $\sum_{i = 1}^{\infty} (1 / i) = \infty$ . Furthermore, $g \notin V$ since $V \subset W$ .

3.3. Convergence in measure

Along with the conclusions of Theorem 5(d) and (e), Lemma 12 also implies convergence in measure. That is, if $ν$ is a $σ - f i n i t e$ Borel measure on $R^{n}$ , then for any $ν - m e a s u r a b l e$ $f$ , there exists a sequence $\{h_{m}^{g}\}$ , such that for any $ϵ > 0$ ,

lim_{m \to \infty} υ (\{x \in R^{n} : |f (x) - h_{m}^{g} (x)| \geq ϵ\}) = 0.

A Technical results

Throughout the main text, we utilize a number of established technical results. For the convenience of the reader, we append these results within this Appendix. Sources from which we draw the unproved results are provided at the end of the section.

Lemma 5. Let $\{g_{k}\}$ be a sequence of pdfs in $L_{1}$ and for every $δ > 0$

lim_{k \to \infty} \int 1_{\{x : {∥x∥}_{2} > δ\}} g_{k} d λ = 0.

Then, for all $f \in L_{p}$ and $1 \leq p < \infty$ ,

lim_{k \to \infty} {∥g_{k} ⋆ f - f∥}_{L_{p}} = 0.

Furthermore, for all $f \in C_{b}$ and any compact $K \subset R^{n}$ ,

lim_{k \to \infty} {∥g_{k} ⋆ f - f∥}_{L_{\infty} (K)} = 0.

The sequences $\{g_{k}\}$ from Lemma 5 are often called approximate identities or approximations of the identity. A simple construction of approximate identities is by taking dilations $g_{k} (x) = k^{n} g (k x)$ , which yields the following corollary.

Corollary 1. Let $g$ be a pdf. Then the sequence of dilations $\{g_{k} : g_{k} (x) = k^{n} g (k x)\}$ , satisfies the hypothesis of Lemma 5 and hence permits its conclusion.

Lemma 6. The class $C_{0}$ is a subset of $C_{b}$ . Furthermore, if $f \in C_{0}$ , then $f$ is uniformly continuous.

Lemma 7 (Urysohn’s Lemma). If $K \subset R^{n}$ is compact, then there exists some $g \in C_{c}$ , such that $0 \leq g \leq 1$ and $1_{K} g = 1$ .

Lemma 8. If $X \subset R^{n}$ is bounded, then for any $r > 0$ , $X$ can be covered by $⋃_{i = 1}^{m} B (x_{i}, r)$ for some finite $m \in N$ , where $x_{i} \in R^{n}$ and $i \in [m]$ .

Lemma 9. If $0 < p < q < r \leq \infty$ , then $L_{p} \cap L_{r} \subset L_{q}$ .

Let $Γ : R \to R$ be the usual gamma function, defined as $Γ (z) = \int 1_{(0, \infty)} x^{z - 1} exp (- x) d λ$ .

Lemma 10. If $f \in L_{p}$ and $g \in L_{1}$ , for $1 \leq p \leq \infty$ , then $f ⋆ g$ exists and we have ${∥f ⋆ g∥}_{L_{p}} \leq {∥g∥}_{L_{1}} {∥f∥}_{L_{p}}$ .

Lemma 11. Let $G \subset L_{p}$ , for some $1 \leq p < \infty$ , and let $f \in \overline{C o n v} (G)$ . For any $K > 0$ , such that ${∥f - α∥}_{L_{p}} < K$ , for all $α \in G$ , there exists a $h_{m} \in C o n v_{m} (G)$ , such that

{∥f - h_{m}∥}_{L_{p}} \leq \frac{C_{p} K}{m^{1 - 1 / α}},

where $α = min \{p, 2\}$ , and

C_{p} = (\begin{matrix} 1 & i f 1 \leq p \leq 2, \\ \sqrt{2} {[\sqrt{π} Γ (\frac{p + 1}{2})]}^{1 / p} & i f p > 2. \end{matrix}

Lemma 12. In any measure $ν$ , uniform convergence implies almost uniform convergence, and almost uniform convergence implies almost everywhere convergence and convergence in measure, with respect to $ν$ .

B Sources of results

Lemma 5 is reported as Theorem 9.3.3 in Makarov & Podkorytov (Citation2013) (see also Theorem 2 of Rudin, Citation1976, Ch. 20). The proof of Corollary 1 can be taken from that of Theorem 4 of Cheney & Light, Citation2000, Ch. 20. Lemma 6 appears in Conway, Citation2012), as Proposition 1.4.5. Lemma 7 is taken from Corollary 1.2.9 of Conway (Citation2012). Lemma 8 appears as Theorem 1.2.2 in Conway (Citation2012). Lemma 9 can be found in Folland (Citation1999), Prop. 6.10. Lemma 10 can be found in [21, Thm. 9.3.1]. Lemma 11 appears as Corollary 2.6 in Donahue et al. (Citation1997). Lemma 12 can be obtained from the definition of almost uniform convergence, Lemma 7.10, and Theorem 7.11 of Bartle (Citation1995).

Acknowledgements

HDN is personally funded by Australian Research Council (ARC) grant DE170101134. HDN and GJM are supported by ARC grant DP180101192. FC is supported by Agence Nationale de la Recherche (ANR) grant SMILES ANR-18-CE40-0014 and by Région Normandie grant RIN.

Additional information

Notes on contributors

Geoffrey J. McLachlan

Mr. Nguyen, Dr. Nguyen, and Profs. Chamroukhi and McLachlan are each keenly interested in the study of finite mixture models and their modifications and extensions for various data analytic and machine learning applications. Their research spans the fields of computational statistics, mathematical statistics, machine learning and AI. Common threads of their research consist of the derivation of algorithms for the efficient estimation of mixture models and extensions, such as mixtures of experts, and mixtures of factor analysers; the derivation of theoretical results regarding the statistical and mathematical properties of such constructions and their estimators; and the application of mixture models to data analytic problems spanning the fields of biology, medical science, engineering, signal processing, among many others.

References

Bacharoglou, A. G. 2010. Approximation of probability distributions by convex mixtures of Gaussian measures. Proceedings of the American Mathematical Society, 138:2619–18.
Google Scholar
Barron, A. R. (1993). Universal approximation bound for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930–945. doi: 10.1109/18.256500
Web of Science ®Google Scholar
Bartle, R. G. (1995). The Elements of Integration and Lebesgue Measure. Wiley.
Google Scholar
Cheney, W. & Light, W. (2000). A Course in Approximation Theory. Brooks/Cole.
Google Scholar
Conway, J. B. (2012). A Course in Abstract Analysis. American Mathematical Society.
Google Scholar
DasGupta, A. (2008). Asymptotic Theory Of Statistics And Probability. Springer.
Google Scholar
De Branges, L. 1959. The Stone-Weierstrass theorem. Proceedings of the American Mathematical Society, 10:822–824.
Google Scholar
Devroye, L. & Lugosi, G. (2000). Combinatorial Methods in Density Estimation. Springer.
Google Scholar
Donahue, M. J., Gurvits, L., Darken, C. & Sontag, E. (1997). Rates of convex approximation in non-Hilbert spaces. Constructive Approximation, 13(2), 187–220. https://doi.org/10.1007/BF02678464
Web of Science ®Google Scholar
Everitt, B. S. & Hand, D. J. (1981). Finite Mixture Distributions. Chapman and Hall.
Google Scholar
Feichtinger, H. G. (1977). A characterization of wiener’s algebra on locally compact groups. Archiv der Mathematik, 29(1), 136–140. https://doi.org/10.1007/BF01220386
Google Scholar
Folland, G. B. (1999). Real Analysis: Modern Techniques and Their Applications. Wiley.
Google Scholar
Forbes, F. & Wraith, D. (2013). A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweights. Statistics and computing, 24(6), 971–984. https://doi.org/10.1007/s11222-013-9414-4
Web of Science ®Google Scholar
Fruwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer.
Google Scholar
Fruwirth-Schnatter, S., Celeux, G., & Robert, C. P. (editors). (2019). Handbook of Mixture Analysis. CRC Press.
Google Scholar
Jones, L. K. (1992). A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. Annals of statistics, 20(1), 608–613. https://doi.org/10.1214/aos/1176348546
Web of Science ®Google Scholar
Kullback, S. & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79–86. https://doi.org/10.1214/aoms/1177729694
Google Scholar
Lee, S. X. & McLachlan, G. J. (2016). Finite mixtures of canonical fundamental skew t-distributions: the unification of the restricted and unrestricted skew t-mixture models. Statistics and computing, 26(3), 573–589. https://doi.org/10.1007/s11222-015-9545-x
Web of Science ®Google Scholar
Li, J. Q. & Barron, A. R. (1999). Mixture density estimation. In S. A. Solla, T. K. Leen, & K. R. Mueller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 279–285). MIT Press.
Google Scholar
Lindsay, B. G. 1995. Mixture models: theory, geometry and applications. In NSF-CBMS Regional Conference Series in Probability and Statistics. Hayward.
Google Scholar
Makarov, B. & Podkorytov, A. (2013). Real Analysis: Measures, Integrals and Applications. Springer.
Google Scholar
McLachlan, G. J. & Basford, K. E. (1988). Mixture Models: Inference and Applications to Clustering. Marcel Dekker.
Google Scholar
McLachlan, G. J. & Peel, D. (2000). Finite Mixture Models. Wiley.
Google Scholar
Mengersen, K. L., Robert, C. & Titterington, M. (2011). Mixtures: Estimation and Applications. Wiley.
Google Scholar
Nestoridis, V. & Papadimitropoulos, C. (2005). Abstract theory of universal series and an application to Dirichlet series. Comptes Rendus Academy of Science Paris Series I, 341(9), 530–543. doi: 10.1016/j.crma.2005.09.028
Google Scholar
Nestoridis, V., Schmutzhard, S. & Stefanopoulos, V. (2011). Universal series induced by approximate identities and some relevant applications. Journal of Approximation Theory, 163(12), 1783–1797. https://doi.org/10.1016/j.jat.2011.06.001
PubMed Web of Science ®Google Scholar
Nestoridis, V. & Stefanopoulos, V. 2007. Universal series and approximate identities. Technical report.
Google Scholar
Nguyen, H. D. & McLachlan, G. J. (2019). On approximations via convolution-defined mixture models. Communications in Statistics - Theory and Methods, 48(16), 3945–3955. In press. https://doi.org/10.1080/03610926.2018.1487069
Web of Science ®Google Scholar
Peel, D. & McLachlan, G. J. (2000). Robust mixture modelling using the t distribution. Statistics and computing, 10(4), 339–348. https://doi.org/10.1023/A:1008981510081
Web of Science ®Google Scholar
Rakhlin, A., Panchenko, D. & Mukherjee, S. (2005). Risk bounds for mixture density estimation. ESAIM: Probability and Statistics, 9, 220–229. https://doi.org/10.1051/ps:2005011
Google Scholar
Rudin, W. (1976). Principles of Mathematical Analysis. McGraw-Hill.
Google Scholar
Sandberg, I. W. (2001). Gaussian radial basis functions and inner product space. Circuits, Systems and Signal Processing, 20(6), 635–642. https://doi.org/10.1007/BF01270933
Web of Science ®Google Scholar
Schlattmann, P. (2009). Medical Applications of Finite Mixture Models. Springer.
Google Scholar
Stone, M. H. 1948. The generalized Weierstrass approximation theorem. Mathematical Magazine, 21:237–254.
Google Scholar
Titterington, D. M., Smith, A. F. M. & Makov, U. E. (1985). Statistical Analysis of Finite Mixture Distributions. Wiley.
Google Scholar
van de Geer, S. (2003). Asymptotic theory for maximum likelihood in nonparametric mixture models. Computational Statistics and Data Analysis, 41(3–4), 453–464. https://doi.org/10.1016/S0167-9473(02)00188-3
Web of Science ®Google Scholar
Walker, J. L., & Ben-Akiva, M. (2011). Advances in discrete choice: mixture models. In A. De Palma, R. Lindsey, E. Quinet, & R. Vickerman (Eds.), A Handbook of transport economics (pp. 160–187). Edward Edgar.
Google Scholar
Xu, Y., & Light, W. A. (1993). and E W Cheney. Constructive methods of approximation by ridge functions and radial functions. Numerical Algorithms, 4(2), 205–223. https://doi.org/10.1007/BF02144104
Google Scholar
Yona, G. (2011). Introduction to Computational Proteomics. CRC Press.
Google Scholar
Zeevi, A. J., & Meir, R. (1997). Density estimation through convex combinations of densities: approximation and estimation bounds. Neural computation, 10(1), 99–109. doi: 10.1016/S0893-6080(96)00037-8
Google Scholar

Approximation by finite mixtures of continuous density functions that vanish at infinity

Abstract

PUBLIC INTEREST STATEMENT

1 Introduction

2. Main result

2.1. Technical preliminaries

2.2. Proof of theorem 5(a)

2.3. Proof of Theorem 5(b)

2.4. Proof of Theorem 5(c)

2.5. Proof of Theorem 5(d) and Theorem 5(e)

2.6. Proof of Theorem 5(f)

3. Comments and discussion

3.1. Relationship to Theorem 1

3.2. The class $W$ is a proper subset of the class $C_{0}$

3.3. Convergence in measure

Acknowledgements

Notes on contributors

Geoffrey J. McLachlan

References

Information for

Open access

Opportunities

Help and information

Approximation by finite mixtures of continuous density functions that vanish at infinity

Abstract

PUBLIC INTEREST STATEMENT

1 Introduction

2. Main result

2.1. Technical preliminaries

2.2. Proof of theorem 5(a)

2.3. Proof of Theorem 5(b)

2.4. Proof of Theorem 5(c)

2.5. Proof of Theorem 5(d) and Theorem 5(e)

2.6. Proof of Theorem 5(f)

3. Comments and discussion

3.1. Relationship to Theorem 1

3.2. The class W is a proper subset of the class C0

3.3. Convergence in measure

Acknowledgements

Additional information

Notes on contributors

Geoffrey J. McLachlan

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

3.2. The class $W$ is a proper subset of the class $C_{0}$