Jeśli jądro Epanechnikowa jest teoretycznie optymalne podczas szacowania gęstości jądra, dlaczego nie jest częściej używane?

Czytałem (na przykład tutaj ), że jądro Epanechnikowa jest optymalne, przynajmniej w sensie teoretycznym, podczas szacowania gęstości jądra. Jeśli to prawda, to dlaczego Gaussian pojawia się tak często jako domyślne jądro lub w wielu przypadkach jedyne jądro w bibliotekach szacowania gęstości?

nonparametric kernel-smoothing

— John Rauser
źródło

Dwa pytania zostały tutaj połączone: dlaczego nie są częściej używane? dlaczego Gaussian często jest domyślnym / jedynym jądrem? Może to brzmieć trywialnie, ale imię Epanechnikov może wydawać się trudne do poprawnego przeliterowania i wymowy dla osób, które nie mówią płynnie w tym języku. (Nie jestem nawet pewien, czy E. był Rosjaninem; nie znalazłem żadnych szczegółów biograficznych.) Ponadto, jeśli pokażę (np.) Wagę biologiczną, skomentuj jej kształt dzwonu, skończoną szerokość i zachowanie na krawędziach, które wydają się łatwiej sprzedać. Epanechnikov jest domyślny w Stacie kdensity.

— Nick Cox

Dodałbym, że ta teoretyczna optymalność nie ma większego znaczenia w praktyce, jeśli w ogóle.

— Xi'an

To znajome imię. Jeśli sensowne jest użycie jądra, które nie ma skończonego wsparcia, powinieneś go preferować. Z mojego doświadczenia wynika, że nie ma to sensu, więc wybór wydaje się towarzyski, a nie techniczny.

— Nick Cox,

@NickCox, tak, E był Rosjaninem, to nie jest skrót :) Był enigmatyczną osobą, to wszystko, co można o nim znaleźć. Pamiętam też bardzo przydatną książkę, którą ktoś z jego imieniem napisał na programowalnych kalkulatorach, tak, to była wtedy wielka sprawa

— Aksakal

@amoeba Pracował w Институт радиотехники и электроники Российской Академии Наук им. Котельникова, założę się, że przeprowadził niejawne badania, pełne imię to Епанечников Виктор Александрович

— Aksakal

Odpowiedzi:

Przyczyną tego, że jądro Epanechnikowa nie jest powszechnie używane ze względu na jego teoretyczną optymalizację, może być bardzo dobrze że jądro Epanechnikowa nie jest w rzeczywistości teoretycznie optymalne . Tsybakov wyraźnie krytykuje argument, że jądro Epanechnikowa jest „teoretycznie optymalne” w s. 16–19 Wstępu do estymacji nieparametrycznej (sekcja 1.2.4).

Próbując podsumować, przy pewnych założeniach dotyczących jądra $K$ i stałej gęstości $p$ można stwierdzić, że średni zintegrowany błąd kwadratowy ma postać

\begin{matrix} (1) & \frac{1}{n h} \int K^{2} (u) d u + \frac{h^{4}}{4} S_{K}^{2} \int (p^{″} (x))^{2} d x . \end{matrix}

$\frac{1}{nh} \int K^2 (u) du + \frac{h^4}{4}S_K^2 \int (p''(x))^2 dx \,. \tag{1}$

Główną krytyką Tsybakowa wydaje się być minimalizowanie w stosunku do nieujemnych jąder, ponieważ często możliwe jest uzyskanie lepszych wyników estymatorów, które są nawet nieujemne, bez ograniczania się do nieujemnych jąder.

Pierwszy krok argumentu dla jądra Epanechnikowa zaczyna się od zminimalizowania $(1)$ ponad $h$ i wszystkich nieujemnych jąder (zamiast wszystkich jąder szerszej klasy), aby uzyskać „optymalną” przepustowość dla $K$

h^{M I S E} (K) = {(\frac{\int K^{2}}{n S_{K}^{2} \int (p^{″})^{2}})}^{1 / 5}

$h^{MISE}(K) = \left( \frac{\int K^2}{nS_K^2 \int (p'')^2} \right)^{1/5}$

i „optymalne” jądro (Epanechnikov)

K^{*} (u) = \frac{3}{4} (1 - u^{2})_{+}

$K^*(u) = \frac{3}{4}(1-u^2)_+$

którego średni zintegrowany błąd kwadratowy wynosi:

h^{M I S E} (K^{*}) = {(\frac{15}{n \int (p^{″})^{2}})}^{1 / 5} .

$h^{MISE}(K^*) = \left( \frac{15}{n \int (p'')^2} \right)^{1/5} \,.$

Nie są to jednak możliwe wybory, ponieważ zależą od wiedzy (za pośrednictwem $p''$ ) o nieznanej gęstości $p$ - dlatego są wielkościami „wyroczni”.

Twierdzenie Tsybakowa sugeruje, że asymptotyczny MISE dla wyroczni Epanechnikowa to:

\begin{matrix} (2) & lim_{n \to \infty} n^{4 / 5} E_{p} \int (p_{n}^{E} (x) - p (x))^{2} d x = \frac{3^{4 / 5}}{5^{1 / 5} 4} {(\int (p^{″} (x))^{2} d x)}^{1 / 5} . \end{matrix}

$\lim_{n \to \infty} n^{4/5} \mathbb{E}_p \int (p_n^E (x) - p(x))^2 dx = \frac{3^{4/5}}{5^{1/5}4} \left( \int (p''(x))^2 dx \right)^{1/5} \,. \tag{2}$

Tsybakov says (2) is often claimed to be the best achievable MISE, but then shows that one can use kernels of order 2 (for which $S_K =0$ ) to construct kernel estimators, for every $\varepsilon >0$ , such that

\underset{n \to \infty}{lim sup} n^{4 / 5} E_{p} \int ({\hat{p}}_{n} (x) - p (x))^{2} d x \leq ε .

$\limsup_{n \to \infty} n^{4/5} \mathbb{E}_p \int (\hat{p}_n (x) - p(x))^2 dx \le \varepsilon \,.$

$\hat{p}_n$ $p_n^+ := \max(0, \hat{p}_n)$ $K$

\underset{n \to \infty}{lim sup} n^{4 / 5} E_{p} \int (p_{n}^{+} (x) - p (x))^{2} d x \leq ε .

$\limsup_{n \to \infty} n^{4/5} \mathbb{E}_p \int (p_n^+ (x) - p(x))^2 dx \le \varepsilon \,.$

Therefore, for $\varepsilon$ small enough, there exist true estimators which have smaller asymptotic MISE than the Epanechnikov oracle, even using the same assumptions on the unknown density $p$ .

In particular, one has as a result that the infimum of the asymptotic MISE for a fixed $p$ over all kernel estimators (or positive parts of kernel estimators) is $0$ . So the Epanechnikov oracle is not even close to being optimal, even when compared to true estimators.

The reason why people advanced the argument for the Epanechnikov oracle in the first place is that one often argues that the kernel itself should be non-negative because the density itself is non-negative. But as Tsybakov points out, one doesn't have to assume that the kernel is non-negative in order to get non-negative density estimators, and by allowing other kernels one can non-negative density estimators which (1) aren't oracles and (2) perform arbitrarily better than the Epanechnikov oracle for a fixed $p$ . Tsybakov uses this discrepancy to argue that it doesn't make sense to argue for optimality in terms of a fixed $p$ , but only for optimality properties which are uniform over a class of densities. He also points out that the argument still works when using the MSE instead of MISE.

EDIT: See also Corollary 1.1. on p.25, where the Epanechnikov kernel is shown to be inadmissible based on another criterion. Tsybakov really seems not to like the Epanechnikov kernel.

— Chill2Macht
źródło

+1 for an interesting read, but this does not answer why Gaussian kernel is used more often than Epanechnikov kernel: they are both non-negative.

— amoeba says Reinstate Monica

@amoeba That is true. At the very least this answers the question in the title, which is only about the Epanechnikov kernel. (I.e. it addresses the premise for the question and shows that it is false.)

— Chill2Macht

(+1) One thing to beware with Tsybakov's scheme of taking the positive part of a possibly-negative kernel estimate – which is at least my memory of his suggestion – is that although the resulting density estimator might give better MSE convergence to the true density, the density estimate will in general not be a valid density (since you're cutting off mass, and it no longer integrates to 1). If you actually only care about MSE, it doesn't matter, but sometimes this will be a significant problem.

— Dougal

The Gaussian kernel is used for example in density estimation through derivatives:

\frac{d^{i} f}{d x^{i}} (x) \approx \frac{1}{b a n d w i d t h} \sum_{j = 1}^{N} \frac{d^{i} k}{d x^{i}} (X_{j}, x)

$\frac{d^if}{dx^i}(x)\approx \frac{1}{bandwidth}\sum_{j=1}^N \frac{d^ik}{dx^i}(X_j,x)$

This is because the Epanechnikov kernel has 3 derivatives before it's identically zero, unlike the Gaussian which has infinitely many (nonzero) derivatives. See section 2.10 in your link for more examples.

— Alex R.
źródło

The first derivative of the Epanechnikov (note the second n, by the way) kernel is not continuous where the function crosses the kernel's own bounds; that might be more of an issue.

— Glen_b -Reinstate Monica

@Glen_b: You're probably right, although having 0 derivatives after some

i

$i$ would be silly too.

— Alex R.

@AlexR. While what you say is true, I don't understand how it explains why the Gaussian is so common in ordinary density estimation (as opposed to estimating the derivative of the density). And even when estimating derivatives, section 2.10 suggests that the Gaussian is never the preferred kernel.

— John Rauser

@JohnRauser: Keep in mind that you need to use higher order Epanechnikov kernels for optimality. Usually people use a Gaussian because it's just easier to work with and has nicer properties.

— Alex R.

@AlexR I'd quibble on "[u]sually people use a Gaussian"; do you have any systematic data on frequency of use or this is just an impression based on work you see? I see biweights often, but I wouldn't claim more than that.

— Nick Cox