Czy mediana jest rodzajem środka dla pewnego uogólnienia „środka”?

20

Pojęcie „średniej” wędruje znacznie szerzej niż tradycyjna średnia arytmetyczna; czy rozciąga się tak daleko, że obejmuje medianę? Przez analogię

raw data \overset{id}{⟶} raw data \overset{mean}{⟶} raw mean \overset{{id}^{- 1}}{⟶} arithmetic mean raw data \overset{recip}{⟶} reciprocals \overset{mean}{⟶} mean reciprocal \overset{{recip}^{- 1}}{⟶} harmonic mean raw data \overset{log}{⟶} logs \overset{mean}{⟶} mean log \overset{{log}^{- 1}}{⟶} geometric mean raw data \overset{square}{⟶} squares \overset{mean}{⟶} mean square \overset{{square}^{- 1}}{⟶} root mean square raw data \overset{rank}{⟶} ranks \overset{mean}{⟶} mean rank \overset{{rank}^{- 1}}{⟶} median

$\text{raw data} \overset{\text{id}}{\longrightarrow} \text{raw data} \overset{\text{mean}}{\longrightarrow} \text{raw mean} \overset{\text{id}^{-1}}{\longrightarrow} \text{arithmetic mean} \\ \text{raw data} \overset{\text{recip}}{\longrightarrow} \text{reciprocals} \overset{\text{mean}}{\longrightarrow} \text{mean reciprocal} \overset{\text{recip}^{-1}}{\longrightarrow} \text{harmonic mean} \\ \text{raw data} \overset{\text{log}}{\longrightarrow} \text{logs} \overset{\text{mean}}{\longrightarrow} \text{mean log} \overset{\text{log}^{-1}}{\longrightarrow} \text{geometric mean} \\ \text{raw data} \overset{\text{square}}{\longrightarrow} \text{squares} \overset{\text{mean}}{\longrightarrow} \text{mean square} \overset{\text{square}^{-1}}{\longrightarrow} \text{root mean square} \\ \text{raw data} \overset{\text{rank}}{\longrightarrow} \text{ranks} \overset{\text{mean}}{\longrightarrow} \text{mean rank} \overset{\text{rank}^{-1}}{\longrightarrow} \text{median}$

Rysuję analogię do quasi-arytmetycznej średniej , podanej przez:

M_{f} (x_{1}, \dots, x_{n}) = f^{- 1} (\frac{1}{n} \sum_{i = 1}^{n} f (x_{i}))

$M_f(x_1,\dots,x_n)=f^{-1}\left(\frac{1}{n}\sum_{i=1}^{n}f(x_i) \right)$

Dla porównania, gdy mówimy, że mediana zbioru danych złożonego z pięciu elementów jest równa trzeciemu elementowi, widzimy, że jest to równoznaczne z uszeregowaniem danych od jednego do pięciu (które moglibyśmy oznaczyć funkcją $f$ ); przyjmując średnią z przekształconych danych (która wynosi trzy); i odczytywanie wartości elementu danych, który miał rangę trzy (rodzaj $f^{-1}$ ).

W przykładach średniej geometrycznej, średniej harmonicznej i RMS, $f$ była stałą funkcją, którą można zastosować do dowolnej liczby oddzielnie. W przeciwieństwie do tego, aby przypisać rangę lub cofnąć się od rang do oryginalnych danych (interpolując w razie potrzeby), wymagana jest znajomość całego zestawu danych. Ponadto w definicjach, które przeczytałem o średniej quasi-arytmetycznej, $f$ musi być ciągłe. Czy mediana jest kiedykolwiek uważana za szczególny przypadek średniej quasi-arytmetycznej, a jeśli tak, to w jaki sposób definiuje się $f$ ? Czy też mediana jest kiedykolwiek opisywana jako przykład jakiegoś innego szerszego pojęcia „wrednego”? Średnia quasi-arytmetyczna z pewnością nie jest jedynym dostępnym uogólnieniem.

Część problemu ma charakter terminologiczny (co i tak znaczy „znaczy”, szczególnie w przeciwieństwie do „tendencji centralnej” lub „średniej”?). Na przykład w literaturze dotyczącej systemów sterowania rozmytego funkcja agregująca jest funkcją rosnącą z i ; funkcja agregująca, dla której $F:[a,b] \times [a,b] \to [a,b]$ $F(a,a)=a$ $F(b,b)=b$ dla wszystkich nazywa się „średnią” (w sensie ogólnym). Taka definicja jest, rzecz jasna, niezwykle szeroka! I w tym kontekście mediana jest rzeczywiście określana jako rodzaj średniej. Zastanawiam się jednak, czy mniej szerokie charakterystyki średniej mogą nadal rozciągać się wystarczająco daleko, aby objąć medianę - tak zwanąśrednią uogólnioną $\min(x,y) \leq F(x,y) \leq \max(x,y)$ $x,y \in [a,b]$ $^{[1]}$ (co można lepiej opisać jako „środek mocy”), a środek Lehmera nie, ale inni mogą. Ze względu na swoją wartość Wikipedia umieszcza „medianę” na liście „innych środków” , ale bez dalszego komentarza i cytowania.

: Tak szeroka definicja średniej, odpowiednio poszerzona o więcej niż dwa dane wejściowe, wydaje się standardem w dziedzinie kontroli rozmytej i wielokrotnie pojawiała się podczas wyszukiwania w Internecie przypadków wystąpienia mediany opisanej jako mediana; Przytoczę np. Fodor, JC i Rudas, IJ (2009), „O niektórych klasach funkcji agregacyjnych migrujących”,IFSA / EUSFLAT Conf. (str. 653–656). Nawiasem mówiąc, w tym dokumencie zauważono, że jednym z pierwszych użytkowników terminu „średnia” (moyenne) byłoCauchy, w Cours d'analyse de l'École royale polytechnique, 1ère partie; Przeanalizuj algébrique (1821). Późniejsze wypowiedziAczél,Chisini, $[1]$ Kołmogorow i de Finetti w opracowywaniu bardziej ogólnych pojęć „średnia” niż Cauchy są uznani w Fodor, J. i Roubens, M. (1995), „ O sensie środków ”, Journal of Computational and Applied Mathematics , 64 (1) , 103–115.

mean average median

— Silverfish
źródło

Myślę, że średnia arytmetyczna, mediana i ruda często nazywane są ogólnie „średnimi”, a słowo to jest czasem używane w niejednoznaczny sposób. Książka „ Jak kłamać ze statystykami ” wykorzystuje to jako przykład „leżenia” ze statystykami. (Rozumiem, że twoje pytanie jest bardziej ogólne, więc opublikuj je jako komentarz.)

— Tim

@Tim Mam nienaukowe wrażenie, że rzadko widuje się „tryb” określany jako „wredny”. Ale na pewno istnieje ogromny związek zamieszania wokół użycia „średniej” (która czasami jest używana jako synonim „średniej arytmetycznej”, a innym razem obejmuje miary tendencji centralnej, które wcale nie są środkami) i „średniej” (która w ogólne zastosowanie, a nie w sensie technicznym, jest najczęściej, ale nie wyłącznie, używane w „średniej arytmetycznej”). Nawiasem mówiąc, jest to również trudny temat do wyszukiwania w Internecie, z powodu innych znaczeń „wredny”!

— Silverfish,

3

środki (arytmetyczne, geometryczne, harmoniczne, zasilane, wykładnicze, kombinatoryczne itp.) są „średnimi analitycznymi”. Mediana, kwantyle, pile są „średnimi pozycyjnymi”. Ranking różni się zupełnie od logu, kwadratu itp., Ponieważ jest monotoniczną transformacją dowolnego wariantu na zmienny jednolity i nie ma tylnej ścieżki do transformacji.

— ttnphns

Przy okazji pojęcie „sposób ogólny” jest zajęte en.wikipedia.org/wiki/Generalized_mean

— ttnphns

3

Jeśli dopuścisz wagi do obliczeń

, wówczas medianę można łatwo uznać za rodzaj średniej. Podobnie, ale nie identycznie, pojęcie przyciętych środków z pewnością obejmuje mediany jako szczególny przypadek ograniczający lub grzecznościowy. stata-journal.com/article.html?article=st0313 to jedna dość niedawna recenzja.

\sum_{i} w_{i} x_{i}, \sum_{i} w_{i} = 1

$\sum_i w_i x_i, \sum_i w_i = 1$

— Nick Cox,

9

Oto jeden ze sposobów, w jaki możesz uznać medianę za „ogólny rodzaj średniej” - po pierwsze, ostrożnie zdefiniuj swoją zwykłą średnią arytmetyczną w kategoriach statystyk porządkowych:

\bar{x} = \sum_{i} w_{i} x_{(i)}, w_{i} = \frac{_{1}}{^{n}} .

$\bar{x} = \sum_i w_i x_{(i)},\qquad w_i=\frac{_1}{^n}\,.$

Następnie, zastępując tę zwykłą średnią statystyk zamówienia inną funkcją wagi, otrzymujemy pojęcie „średniej ogólnej”, która uwzględnia zamówienie.

W takim przypadku wiele potencjalnych miar centrum staje się „uogólnionymi rodzajami środków”. W przypadku mediany, dla nieparzystego , i wszystkie inne są równe 0, a dla parzystego , $n$ $w_{(n+1)/2}=1$ $n$ . $w_{\frac{n}{2}}=w_{\frac{n}{2}+1}=\frac{1}{2}$

Podobnie, jeśli spojrzymy na oszacowanie M, oszacowania lokalizacji można również uznać za uogólnienie średniej arytmetycznej (gdzie dla średniej jest kwadratowe, jest liniowe lub funkcja wagi jest płaska), a mediana należy również do tej klasy uogólnień. Jest to nieco inna generalizacja niż poprzednia. $\rho$ $\psi$

Istnieje wiele innych sposobów rozszerzenia pojęcia „podła”, które mogą obejmować medianę.

— Glen_b - Przywróć Monikę
źródło

To jest bardzo miłe. Ściśle związany z tą odpowiedzią, który jest omawiany w artykułach cytowanych w pytaniu: uporządkowana średnia ważona lub OWA

— Silverfish

11

Jeśli myślisz o średniej jako punkcie minimalizującym funkcję straty kwadratowej SSE, wówczas mediana jest punktem minimalizującym liniową funkcję straty MAD, a trybem jest punkt minimalizujący funkcję straty 0-1. Nie wymaga transformacji.

Mediana jest więc przykładem środka Frécheta .

— Mike Anderson
źródło

3

@Mike Anderson: Cóż, to pokazuje, że media są środkiem Frecheta (patrz artykuł na Wikipedii): en.wikipedia.org/wiki/Fr%C3%A9chet_mean

— kjetil b halvorsen

@Kjetil Excellent! Fakt, że mediana jest przykładem środka Frécheta, jest dokładnie odpowiedzią na moje pytanie „czy mediana jest kiedykolwiek opisywana jako przykład jakiegoś innego szerszego pojęcia„ podła ”? I +1 dla Mike'a Andersona. Mam nadzieję, że ta informacja zostanie edytowana w odpowiedzi.

— Silverfish,

2

I've added @Kjetil's comment to the answer so that it will show up in a site search for "Frechet mean". Thanks to both of you.

— Silverfish

4

$\sum_{i=1}^n w_i x_i / \sum_{i=1}^n w_i,$ where $\sum_{i=1}^n w_i = 1$ . Clearly the common or garden mean is the simplest special case with equal weights $w_i = 1/n$ .

Letting the weights depend on the order of values in magnitude, from smallest to largest, points to various other special cases, notably the idea of a trimmed mean, which is known by other names too.

Aby uniknąć nadmiernego używania notacji, gdy nie jest ona potrzebna lub szczególnie pomocna, wyobraź sobie na przykład ignorowanie najmniejszych i największych wartości i przyjmowanie (równej ważonej) średniej z pozostałych. Albo wyobraź sobie, że ignorujesz dwa najmniejsze i dwa największe i bierzesz pod uwagę pozostałe; i tak dalej. Najbardziej energiczne przycinanie zignorowałoby wszystkie oprócz jednej lub dwóch wartości średnich w kolejności, w zależności od tego, czy liczba wartości była nieparzysta, czy parzysta, co oczywiście jest po prostu znaną medianą . Nic w idei przycinania nie zobowiązuje cię do ignorowania równych liczb w każdym ogonie próbki, ale powiedzenie więcej o asymetrycznym przycinaniu odciągnęłoby nas od głównej idei w tym wątku.

In short, means (unqualified) and medians are extreme limiting cases of the family of (symmetric) trimmed means. The overall idea is to allow compromises between one ideal of using all the information in the data and another ideal of protecting oneself from extreme data points, which may be unreliable outliers.

See the reference here for one fairly recent review.

— Nick Cox
źródło

4

The question invites us to characterize the concept of "mean" in a sufficiently broad sense to encompass all the usual means--power means, $L^p$ means, medians, trimmed means--but not so broadly that it becomes almost useless for data analysis. This reply discusses some of the axiomatic properties that any reasonably useful definition of "mean" should have.

Basic Axioms

A usefully broad definition of "mean" for the purpose of data analysis would be any sequence of well-defined, deterministic functions $f_n:A^n\to A$ for $A\subset\mathbb{R}$ and $n=1, 2, \ldots$ such that

(1) $\newcommand{\x}{\mathrm{x}} \newcommand{\min}{\text{min}}\min (\x)\le f_n(\x)\le \max(\x)$ for all $\x = (x_1, x_2, \ldots, x_n)\in A^n$ (a mean lies between the extremes),

(2) $f_n$ is invariant under permutations of its arguments (means do not care about the order of the data), and

(3) each $f_n$ is nondecreasing in each of its arguments (as the numbers increase, their mean cannot decrease).

We must allow for $A$ to be a proper subset of real numbers (such as all positive numbers) because plenty of means, such as geometric means, are defined only on such subsets.

We might also want to add that

(1') there exists at least some $\x\in A$ for which $\min(\x)\ne f_n(\x)\ne \max(\x)$ (means are not extremes). (We cannot require that this always hold. For instance, the median of $(0,0,\ldots,0,1)$ equals $0$ , which is the minimum.)

These properties seem to capture the idea behind a "mean" being some kind of "middle value" of a set of (unordered) data.

Consistency axioms

I am further tempted to stipulate the rather less obvious consistency criterion

(4.a) The range of $f_{n+1}(t, x_1, x_2, \ldots, x_n)$ as $t$ varies throughout the interval $[\min(\x), \max(\x)]$ includes $f_n(\x)$ . In other words, it is always possible to leave the mean unchanged by adjoining an appropriate value $t$ to a dataset. In conjunction with (3), it implies that adjoining extreme values to a dataset will pull the mean towards those extremes.

If we wish to apply the concept of mean to a distribution or "infinite population", then one way would be to obtain it in the limit of arbitrarily large random samples. Of course the limit might not always exist (it does not exist for the arithmetic mean when the distribution has no expectation, for instance). Therefore I do not want to impose any additional axioms to guarantee the existence of such limits, but the following seems natural and useful:

(4.b) Whenever $A$ is bounded and $\x_n$ is a sequence of samples from a distribution $F$ supported on $A$ , then the limit of $f_n(\x_n)$ almost surely exists. This prevents the mean from forever "bouncing around" within $A$ even as sample sizes get larger and larger.

Along the same lines, we could further narrow the idea of a mean to insist that it become a better estimator of "location" as sample sizes increase:

(4.c) Whenever $A$ is bounded, then the variance of the sampling distribution of $f_n(X^{(n)})$ for a random sample $X^{(n)} = (X_1, X_2, \ldots, X_n)$ of $F$ is nondecreasing in $n$ .

Continuity axiom

We might consider asking means to vary "nicely" with the data:

(5) $f_n$ is separately continuous in each argument (a small change in the data values should not induce a sudden jump in their mean).

This requirement might eliminate some strange generalizations, but it does not rule out any well-known mean. It will rule out some aggregation functions.

An invariance axiom

We can conceive of means as applying to either interval or ratio data (in Stevens' well-known sense). We cannot demand they be invariant under shifts of location (the geometric mean is not), but we can require

(6) $f_n(\lambda \x) = \lambda f_n(\x)$ for all $\x \in A^n$ and all $\lambda \gt 0$ for which $\lambda \x \in A^n$ . This says only that we are free to compute $f_n$ using any units of measurement we like.

All the means mentioned in the question satisfy this axiom except for some aggregation functions.

Discussion

General aggregation functions $f_2$ , as described in the question, do not necessarily satisfy axioms (1'), (2), (3), (5), or (6). Whether they satisfy any consistency axioms may depend on how they are extended to $n\gt 2$ .

The usual sample median enjoys all these axiomatic properties.

We could augment the consistency axioms to include

(4.d) $f_{2n}(\x;\x) = f_n(\x)$ for all $\x \in A^n.$

This implies that when all elements of a dataset are repeated equally often, the mean does not change. This may be too strong, though: the Winsorized mean does not have this property (except asymptotically). The purpose of Winsorizing at the $100\alpha\%$ level is to provide resistance against changes in at least $100\alpha\%$ of the data at either extreme. For instance, the 10% Winsorized mean of $(1,2,3,6)$ is the arithmetic mean of $(2,2,3,3)$ , equal to $2.5$ , but the 10% Winsorized mean of $(1,1,2,2,3,3,6,6)$ is $3.5$ .

I do not know which of the consistency axioms (4.a), (4.b), or (4.c) would be most desirable or useful. They appear to be independent: I don't think any two of them imply the third.

— whuber
źródło

(+1) I think (1'), "means are not extremes", is an interesting point. Many otherwise natural definitions of mean happen to include the minimum and maximum as special or limiting cases: this is true of power means, Lehmer means, Fréchet mean, Chisini mean and Stolarsky mean. Though it does seem a bit odd to refer to them as "average"!

— Silverfish

Yes, limiting cases are unavoidable. But for finite datasets we might want to insist that neither the max nor the min qualify as "means."

— whuber

On the other hand, not only is it true that "the usual sample median enjoys all these axiomatic properties", but so do the usual sample quantile (unless I've missed something). It also feels a bit odd to refer to e.g. the upper quartile as a "mean" (though I've seen it used as a measure of central tendency on very skewed data). If we accept all other quantiles, it no longer feels quite so perverse to admit minima and maxima. But I can certainly see it may be desirable to at least retain the right to exclude them.

— Silverfish

1

I am not perturbed by the admission of quantiles into the pantheon of means. After all, for given families of distributions, certain non-median quantiles will coincide with arithmetic means, so you could be in trouble if you tried to eliminate this possibility axiomatically. (Consider a family of lognormal distributions of constant geometric SD, for instance.) If the arithmetic mean cannot qualify as a mean, all is lost!

— whuber

1

I have considered that approach and rejected it, as explained in my answer: if you apply such a criterion for

n > 2

$n \gt 2$ , you eliminate the median as a form of mean!

— whuber

2

I think the median can be considered a type of a generalization of the arithmetic mean. Specifically, the arithmetic mean and the median (among others) can be unified as special cases of the Chisini mean. If you are going to perform some operation over a set of values, the Chisini mean is a number that you can substitute for all of the original values in the set and still get the same result. For example, if you want to sum your values, replacing all the values with the arithmetic mean will yield the same sum. The idea is that a certain value is representative of the numbers in the set in the context of a certain operation over those numbers. (An interesting implication of this way of thinking is that a given value—the arithmetic mean—can only be considered representative under the assumption that you are doing certain things with those numbers.)

This is less obvious for the median (and I note that the median is not listed as one of the Chisini means on Wolfram or Wikipedia), but if you were to allow operations over ranks, the median could fit within the same idea.

— gung - Reinstate Monica
źródło

This is a very interesting suggestion. Could you suggest a suitable operation, so that for a median

M

$M$ we would have

f (M, M, . . ., M) = f (x_{1}, x_{2}, . . ., x_{n})

$f(M,M,...,M)=f(x_1,x_2,...,x_n)$ ?

— Silverfish

That's a good question, @Silverfish, I've been thinking about that ;-). My thinking is more that, in your Q & the discussion in comments, the conceptual framework seems to be how to get the mean & how to get the data back from the mean; OTOH, my framing is what we use the mean for: viz as a compressed representation of the data w/ the minimum information loss.

— gung - Reinstate Monica

I've added some citations to the question which show a wider range of conceptual frameworks, including this one. At the moment I can't see a better

f

$f$ than "take the median", which doesn't quite seem within the spirit of the piece!

— Silverfish

@Silverfish, I grant that does seem like a somewhat problematic hole in my position.

— gung - Reinstate Monica

While the insight from Chisini's set-up is that, for example, the arithmetic mean preserves the sum, while the geometric mean preserves the product, it's still true (just less interesting) that the arithmetic mean of

(\bar{x}, \bar{x}, . . ., \bar{x})

$(\bar{x}, \bar{x}, ..., \bar{x})$ is also

\bar{x}

$\bar{x}$ and so on. So I'm not convinced it's a fatal blow.

— Silverfish

-1

The question is not well defined. If we agree on the common "street" definition of mean as the sum of n numbers divided by n then we have a stake in the ground. Further If we would look at measures of central tendency we could say both Mean and Median are generealization but not of each other. Part of my background is in non parametrics so I like the median and the robustness it provides, invariance to monotonic transformation and more. but each measure has it's place depending on objective.

— Bob Clauss
źródło

2

Welcome to our site, Bob. I believe that if you read to the end of the question--especially the long penultimate paragraph--you will discover that it is precise and well-defined. (If not, it would be a good idea to explain what you mean by "not well defined.) Your comments don't really seem to address what is being asked.

— whuber

1

I actually sympathise with Bob's feeling that the question is not terribly well-defined, in the sense that the concept of "mean" does not have a single definition, but I have tried my best to make things as clear as possible. I hope my most recent edit helps clarify things.

— Silverfish

1

The reason I feel the question has some value other than mere terminology (what does mean mean anyway, and is there a definition we can stretch as far as to include the median?) is that it may be instructive to see the median as just one member of a family of generalizations of the mean; Nick Cox's example of the median as a limiting case of the trimmed mean is particularly nice - it ties in neatly with the "robustness" property you like. In the family of trimmed means, the "street" arithmetic mean and the median lie at opposite ends with a spectrum between them.

— Silverfish