Intuicja (geometryczna lub inna)


19

Rozważ podstawową tożsamość wariancji:

V a r ( X )= E [ ( X - E [ X ] ) 2 ]= . . .= E [ X 2 ] - ( E [ X ] ) 2

Var(X)===E[(XE[X])2]...E[X2](E[X])2

Jest to prosta algebraiczna manipulacja definicją momentu centralnego na momenty niecentralne.

Umożliwia wygodne manipulowanie V a r ( X )Var(X) w innych kontekstach. Umożliwia także obliczenie wariancji za pomocą pojedynczego przejścia danych zamiast dwóch przejść, najpierw w celu obliczenia średniej, a następnie w celu obliczenia wariancji.

Ale co to znaczy ? Dla mnie nie ma bezpośredniej intuicji geometrycznej, która wiązałaby rozkład wokół średniej do rozproszenia około 0. Ponieważ XX jest zbiorem w jednym wymiarze, jak postrzegasz rozkład wokół średniej jako różnicę między rozkładem wokół początku i kwadratu oznaczać?

Czy istnieją jakieś dobre interpretacje algebry liniowej lub fizyczne lub inne, które dałyby wgląd w tę tożsamość?


7
Wskazówka: to jest twierdzenie Pitagorasa.
whuber

1
@Matthew Zastanawiam się, co ma oznaczać „ EE ”. Podejrzewam, że nie jest to oczekiwanie, a jedynie skrót od średniej arytmetycznej. W przeciwnym razie równania byłyby niepoprawne (i prawie bez znaczenia, ponieważ zrównaliby wówczas zmienne losowe z liczbami).
whuber

2
@ whuber Ponieważ produkty wewnętrzne wprowadzają ideę odległości i kątów, a iloczyn wewnętrzny przestrzeni wektorowej zmiennych losowych o wartościach rzeczywistych jest zdefiniowany jako E [ X Y ]E[XY] (?), zastanawiam się, czy można podać jakąś geometryczną intuicję za pomocą nierówność trójkąta. Nie mam pojęcia, jak postępować, ale zastanawiałem się, czy to ma sens.
Antoni Parellada,

1
@Antoni Nierówność trójkąta jest zbyt ogólna. Produkt wewnętrzny jest znacznie bardziej specjalnym przedmiotem. Na szczęście odpowiednia intuicja geometryczna jest dokładnie taka, jak w przypadku geometrii euklidesowej. Co więcej, nawet w przypadku zmiennych losowych XX i YY , niezbędną geometrię można ograniczyć do dwuwymiarowej przestrzeni wektora rzeczywistego generowanej przez XX i YY : to znaczy do samej płaszczyzny euklidesowej. W obecnym przypadku XX nie wydaje się być RV: jest to po prostu wektor nn . Tutaj spacja obejmuje XX i ( 1 , 1 , , 1 )(1,1,,1)jest płaszczyzną euklidesową, w której zachodzi cała geometria.
whuber

3
Ustawianie β 1 = 0 w odpowiedzi I związane, a następnie dzieląc wszystkie warunki przez n (jeśli chcesz) daje pełną algebraiczne rozwiązanie dla wariancji: nie ma powodu, aby skopiować go na nowo. To dlatego, że β 0 jest średnią arytmetyczną y , skąd | | y - y | | 2 jest tylko n- krotnością wariancji, jak ją tutaj zdefiniowałeś, | | y | | 2 jest n razy kwadratową średnią arytmetyczną, a |β^1=0nβ^0y||yy^||2n||y^||2n|y | | 2 jest n- krotnością średniej arytmetycznej wartości do kwadratu. ||y||2n
whuber

Odpowiedzi:


21

Rozwijając punkt @ Whubera w komentarzach, jeśli Y i Z są ortogonalne, masz twierdzenie Pitagorasa :YZ

Y 2 + Z 2 = Y + Z 2

Y2+Z2=Y+Z2

Observe that Y,ZE[YZ]Y,ZE[YZ] is a valid inner product and that Y=E[Y2]Y=E[Y2] is the norm induced by that inner product.

Let XX be some random variable. Let Y=E[X]Y=E[X], Let Z=XE[X]Z=XE[X]. If YY and ZZ are orthogonal:

Y2+Z2=Y+Z2E[E[X]2]+E[(XE[X])2]=E[X2]E[X]2+Var[X]=E[X2]

Y2+Z2=Y+Z2E[E[X]2]+E[(XE[X])2]=E[X2]E[X]2+Var[X]=E[X2]

And it's easy to show that Y=E[X]Y=E[X] and Z=XE[X]Z=XE[X] are orthogonal under this inner product:

Y,Z=E[E[X](XE[X])]=E[X]2E[X]2=0

Y,Z=E[E[X](XE[X])]=E[X]2E[X]2=0

One of the legs of the triangle is XE[X]XE[X], the other leg is E[X]E[X], and the hypotenuse is XX. And the Pythagorean theorem can be applied because a demeaned random variable is orthogonal to its mean.


Technical remark:

YY in this example really should be the vector Y=E[X]1Y=E[X]1, that is, the scalar E[X]E[X] times the constant vector 11 (e.g. 1=[1,1,1,,1]1=[1,1,1,,1] in the discrete, finite outcome case). YY is the vector projection of XX onto the constant vector 11.

Simple Example

Consider the case where XX is a Bernoulli random variable where p=.2p=.2. We have:

X=[10]P=[.2.8]E[X]=iPiXi=.2

X=[10]P=[.2.8]E[X]=iPiXi=.2

Y=E[X]1=[.2.2]Z=XE[X]=[.8.2]

Y=E[X]1=[.2.2]Z=XE[X]=[.8.2]

And the picture is: enter image description here

The squared magnitude of the red vector is the variance of XX, the squared magnitude of the blue vector is E[X]2E[X]2, and the squared magnitude of the yellow vector is E[X2]E[X2].

REMEMBER though that these magnitudes, the orthogonality etc... aren't with respect to the usual dot product iYiZiiYiZi but the inner product iPiYiZiiPiYiZi. The magnitude of the yellow vector isn't 1, it is .2.

The red vector Y=E[X]Y=E[X] and the blue vector Z=XE[X]Z=XE[X] are perpendicular under the inner product iPiYiZiiPiYiZi but they aren't perpendicular in the intro, high school geometry sense. Remember we're not using the usual dot product iYiZiiYiZi as the inner product!


That is really good!
Antoni Parellada

1
Good answer (+1), but it lacks a figure, and also might be a bit confusing for OP because your Z is their X...
amoeba says Reinstate Monica

@MatthewGunn, great answer. you can check my answer below for a representation where orthogonality is in the Euclidean sense.
YBE

I hate to be obtuse, but I'm having trouble keeping ZZ, Var(X)Var(X), and the direction of the logic straight ('because' comes at places that don't make sense to me). It feels like a lot of (well substantiated) facts are stated randomly. What space is the inner product in? Why 1?
Mitch

@Mitch The logical order is: (1) Observe that a probability space defines a vector space; we can treat random variables as vectors. (2) Define the inner product of random variables YY and ZZ as E[YZ]E[YZ]. In an inner product space, vectors YY and ZZ are defined as orthogonal if their inner product is zero. (3a) Let XX be some random variable. (3b) Let Y=E[X]Y=E[X] and Z=XE[X]Z=XE[X]. (4) Observe that YY and ZZ defined this way are orthogonal. (5) Since YY and ZZ are orthogonal, the pythagorean theorem applies (6) By simple algebra, the Pythagorean theorem is equivalent to the identity.
Matthew Gunn

8

I will go for a purely geometric approach for a very specific scenario. Let us consider a discrete valued random variable XX taking values {x1,x2}{x1,x2} with probabilities (p1,p2)(p1,p2). We will further assume that this random variable can be represented in R2R2 as a vector, X=(x1p1,x2p2)X=(x1p1,x2p2). enter image description here

Notice that the length-square of XX is x21p1+x22p2x21p1+x22p2 which is equal to E[X2]E[X2]. Thus, X=E[X2]X=E[X2].

Since p1+p2=1p1+p2=1, the tip of vector XX actually traces an ellipse. This becomes easier to see if one reparametrizes p1p1 and p2p2 as cos2(θ)cos2(θ) and sin2(θ)sin2(θ). Hence, we have p1=cos(θ)p1=cos(θ) and p2=sin(θ)p2=sin(θ).

One way of drawing ellipses is via a mechanism called Trammel of Archimedes. As described in wiki: It consists of two shuttles which are confined ("trammelled") to perpendicular channels or rails, and a rod which is attached to the shuttles by pivots at fixed positions along the rod. As the shuttles move back and forth, each along its channel, the end of the rod moves in an elliptical path. This principle is illustrated in the figure below.

Now let us geometrically analyze one instance of this trammel when the vertical shuttle is at AA and the horizontal shuttle is at BB forming an angle of θθ. Due to construction, |BX|=x2|BX|=x2 and |AB|=x1x2|AB|=x1x2, θθ (here x1x2x1x2 is assumed wlog).

enter image description here

Let us draw a line from origin, OCOC, that is perpendicular to the rod. One can show that |OC|=(x1x2)sin(θ)cos(θ)|OC|=(x1x2)sin(θ)cos(θ). For this specific random variable Var(X)=(x21p1+x22p2)(x1p1+x2p2)2=x21p1+x22p2x21p21x22p222x1x2p1p2=x21(p1p21)+x22(p2p22)2x1x2p1p2=p1p2(x212x1x2+x22)=[(x1x2)p1p2]2=|OC|2

Var(X)=====(x21p1+x22p2)(x1p1+x2p2)2x21p1+x22p2x21p21x22p222x1x2p1p2x21(p1p21)+x22(p2p22)2x1x2p1p2p1p2(x212x1x2+x22)[(x1x2)p1p2]2=|OC|2
Therefore, the perpendicular distance |OC||OC| from the origin to the rod is actually equal to the standard deviation, σσ.

If we compute the length of segment from CC to XX: |CX|=x2+(x1x2)cos2(θ)=x1cos2(θ)+x2sin2(θ)=x1p1+x2p2=E[X]

|CX|===x2+(x1x2)cos2(θ)x1cos2(θ)+x2sin2(θ)x1p1+x2p2=E[X]

Applying the Pythagorean Theorem in the triangle OCX, we end up with E[X2]=Var(X)+E[X]2.

E[X2]=Var(X)+E[X]2.

To summarize, for a trammel that describes all possible discrete valued random variables taking values {x1,x2}{x1,x2}, E[X2]E[X2] is the distance from the origin to the tip of the mechanism and the standard deviation σσ is the perpendicular distance to the rod.

Note: Notice that when θθ is 00 or π/2π/2, XX is completely deterministic. When θθ is π/4π/4 we end up with maximum variance.


1
+1 Nice answer. And multiplying vectors by the square of the probabilities is a cool/useful trick to make the usual probabilistic notion of orthogonality look orthogonal!
Matthew Gunn

Great graphics. The symbols all make sense (the trammel describing an ellipse and then the Pythagorean Thm applies) but somehow I'm not getting intuitively how it gives an idea of how 'magically' it relates the moments (the spread and center.
Mitch

consider the trammel as a process that defines all the possible (x1,x2)(x1,x2) valued random variables. When the rod is horizontal or vertical you have a deterministic RV. In the middle there is randomness and it turns out that in my proposed geometric framework how random a RV (its std) is exactly measured by the distance of the rod to the origin. There might be a deeper relationship here as elliptic curves connects various objects in math but I am not a mathematician so I cannot really see that connection.
YBE

3

You can rearrange as follows:

Var(X)=E[X2](E[X])2E[X2]=(E[X])2+Var(X)

Var(X)E[X2]==E[X2](E[X])2(E[X])2+Var(X)

Then, interpret as follows: the expected square of a random variable is equal to the square of its mean plus the expected squared deviation from its mean.


Oh. Huh. Simple. But the squares still seem kinda uninterpreted. I mean it makes sense (sort of, extremely loosely) without the squares.
Mitch

3
I am not sold on this.
Michael R. Chernick

1
If the Pythagorean theorem applies, what is the triangle with what sides and how are the two legs perpendicular?
Mitch

1

Sorry for not having the skill to elaborate and provide a proper answer, but I think the answer lies in the physical classical mechanics concept of moments, especially the conversion between 0 centred "raw" moments and mean centred central moments. Bear in mind that variance is the second order central moment of a random variable.


1

The general intuition is that you can relate these moments using the Pythagorean Theorem (PT) in a suitably defined vector space, by showing that two of the moments are perpendicular and the third is the hypotenuse. The only algebra needed is to show that the two legs are indeed orthogonal.

For the sake of the following I'll assume you meant sample means and variances for computation purposes rather than moments for full distributions. That is:

E[X]=1nxi,mean,first central sample momentE[X2]=1nx2i,second sample moment (noncentral)Var(X)=1n(xiE[X])2,variance,second central sample moment

E[X]E[X2]Var(X)===1nxi,1nx2i,1n(xiE[X])2,mean,first central sample momentsecond sample moment (noncentral)variance,second central sample moment

(where all sums are over nn items).

For reference, the elementary proof of Var(X)=E[X2]E[X]2Var(X)=E[X2]E[X]2 is just symbol pushing: Var(X)=1n(xiE[X])2=1n(x2i2E[X]xi+E[X]2)=1nx2i2nE[X]xi+1nE[X]2=E[X2]2E[X]2+1nnE[X]2=E[X2]E[X]2

Var(X)=====1n(xiE[X])21n(x2i2E[X]xi+E[X]2)1nx2i2nE[X]xi+1nE[X]2E[X2]2E[X]2+1nnE[X]2E[X2]E[X]2

There's little meaning here, just elementary manipulation of algebra. One might notice that E[X]E[X] is a constant inside the summation, but that is about it.

Now in the vector space/geometrical interpretation/intuition, what we'll show is the slightly rearranged equation that corresponds to PT, that

Var(X)+E[X]2=E[X2]

Var(X)+E[X]2=E[X2]

So consider XX, the sample of nn items, as a vector in RnRn. And let's create two vectors E[X]1E[X]1 and XE[X]1XE[X]1.

The vector E[X]1E[X]1 has the mean of the sample as every one of its coordinates.

The vector XE[X]1XE[X]1 is x1E[X],,xnE[X]x1E[X],,xnE[X].

These two vectors are perpendicular because the dot product of the two vectors turns out to be 0: E[X]1(XE[X]1)=E[X](xiE[X])=(E[X]xiE[X]2)=E[X]xiE[X]2=nE[X]E[X]nE[X]2=0

E[X]1(XE[X]1)=====E[X](xiE[X])(E[X]xiE[X]2)E[X]xiE[X]2nE[X]E[X]nE[X]20

So the two vectors are perpendicular which means they are the two legs of a right triangle.

Then by PT (which holds in RnRn), the sum of the squares of the lengths of the two legs equals the square of the hypotenuse.

By the same algebra used in the boring algebraic proof at the top, we showed that we get that E[X2]E[X2] is the square of the hypotenuse vector:

(XE[X])2+E[X]2=...=E[X2](XE[X])2+E[X]2=...=E[X2] where squaring is the dot product (and it's really E[x]1E[x]1 and (XE[X])2(XE[X])2 is Var(X)Var(X).

The interesting part about this interpretation is the conversion from a sample of nn items from a univariate distribution to a vector space of nn dimensions. This is similar to nn bivariate samples being interpreted as really two samples in nn variables.

In one sense that is enough, the right triangle from vectors and E[X2]E[X2] pops out as the hypotnenuse. We gave an interpretation (vectors) for these values and show they correspond. That's cool enough, but unenlightening either statistically or geometrically. It wouldn't really say why and would be a lot of extra conceptual machinery to, in the end mostly, reproduce the purely algebraic proof we already had at the beginning.

Another interesting part is that the mean and variance, though they intuitively measure center and spread in one dimension, are orthogonal in nn dimensions. What does that mean, that they're orthogonal? I don't know! Are there other moments that are orthogonal? Is there a larger system of relations that includes this orthogonality? central moments vs non-central moments? I don't know!


I am also interested in an interpretation/intuition behind the superficially similar bias variance tradeoff equation. Does anybody have hints there?
Mitch

Let pipi be the probability of state ii occurring. If pi=1npi=1n then ipiXiYi=1niXiYiipiXiYi=1niXiYi, that is, E[XY]E[XY] is simply the dot product between XX and YY divided by nn. If ipi=1nipi=1n, what I used as an inner product ( E[XY]=ipiXiYiE[XY]=ipiXiYi) is basically the dot product divided by n. This whole Pythagorean interpretation still needs to you use the particular inner product E[XY] (though it's algebriacly close to the classic dot product for a probability measure P such that ipi=1n).
Matthew Gunn

Btw, the trick @YBE did is to define new vectors ˆx and ˆy such that ˆxi=xipi and ˆyi=xipi. Then dot product ˆxˆy=ixipiyipi=ipixiyi=E[xy].The dot product of ˆx and ˆy corresponds to E[xy] (which is what I used as an inner product).
Matthew Gunn
Korzystając z naszej strony potwierdzasz, że przeczytałeś(-aś) i rozumiesz nasze zasady używania plików cookie i zasady ochrony prywatności.
Licensed under cc by-sa 3.0 with attribution required.