Intuicja (geometryczna lub inna)

19

Rozważ podstawową tożsamość wariancji:

V a r (X) = = = E [(X - E [X]) 2] . . . E [X 2] - (E [X]) 2

$\begin{eqnarray} Var(X) &=& E[(X - E[X])^2]\\ &=& ...\\ &=& E[X^2] - (E[X])^2 \end{eqnarray}$

Jest to prosta algebraiczna manipulacja definicją momentu centralnego na momenty niecentralne.

Umożliwia wygodne manipulowanie $Var(X)$ w innych kontekstach. Umożliwia także obliczenie wariancji za pomocą pojedynczego przejścia danych zamiast dwóch przejść, najpierw w celu obliczenia średniej, a następnie w celu obliczenia wariancji.

Ale co to znaczy ? Dla mnie nie ma bezpośredniej intuicji geometrycznej, która wiązałaby rozkład wokół średniej do rozproszenia około 0. Ponieważ $X$ jest zbiorem w jednym wymiarze, jak postrzegasz rozkład wokół średniej jako różnicę między rozkładem wokół początku i kwadratu oznaczać?

Czy istnieją jakieś dobre interpretacje algebry liniowej lub fizyczne lub inne, które dałyby wgląd w tę tożsamość?

variance descriptive-statistics intuition

— Mitch
źródło

7

Wskazówka: to jest twierdzenie Pitagorasa.

— whuber

1

@Matthew Zastanawiam się, co ma oznaczać „

E $E$ ”. Podejrzewam, że nie jest to oczekiwanie, a jedynie skrót od średniej arytmetycznej. W przeciwnym razie równania byłyby niepoprawne (i prawie bez znaczenia, ponieważ zrównaliby wówczas zmienne losowe z liczbami).

— whuber

2

@ whuber Ponieważ produkty wewnętrzne wprowadzają ideę odległości i kątów, a iloczyn wewnętrzny przestrzeni wektorowej zmiennych losowych o wartościach rzeczywistych jest zdefiniowany jako

E[XY] $\mathbb E[XY]$ (?), zastanawiam się, czy można podać jakąś geometryczną intuicję za pomocą nierówność trójkąta. Nie mam pojęcia, jak postępować, ale zastanawiałem się, czy to ma sens.

— Antoni Parellada,

1

@Antoni Nierówność trójkąta jest zbyt ogólna. Produkt wewnętrzny jest znacznie bardziej specjalnym przedmiotem. Na szczęście odpowiednia intuicja geometryczna jest dokładnie taka, jak w przypadku geometrii euklidesowej. Co więcej, nawet w przypadku zmiennych losowych

X $X$ i

Y $Y$ , niezbędną geometrię można ograniczyć do dwuwymiarowej przestrzeni wektora rzeczywistego generowanej przez

X $X$ i

Y $Y$ : to znaczy do samej płaszczyzny euklidesowej. W obecnym przypadku

X $X$ nie wydaje się być RV: jest to po prostu wektor

n $n$ . Tutaj spacja obejmuje

X $X$ i

(1,1,…,1) $(1,1,\ldots, 1)$ jest płaszczyzną euklidesową, w której zachodzi cała geometria.

— whuber

3

Ustawianie

w odpowiedzi I związane, a następnie dzieląc wszystkie warunki przez

(jeśli chcesz) daje pełną algebraiczne rozwiązanie dla wariancji: nie ma powodu, aby skopiować go na nowo. To dlatego, że

jest średnią arytmetyczną

, skąd

jest tylko

krotnością wariancji, jak ją tutaj zdefiniowałeś,

jest

razy kwadratową średnią arytmetyczną, a

β^1=0 $\hat\beta_1=0$

n $n$

β^0 $\hat\beta_0$

y $y$

||y−y^||2 $||y-\hat y||^2$

n $n$

||y^||2 $||\hat y||^2$

n $n$

jest

krotnością średniej arytmetycznej wartości do kwadratu. ||y||2 $||y||^2$

n $n$

— whuber

21

Rozwijając punkt @ Whubera w komentarzach, jeśli i są ortogonalne, masz twierdzenie Pitagorasa : $Y$ $Z$

∥ Y ∥ 2 + ∥ Z ∥ 2 = ∥ Y + Z ∥ 2

$\|Y\|^2 + \|Z\|^2 = \|Y + Z\|^2$

Observe that $\langle Y, Z \rangle \equiv \mathrm{E}[YZ]$ is a valid inner product and that $\|Y\| = \sqrt{\mathrm{E}[Y^2]}$ is the norm induced by that inner product.

Let $X$ be some random variable. Let $Y = \mathrm{E}[X]$ , Let $Z = X - \mathrm{E}[X]$ . If $Y$ and $Z$ are orthogonal:

\Leftrightarrow \Leftrightarrow ∥ Y ∥ 2 + ∥ Z ∥ 2 = ∥ Y + Z ∥ 2 E [E [X] 2] + E [(X - E [X]) 2] = E [X 2] E [X] 2 + V a r [X] = E [X 2]

$\begin{align*} & \|Y\|^2 + \|Z\|^2 = \|Y + Z\|^2 \\ \Leftrightarrow \quad&\mathrm{E}[\mathrm{E}[X]^2] + \mathrm{E}[(X - \mathrm{E}[X])^2] = \mathrm{E}[X^2] \\ \Leftrightarrow \quad & \mathrm{E[X]}^2 + \mathrm{Var}[X]= \mathrm{E}[X^2] \end{align*}$

And it's easy to show that $Y = \mathrm{E}[X]$ and $Z = X - \mathrm{E}[X]$ are orthogonal under this inner product:

⟨ Y, Z ⟩ = E [E [X] (X - E [X])] = E [X] 2 - E [X] 2 = 0

$\langle Y, Z \rangle = \mathrm{E}[\mathrm{E}[X]\left(X - \mathrm{E}[X] \right)] = \mathrm{E}[X]^2 - \mathrm{E}[X]^2 = 0$

One of the legs of the triangle is $X - \mathrm{E}[X]$ , the other leg is $\mathrm{E}[X]$ , and the hypotenuse is $X$ . And the Pythagorean theorem can be applied because a demeaned random variable is orthogonal to its mean.

Technical remark:

$Y$ in this example really should be the vector $Y = \mathrm{E}[X] \mathbf{1}$ , that is, the scalar $\mathrm{E}[X]$ times the constant vector $\mathbf{1}$ (e.g. $\mathbf{1} = [1, 1, 1, \ldots, 1]'$ in the discrete, finite outcome case). $Y$ is the vector projection of $X$ onto the constant vector $\mathbf{1}$ .

Simple Example

Consider the case where $X$ is a Bernoulli random variable where $p = .2$ . We have:

X = [10] P = [.2 .8] E [X] = \sum i P i X i = .2

$X = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \quad P = \begin{bmatrix} .2 \\ .8 \end{bmatrix} \quad \mathrm{E}[X] = \sum_i P_iX_i = .2$

Y = E [X] 1 = [.2 .2] Z = X - E [X] = [.8 - .2]

$Y = \mathrm{E}[X]\mathbf{1} = \begin{bmatrix} .2 \\ .2 \end{bmatrix} \quad Z = X - \mathrm{E}[X] = \begin{bmatrix} .8 \\ -.2 \end{bmatrix}$

And the picture is:

The squared magnitude of the red vector is the variance of $X$ , the squared magnitude of the blue vector is $\mathrm{E}[X]^2$ , and the squared magnitude of the yellow vector is $\mathrm{E}[X^2]$ .

REMEMBER though that these magnitudes, the orthogonality etc... aren't with respect to the usual dot product $\sum_i Y_iZ_i$ but the inner product $\sum_i P_iY_iZ_i$ . The magnitude of the yellow vector isn't 1, it is .2.

The red vector $Y = \mathrm{E}[X]$ and the blue vector $Z = X - \mathrm{E}[X]$ are perpendicular under the inner product $\sum_i P_i Y_i Z_i$ but they aren't perpendicular in the intro, high school geometry sense. Remember we're not using the usual dot product $\sum_i Y_i Z_i$ as the inner product!

— Matthew Gunn
źródło

That is really good!

— Antoni Parellada

1

Good answer (+1), but it lacks a figure, and also might be a bit confusing for OP because your Z is their X...

— amoeba says Reinstate Monica

@MatthewGunn, great answer. you can check my answer below for a representation where orthogonality is in the Euclidean sense.

— YBE

I hate to be obtuse, but I'm having trouble keeping

Z $Z$ ,

Var(X) $Var(X)$ , and the direction of the logic straight ('because' comes at places that don't make sense to me). It feels like a lot of (well substantiated) facts are stated randomly. What space is the inner product in? Why 1?

— Mitch

@Mitch The logical order is: (1) Observe that a probability space defines a vector space; we can treat random variables as vectors. (2) Define the inner product of random variables

Y $Y$ and

Z $Z$ as

E[YZ] $E[YZ]$ . In an inner product space, vectors

Y $Y$ and

Z $Z$ are defined as orthogonal if their inner product is zero. (3a) Let

X $X$ be some random variable. (3b) Let

Y=E[X] $Y = E[X]$ and

Z=X−E[X] $Z = X - E[X]$ . (4) Observe that

Y $Y$ and

Z $Z$ defined this way are orthogonal. (5) Since

Y $Y$ and

Z $Z$ are orthogonal, the pythagorean theorem applies (6) By simple algebra, the Pythagorean theorem is equivalent to the identity.

— Matthew Gunn

8

I will go for a purely geometric approach for a very specific scenario. Let us consider a discrete valued random variable $X$ taking values $\{x_1,x_2\}$ with probabilities $(p_1,p_2)$ . We will further assume that this random variable can be represented in $\mathbb{R}^2$ as a vector, $\mathbf{X} = \left(x_1\sqrt{p_1},x_2\sqrt{p_2} \right)$ .

Notice that the length-square of $\mathbf{X}$ is $x_1^2p_1+x_2^2p_2$ which is equal to $E[X^2]$ . Thus, $\left\| \mathbf{X} \right\| = \sqrt{E[X^2]}$ .

Since $p_1+p_2=1$ , the tip of vector $\mathbf{X}$ actually traces an ellipse. This becomes easier to see if one reparametrizes $p_1$ and $p_2$ as $\cos^2(\theta)$ and $\sin^2(\theta)$ . Hence, we have $\sqrt{p_1} =\cos(\theta)$ and $\sqrt{p_2} = \sin(\theta)$ .

One way of drawing ellipses is via a mechanism called Trammel of Archimedes. As described in wiki: It consists of two shuttles which are confined ("trammelled") to perpendicular channels or rails, and a rod which is attached to the shuttles by pivots at fixed positions along the rod. As the shuttles move back and forth, each along its channel, the end of the rod moves in an elliptical path. This principle is illustrated in the figure below.

Now let us geometrically analyze one instance of this trammel when the vertical shuttle is at $A$ and the horizontal shuttle is at $B$ forming an angle of $\theta$ . Due to construction, $\left|BX\right| = x_2$ and $\left| AB \right| = x_1-x_2$ , $\forall \theta$ (here $x_1\geq x_2$ is assumed wlog).

Let us draw a line from origin, $OC$ , that is perpendicular to the rod. One can show that $\left| OC \right|=(x_1-x_2) \sin(\theta) \cos(\theta)$ . For this specific random variable

V a r (X) = = = = = (x 21 p 1 + x 22 p 2) - (x 1 p 1 + x 2 p 2) 2 x 21 p 1 + x 22 p 2 - x 21 p 21 - x 22 p 22 - 2 x 1 x 2 p 1 p 2 x 21 (p 1 - p 21) + x 22 (p 2 - p 22) - 2 x 1 x 2 p 1 p 2 p 1 p 2 (x 21 - 2 x 1 x 2 + x 22) [(x 1 - x 2) p 1 - - \sqrt p 2 - - \sqrt] 2 = | O C | 2

$\begin{eqnarray} Var(X) &=& (x_1^2p_1 +x_2^2p_2) - (x_1p_1+x_2p_2)^2 \\ &=& x_1^2p_1 +x_2^2p_2 - x_1^2p_1^2 - x_2^2p_2^2 - 2x_1x_2p_1p_2 \\ &=& x_1^2(p_1-p_1^2) + x_2^2(p_2-p_2^2) - 2x_1x_2p_1p_2 \\ &=& p_1p_2(x_1^2- 2x_1x_2 + x_2^2) \\ &=& \left[(x_1-x_2)\sqrt{p_1}\sqrt{p_2}\right]^2 = \left|OC \right|^2 \end{eqnarray}$ Therefore, the perpendicular distance

|OC| $\left|OC \right|$ from the origin to the rod is actually equal to the standard deviation,

σ $\sigma$ .

If we compute the length of segment from $C$ to $X$ :

| C X | = = = x 2 + (x 1 - x 2) cos 2 (θ) x 1 cos 2 (θ) + x 2 sin 2 (θ) x 1 p 1 + x 2 p 2 = E [X]

$\begin{eqnarray} \left|CX\right| &=& x_2 + (x_1-x_2)\cos^2(\theta) \\ &=& x_1\cos^2(\theta) +x_2\sin^2(\theta) \\ &=& x_1p_1 + x_2p_2 = E[X] \end{eqnarray}$

Applying the Pythagorean Theorem in the triangle OCX, we end up with

E [X 2] = V a r (X) + E [X] 2 .

$\begin{equation} E[X^2] = Var(X) + E[X]^2. \end{equation}$

To summarize, for a trammel that describes all possible discrete valued random variables taking values $\{x_1,x_2\}$ , $\sqrt{E[X^2]}$ is the distance from the origin to the tip of the mechanism and the standard deviation $\sigma$ is the perpendicular distance to the rod.

Note: Notice that when $\theta$ is $0$ or $\pi/2$ , $X$ is completely deterministic. When $\theta$ is $\pi/4$ we end up with maximum variance.

— YBE
źródło

1

+1 Nice answer. And multiplying vectors by the square of the probabilities is a cool/useful trick to make the usual probabilistic notion of orthogonality look orthogonal!

— Matthew Gunn

Great graphics. The symbols all make sense (the trammel describing an ellipse and then the Pythagorean Thm applies) but somehow I'm not getting intuitively how it gives an idea of how 'magically' it relates the moments (the spread and center.

— Mitch

consider the trammel as a process that defines all the possible

(x1,x2) $(x_1,x_2)$ valued random variables. When the rod is horizontal or vertical you have a deterministic RV. In the middle there is randomness and it turns out that in my proposed geometric framework how random a RV (its std) is exactly measured by the distance of the rod to the origin. There might be a deeper relationship here as elliptic curves connects various objects in math but I am not a mathematician so I cannot really see that connection.

— YBE

3

You can rearrange as follows:

V a r (X) E [X 2] = = E [X 2] - (E [X]) 2 (E [X]) 2 + V a r (X)

$\begin{eqnarray} Var(X) &=& E[X^2] - (E[X])^2\\ E[X^2] &=& (E[X])^2 + Var(X) \end{eqnarray}$

Then, interpret as follows: the expected square of a random variable is equal to the square of its mean plus the expected squared deviation from its mean.

— Lam
źródło

Oh. Huh. Simple. But the squares still seem kinda uninterpreted. I mean it makes sense (sort of, extremely loosely) without the squares.

— Mitch

3

I am not sold on this.

— Michael R. Chernick

1

If the Pythagorean theorem applies, what is the triangle with what sides and how are the two legs perpendicular?

— Mitch

1

Sorry for not having the skill to elaborate and provide a proper answer, but I think the answer lies in the physical classical mechanics concept of moments, especially the conversion between 0 centred "raw" moments and mean centred central moments. Bear in mind that variance is the second order central moment of a random variable.

— S. Diaxo
źródło

1

The general intuition is that you can relate these moments using the Pythagorean Theorem (PT) in a suitably defined vector space, by showing that two of the moments are perpendicular and the third is the hypotenuse. The only algebra needed is to show that the two legs are indeed orthogonal.

For the sake of the following I'll assume you meant sample means and variances for computation purposes rather than moments for full distributions. That is:

E [X] E [X 2] V a r (X) = = = 1 n \sum x i, 1 n \sum x 2 i, 1 n \sum (x i - E [X]) 2, m e a n, f i r s t c e n t r a l s a m p l e m o m e n t s e c o n d s a m p l e m o m e n t (n o n - c e n t r a l) v a r i a n c e, s e c o n d c e n t r a l s a m p l e m o m e n t

$\begin{array}{rcll} E[X] &=& \frac{1}{n}\sum x_i,& \rm{mean, first\ central\ sample\ moment}\\ E[X^2] &=& \frac{1}{n}\sum x^2_i,& \rm{second\ sample\ moment\ (non-central)}\\ Var(X) &=& \frac{1}{n}\sum (x_i - E[X])^2,& \rm{variance, second\ central\ sample\ moment} \end{array}$

(where all sums are over $n$ items).

For reference, the elementary proof of $Var(X) = E[X^2] - E[X]^2$ is just symbol pushing:

V a r (X) = = = = = 1 n \sum (x i - E [X]) 2 1 n \sum (x 2 i - 2 E [X] x i + E [X] 2) 1 n \sum x 2 i - 2 n E [X] \sum x i + 1 n \sum E [X] 2 E [X 2] - 2 E [X] 2 + 1 n n E [X] 2 E [X 2] - E [X] 2

$\begin{eqnarray} Var(X) &=& \frac{1}{n}\sum (x_i - E[X])^2\\ &=& \frac{1}{n}\sum (x^2_i - 2 E[X]x_i + E[X]^2)\\ &=& \frac{1}{n}\sum x^2_i - \frac{2}{n} E[X] \sum x_i + \frac{1}{n}\sum E[X]^2\\ &=& E[X^2] - 2 E[X]^2 + \frac{1}{n} n E[X]^2\\ &=& E[X^2] - E[X]^2\\ \end{eqnarray}$

There's little meaning here, just elementary manipulation of algebra. One might notice that $E[X]$ is a constant inside the summation, but that is about it.

Now in the vector space/geometrical interpretation/intuition, what we'll show is the slightly rearranged equation that corresponds to PT, that

V a r (X) + E [X] 2 = E [X 2]

$\begin{eqnarray} Var(X) + E[X]^2 &=& E[X^2] \end{eqnarray}$

So consider $X$ , the sample of $n$ items, as a vector in $\mathbb{R}^n$ . And let's create two vectors $E[X]{\bf 1}$ and $X-E[X]{\bf 1}$ .

The vector $E[X]{\bf 1}$ has the mean of the sample as every one of its coordinates.

The vector $X-E[X]{\bf 1}$ is $\langle x_1-E[X], \dots, x_n-E[X]\rangle$ .

These two vectors are perpendicular because the dot product of the two vectors turns out to be 0:

E [X] 1 \cdot (X - E [X] 1) = = = = = \sum E [X] (x i - E [X]) \sum (E [X] x i - E [X] 2) E [X] \sum x i - \sum E [X] 2 n E [X] E [X] - n E [X] 2 0

$\begin{eqnarray} E[X]{\bf 1}\cdot(X-E[X]{\bf 1}) &=& \sum E[X](x_i-E[X])\\ &=& \sum (E[X]x_i-E[X]^2)\\ &=& E[X]\sum x_i - \sum E[X]^2\\ &=& n E[X]E[X] - n E[X]^2\\ &=& 0\\ \end{eqnarray}$

So the two vectors are perpendicular which means they are the two legs of a right triangle.

Then by PT (which holds in $\mathbb{R}^n$ ), the sum of the squares of the lengths of the two legs equals the square of the hypotenuse.

By the same algebra used in the boring algebraic proof at the top, we showed that we get that $E[X^2]$ is the square of the hypotenuse vector:

$(X-E[X])^2 + E[X]^2 = ... = E[X^2]$ where squaring is the dot product (and it's really $E[x]{\bf 1}$ and $(X-E[X])^2$ is $Var(X)$ .

The interesting part about this interpretation is the conversion from a sample of $n$ items from a univariate distribution to a vector space of $n$ dimensions. This is similar to $n$ bivariate samples being interpreted as really two samples in $n$ variables.

In one sense that is enough, the right triangle from vectors and $E[X^2]$ pops out as the hypotnenuse. We gave an interpretation (vectors) for these values and show they correspond. That's cool enough, but unenlightening either statistically or geometrically. It wouldn't really say why and would be a lot of extra conceptual machinery to, in the end mostly, reproduce the purely algebraic proof we already had at the beginning.

Another interesting part is that the mean and variance, though they intuitively measure center and spread in one dimension, are orthogonal in $n$ dimensions. What does that mean, that they're orthogonal? I don't know! Are there other moments that are orthogonal? Is there a larger system of relations that includes this orthogonality? central moments vs non-central moments? I don't know!

— Mitch
źródło

I am also interested in an interpretation/intuition behind the superficially similar bias variance tradeoff equation. Does anybody have hints there?

— Mitch

Let

pi $p_i$ be the probability of state

i $i$ occurring. If

pi=1n $p_i = \frac{1}{n}$ then

∑ipiXiYi=1n∑iXiYi $\sum_i p_i X_i Y_i = \frac{1}{n} \sum_i X_i Y_i$ , that is,

E[XY] $E[XY]$ is simply the dot product between

X $X$ and

Y $Y$ divided by

n $n$ . If

∀ipi=1n $\forall_i p_i = \frac{1}{n}$ , what I used as an inner product (

E[XY]=∑ipiXiYi $E[XY] = \sum_i p_i X_i Y_i$ ) is basically the dot product divided by

$n$ . This whole Pythagorean interpretation still needs to you use the particular inner product

$E[XY]$ (though it's algebriacly close to the classic dot product for a probability measure

$P$ such that

$\forall_i p_i = \frac{1}{n}$ ).

— Matthew Gunn

Btw, the trick @YBE did is to define new vectors

$\hat{x}$ and

$\hat{y}$ such that

$\hat{x}_i = x_i \sqrt{p_i}$ and

$\hat{y}_i = x_i \sqrt{p_i}$ . Then dot product

$\hat{x} \cdot \hat{y} = \sum_i x_i \sqrt{p_i} y_i \sqrt{p_i} = \sum_i p_i x_i y_i = E[xy]$ .The dot product of

$\hat{x}$ and

$\hat{y}$ corresponds to

$E[xy]$ (which is what I used as an inner product).

— Matthew Gunn