Dlaczego mianownik estymatora kowariancji nie powinien być n-2, a nie n-1?

36

Mianownik (obiektywnego) estymatora wariancji jest ponieważ istnieje obserwacji i szacowany jest tylko jeden parametr. $n-1$ $n$

V (X) = \frac{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}}{n - 1}

$\mathbb{V}\left(X\right)=\frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2}}{n-1}$

Z tego samego powodu zastanawiam się, dlaczego mianownik kowariancji nie powinien wynosić $n-2$ gdy szacuje się dwa parametry?

C o v (X, Y) = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{n - 1}

$\mathbb{Cov}\left(X, Y\right)=\frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)\left(Y_{i}-\overline{Y}\right)}{n-1}$

— MYaseen208
źródło

15

Jeśli to zrobił, to masz dwie sprzeczne definicje dla wariancji: jeden byłby pierwszy formuła, a druga byłaby druga formuła stosowana z

Y = X

$Y=X$ .

— whuber

3

Średnia dwu / wielowymiarowa (oczekiwanie) to jeden, a nie 2 parametry.

— ttnphns

14

@ttnphns To nieprawda: średnia dwuwymiarowa to oczywiście dwa parametry, ponieważ do jej wyrażenia potrzebne są dwie liczby rzeczywiste. (Rzeczywiście, jest to parametr pojedynczego wektora , ale powiedzenie o tym jedynie ukrywa fakt, że ma on dwa składniki). Widać to wyraźnie w stopniach swobody dla testów t wariancji puli, na przykład, gdzie

2

$2$ jest odejmowane, a nie

1

$1$ . Ciekawe w tym pytaniu jest to, w jaki sposób ujawnia ono, jak niejasne, bezwzględne i potencjalnie wprowadzające w błąd jest powszechne „wyjaśnienie”, które odejmujemy

1

$1$ od

n

$n$ ponieważ oszacowano jeden parametr.

— whuber

@ whuber, masz rację. Gdyby tylko (niezależne obserwacje) miały znaczenie, nie wydalibyśmy więcej df na testy wielowymiarowe niż na testy jednowymiarowe.

n

$n$

— ttnphns

3

@whuber: Być może powiedziałbym, że pokazuje, że to, co liczy się jako „parametr” zależy od sytuacji. W tym przypadku wariancja jest obliczana na podstawie obserwacji, $n$ a zatem każda obserwacja - lub całkowita średnia - może być postrzegana jako jeden parametr, nawet jeśli jest to średnia wielowymiarowa, jak powiedział ttnphns. Jednak w innych przypadkach, gdy np. Test uwzględnia liniowe kombinacje wymiarów, każdy wymiar każdej obserwacji staje się „parametrem”. Masz rację, że jest to trudna sprawa.

— ameba mówi Przywróć Monikę

31

Kowariancje są wariancjami.

Od tożsamości polaryzacji

Cov (X, Y) = Var (\frac{X + Y}{2}) - Var (\frac{X - Y}{2}),

$\newcommand{\c}{\text{Cov}}\newcommand{\v}{\text{Var}} \c(X,Y) = \v\left(\frac{X+Y}{2}\right) - \v\left(\frac{X-Y}{2}\right),$

mianowniki muszą być takie same.

— Whuber
źródło

20

Specjalny przypadek powinien dać ci intuicję; pomyśl o następujących kwestiach:

\hat{C o v} (X, X) = \hat{V} (X)

$\hat{\mathbb{Cov}}\left(X, X\right)= \hat{\mathbb{V}}\left(X\right)$

Cieszysz się, że ten ostatni to powodu Korekta Bessela. $\frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2}}{n-1}$

Ale zastąpienie przez w dla pierwszego daje , więc jak myślisz, co może teraz najlepiej wypełnić puste miejsce? $Y$ $X$ $\hat{\mathbb{Cov}}\left(X, Y\right)$ $\frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)\left(X_{i}-\overline{X}\right)}{\text{mystery denominator}}$

— Silverfish
źródło

1

DOBRZE. Ale OP może zapytać: „dlaczego rozważać cov (X, X) i cov (X, Y) jako jedną linię logiczną? Dlaczego zamieniasz Y na X w cov () nieostro? Może cov (X, Y) jest inna sytuacja? ” Nie uniknąłeś tego, podczas gdy odpowiedź (bardzo pozytywnie oceniona) powinna mieć, moim

— zdaniem

7

Szybka i brudna odpowiedź ... Rozważmy najpierw $\text{var}(X)$ ; gdybyś miał $n$ obserwacji o znanej oczekiwanej wartości $E(X) = 0$ , użyłbyś aby oszacować wariancję. ${1\over n}\sum_{i=1}^n X_i^2$

Ponieważ wartość oczekiwana jest nieznana, możesz przekształcić swoje obserwacji w ze znaną wartością oczekiwaną, przyjmując dla . Otrzymasz wzór z w mianowniku - jednak nie są niezależne i musisz to wziąć pod uwagę; na końcu znajdziesz zwykłą formułę. $n$ $n-1$ $A_i = X_i - X_1$ $i = 2, \dots,n$ $n-1$ $A_i$

Teraz dla kowariancji możesz zastosować ten sam pomysł: jeśli oczekiwana wartość wynosiła , miałbyś $(X,Y)$ $(0,0)$ we wzorze. Odejmującod wszystkich innych obserwowanych wartości, otrzymujeszobserwacji o znanej oczekiwanej wartości ... i ${1\over n}$ $(X_1,Y_1)$ $n-1$ we wzorze - ponownie wprowadza to pewną zależność do wzięcia pod uwagę. ${1\over n-1}$

PS sposobem na to jest wybranie ortonormalnej podstawy , czyli wektorów takich, że $\big\langle (1, \dots, 1)' \big\rangle^{\perp}$ $n-1$ $c_1, \dots, c_{n-1} \in \mathbb R^n$

dla wszystkich , $\sum_j c_{ij}^2 = 1$ $i$
dla wszystkich , $\sum_j c_{ij} = 0$ $i$
dla wszystkich . $\sum_j c_{i_1j} c_{i_2j} = 0$ $i_1 \ne i_2$

Następnie można zdefiniować zmienne oraz . Wartości są niezależne, mają oczekiwaną wartość i mają taką samą wariancję / kowariancję jak zmienne pierwotne. $n-1$ $A_i = \sum_j c_{ij} X_j$ $B_i = \sum_j c_{ij} Y_j$ $(A_i,B_i)$ $(0,0)$

Chodzi o to, że jeśli chcesz pozbyć się nieznanego oczekiwania, porzucisz jedną (i tylko jedną) obserwację. Działa to tak samo w obu przypadkach.

— Elvis
źródło

6

Oto dowód, że estymator kowariancji kowariancji p-variation próbki o mianowniku jest obiektywnym estymatorem macierzy kowariancji: $\frac{1}{n-1}$

. $x' = (x_1,...,x_p)$

$\Sigma= E((x-\mu)(x-\mu)')$

$S = \frac{1}{n} \sum (x_i - \bar{x})(x_i - \bar{x})'$

To show: $E(S) = \frac{n-1}{n}\Sigma$

Proof: $S= \frac{1}{n}\sum x_ix_i' - \bar{x}\bar{x}'$

(2) $E(\bar{x}\bar{x}') = \frac{1}{n} \Sigma+ \mu\mu'$

Therefore: $E(S) = \Sigma + \mu\mu' - (\frac{1}{n} \Sigma+ \mu\mu') = \frac{n-1}{n} \Sigma$

And so $S_u = \frac{n}{n-1}S$ , with the final denominator $\frac{1}{n-1}$ , is unbiased. The off-diagonal elements of $S_u$ are your individual sample covariances.

Additional remarks:

The n draws are independent. This is used in (2) to calculate the covariance of the sample mean.
Step (1) and (2) use the fact that $Cov(x)= E[xx']-\mu\mu'$
Step (2) uses the fact that $Cov(\bar{x})= \frac{1}{n}\Sigma$

— statchrist
źródło

The difficulty being in step 2 ! :)

— Elvis

@Elvis It's messy. One needs to apply the rule Cov(X+Y,Z)=Cov(X,Z) + Cov(Y,Z) and recognize that the different draws are independent. Then it's basically summing up the covariance n times and scaling it down by 1/n²

— statchrist

4

I guess one way to build intuition behind using 'n-1' and not 'n-2' is - that for calculating co-variance we do not need to de-mean both X and Y, but either of the two, i.e.

$\sum (X-\mu_x)(Y - \mu_y) = \sum (X-\mu_x)Y \ \ \ or \ \ \ \sum (Y-\mu_y)X$

— Uditg_ucla
źródło

Could you elaborate on how this bears on the question of what denominator to use? The algebraic relation in evidence derives from the fact that the residuals relative to the mean sum to zero, but otherwise is silent about which denominator is relevant.

— whuber

5

I came here because I had the same question as the OP. I think this answer gets at the nub of the point @whuber pointed out above: that the rule of thumb is that df ~= n - (parameters estimated) can be "vague, unrigorous, and potentially misleading." This points out the fact that though it looks like you need to estimate two parameters (xbar and ybar), you really only estimate one (xbar or ybar). Since the df should be the same in both cases, it must be the lower of the two. I think that is the intent here.

— mpettis

1

1) Start $df=2n$ .

2) Sample covariance is proportional to $\Sigma_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})$ . Lose two $df$ ; one from $\bar{X}$ , one from $\bar{Y}$ resulting in $df=2(n-1)$ .

3) However, $\Sigma_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})$ only contains $n$ separate terms, one from each product. When two numbers are multiplied together the independent information from each separate number disappears.

As a trite example, consider that

$24=1*24=2*12=3*8=4*6=6*4=8*3=12*2=24*1$ ,

and that does not include irrationals and fractions, e.g. $24=2\sqrt{6}*2\sqrt{6}$ , so that when we multiply two number series together and examine their product, all we see are the $df=n-1$ from one number series, as we have lost half of the original information, that is, what those two numbers were before the pair-wise grouping into one number (i.e., multiplication) was performed.

In other words, without loss of generality we can write

$(X_i-\bar{X})(Y_i-\bar{Y})=z_i-\bar{z}$ for some $z_i$ and $\bar{z}$ ,

i.e., $z_i=X_iY_i-\bar{X}Y_i-X_i\bar{Y}$ , and, $\bar{z}=\bar{X}\bar{Y}$ . From the $z$ 's, which then clearly have $df=n-1$ , the covariance formula becomes

$\Sigma_{i=1}^n\frac{z_i-\bar{z}}{n-1}=$

$\Sigma_{i=1}^n\frac{[(X_i-\bar{X})(Y_i-\bar{Y})]}{n-1}=$

$\frac{1}{n-1}\Sigma_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})$ .

Thus, the answer to the question is that the $df$ are halved by grouping.

— Carl
źródło

@whuber How on earth did I get the same thing posted twice and deleted once? What gives? Can we get rid of one of them? For future reference, is there any way to permanently delete such duplicates? I have a few hanging around and it's annoying.

— Carl

As far as I can tell, you reposted your answer from the duplicate to here. (Nobody else has the power to post answers in your name.) The system strongly discourages posting identical answers in multiple threads, so when I saw that, it convinced me these two threads are perfect duplicates and I "merged" them. This is a procedure that moves all comments and answers from the source thread to the target thread. I then deleted your duplicate post here in the target thread. It will remain permanently deleted, but will be visible to you as well as to people of sufficiently high reputation.

— whuber

@whuber I didn't know what happens in a merge, that a merge was taking place or what many of the rules are, despite looking things up constantly. It takes time to learn, be patient, BTW, would you consider taking stats.stackexchange.com/questions/251700/… off of Hold?

— Carl