Jaki jest rozkład w regresji liniowej pod hipotezą zerową? Dlaczego jego tryb nie jest ustawiony na zero, gdy ?

Jaki jest rozkład współczynnika determinacji, czyli R do kwadratu, , w regresji wielokrotnej liniowej jednowymiarowej z regresją zerową ? $R^2$ $H_0:\beta=0$

Jak to zależy od liczby predyktorów i liczby próbek ? Czy istnieje sposób wyrażenia w formie zamkniętej dla trybu tej dystrybucji? $k$ $n>k$

W szczególności mam wrażenie, że dla prostej regresji (z jednym predyktorem ) ten rozkład ma mod na zero, ale dla regresji wielokrotnej tryb ma niezerową wartość dodatnią. Jeśli to prawda, czy istnieje intuicyjne wyjaśnienie tego „przejścia fazowego”? $x$

Aktualizacja

Jak pokazano poniżej @Alecos, rozkład rzeczywiście osiąga wartość szczytową przy zero, gdy i a nie przy zero, gdy . Wydaje mi się, że na przejściu fazowym powinien być geometryczny widok. Rozważmy geometryczny widok OLS: jest w , definiuje tam podprzestrzeń wymiarową. OLS równa się rzutowaniu na tę podprzestrzeń, a jest kwadratem cosinus kąta między i jego rzutem . $k=2$ $k=3$ $k>3$ $\mathbf y$ $\mathbb R^n$ $\mathbf X$ $k$ $\mathbf y$ $R^2$ $\mathbf y$ $\hat{\mathbf y}$

Otóż z odpowiedzi @ Alecosa wynika, że jeśli wszystkie wektory są losowe, wówczas rozkład prawdopodobieństwa tego kąta osiągnie wartość szczytową przy dla i , ale będzie miał tryb o innej wartości dla . Czemu?! $90^\circ$ $k=2$ $k=3$ $<90^\circ$ $k>3$

Aktualizacja 2: Akceptuję odpowiedź @ Alecos, ale nadal mam wrażenie, że brakuje mi tutaj ważnego wglądu. Jeśli ktokolwiek zasugeruje jakiekolwiek inne (geometryczne lub nie) spojrzenie na to zjawisko, które uczyniłoby to „oczywistym”, chętnie zaoferuję nagrodę.

— ameba mówi Przywróć Monikę
źródło

Czy chcesz założyć normalność błędów?

— Dimitriy V. Masterov,

Tak, chyba trzeba założyć, że na to pytanie można odpowiedzieć (?).

— ameba mówi Przywróć Monikę

Czy sprawdziłeś to davegiles.blogspot.jp/2013/05/good-old-r-squared.html ?

— Khashaa

@Khashaa: w rzeczywistości muszę przyznać, że znalazłem tę stronę blogspot przed opublikowaniem tutaj mojego pytania. Szczerze mówiąc, nadal chciałem omawiać to zjawisko na naszym forum, więc udawałem, że tego nie widziałem.

— ameba mówi Przywróć Monikę

Silnie powiązane pytanie CV stats.stackexchange.com/questions/123651/…

— Alecos Papadopoulos

Odpowiedzi:

W przypadku konkretnej hipotezy (że wszystkie współczynniki regresora są zerowe, nie uwzględniając stałego składnika, który nie jest badany w tym teście) i w normalności wiemy (patrz np. Maddala 2001, s. 155, ale zauważmy, że tam $k$ liczy regresory bez stałego terminu, więc wyrażenie wygląda nieco inaczej) niż statystyka

F = n - k k - 1 R 2 1 - R 2

$F = \frac {n-k}{k-1}\frac {R^2}{1-R^2}$ są rozmieszczone w środkowej

F(k−1,n−k) $F(k-1, n-k)$ zmiennej losowej.

Zauważ, że chociaż nie testujemy stałego terminu, $k$ również go liczy.

Przenoszenie rzeczy,

(k - 1) F - (k - 1) F R 2 = (n - k) R 2 \Rightarrow (k - 1) F = R 2 [(n - k) + (k - 1) F]

$(k-1)F - (k-1)FR^2 = (n-k)R^2 \Rightarrow (k-1)F = R^2\big[(n-k) + (k-1)F\big]$

\Rightarrow R 2 = ( k - 1 ) F ( n - k ) + ( k - 1 ) F

$\Rightarrow R^2 = \frac {(k-1)F}{(n-k) + (k-1)F}$

Ale prawej stronie jest rozpowszechniany jako dystrybucji Beta , a konkretnie

R 2 \sim B e t a (k - 1 2, n - k 2)

$R^2 \sim Beta\left (\frac {k-1}{2}, \frac {n-k}{2}\right)$

Tryb tego rozkładu jest

mode R 2 = k - 1 2 - 1 k - 1 2 + n - k 2 - 2 = k - 3 n - 5

$\text{mode}R^2 = \frac {\frac {k-1}{2}-1}{\frac {k-1}{2}+ \frac {n-k}{2}-2} =\frac {k-3}{n-5}$

TRYB SKOŃCZONY I UNIKALNY
Z powyższej relacji możemy wywnioskować, że aby rozkład miał tryb unikalny i skończony, musimy mieć

k \geq 3, n > 5

$k\geq 3, n >5$

Jest to zgodne z ogólnym wymogiem dla dystrybucji Beta, który jest

{α > 1, β \geq 1}, OR {α \geq 1, β > 1}

$\{\alpha >1 , \beta \geq 1\},\;\; \text {OR}\;\; \{\alpha \geq1 , \beta > 1\}$

jak można wywnioskować z tego wątku CV lub przeczytać tutaj .
Zauważ, że jeśli , otrzymujemy rozkład Uniform, więc wszystkie punkty gęstości są modami (skończonymi, ale nie niepowtarzalnymi). Który tworzy na pytanie dlaczego jeśli , jest rozprowadzany w postaci ? $\{\alpha =1 , \beta = 1\}$ $k=3, n=5$ $R^2$ $U(0,1)$

IMPLIKACJE
Załóżmy, że masz regresorów (w tym stałą), a obserwacji. Całkiem niezła regresja, bez nadmiernego dopasowania. Następnie $k=5$ $n=99$

R 2 ∣ ∣ β = 0 \sim B e t a (2, 47), mode R 2 = 1 47 \approx 0.021

$R^2\Big|_{\beta=0} \sim Beta\left (2, 47\right), \text{mode}R^2 = \frac 1{47} \approx 0.021$

i wykres gęstości

enter image description here

Intuicja proszę: to jest dystrybucja przy założeniu, że nie REGRESSOR faktycznie należy do regresji. Więc a) rozkład jest niezależny od regresorów, b) wraz ze wzrostem wielkości próby, jej rozkład koncentruje się w kierunku zera, ponieważ zwiększona informacja zaburza zmienność małych próbek, które mogą powodować pewne „dopasowanie”, ale także c) jako liczbę nieistotnych regresorów zwiększa się dla danej wielkości próbki, rozkład koncentruje się w kierunku , a my mamy zjawisko „fałszywego dopasowania”. $R^2$ $1$

Ale również, uwaga jak „łatwo” jest do odrzucenia hipotezy zerowej: na konkretnym przykładzie, dla skumulowane prawdopodobieństwo już osiągnął , tak otrzymaną pochodną odrzuci NULL „znikomej regresji” w poziom istotności %. $R^2=0.13$ $0.99$ $R^2>0.13$ $1$

DODATEK
Aby odpowiedzieć na nowy problem dotyczący trybu rozkładu , mogę zaproponować następującą linię myślenia (nie geometryczną), która łączy go ze zjawiskiem „fałszywego dopasowania”: kiedy wykonujemy najmniejsze kwadraty na zbiorze danych , zasadniczo rozwiązujemy układ równań liniowych z niewiadomymi (jedyną różnicą w stosunku do matematyki w szkole średniej jest to, że wtedy nazywaliśmy „znanymi współczynnikami”, co w regresji liniowej nazywamy „zmiennymi / regresorami”, „nieznane x” nazywamy teraz „nieznane współczynniki” i „stałe warunki”, co znamy, nazywamy „zmienną zależną”). Tak długo, jak $R^2$ $n$ $k$ $k<n$ system jest nadmiernie określone i nie ma dokładnego rozwiązania, tylko w przybliżeniu -i różnica pojawia się jako „niewyjaśnionej wariancji zmiennej zależnej”, który jest przechwytywany przez . Jeśli system ma jedno dokładne rozwiązanie (zakładając liniową niezależność). W międzyczasie, gdy zwiększamy liczbę , zmniejszamy „stopień nadmiernej identyfikacji” systemu i „dążymy” do pojedynczego dokładnego rozwiązania. Zgodnie z tym poglądem, warto dlaczego wzrasta fałszywie z dodatkiem nieistotnych regresji, a co za tym idzie, dlatego jego tryb stopniowo przemieszcza się w kierunku , as zwiększa się o ustalonej $1-R^2$ $k=n$ $k$ $R^2$ $1$ $k$ . $n$

— Alecos Papadopoulos
źródło

To matematyka. Dla

pierwszy parametr rozkładu beta („

” w notacji standardowej) staje się mniejszy niż jedność. W takim przypadku dystrybucja Beta nie ma trybu skończonego, pobaw się z keisan.casio.com/exec/system/1180573226, aby zobaczyć, jak kształty się zmieniają. k=2 $k=2$

α $\alpha$

— Alecos Papadopoulos,

@Alecos Doskonała odpowiedź! (+1) Czy mogę zdecydowanie zasugerować dodanie do odpowiedzi wymogu istnienia trybu? Zazwyczaj określa się to jako

ale bardziej subtelnie, jest ok, jeśli równość zachodzi w jednym z dwóch ... Myślę, że dla naszych celów staje się to

i co najmniej jeden z nierówności te są surowe . α>1 $\alpha>1$

β>1 $\beta>1$

k≥3 $k \geq 3$

n≥k+2 $n \geq k + 2$

— Silverfish,

@ Khashaa Z wyjątkiem sytuacji, gdy wymaga tego teoria, nigdy nie wykluczam przechwytywania z regresji - jest to średni poziom zmiennej zależnej, regresory lub brak regresorów (i ten poziom jest zwykle dodatni, więc byłoby głupio samookreśleniem pomiń to). Ale zawsze wykluczam to z testu F regresji, ponieważ zależy mi nie na tym, czy zmienna zależna ma niezerową średnią bezwzględną, ale czy regresory mają jakąkolwiek moc wyjaśniającą w odniesieniu do odchyleń od tej średniej.

— Alecos Papadopoulos

+1! Są tam wyniki dla dystrybucji

dla niezerowej

? R2 $R^2$

βj $\beta_j$

— Christoph Hanck

@ChristophHanck Zobacz także davegiles.blogspot.jp/2013/05/good-old-r-squared.html

— Alecos Papadopoulos

Nie będę ponownie redagował dystrybucja w doskonałej odpowiedzi @ Alecos (jest to standardowy wynik, patrztutajkolejna miła dyskusja), ale chcę podać więcej szczegółów na temat konsekwencji! Po pierwsze, co robi dystrybucję NULLwyglądać dla różnych wartościi? Wykres w odpowiedzi @ Alecos jest dość reprezentatywny dla tego, co dzieje się w praktycznych regresjach wielokrotnych, ale czasami wgląd można uzyskać łatwiej z mniejszych przypadków. Podałem średnią, tryb (tam, gdzie on istnieje) i odchylenie standardowe. Wykres / tabela zasługuje na dobrą gałkę oczną:najlepiej oglądać w pełnym rozmiarze. Mógłbym uwzględnić mniej aspektów, ale wzór byłby mniej wyraźny; Załączyłem $\mathrm{Beta}(\frac{k-1}{2}, \, \frac{n-k}{2})$ $R^2$ $n$ $k$ Rkod, aby czytelnicy mogli eksperymentować z różnymi podzbiorami i . $n$ $k$

Distribution of R2 for small sample sizes

Wartości parametrów kształtu

Schemat kolorów wykresu wskazuje, czy każdy parametr kształtu jest mniejszy niż jeden (czerwony), równy jeden (niebieski), czy więcej niż jeden (zielony). Lewa strona pokazuje wartość podczas gdy jest po prawej stronie. Ponieważ $\alpha$ $\beta$ , jego wartość wzrasta w postępie arytmetycznym o wspólną różnicę $\alpha = \frac{k-1}{2}$ gdy przechodzimy od kolumny do kolumny (dodaj regresor do naszego modelu), podczas gdy dla ustalonego, $\frac{1}{2}$ $n$ zmniejsza się o $\beta = \frac{n-k}{2}$ $\frac{1}{2}$ . The total $\alpha + \beta = \frac{n-1}{2}$ is fixed for each row (for a given sample size). If instead we fix $k$ and move down the column (increase sample size by 1), then $\alpha$ stays constant and $\beta$ increases by $\frac{1}{2}$ . In regression terms, $\alpha$ is half the number of regressors included in the model, and $\beta$ is half the residual degrees of freedom. To determine the shape of the distribution we are particularly interested in where $\alpha$ or $\beta$ equal one.

The algebra is straightforward for $\alpha$ : we have $\frac{k-1}{2}=1$ so $k=3$ . This is indeed the only column of the facet plot that's filled blue on the left. Similarly $\alpha < 1$ for $k<3$ (the $k=2$ column is red on the left) and $\alpha > 1$ for $k>3$ (from the $k=4$ column onwards, the left side is green).

For $\beta=1$ we have $\frac{n-k}{2}=1$ hence $k=n-2$ . Note how these cases (marked with a blue right-hand side) cut a diagonal line across the facet plot. For $\beta > 1$ we obtain $k < n - 2$ (the graphs with a green left side lie to the left of the diagonal line). For $\beta < 1$ we need $k > n - 2$ , which involves only the right-most cases on my graph: at $n=k$ we have $\beta=0$ and the distribution is degenerate, but $n=k-1$ where $\beta = \frac{1}{2}$ is plotted (right side in red).

Since the PDF is $f(x;\,\alpha,\,\beta) \propto x^{\alpha-1} (1-x)^{\beta-1}$ , it is clear that if (and only if) $\alpha<1$ then $f(x) \to \infty$ as $x \to 0$ . We can see this in the graph: when the left side is shaded red, observe the behaviour at 0. Similarly when $\beta<1$ then $f(x) \to \infty$ as $x \to 1$ . Look where the right side is red!

Symmetries

One of the most eye-catching features of the graph is the level of symmetry, but when the Beta distribution is involved, this shouldn't be surprising!

The Beta distribution itself is symmetric if $\alpha = \beta$ . For us this occurs if $n = 2k-1$ which correctly identifies the panels $(k=2, n=3)$ , $(k=3, n=5)$ , $(k=4, n=7)$ and $(k=5, n=9)$ . The extent to which the distribution is symmetric across $R^2 = 0.5$ depends on how many regressor variables we include in the model for that sample size. If $k = \frac{n+1}{2}$ the distribution of $R^2$ is perfectly symmetric about 0.5; if we include fewer variables than that it becomes increasingly asymmetric and the bulk of the probability mass shifts closer to $R^2 = 0$ ; if we include more variables then it shifts closer to $R^2 = 1$ . Remember that $k$ includes the intercept in its count, and that we are working under the null, so the regressor variables should have coefficient zero in the correctly specified model.

There is also an obviously symmetry between distributions for any given $n$ , i.e. any row in the facet grid. For example, compare $(k=3, n=9)$ with $(k=7, n=9)$ . What's causing this? Recall that the distribution of $\mathrm{Beta}(\alpha, \beta)$ is the mirror image of $\mathrm{Beta}(\beta, \alpha)$ across $x=0.5$ . Now we had $\alpha_{k,n} = \frac{k-1}{2}$ and $\beta_{k,n} = \frac{n-k}{2}$ . Consider $k'=n-k+1$ and we find:

α k', n = ( n - k + 1 ) - 1 2 = n - k 2 = β k, n

$\alpha_{k',n} = \frac{(n-k+1)-1}{2} = \frac{n-k}{2} = \beta_{k,n}$

β k', n = n - ( n - k + 1 ) 2 = k - 1 2 = α k, n

$\beta_{k',n} = \frac{n-(n-k+1)}{2} = \frac{k-1}{2} = \alpha_{k,n}$

So this explains the symmetry as we vary the number of regressors in the model for a fixed sample size. It also explains the distributions that are themselves symmetric as a special case: for them, $k' = k$ so they are obliged to be symmetric with themselves!

This tells us something we might not have guessed about multiple regression: for a given sample size $n$ , and assuming no regressors have a genuine relationship with $Y$ , the $R^2$ for a model using $k-1$ regressors plus an intercept has the same distribution as $1 - R^2$ does for a model with $k-1$ residual degrees of freedom remaining.

Special distributions

When $k=n$ we have $\beta=0$ , which isn't a valid parameter. However, as $\beta \to 0$ the distribution becomes degenerate with a spike such that $\mathsf{P}(R^2 = 1)=1$ . This is consistent with what we know about a model with as many parameters as data points - it achieves perfect fit. I haven't drawn the degenerate distribution on my graph but did include the mean, mode and standard deviation.

When $k=2$ and $n=3$ we obtain $\mathrm{Beta}(\frac{1}{2}, \, \frac{1}{2})$ which is the arcsine distribution. This is symmetric (since $\alpha = \beta$ ) and bimodal (0 and 1). Since this is the only case where both $\alpha < 1$ and $\beta < 1$ (marked red on both sides), it is our only distribution which goes to infinity at both ends of the support.

The $\mathrm{Beta}(1, \, 1)$ distribution is the only Beta distribution to be rectangular (uniform). All values of $R^2$ from 0 to 1 are equally likely. The only combination of $k$ and $n$ for which $\alpha = \beta =1$ occurs is $k=3$ and $n=5$ (marked blue on both sides).

The previous special cases are of limited applicability but the case $\alpha > 1$ and $\beta=1$ (green on left, blue on right) is important. Now $f(x;\,\alpha,\,\beta) \propto x^{\alpha-1} (1-x)^{\beta-1} = x^{\alpha-1}$ so we have a power-law distribution on [0, 1]. Of course it's unlikely we'd perform a regression with $k=n-2$ and $k>3$ , which is when this situation occurs. But by the previous symmetry argument, or some trivial algebra on the PDF, when $k=3$ and $n > 5$ , which is the frequent procedure of multiple regression with two regressors and an intercept on a non-trivial sample size, $R^2$ will follow a reflected power law distribution on [0, 1] under $H_0$ . This corresponds to $\alpha=1$ and $\beta>1$ so is marked blue on left, green on right.

You may also have noticed the triangular distributions at $(k=5,n=7)$ and its reflection $(k=3,n=7)$ . We can recognise from their $\alpha$ and $\beta$ that these are just special cases of the power-law and reflected power-law distributions where the power is $2-1=1$ .

Mode

If $\alpha>1$ and $\beta>1$ , all green in the plot, $f(x; \, \alpha, \, \beta)$ is concave with $f(0)=f(1)=0$ , and the Beta distribution has a unique mode $\frac{\alpha-1}{\alpha+\beta-2}$ . Putting these in terms of $k$ and $n$ , the condition becomes $k>3$ and $n>k+2$ while the mode is $\frac{k-3}{n-5}$ .

All other cases have been dealt with above. If we relax the inequality to allow $\beta=1$ , then we include the (green-blue) power-law distributions with $k=n-2$ and $k>3$ (equivalently, $n>5$ ). These cases clearly have mode 1, which actually agrees with the previous formula since $\frac{(n-2)-3}{n-5}=1$ . If instead we allowed $\alpha=1$ but still demanded $\beta>1$ , we'd find the (blue-green) reflected power-law distributions with $k=3$ and $n>5$ . Their mode is 0, which agrees with $\frac{3-3}{n-5}=0$ . However, if we relaxed both inequalities simultaneously to allow $\alpha=\beta=1$ , we'd find the (all blue) uniform distribution with $k=3$ and $n=5$ , which does not have a unique mode. Moreover the previous formula can't be applied in this case, since it would return the indeterminate form $\frac{3-3}{5-5}=\frac{0}{0}$ .

When $n=k$ we get a degenerate distribution with mode 1. When $\beta < 1$ (in regression terms, $n=k-1$ so there is only one residual degree of freedom) then $f(x) \to \infty$ as $x \to 1$ , and when $\alpha < 1$ (in regression terms, $k=2$ so a simple linear model with intercept and one regressor) then $f(x) \to \infty$ as $x \to 0$ . These would be unique modes except in the unusual case where $k=2$ and $n=3$ (fitting a simple linear model to three points) which is bimodal at 0 and 1.

Mean

The question asked about the mode, but the mean of $R^2$ under the null is also interesting - it has the remarkably simple form $\frac{k-1}{n-1}$ . For a fixed sample size it increases in arithmetic progression as more regressors are added to the model, until the mean value is 1 when $k=n$ . The mean of a Beta distribution is $\frac{\alpha}{\alpha+\beta}$ so such an arithmetic progression was inevitable from our earlier observation that, for fixed $n$ , the sum $\alpha+\beta$ is constant but $\alpha$ increases by 0.5 for each regressor added to the model.

$\frac{\alpha}{\alpha+\beta} = \frac{(k-1)/2}{(k-1)/2 + (n-k)/2} = \frac{k-1}{n-1}$

Code for plots

require(grid)
require(dplyr)

nlist <- 3:9 #change here which n to plot
klist <- 2:8 #change here which k to plot

totaln <- length(nlist)
totalk <- length(klist)

df <- data.frame(
    x = rep(seq(0, 1, length.out = 100), times = totaln * totalk),
    k = rep(klist, times = totaln, each = 100),
    n = rep(nlist, each = totalk * 100)
)

df <- mutate(df,
    kname = paste("k =", k),
    nname = paste("n =", n),
    a = (k-1)/2,
    b = (n-k)/2,
    density = dbeta(x, (k-1)/2, (n-k)/2),
    groupcol = ifelse(x < 0.5, 
        ifelse(a < 1, "below 1", ifelse(a ==1, "equals 1", "more than 1")),
        ifelse(b < 1, "below 1", ifelse(b ==1, "equals 1", "more than 1")))
)

g <- ggplot(df, aes(x, density)) +
    geom_line(size=0.8) + geom_area(aes(group=groupcol, fill=groupcol)) +
    scale_fill_brewer(palette="Set1") +
    facet_grid(nname ~ kname)  + 
    ylab("probability density") + theme_bw() + 
    labs(x = expression(R^{2}), fill = expression(alpha~(left)~beta~(right))) +
    theme(panel.margin = unit(0.6, "lines"), 
        legend.title=element_text(size=20),
        legend.text=element_text(size=20), 
        legend.background = element_rect(colour = "black"),
        legend.position = c(1, 1), legend.justification = c(1, 1))


df2 <- data.frame(
    k = rep(klist, times = totaln),
    n = rep(nlist, each = totalk),
    x = 0.5,
    ymean = 7.5,
    ymode = 5,
    ysd = 2.5
)

df2 <- mutate(df2,
    kname = paste("k =", k),
    nname = paste("n =", n),
    a = (k-1)/2,
    b = (n-k)/2,
    meanR2 = ifelse(k > n, NaN, a/(a+b)),
    modeR2 = ifelse((a>1 & b>=1) | (a>=1 & b>1), (a-1)/(a+b-2), 
        ifelse(a<1 & b>=1 & n>=k, 0, ifelse(a>=1 & b<1 & n>=k, 1, NaN))),
    sdR2 = ifelse(k > n, NaN, sqrt(a*b/((a+b)^2 * (a+b+1)))),
    meantext = ifelse(is.nan(meanR2), "", paste("Mean =", round(meanR2,3))),
    modetext = ifelse(is.nan(modeR2), "", paste("Mode =", round(modeR2,3))),
    sdtext = ifelse(is.nan(sdR2), "", paste("SD =", round(sdR2,3)))
)

g <- g + geom_text(data=df2, aes(x, ymean, label=meantext)) +
    geom_text(data=df2, aes(x, ymode, label=modetext)) +
    geom_text(data=df2, aes(x, ysd, label=sdtext))
print(g)

— Silverfish
źródło

Really illuminating visualization. +1

— Khashaa

Great addition, +1, thanks. I noticed that you call

$0$ a mode when the distribution goes to

$+\infty$ when

$x\to 0$ (and nowhere else) -- something @Alecos above (in the comments) did not want to do. I agree with you: it is convenient.

— amoeba says Reinstate Monica

@amoeba from the graphs we'd like to say "values around 0 are most likely" (or 1). But the answer of Alecos is also both self-consistent and consistent with many authorities (people differ on what to do about the 0 and 1 full stop, let alone whether they can count as a mode!). My approach to the mode differs from Alecos mostly because I use conditions on alpha and beta to determine where the formula is applicable, rather than taking my starting point as the formula and seeing which k and n give sensible answers.

— Silverfish

(+1), this is a very meaty answer. By keeping

$k$ too close to

$n$ and both small, the question studies in detail, and so decisively, the case of really small samples with relatively too many and irrelevant regressors.

— Alecos Papadopoulos

@amoeba You probably noticed that this answer furnishes an algebraic answer for why, for sufficiently large

$n$ , the mode of the distribution is 0 for

$k=3$ but positive for

$k>3$ . Since

$f(x) \propto x^{(k-3)/2}(1-x)^{(n-k-2)/2}$ then for

$k=3$ we have

$f(x) \propto (1-x)^{(n-5)/2}$ which will clearly have mode at 0 for

$n>5$ , whereas for

$k=4$ we have

$f(x) \propto x^{1/2}(1-x)^{(n-6)/2}$ whose maximum can be found by calculus to be the quoted mode formula. As

$k$ increases, the power of

$x$ rises by 0.5 each time. It's this

$x^{\alpha-1}$ factor which makes

$f(0)=0$ so kills the mode at 0

— Silverfish