Czy ta dyskretna dystrybucja ma nazwę?

Czy ta dyskretna dystrybucja ma nazwę? Dla $i \in 1...N$

$f(i) = \frac{1}{N} \sum_{j = i}^N \frac{1}{j}$

Natrafiłem na tę dystrybucję z następujących: Mam listę pozycji uszeregowanych według funkcji użyteczności. Chcę losowo wybrać jeden z elementów, kierując się na początek listy. Więc najpierw wybieram indeks pomiędzy 1 a równomiernie. Następnie wybieram pozycję między indeksami 1 i . Wierzę, że ten proces powoduje powyższą dystrybucję. $N$ $j$ $N$ $j$

— Tomek
źródło

To nie jest dystrybucja: nie jest znormalizowana.

— whuber

@ whuber Na początku tak myślałem (i skomentowałem, zanim zdałem sobie sprawę, że źle zrozumiałem i usunąłem komentarz), ale okazało się, że źle zrozumiałem definicję. O ile nie mam dalszych nieporozumień, jest to znormalizowana funkcja masy prawdopodobieństwa.

— Glen_b

Jest znormalizowany. 1/1 pojawi się w sumie dokładnie raz (będzie w f (1)). 1/2 pojawi się dokładnie dwa razy (będzie to f (1) if (2)). itd. Więc suma wszystkich tych sum będzie wynosić N, a stała normalizująca zostanie pokazana jako 1 / N. wymeldowuje się.

— rcorty

Co więcej, nie wiem, jak się nazywa ta dystrybucja. Nie wiem też, w jaki sposób opisany proces prowadzi do tej dystrybucji. Jedną z moich myśli było to, że brzmi to jak dyskretna wersja procesu łamania kija, który jest bardzo łatwy do przeszukiwania.

— rcorty

@Glen_b Dzięki. Czytałam to na moim telefonie, które nie czynią wystarczająco jasno.

f

$f$

— whuber

Odpowiedzi:

Masz dyskretną wersję dystrybucji dziennika negatywnego, to znaczy dystrybucji, której wsparciem jest i której pdf to . $[0, 1]$ $f(t) = - \log t$

Aby to zobaczyć, przedefiniuję zmienną losową, aby przyjmowała wartości z zestawu zamiast i wywołać powstały rozkład . Zatem moje twierdzenie jest takie $\{ 0, 1/N, 2/N, \ldots, 1 \}$ $\{0, 1, 2, \ldots, N \}$ $T$

P r (T = \frac{t}{N}) \to - \frac{1}{N} \log (\frac{t}{N})

$Pr\left( T = \frac{t}{N} \right) \rightarrow - \frac{1}{N} \log \left( \frac{t}{N} \right)$

as $N, t \rightarrow \infty$ while $\frac{t}{N}$ is held (approximately) constant.

First, a little simulation experiment demonstrating this convergence. Here's a small implementation of a sampler from your distribution:

t_sample <- function(N, size) {
  bounds <- sample(1:N, size=size, replace=TRUE)
  samples <- sapply(bounds, function(t) {sample(1:t, size=1)})
  samples / N
}

Here's a histogram of a large sample taken from your distribution:

ss <- t_sample(100, 200000)
hist(ss, freq=FALSE, breaks=50)

enter image description here

and here's the logarithmic pdf overlaid:

linsp <- 1:100 / 100
lines(linsp, -log(linsp))

enter image description here

To see why this convergence occurs, start with your expression

P r (T = \frac{t}{N}) = \frac{1}{N} \sum_{j = t}^{N} \frac{1}{j}

$Pr \left( T = \frac{t}{N} \right) = \frac{1}{N} \sum_{j=t}^N \frac{1}{j}$

and multiply and divide by $N$

P r (T = \frac{t}{N}) = \frac{1}{N} \sum_{j = t}^{N} \frac{N}{j} \frac{1}{N}

$Pr \left( T = \frac{t}{N} \right) = \frac{1}{N} \sum_{j=t}^N \frac{N}{j} \frac{1}{N}$

The summation is now a Riemann sum for the function $g(x) = \frac{1}{x}$ , integrated from $\frac{t}{N}$ to $1$ . That is, for large $N$ ,

P r (T = \frac{t}{N}) \approx \frac{1}{N} \int_{\frac{t}{N}}^{1} \frac{1}{x} d x = - \frac{1}{N} \log (\frac{t}{N})

$Pr \left( T = \frac{t}{N} \right) \approx \frac{1}{N} \int_{\frac{t}{N}}^1 \frac{1}{x} dx = - \frac{1}{N} \log \left( \frac{t}{N} \right)$

which is the expression I wanted to arrive at.

— Matthew Drury
źródło

You're extremely welcome. This was a great question and I had a lot of fun working it out.

— Matthew Drury

This appears to be related to the Whitworth distribution. (I don't believe it is the Whitworth distribution, since if I remember right, that's the distribution of a set of ordered values, but it seems to be connected to it, and relies on the same summation-scheme.)

There's some discussion of the Whitworth (and numerous references) in

Anthony Lawrance and Robert Marks, (2008)
"Firm size distributions in an industry with constrained resources,"
Applied Economics, vol. 40, issue 12, pages 1595-1607

(There looks to be a working paper version here)

Also see

Nancy L Geller, (1979)
A test of significance for the Whitworth distribution,
Journal of the American Society for Information Science, Vol.30(4), pp.229-231

— Glen_b -Reinstate Monica
źródło

To make this answer self-contained, could you provide a definition of the Whitworth distribution and perhaps supply a few words of explanation concerning the connection you see?

— whuber

@whuber Yes, it should be a comment as it stands. I'll edit some details in but it's going to end up a good deal longer.

— Glen_b -Reinstate Monica

Just some kind of definition would be fine.

— whuber

Thanks, that was understood, but nevertheless that will be the outcome.

— Glen_b -Reinstate Monica