Rzucanie piłkami do pojemników, oszacuj niższe prawdopodobieństwo

14

To nie jest praca domowa, choć na to wygląda. Wszelkie odniesienia są mile widziane. :-)

Scenariusz: Istnieje $n$ różnych piłek i $n$ różnych pojemników (oznaczonych od 1 do $n$ , od lewej do prawej). Każda piłka jest rzucana niezależnie i równomiernie do koszy. Niech $f(i)$ będzie liczbą kulek w $i$ -tym pojemniku. Niech $E_i$ oznaczają następujące zdarzenie.

Dla każdego $j\le i$ , $\sum_{k\le j}{f(k)} \le j-1$

Oznacza to, że pierwsze pojemniki $j$ (najbardziej lewe pojemniki $j$ ) zawierają mniej niż kulki $j$ , dla każdego $j\le i$ .

Pytanie: Oszacowanie $\sum_{i<n}{Pr(E_i)}$ w kategoriach $n$ ? Kiedy $n$ idzie w nieskończoność. Preferowane jest ograniczenie dolne. Nie sądzę, by istniała łatwo obliczalna formuła.

Przykład: $\lim\limits_{n\to\infty}{Pr(E_1)}=\lim\limits_{n\to\infty}{(\frac{n-1}{n})^n}=\frac{1}{e}$ . Uwaga $Pr(E_n)=0$ .

Zgaduję: Myślę, że $\sum_{i<n}{Pr(E_i)}=\ln n$ , gdy $n$ idzie w nieskończoność. Uważałem pierwsze $\ln n$ elementów w sumowaniu.

reference-request co.combinatorics pr.probability

— Peng Zhang
źródło

1

Wygląda jak podtekst problemu urodzinowego.

— Gopi

@Gopi Nie mogę się przekonać, że moje pytanie jest ograniczonym problemem urodzinowym. Czy możesz to wyraźnie wyjaśnić? Dziękuję Ci bardzo. Uwaga: Ograniczenie dotyczy sumy piłek w pierwszych pojemnikach

, a nie liczby pojemników w określonym pojemniku.

j

$j$

— Peng Zhang,

Rzeczywiście, mój zły, po ponownym przeczytaniu artykułu na Wikipedii o problemie urodzinowym zdałem sobie sprawę, że rozważam inny problem, który został dostosowany do problemu urodzinowego.

— Gopi

2

Niektóre niepoprawne pomysły ... Zastanów się, jak zakodować stan: Przeczytaj pojemniki od lewej do prawej. Jeśli pierwszy pojemnik ma kulki i, wypisz sekwencję i-tych, a następnie 0. Zrób to dla wszystkich pojemników od lewej do prawej. Wygląda na to, że jesteś zainteresowany największym i takim, że ten ciąg binarny (który ma n zer i n) po raz pierwszy zawiera więcej zer niż zer. Teraz, pozwala wykonać skok z losem i wygenerować 0 i 1 z równym prawdopodobieństwem

. (To może być kompletny nonsens). Ten problem dotyczy liczb katalońskich i słów Dyck. I...???

1 / 2

$1/2$

— Sariel Har-Peled

4

Nie rozumiem w twojej definicji, dlaczego ważne jest, że kule są różne. Również interpretacja ciągów uwzględnia fakt, że pojemniki są różne.

— Sariel Har-Peled

11

EDYCJA: (2014-08-08) Jak zauważa Douglas Zare w komentarzach, poniższy argument, a konkretnie „pomost” między dwoma prawdopodobieństwami, jest nieprawidłowy. Nie widzę prostego sposobu, aby to naprawić. Zostawię odpowiedź tutaj jako wierzę, że nadal zapewnia pewną intuicję, ale wiem, że jest nie tak w ogóle.

Pr (E_{m}) \leq \prod_{l = 1}^{m} Pr (F_{l})

$\Pr(E_m) \le \prod_{l=1}^{m}\Pr(F_l)$

To nie będzie pełna odpowiedź, ale mam nadzieję, że będzie zawierała wystarczającą ilość treści, którą Ty lub ktoś bardziej kompetentny ode mnie skończę.

Rozważ prawdopodobieństwo, że dokładnie kulek wpadnie do pierwszych (z ) pojemników: $k$ $l$ $n$

(\binom{n}{k}) {(\frac{l}{n})}^{k} {(\frac{n - l}{n})}^{n - k}

$\binom{n}{k} \left( \frac{l}{n} \right)^k \left(\frac{n-l}{n} \right)^{n-k}$

Wywołaj prawdopodobieństwo, że mniej niż kulek wpadnie do pierwszych pojemników : $l$ $l$ $F_l$

Pr (F_{l}) = \sum_{k = 0}^{l - 1} (\binom{n}{k}) {(\frac{l}{n})}^{k} {(\frac{n - l}{n})}^{n - k}

$\Pr(F_l) = \sum_{k=0}^{l-1} \binom{n}{k} \left( \frac{l}{n} \right)^k \left( \frac{n-l}{n} \right)^{n-k}$

Prawdopodobieństwo wystąpienia zdarzenia, powyżej występuje jest mniejszy niż, jeśli rozważyć każdy z zdarzeń występujących samodzielnie i w jednej porcji. To daje nam pomost między nimi: $E_l$ $F_l$

\begin{array}{lll} Pr (E_{m}) & \leq & \prod_{l = 1}^{m} Pr (F_{l}) \\ = & \prod_{l = 1}^{m} (\sum_{k = 1}^{l - 1} (\binom{n}{k}) ({\frac{l}{n}}^{k}) {(\frac{n - l}{n})}^{n - k}) \\ = & \prod_{l = 1}^{m} F (l - 1; n, \frac{l}{n}) \end{array}

$\begin{array}{lll} \Pr(E_m) & \le & \prod_{l=1}^m \Pr(F_l) \\ & = & \prod_{l=1}^m \left( \sum_{k=1}^{l-1} \binom{n}{k} \left( \frac{l}{n}^k \right) \left( \frac{n-l}{n} \right)^{n-k} \right) \\ & = & \prod_{l=1}^m F(l-1; n, \frac{l}{n} ) \end{array}$

Gdzie jestfunkcją rozkładu skumulowanego dla rozkładu dwumianowegoprzy $F(l-1; n, \frac{l}{n})$ . Wystarczy przeczytać kilka wierszy w dół na stronie Wikipedii i zauważyć, żemożemy użyćnierówności Chernoffa,aby uzyskać: $p = \frac{l}{n}$ $(l-1 \le p n)$

\begin{array}{lll} Pr (E_{m}) & \leq & \prod_{l = 1}^{m} \exp [- \frac{1}{2 l}] \\ = & \exp [- \frac{1}{2} \sum_{l = 1}^{m} \frac{1}{l}] \\ = & \exp [- \frac{1}{2} H_{m}] \\ \leq & \exp [- \frac{1}{2} (\frac{1}{2 m} + \ln (m) + γ)] \end{array}

$\begin{array}{lll} \Pr(E_m) & \le & \prod_{l=1}^m \exp\left[ -\frac{1}{2l} \right] \\ & = & \exp\left[ - \frac{1}{2} \sum_{l=1}^m \frac{1}{l} \right] \\ & = & \exp\left[ - \frac{1}{2} H_m \right] \\ & \le & \exp\left[ -\frac{1}{2} \left( \frac{1}{2 m} + \ln(m) + \gamma \right) \right] \end{array}$

Where $H_m$ is the $m$ 'th Harmonic Number, $\gamma$ is the Euler-Mascheroni constant and the inequality for the $H_m$ is taken from Wolfram's MathWorld linked page.

Not worrying about the $e^{-1/4m}$ factor, this finally gives us:

Pr (E_{m}) \leq \frac{e^{- γ / 2}}{\sqrt{m}}

$\Pr(E_m) \le \frac{ e^{ -\gamma/2}}{\sqrt{m}}$

Below is a log-log plot of an average of 100,000 instances for $n=2048$ as a function of $m$ with the function $\frac{e^{ -\gamma/2}}{\sqrt{m}}$ also plotted for reference:

enter image description here

While the constants are off, the form of the function appears to be correct.

Below is a log-log plot for varying $n$ with each point being the average of 100,000 instances as a function of $m$ :

enter image description here

Finally, getting to the original question you wanted answered, since we know that $\Pr(E_m) \propto \frac{1}{\sqrt{m}}$ we have:

\sum_{i < n} Pr (E_{i}) \propto \sqrt{n}

$\sum_{i<n} \Pr(E_i) \propto \sqrt{n}$

And as numerical verification, below is a log-log plot of the sum, $S$ , versus instance size, $n$ . Each point represents the average of the sum of 100,000 instances. The function $x^{1/2}$ has been plotted for reference:

enter image description here

While I see no direct connection between the two, the tricks and final form of this problem have a lot of commonalities with the Birthday Problem as initially guessed at in the comments.

— user834
źródło

4

How do you get

P r (E_{2}) \leq P r (F_{1}) \times P r (F_{2})

$Pr(E_2) \le Pr(F_1)\times Pr(F_2)$ ? For example, for

n = 100

$n=100$ , I calculate that

P r (E_{2}) = 0.267946 > 0.14761 = P r (F_{1}) P r (F_{2}) .

$Pr(E_2) = 0.267946 \gt 0.14761 = Pr(F_1)Pr(F_2).$ If you are told that the first bin is empty, does this make it more or less likely that the first two bins hold at most

1

$1$ ball? It's more likely, so

P r (F_{1}) P r (F_{2})

$Pr(F_1)Pr(F_2)$ is an underestimate.

— Douglas Zare

@DouglasZare, I've verified your calculations, you're correct. Serves me right for not being more rigorous.

— user834

15

The answer is $\Theta(\sqrt{n})$ .

First, let's compute $E_{n-1}$ .

Let's suppose we throw $n$ balls into $n$ bins, and look at the probability that a bin has exactly $k$ balls in it. This probability comes from the Poisson distribution, and as $n$ goes to $\infty$ the probability that there are exactly $k$ balls in a given bin is $\frac{1}{e} \frac{1}{ k!}$ .

Now, let's look at a different way of distributing balls into bins. We throw a number of balls into each bin chosen from the Poisson distribution, and condition on the event that there are $n$ balls total. I claim that this gives exactly the same distribution as throwing $n$ balls into $n$ bins. Why? It is easy to see that the probability of having $k_j$ balls in the $j$ ^th bin is proportional to $\prod_{j=1}^n \frac{1}{k_j!}$ in both distributions.

So let's consider a random walk where at each step, you go from $t$ to $t+1-k$ with probability $\frac{1}{e}\frac{1}{k!}$ . I claim that if you condition on the event that this random walk returns to 0 after $n$ steps, the probability that this random always stays above $0$ is the probability that the OP wants to calculate. Why? This height of this random walk after $s$ steps is $s$ minus the number of balls in the first $s$ bins.

If we had chosen a random walk with a probability of $\frac{1}{2}$ of going up or down $1$ on each step, this would be the classical ballot problem, for which the answer is $\frac{1}{2(n-1)}$ . This is a variant of the ballot problem which has been studied (see this paper), and the answer is still $\Theta\left(\frac{1}{n}\right)$ . I don't know whether there is an easy way to compute the constant for the $\Theta\left(\frac{1}{n}\right)$ for this case.

The same paper shows that when the random walk is conditioned to end at height $k$ , the probability of always staying positive is $\Theta(k/n)$ as long as $k = O(\sqrt{n})$ . This fact will let us estimate $E_s$ for any $s$ .

I'm going to be a little handwavy for the rest of my answer, but standard probability techniques can be used to make this rigorous.

We know that as $n$ goes to $\infty$ , this random walk converges to a Brownian bridge, i.e., Brownian motion conditioned to start and end at $0$ . From general probability theorems, for $\epsilon n < s< (1-\epsilon)n$ , the random walk is roughly $\Theta(\sqrt{n})$ away from the $x$ -axis. In the case it has height $t>0$ , the probability that it has stayed above $0$ for the entire time before $s$ is $\Theta(t/s)$ . Since $t$ is likely to be $\Theta(\sqrt{n})$ when $s = \Theta(n)$ , we have $E_s \approx \Theta(1/\sqrt{n})$ .

— Peter Shor
źródło

4

[Edit 2014-08-13: Thanks to a comment by Peter Shor, I have changed my estimate of the asymptotic growth rate of this series.]

My belief is that $\lim_{n\to\infty} \sum_{i<n} \Pr(E_i)$ grows as $\sqrt{n}$ . I do not have a proof but I think I have a convincing argument.

Let $B_i = f(i)$ be a random variable that gives the number of balls in bin $i$ . Let $B_{i,j} = \sum_{k=i}^j B_k$ be a random variable that gives the total number of balls in bins $i$ through $j$ inclusive.

You can now write $\Pr(E_i) = \sum_{b<j} \Pr(E_j \wedge B_{1,j} = b) \Pr(E_i \mid E_j \wedge B_{1,j} = b)$ for any $j < i$ . To that end, let's introduce the functions $\pi$ and $g_i$ .

π (j, k, b) = Pr (B_{j} = k ∣ B_{1, j - 1} = b) = (\binom{n - b}{k}) {(\frac{1}{n - j + 1})}^{k} {(\frac{n - j}{n - j + 1})}^{n - b - k}

$\pi(j, k, b) = \Pr(B_j = k \mid B_{1,j-1} = b) = \binom{n-b}{k}\left(\frac{1}{n-j+1}\right)^k\left(\frac{n-j}{n-j+1}\right)^{n-b-k}$

\begin{aligned} g_{i} (j, k, b) & = Pr (E_{i} \land B_{j, i} \leq k ∣ E_{j - 1} \land B_{1, j - 1} = b) \\ = {\begin{cases} 0 & k < 0 \\ 1 & k >= 0 \land j > i \\ \sum_{l = 0}^{j - b - 1} π (j, l, b) g_{i} (j + 1, k - l, b + l) & o t h e r w i s e \end{cases} \end{aligned}

$\begin{aligned} g_i(j, k, b) \; &= \Pr(E_i \wedge B_{j,i} \le k \mid E_{j-1} \wedge B_{1,j-1} = b) \\ &= \begin{cases} 0 & k < 0 \\ 1 & k >= 0 \wedge j > i \\ \sum_{l=0}^{j-b-1} \pi(j, l, b) g_i(j + 1, k - l, b + l) & \mathrm{otherwise} \end{cases}\end{aligned}$

We can write $\Pr(E_i)$ in terms of $g_i$ :

Pr (E_{i}) = g_{i} (1, i - 1, 0)

$\Pr(E_i) = g_i(1, i - 1, 0)$

Now, it's clear from the definition of $g_i$ that

Pr (E_{i}) = \frac{(n - i)^{n - i + 1}}{n^{n}} h_{i} (n)

$\Pr(E_i) = \frac{(n-i)^{n-i+1}}{n^n}h_i(n)$

where $h_i(n)$ is a polynomial in $n$ of degree $i - 1$ . This makes some intuitive sense too; at least $n - i + 1$ balls will have to be put in one of the $(i+1)$ th through $n$ th bins (of which there are $n-i$ ).

Since we're only talking about $Pr(E_i)$ when $n\to\infty$ , only the lead coefficient of $h_i(n)$ is relevant; let's call this coefficient $a_i$ . Then

lim_{n \to \infty} Pr (E_{i}) = \frac{a_{i}}{e^{i}}

$\lim_{n\to\infty} \Pr(E_i) = \frac{a_i}{e^i}$

How do we compute $a_i$ ? Well, this is where I'll do a little handwaving. If you work out the first few $E_i$ , you'll see that a pattern emerges in the computation of this coefficient. You can write it as

a_{i} = μ_{i} (1, i - 1, 0)

$a_i = \mu_i(1, i-1, 0)$ where

μ_{i} (j, k, b) = {\begin{cases} 0 & k < 0 \\ 1 & k >= 0 \land i > j \\ \sum_{l = 0}^{j - b - 1} \frac{1}{l!} μ_{i} (j + 1, k - l, b + l) & o t h e r w i s e \end{cases}

$\mu_i(j, k, b) = \begin{cases} 0 & k < 0 \\ 1 & k >= 0 \wedge i > j \\ \sum_{l = 0}^{j-b-1} \frac{1}{l!} \mu_i(j + 1, k - l, b+ l) & \mathrm{otherwise} \end{cases}$

Now, I wasn't able to derive a closed-form equivalent directly, but I computed the first 20 values of $Pr(E_i)$ :

N       a_i/e^i
1       0.367879
2       0.270671
3       0.224042
4       0.195367
5       0.175467
6       0.160623
7       0.149003
8       0.139587
9       0.131756
10      0.12511
11      0.119378
12      0.114368
13      0.10994
14      0.105989
15      0.102436
16      0.0992175
17      0.0962846
18      0.0935973
19      0.0911231
20      0.0888353

Now, it turns out that

Pr (E_{i}) = \frac{i^{i}}{i! e^{i}} = Pois (i; i)

$\DeclareMathOperator{\Pois}{Pois} \Pr(E_i) = \frac{i^i}{i! e^i} = \Pois(i; i)$

where $\Pois(i; \lambda)$ is the probability that a random variable $X$ has value $i$ when it's drawn from a Poisson distribution with mean $\lambda$ . Thus we can write our sum as

lim_{n \to \infty} \sum_{i = 1}^{n} Pr (E_{i}) = \sum_{x = 1}^{\infty} \frac{x^{x}}{x! e^{x}}

$\lim_{n\to\infty} \sum_{i=1}^n \Pr(E_i) = \sum_{x = 1}^{\infty} \frac{x^x}{x!e^x}$

Wolfram Alpha tells me this series diverges. Peter Shor points out in a comment that Stirling's approximation allows us to estimate $\Pr(E_i)$ :

lim_{n \to \infty} Pr (E_{x}) = \frac{x^{x}}{x! e^{x}} \approx \frac{1}{\sqrt{2 π x}}

$\lim_{n\to\infty} \Pr(E_x) = \frac{x^x}{x!e^x} \approx \frac{1}{\sqrt{2 \pi x}}$

Let

ϕ (x) = \frac{1}{\sqrt{2 π x}}

$\phi(x) = \frac{1}{\sqrt{2 \pi x}}$

Since

$\lim_{x\to\infty}\frac{\phi(x)}{\phi(x+1)} = 1$
$\phi(x)$ is decreasing
$\int_1^n \phi(x)dx \to \infty$ as $n \to \infty$

our series grows as $\int_1^n \phi(x) dx$ (See e.g. Theorem 2). That is,

\sum_{i = 1}^{n} P r (E_{i}) = Θ (\sqrt{n})

$\sum_{i=1}^n Pr(E_i) = \Theta\left(\sqrt{n}\right)$

— ruds
źródło

1

Wolfram Alpha is wrong. Use Stirling's formula. It says that,

x^{x} / (x! e^{x}) \approx 1 / \sqrt{2 π x}

$x^x/(x! e^x)\approx 1/\sqrt{2\pi x}$ .

— Peter Shor

@PeterShor Thanks! Zaktualizowałem wniosek dzięki twojemu wglądowi i teraz zgadzam się z pozostałymi dwiema odpowiedziami. Interesujące jest dla mnie 3 całkiem różne podejścia do tego problemu.

— ruds

4

Exhaustively checking the first few terms (by examining all n^n cases) and a bit of lookup shows that the answer is https://oeis.org/A036276 / $n^n$ . This implies that the answer is $\sim n^{\frac{1}{2}} \frac{\sqrt{\pi}}{2}$ .

More exactly, the answer is:

\frac{n!}{2 n^{n}} \sum_{k = 0}^{n - 2} \frac{n^{k}}{k!}

$\frac{n!}{2 n^n} \sum_{k=0}^{n-2}\frac{n^k}{k!}$ and there is no closed-form answer.

— Haran
źródło

Oeis is pretty awesome

— Thomas Ahle