Określ brakującą liczbę w strumieniu danych

Otrzymujemy strumień $n-1$ par różnych liczb ze zbioru $\left\{1,\dots,n\right\}$ .

Jak mogę ustalić brakującą liczbę za pomocą algorytmu, który odczytuje strumień raz i wykorzystuje pamięć tylko bitów? $O(\log_2 n)$

algorithms integers online-algorithms

— Kolejka
źródło

Odpowiedzi:

Wiesz $\sum_{i=1}^n i = \frac{n(n+1)}{2}$ , a ponieważ można zakodować wbitachmożna to zrobić wpamięcii na jednej ścieżce (po prostu znajdź $S = \frac{n(n+1)}{2}$ $O(\log(n))$ $O(\log n)$ to brakuje liczba). $S - \mathrm{currentSum}$

Ale problem ten można rozwiązać w ogólnym przypadku (dla stałej ): mamy brakujących liczb, znajdź je wszystkie. W tym przypadku zamiast obliczania tylko suma , suma oblicz z j'st potęgi dla wszystkich (Przypuszczałem jest brakujące numery i to numery wejściowe): $k$ $k$ $y_i$ $x_i$ $1\le j \le k$ $x_i$ $y_i$

$\qquad \displaystyle \begin{align} \sum_{i=1}^k x_i &= S_1,\\ \sum_{i=1}^k x_i^2 &= S_2,\\ &\vdots \\ \sum_{i=1}^k x_i^k &= S_k \end{align}$ $\qquad (1)$

Pamiętaj, że możesz obliczyć po prostu dlatego, , ... $S_1,...S_k$ $S_1 = S - \sum y_i$ $S_2 = \sum i^2 - \sum y_i^2$

Teraz, aby znaleźć brakujące liczby, powinieneś rozwiązać aby znaleźć wszystkie . $(1)$ $x_i$

Możesz obliczyć:

, , ..., $P_1 = \sum x_i$ $P_2 = \sum x_i\cdot x_j$ $P_k = \prod x_i$ . $(2)$

W tym celu należy pamiętać, że , $P_1 = S_1$ , ... $P_2 = \frac{S_1^2 - S_2}{2}$

Ale to współczynniki ale można by rozłożyć na czynniki specjalne, dzięki czemu można znaleźć brakujące liczby. $P_i$ $P=(x-x_1)\cdot (x-x_2) \cdots (x-x_k)$ $P$

To nie są moje myśli; przeczytaj to .

— Raphael
źródło

Nie rozumiem (2). Może jeśli dodałeś szczegóły sum? Czy

brakuje

P_{k}

$P_k$

\sum

$\sum$

— Raphael

@Raphael,

to tożsamości Newtona, myślę, że jeśli spojrzysz na moją stronę wiki, na którą się powołujesz, możesz uzyskać pomysł obliczeń, każde

może być obliczone według poprzednich

, pamiętaj prostą formułę:

, możesz zastosować podobne podejście do wszystkich mocy. Także jak pisałem

P_{i}

$P_i$

P_{i}

$P_i$

P

$P$

S_{j}

$S_j$

2 \cdot x_{1} \cdot x_{2} = (x_{1} + x_{2})^{2} - (x_{1}^{2} + x_{2}^{2})

$2 \cdot x_1 \cdot x_2 = (x_1 + x_2)^2 - (x_1^2 + x_2^2)$

P_{i}

$P_i$ jest sigma czegoś, ale

nie ma żadnego

, ponieważ jest tylko jeden

P_{k}

$P_k$

Σ

$\Sigma$

Π

$\Pi$

Niezależnie od tego, odpowiedzi powinny być samodzielne w rozsądnym stopniu. Dajesz jakieś formuły, więc dlaczego nie uzupełnić ich?

— Raphael

Z powyższego komentarza:

Before processing the stream, allocate $\lceil \log_2 n \rceil$ bits, in which you write $x:= \bigoplus_{i=1}^n \mathrm{bin}(i)$ ( $\mathrm{bin}(i)$ is the binary representation of $i$ and $\oplus$ is pointwise exclusive-or). Naively, this takes $\mathcal{O}(n)$ time.

Upon processing the stream, whenever one reads a number $j$ , compute $x := x \oplus \mathrm{bin}(j)$ . Let $k$ be the single number from $\{ 1, ... n\}$ that is not included in the stream. After having read the whole stream, we have

x = (⨁_{i = 1}^{n} b i n (i)) \oplus (⨁_{i \neq k} b i n (i)) = b i n (k) \oplus ⨁_{i \neq k} (b i n (i) \oplus b i n (i)) = b i n (k),

$x = \left(\bigoplus_{i=1}^n \mathrm{bin}(i)\right) \oplus \left(\bigoplus_{i \neq k } \mathrm{bin}(i)\right) = \mathrm{bin}(k) \oplus \bigoplus_{i \neq k } (\mathrm{bin}(i) \oplus \mathrm{bin}(i)) = \mathrm{bin}(k),$ yielding the desired result.

Hence, we used $\mathcal{O}(\log n)$ space, and have an overall runtime of $\mathcal{O}(n)$ .

— HdM
źródło

may I suggest an easy optimization that makes this a true streaming single-pass algorithm: at time step

i

$i$ , xor

x

$x$ with

b i n (i)

$\mathrm{bin}(i)$ and with the input

b i n (j)

$\mathrm{bin}(j)$ that has arrived on the stream. this has the added benefit that you can make it work even if

n

$n$ is not known ahead of time: just start with a single bit allocated for

x

$x$ and "grow" the allocated space as necessary.

— Sasho Nikolov

HdM's solution works. I coded it in C++ to test it. I can't limit the value to $O(\log_2 n)$ bits, but I'm sure you can easily show how only that number of bits is actually set.

For those that want pseudo code, using a simple $\text{fold}$ operation with exclusive or ( $\oplus$ ):

Missing = fold (\oplus, {1, \dots, N} \cup InputStream)

$\text{Missing} = \text{fold}(\oplus, \{1,\ldots,N\} \cup \text{InputStream})$

Hand-wavey proof: A $\oplus$ never requires more bits than its input, so it follows that no intermediate result in the above requires more than the maximum bits of the input (so $O(\log_2 n)$ bits). $\oplus$ is commutative, and $x \oplus x = 0$ , thus if you expand the above and pair off all data present in the stream you'll be left only with a single un-matched value, the missing number.

#include <iostream>
#include <vector>
#include <cstdlib>
#include <algorithm>

using namespace std;

void find_missing( int const * stream, int len );

int main( int argc, char ** argv )
{
    if( argc < 2 )
    {
        cerr << "Syntax: " << argv[0] << " N" << endl;
        return 1;
    }
    int n = atoi( argv[1] );

    //construct sequence
    vector<int> seq;
    for( int i=1; i <= n; ++i )
        seq.push_back( i );

    //remove a number and remember it
    srand( unsigned(time(0)) );
    int remove = (rand() % n) + 1;
    seq.erase( seq.begin() + (remove - 1) );
    cout << "Removed: " << remove << endl;

    //give the stream a random order
    std::random_shuffle( seq.begin(), seq.end() );

    find_missing( &seq[0], int(seq.size()) );
}

//HdM's solution
void find_missing( int const * stream, int len )
{
    //create initial value of n sequence xor'ed (n == len+1)
    int value = 0;
    for( int i=0; i < (len+1); ++i )
        value = value ^ (i+1);

    //xor all items in stream
    for( int i=0; i < len; ++i, ++stream )
        value = value ^ *stream;

    //what's left is the missing number
    cout << "Found: " << value << endl;
}

— edA-qa mort-ora-y
źródło

Please post readable (pseudo) code of only the algorithm instead (skip main). Also, a correctness proof/argument at some level should be included.

— Raphael

@edA-qamort-ora-y Your answer assumes that the reader knows C++. To someone who is not familiar with this language, there is nothing to see: both finding the relevant passage and understanding what it's doing are a challenge. Readable pseudocode would make this a better answer. The C++ is not really useful on a computer science site.

— Gilles 'SO- stop being evil'

If my answer proves not to be useful people don't need to vote for it.

— edA-qa mort-ora-y

+1 for actually taking the time to write C++ code and test it out. Unfortunately as others pointed out, it's not SO. Still you put effort into this !

— Julien Lebot

I don't get the point of this answer: you take someone else's solution, which is very simple and obviously very efficient, and "test" it. Why is testing necessary? This is like testing your computer adds numbers correctly. And there is nothing nontrivial abt your code either.

— Sasho Nikolov