ডেটা প্রবাহে অনুপস্থিত নম্বর নির্ধারণ করুন


14

আমরা একটি স্ট্রিম গ্রহণ n1 সেট থেকে pairwise বিভিন্ন নম্বর {1,,n}

আমি কীভাবে অ্যালগরিদম দিয়ে অনুপস্থিত নম্বরটি নির্ধারণ করতে পারি যা একবার স্ট্রিমটি পড়ে এবং কেবল O(log2n) বিটের মেমরি ব্যবহার করে?

উত্তর:


7

You know i=1ni=n(n+1)2, and because S=n(n+1)2 could be coded in O(log(n)) bits this can be done in O(logn) memory and in one path (just find ScurrentSum, this is missing number).

But this problem could be solved in general case (for constant k): we have k missing numbers, find out all of them. In this case instead of calculating just sum of yi, calculate sum of j'st power of xi for all 1jk (I assumed xi is missing numbers and yi is input numbers):

i=1kxi=S1,i=1kxi2=S2,i=1kxik=Sk (1)

Remember that you can calculate S1,...Sk simply, because S1=Syi, S2=i2yi2, ...

Now for finding missing numbers you should solve (1) to find all xi.

You can compute:

P1=xi, P2=xixj, ... , Pk=xi (2).

For this remember that P1=S1, P2=S12S22, ...

But Pi is coefficients of P=(xx1)(xx2)(xxk) but P could be factored uniquely, so you can find missing numbers.

These are not my thoughts; read this.


1
I don't get (2). Maybe if you added in the sums' details? Does Pk miss a ?
Raphael

@Raphael, Pi is Newton's identities, I think if you take a look at my referenced wiki page you can get the idea of calculation, each Pi could be calculated by previous Ps, Sj, remember simple formula: 2x1x2=(x1+x2)2(x12+x22), you can apply similar approach to all powers. Also as I wrote Pi is sigma of something, but Pk doesn't have any Σ, because there is just one Π.

Be that as it may, answers should be self-contained to a reasonable degree. You give some formulae, so why not make them complete?
Raphael

11

From the comment above:

Before processing the stream, allocate log2n bits, in which you write x:=i=1nbin(i) (bin(i) is the binary representation of i and is pointwise exclusive-or). Naively, this takes O(n) time.

Upon processing the stream, whenever one reads a number j, compute x:=xbin(j). Let k be the single number from {1,...n} that is not included in the stream. After having read the whole stream, we have

x=(i=1nbin(i))(ikbin(i))=bin(k)ik(bin(i)bin(i))=bin(k),
yielding the desired result.

Hence, we used O(logn) space, and have an overall runtime of O(n).


3
may I suggest an easy optimization that makes this a true streaming single-pass algorithm: at time step i, xor x with bin(i) and with the input bin(j) that has arrived on the stream. this has the added benefit that you can make it work even if n is not known ahead of time: just start with a single bit allocated for x and "grow" the allocated space as necessary.
Sasho Nikolov

0

HdM's solution works. I coded it in C++ to test it. I can't limit the value to O(log2n) bits, but I'm sure you can easily show how only that number of bits is actually set.

For those that want pseudo code, using a simple fold operation with exclusive or ():

Missing=fold(,{1,,N}InputStream)

Hand-wavey proof: A never requires more bits than its input, so it follows that no intermediate result in the above requires more than the maximum bits of the input (so O(log2n) bits). is commutative, and xx=0, thus if you expand the above and pair off all data present in the stream you'll be left only with a single un-matched value, the missing number.

#include <iostream>
#include <vector>
#include <cstdlib>
#include <algorithm>

using namespace std;

void find_missing( int const * stream, int len );

int main( int argc, char ** argv )
{
    if( argc < 2 )
    {
        cerr << "Syntax: " << argv[0] << " N" << endl;
        return 1;
    }
    int n = atoi( argv[1] );

    //construct sequence
    vector<int> seq;
    for( int i=1; i <= n; ++i )
        seq.push_back( i );

    //remove a number and remember it
    srand( unsigned(time(0)) );
    int remove = (rand() % n) + 1;
    seq.erase( seq.begin() + (remove - 1) );
    cout << "Removed: " << remove << endl;

    //give the stream a random order
    std::random_shuffle( seq.begin(), seq.end() );

    find_missing( &seq[0], int(seq.size()) );
}

//HdM's solution
void find_missing( int const * stream, int len )
{
    //create initial value of n sequence xor'ed (n == len+1)
    int value = 0;
    for( int i=0; i < (len+1); ++i )
        value = value ^ (i+1);

    //xor all items in stream
    for( int i=0; i < len; ++i, ++stream )
        value = value ^ *stream;

    //what's left is the missing number
    cout << "Found: " << value << endl;
}

3
Please post readable (pseudo) code of only the algorithm instead (skip main). Also, a correctness proof/argument at some level should be included.
Raphael

4
@edA-qamort-ora-y Your answer assumes that the reader knows C++. To someone who is not familiar with this language, there is nothing to see: both finding the relevant passage and understanding what it's doing are a challenge. Readable pseudocode would make this a better answer. The C++ is not really useful on a computer science site.
Gilles 'SO- stop being evil'

3
If my answer proves not to be useful people don't need to vote for it.
edA-qa mort-ora-y

2
+1 for actually taking the time to write C++ code and test it out. Unfortunately as others pointed out, it's not SO. Still you put effort into this !
Julien Lebot

9
I don't get the point of this answer: you take someone else's solution, which is very simple and obviously very efficient, and "test" it. Why is testing necessary? This is like testing your computer adds numbers correctly. And there is nothing nontrivial abt your code either.
Sasho Nikolov
আমাদের সাইট ব্যবহার করে, আপনি স্বীকার করেছেন যে আপনি আমাদের কুকি নীতি এবং গোপনীয়তা নীতিটি পড়েছেন এবং বুঝতে পেরেছেন ।
Licensed under cc by-sa 3.0 with attribution required.