গাদা -

সম্ভবত, এই প্রশ্নটি আগে জিজ্ঞাসা করা হয়েছিল। এটি সিএলআরএস (২ য় এড) সমস্যা থেকে 6.5-8 -

একটি বাছাই করা তালিকায় সাজানো তালিকাগুলি একত্রীকরণের জন্য একটি সময়ের অ্যালগরিদম দিন , যেখানে সমস্ত ইনপুট তালিকার সামগ্রীর সংখ্যা। (ইঙ্গিত: ওয়ে মার্জ করার জন্য একটি মিনি-হিপ ব্যবহার করুন )) $O(n \lg k)$ $k$ $n$ $k$

As there are $k$ sorted lists and total of $n$ values, let us assume each list contains $\frac{n}{k}$ numbers, moreover each of the lists are sorted in strictly ascending order, and the results will also be stored in the ascending order.

My pseudo-code looks like this --

    list[k]   ; k sorted lists
    heap[k]   ; an auxiliary array to hold the min-heap
    result[n] ; array to store the sorted list
    for i := 1 to k                 ; O(k)
    do
        heap[i] := GET-MIN(list[i]) ; pick the first element 
                                    ; and keeps track of the current index - O(1)
    done
    BUILD-MIN-HEAP(heap) ; build the min-heap - O(k)
    for i := 1 to n
    do
        array[i] := EXTRACT-MIN(heap)   ; store the min - O(logk)
        nextMin := GET-MIN(list[1])     ; get the next element from the list 1 - O(1)
        ; find the minimum value from the top of k lists - O(k)
        for j := 2 to k                 
        do
            if GET-MIN(list[j]) < nextMin
                nextMin := GET-MIN(list[j]) 
        done
        ; insert the next minimum into the heap - O(logk)
        MIN-HEAP-INSERT(heap, nextMin)
    done

আমার সামগ্রিক জটিলতা । ভিতরে লুপ এড়াতে কোনও উপায় খুঁজে পেলাম না $O(k) + O(k) + O(n(k + 2 \lg k)) \approx O(nk+n \lg k) \approx O(nk)$ $O(k)$ $O(n)$ কে তালিকা থেকে পরবর্তী ন্যূনতম উপাদানটি খুঁজে পেতে around অন্য কোনও উপায় কি আছে? কীভাবে পাবেন? $O(n \lg k)$ algorithm?

— ramgorur
সূত্র

উত্তর:

The purpose of the heap is to give you the minimum, so I'm not sure what the purpose of this for-loop is - for j := 2 to k.

My take on the pseudo-code:

lists[k][?]      // input lists
c = 0            // index in result
result[n]        // output
heap[k]          // stores index and applicable list and uses list value for comparison
                 // if i is the index and k is the list
                 //   it has functions - insert(i, k) and deleteMin() which returns i,k
                 // the reason we use the index and the list, rather than just the value
                 //   is so that we can get the successor of any value

// populate the initial heap
for i = 1:k                   // runs O(k) times
  heap.insert(0, k)           // O(log k)

// keep doing this - delete the minimum, insert the next value from that list into the heap
while !heap.empty()           // runs O(n) times
  i,k = heap.deleteMin();     // O(log k)
  result[c++] = lists[k][i]
  i++
  if (i < lists[k].length)    // insert only if not end-of-list
    heap.insert(i, k)         // O(log k)

The total time complexity is thus $O(k * \log k + n * 2 \log k) = O(n \log k)$

You can also, instead of deleteMin and insert, have a getMin ( $O(1)$ ) and an incrementIndex ( $O(\log k)$ ), which will reduce the constant factor, but not the complexity.

Example:
(using value rather than index and list index and heap represented as a sorted array for clarity)

Input: [1, 10, 15], [4, 5, 6], [7, 8, 9]

Initial heap: [1, 4, 7]

Delete 1, insert 10
Result: [1]
Heap: [4, 7, 10]

Delete 4, insert 5
Result: [1, 4]
Heap: [5, 7, 10]

Delete 5, insert 6
Result: [1, 4, 5]
Heap: [6, 7, 10]

Delete 6, insert nothing
Result: [1, 4, 5, 6]
Heap: [7, 10]

Delete 7, insert 8
Result: [1, 4, 5, 6, 7]
Heap: [8, 10]

Delete 8, insert 9
Result: [1, 4, 5, 6, 7, 8]
Heap: [9, 10]

Delete 9, insert nothing
Result: [1, 4, 5, 6, 7, 8, 9]
Heap: [10]

Delete 10, insert 15
Result: [1, 4, 5, 6, 7, 8, 9, 10]
Heap: [15]

Delete 15, insert nothing
Result: [1, 4, 5, 6, 7, 8, 9, 10, 15]
Heap: []

Done

— Dukeling
সূত্র

lets say you have these lists to merge, list[1] = [1, 10, 15], list[2] = [4, 5, 6] and list[3] = [7, 8, 9]. At the first iteration, the value from the heap will be 1 and next your algorithm will insert 10 into the heap, but 10 is the largest value of all the lists -- how will you avoid that?

— ramgorur

@ramgorur It doesn't matter that 10 is in the heap. 4,5,6,7,8 and 9 will all get processed before it since we always get the smallest value from the heap and keep replacing deleted values with the next item from the same list. Edited answer with example.

— Dukeling

well, if this is the case, we don't have to actually remember the same list for next element push. We can pick a random list every time and push the next element into heap -- which will also supposedly give the same result, am I right? Or is there any other special reason to follow the same list argument?

— ramgorur

When deleting 4, if you pick a random list, you may end up inserting 8, thus the heap will be [7, 8, 10], from which you'll insert 7 instead of 5 into the result set, which will be wrong.

— Dukeling

@AshwaniGautam's comment on the other answer is apt: creating the heap initially can be done in time

O (k)

$O(k)$ .

— Raphael

First of all, I think that your assumption of all lists having $n/k$ entries is not valid if the running time of the algorithm depends on the length of the longest list.

As for your problem, the following algorithm should do the trick:

Put the first elements of the lists in a min-heap $H$ of size $k$ . Remember for each element the list $l_m$ it belongs to. ( $O(k\lg k)$ )
For from to do:
- Extract the minimum $m$ from $H$ and store it in $result[i]$ ( $O(\lg k)$ )
- Insert the direct successor of $m$ in $l_m$ (if any) into $H$ ( $O(\lg k)$ )

The running time is obviosuly in $O(k\lg k + n \lg k)=O(n\lg k)$ and the algorithm correctly sorts $result$ .

Proof (or at least, an idea for a proof). Consider the following loop invariant: The $i$ -th element to insert into $result$ is always the minimum of the min-heap $H$ in step $i$ and therefore, $result[1..i]$ is correctly sorted after the $i$ -th iteration.

This is true before the first iteration: First, we show that the first element to insert into $result$ is in $H$ : Assume towards a contradiction that the first element to insert into $result$ (that is, the overall smallest element, call it $r_1$ ) were not a first element. Then, in the list $l$ that contains $r_1$ , the first element $l[1]$ must be distinct from $r_1$ (as by assumption, $r_1$ is not a first element). As our lists are all sorted, we have even $l[1] < r_1$ , but this is a contradiction, as we chose $r_1$ to be the overall smallest element. Obviously, the minimum of all first elements is the one to insert into $result$ .

The invariant holds after an iteration: We proceed in the same way. Assume the $i$ -th element to insert (call it $r_i$ ) were not in $H$ . By construction, $H$ holds at most one element from each list, and once it contains an element $m$ from a list $l$ , all of its predecessors in $l$ were already extracted from $H$ and (by hypothesis) correctly inserted into $result$ . Therefore, $r_i$ is assumed to be a successor of some element $m$ in the list $l$ . But this is, as above, a contradiction, as $l$ is sorted, and therefore, the invariant holds.

On termination, we thus have $result[1..n]$ correctly sorted.

— Cornelius Brand
সূত্র

Actually the tighter time complexity would be O(K+2*NlogK)=O(NlogK). O(K) is tighter bound than O(KlogK), when making a Heap. Refer this for further clarifications.

— Ashwani Gautam

@AshwaniGautam That's not tighter at all, since both

O (k)

$O(k)$ and

O (k \log k)

$O(k \log k)$ are dominated. That said, your point is correct; the first initialization of the heap can indeed be done in linear time (in

k

$k$ ). I guess (hope!) both answerers know that, but it's not crucial here.

— Raphael