গাদা -


15

সম্ভবত, এই প্রশ্নটি আগে জিজ্ঞাসা করা হয়েছিল। এটি সিএলআরএস (২ য় এড) সমস্যা থেকে 6.5-8 -

একটি বাছাই করা তালিকায় কে সাজানো তালিকাগুলি একত্রীকরণের জন্য একটি সময়ের অ্যালগরিদম দিন , যেখানে এন সমস্ত ইনপুট তালিকার সামগ্রীর সংখ্যা। (ইঙ্গিত: কে- ওয়ে মার্জ করার জন্য একটি মিনি-হিপ ব্যবহার করুন ))O(nlgk)knk

As there are k sorted lists and total of n values, let us assume each list contains nk numbers, moreover each of the lists are sorted in strictly ascending order, and the results will also be stored in the ascending order.

My pseudo-code looks like this --

    list[k]   ; k sorted lists
    heap[k]   ; an auxiliary array to hold the min-heap
    result[n] ; array to store the sorted list
    for i := 1 to k                 ; O(k)
    do
        heap[i] := GET-MIN(list[i]) ; pick the first element 
                                    ; and keeps track of the current index - O(1)
    done
    BUILD-MIN-HEAP(heap) ; build the min-heap - O(k)
    for i := 1 to n
    do
        array[i] := EXTRACT-MIN(heap)   ; store the min - O(logk)
        nextMin := GET-MIN(list[1])     ; get the next element from the list 1 - O(1)
        ; find the minimum value from the top of k lists - O(k)
        for j := 2 to k                 
        do
            if GET-MIN(list[j]) < nextMin
                nextMin := GET-MIN(list[j]) 
        done
        ; insert the next minimum into the heap - O(logk)
        MIN-HEAP-INSERT(heap, nextMin)
    done

আমার সামগ্রিক জটিলতা O ( n ) এর ভিতরে ( কে ) লুপ এড়াতে কোনও উপায় খুঁজে পেলাম নাO(k)+O(k)+O(n(k+2lgk))O(nk+nlgk)O(nk)O(k)O(n) কে তালিকা থেকে পরবর্তী ন্যূনতম উপাদানটি খুঁজে পেতে around অন্য কোনও উপায় কি আছে? কীভাবে পাবেন?O(nlgk) algorithm?

উত্তর:


13

The purpose of the heap is to give you the minimum, so I'm not sure what the purpose of this for-loop is - for j := 2 to k.

My take on the pseudo-code:

lists[k][?]      // input lists
c = 0            // index in result
result[n]        // output
heap[k]          // stores index and applicable list and uses list value for comparison
                 // if i is the index and k is the list
                 //   it has functions - insert(i, k) and deleteMin() which returns i,k
                 // the reason we use the index and the list, rather than just the value
                 //   is so that we can get the successor of any value

// populate the initial heap
for i = 1:k                   // runs O(k) times
  heap.insert(0, k)           // O(log k)

// keep doing this - delete the minimum, insert the next value from that list into the heap
while !heap.empty()           // runs O(n) times
  i,k = heap.deleteMin();     // O(log k)
  result[c++] = lists[k][i]
  i++
  if (i < lists[k].length)    // insert only if not end-of-list
    heap.insert(i, k)         // O(log k)

The total time complexity is thus O(klogk+n2logk)=O(nlogk)

You can also, instead of deleteMin and insert, have a getMin (O(1)) and an incrementIndex (O(logk)), which will reduce the constant factor, but not the complexity.

Example:
(using value rather than index and list index and heap represented as a sorted array for clarity)

Input: [1, 10, 15], [4, 5, 6], [7, 8, 9]

Initial heap: [1, 4, 7]

Delete 1, insert 10
Result: [1]
Heap: [4, 7, 10]

Delete 4, insert 5
Result: [1, 4]
Heap: [5, 7, 10]

Delete 5, insert 6
Result: [1, 4, 5]
Heap: [6, 7, 10]

Delete 6, insert nothing
Result: [1, 4, 5, 6]
Heap: [7, 10]

Delete 7, insert 8
Result: [1, 4, 5, 6, 7]
Heap: [8, 10]

Delete 8, insert 9
Result: [1, 4, 5, 6, 7, 8]
Heap: [9, 10]

Delete 9, insert nothing
Result: [1, 4, 5, 6, 7, 8, 9]
Heap: [10]

Delete 10, insert 15
Result: [1, 4, 5, 6, 7, 8, 9, 10]
Heap: [15]

Delete 15, insert nothing
Result: [1, 4, 5, 6, 7, 8, 9, 10, 15]
Heap: []

Done

lets say you have these lists to merge, list[1] = [1, 10, 15], list[2] = [4, 5, 6] and list[3] = [7, 8, 9]. At the first iteration, the value from the heap will be 1 and next your algorithm will insert 10 into the heap, but 10 is the largest value of all the lists -- how will you avoid that?
ramgorur

@ramgorur It doesn't matter that 10 is in the heap. 4,5,6,7,8 and 9 will all get processed before it since we always get the smallest value from the heap and keep replacing deleted values with the next item from the same list. Edited answer with example.
Dukeling

well, if this is the case, we don't have to actually remember the same list for next element push. We can pick a random list every time and push the next element into heap -- which will also supposedly give the same result, am I right? Or is there any other special reason to follow the same list argument?
ramgorur

When deleting 4, if you pick a random list, you may end up inserting 8, thus the heap will be [7, 8, 10], from which you'll insert 7 instead of 5 into the result set, which will be wrong.
Dukeling

@AshwaniGautam's comment on the other answer is apt: creating the heap initially can be done in time O(k).
Raphael

13

First of all, I think that your assumption of all lists having n/k entries is not valid if the running time of the algorithm depends on the length of the longest list.

As for your problem, the following algorithm should do the trick:

  1. Put the first elements of the lists in a min-heap H of size k. Remember for each element the list lm it belongs to. (O(klgk))
  2. For i from 1 to n do:
    • Extract the minimum m from H and store it in result[i] (O(lgk))
    • Insert the direct successor of m in lm (if any) into H (O(lgk) )

The running time is obviosuly in O(klgk+nlgk)=O(nlgk) and the algorithm correctly sorts result.

Proof (or at least, an idea for a proof). Consider the following loop invariant: The i-th element to insert into result is always the minimum of the min-heap H in step i and therefore, result[1..i] is correctly sorted after the i-th iteration.

This is true before the first iteration: First, we show that the first element to insert into result is in H: Assume towards a contradiction that the first element to insert into result (that is, the overall smallest element, call it r1) were not a first element. Then, in the list l that contains r1, the first element l[1] must be distinct from r1 (as by assumption, r1 is not a first element). As our lists are all sorted, we have even l[1]<r1, but this is a contradiction, as we chose r1 to be the overall smallest element. Obviously, the minimum of all first elements is the one to insert into result.

The invariant holds after an iteration: We proceed in the same way. Assume the i-th element to insert (call it ri) were not in H. By construction, H holds at most one element from each list, and once it contains an element m from a list l, all of its predecessors in l were already extracted from H and (by hypothesis) correctly inserted into result. Therefore, ri is assumed to be a successor of some element m in the list l. But this is, as above, a contradiction, as l is sorted, and therefore, the invariant holds.

On termination, we thus have result[1..n] correctly sorted.


Actually the tighter time complexity would be O(K+2*NlogK)=O(NlogK). O(K) is tighter bound than O(KlogK), when making a Heap. Refer this for further clarifications.
Ashwani Gautam

@AshwaniGautam That's not tighter at all, since both O(k) and O(klogk) are dominated. That said, your point is correct; the first initialization of the heap can indeed be done in linear time (in k). I guess (hope!) both answerers know that, but it's not crucial here.
Raphael
আমাদের সাইট ব্যবহার করে, আপনি স্বীকার করেছেন যে আপনি আমাদের কুকি নীতি এবং গোপনীয়তা নীতিটি পড়েছেন এবং বুঝতে পেরেছেন ।
Licensed under cc by-sa 3.0 with attribution required.