ধ্রুব আকারের খণ্ডগুলিতে কীভাবে পুনরাবৃত্তীয় বিভক্ত করা যায়


87

সম্ভাব্য সদৃশ:
আপনি কীভাবে পাইথনে সমান আকারের অংশগুলিতে একটি তালিকা ভাগ করবেন?

আমি অবাক হই আমি এমন একটি "ব্যাচ" ফাংশনটি খুঁজে পেলাম না যা ইনপুট হিসাবে পুনরাবৃত্ত হবে এবং পুনরাবৃত্তের পুনরাবৃত্ত হবে।

উদাহরণ স্বরূপ:

for i in batch(range(0,10), 1): print i
[0]
[1]
...
[9]

or:

for i in batch(range(0,10), 3): print i
[0,1,2]
[3,4,5]
[6,7,8]
[9]

Now, I wrote what I thought was a pretty simple generator:

def batch(iterable, n = 1):
   current_batch = []
   for item in iterable:
       current_batch.append(item)
       if len(current_batch) == n:
           yield current_batch
           current_batch = []
   if current_batch:
       yield current_batch

But the above does not give me what I would have expected:

for x in   batch(range(0,10),3): print x
[0]
[0, 1]
[0, 1, 2]
[3]
[3, 4]
[3, 4, 5]
[6]
[6, 7]
[6, 7, 8]
[9]

So, I have missed something and this probably shows my complete lack of understanding of python generators. Anyone would care to point me in the right direction ?

[Edit: I eventually realized that the above behavior happens only when I run this within ipython rather than python itself]


Good question, well written, but it already exists and will solve your problem.
Josh Smeaton

7
IMO this isn't really a duplicate. The other question focuses on lists instead of iterators, and most of those answers require len() which is undesirable for iterators. But eh, the currently accepted answer here also requires len(), so...
dequis

7
This is clearly not a duplicate. The other Q&A only works for lists, and this question is about generalizing to all iterables, which is exactly the question I had in mind when I came here.
Mark E. Haase

1
@JoshSmeaton @casperOne this is not a duplicate and the accepted answer is not correct. The linked duplicate question is for list and this is for iterable. list provides len() method but iterable does not provide a len() method and the answer would be different without using len() This is the correct answer: batch = (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *[iter(iterable)] * n))
Trideep Rath

@TrideepRath yep, I've voted to reopen.
Josh Smeaton

উত্তর:


126

This is probably more efficient (faster)

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

for x in batch(range(0, 10), 3):
    print x

Example using list

data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data 

for x in batch(data, 3):
    print(x)

# Output

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]

It avoids building new lists.


4
For the record, this is the fastest solution I found: mine = 4.5s, yours=0.43s, Donkopotamus = 14.8s
mathieu

77
your batch in fact accepts a list (with len()), not iterable (without len())
tdihp

31
This is faster because it isn't a solution to the problem. The grouper recipe by Raymond Hettinger - currently below this - is what you are looking for for a general solution that doesn't require the input object to have a len method.
Robert E Mealey

7
Why you use min()? Without min() code is completely correct!
Pavel Patrin

21
Iterables don't have len(), sequences have len()
Kos

64

FWIW, the recipes in the itertools module provides this example:

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)

It works like this:

>>> list(grouper(3, range(10)))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]

13
This is not exactly what I needed since it pads the last element with a set of None. i.e., None is a valid value in the data I actually use with my function so what I need instead is something that does not pad the last entry.
mathieu

12
@mathieu Replace izip_longest with izip, which will not pad the last entries, but instead cut off entries when some of the elements start running out.
GoogieK

3
Should be zip_longest/zip in python 3
Peter Gerdes

5
@GoogieK for x, y in enumerate(grouper(3, xrange(10))): print(x,y) does indeed not fill values, it just drops the incomplete segment altogether.
kadrach

3
As a one liner that drops the last element if incomplete: list(zip(*[iter(iterable)] * n)). This has to be the neatest bit of python code I've ever seen.
Le Frite

31

As others have noted, the code you have given does exactly what you want. For another approach using itertools.islice you could see an example of following recipe:

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([batchiter.next()], batchiter)

1
@abhilash No ... this code uses the call to next() to cause a StopIteration once sourceiter is exhausted, thus terminating the iterator. Without the call to next it would continue to return empty iterators indefinitely.
donkopotamus

7
I had to replace batchiter.next() with next(batchiter) to make the above code work in Python 3.
Martin Wiebusch

2
pointing out a comment from the linked article: "You should add a warning that a batch has to be entirely consumed before you can proceed to the next one." The output of this should be consumed with something like: map(list, batch(xrange(10), 3)). Doing: list(batch(xrange(10), 3) will produce unexpected results.
Nathan Buesgens

2
Does not work on py3. .next() must be changed to next(..), and list(batch(range(0,10),3)) throws RuntimeError: generator raised StopIteration
mathieu

1
@mathieu: Wrap the while loop in try:/except StopIteration: return to fix the latter issue.
ShadowRanger

13

I just gave one answer. However, now I feel the best solution might be not writing any new functions. More-itertools includes plenty of additional tools, and chunked is amongst them.


This is indeed the most fitting answer (even though it requires installation of one more package), and there's also ichunked that yields iterables.
viddik13

10

Weird, seems to work fine for me in Python 2.x

>>> def batch(iterable, n = 1):
...    current_batch = []
...    for item in iterable:
...        current_batch.append(item)
...        if len(current_batch) == n:
...            yield current_batch
...            current_batch = []
...    if current_batch:
...        yield current_batch
...
>>> for x in batch(range(0, 10), 3):
...     print x
...
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

Great answer because it does not need to import anything and is intuitive to read.
ojunk

8

This is a very short code snippet I know that does not use len and works under both Python 2 and 3 (not my creation):

def chunks(iterable, size):
    from itertools import chain, islice
    iterator = iter(iterable)
    for first in iterator:
        yield list(chain([first], islice(iterator, size - 1)))

7

Solution for Python 3.8 if you are working with iterables that don't define a len function, and get exhausted:

def batcher(iterable, batch_size):
    while batch := list(islice(iterable, batch_size)):
        yield batch

Example usage:

def my_gen():
    yield from range(10)
 
for batch in batcher(my_gen(), 3):
    print(batch)

>>> [0, 1, 2]
>>> [3, 4, 5]
>>> [6, 7, 8]
>>> [9]

Could of course be implemented without the walrus operator as well.


3
In the current version, batcher accepts an iterator, not an iterable. It would result in an infinite loop with a list, for example. There should probably be a line iterator = iter(iterable) before starting the while loop.
Daniel Perez

2

This is what I use in my project. It handles iterables or lists as efficiently as possible.

def chunker(iterable, size):
    if not hasattr(iterable, "__len__"):
        # generators don't have len, so fall back to slower
        # method that works with generators
        for chunk in chunker_gen(iterable, size):
            yield chunk
        return

    it = iter(iterable)
    for i in range(0, len(iterable), size):
        yield [k for k in islice(it, size)]


def chunker_gen(generator, size):
    iterator = iter(generator)
    for first in iterator:

        def chunk():
            yield first
            for more in islice(iterator, size - 1):
                yield more

        yield [k for k in chunk()]

2
def batch(iterable, n):
    iterable=iter(iterable)
    while True:
        chunk=[]
        for i in range(n):
            try:
                chunk.append(next(iterable))
            except StopIteration:
                yield chunk
                return
        yield chunk

list(batch(range(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Best answer so far, works with every data structure
Clément Prévost

1

This would work for any iterable.

from itertools import zip_longest, filterfalse

def batch_iterable(iterable, batch_size=2): 
    args = [iter(iterable)] * batch_size 
    return (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *args))

It would work like this:

>>>list(batch_iterable(range(0,5)), 2)
[(0, 1), (2, 3), (4,)]

PS: It would not work if iterable has None values.


1

Here is an approach using reduce function.

Oneliner:

from functools import reduce
reduce(lambda cumulator,item: cumulator[-1].append(item) or cumulator if len(cumulator[-1]) < batch_size else cumulator + [[item]], input_array, [[]])

Or more readable version:

from functools import reduce
def batch(input_list, batch_size):
  def reducer(cumulator, item):
    if len(cumulator[-1]) < batch_size:
      cumulator[-1].append(item)
      return cumulator
    else:
      cumulator.append([item])
    return cumulator
  return reduce(reducer, input_list, [[]])

Test:

>>> batch([1,2,3,4,5,6,7], 3)
[[1, 2, 3], [4, 5, 6], [7]]
>>> batch(a, 8)
[[1, 2, 3, 4, 5, 6, 7]]
>>> batch([1,2,3,None,4], 3)
[[1, 2, 3], [None, 4]]

1

A workable version without new features in python 3.8, adapted from @Atra Azami's answer.

import itertools    

def batch_generator(iterable, batch_size=1):
    iterable = iter(iterable)

    while True:
        batch = list(itertools.islice(iterable, batch_size))
        if len(batch) > 0:
            yield batch
        else:
            break

for x in batch_generator(range(0, 10), 3):
    print(x)

Output:

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

0

You can just group iterable items by their batch index.

def batch(items: Iterable, batch_size: int) -> Iterable[Iterable]:
    # enumerate items and group them by batch index
    enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
    # extract items from enumeration tuples
    item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    return item_batches

It is often the case when you want to collect inner iterables so here is more advanced version.

def batch_advanced(items: Iterable, batch_size: int, batches_mapper: Callable[[Iterable], Any] = None) -> Iterable[Iterable]:
    enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
    if batches_mapper:
        item_batches = (batches_mapper(t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    else:
        item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    return item_batches

Examples:

print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, tuple)))
# [(1, 9, 3, 5), (2, 4, 2)]
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, list)))
# [[1, 9, 3, 5], [2, 4, 2]]

0

Related functionality you may need:

def batch(size, i):
    """ Get the i'th batch of the given size """
    return slice(size* i, size* i + size)

Usage:

>>> [1,2,3,4,5,6,7,8,9,10][batch(3, 1)]
>>> [4, 5, 6]

It gets the i'th batch from the sequence and it can work with other data structures as well, like pandas dataframes (df.iloc[batch(100,0)]) or numpy array (array[batch(100,0)]).


0
from itertools import *

class SENTINEL: pass

def batch(iterable, n):
    return (tuple(filterfalse(lambda x: x is SENTINEL, group)) for group in zip_longest(fillvalue=SENTINEL, *[iter(iterable)] * n))

print(list(range(10), 3)))
# outputs: [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
print(list(batch([None]*10, 3)))
# outputs: [(None, None, None), (None, None, None), (None, None, None), (None,)]

0

I use

def batchify(arr, batch_size):
  num_batches = math.ceil(len(arr) / batch_size)
  return [arr[i*batch_size:(i+1)*batch_size] for i in range(num_batches)]
  

0

Keep taking (at most) n elements until it runs out.

def chop(n, iterable):
    iterator = iter(iterable)
    while chunk := list(take(n, iterator)):
        yield chunk


def take(n, iterable):
    iterator = iter(iterable)
    for i in range(n):
        try:
            yield next(iterator)
        except StopIteration:
            return
আমাদের সাইট ব্যবহার করে, আপনি স্বীকার করেছেন যে আপনি আমাদের কুকি নীতি এবং গোপনীয়তা নীতিটি পড়েছেন এবং বুঝতে পেরেছেন ।
Licensed under cc by-sa 3.0 with attribution required.