Skip to main content

Live, Learn, and Lung

How to Yield Batches from an Iterable in Python

In a past project, I built a pipeline to ingest crawled job postings from Elasticsearch and standardize them into a unified schema. Since the postings couldn’t all fit in memory, I needed to process them in batches. The Elasticsearch client yields postings one by one, so I wrapped it with a batching generator that groups them into chunks.

To illustrate, suppose you have a generator that yields numbers 0 to 9 (in reality, it might yield thousands or millions of items). To read them in batches (for example, batch size of 4), you can loop and use itertools.islice to take the next 4 items. Each call to itertools.islice consumes up to 4 elements from the generator, and the loop continues until the generator is exhausted.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from itertools import islice
from typing import Any, Generator, Iterable, Tuple

def batched_yield(
    iterable: Iterable[Any], batch_size: int
) -> Generator[Tuple[Any, ...], None, None]:
    it = iter(iterable)
    while True:
        chunk = tuple(islice(it, batch_size))
        if not chunk:
            return
        yield chunk

numbers = (i for i in range(10))
for group in batched_yield(numbers, 4):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9)

The code above can be simplified with yield from. The key idea is that yield from g is equivalent to for v in g: yield v. Therefore, yield from delegates the loop and yields each chunk until the source iterator is exhausted. Note that you must pass a sentinel value to iter to signal the end of iteration (in this case, an empty tuple).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def batched_yield_from(
    iterable: Iterable[Any], batch_size: int
) -> Generator[Tuple[Any, ...], None, None]:
    it = iter(iterable)
    yield from iter(lambda: tuple(islice(it, batch_size)), ())

numbers = (i for i in range(10))
for group in batched_yield_from(numbers, 4):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9)

In Python 3.12 or later, you can simply use itertools.batched to yield fixed-size batches from any iterable.

1
2
3
4
5
6
7
8
9
# 3.12+
from itertools import batched

numbers = (i for i in range(10))
for group in batched(numbers, 4):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9)