In a past project, I built a pipeline to ingest crawled job postings from Elasticsearch and standardize them into a unified schema. Since the postings couldn’t all fit in memory, I needed to process them in batches. The Elasticsearch client yields postings one by one, so I wrapped it with a batching generator that groups them into chunks.
To illustrate, suppose you have a generator that yields numbers 0 to 9 (in reality, it might yield thousands or millions of items). To read them in batches (for example, batch size of 4), you can loop and use itertools.islice to take the next 4 items. Each call to itertools.islice consumes up to 4 elements from the generator, and the loop continues until the generator is exhausted.