Skip to main content

Live, Learn, and Lung

Tag: Python

How to Yield Batches from an Iterable in Python

In a past project, I built a pipeline to ingest crawled job postings from Elasticsearch and standardize them into a unified schema. Since the postings couldn’t all fit in memory, I needed to process them in batches. The Elasticsearch client yields postings one by one, so I wrapped it with a batching generator that groups them into chunks.

To illustrate, suppose you have a generator that yields numbers 0 to 9 (in reality, it might yield thousands or millions of items). To read them in batches (for example, batch size of 4), you can loop and use itertools.islice to take the next 4 items. Each call to itertools.islice consumes up to 4 elements from the generator, and the loop continues until the generator is exhausted.

Using Docker to Run Your Python Tests

Have you ever come across the infamous ‘It works on my machine’ issue? I’m sure you have—it’s a common challenge in software development. To tackle this problem, Docker containers offer a solution by allowing you to encapsulate your code and execute it in a consistent environment. As a result, Docker is widely adopted in various domains, including the automation of CI/CD pipelines. In this article, I will demonstrate how to leverage Docker to create and execute a Python test environment.

Managing Project Dependencies with Poetry

Throughout the development of Python projects, incorporating third-party packages becomes essential. The conventional approach for managing project dependencies involves using a requirements.txt file. However, it’s easy to overlook updating this file with newly installed packages using pip freeze > requirements.txt. Moreover, it can be challenging to tell which dependencies were installed directly or indirectly via requirements.txt, making it unclear which packages are genuinely essential after removing some.

To address these issues, it’s recommended to adopt a modern package manager like poetry for more efficient project dependency management.

Using pytest to Test Your Code

Have you ever found yourself inheriting legacy code and questioned its functionality after refactoring? Or, have you made changes to your code and wondered if it still works correctly? If you’ve experienced either of these scenarios, it’s time to consider implementing tests for your code. In this article, we will cover the basics of pytest for conducting unit tests.

Using pyenv to Manage Python Versions

During development, you may require different Python versions for various projects. For example, you might need Python 3.6 for one project and Python 3.10 for another. Instead of installing different Python versions on your system, you can use pyenv to manage multiple Python versions.