Python Tips: the tempfile library

The other day I was writing a python test, and I needed a temporary directory and file.

Normally, I make these tests happen with os and pathlib libraries. It works well enough, and therefore I never had reason to change it. However, this time I asked Claude for a quick test that I was going to edit after, and it showed me the tempfile library.

This module creates temporary files and directories. It works on all supported platforms. TemporaryFile, NamedTemporaryFile, TemporaryDirectory, and SpooledTemporaryFile are high-level interfaces which provide automatic cleanup and can be used as context managers. mkstemp() and mkdtemp() are lower-level functions which require manual cleanup.

You can do things like this:

with tempfile.NamedTemporaryFile(mode='w+', delete=True) as temp_file:
    # You can write data to the temporary temp_file
    # `data` can just be a string of your liking, like a fake CSV
    temp_file.write(data)
    print(f"Created temporary file: {temp_file.name}")

    # You can then process the file via it's path
    # or just process it's content
    # ...

So practical! Quick tip of the day! Here’s a basic complete example:

import pandas as pd
import tempfile
import os
import unittest


def read_csv_to_dataframe(file_path):
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"File not found: {file_path}")

    return pd.read_csv(file_path)


class TestCSVReader(unittest.TestCase):
    def test_read_csv_to_dataframe(self):
        """Test that CSV data is correctly loaded into a DataFrame."""
        # Create sample CSV data
        csv_data = """name,age,city
Alice,32,Seattle
Bob,45,Portland
Charlie,28,San Francisco"""

        # Create a temporary CSV file
        with tempfile.NamedTemporaryFile(mode='w+', suffix='.csv') as temp_file:
            # Write the CSV data
            temp_file.write(csv_data)
            temp_file.flush()
            temp_path = temp_file.name

            # Test the function
            df = read_csv_to_dataframe(temp_path)

            # Verify the results
            self.assertEqual(len(df), 3)
            self.assertEqual(list(df.columns), ['name', 'age', 'city'])
            self.assertEqual(df.iloc[0]['name'], 'Alice')
            self.assertEqual(df.iloc[1]['age'], 45)
            self.assertEqual(df.iloc[2]['city'], 'San Francisco')

if __name__ == "__main__":
    unittest.main()

Docker and DBT starter template

TL;DR: Docker and DBT starter template

I am currently revising some data modeling concepts, and in the course I am doing they use Pentaho. However, the lecturer clearly states that Pentaho is just the tool, and the course is generic enough that you can use any tool you want.

At Cable, we used DBT (data build tool) to build our data models and run them on BigQuery. I became a big fan of DBT, and really wanted to use it during my course. So, to make my life easier, I created a starter template for Docker and DBT.

You can check out the GitHub repository here. All files should be self-explanatory, but open an issue if you have any questions.

Motivation

  1. I did not want to setup a Python environment - I have been using Poetry and it feels clunky to me
  2. I already had Docker installed on my machine, so I thought that I could have a similar workflow to VSCode’s devcontainers
  3. I still wanted to use Zed as my editor - so I wanted to make sure that the changes in my host machine were reflected in the container
  4. I also wanted to be able to generate and access the docs that DBT creates
  5. And finally I knew I could just use docker-compose to run the dbt container and the Postgres container easily together

I hope this helps you get started with Docker and DBT.

I like virtual coffees

Virtual coffees are a divisive topic. The opinions I have seen around me range from “yuck, no” to “they’re okay”. Personally, they can be really great! I have had many fulfilling conversations, ranging from learning moments about other areas of the company to deep, meaningful life conversations. This post serves to advocate for virtual coffee, to both managers and ICs, and to share a few tips that hopefully will help make your one-on-one moments better.

More …

Setting up a new Mac

I wanted to write a blogpost about how I setup my new Mac for work. However, I find myself being annoyed at my own process of setting up my Mac. I feel this because my process feels inefficient.

Let me walk you through how I setup things. And hopefully I will figure out the questions I should be asking myself to improve it.

More …