What I wish I knew about Python when I started
Seven years ago, I quit my job working for a startup and joined a new company that did all their development in Python. I had never used the Python language to any serious degree up until that point, but quickly learned to love the language for its intuitive nature and broad community support. At the same time, it often felt a bit like the wild west as there seemed to exist at least ten different ways to accomplish any one task, and no obvious consensus on which one was right.
Since then, through a combination of learning best practices from peers and colleagues and gaining firsthand experience through trial and error, I’ve developed a set of choices I now make—ones I only wish I had known about back then. Some of these ideas didn’t even exist seven years ago (like PEP-621), so implementing them back then would have been impossible. But now, I’m sharing them here for anyone in a similar position. Keep in mind that the Python ecosystem evolves quickly, so it’s entirely possible much of the advice here may become obsolete within a year. And of course, you may not agree with all my recommendations—and that’s okay! Feel free to debate (or roast) me in the comments.
I’ll be breaking this down into seven sections, so let’s dive in.
Folder structure and basics
The first thing I want to bring up probably seems small and
inconsequential, but it was definitely something that tripped me up a
few times when I first started. When starting a new project, I would
naturally create a folder to store all my files, something like
myproject
. This folder would invariably be the root of a Git
repository. I recommend avoiding storing Python files directly in the
root of this folder. Instead, create a subfolder within myproject
also named myproject
and store your Python code there.
Alternatively, you might create a subfolder called src
for the code,
but if you plan on using any of your new Python files as packages,
you’ll probably still need another subfolder inside src
called
myproject
. I will still include other files at the root of the
repository—just not Python files. Here are some examples of other
files that might live at the root:
Dockerfile
Justfile
/Makefile
pyproject.toml
(more on this later)- Other config files
Now keep in mind that rules are made to be broken, so there are
exceptions to this. My baseline rule is to always avoid placing Python
files directly at the root of a new repository when setting it up. Why,
you might ask? Because the folder name of your repository will always
be effectively invisible to the Python importer—it needs a subfolder
in order to import things by name. So, suppose I have two files in my
repository: an executable script called main.py
and a utility class
called utils.py
from which I want to import a function. If I threw
everything in the root of the repository, the folder structure would
look something like this:
.
├── main.py
├── utils.py
└── README.md
…and then my main function would look something like this:
#!/usr/bin/env python
from myproject.utils import my_function
if __name__ == "__main__":
my_function()
Well, then it simply wouldn’t work, would it? Although the root
myproject
folder is indeed part of the PYTHONPATH
by running it inside
that folder, the package name myproject
still isn’t available unless—and
until—we have that extra nested folder, so it should look more like this structure:
.
├── README.md
└── myproject
├── main.py
└── utils.py
Again, I recognize this probably seems small and inconsequential, but
coming from a non-Pythonic background, I remember it felt weird at
first. Just accept that this is the convention. And while we’re talking
about basics, there are a few other things worth mentioning. You may have
noticed that the first line of my sample Python script above included a
shebang. It’s standard practice to include this only in files meant to be
executed directly. Furthermore, I used /usr/bin/env python
in my shebang,
but in the wild you might come across /usr/bin/python
or /usr/bin/local/python
.
I recommend sticking with /usr/bin/env python
, as it’s the preferred
approach outlined in PEP-394.
Finally, whatever file launches your application (in this case, mine is main.py
),
it is conventional to include that if __name__ == "main":
block at the bottom
(shown above), to specify the entrypoint to your application. This isn’t strictly
necessary in many use cases—for instance, if you’re writing a REST API using
FastAPI and launching it via
uvicorn, you don’t need this block. But it
also doesn’t hurt to have it regardless. It’s a helpful reminder of where
things begin.
One critical piece of learning I want to share is that, when
creating files, do your best not to name them with the same names as
Python system packages. If you do, it will invetiably cause import
issues. I remember burning more time than I would care to admit,
scratching my head over why one of my scripts wasn’t running, all
because I had created a file named copy.py
or email.py
(which
conflict with the copy
and email system
packages, respectively). So, watch out for that!
Additionally, I sometimes create repositories that contain more than just Python code, in which case I further segment the folder structure. This goes beyond simply including template files alongside my Python code—I mean entirely separate projects within the same repository. This approach is often referred to as building a monorepo, where a single repository houses multiple projects. The most common structure I use is really more of a pseudo-monorepo containing two different projects: a backend and a frontend application. In many cases, my frontend application doesn’t include any Python at all, as JavaScript frameworks often make more sense. In such cases, my folder structure might look more like this:
.
├── README.md
├── backend
│ ├── myproject
│ │ ├── main.py
│ │ └── util.py
│ └── pyproject.toml
└── frontend
└── package.json
Sometimes when writing applications in Python (or any language, really),
you may find yourself writing common utility functions that aren’t specific
to any one application. These might include functions for database connections,
common data transformations, or reusable subclasses for workflow
orchestration. In such cases, I recommend utilizing the earlier
folder structure, where I create a folder called myproject
at the root
of the repository. To keep things organized, I also recommend making use of
the __all__
dunder variable to cleanly define what utilities are exported.
To demonstrate this, I created a sample repository called rds-utils,
which I reference later when we discuss publishing packages. If you peruse
that repository, you’ll see how I have things structured for this library.
Note the package itself is called rds-utils
with a hyphen, but the root-level
project folder is named rds_utils
with an underscore. This is intentional.
When writing import statements in Python, you cannot use hyphens, though it
is relatively common to see hyphens used in the package names themselves.
So with this repo, you would run pip install rds-utils
but then use
something like from rds_utils import fetch_query
in your scripts.
I also use the __all__
special variable in my __init__.py
file.
Generally, I prefer to leave most of my __init__.py files
completely blank (or omit them entirely), but defining the __all__
variable within them is my one exception. This essentially
allows you to create a cleaner interface when building shared libraries
so that consumers of your library don’t have to worry about internal
package structures. The way I have it set up right now,
it’s still possible to directly import things like:
from rds_utils.utils import fetch_query
, but it’s much nicer
that consumers don’t need to know about that extra .utils
that
I have defined there.
Virtual environments and dependency management
In the seven years I’ve been working with Python, the landscape has changed dramatically. It feels especially true now, as the Python community reaps the benefits of what you might call the “Rust Renaissance.” So although it still seems very nascent, I want to focus on a tool called uv which I’ve recently adopted and has streamlined my workflow by replacing three other tools I previously relied on. While uv helps manage virtual environments, it does a whole lot more. It also helps manage Python versions as well as project dependencies and can even be used to create packages.
When I first started writing this post, I had planned on getting into the nuances of pyenv, pdm, pipx—all of which I had been using. Just last year, I thought this blog post comparing Python package managers was still an insightful guide, but now it already seems outdated! While tools like pyenv and pdm still have advantages over uv in certain areas, those differences are minor, and I’m increasingly confident that uv will iron out any of my grievances as they continue to move forward with development.
But let me back up and start from the perspective of a total Python
beginner, as that is who this post is intended for. In Python, there are
a lot of built-in libraries available to you via the Python Standard
Library. This includes
packages like datetime
which allows you to manipulate dates and times,
or like smtplib
which allows you to send emails, or like argparse
which helps aid development of command line utilities, and so on. In
addition, Python also has third-party libraries available through
PyPI, the Python package index. You don’t need to
do anything special to utilize any of the Python Standard Library
packages besides using an import
statement in your Python script
to reference them. But with PyPI packages you typically use
pip install
to first add them into your system, then reference
them with the import
keyword.
Pip is a command line package manager that comes with Python, but it
is definitely not the only package manager. Other tools like pipenv,
poetry, pdm, conda, and even some third-party applications like
please can all manage Python dependencies.
One common mistake beginners make is installing Python, and then
running pip install
(or even worse, sudo pip install
), before
moving forward. While this works in the moment, it will almost
certainly cause headaches for you later.
In Python, you cannot have multiple versions of the same package
installed at the same time. A problem arises when one piece of Python
code depends upon, say, version 1 of Pydantic, while another depends
on version 2. Or maybe one package depends on an older version of
scikit-learn, but another needs the newer version, and so on—these
are called dependency conflicts. To solve this problem, Python has a
concept called virtual environments. These are simply an isolated
context in which you can install a specific set of Python packages,
unique to one application. So, if you have code in a folder named
myproject1
and some other code in a folder called myproject2
,
you can have separate virtual environments for each.
It’s for this reason, I recommend never installing any Python packages into your global environment at all. Only ever install into a virtual environment. Prior to recent updates to uv, I would have recommended installing pyenv first as a way of managing different versions of Python, and then using its built-in extension pyenv-virtualenv to manage your virtual environment. However, as I teased earlier, uv now handles all of this. With that foundation in place, let’s see why uv is my go-to tool.
To begin, I recommend installing uv first, even before installing Python, via this command:
curl -LsSf https://astral.sh/uv/install.sh | sh
Once uv is installed, you can use it to install the latest version of Python, via this command (as of version 3.13):
uv python install 3.13
That will download the specified version of Python to a subfolder under
~/.local/share/uv/python
but it won’t be immediately available. You
can now type uv run python
to launch this instance of Python, or
uv run --python 3.13 python
if you want to specify the version.
Lately, I’ve been adding the following aliases to my .bashrc
or .zshrc
file:
alias python="uv run python"
alias pip="uv pip"
This allows me to simply type python
on the command line, and it
effectively calls that same uv run python
command to launch the Python
REPL. Or I can type pip list
, and it will call uv pip list
instead.
So at this point, we have uv managing our different versions of Python
(because inevitably you will need to update both), but we are still
operating within the global environment for that version of Python.
As I mentioned before, we want to utilize virtual environments as a
means of segregating dependencies for different projects.
Even when using uv, there are multiple ways of going about this. You
could type uv venv
which will create a folder named .venv
, then run
source .venv/bin/activate
to activate it, and then run
uv pip install
(or just pip install
with the alias I mentioned
earlier) to install dependencies into it. You could do that, but that’s
not what I recommend. If you’re brand new to virtual environments, you
might not realize they are something you have to “activate” and
“deactivate.” But before I suggest a better alternative, I need to
introduce one more concept: Python packaging.
Virtual environments solve one problem: conflicting dependencies across different projects. But they don’t solve an accompanying problem, which is what to do when you’ve created a project and need to package it to share with others. Virtual environments live on your computer and provide an isolated space to install dependencies, but if you want to publish your project to the Python Package Index, or if you want to install the project inside a Docker container, a separate process is involved as virtual environments aren’t portable.
Different tools that have sprung up over the years to solve this problem, some of which I already mentioned. Pip uses a requirements.txt with a list of each dependency on every line. The related pip-tools package uses a requirements.in file, which barely abstracts that a little further. Pipenv uses a TOML-formatted file named Pipfile to define much of the same, while Anaconda and Conda use an environment.yml file. The setuptools package defines a Python script named setup.py to handle packaging and define dependencies. And tools like poetry and pdm use a pyproject.toml file.
That last one—pyproject.toml—was introduced in mid-2020 as part of PEP-621, and in my opinion is the best way to handle Python packaging to date. All the others are antiquated or incomplete by comparison, and fortunately, pyproject.toml is uv’s native format. Here’s an example of a pyproject.toml file generated by uv:
[project]
name = "my-email-application"
version = "0.1.0"
description = "Sends emails using SES"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"boto3>=1.35.68",
"click>=8.1.7",
"tqdm>=4.67.1",
]
As you can see, it has a name, a version, a description, a reference to
a README.md file, the specific Python version required, and a list of
dependencies. I generated this file by first creating a project folder
named my-email-application
and then running the following commands
from within that folder:
uv init
uv add boto3 click tqdm
In truth, I did have to manually update the pyproject.toml, which was
generated by that first uv init
command, because it had placeholder text in
the description. But by calling the second command (uv add
), not only
did it update the pyproject.toml file as shown above with the list of
dependencies, it created a .venv
folder and installed those dependencies
into it. So now if I type uv pip list
, I immediately see the following output:
Package Version
--------------- -----------
boto3 1.35.68
botocore 1.35.68
click 8.1.7
jmespath 1.0.1
python-dateutil 2.9.0.post0
s3transfer 0.10.4
six 1.16.0
tqdm 4.67.1
urllib3 2.2.3
Because these are installed into a virtual environment, I can trust
they won’t conflict with any other projects I might be working on.
My recommendation is to always start a project by running uv init
,
which will create a pyproject.toml file (if one does not already exist).
However, if you happen to be working on a project that already has this
file, you’ll need to call the following two commands in order instead:
uv lock
uv sync
The first will use the existing pyproject.toml file to create another file called uv.lock, while the second will create a virtual environment and install all the dependencies into it.
Further, I recommend only installing dependencies using uv add
,
instead of something like uv pip install
. Using uv add
will
not only install dependencies into your virtual environment, but will
also update your pyproject.toml and uv.lock files. This means you
don’t have to treat dependency installation and packaging as two
separate things.
Running Python on Windows
If you are running Microsoft Windows, I want to advise one more prerequisite step that you need to take before getting started with Python or uv: install the Windows Subsystem for Linux, also known as WSL2. Do not, for the love of all that is good and holy, install Python tooling directly in Windows; rather, install WSL first. This guide outlines all the steps you need to take to get started, though I recommend downloading WSL from the Releases page on Github instead of from the Microsoft Store as advised in Step 3.
WSL transforms the Windows command prompt into a Linux terminal, along with its own Linux-based filesystem in a cordoned off part of your hard drive. The files there are still accessible through programs like VS Code, via their WSL extension. Installing WSL does mean you will need to learn Linux syntax, but it will be worth it. So for Windows users, install WSL first, then install uv.
A note on tools
It’s also worth mentioning that there are some utility packages
offering helpful tools that I like to make globally available.
This approach should be used sparingly—since most of the time you
will want to install project dependencies as I’ve described above
using uv add
—there are certain tools that cover cross-cutting
concerns and don’t properly belong to any one project you’re writing.
An example of a cross-cutting concern would be something like code formatting, or CLI utilities helpful during development but not properly part of the codebase itself. There are only four tools I use in particular, but I’ll list some of the more popular examples:
- black - a Python code formatter
- flake8 - a Python code formatter
- lefthook - a Git hooks manager
- isort - a utility for sorting import statements
- mypy - a static typing utility for Python
- nbstripout - a utility that strips output from Jupyter notebooks
- pdm - another package manager
- poetry - another package manager
- pre-commit - a Git hooks manager
- pylint - a static code checker for Python
- ruff - a Python code formatter
- rust-just - a modern command runner, similar to Make
- uv-sort - a utility for alphabetizing dependencies in pyproject.toml
These days, I only keep lefthook, mypy, ruff, and rust-just installed.
I’ve stopped using pdm in favor of uv, and I’ve stopped using tools like
black and isort in favor of ruff. In the past, I might have installed
these utilities with a command like pip install black
or pipx install pdm
.
But with uv, the equivalent command is uv tool install
followed by the
name of the tool. This makes the exectuable available globally regardless
of whether you’re in a virtual environment. You can also run
uv tool upgrade
periodically to make sure you’re using the latest version.
I use three criteria to determine whether to install a tool via
uv tool install
or the usual uv add
:
- The tool must primarily provide a binary application (not a library)
- The tool must address a cross-cutting concern (such as code formatting)
- The tool must not be something you would otherwise bundle with an application itself
With that said, there is still one tool that covers cross-cutting concerns, and I recommend packaging the normal way with the application: pytest. Pytest is used for running unit tests within your project, so it should be added as a development dependency. Add it with a command like this:
uv add --dev pytest
There are other development dependencies you might consider adding (like coverage), but in general I tend to install things as tools rather than dev dependencies. It really depends on the use case.
Publishing packages
uv makes it remarkably easy to publish shared libraries and other utilities you’ve written in Python as packages—either on the public PyPI repository or in private artifact registries (e.g. Gitlab Artifacts, AWS CodeArtifact, Google Artifact Registry, Artifactory, SonaType Nexus, etc.). You only need to make minor modifications to your pyproject.toml file to support publishing, and then set some environment variables. I’ll reference my rds-utils package I highlighted earlier to illustrate how this works. As you can see from that project’s prpyroject.toml file, I have added two blocks that aren’t there by default:
[project.urls]
[tool.uv]
The list of project.urls is not strictly required, but when publishing specifically to PyPI it means the website that PyPI auto-generates for your package will link back to your repository. I went ahead and published this library to PyPI here. You can see the left navigation bar lists the three “Project Links.” The tool.uv section defines the publish URL, which in this case is PyPI’s upload endpoint. If you’re are using a private registry such as AWS CodeArtifact, you can swap that out here. For instance, a CodeArtifact repository URL might look something like this:
[tool.uv]
publish-url = "https://registry-name-123456789012.d.codeartifact.us-east-1.amazonaws.com/pypi/artifacts/"
Note: Authentication for the upload endpoint is not defined here, nor should it be. Usernames, passwords, and/or tokens should never be hard-coded into a file.
In the case of a private registry, you may be given an actual
username and password. In the case of the public PyPI registry, you need
to generate an API token, and that token effectively becomes the
password while the literal string __token__
is the username. To use uv
for publishing, you need to set the values of these environment
variables:
- UV_PUBLISH_USERNAME
- UV_PUBLISH_PASSWORD
With those values set, you then need to run the following two commands in order:
uv build
uv publish
And voila! Assuming you’ve specified the right credentials, you should now have a published Python package in your choice repository. In the past, this might have involved defining a setup.py file at the root of your repository, and using utility packages such as setuptools, wheel, and twine to build and upload everything, but now uv serves as a complete replacement for all of those.
Installing libraries from private repositories
If you built your shared library package and published it to the public
PyPI repository (as I did with my rds-utils package), you don’t have to
do anything special to utilize it in future projects—you can simply use
pip install rds-utils
or uv add rds-utils
. But if you pushed your
code to a private repository (common when developing commercial
applications for private companies), you’ll have to do a little bit
more to tell uv (or pip) where to pull your package from and how to authenticate.
I ran a quick experiment by creating a private CodeArtifact repository
in my personal AWS account and installed my same rds-utils package
using the steps described above. Then I created a second project in
which I wanted to install that library, again using uv init
to create
my pyproject.toml file. This time, I needed to manually add another section
called [[tool.uv.index]]
to that file. In my case, it looked
like this:
[[tool.uv.index]]
name = "codeartifact"
url = "https://registry-name-123456789012.d.codeartifact.us-east-1.amazonaws.com/pypi/artifacts/simple/"
You can add as many of these sections to your pyproject.toml file as you want, but it’s unlikely you’ll need more than one at a time. This is because a single private registry can host as many different Python packages as you like. Even Gitlab’s project-based, built-in artifact registries still have a mechanism for pulling things at the group level, thereby allowing better consolidation.
Again, it’s critical to omit any authentication information from the URL. With publishing to CodeArtifact, it’s easy as the AWS credentials provided are already delineated into a separate URL, username, and password. But when fetching from CodeArtifact, AWS will present you with a message like this:
Use
pip config
to set the CodeArtifact registry URL and credentials. The following command will update the system-wide configuration file. To update the current environment configuration file only, replace global with site.pip config set global.index-url https://aws:$CODEARTIFACT_AUTH_TOKEN@registry-name-123456789012.d.codeartifact.us-east-1.amazonaws.com/pypi/artifacts/simple/
Notice that I’ve omitted the aws:$CODEARTIFACT_AUTH_TOKEN@
portion
of the URL in my pyproject.toml entry. This is important, because uv
already has its own mechanisms for supplying credentials to package
indexes, which you should utilize instead. In this example, I can forego
setting the CODEARTIFACT_AUTH_TOKEN environment variable and instead set
the following two environment variables:
- UV_INDEX_CODEARTIFACT_USERNAME
- UV_INDEX_CODEARTIFACT_PASSWORD
Note that the “CODEARTIFACT” portion of those environment variables is
only that value because I happened to specify name = "codeartifact"
in
the index definition in my pyproject.toml file. If I had set
name = "gitlab"
then it would be UV_INDEX_GITLAB_USERNAME
instead.
With CodeArtifact specifically, the username should be set to the
literal string “aws” while the password should be the token value
generated by the AWS CLI. With Gitlab, you would set the username to the
literal string __token__
while the password would be an access token
with the appropriate rights. Other registries will have different
conventions, but hopefully you get the picture.
Publishing Cloud Functions
In addition to packaging shared libraries for PyPI and other registries, Python scripts are often deployed as Cloud Functions, such as AWS Lambda, Azure Functions, and Google Cloud Functions. After getting familiar with uv, I’ve learned it can be used to facilitate the packaging of cloud functions. The methods are a bit obscure, so I think it’s worth explaining how to do it.
In the case of Google Cloud Functions and Azure Functions, you need to deploy an actual requirements.txt file along with your Python script(s). You can utilize the following command to generate a requirements.txt file for this purpose:
uv export --no-dev --format requirements-txt
I recommend using the uv export
command instead of uv pip freeze
,
because the latter does not exclude dev dependencies. uv export
allows
you to utilize the benefits of dev dependencies in your codebase while
still ensuring what you package for deployment remains as slim as possible.
With AWS Lambda, things are a little bit different. Google and Azure take care of installing the dependencies for you on the server side, but with AWS, it is your responsibility to install the dependencies ahead of time. Moreover, there are two ways of packaging Python scripts for Lambda execution: you can either upload a ZIP file containing your scripts and dependencies, or you can build a Docker image with a very specific Python package included and push it to ECR.
In the former case, you can run commands like the following to build your deployable Lambda ZIP file:
uv pip install --target dist .
rm -rf $(basename $(pwd).egg-info) # cleanup
cp *.py dist # modify this if you have subfolders
cd dist
zip -r ../lambda.zip .
cd -
The first command will install any non-dev dependencies into the dist directory (you can name this whatever you want), then it will copy any Python files from the root directory of your repo into the same dist directory, and ZIP both the Python scripts and dependencies up into a deployable. Keep in mind that if you have created scripts inside subfolders, you might need to modify my line that copies .py files to the dist folder.
The other way AWS Lambda can package Python code is with a Docker
container. Lambda requires you include a special dependency
called awslambdaric, so when
creating a Lambda function package with uv you’ll need to add that as a
dependency via uv add awslambdaric
. A Dockerfile for an AWS Lambda
file might look like this:
FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim
WORKDIR /opt
ENV UV_CACHE_DIR=/tmp/.cache/uv
COPY pyproject.toml /opt
COPY uv.lock /opt
RUN uv sync --no-dev
COPY . /opt
ENTRYPOINT ["uv", "run", "--no-dev", "python", "-m", "awslambdaric"]
CMD ["lambda_function.handler"]
The critical factors about this:
- We have to set the
UV_CACHE_DIR
environment variable to something under/tmp
because that’s the only writable folder when Lambda is invoked - We have to consistently use the
--no-dev
flag with both sync and run - We copy the pyproject.toml and uv.lock files before calling
uv sync
- We copy the rest of the files after calling
uv sync
- We have to set the entrypoint to call the awslambdaric package
In addition, I make use of a .dockerignore
file (not shown here)
so that the command to COPY . /opt
only copies in Python
files. My choice of /opt
as the working directory is purely a
matter of preference. With a Dockerfile like this, you can build and
push the image into ECR and then deploy it as Lambda. But personally,
I prefer to avoid going the Docker container route because it precludes
you from being able to use Lambda’s code editor in the AWS console,
which I find is rather nice.
Logging
Coming into the Python world having done Java and C# development, I
really didn’t know the best practices for logging. And logging things
properly is so important. I was used to creating logger objects in
Spring Boot applications, using dependency injection to insert them
into various classes, or using a factory class to instantiate them as
class members. That second approach is actually closer to what I now
consider best practices for logging in Python—only instead of
passing around logger objects between classes and files, it’s actually
better to create a unique logger
object within each file at the top,
and let functions within that file access it from the global namespace.
In other words, each file can look like this:
from logging import getLogger
logger = getLogger(__name__)
...
def my_function(my_parameter):
"""
Notice how this function doesn't accept the logger object as a parameter;
it simply grabs it from the global namespace by convention.
"""
...
logger.info("Log message", extra={"my_parameter": my_parameter})
...
Omitted from that sample code is the logger configuration—let’s talk about that. When I was starting out, I quickly realized the advantages of logging things in a JSON format rather than Python’s default format. When deploying web services to the cloud, it can be immensely helpful to have all your logs consistently formatted using JSON, because then monitoring platforms like Datadog can parse your log events into a helpful, collapsible tree structure, allowing more complex searches on your log events.
When I started out using Python, I assumed this meant I had to install some special package to handle the JSON logs for me, so I installed structlog, which is a very popular logging package. Little did I know, I didn’t actually need that at all! I spent a good amount of time implementing structlog in all of my team’s projects, only to then spend more time ripping it out again—after realizing that Python’s standard library was already more than capable of printing structured/JSON logs without the need for third-party packages.
This has been a perennial lesson for me in Python: yes, there probably
is a package on Github that does the thing you want, but you should
always check to see if the thing you want can simply be done with
just the standard library. In the case of JSON-formatted logs,
I’ll share some simplified code examples. First, let’s create a file
called loggers.py
as follows:
import json
import traceback
from logging import Formatter, LogRecord, StreamHandler, getLevelName
from typing import Any
# skip natural LogRecord attributes
# https://docs.python.org/3/library/logging.html#logrecord-attributes
_RESERVED_ATTRS = frozenset(
k.lower()
for k in (
"args",
"asctime",
"created",
"exc_info",
"exc_text",
"filename",
"funcName",
"levelname",
"levelno",
"lineno",
"module",
"msecs",
"message",
"msg",
"name",
"pathname",
"process",
"processName",
"relativeCreated",
"stack_info",
"taskname",
"thread",
"threadName",
)
)
class SimpleJsonFormatter(Formatter):
def format(self, record: LogRecord) -> str:
super().format(record)
record_data = {k.lower(): v for k, v in vars(record).items()}
attributes = {k: v for k, v in record_data.items() if k not in _RESERVED_ATTRS}
payload = {
"Body": record_data.get("body") or record.getMessage(),
"Timestamp": int(record.created),
"SeverityText": getLevelName(record.levelno),
}
if record.exc_info:
attributes["exception.stacktrace"] = "".join(traceback.format_exception(*record.exc_info))
exc_val = record.exc_info[1]
if exc_val is not None:
attributes["exception.message"] = str(exc_val)
attributes["exception.type"] = exc_val.__class__.__name__ # type: ignore
payload["Attributes"] = attributes
return json.dumps(payload, default=str)
def get_log_config() -> dict[str, Any]:
return {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"json": {"()": SimpleJsonFormatter},
},
"handlers": {
"console": {
"()": StreamHandler,
"level": "DEBUG",
"formatter": "json",
"stream": "ext://sys.stdout"
}
},
"root": {
"level": "DEBUG",
"handlers": ["console"]
}
}
As you can see—within this file, we define a class called
SimpleJsonFormatter
, which inherits from logging.Formatter
, as well
as a function named get_log_config
. Then, in the main.py
file (or
whatever your entrypoint Python file is), you can utilize this class and
function like this:
from logging import getLogger
from logging.config import dictConfig
from loggers import get_log_config
logger = getLogger(__name__)
dictConfig(get_log_config())
Note that you only want to call the dictConfig
function exactly once,
ideally right at the beginning of launching your application. The
get_log_config
function will ensure that every log message is passed
through our SimpleJsonFormatter
class and all log messages
will be printed as JSON objects. But best of all, you don’t need to rely
on third-party packages to achieve this! This can all be done with the
standard library logging package.
When I first tried out the structlog package, I thought a big advantage was that you could add in as many extra attributes as you wanted, like this:
structlog.info("Log message", key="value", id=123)
But that functionality isn’t unique to structlog at all. It can
be supported just the same using the standard library, via the extra
parameter. So that same code snippet turns into something like this:
logger.info("Log message", extra={"key": "value", "id": 123})
When this log statement gets passed to our SimpleJsonFormatter
, it
comes in as an instance of the logging.LogRecord
class, and whatever
values you include in the extra
parameter get embedded as a part of
that LogRecord and then printed when we JSON encode it.
I have given a very simplified example of a log formatter here, but there are many more powerful things you can do. For example, with additional log filters, you can utilize the OpenTelemetry standard to integrate traces and more.
You can also rework my get_log_config
Python function into YAML syntax
and use it directly with some ASGI servers. Here’s the equivalent YAML:
version: 1
disable_existing_loggers: false
formatters:
json:
(): SimpleJsonFormatter
handlers:
console:
class: logging.StreamHandler
level: DEBUG
formatter: json
stream: ext://sys.stdout
root:
level: DEBUG
handlers: [console]
So if you build an application with something like FastAPI and uvicorn, you can utilize this custom logging when launching the application, like this:
uvicorn api:api --host 0.0.0.0 --log-config logconfig.yaml
When doing things this way, there’s no need to explicitly call the
dictConfig
function as uvicorn (or whatever other ASGI server you’re
using) will handle that for you.
Finding open-source software packages
As I’ve mentioned, the Python community can often leave you with the impression that it’s the wild, wild west out there. If you can think of some kind of application or library, chances are there are at least ten different versions of it already, all made by different people and all at various stages of development. The obvious example is the one I’ve already spoken about: package management. You’ve read why I think uv is the best tool for the job, but there’s also pip, pipx, poetry, pdm, pipenv, pip-tools, conda, anaconda, and more.
And if you take a look at web frameworks, there is FastAPI, Flask, Falcon, Django, and more. Or if you look at distributed computing and workflow orchestration, you might find packages like Airflow, Luigi, Dagster, or even pyspark. And the examples don’t end there! If you look at blog software, the official Python site showcases dozens upon dozens of different packages all relating to blogs, some of which haven’t seen updates in over a decade.
It can be incredibly confusing to a newcomer to Python to understand where to get started when it comes to identifying the right tooling to solve any given problem. I generally advocate for the same approach Google likes to see when they interview engineering candidates: namely, “has someone else already solved this problem?”
Unfortunately, there is no hard and fast rule to determine what the most appropriate software package or tool is for the problem you want to solve. The answer is almost always “it depends.” But I did discover a very powerful website that aids in the decision-making process: the Snyk Open Source Advisor. The Snyk Open Source Advisor lets you search for any public Python package published to PyPI and provides a package health score, which is a composite of four different metrics. Pictured below is an example package health score for the pelican library, which is the static site generator I use to create my blog.
The package health score ranks a number between 1 and 100, and the four metrics contributing to this score include Security, Popularity, Maintenance, and Community. These are immensely helpful—you certainly don’t want to utilize a package that has known security holes. It’s generally better to hone in on packages with a higher popularity score—though this becomes less true the more obscure your desired functionality is. The maintenance score is a quick way to determine if the project is still relevant and being updated, and the community score can help tell you how easy it would be to seek out support.
I use the Snyk Open Source Advisor all the time as a Python developer. It doesn’t completely solve the problem of determining which Python packages are the best tool for the job, but it sure helps make better informed decisions. There are some cases, of course, where you might deliberately choose a tool with a lower score than some other tool. Or you might find yourself comparing two frameworks that both serve the same function and have relatively high scores. And so experience is really the only critical factor there. But I have found 95% of the time I’ve been able to quickly determine the right package simply by looking at the Package Health score and nothing else.
Code formatting
Earlier I mentioned that I’ve been installing a few tools globally. Let’s focus specifically on three: ruff, lefthook, and mypy. ruff is a code formatting tool written by the same people who authored uv, and I absolutely love it. Previously, I had used a combination of black, pylint, isort, and others to handle code formatting, but I’ve found ruff is a sufficient replacement for them all.
Wikipedia has a whole article on indentation style, which describes different code conventions for how you might format your code in different C-based languages. For instance, you might use the Allman style:
while (x == y)
{
foo();
bar();
}
Or you might use the K&R style:
while (x == y) {
foo();
bar();
}
I’ve witnessed one developer use one style, only to have another developer change things in a pull request with more substantive changes. This muddies the waters during code reviews, sometimes dramatically, when legitimate code changes are hiding between hundreds of lines of whitespace changes. A similar argument can be had with regard to tabs vs. spaces. Of course Python doesn’t have this same problem with indentation because it enforces a single, consistent style as part of the syntax. However, there are other areas of code style where a consistent syntax is not enforced, and this is where tools like black or ruff come in.
In Python, you might have debates over things like line length. For instance, the following two snippets of code are functionally the same, though cosmetically different:
def load_metadata(toml_path: Path) -> dict[str, Any]:
metadata = toml.load(toml_path)
return {
"contact": {"name": metadata["project"]["authors"][0]["name"], "email": metadata["project"]["authors"][0]["email"]},
"description": metadata["project"]["description"],
"title": metadata["project"]["name"].title(),
"version": metadata["project"]["version"]
}
def load_metadata(toml_path: Path) -> dict[str, Any]:
metadata = toml.load(toml_path)
return {
"contact": {
"name": metadata["project"]["authors"][0]["name"],
"email": metadata["project"]["authors"][0]["email"],
},
"description": metadata["project"]["description"],
"title": metadata["project"]["name"].title(),
"version": metadata["project"]["version"],
}
Personally, I find the latter version easier to read, and in this case it’s the version you would get if you ran the code through a formatting tool like ruff. Though sometimes the tool will format the code in ways I wouldn’t necessarily have chosen—but the point of code formatters is it takes away your choice, which is actually a very good thing. This way, we can avoid the scenario where one developer decides to change the style and you end up with long, muddied pull requests.
Python encourages other rules, like import order, as part of PEP-8. Most people coming into Python for the first time don’t realize there is a recommended best practice when it comes to sorting import statements. In fact, it’s recommended that you split your imports into a maximum of three distinct groups:
- Standard library imports
- Related third-party imports
- Local application-specific imports
Each group is meant to be separated by a single empty line, and within
each group, package imports (starting the line with import
) should
come first while specific import (starting with from
) should come
second. Furthermore, each of these groups and sub-groups should be
sorted alphabetically, and any global imports
(e.g. from package import *
) should be avoided altogether.
Beyond what PEP-8 recommends, I advise never, ever using relative imports. The standard practice concedes there are some complex scenarios when relative imports are OK, but I say there is practically no scenario that justifies the use of relative imports in Python. Unfortunately, ruff doesn’t yet have preconfigured rules to ban the use of relative imports altogether, so it’s incumbent on you to simply never use them.
What’s a relative import? Python allows you to write import statements like these:
import .package
from . import sibling
from ..pibling import thing
The problem with these is they are so tightly coupled to your
filesystem that they almost scream, “I’m going to break the moment you
try to share me!” If you prefix a package with a dot (.
) like the
first statement above, it’s telling Python to find either a file named
package.py or a folder named package in the current directory. But this
isn’t necessary, because if you simply say import package
(without the dot) it will do the same thing! If you find yourself
needing to use the dot, it’s likely your package name conflicts with a
reserved keyword or something in Python’s standard library, in
which case you should just rename your package or file. Similarly, there’s
no need to use from . import sibling
; import sibling
works just fine.
Lastly, avoid using the double-dot to import something from outside your
directory. If you’re working with a separate module intended as a
shared library, it’s better to properly package it and install it into
your environment rather than summoning the unholy mess of the double-dot
import.
So with that out of the way, let’s see how I have ruff set up. As
mentioned earlier, I first installed the ruff utility with
uv tool install ruff
. This makes it globally available. Then, I
created a configuration file at ~/.config/ruff/ruff.toml
which looks like this:
[lint]
extend-select = [
"RUF100", # disallow unused # noqa comments
"I001", # isort
]
This extends the default settings to include a couple of rules.
RUF100 disallows
unused # noqa
comments, which was a helpful recommendation from a friend,
and I001 ensures
ruff automatically sorts imports per the PEP-8 standard.
In addition to these rules in my global configuration, I’ve also been copying them to each pyproject.toml file any time I start a new project. This ensures that the rules always run, but beyond adding them to pyproject.toml files, it also ensures that anyone else looking at my code for a given package also follows the same rules. The syntax is only slightly different inside the pyproject.toml file:
[tool.ruff]
lint.extend-select = [
"RUF100", # dissallow unused # noqa comments
"I001", # isort
]
Then you just need to run ruff format .
and ruff check --fix .
in
sequence to apply these changes to your codebase. (The first is called
formatting while the latter is called linting.) You can also make
these commands run automatically either by hooking them into
VS Code or using a pre-commit hook. I’ve opted to go with the
latter route, and I’ve specifically been using a library called
lefthook to accomplish this.
Lefthook is basically a faster version of pre-commit, written in Go. It allows you to define a list of rules to run before checking in your code to Git. If any of the rules fail, the commit will also fail, so you’re forced to deal with the failure before pushing your code. This setup involves creation of a lefthook.yml file in your repository. Here’s an example lefthook.yml file that I’ve been using in one of my repositories:
pre-commit:
commands:
uv-sort:
root: "backend/"
glob: "pyproject.toml"
run: uv-sort
python-lint:
root: "backend/"
glob: "*.py"
run: ruff check --fix {staged_files}
stage_fixed: true
python-format:
root: "backend/"
glob: "*.py"
run: ruff format {staged_files}
stage_fixed: true
nbstripout:
root: "backend/"
glob: "*.ipynb"
run: nbstripout {staged_files}
stage_fixed: true
js-lint:
root: "frontend/"
glob: "*.{js,mjs,cjs,ts,vue}"
run: npx eslint --fix {staged_files}
stage_fixed: true
js-format:
root: "frontend/"
glob: "*.{js,mjs,cjs,ts,vue}"
run: npx prettier --write {staged_files}
stage_fixed: true
In this example, I have six pre-commit commands defined and only four of them are Python-related. I’ve segmented my code into different folders for the backend API (written in Python) and the frontend application (written in React.js). This is a good illustratration of how you can segment your lefthook rules to operate only on specific directories (and files) within your codebase.
As you can see, the first command makes use of the uv-sort
utility
previously installed via uv tool install
. This command ensures
the dependencies in my pyproject.toml file are sorted. The next two
commands are the two ways of invoking ruff
which I’ve already
explained. And then I also run npstripout
to clean up any output from
Jupyter notebook files before committing them. The last two commands run
linting and formatting on a JavaScript codebase.
Notably, I’ve opted to make use of the stage_fixed
directive too, which
means that if any of these lefthook commands result in changes to the
files I’ve staged, those changes will silently and automatically be
included in my commit as well. This is a choice you may want to consider;
I personally think it’s useful because I don’t ever think about formatting.
Others may prefer to have their IDE automatically run formatting and
linting commands upon save, and still others prefer not to do either,
instead to forcing the user to create a second commit with any fixes.
Noticeably absent from my lefthook.yml
file is any invocation of
mypy, a static type checker. Static type checking is similar to linting,
but it goes even further. Sometimes there are instances where
you may not care when static type checking fails. Python is a dynamic
language, meaning that variables are not strictly typed. Python also
has a robust system of type hints allowing you to more explicitly
designate how types flow between function and method calls. But at the
end of the day, even type hints are still hints, meaning nothing in
the Python executable is going to actually enforce them. mypy helps
verify that when type hints are specified, they are used consistently.
Hynek Schlawack, a well-known Pythonista and open source contributor,
has recommended
not to run mypy as part of your pre-commit workflows. And Shantanu
Jain, a CPython core developer and mypy maintainer, has an excellent
write-up of some of the
gotchas that come with running mypy as a pre-commit hook. By default,
mypy runs in an isolated environment, meaning it won’t have
access to your project’s virtual environment and therefore won’t be
able to fully analyze type hints when your code makes use of
dependencies. Additionally, pre-commit hooks only pass the changed/staged
files by default, whereas mypy needs to see the entire repository to
function correctly. I’ve also noticed nesting your code under other
folders (like my backend/
directory) can also cause problems.
Instead, I recommend setting up a Justfile at the root of your repository and defining a rule for manually running static type checks. Here’s an example of a Justfile I’ve used in a past project:
PYTHON_EXECUTABLE := `cd backend && uv python find`
default:
@just --list | grep -v "^ default$"
clean:
@find . | grep -E "(/__pycache__$|\.pyc$|\.pyo$|\.mypy_cache$|\.ruff_cache$|\.pytest_cache$)" | xargs rm -rf
init-backend:
@cd backend && uv venv
@cd backend && uv sync
test-backend:
@cd backend && PYTHONPATH=$(pwd) uv run pytest .
typecheck:
@MYPYPATH=backend mypy backend --python-executable {{PYTHON_EXECUTABLE}}
The key takeaway here is the final typecheck
rule which, as you can
see, changes directories into the backend/
folder and calls mypy
with specific arguments. The --python-executable
argument is essential
when you have dependencies, as it allows mypy to properly utilize the
project’s virtual environment.
I’ve included three other rules here, which I’ll briefly touch on, for those not familiar with Just.
- The
default
rule allows me to typejust
from the terminal to print all available rules, excluding the default rule itself. - The
clean
rule quickly deletes hidden files and folders that may appear after running the code, the tests, or formatting tools. (These files are also excluded in.gitignore
.) - My
init-backend
rule is meant to be run after cloning the repo for the first time. - My
test-backend
rule runs the unit tests, which I’ll touch on in the next section.
My examples here showcase a project where I’ve thrown
everything Python-related into a single folder. However, in many
projects, you won’t have to do this. I’m sharing this example
because keeping everything at the root of the
repository is considerably easier to deal with. In that case, you can
simply remove all the cd backend &&
commands from the Justfile and the
root:
directives from the lefthook.yml file.
Testing and debugging
Being able to debug your code is critical. For those of you who come from a data science background, this may be a new concept. If you’re using Jupyter notebooks, the whole concept of debugging is probably incorporated into your workflow already, at least conceptually. Juptyer notebooks build in the concept of breakpoints by forcing that you run your code line by line or block by block. But when we move beyond publishing Juptyer notebooks and into developing production-grade applications, developing good debugging practices is essential. Fortunately, the tooling in the ecosystem has improved dramatically in the last couple of years.
Hand in hand with debugging comes the ability to run unit and integration
tests. Python has a couple of different frameworks for testing including
pytest
and unittest
, and there’s some overlap between the
two. I’ll cover the basics of setting up unit tests and
debugging in Python—without going into the moral question of using
mocks, monkey patching, or other controversial techniques. If you’re
interested in those discussions, I’ll include links to relevant articles
at the end.
When I first began coding in Python, VS Code integration was still fairly immature, and it made me want to reach for Jetbrains Pycharm instead. Pycharm has consistently maintained a very intuitive setup for debugging and testing Python, but it comes with a hefty price tag. However, in February 2024, I happened to catch an episode of the Test & Code podcast titled Python Testing in VS Code in wherein the host interviewed a Microsoft product manager and software engineer about overhaul improvements made to the Python extension for VS Code. And I agree—testing and debugging in VS Code is way better than it used to be. So much so that I no longer recommend purchasing Pycharm; VS Code (which is free) is more than sufficient.
To get started with testing, I do recommend installing pytest as a dev
dependency, using uv add pytest --dev
. I suggest creating a “test”
folder in the root of your repository and prefix the files within with
test_
. I also use the same prefix when naming test functions contained
within those files. Here’s an example: I might have files named
test_api.py, test_settings.py, or test_utils.py, and within those files
the test functions would look something like this:
from mymodule.utils import business_logic
def test_functionality():
result = business_logic()
assert result, "Business Logic should return True"
This simple test exercises one key aspect of unit testing: using assertions. It runs the business logic function and asserts that the result returned is truthy. Moreover, the test will fail if the business logic function raises any kind of exception. If the test fails, we will see the string message in the test output. That alone is already a massively useful tool in the toolbox for remediating and even preventing bugs, but let’s touch on a few other useful aspects about testing code in Python:
- Capturing Exception
- Fixtures
- Mocks
- Monkey Patching
- Parameterization
- Skip Conditions
Beyond testing whether certain code returns specific values,
it’s also useful to craft tests that actually expect failure. This
doesn’t mean using exceptions as control flow (which is
largely considered an anti-pattern), but rather testing that invalid
inputs will reliably raise an exception. You can utilize the raises
function from pytest, which as you’ll see here is actually a context manager:
from pytest import raises
from mymodule.utils import business_logic
def test_bad_input():
with raises(ValueError):
business_logic("Invalid input data")
As shown above, you specify an exception type (which can be a tuple of multiple types when applicable) to the raises function. Then something inside the context manager block must raise that specific exception, otherwise the whole test will fail. Useful!
However, sometimes the input for your business logic function is more complex than a simple scalar value. You might also want to re-use that same complex input across multiple test functions. While you could declare the input as a global variable, this can be problematic—particularly when you have functions that modify their own inputs. It’s better to use fixtures. Here’s an example of declaring a fixture that loads JSON from a file:
import json
import pytest
from mymodule.utils import business_logic
@pytest.fixture
def complex_input_data():
with open("test_data.json") as fh:
return json.loads(fh.read())
def test_functionality(complex_input_data):
result = business_logic(complex_input_data)
assert result, "Business Logic should return True"
As you can see, using the @pytest.fixture
decorator allows us
to insert the name of our fixture function as an argument to any test
function, injecting it automatically by pytest.
A common challenge in Python applications and scripts is dealing with functions that perform business logic that can create side effects to, or dependencies on, external systems. For instance, maybe you have a function that calculates geographical distance between two points, but it needs to query a database first. Or maybe you have a function which reformats an API response and also saves it to S3.
Now, I can already hear several of my colleagues screaming in my ear that you should simply write better code when functions are serving multiple purposes, that you use proper inversion of control, and so on. And while those are valid opinions, the purpose of this post isn’t about debating best practices for architecture and testing. Rather, my goal is to empower newcomers to everything Python has to offer. And with that, I need to talk about monkey patching.
Monkey patching is a feature in Python (as well as some other dynamically-typed languages) that allows you to modify the behavior of your application at runtime. Suppose you have a function like this:
import boto3
import json
from mymodule.utils import transform_data
def business_logic(input_data, bucket_name, filename):
# manipulate the data
result_data = transform_data(input_data)
# save both forms to S3
cached_data = json.dumps({"raw": input_data, "transformed": result_data})
s3 = boto3.resource("s3")
bucket = s3.Bucket(bucket_name)
bucket.put_object(Key=filename, Body=cached_data)
return result_data
In this case, when testing the business logic function, you want to
be able to assert it performs the transformation correctly, without
your unit tests actually creating new files in S3. You might be
saying, “why test business_logic
at all? Just test transform_data
!”
And in this overly simplified example, your instincts would be right.
But let’s suspend disbelief for a minute.
If we want to test the business logic function and avoid writing
anything to S3, we can use monkey patching to dynamically swap the
call to boto3.resource()
with something else. Sometimes that can
be a mock object, like this:
from unittest.mock import patch, MagicMock
from mymodule.utils import business_logic
@patch("mymodule.utils.boto3")
def test_functionality_without_side_effects(mocked_boto3, complex_input_data):
mocked_s3 = MagicMock()
mocked_boto3.resource.return_value = mocked_s3
mocked_bucket = MagicMock()
mocked_s3.Bucket.return_value = mocked_bucket
result = business_logic(complex_input_data, BUCKET_NAME, FILENAME)
assert result.get("transformations") == 1, "Transformations key should be present"
mocked_bucket.put_object.assert_called_once()
Let’s dive into what’s happening here. I specified the patch decorator
with the string mymodule.utils.boto3
. Notice I didn’t directly import
that path within the test script, but I did import the first part of it:
mymodule.utils
. Pytest is then smart enough
to figure out that from within that package, I further import boto3.
The monkey patching happens right at that nested import statement,
so instead of actually importing the real boto3 library, it returns a mock
object any time I reference boto3
within that file. And that’s useful!
Now, based on my function definition from above, I do call
s3 = boto3.resource("s3")
. And here’s the beauty of mock
objects in Python: any method or property of a mock object will also
return another mock object. You can even override certain behaviors like
return_value
and side_effect
of mock objects. Back to my test
function, I declare the return value of the resource()
function
on boto3 should be another mock object, which I’ve named mocked_s3
.
Then I specify the return value of its Bucket
property should be
another mock object named mocked_bucket
. I don’t have to declare
additional mock objects explicitly like this; I could have used something like:
mocked_boto3.resource.return_value.Bucket.return_value.put_object.assert_called_once()
But I’m not a fan of extra-long lines like this, so when chaining calls
I find it’s cleaner and more readable to declare multiple mock objects
as I’ve done above. So, if you’re going to use mocks, I recommend this
approach. Lastly, setting the return_value
property allows you to
return a fixed value, whereas setting the side_effect
property to a
function allows you to return data conditional on the input.
Next, pytest also offers a parameterize decorator that lets you specify multiple inputs for a single test function. I grabbed this example straight from the pytest documentation to illustrate:
import pytest
@pytest.mark.parametrize("test_input,expected", [("3+5", 8), ("2+4", 6), ("6*9", 42)])
def test_eval(test_input, expected):
assert eval(test_input) == expected
There may be cases where you want to skip running tests
unless certain conditions are met. I’ve used the skipif
decorator to
check whether a specific environment variable was present in the case of
integration tests, like this:
import os
import pytest
from mymodule.utils import business_logic
@pytest.mark.skipif(os.getenv("NEEDED_ENV_VAR") is None, "NEEDED_ENV_VAR was not set")
def test_integration():
assert business_logic()
Finally, if you’re using VS Code, the built-in testing integrations are really nice to work with. Get started by clicking the Testing icon on the left menu.
Click the blue button to Configure Tests, and then choose
the pytest framework. Next, you’ll need to choose the directory your
project is in. In most cases this will be the root directory (.
),
unless you’re like me and you segment projects into a “backend” folder.
If you are like me, you may also need to specify your Python interpreter
by clicking on the version of Python at the bottom right of the screen,
clicking “Enter interpreter path…”, and then typing out something like
backend/.venv/bin/python
. If you’re doing everything in the root of
your repository, this will not be necessary, as VS Code likely
auto-detected your virtual environment already.
Voila! The tests should appear on the left pane along with Run and Debug buttons to execute them. Similarly, another helpful tool in being able to run your code directly inside VS Code while utilizing breakpoints. To get started, you’ll want to click the Run and Debug icon on the left menu:
The first time you visit this tab, it will display a message that says, “To customize Run and Debug create a launch.json file.” Simply click the link to create a launch.json file, which will bring up a command palette prompt. Choose “Python Debugger” as the first option, and then you’re presented with a list of different templates. I usually pick “Python File” or “Python File with Arguments” to start, but you can add as many templates as you like. It will create a file that looks something like this:
{
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Current File",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal"
}
]
}
Back in the Run and Debug tab, you should now see a dropdown menu on top (with your “Python Debugger” configuration selected by default), with a small, green play button to the right. Click this button to launch Python in debug mode. But be aware that if you’re using the default “Current File” configuration, make sure you have your main.py file open and in-focus, and not your newly created launch.json file. Set breakpoints in your code by clicking the red dots just to the left of line numbers within your codebase, and now you’re cooking with gas!
My last bit on debugging is obviously uncontroversial, but my earlier section on utilizing mocks can be divisive. A lot of seasoned developers actively discourage the use of mocks, or refine it to specific types of mocks (e.g. stubs and fakes). So if you’re curious, here’s some recommended reading:
- Don’t Mock What You Don’t Own by Hynek Schlawack
- Mock Hell by Edwin Jung
- Mocks Aren’t Stubs by Martin Fowler
- Fast tests for slow services: why you should use verified fakes by Itamar Turner-Trauring
- Stop Mocking, Start Testing by Augie Fackler & Nathaniel Manista
Some of the takeaways from those articles have to do with using too many mock objects, and then if you refactor your codebase later, it becomes super painful to find all of the mocks. They also recommend writing your tests to an interface (aka public API) rather than testing specific implementation details. Ask yourself the question, “How much could the implementation change, without having to change our tests?” And finally, several experts advise almost exclusively using fakes instead of mocks and recommend treating mocks as a tool of last resort.
The larger morals of mocking and testing are out of scope for this article, but if you have questions about any of those points, I encourage you to peruse through those five links for more understanding.
Conclusion
Python is a wildly powerful ecosystem that can seem overwhelming at first. Python has become the lingua franca of data science and although it has plenty of detractors, it isn’t going anywhere any time soon. So to briefly summarize all of my main points:
- Don’t use relative imports.
- Install uv and play around with its myriad features, both for managing dependencies and publishing packages.
- Windows users: install WSL2 before installing anything else.
- Configure loggers to output things in JSON format, fully
utilizing the
extra
argument. - Search the Snyk Open Source Advisor when looking for existing packages.
- Use ruff for formatting and lefthook for automation, but make sure to handle mypy carefully.
- Use the pytest integration in VS Code to better test and debug your code.
Hopefully this article illuminated some areas of development you were unsure about, equipped you with powerful new tools, and left you feeling empowered to get started. And let me know in the comments if you disagree with any of my recommendations or have better ideas!