20 August 2021
Many software projects require sensitive data which shouldn't be committed to version control. You don't want Bad Guys to read your usernames, passwords, API keys, etc.
To get around this, many projects require that each collaborator creates a local .env file with a list of secret environment variables which are read by the code. The file is then gitignored so that the sensitive values never appear on GitHub for malicious users to steal.
But for well-intentioned new collaborators, the complete absence of a .env file means that the required secrets for a project are rarely obvious. It's easy for a project's code to fall out of sync with the instructions for creating a .env, and not knowing which secrets to provide can be a significant source of friction while trying to get started with a codebase.
I want to share my .env files with my collaborators, without sharing the secret bits.
In an attempt to improve this situation, I've built a python package called dotenv-stripout.
When dotenv-stripout is installed in a git repo, a filter cleans the sensitive values from your .env files as they're staged for commit, while keeping the names of the secrets intact. For example:
The result for your collaborators is an always-up-to-date set of required secrets, in the actual .env file where they need to be filled out.
dotenv-stripout is loosely inspired by nbstripout: a package which I use daily as part of my work with jupyter notebooks and git. As notebooks are added to git, the output of each cell is removed, or 'stripped' from the file. The original file on my machine remains intact with all of its messy outputs, but the version which is pushed to GitHub looks like it has been never been run.
This almost always makes notebooks more readable, but also limits the potential for large notebook outputs (eg high-resolution graphs or images) to bloat the size of a file. For some cells with stochastic/random outputs, stripping outputs is extra important: simply re-running a notebook without changing the code could generate huge diffs, despite not making any significant changes!
The parallels to the .env problem felt obvious - all I had to do was build the solution.
To achieve the magic, invisible cleanup effect without affecting the local copy of a file, both dotenv-stripout and nbstripout make use of git's "smudge" and "clean" filters which specify commands to be run when a file is checked in or out of the repo's staging area. When installed in a repo, the filters and commands are written to the repo's hidden .git/config and .git/info/attributes (or .gitattributes) files.
Those filters can also be set in the global git configuration, affecting files in every repo. If you're interested in reading more, the git documentation includes some great tips for tweaking and customising your git configuration.
The actual stripping work done by dotenv-stripout is considerably simpler than nbstripout. While nbstripout has to contend with the complex metadata and json structure of a jupyter notebook to determine whether something should be removed, dotenv-stripout simply goes through each matching file line-by-line, removing any characters appearing after an =.
The dotenv-stripout package also includes a friendly CLI (implemented with typer) which lets you install, uninstall, or check the status of the filter in a repo (or globally). It also allows you to actually strip the values from .env files in a repo.
Of course, this isn't the only solution to this problem, nor is it the best one.
As I mentioned earlier, the use of a blunt .gitignore file is a widely-accepted pattern which does enough to keep your secrets safe, but has some inherent problems.
Some projects choose instead to use a third-party secret store (eg AWS secrets manager, or HashiCorp Vault) to hold their secrets, which are then fetched when the code runs. Using a third-party store often comes with lots of complexity and cost overhead.
In other projects, environment variables might only ever be necessary during builds, so managing them within those build tools directly (eg with vercel, netlify, or github actions), might be the best solution.
All of these options are sophisticated tools which cater to large projects with an emphasis on cloud computing and automation.
In my mind, dotenv-stripout is positioned as a small improvement on the standard .gitignore approach, applicable to small projects where contributors are likely to run the code themselves or manually create their .env files.
If you want to try working with dotenv-stripout in your own projects, just run
pip install dotenv-stripout dotenv-stripout install --global