Python Package and Environment Management#

Introduction#

Writing programs in Python, especially for astronomy, often requires Python packages that are not part of the standard library. In addition, certain Python packages and/or workflows will also require certain system packages to be installed. Python packages will require specific packages/modules to be installed, as well as certain versions (or ranges of versions) of a package. These installed packages, in turn, have their own set of requirements! What’s worse is that a single user may want to have access to several versions of packages for different projects, or even different versions of Python as well!

Package manager

Software that handles installing packages and dependency resolution.

It may not be possible to meet the requirements of every application, especially if the conditions are in conflict, and installing either version 1.0 or 2.0 will leave one application unable to run. The solution to this problem is to create a virtual environment. This self-contained directory tree contains a Python installation with additional packages, potentially even non-Python packages.

Virtual environment

A cooperatively isolated runtime environment that allows Python users and applications to install and upgrade Python distribution packages without interfering with the behavior of other Python applications running on the same system.

For this reason, package managers and virtual environment managers are essential for modern day workflows in astronomy. Many of these exist, and you are likely familiar with some of them, such as pip and venv. Some tools can even handle both virtual environment and package managing, all in one tool! This is the case for conda, which you also might be familiar with.

In this article, we will review and describe several common resources, ranging from tools included in Python itself to relatively new software that is actively being developed.

Note

This article describes a few different resources and tools for managing virtual environments. They broadly fall into two categories with slightly different philosophies:

  1. Namespace-Based:

    In this paradigm, virtual environments are associated with a “name” and are designed to be activated from anywhere in the system. Typically the environments are all held in a single location. A good example of this paradigm is conda, where the user creates environments, which are all stored in a configured location, and can be activated at will from anywhere on the system.

  2. Project-Based / Project-Scoped

    In this paradigm, virtual environments are associated with a specific directory tree on the filesystem and are designed to be activated within this tree. A good example of this paradigm is venv, where invoking the command creates a folder in the specified directory that contains scripts that the user can invoke to activate the environment. It should be noted that it is still possible to activate/access the environment from other locations on the filesystem, but it might be slightly more tedious.

Python good practice: virtual environment per project

Regardless of the paradigm you prefer to use, it is good practice to use a virtual environment for each project you wish to work on. It will prevent interference between projects’ libraries and make package management more transparent. In addition, it can make sharing more straightforward, ensuring reproducibility.

Python workspace: shared virtual environment for multiple projects

Workspaces help organize large codebases by splitting them into multiple packages with independent dependencies. Each package in a workspace has its own definition, but they are all locked together in a shared shared virtual environment.

It is a common workflow in academic research to combine multiple projects in one environment (probably coming from the conda ecosystem). Be aware that there are good tools like poetry and uv that can protect you from conflicts between packages and versions.

In this article, we review a few ways to create and manage virtual environments. This is only intended to give you a high-level understanding of the pros and cons of each tool, so you can better make your own decision about what tools you need. As always, consult the documentation for these tools directly; many have lengthy tutorials. You can also check out YouTube for demos for quite a few of these as well!

Shell completion

Quite a few of these tools allow for shell completion, e.g. using to autocomplete parts of the commands. Check the documentation for each tool on how to enable these features.

pip#

pip is the Python package manager. It’s hard to think about Python without thinking of pip! Nearly all installations of python come with pip. You can even invoke a special command in python to install pip: python -m ensurepip

Pip is able to install Python packages by pulling code from the Python Package Index (PyPI), from GitHub repos, or from the local filesystem. It handles dependency installation and resolution for you! Simply typing pip install <package name> (or pip3 install depending on which is available in the path) will install Python packages to the current environment.

Pros: It’s as straightforward as they come! If you need a purely Python package manager, pip is almost always there for you, and can get you set up quickly.

Cons: pip can only install Python packages, not any other kinds of packages which those packages might depend on. pip is also relatively slow. pip is also not ideal for reproducibility or package development since it relies on pyproject.toml or requirements.txt files to lock in dependencies, but it cannot handle complex dependencies (e.g., you cannot fix the dependencies of dependencies), you can get inconsistent package installations between systems, etc.

venv#

venv is the module used to create and manage virtual environments since Python 3.6. It is project-based, as it creates a directory that contains all of the relevant scripts to activate/deactivate the environment. As long as you have Python on your system, you can use venv!

To create a virtual environment, decide upon a directory where you want to place it, and run the venv module as a script with the directory path:

python -m venv <dirname> 
source <dirname>/bin/activate # If you are not using bash/zsh, there are other scripts to use
pip install numpy # This will install only to the local environment. 

This snippet above creates a local directory <dirname> (if not existing), which contains all the information of your virtual environment.

Some additional options to python -m venv include:

  • The --symlink option allows you to avoid duplicating libraries from the main system (overwritten if you require specific versions.)

  • The --prompt option overwrites the default name of the project, so you can customize how it appears in your shell.

To deactivate a virtual environment, simply type:

deactivate

Pros: It comes default with any version of Python since 3.6. It’s straightforward, simple, and works well with pip.

Cons: The venv module does not offer to install a different Python version than the one used to create the environment. It is also not easy/designed to make environments reproducible.

Attention

The venv module does not offer to install a different python version than the one used to create the environment. If you need to use a different version of Python, consider using the other options below.

mamba (replacement for conda)#

We’re now going to talk about conda/mamba, one of the staples in Python configuration management. It can handle both package and virtual environment management.

(micro)mamba is a reimplementation of the conda package manager in C++. It is fully compatible with conda, but it is much faster. mamba is a drop-in replacement for the conda package manager, utilizing the same configuration files, and only changing the package solving part. micromamba is comparable to miniconda as the smaller packaging that doesn’t come with any installed packages by default and is the preferred method.

Warning

conda licensing update

The Anaconda Python Distribution has been the foundation for Python user applications, offering a large selection of important packages such as NumPy, SciPy, matplotlib, etc. in compatible versions and using optimized builds and libraries.

However, earlier this year, Anaconda Inc. has changed its software licensing model such that it would cost licensing fees to use the Anaconda distribution of Python (this includes the conda). The individual scope of the free tier remains unclear. For those reasons, MPCDF/MPIA is not allowed to install new versions of Anaconda Python any more.

As a drop-in replacement, mamba uses conda-forge, a community-driven initiative that develops conda packages which do not fall under the strict licensing of Anaconda Inc. and can therefore be used freely. Note that a lot if not most of packages are available in conda-forge, but some might be missing or slightly outdated.

One of the most powerful aspects of mamba (like conda) that set it apart from all of the other solutions we have talked about before is that mamba can install non Python dependencies in your environments. The ability to specify system Packages as dependencies in environmnets means that, in essence, mamba is not a Python virtual environment manager, but an environment manager for packages that exist in the conda-forge ecosystem.

You can install micromamba, the preferred mamba implementation, with their provided installation script or with your system’s package manager (apt-get, brew, etc.):

"${SHELL}" <(curl -L micro.mamba.pm/install.sh)

You can use mamba to create namespace-based virtual environments:

micromamba create --name <projectname>
micromamba activate <projectname>
micromamba install python=3.11 numpy cuda 

Just like that, we’ve installed Python, Python packages, and even non-Python packages in one line! These can all be written out to a share-able file with micromamba env export --no-build

You can also use pip to install packages to the environment if they are not available in conda-forge. However, it’s usually recommended you do this after installing packages with mamba.

# example of packages to install
pip install matplotlib numpy
# better: make a list of packages
pip install -e requirements.txt

All you need later on is to activate the virtual environment when required, anywhere on the filesystem:

micromamba activate <projectname>

To deactivate a virtual environment, type:

micromamba deactivate

Pros: mamba is currently the best namespace-based Python environment manager that exists, due quite largely to the fact that it can also install non-Python dependencies! It is relatively fast (though not the fastest we’ve talked about), relatively reproducible, and has a large list of packages available from conda-forge (you can, of course, pull packages from Anaconda, however as discussed above this may have other consequences).

Cons: The conda recipe format is not the most reproducible, and mamba is not the fastest code out there. In addition, mamba still has some legacy baggage that it ports over from being an attempt to replace conda, such as the base environment.

Conclusion: If you are used to conda, switch to mamba.

Poetry#

So far we have focused on creating isolated environments and installing packages with the purpose of using them for our own work. However, we have neglected reproducibility as a core aspect of Python workflows.

Poetry is a tool for dependency management and packaging in Python. Poetry enforces the best practice of creating a virtual environment for each project and manages the dependencies (and versions) for you. Poetry generates and updates the pyproject.toml file of your project. The pyproject.toml file is a standardized configuration file used in Python projects to specify build system requirements, dependencies, and project metadata.

You can install poetry as any python package or with your system’s package manager (apt-get, brew, etc.):

pip install poetry

you may need --user option if you don’t have admin rights.

To create a new project with poetry, use the following command:

poetry new <projectname>

To add a package to your project, use the following command:

poetry add <package>

You’ll notice that this automatically updates the pyproject.toml file with the dependencies. You can edit the file manually or use the Poetry CLI, as demonstrated above. If you’ve edited manually, you can install all packages in the environment using the poetry install command.

To activate the virtual environment, use the following command:

poetry shell

This sets a shell environment with your project parameters and dependencies. To deactivate the virtual environment and exit this new shell type exit.

Note

poetry run To run a script, you can simply use poetry run python your_script.py. Likewise if you have command line tools such as pytest or black you can run them using poetry run pytest.

More information on poetry can be found in the official documentation. It comes with a lot of great features, include the ability to directly publish your package to PyPI!

Pros: In terms of project-scoped environment management, it’s hard to think of what features would be better suited for reproducibility than what is provided by Poetry. If you are developing a Python package, Poetry is a great option.

Cons: Some parts of Poetry can be slightly slow and it also cannot specify non-Python dependencies (other than Python itself).

uv#

uv is an extremely fast Python package installer and resolver written in Rust. It is designed as a drop-in replacement for pip, pip-tools, pipx, poetry, pyenv, virtualenv and their associated workflows.

You can install uv using pip, but it is recommended to avoid using the system’s python and instead use the uv installer script or install with your system’s package manager (apt-get, brew, etc.):

curl -LsSf https://astral.sh/uv/install.sh | sh

You can create a virtual environment (similar to venv) with uv:

uv venv <directory name> --python 3.12.0

We can activate and add packages to

source <directory name>/bin/activate
uv pip install numpy

If you’ve been following along and trying this, you may notice that uv is orders of magnitude faster than pip. This is a major selling point of the software, the authors of uv have a track record of making fast, performant, software for Python, such as the ruff linter.

To deactivate the virtual environment, use the following command:

deactivate

You can create a project/workspace (similar to Poetry) with uv, though this feature is still experimental.

uv init <projectname>
uv add <package>

Pros: If your workflow requires pip and venv and a different version of Python, look no further than uv. Not only is it ludicrously fast, but it is also feature-rich and supports nearly all of the pip and venv syntax. Just add uv before your regular commands and you’re good to go!

Cons: As with pip, uv can only install Python packages, except that it can install different versions of Python. In addition, reproducible project/workspaces (i.e. poetry replacement) with uv are still experimental.

pixi#

pixi attempts to have the reproducibility of Poetry with the ability to add non-Python packages like mamba as well as be a global installation tool (more on that later). In fact, pixi doesn’t claim to be a Python manager at all! It works with Python, R, C/C++, Rust, etc. It is a project-scoped package and environment manager that is designed with reproducibility and workflows in mind.

You can install pixi with their provided installation script or with your system’s package manager (apt-get, brew, etc.):

curl -fsSL https://pixi.sh/install.sh | bash

To create a project, we can use the following:

pixi init pixi-hello-world
cd pixi-hello-world
pixi add python numpy astropy

We can add packages for conda-forge, PyPI, or any other configurable source. See their documentation to learn more.

Similar to other tools we can activate the environment with pixi shell (exiting it with exit) or run commands with pixi run like pixi run python --version.

What’s great is we can even add automated actions that can run for us:

pixi task add checkversion "python --version"
pixi run checkversion

Pixi can even be used to install packages globally, which can be extremely useful on systems where you don’t have root access, such as MPCDF. You can also use this to install your favorite command-line tools.

pixi global install bat # Like cat, but with wings

It even promises to allow you to publish projects to conda-forge in the future, though this still needs to be implemented.

Pros: Want one tool that can do it all? pixi appears to promise this. It doesn’t matter what language you develop in or what platform you are on, pixi enables you to make highly reproducible packages and workflows with minimal effort. If you are a fan of project-scoped environments, this might be for you.

Cons: This package is still in development. It has high promise but still has much ground to cover.