Using Poetry for Python Dependency Management
Poetry is a tool for Python packaging and dependency management. Check out the poetry docs here: https://python-poetry.org. As a data scientist you can use poetry to create reproducible python environments for you and your team. The key features are:
- A command line interface for declaring dependencies.
- The ability to document development only dependencies in addition to production dependencies.
- A lock file to record all dependencies and sub-dependencies.
- Easily develop a package with built in package development features.
Usage
Installation
To install poetry run the following command:
curl -sSL https://install.python-poetry.org | python3 -
Follow the instructions from the terminal output to configure poetry. For example, if you are using bash you will need to add the following line to your ~/.bashrc file:
export PATH="$HOME/.poetry/bin:$PATH"
Restart your shell, and verify that poetry is working by checking the version:
poetry --version
Create a new project
Create a new empty directory for the project.
mkdir ~/my_app
cd ~/my_app
Use the poetry init
command to setup poetry:
poetry init \
--no-interaction \
--name my_app \
--author "YourName <yourname@gmail.com>" \
--description "A hello world poetry example"
If using Pip you would run the following commands:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip wheel setuptools
poetry-init
The poetry init
commands creates a pyproject.toml file. After running the above command your project structure will have only one file and look like this:
.
└── pyproject.toml
pyproject.toml
[tool.poetry]
name = "my_app"
version = "0.1.0"
description = "A hello world poetry example"
authors = ["YourName <yourname@gmail.com>"]
[tool.poetry.dependencies]
python = "^3.9"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
pyproject.toml is a special file that poetry uses to store project configuration data. It is not specific to poetry, other tools can also store information in pyproject.toml (read PEP 621 to learn more). The tool.poetry
section of the pyproject.toml file is where the poetry specific meta-data is stored (https://python-poetry.org/docs/pyproject/). As you will learn in the upcoming sections pyproject.toml will automatically be updated by poetry as we add and remove dependencies.
Manage dependencies
Poetry comes with a suite of commands that you can use to manage your dependencies without ever touching pyproject.toml by hand. The main commands include:
poetry add
: declare a new dependency.poetry remove
: remove a dependency.poetry run
: run a command inside the poetry virtual environment.poetry export
: export the project dependencies into another format.
Add a dependency
Package dependency
Add the requests package as a dependency.
To add a dependency you can use the poetry add
command.
poetry add requests
Running poetry add <PACKAGE_NAME>
will achieve several things:
- poetry will install requests into a virtual environment.
- poetry will make note of requests as a dependency in pyproject.toml.
- poetry will make note of the version of requests used, as well as of the requests dependencies in poetry.lock
You can think of poetry add <PACKAGE_NAME>
as being equivalent to pip install <PACKAGE_NAME>
. One of the benefits of using poetry add <PACKAGE_NAME>
is that the requirement will be documented in our pyproject.toml, where as with pip the requirement is not documented in any configuration file.
To document the packages we require we will use a requirements.txt file and install from there.
requirements.txt
requests
Then run:
pip install -r requirements.txt
Development dependency
A development dependency is a package that is only required by the package developers. For example we may want to use a code formatter, but there is no reason for the code formatter to be installed when the end user installs the package. Lets install black as our code formatter.
poetry add --group dev black
The output will look very similar, but note the use of the --group dev
option. This tells poetry that this is a “development only” dependency. This means that the app does not need black to work, but we want all of the developers who are working on the app to have black installed so that code is formatted consistently.
Pip has no official way of adding dev dependencies. A common convention would be to use a requirements-dev.txt file.
requirements-dev.txt
black
Then run:
pip install -r requirements-dev.txt
We have now installed two packages: requests and black. Lets take a look and see how our dependencies have been documented.
.
├── poetry.lock
└── pyproject.toml
pyproject.toml
Includes the “top-level” dependencies that we defined with poetry add
.
pyproject.toml
[tool.poetry]
name = "my-app"
version = "0.1.0"
description = "A hello world poetry example"
authors = ["YourName <yourname@gmail.com>"]
readme = "README.md"
packages = [{include = "my_app"}]
[tool.poetry.dependencies]
python = "^3.9"
requests = "^2.28.1"
[tool.poetry.group.dev.dependencies]
black = "^22.8.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
poetry.lock
Include the “top-level” dependencies that we defined with poetry add
plus all of the dependencies of those packages. The example below is only a preview of the first several lines.
poetry.lock
[[package]]
name = "black"
version = "22.8.0"
description = "The uncompromising code formatter."
category = "dev"
optional = false
python-versions = ">=3.6.2"
[package.dependencies]
click = ">=8.0.0"
mypy-extensions = ">=0.4.3"
pathspec = ">=0.9.0"
platformdirs = ">=2"
tomli = {version = ">=1.1.0", markers = "python_full_version < \"3.11.0a7\""}
typing-extensions = {version = ">=3.10.0.0", markers = "python_version < \"3.10\""}
...
.
├── requirements-dev.txt
└── requirements.txt
requirements.txt
requirements.txt
requests
requirements-dev.txt
requirements-dev.txt
black
If we want to see the dependencies of our dependencies you can use the pip list
command.
pip list
Package Version
------------------ ---------
black 22.8.0
certifi 2022.9.14
charset-normalizer 2.1.1
click 8.1.3
idna 3.4
mypy-extensions 0.4.3
pathspec 0.10.1
pip 22.2.2
platformdirs 2.5.2
requests 2.28.1
setuptools 65.3.0
tomli 2.0.1
urllib3 1.26.12
wheel 0.37.1
Remove a dependency
This is where Poetry really shines!
After a few weeks of development the team has decided that they do not want to use the black code formatter anymore. Instead, everyone has agreed on autopep8.
First we need to remove black:
poetry remove --group dev black
This command will remove black, and it will also remove all of black’s dependencies that we no longer need. Then, lets add autopep8 as a dependency:
poetry add --group dev autopep8
pyproject.toml and poetry.lock will automatically be updated!
Use pip uninstall
to remove a dependency.
pip uninstall black
However… this command will only remove black. It will not remove black’s dependencies such as click
, tomli
, and others. My virtual environment now includes a bunch of dependencies that I am not using!
pip list
Package Version
------------------ ---------
certifi 2022.9.14
charset-normalizer 2.1.1
click 8.1.3
idna 3.4
mypy-extensions 0.4.3
pathspec 0.10.1
pip 22.2.2
platformdirs 2.5.2
requests 2.28.1
setuptools 65.3.0
tomli 2.0.1
urllib3 1.26.12
wheel 0.37.1
And then remember to update your requirements-dev.txt
:
requirements-dev.txt
autopep 8
And install autopep 8:
pip install -r requirements-dev.txt
If you want to have a virtual environment that actually reflects your current requirements you will need to recreate one.
deactivate
rm -r .venv
python -m venv .venv
python -m pip install --upgrade pip wheel setuptools
source .venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
Run your code
When using Poetry you run you code by prefixing all commands with poetry run
. This runs your code inside the virtual environment that poetry created. You do not need to remember or worry about activating and deactivating virtual environments.
poetry run python hello-world.py
If using VS code you can set the poetry virtual environment as your interpreter. VS Code will then automatically activate the virtual environment. You can then choose to not prefix your commands with poetry run
.
Additionally, if you want to avoid prefixing all of your commands with poetry run
you can use the poetry shell
command. Check out the poetry docs for more details: https://python-poetry.org/docs/cli/#shell.
poetry run
Under the hood poetry uses virtual environments to isolate your projects dependencies. Every time we call poetry add
or poetry remove
we are modifying that virtual environment. In order to run a command inside the virtual environment we use the command poetry run
.
If you invoke python
as you normally would it uses the default python interpreter for your system.
which python
/Users/samedwardes/.pyenv/shims/python
In order to use the virtual environment created by poetry you need to prefix your commands with poetry run
.
poetry run which python
/Users/samedwardes/Library/Caches/pypoetry/virtualenvs/my-app-SAojqYOg-py3.9/bin/python
Note the difference above. When we prefix our command with poetry run
we run our command inside the virtual environment. When we do not prefix the command with poetry run
the virtual environment is not used.
Pip has no special commands for running code. However, it is important to remember to make sure you have activated your virtual environment before running code.
python hello-world.py
FAQ
How do I publish to RStudio Connect?
Some tools like RStudio Connect require a requirements.txt file and do not know how to use files like pyproject.toml or poetry.lock. Luckily Poetry includes commands to programmatically create a requirements.txt that can be consumed by other tools.
poetry export --without-hashes --output requirements.txt
How do you collaborate with others when using poetry?
So far you have created a new project, and used poetry to document your dependencies. Your app is getting a lot of traction and you want to implement some new features. To help with the backlog you will need to on-board a new colleague.
How can you ensure that you and your colleague are using identical environments?
There are two key things your colleague will need:
- poetry.lock
- pyproject.toml
With these two files anyone will be able to reproduce your environment.
Both poetry.lock and pyrpoject.toml should be checked into version control (e.g. GitHub).
When your colleague is ready to start working on the project here is what they will need to do:
git clone <REPO>
cd <REPO>
poetry install
That is it 🎉! Your colleague will now be able to run the code using the poetry run
command. They can also make changes to the environment with poetry add
and poetry remove
!
How do I specify the source of my packages?
By default poetry is configured to use the PyPI repository (https://pypi.org). However, poetry does support the use of alternate repositories as well. Lets add RStudio Package Manager as an alternate. To do this you need to update pyproject.toml by hand:
[[tool.poetry.source]]
name = "rspm"
url = "https://colorado.rstudio.com/rspm/pypi/latest/simple"
Now when you run poetry add
or poetry install
poetry will check both Rstudio Package Manager and PyPi.
Any custom repository will have precedence over PyPI. If you still want PyPI to be your primary source for your packages you can declare custom repositories as secondary.
[[tool.poetry.source]]
name = "rspm"
url = "https://colorado.rstudio.com/rspm/pypi/latest/simple"
secondary = true
If you want to disable PyPi so that only RStudio Package Manager is used you can use the default
keyword.
[[tool.poetry.source]]
name = "rspm"
url = "https://colorado.rstudio.com/rspm/pypi/latest/simple"
default = true
How do I update dependencies?
Overtime you may want to update your dependencies. For example one day pandas version 2.0 may be released and you will want to update to the latest and greatest.
To update pandas only run:
poetry update pandas
pip install --upgrade pandas
To update all dependencies in your project run:
poetry update
Pip has not out of the box way of doing this 😢
How do I specify which version of a package I want to use?
You can specify a specific package version in poetry add
by using the ==
operator. For example:
poetry add urllib3==1.26.0
Read more about constraining package versions here: https://python-poetry.org/docs/cli/#add.
How do I switch the version of Python I want to use?
By default poetry will create a virtual environment using your current python environment. You can change which version of python poetry is using with the poetry env use
command.
Here is an example of how I would use Python version 3.10.1:
poetry env use ~/.pyenv/versions/3.10.1/bin/python
We can validate that this worked by checking the python version:
poetry run python --version
Python 3.10.1
I can always change my Python version later by running the command again. For example lets downgrade to Python 3.9.10:
poetry env use ~/.pyenv/versions/3.9.10/bin/python
poetry run python --version
Python 3.9.10
What should I check into version control?
There are two files you should check into version control:
- poetry.lock
- pyproject.toml
How do I create a requirements.txt file?
Use the poetry export
command:
poetry export --without-hashes --output requirements.txt