The Python Packaging Hell: Files Everywhere (4 / 7)
Python packaging can sometimes be a nightmare. To convince yourself about that, you just need a few minutes of drowning into the myriad of usable (and used!) files to build or install a packages.
💕💕💕
This article is part of a series of tearful articles about Python packaging:
- The Can of Worms
- The Roots of Evil
- Delusions of Formats
- Files Everywhere
- The Toolbox
- The Expression of Needs
- The Minimal Solution
Before starting, we would like to send a lot of love to the members of the PyPA team. We complain a lot in this series, but we have a lot of respect for the sisyphean work already done.
That being said, let’s start (again) the whining 😭.
💕💕💕
But Why?
We can’t say that there is no manual to create Python packages. The issue isn’t the lack, but the abundance. You’ll find manuals everywhere, more or less old, more or less practical, more or less useful… The hardest part isn’t to find one, it’s to read all of them and to come across information in each one, until you’ve made your own conviction.
Maybe you thought that you’d find in the lines of this article a good summary of what exists, but here is the sad truth: what we have here is only an additional source you can refer to if you ever find something interesting.
That being said, it’s not that bad…
What is the connection between this introduction and files? Well, it’s quite simple. We can’t say there is no configuration file to create Python packages. The issue isn’t the lack, but the abundance. You’ll find files everywhere, more or less old, more or less practical, more or less useful… The hardest part isn’t to find one, it’s to read all of them and to come across information in each one, until you’ve made your own conviction.
(It’s OK, you got the connection, right?)
We won’t say again the good old "you have to understand the people creating
Python, because Python is old, you can’t change everything suddenly".
It’s a bit true for this hell of files, but also a bit false.
The official example
proposed today by the PyPA contains 4 files used to create packages
(setup.py
, setup.cfg
,
MANIFEST.in
and pyproject.toml
).
If we can understand the to cover a maximum of possible solutions,
we can also condemn the impression of full chaos given to someone who’d
like to learn.
(Reminder: a minimal Rust project contains a Cargo.toml
file
to store metadata and a src/main.rs
file to store code.
Moreover, these two files are automatically created for you with the
cargo new
command.)
For sure, it’s hard to think about all the needs of a configuration file at the beginning. But on the other hand, it’s questionable to say we have to live with this sad legacy. Unlike other subjects, nothing would prevent us from defining a new standard of configuration file. And nothing would prevent this standard from being able to generate packages identical to existing ones. We’d be able to leave the past behind us, with its old files and its old tools, to only use one file in any case. The package creator would have to learn how to use these new rules, of course, but nothing would change for the final user, nor for the tools they’d use.
That would be nice, don’t you think? It’s time for the good news: it’s already happening right now. No joke.
Now that you really want to know the solution (yes, it’s totally sneaky and totally assumed), we’ll be able to inflict you the full thought process leading to the current situation. The path matters more than the destination, doesn’t it?
A Rather Short List
There is no need to grumble: as usual, we won’t talk about everything that has ever existed to create or install packages. Don’t expect an exhaustive list, just a few emblematic files allowing us to understand where we come from.
setup.py
This file is the first one introduced to handle package creation, it’s also the most famous and the most used today, despite its advanced age (at least 20 years, this doesn’t make us feel any younger).
The idea behind setup.py
is quite simple: to set up the
whole configuration needed to create and install packages, we use a
Python script defining a set of metadata (the name of the package, the
list of files to include, etc.) and various commands (create a source
package, create a binary package, install, etc.). To do that, Python
offers a module called
distutils
,
containing everything needed to describe these metadata and commands.
It’s enough to import it in setup.py
, to call the right
functions, and voilà.
But, and it’s not the first time we meet this problem with tools managing Python
packages, distutils
is rather limited and its features
aren’t defined strictly. Hurtfully, the code became the reference of
what we can do, and the (legitimate) fear of breaking everything
quickly prevented developers from adding features, or from fixing bugs
that some users would have confused with features.
Limited by distutils
, setup.py
may have been
replaced by another solution. But we found a quick fix instead:
setuptools
.
setuptools
is a module using distutils
internally,
but offering additional features, including a more advanced management
of included files, the possibility to create Windows programs, and most
of all… the possibility to define dependencies.
We’ll see libraries and tools in detail in the next article, but it’s
important to understand that setuptools
is going to open,
without realizing, a can of worms.
As the module is external to Python, it’s much less cluttered by the
tweezers of its predecessor.
The new features are added in response to the needs of users, in a
happy disorganization that at least had the merit of allowing a chaotic
but large packages distribution.
The library comes with an executable script, easy_install
, that
allows to install a package and its dependencies. It also comes with
the "egg" package format that we discussed about last time.
From the anarchic development of setuptools
, it has been
impossible to correctly specify options and best practices of package
creation. setup.py
has the cons of its pros: being written
in Python, it allows to use the full power of the language for
something that was initially supposed to be a few lines of metadata and
installation scripts. Everything that could simply be descriptive
potentially becomes dynamic when executed. Extensions are proposed,
dependent on setuptools
or not, offering a galaxy of
possibilities. Scripts are getting bigger, are copied from project to
project without being understood. Random parts of code fixing
dysfunctions for different versions of Python, distutils
or setuptools
, are included in all the setup.py
of the Earth.
And at the end, we get that. Of course, this project needs a lot of configuration and it would be difficult to do these things with less code. Of course, it’s quite easy to understand the whole file, furthermore nicely written, with some time and some hard work.
The issue of setup.py
isn’t its potential complexity, that
can be useful in some cases. The real issue is that, for a long time,
there has been no simple alternative to create simple, pure Python
packages. The only way was to write code, for thing that could have
been totally declarative. And who didn’t try to write code, a lot of
horrible code, even to do simple things? With this stack of horrible
code in many projects, setuptools
has had to include workarounds
allowing to bypass workarounds set up to bypass issues fixed since.
setuptools
has had to copy and include different functions
of different Python versions (including their own bugs, of course) to be
perfectly backward compatible.
TL;DR: setuptools
has become a hellish monster which has
infected the setup.py
of a significant majority of projects.
setup.cfg
Obviously, the idea to set up a declarative format for packages creation
has finally arrived, and a solution has been integrated into setuptools:
setup.cfg
.
This INI file is nothing more than a different presentation of most of
the options proposed in Python by setuptools
. So we’ll find
the same disadvantages: same bugs, same poorly documented options, same
inconsistencies.
Moreover, this file isn’t a replacement of setup.py
, but
an extension. We have to keep the script, even when it’s almost empty!
If some data are present twice, those from setup.cfg
are kept.
Why do we need to keep the setup.py
file? Just because
setuptools
doesn’t provide an external command to execute
commands integrated in the script. To generate a source package, we use
python setup.py sdist
, directly executing the script.
It looks like a detail, but it actually is a major issue. Who would want to use a static format, while we can make a big pile of spaghetti code in a script that you still have to keep anyway? How can we explain to the people discovering Python that they have to write a Python file and an INI file, while we technically can do without the INI file? Yes, you got it: we can’t fight the call of the code.
That explains why setup.cfg
isn’t really used today.
Attached to the two huge ubiquitous burdens that are setuptools
and setup.py
, it just brings a small dose of simplicity by
its declarative side. As long as it’ll be carrying around a heavy and
sclerosing history, it’ll stay a second choice, a clumsy attempt to fix
a real issue.
requirements.txt
Here is a file you certainly have already met and used. Praised without
finesse by second zone tutorials, acclaimed for its simplicity and
its power, used by a lot of famous projects, requirements.txt
is the star of the dependencies installation.
But well, let’s just go ahead and say it: it has nothing to do with package creation.
requirements.txt
, is a simple list of packages to install,
with the possibility to fix versions, sources, branches and
installation options.
It’s often used with pip
, just for the installation. We
can see it as a convenient way to list dependencies, in a format that
we could directly write in the command line, but that our laziness and
our taste for line breaks push us to confine in a file.
That’s convenient, in particular for everything we’d like to share in a
different format than a package. At random: everything but libraries.
A little unpretentious script? A requirements.txt
file.
A web application? A requirements.txt
file.
A library? Well, OK, let’s
write some requirements.txt
files for the documentation and the
tests.
Yes, we can have a setup.py
file, a setup.cfg
file and
a requirements.txt
file in the same project. With all their
friends MANIFEST.in
, tox.ini
,
pyproject.toml
, pytest.ini
, and so on. Each
package manager packages its own way, blithely copying things that seem
to work from friends’ packages. We’ll always find a specific case that’s
only handled by one of these files, and simplicity will be sacrificed
on the altar of the sacrosanct features.
MANIFEST.in
Do you want a very special feature? Including assets in a source package is a good example of what a painful puzzle is.
Distributing binary packages is usually done so that users can easily use the code. Packages like wheels are ready-to-use archives, and their installation needs nothing more than decompressing an archive in the right folder. These packages are able to contain only the minimum: code. Everything else (documentation, tests, super-cute-little-nice-files describing changes…) has nothing to do in them.
Source packages are different. These packages are useful for the ones who want to look the code, create packages for Linux distributions, test patches, install libraries, launch tests… We thus try to include everything we can in the package, almost everything from the repository, except files needed for continuous integration, versioning, and other small stuff polluting our so cute project.
To include these files in the source package, in particular when files
are at the root of the project and not in the same folder as the code,
we use MANIFEST.in
. This umpteenth file comes, properly,
with its own syntax and its own commands.
And don’t worry: it allows you to do, at the same time, things that are
already possible with the other files, and things that are not possible
with the other files.
pyproject.toml
Here we are.
At first sight, pyproject.toml
seems to be a direct clone
of setup.cfg
, with a slightly different format and a
debatable name. Another file again, another format again, but what a
crazy idea?
In reality, things are a little bit more complex. PEP 518, that has introduced this file, is called "Specifying minimum build system requirements for Python projects". It’s not called "And here’s one more stupid format to define package metadata", and there are surprisingly good reasons for that.
In the list of issues caused by setuptools
, there is one
we didn’t talk about yet: setup.py
contains the
dependencies of a package, including the dependencies used to build the
package. How can we know the dependencies without executing the file?
And how to execute the file without knowing its dependencies? This
chicken and egg issue is problematic for setuptools
, but as
everyone is using it to create packages, and as it’s a dependency of
pip
, there’s a good chance to have it installed with
Python. However, if we want to use another tool, like a
setuptools
extension, things immediately become less easy.
The idea of pyproject.toml
isn’t to propose a new metadata
format. The idea is to include, in a simple text file, the dependencies
needed to build a package. Think about this carefully. A little more.
Well, you understand now. We’ll be able to get rid of setuptools
and distutils
, at least to build packages. For real.
Of course, in simple cases, we can still use them. pyproject.toml
allows to store all the metadata we used to store before. It also
allows to store more complex information, like dependencies and
supported Python versions, a bit like in setup.cfg
, a bit like before.
But nothing prevents packagers from using another tool, that can define
their own configuration options, independent of setuptools
.
Even better: as the file is specified and well organized, it gives
other tools (black
, pylint
,
coverage
…) the possibility to use this file too, ending
the atrocious set of confetti of configuration files.
One last thing has to be fixed: defining the entry point of the tool
we’re going to use to create the package. That’s the job of
PEP 517 that
allows us to definitely get free from setuptools
,
setup.py
and all their friends.
But… Does it work for real?
Yes. We just have to choose the tools we can use. And we’ll choose them in the next article…