The Python Packaging Hell: Files Everywhere (4 / 7)
Python packaging can sometimes be a nightmare. To convince yourself about that, you just need a few minutes of drowning into the myriad of usable (and used!) files to build or install a packages.
💕💕💕
This article is part of a series of tearful articles about Python packaging:
- The Can of Worms
- The Roots of Evil
- Delusions of Formats
- Files Everywhere
- The Toolbox
- The Expression of Needs
- The Minimal Solution
Before starting, we would like to send a lot of love to the members of the PyPA team. We complain a lot in this series, but we have a lot of respect for the sisyphean work already done.
That being said, let’s start (again) the whining 😭.
💕💕💕
But Why?
We can’t say that there is no manual to create Python packages. The issue isn’t the lack, but the abundance. You’ll find manuals everywhere, more or less old, more or less practical, more or less useful… The hardest part isn’t to find one, it’s to read all of them and to come across information in each one, until you’ve made your own conviction.
Maybe you thought that you’d find in the lines of this article a good summary of what exists, but here is the sad truth: what we have here is only an additional source you can refer to if you ever find something interesting.
That being said, it’s not that bad…
What is the connection between this introduction and files? Well, it’s quite simple. We can’t say there is no configuration file to create Python packages. The issue isn’t the lack, but the abundance. You’ll find files everywhere, more or less old, more or less practical, more or less useful… The hardest part isn’t to find one, it’s to read all of them and to come across information in each one, until you’ve made your own conviction.
(It’s OK, you got the connection, right?)
 
        
        We won’t say again the good old "you have to understand the people creating
        Python, because Python is old, you can’t change everything suddenly".
        It’s a bit true for this hell of files, but also a bit false.
        The official example
        proposed today by the PyPA contains 4 files used to create packages
        (setup.py, setup.cfg,
        MANIFEST.in and pyproject.toml).
        If we can understand the  to cover a maximum of possible solutions,
        we can also condemn the impression of full chaos given to someone who’d
        like to learn.
      
        (Reminder: a minimal Rust project contains a Cargo.toml file
        to store metadata and a src/main.rs file to store code.
        Moreover, these two files are automatically created for you with the
        cargo new command.)
      
For sure, it’s hard to think about all the needs of a configuration file at the beginning. But on the other hand, it’s questionable to say we have to live with this sad legacy. Unlike other subjects, nothing would prevent us from defining a new standard of configuration file. And nothing would prevent this standard from being able to generate packages identical to existing ones. We’d be able to leave the past behind us, with its old files and its old tools, to only use one file in any case. The package creator would have to learn how to use these new rules, of course, but nothing would change for the final user, nor for the tools they’d use.
That would be nice, don’t you think? It’s time for the good news: it’s already happening right now. No joke.
Now that you really want to know the solution (yes, it’s totally sneaky and totally assumed), we’ll be able to inflict you the full thought process leading to the current situation. The path matters more than the destination, doesn’t it?
A Rather Short List
There is no need to grumble: as usual, we won’t talk about everything that has ever existed to create or install packages. Don’t expect an exhaustive list, just a few emblematic files allowing us to understand where we come from.
setup.py
      This file is the first one introduced to handle package creation, it’s also the most famous and the most used today, despite its advanced age (at least 20 years, this doesn’t make us feel any younger).
        The idea behind setup.py is quite simple: to set up the
        whole configuration needed to create and install packages, we use a
        Python script defining a set of metadata (the name of the package, the
        list of files to include, etc.) and various commands (create a source
        package, create a binary package, install, etc.). To do that, Python
        offers a module called
        distutils,
        containing everything needed to describe these metadata and commands.
        It’s enough to import it in setup.py, to call the right
        functions, and voilà.
      
        But, and it’s not the first time we meet this problem with tools managing Python
        packages, distutils is rather limited and its features
        aren’t defined strictly. Hurtfully, the code became the reference of
        what we can do, and the (legitimate) fear of breaking everything
        quickly prevented developers from adding features, or from fixing bugs
        that some users would have confused with features.
      
 
        
        Limited by distutils, setup.py may have been
        replaced by another solution. But we found a quick fix instead:
        setuptools.
      
        setuptools is a module using distutils internally,
        but offering additional features, including a more advanced management
        of included files, the possibility to create Windows programs, and most
        of all… the possibility to define dependencies.
      
        We’ll see libraries and tools in detail in the next article, but it’s
        important to understand that setuptools is going to open,
        without realizing, a can of worms.
        As the module is external to Python, it’s much less cluttered by the
        tweezers of its predecessor.
        The new features are added in response to the needs of users, in a
        happy disorganization that at least had the merit of allowing a chaotic
        but large packages distribution.
        The library comes with an executable script, easy_install, that
        allows to install a package and its dependencies. It also comes with
        the "egg" package format that we discussed about last time.
      
        From the anarchic development of setuptools, it has been
        impossible to correctly specify options and best practices of package
        creation. setup.py has the cons of its pros: being written
        in Python, it allows to use the full power of the language for
        something that was initially supposed to be a few lines of metadata and
        installation scripts. Everything that could simply be descriptive
        potentially becomes dynamic when executed. Extensions are proposed,
        dependent on setuptools or not, offering a galaxy of
        possibilities. Scripts are getting bigger, are copied from project to
        project without being understood. Random parts of code fixing
        dysfunctions for different versions of Python, distutils
        or setuptools, are included in all the setup.py
        of the Earth.
      
And at the end, we get that. Of course, this project needs a lot of configuration and it would be difficult to do these things with less code. Of course, it’s quite easy to understand the whole file, furthermore nicely written, with some time and some hard work.
        The issue of setup.py isn’t its potential complexity, that
        can be useful in some cases. The real issue is that, for a long time,
        there has been no simple alternative to create simple, pure Python
        packages. The only way was to write code, for thing that could have
        been totally declarative. And who didn’t try to write code, a lot of
        horrible code, even to do simple things? With this stack of horrible
        code in many projects, setuptools has had to include workarounds
        allowing to bypass workarounds set up to bypass issues fixed since.
        setuptools has had to copy and include different functions
        of different Python versions (including their own bugs, of course) to be
        perfectly backward compatible.
        TL;DR: setuptools has become a hellish monster which has
        infected the setup.py of a significant majority of projects.
      
setup.cfg
      
        Obviously, the idea to set up a declarative format for packages creation
        has finally arrived, and a solution has been integrated into setuptools:
        setup.cfg.
      
        This INI file is nothing more than a different presentation of most of
        the options proposed in Python by setuptools. So we’ll find
        the same disadvantages: same bugs, same poorly documented options, same
        inconsistencies.
      
        Moreover, this file isn’t a replacement of setup.py, but
        an extension. We have to keep the script, even when it’s almost empty!
        If some data are present twice, those from setup.cfg are kept.
      
        Why do we need to keep the setup.py file? Just because
        setuptools doesn’t provide an external command to execute
        commands integrated in the script. To generate a source package, we use
        python setup.py sdist, directly executing the script.
      
It looks like a detail, but it actually is a major issue. Who would want to use a static format, while we can make a big pile of spaghetti code in a script that you still have to keep anyway? How can we explain to the people discovering Python that they have to write a Python file and an INI file, while we technically can do without the INI file? Yes, you got it: we can’t fight the call of the code.
        That explains why setup.cfg isn’t really used today.
        Attached to the two huge ubiquitous burdens that are setuptools
        and setup.py, it just brings a small dose of simplicity by
        its declarative side. As long as it’ll be carrying around a heavy and
        sclerosing history, it’ll stay a second choice, a clumsy attempt to fix
        a real issue.
      
requirements.txt
      
        Here is a file you certainly have already met and used. Praised without
        finesse by second zone tutorials, acclaimed for its simplicity and
        its power, used by a lot of famous projects, requirements.txt
        is the star of the dependencies installation.
      
But well, let’s just go ahead and say it: it has nothing to do with package creation.
        requirements.txt, is a simple list of packages to install,
        with the possibility to fix versions, sources, branches and
        installation options.
      
        It’s often used with pip, just for the installation. We
        can see it as a convenient way to list dependencies, in a format that
        we could directly write in the command line, but that our laziness and
        our taste for line breaks push us to confine in a file.
      
        That’s convenient, in particular for everything we’d like to share in a
        different format than a package. At random: everything but libraries.
        A little unpretentious script? A requirements.txt file.
        A web application? A requirements.txt file.
        A library? Well, OK, let’s
        write some requirements.txt files for the documentation and the
        tests.
      
        Yes, we can have a setup.py file, a setup.cfg file and
        a requirements.txt file in the same project. With all their
        friends MANIFEST.in, tox.ini,
        pyproject.toml, pytest.ini, and so on. Each
        package manager packages its own way, blithely copying things that seem
        to work from friends’ packages. We’ll always find a specific case that’s
        only handled by one of these files, and simplicity will be sacrificed
        on the altar of the sacrosanct features.
      
MANIFEST.in
      Do you want a very special feature? Including assets in a source package is a good example of what a painful puzzle is.
Distributing binary packages is usually done so that users can easily use the code. Packages like wheels are ready-to-use archives, and their installation needs nothing more than decompressing an archive in the right folder. These packages are able to contain only the minimum: code. Everything else (documentation, tests, super-cute-little-nice-files describing changes…) has nothing to do in them.
Source packages are different. These packages are useful for the ones who want to look the code, create packages for Linux distributions, test patches, install libraries, launch tests… We thus try to include everything we can in the package, almost everything from the repository, except files needed for continuous integration, versioning, and other small stuff polluting our so cute project.
        To include these files in the source package, in particular when files
        are at the root of the project and not in the same folder as the code,
        we use MANIFEST.in. This umpteenth file comes, properly,
        with its own syntax and its own commands.
        And don’t worry: it allows you to do, at the same time, things that are
        already possible with the other files, and things that are not possible
        with the other files.
      
pyproject.toml
      Here we are.
        At first sight, pyproject.toml seems to be a direct clone
        of setup.cfg, with a slightly different format and a
        debatable name. Another file again, another format again, but what a
        crazy idea?
      
In reality, things are a little bit more complex. PEP 518, that has introduced this file, is called "Specifying minimum build system requirements for Python projects". It’s not called "And here’s one more stupid format to define package metadata", and there are surprisingly good reasons for that.
        In the list of issues caused by setuptools, there is one
        we didn’t talk about yet: setup.py contains the
        dependencies of a package, including the dependencies used to build the
        package. How can we know the dependencies without executing the file?
        And how to execute the file without knowing its dependencies? This
        chicken and egg issue is problematic for setuptools, but as
        everyone is using it to create packages, and as it’s a dependency of
        pip, there’s a good chance to have it installed with
        Python. However, if we want to use another tool, like a
        setuptools extension, things immediately become less easy.
      
        The idea of pyproject.toml isn’t to propose a new metadata
        format. The idea is to include, in a simple text file, the dependencies
        needed to build a package. Think about this carefully. A little more.
      
        Well, you understand now. We’ll be able to get rid of setuptools
        and distutils, at least to build packages. For real.
      
        Of course, in simple cases, we can still use them. pyproject.toml
        allows to store all the metadata we used to store before. It also
        allows to store more complex information, like dependencies and
        supported Python versions, a bit like in setup.cfg, a bit like before.
      
        But nothing prevents packagers from using another tool, that can define
        their own configuration options, independent of setuptools.
        Even better: as the file is specified and well organized, it gives
        other tools (black, pylint,
        coverage…) the possibility to use this file too, ending
        the atrocious set of confetti of configuration files.
      
        One last thing has to be fixed: defining the entry point of the tool
        we’re going to use to create the package. That’s the job of
        PEP 517 that
        allows us to definitely get free from setuptools,
        setup.py and all their friends.
      
But… Does it work for real?
Yes. We just have to choose the tools we can use. And we’ll choose them in the next article…