Packaging Python for scale (part three)

Recap

We ended the second part of this series with a finished product. The output was a program called inspyration. It prints out a clever phrase when you run it, like a message of the day (MOTD).

You can import it as a module from your own programs too. It comes with unit tests which means we don't have to worry about bad grammar. The whole thing is wrapped up professionally in a package fit for PyPI.

The calm before the storm

And there we leave it for a while. From time to time we consult inspyration version 0.02. It's simple, but it makes us smile:

$ ./py3/bin/inspyration
Nothing succeeds like success.

That is until one day we get an unexpected email from a developer in Silicon Valley.

His name is Anton, and he's the CTO of a brash startup called WebMirth. He says they've got $2 million of business angel money. They want to totally disrupt the MOTD space. Crowd-sourcing internet memes with Kalman filters. Mobile event streams. Social clickthrough.

They are super-excited about open source. They want to know if we can help them devops inspyration onto their cloud-based app clusters...

We hesitate for a second.

Don't Panic

We are at a decision point. Our little project has leapt from total obscurity to being the keystone of someone's business. We have a choice to make.

There are two sensible options, and there is a third which is not so sensible.

Cat Herding

Of course you want to know what the stupid move would be. I'll tell you then; it's Github.

The absolute worst thing in my opinion is to throw your work up on a public repository with no care for how the code evolves. Within moments, ten people will clone your project and you will spend the rest of your weekend ignoring pull requests.

People you have never heard of will leave their name in comments explaining why their patch is a stylistic mess but so very, very clever. Others will set themselves up as gatekeepers by becoming the Debian packager or the administrator of a Continuous Integration server.

The code will grow in ways you cannot even monitor, let alone control. It won't be long before the original purpose of your little inspyration will be lost forever.

Orderly Handover

An entirely sensible option however, is to hand over control of the project to this WebMirth company. You've taken it as far as you wanted. Now it's they who have the money and the motivation to make it a success.

There's absolutely nothing wrong with this decision, only then it wouldn't make for a very interesting story.

This tutorial is about designing your Python project so that it scales. So for the purposes of the tutorial, we'll take the third option:

Benevolent Dictatorship

As the original author, we will continue to lead the inspyration project. We will make sensible architectural choices which give the WebMirth people what they need while maintaining our original vision. We will design for scale.

How can we achieve that in a Python package? There are two aspects we need to consider;

  • aggregation of functionality
  • discovery of features.

Design for Aggregation

We can't yet imagine what ideas WebMirth want implemented. But we suspect already we might not like all of them. How do we stop them taking over?

There needs to be a way of splitting up our project so that each piece is separate and optional. And yet when you put it all together, it works like a single Python package. Does that make sense?

Yes, it does make sense. Python has a feature called namespace packages for that very purpose.

So the inspyration project will get an extra layer; there will be an inspyration-this and an inspyration-that. Each package will deliver separate functionality. Each will define its dependencies, so that installing one of them can automatically pull in another if need be.

The subpackage

We need to supply WebMirth with the seed for their own subpackage, which will integrate with ours.

Let's do it. Make a copy of your ppfsp2 project from last time, and call it inspyration-webmirth. Delete the files main.py and test_content.py.

Anton has sent us some code he wants to go into the package. It's a bit like the data we already provide, but in order for his marketing demo to work he wants it matched against his customer segments. Here's the listing of WebMirth's content.py:

#!/usr/bin/env python
# encoding: UTF-8

codingData = [
"Artificial Intelligence is no match for Natural Stupidity.",
"Namespaces are one honking great idea -- let's do more of those!",
"""Debugging is twice as hard as writing the code in the first place.
 Therefore, if you write the code as cleverly as possible, you are,
 by definition, not smart enough to debug it.""",
]

geekData = [
"""There are 10 types of people in the world; those who understand binary, and
those who don't.""",
"Just what do you think you're doing, Dave?",
"Always tell the truth, George; it's the easiest thing to remember.",
"I find your lack of faith disturbing.",
]

Look carefully how the structure differs now. We are adding an extra directory below the inspyration level called webmirth:

inspyration-webmirth
├── inspyration
│   ├── __init__.py
│   └── webmirth
│       ├── __init__.py
│       └── content.py
├── MANIFEST.in
├── README.txt
└── setup.py

The top level __init__.py is necessary only for Python 3.3 and earlier. It defines inspyration as a namespace package, and contains this single line:

__import__("pkg_resources").declare_namespace(__name__)

The __init__.py at the next level down defines the WebMirth subpackage. Anton is under pressure from his investors and wants to tell them their app is nearly ready. So let's depart from the cautious versioning we apply to our own code and give them something they can talk about:

__version__ = "1.0a1"

There are some changes to setup.py too. We import the subpackage to access the version number:

import inspyration.webmirth

... and we re-name and re-version this distribution in the invocation of setup:

setup(
    name="inspyration-webmirth",
    version=inspyration.webmirth.__version__,

There are also new arguments which declare the namespace and specify the subpackage.

...
    namespace_packages=["inspyration"],
    packages=["inspyration", "inspyration.webmirth"],

Finally, we add an install dependency, referencing the package we intend ourselves to continue developing:

...
    install_requires=["inspyration-common>=0.03"],

The main package

There are similar changes to be made to our own half of the project. We change the name of the parent directory to inspyration-common. We add the top level __init__.py with the same cryptic one-liner we used above. The __init__.py at the next level contains our own (more modest) package version.

__version__ = "0.03"

In setup.py, we import our own package so that we can declare our new version.

import inspyration.common
setup(
    name="inspyration-common",
    version=inspyration.common.__version__,

And as before, we declare the namespace and specify the subpackage.

...
    namespace_packages=["inspyration"],
    packages=["inspyration", "inspyration.common"],

Design for Discovery

We have split the project in two. How does our main program detect the content supplied by the WebMirth package? By default we want the MOTD to be selected from this source:

  • inspyration.common.content.data

And if inspyration-webmirth is installed alongside, the data should come from all three:

  • inspyration.common.content.data
  • inspyration.webmirth.content.codingData
  • inspyration.webmirth.content.geekData

In other words, we want to check at run-time what is available.

Entry points help us here. We have seen them before. They are the way setuptools thinks of console scripts. But more generally they are a way for one Python package to advertise its interfaces to another.

Naming and versioning of interfaces is important. If you use entry points, you get versioning for free thanks to the Python versioning of your distribution.

All that's left is to think of a sensible name for your interface. Most people use a dotted notation to define a vendor.application.name hierarchy. So we decide on "thuswise.inspyration.data" as the name for our inter-package data sharing interface.

The subpackage

A Python package can advertise any number of objects against a named interface. Each entry point must have its own unique name.

In the setup.py for inspyration-webmirth, we add this entry point declaration to the arguments of the setup call:

entry_points={
    "thuswise.inspyration.data": [
        "coding = inspyration.webmirth.content:codingData",
        "geek = inspyration.webmirth.content:geekData"
    ],
}

The main package

The common package now conforms to two interfaces:

  • The setuptools console script interface for a callable 'binary'
  • Our own data-sharing interface used by inspyration itself

So here's what the entry point declaration looks like in the setup.py for inspyration-common:

entry_points={
    "console_scripts": [
        "inspyration = inspyration.common.main:run"
    ],
    "thuswise.inspyration.data": [
        "common = inspyration.common.content:data",
    ],
}

How do we perform the run-time discovery of entry points? Pkg_resources does the hard work for us. Make a new file called discovery.py in the directory inspyration-common/inspyration/common. Paste into it the code below.

#!/usr/bin/env python3
# encoding: UTF-8

import pkg_resources


def all_data():
    for i in pkg_resources.iter_entry_points("thuswise.inspyration.data"):
        try:
            ep = i.load(require=False)
        except Exception as e:
            continue
        else:
            yield ep

data = [item for seq in all_data() for item in seq]

When this module is first imported by our main program, it will discover all entry points registered under the interface thuswise.inspyration.data and create an aggregate referenced by its own module-level variable.

I hope you know well enough to demand a test of me when I make extravagant claims like that. So here it is. This code belongs in the module test_discovery.py:

#!/usr/bin/env python3
# encoding: UTF-8

import unittest

import inspyration.common.content
from inspyration.common.discovery import all_data


class DiscoveryTest(unittest.TestCase):

    def test_discovery(self):
        self.assertIn(inspyration.common.content.data,
                      all_data())

And in main.py, all we need to do is change how the old-style package used to import its data by name:

from inspyration.content import data

... to how the new namespace package employs dynamic discovery:

from inspyration.common.discovery import data

Well done for following along. Here's what our main project tree should look like now:

inspyration-common
├── inspyration
│   ├── __init__.py
│   └── common
│       ├── __init__.py
│       ├── content.py
│       ├── discovery.py
│       ├── main.py
│       ├── test_content.py
│       └── test_discovery.py
├── MANIFEST.in
├── README.txt
└── setup.py

Release

We avoided a potential disaster. Anton now has his own code tree which he can change as he likes. So long as his inspyration-webmirth package advertises on the interface we agreed, our packages will integrate successfully.

There are now two distributions which install into the inspyration namespace:

  • inspyration-common-0.03.tar.gz
  • inspyration-webmirth-1.0a1.tar.gz

We are committed to maintain the first; this is how we enact our vision for the project as a whole. Collaborators like WebMirth may come along for the ride, and their innovations can be formalised when they offer benefit.

We covered a lot of ground this time. In the final part of the series, we will consolidate a little, and discuss some topics we conveniently skipped over today.