NB: User-Created Packages

Programming for Data Science

Now that we have seen what modules and packages are, let’s create our own.

Why Build a Package?

You package code in order to add it to your Python system for general use, and to share it with others (via GitHub or other means).

It is also a good way to organize your code.

This applies to how to organize your programs internally, and externally as files and directories.

And it’s easy to do.

Projects

An important organizational concept when creating packages for installation by other users is the project.

We define a project as an encompassing directory within which to put a package and all the code necessary to actually use it.

So, a project directory contains a collection of packages and modules (Python files) along with:

  • documentation, e.g. READMDE files.
  • tests, e.g. unit tests and results.
  • any top-level scripts, e.g. the Scenario notebooks for your final project.
  • any data files required.
  • instructions and scripts to build and install the code.

The project directory is often a git repo.

What does it mean to build your package?

To build your own package, you of course need some Python files you want to deploy.

Then you do the following:

  1. Create the basic package structure, such as EXAMPLE 3 below.
  2. Write a setup.py using a package tool (see below).
## EXAMPLE 3

a_package_dir
    __init__.py
    module_a.py
tests
    ...
setup.py # Or pyproject.toml 

This will be contained by a project directory.

About setup.py

Your setup.py file describes your package, and tells the packaging tool how to package, build, and install it.

It is Python code, so you can add anything custom you need to it.

In the simple case, it is basically a configuration files with keys and values.

What does setup.py contain?

  • Version & package metadata
  • List of packages to include
  • List of other files to include
  • List of dependencies
  • List of extensions to be compiled

About pyproject.toml

For a lot reasons that beyond the scope of this document, setup.py is being superceded by the use of pyproject.toml files to store setup configuration information.

However, for now we’re going to stick to the old school approach.

Example Setup Files

Example 1

from distutils.core import setup

setup(name='mypkg',
      version='1.0',
      
      # list folders, not files
      packages=['mypkg', 'mypkg.subpkg'], # Include packages in the project
      install_requires=['click']          # Required libraries
)

Example 2

from setuptools import setup, find_packages

setup(
    name='MyPackageName',
    version='1.0.0',
    url='https://github.com/mypackage.git',
    author='Author Name',
    author_email='author@gmail.com',
    description='Description of my package',
    packages=find_packages(),    
    install_requires=['numpy >= 1.11.1', 'matplotlib >= 1.5.1']
)

Example 3

from setuptools import setup

setup(
    name = 'PackageName',
    version = '0.1.0',
    author = 'An Awesome Coder',
    author_email = 'aac@example.com',
    packages = ['package_name', 'package_name.test'],
    scripts = ['bin/script1','bin/script2'],
    url = 'http://pypi.python.org/pypi/PackageName/',
    license = 'LICENSE.txt',
    description = 'An awesome package that does something',
    long_description = open('README.txt').read(),
    install_requires = [
        "Django >= 1.1.1",
        "pytest",
    ]
)

A Summary of Keys

As mentioned about, the main content of basic setup files is configuraton information. The keys that you should include in your projects are the following:

  • name: A string of the package name as title, not a filename.
  • version: A string of the version number expression, typically using the MAJOR.MINOR.PATCH pattern. See Semantic Versioning for more information.
  • author: A string with the creator’s name.
  • author_email: A string with the creator’s email address.
  • packages: A list of strings of package directories in the project.
  • url: A string of the URL to the code repo.
  • license: A string of the license file name.
  • description: A string with a short blurb of the project.
  • long_description: A link to a longer description. Can do something like open('README.txt').read().
  • install_requires: A list of strings of external libraries that the project requires.

Python packaging tools

In writing setup.py, you need to use a packaging tool.

Notice that we’ve imported the setuptools library.

  • The package tool distutils is included with Python, but it is not recommended.
  • Instead, use setuptools, a third party tool that extends distutils and is used in most modern Python installations.

Quick Demo

So, let’s look at a simple package.

Source: Minimal Structure (python-packaging)

Directory

Here is our directory structure:

!ls -lR demo_package3/
demo_package3/:
total 12
drwx--S--- 2 rca2t users 1024 May  8 15:36 funniest
-rw------- 1 rca2t users  301 May  8 15:36 setup.py
-rw------- 1 rca2t users    0 May  8 15:36 test.py

demo_package3/funniest:
total 8
-rw------- 1 rca2t users 197 May  8 15:36 funniest.py
-rw------- 1 rca2t users  64 May  8 15:36 __init__.py

Setup file

Here is what out setup.py file has inside:

print(open('demo_package3/setup.py', 'r').read())
from setuptools import setup

setup(name='funniest',
      version='0.1',
      description='The funniest joke in the world',
      url='http://github.com/storborg/funniest',
      author='Flying Circus',
      author_email='flyingcircus@example.com',
      license='MIT',
      packages=['funniest'])

__init__.py

print(open('demo_package3/funniest/__init__.py', 'r').read())
from . funniest import joke

print("Have I got a joke for you!")

funniest.py

print(open('demo_package3/funniest/funniest.py', 'r').read())
def joke():
    "This function just tells a joke. Or tries to."
    return (u'Wenn ist das Nunst\u00fcck git und Slotermeyer? Ja! ... '
            u'Beiherhund das Oder die Flipperwaldt gersput.')

Install

!cd demo_package3/; pip install .
Defaulting to user installation because normal site-packages is not writeable
Processing /sfs/qumulo/qhome/rca2t/Documents/MSDS/DS5100/repo-book/notebooks/M09_PythonModules/demo_package3
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: funniest
  Building wheel for funniest (setup.py) ... done
  Created wheel for funniest: filename=funniest-0.1-py3-none-any.whl size=1569 sha256=c3829948b6683f573d78df49d933ae95e47f32d59e0d536bfd96ae9b3c7f746a
  Stored in directory: /tmp/pip-ephem-wheel-cache-ajl4pgec/wheels/7a/bf/b4/a80ff477ce7f61df01ebc8eacffd6f8c08035d70d16a6e68e4
Successfully built funniest
Installing collected packages: funniest
Successfully installed funniest-0.1

Try it out

This should be done from a directory other than the one that contains the source files.

from funniest import joke
joke()
'Wenn ist das Nunstück git und Slotermeyer? Ja! ... Beiherhund das Oder die Flipperwaldt gersput.'

Many Ways to Install

Running setup.py directly with python

python setup.py sdist   # Builds a source distribution as tar archie
python setup.py build   # Builds from source
python setup.py install # Installs to Python
python setup.py develop # Installs in develop mode (changes are immediately reflected)

Using pip

pip install .    # Installs to Python
pip install -e . # To create symlink, so you can keep working on the code (develop mode)

Testing Code

As you work, you will want to write tests and put them somewhere. A good idea is to put your tests in the root of the project directory. There are other options and approaches though, some of which are covered in the resource below.

See Where to Put Tests?.

A More Complex Project Structure

project_name/
    bin/
    CHANGES.txt
    docs/
    LICENSE.txt
    MANIFEST.in
    README.txt
    setup.py
    test_module_1.py
    test_module2.py      
    package_name/
        __init__.py
        module1.py
        module2.py
  • CHANGES.txt: log of changes with each release
  • LICENSE.txt: text of the license you choose (do choose one!)
  • MANIFEST.in: description of what non-code files to include
  • README.txt: description of the package should be written in ReST or Markdown (for PyPi):
  • setup.py: the script for building/installing package.
  • bin/: This is where you put top-level scripts (some folks use scripts)
  • docs/: the documentation
  • package_name/: The main package this is where the code goes.
  • test/: your unit tests. Options here: