NB: Modules and Packages

Programming for Data Science

Modules

In Python, a module is a file containing Python code — basically, a collection of expressions and statements.

It will usually contain functions, classes, and fixed variables (“constants”) such as the value of \(\Large\pi\).

For instance, let’s say we have a file called fibo.py that contains the following code:

## Fibonacci numbers module

def fib(n):
    "Prints Fibonacci series up to n."
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

def fib2(n):
    "Returns a Fibonacci series up to n."
    a, b = 0, 1
    result = []
    while a < n:
        result.append(a)
        a, b = b, a+b
    return result

We would say that these two function belong to the module fibo, based on the filename used to store the code.

Importing

To use the functions in this module in another program, you would need to import it.

You can import a module into the script you are working in as follows:

import fibo

We can do this since fibo.py is sitting in the same directory as our notebook.

!ls | grep fibo.py
fibo.py

Once we have imported the module, we can use its attributes (as they are called) in our code.

fibo.fib(1000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 
fibo.fib2(100)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

Module Names

Note that the module’s name we used to import is just the file name without the .py suffix.

So, we say that the file fibo.py contains the module fibo.

__name__

Python provides a special variable called __name__ that you can use to get the name of a module.

We saw this in our discussion of unittest.

For example:

fibo.__name__
'fibo'

Note that when the module being run is the current file, the name changes to __main__.

Let’s look at the name of this notebook.

__name__
'__main__'

Packages

A package is just a directory that may contain other modules and packages.

For a directory to become a package, it should contain an __init__.py file.

Note that as of Python 3.3, this file is optional. However, it is still useful and commonly used.

The __init__.py can be totally empty or it can have some Python code in it.

We’ll see why you would put code in it below.

Here’s an example of a simple package:

a_package_dir/ init.py # Can be empty module_a.py # Contains functions, classes, etc.

Here is an example directory structure of a package that contains another package:

a_package_dir/ init.py module_a.py a_sub_package_dir/ # A subdirectory init.py module_b.py

Given the above directory and file structures, within a Python file you can import the package a_package like this:

import a_package

This will run any code in a_package/__init__.py.

Any variable or function names defined in the __init__.py will be available like this:

a_package.a_name

However, no modules will be imported unless explicity commanded to.

So, the following will not work:

a_package.module_a

This is because module_a has not be imported.

To access module_a, we need to explicitly import it:

import a_package.module_a

Example

Let’s look at an example with actual files.

!ls -l demo_package1/
total 8
-rw------- 1 rca2t users  0 May  8 15:36 __init__.py
-rw------- 1 rca2t users 49 May  8 15:36 module1.py
!more demo_package1/__init__.py
!more demo_package1/module1.py
def welcome1():
    print("Hi, I'm from Demo 1!")

We import the package …

import demo_package1

But cannot access the module.

demo_package1.module1
AttributeError: module 'demo_package1' has no attribute 'module1'

To access the module in the package directory, we have to specify it in the import path:

import demo_package1.module1
demo_package1.module1
<module 'demo_package1.module1' from '/sfs/qumulo/qhome/rca2t/Documents/MSDS/DS5100/repo-book/notebooks/M09_PythonModules/demo_package1/module1.py'>

Now we have it in memory and can access its attributes.

demo_package1.module1.welcome1()
Hi, I'm from Demo 1!

from

We can use the from statement to provide a context for our imports.

This allows use to directly import the module into our code.

from demo_package1 import module1
module1.welcome1()
Hi, I'm from Demo 1!
from demo_package1.module1 import welcome1
welcome1()
Hi, I'm from Demo 1!

Note how we avoid having to name the package or module in these examples.

Note that we can’t do this:

from demo_package1 import module1.welcome1
SyntaxError: invalid syntax (2584258834.py, line 1)

This is because you can only import what a module or package direclty contains.

So, although demo_package1 contains module1,
and module1 contains welcome1,
demo_package1 does not contain welcome1.

Notice the grammar here.

The from keyword provides the context resource, and the import keyword specifies the attribute name directly contained by the resource.

In each case, what follows import is the name of the resource you will use to access it.

Preloading Modules and Functions

Rembmer that you can put any code you want in a __init__.py file.

It’s as if the package directory is a module, and the contents the initialization file is the content of the module.

A common use case for putting code into the package initialization file is to preload modules when importing the package.

This can be useful if you want to make certain modules available to all other modules in your project.

You can also use it to import files to be shared by modules in your own project for convenience.

For example, let’s say you have the following package set up:

funny/
    __init__.py
    funniest.py # contains the function joke()

If you wanted to import the module funniest into a script and have access to the function joke(), you’d have to do this:

import funny.funniest

And then to use the function, do this:

funny.funniest.joke()

But, remember, you can’t do this:

import funny

funny.funniest.joke()

However, you could do this if you want to simplify how you access the function:

from funny import funniest

funniest.joke()

Or even this:

from funny.funniest import joke

joke()

Now, you can by-pass having to import the module doing all of this in the initialization file.

Basically, you can put the same import line into the initialization file, and it’s as if you did it in your program.

Here are some scenarios.

In __init__.py put:

import funny.funniest

Or:

from funny import funniest

Then in the program:

import funny

funny.funniest.joke()

It looks like we violated the principle that you can only access what is immediately contained by the resource, but we secretly imported the contained module in our package initializer.

Or, you can put this in the initialization file:

from funny.funniest import joke

Then in the program, you can do this:

import funny

funny.joke()

Or this:

from funny import joke

joke()

See how it simplifies the import statement?

Let’s looks at some examples with real files.

Example 1: Empty __init__.py

Let’s import a module, this time using an alias.

import demo_package1.module1 as d1m
d1m.welcome1()
Hi, I'm from Demo 1!

Here we use a from statement to provide context.

from demo_package1.module1 import welcome1
welcome1()
Hi, I'm from Demo 1!

Example 2: Edited __init__.py

Now, we can allow the users to import a module function directly from a package by simply adding the following to our package initializer:

from package.module import func # or class

For example, our Demo2 __init__().py contains:

from demo_package2.module2 import welcome2

This allows us to do this in our calling script:

import demo_package2 as d2

d2.welcome2()
Hi, I'm from Demo 2!

Or this:

from demo_package2 import welcome2

welcome2()
Hi, I'm from Demo 2!

It turns out, this is a common practice.