NB: Python Timing

Programming for Data Science

Before we move onto the topic of NumPy, let’s pause and cover the topic of timing.

This sometimes called the runtime of a program or code block.

Timing in this context refers to measuring how long it takes your code to execute.

Python provides some tools that let you measure how long a block of code takes to execute, so you can compare the speed of different approaches to the same problem.

For example, we might compare the speed of using a comprehension versus a for loop.

The time module

One way to to measure how long it takes a block of code to run is to use the time module.

This module provides a number of functions to get and compute time.

The simplest function is time(), which returns the number of seconds elapsed since the Epoch.

The Epoch is 00:00:00 UTC on 1 January 1970, excluding leapseconds.

It corresponds roughly to when Unix was invented.

To get the time of a block, we get the time before the code runs \(t_0\) and substract it from the time the code finishes \(t_1\). Let’s try this by comparing a simple loop and comprehension.

from time import time
t0 = time() # START
for i in range(10):
    print(i, end=' ')
t1 = time() # END
0 1 2 3 4 5 6 7 8 9 
t3 = time()
_ = [print(i, end=' ') for i in range(10)]
t4 = time()
0 1 2 3 4 5 6 7 8 9 
delta_loop = t1 - t0
delta_comp = t4 - t3
print('loop:', delta_for)
print('comp:', delta_comp)
print('loop/comp:', round(delta_loop/delta_comp, 2))
loop: 8.58306884765625e-05
comp: 9.298324584960938e-05
loop/comp: 0.92

Interestingly, the for loop is faster than the comprehension.

Using timeit

To get a better measure of runtime, we can use the timeit module.

Thie module measures timing across many runs.

Since timeit() will return the runtime across all runs, we divide by the number of runs to get the mean runtime.

timeit() works by evaluating code blocks written as strings.

Let’s compare two funcitons using timeit:

from timeit import timeit
  
num_runs = 100

loop_code = ''' 
vals = []
for i in range(1, 100001):
    if i % 2 == 1:
        i *= -1
    vals.append(i)
'''

comp_code = ''' 
vals = [i * -1 if i % 2 == 1 else i for i in range(1, 100001)]
'''

loop_mean_time = timeit(stmt = loop_code, number = num_runs) / num_runs
comp_mean_time = timeit(stmt = comp_code, number = num_runs) / num_runs
t_diff = loop_mean_time / comp_mean_time
print('loop =', loop_mean_time)
print('comp =', comp_mean_time)
print('loop/list =', t_diff)
print('list/loop =', 1/t_diff)
loop = 0.005688848439604044
comp = 0.0046883809473365545
loop/list = 1.2133929609188114
list/loop = 0.8241353231872839

Using Magic

Instead of calling time and timeit directly, we can use the so-called magic commands.

Magic commands are % or %% prefixed commands that work in Jupyter notebooks and other IPython environments.

% commands apply to single lines; they go at the beginning of the line.

%% commands apply to cell blocks; they go at the top of the cell.

Placing %%timeit or %%time at the top of a cell will appy these functions to the cell block.

Placing %timeit or %time as the first item on a line of code will apply the to a single line.

Note that magic commands can take arguments.

For more on this topic, see Chapter 3 of Wes McKinney’s Python for Data Analysis and the official documentation

Let’s look at an example, similar to those above, comparing a loop to a comprehension.

time

imax = 10000
%%time
vals = []
for i in range(1, imax+1):
    if i % 2 == 1:
        i *= -1
    vals.append(i)
CPU times: user 1.17 ms, sys: 0 ns, total: 1.17 ms
Wall time: 1.18 ms
%time vals = [i*-1 if i % 2 == 1 else i for i in range(1,imax+1)] 
CPU times: user 528 µs, sys: 0 ns, total: 528 µs
Wall time: 538 µs

timeit

%%timeit

vals = []
imax = 10000
for i in range(1, imax+1):
    if i % 2 == 1:
        i *= -1
    vals.append(i)
507 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit vals = [i*-1 if i % 2 == 1 else i for i in range(1,imax+1)] 
469 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Types of Time

Note that the return values of time contain a detailed description of results, including three kinds of CPU time and wall time.

Wall time measures how much time has passed, as if you were looking at the clock on your wall.

CPU time refers to how many seconds the CPU was actually busy.

In CPU time, user time is the amount of time a processor spends running application code.

System time is the amount of time it spends running operating system functions related to the application.