NB: Introduction to Functions

Programming for Data Science

What is a Function?

A function is piece of code, separate fom the larger program, that performs a specific task.

This piece of code is given a name and can be called from the main program.

Functions are the verbs of a programming language. They signify action, and take subjects and objects (as it were).

How do they Work?

Functions take input data and produce output data.

Function inputs are called both parameters and arguments.
Outputs are called return values

Functions are always written with parentheses at the end of their names, e.g.

len(some_list)

time()

Internally, they contain a block of code to do their work.

Often the produce a transformation … e.g. from simple to complex.

When you use a function, we say you call a function. Programmers speak of “function calls” and “callbacks.”

Why Use Them?

Reduce complex tasks into simpler tasks.

Eliminate duplicate code — no need to re-write, reuse function as needed.

Make code reusable. Once function is written, you can reuse it in any other program.

Distribute tasks to multiple programmers. For example, each function can be written by someone.

Hide implementation details, i.e. abstraction.

Increase code readability.

Improve debugging by improving traceability. Things are easier to follow; you can jump from function to function.

Built-in Functions

Python provides many built-in functions. See Python built-in functions.

We’ve looked at many of these already.

These are functions that are available to use any time your are running Python.

To take one simple example, this is a built-in function: bool().

Takes an argument \(x\) and returns a boolean value, i.e. True or False.

bool(0), bool(500)

(False, True)

Imported Functions

Python is meant to be a highly modular language.

It is not designed to have a lot of special purpose functions built into it.

These keeps it light and highly customizable.

Many functions can be imported into a program to add to the functions that you can call in a script.

import math

math.log(256, 2)

8.0

User-Defined Functions

Python makes it easy for you to write your own functions. These are called user-defined functions.

Let’s write a function to compare the list against a threshold.

def vals_greater_than_or_equal_to_threshold(vals, thresh):
    '''
    This is the "docstring" of a function. It is optional but expected. It describes it's 
    purpose and the nature of the input and return values, as well as a sense of what it does.
    More elaborate information should appear in external documentation packages with the function.
    
    PURPOSE: Given a list of values, compare each value against a threshold
    
    INPUTS
    vals    list of ints or floats
    thresh  int or float
    
    OUTPUT
    bools  list of booleans
    '''
    
    filtered_vals = [val >= thresh for val in vals]
    
    return filtered_vals

Let’s break down the components

The function definition starts with def, followed by name, one or more arguments in parenthesis, and then a colon.

Next comes a docstring to provide information to users about how and why to use the function.

The function body follows.

Lastly is a return statement

The function call allows for the function to be used.
It consists of function name and required arguments:

vals_greater_than_or_equal_to_threshold(arg1, arg2) where arg1, arg2 are arbitrary names.

About the docstring

A docstring occurs as first statement in module, function, class, or method definition.

Internally, it is saved in __doc__ attribute of the function object.

It needs to be indented, i.e. part of the code block associated with the function.

It can be a single line or a multi-line string.

Users can print the docstring

print(vals_greater_than_or_equal_to_threshold.__doc__)


    This is the "docstring" of a function. It is optional but expected. It describes it's 
    purpose and the nature of the input and return values, as well as a sense of what it does.
    More elaborate information should appear in external documentation packages with the function.
    
    PURPOSE: Given a list of values, compare each value against a threshold
    
    INPUTS
    vals    list of ints or floats
    thresh  int or float
    
    OUTPUT
    bools  list of booleans

Print the docstring using help():

help(vals_greater_than_or_equal_to_threshold)

Help on function vals_greater_than_or_equal_to_threshold in module __main__:

vals_greater_than_or_equal_to_threshold(vals, thresh)
    This is the "docstring" of a function. It is optional but expected. It describes it's 
    purpose and the nature of the input and return values, as well as a sense of what it does.
    More elaborate information should appear in external documentation packages with the function.
    
    PURPOSE: Given a list of values, compare each value against a threshold
    
    INPUTS
    vals    list of ints or floats
    thresh  int or float
    
    OUTPUT
    bools  list of booleans

Or, use the ? prefix in a Jupyter notebook:

?vals_greater_than_or_equal_to_threshold

Signature: vals_greater_than_or_equal_to_threshold(vals, thresh)
Docstring:
This is the "docstring" of a function. It is optional but expected. It describes it's 
purpose and the nature of the input and return values, as well as a sense of what it does.
More elaborate information should appear in external documentation packages with the function.
PURPOSE: Given a list of values, compare each value against a threshold
INPUTS
vals    list of ints or floats
thresh  int or float
OUTPUT
bools  list of booleans
File:      /tmp/ipykernel_133200/392258855.py
Type:      function

Or suffix …

vals_greater_than_or_equal_to_threshold?

Signature: vals_greater_than_or_equal_to_threshold(vals, thresh)
Docstring:
This is the "docstring" of a function. It is optional but expected. It describes it's 
purpose and the nature of the input and return values, as well as a sense of what it does.
More elaborate information should appear in external documentation packages with the function.
PURPOSE: Given a list of values, compare each value against a threshold
INPUTS
vals    list of ints or floats
thresh  int or float
OUTPUT
bools  list of booleans
File:      /tmp/ipykernel_133200/392258855.py
Type:      function

Calling Our Function

Let’s use, or “call,” our function.

The function body uses a list comprehensionto perform a filtering operation:

[val >= thresh for val in vals]

Validate that it works for integers:

x = [3, 4]
thr = 4

vals_greater_than_or_equal_to_threshold(x, thr)

[False, True]

Validate that it works for floats:

x = [3.0, 4.2]
thr = 4.2

vals_greater_than_or_equal_to_threshold(x, thr)

[False, True]

This gives correct results and does exactly what we want.

Passing Parameters

All functions may take \(0\) or more arguments, also called parameters.

Functions need to be called with correct number of parameters.

This function requires two parameters, but the function call includes only one parameter.

def func_with_args(x, y):
    return x + y

func_with_args(10)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 func_with_args(10)

TypeError: func_with_args() missing 1 required positional argument: 'y'

Parameter Order

When calling a function, parameter order matters.

def fcn_swapped_args(x, y):
    out = 5 * x + y
    return out

x = 1
y = 2

fcn_swapped_args(x, y)

fcn_swapped_args(y, x)

Named Parameters

Generally it’s best to keep parameters in order.

However, You can swap the order by putting the parameter names in the function call.

fcn_swapped_args(y=y, x=x)

Weirdness Alert

Note that the same name can be used for the parameter names and the variables passed to them.

The names themselves have nothng to do with each other!

In other words, just because a function names an argument foo,
the variables passed to it don’t have to name foo or anything like it.
They can even be named the same thing—it does not matter.

Default Arguments

Use default arguments to set the value of arguments.

This allows users to call functions with fewer (or no) arguments.

Defaults are often set for the most common cases.

def show_results(precision, printing=True):
    precision = round(precision, 2)
    if printing:
      print('precision =', precision)
    return precision

pr = 0.912
res = show_results(pr)

precision = 0.91

The function call didn’t specify printing, so it defaulted to True.

foo = show_results(pr, False)

NOTE: Default arguments must follow non-default arguments in function definition.

This causes trouble:

def show_results(precision, printing=True, uhoh):
    precision = round(precision, 2)
    if printing:
      print('precision =', precision)
    return precision

  Cell In[30], line 1
    def show_results(precision, printing=True, uhoh):
                                               ^
SyntaxError: non-default argument follows default argument

Returning Values

Functions are not required to have return statement but it is a good idea to have one.

If there is no return statement, a function returns None.

Functions can return no value (None), one value, or many.

Many values are returned as a tuple.

Any Python object can be returned.

This returns None.

def fcn_nothing_to_return(x, y):
    out = 'nothing to see here!'
    print(out)

fcn_nothing_to_return(x, y)

nothing to see here!

r = fcn_nothing_to_return(1, 1)

nothing to see here!

print(r)

None

This returns three values.

def negate_coords(x, y, z):
    return -x, -y, -z

a, b, c = negate_coords(10, 20, 30)

a, b, c

(-10, -20, -30)

foo = negate_coords(10, 20, 30)

foo

(-10, -20, -30)

If you don’t need an output, use the dummy variable _.

d, e, _ = negate_coords(10,20,30)

d, e

(-10, -20)

Note: It’s generally a good idea to include return statements, even if not returning a value.

This shows that you did not forget to consider the return value.

You can use return or return None.

Functions can contain multiple return statements.

These may be used under different logical conditions.

def absolute_value(num):
    if num >= 0:
        return num
    return -num

absolute_value(-4), absolute_value(4)

(4, 4)

Unpacking List-likes with `*args`

The * prefix operator can be passed to avoid specifying the arguments individually.

def show_arg_expansion1(models):
    print(models)

We can pass a tuple of values to the function …

show_arg_expansion1("logreg", "naive_bayes", "gbm")

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[92], line 1
----> 1 show_arg_expansion1("logreg", "naive_bayes", "gbm")

TypeError: show_arg_expansion1() takes 1 positional argument but 3 were given

def show_arg_expansion2(*models):
    print(models)

show_arg_expansion2("logreg", "naive_bayes", "gbm")

('logreg', 'naive_bayes', 'gbm')

This also allows for an unspecified number of arguments, such as how print() works.

You can also pass a list to the function.

If you want the elements unpacked, put * before the list.

models = ["logreg", "naive_bayes", "gbm"]
show_arg_expansion2(*models)

('logreg', 'naive_bayes', 'gbm')

This approach allows your function to accept an arbitrary number of arguments.

Note you can prefix a string with an asterisk *:

show_arg_expansion2(*'abcdefg')

('a', 'b', 'c', 'd', 'e', 'f', 'g')

Or a string operation that returns a list:

show_arg_expansion2(*'a b c d e f g'.split())

('a', 'b', 'c', 'd', 'e', 'f', 'g')

You can use the * prefix to pass list-like objects to a function with a defined number of arguments.

def arg_expansion_example(x, y):
    return x**y

my_args = [2, 8]
arg_expansion_example(*my_args)

But, the passed object must be the right length.

my_args2 = [2, 8, 5]
arg_expansion_example(*my_args2)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[75], line 2
      1 my_args2 = [2, 8, 5]
----> 2 arg_expansion_example(*my_args2)

TypeError: arg_expansion_example() takes 2 positional arguments but 3 were given

Function Design

A function is not just a bag of code!

Design a function to do one thing.

Make them as simple as possible. This makes them:

more comprehensible
easier to maintain
reusable

This helps avoid situations where a team has 20 variations of similar functions.

Give your function a good name.

It should reflect the action it performs.
Be consistent in your naming conventions.
A name like compute_variances_sort_save_print suggests the function is overworked!

If the function compute_variances also produces plots and updates variables, it will cause confusion.

Always give your function a docstring

Particularly important since indicating data types is not required.
As a side note, you can include this information by using type annotation.

You may be interested to learn some of the formatting languages that have been developed to write docstrings. See Lutz 2019 and this web page about Documenting Python Code for more info.