bool(0), bool(500)
(False, True)
Programming for Data Science
A function is piece of code, separate fom the larger program, that performs a specific task.
This piece of code is given a name and can be called from the main program.
Functions are the verbs of a programming language. They signify action, and take subjects and objects (as it were).
Functions take input data and produce output data.
Functions are always written with parentheses at the end of their names, e.g.
len(some_list)
time()
Internally, they contain a block of code to do their work.
Often the produce a transformation … e.g. from simple to complex.
When you use a function, we say you call a function. Programmers speak of “function calls” and “callbacks.”
Reduce complex tasks into simpler tasks.
Eliminate duplicate code — no need to re-write, reuse function as needed.
Make code reusable. Once function is written, you can reuse it in any other program.
Distribute tasks to multiple programmers. For example, each function can be written by someone.
Hide implementation details, i.e. abstraction.
Increase code readability.
Improve debugging by improving traceability. Things are easier to follow; you can jump from function to function.
Python provides many built-in functions. See Python built-in functions.
We’ve looked at many of these already.
These are functions that are available to use any time your are running Python.
To take one simple example, this is a built-in function: bool()
.
Takes an argument \(x\) and returns a boolean value, i.e. True
or False
.
bool(0), bool(500)
(False, True)
Python is meant to be a highly modular language.
It is not designed to have a lot of special purpose functions built into it.
These keeps it light and highly customizable.
Many functions can be imported into a program to add to the functions that you can call in a script.
import math
256, 2) math.log(
8.0
Python makes it easy for you to write your own functions. These are called user-defined functions.
Let’s write a function to compare the list against a threshold.
def vals_greater_than_or_equal_to_threshold(vals, thresh):
'''
This is the "docstring" of a function. It is optional but expected. It describes it's
purpose and the nature of the input and return values, as well as a sense of what it does.
More elaborate information should appear in external documentation packages with the function.
PURPOSE: Given a list of values, compare each value against a threshold
INPUTS
vals list of ints or floats
thresh int or float
OUTPUT
bools list of booleans
'''
= [val >= thresh for val in vals]
filtered_vals
return filtered_vals
Let’s break down the components
The function definition starts with def
, followed by name, one or more arguments in parenthesis, and then a colon.
Next comes a docstring to provide information to users about how and why to use the function.
The function body follows.
Lastly is a return
statement
The function call allows for the function to be used.
It consists of function name and required arguments:
vals_greater_than_or_equal_to_threshold(arg1, arg2)
where arg1
, arg2
are arbitrary names.
A docstring occurs as first statement in module, function, class, or method definition.
Internally, it is saved in __doc__
attribute of the function object.
It needs to be indented, i.e. part of the code block associated with the function.
It can be a single line or a multi-line string.
print(vals_greater_than_or_equal_to_threshold.__doc__)
This is the "docstring" of a function. It is optional but expected. It describes it's
purpose and the nature of the input and return values, as well as a sense of what it does.
More elaborate information should appear in external documentation packages with the function.
PURPOSE: Given a list of values, compare each value against a threshold
INPUTS
vals list of ints or floats
thresh int or float
OUTPUT
bools list of booleans
Print the docstring using help()
:
help(vals_greater_than_or_equal_to_threshold)
Help on function vals_greater_than_or_equal_to_threshold in module __main__:
vals_greater_than_or_equal_to_threshold(vals, thresh)
This is the "docstring" of a function. It is optional but expected. It describes it's
purpose and the nature of the input and return values, as well as a sense of what it does.
More elaborate information should appear in external documentation packages with the function.
PURPOSE: Given a list of values, compare each value against a threshold
INPUTS
vals list of ints or floats
thresh int or float
OUTPUT
bools list of booleans
Or, use the ?
prefix in a Jupyter notebook:
?vals_greater_than_or_equal_to_threshold
Signature: vals_greater_than_or_equal_to_threshold(vals, thresh) Docstring: This is the "docstring" of a function. It is optional but expected. It describes it's purpose and the nature of the input and return values, as well as a sense of what it does. More elaborate information should appear in external documentation packages with the function. PURPOSE: Given a list of values, compare each value against a threshold INPUTS vals list of ints or floats thresh int or float OUTPUT bools list of booleans File: /tmp/ipykernel_133200/392258855.py Type: function
Or suffix …
vals_greater_than_or_equal_to_threshold?
Signature: vals_greater_than_or_equal_to_threshold(vals, thresh) Docstring: This is the "docstring" of a function. It is optional but expected. It describes it's purpose and the nature of the input and return values, as well as a sense of what it does. More elaborate information should appear in external documentation packages with the function. PURPOSE: Given a list of values, compare each value against a threshold INPUTS vals list of ints or floats thresh int or float OUTPUT bools list of booleans File: /tmp/ipykernel_133200/392258855.py Type: function
Let’s use, or “call,” our function.
The function body uses a list comprehension
to perform a filtering operation:
[val >= thresh for val in vals]
Validate that it works for integers:
= [3, 4]
x = 4
thr
vals_greater_than_or_equal_to_threshold(x, thr)
[False, True]
Validate that it works for floats:
= [3.0, 4.2]
x = 4.2
thr
vals_greater_than_or_equal_to_threshold(x, thr)
[False, True]
This gives correct results and does exactly what we want.
All functions may take \(0\) or more arguments, also called parameters.
Functions need to be called with correct number of parameters.
This function requires two parameters, but the function call includes only one parameter.
def func_with_args(x, y):
return x + y
10) func_with_args(
TypeError: func_with_args() missing 1 required positional argument: 'y'
When calling a function, parameter order matters.
def fcn_swapped_args(x, y):
= 5 * x + y
out return out
= 1
x = 2 y
fcn_swapped_args(x, y)
7
fcn_swapped_args(y, x)
11
Generally it’s best to keep parameters in order.
However, You can swap the order by putting the parameter names in the function call.
=y, x=x) fcn_swapped_args(y
Note that the same name can be used for the parameter names and the variables passed to them.
The names themselves have nothng to do with each other!
In other words, just because a function names an argument foo
,
the variables passed to it don’t have to name foo
or anything like it.
They can even be named the same thing—it does not matter.
Use default arguments to set the value of arguments.
This allows users to call functions with fewer (or no) arguments.
Defaults are often set for the most common cases.
def show_results(precision, printing=True):
= round(precision, 2)
precision if printing:
print('precision =', precision)
return precision
= 0.912
pr = show_results(pr) res
precision = 0.91
The function call didn’t specify printing
, so it defaulted to True.
= show_results(pr, False) foo
NOTE: Default arguments must follow non-default arguments in function definition.
This causes trouble:
def show_results(precision, printing=True, uhoh):
= round(precision, 2)
precision if printing:
print('precision =', precision)
return precision
SyntaxError: non-default argument follows default argument (830346004.py, line 1)
Functions are not required to have return statement but it is a good idea to have one.
If there is no return statement, a function returns None
.
Functions can return no value (None
), one value, or many.
Many values are returned as a tuple.
Any Python object can be returned.
This returns None
.
def fcn_nothing_to_return(x, y):
= 'nothing to see here!'
out print(out)
fcn_nothing_to_return(x, y)
nothing to see here!
= fcn_nothing_to_return(1, 1) r
nothing to see here!
r
print(r)
None
This returns three values.
def negate_coords(x, y, z):
return -x, -y, -z
= negate_coords(10, 20, 30) a, b, c
a, b, c
(-10, -20, -30)
= negate_coords(10, 20, 30) foo
foo
(-10, -20, -30)
If you don’t need an output, use the dummy variable _
.
= negate_coords(10,20,30) d, e, _
d, e
(-10, -20)
Note: It’s generally a good idea to include return statements, even if not returning a value.
This shows that you did not forget to consider the return value.
You can use return
or return None
.
Functions can contain multiple return statements.
These may be used under different logical conditions.
def absolute_value(num):
if num >= 0:
return num
return -num
-4), absolute_value(4) absolute_value(
(4, 4)
*args
The *
prefix operator can be passed to avoid specifying the arguments individually.
def show_arg_expansion1(models):
print(models)
We can pass a tuple of values to the function …
"logreg", "naive_bayes", "gbm") show_arg_expansion1(
TypeError: show_arg_expansion1() takes 1 positional argument but 3 were given
def show_arg_expansion2(*models):
print(models)
"logreg", "naive_bayes", "gbm") show_arg_expansion2(
('logreg', 'naive_bayes', 'gbm')
This also allows for an unspecified number of arguments, such as how print()
works.
You can also pass a list to the function.
If you want the elements unpacked, put *
before the list.
= ["logreg", "naive_bayes", "gbm"]
models *models) show_arg_expansion2(
('logreg', 'naive_bayes', 'gbm')
This approach allows your function to accept an arbitrary number of arguments.
Note you can prefix a string with an asterisk *
:
*'abcdefg') show_arg_expansion2(
('a', 'b', 'c', 'd', 'e', 'f', 'g')
Or a string operation that returns a list:
*'a b c d e f g'.split()) show_arg_expansion2(
('a', 'b', 'c', 'd', 'e', 'f', 'g')
You can use the *
prefix to pass list-like objects to a function with a defined number of arguments.
def arg_expansion_example(x, y):
return x**y
= [2, 8]
my_args *my_args) arg_expansion_example(
256
But, the passed object must be the right length.
= [2, 8, 5]
my_args2 *my_args2) arg_expansion_example(
TypeError: arg_expansion_example() takes 2 positional arguments but 3 were given
A function is not just a bag of code!
Design a function to do one thing.
Make them as simple as possible. This makes them:
This helps avoid situations where a team has 20 variations of similar functions.
Give your function a good name.
compute_variances_sort_save_print
suggests the function is overworked!If the function compute_variances
also produces plots and updates variables, it will cause confusion.
Always give your function a docstring
You may be interested to learn some of the formatting languages that have been developed to write docstrings. See Lutz 2019 and this web page about Documenting Python Code for more info.