= [1,5,6,8,12,15]
vals = []
is_odd for val in vals:
if val % 2:
True)
is_odd.append(else:
False) is_odd.append(
NB: Comprehensions
Programming for Data Science
List Comprehensions
Consider the following task.
Check if each integer in a list is odd and save the results (true or false) in a list.
With a standard for loop, you could do this:
is_odd
[True, True, False, False, False, True]
Now let’s do the same thing with a list comprehension:
= [val % 2 == 1 for val in vals] is_odd_comp
is_odd_comp
[True, True, False, False, False, True]
Much shorter, and if you understand the syntax, quicker to interpet.
Here’s how you might save all the even numbers in a list:
= [val for val in vals if val % 2 == 1]
odd_vals odd_vals
[1, 5, 15]
This introduces how comprehensions may include a boolean condition to filter what gets included in the result.
Comprehensions in General
Comprehensions provide a concise method for iterating over any iterable object to a new iterable object.
There are comprehensions for each type of iterable:
- List comprehensions
- Dictionary comprehensions
- Set comprehensions
Note there is no tuple comprehension.
Comprehensions are essentially concise for
loops that address the use case of transforming one interable into another.
They are also are more efficient than loops.
All comprehensions have the form:
listlike_result = [ expression + context + condition]
For example, in the comprehension above we can see these parts by breaking up the code into three lines:
= [
odd_vals
val for val in vals
if val % 2 == 1
]
Note this is syntactically legit.
The type of comprehension is indicated by the use of enclosing pairs, just like anonymous constructors:
- List comprehensions
[expression + context + condition]
- Dictionary comprehensions
{expression + context + condition}
- Set comprehensions
{expression + context + condition}
Parts:
Expression defines what to do with each element in the list.
This can be a complex expression, or it may not include the iterated value at all.
For dictionaries, the expression is actually conplex; it must be a key/value pair.
Context defines which iterable elements to select.
Conidtion defines a boolean condition on the iterated value that determines if it gets included in the expression.
Note the that you can include comprehensions within comprehensions.
And you can include multiple context + condition statements.
Examples
Removing Stopwords
Define a sentence and a list of stop words.
Filter out the stop words (considered not important).
= "I am not a fan of this film"
sentence = ['a','am','an','i','the','of'] stop_words
= [word for word in sentence.split() if word.lower() not in stop_words]
clean_words clean_words
['not', 'fan', 'this', 'film']
Here is a color-coded version of the list comprehension to show its parts:
[word for word in sentence.split() if word not in stop_words]
Side note: This task can also be done with sets, if you are not concerned with mulitple instances of the same word:
= set(stop_words)
s1 = set(sentence.lower().split())
s2 = s2 - s1 s3
s3
{'fan', 'film', 'not', 'this'}
Selecting Tokens Containing Units
Given a list of measurements, retain elements containing \(mmHg\) (millimeters of mercury)
= 'mmHg'
units = ['20', '115mmHg', '5mg', '10 mg', '7.5dl', '120 mmHg']
measures = [measure for measure in measures if units in measure] measures_mmhg
measures_mmhg
['115mmHg', '120 mmHg']
Filtering on two conditions
= 'mmHg'
units1 = 'dl'
units2 = [meas for meas in measures if units1 in meas or units2 in meas] meas_mmhg_dl
meas_mmhg_dl
['115mmHg', '7.5dl', '120 mmHg']
This can be written differently for clarity:
[meas for meas in measures
if units1 in meas
or units2 in meas]
['115mmHg', '7.5dl', '120 mmHg']
Dictionary Comprehensions
Dictionary comprehensions provide a concise method for iterating over a dictionary to create a new dictionary.
This is common when data is structured as key-value pairs, and we’d like to filter the dict.
Here we define various deep learning models and their depths (in layers).
= {'cnn_1': 15, 'cnn_2': 20, 'rnn': 10} model_arch
We use a comprehension to create a new dict containing only key-value pairs where the key contains the string cnn
.
= {key:model_arch[key] for key in model_arch.keys() if 'cnn' in key} cnns
cnns
{'cnn_1': 15, 'cnn_2': 20}
We build the key-value pairs using key:model_arch[key]
, where the key indexes into the dict model_arch
.