NB: Comprehensions

Programming for Data Science

List Comprehensions

Consider the following task.

Check if each integer in a list is odd and save the results (true or false) in a list.

With a standard for loop, you could do this:

vals = [1,5,6,8,12,15]
is_odd = []
for val in vals:   
    if val % 2:
        is_odd.append(True)
    else:       
        is_odd.append(False)
is_odd
[True, True, False, False, False, True]

Now let’s do the same thing with a list comprehension:

is_odd_comp = [val % 2 == 1 for val in vals]
is_odd_comp
[True, True, False, False, False, True]

Much shorter, and if you understand the syntax, quicker to interpet.

Here’s how you might save all the even numbers in a list:

odd_vals = [val for val in vals if val % 2 == 1]
odd_vals
[1, 5, 15]

This introduces how comprehensions may include a boolean condition to filter what gets included in the result.

Comprehensions in General

Comprehensions provide a concise method for iterating over any iterable object to a new iterable object.

There are comprehensions for each type of iterable:

  • List comprehensions
  • Dictionary comprehensions
  • Set comprehensions

Note there is no tuple comprehension.

Comprehensions are essentially concise for loops that address the use case of transforming one interable into another.

They are also are more efficient than loops.

All comprehensions have the form:

listlike_result = [ expression + context + condition]

For example, in the comprehension above we can see these parts by breaking up the code into three lines:

odd_vals = [
    val 
    for val in vals 
    if val % 2 == 1
]

Note this is syntactically legit.

The type of comprehension is indicated by the use of enclosing pairs, just like anonymous constructors:

  • List comprehensions [expression + context + condition]
  • Dictionary comprehensions {expression + context + condition}
  • Set comprehensions {expression + context + condition}

Parts:

  • Expression defines what to do with each element in the list.

    • This can be a complex expression, or it may not include the iterated value at all.

    • For dictionaries, the expression is actually conplex; it must be a key/value pair.

  • Context defines which iterable elements to select.

  • Conidtion defines a boolean condition on the iterated value that determines if it gets included in the expression.

Note the that you can include comprehensions within comprehensions.

And you can include multiple context + condition statements.

Examples

Removing Stopwords

Define a sentence and a list of stop words.

Filter out the stop words (considered not important).

sentence = "I am not a fan of this film"
stop_words = ['a','am','an','i','the','of']
clean_words = [word for word in sentence.split() if word.lower() not in stop_words]
clean_words
['not', 'fan', 'this', 'film']

Here is a color-coded version of the list comprehension to show its parts:

[word for word in sentence.split() if word not in stop_words]

Side note: This task can also be done with sets, if you are not concerned with mulitple instances of the same word:

s1 = set(stop_words)
s2 = set(sentence.lower().split())
s3 = s2 - s1
s3
{'fan', 'film', 'not', 'this'}

Selecting Tokens Containing Units

Given a list of measurements, retain elements containing \(mmHg\) (millimeters of mercury)

units = 'mmHg'
measures = ['20', '115mmHg', '5mg', '10 mg', '7.5dl', '120 mmHg']
measures_mmhg = [measure for measure in measures if units in measure]
measures_mmhg   
['115mmHg', '120 mmHg']

Filtering on two conditions

units1 = 'mmHg'
units2 = 'dl'
meas_mmhg_dl = [meas for meas in measures if units1 in meas or units2 in meas]
meas_mmhg_dl
['115mmHg', '7.5dl', '120 mmHg']

This can be written differently for clarity:

[meas 
 for meas in measures 
 if units1 in meas 
 or units2 in meas]
['115mmHg', '7.5dl', '120 mmHg']

Dictionary Comprehensions

Dictionary comprehensions provide a concise method for iterating over a dictionary to create a new dictionary.

This is common when data is structured as key-value pairs, and we’d like to filter the dict.

Here we define various deep learning models and their depths (in layers).

model_arch = {'cnn_1': 15, 'cnn_2': 20, 'rnn': 10}

We use a comprehension to create a new dict containing only key-value pairs where the key contains the string cnn.

cnns = {key:model_arch[key] for key in model_arch.keys() if 'cnn' in key}
cnns
{'cnn_1': 15, 'cnn_2': 20}

We build the key-value pairs using key:model_arch[key], where the key indexes into the dict model_arch.