NB: On Comprehensions

Purpose - Explain the benefit of list comprehensions - Illustrate the use of list comprehensions - Explain the benefit of dict comprehensions - Illustrate the use of dict comprehensions

Concepts - list comprehension - dict comprehension - iterators

List Comprehensions

Consider this task: check if each integer in a list is odd.

Without list comprehensions, you might do this:

Check if Odd

vals = [1,5,6,8,12,15]
is_odd = []

for val in vals:   
    if val % 2: # if remainder is one, val is odd
        is_odd.append(True)
    else:       # else it's not odd
        is_odd.append(False)

is_odd
[True, True, False, False, False, True]

The code loops over each value in the list, checks the condition, and appends to a new list.

The code works, but it’s lengthy compared to a list comprehension.

The approach takes extra time to write and understand.

Let’s solve with a list comprehension:

is_odd = [val % 2 == 1 for val in vals]
is_odd
[True, True, False, False, False, True]

Much shorter, and if you understand the syntax, quicker to interpet.

Note the in-place use of an expression.

Now let’s discuss the syntax.

Comprehensions in General

Comprehensions provide a concise method for iterating over any list-like object to a new list like object.

There are comprehensions for each list-like object: * List comprehensions * Dictionary comprehensions * Tuple comprehensions * Set comprehensions

Comprehensions are essentially very concise for loops. They are compact visually, but they also are more efficient than loops.

All comprehensions have the form:

listlike_result = [ expression + context]

The type of comprehension is indicated by the use of enclosing pairs, just like anonymous constructors:

  • List comprehensions [expression + context]
  • Dictionary comprehensions {expression + context}
  • Tuple comprehensions (expression + context)
  • Set comprehensions {expression + context}

Expression defines what to do with each element in the list. This has the structure of the kind of comprehension. So, dictionary comprehension expressions take the form k:v while sets use v.

Context defines which list elements to select. The context always consists of an arbitrary number of for and if statements.

More examples

Stop Word Remover

Create list of words, and list of stop words.
Filter out the stop words (considered not important).

stop_words = ['a','am','an','i','the','of']
words      = ['i','am','not','a','fan','of','the','film']

clean_words = [wd for wd in words if wd not in stop_words]
clean_words
['not', 'fan', 'film']

placing the color-coding on the list comprehension:

[ wd for wd in words if wd not in stop_words]

  • the expression is very simple: wd. keep the word if meets condition
  • the condition does the work: if the word isn’t in list of stop words, keep it

Side note: This task can also be done with sets, if you are not concerned with mulitple instances of the same word:

s1 = set(stop_words)
s2 = set(words)
s3 = s2 - s1
s3
{'fan', 'film', 'not'}

Select Tokens Containing Units

Given a list of measurements, retain elements containing mmHg (millimeters of mercury)

units = 'mmHg'
measures = ['20', '115mmHg', '5mg', '10 mg', '7.5dl', '120 mmHg']
meas_mmhg = [meas for meas in measures if units in meas]
meas_mmhg   
['115mmHg', '120 mmHg']

Filtering on two conditions

units1 = 'mmHg'
units2 = 'dl'
meas_mmhg_dl = [meas for meas in measures if units1 in meas or units2 in meas]
meas_mmhg_dl
['115mmHg', '7.5dl', '120 mmHg']

This can be written differently for clarity:

[meas 
 for meas in measures 
 if units1 in meas 
 or units2 in meas]
['115mmHg', '7.5dl', '120 mmHg']

Dictionary Comprehensions

Dictionary comprehensions provide a concise method for iterating over a dictionary to create a new dictionary.

This is common when data is structured as key-value pairs, and we’d like to filter the dict.

# various deep learning models and their depths

model_arch = {'cnn_1':'15 layers', 'cnn_2':'20 layers', 'rnn': '10 layers'}
# create a new dict containing only key-value pairs where the key contains 'cnn'

cnns = {key:model_arch[key] for key in model_arch.keys() if 'cnn' in key}
cnns
{'cnn_1': '15 layers', 'cnn_2': '20 layers'}

We build the key-value pairs using key:model_arch[key], where the key indexes into the dict model_arch