= ['living room', 'was', 'quite', 'large']
tokens for tok in tokens:
print(tok)
living room
was
quite
large
Programming for Data Science
We have seen that sequential data structures like lists and tuples have a natural affinity to loops.
Sequences imply loops and loops expect sequences.
In Python, this relationship is captured by the resonance between the words iteration and iterables.
Iterable data structures that can be iterated over, meaning they can return their elements one at a time.
Examples of iterable objects include lists, tuples, sets, dictionaries, and strings.
Typically we iterate over iterables using for
loops, as we saw when reviewed control structures.
for
First, let’s review iteration by means of a for
loop.
= ['living room', 'was', 'quite', 'large']
tokens for tok in tokens:
print(tok)
living room
was
quite
large
Python introduces a kind of object call an iterator designed to make iteration — sequence processing — fast and efficient.
An iterator is a specific object that represents an interable stream of data.
It is used to iterate over iterable objects by removing one element at a time from the iterables.
An iterator works by popping out and removing a value at each iteration.
This means than when iterating through an iterable object you empty it as you go, leaving an empty data structure at the end.
This is useful in situations where you want to save memory.
Many functions in Python return iterables so it’s helpful to understand them even if you don’t create any yourself.
iter()
and next()
To use an iterator, you convert a sequence to an iterator object using iter()
.
Then you use next()
to get the next item from the iterator.
= ['living room','was','quite','large']
tokens = iter(tokens)
myit print(next(myit))
print(next(myit))
print(next(myit))
print(next(myit))
living room
was
quite
large
Calling next()
when the iterator has reached the end of the list produces an exception:
next(myit)
StopIteration:
Note that when used with an iterable created by iter()
, for
implicitly executes next()
on each loop iteration.
= iter(tokens) # Reset the iterator
myit for next_it in myit:
print(next_it)
living room
was
quite
large
So far, we iterated over a list.
Now let’s look at sets, strings, tuples, dictionaries, and ranges.
Lists, tuples, and strings are sequences. Sequences are designed so that elements come out of them in the same order they were put in.
Sets and dictionaries are not sequences per se, since they the order of their elements is not as important as their names. They are called collections.
Note that prior to Python 3.7, the order of elements in sets and dictionaries was arbitrary. Now, dictionaries preserve the order in which they were populated, and sets are sorted.
Iterating using for
:
= {'belle', 'cinderella', 'rapunzel'}
princesses for princess in princesses:
print(princess)
rapunzel
belle
cinderella
Iterating using iter()
and next()
:
= iter(princesses)
princesses_i print(next(princesses_i))
print(next(princesses_i))
print(next(princesses_i))
rapunzel
belle
cinderella
type(princesses_i)
set_iterator
Iterating using for
:
= 'data'
str1 for my_char in str1:
print(my_char)
d
a
t
a
Iterating using iter()
and next()
:
= iter(str1)
str1_i print(next(str1_i))
print(next(str1_i))
print(next(str1_i))
print(next(str1_i))
d
a
t
a
type(str1_i)
str_ascii_iterator
Iterating using for
:
= ('auc','recall','precision','support')
metrics for met in metrics:
print(met)
auc
recall
precision
support
Iterating using iter()
and next()
:
= ('auc','recall','precision','support')
metrics = iter(metrics)
metrics_i print(next(metrics_i))
print(next(metrics_i))
print(next(metrics_i))
print(next(metrics_i))
auc
recall
precision
support
type(metrics_i)
tuple_iterator
Iterating using for
:
= {'fall': ['regression','python'], 'spring': ['capstone','pyspark','nlp']} courses
for k in courses:
print(k)
fall
spring
for k in courses.keys():
print(k)
fall
spring
for v in courses.values():
print(v)
['regression', 'python']
['capstone', 'pyspark', 'nlp']
for k, v in courses.items():
print(f"{k.upper()}:\t{', '.join(v)}")
FALL: regression, python
SPRING: capstone, pyspark, nlp
for k in courses.keys():
print(f"{k.upper()}:\t{', '.join(courses[k])}") # index into the dict with the key
FALL: regression, python
SPRING: capstone, pyspark, nlp
Iterating using for
:
for i in range(10):
print(str(i+1).zfill(2), (i+1)**2 * '|')
01 |
02 ||||
03 |||||||||
04 ||||||||||||||||
05 |||||||||||||||||||||||||
06 ||||||||||||||||||||||||||||||||||||
07 |||||||||||||||||||||||||||||||||||||||||||||||||
08 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
09 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
enumerate()
Very often you will want to know iteration number you are on in a loop.
This can be used to name files or dict keys, for example.
enumerate()
will return the index and key for each iteration.
courses
{'fall': ['regression', 'python'], 'spring': ['capstone', 'pyspark', 'nlp']}
for i, semester in enumerate(courses):
= f"{str(i).zfill(2)}_{semester}:\t{'-'.join(courses[semester])}"
course_name print(course_name)
00_fall: regression-python
01_spring: capstone-pyspark-nlp
Iterations can be nested — this is very powerful.
This works well with nested data structures, like dictionaries within dictionaries.
This is basically how JSON
files are handled, BTW.
Be careful, though – these can get deep and complicated.
for i, semester in enumerate(courses):
print(f"{i+1}. {semester.upper()}:")
for j, course in enumerate(courses[semester]):
print(f"\t{i+1}.{j+1}. {course}")
1. FALL:
1.1. regression
1.2. python
2. SPRING:
2.1. capstone
2.2. pyspark
2.3. nlp
Used nested loops to get the cartesian product.
= range(1,7)
die = []
die_rolls for face1 in die:
for face2 in die:
die_rolls.append((face1, face2))print(die_rolls)
[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]
Now get the frequency of die roll sums.
= {} die_roll_sums
for my_die_roll in die_rolls:
= sum(my_die_roll)
my_die_roll_sum = die_roll_sums.get(my_die_roll_sum, 0) + 1 die_roll_sums[my_die_roll_sum]
for k, v in die_roll_sums.items():
print(str(k).zfill(2), v, '|' * v)
02 1 |
03 2 ||
04 3 |||
05 4 ||||
06 5 |||||
07 6 ||||||
08 5 |||||
09 4 ||||
10 3 |||
11 2 ||
12 1 |