NB: Structures

In contrast to primitive data types, data structures organize types into structures that have certain properties, such as order, mutability, and addressing scheme, e.g. by index.

Lists

A list is an ordered sequence of items.

Each element of a list is associated with an integer that represents the order in which the element appears.

Lists are indexed with brackets [].

List elements are accessed by providing their order number in the brackets.

Lists are mutable, meaning you can modify them after they have been created.

They can contain mixed types.

Constructing

They can be constructed in several ways:

list1 = []
list2 = list()
list3 = "some string".split()
numbers = [1,2,3,4] 

Indexing

Zero-based indexing

Python uses xzero-based indexing, which means for a collection mylist

mylist[0] references the first element
mylist[1] references the second element, etc

For any iterable object of length N:
mylist[:n] will return the first n elements from index 0 to n-1
mylist[-n:] will return the last n elements from index N-n to N-1

numbers[0] # Access first element (output: 1)
1
numbers[-1]
4
numbers[0] + numbers[3] # doing arithmetic with the values (output: 5)
5
numbers[len(numbers)]
IndexError: list index out of range

Slicing

numbers[0:2] # Output: [1, 2]
[1, 2]
numbers[1:3] # Output: [2, 3]
[2, 3]
len(numbers) # use len() function to find the size. Output: 4
4
numbers[2:]  # Output: [3, 4]
[3, 4]

Multiply lists by a scalar

A scalar is a single value number.

numbers * 2

Concatenate lists with +

numbers2 = [30, 40, 50]
numbers + numbers2 # concatenate two lists

Lists can mix types

myList = ['coconuts', 777, 7.25, 'Sir Robin', 80.0, True]
myList

What happens if we multiply a list with strings?

# myList * 2

Lists can be nested

names = ['Darrell', 'Clayton', ['Billie', 'Arthur'], 'Samantha']
names[2] # returns a *list*
names[0] # returns a *string*

cannot subset into a float, will break

names[2][0]

Lists can concatenated with +

variables = ['x1', 'x2', 'x3']
response = ['y']
variables + response
['x1', 'x2', 'x3', 'y']

Dictionaries dict

Like a hash table.

Has key-value pairs.

Elements are indexed using brackets [] (like lists).

But they are constructed used braces {}.

Key names are unique. If you re-use a key, you overwrite its value.

Keys don’t have to be strings – they can be numbers or tuples or expressions that evaluate to one of these.

Constructing

dict1 = {
    'a': 1,
    'b': 2,
    'c': 3
}
dict2 = dict(x=55, y=29, z=99) # Note the absence of quotes around keys
dict2
{'x': 55, 'y': 29, 'z': 99}
dict3 = {'A': 'foo', 99: 'bar', (1,2): 'baz'}
dict3
{'A': 'foo', 99: 'bar', (1, 2): 'baz'}

Retrieve a value

Just write a key as the index.

phonelist = {'Tom':123, 'Bob':456, 'Sam':897}
phonelist['Bob']

Tuples

A tuple is like a list but with one big difference: a tuple is an immutable object!

You can’t change a tuple once it’s created.

A tuple can contain any number of elements of any datatype.

Accessed with brackets [] but constructed with parentheses ().

numbers

Constructing

Created with comma-separated values, with or without parenthesis.

letters = 'a', 'b', 'c', 'd'
letters
numbers = (1,2,3,4) # numbers 1,2,3,4 stored in a tuple

A single valued tuple must include a comma ,, e.g.

tuple0 = (29)
tuple0, type(tuple0)
tuple1 = (29,)
tuple1, type(tuple1)
len(numbers)
numbers[0] = 5 # Trying to assign a new value 5 to the first position

Common functions and methods to all sequences

len()
in
+ 
*
[1, 3] * 8
(1, 3) * 8

Membership with in

Returns a boolean.

'Sam' in phonelist

Sets

A set is an unordered collection of unique objects.

They are subject to set operations.

peanuts = {'snoopy','snoopy','woodstock'}
peanuts

Note the set is deduped

Since sets are unordered, they don’t have an index. This will break:

peanuts[0]
for peanut in peanuts:
    print(peanut)

Check if a value is in the set using in

'snoopy' in peanuts

Combine two sets

set1 = {'python','R'}
set2 = {'R','SQL'}

This fails:

set1 + set2

This succeeds:

set1.union(set2)

Get the set intersection

set1.intersection(set2)

Ranges

A range is a sequence of integers, from start to stop by step. - The start point is zero by default.
- The step is one by default.
- The stop point is NOT included.

Ranges can be assigned to a variable.

rng = range(5)

More often, ranges are used in iterations, which we will cover later.

for rn in rng:
    print(rn)

another range:

rangy = range(1, 11, 2)
for rn in rangy:
    print(rn)

Collections and defaultdict

Very often you will want to build a dictionary from some data source, and add keys as they appear. The default dict type in Python, however, requires that the key exists before you can mutate it. The defaultdict type in the collections module solves this problem. Here’s an example.

source_data = """
Lorem Ipsum is simply dummy text of the printing and typesetting industry. 
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, 
when an unknown printer took a galley of type and scrambled it to make a type 
specimen book. It has survived not only five centuries, but also the leap 
into electronic typesetting, remaining essentially unchanged. It was 
popularised in the 1960s with the release of Letraset sheets containing 
Lorem Ipsum passages, and more recently with desktop publishing software 
like Aldus PageMaker including versions of Lorem Ipsum.
"""[1:-1].split()
# source_data

Try with dict

words = {}
for word in source_data:
    words[word] += 1

Use try and except

for word in source_data:
    try:
        words[word] += 1
    except KeyError:
        words[word] = 1
words

Or use .get()

for word in source_data:
    words[word] = words.get(word, 0) + 1

Use collections.defaultdict

from collections import defaultdict
words2 = defaultdict(int) # Not the type must be set
for word in source_data:
    words2[word] += 1
words2