import pandas as pd
import numpy as np
M02 Notes
%%html
<style>
float: left; clear: right;}
table {-align:right;}
td, th {text</style>
Review Assignment
See M01 Notebook for results.
Activities
Activity 1 - Which Python do you have? View - Do you have Anaconda?
Activity 2 - Running Jupyter Lab View - Running VSCode
Concepts
Data / Code
Data vs algorithm (code). How are they related?
Data types and structures
Data types and data structures. What are the differences?
Data types are atomic; they don’t contain other data types.
A data structure contains data types organized in a certain way.
Strings
Strings are data types, but internally they are like data structures.
However, unlike the data structures considered here, strings can’t contain any of the data types specified by Python.
Internally, a string is a sequence of Unicode code points, which are not exposed as data types (as they are in some other languages).
- A code point is a numerical value that maps to a specific character.
- Unicode is an international standard of code points that map onto the alphabets of many languages.
Each character is an element in an immutable list-like structure.
You can access it’s elements as if it were a tuple of characters:
= "This is a string"
my_string print(my_string[0])
print(my_string[-1])
print(my_string[1:-1])
print(my_string[1:4])
print(my_string[1:-4])
print(my_string[-4:1])
T
g
his is a strin
his
his is a st
But also like a tuple, you can’t change its values:
2] = 'a'
my_string[3] = 't' my_string[
TypeError: 'str' object does not support item assignment
Note that some languages, like Java, have a data type for individual characters, e.g. A
.
String indexing
Note that strings can be accessed via indexes, since they are list-like sequences.
Every positive number has a corresponding negative number, and these may be
substituted freely and using indexes and slices.
Slices have to be expressed using numbers going from left to right.
The following example illustrates these points.
= "I AM A STRING" my_string2
The above string can represented in the following way:
I | A | M | A | S | T | R | I | N | G | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
-13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |
Note that the second and third rows of this table represent two functionally
equivalent ways of accessing elements of the string sequence.
Also note that positive and negative numbers are subsitutible.
print(my_string2[12], '==', my_string2[-1])
G == G
2:6] my_string2[
'AM A'
2:-7] my_string2[
'AM A'
-11:-7] my_string2[
'AM A'
-11:6] my_string2[
'AM A'
Note that you can go backwards, too, with the step paramater after the second colon.
The default value is 1.
10:1:-1] my_string2[
'IRTS A MA'
2] my_string2[::
'IA TIG'
Mutability
A mutable object is a data structure whose internal values can be changed.
For example, tuples are immutable, lists are not.
Demonstration
Here, we mutate a list by appending a value to it.
= [1,2,3,4,5]
a 10)
a.append(print(a)
[1, 2, 3, 4, 5, 10]
0] = 5
a[print(a)
[5, 2, 3, 4, 5, 10]
If we try the same things with a tuple, we get an error.
= (1,2,3,4,5)
b 10)
b.append(print(b)
AttributeError: 'tuple' object has no attribute 'append'
0] = 5
b[print(b)
TypeError: 'tuple' object does not support item assignment
This, on the other hand, is not mutation:
= [1,2,3,4,5,10] # A list
a = (1,2,3,4,5,10) # A tuple
b print(a)
print(b)
[1, 2, 3, 4, 5, 10]
(1, 2, 3, 4, 5, 10)
We are just re-assigning a new value to the variable.
The new value just replaces the old one.
In mutation, the same data structure remains in place but its contents are changed.
Note, however, that this works with tuples:
+= (11,)
b print(b)
(1, 2, 3, 4, 5, 10, 11)
It looks like mutation, but it’s not.
This is because we are replacing b
with a new tuple value.
Notice that we write a single valued tuple with a comma. Why?
Behavior
View the video for this topic on Canvas.
Relatedly, mutable and immutable objects behave differently.
For example, when you assign a variable to another variable of a
mutable datatype, any changes to the data are reflected by both variables.
The new variable is just an alias for the old variable.
This is only true for mutable datatypes.
Lets explore how +
operator behaves differently.
First, let’s create a function that will allow us
to compare the objects as we modify them.
def compare_objects(trial:int, obj1:str, obj2:str):
= eval(obj1)
o1 = eval(obj2)
o2 print(f"t{trial} {obj1} {o1} {id(o1)}")
print(f"t{trial} {obj2} {o2} {id(o2)}")
print(f"{obj1} == {obj2}:", o1 == o2)
List t1
We initialize a list and make a copy of it.
Note that the two variables share the same id
.
= [1,2,3,4,5]
a0 = a0 # Make a copy of a list
a1 1, 'a0', 'a1') compare_objects(
t1 a0 [1, 2, 3, 4, 5] 140486786318976
t1 a1 [1, 2, 3, 4, 5] 140486786318976
a0 == a1: True
List t2
Now we add to the copy and note the effects on the original.
The original value is also changed.
This is because both variables point to the same object.
+= [12] # Extend the copy
a1 2, 'a0', 'a1') compare_objects(
t2 a0 [1, 2, 3, 4, 5, 12] 140486786318976
t2 a1 [1, 2, 3, 4, 5, 12] 140486786318976
a0 == a1: True
List t3
Note, however, that if we don’t use the unary operator,
then a1
becomes a different object!
Lutz goes into the difference between the += and the + in Ch 11 pages 360-363.
= a1 + [12] # Extend the copy
a1 3, 'a0', 'a1') compare_objects(
t3 a0 [1, 2, 3, 4, 5, 12] 140486786318976
t3 a1 [1, 2, 3, 4, 5, 12, 12] 140485791147584
a0 == a1: False
List t4
Try it with a new object copy, to avoid any possible inference between t2
and t3
.
= a0
a2 = a2 + [12] # Extend the copy
a2 4, 'a0', 'a2') compare_objects(
t4 a0 [1, 2, 3, 4, 5, 12] 140486786318976
t4 a2 [1, 2, 3, 4, 5, 12, 12] 140485791151296
a0 == a2: False
We get the same result.
Tuple t1
Let’s try this with a tuple.
We see again that both variables have the same id
.
= (1,2,3,4,5)
b0 = b0 # Make a copy of a tuple
b1 1, 'b0', 'b1') compare_objects(
t1 b0 (1, 2, 3, 4, 5) 140485791255184
t1 b1 (1, 2, 3, 4, 5) 140485791255184
b0 == b1: True
Tuple t2
However, if extend the tuple with the unary operator,
b1
becomes a new object.
Note how this differs from the list behavior.
+= (12,) # Extend the copy
b1 2, 'b0', 'b1') compare_objects(
t2 b0 (1, 2, 3, 4, 5) 140485791255184
t2 b1 (1, 2, 3, 4, 5, 12) 140485791104832
b0 == b1: False
Tuple t3
If we don’t use the unary operator, the same thing happens again.
The value of b1
becomes a new object because the variable has been reassigned.
= b1 + (12,) # Extend the copy
b1 3, 'b0', 'b1') compare_objects(
t3 b0 (1, 2, 3, 4, 5) 140485791255184
t3 b1 (1, 2, 3, 4, 5, 12, 12) 140485791318080
b0 == b1: False
Let’s look at another example.
Here is a list:
= ['hi']
foo = foo
bar 1, 'foo', 'bar') compare_objects(
t1 foo ['hi'] 140485791149952
t1 bar ['hi'] 140485791149952
foo == bar: True
+= ['bye']
bar 2, 'foo', 'bar') compare_objects(
t2 foo ['hi', 'bye'] 140485791149952
t2 bar ['hi', 'bye'] 140485791149952
foo == bar: True
= bar + ['bye']
bar 2, 'foo', 'bar') compare_objects(
t2 foo ['hi', 'bye'] 140485791149952
t2 bar ['hi', 'bye', 'bye'] 140485791173504
foo == bar: False
And here is a tuple:
= ('hi')
foo1 = foo1
bar1 1, 'foo1', 'bar1') compare_objects(
t1 foo1 hi 140486835502896
t1 bar1 hi 140486835502896
foo1 == bar1: True
+= ('bye')
bar1 2, 'foo1', 'bar1') compare_objects(
t2 foo1 hi 140486835502896
t2 bar1 hibye 140485792814832
foo1 == bar1: False
Comparing floats
Let’s do an experiment:
= 0.1 + 0.2
f1 = 0.3 f2
== f2 f1
False
In the above case, f1
and f2
don’t hold precisely the same value because of the limitations of representing base-10 fractions in base-2 (binary).
Inspecting their values, we find minor differences in the lower significant digits:
f1, f2
(0.30000000000000004, 0.3)
To get around this problem, try using math.isclose()
instead of ==
:
import math
math.isclose(f1, f2)
True
Note that sometimes floating point comparisons do work:
= 4.0
f3 = 3.5 + .5 f4
== f4 f3
True
See the Wikipedia article on floating point arithmetic to learn more about how this arises.
It will provide you with insight into how computers actually work as machines that process numbers.
The word “scalar”
Sometimes you will see the word “scalar” in the literature to refer to certain kinds of values.
Scalars are single values as opposed to structures or collections of values.
Strings as data types sometimes behave as scalars and sometimes as sequential structures.
Summary
Types | name | type | literal | |——|——|———| | int
| integer | 1
| | str
| string | "1"
, '1'
| | float
| floating point (real) | 1.
| | complex
| complex | 1j
(imaginary component) | | bool
| boolean | True
|
Structures | name | mutable | constructor | |——|———|————-| | tuple
| no | ()
, tuple()
| | list
| yes | []
, list()
| | dict
| yes | {}
with key/value pairs, dict()
| | set
| yes | {}
with single values, set()
|