'hello world!' == "hello world!"
True
Programming for Data Science
Strings are sequences of characters.
Characters are member of character sets. Python uses the Unicode character set.
Python does not have a data type for individual characters, though, as some languages do.
Instead, a string is an object type that is similar to a list, which we will cover soon.
Python makes it easy to manipulate string in complex ways.
This is very useful for data wrangling tasks such as web scraping and converting unstructured text files into structured data sets.
Recall that strings are signified by the use of quotes.
Single and double quotes are identical in function.
They must be “straight quotes” though — cutting and pasting from a Word document with smart quotes won’t work.
'hello world!' == "hello world!"
True
print()
Python uses a print function to render output to screens or files.
This function was introduced in Python 3 and is one of the reasons Python 2 and 3 are incompatible.
Python 2 uses a print statement instead of a function. We’ll cover this topic later.
The print function takes a string as an argument and returns an interpreted version of the string, based its prefix (or lack of one).
print("This is a simple print statement")
This is a simple print statement
Python supports special “escape characters” within quoted strings that produce effects when printed.
These are characters prefixed with a backslash \
.
\\ Backslash (\)
\' Single quote (')
\" Double quote (")
\n Line break
\t Tab
Note that these escape characters are not unique to Python. They are part of almost all languages.
Here is a string with the tab character \t
:
"Hello,\tWorld! (With a tab character)"
'Hello,\tWorld! (With a tab character)'
Here is the string interpreted by print()
:
print("Hello,\tWorld! (With a tab character)")
Hello, World! (With a tab character)
Here we insert the new line character \n
:
print("Line one\nLine two, with newline character")
Line one
Line two, with newline character
Remember that to concatenate strings, you may use the plus sign +
:
print("Concatenation," + "\t" + "in strings with tab in middle")
Concatenation, in strings with tab in middle
If you wanted to print quotes in a string, you can alternate singles and doubles:
print('Printing "quotes" within a string')
Printing "quotes" within a string
print("Printing 'quotes' within a string")
Printing 'quotes' within a string
Or you can escape the qoute:
print("Printing \"quotes\" within a string")
Printing "quotes" within a string
By default, the print function puts spaces between strings and a newline at the end, but you can change that:
print("This", "is", "a", "sentence")
This is a sentence
print("This", "is", "a", "sentence", sep="|")
This|is|a|sentence
print("This", "is", "a", "sentence", end=" -- ")
print("This", "is", "a", "sentence")
This is a sentence -- This is a sentence
Python allows you to prefix a string literal with a letter to change how the string is interpreted.
f
stringsPrefixing a string with f
(for ‘formatted’) allows variable interpolation — inplace evaluation of variables in strings.
= 'knights'
people = 'Ni' greeting
print(f'We are the {people} who say {greeting}!')
We are the knights who say Ni!
The brackets and characters within them (called format fields) are replaced with the passed objects.
r
stringsPrefixing a string with r
(for ‘raw’) causes escape characters to be uninterpreted.
print("Sentence one.\nSentence two.")
Sentence one.
Sentence two.
print(r"Sentence one.\nSentence two.")
Sentence one.\nSentence two.
Python lets you put strings that take up more than one line into your code by using '''
or """
.
= '''
foo This is an
example of
a multi-line
comment: single quotes
'''
print(foo)
This is an
example of
a multi-line
comment: single quotes
Note that the hard returns are represented as \n
s.
foo
'\nThis is an\nexample of\na multi-line\ncomment: single quotes\n'
The input()
function allows users to enter data while the program is running:
= input("What is your name? ")
answer print("Hello, " + answer + "!")
What is your name? Rafael
Hello, Rafael!
Python has many built-in string methods and functions.
See Common String Operations for more info.
.lower()
, .upper()
These will convert the case of a string.
'BOB'.lower()
'bob'
'carlos'.upper()
'CARLOS'
.split()
This will parse a string based on a delimiter, which defaults to whitespace.
NOTE: This does not use regular expressions.
This returns a list.
= 'are.you.suggesting.coconuts.migrate' monty_python_quote
monty_python_quote
'are.you.suggesting.coconuts.migrate'
'.') monty_python_quote.split(
['are', 'you', 'suggesting', 'coconuts', 'migrate']
Note that literal strings behave like objects.
'are.you.suggesting.coconuts.migrate'.split('.')
['are', 'you', 'suggesting', 'coconuts', 'migrate']
.strip()
, .rstrip()
, lstrip()
Strip methodsYou remove extra whitespace from strings using strip()
, rstrip()
and lstrip()
.
Whitespace characters are characters that are used for spacing.
These include newlines, spaces, tabs, carriage returns, feed, etc.
.strip()
removes white space from anywhere in a string.
.rstrip()
only removes white space from the right-hand-side of the string.
.lstrip()
only removes white space from the left-hand-side of the string.
= ' hello, world!' # white space at the beginning
str1 = ' hello, world! ' # white space at both ends
str2 = 'hello, world! ' # white space at the end str3
str1, str2, str3
(' hello, world!', ' hello, world! ', 'hello, world! ')
str1.lstrip(), str1.rstrip()
('hello, world!', ' hello, world!')
str2.strip(), str2.rstrip()
('hello, world!', ' hello, world!')
str2.lstrip(), str3.rstrip()
('hello, world! ', 'hello, world!')
.startswith()
This lets you see if a string starts with a character or a string.
= 'success' status
'a') status.startswith(
False
's') status.endswith(
True
'ss') status.endswith(
True
.replace()
This lets you swap out characters or strings.
"latina".replace("a", "x")
'lxtinx'
"good night".replace("night", "day")
'good day'
.format()
Instead of using the f
string prefix, you can use the format method to embed variables in strings.
Place {}
in the string in order from left to right. followed by .format(var1, var2, ...
)`
= 20
epoch = 1.55
loss print('Epoch: {}, loss: {}'.format(epoch, loss))
Epoch: 20, loss: 1.55
This breaks, as three variables are required based on number of {}
print('Epoch: {}, loop: {}, loss: {}'.format(epoch, loss))
IndexError: Replacement index 2 out of range for positional args tuple
.zfill()
Use this method to pad strings with zeros, which is useful for printing out data sets in raw text form.
print('12'.zfill(5))
print('3.14'.zfill(7))
print('-3.14'.zfill(7))
print('3.141592'.zfill(3)) # Will not truncate
00012
0003.14
-003.14
3.141592
There are many other functions and methods for strings.
Many of these are based on the fact that strings are list-like.
We will cover these when we review lists.
Comments
Comments are lines of code that aren’t read by the interpreter.
They are used to explain blocks of code, or to remove code from execution when debugging.