NB: Strings

Programming for Data Science

Strings

Strings are sequences of characters.

Characters are member of character sets. Python uses the Unicode character set.

Python does not have a data type for individual characters, though, as some languages do.

Instead, a string is an object type that is similar to a list, which we will cover soon.

Python makes it easy to manipulate string in complex ways.

This is very useful for data wrangling tasks such as web scraping and converting unstructured text files into structured data sets.

Kinds of quotes

Recall that strings are signified by the use of quotes.

Single and double quotes are identical in function.

They must be “straight quotes” though — cutting and pasting from a Word document with smart quotes won’t work.

'hello world!' == "hello world!"
True

Printing with print()

Python uses a print function to render output to screens or files.

This function was introduced in Python 3 and is one of the reasons Python 2 and 3 are incompatible.

Python 2 uses a print statement instead of a function. We’ll cover this topic later.

The print function takes a string as an argument and returns an interpreted version of the string, based its prefix (or lack of one).

print("This is a simple print statement")
This is a simple print statement

Escape Characters

Python supports special “escape characters” within quoted strings that produce effects when printed.

These are characters prefixed with a backslash \.

\\     Backslash (\)
\'     Single quote (')
\"     Double quote (")
\n     Line break
\t     Tab

Note that these escape characters are not unique to Python. They are part of almost all languages.

Here is a string with the tab character \t:

"Hello,\tWorld! (With a tab character)"
'Hello,\tWorld! (With a tab character)'

Here is the string interpreted by print():

print("Hello,\tWorld! (With a tab character)")
Hello,  World! (With a tab character)

Here we insert the new line character \n:

print("Line one\nLine two, with newline character")
Line one
Line two, with newline character

Remember that to concatenate strings, you may use the plus sign +:

print("Concatenation," + "\t" + "in strings with tab in middle")
Concatenation,  in strings with tab in middle

Quotes in Quotes

If you wanted to print quotes in a string, you can alternate singles and doubles:

print('Printing "quotes" within a string')
Printing "quotes" within a string
print("Printing 'quotes' within a string")
Printing 'quotes' within a string

Or you can escape the qoute:

print("Printing \"quotes\" within a string")
Printing "quotes" within a string

Spaces

By default, the print function puts spaces between strings and a newline at the end, but you can change that:

print("This", "is", "a", "sentence")
This is a sentence
print("This", "is", "a", "sentence", sep="|")
This|is|a|sentence
print("This", "is", "a", "sentence", end=" -- ")
print("This", "is", "a", "sentence")
This is a sentence -- This is a sentence

String Prefixes

Python allows you to prefix a string literal with a letter to change how the string is interpreted.

f strings

Prefixing a string with f (for ‘formatted’) allows variable interpolation — inplace evaluation of variables in strings.

people = 'knights'
greeting = 'Ni'
print(f'We are the {people} who say {greeting}!')
We are the knights who say Ni!

The brackets and characters within them (called format fields) are replaced with the passed objects.

r strings

Prefixing a string with r (for ‘raw’) causes escape characters to be uninterpreted.

print("Sentence one.\nSentence two.")
Sentence one.
Sentence two.
print(r"Sentence one.\nSentence two.")
Sentence one.\nSentence two.

Comments

Comments are lines of code that aren’t read by the interpreter.

They are used to explain blocks of code, or to remove code from execution when debugging.

# This is a comment

Multi-line Strings

Python lets you put strings that take up more than one line into your code by using ''' or """.

foo = '''
This is an
example of
a multi-line
comment: single quotes
'''
print(foo)

This is an
example of
a multi-line
comment: single quotes

Note that the hard returns are represented as \ns.

foo
'\nThis is an\nexample of\na multi-line\ncomment: single quotes\n'

Run-time User Input

The input() function allows users to enter data while the program is running:

answer = input("What is your name? ")
print("Hello, " + answer + "!")
What is your name?  Rafael
Hello, Rafael!

Some String Functions

Python has many built-in string methods and functions.

See Common String Operations for more info.

.lower(), .upper()

These will convert the case of a string.

'BOB'.lower()
'bob'
'carlos'.upper()
'CARLOS'

.split()

This will parse a string based on a delimiter, which defaults to whitespace.

NOTE: This does not use regular expressions.

This returns a list.

monty_python_quote = 'are.you.suggesting.coconuts.migrate'
monty_python_quote
'are.you.suggesting.coconuts.migrate'
monty_python_quote.split('.') 
['are', 'you', 'suggesting', 'coconuts', 'migrate']

Note that literal strings behave like objects.

'are.you.suggesting.coconuts.migrate'.split('.')
['are', 'you', 'suggesting', 'coconuts', 'migrate']

.strip(), .rstrip(), lstrip() Strip methods

You remove extra whitespace from strings using strip(), rstrip() and lstrip().

Whitespace characters are characters that are used for spacing.

These include newlines, spaces, tabs, carriage returns, feed, etc.

.strip() removes white space from anywhere in a string.
.rstrip() only removes white space from the right-hand-side of the string.
.lstrip() only removes white space from the left-hand-side of the string.

str1 = '  hello, world!'    # white space at the beginning
str2 = '  hello, world!  '  # white space at both ends
str3 = 'hello, world!  '    # white space at the end
str1, str2, str3
('  hello, world!', '  hello, world!  ', 'hello, world!  ')
str1.lstrip(), str1.rstrip()
('hello, world!', '  hello, world!')
str2.strip(), str2.rstrip()
('hello, world!', '  hello, world!')
str2.lstrip(), str3.rstrip()
('hello, world!  ', 'hello, world!')

.startswith()

This lets you see if a string starts with a character or a string.

status = 'success'
status.startswith('a')
False
status.endswith('s')
True
status.endswith('ss')
True

.replace()

This lets you swap out characters or strings.

"latina".replace("a", "x")
'lxtinx'
"good night".replace("night", "day")
'good day'

.format()

Instead of using the f string prefix, you can use the format method to embed variables in strings.

Place {} in the string in order from left to right. followed by .format(var1, var2, ...)`

epoch = 20
loss = 1.55
print('Epoch: {}, loss: {}'.format(epoch, loss))
Epoch: 20, loss: 1.55

This breaks, as three variables are required based on number of {}

print('Epoch: {}, loop: {}, loss: {}'.format(epoch, loss))
IndexError: Replacement index 2 out of range for positional args tuple

.zfill()

Use this method to pad strings with zeros, which is useful for printing out data sets in raw text form.

print('12'.zfill(5))       
print('3.14'.zfill(7))    
print('-3.14'.zfill(7))    
print('3.141592'.zfill(3)) # Will not truncate
00012
0003.14
-003.14
3.141592

Other Functions and Methods

There are many other functions and methods for strings.

Many of these are based on the fact that strings are list-like.

We will cover these when we review lists.