Help support the author by donating or purchasing a copy of the book (not available yet)



Chapter 6 - Strings

6.1 - Introduction to strings

In this chapter we're going to take a closer look at strings. We have met them various times throughout the book and we know they represent textual data.

Manipulating textual data is common in programming so becoming comfortable working with strings is essential!

In Python we represent strings as a sequence of characters enclosed by " " or ' '. For example:

my_string = hello, world!    # This is not a string

my_string = "hello, world!"  # This is a string!
my_string = 'hello, world!'  # This is also a string!

Python also offers a large set of operations, methods and functions for working with strings, some of which we'll meet later on in this chapter. Working with strings in Python is also quite easy compared to other languages.

There are a couple of things I'd like to point out before we jump onto string methods. Firstly, I want to take another quick look at the print function.

print() converts each of its arguments to strings, inserts a space character in the output between each of its arguments then appends a newline character and writes the result to standard output (in our case the terminal window).

When I say argument I mean the following:

print("This", "and", "that")

# OUTPUT
"This and that"

In the above code, each of the strings is an argument and you can have as many arguments as you like (we'll look at arguments in more detail later in the functions chapter).

When working with textual data (strings) we have some special characters. Some of these are:

Special Character Meaning
\n Newline
\r Carriage return
\t Tab

And here is what they do:

>>> print("Hello\nWorld")
Hello
World

>>>print("Welcome to my\rhome")
Homeome to my

>>>print("Apples\tOranges")
Apples  Oranges

The \t adds a tab, the \nwill move to a new line and \r will essentially take whatever comes after it, and overwrite whatever comes before it up to the length of whatever comes after it. Don't worry about it too much I've only encountered it a handful of times.

We probably wont encounter the tab or carriage return characters again throughout this book but I want to warn you about the newline character. Just because you can't see it doesn't mean it's not there so if you are ever getting unexpected output the newline character might be causing it! But we'll also look at dealing with it a little later on.

There is also a special string called the empty string. This is represented as "".

6.2 - String functions

As I've said previously, Python provides lots of functions for working with strings, so let's take a look at some of the most important ones now!

Method What it does
len() returns the length of the string
str() Returns the string representation of an object
chr() Converts an integer to a character
ord() Converts a character to an integer

Lets look at these in action:

>>> len("Hello")
5
>>> len("Hello\nWorld")     # Newline character counts!
11
>>> str(5 + 5)
'10'
>>> chr(65)
'A'
>>> ord("B")
66

len() and str() work as expected. We cast a type to a string using the str() function.

But what are chr() and ord() doing?

At the most basic level, computers store information as numbers. To represent characters, a translation scheme has been devised which maps each character to its representative number.

The simplest of these schemes is called ASCII which you may have heard of. It stands for American Standard Code for Information Interchange and it covers the common Latin characters you're probably used to working with.

chr() and ord() are used for translating between codes and characters.

6.3 - String methods

We're now going to look at string methods. They're kind of like functions except they're specific to strings. For example, len() works on may different types.

The syntax for a method is as follow:

object.method(<arguments>)

Let's look some at the most commonly used string methods:

Method Meaning
s.capitalize() returns a copy of s with the first character capitalized
s.title() returns a copy of s with the first letter of each word capitalized
s.upper() returns a copy of s with all alphabetic characters capitalized
s.lower() returns a copy of s with all alphabetic characters converted to lowercase
s.isalnum() returns True if s is nonempty and all its characters are alphanumeric (letters and numbers) and False otherwise.
s.isalpha() returns True if s is nonempty and all its characters are alphabetic and False otherwise
s.strip() returns a copy of s with the whitespace on the left-hand side and right-hand side removed
s.lstrip() returns a copy of s with the whitespace on the left-hand side removed
s.rstrip() returns a copy of s with the whitespace on the right-hand side removed.

Let's look at these in action:

>>> s = "tHiS is A StRIng"
>>> s.capitalize()
"This is a string"

>>> s = "tHis is A StRIng"
>>> s.title()
"This Is A String"

>>> s = "tHis is A StRIng"
>>> s.upper()
"THIS IS A STRING"

>>> s = "tHis is A StRInt"
>>> s.lower()
"this is a string"

>>> s = "937 ThIs is A STRing"
>>> s.isalnum()
True

>>> s = "93 *** ThIs is A STRing"
>>> s.isalnum()
False

>>> s = "937 ThIs is A STRing"
>>> s.isalpha()
False

>>> s = "ThIs is A STRing"
>>> s.isalpha()
True

>>> s = "      This is a string         "
>>> s.strip()
"This is a string"

>>> s = "    This is a string    "
>>> s.lstrip()
"This is a string    "

>>> s = "    This is a string    "
>>> s.rstrip()
"    This is a string"

For some of these methods, when we don't pass any arguments, they have a default behaviour. However, we can change this by passing arguments to the function. Let me show you how that works:

>>> s = "This is my stringggggg"
>>> s.rstrip("g")
"This is my strin"               # rstrip() now removes 'g' instead of whitespace to the                                      right-hand side

Make sure you become familiar with these, you'll probably use them often (some more often than others).

6.4 - String indexing

Important Note: This section is incredibly important so pay particular attention to it!

Remember we talked about memory and counting from 0 for indices? Well let's combine those two ideas here.

I mentioned earlier in the chapter that strings are a sequence of characters. In memory they are represented as :

-------------------------------
|  H  |  E  |  L  |  L  |  O  |     # Each memory location contains one of the characters
-------------------------------



-------------------------------
|  H  |  E  |  L  |  L  |  O  |     # Each location also has an address
-------------------------------
   |     |     |     |     |
   13    14    15    16    17        # These are hypothetical addresses
   

-------------------------------
|  H  |  E  |  L  |  L  |  O  |     # We can get the address of 'E' as 13 + 1
-------------------------------     # Similarly for 'H' as 13 + 0
   |     |     |     |     |
   13    14    15    16    17
   
                                    # The 0 above is an "index" so is the 1!

We can index strings in a similar way! The syntax for indexing into a string is as follows:

>>> s = "Hello"
>>> s[0]
"H"
>>> s[1]
"e"
>>> s[2]
"l"
>>> s[3]
"l"
>>> s[4]
"o"
>>> s[5]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

Notice in the last line we try to index to string at position 5 but we get an IndexError and we're told the index is out of range. This is because we start counting from 0 and 0-4 are the 5 indices of the string.

So, the final index of string s is: len(s) - 1 and we can index using this: s[len(s) - 1]

String indexes may also be negative numbers and will specify locations relative to the end of the string.

>>> s = "Hello"
>>> s[-1]
"o"
>>> s[-2]
"l"
>>> s[-3]
"l"
>>> s[-4]
"e"
>>> s[-5]
"H"
H   E   L   L   O       # String
0   1   2   3   4       # Positive indices
-5  -4  -3  -2  -1      # Negative indices

Recap for this section:

6.5 - Immutability

In Python, every variable holds an instance of an object and there are two types of objects, mutable and immutable. So far we have only met immutable types i.e ints, floats, bools and strings. An object being immutable means it's value can't be changed once created.

For example, doing something such as:

10 = 3
True = False
"Hello" = "World"

This wouldn't make any sense and it's probably quite obvious why. However, now that we know about string indexing, doing something such as:

s = "string"
s[0] = "b"

This might seem reasonable. However it's not and there is good reason for it. What if we tried something such as:

s = "string"
s[0] = "bl"

Now that might also seem reasonable but remember how strings are stored in memory. When they are created, they have a length and trying to change that length such as above would cause issues on a lower level (not that it can't be done, it's just not worth it in 99% of cases). Making strings immutable types increases performance and security but don't worry about that for now. Just know you can't change a strings value once it's created.

To get a better understanding of how strings are stored in memory and how they're referenced, take a look at the code below, then the diagram that follows.

a = "Hello"
b = a
a = "World"
arrows

arrows

From the first part of the diagram, I said before a contains "Hello", however that isn't the case. That was for simplicity. What's actually happening is, Python maintains a namespace of mappings from variables to objects they refer to. After a = "Hello" is executed, a mapping from a to the value "Hello" is added to the namespace. a now points to the memory location that holds the value "Hello". We say that a contains a reference to the value "Hello" stored in memory.

When the line b = a is executed, what happens is, b now points to the same memory location that a points to and when a = "World" is executed, a new string instance is created and saved in some new memory location and the location that a points to is updated.

Hence, this behaviour is what makes strings immutable. Try to gain an understanding of this and we'll come back to it and look at it in more detail later on.

6.6 - String slicing

We have seen how to get characters at specific positions of a string using indexing e.g. s[0] gets the character at the first position of string s.

What if we wanted to get the characters from the first to third position of the string "Hello" i.e. "Hel".

Well we can! We use slicing to do so. Slicing works very similarly to indexing except, rather than just specifying the start index, we also give the end index we want.

Here's a couple of examples:

>>> s = "Hello"
>>> s[0:3]
'Hel'

>>> s[0:1]
'H'

The syntax is: string[ <start> : <end> ]

Notice how slicing works though. Using the first example, slicing returns a new string with everything from s at the start index, up to but not including the end index.

If we wanted everything after the first letter in s, we can omit the end index. We can also omit the first index and leave the end index which will give us everything from index 0 up to the specified end index

>>> s = "Hello"
>>> s[1:]
'ello'
>>> s[:3]
'Hel'

We can also omit both indexes, in which case we'll get the entire string returned:

>>> s = "Hello"
>>> s[:]
'Hello'

Note that the colon must remain in either case.

The indices may also be variables:

>>> first = 2
>>> last = 4
>>> s = "Hello"
>>> s[first:last]
'll'

The indices may also be negative numbers and will specify locations relative to the end of the string:

>>> s = "Hello"
>>> s[:-1]
'Hell'
>>> s[-4:]

Python also offers extended slicing for strings. This adds a third parameter when slicing the string. This third parameter indicates the step size and the syntax is s[first:last:step].

This is done as follows:

>>> s = "This is a string to be sliced"
>>> s[::1]
'This is a string to be sliced'
>>> s[::2]
'Ti sasrn ob lcd'
>>> s[1:8:2]
'hsi'

The step size can also be negative (This is useful for reversing a string)

>>> s = "This is a string to be sliced"
>>> s[::-1]
'decils eb ot gnirts a si sihT'
>>> s[::-2]
'dcl bo nrsas iT'

6.7 - String operators

Strings also have operators. These are the + operator and the * operator. They don't function the same as integer or float operators.

The + operator is called the concatenation operator and is used to join strings together.

>>> s1 = "Hello"
>>> s2 = "World"
>>> s1 + s2
"HelloWorld"

The * operator is called the replication operator and is used to replicate a string.

>>> s = "Hello"
>>> s * 3
"HelloHelloHello"

These operators also have precedence associated with them:

>>> s = "This is a string"
>>> s + ' ' * 3
'This is a string   '
>>> (s + ' ') * 3
'This is a string This is a string This is a string '

There is also another special operator called the in operator. This will return a boolean based on whether or not a substring is contained within a string.

>>> s = "This is a string"
>>> "This" in s
True
>>> "That" in s
False

6.8 - Printing & formatting

We looked at the print function a little while back. It is quite simple in how it works. Takes something as an argument, tries to convert it to a string then prints it to standard output (We'll look at standard output a little later, for now though, its just the terminal window).

Sometimes we want our output to look nice and have better readability. Let's look at an example with bad readability.

i = 0
while i < 12:
    print(i, '* 15 = ', i * 15)
    i += 1

# OUTPUT
0 * 15 =  0
1 * 15 =  15
2 * 15 =  30
3 * 15 =  45
4 * 15 =  60
5 * 15 =  75
6 * 15 =  90
7 * 15 =  105
8 * 15 =  120
9 * 15 =  135
10 * 15 =  150
11 * 15 =  165

We can see that as the numbers become larger in the output, the spacing becomes off and lines start to jut out. We will look at how to make this look nice a little later in this section.

To achieve better formatting on output we use the format() operator. This operator prepares a string for printing.

The syntax is as follow: '{}'.format(x).

The string on which format operates ('{}') is called the format string. The {} is called a placeholder or (replacement field). Inside the placeholders we specify how data should be displayed and the arguments to the format method are the items of data that will be inserted into each placeholder and formatted according to the placeholders format commands.

The general syntax for a format command is {[: <align> <minimum width> <.precision> <type>]} where square brackets indicate optional parameters.

Let's look at how this in action:

>>> pi = 3.14159265359
>>> print('{:.3f}'.format(pi))
'3.142'

Lets break down the format command here ({:.3f}).

The .3 is the precision. In this case we want pi to be formatted to 3 decimal places. The f is the type which stands for floating point type.

We can have multiple placeholders in the format string:

>>> pi = 3.14159265359
>>> e = 2.71828182845
>>> print('The number pi is: {:.3f} and the number e is {:.5f}'.format(pi, e))
'The number pi is: 3.142 and the number e is 2.71828'

The first argument to be formatted is matched with the first placeholder and the second argument to the second placeholder and so on.

We can also have nested placeholders:

e = 2.71828

i = 0
while i < 6:
    print('{:.{}f}'.format(e, i))
    i += 1

# OUTPUT
3
2.7
2.72
2.718
2.7183
2.71828

Nested placeholders are matched from left to right as well. A way of helping thinking about this is, each time we encounter an opening curly brace, we will be inserting the next argument.

Another option we have to the format command is the minimum width.

We can use this minimum width to solve the problem we had with the output from the 15 times tables.

i = 0
while i < 12:
    print('{:2d} * {:2d} = {:3d}'.format(i, 15, i*15))
    i += 1

# OUTPUT
 0 * 15 =   0
 1 * 15 =  15
 2 * 15 =  30
 3 * 15 =  45
 4 * 15 =  60
 5 * 15 =  75
 6 * 15 =  90
 7 * 15 = 105
 8 * 15 = 120
 9 * 15 = 135
10 * 15 = 150
11 * 15 = 165

The minimum width will pad out the argument with whitespace to the specified minimum width.

Notice that we have d instead of f. This means an integer. We also do not specify a precision which is indicated by a . followed by the precision number.

We can also specify an alignment. Alignments are specified with ^ for centred, < for left justified and > for right justified.

Let's look at the star example, this is by no means a good solution, it's simply to show the use of the alignment format command:

i = 0
while i < 6: 
    print('{:^10s}'.format('* '*i))
    i += 1

i -= 2
while i > 0:
    print('{:^10s}'.format('* '*i))
    i -= 1
    
# OUTPUT
    *     
   * *    
  * * *   
 * * * *  
* * * * * 
 * * * *  
  * * *   
   * *    
    *

As you can see in the format command we specify the string to be centred with a minimum width of 10 and the type to be s which is a string.

6.9 - Exercises

IMPORTANT NOTE:

I can't show you everything there is to python. The book would go on forever. A really important skill all good programmers have is being able to know what to look for when they run into a problem they can't solve themselves. For example, I might want to be able to something with a string but I don't know how. Knowing where to look to find out the answer is super important. You may have even come across a website called stack overflow by now. This is going to be a really good resource for you. Therefore some the questions in these exercises may require you to go and search for methods to help you arrive at your solution.

There is a lot to take in from this chapter. String are really important and used all the time. Becoming comfortable with them is equally important.

Question 1

Write a program that takes as input, a single integer from the user which will specify how many decimal places the number e should be formatted to.

Take e to be 2.7182818284590452353602874713527

# EXAMPLE INPUT
4
# EXAMPLE OUTPUT
2.7183

Question 2

Write a program that will take as input, a string and two integers. The two integers will represent indices. If the string can be sliced using the two indices then print the sliced string. If either or both of the integers are outside the strings index range then print that the string cannot be sliced at those integers.

Assume that the integers can also be negative.

# EXAMPLE INPUT
"This is my string"
2
9

# EXAMPLE OUTPUT
'is is m'

# EXAMPLE INPUT
"This is a string"
10
22

# EXAMPLE OUTPUT
'Cannot slice string using those indices'

Question 3

When you sign up for accounts on website or apps, you may be told your password strength when entering it for the first time. In this exercise, you are to write a program that takes in as input, a string that will represent a password.

Assume a password can contain the following:

A passwords strength should be graded on how many of the above categories are contained in the password. The password should be given a score of 1 to 4.

If the password is greater than or equal to strength 3 (contains characters from 3 of the above categories) then you should print the strength and that the password is valid. Otherwise print the strength and that the password is not valid.

Hint: You may need to look up some methods that will be useful in determining the class of each character. Google will probably help but pythons own documentation will also help.

Python comes with built-in documentation. To access this, at the command prompt type pydoc str. This will return the documentation for the str type. Here you will find all methods strings have to offer.

# EXAMPLE INPUTS
978
hjj
jKl
nmM2
r@num978LL
LLLL

# EXAMPLE OUTPUTS
1
1
2
3
4
1

Question 4

Write a program that takes 3 floating point numbers as input from the user: a starting radius, radius increment and an ending radius. Based on these three numbers, your program should output a table with a spheres corresponding surface area and volume.

The surface area and volume of a sphere are given by the following formulae:
$$ A = 4πr^2 $$


$$ V = \frac 4 3 \pi r^3 $$

Hint: Formatting is key here

# EXAMPLE INPUT
1
1
10
# EXAMPLE OUTPUT
    Radius            Area          Volume
----------      ----------    ------------
       1.0           12.57            4.19
       2.0           50.27           33.51
       3.0          113.10          113.10
       4.0          201.06          268.08
       5.0          314.16          523.60
       6.0          452.39          904.78
       7.0          615.75         1436.76
       8.0          804.25         2144.66
       9.0         1017.88         3053.63
      10.0         1256.64         4188.79


Help support the author by donating or purchasing a copy of the book (not available yet)



Previous Chapter - Next Chapter