Help support the author by donating or purchasing a copy of the book (not available yet)



Chapter 22 - Basic Data Structures

The backbone of every piece of software you will write will consist of two main things: data and algorithms. We know that algorithms are used to manipulate the data within our software (and that manipulation should be done as efficiently as possible). For this reason, it is important to structure our data so that our algorithms can manipulate the data efficiently.

We have met various data structures already, lists and dictionaries to name a few. You also know that using lists for example, makes it very easy for us to store and manipulate sequences of data (Trust me, without this abstraction it becomes a bit of a pain).

We have also seen that data structures allow us to model systems that we see in the real world. For example, we could design and build a class that allows us to model a library. We could easily use this to build a library management system.

In this chapter we're going to look at two fundamental data structures that are used everywhere in computing. We're also going to look at some examples of how to take advantage of them.

22.1 - The Stack

The stack is one of the most fundamental data structures in computing. Every time you run a program, that program takes advantage of a stack. I'll first explain how it works, then I'll explain how your computer takes advantage of it.

A stack is a linear data structure that follows a particular order in which the operations are performed. A stack is a LIFO (Last in First Out) data structure. That is, the last element entered into the stack is the first element that will leave the stack. You can think of a stack as a stack of dinner plates. Plates are stacked, one on top of the other. The last plate to be put on top of the stack will be the first plate to be removed as we cant remove the plate at the bottom (or the stack of plates would fall over and they'd all smash).

A stack (usually) has 4 main operations: push (add an element to the top), pop(remove an element from the top), top (show the element at the top) and isEmpty (is the stack empty). We can also add a length method and it is usually useful!

Here is an illustration of a stack

                             |------| <-- Pop                 |------| <-- Push new
|------| <-- Top                                              |------|     element (new
|------|                     |------| <-- New top             |------|     top)
|------|          If we pop: |------|         If we push      |------|
|------|                     |------|         a new element:  |------|
|------|                     |------|                         |------|
|------|                     |------|                         |------|
|------|                     |------|                         |------|

The stack is such a fundamental data structure that many computer instruction sets provide special instructions for manipulating stacks. For example, the Intel Pentium processor implements the x86 instruction set. This allows for machine language programmers to program computers at a very low level (Assembly language for example).

A very special type of stack called the "Call stack" or "Program stack", usually shortened to "The stack" is what some of the instructions in the x86 instruction set manipulate.

Let's consider what happens with the following program at a low level (For anyone with a solid understanding of computers this is a little high level but a good example).

   .
   .
   .
print(4)
   .
   .
   .

In the above Python snippet, we have some set of instructions executing in order (Like we have always seen). At some point we call the print function to output the number 4 to our terminal window. This is where the stack comes into play. Lets look at how our program might be stored in memory (At a high level)

Memory Address       Instruction
----------------------------------
0x0000001            .......       # The interal code for the print function might
0x0000002            .......       # be stored at 0x0...01 to 0x0...03. 
0x0000003            .......
    .
    .
    .
0x0000004            .......       # Some instruction before the print.
0x0000005            print(4)      # The call to print.
0x0000006            .......       # Some instruction after print (The next
    .                              # instruction to be executed).
    .
    .

(I'M AWARE THIS ISN'T WHAT ACTUALLY HAPPENS BUT LET'S GO WITH THIS FOR NOW).

When we run a program, it is loaded into memory as machine code (Code the computer knows how to execute). Instructions are stored sequentially. In this view, we can look at functions as "mini programs" inside our main program. That is, the code for them is stored elsewhere in memory. When we call print(4) we are basically saying, jump to where the code for print is and start sequentially executing instructions until the function has completed, then return to the original location and continue on executing where you left off.

Before I explain this, the CPU has many special pieces of memory within it, called registers (These are like tiny pieces of RAM, usually 32-bits or 64-bits in size). One of these registers is called the instruction pointer register often abbreviated to IP. It contains the memory address of the next piece of code to execute.

In this case we execute the instruction at 0x0000004 ( then increase the IP), in the next instruction we call print(4) which is at location 0x0000005 yet the code for the print function is stored at 0x0000001 to 0x0000003. This is where the stack comes into play. We know that when we print the number we want to continue where we left off and execute the instruction at 0x0000006. In this case we push the return address (0x0000006) onto the stack, set the IP to 0x0000001 and start executing the code for the print function. When we're finished, we pop the return address off the stack (0x0000006) and place this in the IP. Now our program continues to execute where we left off.

I'm sure you can imagine how recursion works now at this level (essentially a series of push, push, push ...... pop, pop, pop).

Things are a lot more complex than this but it isn't necessary for the explanation of how stacks are used at the most fundamental levels. Don't worry too much if you didn't understand all of that. It isn't necessary for this book but it shows just how important the stack is.

Stacks have many high level uses too, for example, undo mechanisms in text editors in which we keep track of all text changes in a stack. They may be used in browsers to implement a back/forward mechanism to go back and forward between web pages.

Let's look at the code for a stack in Python. We will implement a stack using a list

class Stack:
    def __init__(self):
        self.stack = []
    
    def push(self, elem):
        self.stack.append(elem)
    
    def pop(self):
        if len(self.stack) == 0:
            return None
        else:
            return self.stack.pop()
    
    def isEmpty(self):
        if len(self.stack) == 0:
            return True
        return False
    
    def top(self):
        return self.stack[-1]

This is a very simple version of a stack, but we can now use our stack as follows:

stack = Stack()

stack.push(1)
stack.push(4)
stack.push(2)

print(stack.top())

print(stack.pop())
print(stack.pop())
print(stack.pop())

We will get the following output if we run this program

2 # top
2
4
1

When we call the top() method, we don't actually remove the top element, we are just told what it is. When we use pop() method, we do remove the top element.

We will look at an example of where this can be used a bit later.

22.2 - The Queue

Queues in some sense similar to stacks. They are a linear data structure except rather than the operation order being Last in First Out, they are FIFO (First in First Out). They behave exactly like any queue you may think of in real life. A queue at a super market for example, you get in line at the back and wait until you get to the front to leave.

In computing, a queue may be useful in CPU scheduling. In computers, there are many things happening at what seems to be, the same time. For example you may have music playing as you read an article. There are many things happening here. Your music is being played, your monitor is being updated (refreshed) as you scroll through your article and you're using your mouse wheel to indicate when to scroll. In a simple sense, your CPU can only deal with one thing at a time. Your music application needs time on the CPU to output the next bit of music, as you scroll while this is going on, your mouse driver needs CPU time to tell the computer what to do, then your monitor drivers need CPU time to refresh the pixels on your screen. This all seems to be happening at the same time, except it isn't (computers are just so fast it appears it is). What's actually happening in this situation is each program is quickly using the CPU, then jumping back in the queue. This is happening over and over again. A CPU scheduling algorithm deals with which program gets to go next on the CPU and may implement some variation of a queue to handle this. (A priority queue for example).

They have similar methods for adding and removing except they are enqueue() and dequeue() respectively.

I think you get the gist of how you may visualize a queue so I'm just going to jump to the code for implementing a queue. Again, we will implement a queue using a list.

class Queue:
    def __init__(self):
        self.q = []
        
    def enqueue(self, elem):
        self.q.append(elem)
        
    def dequeue(self):
        if len(self.q) != 0:
            return self.q.pop(0)
    
    def isEmpty(self):
        if len(self.q) > 0:
            return False
        return True

And there you go. A very simple Queue class. Notice how we pop from index 0. Recall that the pop() method for lists takes an optional argument which is the index of the list you want to remove an element from. If we don't supply this optional argument it defaults to -1 (the end of the list, which is what stacks do). Here we remove the element at the start of the list but add elements to the end.

Running the following program, will produce the following output

queue = Queue()

queue.enqueue(1)
queue.enqueue(4)
queue.enqueue(2)

print(queue.isEmpty())

print(queue.dequeue())
print(queue.dequeue())
print(queue.dequeue())

print(queue.isEmpty())
False
1
4
2
True

22.3 - Examples

The first example I'm going to look at is for stacks. This question is known to have been asked during past Google coding interviews!

The problem is: Given an input string consisting of opening and closing parentheses, write a python program that outputs whether or not the string is balanced.

For example:

Input: ([{}])
Output: Balanced

Input: (([])
Output: Unbalanced

Input: {(){[]}}
Output: Balanced

Input: [()(){]
Output: Unbalanced

One approach we can take with this problem is to use a stack. Each time we encounter an opening bracket we append that to the stack. When we encounter a closing bracket we pop from the stack and check that the element we popped is the opening bracket for the closing bracket. For example, [ is the opening bracket for ], similarly, { is the opening bracket for } and so on.

Here is a possible solution. Assume our stack class from earlier exists:

opening = ["(", "[", "{"]
# Note that pairs is a dictionary that maps the closing bracket
# to the opening bracket
pairs = {")": "(", "]": "[", "}": "{"}

def balanced(input_string):
    stack = Stack()
    for bracket in input_string:
        if bracket in opening:
            stack.push(bracket)
        elif stack.isEmpty():
            return False # Can't pop anything as stack is empty so it's unbalanced
        elif pairs[bracket] != stack.pop():
            return False # The brackets didn't match up so it's unbalanced
    
    return not stack.isEmpty() # If stack is empty at the end,
                               # return True, else, False

We can make better use of the pairs map and make our code a bit neater but it won't be as readable so I've taken a slightly longer approach.

Now we can run our code as follows to get the desired output from the example input and output above:

# ABOVE CODE HERE

inputString = input()
if (balanced(inputString)):
    print("Balanced")
else:
    print("Unbalanced")

I don't have an example usage of queues as I can't think of anything useful to do with them with the knowledge we've got at the moment. Don't worry though, there will be a good question on queues coming up soon.

2.4 - Exercises

Question 1

You are given two strings S and T. Both strings either contain alphanumeric characters or a '#' character. A '#' means backspace. Your goal is to use your knowledge of stacks to design a function that will decide whether both strings are equal after the back space operation has been applied. For example:

Input: S = "xy#z", T = "xw#z"
Output: True
Explanation: When the backspace operation is applied, both strings become "xz"
    
Input: S = "abcd##", T = "acbd##"
Output: False
Explanation: S becomes 'ab' and T becomes 'ac' which are not the same
    
Input: S = "z##y", T="#d#y"
Output: True
Explanation: Both strings become 'y'
    
Input: S = "er##", T = "u#u#"
Output: true
Explanation: Both S and T become '' (empty string)

Question 2

It is possible to create a queue through the use of two stacks. Implement a queue class like above, except rather than using a list to implement the queue you use two stacks.

This might be a little tricky, you'll have to think about this.

Question 3

It is also possible to create a stack through the use of two queues. Implement a stack class like the one we have seen earlier, except rather than using a list to implement the stack, use two queues.

This might be a little tricky, you'll have to think about this.

Mini Project 1

You are to design an RPN calculator. RPN stands for Reverse Polish Notation. An RPN calculator is also known as a postfix calculator. We as humans do math as follows, for example, 4 + 7 or we must use BEMDAS rules: 8 + ((10 * 3) / 2). In RPN, we would represent these as: 4 7 + and the last expression we represent as 8 10 3 2 / * +. Whats happening here is the operator (+ is in the prefix position rather than infix position as we are used to). This might look a little confusing but its actually quite simple and we can use a stack to solve this problem.

Let' s look at how this works:

Expression: 4 7 +
1) Read the first character (4)
2) This is a number so we push it to the stack
3) Read the next charactr (7)
4) This is also a number so we push it to the stack
5) Read the next character (+)
6) This is an operator. It's a binary operator as it works on two operands, so we
   pop two elements from the stack
7) Add the two numbers together
8) Push the result back onto the stack

If we had a unary operator such as negation then we would pop one element from the stack, calculate the result and push it back.

You are to implement the following operators in your calculator:

Your calculator should read in RPN expressions, one line at a time and evaluate them. There is one additional operator to implement. This is p which stands for print. If you encounter p then print the result. You can assume p will always occur at the end of the input.

Example:

Input: 4 7 3 + + p
Output: 14
Infix: 4 + 7 + 3

Input: 5 8 * n p
Output: -40
Infix: -(5 * 8)

Input: 3 6 1 + + 2 e p
Output: 100
Infix: (3 + 6 + 1)^2

Think carefully about this problem. It's even a great beginner project to host on your Github account to showcase your work to the world!



Help support the author by donating or purchasing a copy of the book (not available yet)



Previous Chapter - Next Chapter