Help support the author by donating or purchasing a copy of the book (not available yet)



Chapter 23 - More Advanced Data Structures

In the previous chapter, we learned about some simple yet fundamental data structures with rather simple implementations. In this chapter we're going to take a look at a couple of new data structures. They'll also be a little bit more advanced in their implementation.

23.1 - Linked Lists

The first data structure we're going to look at in this chapter is a Linked List.

A linked list is a linear data structure where each element is a separate element. Linked List objects are not stored at contiguous locations. Instead the linked list objects are linked together using pointers.

Throughout this book I may have referred to Python lists as arrays and visa versa. Python lists have a complicated implementation under the hood. This allows for lists to be of arbitrary size, grow and shrink and store many items of various different types.

In many other programming languages an array is much like a fixed size list that stores elements of a specific type. They store elements contiguously in memory. That isn't necessarily the case in Python.

Python is implemented in a language called 'C'. This language is low-level compared to Python. In the C language, Python lists are implemented with a data structure some what similar to a linked list. It's a lot more complicated than a linked list but a lot of the concepts and techniques we'll learn here will apply to the C Python implementation of lists.

There are many variations of linked lists but in this chapter we're going to look at one variation called a Singly Linked List.

Below is a diagram of a Singly Linked List:

There are various elements to this so let me explain:

Linked Lists have various advantages over arrays (not Python lists). These include dynamic size (they can grow and shrink as needed much like python lists, whereas arrays can't do this) and ease of insertion and deletion.

When we delete something from a Python list, the list must be resized and various other things must be done (this is done under the hood so we don't worry about this).

There are also some disadvantages to Linked Lists compared to arrays. We can't index them, therefore if we wish to access an element we must iterate through each element, starting from the head, in order until we find the element we are searching for.

From the above diagram we can break the problem of implementing a linked list down into two things:

  1. A node class: This will contain the data the node will store and a pointer to the next node.
  2. Linked List class: This will contain the head node and methods we can perform on the linked list.

Let's look at the node class:

class Node:
    def __init__(self, elem):
        self.elem = elem
        self.next = None

This is our linked list node class. It's very simple and only contains two attributes. The first is the data the node will store and a next attribute. The next attribute is set to None by default as it won't point to anything initially. This concept of a node is very powerful when it comes to data structures. It gives us a lot of flexibility when developing other data structures as we'll see later on.

Let's now look at the first attempt at out linked list:

class LinkedList:
    def __init__(self):
        self.head = None

When we initialize our LinkedList it will be empty, that is, the head will point to nothing (None). That's all we need to start implementing methods.

To get started, we'll take a look at the add() method. Firstly we need to think about how we're going to go about doing this. In our linked list, we'll be adding new elements to the end of the list. Firstly we need to check if the head is empty. If it is, then we just point the LinkedList head to a new node. Otherwise, we need to "follow" this trail of pointers to nodes until one of the node's next attribute points to None. When we find this node then we update it's next attribute to point to the element we're trying to add.

Let's look at how that's done:

class LinkedList:
    def __init__(self):
        self.head = None
        
    def add(self, elem):
        if self.head == None:
            self.head = Node(elem)
        else:
            curr = self.head
            while curr.next != None:
                curr = curr.next
            curr.next = Node(elem)

Here we check that the head of the linked list isn't empty. If it has then we assign a new Node to the linked list head. Otherwise we create a new variable called curr. This keeps track of what node we're on. If we didn't do this we would end up updating the linked list's head node by mistake. After we create this new variable, we loop through the elements of the list, going from node to node through the next pointer until we find a node whose next pointer is empty. When we find that node, we update it's next pointer to point to a new Node.

This may take a little time to wrap your head around or understand but the more you work with data structures like this or ones similar, the more sense it will make.

Let's look at our deletion method. We have many options when deleting elements from a linked list just like we do with adding them. We could remove the first element, the last element or the first occurrence of an element. Since our linked list can hold many occurrences of the same data, we will remove the first occurrence of an element.

Let's look at how that may be done with the help of a diagram:

As you can see, from this diagram we want to delete the node that contains 2. To do this we find the first node who's next pointer points to a Node that contains the value 2. When we find this Node we just update it's next pointer to point to the same node the Node we want to delete points to. Essentially, we re-route the next pointer to bypass the element we want to delete. This is quite easy to do.

Let's look at how it's done.

class LinkedList:
    def __init__(self):
        self.head = None
        
    def delete(self, val):
        if self.head == None:
            return
        if self.head.elem == val:
            self.head = self.head.next
            return elem
        
        curr = self.head
        while curr.next != None and curr.next.elem != val:
            curr = curr.next
        
        if curr.next == None:
            return
        else:
            curr.next = cur.next.next
            return val

We have to cover a couple of cases here. The first is that the list is empty in which case return nothing. The second is that the head of the linked list is the element we want to delete. If it is then we make the head of the list the second Node in the list (Which may be None but thats ok, it just means the list only had one item). Finally, if it is neither of those cases we traverse the list using linear search until we find the first occurrence of the element then we 're-route' the next pointer of the Node that points to the Node we want to delete, to the Node the next pointer of the Node we want to delete points to.

That last part might seem confusing but we're doing what you see in the diagram. It's probably best at this point if you're confused by that to draw this scenario out on paper to help clarify things.

There are a couple of other useful methods we can add here but we'll leave them for later.

Just before I finish up on linked lists, I want to talk a little bit more about the complexities of their methods and why you might use them (or not).

The run time complexity of the add method is O(n) in this case as we must iterate over each entry in the list to find the last element. The runtime complexity of the deletion method is also O(n). Although we don't always iterate over each element in the list, Big-O deals with the worst case, which is going through the entire list. There are optimizations we could make though and we'll get to them later on too.

Linked Lists in Python usually don't have a use case. This is because Python lists are already very well optimized and dynamic. However, in languages such as C or C++, Linked Lists are essential (Where dynamic arrays, like python lists, may not exist). They also pop up in technical interviews so you're best to know them. In fact, if I was hiring an engineer, I'd be very skeptical about hiring one who didn't know how to code a linked list.

23.2 - Binary Search Trees (BSTs)

Before I start this section, I want to give you a warning. We will be using recursion quite a lot here. If recursion is not your strong point (which it probably won't be at this point), perhaps brush up on the recursion section. However, this section might be a good place to help you understand recursion. BSTs are what helped me wrap my head around recursion (well, at least that's when recursion clicked for me).

A binary search tree (BST), is somewhat different to the data structures we have met so far. It is not a linear data structure. It is a type of ordered data structure that stores items. BSTs allow for fast searches, addition and removal of items. BSTs keep their items in sorted order so that the operations that can be performed on them are fast, unlike a linked list who's add and remove operations are O(n). The add and remove operations on a BST are O(log n). Searching a BST is also O(log n). This is the average case for those three operations.

This is a similar situation to linear search vs. binary search which we looked at earlier.

In computer science, a 'tree' is a widely used abstract data type (a data type defined by its behaviour not implementation) that simulates a hierarchical tree structure (much like a family tree), with a root node and subtrees of children with a parent node represented as a set of linked nodes.

A binary tree is a type of tree in which each Node in the tree has at most, two child nodes. A binary search tree is a special type of binary tree in which the elements are ordered. A diagram might help you visualize this, so here's one:

There is a lot to take in here so let me explain. The root of this tree is the node that contains the element 10. The big thing labeled Sub Tree is the right sub tree of the root node.

10 is the parent node to 22 and 5 and similarly, 5 is a child node of 10.

A leaf node is a node that has no children.

As you can see from this diagram, it is a binary tree as each node has at most two child nodes. It also satisfies the an important property that makes it a binary search tree. That is, that if we take any node, 10 for example, everything to the right is greater than it and everything to the left is less than it. Similarly if we look at the node 13, everything to the right is greater than 13 and everything to the left is less than 13.

What makes searching fast is this, if we wanted to check if 2 was in the tree, we start at the root node and check if 2 is less than the value at the root. In this case 2 is less than 10 so we know we don't need to bother searching the root node's right sub tree. Essentially what we've done here is cut out everything in that blue circle as the element 2 will not be in there.

Let's look at implementing a binary search tree. Again, the concept of a Node is important here. Our Node class for a binary search tree will be slightly different than what it was for a linked list. Let's take a look:

class Node:
    def __init__(self, elem):
        self.elem = elem
        self.left = None
        self.right = None

As you can see here, each node will have an element, a left child and a right child (either or both of which may be None).

Let's make our first attempt at a BinarySearchTree class:

class BinarySearchTree:
    def __init__(self):
        self.root = None

Pretty simple so far. Basically, what we can do if we want to add, remove or search for an element is follow pointers from left sub trees or right sub trees until we find what we are looking for.

This time we can look at the search operation first. It will be a good look at how to use recursion in a practical situation. We either want to return None if the element is not found or return the Node that contains what we're looking for if the element is found.

class BinarySearchTree:
    def __init__(self):
        self.root = None
        
    def search(self, value):
        return self.recursive_search(self.root, value)
    
    def recursive_search(self, node, value):
        if node is None or node.elem == value:
            return node
        if value < node.elem:
            return self.recursive_search(node.left, value)
        else:
            return self.recursive_search(node.right, value)

This may seem a little odd as we have two search functions but there is good reason for it. The search method allows us to call the recursive_search method and pass in the root node, that way, we now have a 'copy' of the root node and not the root node itself so we don't end up in an infinite loop when recursively searching. Again, the recursive nature of this is very difficult to explain so I'd try working through an example on paper to help clarify after the following explanation.

Inside the recursive_search method we have a base case which checks if we've reached None (we didn't find what we're looking for) or we've reached the Node that contains what we're searching for.

If we don't pass our base case then we check if the value we're searching for is less than the value at the node we're currently at; if it is then we search the left subtree of that node by calling the recursive_search method and passing the current nodes left tree. Otherwise, the value we're searching for is greater than the value at the current node, in which case we call recursive_search and pass in the current nodes right tree.

Remember a node's left or right tree is simply a pointer to a node (we can look at this node as the root node of a sub tree).

For example, if we're searching for the value 15 in the tree from the diagram above, then the path we take looks like this:

Just to clarify, a leaf nodes left and right 'trees' are None.

Next we will look at inserting an element into the tree. What we're doing here is following pretty much the same thing we did with search except when we reach a leaf node on our path we will set it's left or right pointer to point to a new node (depending on whether or not the value is less than or greater than the value at the child node).

Here's how we will approach this:

Here's how that's done:

class BinarySearchTree:
    def __init__(self):
        self.root = None
        
    def insert(self, val):
        return self.recursive_insert(self.root, val)
    
    def recursive_insert(self, node, val):
        if node == None:
            node = Node(val)
        elif val == node.elem:
            continue
        else:
            if val < node.elem:
                if node.left == None:
                    node.left = Node(val)
                else:
                    self.recursive_insert(node.left, val)
            else:
                if node.right == None:
                    node.right = Node(val)
                else:
                    self.recursive_insert(node.right, val)

That's it. I hope you can see why recursion is useful here. It makes our code read better and when we're planning this out in our heads, it naturally fits into our algorithm.

Next we're going to look deleting an element. This is quite a bit more difficult. With insertion, we always insert into one of the leaf nodes but if we're deleting something, then three possibilities arise. The first is the simplest case:

  1. The node we're removing has no child nodes (i.e. a leaf node). In this case we can just remove the node.
  1. The next case is when we're removing a node that only has one child. Again this is simple enough to deal with. In this case we just cut the node we're removing from the tree and link its child to its parent.
  1. The final case is when we're removing a node that has two child nodes. This is the most complex case and requires some smart thinking. There is however, one useful property of BSTs that we can use to solve this problem. That is, the same set of values can be represented as different binary-search trees. Let's consider the following set of values: {9, 23, 11, 30}. We can represent this in two different ways:

Both of these trees are different yet they are both valid. What did we do here though to transform the first tree into the second?

We can take this idea and apply it to removing elements with two child nodes. Here's how we'll do that:

When applying the removal to the right subtree to remove the duplicate, we can be certain that the duplicate node will fall under case 1 or 2. It won't fall under case 3 (It wouldn't have been the minimum if we did).

There are two functions we'll need here. One is the remove method and the other will find the minimum node. We get a free function here (we will have a min operation on our BST!).

Here's how to code it, I'll also comment the code as it's a bit complex:

class BinarySearchTree:
    def __init__(self):
        self.root = None
        
    def min(self):
        node = self.root
        while node.left != None:
            node = node.left
        return node
    
    def remove(self, value):
        return self.recursive_remove(self.root, value)
    
    def recursive_remove(self, node, value):
        
        # Base case (element doesn't exist in the tree)
        if node == None:
            return node
        
        # If element to remove is less than current node
        # then it is in the left subtree. We must set the current
        # node's left subtree equal to the result of the removal
        if value < node.elem:
            node.left = self.recursive_remove(node.left, value)
        
        # If element to remove is greater than current node
        # then it is in the right subtree. We must set the current
        # node's right subtree equal to the result of the removal
        elif value > node.elem:
            node.right = self.recursive_remove(node.right, value)
        
        # If the element to remove is equal to the value of
        # current node, then this is the node to remove
        else:
            # Node has only one child or no child
            if node.left == None:
                # If right subtree is None then good, we return that
                # If right subtree is another node, fine, we can
                # still return that (bypassing like second case diagram)
                tmp = node.right 
                node = None
                return tmp
            elif node.right == None:
                # If left subtree is None then good, we return that
                # If left subtree is another node, fine, we can
                # still return that (bypassing like second case diagram)
                tmp = node.left
                node = None
                return tmp
            
            # Node with two child nodes
            else:
                # Find minimum element in right subtree
                tmp = self.min(node.right)
                
                # Copy the min node into the current node position
                node.elem = tmp.elem
                
                # Delete the duplicate node
                node.right = self.recursive_delete(node.right, tmp.elem)
                
        return node

This is tough. Make sure you understand the delete operation, even if that means going through it over and over again. Use a pen and paper if you need! It is a good question for an employer to ask as it shows how someone approaches a tough problem (pointing out different scenarios, breaking the problem down, etc. ).

That is all I want to cover on Binary Search Tree's for now. There are a few more methods we can add and I'll leave that up to you in the exercises!

23.3 - Exercises

Important: Some of these exercises are quite difficult! Make use of problem solving techniques to help you arrive at a solution. Sketching out what's happening on paper is a good way to help you out!

It's time to turn up the heat a bit on the difficulty of the questions. There are some really tough questions in here. Good luck!

Question 1

We talked earlier about the run-time complexities of the Linked List methods we implemented. I mentioned that we could improve their complexities.

Currently, the add() method has a run-time complexity of O(n).

Change the implementation of the add() method so that it's run-time complexity is O(1).

Question 2

Change the implementation of the add() or remove() methods on the Linked List class in order to achieve a Linked List that behaves like:

Try to give optimized implementations of add() and/or remove() when doing this.

Question 3

Another useful method our Linked List class may have is a length() method. That is, it returns the number of elements in the linked list.

Implement the length() method.

Question 4

If you thought carefully about your implementation of the length() method from the previous exercise then you might be able to skip this question. However, I'm not expecting that you did.

If my guess is correct, your length() method iterates over each element in the list, adding 1 to your length each time until you reach the end of the list. This will give your length() method a run-time complexity of O(n).

We can do better than that though. How might you change the Linked List class so that the run-time complexity of your length() method is O(1)?

Question 5

You might have noticed when testing out the binary search tree that the deletion method doesn't always work.

If the case is that the element you want to remove is the tree's root node and the root node has no children or one child, then the element wont be removed.

Fix the remove method so that it works in all cases

Question 6

The height (depth) of a Tree is the number of edges on the longest path from the root node to leaf node. (An edge is basically the arrows in our diagrams that we've looked at before)

For example, the height of the following tree is 3

The height is determined by the number of edges in the longest path, like the following:

You can also view the height of a tree as such:

You should write a recursive method to find the height of any given binary search tree

Question 7

This question is very difficult

Consider what happens when we add elements to the tree in the following order: {2, 5, 7, 12, 13}. We end up with a Linked List. That means that our search and insert is no longer O(log n). It's now O(n). This isn't really that great for us.

We can however, make an optimization to our binary search tree so that our search and insert are always O(log n), regardless of what order we enter the elements.

The optimization is that the difference between height of the left and right subtrees is no greater than one for all nodes. When the difference between left and right subtrees height is no greater than 1, we say the tree is balanced. This means, if we insert or remove an element and the height between left and right subtrees becomes greater than 1 then we must rearrange the nodes.

This optimization means that our binary search tree will become self balancing. Here is an animation of that self balancing process.

This type of self balancing binary search tree has a special name. It's called an AVL Tree.

Remember: This question is very difficult, I don't expect you to be able to answer this, but I encourage you to give it a go!

(Credit to Bruno Schalch for the AVL gif. All rights are owned by him).



Help support the author by donating or purchasing a copy of the book (not available yet)



Previous Chapter - Next Chapter