Help support the author by donating or purchasing a copy of the book (not available yet)



Chapter 9 - Basic Sorting Algorithms

9.1 - What is sorting?

We've learned a lot so far and its now time to put what we've learned to good use. We're now going to look at your first important algorithms.

Sorting is the process of arranging items systematically. In computer science, sorting refers to arranging items (whatever they may be) in an ordered sequence.

Sorting is a very common operation in applications and developing efficient algorithms to perform sorting is becoming an ever more important task as the amount of data we gather grows at an exponential rate.

Sorting is usually used to support making lookup or search efficient, to enable processing of data in a defined order possible and to make the merging of sequences efficient.

In this chapter we're going to look at two of these sorting algorithms: Selection sort and Insertion sort.

9.2 - Selection sort

Selection sort is a general purpose sorting algorithm. In order for selection sort to work we must assume there exists some sequence of elements that are order-able (integers, floats, etc.).

Selection sort is an in place sorting algorithm which means we don't build a new list, rather, we rearrange the elements of that list.

Before we begin we need to look at some terminology:

In selection sort, we break the list into two parts, the sorted subarray and the unsorted subarray and all of the elements in the sorted subarray are less than or equal to all the elements in the unsorted subarray. We also begin with the assumption that the entire list is unsorted.

Next, we search the entire list for the position of the smallest element. We must search the entire list to be certain that we have found the smallest element. If there are multiple occurrences of the smallest element, we take the position of the first one. We then move that element into the correct position of the sorted subarray. We repeat the above steps on the unsorted subarray

This is illustrated below

  6    3    9    7    2    8
|||========================|           # FIND THE POSITION OF THE SMALLEST
     UNSORTED PART                     # ELEMENT IN THE LIST AND SWAP WITH 6
       
----------------------------------------------------------------------------

  2    3    9    7    6    8           # 2 IS NOW IN THE CORRECT POSTION
  |===||===================|           # FIND THE POSITION OF THE SMALLEST
           UNSORTED PART               # ELEMENT IN THE LIST AND SWAP WITH 3
                                       # 3 IS THE SMALLEST (NO SWAP)
----------------------------------------------------------------------------

  2    3    9    7    6    8           # 3 IS NOW IN THE CORRECT POSITION
  |========||==============|           # FIND THE POSITION OF THE SMALLEST
    SORTED    UNSORTED PART            # ELEMENT IN THE LIST AND SWAP WITH 9
     PART      
  
----------------------------------------------------------------------------

  2    3    6    7    9    8           # 6 IS NOW IN THE CORRECT POSITION
  |=============||=========|           # FIND THE POSITION OF THE SMALLEST
    SORTED PART    UNSORTED            # ELEMENT IN THE LIST AND SWAP WITH 7
                     PART              # 7 IS THE SMALLEST (NO SWAP)
               
----------------------------------------------------------------------------

  2    3    6    7    9    8           # 7 IS NOW IN THE CORRECT POSITION
  |==================||=====|          # FIND THE POSITION OF THE SMALLEST
      SORTED PART      UNSORTED        # ELEMENT IN THE LIST AND SWAP WITH 9
      
----------------------------------------------------------------------------

  2    3    6    7    8    9           # 8 IS NOW IN THE CORRECT POSITION
  |========================|||         # WE HAVE REACHED THE END OF THE LIST
         SORTED PART                   # THE LIST IS NOW SORTED

Does this seem somewhat familiar? If you did the exercises from the previous chapter, exercise 5 was basically this process!

It's time to code selection sort! Let's look at the code for it below:

i = 0
while i < len(a):
    p = i
    j = i + 1
    while j < len(a):
        if a[j] < a[p]:
            p = j
        j += 1
    
    tmp = a[p]
    a[p] = a[i]
    a[i] = tmp
    
    i += 1

That's it! Let's break this down as there is a lot going on here

A skill that all programmers have is being able to step through code and figure out how it's working. I encourage you to do the same with this algorithm (pen and paper and walking through an example)

Another good way to figure out what's going in the middle of some code is to stick in a print() statement to print out what variables hold what values at that current time. Another good way is to set break points. I'm not going to show you how to do this as it's generally a feature of the text editor you're using so look up how to set break points for your editor and how to use them! It's a skill you'd be expected to have if you worked in this field.

9.3 - Insertion sort

Insertion sort is another general purpose, in place sorting algorithm.

To contrast how insertion sort works compared to selection sort:

I'll illustrate this with a semi complete example:


2    4    5    3    9     6
|=============||==========|      # THE STATE OF THE LIST AT SOME POINT
    SORTED       UNSORTED        # DURING THE INSERTION SORT PROCESS
    
-----------------------------------------------------------------------
          
          This is the next element we want to sort
               |
               |
2    4    5    3    9     6
|=============||==========|      # WE WANT TO NOW PLACE THE 3 IN IT'S 
    SORTED       UNSORTED        # CORRECT POSITION

-----------------------------------------------------------------------
               3      
2    4    5    _    9     6
|=============||==========|      # TAKE 3 OUT OF THE LIST
    SORTED       UNSORTED   
    
-----------------------------------------------------------------------
          
               3 
2    4    5    _    9     6
|=============||==========|      # IS 3 < 5? YES, SO MOVE 5 UP
    SORTED       UNSORTED   
    
-----------------------------------------------------------------------

          3 
2    4    _    5    9     6
|=============||==========|      # IS 3 < 4? YES, SO MOVE 4 UP
    SORTED       UNSORTED   

-----------------------------------------------------------------------

     3 
2    _    4    5    9     6
|=============||==========|      # IS 3 < 2? NO, PLACE AFTER THE 2
    SORTED       UNSORTED  
    
-----------------------------------------------------------------------

      
2    3    4    5    9     6
|=============||==========|      # 3 IS NOW IN THE CORRECT POSITION
    SORTED       UNSORTED 
    
-----------------------------------------------------------------------

     
2    3    4    5    9     6
|==================||=====|      # MOVE ONTO THE NEXT ELEMENT
    SORTED           UNSORTED  

It's time to code insertion sort! Let's take a look at the code for it, I'll give the explanation as comments:

i = 1         # Assume the first element (a[0]) is sorted
while i < len(a):  # Same as with selection sort
    v = a[i]       # The value we want to insert into the correct position
    p = i          # The position the element should be inserted into to
    while p > 0 and v < a[p-1]:   # While value is < element to left 
        a[p] = a[p-1]             # Move the element to left up
        p -= 1                    # Decrement p (move one position left)
    a[p] = v       # Found correct position so insert the value
    
    i += 1         # Move to next element

9.4 - Comparison of Selection sort and Insertion sort

In this section I'm going to talk about the algorithmic complexity of the two algorithms. This is a really important section. If you ever do a technical interview, in 99.9% of cases you'll be asked to write some code or an algorithm and give it's algorithmic complexity. You need to know this stuff and it isn't just for Python, this applies to any language! I'm going to explain this in the simplest way possible while still giving you a good understanding of the topic.

Selection sort and Insertion sort are what we call quadratic sorting algorithms.

This means they both have a big O time complexity of:
O(n2)
Time complexity refers to the time it takes for the algorithm to complete. There are generally three categories:

Big O deals with the worst case and it is the case we are usually concerned with!

There's a lot of maths behind this and there's a whole topic of computer science related to it so I'm not covering it here. Check chapter 24 for that material. Even there I wont be able to cover it all.

Anyway, when we say that both of these algorithms have a time complexity of O(n^2), we are essentially saying that the performance of the algorithms will grow proportionally to the square of the size of the input (lists in this case).

I've taken the time to run some algorithms with different time complexities on lists of varying sizes and we can look at how the algorithm takes longer to complete as the size of the list grows. You can see this in the diagram below:

Time complexity

Time complexity

In this diagram, the y-axis represents how long it took the algorithm to finish in seconds and the x-axis shows the number of elements in the list.

The blue line is for an O(n^2) algorithm and the orange is for an O(n) algorithm (Linear time algorithm).

I have a fast computer so it doesn't appear that the orange line is changing (it is, just very slowly).

However it is very clear that the O(n^2) algorithms become very slow when the size of the input becomes large.

In the real world, a list of size 1000 is small and selection sort or insertion sort wouldn't cut it. However that doesn't mean they don't have their uses. In fact, in practice, selection sort and insertion sort out perform the faster algorithms on small lists and some of the fast sorting algorithms will actually switch to these O(n^2) algorithms when they are nearing the end of the sorting process. It turns out the whole process finishes faster when this is done (in some cases).

We also have something called space complexity and this deals with how much memory is taken up by an algorithm. Both of these algorithms have O(1) space complexity. This means they use constant memory. They use constant memory as they are in-place sorting algorithms. They don't create additional lists to assist during the sorting process.

There is usually a trade-off between time and space complexity. As you can see here we have constant memory (This is good) but quadratic time complexity (This is bad). We could write some algorithm that is faster but takes more memory. It depends on the problem and the computing resources we have.

9.5 - Exercises

There won't be any coding exercises in this chapter, just theory questions. They are just as important to understand and get right though!

Question 1

What is the time complexity of:

  1. Insertion sort
  2. Selection sort

Give your answer in Big O notation

Question 2

What is the space complexity of:

  1. Selection sort
  2. Insertion sort

Give your answer in Big O notation

Question 3

Give two examples of when we might use selection sort or insertion sort in the real world?

Question 4

If we ended up in a situation in which the input was sorted but we didn't know and we try to sort the input, why might we prefer insertion sort over selection sort?

You may have to go and search for the answer to this.



Help support the author by donating or purchasing a copy of the book (not available yet)



Previous Chapter - Next Chapter