Help support the author by donating or purchasing a copy of the book (not available yet)



Chapter 13 - Binary Search

13.1 - Number guessing game

Let's assume we have a sorted array of length 100,000. Further assume somebody has chosen a number from that array, at random and we don't know what it is. We're given 20 attempts to guess the answer. This may seem impossible and that we only have luck on our side.

We can use the knowledge we have so far to write a function that guesses this and we have a couple of ways of doing it.

Let's look at two ways we might attempt this. The first is linear search.

from random import randint
arr = [x for x in range(0, 100000)]
secret = randint(0, len(arr))

i = 0
while i < 20 and i != secret:
    i += 1

if i < 20:
    print("The secret number is " + str(i))
else:
    print("Hard luck, you didn't guess the secret number")

The first line imports the randint() function from the random module which takes two integer arguments and returns a random integer between the arguments.

The second line builds up the array using something called a list comprehension. We'll get back to these soon.

The third line chooses that random number as the secret number and the rest of the code is standard linear search to try find the secret.

Obviously this is a terrible idea as the secret number must be between 0 and 20 which has a 20/100000 chance of happening.

Another approach we could take is, taking 10 random guesses.

from random import randint
arr = [x for x in range(0, 100000)]
secret = randint(0, len(arr))

guess = randint(0, len(arr))
i = 1
while i < 20 and guess != secret:
    guess = randint(0, len(arr))
    i += 1

if i < 20:
    print("The secret number is " + str(i))
else:
    print("Hard luck, you didn't guess the secret number")

Better approach? Maybe, but we can do much better. In fact, we can guess the number within 20 guesses every time without fail. We use a new algorithm called Binary Search to do that.

One of the assumptions made in the previous section was that the list was sorted. Knowing that the list is sorted allows us to take advantage of that fact. In this section we'll look at how we do that using a new searching algorithm called binary search.

Let me illustrate how binary search works:

2  4  9  12  34  35  77
|_____________________|  # The number we were trying to guess is in here

2  4  9  12  34  35  77
| low                 | high
|_____________________|        # We will call these two position low and high

2  4  9  12  34  35  77
| low                 | high
|_____________________|        # The number is somewhere between low and high 

2  4  9  12  34  35  77
| low                 | high   # We're also able to calculate the index for
|_____________________|        # the number in the middle of low and high

2  4  9  12  34  35  77
| low     | middle     | high   # We're also able to calculate the index for
|_________|____________|        # the number in the middle of low and high

#
# Assume the number we're searching for is 4
#

2  4  9  12  34  35  77
| low     | middle     | high   # We're also able to calculate the index for
|_________|____________|        # the number in the middle of low and high

2  4  9  12  34  35  77
| low     | middle     | high   # As the list is sorted we can check if the
|_________|____________|        # number at 'middle' is <= or > 4

2  4  9  12  34  35  77
| low     | middle     | high   # In this case 4 is < 12 so we don't need to 
|_________|____________|        # bother searching anything above middle index

2  4  9  12
| low     | high                # So we cut that portion of the list out
|_________|                     # and make middle the new high

2  4  9  12
| low     | high                # Repeat the process,
|_________|                     # continuously halving the list until the condition
                                # that low < high fails i.e low == high

So our approach here is:

We calculate the midpoint by getting the average of low and high i.e. (low + high) // 2. Notice we're doing integer division here so we'll round down if there is a decimal place.

In the next section we'll look at implementing binary search

In the previous section we looked at how binary search works. Now we need to code it.

Roughly, only 10% of developers are able to code binary search properly. Theres a little cop out in the middle of it so we need to be careful so that our solution works correctly in all cases.

Below is the commented implementation of binary search:

def binary_search(arr, elem):
    low = 0                      # Define initial low value.
    high = len(arr)              # Define initial high value.
    
    while low < high:            # The condition that must hold true.
        mid = (low + high) // 2  # Calculate the mid index.
        
        if arr[mid] < elem:         # Check if the value is in first or second half.
            low = mid + 1           # Update low value if mid val < elem (in second half).
        else:               
            high = mid              # Otherwise update high value (elem in first half).
    return low                      # Return the position of the element
                                    # we're searching for.

It is important that mid and high are never equal. This is one place where some developers mess up when coding this algorithm.

We must add 1 when updating the low value because if low == mid, we will end up in an infinite loop and that is bad news! This is another place where developers mess up.

Since binary search has it's cop outs like those above, I recommend you learn it off by heart.

13.4 - Improving our number guessing game

We can now take this binary search algorithm and modify it slightly to fit our game. We're going to add a check that we don't do more than 20 iterations of the binary search algorithm i.e. halve the list more than 20 times.

from random import randint
arr = [x for x in range(0, 100000)]
secret = randint(0, len(arr))


low = 0
high = len(arr)
i = 0
while low < high and i < 20:
    mid = (low + high) // 2
    
    if arr[mid] < secret:
        low = mid + 1
    else:               
        high = mid
    i += 1

if i < 20:
    print("The secret number is " + str(low))
else:
    print("Hard luck, you didn't guess the secret number")

Our improved version of the game will guess (find) the secret number within 20 attempts every time.

This is important because we get guess the answer correctly, in the worst case, in 20 attempts, whereas our previous approaches, the worst case would have been 100,000 attempts!

In this section I'm going to look at Binary Search's runtime complexity much like I did with Insertion Sort and Selection Sort.

Insertion Sort and Selection Sort had a runtime complexity of:

O(n2)

Binary Search has a runtime complexity of:

O(log(n))

More specifically:

O(log2n)

It is a little trickier to tell what the runtime of an algorithm like this is. As a general rule of thumb, if we halve the input upon each iteration then it will have a logarithmic runtime (or at least, a logarithmic component to its runtime complexity).

Recall the graph of the Insertion Sort and an O(n) algorithm. Here is the graph of an algorithm with binary search's runtime complexity.

You can clearly see that as the input get's much larger, the time it takes for the algorithm to complete begins to level off and increasing the input size begins to have a smaller and smaller effect on the algorithms performance.

Binary Search's space complexity is also O(1), which is constant memory, as we don't create any copies of the list when searching.

This is considered a very efficient algorithm as it's runtime complexity is pretty close to constant and the space complexity is constant.

13.6 - Exercises

Question 1

Answer the following questions:

Question 2

Using what you've learned in this chapter, write a function called insert()that uses an efficient algorithm (hint, hint...) and takes a list and an element as input and returns the index at which that element should be inserted into the list.

For example, calling the insert function as such should return the following:

insert([1, 3, 4, 5], 2)

# RETURNS
1              # i.e. 2 should be inserted at index 1

Question 3

Again, using what you've learned, write a function called contains() that takes two arguments, a list and an element and returns whether or not that element is contained within that list.

For example:

contains([1, 3, 5, 7], 3)

# RETURNS
True
------------------------------
contains([1, 3, 5, 7], 2)

#RETURNS
False


Help support the author by donating or purchasing a copy of the book (not available yet)



Previous Chapter - Next Chapter