Searching and Sorting

Table of Contents


1. Searching

We're going to keep this section short for now because we're mostly going to be focusing on sorting algorithms.

When we're searching for items in an unsorted linear structure
there's not much we can do to speed up the process. We can basically either start at the beginning and move forward, or start and the end and move backward, checking each item in the structure for what you're looking for.
template <typename T>
int LinearSearch( const vector<T>& arr, T findme )
{
    int size = arr.size();
    for ( int i = 0; i < size; i++ )
    {
        if ( arr[i] == findme )
        {
            return i;
        }
    }

    return -1; // not found
}

We begin at the first index 0 and iterate until we hit the last index. Within the loop, if the element at index i matches what we're looking for, we return this index.

If the loop completes and we haven't returned an index yet that means we've searched the entire structure and have not found the item. In this case, it is not in the structure and we can throw an exception to be dealt with elseware or return something like -1 to symbolize "no valid index".

This search algorithm's growth rate is \(O(n)\) – the more items in the structure, the time linearly increases to search through it. Not much we can do about that, which is why we have different types of data structures that sort data as it is inserted - more on those later on.

OK, but what if the structure is sorted?

We're going to be learning about sorting algorithms, so what if we happen to have a structure that is sorted? How can we more intelligently look for some value in the structure?

Let's say we have a simple array like this:

Value: "aardvark" "bat" "cat" "dog" "elephant" "fox"
Index: 0 1 2 3 4 5

And we want to see if "dog" is in the array. We could investigate what the first item is (Hm, starts with an "a") and the last item ("f"), and realize that "d" is about halfish way between both values. Maybe we should start in the middle and move left or right?

  • Index 0 is "aardvark". Index 5 is "fox". Middle value \(\frac{0+5}{2}\) is 2.5 (or 2, for integer division). What is at position 2? – "cat". If arr[2] \(<\) findme, move left (investigate arr[1] next) Or if arr[2] \(>\) findme, move right (investigate arr[3] next).
  • "d" is greater than "c" so we'll move right… Index 3 gives us "dog" - we've found the item! Return 3.

In this case, we basically have two iterations of a loop to find "dog" and return its index. If we were searching linearly, we would have to go from 0 to 1 to 2 to 3, so four iterations.

This still isn't the most efficient way to search this array - just starting at the midpoint and moving left or moving right each time. However, we can build a better search that imitates that first step: Checking the mid-way point each time.

Here is the Binary Search.

template <typename T>
int BinarySearch( vector<T> arr, T findme )
{
    int size = arr.size();
    int left = 0;
    int right = size - 1;

    while ( left <= right )
    {
        int mid = ( left + right ) / 2;

        if ( arr[mid] < findme )
        {
            left = mid + 1;
        }
        else if ( arr[mid] > findme )
        {
            right = mid - 1;
        }
        else if ( arr[mid] == findme )
        {
            return mid;
        }
    }

    return -1; // not found
}

With the binary search we look at the left-most index, right-most index, and mid-point. Each iteration of the loop, we look at our search value findme – is its value greater than the middle or less than the middle?

Example

Let's say we have this array, and we are searching for 'p'.

Value: 'a' 'c' 'e' 'h' 'i' 'k' 'm' 'o' 'p' 'r'
Index: 0 1 2 3 4 5 6 7 8 9

Step 1: left is at 0, right is at 9, mid is \(\frac{0+9}{2}\) = 4 (integer division).

Value: 'a' 'c' 'e' 'h' 'i' 'k' 'm' 'o' 'p' 'r'
Index: 0 1 2 3 4 5 6 7 8 9
  left       mid         right

Next we compare i to 'p'. 'p' comes later in the alphabet (so p > i), so next we're going to change the left value to look at mid+1 and keep right as it is.


Step 2: left is at 5, right is at 9, mid is \(\frac{5+9}{2} = \frac{14}{2}\) = 7.

Value: 'a' 'c' 'e' 'h' 'i' 'k' 'm' 'o' 'p' 'r'
Index: 0 1 2 3 4 5 6 7 8 9
            left   mid   right

Now we compare the item at arr[mid] 'o' to what we're searching for ('p'). p > o so we adjust our left point again to our current midpoint.


Step 3: left is at 7, right is at 9, mid is \(\frac{7+9}{2} = \frac{16}{2}\) = 8.

Value: 'a' 'c' 'e' 'h' 'i' 'k' 'm' 'o' 'p' 'r'
Index: 0 1 2 3 4 5 6 7 8 9
                left mid right

Now we compare the item at arr[mid] ('p') to what we're searching for ('p'). The values match! So the result is mid as the index where we found our item.

Each step through the process we cut out half the search area by investigating mid and deciding to ignore everything either before it (like our example) or after it. We do this every iteration, cutting out half the search region each time, effectively giving us an efficiency of \(O(log(n))\) - the inverse of an exponential increase.


2. Sorting

I'll update the text here later for next semester but I never liked sorting algorithms. I always found the approach to studying them really tedious in uni. I'm not going to make you have to figure out these algorithms yourself - the algorithms are online.

I'm just going to give you the code and we can visually step through how they work. It's possible you'll be asked to implement some sorting algorithms in a job interview if the company is really annoying, but for the most part you're going to be using sorting algorithms already implemented in your day-to-day life rather than implementing these yourself from scratch each time.

You'll find animations and stuff on the class webpage that hopefully illustrate it better than we could in a typewritten format.

Sorting algorithm efficiency (From https://www.bigocheatsheet.com/)

Algorithm Best time Average time Worst time
Bubble Sort \(\Omega(n)\) \(\Theta(n^{2})\) \(O(n^{2})\)
Insertion Sort \(\Omega(n)\) \(\Theta(n^{2})\) \(O(n^{2})\)
Selection Sort \(\Omega(n^{2})\) \(\Theta(n^{2})\) \(O(n^{2})\)
Merge Sort \(\Omega(n log(n))\) \(\Theta(n log(n))\) \(O(n log(n))\)
Quick Sort \(\Omega(n log(n))\) \(\Theta(n log(n))\) \(O(n^{2})\)

2.1. Bubble Sort

template <typename T>
void BubbleSort( vector<T>& arr )
{
    for ( int i = 0; i < arr.size() - 1; i++ )
    {
        for ( int j = 0; j < arr.size() - i - 1; j++ )
        {
            if ( arr[j] > arr[j+1] )
            {
                swap( arr[j], arr[j+1] );
            }
        }
    }
}

2.2. Insertion Sort

template <typename T>
void InsertionSort( vector<T>& arr )
{
    size_t arraySize = arr.size();
    size_t i = 1;

    while ( i < arraySize )
    {
        int j = i;
        while ( j > 0 && arr[j-1] > arr[j] )
        {
            swap( arr[j], arr[j-1] );
            j = j - 1;
        }

        i = i + 1;
    }
}

2.3. Selection Sort

template <typename T>
void SelectionSort( vector<T>& arr )
{
    int arraySize = arr.size();

    for ( size_t i = 0; i < arraySize - 1; i++ )
    {
        int minIndex = i;

        for ( size_t j = i + 1; j < arraySize; j++ )
        {
            if ( arr[j] < arr[minIndex] )
            {
                minIndex = j;
            }
        }

        if ( minIndex != i )
        {
            swap( arr[i], arr[minIndex] );
        }
    }
}

2.4. Merge Sort

// Declarations
template <typename T>
void MergeSort( vector<T>& arr );

template <typename T>
void MergeSort( vector<T>& arr, int left, int right );

template <typename T>
void Merge( vector<T>& arr, int left, int mid, int right );

// Definitions
template <typename T>
void MergeSort( vector<T>& arr )
{
    MergeSort( arr, 0, arr.size() - 1 );
}

template <typename T>
void MergeSort( vector<T>& arr, int left, int right )
{
    if ( left < right )
    {
        int mid = ( left + right ) / 2;

        MergeSort( arr, left, mid );
        MergeSort( arr, mid+1, right );
        Merge( arr, left, mid, right );
    }
}

template <typename T>
void Merge( vector<T>& arr, int left, int mid, int right )
{
    const int n1 = mid - left + 1;
    const int n2 = right - mid;

    vector<T> leftVec;
    vector<T> rightVec;

    for ( int i = 0; i < n1; i++ )
    {
        leftVec.push_back( arr[left + i] );
    }

    for ( int j = 0; j < n2; j++ )
    {
        rightVec.push_back( arr[mid + 1 + j] );
    }

    int i = 0;
    int j = 0;
    int k = left;

    while ( i < n1 && j < n2 )
    {
        if ( leftVec[i] <= rightVec[j] )
        {
            arr[k] = leftVec[i];
            i++;
        }
        else
        {
            arr[k] = rightVec[j];
            j++;
        }
        k++;
    }

    while ( i < n1 )
    {
        arr[k] = leftVec[i];
        i++;
        k++;
    }

    while ( j < n2 )
    {
        arr[k] = rightVec[j];
        j++;
        k++;
    }
}

2.5. Quick Sort

// Declarations
template <typename T>
void QuickSort( vector<T>& arr );

template <typename T>
void QuickSort( vector<T>& arr, int low, int high );

template <typename T>
int Partition( vector<T>& arr, int low, int high );

// Definitions
template <typename T>
void QuickSort( vector<T>& arr )
{
    QuickSort( arr, 0, arr.size() - 1 );
}

template <typename T>
void QuickSort( vector<T>& arr, int low, int high )
{
    if ( low < high )
    {
        int partIndex = Partition( arr, low, high );
        QuickSort( arr, low, partIndex - 1 );
        QuickSort( arr, partIndex + 1, high );
    }
}

template <typename T>
int Partition( vector<T>& arr, int low, int high )
{
    T pivotValue = arr[high];
    int i = low - 1;

    for ( int j = low; j <= high - 1; j++ )
    {
        if ( arr[j] <= pivotValue )
        {
            i++;
            swap( arr[i], arr[j] );
        }
    }

    swap( arr[i+1], arr[high] );
    return i + 1;
}

Author: Rachel Wil Sha Singh

Created: 2023-10-23 Mon 14:45

Validate