# Searching and Sorting

## Table of Contents

## 1. Searching

We're going to keep this section short for now because we're mostly going to be focusing on sorting algorithms.

- When we're searching for items in an
*unsorted*linear structure - there's not much we can do to speed up the process. We can basically either start at the beginning and move forward, or start and the end and move backward, checking each item in the structure for what you're looking for.

template <typename T> int LinearSearch( const vector<T>& arr, T findme ) { int size = arr.size(); for ( int i = 0; i < size; i++ ) { if ( arr[i] == findme ) { return i; } } return -1; // not found }

We begin at the first index 0 and iterate until we hit the last
index. Within the loop, if the element at index `i`

matches what we're looking for, we return this index.

If the loop completes and we haven't returned an index yet that means we've searched the entire structure and have not found the item. In this case, it is not in the structure and we can throw an exception to be dealt with elseware or return something like -1 to symbolize "no valid index".

This search algorithm's growth rate is \(O(n)\) – the more items in the structure, the time linearly increases to search through it. Not much we can do about that, which is why we have different types of data structures that sort data as it is inserted - more on those later on.

**OK, but what if the structure is sorted?**

We're going to be learning about sorting algorithms, so what if we happen
to have a structure that *is* sorted? How can we more intelligently
look for some *value* in the structure?

Let's say we have a simple array like this:

Value: | "aardvark" | "bat" | "cat" | "dog" | "elephant" | "fox" |

Index: | 0 | 1 | 2 | 3 | 4 | 5 |

And we want to see if `"dog"`

is in the array. We could investigate
what the first item is (Hm, starts with an `"a"`

) and the last item
(`"f"`

), and realize that `"d"`

is about halfish way between
both values. Maybe we should start in the middle and move left or right?

- Index 0 is
`"aardvark"`

. Index 5 is`"fox"`

. Middle value \(\frac{0+5}{2}\) is 2.5 (or 2, for integer division). What is at position 2? –`"cat"`

. If arr[2] \(<\) findme, move left (investigate arr[1] next) Or if arr[2] \(>\) findme, move right (investigate arr[3] next). `"d"`

is greater than`"c"`

so we'll move right… Index 3 gives us`"dog"`

- we've found the item! Return 3.

In this case, we basically have two iterations of a loop to find "dog" and return its index. If we were searching linearly, we would have to go from 0 to 1 to 2 to 3, so four iterations.

This still isn't the most efficient way to search this array - just starting at the midpoint and moving left or moving right each time. However, we can build a better search that imitates that first step: Checking the mid-way point each time.

Here is the **Binary Search**.

template <typename T> int BinarySearch( vector<T> arr, T findme ) { int size = arr.size(); int left = 0; int right = size - 1; while ( left <= right ) { int mid = ( left + right ) / 2; if ( arr[mid] < findme ) { left = mid + 1; } else if ( arr[mid] > findme ) { right = mid - 1; } else if ( arr[mid] == findme ) { return mid; } } return -1; // not found }

With the binary search we look at the left-most index,
right-most index, and mid-point. Each iteration of the loop,
we look at our search value `findme`

– is its value
greater than the middle or less than the middle?

- Example

Let's say we have this array, and we are searching for `'p'`

.

Value: | `'a'` |
`'c'` |
`'e'` |
`'h'` |
`'i'` |
`'k'` |
`'m'` |
`'o'` |
`'p'` |
`'r'` |

Index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

**Step 1:** `left`

is at 0, `right`

is at 9, `mid`

is \(\frac{0+9}{2}\) = 4 (integer division).

Value: | `'a'` |
`'c'` |
`'e'` |
`'h'` |
`'i'` |
`'k'` |
`'m'` |
`'o'` |
`'p'` |
`'r'` |

Index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

left | mid | right |

Next we compare `i`

to `'p'`

. `'p'`

comes later in the alphabet (so `p > i`

),
so next we're going
to change the `left`

value to look at `mid+1`

and keep `right`

as it is.

**Step 2:** `left`

is at 5, `right`

is at 9, `mid`

is \(\frac{5+9}{2} = \frac{14}{2}\) = 7.

Value: | `'a'` |
`'c'` |
`'e'` |
`'h'` |
`'i'` |
`'k'` |
`'m'` |
`'o'` |
`'p'` |
`'r'` |

Index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

left | mid | right |

Now we compare the item at `arr[mid]`

`'o'`

to what we're searching for (`'p'`

). `p > o`

so we adjust
our left point again to our current midpoint.

**Step 3:** `left`

is at 7, `right`

is at 9, `mid`

is \(\frac{7+9}{2} = \frac{16}{2}\) = 8.

Value: | `'a'` |
`'c'` |
`'e'` |
`'h'` |
`'i'` |
`'k'` |
`'m'` |
`'o'` |
`'p'` |
`'r'` |

Index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

left | mid | right |

Now we compare the item at `arr[mid]`

(`'p'`

) to what we're searching for (`'p'`

).
The values match! So the result is `mid`

as the index where we found our item.

Each step through the process we **cut out half the search area** by investigating
mid and deciding to ignore everything either *before it* (like our example)
or *after it*. We do this every iteration, cutting out half the search region
each time, effectively giving us an efficiency of \(O(log(n))\) - the inverse
of an exponential increase.

## 2. Sorting

I'll update the text here later for next semester but I never liked sorting algorithms. I always found the approach to studying them really tedious in uni. I'm not going to make you have to figure out these algorithms yourself - the algorithms are online.

I'm just going to give you the code and we can visually step through how they work. It's possible you'll be asked to implement some sorting algorithms in a job interview if the company is really annoying, but for the most part you're going to be using sorting algorithms already implemented in your day-to-day life rather than implementing these yourself from scratch each time.

You'll find animations and stuff on the class webpage that hopefully illustrate it better than we could in a typewritten format.

**Sorting algorithm efficiency** (From https://www.bigocheatsheet.com/)

Algorithm | Best time | Average time | Worst time |
---|---|---|---|

Bubble Sort | \(\Omega(n)\) | \(\Theta(n^{2})\) | \(O(n^{2})\) |

Insertion Sort | \(\Omega(n)\) | \(\Theta(n^{2})\) | \(O(n^{2})\) |

Selection Sort | \(\Omega(n^{2})\) | \(\Theta(n^{2})\) | \(O(n^{2})\) |

Merge Sort | \(\Omega(n log(n))\) | \(\Theta(n log(n))\) | \(O(n log(n))\) |

Quick Sort | \(\Omega(n log(n))\) | \(\Theta(n log(n))\) | \(O(n^{2})\) |

### 2.1. Bubble Sort

template <typename T> void BubbleSort( vector<T>& arr ) { for ( int i = 0; i < arr.size() - 1; i++ ) { for ( int j = 0; j < arr.size() - i - 1; j++ ) { if ( arr[j] > arr[j+1] ) { swap( arr[j], arr[j+1] ); } } } }

### 2.2. Insertion Sort

template <typename T> void InsertionSort( vector<T>& arr ) { size_t arraySize = arr.size(); size_t i = 1; while ( i < arraySize ) { int j = i; while ( j > 0 && arr[j-1] > arr[j] ) { swap( arr[j], arr[j-1] ); j = j - 1; } i = i + 1; } }

### 2.3. Selection Sort

template <typename T> void SelectionSort( vector<T>& arr ) { int arraySize = arr.size(); for ( size_t i = 0; i < arraySize - 1; i++ ) { int minIndex = i; for ( size_t j = i + 1; j < arraySize; j++ ) { if ( arr[j] < arr[minIndex] ) { minIndex = j; } } if ( minIndex != i ) { swap( arr[i], arr[minIndex] ); } } }

### 2.4. Merge Sort

// Declarations template <typename T> void MergeSort( vector<T>& arr ); template <typename T> void MergeSort( vector<T>& arr, int left, int right ); template <typename T> void Merge( vector<T>& arr, int left, int mid, int right ); // Definitions template <typename T> void MergeSort( vector<T>& arr ) { MergeSort( arr, 0, arr.size() - 1 ); } template <typename T> void MergeSort( vector<T>& arr, int left, int right ) { if ( left < right ) { int mid = ( left + right ) / 2; MergeSort( arr, left, mid ); MergeSort( arr, mid+1, right ); Merge( arr, left, mid, right ); } } template <typename T> void Merge( vector<T>& arr, int left, int mid, int right ) { const int n1 = mid - left + 1; const int n2 = right - mid; vector<T> leftVec; vector<T> rightVec; for ( int i = 0; i < n1; i++ ) { leftVec.push_back( arr[left + i] ); } for ( int j = 0; j < n2; j++ ) { rightVec.push_back( arr[mid + 1 + j] ); } int i = 0; int j = 0; int k = left; while ( i < n1 && j < n2 ) { if ( leftVec[i] <= rightVec[j] ) { arr[k] = leftVec[i]; i++; } else { arr[k] = rightVec[j]; j++; } k++; } while ( i < n1 ) { arr[k] = leftVec[i]; i++; k++; } while ( j < n2 ) { arr[k] = rightVec[j]; j++; k++; } }

### 2.5. Quick Sort

// Declarations template <typename T> void QuickSort( vector<T>& arr ); template <typename T> void QuickSort( vector<T>& arr, int low, int high ); template <typename T> int Partition( vector<T>& arr, int low, int high ); // Definitions template <typename T> void QuickSort( vector<T>& arr ) { QuickSort( arr, 0, arr.size() - 1 ); } template <typename T> void QuickSort( vector<T>& arr, int low, int high ) { if ( low < high ) { int partIndex = Partition( arr, low, high ); QuickSort( arr, low, partIndex - 1 ); QuickSort( arr, partIndex + 1, high ); } } template <typename T> int Partition( vector<T>& arr, int low, int high ) { T pivotValue = arr[high]; int i = low - 1; for ( int j = low; j <= high - 1; j++ ) { if ( arr[j] <= pivotValue ) { i++; swap( arr[i], arr[j] ); } } swap( arr[i+1], arr[high] ); return i + 1; }