Binary Search Trees
Table of Contents
- 1. Introduction to Binary Search Trees
- 2. Architecture of a Binary Search Tree
- 3. BinarySearchTreeNode in C++:
- 4. BinarySearchTree in C++:
- 5. Efficiency of a Binary Search Tree
- 6. Types of Binary Trees
- 7. Binary Search Tree Traversal
- 8. Binary Search Tree Operations
- 9. Balancing a Binary Search Tree
- 10. AVL Tree
- 11. Red-Black Tree
- 12. Conclusion
WIP: ChatGPT converted this from my LaTeX document to orgmode so I still need to review this.
1. Introduction to Binary Search Trees
1.1. Binary search trees are a type of data structure that keep the data it contains ordered. The ordering process happens when a new piece of data is entered by finding a location for the data that adheres to the ordering rules. With a binary search tree, smaller values are stored to the left and larger values are stored to the right.
This means that when we're searching for data, when we land at a node we can figure out whether to traverse left or right by comparing the node to what we're searching for.
2. Architecture of a Binary Search Tree
With a Linked List, we need to implement a Node structure and a LinkedList class, where the Node stores the data and the LinkedList provides an interface for users to add, remove, and search for data and stores a pointer to the first and last elements.
\vspace{1cm}
Similarly for a Binary Search Tree, we need another type of Node to store the data, as well as the BinarySearchTree structure that acts as an interface and keeps a pointer to the root node.
We might also think of a Binary Search Tree as being ordered based on the data's key - some sort of unique identifier we assign to each node - and then containing additional data (a value) within the node.
For example, if we create our Node with two templated types like this:
our key could be a unique lookup (e.g., "employee ID"), and the *data*/value could be another structure that stores more employee data (name, department, etc.).
What's the significance of the hierarchy in this tree? Nothing, really - the point of a binary search tree is that we're assuming the keys we will be pushing into the tree will be in a somewhat random order, and using the BST structure will help keep things ordered and somewhat faster to search through.
We could use a Binary Search Tree in another application where the order of the keys does have some significance, but that's all a design decision.
3. BinarySearchTreeNode in C++:
```cpp
template <typename TK, typename TD> class Node { public: Node() { ptrLeft = nullptr; ptrRight = nullptr; } Node( TK newKey, TD newData ) { key = newKey; data = newData; ptrLeft = nullptr; ptrRight = nullptr; } ~Node() { if ( ptrLeft != nullptr ) { delete ptrLeft; } if ( ptrRight != nullptr ) { delete ptrRight; } } Node<TK, TD>* ptrLeft; Node<TK, TD>* ptrRight; TD data; TK key; };
The node I've written here contains a key, which nodes will be ordered by, and data, which can contain more information.
As with any structure utilizing pointers, the pointers should be initialized to `nullptr` in any constructors.
The destructor here will trigger the deletion of any child nodes, creating a chain reaction to clean up the entire tree if the root node is deleted.
4. BinarySearchTree in C++:
```cpp
template <typename TK, typename TD> class BinarySearchTree { public: BinarySearchTree(); ~BinarySearchTree(); // Basic functionality void Push( const TK& newKey, const TD& newData ); bool Contains( const TK& key ); TD& GetData( const TK& key ); void Delete( const TK& key ); // Traversal functions string GetInOrder(); string GetPreOrder(); string GetPostOrder(); // Additional functionality TK& GetMinKey(); TK& GetMaxKey(); int GetCount(); int GetHeight(); private: // (more here) private: Node<TK, TD>* m_ptrRoot; int m_nodeCount; };
A Binary Search Tree, just like other data structures, can store more functionality than this, or less if needed.
There are additional private methods that would be implemented. This declaration is just showing the public (interface) methods and the private member variables. I will talk about the private helper methods in depth later.
5. Efficiency of a Binary Search Tree
The Binary Search Tree ends up being a good compromise between choosing faster random access but slow search/insert/delete (like with a dynamic array) and faster inserts/deletes but slow search/access (like with a linked list).
The Binary Search Tree ends up being slower than \(O(1)\) (instant) but faster than \(O(N)\) (going through the entire list, where \(N\) is the number of nodes in the tree).
The efficiency of searching through the Binary Search Tree depends on how well balanced the tree is. The more balanced the tree, the fewer steps it will take to find a particular node/key. On average, a well-balanced Binary Search Tree will take \(O(log N)\) time for each search/insert/delete operation.
If the tree becomes more like a linked list (where each node only has one child), searching through the Binary Search Tree will degrade to \(O(N)\) time. So, it's essential to keep the tree relatively balanced to maintain its efficiency.
6. Types of Binary Trees
- Full Binary Tree: Every node in this tree has either 0 or 2 children. There are no nodes with only one child.
- Complete Binary Tree: In a complete binary tree, every level except possibly the last is completely filled, and all nodes are as left as possible.
- Perfect Binary Tree: In a perfect binary tree, all the internal nodes have two children, and all leaf nodes are at the same level.
- Balanced Binary Tree: In a balanced binary tree, the height of the two subtrees of every node never differs by more than one. It ensures that the tree is relatively balanced and provides \(O(log N)\) performance for search, insert, and delete operations.
- Degenerate or Skewed Binary Tree: In this tree, each parent node has only one associated child node. It essentially becomes a linked list.
7. Binary Search Tree Traversal
Traversal is the process of visiting all the nodes in a tree and performing some operations at each node. Binary Search Trees support three types of traversal:
- In-Order Traversal: In this traversal, you first visit the left subtree, then the root node, and finally the right subtree. It gives you the nodes in sorted order when applied to a Binary Search Tree.
- Pre-Order Traversal: In this traversal, you first visit the root node, then the left subtree, and finally the right subtree.
- Post-Order Traversal: In this traversal, you first visit the left subtree, then the right subtree, and finally the root node.
Here's an example of how these traversals look in code:
```cpp
// In-Order Traversal void InOrderTraversal(Node<TK, TD>* pNode) { if (pNode) { InOrderTraversal(pNode->ptrLeft); // Process pNode InOrderTraversal(pNode->ptrRight); } } // Pre-Order Traversal void PreOrderTraversal(Node<TK, TD>* pNode) { if (pNode) { // Process pNode PreOrderTraversal(pNode->ptrLeft); PreOrderTraversal(pNode->ptrRight); } } // Post-Order Traversal void PostOrderTraversal(Node<TK, TD>* pNode) { if (pNode) { PostOrderTraversal(pNode->ptrLeft); PostOrderTraversal(pNode->ptrRight); // Process pNode } }
These are recursive functions that start at the root node and traverse the tree, applying the appropriate action at each node.
8. Binary Search Tree Operations
- Insertion: To insert a new node with a specific key and data into a Binary Search Tree, you start at the root and recursively traverse the tree. When you reach a null pointer, you create a new node with the given key and data and attach it to the parent node.
- Deletion: Deleting a node from a Binary Search Tree can be a bit more complex because you need to maintain the binary search tree properties. There are three cases to consider:
- The node to be deleted has no children: In this case, you simply remove the node from its parent.
- The node to be deleted has one child: In this case, you replace the node with its child.
- The node to be deleted has two children: In this case, you find the inorder successor or predecessor, replace the node with it, and then delete the inorder successor or predecessor.
- Searching: To search for a specific key in a Binary Search Tree, you start at the root and recursively traverse the tree, comparing the key you're looking for with the keys in the nodes. If you find a node with the matching key, you return its data. If you reach a null pointer, you know the key is not in the tree.
- Minimum and Maximum Key: To find the minimum key in a Binary Search Tree, you traverse the tree to the leftmost node. To find the maximum key, you traverse the tree to the rightmost node.
- Count and Height: You can also implement functions to count the number of nodes in the tree and calculate its height.
9. Balancing a Binary Search Tree
Balancing a Binary Search Tree is essential to maintain efficient search, insert, and delete operations. When a Binary Search Tree becomes unbalanced, its height can approach \(N\), making these operations \(O(N)\) instead of \(O(log N)\).
There are several algorithms for balancing Binary Search Trees, with the most common one being the AVL tree and Red-Black tree. These algorithms ensure that the tree remains balanced after insertions and deletions, guaranteeing that search, insert, and delete operations have a time complexity of \(O(log N)\).
10. AVL Tree
An AVL tree (Adelson-Velsky and Landis tree) is a self-balancing Binary Search Tree. It maintains the balance factor of each node to ensure that the tree remains balanced. The balance factor of a node is the height of its left subtree minus the height of its right subtree.
The balance factor of every node in an AVL tree must be in the range [-1, 1]. If the balance factor of any node is outside this range, the tree is rebalanced using rotation operations.
The rotations include left rotations and right rotations to maintain the balance. When a new node is inserted, the balance factors of the affected nodes are updated, and rotations are performed if necessary to rebalance the tree.
Here's an example of a left rotation:
And here's an example of a right rotation:
These rotations ensure that the tree remains balanced, and the height remains \(O(log N)\). The AVL tree guarantees efficient search, insert, and delete operations.
11. Red-Black Tree
A Red-Black tree is another self-balancing Binary Search Tree. It maintains balance using a set of properties and color-coding of nodes. Each node in a Red-Black tree is assigned one of two colors: red or black. The tree's properties ensure that it remains balanced.
The key properties of a Red-Black tree are as follows:
- Every node is either red or black.
- The root node is always black.
- Every leaf (NIL) node is black.
- If a node is red, both its children are black (no two red nodes can be adjacent).
- Every simple path from a node to a descendant leaf node must have the same number of black nodes.
These properties ensure that the tree remains balanced. When you insert or delete a node, you may need to perform recoloring and rotations to maintain these properties.
Here's an example of a Red-Black tree:
The Red-Black tree also guarantees efficient search, insert, and delete operations, with a height of \(O(log N)\).
12. Conclusion
In summary, a Binary Search Tree is a data structure that allows for efficient searching, insertion, and deletion of elements. However, it's essential to keep the tree relatively balanced to maintain its efficiency. AVL trees and Red-Black trees are two common self-balancing Binary Search Tree implementations that ensure efficient operations.