At times we want a structure similar to a queue or stack, but instead of LIFO or FIFO, we want it to pop the maximum element each time.
One data structure that allows us to do this is the binary heap.
Binary Heap
A binary heap is a complete binary tree. This means that it is balanced except perhaps at the last level. One of the properties of a complete binary tree is that its height is the floor of \(\lg N\).
A binary heap is a binary tree that satisfies the following invariant: The item at a given node is not smaller than any of its children. As such, the root of the tree is the maximal element.
Operations
Swimming
If we find a child that is greater than its parent, we simply exchange it with the parent, and iterate if need be. This is called swimming up.
Insertion
To insert a new element, add it as a leaf node, and swim up appropriately. The cost is \(O(\lg N)\), although it has been shown that on average it is constant time. Intuitively one can realize this by noting that roughly half the elements are leaf nodes, a quarter are at the next level, etc.
Sinking
If a parent is smaller than one or both of its children, exchange it with the larger of the two, and recurse. This is sinking.
Pop Max
Exchange the maximal item with the last element. Then sink the new root down to its appropriate position. This is \(O(\lg N)\).
Note that the invariance of being a complete tree is maintained. When you swap the last leaf element with the root, and then delete that leaf, your tree is still balanced.
Sample (Optional)
To get a random sample from a binary heap, just pick a random index.
Delete Random Element (optional)
Delete in the same way as you would the max element. Swap the last element with the element you wish to delete, and sink down.
Building a Heap
A heap can be built with successive insertions. This will be \(O(N\lg N)\). However, if we already have all the elements, a scheme exists that will create the heap at \(O(N)\). See this page.
Implementation
The canonical way to implement a binary heap is via an array. For convenience, I’ll use 1-based indexing.
Parent of item in position \(k\) is at position \(k/2\). Its children are at \(2k\) and \(2k+1\). Due to the simplicity of representation, one does not need links! One does have to keep track of the size of the heap, though, and ensure it doesn’t exceed the size of the array.
Properties
- The priority queue is online - you do not need all the data at once. You can find the maximum, etc thus far.
- Insertion and deletion are \(\lg N\).
- It may not be effective in memory cache applications as one frequently has to access non-contiguous regions in memory (when going from child to parent).
Alternatives
- A d-ary heap is like a binary heap except each node has \(d\) children. Sinking is slower than binary as it has to compare with more branches each time, but swimming is faster (due to a smaller height). It is less likely to have cache mishits.
- A Fibonacci heap is another kind of heap. Insertion is \(O(1)\) and deletion is \(\lg N\) on average.
Applications
Top \(M\) Items in \(N\) Elements
You can find the top \(M\) items in a list of \(N\) elements without having to store all \(N\) of them (as many sorting algorithms will require).
N-Particle Collisions
If you have \(N\) particles in a box and want to simulate their movements along with their collisions, a brute force approach can take \(O(N^{2})\) calculations at each time step to check for a collision (check each particle with all others). This is too expensive.
Instead, knowing the trajectories, one can predict when a collision can occur. Put the times to collisions in a binary heap. One should update all collisions involving a particle after a collision occurs, as its trajectory has changed and may no longer collide with some particles.
Find Median
One can build a data structure that always pops the median. This is done by building two binary heaps. In one binary heap you will always have the elements less than the median (this is a max heap). In the other you will have elements greater than the median (this is a min heap) When a new element comes along, pick the appropriate heap to insert it into. Track the lengths of each heap - they should never differ by greater than one. If one of them (e.g. the “greater than” heap) does, then insert the current median into the “smaller than” heap, pop the minimal element in the “larger than” heap, and that becomes your new median.
Implementations
Python
Use the heapq module.
C++
Use priority_queue in the queue library:
#include <iostream>
#include <queue>
#include <algorithm>
int main(void) {
std::vector<int> v;
std::priority_queue<int> q;
int N = 10;
for(int i=0; i < N; ++i)
{
v.push_back(i);
}
std::random_shuffle(v.begin(), v.end());
for(std::vector<int>::iterator it=v.begin(); it != v.end(); ++it)
{
std::cout << *it << std::endl;
q.push(*it);
}
std::cout << std::endl;
std::cout << std::endl;
std::cout << q.top() << std::endl;
q.pop();
std::cout << q.top() << std::endl;
q.pop();
std::cout << q.top() << std::endl;
q.pop();
return 0;
}