Counting Sort
Suppose we have \(N\) elements, but we know the elements are drawn from a fixed, finite list (called the alphabet in this article) of size \(R\). We can sort this list in linear time.
Algorithm
Create an array of size \(R+1\). This array will contain the number of times each element appears in the list. We arbitrarily assign each element in the alphabet some index in this auxiliary array.
For illustration purposes, let the “alphabet” be the English alphabet. Let’s assign 0 to a, 1 to b, etc.
Let the list be:
Element |
---|
d |
a |
a |
b |
e |
d |
b |
d |
a |
b |
c |
e |
a |
d |
c |
In the first pass of the list, we count the number of times each element appears. So if “a” appears 3 times, we’ll put 3 in the index of 1 (not 0) in the auxiliary array. If “b” appears 5 times, we’ll put 5 in the index of 2, and so on.
So the auxiliary array looks like:
Counts |
---|
0 |
4 |
3 |
2 |
3 |
2 |
Next, we replace these values by their “cumulative” values:
Counts |
---|
0 |
4 |
7 |
9 |
12 |
14 |
These values represent the starting index for each letter of the alphabet.
Now we iterate through our original list again, and every time we encounter a letter, we place it in the index in this auxiliary array. Whenever we place a letter, we increment the value in that index (so the next time the letter shows up, it’ll go into the next slot).
Complexity
The complexity of this algorithm is roughly \(\Theta(2N+R)\).
Note that this algorithm is linear. This is because no comparisons were made. This is not a comparison based algorithm, so it is not bounded below by \(N\lg N\).
Properties
- It is a stable sort.
- If \(R>>N\), then this may not be the best sorting algorithm. Don’t be fooled by its linear nature.
Implementations
Python
from collections import Counter, OrderedDict
def identity(x):
return x
def counting_sort(lst, mapping, function=identity):
"""
Counting sort. The assumption is that the entries in the list come from a
finite alphabet. Performance is good if the alphabet is much smaller than
the size of the list.
Keyword Arguments:
lst -- List to sort
mapping -- A dictionary that maps the elements in the alphabet to numbers.
function -- lst may contain complex objects. An optional function can be
passed that will extract the 'key' of the element. The default
is simply the identity function.
"""
counter = [0] * (len(mapping)+1)
for letter in lst:
counter[mapping[function(letter)]+1] += 1
cumul = {}
running_sum = 0
for index in xrange(len(mapping)):
cumul[index] = running_sum
running_sum += counter[index+1]
new_lst = [None] * len(lst)
for letter in lst:
new_lst[cumul[mapping[function(letter)]]] = letter
cumul[mapping[function(letter)]] += 1
return new_lst
C++
#include <algorithm>
#include <vector>
#include <map>
std::vector<char> counting_sort(std::vector<char> list,
std::map<char, int> mapping);
std::vector<char> counting_sort(std::vector<char> list,
std::map<char, int> mapping)
{
std::vector<int> counter(mapping.size()+1);
for(std::vector<char>::iterator iter = list.begin(); iter != list.end();
++iter)
{
++counter[mapping[*iter]+1];
}
std::map<int, int> cumul;
int running_sum = 0;
for(int index=0; index < mapping.size(); ++index)
{
cumul[index] = running_sum;
running_sum += counter[index+1];
}
std::vector<char> new_list(list.size());
for(std::vector<char>::iterator iter = list.begin(); iter != list.end();
++iter)
{
new_list[cumul[mapping[*iter]]] = *iter;
++cumul[mapping[*iter]];
}
return new_list;
}