Quick Sort

Posted by Beetle B. on Wed 13 August 2014

A quick sort works recursively:

  1. Shuffle the list. This protects against a carefully crafted input that will make the algorithm \(\Theta(N^{2})\).

  2. Pick an element (called the pivot) and place the pivot into its final place in the list in a manner that partitions the list into left and right sublists. Everything in the left sublist is smaller than the pivot. Everything in the right sublist is greater than the pivot.

    How one chooses to partition is crucial for performance.

  3. Apply the algorithm to each sublist recursively.

One may choose to use insertion sort when the sublist is smaller than, say, 10 elements.

Partitioning

The partitioning scheme greatly affects the performance. One method (the 3-way quicksort) is to:

  1. Make the first element the pivot (if you made another element a pivot, simply swap it with the first).
  2. Let i be the index to the next element, and j the index to the last element.
  3. As long as the value at i is smaller than the pivot, keep incrementing it, as well as moving the pivot forward by one.
  4. Once the value at i is greater than the pivot, swap it with the value at j, and decrement j.
  5. If the value at i equals the pivot, increment i, without moving the pivot.
  6. Stop when j passes i.

Choice of Pivot

The best pivot is the median of the list. As we don’t know this in advance, it can be estimated.

One option is to take the median of the first 3 elements (after shuffling).

Another is to use Tukey’s ninther, which is the median of 3 numbers. One number is the median of the first 3 elements. Another is the median of the next 3 elements. And the third is the median of the next 3. This is a robust estimator.

Equal Keys

If the array has lots of elements with equal keys, many standard implementations become quadratic. In particular, do not try to put all elements of the same key in one sublist. The 3-way partition above is supposed to handle multiple duplicate keys.

Complexity

The complexity is \(\Theta(N\lg N)\) on average. The worst case is still quadratic, but the likelihood of getting a worst case input is really low for large data sets (assuming the algorithm appropriately shuffles the list).

If you have a constant number of distinct keys, then the algorithm given with 3-way partitioning is linear!

As noted, the ideal case is when you pick the median element each time as the pivot. This is a trivial application of the master theorem. However, do note that if you always make the pivot to be of the some fixed percentile, then the recursion tree method still gives you \(\Theta(N\log N)\). You don’t need to pick the median. You could always split into 5% and 95% and still get good performance. The algorithm is robust.

Derivation of Average Complexity

Let’s directly calculate the expected number of comparisons needed in the quicksort algorithm.

\begin{equation*} E[C]=\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}P[X_{ij}=1] \end{equation*}

where \(X_{ij}=1\) means that \(z_{i},z_{j}\) get compared (\(i\) th and \(j\) th smallest terms).

Now:

\begin{equation*} \forall i<j,P[X_{ij}=1]=\frac{2}{j-i+1} \end{equation*}

To see this, look at the ascending sequence: \(z_{i},z_{i+1},\dots,z_{j}\). If any of the “middle” terms is chosen as a pivot, then \(X_{ij}=0\) as the two terms will get thrown into separate sublists.

Now note that in the algorithm, every element will eventually be chosen as a pivot. If \(z_{i}\) or \(z_{j}\) get chosen as pivots before the other terms, then they will be compared. Otherwise they will not. The probability of this happening is given by the uniform distribution \(\frac{2}{j-i+1}\) (we ignore all other terms).

Thus:

\begin{equation*} E[C]=2\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}\frac{1}{j-i+1} \end{equation*}

Looking at the inner sum, we note that it is bounded above by \(\sum_{k=2}^{n}\frac{1}{k}\). If we use this inequality, the inner term is independent of \(i\), so we get:

\begin{equation*} E[C]\le 2n\sum_{k=2}^{n}\frac{1}{k}\le 2n\ln n \end{equation*}

Properties

  1. Quicksort is not stable. Variants exist that are but they often lose the benefits of quicksort.
  2. Quicksort makes more comparisons than mergesort, but is usually faster, as the inner loop is tight. It may have fewer exchanges.
  3. It is in-place, which is one of the main advantages it has over mergesort. Presumably this gives it an advantage in caches, etc. However, it does use \(\lg N\) space for the function stack in recursive calls.
  4. If the input is randomly arranged, then the left and right sublists will also be random.

Implementations

Python

from random import shuffle

def simple_compare(left, right):
    """
    Simple Comparison using < and >.

    Keyword Arguments:
    left  -- 
    right -- 
    """
    if left < right:
	return -1
    elif left > right:
	return 1
    else:
	return 0

def median_3(a, b, c):
    """
    Get median of 3 elements.

    Not a good way of doing this, but...
    """
    if (a <= b):
	if (b <= c):
	    return b
	else:
	    if (a <= c):
		return c
	    else:
		return a
    else:
	if a <= c:
	    return a
	else:
	    if (b <= c):
		return c
	    else:
		return b
	    

def guess_median(lst):
    """
    Estimate the median in the list. If more than 9 elements, it uses
    Tukey's ninther.
    """
    length_lst = len(lst)
    if length_lst < 3:
	return lst[0]
    if length_lst > 9:
	median1 = median_3(*lst[:3])
	median2 = median_3(*lst[3:6])
	median3 = median_3(*lst[6:9])
	return median_3(median1, median2, median3)
    else:
	return median_3(*lst[:3])

def partition(lst):
    """
    Partition a list based on pivot. This is the 3-way quicksort method

    Keyword Arguments:
    lst   -- 
    """
    pivot = lst[0]
    pivot_index = 0
    i = 1
    j = len(lst) - 1
    while (i <= j):
	if lst[i] < pivot:
	    lst[pivot_index], lst[i] = lst[i], lst[pivot_index]
	    pivot_index += 1
	    i += 1
	elif lst[i] > pivot:
	    lst[i], lst[j] = lst[j], lst[i]
	    j -= 1
	else:
	    i += 1
    return pivot_index, j
    
def quicksort(lst, comparator=simple_compare, is_shuffled=False):
    """
    Quick sort. It takes in a list of objects and sorts them based on
    the comparator. This algorithm uses the 3-way quicksort and shuffles
    the input before doing anything else.

    Note that this does /not/ sort in place! It creates temporary memory.
    
    Keyword Arguments:
    lst        -- List to sort
    comparator -- Comparator function. It should take in two arguments, and
                  return -1 if the left is smaller, 1 if the right is smaller,
		  and 0 if they are equal. If none is provided, then it will
		  assume a numerical sort.
    is_shuffled -- Flag on whether to shuffle list.
    """
    if len(lst) < 2:
	return lst
    if not is_shuffled:
	shuffle(lst)
    lower, upper = partition(lst)
    lst[:lower] = quicksort(lst[:lower], comparator, True)
    lst[upper+1:] = quicksort(lst[upper+1:], comparator, True)
    return lst

def quicksort2(lst, comparator=simple_compare, is_shuffled=False):
    """
    Quick sort using list comprehensions. A part of me suspects this
    algorithm may be faulty for some data sets (i.e. slow). This
    algorithm shuffles the input before doing anything else.

    Note that this does /not/ sort in place! It creates temporary memory.

    Also note that this method WILL fail if you have different objects
    with duplicate keys, as this simply clones the objects of the same
    key.
    

    Keyword Arguments:
    lst        -- List to sort
    comparator -- Comparator function. It should take in two arguments, and
                  return -1 if the left is smaller, 1 if the right is smaller,
		  and 0 if they are equal. If none is provided, then it will
		  assume a numerical sort.
    is_shuffled -- Flag on whether to shuffle list.
    """
    if len(lst) < 2:
	return lst
    if not is_shuffled:
	shuffle(lst)
    pivot = guess_median(lst)
    left = [x for x in lst if comparator(pivot, x)==1]
    right = [x for x in lst if comparator(pivot, x)==-1]
    left = quicksort2(left, comparator, True)
    right = quicksort2(right, comparator, True)
    num_dupes = len(lst) - (len(left) + len(right))
    return left + [pivot]*num_dupes + right

C++

Use qsort in cstdlib.