Why would we want to count the number of inversions? One application may be to see how close two different people’s rankings of the same products are. If one person ranks a set of movies, then we can use his ranking as the standard, and count the number of inversions in the second person’s rankings with respect to the first person’s. This way we can find which person out of a whole pool has very similar tastes to the “golden” person.
I have no idea if this is really a robust way of doing it, though.
Counting the Number of Inversions
The brute force approach for counting the number of inversions is \(\theta(N^{2})\). We can do better by piggy-backing on merge sort: Let’s say you have left and right sublists. The total number of inversions is the number within the left sublist, the number within the right sublist, and the number between the two sublists (i.e. the elements in the right sublist that are greater than the ones in the left sublist).
To calculate this inter-list number, we utilize the fact that each sublist is already sorted. Hence, if a number on the right sublist is smaller than an entry on the left, then it is smaller than all the other “future” entries in the left sublist (utilizing sortedness here). So counting during the merge operation is \(\Theta(N)\).
The down side to this approach is that you are modifying the list.
Implementation
Python
def simple_compare(left, right):
"""
Keyword Arguments:
left --
right --
"""
if left < right:
return -1
elif left > right:
return 1
else:
return 0
def merge(lsta, lstb, comparator):
"""
Merge two sorted lists together and return the merged list. This may be
inefficient as I take a lot of slices and produce temporary lists in several
places.
Also returns the number of inversions between the right and the left
sublists.
Keyword Arguments:
lsta -- Left list
lstb -- Right list
comparator -- Comparator function. It should take in two arguments, and
return -1 if the left is smaller, 1 if the right is smaller,
and 0 if they are equal. If none is provided, then it will
assume a numerical sort.
"""
new_list = []
index_a = 0
index_b = 0
completed = False
inversions = 0
len_a = len(lsta)
while not completed:
if comparator(lsta[index_a], lstb[index_b]) > 0:
start = index_b
while comparator(lsta[index_a], lstb[index_b]) > 0:
index_b += 1
inversions += len_a - index_a
if index_b == len(lstb):
completed = True
break
new_list.extend(lstb[start:index_b])
if completed:
new_list.extend(lsta[index_a:])
else:
start = index_a
while comparator(lstb[index_b], lsta[index_a]) > 0:
index_a += 1
if index_a == len(lsta):
completed = True
break
new_list.extend(lsta[start:index_a])
if completed:
new_list.extend(lstb[index_b:])
return new_list, inversions
def count_inversions_int(lst, merge_method=merge, comparator=simple_compare):
"""
Count the inversions in the list using a mergesort. So this *will* modify
the list.
Keyword Arguments:
lst -- List to sort
merge_method -- Function for merging.
comparator -- Comparator function. It should take in two arguments, and
return -1 if the left is smaller, 1 if the right is smaller,
and 0 if they are equal. If none is provided, then it will
assume a numerical sort.
"""
if len(lst) == 1:
return lst, 0
split = len(lst)/2
left_lst, linversions = count_inversions_int(lst[:split],
merge_method=merge_method,
comparator=comparator)
right_lst, rinversions = count_inversions_int(lst[split:],
merge_method=merge_method,
comparator=comparator)
new_list, minversions = merge_method(left_lst, right_lst,
comparator=comparator)
return new_list, linversions + rinversions + minversions
def count_inversions(lst):
"""
Count the inversions in the list using a mergesort. So this *will* modify
the list.
Keyword Arguments:
lst -- List to sort
"""
return count_inversions_int(lst)[1]