Sorting

Sorting: rearrange a collection of items into increasing or decreasing order.

  • Time complexity:

$$ T(n) = \begin{cases} O(n^2), & Naive/Brute-force~~algorithm, insertion sort\\ O(n log n), & heapsort, merge sort, quick sort \end{cases} $$

Insertion Sort

Insertion sort is both faster and well-arguably more simplistic than both bubble sort and selection sort. Funny enough, it’s how many people sort their cards when playing a card game!

On each loop iteration, insertion sort removes one element from the array. It then finds the location where that element belongs within another sorted array and inserts it there. It repeats this process until no input elements remain.

In [119]:
class Solution:
    
    """
    Basic idea: 
    - take out the number from left to right one by one
    - For the i-th number, find out the position in the previous (left) i-1 numbers 
      by shifting the larger one to right by one position recursively.
    
    Time complexity: O(n^2)
    """
    
    def __init__(self,array):
        self.array = array
        
    def insert_sort(self):
        array = self.array
        for i in range(len(array)):
            cursor   = array[i]                       # take out the i-th number
            position = i
            
            # looking for the position in which to insert the i-th number
            while position >0 and array[position-1]>cursor:
                array[position] = array[position-1]    # shift the number to right by 1
                position  = position -1                # shift to next-left one
        
            array[position] = cursor   # finally find out the position for the number cursor
        return array
In [120]:
import numpy as np
import time
A = np.random.randint(0,100,50)  # randomly take 50 integer numbers in between 0 and 100
ts = time.time()
print(Solution(A).insert_sort() ) 
print("Time elapse:",time.time()-ts)
[ 1  3  8  8 12 22 22 23 24 26 29 30 31 36 38 44 50 50 50 51 52 53 54 57
 57 57 58 61 62 63 65 65 66 66 69 76 76 77 78 79 87 87 88 88 89 96 96 96
 97 99]
Time elapse: 0.0010328292846679688

Merge Sort

Merge sort is a perfectly elegant example of a Divide and Conquer algorithm. It simple uses the 2 main steps of such an algorithm:

  • Continuously divide the unsorted list until you have N sublists, where each sublist has 1 element that is “unsorted” and N is the number of elements in the original array.

  • Repeatedly merge i.e conquer the sublists together 2 at a time to produce new sorted sublists until all elements have been fully merged into a single sorted array.

In [21]:
def merge_sort(arr):
    # The last array split
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2
    # Perform merge_sort recursively on both halves
    left, right = merge_sort(arr[:mid]), merge_sort(arr[mid:]) 
    # Merge each side together 
    return merge(left, right, arr.copy())


def merge(left, right, merged):
    left_cursor, right_cursor = 0, 0
    while left_cursor < len(left) and right_cursor < len(right):
      
    
        # Sort each one and place into the result
        if left[left_cursor] <= right[right_cursor]:
            merged[left_cursor+right_cursor]=left[left_cursor] 
            left_cursor += 1
        else:
            merged[left_cursor + right_cursor] = right[right_cursor] 
            right_cursor += 1
            
    for left_cursor in range(left_cursor, len(left)):
        merged[left_cursor + right_cursor] = left[left_cursor]
        
    for right_cursor in range(right_cursor, len(right)):
        merged[left_cursor + right_cursor] = right[right_cursor]

    return merged
In [22]:
import time
#A = np.random.randint(0,100,4)  # randomly take 50 integer numbers in between 0 and 100
A = [5,4,3,2,1,0]  
ts = time.time()
print(merge_sort(A))
print(time.time()-ts)
[0, 1, 2, 3, 4, 5]
0.00021696090698242188

Quick Sort

Quick sort is also a divide and conquer algorithm like merge sort. Although it’s a bit more complicated, in most standard implementations it performs significantly faster than merge sort and rarely reaches its worst case complexity of $O(n^2)$. It has 3 main steps:

  • We first select an element which we will call the pivot from the array.
  • Move all elements that are smaller than the pivot to the left of the pivot; move all elements that are larger than the pivot to the right of the pivot. This is called the partition operation.
  • Recursively apply the above 2 steps separately to each of the sub-arrays of elements with smaller and bigger values than the last pivot.
In [9]:
def partition(array, begin, end):
    pivot_idx = begin
    for i in range(begin+1, end+1):
        if array[i] <= array[begin]:
            pivot_idx += 1
            array[i], array[pivot_idx] = array[pivot_idx], array[i]
    array[pivot_idx], array[begin] = array[begin], array[pivot_idx]
    return pivot_idx

def quick_sort_recursion(array, begin, end):
    if begin >= end:
        return
    pivot_idx = partition(array, begin, end)
    quick_sort_recursion(array, begin, pivot_idx-1)
    quick_sort_recursion(array, pivot_idx+1, end)

def quick_sort(array, begin=0, end=None):
    if end is None:
        end = len(array) - 1
    
    return quick_sort_recursion(array, begin, end)
In [10]:
import time
A = np.random.randint(0,100,50)  # randomly take 50 integer numbers in between 0 and 100
ts = time.time()
print(quick_sort(A))
print(time.time()-ts)
None
0.0006389617919921875

counting sort

Strengths:

Linear time. Counting sort runs in O(n), making it asymptotically faster than comparison-based sorting algorithms like quicksort or merge sort.

Weaknesses:

Restricted inputs. Counting sort only works when the range of potential items in the input is known ahead of time. Space cost. If the range of potential values is big, then counting sort requires a lot of space.

In [15]:
def counting_sort(theList,max_value): 
    
    
    counts = [0]*(max_value+1)
    
    
    # put the numbers in theList on the coordinate
    # if the number exist, counting its frequency and put the frequency at the position
    
    # for example: theList=[2,4,3,4], the max value is 4
    # counts = [0,0,0,0,0]
    #           / / / / /
    #          0 1 2 3 4 
    
    for num in theList:
        counts[num] += 1
        
        
    # Now, counts becomes [0,0,1,1,2]
    # Overwrite counts to hold the next index an item with
    # a given value goes. 
    accumulate = 0
    for i, count in enumerate(counts):
        counts[i]  = accumulate
        accumulate += count
        
    # Now counts becomes [0,0,0,1,2] 
    
    
    sorted_list = [None]*len(theList)
    
    # theList[2,4,3,4]
    for num in theList:
        
        # Place the item in the sorted list
        
        # counts[2] = 0 -> sorted_list[0] = 2 -> counts[2]=1
        # counts[4] = 2 -> sorted_list[2] = 4 -> counts[4]=3
        # counts[3] = 1 -> sorted_list[1] = 3 -> counts[3]=2
        # counts[4] = 3 -> sorted_list[2] = 4 -> counts[4]=4
        sorted_list[counts[num]] = num
        
        # And, make sure the next item we see with the same value
        # goes after the one we just placed
        counts[num] += 1
    
    
    return sorted_list
        
In [12]:
A = [2,4,3,4]
In [14]:
print(counting_sort(A,4))
[0, 0, 0, 1, 2]
[2, 3, 4, 4]
In [8]:
import time
A = np.random.randint(0,100,50)  # randomly take 50 integer numbers in between 0 and 100
ts = time.time()
print(counting_sort(A,A.max()))
print(time.time()-ts)
[2, 4, 6, 7, 8, 8, 15, 18, 18, 18, 19, 19, 21, 22, 23, 24, 41, 44, 44, 47, 47, 49, 51, 51, 51, 52, 53, 59, 61, 62, 64, 66, 66, 72, 74, 76, 77, 77, 78, 78, 78, 79, 80, 90, 90, 92, 92, 93, 93, 96]
0.0005400180816650391

Using package: an example

  • To sort a list, using sort() method
  • To sort an iterable, using function sorted().

sort(key=None, reverse=False): in place sort, the calling list is updated.

In [42]:
import numpy as np
a=list(np.random.randint(1,100,10))
In [43]:
a
Out[43]:
[71, 73, 32, 31, 94, 63, 84, 67, 9, 14]
In [44]:
a.sort()
In [45]:
a
Out[45]:
[9, 14, 31, 32, 63, 67, 71, 73, 84, 94]

sorted(a, key=lambda x: str(x)): return a new list containing all the iterms in ascending order

In [2]:
class Student(object):
    def __init__(self,name,GPA):
        self.name= name
        self.GPA = GPA
        
    def __lt__(self,other):
        return self.name < other.name
In [11]:
students = [Student('John',4.0),Student('Alice',4.5),Student('Peter',5.0), Student('Nancy',3.0)]
In [14]:
students[0].name
Out[14]:
'John'

sort by name, students is unchanged

In [15]:
students_sorted_byname = sorted(students) 
In [16]:
students_sorted_byname[0].name,students[0].name
Out[16]:
('Alice', 'John')

sort in place by GPA using sorted(key=)

In [17]:
students.sort(key= lambda Student: Student.GPA)
In [18]:
students[0].name
Out[18]:
'Nancy'