Counting Sort Algorithm

In this post I will step through an example execution of the counting sort algorithm and explain aspects of the algorithm that may seem confusing at first glance.

Counting Sort Algorithm

There are 4 main phases of the counting sort algorithm. The first can be skipped if the radix is used, but in this example we will determine the max integer in the input ourselves.

The 4 Phases of the Counting Sort Algorithm
  1. Determine the max integer in the Input array.
  2. Count the frequency of each number of the Input array and store it in the Count array.
  3. Transform the frequencies in the Count array into indices.
  4. Distribute the numbers from the Input array into the Output array in sorted order.
Input Array

Below is the Input array we will be sorting with the counting sort algorithm.

input array

Phase 1: Determine the max integer in the input array

In this phase we will loop through the input array to find the maximum number. The maximum number is then used to determine the size of the Count array used in phase 2.  The first index in the array must be reserved for use in Phase 3 and 4, so the frequency count will start from index 1. That is, the number 0 will be associated with index 1, the number 1 with index 2, and so on.

Source: Phase 1
int max = 0;
// Find the max value in input array
for (int i = 0; i < input.length; i++) {
    if (input[i] > max) {
        max = input[i];
    }
}
// Increase the size to max plus 1 for the offset, and 1 for the zero digit
int[] count = new int[max + 2];

 

Phase 2: Count the frequency of each number in the Input array and store it in the Count array

In this phrase we will loop though the input array and keep a running count of the number of times we see a number in the input array. We leave the first element of the count array Count[0] empty for use in phase 3, so the index for number 0 is in Count[0+1], the number 1 in Count[1+1], and so on. The resulting counts are shown below.

Source: Phase 2
// Count frequencies
for (int i = 0; i < input.length; ++i) {
    count[input[i] + 1]++;
}

 

Phase 3: Transform the frequencies in the Count array into indices

In this phase we will transform the count values from frequencies to indices. This is done by summing the frequency of every number with the frequencies of the numbers less than it, as shown below.

Interpreting the Indices

Each value in the Count array now contains the sum of the frequencies of numbers less than or equal to its associated number.

The first element has no number associated with it. So it there are 0 numbers less than or equal to it.

The second element, which is associated with the number 0, has 3 numbers less than or equal to it. Since there are no numbers less than 0 in the Input all three must be 0’s.

The third element, which is associated with the number 1, has 5 numbers less than or equal to it. We know that 3 are 0’s, so the remaining 2 must be 1’s.

You might have already noticed that the value to the left of the associated number is actually the the first index where that number will be placed.

Shifting each elements associated number to the left results in the following new association.

Since there are no 0’s to the left of 0 we can use 0 as the index for where the first 1 will be placed. Likewise with the number 1, there are 3 numbers before it so we can use 3 as the index for where the first 1 will be placed. And for  the number 2, there are 5 before it, so we can use 5 as the index for where the first 2 will be placed. This change in the position of the associated number is reflected in code by changing the index we use to access the index from Count[Number + 1] to Count[Number], where Number is a value in the Input array.

Source: Phase 3
// Convert frequency counts to indices
for (int i = 1; i < count.length; ++i) {
    count[i] += count[i-1];
}

Phase 4: Distribute the numbers from the Input array into the Output array in sorted order

In this final phase we will copy values from the Input to the Output array using the Count array index values, as shown below.

Source: Phase 4
int[] output = new int[input.length];
// Distribute input values into output using the ordered positions in the count array
for (int i = 0; i < input.length; ++i) {
    output[count[input[i]]++] = input[i];
}

 

Complete Source

public class KeyIndexCountingSort {
    public static int[] sort(int[] input) {
        int max = 0;
        // Find the max value in input array
        for (int i = 0; i < input.length; i++) {
            if (input[i] > max) {
                max = input[i];
            }
        }
        // Increase the size to max plus 1 for the offset, and 1 for the zero digit
        int[] count = new int[max + 2];
        // Count frequencies
        for (int i = 0; i < input.length; ++i) {
            count[input[i] + 1]++;
        }

        // Convert frequency count to value index ranges
        for (int i = 1; i < count.length; ++i) {
            count[i] += count[i-1];
        }

        int[] output = new int[input.length];
        // Distribute input values into output using the ordered positions in the count array
        for (int i = 0; i < input.length; ++i) {
            output[count[input[i]]++] = input[i];
        }

        return output;
    }
}

 

Running Time Analysis

From the above example you can probably already see that there are two main contributors to the running time.

  1. The size R of the Count array, which is determined by the max value in the array (or the Radix).
  2. The size N of the Input array.

There are 5N + 3R total array accesses in Phase 1-4. The resulting running time of counting sort is O(N + R).  The space required is 2N+R, which is also O(N + R).

For proof of correctness and a deeper analysis of this algorithm, I highly recommend Robert Sedgewick and Keven Wayne book, “Algorithms”.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *