Counting Sort Algorithm
In this post I will step through an example execution of the counting sort algorithm and explain aspects of the algorithm that may seem confusing at first glance.
Counting Sort Algorithm
There are 4 main phases of the counting sort algorithm. The first can be skipped if the radix is used, but in this example we will determine the max integer in the input ourselves.
The 4 Phases of the Counting Sort Algorithm
- Determine the max integer in the Input array.
- Count the frequency of each number of the Input array and store it in the Count array.
- Transform the frequencies in the Count array into indices.
- Distribute the numbers from the Input array into the Output array in sorted order.
Input Array
Below is the Input array we will be sorting with the counting sort algorithm.
Phase 1: Determine the max integer in the input array
In this phase we will loop through the input array to find the maximum number. The maximum number is then used to determine the size of the Count array used in phase 2. The first index in the array must be reserved for use in Phase 3 and 4, so the frequency count will start from index 1. That is, the number 0 will be associated with index 1, the number 1 with index 2, and so on.
Source: Phase 1
int max = 0; // Find the max value in input array for (int i = 0; i < input.length; i++) { if (input[i] > max) { max = input[i]; } } // Increase the size to max plus 1 for the offset, and 1 for the zero digit int[] count = new int[max + 2];
Phase 2: Count the frequency of each number in the Input array and store it in the Count array
In this phrase we will loop though the input array and keep a running count of the number of times we see a number in the input array. We leave the first element of the count array Count[0] empty for use in phase 3, so the index for number 0 is in Count[0+1], the number 1 in Count[1+1], and so on. The resulting counts are shown below.
Source: Phase 2
// Count frequencies for (int i = 0; i < input.length; ++i) { count[input[i] + 1]++; }
Phase 3: Transform the frequencies in the Count array into indices
In this phase we will transform the count values from frequencies to indices. This is done by summing the frequency of every number with the frequencies of the numbers less than it, as shown below.
Interpreting the Indices
Each value in the Count array now contains the sum of the frequencies of numbers less than or equal to its associated number.
The first element has no number associated with it. So it there are 0 numbers less than or equal to it.
The second element, which is associated with the number 0, has 3 numbers less than or equal to it. Since there are no numbers less than 0 in the Input all three must be 0’s.
The third element, which is associated with the number 1, has 5 numbers less than or equal to it. We know that 3 are 0’s, so the remaining 2 must be 1’s.
You might have already noticed that the value to the left of the associated number is actually the the first index where that number will be placed.
Shifting each elements associated number to the left results in the following new association.
Since there are no 0’s to the left of 0 we can use 0 as the index for where the first 1 will be placed. Likewise with the number 1, there are 3 numbers before it so we can use 3 as the index for where the first 1 will be placed. And for the number 2, there are 5 before it, so we can use 5 as the index for where the first 2 will be placed. This change in the position of the associated number is reflected in code by changing the index we use to access the index from Count[Number + 1] to Count[Number], where Number is a value in the Input array.
Source: Phase 3
// Convert frequency counts to indices for (int i = 1; i < count.length; ++i) { count[i] += count[i-1]; }
Phase 4: Distribute the numbers from the Input array into the Output array in sorted order
In this final phase we will copy values from the Input to the Output array using the Count array index values, as shown below.
Source: Phase 4
int[] output = new int[input.length]; // Distribute input values into output using the ordered positions in the count array for (int i = 0; i < input.length; ++i) { output[count[input[i]]++] = input[i]; }
Complete Source
public class KeyIndexCountingSort { public static int[] sort(int[] input) { int max = 0; // Find the max value in input array for (int i = 0; i < input.length; i++) { if (input[i] > max) { max = input[i]; } } // Increase the size to max plus 1 for the offset, and 1 for the zero digit int[] count = new int[max + 2]; // Count frequencies for (int i = 0; i < input.length; ++i) { count[input[i] + 1]++; } // Convert frequency count to value index ranges for (int i = 1; i < count.length; ++i) { count[i] += count[i-1]; } int[] output = new int[input.length]; // Distribute input values into output using the ordered positions in the count array for (int i = 0; i < input.length; ++i) { output[count[input[i]]++] = input[i]; } return output; } }
Running Time Analysis
From the above example you can probably already see that there are two main contributors to the running time.
- The size R of the Count array, which is determined by the max value in the array (or the Radix).
- The size N of the Input array.
There are 5N + 3R total array accesses in Phase 1-4. The resulting running time of counting sort is O(N + R). The space required is 2N+R, which is also O(N + R).
For proof of correctness and a deeper analysis of this algorithm, I highly recommend Robert Sedgewick and Keven Wayne book, “Algorithms”.