Sorting algorithms

From ScienceZero
Jump to: navigation, search

Finding the optimal sorting algorithm for a particular task is difficult. Most comparisons in literature is based on assumptions that makes the conclusions unreliable. The complexity of a modern CPU makes it almost impossible to predict performance with high accuracy without testing.

When working with sorting algorithms it is very important to add code to check that the results are correct. Sorting algorithms are notoriously difficult to get 100% correct because the symptoms of a mistake can be very subtle. You need to check that the items have been sorted correctly and that they have not been altered. Some common errors will fill the table with copies of the same item and will always pass the sorted test even if no actual sorting has taken place.

//Example of code to catch most human errors
unsigned int checksum = 0;
for (i=0; i<tabN; i+=1) checksum ^= tab1[i];

  Do sorting here...
 
for (i=0; i<tabN; i+=1) checksum ^= tab1[i];
if (checksum != 0) printf("Table is corrupted!!!\n");
 
for (i=0; i<(tabN-1); i+=1) {
  if (tab1[i] > tab1[i+1]) {
    printf("Table is not sorted correctly!!!\n");
  }
}


Comb sort 11

From Stephen Lacy & Richard Box "A Fast Easy Sort" BYTE April 1991, p315

Comb sort 11 uses no extra memory and is very simple to implement. The speed is comparable to quick sort in some cases (compared to the standard qsort() function it is the same speed on integers and 30% faster on unsigned integers). It is not cache friendly since it scans the whole table sequentially.

//Comb sort 11
unsigned int i, temp, gap, tabN;
bool f;

gap = tabN;
do {
  gap = (gap*10)/13;
  if (gap==0) gap=1;
  if ((gap==9) || (gap==10)) gap = 11;

  f = false;
  for (i=0;i<(tabN-gap);i=i+1) {

    if (tab[i] > tab[i+gap]) {
      temp = tab[i];
      tab[i] = tab[i+gap];
      tab[i+gap] = temp;

      f = true;
    }

  }
} while(f || (gap>1));

Radix sort

Radix sort requires N*4 bytes of extra memory but it is very fast. This version is 4-5 times faster than qsort() on 32 million unsigned integers on a Pentium 4. There is still room for improvement since the random writing to 256 different memory locations causes a lot of DTLB misses that slows it down.

//Radix sort
unsigned int t, i, s, sft, tabN, *temp, *histogram;
histogram = (unsigned int *)malloc(256*sizeof(unsigned int));
bool flag;

for (sft=0; sft<32; sft+=8) {

  for (i=0; i<256; i+=1) histogram[i] = 0;
  for (i=0; i<tabN; i+=1) histogram[(tab1[i]>>sft) & 255] += 1;

  s = 0;
  flag = false;
  for (i=0; i<256; i+=1) {
    t = histogram[i];
    if (t==tabN) flag = true;
    histogram[i] = s;
    s = s + t;
  }

  if (flag) continue;

  for (i=0; i<tabN; i+=1) {
    s = (tab1[i]>>sft) & 255;
    tab2[histogram[s]] = tab1[i];
    histogram[s] += 1;
  }

  temp = tab1;
  tab1 = tab2;
  tab2 = temp;
}