The Avalanche

This is a non-recursive quicksort based algorithm that has been written from the ground up as a stable alternative to the blindingly fast quicksort.

It is not quite as fast as the outright fastest non-stable quicksort, but is still very fast as it uses buffers and CopyMemory and is beaten by none of my other string sorting algorithms except my fastest non-stable quicksort.

A standard quicksort only moves items that need swapping, while this stable algorithm manipulates all items on every iteration to keep them all in relative positions to one another. This algorithm I have dubbed the Avalanche©.

The Avalanche algorithm has the following features:

- It can handle sorting arrays of millions of string items.
- It can handle sorting in ascending and descending order.
- It can handle case-sensitive and case-insensitive criteria.
- It can handle zero or higher based arrays.
- It can handle negative lb and positive ub.
- It can handle negative lb and zero or negative ub.
- It can sort sub-sets of the array data.

It is also important to note that this algorithm does not suffer at all from the traditional quicksort nemesis. It does not guard against pre-sorted data or many repeated items yet is very fast at sorting these data states!

Avalanche v2

In version two a runner section has been added to handle a very hard job for a stable sorter - reverse pretty sorting. Reverse pretty sorting is case-insensitive sorting of data that has been pre-sorted case-sensitively in reverse order.

Case-insensitive sorting is much more demanding than case-sensitive/binary compare sorting, particularly for a stable sorter that must keep all items in relative positions to one another.

It utilises a runner technique to boost this very demanding operation - down from 2.0 seconds to 1.5 seconds on 100,000 items on my 866 MHz P3.

Adding runners has also boosted same-direction pretty sorting operations.

Avalanche v2.1

Version 2.1 of the Avalanche algorithm has smarts added to the runners for maximum performance on all data states! If performing a refresh or pretty sorting operation the code can identify this state and the runners are turned on automatically, while they are off for operations that do not benifit from this extra processing.

Because all items are re-positioned based on the current value the executing code identifies when the avalanche process is producing a zero count buffer one way and so is moving all items the other way - shifting no items up/down in relation to the current item - indicating that the data is in a pre-sorted state.

On each iteration a test of the buffer counts can identify when it is re-sorting or reverse-sorting, as well as producing distinctive indicators on reverse-pretty and same-direction pretty sorting operations.

Note that stable reverse-sorting operations are quite different to a non-stable inversion style reverse operation.

Avalanche v2.2

This version identifies sub-sets of pre-sorted data and delegates it to a built-in insert/binary hybrid algorithm dubbed the Twister©.

This delegation is the sole reason for the speed boost on all operations over version 2.1, and also the reason for the incredibly fast refresh sorting performance - it can refresh-sort 3,248,230 pre-sorted strings in around 2 and a half seconds on my 866MHz P3.

This algorithm is one of my outright fastest at these operations, beaten only by my absolute fastest non-stable sorters, and is the outright fastest on same-direction pretty sorting operations!

Avalanche v2.25

This interim version added safe addition and subtraction of unsigned long integers. This guarantees valid arithmetic operations on memory address pointers which are used extensively by the runner sections of code.

This change imposed a slight performance degradation on all operations.

Avalanche v2.3

The latest version of this algorithm employs a SAFEARRAY substitution technique to trick VB into thinking the four-byte string pointers in the string array are just VB longs in a native VB long array.

The technique simply uses CopyMemory to point a VB long array (defined in the module) at the first of the string pointers in memory, and sets its lower-bound and item count to match (as if it had been redimmed).

This allows us to treat the string pointers as if they were simply four-byte long values in a long array and can be swapped around as needed without touching the actual strings that are pointed to.

Reading and assigning to a VB long array is lightning fast, and proves to be considerably faster when copying only one item than the previous method of copying the string pointers using CopyMemory.

Indexed Version

This version receives a dynamic long array that holds references to the string arrays indices. This is known as an indexed sort. No changes are made to the source string array.

The index array is automatically initialized if it is passed erased or uninitialized. The index array can be passed again for sorting without erasing it.

This allows the index array to be passed on to other sorting processes to be further manipulated, which is exploited in the included PrettySort routine.

After a sort procedure is run the long array is ready as a sorted index (lookup table) to the string array items, so  strA(idxA(lo))  returns the lo item in the string array whose index may be anywhere in the string array.

Usage Details: The index array can be redimmed to match the source string array boundaries or it can be erased or left uninitialized before sorting a string array for the first time. However, if you modify string items and re-sort you should not redim or erase the index array which will take advantage of the fast refresh sorting performance. This also allows the index array to be passed on to other sorting processes to be further manipulated.

Even when using redim with the preserve keyword and adding more items to the string array you can pass the index array unchanged and the new items will be sorted into the previously sorted array. The index array will automatically return with boundaries matching the string array boundaries.

Only when you reload the string array items with new array boundaries should you erase the index array for the first sorting operation. Also, if you redim the source string array to smaller boundaries using the preserve keyword you should erase the index array before sorting the new smaller data set for the first time.

Performance

These results are the fastest times, in seconds, produced sorting 99,996 string items for each operation, tested on my 866MHz P3.

Data State Avalanche v2
Stable
Avalanche v2
Stable Indexed
Quicksort
Non-Stable
Heapsort
Non-Stable
Shellsort
Non-Stable
Unsorted0.55920.66120.48760.78971.1281
Pre-sorted0.09390.10240.36950.61640.4471
Reverse-sorted0.43570.46030.36640.62330.5726
Pretty-sorted1.20811.25211.33262.09781.6709
Reverse-pretty1.44471.49241.30171.99001.6471

This stable algorithm is truely very fast at all sorting operations!

Stability

This very unique sorting algorithm is stable; which is the preserving of the original order of items that equate to equal during a comparison.

This issue usually applies to sorting objects or data types that contain multiple members where only one or a selected few of the members are compared during the sorting procedure. It is considered very inappropriate to shuffle the order of items that equate to equal for that particular sort criteria.

This issue in not completely irrelevant for strings. For sorting with case-insensitive comparisons it can make a difference.

For example, comparisons with case-insensitive criteria on items that are ordered lowercase before uppercase will remain in this state within their sorted groups:

before sorting case-sensitive
sorting
case-insensitive
stable sorting
case-insensitive
unstable sorting
aaaaaaAAAAAAaaaaaaaaaaaa
bbbbbbAAAaaaAAAaaaAAAAAA
ccccccBBBBBBAAAAAAAAAaaa
AAAaaaBBBbbbbbbbbbbbbbbb
BBBbbbCCCCCCBBBbbbBBBbbb
CCCcccCCCcccBBBBBBBBBBBB
AAAAAAaaaaaaccccccCCCccc
BBBBBBbbbbbbCCCccccccccc
CCCCCCccccccCCCCCCCCCCCC
Note: the unstable results shown are real results of a shellsort operation.

Not only is this issue relevant for case-insensitive string sorting, but for any array sorting algorithm that can handle, or be modified to handle, other data types.

Free Usage

As usual, you are free to use any part or all of this code even for commercial purposes in any way you wish under the one condition that no copyright notice is moved or removed from where it is.

Happy coding :)

...