PowerBASIC Sorting Faster |
Would You Like a Faster Sort?
Probably more research has been devoted to sort routines than any other
algorithm. Literally trillions of items are sorted every day in the computer
world, and speed is crucial. If you could produce a sort package that was
5% faster than the best existing algorithms, you probably could name your
price.
Well, I haven't written one of those, but I would offer an alternative to your
compiler's built-in sorting mechanism. This idea isn't actually a sort
at all, but a linear algorithm — that is, no nested loops.
Under certain conditions, the simple routine shown below can run exponentially
faster than PowerBASIC's ARRAY SORT function.
ARRAY MAP
There are two restrictions. The data must consist of integers,
and the utility array X() must be dimensioned at least to the
size of the greatest data value. So if the value 9,000,000 occurs
somewhere in the data, then X() must be dimensioned at least to
nine million cells. That is not an issue for PowerBASIC, which supports
arrays exceeding four billion cells.
SUB ArrayMapSort
DIM A(100000000) AS LONG 'integer data to be sortedDIM X(100000000) AS LONG 'dimensioned >= greatest data valueDIM SortSize AS LONG 'number of data itemsDIM MaxInt AS LONG 'greatest integer in dataDIM J,K,N AS LONG FOR J=1 TO SortSize: INCR X(A(J)): NEXT N=1 FOR K=1 TO SortSize 'for DESCENDING order, use: FOR K=SortSize TO 1 STEP-1DO IF X(N) THEN A(K)=N: DECR X(N): EXIT INCR N LOOP NEXT K END SUB '(arraymapsort) |
Each value in A() is flagged at the corresponding address of
X(). So if A(any)=375, then
X(375) is incremented. Then variable N steps through
X(); whenever a positive value is encountered, the location
of N is written back sequentially to A(). A nifty
feature is that A() doesn't have to be cleared out before being
rewritten.
Data which include negative integers can be accommodated either by adding a
suitable offset when writing to X(), then subtracting that value as
it is written back to A(), or by setting up the array to include
negative addresses in the first place. PowerBASIC lets you do that.
Also, using an appropriate multiplier as an offset would enable the sorting of
non-integer values that conform to a rigid format — such as
decimal currency or baseball batting averages.
Because ARRAY SORT was written in assembler, it is of course more
inherently efficient than anything written in a high-level language such
as PowerBASIC. That is the only reason that it can compete with
Array Map at all. As it is, ARRAY SORT
performs better with smaller data samples — up to about 25,000
items. But for a sort of, say, 50 million items,
Array Map runs 30 times faster on my computer when the
integer size is limited to 100,000, and five times faster when the integer
values can range up to 100,000,000.
BITMAP SORT
This works just like Array Map, except that instead of flagging
whole array elements, each data integer in A() is represented by a
single bit in X(). The tradeoff is that there can be
no matching data values, unless it is okay that only one of equal values
would survive the process.
The sole advantage is that array X() can be eight times
smaller than before. Admittedly, in today's world of virtually unlimited
memory resources, this routine has become functionally obsolete; I include it just
for nostalgia purposes, for in my DOS/GW-BASIC days it was a godsend.
SUB BitmapSort
DIM (as above)
DIM EL AS LONG 'array elementDIM BT AS LONG 'bit counterFOR J=1 TO SortSize EL=A(J)\32: BT=A(J) MOD 32 BIT SET X(EL),BT NEXT J K=1 FOR EL=0 TO (MaxInt+31)\32 'step through the arrayFOR BT=0 TO 31 'step through the bitsIF BIT(X(EL),BT) THEN N=32*EL+BT: A(K)=N: INCR K IF K > SortSize THEN EXIT,EXIT 'doneEND IF NEXT BT NEXT EL END SUB '(BitmapSort) |