PowerBASIC | Faster Sorting with an Array Map

PowerBASIC
Sorting Faster

Would You Like a Faster Sort?

Probably more research has been devoted to sort routines than any other algorithm. Literally trillions of items are sorted every day in the computer world, and speed is crucial. If you could produce a sort package that was ~~5% faster~~ than the best existing algorithms, you probably could name your price.

Well, I haven't written one of those, but I would offer an alternative to your compiler's ~~built-in~~ sorting mechanism. This idea isn't actually a sort at all, but a linear ~~algorithm —~~ that is, no nested loops. Under certain conditions, the simple routine shown below can run exponentially faster than PowerBASIC's ~~ARRAY SORT~~ function.

ARRAY MAP

There are two restrictions. The data must consist of integers, and the utility array ~~X()~~ must be dimensioned at least to the size of the greatest data value. So if the value 9,000,000 occurs somewhere in the data, then ~~X()~~ must be dimensioned at least to nine million cells. That is not an issue for PowerBASIC, which supports arrays exceeding four billion cells.

SUB ArrayMapSort DIM A(100000000) AS LONG 'integer data to be sorted DIM X(100000000) AS LONG 'dimensioned >= greatest data value DIM SortSize AS LONG 'number of data items DIM MaxInt AS LONG 'greatest integer in data DIM J,K,N AS LONG FOR J=1 TO SortSize: INCR X(A(J)): NEXT N=1 FOR K=1 TO SortSize 'for DESCENDING order, use: FOR K=SortSize TO 1 STEP-1 DO IF X(N) THEN A(K)=N: DECR X(N): EXIT INCR N LOOP NEXT K END SUB '(arraymapsort)

Each value in A() is flagged at the corresponding address of X(). So if A(any)=375, then ~~X(375)~~ is incremented. Then variable N steps through X(); whenever a positive value is encountered, the location of N is written back sequentially to A(). A nifty feature is that ~~A()~~ doesn't have to be cleared out before being rewritten.

Data which include negative integers can be accommodated either by adding a suitable offset when writing to X(), then subtracting that value as it is written back to A(), or by setting up the array to include negative addresses in the first place. PowerBASIC lets you do that. Also, using an appropriate multiplier as an offset would enable the sorting of ~~non-integer~~ values that conform to a rigid ~~format —~~ such as decimal currency or baseball batting averages.

Because ARRAY SORT was written in assembler, it is of course more inherently efficient than anything written in a ~~high-level~~ language such as PowerBASIC. That is the only reason that it can compete with ~~Array Map~~ at all. As it is, ~~ARRAY SORT~~ performs better with smaller data ~~samples —~~ up to about 25,000 items. But for a sort of, say, ~~50 million~~ items, ~~Array Map~~ runs 30 times faster on my computer when the integer size is limited to 100,000, and five times faster when the integer values can range up to ~~100,000,000.~~

BITMAP SORT

This works just like Array Map, except that instead of flagging whole array elements, each data integer in ~~A()~~ is represented by a single bit in ~~X().~~ The tradeoff is that there can be no matching data values, unless it is okay that only one of equal values would survive the process.

The sole advantage is that array ~~X()~~ can be eight times smaller than before. Admittedly, in today's world of virtually unlimited memory resources, this routine has become functionally obsolete; I include it just for nostalgia purposes, for in my ~~DOS/GW-BASIC~~ days it was a godsend.

SUB BitmapSort DIM (as above) DIM EL AS LONG 'array element DIM BT AS LONG 'bit counter FOR J=1 TO SortSize EL=A(J)\32: BT=A(J) MOD 32 BIT SET X(EL),BT NEXT J K=1 FOR EL=0 TO (MaxInt+31)\32 'step through the array FOR BT=0 TO 31 'step through the bits IF BIT(X(EL),BT) THEN N=32*EL+BT: A(K)=N: INCR K IF K>SortSize THEN EXIT,EXIT 'done END IF NEXT BT NEXT EL END SUB '(BitmapSort)