Ted's Computer World Sorting, Shuffling & Dealing

Out of Sorts?

boldl boldaslf bold bold2 normal

Probably more research has been devoted to sort routines than any other algorithm.  Literally trillions of items are sorted every day in the computer world, and speed is crucial.  If you could present IBM with a sort package that would save its mainframe users 5% on their sort-times, you could name your price.

I have analyzed a multitude of sorts — Bubble, Insertion, Tree, Heap, Selection, Merge, O-Sort, Endsort, Jumpsort, Flipsort, Fastsort, Shell, and Quicksort.  The Quicksort is rated by most as the fastest of the bunch; however, the routine is recursive, and that causes GW-BASIC to run out of stack space in a hurry.  Of the remaining options, the Shell Sort performed by far the best in my testing, so that is what will be featured here.

Included are three other methods, which aren't actually sorts at all.  Although their usefulness is limited to certain types of data, they are worth a look, because they are exponentially faster and more economical than any traditional sorting routine!

The stated [times] are just for comparison purposes; they represent the average runtime on several arrays of 10,000 random integers, in BASIC direct mode.  Compiled runtimes would be far faster.

40 DIM A(10000), X(12500) 'data array, mirror array 50 SIZE=10000 'total elements to sort


SHELL-METZNER SORT     [time: 1.21 sec]

This is inherently a multi-layer Insertion Sort.  A decreasing offset enables data slated for greater moves to do so early, rather than being transferred later in smaller increments.  No one knows the best strategy for values of D, but Donald Knuth has suggested a series of the form  (3^n-1)/2, or 1, 4, 13, 40, 121, 364, 1093, etc.  Knuth also suggests that the starting point be one lower than the greatest D-value that is less than the sort size.  Example: to sort 500 items, the starting value for D would be 121.  This strategy proved best overall in my testing, and Knuth is a better programmer than I; so his recommendation is featured in this sort.  The strategy D=2^n-1  performed nearly as well, however; the only absolute requirement is that the final value is D=1.

2100 '-----------[ Shell-Metzner ]----------- 2101 ' 2102 'In: SIZE = number of data elements 2103 ' A() = array to be sorted 2104 'Out: A() as sorted list 2109 ' 2110 DLOOPS = INT(LOG(2*SIZE)/LOG(3))-1 2115 D=(3^DLOOPS-1)/2:  IF D=0 THEN D=1 2119 ' 2120 WHILE D 2130 FOR J=D+1 TO SIZE 2135 T=A(J) 2140 FOR K=J-D TO 1 STEP -D 2150 IF T>=A(K) GOTO 2165 2155 A(K+D)=A(K) 2160 NEXT K 2165 A(K+D)=T 2170 NEXT J 2180 D=D\3 2190 WEND

Lines 2110-2115 calculate the starting value for D.  It is decremented at 2180 (integer division).  The defaulted K-loops increase the speed of the routine, as they are handled quite efficiently at the machine level.

String input can by accommodated by setting  DEFSTR A, or by changing the array type to A$().

TED'S SHELL SORT     [time: 1.13 sec]

One can do better, however.  According to my testing, the following improvement results in a 2-5% increase in efficiency.  Making the swap at each opportunity at 2155 obviates the temporary variable T, and it eliminates the assignment at 2165, which is redundant when no swap is to be made.

2100 '-----------[ Shell-Metzner-Ted ]----------- 2101 ' 2102 'In: SIZE = number of data elements 2103 ' A() = array to be sorted 2104 'Out: A() as sorted data 2109 ' 2110 DLOOPS=INT(LOG(2*SIZE)/LOG(3))-1 2115 D=(3^DLOOPS-1)/2: IF D=0 THEN D=1 2119 ' 2120 WHILE D 2130 FOR J=D+1 TO SIZE 2140 FOR K=J-D TO 1 STEP -D 2150 IF A(K)<=A(K+D) GOTO 2170 2155 SWAP A(K),A(K+D) 2160 NEXT K 2170 NEXT J 2180 D=D\3 2190 WEND

That is the pretty, readable version.  However, as with most routines, runtime can be minimized by packing the code.  Every time BASIC accesses a new line, some processor time is consumed.  Also, whenever a GOTO or GOSUB is executed, BASIC hunts for the target line, starting at the top of the program.  This means not only that speed-critical routines ideally would be positioned early in the code, but that time can be saved by eliminating cosmetic spacing and combining statements to reduce the number of logical lines.  This three-line packing of Ted's Shell Sort saves noticeable time in direct mode:

[time: 1.10 sec]

2120 WHILE D:FOR J=D+1 TO SS:FOR K=J-D TO 1 STEP-D:IF A(K)<=A(K+D) GOTO 2170 2155 SWAP A(K),A(K+D):NEXT 2170 NEXT:D=D\3:WEND

My tests indicate that the packing of code cuts runtimes substantially when using the BASCOM 2.0 Compiler; but the benefits are negligible when using the far smarter QuickBASIC 4.0 Compiler.  And yes, it's okay to jump out of the K-loop, as long as you don't make a habit of it.  GW-BASIC provides stack space for tracking only nine concurrent FOR-loops; however, if you re-initialize a loop using the same counter variable, no additional stack space is required, and your program will not crash for that reason.


The next three routines are unlike traditional sorts, in that no data values are compared or swapped at any time.  All are linear algorithms — that is, no nested loops.  Speeds are up to ten times faster than any other sort.  The tradeoff is that the data must conform to specific criteria.

ARRAY MAP     [time: 0.15 sec]

This routine is for use with positive integers only, with no duplication of values.  A mirror array X() needs as many elements as the greatest integer to be sorted.  Any matching values are absorbed; that is, if two or more elements have the same integer value, only one such element will survive the process.  If that effect is unwanted, then use Array Match.

What it does: For each value in A(), the X() element of the corresponding number is flagged with -1 (although any value could be used).  Example: if A(24) = 375, then X(375) is set equal to -1.  Then variable N steps through X(); when it encounters a flag, it writes the corresponding address (not its value) back to the lowest-numbered available element of A().  The result is a sorted list!

2400 '-------[ Array Map Sort - no matching data ]------- 2401 ' 2402 'In: SIZE = number of elements to sort 2403 ' MAXINT = greatest value in list 2404 ' A() = array of unique integers 2405 'Out: A() as sorted data> 2409 ' 2410 FOR J=1 TO MAXINT: X(J)=0: NEXT 2420 FOR J=1 TO SIZE: X(A(J))=-1: NEXT 2430 N=0 2440 FOR K=1 TO SIZE 2450 N=N+1: IF X(N)=0 THEN 2450 2460 A(K)=N 2470 NEXT

That's it!  Two variable assignments per data element, and it is finished.

The code could easily be adjusted to process other types of numeric data having a rigid format, such as decimal currency or baseball batting averages.  It would be necessary only to convert the data to or from integers before and after the sort.


ARRAY MATCH    [time: 0.18 sec]

This adjustment accommodates repetitions of data values by counting them, then writing the totals back to A() accordingly.  The only changes from ARRAY MAP are at line 2520, plus the addition of line 2570.

2500 '-------[ Array Map - matching data ok ]------- 2501 ' 2502 'In: SIZE = number of elements to sort 2503 ' MAXINT = greatest value in list 2504 ' A() array of non-negative integers 2504 ' X() dimensioned >= greatest integer 2505 'Out: A() sorted 2509 ' 2510 FOR J=1 TO MAXINT: X(J)=0: NEXT 2520 FOR J=1 TO SIZE: X(A(J))=X(A(J))+1: NEXT 2530 N=0 2540 FOR J=1 TO SIZE 2550 N=N+1: IF X(N)=0 THEN 2550 2560 A(J)=N 2570 X(N)=X(N)-1 2580 NEXT J


BITMAP     [time: 0.16 sec.]

This is similar in concept to Array Sort.  But this time, instead of flagging whole array elements, each data integer in A() is represented by a single bit in X(); that's 16 bits per array element.  Array X() needs to be only 1/16 as large as the greatest integer.  The data are packed and unpacked in binary fashion, using the bitwise AND/OR.

This option could be useful if the data integers were large and/or array space was at a premium.  (In GW-BASIC, it's not difficult to run short of such resources.)  Any matches in the data are absorbed, being flagged by the same bit; therefore, if there are two or more instances of any integer in the data list, only one would be written to the output.

400 '-----------[ Bitmap Sort - matching values are absorbed ]----------- 401 ' 402 'In: SIZE = number of elements to sort 403 ' MAXINT = greatest value in list 404 ' A() = array of;non-negative integers 405 ' X() = initialized at 0, must be at least 1/16 the size of the greatest integer 406 'Out: A() sorted 409 ' 410 FOR J=1 TO SIZE 420 EL=A(J)\16: BIT=A(J) MOD 16 430 IF BIT<15 THEN X(EL)=X(EL) OR 2^BIT 440 IF BIT=15 THEN IF X(EL)>=0 THEN X(EL)=X(EL) OR -32768! 450 NEXT J 499 ' 500 N=1 510 FOR EL=0 TO (MAXINT+15)\16 520 BIT=0 530 WHILE BIT<16 AND N<=SS 540 IF BIT=15 AND X(EL)<0 THEN A(N)=16*EL+15: N=N+1 550 IF BIT<15 THEN IF X(EL) AND 2^BIT THEN A(N)=16*EL+BIT: N=N+1 560 BIT=BIT+1 570 WEND 580 NEXT EL

This method also can accommodate larger sorts that cannot be handled by other methods.  In GW-BASIC, the memory limit for total integer array elements is approximately 25,000.  Of course, most lists are far smaller than that, so no problem arises.  Theoretically, however, if A() were maximally dimensioned, a list of 25,000 × 16, or 400,000, unique values could be sorted to and from a disk file.  Unfortunately, another limitation of BASIC would restrict such a list to a maximum of 32767, as that's all the unique positive integers that are available!  Newer programming languages, with 32-bit or 64-bit integers, are not similarly restricted.

---

Shuffling Around

As with sort routines, there has been an overwhelming amount of research on shuffling.  Gaming casinos and online poker sites alike are keenly interested in maximizing the randomness of their shuffles; otherwise, the results could be predicted, and their games could be mathematically beaten.  Despite all the study, debates still rage over the relative validity of certain algorithms.

Being addicted to the game of contract bridge as well as to computer programming, I have devoted a lot of independent research to shuffling, particularly in regard to mixing a deck of cards.  For my testing, I deal four 13-card hands using the various shuffles, then compare the results with known mathematical expectations.  Everything meaningful is counted: hand patterns, suit lengths, runs, individual card placements, and more.

I am satisfied as to the validity of a shuffle routine only if it passes such tests within close tolerances.  The following two obviously sound algorithms fared better than the others.


SORTING RANDOMS

This routine generates a list of random numbers, then sorts them.  Mirroring data movement with that of the random values results in the data itself being mixed as the randoms are organized.  Any sorting module could be utilized.  For the sake of simplicity, a Bubble Sort is used in this example:

3000 '-----------[  Shuffle: Sort Randoms ]----------- 3001 ' 3002 'In: A() = data to be shuffled 3003 ' SIZE of list 3004 'Out: A() as shuffled data 3009 ' 3010 FOR J=1 TO SIZE: X(J)=RND: NEXT 3020 FOR J=1 TO SIZE-1: FOR K=J+1 TO SIZE 3030 IF X(J)>X(K) THEN SWAP X(J),X(K): SWAP A(J),A(K) 3040 NEXT K,J

This is the method used by the American Contract Bridge League (ACBL) for its preparation of handrecords.  One can do far better, however, in terms of speed:


COLLECTION SHUFFLE

This one is my personal favorite for a number of reasons.  Being another linear algorithm, it is more or less as fast as possible.  From the sort list A(), a random unit is selected and "collected" sequentially in X().  Then the last data element is placed in the slot vacated by the selected unit, and the scope of the random search is decremented.  The process repeats until all data elements have been assigned.  Array X() then contains a shuffled list.

7000 '--------------[ Collection Shuffle ]--------------- 7001 ' 7002 'In: A() data to be shuffled 7003 ' SIZE of list 7004 'Out: X() as shuffled data 7005 ' A() trashed 7009 ' 7010 D=SIZE 7020 FOR J=1 TO SIZE 7030 R=INT(RND*D)+1 'random, 1 to size 7040 X(J)=A(R) 'assign the value 7050 A(R)=A(D) 'fill the hole 7060 D=D-1 'decrement the data choices 7070 NEXT J

The filling of the "hole" left by a selected element eliminates the problem of subsequent random selections accessing an already allocated value (empty array element), which process could theoretically run indefinitely.  Oddly enough, in practice such a procedure invariably runs approximately three times as long as the more robust setup (perhaps pi (3.14) is the actual average runtime, or is it e (2.72)?).


A WORD ABOUT HUMAN SHUFFLES 

Using a deck of Kem cards of average wear and average grime, I made 100 normal shuffles, noting the sizes of both the cuts and the resultant "packets".  The cuts varied from 24-28 cards, and the packets ranged between 1 and 7 cards each.  Starting with an ordered deck, a shuffling routine then simulated my human shuffles by making weighted random choices of cut and packets, based upon their empirically suggested frequencies of occurrence.

Results of testing proved reasonable, albeit unacceptable by computer standards.  Six shuffles yielded much better statistics than three, and ten shuffles were even better yet.  Perhaps that disparity would be lessened by starting with a more randomly organized deck, such as from a live game.

---

Dealing (with) a Deck of Cards

Sometime in the 1980s, when computer bridge scoring was just getting started, the ACBL's Contract Bridge Forum — a monthly newspaper — published a card-shuffling routine in BASIC code.  It is unfortunate, however, that no one — including, apparently, the programmer — had bothered to test or analyze the routine, for it was hideously flawed.  It suffered from what I call the 'clubs problem', an error in the handling of probabilities.

Any dealing routine must begin with the pack arrayed in some ordered fashion; otherwise, there would be no way of tracking the cards' locations.  One popular choice is to arrange the cards spades first, ace to deuce, then the hearts, diamonds, and clubs.  So card #1 would be the ace of spades, and card #52 the deuce of clubs.

Here is the error.  It is invalid to repetitively select a random hand, then deal a card to it.  Why?  Because the hands do not fill up evenly.  After dealing say, 40 cards, they might be distributed 12 - 11 - 9 - 8, or any other pattern.  Using that sequence, if the next card were dealt truly at random, it would be 5 times as likely to go to the hand missing 5 cards as to the hand missing only 1 card.  Assigning equal weight to the four hands themselves distorts the randomness, because the possible outcomes do not carry equal weight.  Additionally, the odds are against the first 48 cards being distributed 12 - 12 - 12 - 12.  This means that in the majority of cases, the last two (or more) cards are dealt to the same hand.

Why is that bad?  Because it results in the lowest-ranking clubs going to the same hand an inordinate number of times.  That's why we call it the 'clubs problem'.  This distortion of the probabilities has a number of disastrous effects, most notably a great increase in frequency of the more distributional hands.  A different ordering of the original deck would not help, either; for the same last few cards dealt still would be paired up far too frequently.

So let us opt for a mathematically valid routine instead, by selecting the cards at random, and dealing them to the hands in some orderly sequence.  Although it makes no difference whether the cards are disbursed one at a time or thirteen at a time, the following routine simulates a standard human deal of one card at a time around the table.  It combines the Collection Shuffle with a simple dealing mechanism.  The randomly selected cards actually are dealt directly to Array Deal(), without the requirement of a prior shuffle.

7000 '----------[ Deal 52 Cards to 4 Hands ]---------- 7001 ' 7002 'Out: DEAL(h,c) hands #0-3, cards #0-12 7009 ' 7010 FOR C=0 TO 51: DK(C)=C: NEXT 'initialize the pack 7019 ' 7020 FOR CD=0 TO 51 7030 R=INT(RND*(52-CD)) 7040 HAND=CD MOD 4 7050 CARD=CD MOD 13 7060 DEAL(HAND,CARD)=DK(R) 7070 DK(R)=DK(51-CD) 7080 NEXT CD

Comprehensive methods don't necessarily require a lot of code!  To maximize speed, this entire routine could be packed into a single logical line, saving 5-6% runtime.  At home my Pentium-4 generates more than 2,200 deals per second in immediate mode, and nearly 8,000 deals per second compiled.

Go Back