World of Shareware - Software Farm 2

home *** CD-ROM | disk | FTP | other *** search

/ World of Shareware - Software Farm 2 / wosw_2.zip / wosw_2 / QBAS / BAS_SORT.ZIP / SPLITSRT.BAS < prev

Wrap

BASIC Source File | 1987-10-06 | 10KB | 243 lines

REM ----- "SPLITSRT" ' By J. G. Krol Copyright 1987 Sep 29 ' Duplication for personal use allowed, but not for commercial use ' TO USE SPLITSORT AS A QUICKBASIC 3.0 SUBPROGRAM, ' DIMENSION TWO NUMERICAL DATA ARRAYS: ' N = ARRAY SIZE FROM 0 BASE (N+1 DATA, TOTAL) ' XRAW(N) = RAW DATA FROM CALLING PROGRAM ' XSORTED(N) = SORTED DATA BACK TO CALLING PROGRAM ' THEN CALL THE SUBPROGRAM WITH: ' CALL SPLITSORT(XRAW(), XSORTED(), N) ' SAMPLE DRIVER N = 20 DIM XRAW(N), XSORTED(N) FOR L = 0 TO N XRAW(L) = 20 - L NEXT L CALL SPLITSORT(XRAW(), XSORTED(), N) PRINT PRINT"(SPACE) to show results" WHILE INKEY$ <> CHR$(32):WEND PRINT PRINT" SHOW L, XRAW(L), AND XSORTED(L)" PRINT FOR L = 0 TO N PRINT L, XRAW(L), XSORTED(L) NEXT L END SUB SPLITSORT (RAW(1), SORTED(1), N) STATIC PRINT PRINT"Sorting" ' MOVE RAW DATA INTO OUTPUT ARRAY FOR L = 0 TO N SORTED(L) = RAW(L) NEXT L ' DIMENSION WITH A VARIABLE TO FORCE THE ARRAYS TO BE DYNAMIC ' SO THAT THEY CAN BE DISARRAYED WITH AN "ERASE" ' SO THAT THEY CAN BE REDIMENSIONED THE NEXT TIME THROUGH STACKSIZE = 20 DIM LOWEND(STACKSIZE), HIGHEND(STACKSIZE) ' INITIALIZE THE PROBLEM POINTER STACK STACKPOINTER = 1 MAXSTACKPOINTER = 1' FOR REFERENCE LOWEND(1) = 0 HIGHEND(1) = N ' MAIN PROBLEM LOOP - DO ANY AVAILABLE SORTING PROBLEM DO UNTIL STACKPOINTER = 0 ' CROSSED LIMITS MEAN NULL PROBLEM DO WHILE HIGHEND(STACKPOINTER) > LOWEND(STACKPOINTER) LOW = LOWEND(STACKPOINTER) HIGH = HIGHEND(STACKPOINTER) GAP = LOW + INT((HIGH - LOW)/2) PIVOT = SORTED(GAP) WHILE HIGH > LOW ' SCAN UPWARDS TOWARDS THE GAP FOR A BIG VALUE IF LOW < GAP THEN FOR SCAN = LOW TO GAP IF SORTED(SCAN) > PIVOT THEN SORTED(GAP) = SORTED(SCAN)' FILL THE OLD GAP GAP = SCAN' LOCATE THE NEW GAP EXIT FOR' A NEW COMMAND IN QB 3.0 END IF NEXT SCAN LOW = GAP END IF ' SCAN DOWNWARDS TOWARDS THE GAP FOR A SMALL VALUE IF HIGH > GAP THEN FOR SCAN = HIGH TO GAP STEP -1 IF SORTED(SCAN) < PIVOT THEN SORTED(GAP) = SORTED(SCAN)' FILL THE OLD GAP GAP = SCAN' LOCATE THE NEW GAP EXIT FOR END IF NEXT SCAN HIGH = GAP END IF WEND ' FILL THE FINAL GAP SORTED(GAP) = PIVOT ' THE FILE IS NOW PARTITIONED AT THE FINAL GAP LOCATION ' STACK UP THE TWO NEW PROBLEMS LOWENDLENGTH = GAP - LOWEND(STACKPOINTER) HIGHENDLENGTH = HIGHEND(STACKPOINTER) - GAP ' PUT LONGER PROBLEM ON STACK FIRST ' OVERWRITING THE PROBLEM JUST SOLVED IF LOWENDLENGTH < HIGHENDLENGTH THEN LOWEND(STACKPOINTER+1) = LOWEND(STACKPOINTER) HIGHEND(STACKPOINTER) = HIGHEND(STACKPOINTER) LOWEND(STACKPOINTER) = GAP + 1 HIGHEND(STACKPOINTER+1) = GAP - 1 ELSE LOWEND(STACKPOINTER) = LOWEND(STACKPOINTER) HIGHEND(STACKPOINTER+1) = HIGHEND(STACKPOINTER) LOWEND(STACKPOINTER+1) = GAP + 1 HIGHEND(STACKPOINTER) = GAP -1 END IF STACKPOINTER = STACKPOINTER + 1 IF STACKPOINTER > MAXSTACKPOINTER THEN_ MAXSTACKPOINTER = STACKPOINTER LOOP STACKPOINTER = STACKPOINTER - 1 LOOP ERASE LOWEND, HIGHEND PRINT PRINT"Data are now sorted" PRINT"Maximum stackpointer was";MAXSTACKPOINTER END SUB ' Program description: ' SPLITSORT is a "partition" sorting algorithim with two advantages ' over the classical QUICKSORT algorithim that has long been the ' standard of comparison. ' Partition sorts repeatedly divide the given array into three parts: ' 1. The "lower part" up to the "pivot"; all numbers in the lower part ' are constructed to be no bigger than the pivot itself. Given M ' numbers to sort, the lower part may contain 0 to M-1 numbers. ' 2. The pivot, a single number. ' 3. The "upper part" from the pivot to the end; all numbers in the ' upper part of the array are constructed to be no smaller than ' the pivot. The upper part may contain 0 to M-1 numbers. ' The result of such a partition is to change the given problem of ' sorting M numbers into three subproblems, each independent of the ' other two: ' 1. Sort the lower part. If the lower part contains but 1 or 0 numbers, ' then this becomes a null problem: nothing needs to be done. ' 2. Sort the pivot, a single number. This is obviously a null problem. ' 3. Sort the upper part. If it contains but 1 or 0 numbers, again ' there's really nothing to do. ' Repeatedly partitioning the original array reduces the original, ' dificult, time-consuming sorting problem into a number of null ' problems. Thus a partition sort sorts by avoiding all sorting. ' In the best case, the pivot ends up in the middle of a given array. Then ' the lower and upper parts each contain about M/2 numbers. Each of the ' two new subproblems is (a bit less than) half as long as the old ' problem. In this ideal case, the number of comparisions and thus the ' time required to sort a given array vary in proportion to M*log(M). ' Since log(M) increases very slowly compared to M itself, the sorting ' time increases only modestly faster than the number of numbers to be ' sorted. ' In the worst case, however, the pivot winds up in the first (last) ' position, the lower (upper) part contains 0 items, the upper (lower) ' part contains M-1 items, and the resulting sorting time varies as ' M*M. Since M-squared increases much faster than M itself, the time ' soon becomes prohibitively long as M increases. Thus the BUBBLE SORT, ' SELECT SORT, and INSERT SORT -- all of which vary as M*M -- are fine ' for short arrays, but impracticable for long arrays. ' The time taken by any partition sort varies between the ' best case and the worst case, depending upon the particular array ' being sorted. In a theoretical sense, averaging sorting time over ' all possible data configurations, each considered equally likely ' to exist, QUICKSORT and SPLITSORT both vary as M*log(M). ' The peculiar and potentially disasterous quirk of QUICKSORT is that ' its worst case array configuration -- the cases in which its time ' actually varies as M*M instead of the theoretical M*log(M) -- is a ' sorted or antisorted array. Nearly-sorted and nearly-antisorted arrays ' are nearly as bad. Since such configurations are highly likely to exist ' in a real data processing setting, QUICKSORT can be very trecherous. ' Here is what can happen. You sort a random array in a couple of minutes, ' per the M*log(M) timing. You now change a few values in the array, and ' try to resort it. Intuitively, this ought to take even less time, since ' most of the numbers already are sorted. Not so. QUICKSORT may now run ' for hours before it's done, since its performance is now near the worst- ' case timing of M*M! ' This infamously trecherous behavior has been attacked various ways: ' 1. Provide whatever manual and programmed means are necessary to ' avoid apply QUICKSORT to ill-conditioned arrays. This of course ' requires some other sorting algorithm to handle those cases. The ' nearly sorted condition that makes QUICKSORT work worst happily ' makes BUBBLE SORT and INSERT SORT work best (very nearly proportional ' to M). ' 2. Deliberately unsort every array before attempting to sort it. This ' requires some randomizing routine to scramble the numbers, and is ' a paradoxical approach, to say the least. ' 3. Devise some more-or-less elaborate scheme for choosing the pivot. ' The original QUICKSORT took the first number in the array as the ' pivot. If the numbers are "randomly" ordered, the expected value ' of each number is the average of all the numbers, so the expected ' final location of the pivot (at which the array will be partitioned) ' is virtually in the middle (the ideal condition). Thus there is no ' advantage to any other choice of a pivot: one record is as good ' as another, on average. To get a better choice, you can examine ' the first several numbers (maybe 3 or 5 of them), doing a little ' subsort to find the middle one, and use it as the pivot. ' Clearly, all those options are clumsy and troublesome. But they're ' the sort of things you have to do to apply QUICKSORT to real-world ' sorting problems. ' SPLITSORT works *consisently best* for arrays that are actually sorted ' or nearly sorted or random or reverse-sorted or nearly reverse-sorted, ' that is, for all data configurations most likely to be encountered. ' As with any possible partition sort, it does have a worst case ' in which its time goes as M*M, not M*log(M). But this is a queer ' sequence of values extremely unlikely to exist in practice. SPLITSORT ' is thus a much more robust, reliable, versatile and dependable algorithm ' than the classical QUICKSORT. You can apply it indiscriminately. There's ' no need for the sort of special tricks described above. ' Nor is there any speed penalty for this greater consistency and ' dependability. There's actually a slight speed *increase*. Here's why. ' SPLITSORT stores the pivot at the start of each problem, creating ' a logical gap in the array. Then it fills that gap with a selected ' number from the array, creating a new gap at that number's old ' location, which new gap it then fills with the next-selected ' number, etc. until a final gap location is attained. The pivot ' is then put back into that gap. Thus SPLITSORT takes N+1 assignments ' to move N numbers. ' QUICKSORT, like all classical sorts, needs 3 assignments to move ' 2 numbers, e.g., TEMP = X, X = Y, Y = TEMP. This takes 3N/2 ' assignments to move N numbers, or about 50% more assignments and ' thus 50% more time than SPLITSORT. In terms of the time spent moving ' numbers, SPLITSORT is always slightly faster than QUICKSORT -- a nice ' though noncritical advantage. ' The critical advantage is in terms of the time spent comparing numbers, ' and the way that time varies as a function of the particular ordering of ' numbers in a given data array. From this viewpoint, SPLITSORT is at least ' as fast as QUICKSORT when QS works best (random data) and is decisively ' faster than QS in the several important and frequent situations where QS ' misbehaves. ' SPLITSORT is consistently fast and dependable.