Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Genetic Algorithm Improves Shellsort 71

gstover writes "As a personal programming project recently, I used a genetic algorithm to find a better sequence of increments for Shellsort. It found at least one sequence that out-performs the best previously known by about 5 percent. Here is an article about it (also here). I believe this is the first time a genetic algorithm has actually improved Shellsort."
This discussion has been archived. No new comments can be posted.

Genetic Algorithm Improves Shellsort

Comments Filter:
  • by tetrode ( 32267 ) on Monday December 09, 2002 @01:43PM (#4844826) Homepage
    You'll be included in THE book & be famous

    Mark
  • Impressive (Score:2, Interesting)

    by trajano ( 220061 )
    Personally, I am quite impressed with the concept of Genetic Algorithms.

    I could imagine your shell sort could be applied to database technologies by creating a separate low priority thread that would generate the "gene pool" to improve the indexing of the tables in the background.

    On a side note I am learning about genetic algorithms with hopes that I may work on it when I take my Masters in Computer Science.
    • Re:Impressive (Score:4, Interesting)

      by jovlinger ( 55075 ) on Monday December 09, 2002 @04:08PM (#4845985) Homepage
      Danny Hillis used his CM-1 to evolve sorting networks for known lengths. Using a predator-prey model (the predator won by generating sequences that the networks failed to sort), he evolved several "optimal" sorters.

      An obvious extension to generic lengths is to use these precomputed networks as recursion base cases for quicksort, instead of switching to selection sort for lengths x (x ~ 5 typically).

      IIRC shellsort works similarly: recursively sorting subsequences and merging results.
    • Re:Impressive (Score:2, Informative)

      by pirap ( 593922 )

      You should not look at genetic algorithms, but perhaps upon a newer "family" known as PSO (Particle Swarm Optimization). It's obviously very similar to GAs but takes care of cooperation as part of the development of solutions. This is the "weak" point of GAs as they only work through "battling" other solutions.

      Swarms are a concept which was "invited" in 1995 I believe. Read some articles about it.
  • This may be the first time something potentially important hit Slashdot before being reported elsewhere! :-)
  • I guess it's a nice thing that shellsort is getting improved, and I do realize that this is great for genetic research, but, pardon me for being an uneducated ignorant bastard, does it have any practical value, seeing as quicksort is much faster than shellsort? Or am I wrong?
    • Re:Nice and all (Score:2, Informative)

      by jalet ( 36114 )
      quicksort doesn't preserve the order of duplicate keys, IIRC
      • Re:Nice and all (Score:3, Informative)

        by Yokaze ( 70883 )
        Bottom-up mergesort does.
        Theta(n log(n)) instead of Theta(n^2) like the (standard) Quicksort.

        (Shellsort has Theta(n log(n)^2), IIRC)
    • Re:Nice and all (Score:4, Interesting)

      by photon317 ( 208409 ) on Monday December 09, 2002 @03:08PM (#4845484)

      There may be some cases where shellsort is more desirable for the exact data being sorted, I don't really know for sure. The importance of this is that he has used a GA to better the optimization work of humans on shellsort. He has laid the groundwork and circumstantial proof out for others to do the same with other algorithms. Of course he evolved a set of constants more than an algorithm itself.

      The next logical place to go with this work, IMHO:

      1) Invent a concise fake machine language for sorting algorithms (a convenience to make the rest of this easier). It should have basic instructions used in manipulating and sorting arrays (move, compare, copy, branching, etc...).

      2) Write a "sort engine" that takes as input algorithms written in your fake language and uses them to sort things (outputting some performance numbers).

      3) Implement every known array sorting algorithm you can find in your little fake machine sort language.

      4) Let a GA evolve the whole algorithm by arbitrarily replacing bytes with other valid bytes from your specialized assembler language, starting with all the known sort algs as parents. Let it run until it stabilizes, using a relatively high mutation rate.

      Of course, the big problem is that if your language implements any kind of looping construct, or any other way that code can be re-executed (and it will almost have to), then you face a "halting problem" when testing each new child. The pratical workaround is of course to know that any reasonable algorithm must finish the sort in a certain bounded amount of cpu cycles, and terminate any children who take longer.

      5) Translate the winning candidate(s) custom machien source back into a generic language like C, and puzzle over exactly why it works so damn well.
      • Re:Nice and all (Score:3, Interesting)

        by platypus ( 18156 )
        But you'd better make damn sure that your GA doesn't improve the algorithm specifically for your "virtual machine". I.e. if pointer arithmetics were very slow (pulling that example out of my ass, btw.), your GA might tend to penalize algorithms using them more often then others.
        • Re:Nice and all (Score:3, Interesting)

          by photon317 ( 208409 )
          True true. Probably the answer would be instead measuring the real execution time in your engine, meaure it in number of various operations, and weight them by how expensive those operations are on a typical modern 32 bit processor.
      • Re:Nice and all (Score:3, Interesting)

        by doubtless ( 267357 )
        You might want to check out Genetic Programming instead. It is the method of evolving programs instead of evolving solutions like Genetic Algorithm described here.

        There are some inherit disadvantages of GP, while being much more powerful than GA. One of them being the problem of code growth, usually after some generations, the organisms gets to be too large and starting to cosume too much memory and cpu resources to be feasible. One of the papers I wrote while taking Evolutionary Computation class while in college was the investigation and a 'solution/improvement' to the code growth problem.
    • Re:Nice and all (Score:2, Informative)

      by Pentagram ( 40862 )
      Quicksort is slow if the data it is sorting are already almost sorted, a situation which (IIRC) shellshort is particularly efficient at dealing with.
    • Re:Nice and all (Score:4, Interesting)

      by Da VinMan ( 7669 ) on Monday December 09, 2002 @03:18PM (#4845566)
      GA is good for a lot of things. For instance, it was used to redesign diesel engines to be more efficient [ceintl.com].

      The big problem with GA though, IIRC, is that the resulting solution is often incomprehensible to a human. I believe Bill Joy did some work with GAs and had comments along those lines (sorry, I couldn't find the quote). Consider for a moment though trying to troubleshoot code generated by a computer. Bad variable names would be just the start of your problems. The logic patterns employed would be essentially random to a human. Many of the patterns would be vestigial and wouldn't even be relevant, but you wouldn't even know that. Identifying the primary execution paths would be a huge chore... never mind actually understanding the basis for why the generated solution actually works.

      How comfortable would you be deploying a solution (hardware or software) where the fundamental design isn't even understood? How the heck do you fix such a thing once it's deployed?
      • by skaffen42 ( 579313 ) on Monday December 09, 2002 @10:51PM (#4850368)
        How comfortable would you be deploying a solution (hardware or software) where the fundamental design isn't even understood? How the heck do you fix such a thing once it's deployed?

        I'm just happy my parents didn't have the same concerns when they "deployed" me! :)

      • How comfortable would you be deploying a solution (hardware or software) where the fundamental design isn't even understood? How the heck do you fix such a thing once it's deployed?

        In this case it does not matter much. The method used to sort data is not likely to be visible in the design.

        Any program that lives through multiple maintainers (esp. any interns) likely has code that no one understands. Layers of compatability requirements mean you can never change the softwares behaviour, but no one can figure out everything the software is doing.

        Ever see modules with comments like:

        /* XXX: 09/17/97 - What does this do? */
        if (someglobal->_pt->getTarget()->foo() == MAGIC_COOKIE_42 + 7)

        Maybe with some inconsistently applied attempts at a naming convention thrown in to repel all attempts to figure out what the code is doing.

      • Consider for a moment though trying to troubleshoot code generated by a computer. Bad variable names would be just the start of your problems. The logic patterns employed would be essentially random to a human. Many of the patterns would be vestigial and wouldn't even be relevant, but you wouldn't even know that. Identifying the primary execution paths would be a huge chore... never mind actually understanding the basis for why the generated solution actually works.


        How comfortable would you be deploying a solution (hardware or software) where the fundamental design isn't even understood? How the heck do you fix such a thing once it's deployed?

        This sort of (non)design is often found in real code written by humans. Only difference is that the 'generated solution' doesn't usually work too well. I have many years exp fixing such things once deployed :-(

      • >How comfortable would you be deploying a solution (hardware or software) where the fundamental design isn't even understood? How the heck do you fix such a thing once it's deployed?

        --I dunno, it seems to work for MS... :b

      • > The big problem with GA though, IIRC, is that the resulting solution is often incomprehensible to a human.

        Same with perl, but people use it anyway.

      • You're confusing GA and GP (genetic programming). GA usually refer to a case where the program is fixed, and the genes only consists of parameters to the program. With genetic programming, your issues are valid, however they can be worked around:

        - A lot of simplification of a program can be automated. Some of that simplification work will require significant work to implement, but it's doable. Other parts of the simplification can be done by changing the fitness function once you have an efficient algorithm: In addition to performance or whatever criteria used, you add measurements for program size and complexity as secondary influences on the fitness.

        - Robustness of the language the program is generated in can be strengthened. Typically GP uses custom domain specific languages. This both simplifies understanding, and prevent crashes.

        - The experiments can be rerun with additions to the language used that group constructs that are complex but understood in order to attempt to simplify the solutions. The population can be steered towards using these constructs by rewarding them in the fitness function.

        All in all, GP is very promising for certain types of problems, but it shouldn't be understood at "press this button and out comes a polished working program", rather as a tool to approach problems that are hard to analyse (especially where there is no 100% "correct" solution, or a correct solution is extremely hard to find, but gradual approximation has the potential of working well enough). GA and GP also has great potential wherever requirements change slowly over time and the software needs to adapt, as a way of preventing manual tweaking.

        Imagine a lift control GA, for instance (actually used as an example in Dr Dobbs some time ago), where a near optimal solution could be used as a starting point for the lifts behavior. However "traffic patterns" in a building change - new tenants move in, old ones move out, departments move and create changes in which floors have the heaviest traffic, and which floors the traffic is between, and a GA could quite possibly be used to automatically adapt to such changes.

        On a sidenote, before anyone raise concerns over letting a "random" program run a lift, the whole point here is that a GA isn't random. You use a hard coded algorithm that enforce rules for what is acceptable and what not. For a lift, of course, you'd have low level requirements such as breaking at a specific speed, and high level requirements such as newer changing direction before all the floors requested in the current direction have been visited.

        The GA will only adapt the parameters you choose to surrender control of, such as whether or not the lifts should move when not requested (some of them moving to a specific floor for lunch time, for instance), and how long to wait before doing so.

    • Quicksort will still be slightly faster than shellsort assuming it is coded properly. But coding quicksort "properly" can be a bit tricky. You must be careful about many things, and you have to come up with some way to choose a good pivot. Also, quicksorting small lists is usually a bad idea, so we also have to write an auxiliary sorting algorithm for the small lists that quicksort gives us.

      By contrast, shellsort is quite easy and clean to code, and this improvement to this algorithm just uses a better increment sequence, which doesn't complicate things by much.

      So, to answer your question, this has a lot of practical value, namely when development time is more valuable than running time.
    • Re:Nice and all (Score:5, Informative)

      by Gumshoe ( 191490 ) on Monday December 09, 2002 @03:36PM (#4845719) Journal
      I guess it's a nice thing that shellsort is getting improved, and I do realize that this is great for genetic research, but, pardon me for being an uneducated ignorant bastard, does it have any practical value, seeing as quicksort is much faster than shellsort? Or am I wrong?


      You're generalising, which is an easy mistake. Quicksort is good but for many data sets, it is very likely suboptimal and another alogorithm would perform better. Shellsort is good for small data sets. In fact, a good quicksort implementation may use shellsort once the divisions get small enough.

      Another example:, quicksort is poor at sorting strings. This is because you have to perform a full strcmp() (or equivalent) at each comparison. For this reason, a modified radix sort is often used.
      • Another example:, quicksort is poor at sorting strings. This is because you have to perform a full strcmp() (or equivalent) at each comparison. For this reason, a modified radix sort is often used.



        Radix sort is also not very good at sorting strings. Radix sort requires O(nk) where k equals the number of radix points. If one sorts a list names from the phone book, the keys in this case can have over 30 radix points. Since 2^32 is over 4 billon a quicksort can sort better than a radix sort as long as there are less than 1 billon names to be sorted in this case.


        Also strcmp is not that slow over all. Since there are far far more people not named John Smith then there are that are named John Simth, strcmp() will complete the vast majority of it's calls in just a few operations.


        Radix sort can not be done in place, thus requiring double the memory to run (will at least for the pointers which for a large list can still be quite a cost). If this addition memory needs to be swapped in and out the performance penalty can be quite nasty.


        Radix sort is best used when the number of radix points is quite small such as zip codes (only 5 radix points). In general this only happens when a large number of items have only a few keys. This prevents radix sort from being a good general purpose sort.


        Quicksort with a good pivot selection algorithm (to avoid the already sorted list problem) or Mergesort are still used quite often today.
        In fact the Java library sort uses a modified mergesort, and the standard C library has qsort().


        I did check Knuth about this, and radix sort can get around its problems with a beefy machine. I did some checking and in fact radix is standard for supercomputers. So if you are programming a Cray or whatever they call them now days you can ignore everything I have said.

        • Radix sort is also not very good at sorting strings.


          It's the best. It's impossible to look at fewer characters than what is possible with a radix sort.

          Algorithm:

          1. Put the strings into piles according to their first letter.

          2. Subdivide each pile into further piles according to the second letter.

          3. Subdivide each of the new piles according to the third letter.

          4. n 5. Collect piles together

          You only ever look at each character once! The problem in implementation is keeping track of all the piles, and everything you've said by way of criticism (all perfectly valid) is really criticism of an implemenation not the theory.

          If you're interested, this paper [bostic.com] suggests a good, efficient algorithm.

          Also strcmp is not that slow over all. Since there are far far more people not named John Smith then there are that are named John Simth, strcmp() will complete the vast majority of it's calls in just a few operations.


          strcmp() itself isn't slow, but when you're checking the same characters over and over again, performance is shattered.

          Radix sort can not be done in place, thus requiring double the memory to run (will at least for the pointers which for a large list can still be quite a cost). If this addition memory needs to be swapped in and out the performance penalty can be quite nasty.


          Naive radix sort implementation don't sort in place, but it's certainly possible. The American Flag algorithm described in the paper linked to above just does that. Memory usage is still problematic though, but it's nowhere near double.

          Quicksort with a good pivot selection algorithm (to avoid the already sorted list problem) or Mergesort are still used quite often today.


          They are, and they have their place, but that's not my argument. My point was that it's dangerous to generalise quicksort as a superior sort for everything. eg. shellsort is superior for small data sets.

          This applies to radix sort too of course: for very specific input with very specific properties, radix sorting of strings may be weaker than some other algorithm. In fact, the sorting stage algorithm suggested in the original Burrows-Wheeler Transform paper highlights this.

          In fact the Java library sort uses a modified mergesort, and the standard C library has qsort()


          qsort() isn't necessarily a a quicksort implementation (ie. the ANSI standard doesn't require it to be) although it very often is.

          I did check Knuth about this, and radix sort can get around its problems with a beefy machine.


          Did you ask Knuth personally or did you simply check the Book? That sounds really impertinant I know, but the reason I ask is because most of the research on Radix Sort has been done since the Book was written.
    • Samplesort vs Timsort [sourceforge.net].

      Intro
      -----
      "This describes an adaptive, stable, natural mergesort, modestly called
      timsort (hey, I earned it ). It has supernatural performance on many
      kinds of partially ordered arrays (less than lg(N!) comparisons needed, and
      as few as N-1), yet as fast as Python's previous highly tuned samplesort
      hybrid on random arrays."
    • Re:Nice and all (Score:3, Insightful)

      by Kynde ( 324134 )
      I guess it's a nice thing that shellsort is getting improved, and I do realize that this is great for genetic research, but, pardon me for being an uneducated ignorant bastard, does it have any practical value, seeing as quicksort is much faster than shellsort? Or am I wrong?

      Shell sort has better worst-case running-time than quick sort does.

      Most programming these days is all about the common case, but in real-time programming it's all about predictability. Thus there the mergesort is the standard choice over quicksort.
    • I guess it's a nice thing that shellsort is getting improved, and I do realize that this is great for genetic research, but, pardon me for being an uneducated ignorant bastard, does it have any practical value, seeing as quicksort is much faster than shellsort? Or am I wrong?

      Quicksort is good for a lot of cases, but not for all. Yet one more good example (not mentioned here yet) would be the presense of knowledge or how unsorted things really are. Meaning that if you know something extra about the "unsorted" state of the data quicksort can be outperformed easily.

      Classic example is to arrange bank payments based on the actual pruchase date using the credict card. As each store sends the bill with some variation the bills arrive in time unsorted. On the average they're still almost sorted, but not quite. For such a case quicksort would perform quite horribly in respect to what could be done.

      True randomness is far more seldom encountered, thus other means for sorting can also be implemented more often that common people think.
  • Now if only there could only be a genetic algorithm enhancement of current methods of finding the factors of large numbers whose factors are two very large primes.
    • Trust me, you do not want to suddenly unleash an efficient algorithm for factorising into large primes on the world. For a start, security as we know it would more-or-less disappear overnight, if only because so much important data has been encrypted using algorithms that assume such factorisation to be hard.

      • The current system needs to be replaced, and that is the best method I can think of to replace it.
        • The current system needs to be replaced, and that is the best method I can think of to replace it.

          I truly and sincerely believe that you should think about that claim, and then think a whole lot harder about what you'd like to see replace it. There are plenty of places in the world, still, where the protections that we take for granted don't exist. I'm guessing you wouldn't like to be there a whole lot, and releasing all the stuff that's currently protected is a one-way trip.

          • What protections do you take for granted? I don't like being here a whole lot either, but I guess it's better than being elsewhere on the planet for the time being.
            • What protections do you take for granted?

              You've seen and heard a lot about terrorism lately, I assume. When the intelligence agencies screw up, and something like Sept 11 happens, you hear all about it. How many terrorist incidents do you think such agencies prevent for every one that happens? I'm betting it's quite a few.

              I realise that a lot of intelligence officers just read the local papers and use their brains, but I'm guessing that on-the-ground in-with-the-bad-guys people are still pretty important as well. If you blow the cover of every single intelligence officer in the world, who do you think it's going to hurt more, us or the bad guys?

              Next up... You have a bank account, right? How would you feel about not only letting anyone who wanted to see the details, but letting any script kiddie with half a brain play with the money itself? If you remove the encryption that's used when things like transfers are done electronically, what exactly is going to stop them?

              Now, I don't know whether either of these specific things is a valid example. I kinda assume that military and serious finance types have more than just a couple of prime numbers to protect their data. Then again, screw ups do happen, so maybe it's not so much more. It doesn't really matter anyway. Even if these exact things wouldn't crumble if you released an efficient prime factorising algorithm, there are numerous other things along the same lines that are bound to. RSA is a well-known and widely used algorithm that would be cracked instantly, for a start.

              Now, I don't know about you, but personally I rate my personal safety, the security of my finances, the confidentiality of my health records and such pretty highly. Right about now, they're reasonably protected as far as I know. If you release something that cracks such a popular and fundamental encryption technique overnight, that all disappears. Me, I reckon I'd prefer living in the West to living in a third world country where the only people who have any control are the guys with the biggest guns, but maybe I'm just dumb like that...

              • You act as if the same things that were used before encryption becomes useless, will still be used after encrytion becomes usesless. When Encryption becomes invalid, at the very least transactions will have to be performed by armored truck. Hopfully the hassels involved will bring a sea change to a non monetary system, one in which if someone doesn't want to work then they are treated as an invalid, and that is their incentive to work instead of money.
                • Unfortunately, the sort of society you're describing couldn't come about overnight. Those sorts of changes would take many decades to get right even starting from an organised beginning. In the period that would follow the breakdown of encryption, I suspect what you'd actually see would be a period of panic and relative anarchy, a clampdown into police states to control it, and subsequently many years of corruption in governments desperate to retain control until they could reestablish the mechanisms that were there before using alternative means. The entire game doesn't change at all -- breaking one crypto algorithm isn't anything like enough to catalyse that -- it just gets really ****ty for a while.

  • simply well done
  • I don't mean to be a rube, but what exactly does this mean, what's the significance, and will this change my life for the better? tia.
  • by one9nine ( 526521 ) on Monday December 09, 2002 @03:18PM (#4845563) Journal

    "Notice that the mating process has nothing to do with the constraints placed on valid organisms."

    I wish more women would read Slashdot. Chalk one up for the "Size Doesn't Matter" team.

    "I chose single-point crossover as the method of mating because it is simple & generic."

    I choose doggie style for the same reasons.
  • by stackdump ( 553408 ) on Monday December 09, 2002 @03:19PM (#4845568)
    One of my professors at Midwestern State University [mwsu.edu] (texas) has done this before [mwsu.edu]. Although it seems you may have improved on his solution. (but I couldnt find a copy of his paper)

    "Faster Shellsort Sequences: A Genetic Algorithm Application", Proceedings of the International Society for Computers and Their Applications(ISCA), April 7-9, 1999. (With Shashidhar Yachavaram) Presented at the conference, Cancun Mexico.
    • The only GA-originated increments for Shellsort that I could find online was the one in Crew fall 2001 report [allegheny.edu], but that sequence didn't perform very well, & they didn't document their technique except to say that it wasn't a full-blown GA. (They placed some limitations on it. They also mention that Simpson & Yachavaram [allegheny.edu] used only a limited GA.)

      I wondered if a GA that fully manipulated bit-strings could produce a sequence as good as Sedgewick's, & I also wanted to document the technique so someone else interested in GAs could implement their own or at least be a little entertained with how it was done.

  • In Melanie Mitchell's wonderful book, "An Introduction to Genetic Algorithms", page 21-27, The author describes using genetic algorithms to find optimal patterns for "sorting networks". They are used for sorting chips (hardware) to sort in chunks of say 16 or 64 units. The order of comparisons is important. Using genetic algorithms, they were able to come very close to the best sequences found by humans. (I don't know if hardware sorting is still used. I had the impression that it is a mainframe thing.)

    Thus, optimizing sorting algorithms with GA is not new in itself.
    • At my university (Umeå, Sweden) we had exactly this as an assignment in the AI course, using genes that were interpreted into a sorting net, then split, paired and mutated a little bit at a time. It was great fun seeing it start with a giant random network, and then seeing it get shorter and shorter.

      Even though one of my first versions only used half of the genes due to a bug, it still managed to create a network that was just one step larger than the theoretical minimum.

      After Ctrl-C'ing it I implemented a way to actually print and save the network, after which it never got better than three steps worse than the minimum. Damn.
  • the sequence "run 8" in figure 2 in section 9 does not seem to be a valid sequence. I.e. it is not in sorted order and the number 13 appears twice. Is this a typo?
  • From a GA fan, and seeing all the articles on distributed computing...

    You may also be interested in Tierra [atr.co.jp]. Open-ended networked evolution - it's pretty nifty.

    Got to love watching the computer solve problems for you. Ahh...

  • by Anonymous Coward
    Are you sure the "improvement" is not an artifact of the particular random inputs used? This could easily explain such a small (5%) improvement.

    Also, randomly ordered inputs are not common in practical applications. Did you test (or consider testing) partially sorted and partially reverse-sorted numbers? Presumably the contrived sequences (such as Sedgewick's) are balanced to provide good results for both random and non-random data.
  • The comb sort is a variation of the bubble sort algorithm that achieves N log(N) execution times, similar to heapsort. The algorithm is simple and memory efficient, compared to Quicksort.

    http://cs.clackamas.cc.or.us/molatore/cs260Spr01 /c ombsort.htm

    http://world.std.com/~jdveale/combsort.htm
  • Fitting... (Score:3, Funny)

    by wcbarksdale ( 621327 ) on Monday December 09, 2002 @07:20PM (#4848412)
    A paper about genetic algorithms, written by a guy named Gene.
  • by acomj ( 20611 ) on Monday December 09, 2002 @11:27PM (#4850670) Homepage
    Genetic algorithms are interesting in that they can become more efficient and modify themselves. I don't mean to sound like a ludite but is this a good way to develop algorithms?

    Is the efficiency gained worth the difficulty in proving the new genetical enhanced algorithm works?

    Its hard enough with vigorous testing to prove simple programs work without fail without introducing some random "Genetic" changes that may break the algorithm for some input cases.

    One of the problems with AI and self modifing algorithms is that as it learns /evolves you don't get the same output from the same input.. I guess thats the point, but I don't want my programs running like that.

    • In this case, the algorithm is guaranteed to work with a wide range of parameters governing its precise functioning. The genetic algorithm is only being used to search the parameter space for the most efficient parameters. Since the algorithm will work no matter which parameters you pick, this is a perfectly fine way to use genetic algorithms.

      Also, genetic algorithms aren't self-modifying.

    • I think you fundamentally misunderstand what is being done here. Shell Sort works, period. His project does not modify the algorithm in any way. There is no chance of him introducing an error.

      Shell Sort works by roughly sorting, then sorting a little finer, and so on, until you finely sort every piece of data, and hopefully at this final and potentially most time-consuming step, there isn't a lot of work to be done.

      The thing is, there's no platonic God-like perfect mathmatically-proven best choice of how fine or rough the early iterations should be. No matter what increments you use (within certain constraints) Shell Sort will still work every single time (the last step at increment 1 guarantees it) -- it's just a matter of performance. Nobody knows what increments work best, and you can't just mathmatically figure it out.

      ..Which is exactly what makes using a genetic algorithm such a good idea, for trying to come up with some values. GAs are great for optimization problems that defy analysis, as long as you can represent possible solutions as genotype strings and you can write a fitness function to evaluate them. That's what this guy did, and IMHO it is really cool.

      • I was extrapolating to the "AI Future" of much more complex algorithms being evaluated (evolved) in more fundamental ways then just changing parameters.

        My frame of mind is has changed. Right now is I'm working on very time constrained real time stuff, and when I have to do it in X seconds I like to know what the worse case is, not best average case.

        The genetic algorithm sounds like and iterative test to try and find the optimum increments for shell sort. Really nothing revolutionary.

        I couldn't recall from the test description, but it doesn't seem that different arrays to sort were used to evolve the parameters of the sort (increment list). If this is the case the solution is just a sort optimized for a particular dataset and of little use.

        I would be interested to see how well the algorithm would work elvolved against different random data sets (almost in order, always reverse order, random).

        And then compared with others Standard increments.

        They could have a competion like they have to come up with good prisoners delima algorithms, but instead come up with the best shell sort increments...

"May your future be limited only by your dreams." -- Christa McAuliffe

Working...