Power-Programmierung

home *** CD-ROM | disk | FTP | other *** search

/ Power-Programmierung / CD2.mdf / c / library / xplatfrm / tierra / tierra1.tex < prev next >

Wrap

Text File | 1992-04-26 | 55.1 KB | 1,031 lines

Please let me know if you receive this. The enclosed manuscript is a draft, which contains more than any published paper. Papers that have been published or are in press are: Ray, T. S. 1991. ``Is it alive, or is it GA?'' Proceedings of the 1991 International Conference on Genetic Algorithms, Eds. Belew, R. K., and L. B. Booker, San Mateo, CA: Morgan Kaufmann, 527--534. Ray, T. S. 1991. ``An approach to the synthesis of life.'' Artificial Life II, Santa Fe Institute Studies in the Sciences of Complexity, vol. XI, Eds. C. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen, Redwood City, CA: Addison-Wesley, 371--408. Ray, T. S. 1991. ``Population dynamics of digital organisms.'' Artificial Life II Video Proceedings, Ed. C. G. Langton, Redwood City, CA: Addison Wesley. Ray, T. S. 1991. ``Evolution and optimization of digital organisms.'' Scientific Excellence in Supercomputing: The IBM 1990 Contest Prize Papers, Eds. Keith R. Billingsley, Ed Derohanes, Hilton Brown, III. Athens, GA, 30602, The Baldwin Press, The University of Georgia. Publication date: December 1991. I don't have a paper in it, but you might be interested in the following book: Langton, Christopher G. [ed.]. 1989. Artificial life: proceedings of an interdisciplinary workshop on the synthesis and simulation of living systems. Vol. VI in the series: Santa Fe Institute studies in the sciences of complexity. Addison-Wesley. I will be at the Santa Fe Institute Feb. 1 thru Aug. 31, 1992. This work will also be presented in the following upcoming seminars: National Science Foundation, Washington, September 20, 1991 University of Rochester, Biology, October 18, 1991 University of Illinois, Complex Systems Colloquia, October 25, 1991 University of Maryland, Zoology, October 29, 1991 University of Kentucky, Lexington, Biology, October 31, 1991 University of Delaware, Entomology, November 5, 1991 Stony Brook, Department of Ecology and Evolution, November 6, 1991 Drexel University, Electrical Engineering, November 8, 1991 The University of the Arts, Philadelphia, Design in Cyberspace lectures, November 12, 1991 IBM, T. J. Watson Research Center, Yorktown Heights, NY, November 13, 1991 Thinking Machines Corp., Cambridge, November 14, 1991 Digital Equipment Corp., Hudson, MA, November 15, 1991 American Society of Information Science, New Jersey, November 19, 1991 Texas Instruments, Dallas, November 21, 1991 Harvard University, Biology (Lewontin's lab), December 2, 1991 Boston University, Computational Sciences Center, December 3, 1991 MIT Nanotechnology Study Group, December 3, 1991 University of Massachusetts Boston, Biology, December 5, 1991 Yale University, Biology, December 6, 1991 University of Arizona, Ecology & Evolutionary Biology, March 10, 1992 Cornell University, Mathematical Sciences Institute, CA Workshop, May 1992 Gordon Conference on Theoretical Biology, New Hampshire, June 8-12, 1992 A DOS version of the Tierra software with a decent front end will be ready for distribution by December. The source code is available by ftp at: tierra.slhs.udel.edu or life.slhs.udel.edu in a directory named tierra The source code in the ftp site will compile and run on DOS (tested with Turbo C) but does not have a nice front end). There is an Artificial Life mailing list. It is distributed as a digest to avoid chaos. You may subscribe to the list by sending a message to: alife-request@cognet.ucla.edu and you may post to the list by sending a message to: alife@cognet.ucla.edu Tom Ray University of Delaware School of Life & Health Sciences Newark, Delaware 19716 ray@tierra.slhs.udel.edu ray@life.slhs.udel.edu ray@brahms.udel.edu 302-451-2281 (FAX) 302-451-2753 The figures for this paper are in available on request in Generic CADD batch file format. This paper is being sent in two parts. You will have to cut the headers of the two parts at the scissors marks, then cat them together. O / ================================= x ------------------------------------- o \ \documentstyle[12pt]{article} \flushbottom \textheight 9in \textwidth 6.5in \textfloatsep 30pt plus 3pt minus 6pt \parskip 7.5pt plus 1pt minus 1pt \oddsidemargin 0in \evensidemargin 0in \topmargin 0in \headheight 0in \headsep 0in % Hanging Paragraph \def\XP{\par\begingroup\parindent 0in\everypar{\hangindent .3in}} \def\eXP{\par\endgroup} % Left Justified Paragraph \def\LP{\par\begingroup\parindent 0in\everypar{\hangindent 0in}} \def\eLP{\par\endgroup} \begin{document} \thispagestyle{empty} \LP \bf Thomas S. Ray\rm \\ School of Life \& Health Sciences, University of Delaware, Newark, Delaware 19716,\\ ray@brahms.udel.edu\\ \rule[6pt]{6.5in}{1pt} \Large \bf Evolution and Optimization of Digital Organisms\rm \normalsize\\ \rule[6pt]{6.5in}{2pt} \eLP Digital organisms have been synthesized based on a computer metaphor of organic life in which CPU time is the ``energy'' resource and memory is the ``material'' resource. Memory is organized into informational ``genetic'' patterns that exploit CPU time for self-replication. Mutation generates new forms, and evolution proceeds by natural selection as different ``genotypes'' compete for CPU time and memory space. In addition, new genotypes appear which exploit other ``creatures'' for informational or energetic resources. The digital organisms are self-replicating computer programs, however, they can not escape because they run exclusively on a virtual computer in its unique machine language. From a single ancestral ``creature'' there have evolved tens of thousands of self-replicating genotypes of hundreds of genome size classes. Parasites evolved, then creatures that were immune to parasites, and then parasites that could circumvent the immunity. Hyper-parasites evolved which subvert parasites to their own reproduction and drive them to extinction. The resulting genetically uniform communities evolve sociality in the sense of creatures that can only reproduce in cooperative aggregations, and these aggregations are then invaded by cheating hyper-hyper-parasites. Diverse ecological communities have emerged. These digital communities have been used to experimentally study ecological and evolutionary processes: e.g., competitive exclusion and coexistance, symbiosis, host/parasite density dependent population regulation, the effect of parasites in enhancing community diversity, evolutionary arms races, punctuated equilibrium, and the role of chance and historical factors in evolution. It is possible to extract information on any aspect of the system without disturbing it, from phylogeny or community structure through time to the ``genetic makeup'' and ``metabolic processes'' of individuals. Digital life demonstrates the power of the computational approach to science as a complement to the traditional approaches of experiment, and theory based on analysis through calculus and differential equations. Optimization experiments have shown that freely evolving digital organisms can optimize their algorithms by a factor of 5.75 in a few hours of real time. In addition, evolution discovered the optimization technique of ``unrolling the loop''. Evolution may provide a new method for the optimization or generation of application programs. This method may prove particularly useful for programming massively parallel machines. \LP \rule[6pt]{6.5in}{1pt} evolution, ecology, artificial life, synthetic life, emergence, self-replication, diversity, adaptation, coevolution, optimization \eLP \newpage % \baselineskip = 20pt \LP \bf Thomas S. Ray\rm \\ School of Life \& Health Sciences, University of Delaware, Newark, Delaware 19716,\\ ray@brahms.udel.edu\\ \rule[6pt]{6.5in}{1pt} \Large \bf Evolution and Optimization of Digital Organisms\rm \normalsize\\ \rule[6pt]{6.5in}{2pt} \vspace{6cm} \begin{quote} Marcel, a mechanical chessplayer... his exquisite 19th-century brainwork --- the human art it took to build which has been flat lost, lost as the dodo bird ... But where inside Marcel is the midget Grandmaster, the little Johann Allgeier? where's the pantograph, and the magnets? Nowhere. Marcel really is a mechanical chessplayer. No fakery inside to give him any touch of humanity at all.\\ \hspace*{2in}--- Thomas Pynchon, \it Gravity's Rainbow\rm . \end{quote} \large \bf 1 INTRODUCTION\rm \normalsize \eLP Ideally, the science of biology should embrace all forms of life. However in practice, it has been restricted to the study of a single instance of life, life on earth. Life on earth is very diverse, but it is presumably all part of a single phylogeny. Because biology is based on a sample size of one, we can not know what features of life are peculiar to earth, and what features are general, characteristic of all life. A truly comparative natural biology would require inter-planetary travel, which is light years away. The ideal experimental evolutionary biology would involve creation of multiple planetary systems, some essentially identical, others varying by a parameter of interest, and observing them for billions of years. A practical alternative to an inter-planetary or mythical biology is to create synthetic life in a computer. The objective is not necessarily to create life forms that would serve as models for the study of natural life, but rather to create radically different life forms, based on a completely different physics and chemistry, and let these life forms evolve their own phylogeny, leading to whatever forms are natural to their unique physical basis. These truly independent instances of life may then serve as a basis for comparison, to gain some insight into what is general and what is peculiar in biology. Those aspects of life that prove to be general enough to occur in both natural and synthetic systems can then be studied more easily in the synthetic system. ``Evolution in a bottle'' provides a valuable tool for the experimental study of evolution and ecology. The intent of this work is to synthesize rather than simulate life. This approach starts with hand crafted organisms already capable of replication and open-ended evolution, and aims to generate increasing diversity and complexity in a parallel to the Cambrian explosion. To state such a goal leads to semantic problems, because life must be defined in a way that does not restrict it to carbon based forms. It is unlikely that there could be general agreement on such a definition, or even on the proposition that life need not be carbon based. Therefore, I will simply state my conception of life in its most general sense. I would consider a system to be living if it is self-replicating, and capable of open-ended evolution. Synthetic life should self-replicate, and evolve structures or processes that were not designed-in or pre-conceived by the creator (Pattee [30]; Cariani [5]). Core Wars programs, computer viruses, and worms (Cohen [6]; Dewdney [10, 11, 13, 14]; Denning [9]; Rheingold [32]; Spafford et al. [33]) are capable of self-replication, but fortunately, not evolution. It is unlikely that such programs will ever become fully living, because they are not likely to be able to evolve. Most evolutionary simulations are not open-ended. Their potential is limited by the structure of the model, which generally endows each individual with a genome consisting of a set of pre-defined genes, each of which may exist in a pre-defined set of allelic forms (Holland [20]; Dewdney [12]; Dawkins [7, 8]; Packard [29]; Ackley \& Littman [1]). The object being evolved is generally a data structure representing the genome, which the simulator program mutates and/or recombines, selects, and replicates according to criteria designed into the simulator. The data structures do not contain the mechanism for replication, they are simply copied by the simulator if they survive the selection phase. Self-replication is critical to synthetic life because without it, the mechanisms of selection must also be pre-determined by the simulator. Such artificial selection can never be as creative as natural selection. The organisms are not free to invent their own fitness functions. Freely evolving creatures will discover means of mutual exploitation and associated implicit fitness functions that we would never think of. Simulations constrained to evolve with pre-defined genes, alleles and fitness functions are dead ended, not alive. The approach presented here does not have such constraints. Although the model is limited to the evolution of creatures based on sequences of machine instructions, this may have a potential comparable to evolution based on sequences of organic molecules. Sets of machine instructions similar to those used in the Tierra Simulator have been shown to be capable of ``universal computation'' (Aho et al. [2]; Minsky [26]; Langton [24]). This suggests that evolving machine codes should be able to generate any level of complexity. Other examples of the synthetic approach to life can be seen in the work of Holland [21], Farmer et al.\ [16], Langton [22], Rasmussen et al.\ [31], and Bagley et al.\ [3]. A characteristic these efforts generally have in common is that they parallel the origin of life event by attempting to create prebiotic conditions from which life may emerge spontaneously and evolve in an open ended fashion. While the origin of life is generally recognized as an event of the first order, there is another event in the history of life that is less well known but of comparable significance: the origin of biological diversity and macroscopic multicellular life during the Cambrian explosion 600 million years ago. This event involved a riotous diversification of life forms. Dozens of phyla appeared suddenly, many existing only fleetingly, as diverse and sometimes bizarre ways of life were explored in a relative ecological void (Gould [18]; Morris [27]). The work presented here aims to parallel the second major event in the history of life, the origin of diversity. Rather than attempting to create prebiotic conditions from which life may emerge, this approach involves engineering over the early history of life to design complex evolvable organisms, and then attempting to create the conditions that will set off a spontaneous evolutionary process of increasing diversity and complexity of organisms. This work represents a first step in this direction, creating an artificial world which may roughly parallel the RNA world of self-replicating molecules (still falling far short of the Cambrian explosion). The approach has generated rapidly diversifying communities of self-replicating organisms exhibiting open-ended evolution by natural selection. From a single rudimentary ancestral creature containing only the code for self-replication, interactions such as parasitism, immunity, hyper-parasitism, sociality and cheating have emerged spontaneously. This paper presents a methodology and some first results. Apart from its value as a tool for the study or teaching of ecology and evolution, synthetic life may have commercial applications. Evolution of machine code provides a new approach to the design and optimization of computer programs. In an analogy to genetic engineering, pieces of application code may be inserted into the genomes of digital organisms, and then evolved to new functionality or greater efficiency. \LP \rule[6pt]{6.5in}{1pt} \begin{quote} Here was a world of simplicity and certainty... a world based on the one and zero of life and death. Minimal, beautiful. The patterns of lives and deaths.... weightless, invisible chains of electronic presence or absence. If patterns of ones and zeros were ``like'' patterns of human lives and deaths, if everything about an individual could be represented in a computer record by a long string of ones and zeros, then what kind of creature would be represented by a long string of lives and deaths? It would have to be up one level at least --- an angel, a minor god, something in a UFO.\\ \hspace*{2in} --- Thomas Pynchon, \it Vineland\rm . \end{quote} \large \bf 2 METHODS\rm \normalsize \bf 2.1 THE METAPHOR\rm \eLP Organic life is viewed as utilizing energy, mostly derived from the sun, to organize matter. By analogy, digital life can be viewed as using CPU (central processing unit) time, to organize memory. Organic life evolves through natural selection as individuals compete for resources (light, food, space, etc.) such that genotypes which leave the most descendants increase in frequency. Digital life evolves through the same process, as replicating algorithms compete for CPU time and memory space, and organisms evolve strategies to exploit one another. CPU time is thought of as the analog of the energy resource, and memory as the analog of the spatial resource. The memory, the CPU and the computer's operating system are viewed as elements of the ``abiotic'' (physical) environment. A ``creature'' is then designed to be specifically adapted to the features of the computational environment. The creature consists of a self-replicating assembler language program. Assembler languages are merely mnemonics for the machine codes that are directly executed by the CPU. These machine codes have the characteristic that they directly invoke the instruction set of the CPU and services provided by the operating system. All programs, regardless of the language they are written in, are converted into machine code before they are executed. Machine code is the natural language of the machine, and machine instructions are viewed by this author as the ``atomic units'' of computing. It is felt that machine instructions provide the most natural basis for an artificial chemistry of creatures designed to live in the computer. In the biological analogy, the machine instructions are considered to be more like the amino acids than the nucleic acids, because they are ``chemically active''. They actively manipulate bits, bytes, CPU registers, and the movements of the instruction pointer (see below). The digital creatures discussed here are entirely constructed of machine instructions. They are considered analogous to creatures of the RNA world, because the same structures bear the ``genetic'' information and carry out the ``metabolic'' activity. A block of RAM memory (random access memory, also known as ``main'' or ``core'' memory) in the computer is designated as a ``soup'' which can be inoculated with creatures. The ``genome'' of the creatures consists of the sequence of machine instructions that make up the creature's self-replicating algorithm. The prototype creature consists of 80 machine instructions, thus the size of the genome of this creature is 80 instructions, and its ``genotype'' is the specific sequence of those 80 instructions (Appendix C). \LP \bf 2.2 THE VIRTUAL COMPUTER --- TIERRA SIMULATOR\rm \eLP The computers we use are general purpose computers, which means, among other things, that they are capable of emulating through software, the behavior of any other computer that ever has been built or that could be built (Aho et al. [2]; Minsky [26]; Langton [24]). We can utilize this flexibility to design a computer that would be especially hospitable to synthetic life. There are several good reasons why it is not wise to attempt to synthesize digital organisms that exploit the machine codes and operating systems of real computers. The most urgent is the potential threat of natural evolution of machine codes leading to virus or worm type of programs that could be difficult to eradicate due to their changing ``genotypes''. This potential argues strongly for creating evolution exclusively in programs that run only on virtual computers and their virtual operating systems. Such programs would be nothing more than data on a real computer, and therefore would present no more threat than the data in a data base or the text file of a word processor. Another reason to avoid developing digital organisms in the machine code of a real computer is that the artificial system would be tied to the hardware and would become obsolete as quickly as the particular machine it was developed on. In contrast, an artificial system developed on a virtual machine could be easily ported to new real machines as they become available. A third issue, which potentially makes the first two moot, is that the machine languages of real machines are not designed to be evolvable, and in fact might not support significant evolution. Von Neuman type machine languages are considered to be ``brittle'', meaning that the ratio of viable programs to possible programs is virtually zero. Any mutation or recombination event in a real machine code is almost certain to produce a non-functional program. The problem of brittleness can be mitigated by designing a virtual computer whose machine code is designed with evolution in mind. Farmer \& Belin [17] have suggested that overcoming this brittleness and ``Discovering how to make such self-replicating patterns more robust so that they evolve to increasingly more complex states is probably the central problem in the study of artificial life.'' The work described here takes place on a virtual computer known as Tierra (Spanish for Earth). Tierra is a parallel computer of the MIMD (multiple instruction, multiple data) type, with a processor (CPU) for each creature. Parallelism is imperfectly emulated by allowing each CPU to execute a small time slice in turn. Each CPU of this virtual computer contains two address registers, two numeric registers, a flags register to indicate error conditions, a stack pointer, a ten word stack, and an instruction pointer. Each virtual CPU is implemented via the C structure listed in Appendix A. Computations performed by the Tierran CPUs are probabilistic due to flaws that occur at a low frequency (see Mutation below). The instruction set of a CPU typically performs simple arithmetic operations or bit manipulations, within the small set of registers contained in the CPU. Some instructions move data between the registers in the CPU, or between the CPU registers and the RAM (main) memory. Other instructions control the location and movement of an ``instruction pointer'' (IP). The IP indicates an address in RAM, where the machine code of the executing program (in this case a digital organism) is located. The CPU perpetually performs a fetch-decode-execute-increment\_IP cycle: The machine code instruction currently addressed by the IP is fetched into the CPU, its bit pattern is decoded to determine which instruction it corresponds to, and the instruction is executed. Then the IP is incremented to point sequentially to the next position in RAM, from which the next instruction will be fetched. However, some instructions like JMP, CALL and RET directly manipulate the IP, causing execution to jump to some other sequence of instructions in the RAM. In the Tierra Simulator this CPU cycle is implemented through the time\_slice routine listed in Appendix B. \LP \bf 2.3 THE TIERRAN LANGUAGE\rm \eLP Before attempting to set up a synthetic life system, careful thought must be given to how the representation of a programming language affects its adaptability in the sense of being robust to genetic operations such as mutation and recombination. The nature of the virtual computer is defined in large part by the instruction set of its machine language. The approach in this study has been to loosen up the machine code in a ``virtual bio-computer'', in order to create a computational system based on a hybrid between biological and classical von Neumann processes. In developing this new virtual language, which is called ``Tierran'', close attention has been paid to the structural and functional properties of the informational system of biological molecules: DNA, RNA and proteins. Two features have been borrowed from the biological world which are considered to be critical to the evolvability of the Tierran language. First, the instruction set of the Tierran language has been defined to be of a size that is the same order of magnitude as the genetic code. Information is encoded into DNA through 64 codons, which are translated into 20 amino acids. In its present manifestation, the Tierran language consists of 32 instructions, which can be represented by five bits, \it operands included\rm. Emphasis is placed on this last point because some instruction sets are deceptively small. Some versions of the redcode language of Core Wars (Dewdney [10, 13]; Rasmussen et al. [31]) for example are defined to have ten operation codes. It might appear on the surface then that the instruction set is of size ten. However, most of the ten instructions have one or two operands. Each operand has four addressing modes, and then an integer. When we consider that these operands are embedded into the machine code, we realize that they are in fact a part of the instruction set, and this set works out to be about $10^{11}$ in size. Similarly, RISC machines may have only a few opcodes, but they probably all use 32 bit instructions, so from a mutational point of view, they really have $2^{32}$ instructions. Inclusion of numeric operands will make any instruction set extremely large in comparison to the genetic code. In order to make a machine code with a truly small instruction set, we must eliminate numeric operands. This can be accomplished by allowing the CPU registers and the stack to be the only operands of the instructions. When we need to encode an integer for some purpose, we can create it in a numeric register through bit manipulations: flipping the low order bit and shifting left. The program can contain the proper sequence of bit flipping and shifting instructions to synthesize the desired number, and the instruction set need not include all possible integers. A second feature that has been borrowed from molecular biology in the design of the Tierran language is the addressing mode, which is called ``address by template''. In most machine codes, when a piece of data is addressed, or the IP jumps to another piece of code, the exact numeric address of the data or target code is specified in the machine code. Consider that in the biological system by contrast, in order for protein molecule A in the cytoplasm of a cell to interact with protein molecule B, it does not specify the exact coordinates where B is located. Instead, molecule A presents a template on its surface which is complementary to some surface on B. Diffusion brings the two together, and the complementary conformations allow them to interact. Addressing by template is illustrated by the Tierran JMP (jump) instruction. Each JMP instruction is followed by a sequence of NOP (no-operation) instructions, of which there are two kinds: NOP\_0 and NOP\_1. Suppose we have a piece of code with five instruction in the following order: JMP NOP\_0 NOP\_0 NOP\_0 NOP\_1. The system will search outward in both directions from the JMP instruction looking for the nearest occurrence of the complementary pattern: NOP\_1 NOP\_1 NOP\_1 NOP\_0. If the pattern is found, the instruction pointer will move to the end of the complementary pattern and resume execution. If the pattern is not found, an error condition (flag) will be set and the JMP instruction will be ignored (in practice, a limit is placed on how far the system may search for the pattern). The Tierran language is characterized by two unique features: a truly small instruction set without numeric operands, and addressing by template. Otherwise, the language consists of familiar instructions typical of most machine languages, e.g., MOV, CALL, RET, POP, PUSH etc. The complete instruction set is listed in Appendix B. \LP \bf 2.4 THE TIERRAN OPERATING SYSTEM\rm \eLP The Tierran virtual computer needs a virtual operating system that will be hospitable to digital organisms. The operating system will determine the mechanisms of interprocess communication, memory allocation, and the allocation of CPU time among competing processes. Algorithms will evolve so as to exploit these features to their advantage. More than being a mere aspect of the environment, the operating system together with the instruction set will determine the topology of possible interactions between individuals, such as the ability of pairs of individuals to exhibit predator-prey, parasite-host or mutualistic relationships. \LP \bf 2.4.1 Memory Allocation --- Cellularity\rm \eLP The Tierran computer operates on a block of RAM of the real computer which is set aside for the purpose. This block of RAM is referred to as the ``soup''. In most of the work described here the soup consisted of about 60,000 bytes, which can hold the same number of Tierran machine instructions. Each ``creature'' occupies some block of memory in this soup. Cellularity is one of the fundamental properties of organic life, and can be recognized in the fossil record as far back as 3.6 billion years (Barbieri [4]). The cell is the original individual, with the cell membrane defining its limits and preserving its chemical integrity. An analog to the cell membrane is needed in digital organisms in order to preserve the integrity of the informational structure from being disrupted easily by the activity of other organisms. The need for this can be seen in Artificial Life models such as cellular automata where virtual state machines pass through one another (Langton [22, 23]), or in core wars type simulations where coherent structures demolish one another when they come into contact (Dewdney [10, 13]; Rasmussen et al. [31]). Tierran creatures are considered to be cellular in the sense that they are protected by a ``semi-permeable membrane'' of memory allocation. The Tierran operating system provides memory allocation services. Each creature has exclusive write privileges within its allocated block of memory. The ``size'' of a creature is just the size of its allocated block (e.g., 80 instructions). This usually corresponds to the size of the genome. This ``membrane'' is described as ``semi-permeable'' because while write privileges are protected, read and execute privileges are not. A creature may examine the code of another creature, and even execute it, but it can not write over it. Each creature may have exclusive write privileges in at most two blocks of memory: the one that it is born with which is referred to as the ``mother cell'', and a second block which it may obtain through the execution of the MAL (memory allocation) instruction. The second block, referred to as the ``daughter cell'', may be used to grow or reproduce into. When Tierran creatures ``divide'', the mother cell loses write privileges on the space of the daughter cell, but is then free to allocate another block of memory. At the moment of division, the daughter cell is given its own instruction pointer, and is free to allocate its own second block of memory. \LP \bf 2.4.2 Time Sharing --- The Slicer\rm \eLP The Tierran operating system must be multi-tasking (or parallel) in order for a community of individual creatures to live in the soup simultaneously. The system doles out small slices of CPU time to each creature in the soup in turn. The system maintains a circular queue called the ``slicer queue''. As each creature is born, a virtual CPU is created for it, and it enters the slicer queue just ahead of its mother, which is the active creature at that time. Thus the newborn will be the last creature in the soup to get another time slice after the mother, and the mother will get the next slice after its daughter. As long as the slice size is small relative to the generation time of the creatures, the time sharing system causes the world to approximate parallelism. In actuality, we have a population of virtual CPUs, each of which gets a slice of the real CPU's time as it comes up in the queue. The number of instructions to be executed in each time slice may be set proportional to the size of the genome of the creature being executed, raised to a power. If the ``slicer power'' is equal to one, then the slicer is size neutral, the probability of an instruction being executed does not depend on the size of the creature in which it occurs. If the power is greater than one, large creatures get more CPU cycles per instruction than small creatures. If the power is less than one, small creatures get more CPU cycles per instruction. The power determines if selection favors large or small creatures, or is size neutral. A constant slice size selects for small creatures. \LP \bf 2.4.3 Mortality --- The Reaper\rm \eLP Self-replicating creatures in a fixed size soup would rapidly fill the soup and lock up the system. To prevent this from occurring, it is necessary to include mortality. The Tierran operating system includes a ``reaper'' which begins ``killing'' creatures from a queue when the memory fills to some specified level (e.g., 80\%). Creatures are killed by deallocating their memory, and removing them from both the reaper and slicer queues. Their ``dead'' code is not removed from the soup. In the present system, the reaper uses a linear queue. When a creature is born it enters the bottom of the queue. The reaper always kills the creature at the top of the queue. However, individuals may move up or down in the reaper queue according to their success or failure at executing certain instructions. When a creature executes an instruction that generates an error condition, it moves one position up the queue, as long as the individual ahead of it in the queue has not accumulated a greater number of errors. Two of the instructions are somewhat difficult to execute without generating an error, therefore successful execution of these instructions moves the creature down the reaper queue one position, as long as it has not accumulated more errors than the creature below it. The effect of the reaper queue is to cause algorithms which are fundamentally flawed to rise to the top of the queue and die. Vigorous algorithms have a greater longevity, but in general, the probability of death increases with age. \LP \bf 2.4.4 Mutation\rm \eLP In order for evolution to occur, there must be some change in the genome of the creatures. This may occur within the lifespan of an individual, or there may be errors in passing along the genome to offspring. In order to insure that there is genetic change, the operating system randomly flips bits in the soup, and the instructions of the Tierran language are imperfectly executed. Mutations occur in two circumstances. At some background rate, bits are randomly selected from the entire soup (e.g., 60,000 instructions totaling 300,000 bits) and flipped. This is analogous to mutations caused by cosmic rays, and has the effect of preventing any creature from being immortal, as it will eventually mutate to death. The background mutation rate has generally been set at about one bit flipped for every 10,000 Tierran instructions executed by the system. In addition, while copying instructions during the replication of creatures, bits are randomly flipped at some rate in the copies. The copy mutation rate is the higher of the two, and results in replication errors. The copy mutation rate has generally been set at about one bit flipped for every 1,000 to 2,500 instructions moved. In both classes of mutation, the interval between mutations varies randomly within a certain range to avoid possible periodic effects. In addition to mutations, the execution of Tierran instructions is flawed at a low rate. For most of the 32 instructions, the result is off by plus or minus one at some low frequency. For example, the increment instruction normally adds one to its register, but it sometimes adds two or zero. The bit flipping instruction normally flips the low order bit, but it sometimes flips the next higher bit or no bit. The shift left instruction normally shifts all bits one bit to the left, but it sometimes shifts left by two bits, or not at all. In this way, the behavior of the Tierran instructions is probabilistic, not fully deterministic. It turns out that bit flipping mutations and flaws in instructions are not necessary to generate genetic change and evolution, once the community reaches a certain state of complexity. Genetic parasites evolve which are sloppy replicators, and have the effect of moving pieces of code around between creatures, causing rather massive rearrangements of the genomes. The mechanism of this ad hoc sexuality has not been worked out, but is likely due to the parasites' inability to discriminate between live, dead or embryonic code. Mutations result in the appearance of new genotypes, which are watched by an automated genebank manager. In one implementation of the manager, when new genotypes replicate twice, producing a genetically identical offspring at least once, they are given a unique name and saved to disk. Each genotype name contains two parts, a number and a three letter code. The number represents the number of instructions in the genome. The three letter code is used as a base 26 numbering system for assigning a unique label to each genotype in a size class. The first genotype to appear in a size class is assigned the label aaa, the second is assigned the label aab, and so on. Thus the ancestor is named 80aaa, and the first mutant of size 80 is named 80aab. The first creature of size 45 is named 45aaa. The genebanker saves some additional information with each genome: the genotype name of its immediate ancestor which makes possible the reconstruction of the entire phylogeny; the time and date of origin; ``metabolic'' data including the number of instructions executed in the first and second reproduction, the number of errors generated in the first and second reproduction, and the number of instructions copied into the daughter cell in the first and second reproductions (see Appendix C, D); some environmental parameters at the time of origin including the search limit for addressing, and the slicer power, both of which affect selection for size. \LP \bf 2.5 THE TIERRAN ANCESTOR\rm \eLP I have used the Tierran language to write a single self-replicating program which is 80 instructions long. This program is referred to as the ``ancestor'', or alternatively as genotype 0080aaa (Fig.\ 1). The ancestor is a minimal self-replicating algorithm which was originally written for use during the debugging of the simulator. No functionality was designed into the ancestor beyond the ability to self-replicate, nor was any specific evolutionary potential designed in. The commented Tierran assembler and machine code for this program is presented in Appendix C. The ancestor examines itself to determine where in memory it begins and ends. The ancestor's beginning is marked with the four no-operation template: 1 1 1 1, and its ending is marked with 1 1 1 0. The ancestor locates its beginning with the five instructions: ADRB, NOP\_0, NOP\_0, NOP\_0, NOP\_0. This series of instructions causes the system to search backwards from the ADRB instruction for a template complementary to the four NOP\_0 instructions, and to place the address of the complementary template (the beginning) in the ax register of the CPU (see Appendix A). A similar method is used to locate the end. Having determined the address of its beginning and its end, it subtracts the two to calculate its size, and allocates a block of memory of this size for a daughter cell. It then calls the copy procedure which copies the entire genome into the daughter cell memory, one instruction at a time. The beginning of the copy procedure is marked by the four no-operation template: 1 1 0 0. Therefore the call to the copy procedure is accomplished with the five instructions: CALL, NOP\_0, NOP\_0, NOP\_1, NOP\_1. When the genome has been copied, it executes the DIVIDE instruction, which causes the creature to lose write privileges on the daughter cell memory, and gives an instruction pointer to the daughter cell (it also enters the daughter cell into the slicer and reaper queues). After this first replication, the mother cell does not examine itself again; it proceeds directly to the allocation of another daughter cell, then the copy procedure is followed by cell division, in an endless loop. Fourty-eight of the eighty instructions in the ancestor are no-operations. Groups of four no-operation instructions are used as complementary templates to mark twelve sites for internal addressing, so that the creature can locate its beginning and end, call the copy procedure, and mark addresses for loops and jumps in the code, etc. The functions of these templates are commented in the listing in Appendix C. \pagebreak \LP \rule[6pt]{6.5in}{1pt} \large \bf 3 RESULTS\rm \normalsize \bf 3.1 EVOLUTION\rm \eLP Evolutionary runs of the simulator are begun by inoculating the soup of about 60,000 instructions with a single individual of the 80 instruction ancestral genotype. The passage of time in a run is measured in terms of how many Tierran instructions have been executed by the simulator. The original ancestral cell executes 839 instructions in its first replication, and 813 for each additional replication. The initial cell and its replicating daughters rapidly fill the soup memory to the threshold level of 80\% which starts the reaper. Typically, the system executes about 400,000 instructions in filling up the soup with about 375 individuals of size 80 (and their gestating daughter cells). Once the reaper begins, the memory remains roughly 80\% filled with creatures for the remainder of the run. \LP \bf 3.1.1 Micro-Evolution\rm \eLP If there were no mutations at the outset of the run, there would be no evolution. However, the bits flipped as a result of copy errors or background mutations result in creatures whose list of 80 instructions (genotype) differs from the ancestor, usually by a single bit difference in a single instruction. Mutations in and of themselves, can not result in a change in the size of a creature, they can only alter the instructions in its genome. However, by altering the genotype, mutations may affect the process whereby the creature examines itself and calculates its size, potentially causing it to produce an offspring that differs in size from itself. Four out of the five possible mutations in a no-operation instruction convert it into another kind of instruction, while one out of five converts it into the complementary no-operation. Therefore 80\% of mutations in templates destroy or change the size of the template, while one in five alters the template pattern. An altered template may cause the creature to make mistakes in self examination, procedure calls, or looping or jumps of the instruction pointer, all of which use templates for addressing. \LP \bf 3.1.1.1 parasites\rm \eLP An example of the kind of error that can result from a mutation in a template is a mutation of the low order bit of instruction 42 of the ancestor (Appendix C). Instruction 42 is a NOP\_0, the third component of the copy procedure template. A mutation in the low order bit would convert it into NOP\_1, thus changing the template from 1 1 0 0 to: 1 1 1 0. This would then be recognized as the template used to mark the end of the creature, rather than the copy procedure. A creature born with a mutation in the low order bit of instruction 42 would calculate its size as 45. It would allocate a daughter cell of size 45 and copy only instructions 0 through 44 into the daughter cell. The daughter cell then, would not include the copy procedure. This daughter genotype, consisting of 45 instructions, is named 0045aaa. Genotype 0045aaa (Fig.\ 1) is not able to self-replicate in isolated culture. However, the semi-permeable membrane of memory allocation only protects write privileges. Creatures may match templates with code in the allocated memory of other creatures, and may even execute that code. Therefore, if creature 0045aaa is grown in mixed culture with 0080aaa, when it attempts to call the copy procedure, it will not find the template within its own genome, but if it is within the search limit (generally set at 200--400 instructions) of the copy procedure of a creature of genotype 0080aaa, it will match templates, and send its instruction pointer to the copy code of 0080aaa. Thus a parasitic relationship is established (see ECOLOGY below). Typically, parasites begin to emerge within the first few million instructions of elapsed time in a run. \LP \bf 3.1.1.2 immunity to parasites\rm \eLP At least some of the size 79 genotypes demonstrate some measure of resistance to parasites. If genotype 45aaa is introduced into a soup, flanked on each side with one individual of genotype 0079aab, 0045aaa will initially reproduce somewhat, but will be quickly eliminated from the soup. When the same experiment is conducted with 0045aaa and the ancestor, they enter a stable cycle in which both genotypes coexist indefinitely. Freely evolving systems have been observed to become dominated by size 79 genotypes for long periods, during which parasitic genotypes repeatedly appear, but fail to invade. \LP \bf 3.1.1.3 circumvention of immunity to parasites\rm \eLP Occasionally these evolving systems dominated by size 79 were successfully invaded by parasites of size 51. When the immune genotype 0079aab was tested with 0051aao (a direct, one step, descendant of 0045aaa in which instruction 39 is replaced by an insertion of seven instructions of unknown origin), they were found to enter a stable cycle. Evidently 0051aao has evolved some way to circumvent the immunity to parasites possessed by 0079aab. The fourteen genotypes 0051aaa through 0051aan were also tested with 0079aab, and none were able to invade. \LP \bf 3.1.1.4 hyper-parasites\rm \eLP Hyper-parasite have been discovered, (e.g., 0080gai, which differs by 19 instructions from the ancestor, Fig.\ 1). Their ability to subvert the energy metabolism of parasites is based on two changes. The copy procedure does not return, but jumps back directly to the proper address of the reproduction loop. In this way it effectively seizes the instruction pointer from the parasite. However it is another change which delivers the coup de gr\^{a}ce: after each reproduction, the hyper-parasite re-examines itself, resetting the bx register with its location and the cx register with its size. After the instruction pointer of the parasite passes through this code, the CPU of the parasite contains the location and size of the hyper-parasite and the parasite thereafter replicates the hyper-parasite genome. \LP \bf 3.1.1.5 social hyper-parasites\rm \eLP Hyper-parasites drive the parasites to extinction. This results in a community with a relatively high level of genetic uniformity, and therefore high genetic relationship between individuals in the community. These are the conditions that support the evolution of sociality, and social hyper-parasites soon dominate the community. Social hyper-parasites (Fig.\ 2) appear in the 61 instruction size class. For example, 0061acg is social in the sense that it can only self-replicate when it occurs in aggregations. When it jumps back to the code for self-examination, it jumps to a template that occurs at the end rather than the beginning of its genome. If the creature is flanked by a similar genome, the jump will find the target template in the tail of the neighbor, and execution will then pass into the beginning of the active creature's genome. The algorithm will fail unless a similar genome occurs just before the active creature in memory. Neighboring creatures cooperate by catching and passing on jumps of the instruction pointer. It appears that the selection pressure for the evolution of sociality is that it facilitates size reduction. The social species are 24\% smaller than the ancestor. They have achieved this size reduction in part by shrinking their templates from four instructions to three instructions. This means that there are only eight templates available to them, and catching each others jumps allows them to deal with some of the consequences of this limitation as well as to make dual use of some templates. \LP \bf 3.1.1.6 cheaters: hyper-hyper-parasites\rm \eLP The cooperative social system of hyper-parasites is subject to cheating, and is eventually invaded by hyper-hyper-parasites (Fig.\ 2). These cheaters (e.g., 0027aab) position themselves between aggregating hyper-parasites so that when the instruction pointer is passed between them, they capture it. \LP \bf 3.1.1.7 a novel self-examination\rm \eLP All creatures discussed thus far mark their beginning and end with templates. They then locate the addresses of the two templates and determine their genome size by subtracting them. In one run, creatures evolved without a template marking their end. These creatures located the address of the template marking their beginning, and then the address of a template in the middle of their genome. These two addresses were then subtracted to calculate half of their size, and this value was multiplied by two (by shifting left) to calculate their full size. \LP \bf 3.1.1.8 an intricate adaptation\rm \eLP The arms race described in the paragraphs above took place over a period of a billion instructions executed by the system. Another run was allowed to continue for fifteen billion instructions, but was not examined in detail. A creature present at the end of the run was examined and found to have evolved an intricate adaptation. The adaptation is an optimization technique known as ``unrolling the loop''. The central loop of the copy procedure performs the following operations: 1) copies an instruction from the mother to the daughter, 2) decrements the cx register which initially contains the size of the parent genome, 3) tests to see if cx is equal to zero, if so it exits the loop, if not it remains in the loop, 4) increment the ax register which contains the address in the daughter where the next instruction will be copied to, 5) increment the bx register which contains the address in the mother where the next instruction will be copied from, 6) jump back to the top of the loop. The work of the loop is contained in steps 1, 2, 4 and 5. Steps 3 and 6 are overhead. The efficiency of the loop can be increased by duplicating the work steps within the loop, thereby saving on overhead. The creature from the end of the long run had repeated the work steps three times within the loop, as illustrated in Appendix E, which compares the copy loop of the ancestor with that of its decendant. \LP \bf 3.1.2 Macro-Evolution\rm \eLP When the simulator is run over long periods of time, hundreds of millions or billions of instructions, various patterns emerge. Under selection for small sizes there is a proliferation of small parasites and a rather interesting ecology (see below). Selection for large creatures has usually lead to continuous incrementally increasing sizes (but not to a trivial concatenation of creatures end-to-end) until a plateau in the upper hundreds is reached. In one run, selection for large size lead to apparently open ended size increase, evolving genomes larger than 23,000 instructions in length. These evolutionary patterns might be described as phyletic gradualism. The most thoroughly studied case for long runs is where selection, as determined by the slicer function, is size neutral. The longest runs to date (as much as 2.86 billion Tierran instructions) have been in a size neutral environment, with a search limit of 10,000, which would allow large creatures to evolve if there were some algorithmic advantage to be gained from larger size. These long runs illustrate a pattern which could be described as periods of stasis punctuated by periods of rapid evolutionary change, which appears to parallel the pattern of punctuated equilibrium described by Eldredge \& Gould [15] and Gould \& Eldredge [19]. Initially these communities are dominated by creatures with genome sizes in the eighties. This represents a period of relative stasis, which has lasted from 178 million to 1.44 billion instructions in the several long runs conducted to date. The systems then very abruptly (in a span of 1 or 2 million instructions) evolve into communities dominated by sizes ranging from about 400 to about 800. These communities have not yet been seen to evolve into communities dominated by either smaller or substantially larger size ranges. The communities of creatures in the 400 to 800 size range also show a long-term pattern of punctuated equilibrium. These communities regularly come to be dominated by one or two size classes, and remain in that condition for long periods of time. However, they inevitably break out of that stasis and enter a period where no size class dominates. These periods of rapid evolutionary change may be very chaotic. Close observations indicate that at least at some of these times, no genotypes breed true. Many self-replicating genotypes will coexist in the soup at these times, but at the most chaotic times, none will produce offspring which are even their same size. Eventually the system will settle down to another period of stasis dominated by one or a few size classes which breed true. \LP \bf 3.2 DIVERSITY\rm \eLP Most observations on the diversity of Tierran creatures have been based on the diversity of size classes. Creatures of different sizes are clearly genetically different, as their genomes are of different sizes. Different sized creatures would have some difficulty engaging in recombination if they were sexual, thus it is likely that they would be different species. In a run of 526 million instructions, 366 size classes were generated, 93 of which achieved abundances of five or more individuals. In a run of 2.56 billion instructions, 1180 size classes were generated, 367 of which achieved abundances of five or more. Each size class consists of a number of distinct genotypes which also vary over time. There exists the potential for great genetic diversity within a size class. There are 32$^{80}$ distinct genotypes of size 80, but how many of those are viable self-replicating creatures? This question remains unanswered, however some information has been gathered through the use of the automated genebank manager. In several days of running the genebanker, over 29,000 self-replicating genotypes of over 300 size classes accumulated. The size classes and the number of unique genotypes banked for each size are listed in Table 1. The genotypes saved to disk can be used to inoculate new soups individually, or collections of these banked genotypes may be used to assemble ``ecological communities''. In ``ecological'' runs, the mutation rates can be set to zero in order to inhibit evolution.