WHAT IS AN ALN? An ALN (adaptive logic network) is a kind of artificial neural network. It is a special case of the familiar multilayer perceptron (MLP) feedforward network and has been designed to perform basic pattern classification computations at extremely high speed -- literally at the upper limit of speed for any digital system. WHAT COULD ONE USE ALNS FOR? They are currently being tried out in simulations using the atree software for various applications: 1. Optical Character Recognition (OCR) -- a demo of OCR is included in atree release 2.7; the great immunity of ALN trees to noise is made clear (Thomas) 2. Discriminating between two kinds of particles produced by an accelerator, where a decision must be made in less than 1/2 microsecond (Volkemer). 3. Designing control systems for prostheses to enable people with spinal cord damage to walk by means of functional electrical stimulation. Simplicity of fabrication, safety and lightness are important here. (Stein, et al.) 4. Analyzing spectra of tarsands to determine the composition; data analysis was used to cut down the number of spectral measurements required from 700 to 25. (Parsons). 5. Classifying the amount of fat in beef based on texture measures taken of ultrasound images of beef cattle. (McCauley et al.) INPUTS The following assumes the reader has read something about typical feed-forward neural nets. In the case of ALNs, the input data to the learning system must be boolean. Non-boolean data is coded into boolean vectors before entering the learning part. This may involve arithmetic and table lookup operations. Several boolean outputs may be used to provide outputs which are more complex than boolean, say continuous values. There are no a priori limitations on the kind of functions that can be treated using ALNs. ARCHITECTURE OF THE NET The interconnections of nodes usually take the form of a binary tree, except that there are multiple connections of inputs to the leaf nodes of the tree, some involving inversions. There is no limit to the number of hidden layers in practice. Ten or fifteen is typical. A small net might have 255 nodes, a large one 65,535 nodes. One doesn't have to worry about the shape of the net as much as with other networks. THE ELEMENTS OF THE NEURAL NETWORK The nodes (or adaptive logic elements) have two input leads. The input signals x1, x2 are boolean ( 0 or 1). A connection "weight" to be used for each input ( to use the term for the usual kind of neural net) is determined by a single bit of information. The nonlinearity, or "squashing function", used in ALNs is just a simple threshold (comparison to a constant). Specifically, if b1 and b2 are boolean, then the element computes a boolean value which is 1 if and only if (b1 + 1) * x1 + (b2 + 1) * x2 >= 2. The four combinations of b1 and b2 ( 00, 11, 10, 01) generate the four boolean functions of two variables: AND, OR, LEFT, and RIGHT, where LEFT(x1, x2) = x1 and RIGHT(x1, x2) = x2. For example 1 * x1 + 1 * x2 >= 2 if and only if both x1 AND x2 are 1. The LEFT and RIGHT functions, in effect, serve to provide some flexibility in the interconnections. They disappear after training is complete. Two counters in the node count up and down in response to 0s and 1s desired of the network when the input to the node is 01 or 10 respectively. They try to determine the best function for the node even in the presence of noise and conflicting training data. The counters also disappear after training is complete. THE TRAINING OF THE TREE A tree of such elements is randomly connected to some boolean input variables and their complements. Inputs enter the tree at the leaves and the output is at the root node. The input variables are components of a vector representing a pattern, such as a handwritten character. Training on a set of input vectors, furnished with the desired boolean outputs, consists of presenting input vectors in random sequence, along with the desired outputs. It results in an assignment of functions to nodes that allows the tree to approximate the desired outputs specified by the training data. To orchestrate the changes to be made during training, a solution to the credit assignment problem is used. It is based on the idea that if one input to an AND (OR) is 0 (1), then it doesn't matter what the nodes of the subtree attached to the other input are doing. According to this simple rule, a node is responsible if a change in its output would change the network output. (This is like backprop, except that the quantity being propagated backwards is either 0 or 1.) The fact that all node functions are increasing causes a positive correlation between the node output and the network output, which indicates the direction of changes that must be made. In addition, there is a small refinement which goes beyond this simple scheme. (You can look at the function starting adapt(tree,vec,desired) in atree.c to confirm that this is a very simple yet effective mechanism). After training, the ALN then has built-in capacity to generalize, i.e. to give appropriate responses to other possible input vectors not presented during training. This comes about not because of any additional hardware or software, but just because of the architecture of the tree -- trees filter out perturbations of the inputs. Hence no speed is lost in order to generalize well -- the architecture does it. The training set can contain contradictory samples, and the relative numbers of such conflicting samples for the same input will be taken into account in the training procedure to approximate maximum likelihood and Bayes decision rules. BUILDING HARDWARE FROM THE TRAINED SYSTEM Training an ALN results in a combinational logic circuit. After adaptation, one eliminates LEFT and RIGHT functions, which are now redundant and groups the connected subtrees of two-input ANDs together to get ANDs of fanin >= 2. Similarly for ORs. The (alternating) layers of ANDs and ORs are logically equivalent to layers consisting entirely of NANDs. (Use De Morgan's Law.) The result is directly translatable into easily built hardware -- essentially a tree of transistors. This is far, far simpler than any other neural system. There are no built in delays, flip-flops, arithmetic units or table look-ups to slow the system down. HARDWARE SPEED The execution time of such a circuit depends on the depth, not the size. If we were using a twenty layer tree of transistors in submicron CMOS, the entire function would be executed in a few nanoseconds. In the usual terms of numbers of one-bit connections per second, we would get many trillions of connections per second from an ALN, thousands of times more than existing special hardware systems for implementing backprop. The fastest we know of on the market for backprop only allows a few billions of connections per second at peak rate. COST OF HARDWARE For small problems, even one of the off-the-shelf programmable chips that exist today can do the job, though at a slower rate than a custom chip. Such fast, inexpensive hardware is being investigated at an accelerator facility for making a fast trigger to recognize elementary particles in under 1/2 microsecond. A few hundred dollars should cover the materials. This is in sharp contrast to other kinds of neural networks where you need to have processors costing thousands of dollars just to get into the game seriously, and even then the result of all their expensive, learning will be a system that executes far more slowly than an ALN tree of transistors. With ALNs, you are seriously in the game from day one with the atree simulators. SPEED OF THE ATREE SIMULATOR Even in the atree software simulator, evaluation is fast, thanks to the fact that boolean logic can be lazily evaluated. Once one input to an AND gate is computed to be 0, we don't need the other input, so we can suppress a whole subtree of the computation. Laziness can provide speedups by factors of 10 or 100 for typical problems. On the other hand, backprop systems do not enjoy such speedups. Non-logical MLPs compute *everything*. It's like doing programming without any "if"-statements. (It's no wonder the usual neural networks need massive parallelism -- they suffer from this serious omission.) TRAINING SPEED The adaptation algorithms are also designed to be fast: what corresponds to backprop, a credit assignment computation, is also combinational logic in hardware, so it will be possible to build systems which will adapt a tree at a rate of, say, 25 million patterns per second. The difference in speed of training and execution of backprop systems on the one hand, and ALN systems on the other, results from the drastic simplification of the usual MLP paradigm to base it on boolean logic. The difference is immense. One shouldn't think of it as getting a racing car from a station wagon (twice as fast, say), it is more like getting a rocket from a hot air balloon (thousands of times faster). As the problems get larger, the advantage of the ALN system over the usual MLP just keep on growing. GENERALIZATION AND NOISE IMMUNITY Good generalization without loss of speed is the strong point of ALNs. Pattern classification problems generally require an output which is the same for all patterns in a class. Members of a class are in some way similar. Similarity could be based on all sorts of criteria, but generally it can be defined by saying two individuals have many features in common. If a feature is represented by a boolean variable, then the above says, similarity can be measured by Hamming distance between boolean vectors. In order to use ALNs, we generally have to code other representations of individuals into boolean vectors in such a way that similarity for pattern classification purposes is reflected in Hamming distance. Binary trees with node functions AND, OR, LEFT, and RIGHT have been shown to have the property that the functions they realize tend to have the same output value when the inputs are perturbed. This depends on averaging over all functions on the tree, weighted by the number of ways each can be produced by setting the two bits at each node which determine the node's function. It is also necessary to average over all vectors and all perturbations of a given number of bits. The plausible reasoning behind this is that when an input signal to a gate changes, it may or may not change the output. A change to an input to an AND will not change its output if the other input to the AND is 0. Over all, the probability of a change is 0.5, if one input changes. So for a 10-layer tree, the probability of a single bit change at a leaf causing a change of the tree output is 1/1024. ALNs match the solution to the problem to provide fast pattern recognition. Simple properties of boolean logic provide speed, immunity to noise, and offer the possibility of lazy evaluation. THE FUTURE In the Fall of 1992, we expect to have a much enhanced version, embodying some recent developments, that have resulted in part thanks to interaction with people on Internet via email and news. It will have - superior adaptive algorithms for both continuous and boolean data - an ability to deal with continuous values, without significant loss in speed, that will make other MLPs obsolete, including those using variants of backprop - a sound statistical basis for fitting continuous functions - a design methodology that will be easy to understand and will result in provably *safe* systems; there will be no more problem of wild values being produced by a "black box", so ALNs can be applied in safety-critical areas - an interface to common database management systems on micros and mainframes - an interface to a popular spreadsheet program under Windows. - facilities to encourage developers to apply ALNs to pattern recognition problems: OCR, voice, control of virtual realities etc. - a price, since we can't carry on otherwise. AN INVITATION I hope the above arguments will be enough to convince the reader to try the atree release 2.0 software for Unix and/or the new release f2.7 for Windows on the IBM PC and compatibles. At the very least, you should take a look at the OCR demo in release 2.7 to see the great noise immunity which results from such simple means. The lf language should make it easy to get started, and there are lots of examples in release 2.7. (You could look at them even if you only want the Unix version.) If you like what you see, work with us. Lots of researchers are doing so, some via the net: physicists, engineers, neuroscientists, speech pathologists, ... Join the alnl mailing list (email to alnl-request@cs.ualberta.ca providing a correct address that will last for a while please). The interest in backprop-type neural networks which has blossomed in the last few years is obviously fading fast and funding is less than expected. I heard that it is because "neural networks" were just slightly better than standard techniques. The problems that need high speed pattern recognition are still there, though, and when speed is an issue, ALNs are not just slightly better, but thousands of times better than anything else. As for the quality of ALN classifiers, you'll just have to try them out in your area of application. For beef grading, at last report, ALNs were the best technique they had. A particle physicist, as far as I know, is still trying to explain what information an ALN was using that managed to learn even when he deprived it of momentum information that was thought to be essential. In prosthetics, the ALN didn't make a mistake a physiotherapist did, even though the mistake was in the training set. A lot of people are pretty happy about their results with ALNs. Good luck with your results using them! Bill Armstrong The file for release 2.7 will be announced as soon as it is available. Another posting will provide a list of references.