FORM, a program for the manipulation of very large formulae

J.A.M.Vermaseren

CERN/NIKHEF–H Amsterdam

Most computer algebra programs provide the user with a mathematics workbench. They give the user control over massive amounts of built in knowledge and much of this is used automatically. This is what most users need. There are however problems for which this approach turns out to be ``sub-optimal''. The design of Form had some of these problems in mind. Hence Form can be considered complementary to a number of these computer algebra packages. Throughout this talk the term computer algebra is used for systems that have much built in knowledge, much of which is applied automatically, while the term symbolic manipulation indicates systems that are mainly rewrite systems.

Many high level operations in computer algebra can only be implemented efficiently if the whole expression to which they are applied resides inside the physical memory. These operations need simultaneous input from more than one term. A good example of such an operation is factorization in which all terms in an entire expression have to ``conspire'' in order to make the expression factorizable. Such an operation will be very time consuming once the expression doesn't fit inside the physical memory. I will call operations like the above ``non-local operations''. Local operations take their input from only one term at a time. This means that one can process formulae sequentially and there is much less need to have the whole formula inside the physical memory. Alas it isn't possible to construct a symbolic manipulation program from local operations only. The operation that brings expressions to a unique form is necessarily non-linear as it has to do some sorting. On the other hand it is possible to build a symbolic manipulation program in which the only non-local operation is a variation of sorting. This can be done very efficiently, even when the expressions are disk based. What is even more interesting is that many algorithms that are implemented in a non-local way in computer algebra programs can in principle be rewritten as a sequence of local operations and sort operations. It is not clear how much this influences their performance as not much research has been done in this direction.

A system that has its expressions disk based will use its memory in an entirely different way than a memory based system. The main memory use of Form concerns various types of buffers to not let the disks slow down the execution significantly. The advantage of this is that even on small computers Form can deal with rather large formulae, although very small buffer sizes may make execution somewhat slower. This means that even inside an MS-DOS computer with 640 Kbytes formulae of Mega bytes can be manipulated. On bigger computers FORM will usually run in a space that is smaller than 4 Mbytes and a large part of that can even be swapped to disk without influencing the performance noticeably. It is even possible to control the size of the buffers by means of a special setup file. At times this can be very handy. On such systems the available diskspace is usually the factor that limits the size of the expressions.

To design a program like Form that must deal with big expressions and be very fast involves some tradeoffs. Of course the definition of local operations needs the definition of a ``term''. For most large scale formula crunching the addition (and subtraction) and the multiplication (and division) have a special role. The addition separates the terms and the multiplication separates subterms. Both operations are associative and the addition is commutative. The multiplication isn't necessarily commutative. This scheme looses some generality and is less elegant than a completely flexible setup, but it is a minor loss. Other operations can be built in by using functions and defining rules for the interactions of the arguments of the functions. This is of course slower than addition and multiplication.

In the beginning most of the syntax of Form was a modernized version of the language of Schoonschip [1]. This was however abandoned when the Schoonschip syntax prevented the use of modern language constructs. An example of this is the introduction of the semicolon to indicate the end of a statement. Before this was done it was impossible to use indentation. The statements in Form still use the Schoonschip construct of a keyword, possibly followed by subkeys, after which come either the body of the statement or a number of parameters. This construction is used in many languages. Because Form knows several types of variables and more are planned for the future the decision has been taken to make it a strong typing language. Hence all symbols, vectors, indices, functions, sets and expressions have to be declared properly. In the declaration it is possible to assign some properties to these objects. An example is the power restriction on symbols. This can be very useful for power series expansions, as terms that contain such an object to an ``illegal'' power are set to zero automatically at a very early stage. Sets are a hybrid between arrays and real mathematical sets. They are mainly used for pattern matching in which a potential match can be restricted to an element of a set. Functions can be declared to be commuting functions or non commuting functions. The order of non commuting functions will always be respected.

The pattern matching of Form is very extensive. Much attention has been given to wildcarding of more than one object and to very special wildcard modes. Yet the scheme isn't complete as some types of wildcarding might require very large search times and could introduce ambiguities. In future versions most of those modes will also be included anyway. Some examples of wildcarding that are allowed are:

    id  F(x?,??)*g(??)*x1^n1?*x2^n1?*p?.q?^n2? = ...

The language of Form knows several types of expressions. Expressions are the formulae that are being manipulated. There are local and global expressions. Global expressions can be kept after a program has terminated. After the proper command they are stored inside a special file and take the status of ``stored expressions''. The other possible status is that of an active expression. Active expressions are the expressions that are being manipulated. In principle all operations act on all active expressions, although there is a way to exclude expressions from such activity temporarily.

Form statements are grouped in modules. Each module is translated only when its turn has come. This means that at the module level Form acts as an interpreter. The statements inside the module are compiled, so that the execution of a module will proceede rapidly. This hybrid fashion is dictated by some considerations of flexibility. Internally the variables aren't stored by name, but rather by a number in a table. It is possible for Form to ``forget'' names as only names that have been declared as global variables cannot be removed from the tables. This loosing of variables and expressions is a special command. Hence the fresh translation before a module is executed. This compilation takes rarely a measurable amount of time anyway.

After a module has been executed the expressions are brought to normal form by sorting them and adding the coefficients of terms that are otherwise identical. This means that the sorting is only executed when the user asks for it. Such a feature provides great flexibility and can speed up execution considerably. An example of such a case would be a sequence of substitutions in which each substitution affects only a small fraction of the terms. In such a case sorting after each substitution would be a waste of time. For small expressions one would hardly notice the difference, but for expressions with more than 105 terms this can become annoying.

As in most modern languages Form is equiped with a preprocessor that can treat the input on a character level before this input is offered to the compiler part of Form. This preprocessor is rather flexible as it contains preprocessor variables, loops, conditions, macro's (named procedures) and a little calculator that allows constructs of the type:

    #define MAX "10"
    Local F0 = 1;
    Local F1 = 1;
    #do i = 2,'MAX'
      .sort
      drop F{'i'-2};
      skip F{'i'-1};
      Local F'i' = F{'i'-1}+F{'i'-2};
      print;
    #enddo
    .end

Here the define assigns the string 10 to the preprocessor variable MAX. Preprocessor variables have to be enclosed by quotes when they are used. This allows them to be concatenated to form new strings. The do loop parameter i is also a preprocessor variable. It gets assigned a string that can be interpreted numerically. The curly brackets invoke the calculator. The string between them will be interpreted as a numerical expression, be evaluated and then it is reconverted to a string. This means that the first pass in the loop generates the statements

      .sort
      drop F0;
      skip F1;
      Local F2 = F1+F0;
      print;

This can be very useful when setting up recursions.

The substitutions are in principle a one shot deal. Their pattern is matched, possibly using some rules that are given in a subkey, the matching object is taken out, and only after all matching due to this statement and possible follow up statements that are indicated as such is finished are all the right hand sides inserted. After this the statement is only used again for the other terms that haven't had their turn yet or the statement is reused when it has been placed inside a loop:

    id  x = x+1;

will not result in an infinite loop.

    repeat;
      id  x = x+1;
    endrepeat;

would result in an infinite loop, as the repeat loop will execute the statements inside the loop until they have no more effect. In the above example execution would come to a halt, unless the expressions don't contain the object x. Repeat loops, and if statements can be nested, but their ``body'' cannot contain and end-of-module instruction.

The arithmetic of Form is in principle over the rational numbers, but it can also be switched to integer arithmetic modulus a finite number. Fractions in such a mode may cause problems if this number isn't a prime number. If the number isn't too large it is possible to print all coefficients as powers of a (user provided) generator. Currently no floating point numbers have been built in although the internal notations leave room for it. In principle symbolic manipulation has not much need for floating point numbers as all transcendental numbers are usually given names like ``pi''. This is unlike computer algebra where the system may try to evaluate expressions when the user asks for it.

Because High Energy Physics has been one of the main reasons to develop symbolic manipulation and nowadays it cannot do without, Form would not be complete without efficient Dirac algebra algorithms. Both 4-dimensional and n-dimensional algorithms have been built in. The n-dimensional algorithms use only the anticommutation rules for the evaluation of traces. The 4-dimensional algorithms are faster as they use some identities that are specific to 4 dimensions. Of course the Chisholm identities have been built in. An example of a typical nontrivial computation in High Energy theory is the amplitude for the reaction 2 gluons W+W-. This reaction needs a quark loop. To compute such a reaction in the most general case (massive quarks, offshell vector particles) is quite some task. One obtains integrals with four powers of the loop momentum in the numerator and these should be expressed in terms of integrals with no powers of this momentum in the numerator. Details of the method used are given in ref [2]. Below is the processing of one of the diagrams.

    #include declare.h
    .global
    G   fermi = ( -(g_(1,Q)+m1)*g_(1,e1)*g7_(1)
            *(g_(1,Q)+g_(1,p1)+m2)*g_(1,e2)
            *(g_(1,Q)+g_(1,p1)+g_(1,p2)+m2)*g_(1,e3)
            *(g_(1,Q)+g_(1,p1)+g_(1,p2)+g_(1,p3)+m2)*g_(1,e4)*g7_(1)
*-- Plus the reversed string to eliminate Levi-Civita tensors
            )/N1/N2/N3/N4;
    trace4,1;
        id  p1.e1 = 0;
        al  p2.e2 = 0;
        al  p3.e3 = 0;
    .sort
Time =      16.00 sec    Generated terms =       1886
            fermi        Terms in output =        339
                         Bytes used      =      18882
        id  Q.p1 = N2/2 - N1/2 + s1.p1;
        id  Q.p2 = N3/2 - N2/2 + s1.p2;
        id  Q.p3 = N4/2 - N3/2 + s1.p3;
        id  Q.Q = N1 + m1^2;
    .sort
Time =      20.35 sec    Generated terms =        716
            fermi        Terms in output =        527
                         Bytes used      =      27284
    #call reduce4{NN|N1|N2|N3|N4|S|p1|p2|p3|p4|v|Delta3|D1234}
    .sort
Time =      60.73 sec    Generated terms =       2582
            fermi        Terms in output =       1946
                         Bytes used      =     111348
    #call reduce3{NN|N1|N2|N3|S123|p1|p2|p34|mu|nu|Delta123|C123|0|0}
    #call reduce3{NN|N1|N2|N4|S124|p1|p23|p4|mu|nu|Delta124|C124|0|0}
    #call reduce3{NN|N1|N3|N4|S134|p12|p3|p4|mu|nu|Delta134|C134|0|0}
    #call reduce3{NN|N2|N3|N4|S234|p2|p3|p14|mu|nu|Delta234|C234|1|p1}
        id  QQPP = QQPP*2/3*(1+epsilon/6);
    .sort
Time =     143.98 sec    Generated terms =       4069
            fermi        Terms in output =       3351
                         Bytes used      =     284206
    #call reduce2{NN|N1|N2|+|p1|s1|s2|B12|0|0|1|1}
    #call reduce2{NN|N1|N3|+|p12|s1|s3|B13|0|0|1|1}
    #call reduce2{NN|N1|N4|-|p4|s1|s4|B14|0|0|1|1}
    #call reduce2{NN|N2|N3|+|p2|s2|s3|B23|1|p1|0|1}
    #call reduce2{NN|N2|N4|+|p23|s2|s4|B24|1|p1|1|1}
    #call reduce2{NN|N3|N4|+|p3|s3|s4|B34|1|p12|0|1}
    .sort
Time =     227.58 sec    Generated terms =       4181
            fermi        Terms in output =       3656
                         Bytes used      =     330898
 
*-- Here are some more statements for some simplification
    print +f;
    .store
Time =     428.99 sec    Generated terms =       3478
            fermi        Terms in output =       3466
                         Bytes used      =     362086
   fermi =
       + A1 * ( 16*delta(p1,p2,p34,e3)*e_(v,p1,p2,e4)
         *p1.e2*p12.p12^-1*p12.e1*Delta3^-1*Delta123^-1 + .....
        etc.
    save fermi.sav;
    .end

The file declare.h contains a list of declarations that can be used in the various programs that are used to come to a final solution of the problem. Then the amplitude is entered and the polarizations of the vector particles are kept in terms of polarization vectors, so that the diagram can be extended if needed. Next some simplifications are performed to keep things as simple as possible. Then come the calls to some subroutines, called procedures. These procedures can either be provided inside the program, or they can be present in the current directory. In such procedures one can solve a problem once in a general way. Because of the complexity of the problem the answer is in a somewhat abstract form. It contains still many Gram-determinants which may or may not take a simple form when they are written out. This depends on the problem. In this case it may be better to rewrite all those objects into symbols that are computed in a Fortran program. If the problem is sufficiently simple (like photon-photon scattering) one would substitute these objects and obtain eventually a formula that fits on less than half a page. The above program ran on an Atari-ST home computer with only one Mbyte of memory.

Form can be used also for many problems that have no connection whatsoever to the perturbation expansions of field theory. Below is a small example of the expansion of the Campbell-Hausdorf series.

    #define MAX "6"
    S   i;
    F   A,B;
    #do k = 1,'MAX'
      S  x'k'(:'k');
      F  C'k';
    #enddo
    .global
    #do k = 2,'MAX'
    G   F'k' = sump_(i,0,'k',x'k'*A/i) * sump_(i,0,'k',x'k'*B/i)
             - sump_(i,0,'k',x'k'*C1/i) + x'k'^'k'*C'k' ;
    #do j = 2,'k'
      id  C{'j'-1} = C{'j'-1}+x'k'*C'j';
    #enddo
    id  x'k'^'k' = 1;
    id  x'k' = 0;
    id  C1 = A+B;
    #do j = 2,'k'-1
        .sort
        id  C'j' = F'j';
    #enddo
    print;
    .store
    #enddo
    save chseries.sav;
    .end

This program makes an intensive use of the preprocessor features. There are loops and the preprocessor calculator facilitates the setting up of some recursions. The program results in a successive calculation of the terms in the Campbell- Hausdorf series as shown below:

   F2 = 1/2*A*B - 1/2*B*A;
   F3 = 1/12*A*A*B - 1/6*A*B*A + 1/12*A*B*B + 1/12*B*A*A
       - 1/6*B*A*B + 1/12*B*B*A;
   F4 = 1/24*A*A*B*B - 1/12*A*B*A*B + 1/12*B*A*B*A - 1/24*B*B*A*A;
   etc.

There exists a closed form for this expansion due to Dynkin (see [3]) but this formula still contains many summations and isn't necessarily more convenient. By changing the value of the preprocessor variable MAX one can run the expansion as far as the computer will allow. In practise this means that a mainframe or large workstation can obtain F12 in less than 20 minutes.

The program that expresses the expansion in terms of commutators is a little more complicated. To make it really efficient it was necessary to build in some extra patterns. These will become available in version 2. It takes less than an hour to express F12 in terms of commutators and the expression thus obtained is shorter than the expression coming from the Dynkin formula, due to the application of relations between the commutators. Below are for instance the results for F6 and F8:

   F6 =
       - 1/720*C(A,A,B,B) + 1/240*C(A,B,A,B)
       + 1/1440*C(A,B,B,B)+ 1/1440*C(B,A,A,A);
   F8 =
       - 5/24192*C(A,A,A,B,B,B)  + 1/2520*C(A,A,B,A,B,B)
       + 1/20160*C(A,A,B,B,B,B)  + 1/15120*C(A,B,A,A,B,B)
       - 1/2016*C(A,B,A,B,A,B)   + 1/20160*C(A,B,A,B,B,B)
       + 1/20160*C(A,B,B,A,A,B)  - 1/10080*C(A,B,B,A,B,B)
       - 1/60480*C(A,B,B,B,B,B)  - 1/60480*C(B,A,A,A,A,A)
       - 1/10080*C(B,A,A,B,A,A)  - 1/20160*C(B,A,B,A,A,A)
       + 1/20160*C(B,B,A,A,A,A) ;

These were separate programs. They took 12 sec and 129 sec respectively on an Atari–ST computer. The function C is defined by

C(x1, x2,..., xn) = x1C(x2, ... , xn) - C(x2, ... , xn)x1  
C = AB - BA  

This looks very much like the Ad operator in ref [3] The program that produced this expression is rather short but uses very high level wildcarding. Some essential statements in the program are:

    repeat;
        id  C(??)*E?AB = E*C(..) - C(E,..);
        id  B*A = A*B - C;
    endrepeat;
 
    id  C(??,B,A) = C(..,A,B);
    repeat   id,disorder,C(??)*C(???)=C(...)*C(..)-C(C(...),..);
    repeat   id  C(?,C(a?,??),???) =
                 C(.,a,C(..),...)-C(.,C(..),a,...);
    repeat   id  C(?,C,??) = C(.,A,B,..)-C(.,B,A,..);

The wildcarding like C(??) means that any argument field will be picked up. This becomes even wilder in C(?,C(a?,??),???) which means: C with a number of arguments, then an argument that must be C with at least one argument, and then any number of extra arguments. In the right hand side the dots match the question marks. The disorder option will cause the substitution to be made only if the objects are not properly ordered, due to the fact that they are non commuting. In the end there are relations between the commutators. These can be derived systematically and each even power introduces more of them. Due to these relations the obtained expressions aren't unique and it isn't easy to set up the relations such that the result is minimal.

Form 1.0 is nowadays available on many different computers. Even though the system still contains occasional errors the situation doesn't seem to be worse than with other systems. Due to the policy decision to make version 1.0 available for free it has already a large number of users. At the moment version 2 is being prepared. It will have a large number of new features, many of which will have their main use outside the perturbation expansions of High Energy Physics. In addition some research is being done to determine what is the best way to parallellize Form. It seems to be quite feasible.