home *** CD-ROM | disk | FTP | other *** search
- # @(#)TOUR 5.1 (Berkeley) 3/7/91
-
- A Tour through Ash
-
- Copyright 1989 by Kenneth Almquist.
-
-
- DIRECTORIES: The subdirectory bltin contains commands which can
- be compiled stand-alone. The rest of the source is in the main
- ash directory.
-
- SOURCE CODE GENERATORS: Files whose names begin with "mk" are
- programs that generate source code. A complete list of these
- programs is:
-
- program intput files generates
- ------- ------------ ---------
- mkbuiltins builtins builtins.h builtins.c
- mkinit *.c init.c
- mknodes nodetypes nodes.h nodes.c
- mksignames - signames.h signames.c
- mksyntax - syntax.h syntax.c
- mktokens - token.def
- bltin/mkexpr unary_op binary_op operators.h operators.c
-
- There are undoubtedly too many of these. Mkinit searches all the
- C source files for entries looking like:
-
- INIT {
- x = 1; /* executed during initialization */
- }
-
- RESET {
- x = 2; /* executed when the shell does a longjmp
- back to the main command loop */
- }
-
- SHELLPROC {
- x = 3; /* executed when the shell runs a shell procedure */
- }
-
- It pulls this code out into routines which are when particular
- events occur. The intent is to improve modularity by isolating
- the information about which modules need to be explicitly
- initialized/reset within the modules themselves.
-
- Mkinit recognizes several constructs for placing declarations in
- the init.c file.
- INCLUDE "file.h"
- includes a file. The storage class MKINIT makes a declaration
- available in the init.c file, for example:
- MKINIT int funcnest; /* depth of function calls */
- MKINIT alone on a line introduces a structure or union declara-
- tion:
- MKINIT
- struct redirtab {
- short renamed[10];
- };
- Preprocessor #define statements are copied to init.c without any
- special action to request this.
-
- INDENTATION: The ash source is indented in multiples of six
- spaces. The only study that I have heard of on the subject con-
- cluded that the optimal amount to indent is in the range of four
- to six spaces. I use six spaces since it is not too big a jump
- from the widely used eight spaces. If you really hate six space
- indentation, use the adjind (source included) program to change
- it to something else.
-
- EXCEPTIONS: Code for dealing with exceptions appears in
- exceptions.c. The C language doesn't include exception handling,
- so I implement it using setjmp and longjmp. The global variable
- exception contains the type of exception. EXERROR is raised by
- calling error. EXINT is an interrupt. EXSHELLPROC is an excep-
- tion which is raised when a shell procedure is invoked. The pur-
- pose of EXSHELLPROC is to perform the cleanup actions associated
- with other exceptions. After these cleanup actions, the shell
- can interpret a shell procedure itself without exec'ing a new
- copy of the shell.
-
- INTERRUPTS: In an interactive shell, an interrupt will cause an
- EXINT exception to return to the main command loop. (Exception:
- EXINT is not raised if the user traps interrupts using the trap
- command.) The INTOFF and INTON macros (defined in exception.h)
- provide uninterruptable critical sections. Between the execution
- of INTOFF and the execution of INTON, interrupt signals will be
- held for later delivery. INTOFF and INTON can be nested.
-
- MEMALLOC.C: Memalloc.c defines versions of malloc and realloc
- which call error when there is no memory left. It also defines a
- stack oriented memory allocation scheme. Allocating off a stack
- is probably more efficient than allocation using malloc, but the
- big advantage is that when an exception occurs all we have to do
- to free up the memory in use at the time of the exception is to
- restore the stack pointer. The stack is implemented using a
- linked list of blocks.
-
- STPUTC: If the stack were contiguous, it would be easy to store
- strings on the stack without knowing in advance how long the
- string was going to be:
- p = stackptr;
- *p++ = c; /* repeated as many times as needed */
- stackptr = p;
- The folloing three macros (defined in memalloc.h) perform these
- operations, but grow the stack if you run off the end:
- STARTSTACKSTR(p);
- STPUTC(c, p); /* repeated as many times as needed */
- grabstackstr(p);
-
- We now start a top-down look at the code:
-
- MAIN.C: The main routine performs some initialization, executes
- the user's profile if necessary, and calls cmdloop. Cmdloop is
- repeatedly parses and executes commands.
-
- OPTIONS.C: This file contains the option processing code. It is
- called from main to parse the shell arguments when the shell is
- invoked, and it also contains the set builtin. The -i and -j op-
- tions (the latter turns on job control) require changes in signal
- handling. The routines setjobctl (in jobs.c) and setinteractive
- (in trap.c) are called to handle changes to these options.
-
- PARSING: The parser code is all in parser.c. A recursive des-
- cent parser is used. Syntax tables (generated by mksyntax) are
- used to classify characters during lexical analysis. There are
- three tables: one for normal use, one for use when inside single
- quotes, and one for use when inside double quotes. The tables
- are machine dependent because they are indexed by character vari-
- ables and the range of a char varies from machine to machine.
-
- PARSE OUTPUT: The output of the parser consists of a tree of
- nodes. The various types of nodes are defined in the file node-
- types.
-
- Nodes of type NARG are used to represent both words and the con-
- tents of here documents. An early version of ash kept the con-
- tents of here documents in temporary files, but keeping here do-
- cuments in memory typically results in significantly better per-
- formance. It would have been nice to make it an option to use
- temporary files for here documents, for the benefit of small
- machines, but the code to keep track of when to delete the tem-
- porary files was complex and I never fixed all the bugs in it.
- (AT&T has been maintaining the Bourne shell for more than ten
- years, and to the best of my knowledge they still haven't gotten
- it to handle temporary files correctly in obscure cases.)
-
- The text field of a NARG structure points to the text of the
- word. The text consists of ordinary characters and a number of
- special codes defined in parser.h. The special codes are:
-
- CTLVAR Variable substitution
- CTLENDVAR End of variable substitution
- CTLBACKQ Command substitution
- CTLBACKQ|CTLQUOTE Command substitution inside double quotes
- CTLESC Escape next character
-
- A variable substitution contains the following elements:
-
- CTLVAR type name '=' [ alternative-text CTLENDVAR ]
-
- The type field is a single character specifying the type of sub-
- stitution. The possible types are:
-
- VSNORMAL $var
- VSMINUS ${var-text}
- VSMINUS|VSNUL ${var:-text}
- VSPLUS ${var+text}
- VSPLUS|VSNUL ${var:+text}
- VSQUESTION ${var?text}
- VSQUESTION|VSNUL ${var:?text}
- VSASSIGN ${var=text}
- VSASSIGN|VSNUL ${var=text}
-
- In addition, the type field will have the VSQUOTE flag set if the
- variable is enclosed in double quotes. The name of the variable
- comes next, terminated by an equals sign. If the type is not
- VSNORMAL, then the text field in the substitution follows, ter-
- minated by a CTLENDVAR byte.
-
- Commands in back quotes are parsed and stored in a linked list.
- The locations of these commands in the string are indicated by
- CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether
- the back quotes were enclosed in double quotes.
-
- The character CTLESC escapes the next character, so that in case
- any of the CTL characters mentioned above appear in the input,
- they can be passed through transparently. CTLESC is also used to
- escape '*', '?', '[', and '!' characters which were quoted by the
- user and thus should not be used for file name generation.
-
- CTLESC characters have proved to be particularly tricky to get
- right. In the case of here documents which are not subject to
- variable and command substitution, the parser doesn't insert any
- CTLESC characters to begin with (so the contents of the text
- field can be written without any processing). Other here docu-
- ments, and words which are not subject to splitting and file name
- generation, have the CTLESC characters removed during the vari-
- able and command substitution phase. Words which are subject
- splitting and file name generation have the CTLESC characters re-
- moved as part of the file name phase.
-
- EXECUTION: Command execution is handled by the following files:
- eval.c The top level routines.
- redir.c Code to handle redirection of input and output.
- jobs.c Code to handle forking, waiting, and job control.
- exec.c Code to to path searches and the actual exec sys call.
- expand.c Code to evaluate arguments.
- var.c Maintains the variable symbol table. Called from expand.c.
-
- EVAL.C: Evaltree recursively executes a parse tree. The exit
- status is returned in the global variable exitstatus. The alter-
- native entry evalbackcmd is called to evaluate commands in back
- quotes. It saves the result in memory if the command is a buil-
- tin; otherwise it forks off a child to execute the command and
- connects the standard output of the child to a pipe.
-
- JOBS.C: To create a process, you call makejob to return a job
- structure, and then call forkshell (passing the job structure as
- an argument) to create the process. Waitforjob waits for a job
- to complete. These routines take care of process groups if job
- control is defined.
-
- REDIR.C: Ash allows file descriptors to be redirected and then
- restored without forking off a child process. This is accom-
- plished by duplicating the original file descriptors. The redir-
- tab structure records where the file descriptors have be dupli-
- cated to.
-
- EXEC.C: The routine find_command locates a command, and enters
- the command in the hash table if it is not already there. The
- third argument specifies whether it is to print an error message
- if the command is not found. (When a pipeline is set up,
- find_command is called for all the commands in the pipeline be-
- fore any forking is done, so to get the commands into the hash
- table of the parent process. But to make command hashing as
- transparent as possible, we silently ignore errors at that point
- and only print error messages if the command cannot be found
- later.)
-
- The routine shellexec is the interface to the exec system call.
-
- EXPAND.C: Arguments are processed in three passes. The first
- (performed by the routine argstr) performs variable and command
- substitution. The second (ifsbreakup) performs word splitting
- and the third (expandmeta) performs file name generation. If the
- "/u" directory is simulated, then when "/u/username" is replaced
- by the user's home directory, the flag "didudir" is set. This
- tells the cd command that it should print out the directory name,
- just as it would if the "/u" directory were implemented using
- symbolic links.
-
- VAR.C: Variables are stored in a hash table. Probably we should
- switch to extensible hashing. The variable name is stored in the
- same string as the value (using the format "name=value") so that
- no string copying is needed to create the environment of a com-
- mand. Variables which the shell references internally are preal-
- located so that the shell can reference the values of these vari-
- ables without doing a lookup.
-
- When a program is run, the code in eval.c sticks any environment
- variables which precede the command (as in "PATH=xxx command") in
- the variable table as the simplest way to strip duplicates, and
- then calls "environment" to get the value of the environment.
- There are two consequences of this. First, if an assignment to
- PATH precedes the command, the value of PATH before the assign-
- ment must be remembered and passed to shellexec. Second, if the
- program turns out to be a shell procedure, the strings from the
- environment variables which preceded the command must be pulled
- out of the table and replaced with strings obtained from malloc,
- since the former will automatically be freed when the stack (see
- the entry on memalloc.c) is emptied.
-
- BUILTIN COMMANDS: The procedures for handling these are scat-
- tered throughout the code, depending on which location appears
- most appropriate. They can be recognized because their names al-
- ways end in "cmd". The mapping from names to procedures is
- specified in the file builtins, which is processed by the mkbuil-
- tins command.
-
- A builtin command is invoked with argc and argv set up like a
- normal program. A builtin command is allowed to overwrite its
- arguments. Builtin routines can call nextopt to do option pars-
- ing. This is kind of like getopt, but you don't pass argc and
- argv to it. Builtin routines can also call error. This routine
- normally terminates the shell (or returns to the main command
- loop if the shell is interactive), but when called from a builtin
- command it causes the builtin command to terminate with an exit
- status of 2.
-
- The directory bltins contains commands which can be compiled in-
- dependently but can also be built into the shell for efficiency
- reasons. The makefile in this directory compiles these programs
- in the normal fashion (so that they can be run regardless of
- whether the invoker is ash), but also creates a library named
- bltinlib.a which can be linked with ash. The header file bltin.h
- takes care of most of the differences between the ash and the
- stand-alone environment. The user should call the main routine
- "main", and #define main to be the name of the routine to use
- when the program is linked into ash. This #define should appear
- before bltin.h is included; bltin.h will #undef main if the pro-
- gram is to be compiled stand-alone.
-
- CD.C: This file defines the cd and pwd builtins. The pwd com-
- mand runs /bin/pwd the first time it is invoked (unless the user
- has already done a cd to an absolute pathname), but then
- remembers the current directory and updates it when the cd com-
- mand is run, so subsequent pwd commands run very fast. The main
- complication in the cd command is in the docd command, which
- resolves symbolic links into actual names and informs the user
- where the user ended up if he crossed a symbolic link.
-
- SIGNALS: Trap.c implements the trap command. The routine set-
- signal figures out what action should be taken when a signal is
- received and invokes the signal system call to set the signal ac-
- tion appropriately. When a signal that a user has set a trap for
- is caught, the routine "onsig" sets a flag. The routine dotrap
- is called at appropriate points to actually handle the signal.
- When an interrupt is caught and no trap has been set for that
- signal, the routine "onint" in error.c is called.
-
- OUTPUT: Ash uses it's own output routines. There are three out-
- put structures allocated. "Output" represents the standard out-
- put, "errout" the standard error, and "memout" contains output
- which is to be stored in memory. This last is used when a buil-
- tin command appears in backquotes, to allow its output to be col-
- lected without doing any I/O through the UNIX operating system.
- The variables out1 and out2 normally point to output and errout,
- respectively, but they are set to point to memout when appropri-
- ate inside backquotes.
-
- INPUT: The basic input routine is pgetc, which reads from the
- current input file. There is a stack of input files; the current
- input file is the top file on this stack. The code allows the
- input to come from a string rather than a file. (This is for the
- -c option and the "." and eval builtin commands.) The global
- variable plinno is saved and restored when files are pushed and
- popped from the stack. The parser routines store the number of
- the current line in this variable.
-
- DEBUGGING: If DEBUG is defined in shell.h, then the shell will
- write debugging information to the file $HOME/trace. Most of
- this is done using the TRACE macro, which takes a set of printf
- arguments inside two sets of parenthesis. Example:
- "TRACE(("n=%d0, n))". The double parenthesis are necessary be-
- cause the preprocessor can't handle functions with a variable
- number of arguments. Defining DEBUG also causes the shell to
- generate a core dump if it is sent a quit signal. The tracing
- code is in show.c.
-