* Introduction. This is the \.{CWEAVE program by Silvio Levy and Donald E. Knuth, based on \.{WEAVE by Knuth. We are thankful to Steve Avery, Nelson Beebe, Hans-Hermann Bode (to whom the \CPLUSPLUS/ adaptation is due), Klaus Guntermann, Norman Ramsey, Tomas Rokicki, Joachim Schnitter, Joachim Schrod, Lee Wittenberg, and others who have contributed improvements.

The “banner line” defined here should be changed whenever \.{CWEAVE is modified.

@d banner "This is CWEAVE (Version 3.4)\n"

@h @<Common code for \.{CWEAVE and \.{CTANGLE@> @<Typedef declarations@> @<Global variables@> @<Predeclaration of procedures@>

 We predeclare several standard system functions here instead of including their system header files, because the names of the header files are not as standard as the names of the functions. (For example, some \CEE/ environments have \.{<string.h> where others have \.{<strings.h>.)

@<Predecl...@>= extern int strlen(); /* length of string */ extern int strcmp(); /* compare strings lexicographically */ extern char* strcpy(); /* copy one string to another */ extern int strncmp(); /* compare up to $n$ string characters */ extern char* strncpy(); /* copy up to $n$ string characters */

 \.{CWEAVE has a fairly straightforward outline. It operates in three phases: First it inputs the source file and stores cross-reference data, then it inputs the source once again and produces the \TEX/ output file, finally it sorts and outputs the index.

Please read the documentation for \.{common, the set of routines common to \.{CTANGLE and \.{CWEAVE, before proceeding further.

int main (ac, av) int ac; /* argument count */ char **av; /* argument values */ { argc=ac; argv=av; program=cweave; make_xrefs=force_lines=1; /* controlled by command-line options */ common_init(); @<Set initial values@>; if (show_banner) printf(banner); /* print a “banner line” */ @<Store all the reserved words@>; phase_one(); /* read all the user’s text and store the cross-references */ phase_two(); /* read all the text again and translate it to \TEX/ form */ phase_three(); /* output the cross-reference index */ return wrap_up(); /* and exit gracefully */

 The following parameters were sufficient in the original \.{WEAVE to handle \TEX/, so they should be sufficient for most applications of \.{CWEAVE. If you change |max_bytes|, |max_names|, |hash_size| or |buf_size| you have to change them also in the file |"common.w"|.

@d max_bytes 90000 /* the number of bytes in identifiers, index entries, and section names */ @d max_names 4000 /* number of identifiers, strings, section names; must be less than 10240; used in |"common.w"| */ @d max_sections 2000 /* greater than the total number of sections */ @d hash_size 353 /* should be prime */ @d buf_size 100 /* maximum length of input line, plus one */ @d longest_name 1000 /* section names and strings shouldn’t be longer than this */ @d long_buf_size (buf_size+longest_name) @d line_length 80 /* lines of \TEX/ output have at most this many characters; should be less than 256 */ @d max_refs 20000 /* number of cross-references; must be less than 65536 */ @d max_toks 20000 /* number of symbols in \CEE/ texts being parsed; must be less than 65536 */ @d max_texts 4000 /* number of phrases in \CEE/ texts being parsed; must be less than 10240 */ @d max_scraps 2000 /* number of tokens in \CEE/ texts being parsed */ @d stack_size 400 /* number of simultaneous output levels */

 The next few sections contain stuff from the file |"common.w"| that must be included in both |"ctangle.w"| and |"cweave.w"|. It appears in file |"common.h"|, which needs to be updated when |"common.w"| changes.

@i common.h


Data structures exclusive to {\tt CWEAVE. As explained in \.{common.w, the field of a |name_info| structure that contains the |rlink| of a section name is used for a completely different purpose in the case of identifiers. It is then called the |ilk| of the identifier, and it is used to distinguish between various types of identifiers, as follows:

\yskip\hang |normal| identifiers are part of the \CEE/ program and will appear in italic type.

\yskip\hang |roman| identifiers are index entries that appear after \.{@\^ in the \.{CWEB file.

\yskip\hang |wildcard| identifiers are index entries that appear after \.{@: in the \.{CWEB file.

\yskip\hang |typewriter| identifiers are index entries that appear after \.{@. in the \.{CWEB file.

\yskip\hang |else_like|, \dots, |typedef_like| identifiers are \CEE/ reserved words whose |ilk| explains how they are to be treated when \CEE/ code is being formatted.

@d ilk dummy.Ilk @d normal 0 /* ordinary identifiers have |normal| ilk */ @d roman 1 /* normal index entries have |roman| ilk */ @d wildcard 2 /* user-formatted index entries have |wildcard| ilk */ @d typewriter 3 /* ‘typewriter type’ entries have |typewriter| ilk */ @d abnormal(a) (a->ilk>typewriter) /* tells if a name is special */ @d custom 4 /* identifiers with user-given control sequence */ @d unindexed(a) (a->ilk>custom) /* tells if uses of a name are to be indexed */ @d quoted 5 /* \.{NULL */ @d else_like 26 /* \&{else */ @d public_like 40 /* \&{public, \&{private, \&{protected */ @d operator_like 41 /* \&{operator */ @d new_like 42 /* \&{new */ @d catch_like 43 /* \&{catch */ @d for_like 45 /* \.{for, \&{switch, \&{while */ @d do_like 46 /* \&{do */ @d if_like 47 /* \&{if, \&{ifdef, \&{endif, \&{pragma, \dots */ @d raw_rpar 48 /* ‘\.)’ or ‘\.]’ when looking for \&{const following */ @d raw_unorbin 49 /* ‘\.\&’ or ‘\.*’ when looking for \&{const following */ @d const_like 50 /* \&{const, \&{volatile */ @d raw_int 51 /* \&{int, \&{char, \&{extern, \dots */ @d int_like 52 /* same, when not followed by left parenthesis */ @d case_like 53 /* \&{case, \&{return, \&{goto, \&{break, \&{continue */ @d sizeof_like 54 /* \&{sizeof */ @d struct_like 55 /* \&{struct, \&{union, \&{enum, \&{class */ @d typedef_like 56 /* \&{typedef */ @d define_like 57 /* \&{define */

 We keep track of the current section number in |section_count|, which is the total number of sections that have started. Sections which have been altered by a change file entry have their |changed_section| flag turned on during the first phase.

@<Global...@>= boolean change_exists; /* has any section changed? */

 The other large memory area in \.{CWEAVE keeps the cross-reference data. All uses of the name |p| are recorded in a linked list beginning at |p->xref|, which points into the |xmem| array. The elements of |xmem| are structures consisting of an integer, |num|, and a pointer |xlink| to another element of |xmem|. If |x=p->xref| is a pointer into |xmem|, the value of |x->num| is either a section number where |p| is used, or |cite_flag| plus a section number where |p| is mentioned, or |def_flag| plus a section number where |p| is defined; and |x->xlink| points to the next such cross-reference for |p|, if any. This list of cross-references is in decreasing order by section number. The next unused slot in |xmem| is |xref_ptr|. The linked list ends at |&xmem[0]|.

The global variable |xref_switch| is set either to |def_flag| or to zero, depending on whether the next cross-reference to an identifier is to be underlined or not in the index. This switch is set to |def_flag| when \.{@! or \.{@d is scanned, and it is cleared to zero when the next identifier or index entry cross-reference has been made. Similarly, the global variable |section_xref_switch| is either |def_flag| or |cite_flag| or zero, depending on whether a section name is being defined, cited or used in \CEE/ text.

@<Type...@>= typedef struct xref_info { sixteen_bits num; /* section number plus zero or |def_flag| */ struct xref_info *xlink; /* pointer to the previous cross-reference */ xref_info; typedef xref_info *xref_pointer;

 @<Global...@>= xref_info xmem[max_refs]; /* contains cross-reference information */ xref_pointer xmem_end = xmem+max_refs-1; xref_pointer xref_ptr; /* the largest occupied position in |xmem| */ sixteen_bits xref_switch,section_xref_switch; /* either zero or |def_flag| */

 A section that is used for multi-file output (with the \.{@( feature) has a special first cross-reference whose |num| field is |file_flag|.

@d file_flag (3*cite_flag) @d def_flag (2*cite_flag) @d cite_flag 10240 /* must be strictly larger than |max_sections| */ @d xref equiv_or_xref

@<Set init...@>= xref_ptr=xmem; name_dir->xref=(char*)xmem; xref_switch=0; section_xref_switch=0; xmem->num=0; /* sentinel value */

 A new cross-reference for an identifier is formed by calling |new_xref|, which discards duplicate entries and ignores non-underlined references to one-letter identifiers or \CEE/’s reserved words.

If the user has sent the |no_xref| flag (the \.{-x option of the command line), it is unnecessary to keep track of cross-references for identifiers. If one were careful, one could probably make more changes around section 100 to avoid a lot of identifier looking up.

@d append_xref(c) if (xref_ptr==xmem_end) overflow("cross-reference"); else (++xref_ptr)->num=c; @d no_xref (flags[’x’]==0) @d make_xrefs flags[’x’] /* should cross references be output? */ @d is_tiny(p) ((p+1)->byte_start==(p)->byte_start+1)

void new_xref(p) name_pointer p; { xref_pointer q; /* pointer to previous cross-reference */ sixteen_bits m, n; /* new and previous cross-reference value */ if (no_xref) return; if ((unindexed(p) || is_tiny(p)) && xref_switch==0) return; m=section_count+xref_switch; xref_switch=0; q=(xref_pointer)p->xref; if (q != xmem) { n=q->num; if (n==m || n==m+def_flag) return; else if (m==n+def_flag) { q->num=m; return; append_xref(m); xref_ptr->xlink=q; p->xref=(char*)xref_ptr;

 The cross-reference lists for section names are slightly different. Suppose that a section name is defined in sections $m_1$, \dots, $m_k$, cited in sections $n_1$, \dots, $n_l$, and used in sections $p_1$, \dots, $p_j$. Then its list will contain $m_1+|def_flag|$, \dots, $m_k+|def_flag|$, $n_1+|cite_flag|$, \dots, $n_l+|cite_flag|$, $p_1$, \dots, $p_j$, in this order.

Although this method of storage take quadratic time on the length of the list, under foreseeable uses of \.{CWEAVE this inefficiency is insignificant.

void new_section_xref(p) name_pointer p; { xref_pointer q,r; /* pointers to previous cross-references */ q=(xref_pointer)p->xref; r=xmem; if (q>xmem) while (q->num>section_xref_switch) {r=q; q=q->xlink; if (r->num==section_count+section_xref_switch) return; /* don’t duplicate entries */ append_xref(section_count+section_xref_switch); xref_ptr->xlink=q; section_xref_switch=0; if (r==xmem) p->xref=(char*)xref_ptr; else r->xlink=xref_ptr;

 The cross-reference list for a section name may also begin with |file_flag|. Here’s how that flag gets put~in.

void set_file_flag(p) name_pointer p; { xref_pointer q; q=(xref_pointer)p->xref; if (q->num==file_flag) return; append_xref(file_flag); xref_ptr->xlink = q; p->xref = (char *)xref_ptr;

 A third large area of memory is used for sixteen-bit ‘tokens’, which appear in short lists similar to the strings of characters in |byte_mem|. Token lists are used to contain the result of \CEE/ code translated into \TEX/ form; further details about them will be explained later. A |text_pointer| variable is an index into |tok_start|.

@<Typed...@>= typedef sixteen_bits token; typedef token *token_pointer; typedef token_pointer *text_pointer;

 The first position of |tok_mem| that is unoccupied by replacement text is called |tok_ptr|, and the first unused location of |tok_start| is called |text_ptr|. Thus, we usually have |*text_ptr==tok_ptr|.

@<Global...@>= token tok_mem[max_toks]; /* tokens */ token_pointer tok_mem_end = tok_mem+max_toks-1; /* end of |tok_mem| */ token_pointer tok_start[max_texts]; /* directory into |tok_mem| */ token_pointer tok_ptr; /* first unused position in |tok_mem| */ text_pointer text_ptr; /* first unused position in |tok_start| */ text_pointer tok_start_end = tok_start+max_texts-1; /* end of |tok_start| */ token_pointer max_tok_ptr; /* largest value of |tok_ptr| */ text_pointer max_text_ptr; /* largest value of |text_ptr| */

 @<Set init...@>= tok_ptr=tok_mem+1; text_ptr=tok_start+1; tok_start[0]=tok_mem+1; tok_start[1]=tok_mem+1; max_tok_ptr=tok_mem+1; max_text_ptr=tok_start+1;

 Here are the three procedures needed to complete |id_lookup|: int names_match(p,first,l,t) name_pointer p; /* points to the proposed match */ char *first; /* position of first character of string */ int l; /* length of identifier */ eight_bits t; /* desired ilk */ { if (length(p)!=l) return 0; if (p->ilk!=t && !(t==normal && abnormal(p))) return 0; return !strncmp(first,p->byte_start,l);

void init_p(p,t) name_pointer p; eight_bits t; { p->ilk=t; p->xref=(char*)xmem;

void init_node(p) name_pointer p; { p->xref=(char*)xmem;

 We have to get \CEE/’s reserved words into the hash table, and the simplest way to do this is to insert them every time \.{CWEAVE is run. Fortunately there are relatively few reserved words. (Some of these are not strictly “reserved,” but are defined in header files of the ISO Standard \CEE/ Library.) r^eserved words@>

@<Store all the reserved words@>= id_lookup("asm",NULL,sizeof_like); id_lookup("auto",NULL,int_like); id_lookup("break",NULL,case_like); id_lookup("case",NULL,case_like); id_lookup("catch",NULL,catch_like); id_lookup("char",NULL,raw_int); id_lookup("class",NULL,struct_like); id_lookup("clock_t",NULL,raw_int); id_lookup("const",NULL,const_like); id_lookup("continue",NULL,case_like); id_lookup("default",NULL,case_like); id_lookup("define",NULL,define_like); id_lookup("defined",NULL,sizeof_like); id_lookup("delete",NULL,sizeof_like); id_lookup("div_t",NULL,raw_int); id_lookup("do",NULL,do_like); id_lookup("double",NULL,raw_int); id_lookup("elif",NULL,if_like); id_lookup("else",NULL,else_like); id_lookup("endif",NULL,if_like); id_lookup("enum",NULL,struct_like); id_lookup("error",NULL,if_like); id_lookup("extern",NULL,int_like); id_lookup("FILE",NULL,raw_int); id_lookup("float",NULL,raw_int); id_lookup("for",NULL,for_like); id_lookup("fpos_t",NULL,raw_int); id_lookup("friend",NULL,int_like); id_lookup("goto",NULL,case_like); id_lookup("if",NULL,if_like); id_lookup("ifdef",NULL,if_like); id_lookup("ifndef",NULL,if_like); id_lookup("include",NULL,if_like); id_lookup("inline",NULL,int_like); id_lookup("int",NULL,raw_int); id_lookup("jmp_buf",NULL,raw_int); id_lookup("ldiv_t",NULL,raw_int); id_lookup("line",NULL,if_like); id_lookup("long",NULL,raw_int); id_lookup("new",NULL,new_like); id_lookup("NULL",NULL,quoted); id_lookup("offsetof",NULL,sizeof_like); id_lookup("operator",NULL,operator_like); id_lookup("pragma",NULL,if_like); id_lookup("private",NULL,public_like); id_lookup("protected",NULL,public_like); id_lookup("ptrdiff_t",NULL,raw_int); id_lookup("public",NULL,public_like); id_lookup("register",NULL,int_like); id_lookup("return",NULL,case_like); id_lookup("short",NULL,raw_int); id_lookup("sig_atomic_t",NULL,raw_int); id_lookup("signed",NULL,raw_int); id_lookup("size_t",NULL,raw_int); id_lookup("sizeof",NULL,sizeof_like); id_lookup("static",NULL,int_like); id_lookup("struct",NULL,struct_like); id_lookup("switch",NULL,for_like); id_lookup("template",NULL,int_like); id_lookup("TeX",NULL,custom); id_lookup("this",NULL,quoted); id_lookup("throw",NULL,case_like); id_lookup("time_t",NULL,raw_int); id_lookup("try",NULL,else_like); id_lookup("typedef",NULL,typedef_like); id_lookup("undef",NULL,if_like); id_lookup("union",NULL,struct_like); id_lookup("unsigned",NULL,raw_int); id_lookup("va_dcl",NULL,decl); /* Berkeley’s variable-arg-list convention */ id_lookup("va_list",NULL,raw_int); /* ditto */ id_lookup("virtual",NULL,int_like); id_lookup("void",NULL,raw_int); id_lookup("volatile",NULL,const_like); id_lookup("wchar_t",NULL,raw_int); id_lookup("while",NULL,for_like);


Lexical scanning. Let us now consider the subroutines that read the \.{CWEB source file and break it into meaningful units. There are four such procedures: One simply skips to the next ‘\.{@\ ’ or ‘\.{@*’ that begins a section; another passes over the \TEX/ text at the beginning of a section; the third passes over the \TEX/ text in a \CEE/ comment; and the last, which is the most interesting, gets the next token of a \CEE/ text. They all use the pointers |limit| and |loc| into the line of input currently being studied.

 Control codes in \.{CWEB, which begin with ‘\.{@’, are converted into a numeric code designed to simplify \.{CWEAVE’s logic; for example, larger numbers are given to the control codes that denote more significant milestones, and the code of |new_section| should be the largest of all. Some of these numeric control codes take the place of |char| control codes that will not otherwise appear in the output of the scanning routines. ÂSCII code dependencies@>

@d ignore 00 /* control code of no interest to \.{CWEAVE */ @d verbatim 02 /* takes the place of extended ASCII \.{\char2 */ @d begin_short_comment 03 /* \CPLUSPLUS/ short comment */ @d begin_comment ’\t’ /* tab marks will not appear */ @d underline ’\n’ /* this code will be intercepted without confusion */ @d noop 0177 /* takes the place of ASCII delete */ @d xref_roman 0203 /* control code for ‘\.{@\^’ */ @d xref_wildcard 0204 /* control code for ‘\.{@:’ */ @d xref_typewriter 0205 /* control code for ‘\.{@.’ */ @d TeX_string 0206 /* control code for ‘\.{@t’ */ @f TeX_string TeX @d ord 0207 /* control code for ‘\.{@’’ */ @d join 0210 /* control code for ‘\.{@\&’ */ @d thin_space 0211 /* control code for ‘\.{@,’ */ @d math_break 0212 /* control code for ‘\.{@\v’ */ @d line_break 0213 /* control code for ‘\.{@/’ */ @d big_line_break 0214 /* control code for ‘\.{@\#’ */ @d no_line_break 0215 /* control code for ‘\.{@+’ */ @d pseudo_semi 0216 /* control code for ‘\.{@;’ */ @d macro_arg_open 0220 /* control code for ‘\.{@[’ */ @d macro_arg_close 0221 /* control code for ‘\.{@]’ */ @d trace 0222 /* control code for ‘\.{@0’, ‘\.{@1’ and ‘\.{@2’ */ @d translit_code 0223 /* control code for ‘\.{@l’ */ @d output_defs_code 0224 /* control code for ‘\.{@h’ */ @d format_code 0225 /* control code for ‘\.{@f’ and ‘\.{@s’ */ @d definition 0226 /* control code for ‘\.{@d’ */ @d begin_C 0227 /* control code for ‘\.{@c’ */ @d section_name 0230 /* control code for ‘\.{@<’ */ @d new_section 0231 /* control code for ‘\.{@\ ’ and ‘\.{@*’ */

 Control codes are converted to \.{CWEAVE’s internal representation by means of the table |ccode|.

@<Global...@>= eight_bits ccode[256]; /* meaning of a char following \.{@ */

 @<Set ini...@>= {int c; for (c=0; c<256; c++) ccode[c]=0; ccode[’ ’]=ccode[’\t’]=ccode[’\n’]=ccode[’\v’]=ccode[’\r’]=ccode[’\f’] =ccode[’*’]=new_section; ccode[’@’]=’@’; /* ‘quoted’ at sign */ ccode[’=’]=verbatim; ccode[’d’]=ccode[’D’]=definition; ccode[’f’]=ccode[’F’]=ccode[’s’]=ccode[’S’]=format_code; ccode[’c’]=ccode[’C’]=ccode[’p’]=ccode[’P’]=begin_C; ccode[’t’]=ccode[’T’]=TeX_string; ccode[’l’]=ccode[’L’]=translit_code; ccode[’q’]=ccode[’Q’]=noop; ccode[’h’]=ccode[’H’]=output_defs_code; ccode[’&’]=join; ccode[’<’]=ccode[’(’]=section_name; ccode[’!’]=underline; ccode[’^’]=xref_roman; ccode[’:’]=xref_wildcard; ccode[’.’]=xref_typewriter; ccode[’,’]=thin_space; ccode[’|’]=math_break; ccode[’/’]=line_break; ccode[’#’]=big_line_break; ccode[’+’]=no_line_break; ccode[’;’]=pseudo_semi; ccode[’[’]=macro_arg_open; ccode[’]’]=macro_arg_close; ccode[’\”]=ord; @<Special control codes for debugging@>@;

 Users can write \.{@2, \.{@1, and \.{@0 to turn tracing fully on, partly on, and off, respectively.

@<Special control codes...@>= ccode[’0’]=ccode[’1’]=ccode[’2’]=trace;

 The |skip_limbo| routine is used on the first pass to skip through portions of the input that are not in any sections, i.e., that precede the first section. After this procedure has been called, the value of |input_has_ended| will tell whether or not a section has actually been found.

There’s a complication that we will postpone until later: If the \.{@s operation appears in limbo, we want to use it to adjust the default interpretation of identifiers.

@<Predec...@>= void skip_limbo();

 void skip_limbo() { while(1) { if (loc>limit && get_line()==0) return; *(limit+1)=’@’; while (*loc!=’@’) loc++; /* look for ’@’, then skip two chars */ if (loc++ <=limit) { int c=ccode[(eight_bits)*loc++]; if (c==new_section) return; if (c==noop) skip_restricted(); else if (c==format_code) @<Process simple format in limbo@>;

 The |skip_TeX| routine is used on the first pass to skip through the \TEX/ code at the beginning of a section. It returns the next control code or ‘\.{\v’ found in the input. A |new_section| is assumed to exist at the very end of the file.

@f skip_TeX TeX

unsigned skip_TeX() /* skip past pure \TEX/ code */ { while (1) { if (loc>limit && get_line()==0) return(new_section); *(limit+1)=’@’; while (*loc!=’@’ && *loc!=’|’) loc++; if (*loc++ ==’|’) return(’|’); if (loc<=limit) return(ccode[(eight_bits)*(loc++)]);


1 Inputting the next token. As stated above, \.{CWEAVE’s most interesting lexical scanning routine is the |get_next| function that inputs the next token of \CEE/ input. However, |get_next| is not especially complicated.

The result of |get_next| is either a |char| code for some special character, or it is a special code representing a pair of characters (e.g., ‘\.{!=’), or it is the numeric value computed by the |ccode| table, or it is one of the following special codes:

\yskip\hang |identifier|: In this case the global variables |id_first| and |id_loc| will have been set to the beginning and ending-plus-one locations in the buffer, as required by the |id_lookup| routine.

\yskip\hang |string|: The string will have been copied into the array |section_text|; |id_first| and |id_loc| are set as above (now they are pointers into |section_text|).

\yskip\hang |constant|: The constant is copied into |section_text|, with slight modifications; |id_first| and |id_loc| are set.

\yskip\noindent Furthermore, some of the control codes cause |get_next| to take additional actions:

\yskip\hang |xref_roman|, |xref_wildcard|, |xref_typewriter|, |TeX_string|, |verbatim|: The values of |id_first| and |id_loc| will have been set to the beginning and ending-plus-one locations in the buffer.

\yskip\hang |section_name|: In this case the global variable |cur_section| will point to the |byte_start| entry for the section name that has just been scanned. The value of |cur_section_char| will be |’(’| if the section name was preceded by \.{@( instead of \.{@<.

\yskip\noindent If |get_next| sees ‘\.{@!’ it sets |xref_switch| to |def_flag| and goes on to the next token.

@d constant 0200 /* \CEE/ constant */ @d string 0201 /* \CEE/ string */ @d identifier 0202 /* \CEE/ identifier or reserved word */

@<Global...@>= name_pointer cur_section; /* name of section just scanned */ char cur_section_char; /* the character just before that name */

 @<Include...@>= #include <ctype.h> /* definition of |isalpha|, |isdigit| and so on */ #include <stdlib.h> /* definition of |exit| */

 As one might expect, |get_next| consists mostly of a big switch that branches to the various special cases that can arise.

@d isxalpha(c) ((c)==’_’) /* non-alpha character allowed in identifier */ @d ishigh(c) ((eight_bits)(c)>0177)


This document was generated on January 4, 2025 using texi2html 5.0.