home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
contrib
/
treefox.lzh
/
TREE&FOX.DOC
< prev
next >
Wrap
Text File
|
1993-04-24
|
66KB
|
1,486 lines
TREE & FOX
Explorations in computer aided natural language analysis
Manfred Jahn
English Department
University of Cologne
1993
TREES A program to create graphs and phrase markers.
TREECAD A utility for generating, manipulating and exploring X-bar trees,
transformations and cross-language variation.
FOX A "Frame Oriented X-bar Parser" which parses sentences in
interactive or automatic mode.
TREE & FOX documents three PC-based programs aimed at processing
linguistic data structures. The programs run under non-386 MS-DOS ICON
(from version 8.0). Due to the experimental and provisional nature of
the programs the author makes no warranties of any kind as to their
robustness or suitability for any application.
Notes:
======
(1) In this file, asterisks (*) are used to mark strings that are
italicized in the original printed output (available from the
author).
(2) If you print this document, make sure to set a nonproportional font
such as Courier or Letter Gothic.
1. TREES - a tree drawing utility
1.1. Structural representations. The basic data type of syntactic analysis
is the directed graph. There are two common representations: labelled
bracketings and trees. Labelled bracketing quickly tends to become
obscure even with trees of moderate complexity. A tree is a far superior
representation, but it takes up more space and is expensive to print.
Consider the following representations:
a. [NP [Detthe] [Nbar [APvery lucky] [Nbar [Ngirl] ] ] ]
[category identifiers appear as subscripts in the original
printing]
b. (NP,(Det,the),(Nbar,(AP,very lucky),(Nbar,(N,girl))))
c.
NP
┌─────┴──────┐
Det Nbar
│ ┌───┴─────┐
│ AP Nbar
│ │ │
│ │ N
│ │ │
the very lucky girl
(a) is a type of labelled bracketing frequently found in linguistic
textbooks. (b) is a straightforward mapping of (a) into a plain string
format. Directly or indirectly, it serves as the basic input data
structure for all of the utility programs introduced in this report.
Note that in the representation of (b) each nonterminal category is
any label after an opening bracket (i.e., NP, Det, Nbar etc.), and any
item preceded by a comma is a terminal item. (c) has been generated from
(b), and it is clearly the most easily comprehensible representation of
the structural relationships involved.
1.2. Input/Output. Input for program Trees comes from a plain text file
called trees.in. This file can contain any number of "tree plans" in the
formats specified below. The trees generated from these plans are
displayed on the screen and saved to an output file called trees.out. Two
input formats may be used:
a. Single lines of labelled bracketing (as in 1.1b).
b. A sequence of lines with indentations representing the tree
structure:
NP
Det
the
Nbar
AP
very lucky
Nbar
N
girl
The root category must appear in column 1. Subordinate levels
are indicated by progressive indentations of two spaces. Phrases
consisting of several words (e.g., "very lucky") are acceptable
node labels. Do not use round brackets or commas. Most higher
ASCII characters (particularly 250 and up) should also be avoided.
c. Successive tree plans must be separated by one or more blank
lines. Lines beginning with a hash character (#) are treated as
comments. Use a file lister to view some sample plans in trees.in.
1.3. Invoke the program with the command line iconx trees. The following
parameters are then requested by the program:
a. Terminal nodes on baseline or *in situ*.
b. The depth of the tree (default is 8). Since Trees is a small
program and memory is the only limitation, trees can be built to
considerable depths.
c. Optional: Tab offset and increments - see below.
1.4. Postediting for proportional fonts. The output trees will display
correctly on the computer's text screen or if printed with a monospaced
(nonproportional) typeface. To a certain extent, Trees can provide some
support towards proportionally spaced output such as the following:
[sorry, unable to display this here; refer to original printed text]
Unfortunately, this type of output requires a certain amount of
postediting. To begin with, your printer must have a monospaced typeface
and a proportional typeface of roughly the same dimensions. Test this by
printing or previewing a couple of sample lines with several typefaces. On
HP type printers, viable combinations include Courier 16.67/Times Roman
10 pt and Courier 12/Times 12pt. The following notes assume the Courier
12/Times 12 configuration.
The basic idea is to superimpose proportional node labels on to a
monospaced scaffold of pseudographics lines. Under WordPerfect 5.1, this
involves the following steps:
a. In WordPerfect, set the monospaced font (Courier 12). Also, via
the Setup option (Shift-F1,3,8), select a small unit of
measurement for the position display, preferably point sizes (pt).
As you can see on the status line, a left margin of 1 inch is
equivalent to a horizontal offset of 72 pt. Type one space, and
under Courier 12 the cursor will move in increments of 6 pts.
Verify this on the status line.
b. Run trees. At the "Calculate Tabstops" prompt press any key
except ENTER. The program will now ask for an increment value,
the left margin setting, an indent factor and a proportional
adjustment. The defaults suggested by the program are 6, 72, 1
and 4, which happens to be right for 10 c.p.i Courier/Times 12 pt.
(Set 4.32, 72, 1 and 2.6 for Courier 16.67/Times 10.) The indent
value can be used to move the tree towards the middle of the
page. The proportional adjustment varies with different typefaces
and has to be determined by trial and error.
c. Trees produces the following output:
Branches in columns: 12 18 21 25 31
Tabstops from Margin offset: 10; by Increments: 6
Set center Tabs at: 142 178 196 220 256
NP
142 178 220
┌─────┴──────┐
Det Nbar
142 196 220 256
│ ┌───┴─────┐
│ AP N
142 178196 220 256
│ ┌──┴───┐ │
│ │ Abar │
142 178 220 256
│ │ │ │
│ │ A │
142 178 220 256
│ │ │ │
a very lucky girl
The figures flush to the branches provide a visual cue (needed for
step f, below) as to which edges are associated with which
tabstops.
d. Back in WordPerfect, set the monospaced font (Courier 12) and
the standard fixed line height appropriate for this font. Import the
tree (from trees.out) together with its list of tab stop positions.
e. Move below the imported list of tab stop positions. Load the Tabs
Menu. Make sure that the tab type is "absolute". Set the tabs;
first by clearing them (Ctrl-End), then by entering the values
provided by trees. By default, WordPerfect sets "left align" (L)
tabs. You need center tab stops, however, so simply place the
cursor on the newly created "L"s in the Tabs line and change
them to "C"s. Exit the tabs menu.
f. Scroll past the monospaced tree and set the Times 12 font. Enter
the text of the nodes tabbing to the proper branch positions
indicated in the tree and leaving blank lines as needed. What you
want is an image of the tree consisting of the labels of the nodes
only, in their proper positions, but without any of the
pseudographics. Additional features such as bold, underlining,
superscripting, italicizing etc. can all be set, now or later,
providing considerable flexibility.
g. Move back to the monospaced image of the tree. Turn on "type-
over" mode and, using spaces, overwrite all monospaced text,
including the tab position cues, until only a bare scaffold of
pseudographics lines remains. Turn off typeover mode when
finished.
h. Make a note of the line position of the first line of the
monospaced tree. Superimpose the Times Roman section by
calling WordPerfect's "Advance to Line" (Shift-F8,4,1,3) function.
Switch over to previewing mode to check alignment, and correct
any mistakes.
i. Once you have mastered the basic technique, it is worth
considering putting all trees into "text boxes" which in
WordPerfect define their own set of tab stops and, more
importantly, have an independent line positioning feature which
is not affected by any editing changes in the main text. Since tab
settings in text boxes are calculated relative to a user-specified
horizontal position, trees should be instructed to calculate tab
stops from a left margin of zero.
2. TREECAD - Designing structural trees
2.1. Basic requirements. TreeCad runs on 386/486 based PCs with a
VGA or EGA screen (no Hercules mode), a mouse and a hard disk. Operation
without a mouse or with lesser processors is possible, but rather
tedious. The program needs as much ordinary memory as can be made
available. TreeCad can either be run from the DOS prompt or under
Windows 3.1. in 386 mode.
2.2. Uses. TreeCad is a text-and-pseudographics based utility for
generating, displaying and manipulating tree structures of all kinds. It
is therefore especially suited to:
o constructing and editing arbitrary trees. Special support is
provided for Xbar structures.
o hilighting structural relationships such as c-command, m-command
and government.
o demonstrating and exploring adjunction and movement patterns.
2.3. Operation under DOS.
a. Program invocation:
iconx TreeCad
iconx TreeCad nomode [see "switches", below ]
iconx TreeCad 7,15,23,59,69,120 nomode
TreeCad.icx can only be run from an ordinary (non-386) DOS
version of ICON.
b. Two optional switches may be set:
- nomode [prohibits switching into 80,43 mode]
The program attempts to switch into 80,43 mode as soon as
the system.max variable (i.e., the tree depth) is set to a value
larger than 11. Mode switching may not work for a number
of reasons (e.g., insufficient memory, idiosyncratic mode
commands etc.). In this case, execute a suitable mode
command and invoke TreeCad with the nomode switch.
- n1,n2,n3,n4,n5,n6 [colours]
These are six colour attribute numbers in the range 0-255.
Default settings (for an ordinary DOS screen) are
7,15,78,14,6,120. If you are displaying the screen via an LCD
connected to an overhead projector you may find that some
of the colours do not reproduce effectively. In this case, run
the ATTRS.EXE utility to determine adequate attributes and
invoke TreeCad with the new values. The attributes are used
in the main menu's show group: n1 is the standard normal
attribute; n2 is the general hilight attribute; n3 is the hilight
attribute for a c-commanding constituent; n4 is the hilight
attribute for a c-commanded item; n5 tones down items which
are not commanded; and n6 hilights a governed item.
c. Support programs. SCROLLER.EXE and the corpus file,
treecad.in, must be present for the data.corpus command to
work. Treecad.in is an editable textfile containing a selection of
trees in labelled bracketing format. If the PC is short on memory,
TreeCad may not be able to run the SCROLLER. In this case,
TreeCad can only be used in its scratch mode. Note that the
SCROLLER is a standalone program which is restricted to handle
a maximum of 100 lines restricted to a maximum length of 255
characters.
d. Other files. The corpus item selected via the SCROLLER program
is fed into scroller.dsk which is consulted when the data.corpus
option is activated from TreeCad. During normal operation of
TreeCad, all trees generated are saved in a protocol file called
treecad.tmp. When the verbose option is ON, all diagnostic
output is fed both to the screen and to treecad.tmp.
Treecad.tmp is overwritten each time TreeCad is started.
2.4. Operation under Windows 3.1. TreeCad is not a proper Windows
program. However, in Windows 3.1 386 mode it can be run as a "non-
windows application in a window". This has a number of advantages such
as access to the black-on-white screen, a smoothly moving arrow-shaped
mouse pointer, resizable system fonts, data exchange via the clipboard and
inclusion of explicatory text in a separate window. Do the following steps
to set up TreeCad for Windows:
a. Copy TreeCad.ico and TreeCad.pif, i.e., the icon and program
information files, to your main Windows 3.1 directory.
b. In Windows, start the PIF-Editor. Click *File/new*. Click *browse*
to locate and select TreeCad.pif. The only entries requiring any
change are the lines specifying the ICON directory. Adjust this to
whatever directory you are using for your TREE&FOX files. Exit
the PIF-Editor, saving the changes. Invoke the Program Manager's
*File/new* menu and OK the box *program item*. Enter
"Treecad.pif" as *commandline* text. Click *change icon*.
Disregard the error message and click OK. Use *browse* to locate
treecad.ico. Select it and click OK. The TreeCad icon will appear
among the Program Manager's other program symbols, and
TreeCad is ready to run.
c. Some further hints:
- There is an option to adjust font sizes in TreeCad's system
menu field. The most suitable font sizes are 8x12 and 7x12.
- The "edit" option lets you copy all or part of your TreeCad
display to the clipboard.
d. Notice: TreeCad may crash when the system.max variable is set
to a value larger than 10. This may be due to a lack of memory
or the fact that no ANSI.SYS driver is presently specified in the
CONFIG.SYS file. For large values of max (i.e., 11..15), make sure
to resize the window and select a suitable font.
2.5. Initial menu. The initial screen presents three button groups:
data system action
┌──────┼───────┐ ┌──────┬─────┴─────┬──────┐ ┌────┼──────┐
│corpus│scratch│ │max=10│verbose=OFF│tree=b│ │quit│resume│
a. Data.
- The corpus option runs SCROLLER.EXE. This program lists the
contents of the corpus file treecad.in and allows trees to be
imported.
- The scratch option presents two major Xbar structures (a CP
and an IP subtree) as initial experimental structures.
b. System. This group provides three buttons to change defaults.
- max is the number of tree levels to be displayed. If it is set to
greater than 11 (and the nomode switch is not in force)
TreeCad makes an attempt to execute "mode 80,43", giving
access, in theory, to a 43 line screen. Actually, critical values
for max begin at around 15, when TreeCad runs out of memory
and therefore crashes. Nothing serious happens, control simply
returns to DOS or Windows.
- verbose ON instructs the program to dump all debugging writes
to treecad.tmp. Under ordinary circumstances, it should remain
OFF.
- tree toggles the display of the trees to either the baseline or
the *in-situ* format.
c. Action. This is either quit or resume. The main use of resume
is to reload previously edited structures after having reset one of
the system variables.
2.6. The main menu consists of four groups. The show group highlights
Xbar relationships. The ops group handles Xbar operations. The edit group
contains a range of editing tools that allow various tree manipulations
such as cutting and copying. The system group has options to undo, redo
and save steps and also provides the quit button, which returns control to
the initial screen. All options appear as idiosyncratic two or three letter
strings:
show ops edit system
┌──┬──┬─┴┬──┬──┐ ┌───┼───┐ ┌───┬───┬─┴─┬───┬───┐ ┌──┬──┼──┬──┐
│hi│cc│mc│gv│Gv│ │adj│mov│ │cpy│cut│gen│mir│ren│ │un│Re│sv│qu│
Note that the following documentation of the individual options has
been arranged so as to provide a step by step tutorial as well as a
reference guide.
2.7. Keyboard-based input. For users without a mouse, the following
guidelines apply. 1) In order to execute a "click on option/button XYZ",
type in *two* option letters and press ENTER or the space bar. 2) In order
to "click a node", hit PgUp. Then, using the cursor keys, navigate the
cursor to the beginning of a node label and press ENTER. Ctrl-Left and
Ctrl-Right jump to the beginning of the next word on the left or on the
right.
Whenever literal keyboard input is requested, the space bar (as well as
ENTER) serves as a terminator - this is very convenient for entering input
strings with one hand only. However, it also means that you cannot enter
strings containing spaces.
2.8. If you want to follow the examples, start the program and set the
max variable to four or five. Select data.scratch in the initial menu.
TreeCad will present the following basic configuration:
CP IP
┌──┴────┐ ┌───┴────┐
│ Cbar │ Ibar
│ ┌─┴───┐ │ ┌──┴────┐
│ │ │ │ │ VP
│ │ │ │ │ │
│ │ │ │ │ Vbar
│ │ │ │ │ ┌─┴───┐
CSp C IP NP I V NP
2.9. The edit group.
a. *cut* deletes nodes and subtrees. Click on cut. The option will be
hilighted and the prompt "CUT<node>" appears. Click a terminal
node, and it will be pruned from the tree. After deleting an item,
cut remains hilighted and active. If you want to continue lopping
off branches just continue clicking other nodes you want
removed. Clicking a nonterminal node such as Cbar will delete
both the parent and the daughter nodes. Clicking a root node will
delete a whole tree.
Use system.un (see 2.10.a, below) to undo steps, if necessary.
Of course, you can also always quit and restart from scratch.
As an exercise, cut the scratch configuration down so that only
the CP tree remains.
b. ren (rename) affects only node text and does not alter any
structural relationships. Click on ren, then click on CP and
rename it to A. Continue traversing the tree changing CSp to B,
C to D and IP to E, eventually obtaining the following tree
(remember that you can undo steps if you make a mistake):
1. CP 2. A
┌──┴────┐ ┌──┴────┐
│ Cbar │ C
│ ┌─┴───┐ │ ┌─┴───┐
│ │ │ │ │ │
CSp C IP B D E
RENAME<node> CP TO: A
RENAME<node> CSp TO: B
(etc.)
c. mir (mirror) exchanges peripheral daughter nodes. Click mir,
then A (B and C will change places):
1. A 2. A
┌──┴────┐ ┌───┴─────┐
│ C C │
│ ┌─┴───┐ ┌─┴───┐ │
│ │ │ │ │ │
B D E D E B
MIRROR<nonterminal>: A
Click C (D and E will change places). Click C again (D and E will
revert to their original positions. Click A again to reconstitute the
original tree. Mir has no effect if activated on a terminal node or
a parent with only one daughter.
d. cpy (copy) copies trees, subtrees or terminals to a destination
node, replacing the destination. Click cpy and node C. Then click
B as the destination.
1. A 2. A
┌──┴────┐ ┌────┴──────┐
│ C C C
│ ┌─┴───┐ ┌─┴───┐ ┌─┴───┐
│ │ │ │ │ │ │
B D E D E D E
COPY<node>: C TO: B
Another major function of cpy is to create independent trees or
subtrees on the left or right periphery of the tree display space.
If the destination click occurs on an empty space in column 1, an
independent tree is created on the left. If the destination click
falls on an empty space in column 2-79, the copy is created on the
right.
1. A 2. C A
┌──┴────┐ ┌─┴───┐ ┌──┴────┐
│ C │ │ │ C
│ ┌─┴───┐ │ │ │ ┌─┴───┐
│ │ │ │ │ │ │ │
B D E D E B D E
COPY<node> C TO: [click at column 1, row 1]
e. gen (generate) is a tool for creating a variety of tree structures.
You begin by clicking a destination position which may be any
node of a given tree or an empty space in column 1 or an empty
space in column 2-79 (this is the identical convention as for cpy.)
Then you either specify a head of an Xbar structure or a list of
daughter nodes, or reconstitute an item from a previous cut.
To obtain tree #2, below, activate gen, click node B and enter
"N". An NP subtree is generated and replaces B. *Any* letter X
entered at gen's "TO" prompt is understood to indicate the head
of an Xbar structure. Gen creates standard Xbar structures,
containing a maximal projection (XP), a specifier (XSp) and a
complement (YP).
1. A 2. A
┌──┴────┐ ┌───────┴────────┐
│ C NP C
│ ┌─┴───┐ ┌──┴────┐ ┌─┴───┐
│ │ │ │ Nbar │ │
│ │ │ │ ┌─┴───┐ │ │
B D E NSp N YP D E
GENERATE<node/pos> B TO<head/paste/list>: N
At the TO prompt, the user can also enter a list of comma-
delimited node labels prefixed by the space character. The new
nodes will then become daughter nodes under (i.e., added to) the
destination node.
1. A 2. A
┌──┴────┐ ┌────┴──────┐
│ C B C
│ ┌─┴───┐ ┌─┴───┐ ┌─┴───┐
│ │ │ │ │ │ │
B D E xx yy D E
GENERATE<node/pos> B TO<head/paste/list>: [space]xx,yy
If, at the <node/pos> prompt, an empty space is clicked either
in column 1 or column 2-79, an independent Xbar tree is
generated to the left or right of the current tree space:
1. A 2. A VP
┌──┴────┐ ┌──┴────┐ ┌──┴────┐
│ C │ C │ Vbar
│ ┌─┴───┐ │ ┌─┴───┐ │ ┌─┴───┐
│ │ │ │ │ │ │ │ │
B D E B D E VSp V YP
GENERATE<node/pos>[click col. 60, row 1] TO<head/paste/list>: V
Finally, a previously cut item may be resurrected in a different
location by entering space+ENTER, which inserts the current
contents of the paste buffer. In the following example, the C-
subtree was first cut and then pasted into the position of node B.
1. A 2. A
┌──┴────┐ │
│ C C
│ ┌─┴───┐ ┌─┴───┐
│ │ │ │ │
B D E D E
CUT<node>: C
GENERATE<node/pos> B TO<head/paste/list> [space][ENTER]
2.10. The system group provides four options:
a. un (undo) undoes the last operation. A maximum of five steps
can be undone.
b. Re (redo) redoes a step previously undone. A maximum of three
steps can be saved in the redo buffer. Thus if you have just
undone five steps, you will only be able to return to the
antepenultimate stage.
c. The sv (save) option allows you to append the current tree to
treecad.in, making it permanently available to selection via the
data.corpus option of the intial screen.
d. qu (quit) returns control to the intial menu. The current tree can
be reloaded by clicking action.resume.
2.11. The show group. With the exception of the general purpose option
hi, this group mainly demonstrates Xbar-specific structural relations. The
screens created in this group can be undone but not redone. Click on
empty space if you want the hilighting removed.
a. hi (hilight). Use this option if you want to hilight specific nodes.
Clicking an already hilighted node resets the normal colour
attribute. (The option is not available for keyboard-based input.)
b. cc (c-command). Click cc and then any tree node. Different
colour attributes hilight the scope of the c-command relation, the
c-commanding node, the c-commanded nodes, and the nodes
excepted from c-command. The implementation follows the
operational definition given in Haegeman (1991:122):
Start from node A and move upwards to the first branching
node. Every node down (except those dominated by, or
dominating, A) is a B c-commanded by A.
In the following tree for the ungrammatical sentence *John will
invite herself* (treecad.in.9), node *John* was clicked. As a
result, all c-commanded nodes are hilighted on the screen
(italicized below).
IP
┌────┴─────┐
NP Ibar
│ ┌────┴─────┐
│ I VP
│ │ │
│ │ Vbar
│ │ ┌───┴────┐
│ │ V NP
│ │ │ │
│ │ │ │
John will invite herself
Briefly, the sentence is ungrammatical because the anaphor
*herself* should have a c-commanding binder. *John* is the only
candidate, but *John* is excluded because of lacking gender
concord.
c. mc (m-command). This is similar to c-command except that the
scope is slightly different:
Go from node A upwards to the first maximal projection. Every
node down from there is a B, m-commanded by A (except nodes
dominating A, or dominated by A). (Haegeman 1991:125)
This configurational property is mainly needed for the definition
of the concept of government (for which see below). In the
following tree (treecad.in.11), node *will* has been clicked. The
italicized items indicate nodes m-commanded by *will*.
IP
┌───┴────┐
NP Ibar
│ ┌──┴────┐ [m-command relations not shown here]
│ I VP
│ │ │
│ │ Vbar
│ │ ┌─┴───┐
│ │ V NP
│ │ │ │
│ │ │ │
He will do it
d. gv (government). Government is a crucial cofigurational
property underlying a number of syntactic phenomena. The
following definition (cp. Haegeman 1991:125) has been
implemented:
1) A governs B if A m-commands B and no barrier intervenes
between A and B. 2) Maximal projections except infinitival IP
are barriers to government. 3) Governors are lexical nodes V,
N, P, A and tensed I.
Consider *We want him to do it* (treecad.in.12), below. If you
click the V of *want*, the program hilights three nodes: the
embedded IP, the NP *him* and the VP *do it*.
IP
┌───┴─────┐
NP Ibar
│ ┌───┴─────┐
│ I VP
│ │ │
│ │ Vbar
│ │ ┌───┴─────┐
│ │ V *IP*
│ │ │ ┌───┴────┐
│ │ │ *NP* Ibar
│ │ │ │ ┌──┴────┐
│ │ │ │ I *VP*
│ │ │ │ │ │
│ │ │ │ │ Vbar
│ │ │ │ │ ┌─┴───┐
│ │ │ │ │ V NP
│ │ │ │ │ │ │
we +t1 want him to do it
The important thing is that *want* governs into the infinitival IP,
but not into the VP *do it*. As a consequence, *want* can
function as the case-assigner of *him*. An infinitival (non-tensed)
I is represented by either (I,to), as in the example above, or by
(I,+t0).
e. Gv (Passive government). This is basically the same as govern-
ment, except that you click a potential governee in order to find
its governor. As a counterpart to the ungrammatical sentence in
2.10.b, above, consider *John will like Mary's description of
herself* (treecad.in.15). This sentence is grammatical because it
obeys the Principle of Reflexive Binding, according to which
A reflexive must be bound in the minimal domain containing
it, its governor and a subject (Haegeman 1991:202).
Click Gv, then the NP *herself* to determine its governor: it is
the preposition *of*.
IP
┌────┴─────┐
NP Ibar
│ ┌────┴──────┐
│ I VP
│ │ │
│ │ Vbar
│ │ ┌─────┴───────┐
│ │ V NP
│ │ │ ┌──────┴────────┐
│ │ │ NSp Nbar
│ │ │ │ ┌────┴──────┐
│ │ │ NP N PP
│ │ │ │ │ │
│ │ │ │ │ Pbar
│ │ │ │ │ ┌──┴────┐
│ │ │ │ │ *P* NP
│ │ │ │ │ │ │
John will like Mary's description of herself
The subtree that includes the reflexive, the governor and the NP
*Mary's* constitutes the minimal domain for reflexive binding
here (cp. Haegeman 1991:201).
2.12. The ops group covers two types of Xbar specific operations:
adjunction and movement.
a. adj (adjunction) joins two solitary trees either on a bar level or
on the level of a maximal projection. For the following example
of a "bar adjunction" click adj, then PP and then Xbar.
1. XP PP 2. XP
┌──┴────┐ │ ┌─────┴──────┐
│ Xbar Pbar │ Xbar
│ ┌─┴───┐ ┌─┴───┐ │ ┌────┴──────┐
│ │ │ │ │ │ Xbar PP
│ │ │ │ │ │ ┌─┴───┐ │
│ │ │ │ │ │ │ │ Pbar
│ │ │ │ │ │ │ │ ┌─┴───┐
│ │ │ │ │ │ │ │ │ │
XSp X YP P YP XSp X YP P YP
ADJOIN<subtree>: PP TO<node>: Xbar
If the subtree originally occurs on the left of the matrix tree it
will be adjoined as a left branch.
As an exercise, you may want to try out possible bar
adjunctions for the ambiguous sentence *we saw the boy with the
telescope* (treecad.in.18):
IP PP
┌───┴─────┐ │
NP Ibar Pbar
│ ┌───┴────┐ ┌───┴─────┐
│ I VP P NP
│ │ │ │ ┌───┴────┐
│ │ Vbar │ NSp Nbar
│ │ ┌──┴────┐ │ │ │
│ │ V NP │ │ N
│ │ │ ┌─┴───┐ │ │ │
│ │ │ NSp Nbar │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ N │ │ │
│ │ │ │ │ │ │ │
we +t2 saw the boy with the telescope
Similarly, for an adjunction to a maximal projection, click adj,
then ZP and then XP:
1. XP ZP 2. XP
┌──┴────┐ │ ┌──────┴───────┐
│ Xbar Zbar XP ZP
│ ┌─┴───┐ │ ┌──┴────┐ │
│ │ │ │ │ Xbar Zbar
│ │ │ │ │ ┌─┴───┐ │
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
XSp X YP Z XSp X YP Z
ADJOIN<subtree>: ZP TO<node>: XP
For a concrete example, see 2.14 below. As with bar-adjunction,
if the subtree originates on the left of the matrix tree it will be
adjoined as a left branching adjunction.
b. mov (move) is both a general editing tool as well as an Xbar-
specific function. In its editing use, it moves a solitary tree to a
terminal node of a matrix tree, an operation which is equivalent
to combining a replacement copy and a cut. In configuration #1,
below, click mov, then the root node Z, then the terminal node
D:
1. A Z 2. A
┌──┴────┐ ┌─┴───┐ ┌────┴──────┐
│ C │ │ │ C
│ ┌─┴───┐ │ │ │ ┌───┴─────┐
│ │ │ │ │ │ Z │
│ │ │ │ │ │ ┌─┴───┐ │
│ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │
B D E X Y B X Y E
MOVE<node>: Z TO<terminal>: D
However, the main function of mov is Xbar-specific: it moves
a constituent of a tree from one position to a suitable landing site,
leaving a trace. The first tree below (treecad.in.24) represents the
structure of the echo question *He did talk about what?* Click
mov, then *did,* then C and you will get the structure of another
echo question, *Did he talk about what?*
1. CP 2. CP
┌───┴─────┐ ┌────┴─────┐
│ Cbar │ Cbar
│ ┌───┴─────┐ │ ┌───┴─────┐
│ │ IP │ │ IP
│ │ ┌───┴─────┐ │ │ ┌───┴─────┐
│ │ NP Ibar │ │ NP Ibar
│ │ │ ┌───┴─────┐ │ │ │ ┌───┴─────┐
│ │ │ I VP │ │ │ I VP
│ │ │ │ │ │ │ │ │ │
│ │ │ │ Vbar │ │ │ │ Vbar
│ │ │ │ ┌───┴────┐ │ │ │ │ ┌───┴────┐
│ │ │ │ V PP │ │ │ │ V PP
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ Pbar │ │ │ │ │ Pbar
│ │ │ │ │ ┌─┴───┐ │ │ │ │ │ ┌─┴───┐
│ │ │ │ │ P NP │ │ │ │ │ P NP
│ │ │ │ │ │ │ │ │ │ │ │ │ │
CSp C he did talk about what CSp did#1 he #1 talk about what
MOVE<node>: did TO<terminal>: C
As can be seen, the trace and the moved item are coindexed by
the notation #n.
As an exercise, continue moving *what* to Csp, deriving *What
did he talk about?* Undo this step and derive the variant *About
what did he talk?* These steps illustrate the movement patterns
known as "preposition stranding" and "pied piping" (Haegeman
1991: 341).
2.13. Exploring German main clause patterns. Contrary to the apparent
structural resemblance between main clauses in English and German (*Mary
likes John - Marie mag Jan*) it is now generally assumed that German VPs
and IPs have a head-last configuration. This is indeed borne out by the
fact that verb definitions in German are spontaneously presented by
native speakers as (object-)object-verb paradigms, e.g. *ein Buch
kaufen,* *jemandem etwas geben* etc. The following example (*daß Jan
bestimmt morgen das Buch kaufen wird,* treecad.in.30) may serve to
illustrate the fact that it is the German subordinate clause structure
which is the most productive base structure for deriving all kinds of
main clauses:
CP
┌────┴──────┐
│ Cbar
│ ┌─────┴───────┐
│ C IP
│ │ ┌───────┴─────────┐
│ │ NP Ibar
│ │ │ ┌──────────┴────────────┐
│ │ │ AdvP Ibar
│ │ │ │ ┌─────────┴──────────┐
│ │ │ │ VP I
│ │ │ │ │ │
│ │ │ │ Vbar │
│ │ │ │ ┌─────┴───────┐ │
│ │ │ │ AdvP Vbar │
│ │ │ │ │ ┌────┴─────┐ │
│ │ │ │ │ NP V │
│ │ │ │ │ ┌─┴───┐ │ │
CSp daß Jan bestimmt morgen das Buch kaufen wird
As a first step, cut off *daß*, so that two landing sites are available,
CSp for phrasal structures, and C for head-to-head movement. Next, mov
*wird* to C to obtain *Wird Jan bestimmt morgen das Buch kaufen?* Next,
mov *bestimmt* to CSp (*Bestimmt wird Jan morgen ...*). Undo this
version. Move *morgen* to CSp: *Morgen wird Jan bestimmt ...*. Undo that
and, finally, derive *Jan wird bestimmt morgen das Buch kaufen.*
2.14. Another typical feature of Germanic languages is the phenomenon
referred to as scrambling. Consider the derivation of *die Torte mit dem
Messer schneiden* whose D-structure Haegeman (1991:540) takes to be
(treecad.in.34):
VP
│
Vbar
┌─────────┴───────────┐
PP Vbar
│ ┌────┴─────┐
Pbar NP V
┌───┴────┐ │ │
P NP │ │
│ │ │ │
│ │ │ │
mit dem Messer die Torte schneiden
It appears that there is no suitable landing site for moving *die Torte* to
a place in front of *mit dem Messer*. However, landing sites may be
created by adjunction. To do this in TreeCad, generate a solitary NP tree
on the left of the VP. Adjoin this to the VP and cut it down to the
following shape:
VP
┌────────┴──────────┐
│ VP
│ │
│ Vbar
│ ┌─────────┴───────────┐
│ PP Vbar
│ │ ┌────┴─────┐
│ Pbar NP V
│ ┌───┴────┐ │ │
│ P NP │ │
│ │ │ │ │
NP mit dem Messer die Torte schneiden
*Die Torte* may now be moved to the newly created landing site:
VP
┌────────┴──────────┐
NP#1 VP
│ │
│ Vbar
│ ┌───────┴─────────┐
│ PP Vbar
│ │ ┌───┴────┐
│ Pbar │ V
│ ┌───┴────┐ │ │
│ P NP │ │
│ │ │ │ │
die Torte mit dem Messer #1 schneiden
2.15. Associative transfer rules map parametric (language specific)
structural features of one language into those of another (see Rolshoven
1991 for a discussion of the concept). As shown above, an important
parametric difference between English and German is the fact that the
former is an SVO language in which the heads of IP and VP phrases come
before their complements, whilst the latter is an SOV language with head-
last characteristics. Consider again the German D-structure for *Jan wird
morgen das Buch kaufen* (treecad.in.36):
CP
┌─────┴───────┐
│ Cbar
│ ┌───────┴────────┐
│ │ IP
│ │ ┌──────────┴───────────┐
│ │ NP Ibar
│ │ │ ┌─────────┴──────────┐
│ │ │ VP I
│ │ │ │ │
│ │ │ Vbar │
│ │ │ ┌─────┴───────┐ │
│ │ │ AdvP Vbar │
│ │ │ │ ┌────┴─────┐ │
│ │ │ │ NP V │
│ │ │ │ ┌─┴───┐ │ │
│ │ │ │ NSp Nbar │ │
│ │ │ │ │ │ │ │
CSp C Jan morgen das Buch kaufen wird
As it happens, the transfer rule that maps this structural configuration
into English syntax is TreeCad's mirror operation. First, mirror the
innermost Vbar to exchange the positions of verb and object. Then mirror
the higher Vbar to move the adverb to the end of the VP. Finally, mirror
the Ibar to move the auxiliary into pre-VP position.
CP
┌────┴─────┐
│ Cbar
│ ┌────┴──────┐
│ │ IP
│ │ ┌─────┴───────┐
│ │ NP Ibar
│ │ │ ┌───────┴─────────┐
│ │ │ I VP
│ │ │ │ │
│ │ │ │ Vbar
│ │ │ │ ┌──────┴────────┐
│ │ │ │ Vbar AdvP
│ │ │ │ ┌───┴────┐ │
│ │ │ │ V NP │
│ │ │ │ │ ┌─┴───┐ │
│ │ │ │ │ NSp Nbar │
│ │ │ │ │ │ │ │
CSp C Jan wird kaufen das Buch morgen
John will buy the book tomorrow
As a result we get a structural scaffold for *John will buy the book
tomorrow.*
3. FOX - A Frame-Oriented X-bar Parser
3.1. Overview and uses. FOX processes simple English sentences and
attempts to represent their syntactic structure in the form of X-bar
phrase markers (Haegeman 1991). FOX is designed to run on DOS PCs with a
VGA/EGA screen and a hard disk, preferably on 80386 or higher platforms.
Operation with lesser processors is possible, but tends to be sluggish.
Technically, the parser is a left-to-right, bottom-up, multipass
nondeterministic parser. In the event of unresolvable lexical or
structural ambiguity it attempts to produce all possible outcomes by
backtracking. In its present form the parser can be used for the
following purposes:
- as an interactive demonstration package illustrating the automatic
processing of a variety of core grammar (mostly textbook) cases;
- as an exploratory model for investigating lexical subcategorization
and syntactic ambiguity and developing disambiguation strategies.
3.2. Operation: DOS or Windows 3.1. FOX is not a proper Windows
program. However, under Windows 3.1 (386 mode) it can be run as a
"non-windows application in a window". Operation under Windows has a
number of advantages such as access to a smoothly moving mouse pointer,
resizeable system fonts, and the data exchange via the clipboard. Do the
following steps to set up FOX for Windows 3.1:
a. Copy the files Fox.ico and Fox.pif to the Windows 3.1 directory.
b. In Windows, start the PIF-Editor. Click *File/new*. Click *browse*
to locate and select Fox.pif. The only entries requiring any change
are the lines specifying the ICON directory. Adjust this to
whatever directory you are using for your TREE&FOX files. Exit
the PIF-Editor, saving the changes. Invoke the Program Manager's
*File/new* menu and OK the box *program item*. Enter "Fox.pif"
as *commandline* text. Click *change icon*. Disregard the error
message and click OK. Use *browse* to locate Fox.ico. Select it
and click OK. The FOX icon will appear among the Program
Manager's other program icons, and Fox is ready to run.
c. Further hints:
- There is an option to adjust font sizes in Fox's system menu
field. The most suitable font sizes are 8x12 and 7x12.
- The "edit" option lets you copy all or part of your Fox display
to the clipboard.
d. FOX may crash when the sysvars.max variable is set to a value
larger than 10. This may be owing to a lack of memory or the fact
that no ANSI.SYS driver is specified in the CONFIG.SYS file. For
large values of max (i.e., 11..15), make sure to resize the window
and select a suitable font.
3.3. Frame Orientation. The word "frame" in the parser's acronym goes
back to Marvin Minsky's conceptual model of human recognition
processes. His introductory definition of the key concepts is a good
starting point:
We can think of a frame as a network of nodes and relations. The
"top levels" of a frame are fixed, and represent things that are
always true about the supposed situation. The lower levels have many
*terminals* - "slots" that must be filled by specific instances or
data. Each terminal can specify conditions its assignments must
meet. (The assignments themselves are usually smaller "sub-frames.")
(Minsky 1975:1)
3.4. Linguistic Orientation. In keeping with Government and Binding (GB)
theory conventions, the FOX parser attempts to assign X-bar S-
structures which preserve their "underlying" D-structures. For its
initial syntactic frames, FOX depends on lexical "subcategorization
frames" (particularly those of verbs), and it capitalizes on the
"projection principle" which posits that all low-level syntactic
structure is based on lexical subcategorization (for details, see
Haegeman 1991). Whilst elements of "theta theory" have been encapsulated
in the sub- categorization frames of lexical entries, FOX is currently
not aware of any of the other supplementary modular subtheories (e.g.,
case and binding) normally treated within GB theory.
3.5. Limitations. The parser's recognition capabilities are restricted to a
purely syntactic level. To the parser, all sentences are like "The mome
raths outgrabe" (from Lewis Carroll's "Jabberwocky"). Even for input
such as this, human recognizers have an immeasurable advantage over the
FOX parser because they assume intuitively that *mome *is an adjective,
that *raths* is the plural form of a noun, and that *outgrabe* is a past
tense of a verb *outgribe*. FOX must be given this information before it
is able to perform a successful parse (the sentence is listed as fox.in.29).
At present, the FOX parser's grounding in realistic language data is still
extremely tenuous. Among the many features the parser does not know
how to handle are compounds, conjunctions, negation, gerunds, phrasal
verbs, tags and many other constructions. If you have inadvertently
entered a sentence containing such "unknown" elements, then a bogus
subcategorization category (such as X) may be used provisionally. (That
won't crash the parser.) Alternatively, enter an empty string to cancel the
processing of the sentence.
3.6. Trees. The following tree represents the last stage in the parser's
processing of fox.in.56, *Which book will John give to Mary?*
CP
┌──────┴───────┐
CSp Cbar
│ ┌────┴─────┐
NP#2 C IP
┌─┴───┐ │ ┌────┴─────┐
NSp Nbar I#1 NP Ibar
│ │ │ │ ┌────┴──────┐
wh N │ │ I VP
│ │ │ │ │ │
│ │ │ │ │ Vbar
│ │ │ │ │ ┌─────┼───────┐
│ │ │ │ │ V NP PP
│ │ │ │ │ │ │ │
which book will John #1 give #2 to_Mary
Note the following details:
a. There are some slight terminological idiosyncrasies - in particular,
"bars" are spelled out and the various conventional designations
C'', C', Spec, CompSpec, SpecComp, Det etc. are not used. In the
parser's notation, which is primarily motivated by ease of
computational handling, any head category X has the projections
X, Xbar and XP, and the specifier node is an XSp.
b. Movements and traces are indicated by indices #1, #2, etc.
c. The display depth of the tree shown is 8, and its virtual depth is
10, which means that some of its lower nonterminal nodes are not
represented (in this case, note the elliptical prep phrase). The
display depth can be adjusted from 4 to around 15. From display
depth 11 upwards, the parser switches to display mode 80,43 (this
may not work for all screens). Normal screen mode (80,25) is
restored after regular program termination.
d. Very observant readers will have noticed that the noun phrases
*John* and *Mary* appear as plain NPs, whereas *book* is pro-
jected fully. This is a subcategorizing option in the lexicon.
3.7. Invocation. The ready-to-run version of the FOX parser is started by
typing iconx fox at the command prompt or by clicking the FOX icon in
Windows' Program Manager. The initial menu comprises four options:
ESC:quit ENTER:corpus SPACE:interactive mode s:SYSVARS
a. Hit ENTER to view the corpus file fox.in. FOX runs the SCROLLER
program to list this file; if this feature does not work, the parser
can only be operated in interactive mode. Pick fox.in.1, *John will
see Mary.* FOX looks up the words in its internal lexicon and
presents the following initial ("given") structure:
NP I IP VP NP
│ │ ┌──┴────┐ │ │
│ │ │ Ibar Vbar │
│ │ │ ┌─┴───┐ ┌─┴───┐ │
│ │ │ │ │ V │ │
│ │ │ │ │ │ │ │
John will ?NP ?I ?VP see ?NP Mary
The parser will now automatically continue with a series of more
or less successful attempts to unify the material in a fully
saturated single structural tree. Most intermediate results are
obtained by procedures that "build" or "grab" or "trace" some-
thing. Once the parse has run its course - either by bottoming out
with a single tree structure or by getting stuck on a sequence of
incompatible subtrees - press ENTER to return to the main menu.
Repeat the process with some of the other sentences in the corpus
file, if you like, or ESCAPE to exit the parser.
Hint: Begin with the simple sentences in the corpus file in order
to get an impression of the parser's operation. These sentences
should all come out without any user intervention. Most of the
sentences from fox.in.17 onwards contain material requiring
interactive subcategorization (for which see below).
b. The SYSVARS option allows you to adapt the following variables
to specific requirements and circumstances:
- Verbose. Normally OFF (0). If toggled to 1 all debugging writes
are echoed on the screen and written to the protocol file.
- Max is the depth of the tree. If it exceeds 11, Fox attempts
to execute "mode 80,43" to set a 43 line display. The safe upper
limit for max lies around 15.
- Steps. The initial setting is *automatic step-by-step*. Two other
settings can be toggled: (1) *user-prompted step-by-step*; (0)
*no intermediate steps, final outcome only*.
c. SPACE:interactive mode. The words listed in the lexicon are
displayed on the screen, and the user is prompted to enter a
sentence. There are a few simple ground rules:
(1) Punctuation is allowed but will be ignored.
(2) The parser is case sensitive but will test whether a lower case
version of the first word in the sentence is listed in the
lexicon.
(3) There is no limit on the length of sentences, but the parser
will trim the tree display to the 80 columns of the screen and
has no horizontal scrolling facility. Output to the session
protocol, however, is not truncated in this manner.
(4) For words not listed in the lexicon, subcategorization frames
will be requested from the user.
3.8. Session protocol. There is no way of undoing or replaying steps, but
all trees generated are saved in a plain text file named fox.tmp. Note that
the session file is overlaid (i.e., deleted) at the beginning of each FOX
session. If any of the trees are to be saved, fox.tmp must be copied or
renamed before FOX is restarted. Be warned that, for long sessions,
fox.tmp can become quite large.
3.9. The lexicon file. A lexicon has been provided in the file fox-lex
which can be edited with an ASCII editor. Its format is largely self-explan-
atory, but note the following details:
a. Since fox-lex is read when FOX is started and is kept in memory
until program termination, its size must obviously remain within
manageable proportions. (I am assuming there won't be much
space left.)
b. Blank lines are ignored, likewise lines beginning with the hash
character (#).
c. In the file dump shown below, the definition of the sub-
categorizing frames begins in column 14. The word itself and its
definition must be separated by at least two spaces.
d. Words can be entered in any order.
### FOX LEXICON ###
### Auxiliaries
be V ?NP_?NP?AP
am +t1 be
are +t1 be
is +t1 be
was +t2 be
were +t2 be
being +pt1 be
been +pt2 be
do V ?NP_?NP
does +t1 do
did +t2 do
doing +pt1 do
done +pt2 do
have V ?NP_?NP
has +t1 have
had +t2+pt2 have
having +pt1 have
to P
### Adjectives
able A _%CP1
green A
little A
lucky A _%CP1
wrong A _%CP1
### Complementizers
whether C
that C
### Inflexionals
will I
would I
### Noun-Specifiers
a NSp
the NSp
### Nouns
boy N
book N _?of
Friday N
he NP
it NP
John NP
London N
Mary NP
man N
student N _?of
we NP
you NP
Xself NP
### wh-words
what NPwh
who NPwh
whom NPwh
which NSpwh
when PPwh
### Prepositions
about P
by P
in P
of P
on P
without P _%CP1
### Verbs
believe V ?NP_?NP?IP?CP
believes +t1 believe
believed +pt2 believe
buy V ?NP_?NP,%NP
give V ?NP_%NP,%NP?PP
gave +t2 give
given +pt2 give
hate V ?NP_?NP
hated +t2+pt2 hate
invite V ?NP_?NP
know V ?NP_?CP?NP
leave V ?NP_%NP
like V ?NP_?NP
meet V ?NP_?NP
met +t2 meet
persuade V ?NP_?NP,?CP2
persuaded +t2+pt2 persuade
promise V ?NP_%NP,?NP?CP1
promised +t2+pt2 promise
promising +pt1+a promise
read V ?NP_?NP?CP
reading +pt1 read
relax V ?NP_
relaxed +t2 relax
resign V ?NP_
resigned +t2 resign
see V ?NP_?NP?CP
saw +t2 see
seeing +pt1+a see
seen +pt2 see
seem V ?ISp_?IP
seemed +t2 seem
talk V ?NP_?about
want V ?NP_?NP?IP?CP1
wonder V ?NP_?CP1
wondered +t2 wonder
### Multiple Cats
big N;A
sleep V ?NP_;N
3.10. Notes on subcategorization.
a. Many lexical items can simply be subcategorized by specifying
their grammatical category (cp. the entries for *green, that, boy,*
etc.). In some cases, unnecessary X-bar structure may be
suppressed by directly specifying the maximal projection (cp.
*John, we*).
b. The notation N _?of in the case of *student* indicates that we
want the parser to treat an *of*-PP following *student* as a
complement.
c. Verbs are major structure determiners, and the parser will use the
subcategorization information of each verb to hypothesize two
major structures: a clausal IP frame and a verb phrase frame, the
latter to be slotted into the IP frame at a later stage.
For the parser, verbs are subcategorized according to the
number and type of their external and internal arguments (their
theta grid). The external argument of a verb is usually a subject
noun phrase in the specifier position of an IP. Internal arguments
are noun phrases, clausal phrases or prep phrases in the
complement position of a verb phrase (cp. the entries for *resign,
invite, talk*). Variant complement categories are simply
concatenated (cp. *believe*).
d. As for inflected verb forms, +t1 denotes present tense, +t2 past
tense, +pt1 the present participle, +pt2 the past participle. If a
tensed or participle form can be used as an adjunct (as in *a
promising boy*), specify +a.
e. Optional or implicit theta roles are experimentally flagged by the
notation %XP (cp. *give*).
f. For items exerting subject control (*promise*), specify an ?XP1
complememt. For object control items (*persuade*), specify ?XP2.
g. Note the special subcategorization for the raising verb *seem*.
h. Lexical ambiguity is indicated by concatenating several sub-
categorization frames and separating them by a semicolon (cp.
*big* and *sleep*). The parser's (inefficient) heuristic for dealing
with multiple subcategorization is backtracking (see para 3.13).
i. Certain trivial subcategorization detail is handled automatically by
the parser. For instance, it is not necessary to specify NP comple-
ments for prepositions. Words ending in *-ly* are taken to be
adverbs. Words ending in *'s* are processed as genitive case NPs.
The parser can also usually recognize passives and proceed
accordingly. Also, the parser's lookup procedure makes a decision
on whether a word such as *hated* is a tensed form or the past
participle of *hate* in the context given.
3.11. Sample dialogue. The following is a typical interactive dialogue in
which the parser requests an additional subcategorization frame (user
input italicized):
*John kissed Mary.*
looking up: John kissed Mary
NO LEX ENTRY FOR: kissed
SUBCATEGORIZE: *+t2+pt2 kiss*
NO LEX ENTRY FOR: kiss
SUBCATEGORIZE: *V ?NP_?NP*
Interactively subcategorized items are added to the lexicon for the
duration of the session and will not be newly requested in subsequent
occurrences. Interactive additions to the lexicon are temporary, and on
leaving the program the added words and their definitions are forgotten.
There is no provision for interactively retracting or changing entries.
3.12. Handling of adjuncts. The parser has no sophisticated heuristics
for placing adjuncts. Thus for the notorious *we saw the boy with the
telescope* (fox.in.20), the parser will just leave the PP stranded. On the
other hand, the parser can be given a cue as to where to attach the PP by
changing the input either to *we saw the boy (Nbar,?PP) with the
telescope* or to *we saw the boy (Vbar,?PP) with the telescope.* See the
corpus file for a number of similar cases.
3.13. Lexical ambiguity. The parser's processing of ambiguous strings
can be illustrated by letting it parse fox.in.27, *the big sleep*. As shown
in the dump of fox-lex, *big* has been subcategorized as N;A (i.e. both
for a noun and and adjective), and *sleep* has been subcategorized V
?NP_;N, i.e., both as an intransitive verb and a noun. Four outcomes are
possible, two of which, namely *big1+sleep1* and *big2+sleep2*,
succeed, whilst the other two, *big1+sleep2* and *big2+sleep1*, fail. To
observe the parsing strategy in detail, set the steps SYSVAR to 1.
4. References
Carroll, Lewis. Through the Looking-Glass. In The Annotated Alice, ed.
Martin Gardner. Harmondsworth: Penguin, 1974 [1896].
Fanselow, Gisbert/Sascha W. Felix. 1990. Die Rektions- und Bindungs-
theorie. Tübingen: Francke.
Griswold, Ralph E./Madge T. Griswold. 1990. The ICON Programming
Language: Second Edition. Englewood Cliffs: Prentice Hall.
Griswold, Ralph. 1992. Version 8.5 of Icon for MS-DOS 386/486
Platforms. The U. of Arizona Icon Project, Doc. IPD184. [See note on
ICON, below.]
Haegeman, Liliane. 1991. Introduction to Government and Binding Theory.
Cambridge, Mass.: Blackwell.
Minsky, Marvin. 1975. "A Framework for Representing Knowledge".
Frame conceptions and text understanding, ed. Dieter Metzing. Berlin:
deGruyter.
Radford, Andrew. 1988. Transformational Grammar: A First Course.
Cambridge U.P.
Rolshoven, Jürgen. 1991. "GB und sprachliche Informationsverarbeitung
mit LPS". Romanische Computerlinguistik: Theorien und Implemen-
tationen, ed. J. Rolshoven and D. Seelbach. Tübingen: Niemeyer.
Note on ICON: All main program modules in TREE & FOX were implemented
using the sophisticated features offered by the ICON programming
language. Griswold & Griswold (1990) is the primary reference text. The
University of Arizona publishes a monthly Icon Newsletter as well as a
bi-monthly technical report called The Icon Analyst. Icon has been
ported to practically all types of platforms and operating systems. For
subscription and ordering details, contact The Icon Project, Dept. of
Computer Science, Gould-Simpson Building, The University of Arizona,
Tucson AZ 85721, U.S.A.