home *** CD-ROM | disk | FTP | other *** search
- 08/29/91
-
- Dear Fellow Icon enthusiasts,
-
- Here's a letter answering some questions that were asked
- about the TRAN1 machine translation program. Perhaps some of the
- information here is of general interest.
-
- First of all, a fundamental principle underlying the design
- of this machine translation program is the idea that it is
- reasonable to put a good deal of manual analysis into a text that
- will be translated into a multitude of target languages. An
- example of such a text is the Bible which still has not been
- translated into some 3500 minority languages. Other suitable
- candidates for this type of treatment are owners manuals for
- various products and the legislation of the European Community. A
- corollary to this first principle is the notion that any machine
- translation program will be more successful if the grammar of the
- source text is as limited as possible. In keeping with this
- corollary the syntax of the program's input text has been greatly
- simplified.
-
- A second fundamental principle is that the program attempts
- to translate meaning rather than just words. To that end the
- analysis of the source text is based on the theory expounded in
- _The Semantic Structure of Written Communication_ (Beekman,
- Callow, & Kopesec 1981). According to the SSWC concepts/meanings
- come in four classes: Things, Events, Attributes, and Relations
- (1981:49). In their simplest forms things are represented by
- nouns, events by verbs, attributes by adjectives and adverbs, and
- relations by function words like conjunctions, sentence adverbs,
- and prepositions. A formidable problem for the translator
- presents itself when concepts are not represented in their
- simplest forms; this is called lexical skewing. For instance, in
- the sentence, "John gave Mary some help" the word "help" is really
- an event. A simpler/unskewed way to express the same meaning
- would be, "John helped Mary."
-
- In the analysis of the source text included with the program
- an attempt was made to eliminate lexical skewing to the fullest
- extent possible. It should be noted that this is not entirely
- necessary when translating between closely related languages, but
- it becomes critical when translating into minority languages which
- may lack abstract nouns for events like "love" or "forgiveness".
-
- As noted above, an attempt was also made to utilize a very
- limited syntax in the analysis of the source text. Ideally a
- sentence should consist of a subject, verb, objects, and
- possibly a relative clause. Passive voice is not permitted
- because it does not exist in all languages. Conjunctions and
- sentence adverbs are used in a stylized manner (ie. they always
- mean the same thing).
-
- To facilitate translation of meanings rather than words, a
- system utilizing connecting underscores and subscripting digits
- was employed. For instance, "chief_priests1" is treated as a
- single concept, and thus contains a connecting underscore. It is
- also followed by the subscripting digit "1" to distinguish this
- concept from any others which might possibly be renderable by the
- same English words. The subscripting digits used are somewhat
- arbitrary, but in the case of verbs the digits 1 through 3 were
- used for first, second, and third person singular verbs, and the
- digits 2 through 6 were used for the plural forms. Thus "know6"
- would mean "they know".
-
- Forms such as "chief_priests1" and "know6" are considered to
- be arbitrary symbols for units of meaning. They could just as
- easily have been rendered as "abc1" and "xyx6", but this would
- have resulted in an input text that was unreadable. Nevertheless,
- the idea that these symbols are arbitrary is important. For
- example, "chief_priests1" may be rendered fairly literally in one
- language (ie. 'sacerdotes principales' in Spanish), but in another
- language the translation may sound more like 'honored old men of
- ceremonial rites'. The arbitrary forms used to represent meanings
- are called semantic tags in the program.
-
- Since the program is attempting to translate meanings rather
- than words, it uses an invention called a semanticon rather than a
- lexicon. Each semanticon entry begins with a semantic tag as
- described above. The next field in each entry is a morpological
- tag. A morphological tag is basically a part of speech, but it
- can contain additional information such as person, number, gender,
- tense, and so on. The morphological tag refers to the target
- language rendering of the concept represented by the semantic tag.
- This target language rendering may not strictly match the semantic
- tag in the traditional sense. For instance, "sacerdotes
- principales" 'priests high' is not a noun in the traditional
- sense, but a combination of a noun plus an adjective. However, it
- functions as a single unit, and for this reason the conglomerate
- is treated as a noun in the semanticon. The next field in the
- semanticon entry is the target language rendering of the concept
- represented by the semantic tag. It generally contains a single
- target language word, but it may contain multiple words. If the
- morphological tag is "n" for noun, the entry consists of an
- article followed by one or more words connected by underscores
- which loosely represent a noun. If the morphological tag is one
- of those for adjectives, the entry consists of four words: a
- masculine and a feminine singular adjective and a masculine and a
- feminine plural adjective.
-
- The source language text to be translated contains braces.
- These braces are used to delimit portions of the text which should
- be translated as a unit. For instance, noun phrases and
- prepositional phrases should be surrounded by braces, and it's a
- good idea to surround the main clause by braces. The program
- translates text in braces as units so if a noun phrase is
- surrounded by braces the program will never make the article of
- that noun phrase agree with a noun which is outside that noun
- phrase.
-
- To make the program translate into some other language such
- as French, it is first necessary to change the semanticon to
- contain French renderings for the semantic tags. (The semanticon
- can be changed with a text editor.) Note that French requires
- explicit subject pronouns so the entry for "know6" would contain
- two words meaning 'they know' rather than the single Spanish word
- 'saben'. After this is done, it will still be necessary to make
- some program modifications, but they should not be too formidable
- for French. First of all, the program has some global variables
- containing Spanish articles. These need to be changed to their
- French counterparts, but it probably won't be necessary to change
- the identifier names. Second, it will be necessary to modify the
- procedure contract(). The rules for contraction will be different
- in French. Likewise, the procedure phono_adj() which makes
- phonological adjustments (like "a house" but "an hour") will have
- to be modified to follow French rules. The procedure which moves
- object pronouns in front of verbs may or may not need to be
- modified. (I don't know what the rules are for French.) None of
- the required modifications should be too time consuming since the
- entire program was written in just fifteen days.
-
- Translations into Portuguese, Italian, and possibly French,
- as well as a Papua New Guinea language called Tigak are planned
- for later this year.
-
- Doug Witmer
- Internet: b912dieg@utarlg.uta.edu
- Bitnet: b912dieg@utarlg
- smail: 1102 Enterprise Drive #149, Grand Prairie, Texas 75051
-