ProfitPress Mega CDROM2 Shareware Freeware (MSDOS)(1992)(Eng)

home *** CD-ROM | disk | FTP | other *** search

/ ProfitPress Mega CDROM2 …eeware (MSDOS)(1992)(Eng) / ProfitPress-MegaCDROM2.B6I / TEXT / UTILITY / MTRAN12.ZIP / DEMO.ZIP / READ.ME2 < prev next >

Wrap

Text File | 1991-08-29 | 7.6 KB | 141 lines

08/29/91 Dear Fellow Icon enthusiasts, Here's a letter answering some questions that were asked about the TRAN1 machine translation program. Perhaps some of the information here is of general interest. First of all, a fundamental principle underlying the design of this machine translation program is the idea that it is reasonable to put a good deal of manual analysis into a text that will be translated into a multitude of target languages. An example of such a text is the Bible which still has not been translated into some 3500 minority languages. Other suitable candidates for this type of treatment are owners manuals for various products and the legislation of the European Community. A corollary to this first principle is the notion that any machine translation program will be more successful if the grammar of the source text is as limited as possible. In keeping with this corollary the syntax of the program's input text has been greatly simplified. A second fundamental principle is that the program attempts to translate meaning rather than just words. To that end the analysis of the source text is based on the theory expounded in _The Semantic Structure of Written Communication_ (Beekman, Callow, & Kopesec 1981). According to the SSWC concepts/meanings come in four classes: Things, Events, Attributes, and Relations (1981:49). In their simplest forms things are represented by nouns, events by verbs, attributes by adjectives and adverbs, and relations by function words like conjunctions, sentence adverbs, and prepositions. A formidable problem for the translator presents itself when concepts are not represented in their simplest forms; this is called lexical skewing. For instance, in the sentence, "John gave Mary some help" the word "help" is really an event. A simpler/unskewed way to express the same meaning would be, "John helped Mary." In the analysis of the source text included with the program an attempt was made to eliminate lexical skewing to the fullest extent possible. It should be noted that this is not entirely necessary when translating between closely related languages, but it becomes critical when translating into minority languages which may lack abstract nouns for events like "love" or "forgiveness". As noted above, an attempt was also made to utilize a very limited syntax in the analysis of the source text. Ideally a sentence should consist of a subject, verb, objects, and possibly a relative clause. Passive voice is not permitted because it does not exist in all languages. Conjunctions and sentence adverbs are used in a stylized manner (ie. they always mean the same thing). To facilitate translation of meanings rather than words, a system utilizing connecting underscores and subscripting digits was employed. For instance, "chief_priests1" is treated as a single concept, and thus contains a connecting underscore. It is also followed by the subscripting digit "1" to distinguish this concept from any others which might possibly be renderable by the same English words. The subscripting digits used are somewhat arbitrary, but in the case of verbs the digits 1 through 3 were used for first, second, and third person singular verbs, and the digits 2 through 6 were used for the plural forms. Thus "know6" would mean "they know". Forms such as "chief_priests1" and "know6" are considered to be arbitrary symbols for units of meaning. They could just as easily have been rendered as "abc1" and "xyx6", but this would have resulted in an input text that was unreadable. Nevertheless, the idea that these symbols are arbitrary is important. For example, "chief_priests1" may be rendered fairly literally in one language (ie. 'sacerdotes principales' in Spanish), but in another language the translation may sound more like 'honored old men of ceremonial rites'. The arbitrary forms used to represent meanings are called semantic tags in the program. Since the program is attempting to translate meanings rather than words, it uses an invention called a semanticon rather than a lexicon. Each semanticon entry begins with a semantic tag as described above. The next field in each entry is a morpological tag. A morphological tag is basically a part of speech, but it can contain additional information such as person, number, gender, tense, and so on. The morphological tag refers to the target language rendering of the concept represented by the semantic tag. This target language rendering may not strictly match the semantic tag in the traditional sense. For instance, "sacerdotes principales" 'priests high' is not a noun in the traditional sense, but a combination of a noun plus an adjective. However, it functions as a single unit, and for this reason the conglomerate is treated as a noun in the semanticon. The next field in the semanticon entry is the target language rendering of the concept represented by the semantic tag. It generally contains a single target language word, but it may contain multiple words. If the morphological tag is "n" for noun, the entry consists of an article followed by one or more words connected by underscores which loosely represent a noun. If the morphological tag is one of those for adjectives, the entry consists of four words: a masculine and a feminine singular adjective and a masculine and a feminine plural adjective. The source language text to be translated contains braces. These braces are used to delimit portions of the text which should be translated as a unit. For instance, noun phrases and prepositional phrases should be surrounded by braces, and it's a good idea to surround the main clause by braces. The program translates text in braces as units so if a noun phrase is surrounded by braces the program will never make the article of that noun phrase agree with a noun which is outside that noun phrase. To make the program translate into some other language such as French, it is first necessary to change the semanticon to contain French renderings for the semantic tags. (The semanticon can be changed with a text editor.) Note that French requires explicit subject pronouns so the entry for "know6" would contain two words meaning 'they know' rather than the single Spanish word 'saben'. After this is done, it will still be necessary to make some program modifications, but they should not be too formidable for French. First of all, the program has some global variables containing Spanish articles. These need to be changed to their French counterparts, but it probably won't be necessary to change the identifier names. Second, it will be necessary to modify the procedure contract(). The rules for contraction will be different in French. Likewise, the procedure phono_adj() which makes phonological adjustments (like "a house" but "an hour") will have to be modified to follow French rules. The procedure which moves object pronouns in front of verbs may or may not need to be modified. (I don't know what the rules are for French.) None of the required modifications should be too time consuming since the entire program was written in just fifteen days. Translations into Portuguese, Italian, and possibly French, as well as a Papua New Guinea language called Tigak are planned for later this year. Doug Witmer Internet: b912dieg@utarlg.uta.edu Bitnet: b912dieg@utarlg smail: 1102 Enterprise Drive #149, Grand Prairie, Texas 75051