home *** CD-ROM | disk | FTP | other *** search
GNU Info File | 1996-10-12 | 45.3 KB | 968 lines |
- This is Info file gettext.info, produced by Makeinfo-1.64 from the
- input file /ade-src/fsf/gettext/doc/gettext.texi.
-
- START-INFO-DIR-ENTRY
- * Gettext Utilities: (gettext). GNU gettext utilities.
- * gettextize: (gettext)gettextize Invocation. Prepare a package for gettext.
- * msgfmt: (gettext)msgfmt Invocation. Make MO files out of PO files.
- * msgmerge: (gettext)msgmerge Invocation. Update two PO files into one.
- * xgettext: (gettext)xgettext Invocation. Extract strings into a PO file.
- END-INFO-DIR-ENTRY
-
- This file provides documentation for GNU `gettext' utilities.
-
- Copyright (C) 1995 Free Software Foundation, Inc.
-
- Permission is granted to make and distribute verbatim copies of this
- manual provided the copyright notice and this permission notice are
- preserved on all copies.
-
- Permission is granted to copy and distribute modified versions of
- this manual under the conditions for verbatim copying, provided that
- the entire resulting derived work is distributed under the terms of a
- permission notice identical to this one.
-
- Permission is granted to copy and distribute translations of this
- manual into another language, under the above conditions for modified
- versions, except that this permission notice may be stated in a
- translation approved by the Foundation.
-
- File: gettext.info, Node: Modifying Translations, Next: Modifying Comments, Prev: Obsolete Entries, Up: Updating
-
- Modifying Translations
- ======================
-
- PO mode prevents direct edition of the PO file, by the usual means
- Emacs give for altering a buffer's contents. By doing so, it pretends
- helping the translator to avoid little clerical errors about the
- overall file format, or the proper quoting of strings, as those errors
- would be easily made. Other kinds of errors are still possible, but
- some may be caught and diagnosed by the batch validation process, which
- the translator may always trigger by the `V' command. For all other
- errors, the translator has to rely on her own judgment, and also on the
- linguistic reports submitted to her by the users of the translated
- package, having the same mother tongue.
-
- When the time comes to create a translation, correct an error
- diagnosed mechanically or reported by a user, the translators have to
- resort to using the following commands for modifying the translations.
-
- `RET'
- Interactively edit the translation.
-
- `LFD'
- Reinitialize the translation with the original, untranslated
- string.
-
- `k'
- Save the translation on the kill ring, and delete it.
-
- `w'
- Save the translation on the kill ring, without deleting it.
-
- `y'
- Replace the translation, taking the new from the kill ring.
-
- The command `RET' (`po-edit-msgstr') opens a new Emacs window
- containing a copy of the translation taken from the current PO file
- entry, all ready for edition, fully modifiable and with the complete
- extent of GNU Emacs modifying commands. The string is presented to the
- translator expunged of all quoting marks, and she will modify the
- *unquoted* string in this window to heart's content. Once done, the
- regular Emacs command `M-C-c' (`exit-recursive-edit') may be used to
- return the edited translation into the PO file, replacing the original
- translation. The keys `C-c C-c' are bound so they have the same effect
- as `M-C-c'.
-
- If the translator becomes unsatisfied with her translation to the
- extent she prefers keeping the translation which was existent prior to
- the `RET' command, she may use the regular Emacs command `C-]'
- (`abort-recursive-edit') to merely get rid of edition, while preserving
- the original translation. Another way would be for her to exit
- normally with `C-c C-c', then type `U' once for undoing the whole
- effect of last edition.
-
- While editing her translation, the translator should pay attention to
- not inserting unwanted `RET' (carriage returns) characters at the end
- of the translated string if those are not meant to be there, or to
- removing such characters when they are required. Since these
- characters are not visible in the editing buffer, they are easily
- introduced by mistake. To help her, `RET' automatically puts the
- character `<' at the end of the string being edited, but this `<' is
- not really part of the string. On exiting the editing window with
- `C-c C-c', PO mode automatically removes such `<' and all whitespace
- added after it. If the translator adds characters after the
- terminating `<', it looses its delimiting property and integrally
- becomes part of the string. If she removes the delimiting `<', then
- the edited string is taken *as is*, with all trailing newlines, even if
- invisible. Also, if the translated string ought to end itself with a
- genuine `<', then the delimiting `<' may not be removed; so the string
- should appear, in the editing window, as ending with two `<' in a row.
-
- When a translation (or a comment) is being edited, the translator
- may move the cursor back into the PO file buffer and freely move to
- other entries, browsing at will. The edited entry will be recovered as
- soon as the edit ceases, because it is this entry only which is being
- modified. If, with an edition still opened, the translator wanders in
- the PO file buffer, she cannot modify any other entry. If she tries
- to, PO mode will react by suggesting that she abort the current edit,
- or else, by inviting her to finish the current edit prior to any other
- modification.
-
- The command `LFD' (`po-msgid-to-msgstr') initializes, or
- reinitializes the translation with the original string. This command
- is normally used when the translator wants to redo a fresh translation
- of the original string, disregarding any previous work.
-
- In fact, whether it is best to start a translation with an empty
- string, or rather with a copy of the original string, is a matter of
- taste or habit. Sometimes, the source language and the target language
- are so different that is simply best to start writing on an empty page.
- At other times, the source and target languages are so close that it
- would be a waste to retype a number of words already being written in
- the original string. A translator may also like having the original
- string right under her eyes, as she will progressively overwrite the
- original text with the translation, even if this requires some extra
- editing work to get rid of the original.
-
- The command `k' (`po-kill-msgstr') merely empties the translation
- string, so turning the entry into an untranslated one. But while doing
- so, its previous contents is put apart in a special place, known as the
- kill ring. The command `w' (`po-kill-ring-save-msgstr') has also the
- effect of taking a copy of the translation onto the kill ring, but it
- otherwise leaves the entry alone, and does *not* remove the translation
- from the entry. Both commands use exactly the Emacs kill ring, which
- is shared between buffers, and which is well known already to GNU Emacs
- lovers.
-
- The translator may use `k' or `w' many times in the course of her
- work, as the kill ring may hold several saved translations. From the
- kill ring, strings may later be reinserted in various Emacs buffers.
- In particular, the kill ring may be used for moving translation strings
- between different entries of a single PO file buffer, or if the
- translator is handling many such buffers at once, even between PO files.
-
- To facilitate exchanges with buffers which are not in PO mode, the
- translation string put on the kill ring by the `k' command is fully
- unquoted before being saved: external quotes are removed, multi-lines
- strings are concatenated, and backslashed escaped sequences are turned
- into their corresponding characters. In the special case of obsolete
- entries, the translation is also uncommented prior to saving.
-
- The command `y' (`po-yank-msgstr') completely replaces the
- translation of the current entry by a string taken from the kill ring.
- Following GNU Emacs terminology, we then say that the replacement
- string is "yanked" into the PO file buffer. *Note Yanking:
- (emacs)Yanking. The first time `y' is used, the translation receives
- the value of the most recent addition to the kill ring. If `y' is
- typed once again, immediately, without intervening keystrokes, the
- translation just inserted is taken away and replaced by the second most
- recent addition to the kill ring. By repeating `y' many times in a row,
- the translator may travel along the kill ring for saved strings, until
- she finds the string she really wanted.
-
- When a string is yanked into a PO file entry, it is fully and
- automatically requoted for complying with the format PO files should
- have. Further, if the entry is obsolete, PO mode then appropriately
- push the inserted string inside comments. Once again, translators
- should not burden themselves with quoting considerations besides, of
- course, the necessity of the translated string itself respective to the
- program using it.
-
- Note that `k' or `w' are not the only commands pushing strings on
- the kill ring, as almost any PO mode command replacing translation
- strings (or the translator comments) automatically save the old string
- on the kill ring. The main exceptions to this general rule are the
- yanking commands themselves.
-
- To better illustrate the operation of killing and yanking, let's use
- an actual example, taken from a common situation. When the programmer
- slightly modifies some string right in the program, his change is later
- reflected in the PO file by the appearance of a new untranslated entry
- for the modified string, and the fact that the entry translating the
- original or unmodified string becomes obsolete. In many cases, the
- translator might spare herself some work by retrieving the unmodified
- translation from the obsolete entry, then initializing the untranslated
- entry `msgstr' field with this retrieved translation. Once this done,
- the obsolete entry is not wanted anymore, and may be safely deleted.
-
- When the translator finds an untranslated entry and suspects that a
- slight variant of the translation exists, she immediately uses `m' to
- mark the current entry location, then starts chasing obsolete entries
- with `o', hoping to find some translation corresponding to the
- unmodified string. Once found, she uses the `z' command for deleting
- the obsolete entry, knowing that `z' also *kills* the translation, that
- is, pushes the translation on the kill ring. Then, `r' returns to the
- initial untranslated entry, `y' then *yanks* the saved translation
- right into the `msgstr' field. The translator is then free to use
- `RET' for fine tuning the translation contents, and maybe to later use
- `u', then `m' again, for going on with the next untranslated string.
-
- When some sequence of keys has to be typed over and over again, the
- translator may find it useful to become better acquainted with the GNU
- Emacs capability of learning these sequences and playing them back under
- request. *Note Keyboard Macros: (emacs)Keyboard Macros.
-
- File: gettext.info, Node: Modifying Comments, Next: Auxiliary, Prev: Modifying Translations, Up: Updating
-
- Modifying Comments
- ==================
-
- Any translation work done seriously will raise many linguistic
- difficulties, for which decisions have to be made, and the choices
- further documented. These documents may be saved within the PO file in
- form of translator comments, which the translator is free to create,
- delete, or modify at will. These comments may be useful to herself
- when she returns to this PO file after a while.
-
- The following commands are somewhat similar to those modifying
- translations, so the general indications given for those apply here.
- *Note Modifying Translations::.
-
- `#'
- Interactively edit the translator comments.
-
- `K'
- Save the translator comments on the kill ring, and delete it.
-
- `W'
- Save the translator comments on the kill ring, without deleting it.
-
- `Y'
- Replace the translator comments, taking the new from the kill ring.
-
- These commands parallel PO mode commands for modifying the
- translation strings, and behave much the same way as they do, except
- that they handle this part of PO file comments meant for translator
- usage, rather than the translation strings. So, if the descriptions
- given below are slightly succinct, it is because the full details have
- already been given. *Note Modifying Translations::.
-
- The command `#' (`po-edit-comment') opens a new Emacs window
- containing a copy of the translator comments on the current PO file
- entry. If there are no such comments, PO mode understands that the
- translator wants to add a comment to the entry, and she is presented
- with an empty screen. Comment marks (`#') and the space following them
- are automatically removed before edition, and reinstated after. For
- translator comments pertaining to obsolete entries, the uncommenting
- and recommenting operations are done twice. Once in the editing
- window, the keys `C-c C-c' allow the translator to tell she is finished
- with editing the comment.
-
- The command `K' (`po-kill-comment') get rid of all translator
- comments, while saving those comments on the kill ring. The command
- `W' (`po-kill-ring-save-comment') takes a copy of the translator
- comments on the kill ring, but leaves them undisturbed in the current
- entry. The command `Y' (`po-yank-comment') completely replaces the
- translator comments by a string taken at the front of the kill ring.
- When this command is immediately repeated, the comments just inserted
- are withdrawn, and replaced by other strings taken along the kill ring.
-
- On the kill ring, all strings have the same nature. There is no
- distinction between *translation* strings and *translator comments*
- strings. So, for example, let's presume the translator has just
- finished editing a translation, and wants to create a new translator
- comment to document why the previous translation was not good, just to
- remember what was the problem. Foreseeing that she will do that in her
- documentation, the translator may want to quote the previous
- translation in her translator comments. To do so, she may initialize
- the translator comments with the previous translation, still at the
- head of the kill ring. Because editing already pushed the previous
- translation on the kill ring, she merely has to type `M-w' prior to
- `#', and the previous translation will be right there, all ready for
- being introduced by some explanatory text.
-
- On the other hand, presume there are some translator comments already
- and that the translator wants to add to those comments, instead of
- wholly replacing them. Then, she should edit the comment right away
- with `#'. Once inside the editing window, she can use the regular GNU
- Emacs commands `C-y' (`yank') and `M-y' (`yank-pop') to get the
- previous translation where she likes.
-
- File: gettext.info, Node: Auxiliary, Prev: Modifying Comments, Up: Updating
-
- Consulting Auxiliary PO Files
- =============================
-
- An incoming feature of PO mode should help the knowledgeable
- translator to take advantage of translations already achieved in other
- languages she just happens to know, by providing these other language
- translation as additional context for her own work. Each PO file
- existing for the same package the translator is working on, but
- targeted to a different mother tongue language, is called an
- "auxiliary" PO file. Commands will exist for declaring and handling
- auxiliary PO files, and also for showing contexts for the entry under
- work. For this to work fully, all auxiliary PO files will have to be
- normalized.
-
- File: gettext.info, Node: Binaries, Next: Users, Prev: Updating, Up: Top
-
- Producing Binary MO Files
- *************************
-
- * Menu:
-
- * msgfmt Invocation:: Invoking the `msgfmt' Program
- * MO Files:: The Format of GNU MO Files
-
- File: gettext.info, Node: msgfmt Invocation, Next: MO Files, Prev: Binaries, Up: Binaries
-
- Invoking the `msgfmt' Program
- =============================
-
- Usage: msgfmt [OPTION] FILENAME.po ...
-
- `-a NUMBER'
- `--alignment=NUMBER'
- Align strings to NUMBER bytes (default: 1).
-
- `-h'
- `--help'
- Display this help and exit.
-
- `--no-hash'
- Binary file will not include the hash table.
-
- `-o FILE'
- `--output-file=FILE'
- Specify output file name as FILE.
-
- `--strict'
- Direct the program to work strictly following the Uniforum/Sun
- implementation. Currently this only affects the naming of the
- output file. If this option is not given the name of the output
- file is the same as the domain name. If the strict Uniforum mode
- is enable the suffix `.mo' is added to the file name if it is not
- already present.
-
- We find this behaviour of Sun's implementation rather silly and so
- by default this mode is *not* selected.
-
- `-v'
- `--verbose'
- Detect and diagnose input file anomalies which might represent
- translation errors. The `msgid' and `msgstr' strings are studied
- and compared. It is considered abnormal that one string starts or
- ends with a newline while the other does not.
-
- Also, if the string represents a format sring used in a
- `printf'-like function both strings should have the same number of
- `%' format specifiers, with matching types. If the flag
- `c-format' or `possible-c-format' appears in the special comment
- #, for this entry a check is performed. For example, the check
- will diagnose using `%.*s' against `%s', or `%d' against `%s', or
- `%d' against `%x'. It can even handle positional parameters.
-
- Normally the `xgettext' program automatically decides whether a
- string is a format string or not. This algorithm is not perfect,
- though. It might regard a string as a format string though it is
- not used in a `printf'-like function and so `msgfmt' might report
- errors where there are none. Or the other way round: a string is
- not regarded as a format string but it is used in a `printf'-like
- function.
-
- So solve this problem the programmer can dictate the decision to
- the `xgettext' program (*note c-format::.). The translator should
- not consider removing the flag from the #, line. This "fix" would
- be reversed again as soon as `msgmerge' is called the next time.
-
- `-V'
- `--version'
- Output version information and exit.
-
- If input file is `-', standard input is read. If output file is
- `-', output is written to standard output.
-
- File: gettext.info, Node: MO Files, Prev: msgfmt Invocation, Up: Binaries
-
- The Format of GNU MO Files
- ==========================
-
- The format of the generated MO files is best described by a picture,
- which appears below.
-
- The first two words serve the identification of the file. The magic
- number will always signal GNU MO files. The number is stored in the
- byte order of the generating machine, so the magic number really is two
- numbers: `0x950412de' and `0xde120495'. The second word describes the
- current revision of the file format. For now the revision is 0. This
- might change in future versions, and ensures that the readers of MO
- files can distinguish new formats from old ones, so that both can be
- handled correctly. The version is kept separate from the magic number,
- instead of using different magic numbers for different formats, mainly
- because `/etc/magic' is not updated often. It might be better to have
- magic separated from internal format version identification.
-
- Follow a number of pointers to later tables in the file, allowing
- for the extension of the prefix part of MO files without having to
- recompile programs reading them. This might become useful for later
- inserting a few flag bits, indication about the charset used, new
- tables, or other things.
-
- Then, at offset O and offset T in the picture, two tables of string
- descriptors can be found. In both tables, each string descriptor uses
- two 32 bits integers, one for the string length, another for the offset
- of the string in the MO file, counting in bytes from the start of the
- file. The first table contains descriptors for the original strings,
- and is sorted so the original strings are in increasing lexicographical
- order. The second table contains descriptors for the translated
- strings, and is parallel to the first table: to find the corresponding
- translation one has to access the array slot in the second array with
- the same index.
-
- Having the original strings sorted enables the use of simple binary
- search, for when the MO file does not contain an hashing table, or for
- when it is not practical to use the hashing table provided in the MO
- file. This also has another advantage, as the empty string in a PO
- file GNU `gettext' is usually *translated* into some system information
- attached to that particular MO file, and the empty string necessarily
- becomes the first in both the original and translated tables, making
- the system information very easy to find.
-
- The size S of the hash table can be zero. In this case, the hash
- table itself is not contained in the MO file. Some people might prefer
- this because a precomputed hashing table takes disk space, and does not
- win *that* much speed. The hash table contains indices to the sorted
- array of strings in the MO file. Conflict resolution is done by double
- hashing. The precise hashing algorithm used is fairly dependent of GNU
- `gettext' code, and is not documented here.
-
- As for the strings themselves, they follow the hash file, and each
- is terminated with a NUL, and this NUL is not counted in the length
- which appears in the string descriptor. The `msgfmt' program has an
- option selecting the alignment for MO file strings. With this option,
- each string is separately aligned so it starts at an offset which is a
- multiple of the alignment value. On some RISC machines, a correct
- alignment will speed things up.
-
- Nothing prevents a MO file from having embedded NULs in strings.
- However, the program interface currently used already presumes that
- strings are NUL terminated, so embedded NULs are somewhat useless. But
- MO file format is general enough so other interfaces would be later
- possible, if for example, we ever want to implement wide characters
- right in MO files, where NUL bytes may accidently appear.
-
- This particular issue has been strongly debated in the GNU `gettext'
- development forum, and it is expectable that MO file format will evolve
- or change over time. It is even possible that many formats may later
- be supported concurrently. But surely, we have to start somewhere, and
- the MO file format described here is a good start. Nothing is cast in
- concrete, and the format may later evolve fairly easily, so we should
- feel comfortable with the current approach.
-
- byte
- +------------------------------------------+
- 0 | magic number = 0x950412de |
- | |
- 4 | file format revision = 0 |
- | |
- 8 | number of strings | == N
- | |
- 12 | offset of table with original strings | == O
- | |
- 16 | offset of table with translation strings | == T
- | |
- 20 | size of hashing table | == S
- | |
- 24 | offset of hashing table | == H
- | |
- . .
- . (possibly more entries later) .
- . .
- | |
- O | length & offset 0th string ----------------.
- O + 8 | length & offset 1st string ------------------.
- ... ... | |
- O + ((N-1)*8)| length & offset (N-1)th string | | |
- | | | |
- T | length & offset 0th translation ---------------.
- T + 8 | length & offset 1st translation -----------------.
- ... ... | | | |
- T + ((N-1)*8)| length & offset (N-1)th translation | | | | |
- | | | | | |
- H | start hash table | | | | |
- ... ... | | | |
- H + S * 4 | end hash table | | | | |
- | | | | | |
- | NUL terminated 0th string <----------------' | | |
- | | | | |
- | NUL terminated 1st string <------------------' | |
- | | | |
- ... ... | |
- | | | |
- | NUL terminated 0th translation <---------------' |
- | | |
- | NUL terminated 1st translation <-----------------'
- | |
- ... ...
- | |
- +------------------------------------------+
-
- File: gettext.info, Node: Users, Next: Programmers, Prev: Binaries, Up: Top
-
- The User's View
- ***************
-
- When GNU `gettext' will truly have reached is goal, average users
- should feel some kind of astonished pleasure, seeing the effect of that
- strange kind of magic that just makes their own native language appear
- everywhere on their screens. As for naive users, they would ideally
- have no special pleasure about it, merely taking their own language for
- *granted*, and becoming rather unhappy otherwise.
-
- So, let's try to describe here how we would like the magic to
- operate, as we want the users' view to be the simplest, among all ways
- one could look at GNU `gettext'. All other software engineers:
- programmers, translators, maintainers, should work together in such a
- way that the magic becomes possible. This is a long and progressive
- undertaking, and information is available about the progress of the GNU
- Translation Project.
-
- When a package is distributed, there are two kind of users:
- "installers" who fetch the distribution, unpack it, configure it,
- compile it and install it for themselves or others to use; and "end
- users" that call programs of the package, once these have been
- installed at their site. GNU `gettext' is offering magic for both
- installers and end users.
-
- * Menu:
-
- * Matrix:: The Current `NLS' Matrix for GNU
- * Installers:: Magic for Installers
- * End Users:: Magic for End Users
-
- File: gettext.info, Node: Matrix, Next: Installers, Prev: Users, Up: Users
-
- The Current `NLS' Matrix for GNU
- ================================
-
- Languages are not equally supported in all GNU packages. To know if
- some GNU package uses GNU `gettext', one may check the distribution for
- the `NLS' information file, for some `LL.po' files, often kept together
- into some `po/' directory, or for an `intl/' directory.
- Internationalized packages have usually many `LL.po' files, where LL
- represents the language. *Note End Users:: for a complete description
- of the format for LL.
-
- More generally, a matrix is available for showing the current state
- of GNU internationalization, listing which packages are prepared for
- multi-lingual messages, and which languages is supported by each.
- Because this information changes often, this matrix is not kept within
- this GNU `gettext' manual. This information is often found in file
- `NLS' from various GNU distributions, but is also as old as the
- distribution itself. A recent copy of this `NLS' file, containing
- up-to-date information, should generally be found on most GNU archive
- sites.
-
- File: gettext.info, Node: Installers, Next: End Users, Prev: Matrix, Up: Users
-
- Magic for Installers
- ====================
-
- By default, packages fully using GNU `gettext', internally, are
- installed in such a way that they to allow translation of messages. At
- *configuration* time, those packages should automatically detect
- whether the underlying host system provides usable `catgets' or
- `gettext' functions. If neither is present, the GNU `gettext' library
- should be automatically prepared and used. Installers may use special
- options at configuration time for changing this behavior. The command
- `./configure --with-included-gettext' bypasses system `catgets' or
- `gettext' to use GNU `gettext' instead, while `./configure
- --disable-nls' produces program totally unable to translate messages.
-
- Internationalized packages have usually many `LL.po' files. Unless
- translations are disabled, all those available are installed together
- with the package. However, the environment variable `LINGUAS' may be
- set, prior to configuration, to limit the installed set. `LINGUAS'
- should then contain a space separated list of two-letter codes, stating
- which languages are allowed.
-
- File: gettext.info, Node: End Users, Prev: Installers, Up: Users
-
- Magic for End Users
- ===================
-
- We consider here those packages using GNU `gettext' internally, and
- for which the installers did not disable translation at *configure*
- time. Then, users only have to set the `LANG' environment variable to
- the appropriate `LL' prior to using the programs in the package. *Note
- Matrix::. For example, let's presume a German site. At the shell
- prompt, users merely have to execute `setenv LANG de' (in `csh') or
- `export LANG; LANG=de' (in `sh'). They could even do this from their
- `.login' or `.profile' file.
-
- File: gettext.info, Node: Programmers, Next: Translators, Prev: Users, Up: Top
-
- The Programmer's View
- *********************
-
- One aim of the current message catalog implementation provided by
- GNU `gettext' was to use the systems message catalog handling, if the
- installer wishes to do so. So we perhaps should first take a look at
- the solutions we know about. The people in the POSIX committee does not
- manage to agree on one of the semi-official standards which we'll
- describe below. In fact they couldn't agree on anything, so nothing
- decide only to include an example of an interface. The major Unix
- vendors are split in the usage of the two most important
- specifications: X/Opens catgets vs. Uniforums gettext interface. We'll
- describe them both and later explain our solution of this dilemma.
-
- * Menu:
-
- * catgets:: About `catgets'
- * gettext:: About `gettext'
- * Comparison:: Comparing the two interfaces
- * Using libintl.a:: Using libintl.a in own programs
- * gettext grok:: Being a `gettext' grok
- * Temp Programmers:: Temporary Notes for the Programmers Chapter
-
- File: gettext.info, Node: catgets, Next: gettext, Prev: Programmers, Up: Programmers
-
- About `catgets'
- ===============
-
- The `catgets' implementation is defined in the X/Open Portability
- Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the
- process of creating this standard seemed to be too slow for some of the
- Unix vendors so they created their implementations on preliminary
- versions of the standard. Of course this leads again to problems while
- writing platform independent programs: even the usage of `catgets' does
- not guarantee a unique interface.
-
- Another, personal comment on this that only a bunch of committee
- members could have made this interface. They never really tried to
- program using this interface. It is a fast, memory-saving
- implementation, an user can happily live with it. But programmers hate
- it (at least me and some others do...)
-
- But we must not forget one point: after all the trouble with
- transfering the rights on Unix(tm) they at last came to X/Open, the
- very same who published this specifications. This leads me to making
- the prediction that this interface will be in future Unix standards
- (e.g. Spec1170) and therefore part of all Unix implementation
- (implementations, which are *allowed* to wear this name).
-
- * Menu:
-
- * Interface to catgets:: The interface
- * Problems with catgets:: Problems with the `catgets' interface?!
-
- File: gettext.info, Node: Interface to catgets, Next: Problems with catgets, Prev: catgets, Up: catgets
-
- The Interface
- -------------
-
- The interface to the `catgets' implementation consists of three
- functions which correspond to those used in file access: `catopen' to
- open the catalog for using, `catgets' for accessing the message tables,
- and `catclose' for closing after work is done. Prototypes for the
- functions and the needed definitions are in the `<nl_types.h>' header
- file.
-
- `catopen' is used like in this:
-
- nl_catd catd = catopen ("catalog_name", 0);
-
- The function takes as the argument the name of the catalog. This
- usual refers to the name of the program or the package. The second
- parameter is not further specified in the standard. I don't even know
- whether it is implemented consistently among various systems. So the
- common advice is to use `0' as the value. The return value is a handle
- to the message catalog, equivalent to handles to file returned by
- `open'.
-
- This handle is of course used in the `catgets' function which can be
- used like this:
-
- char *translation = catgets (catd, set_no, msg_id, "original string");
-
- The first parameter is this catalog descriptor. The second parameter
- specifies the set of messages in this catalog, in which the message
- described by `msg_id' is obtained. `catgets' therefore uses a
- three-stage addressing:
-
- catalog name => set number => message ID => translation
-
- The fourth argument is not used to address the translation. It is
- given as a default value in case when one of the addressing stages
- fail. One important thing to remember is that although the return type
- of catgets is `char *' the resulting string *must not* be changed. It
- should better `const char *', but the standard is published in 1988,
- one year before ANSI C.
-
- The last of these function functions is used and behaves as expected:
-
- catclose (catd);
-
- After this no `catgets' call using the descriptor is legal anymore.
-
- File: gettext.info, Node: Problems with catgets, Prev: Interface to catgets, Up: catgets
-
- Problems with the `catgets' Interface?!
- ---------------------------------------
-
- Now that this descriptions seemed to be really easy where are the
- problem we speak of. In fact the interface could be used in a
- reasonable way, but constructing the message catalogs is a pain. The
- reason for this lies in the third argument of `catgets': the unique
- message ID. This has to be a numeric value for all messages in a single
- set. Perhaps you could imagine the problems keeping such list while
- changing the source code. Add a new message here, remove one there. Of
- course there have been developed a lot of tools helping to organize this
- chaos but one as the other fails in one aspect or the other. We don't
- want to say that the other approach has no problems but they are far
- more easily to manage.
-
- File: gettext.info, Node: gettext, Next: Comparison, Prev: catgets, Up: Programmers
-
- About `gettext'
- ===============
-
- The definition of the `gettext' interface comes from a Uniforum
- proposal and it is followed by at least one major Unix vendor (Sun) in
- its last developments. It is not specified in any official standard,
- though.
-
- The main points about this solution is that it does not follow the
- method of normal file handling (open-use-close) and that it does not
- burden the programmer so many task, especially the unique key handling.
- Of course here is also a unique key needed, but this key is the message
- itself (how long or short it is). *Note Comparison:: for a more
- detailed comparison of the two methods.
-
- The following section contains a rather detailed description of the
- interface. We make it that detailed because this is the interface we
- chose for the GNU `gettext' Library. Programmers interested in using
- this library will be interested in this description.
-
- * Menu:
-
- * Interface to gettext:: The interface
- * Ambiguities:: Solving ambiguities
- * Locating Catalogs:: Locating message catalog files
- * Optimized gettext:: Optimization of the *gettext functions
-
- File: gettext.info, Node: Interface to gettext, Next: Ambiguities, Prev: gettext, Up: gettext
-
- The Interface
- -------------
-
- The minimal functionality an interface must have is a) to select a
- domain the strings are coming from (a single domain for all programs is
- not reasonable because its construction and maintenance is difficult,
- perhaps impossible) and b) to access a string in a selected domain.
-
- This is principally the description of the `gettext' interface. It
- has an global domain which unqualified usages reference. Of course this
- domain is selectable by the user.
-
- char *textdomain (const char *domain_name);
-
- This provides the possibility to change or query the current status
- of the current global domain of the `LC_MESSAGE' category. The
- argument is a null-terminated string, whose characters must be legal in
- the use in filenames. If the DOMAIN_NAME argument is `NULL', the
- function return the current value. If no value has been set before,
- the name of the default domain is returned: *messages*. Please note
- that although the return value of `textdomain' is of type `char *' no
- changing is allowed. It is also important to know that no checks of
- the availability are made. If the name is not available you will see
- this by the fact that no translations are provided.
-
- To use a domain set by `textdomain' the function
-
- char *gettext (const char *msgid);
-
- is to be used. This is the simplest reasonable form one can imagine.
- The translation of the string MSGID is returned if it is available in
- the current domain. If not available the argument itself is returned.
- If the argument is `NULL' the result is undefined.
-
- One things which should come into mind is that no explicit
- dependency to the used domain is given. The current value of the
- domain for the `LC_MESSAGES' locale is used. If this changes between
- two executions of the same `gettext' call in the program, both calls
- reference a different message catalog.
-
- For the easiest case, which is normally used in internationalized GNU
- packages, once at the beginning of execution a call to `textdomain' is
- issued, setting the domain to a unique name, normally the package name.
- In the following code all strings which have to be translated are
- filtered through the gettext function. That's all, the package speaks
- your language.
-
- File: gettext.info, Node: Ambiguities, Next: Locating Catalogs, Prev: Interface to gettext, Up: gettext
-
- Solving Ambiguities
- -------------------
-
- While this single name domain work good for most applications there
- might be the need to get translations from more than one domain. Of
- course one could switch between different domains with calls to
- `textdomain', but this is really not convenient nor is it fast. A
- possible situation could be one case discussing while this writing: all
- error messages of functions in the set of common used functions should
- go into a separate domain `error'. By this mean we would only need to
- translate them once.
-
- For this reasons there are two more functions to retrieve strings:
-
- char *dgettext (const char *domain_name, const char *msgid);
- char *dcgettext (const char *domain_name, const char *msgid,
- int category);
-
- Both take an additional argument at the first place, which
- corresponds to the argument of `textdomain'. The third argument of
- `dcgettext' allows to use another locale but `LC_MESSAGES'. But I
- really don't know where this can be useful. If the DOMAIN_NAME is
- `NULL' or CATEGORY has an value beside the known ones, the result is
- undefined. It should also be noted that this function is not part of
- the second known implementation of this function family, the one found
- in Solaris.
-
- A second ambiguity can arise by the fact, that perhaps more than one
- domain has the same name. This can be solved by specifying where the
- needed message catalog files can be found.
-
- char *bindtextdomain (const char *domain_name,
- const char *dir_name);
-
- Calling this function binds the given domain to a file in the
- specified directory (how this file is determined follows below). Esp a
- file in the systems default place is not favored against the specified
- file anymore (as it would be by solely using `textdomain'). A `NULL'
- pointer for the DIR_NAME parameter returns the binding associated with
- DOMAIN_NAME. If DOMAIN_NAME itself is `NULL' nothing happens and a
- `NULL' pointer is returned. Here again as for all the other functions
- is true that none of the return value must be changed!
-
- File: gettext.info, Node: Locating Catalogs, Next: Optimized gettext, Prev: Ambiguities, Up: gettext
-
- Locating Message Catalog Files
- ------------------------------
-
- Because many different languages for many different packages have to
- be stored we need some way to add these information to file message
- catalog files. The way usually used in Unix environments is have this
- encoding in the file name. This is also done here. The directory name
- given in `bindtextdomain's second argument (or the default directory),
- followed by the value and name of the locale and the domain name are
- concatenated:
-
- DIR_NAME/LOCALE/LC_CATEGORY/DOMAIN_NAME.mo
-
- The default value for DIR_NAME is system specific. For the GNU
- library it's:
- /usr/local/share/locale
-
- LOCALE is the value of the locale whose name is this `LC_CATEGORY'.
- For `gettext' and `dgettext' this locale is always `LC_MESSAGES'.
- `dcgettext' specifies the locale by the third argument.(1) (2)
-
- ---------- Footnotes ----------
-
- (1) Some system, eg Ultrix, don't have `LC_MESSAGES'. Here we use
- a more or less arbitrary value for it.
-
- (2) When the system does not support `setlocale' its behavior in
- setting the locale values is simulated by looking at the environment
- variables.
-
- File: gettext.info, Node: Optimized gettext, Prev: Locating Catalogs, Up: gettext
-
- Optimization of the *gettext functions
- --------------------------------------
-
- At this point of the discussion we should talk about an advantage of
- the GNU `gettext' implementation. Some readers might have pointed out
- that an internationalized program might have a poor performance if some
- string has to be translated in an inner loop. While this is unavoidable
- when the string varies from one run of the loop to the other it is
- simply a waste of time when the string is always the same. Take the
- following example:
-
- {
- while (...)
- {
- puts (gettext ("Hello world"));
- }
- }
-
- When the locale selection does not change between two runs the resulting
- string is always the same. One way to use this is:
-
- {
- str = gettext ("Hello world");
- while (...)
- {
- puts (str);
- }
- }
-
- But this solution is not usable in all situation (e.g. when the locale
- selection changes) nor is it good readable.
-
- The GNU C compiler, version 2.7 and above, provide another solution
- for this. To describe this we show here some lines of the
- `intl/libgettext.h' file. For an explanation of the expression command
- block see *Note Statements and Declarations in Expressions:
- (gcc)Statement Exprs.
-
- # if defined __GNUC__ && __GNUC__ == 2 && __GNUC_MINOR__ >= 7
- # define dcgettext(domainname, msgid, category) \
- (__extension__ \
- ({ \
- char *result; \
- if (__builtin_constant_p (msgid)) \
- { \
- extern int _nl_msg_cat_cntr; \
- static char *__translation__; \
- static int __catalog_counter__; \
- if (! __translation__ \
- || __catalog_counter__ != _nl_msg_cat_cntr) \
- { \
- __translation__ = \
- dcgettext__ ((domainname), (msgid), (category)); \
- __catalog_counter__ = _nl_msg_cat_cntr; \
- } \
- result = __translation__; \
- } \
- else \
- result = dcgettext__ ((domainname), (msgid), (category)); \
- result; \
- }))
- # endif
-
- The interesting thing here is the `__builtin_constant_p' predicate.
- This is evaluated at compile time and so optimization can take place
- immediately. Here two cases are distinguished: the argument to
- `gettext' is not a constant value in which case simply the function
- `dcgettext__' is called, the real implementation of the `dcgettext'
- function.
-
- If the string argument *is* constant we can reuse the once gained
- translation when the locale selection has not changed. This is exactly
- what is done here. The `_nl_msg_cat_cntr' variable is defined in the
- `loadmsgcat.c' which is available in `libintl.a' and is changed
- whenever a new message catalog is loaded.
-
-