home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The World of Computer Software
/
World_Of_Computer_Software-02-386-Vol-2of3.iso
/
v
/
viscii11.zip
/
REPORT.TX8
< prev
Wrap
Text File
|
1993-01-14
|
210KB
|
3,846 lines
VIETNAMESE CHARACTER ENCODING STANDARDIZATION REPORT
VISCII AND VIQR 1.1 CHARACTER ENCODING SPECIFICATIONS
============================================================================
Bilingual Report
Viet-Std, September 1992
(c) Copyright 1992, Viet-Std. All rights reserved. Permission in
hereby granted for limited use, duplication, and distribution,
in whole or in part, provided that no modifications are made.
All documentation is provided without express or implied warranty.
The Viet-Std Group
1212 Somerset Drive
San Jose, California 95132
U.S.A.
TABLE OF CONTENTS
~~~~~~~~~~~~~~~~~
PREFACE ................................................................... 1
TH┐ NGÖ G╝I QU═ Vÿ TRONG NG└NH ╨IÄN TO┴N .................................. 3
A UNIFIED FRAMEWORK FOR VIETNAMESE INFORMATION PROCESSING ................. 7
Abstract ................................................................ 7
1. INTRODUCTION ......................................................... 7
2. REVIEW OF CURRENT CONVENTIONS ........................................ 10
3. VISCII: 8-BIT ENCODING SPECIFICATION FOR VIETNAMESE .................. 11
3.1 MOTIVATION ....................................................... 11
3.2 ENCODING RATIONALE ............................................... 13
4. VIQR: MNEMONIC ENCODING SPECIFICATION FOR VIETNAMESE ................. 16
4.1 MOTIVATION ....................................................... 16
4.2 QUOTED-READABLE SPECIFICATION (VIQR) ............................. 18
4.2.1 Implicit Composition ....................................... 18
4.2.2 Explicit Composition ....................................... 19
4.2.3 Literal State .............................................. 19
4.2.4 English State .............................................. 20
4.2.5 Vietnamese State ........................................... 20
4.2.6 Character Literals in English and Vietnamese States ........ 20
4.2.7 Closure .................................................... 21
5. SPECIFIC APPLICATIONS ................................................ 21
5.1 ELECTRONIC MAIL OVER 7-BIT CHANNELS .............................. 21
5.2 VIETNAMESE KEYBOARDING ........................................... 22
5.2.1 Immediate Echo for Implicit Composition .................... 22
5.2.2 Delayed Echo for Explicit Composition ...................... 23
5.3 ADAPTING EXISTING VIETNAMESE APPLICATIONS ........................ 24
6 SUMMARY & CONCLUSIONS ................................................. 24
REFERENCES .............................................................. 24
GLOSSARY OF TERMS ....................................................... 26
APPENDIX A: Vietnamese Characters under VISCII and VIQR
by Collating Order .......................................... 29
APPENDIX B: Vietnamese Characters under VISCII and VIQR
by Encoding Order ........................................... 30
iii
MôT KHU╘N KHæ THÅNG NHäT CHO VIÄC X╝ L▌ D KIÄN VIÄT NG .................. 32
T≤m L▀■c ................................................................ 32
1. LûI GIòI THIÄU ....................................................... 32
2. DUYÄT LÇI NH NG QUY ┐òC HIÄN THûI .................................... 33
3. VISCII: QUY ╨ÿNH M├ 8-BIT CHO VIÄT NG ............................... 37
3.1 ╨ôNG L╣C ........................................................ 37
3.2 C┴C L▌ DO BIÄN MINH VIÄC M├ H╙A .................................. 37
4. VIQR: QUY ╨ÿNH VIÄT NG ╨ÜC-╨┐öC-TRONG-NGOâC ......................... 41
4.1 ╨ôNG L╣C ......................................................... 41
4.2 QUY ╨ÿNH "╨ÜC ╨┐öC-TRONG-NGOâC" (VIQR) ........................... 43
4.2.1 PhΘp T╒o Chµ NgÑm ........................................ 43
4.2.1 PhΘp T╒o Chµ Ch∩ ╨╕nh .................................... 42
4.2.3 Tr╒ng Thßi NguyΩn D╒ng ................................... 44
4.2.4 Tr╒ng Thßi Anh Ngµ ....................................... 45
4.2.5 Tr╒ng Thßi Vi«t Ngµ ...................................... 45
4.2.6 NguyΩn T± Trong Tr╒ng Thßi Anh Ngµ vα Vi«t Ngµ ........... 45
4.2.7 K² t± Hoαn Cñu ........................................... 46
5. C┴C ║NG D₧NG ╨âC BIÄT ................................................ 46
5.1 ╨IÄN TH┐ CHUYîN QUA MÇCH 7-BIT ................................... 46
5.2 ╨┴NH CH VIÄT .................................................... 47
5.2.1 Cßch T╒o H∞nh Lºp T╤c Trong PhΘp T╒o Chµ NgÑm ............ 47
5.2.2 Cßch T╒o H∞nh Chºm Trong PhΘp T╒o Chµ Ch∩ ╨╕nh ........... 48
5.2 HöP TH║C H╙A ║NG D₧NG VIÄT NG HIÄN H└NH ......................... 49
6. T╙M TüT & KèT LUçN ................................................... 49
T└I LIÄU THAM KH─O ...................................................... 49
THUçT NG ANH-VIÄT ...................................................... 51
Ph° L°c A: Mτu T± Vi«t Li«t KΩ Theo Th╤ T± Síp Chµ ...................... 58
Ph° L°c B: Mτu T± Vi«t Li«t KΩ Theo Th╤ T± Mπ S» ........................ 59
ANNOUNCEMENT OF VISCII-COMPLIANT SOFTWARE APPLICATIONS .................... 61
1. FOR UNIX & X-WINDOWS ................................................. 61
2. FOR MS-WINDOWS ....................................................... 63
3. FOR MS-DOS ........................................................... 64
4. WORK IN PROGRESS ..................................................... 65
5. PROPOSED PROJECTS .................................................... 65
iv
6. GETTING SOFTWARE ..................................................... 66
6.1 Using Anonymous ftp .............................................. 66
6.2 Using Modem or MailServer at Saigon.COM .......................... 67
6.3 Using Modem to Phsys.COM ......................................... 68
6.4 Using Post Office ................................................ 68
7. COPYRIGHT ............................................................ 69
8. ABOUT THE TRICHLOR GROUP ............................................. 69
v
PREFACE
The Vietnamese Standardization Group (Viet-Std) was formed in the fall of 1989
to promote the standardization of Vietnamese character encoding and to monitor
ongoing work of international bodies in this regard. The group has been
working on designing and implementing a code table that can be integrated into
existing computing environments on many platforms. In addition, the group has
contacted the two committees on multilingual character encoding---the Unicode
Consortium and the International Standardizations Organization (ISO)---to
request that the Vietnamese characters be encoded in a precomposed form in the
same manner as other Latin-based European languages. All these efforts will
be reported fully in a special issue to be published in the near future. This
special report collects only works concerning the 7- and 8-bit encodings of
the Vietnamese alphabet.
The first article is a cover letter in Vietnamese that summarizes the
Viet-Std Group's work in the area of 7- and 8-bit Vietnamese character
encoding.
The second article covers the Vietnamese character encoding in 7 and 8
bits. It reviews the pros and cons of current encoding schemes and discusses
the vital need to integrate into existing computing environments that a
standard must address. It then presents the Viet-Std's 8-bit proposal for the
Vietnamese Standard Code for Information Interchange (known as VISCII) and a
Vietnamese Quoted-Readable convention (VIQR) to represent Vietnamese
characters in 7-bit ASCII. The article also examines some guidelines and
conventions in handling Vietnamese electronic mail over 7-bit channnels,
Vietnamese keyboarding, and adapting existing Vietnamese applications. It
includes two appendices listing both VISCII and VIQR for reference purposes.
The third article is the Vietnamese translation of the above to serve the
general Vietnamese community.
The last item included in this report is the announcement of the third
release of public-domain Vietnamese software by the TriChlor Group. It
summarizes software packages either developed or enhanced by TriChlor members
and other independent developers to run on the Unix, X-Windows, DOS, and MS-
Windows platforms.
Since the release of version 1.0 in January 1992, VISCII has undergone
only one change---swapping the two characters ╒ (a dot-below) and á (O
tilde)--- to accommodate MS-Windows. To reflect this change, the versions of
VISCII and VIQR published in this report are called VISCII 1.1 and VIQR 1.1
although VIQR was unchanged. This report therefore supercedes all previous
publications concerning VISCII and VIQR.
A brief history of the Vietnamese Standardization Group is in order. The
group was born out of the Viet-Net electronic mailing list, which comprised
members at companies and universities throughout the computing world. At the
1
2
time, Vietnamese software for publishing existed commercially only on the
personal computer platform, but each package was limited to its designed
function and could not be easily intermixed with others. No standard encoding
existed. Developers designed their own schemes to perform the job they
needed. Even Viet-Net itself had invented a 7-bit mnemonic writing style for
Vietnamese for use in e-mail. For example, "N▀╛c Vi«t Nam" would be typed as
"Nu+o+'c Vie^.t Nam." Everyone agreed that some form of standardization was
necessary so as to promote availability of and access to Vietnamese computing,
and Viet-Net provided the forum that electronically brought together for the
first time a large number of individuals committed to this goal. With this
specific interest in mind, Viet-Std was formed by Thαnh Vσn Nguy¡n, and
standardization discussions moved to the mailing list "Viet-
Std@images.Sun.COM" and later to "Viet-Std@Haydn.Stanford.EDU." Early
contributors included C▀╢ng Tñn Nguy¡n, TΓm Nguy¡n, NhΓn TrÑn, Randall
Atkinson, Khoa T⌠n, KhiΩm H░, T▀╛c L▀╜ng, and many others too numerous to
acknowledge properly. Later joining the group were C▀╜ng Minh B∙i, H≈c ╨∞nh
Ng⌠, and others. This special report is a testimony to the success of the
group. In addition, the group has contributed its expertise to international
organizations on matters relevant to Vietnamese encoding; the details will be
reported in a separate issue.
It should be mentioned that TriChlor Software (a non-profit experimental
group) was formed by C▀╢ng Tñn Nguy¡n, C▀╜ng Minh B∙i, and Tφn LΩ to
independently explore and implement any encoding scheme for Vietnamese to gain
real experience with the pros and cons in each scheme. It has helped
integrate VISCII-compliant Vietnamese into popular public domain software
products.
It is our dream one day to be able to read, write, and exchange Vietnamese
data of a common format on any machine, any platform, and to take advantage of
all the processing tools that have been produced by the computing world. That
dream, once a pure exercise in imagination, has today come many steps closer
to realization.
Viet-Std Vietnamese Standardization Group
California, USA
September 1992
TH┐ NGÖ G╝I QU═-Vÿ TRONG NG└NH ╨IÄN-tO┴N C┘NG TäT-C─ QU═-Vÿ
QUAN-T┬M ╨èN VIÄC M├-H╙A CH VIÄT-NAM
Kφnh th▀a quφ v╕:
Ch·ng t⌠i lα m╡t nh≤m chuyΩn viΩn Vi«t Nam ╖ hΣi ngo╒i c╡ng tßc trong tinh
thÑn v⌠ v° l■i ≡¼ theo d⌡i vα ≡≤ng g≤p ² ki¬n chuyΩn m⌠n v½ chµ Vi«t Nam v╛i
cßc Vi«n ╨╕nh Chuªn Tin H≈c hoúc cßc c⌠ng ty ≡i«n toßn c≤ tÑm v≤c qu»c t¬.
Trong th╢i gian qua ch·ng t⌠i ≡π vºn ≡╡ng tφch c±c v╛i t▒ ch╤c Unicode vα Vi«n
╨╕nh Chuªn ISO ≡¼ h≈ tiΩu chuªn h≤a b╡ Vi«t-t±-mπ (Vietnamese character
encoding) trong khu⌠n kh▒ b╡ mπ chµ qu»c t¬ 16-bit vα 32-bit trong chi½u h▀╛ng
c≤ l■i nhñt. Ngoαi ra, ch·ng t⌠i ≡π nghiΩn c╤u vα ≡½ ngh╕ b╡ Vi«t-t±-mπ 7-bit
vα 8-bit ≡¼ giΣi quy¬t nhu cÑu phßt tri¼n vα trao ≡▒i nhu li«u hi«n nay. B╡
t±-mπ 8-bit c≤ tΩn Anh ngµ lα Vietnamese Standard Code for Information
Interchange hay g≈i tít lα VISCII ≡¼ phΓn bi«t v╛i cßc b╡ t±-mπ khßc. TiΩu
chuªn 7-bit ≡▀■c g≈i lα Quy-tíc ╨≈c-╨▀■c-Trong-Ngoúc (Vietnamese Quoted-
Readable Specification) hay g≈i tít lα VIQR. ╨Γy lα ≡½ tαi chφnh mα ch·ng t⌠i
mu»n thΣo luºn v╛i quφ v╕ qua lß th╜ nαy.
Tr▀╛c h¬t ch·ng t⌠i xin tr∞nh bαy v½ b╡ t±-mπ 8-bit VISCII. ╨i¼m qua quß
tr∞nh phßt tri¼n ngαnh ≡i«n toßn d∙ng chµ Vi«t ╖ hΣi ngo╒i, ch·ng t⌠i nhºn
thñy hÑu nh▀ m▓i c⌠ng ty hay m▓i nh≤m t± ≡út ra cho m∞nh m╡t b╡ Vi«t-t±-mπ,
v╛i h« quΣ tñt nhiΩn lα nhu li«u kh⌠ng th¼ trao ≡▒i d¡ dαng v╛i nhau. Do ≡≤
ch·ng t⌠i ≡π quy¬t ≡╕nh nghiΩn c╤u m╡t b╡ Vi«t-t±-mπ tiΩu chuªn (Vietnamese
character encoding standard) nhóm giΣi quy¬t t∞nh tr╒ng nαy.
Nh▀ quφ v╕ ≡π r⌡, ≡a s» nhu li«u hi«n nay ≡▀■c vi¬t d±a trΩn n½n tΣng m▓i
mτu t± ≡▀■c mπ h≤a bóng 8-bit (1 byte). V╛i 8-bit ch·ng ta c≤ th¼ mπ h≤a ≡▀■c
256 mτu t± hoúc tφn hi«u khßc nhau. V∞ nhµng l² do c≤ tφnh cßch l╕ch s╪, b╡
t±-mπ ASCII cⁿa Hoa k╧ d∙ng 128 mπ s» (code point or code value) ≡Ñu tiΩn ≡π
tr╖ thαnh tiΩu chuªn qu»c t¬. B╡ t±-mπ nαy bao g░m 32 tφn hi«u ≡i½u khi¼n
(control character) c≤ mπ s» t╫ 0 ≡¬n 31, vα 96 mπ s» c≥n l╒i dαnh cho m╡t s»
mτu t± La tinh, dñu chñm cΓu hoúc k² hi«u. (╨¼ ti«n vi«c thΣo luºn, ch·ng t⌠i
t╒m d╕ch control character lα ki¼m-t±, cßc chµ c≥n l╒i lα k²-t±. ╨╤ng trΩn
quan ≡i¼m ≡i«n toßn, ch·ng ta phΣi ≡╕nh nghεa thΩm mπ-t± lα bñt c╤ cßi g∞ ≡▀■c
t▀■ng tr▀ng bóng m╡t mπ s»; m▓i mπ-t± t▀╜ng ╤ng v╛i m╡t mπ s» vα ng▀■c l╒i.)
PhÑn c≥n l╒i (128 mπ s» t╫ 128 ≡¬n 255) th▀╢ng ≡▀■c quy ≡╕nh vα s╪ d°ng t∙y
theo nhu cÑu cⁿa m▓i qu»c gia hoúc t╫ng b╡ nhu li«u. Nh▀ vºy ch·ng ta c≤ th¼
tiΩu chuªn h≤a 128 mπ s» nαy cho cßc mτu t± Vi«t Nam kh⌠ng nóm trong danh sßch
128 chµ ASCII.
TrΩn th±c t¬ vi«c mπ h≤a chµ Vi«t lα cΣ m╡t vñn ≡½ ph╤c t╒p. Ngoαi cßc
ph° Γm, ch·ng ta c≤ 12 nguyΩn Γm chφnh (A ┼ ┬ E ╩ I O ╘ ┤ U ┐ Y) vα 60 nguyΩn
Γm khßc ≡▀■c t╒o thαnh do cßc nguyΩn Γm chφnh k¬t h■p v╛i 5 dñu gi≈ng (síc,
huy½n, h÷i, ngπ, núng). Nh▀ vºy chµ Vi«t c≤ tñt cΣ 144 nguyΩn Γm v╫a th▀╢ng
v╫a hoa. Ch·ng ta c≤ th¼ k¼ ra hai ph▀╜ng phßp chφnh ≡¼ mπ h≤a nguyΩn Γm Vi«t
3
4
Nam nh▀ sau:
1. M▓i nguyΩn Γm chφnh vα m▓i dñu gi≈ng ≡▀■c xem nh▀ cßc mπ-t± riΩng bi«t,
nghεa lα ch∩ cÑn 29 mπ s» ≡¼ mπ h≤a 12 nguyΩn Γm chµ th▀╢ng, 12 nguyΩn
Γm chµ hoa vα 5 dñu gi≈ng. Thφ d°, mτu t± └ ≡▀■c xem nh▀ g░m c≤ hai mπ-
t± "A" vα mπ-t± "`".
2. M▓i nguyΩn Γm, vα dñu gi≈ng n¬u c≤, ≡▀■c xem nh▀ m╡t mτu t± duy nhñt
(mπ-t±), nghεa lα phΣi cÑn ≡¬n 144 mπ s» ≡¼ mπ h≤a tñt cΣ nguyΩn Γm
Vi«t.
Ph▀╜ng phßp ≡Ñu tiΩn, c≥n ≡▀■c g≈i lα ph▀╜ng phßp dñu r╢i, ≡π ≡▀■c ßp d°ng
╖ nhi½u qu»c gia d∙ng chµ La-tinh c≤ dñu ph° (diacritical mark). Nh▀ng quß
tr∞nh th╪ nghi«m cho thñy ph▀╜ng phßp nαy c≤ khuy¬t ≡i¼m l╛n lao v½ nhi½u mút
nh▀ t»c ≡╡ x╪ l² giΣm s·t, nhu cÑu v½ s╤c ch╤a (storage) vα b╡ nh╛ (memory)
gia tσng, vi«c thΣo ch▀╜ng ph╤c t╒p, kh⌠ng th¼ tφch h■p vαo m⌠i tr▀╢ng nhu
li«u vα c▀╜ng li«u hi«n hµu ... V∞ nhµng l² do nαy, ph▀╜ng phßp dñu r╢i hÑu
nh▀ kh⌠ng c≥n ≡▀■c s╪ d°ng trong k█ ngh« ≡i«n toßn hi«n nay.
╨¼ c≤ th¼ h╡i nhºp vαo n½n k█ ngh« ≡i«n toßn cⁿa th¬ gi╛i, ch·ng ta phΣi
chñp nhºn giΣi phßp th╤ hai, nghεa lα phΣi mπ h≤a m╡t s» l▀■ng rñt l╛n mτu t±
Vi«t. Tr╫ m╡t s» chµ Vi«t ≡π c≤ s╟n trong b╡ t±-mπ ASCII, ch·ng ta c≤ tñt cΣ
134 mτu t± cÑn phΣi mπ h≤a trong khi ch∩ c≤ 128 mπ s» c≥n tr»ng mα th⌠i.
Cßch giΣi quy¬t th⌠ng th▀╢ng lα t∞m cßch thay th¬ 6 mπ-t± ASCII nαo ≡≤ bóng 6
mτu t± Vi«t. C≤ rñt nhi½u cßch nh▀ng cßch nαo c√ng vñp phΣi m╡t s» khuy¬t
≡i¼m riΩng.
Sau khi phΓn tφch k█ cαng vñn ≡½ vα c╤u xΘt cΣ chi½u h▀╛ng phßt tri¼n
c▀╜ng li«u vα nhu li«u trong t▀╜ng lai, nh≤m Viet-Std ≡π giΣi quy¬t bαi toßn
nαy d±a trΩn cσn bΣn tri«t ≡¼ bΣo toαn 96 k² t± ASCII trong v∙ng G0 (mang mπ
s» 32 ≡¬n 127). Quy¬t ≡╕nh nαy ≡▀■c tr∞nh bαy k█ cαng trong bαi Anh Ngµ in
l╒i trong tºp nαy [xem tr. 6--22], nh▀ng trong ph╒m vi lß th╜ nαy c≤ th¼ ≡▀■c
t≤m tít nh▀ sau: bóng cßch bΣo toαn 96 k² t± ASCII ch·ng ta c≤ th¼ s╪ d°ng hÑu
h¬t nhu li«u vα c▀╜ng li«u sΣn xuñt khíp n╜i trΩn th¬ gi╛i mα kh⌠ng phΣi ≡Ñu
t▀ nhΓn l±c vα tαi nguyΩn vαo vi«c ≡i½u ch∩nh hoúc bi¬n ≡▒i cho thφch h■p v╛i
chµ Vi«t Nam, c° th¼ nh▀ b╡ d╕ch (compiler), khi¼n h« (operating system),
khung X (X windows), v.v... Ch∩ c≤ cßch giΣi quy¬t nαy m╛i gi·p ch·ng ta s╪
d°ng ≡▀■c nhµng thαnh quΣ k█ thuºt m╛i mδ nhñt trΩn th¬ gi╛i.
V╛i ph▀╜ng chΓm trΩn, ch·ng t⌠i ≡π thay th¬ 6 ki¼m-t± ASCII trong v∙ng C0
bóng 6 mτu t± Vi«t Nam , , , , vα . Ngoαi ra, d±a trΩn b╡ t±-mπ tiΩu
chuªn 8859/Latin-1 dαnh cho cßc n▀╛c TΓy ┬u, ch·ng t⌠i c√ng quy¬t ≡╕nh duy tr∞
mπ s» cⁿa tñt cΣ chµ Vi«t ≡π c≤ s╟n trong tiΩu chuªn nαy. Tñt cΣ chi ti¬t v½
b╡ t±-mπ 8-bit VISCII ≡▀■c tr∞nh bαy r⌡ rαng trong ch▀╜ng 3 cⁿa bαi Anh ngµ
[xem tr. 8-11] vα bΣn d╕ch Vi«t ngµ [xem tr. 26--29].
Sau ≡Γy ch·ng t⌠i xin s╜ l▀■c v½ Quy-≡╕nh ╨≈c-╨▀■c-Trong-Ngoúc, VIQR. ╨Γy
lα quy luºt vi¬t chµ Vi«t Nam bóng b╡ t±-mπ 7-bit ASCII cⁿa Hoa-K╧. ╨π t╫ lΓu
c╡ng ≡░ng ng▀╢i Vi«t ╖ hΣi ngo╒i th▀╢ng hay s╪ d°ng h∞nh th╤c nαy ≡¼ trao ≡▒i
≡i«n th▀ bóng ti¬ng Vi«t trΩn cßc mßy vi tφnh kh⌠ng c≤ chµ Vi«t Nam. Theo quy
≡╕nh nαy, cßc dñu ≡▀■c ≡ßnh sau cßc nguyΩn Γm; dñu síc, huy½n, h÷i, ngπ, núng
≡▀■c thay th¬ bóng cßc k² hi«u ASCII Hoa-K╧ c≤ d╒ng t▀╜ng t± lα "'", "`", "?",
"~", ".", dñu trσng (dñu ß) ≡▀■c thay bóng "(", dñu m√ bóng "^", dñu m≤c (nh▀
5
trong chµ ┤, ┐) bóng "+", vα chµ ≡ ≡▀■c thay th¬ bóng "dd". Ch╞ng h╒n, hai
cΓu th╜ Ki½u cⁿa c° Nguy¡n Du:
Trσm nσm trong c⌡i ng▀╢i ta
Chµ tαi chµ m«nh khΘo lα ghΘt nhau
s¿ hi«n ra nh▀ sau khi vi¬t theo quy ≡╕nh VIQR:
Tra(m na(m trong co~i ngu+o+`i ta
Chu+~ ta`i chu+~ me^.nh khe'o la` ghe't nhau
Tuy nhiΩn m╡t s» ng▀╢i l╒i thφch d∙ng cßc k² hi«u khßc, ch╞ng h╒n nh▀ "*"
t▀■ng tr▀ng cho dñu m≤c, "<" thay cho dñu trσng, "\" thay cho dñu huy½n,
v.v... Nh▀ng v╛i vai tr≥ lα m╡t tiΩu chuªn, VIQR phΣi ñn ≡╕nh m╡t quy luºt
duy nhñt vα t»i thi¼u ≡¼ lαm c╜ s╖ th»ng nhñt cho vi«c trao ≡▒i nhu li«u,
nghεa lα m▓i dñu Vi«t Nam phΣi ≡▀■c t▀■ng tr▀ng bóng m╡t vα ch∩ m╡t k² t±
ASCII mα th⌠i. V╛i s± s╪ d°ng r╡ng rπi bαn ≡ßnh chµ ASCII trong gi╛i ≡i«n
toßn Vi«t Nam, ch·ng t⌠i quy¬t ≡╕nh ch≈n quy ≡╕nh VIQR d±a trΩn nguyΩn tíc
th±c d°ng lα d¡ ≡≈c vα d¡ nh╛ cho tuy«t ≡╒i ≡a s» ng▀╢i d∙ng. Quy luºt VIQR
≡▀■c m⌠ tΣ chi ti¬t trong ch▀╜ng 4 cⁿa tαi li«u Anh Ngµ vα Vi«t ngµ ≡φnh kΦm.
Ti«n ≡Γy ch·ng t⌠i c√ng xin nhñn m╒nh lα b╡ Vi«t-t±-mπ 8-bit VISCII vα quy
≡╕nh 7-bit VIQR ≡π ≡▀■c ph▒ bi¬n r╡ng rπi trΩn cßc m╒ng th⌠ng tin ≡i«n toßn
(computer network) ╖ Hoa K╧ vα cßc qu»c gia tiΩn ti¬n khßc trΩn toαn th¬ gi╛i.
Cßc chuyΩn viΩn ≡i«n toßn Vi«t Nam ╖ hΣi ngo╒i ≡π d∙ng nhµng tiΩu chuªn nαy
≡¼ vi¬t nhu li«u ╤ng-d°ng (software application) hoúc nhu-li«u d°ng-c°
(software tool) cho khi¼n h« DOS vα Unix. Nhµng nhu li«u nαy ≡π ≡▀■c cΣ ng▀╢i
vi¬t lτn ng▀╢i d∙ng tríc nghi«m trΩn cßc mßy nh▀ PC, AT, 386/486, workstation,
mainframe, v.v... trong m╡t th╢i gian khß lΓu dαi vα th±c t¬ cho thñy tñt cΣ
≡½u vºn hαnh t»t ≡⌐p v╛i dµ li«u Vi«t ngµ. Trong ph╒m vi lß th╜ nαy, ch·ng t⌠i
xin t≤m tít nhµng sΣn phªm sau ≡Γy:
* ch▀╜ng tr∞nh bi¬n ≡▒i dµ li«u ╖ d╒ng 7-bit VIQR sang d╒ng 8-bit VISCII
vα ng▀■c l╒i.
* ╤ng d°ng ≡i½u khi¼n bαn chµ Vi«t trong m⌠i tr▀╢ng Unix, DOS, MS-Windows
3.1. ║ng d°ng nαy cho phΘp ng▀╢i d∙ng ≡ßnh chµ Vi«t trΩn bαn chµ ASCII
Hoa K╧ nh▀ng mαn Σnh s¿ hi«n ra chµ Vi«t Nam.
* ╤ng d°ng mαn Σnh Vi«t ch╒y trong khung X (X windows) hoúc khung MS-DOS
(MS-DOS windows).
* ╤ng d°ng th▀ tφn cho phΘp ≡≈c vα vi¬t th▀ bóng 8-bit VISCII nh▀ng s¿ t±
≡╡ng bi¬n ≡▒i th▀ thαnh d╒ng 7-bit VIQR khi l▀u trµ hoúc g╪i ≡i.
* ╤ng d°ng in dµ li«u 8-bit VISCII trΩn cßc mßy in Laser, ma-trºn-≡i¼m
(dot matrix), hoúc PostScript.
* ╤ng d°ng vi¬t bαi (editor) d∙ng 8-bit VISCII.
* cßc nhu-li«u d°ng-c° vα nhu li«u th▀ vi«n (library) th▀╢ng d∙ng trΩn
khi¼n h« Unix, DOS.
6
* ╤ng d°ng bΣng tφnh (speadsheet) ch╒y trΩn Unix.
* ╤ng d°ng s▀ ph╒m d∙ng ≡¼ ra cΓu ≡» hoúc ≡½ thi tríc nghi«m ti¬ng Vi«t.
* ╤ng d°ng ki¼m soßt l▓i chφnh tΣ.
* ╤ng d°ng tr≥ ch╜i.
* cßc b╡ ph⌠ng chµ ╖ d╒ng ≡i¼m (bitmap fonts), TrueType, PostScript.
Ngoαi ra c≥n nhi½u sΣn phªm nµa kh⌠ng ti«n k¼ ra ≡Γy. Mu»n bi¬t chi ti¬t
xin xem th⌠ng-bßo v½ vi«c phßt hαnh nhu-li«u Vi«t Nam ≡■t 3 [tr. 41]. Hi«n
ch·ng t⌠i ≡ang phΓn c⌠ng thi¬t k¬ thΩm cßc b╡ ph⌠ng chµ (fonts) VISCII ≡¼ s╪
d°ng trΩn cßc mßy in laser. V∞ Viet-Std kh⌠ng c≤ tφnh cßch th▀╜ng m╒i, tñt cΣ
cßc c⌠ng tr∞nh nghiΩn c╤u vα sΣn phªm nhu li«u cⁿa nh≤m ≡½u ≡▀■c ph▒ bi¬n mi¡n
phφ. M╡t s» c⌠ng ty nhu li«u vα t▒ ch╤c chuyΩn gia Vi«t Nam ╖ Hoa K╧ ≡π tuyΩn
b» ⁿng h╡ b╡ t±-mπ 8-bit VISCII vα quy-≡╕nh 7-bit VIQR nh▀ VNU, TIèN, H╡i
ChuyΩn Gia Vi«t Nam, v.v...
B╡ t±-mπ VISCII vα quy ≡╕nh VIQR lα c⌠ng tr∞nh nghiΩn c╤u cⁿa ≡⌠ng ≡Σo
chuyΩn viΩn Vi«t Nam ╖ hΣi ngo╒i thu╡c nhi½u lπnh v±c khßc nhau vα ≡π trΣi qua
quß tr∞nh th╪ nghi«m khß lΓu dαi. Ch·ng t⌠i s╟n sαng g╪i ≡¬n quφ v╕ m╡t s»
nhu li«u cÑn thi¬t ≡¼ quφ v╕ c≤ th¼ t± m∞nh ch╒y vα quan sßt ▀u ≡i¼m cⁿa
VISCII vα VIQR m╡t cßch c° th¼. Xin quφ v╕ nghiΩn c╤u k█ l▀▐ng tαi li«u ≡φnh
kΦm vα dαnh cho ch·ng t⌠i s± ⁿng h╡ cÑn thi¬t ≡¼ b╡ Vi«t-t±-mπ 8-bit VISCII vα
Quy ≡╕nh 7-bit VIQR tr╖ thαnh tiΩu chuªn chφnh th╤c cho chµ Vi«t Nam.
TrΓn tr≈ng kφnh chαo quφ v╕ vα mong nhºn ≡▀■c ² ki¬n cⁿa quφ v╕ trong th╢i
gian ngín nhñt.
Nh≤m NghiΩn C╤u TiΩu Chuªn Ti¬ng Vi«t
California, USA
September 1992
A UNIFIED FRAMEWORK FOR VIETNAMESE INFORMATION PROCESSING
Vietnamese Standardization Working Group {1}
September 1992 {2}
Abstract
Increasing demand for Vietnamese electronic information processing has seen
answer in a wide array of Vietnamese-capable applications. The inevitable
need for integration of Vietnamese into existing environments and the exchange
of data among them point to the necessity of standardization. This paper
presents the strategic and pragmatic technical considerations that must go
into such a standard, and reviews existing conventions/proposals in these
important contexts. A full description of the Viet-Std proposal is presented,
including 1) an 8-bit, fully precomposed encoding table for Vietnamese
Standard Code for Information Interchange (known as VISCII), 2) a 7-bit
Vietnamese Quoted-Readable (known as VIQR) standard for data interchange over
7-bit channels, with a seamless interface to the 8-bit encoding, and 3) a
keyboard user-interface specification that works transparently with both 1 and
2. Together, these provide a truly unified framework for a Vietnamese
information processing environment with simplicity, efficiency, and
straightforward integration. The real-world construction of this framework
has proven quite successful in an array of compliant applications from a
number of group and individual developers across a number of platforms,
including Unix and its variants, the X window system, MS-DOS, Windows, and
with ongoing work elsewhere.
1. INTRODUCTION
~~~~~~~~~~~~~~~
With the growing Vietnamese population abroad and the proliferation of
computer usage within Viet Nam, the Vietnamese language has seen rapidly
increasing representation in electronic information processing. The
concomitant growth in demand for Vietnamese-capable software has resulted in
successful launches of myriad vendors in the U.S. and elsewhere, mainly in the
area of Vietnamese word processing. In addition, individual and group efforts
have also been productive in providing Vietnamese-language users with high-
quality public-domain applications. In Viet Nam, centers such as the
Institute of Informatics [1] have reported impressive progress on many fronts,
among which is the Vietnamization of standard software packages.
------------------------------------------------------------------------------
{1} Postal address: Viet-Std, 1212 Somerset Dr., San Jose, California
95132, USA. E-mail address: Viet-Std@Haydn.Stanford.EDU
{2} Reprinted December 3, 1992. This version 1.1 supersedes version 1.0 of
January 1992. The only significant difference is the exchange of the
positions of ╒ (a dot-below) and á (O tilde) in the 8-bit table.
7
8
All of the above illustrate two important points: 1) There are growing
market demands for Vietnamese-capable processing engines, and 2) There is no
shortage of technical talent to fulfill those demands. Unfortunately, therein
lies a large problem: most existing Vietnamese applications have been designed
to operate in the exclusive framework or environment of the developer, and all
are incompatible with one another. As long as this trend continues, the
application base for Vietnamese can never keep reasonable pace with demand.
Users want to do more with Vietnamese than mere word processing, and to expect
one single vendor to provide all potential applications across all platforms
is to dream the impossible. Technicians providing these applications are
limited to the Vietnamese tools they must themselves learn and develop from
the ground up. Standardization is necessary. Anyone who has had to deal with
the incompatibility between ASCII and EBCDIC can try to imagine a world where
every machine is using a different character set, and appreciate how limited
that world would be in its application base and how cumbersome in its data
interchange. A uniform framework will greatly benefit both the user and the
technician alike.
The proposal for any Vietnamese data standardization must take several
important points in the proper contexts. First and foremost, since this
discussion is geared toward existing 7- and 8-bit environments, the prime goal
is straightforward and direct integration onto current platforms. The
standard must work here and now. This implies the use of precomposed
Vietnamese characters, because the handling of floating diacritics will never
see full or simple support outside of specific contexts. The standard must be
designed so as to take advantage of existing applications as much as possible.
The familiar "don't reinvent the wheel" rule is not only an advantage---but a
necessity---if a meaningful application base is to be established in any
reasonable length of time. Furthermore, it is known that overall efficiency
both in time and space is greater in processing precomposed character units
when compared with the floating-diacritic approach [2]. Floating diacritics
therefore must be limited to only where they are necessary and inevitable,
such as in keyboard entry or 7-bit data transmission. There is no reason to
require that all applications must deal with the complexities and
inefficiencies of floating diacritics, for example, in 8-bit data processing,
storage, transmission, screen rendering, or printing.
The second major context points to the pragmatic and vital consideration
of existing precedents set in the Vietnamese software base. Standardization
necessarily requires adaptation, but it makes little sense to propose to
change the world so significantly that the inertia against large changes
greatly delays adoption of the standard. The trend towards 16-bit and wider
data standards for multinational character sets has gained momentum with the
recent works of Unicode [3] and ISO 10646 [4]. However, the need for an 8-bit
Vietnamese standard is irreplaceable until these new standards are fully
supported and completely dominate the computing world. An 8-bit Vietnamese
standard must not ignore existing software precedents so that it can gain
speedy acceptance before it becomes obsolete.
Thirdly, the standard must address the issue of user interface; if not
defining it, then at least consider its possible effects on the end-user.
This relates primarily to the 7-bit keyboarding and representation of
Vietnamese---in both instances diacritics are necessarily floating, and
9
represented mnemonically by existing 7-bit characters with similar appearance.
With keyboarding, one must preserve where possible existing practices such as
that defined for the Viet-Net mailing list and the Usenet newsgroup
Soc.Culture.Vietnamese, both with members worldwide. For 7-bit readable
representation, the keyword is "readable." The goals here are to maintain a
short learning time and to promote a uniform interface so that it is not
necessary for a user to re-learn the particulars of every software
installation before being able to use it effectively.
Finally, to every extent possible, the standard must stay within the
framework of international standards, e.g., ISO-8859/x [5], in order to ensure
compatibility with existing environments. For example, this goal means
preservation of the ASCII encoding. It should extend also to the encoding
into the same 8859/Latin-1 slots those Vietnamese characters that are already
defined, thus ensuring that 8859/Latin-1 keyboards will work transparently for
those Vietnamese characters. However, there are many standards requirements
that are obsolete from a practical viewpoint. For example, in recent
Unicode/ISO-10646 decisions, the prohibition from use of the available control
character space---those with encodings between xx00h and xx1Fh, except for C0
itself---was discarded on the grounds that it was a waste of encoding space.
As will be discussed later, the encoding of Vietnamese into the existing 8-bit
space presents some well-known trade-offs. Where trade-offs are made, they
must be justified with good reason---pragmatic preferred over theoretical.
These primary requirements are summarized as follows:
R1. Straightforward and direct integration into existing platforms.
R2. Ease of adaptation for existing software.
R3. User-friendly mnemonic encoding scheme and interface.
R4. Adherence to international standards.
R5. Trade-offs made only on practical usage considerations and with good
reason.
In the following section we present a brief review of the strengths and
weaknesses of different approaches to Vietnamese encoding. Section 3 will
describe the proposed 8-bit encoding table in detail. A quoted-readable
encoding scheme encompassing 7-bit data streams, including electronic mail and
keyboard input, is presented in Section 4. Finally, Section 5 outlines the
particular rules and conventions relevant in some application-specific
contexts.
2. REVIEW OF CURRENT CONVENTIONS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A review of current conventions used by software vendors reveals one distinct
feature: virtually all realize the strengths of a precomposed encoding and
adopt it as a primary requirement. The complications arise from a familiar
fact: apart from the alphabetics already available in the ASCII standard,
10
Vietnamese requires an additional 134 unique characters. Of these, 128 can be
coded in the C1 and G1 areas. The allocation of the remaining 6 characters in
the lower C0 and G0 space is handled with differing approaches:
A1. Encode into 6 of the "least-used" G0 characters in the context of
Vietnamese data processing.
A2. Encode into 6 slots of the National Replacement Character {3} (NRC)
set.
A3. Drop 6 of the "least-used" {4} Vietnamese characters, typically
accented capitals such as , , , , , and .
A4. Map accented "y" combinations into corresponding "i" combinations,
e.g., "k█ s▀" is replaced with "kε s▀."
A5. Encode into the ASCII control space C0.
Approaches A1 and A2 both satisfy the typical needs of the word processing
environments in which rarely used ASCII characters can be avoided, or employed
by font shifting. However they both eliminate prospects for integration of
Vietnamese into existing ASCII environments where all graphic characters in G0
are needed. A character that already serves one purpose cannot be re-used for
another. First, it makes rendering of the needed G0 character incorrect, as
it would now look like a Vietnamese character. The frequency of use of G0
characters in an integrated environment is far too high for this conflict to
be tolerable. Second, while font shifting may be employed to remedy this in
some situations, a more serious problem occurs when the Vietnamese character
is needed. The environment would typically have assigned some specific
meaning to the G0 character, particularly with those in the NRC set.
Consider, for example, using the backslash character "\" for a Vietnamese
character under Unix. The backslash is used for many escape mechanisms under
Unix so that the Vietnamese character cannot simply be used but must be
escaped in one way or another. This is more than just an inconvenience; it
means data interchange is complicated by the fact that the escape mechanism
will not be understood on another platform, and data integrity has thus not
been preserved. A standard employing this approach fails at its basic
mission: to provide cross-platform transparency. A similar case can be made
for the other G0 characters.
Both A3 and A4 propose to limit Vietnamese language data in one way or
another. Most agree that elimination of some Vietnamese characters are simply
unacceptable; indeed, this point is so fundamental that we have in the
foregoing chosen to assume it as a technical requirement without elaboration.
However, it must be said that A4 is not a proposal without rationale. A
school of thought exists that believes y's existing in words as a single vowel
------------------------------------------------------------------------------
{3} This set contains 12 country-specific characters at code positions
corresponding to ASCII characters #, $, @, [, \, ], ^, `, {, |, }, ~.
{4} Least-used because they (a) rarely begin words and therefore do not
often get capitalized, and (b) appear in fewer words.
11
should be mapped to corresponding i's, as their pronunciations are indeed
identical. The concept dates as far back as 1948 [6, 7]. However, it is not
the function of an encoding standard to settle a linguistic issue, and hence
A4 is also a bad choice.
The immediate objection to A5 is primarily in data communication channels
where many C0 characters are used as data control. In addition, it also
presents problems for integration into environments where some C0 characters
are used in the keyboard interface and in data format controls, similar to the
problem facing A1 and A2. However, as will be discussed further, judicious
choice of the 6 C0 characters to be used has in practice been shown
successfully to avoid characters that are significant in data communication.
Furthermore, most data channels provide for clean transfer of binary data, and
there is no reason to worry that arbitrary data bits cannot be employed over
these binary routes.
With those particular cases where C0 is used in the keyboard interface,
judicious choice as well as remapping of keys can minimize conflict. Data
format control is application-specific but is typically scattered in C0 and
C1. It is therefore a universal problem for integration because C1 is
necessarily densely encoded, but, again, conflict can be avoided by studying
significant applications. Finally, the choice can be made for 6 least-used
Vietnamese characters so that the probability of conflict is greatly reduced.
It should be noted here that the foregoing discussion has subjected the
alternatives to the requirements of integration into existing applications and
platforms, as outlined in Section 1. The importance of this goal cannot be
overstated, and it does present complications that result in the following
Pragmatism Principle: it is obviously impossible to define a standard that
would operate seamlessly with all existing applications, therefore pragmatic
considerations must be made to make a standard workable in as many important
applications and on as many platforms as possible, with emphasis on the word
"workable."
3. VISCII: 8-BIT ENCODING SPECIFICATION FOR VIETNAMESE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3.1. MOTIVATION
~~~~~~~~~~~~~~~
The available body of evidence shows that alternative A5 described in the
previous section, encoding into 6 of the C0 characters, has the greatest
chance of success in fulfilling the requirements outlined in Section 1. The
choice of the 6 C0 codes and the 6 least-used Vietnamese capital letters to
encode, when made carefully, greatly reduces the probability of conflict for
all practical purposes. Concerns regarding data communications are well
addressed by avoiding C0 codes that are in fact often used for data control.
Indeed, data communication concerns are more applicable to C1 and G1 encoding;
a prominent example is electronic mail transfer through 7-bit gateways and
mail agents. Communication failure here has in most cases been due to the use
of the eighth bit and not because of C0 encoding. In any event, the option
exists for data to be sent in some "binary" mode, or to employ the Vietnamese
12
Table 1: A sampler of possible C0 usage conflicts. Codes selected
for this standard proposal are noted with a "+".
======================================================================
CODE COMM. CTRL- GENERAL PRINTER(PC) PC Unix vi (Unix)
----------------------------------------------------------------------
0 NUL @ C string strings
1 SOH A
+2 STX B back screen
3 ETX C INTR INTR INTR INTR
4 EOT D EOF EOF back tab
+5 ENQ E
+6 ACK F forw.screen
7 BEL G BEL BEL BEL BEL
8 BS H BS BS BS BS BS
9 HT I HT HT HT HT HT
10 LF J LF LF LF LF LF
11 VT K VT
12 FF L FF FF FF redraw
13 CR M CR CR CR CR CR
14 SO N wide on(IBM)
15 SI O comp.on(IBM)
16 DLE P Prt.on/off
17 DC1 Q XOFF XOFF XOFF XOFF
18 DC2 R comp.off(IBM) retype
19 DC3 S XON XON XON XON
+20 DC4 T wide off(IBM) forw.tab
21 NAK U clr buf(IBM) kill kill
22 SYN V literal literal
23 ETB W werase werase
24 CAN X kill
+25 EM Y suspend
26 SUB Z EOF suspend
27 ESC [ ESC ESC sequence ESC ESC ESC
28 FS \ quit
29 GS ] Telnet ESC
+30 RS ^
31 US _ Windows
=====================================================================
Table 2: Vietnamese-specific characters already present in 8859/Latin-1.
+===============================================================+
| 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : A : B : C : D : E : F |
+----+---------------------------------------------------------------+
| Cx | └ : ┴ : ┬ : ├ : : : : : ╚ : ╔ : ╩ : : ╠ : ═ : : : |
| Dx | ╨ : : ╥ : ╙ : ╘ : á : : : : ┘ : ┌ : : : ▌ : : : |
| Ex | α : ß : Γ : π : : : : : Φ : Θ : Ω : : ∞ : φ : : : |
| Fx | ≡ : : ≥ : ≤ : ⌠ : ⌡ : : : : ∙ : · : : : ² : : : |
+====================================================================+
13
Quoted-Readable format to be described in Section 4.
The overwhelming advantage of this approach is that it is readily and
easily integrated into existing environments without many of the problems
plaguing the other alternatives, if they can at all be integrated. As a
testimony to the approach's successful application, this document itself was
prepared using the TeX system under Unix. The text source was edited in an
8-bit X terminal window using a minimally modified version of Elvis {5}, a
public-domain 8-bit version of Unix's Vi text editor. Both TeX (a document
preparation system) and Dvi2ps (a PostScript generator) readily accepted and
processed Vietnamese (8-bit) data transparently.
Many other applications including a spreadsheet, various text viewers,
PostScript and dot-matrix printing, DOS's WordPerfect, Word, PC Tools, etc.,
have been tested and seen to operate well with Vietnamese text.
Modifications, if any, were primarily in making these applications accept
8-bit data. An educational teaching tool for Vietnamese has also been
produced using the C programming language with 8-bit Vietnamese strings
embedded in the source code. With increasing system internationalization,
applications and tools are being made 8-bit "clean," further facilitating
integration of this Vietnamese encoding.
3.2. ENCODING RATIONALE
~~~~~~~~~~~~~~~~~~~~~~~
A basic requirement is to preserve the 7-bit ASCII graphic characters (G0)
layout, since the emphasis is on integration. G0 was therefore left
unchanged. For the 6 C0 characters, we first lay out the code space and
consider typical usage, a sampler of which is in Table 1. The codes selected,
STX (2), ENQ (5), ACK (6), DC4 (20), EM (25), and RS (30) present the least
possible problems with data communication and significant applications
considered. The use of ACK, for example, is actually context-dependent. In
those protocols we have reviewed, it is only considered a "control" character
outside of a data frame; within a data frame it is transfered without special
interpretation. To reduce the probability of conflict even further, the 6
least-often used Vietnamese capital letters, , , , , , and , are encoded
into these slots.
The remaining task is to encode the other 128 Vietnamese characters into
the extended ASCII space (C1 and G1). Since no unique international encoding
standard exists in this region, the philosophy is to be as much conservative
as possible so that in the worst case the user can still use all of the lower
case Vietnamese letters.
The encoding of C1 is less troublesome, although in application-specific
contexts it has been found that some C1 characters are employed with special
meanings. A review of ongoing work on 8-bit mail transport standardization
indicates that C1 characters will be fully supported as graphic characters
without special interpretation. Nevertheless it is prudent to encode only
------------------------------------------------------------------------------
{5} The modifications provided the keyboard interface described in later
sections.
14
upper-case characters into the C1 space.
For G1, the aim is to accommodate the popular PC character set (code page
850) and to adhere, if possible, to the 8859/Latin-1 mapping where Vietnamese-
specific characters are already encoded.
Experience in development of this encoding on the MS-DOS platform
motivates the consideration of line-drawing glyphs in the PC character set.
In many situations where both Vietnamese and line-drawing characters are
desirable but font switching is impossible, the best we can do is to preserve
all the lower case Vietnamese characters and all the single- and double-line
drawing characters. This means that code positions occupied by single- and
double-line drawing characters must be populated with upper case letters.
With this provision, the MS-DOS user can be supplied with either code pages
containing all Vietnamese glyphs or code pages where a number of upper case
Vietnamese characters are replaced by PC line-drawing characters. For
existing applications, the user can choose the code page most appropriate for
her purpose. Where the code page with line-drawing characters must be used,
the penalty from missing Vietnamese characters has been minimized by the
choice of the infrequently used ones. For new applications, code page
switching can easily be done on the fly, if it is desired.
Compatibility with the 8859/Latin-1 standard is merely for user
friendliness and is not mandatory. It is natural and reasonable for a user in
France to expect that the same keystrokes producing Θ on the screen for French
will do the same for Vietnamese. The motivation for this compatibility is the
predominant and increasing availability of 8859/Latin-1 keyboards and font
sets, e.g., Digital's VT-terminal series, Xterm keymaps, and Microsoft's
Windows. Table 2 lists the subset of 8859/Latin-1 characters in G1 that are
also Vietnamese {6}. It can be concluded that all 8859/Latin-1 text that
contains characters mostly from G0 (ASCII) and this table, French text for
example, is highly readable in the Vietnamese environment.
Finally, certain characters in G1 are not renderable in a number of
applications such as character codes 160 (non-breaking space character in
8859/Latin-1), 202 (non-breaking space character on Macintosh), or 255. The
list of potentially non-graphic characters in C1 and G1 can be quite large:
nearly 30 characters in MS Windows 3.0 and roughly 25 characters in MS Windows
3.1. These positions must be populated with upper case characters in
consistence with the above philosophy. In applications where font switching
is allowed and upper case characters are blocked out, a solution is to supply
fonts in pair: a normal font and a capital font. In the capital font all the
positions that should be filled with lower case characters are actually filled
with the corresponding upper case. When a capital letter in the normal font
cannot be rendered, the user simply switches to the corresponding capital font
and types in the corresponding lower case character.
With the above guidelines, the task is then to lay out the remaining
------------------------------------------------------------------------------
{6} Note that the <≡> in Table 2 is actually a similar-looking Icelandic
"edh" in 8859/Latin-1; the Vietnamese rendering form is better reflected in
8859/Latin-2.
15
Table 3: VISCII 8-bit Encoding Standard Proposal for Vietnamese
+======================================================================+
| || 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : A : B : C : D : E : F |
+======================================================================+
| 0x || NUL:SOH: :ETX:EOT: : :BEL:BS :HT :LF :VT :FF :CR :SO :SI |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 1x || DLE:DC1:DC2:DC3: :NAK:SYN:ETB:CAN: :SUB:ESC:FS :GS : :US |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 2x || SP : ! : " : # : $ : % : & : ' : ( : ) : * : + : , : - : . : / |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 3x || 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : : : ; : < : = : > : ? |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 4x || @ : A : B : C : D : E : F : G : H : I : J : K : L : M : N : O |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 5x || P : Q : R : S : T : U : V : W : X : Y : Z : [ : \ : ] : ^ : _ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 6x || ` : a : b : c : d : e : f : g : h : i : j : k : l : m : n : o |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 7x || p : q : r : s : t : u : v : w : x : y : z : { : | : } : ~ :DEL|
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 8x || Ç : ü : é : â : ä : à : å : ç : ê : ë : è : ï : î : ì : Ä : Å |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 9x || É : æ : Æ : ô : ö : ò : û : ù : ÿ : Ö : Ü : ¢ : £ : ¥ : ₧ : ƒ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Ax || á : í : ó : ú : ñ : Ñ : ª : º : ¿ : ⌐ : ¬ : ½ : ¼ : ¡ : « : » |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Bx || ░ : ▒ : ▓ : │ : ┤ : ╡ : ╢ : ╖ : ╕ : ╣ : ║ : ╗ : ╝ : ╜ : ╛ : ┐ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Cx || └ : ┴ : ┬ : ├ : ─ : ┼ : ╞ : ╟ : ╚ : ╔ : ╩ : ╦ : ╠ : ═ : ╬ : ╧ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Dx || ╨ : ╤ : ╥ : ╙ : ╘ : ╒ : ╓ : ╫ : ╪ : ┘ : ┌ : █ : ▄ : ▌ : ▐ : ▀ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Ex || α : ß : Γ : π : Σ : σ : µ : τ : Φ : Θ : Ω : δ : ∞ : φ : ε : ∩ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Fx || ≡ : ± : ≥ : ≤ : ⌠ : ⌡ : ÷ : ≈ : ° : ∙ : · : √ : ⁿ : ² : ■ : |
+======================================================================+
Vietnamese characters in some fashion, perhaps even arbitrary. This has been
done in such a way so as to provide some degree of symmetry simply for
aesthetics. It turns out that all the above guidelines can be adhered to
except for compatibility with the letter á (O tilde) in 8859/Latin-1. Note
that the Vietnamese collating order cannot in any case be preserved, but this
is not a major issue since collation for non-ASCII characters is well accepted
to be a table-lookup problem.
The preceding guidelines have resulted in the VISCII 8-bit Vietnamese
encoding proposal listed in Table 3. It is intended to be a single table that
applies to Vietnamese data handling including storage, processing,
transmission, and font encoding. This greatly simplifies the integration,
implementation, and usage processes and is indeed one of the major strengths
of the proposal.
16
4. VIQR: MNEMONIC ENCODING SPECIFICATION FOR VIETNAMESE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4.1 MOTIVATION
~~~~~~~~~~~~~~~
While the 8-bit specification attempts to standardize Vietnamese encoding in
8-bit environments, much remains to be addressed in important 7-bit
environments such as electronic mail transport and other 7-bit data lines, as
well as in keyboard entry applications where the interface for generating
Vietnamese characters needs to be standardized.
Transporting more than 128 unique symbols over 7-bit data channels is not
a problem specific to the Vietnamese language. Since its proposal in 1982,
the Internet Simple Mail Transfer Protocol ("SMTP", [8]) has seen unrelenting
efforts to extend it to accommodate 8-bit and wider-word data in European
Latin scripts and Oriental ideographic characters (see, e.g., [9]). While
clean 8-bit transport is highly desirable, all mail gateways are not going to
be converted overnight. For the foreseeable future there is a need for
unambiguous transport of Vietnamese text over existing 7-bit channels.
Indeed there is an ad-hoc standard in use on the Viet-Net mailing list and
the Usenet newsgroup Soc.Culture.Vietnamese, where mnemonic use of appropriate
characters to follow a vowel proves to be quite readable; for example, "Vi«t
Nam" would be written as "Vie^.t Nam". However, this is troubled by the
ambiguity in the multiple roles played by the mnemonic diacritical marks; for
example, does "tha?" mean "tha?" or "thΣ"?
The Viet-Net convention is not far in concept from a quoted-readable
format proposed by K. Simonsen [10, 11] which disambiguates such texts by
specifying text states at both the character and character set levels.
Unfortunately, in its attempt to provide a universal solution to mnemonic
encoding, the proposal does not provide a good answer for Vietnamese text.
First, it restricts the use of mnemonics to the 83 invariant ISO-646 [12]
graphic characters, which is a good idea in principle, but sacrifices
readability in the process. For example, the counter-intuitive mnemonics for
hook-above (dau hoi) and tilde (dau nga) are "2" and "?", respectively, in
order to avoid "~" itself, which is not an invariant. The wide availability
of ASCII keyboards to the great majority of Vietnamese users makes this too
unreasonable a limitation in the context of Vietnamese processing. It should
be noted that we are in fact arguing in favor of "readability for most"
against "illegibility for all." Furthermore, with on progress on keyboard
and display internationalization, e.g., in graphical window environments where
keyboard mapping and font switching are easily implemented, this availability
is on the increase, further obsoleting the restriction.
The greater difficulty is that the two-character fixed-length encoding {7}
cannot provide a readable or mnemonic representation of all Vietnamese
------------------------------------------------------------------------------
{7} The convention is "&xy" where x is a literal character and y
represents some combining form.
17
characters, in particular those with 2 diacritical marks. The variable-length
mnemonics {8} have been extended to include all Vietnamese characters, but
this scheme is so cluttered with announcers and delimiters that readability
and efficiency are near nil, keeping in mind that diacritics are heavily used
in Vietnamese. While machine data translators will have little trouble with
any "mnemonic" scheme, one that is directly accessible to human users, who are
in many cases typing mail messages using 7-bit editors, needs to be more user-
friendly. A Vietnamese user will not want to learn or remember among all
possible combinations that, say, "a5" stands for "í", nor will she like typing
sequences as long as "&_a('_" for some letter in every word.
To satisfy the readability and flexibility requirements, a separate
specification is necessary. It is better to adopt an approach like code-page
switching under ISO-2022 [13] to switch the text into "Vietnamese" mode and
optimize encoding according to the language state. Recently, van der Poel put
forth a mnemonic proposal [14] which emphasizes language-specific conventions
for these rea This proposal provides a means to specify the language state,
each with its own (efficient) encoding method. Its strength lies in the
flexible specification that conformant implementations "need not be able to
display all of the character sets specified"; they have the option of stating
messages such as "undisplayable Greek appeared here" for unsupported languages
(for a more precise specification, see [14]). This allows networking
communities to determine the best approach for encoding their own languages.
The VIQR convention is compatible with this approach and should easily be
incorporated into this framework.
The specification here encompasses all data streams including text
transfer, file I/O, and keyboard entry. This principle has been the major
reason for success in operating systems such as Unix, in which device-specific
details are hidden as much as possible from the applications programmer,
leaving a uniform interface above which tools such as common library routines
can be shared. Indeed as the keyboard example above has implied, the
characters actually typed by the user are often not different from the text
data that is eventually stored or transmitted. It is therefore desirable to
provide a common base on which to build data interpreters for all data
streams, independent of the input source. In actual implementation, this has
greatly facilitated development of the Vietnamese-capable software base.
In addition, the user stands to benefit tremendously from standardization
of keyboard entry. One does not need to learn a different keyboard entry
technique for each different Vietnamese application. If one standard keyboard
model is fully supported by all Vietnamese software, a user familiar with the
standard can sit down and start typing Vietnamese immediately. This standard
defines the minimum expected behavior from compliant software; any additional
input techniques can of course be incorporated as a superset of the standard
behavior. This is discussed further in Section 5.2 on Vietnamese keyboarding.
------------------------------------------------------------------------------
{8} The convention is "&_xxxx_" where xxxx can be an arbitrary mnemonic
sequence.
18
4.2. QUOTED-READABLE SPECIFICATION (VIQR)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The mnemonic model from Viet-Net is fully employed in the specification. The
Vietnamese QR comprises three major states: Literal, English, and Vietnamese.
The Literal state is intended for completely transparent handling of literal
data (except of course for the escape sequences into and out of Literal
state). The English and Vietnamese states are designed for mixed use of
English and Vietnamese, with each optimized in appearance as well as data size
for texts containing mostly English and Vietnamese, respectively. In either
state there exist methods for composing Vietnamese-specific characters, using
a base vowel followed by one or two diacritics.
We first introduce the concept of implicit and explicit composition, then
discuss how they are used in each of the states.
4.2.1. Implicit Composition
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Implicit composition is useful for data containing a large percentage of
Vietnamese characters.
With implicit composition, a sequence of a base vowel followed by one or
two diacritical marks is combined into one Vietnamese letter as long as it is
grammatically legal. This is best illustrated by examples:
a^ ---> Γ
o+? ---> ╖
╜? ---> ╖
Vie^.t ---> Vi«t
ViΩ.t ---> Vi«t
la'^n ---> lß^n (kh⌠ng phΣi lñn)
lß^n ---> lß^n (kh⌠ng phΣi lñn)
Note in the last two example that the sequence "a^'" is not grammatically
equivalent to "a'^" or "ß^". In general a modifier ("(", "^", "+") must
immediately follow the appropriate vowel in order to be combined.
The special sequence dd is composed into ≡ ; DD, dD, and Dd all represent
╨.
The base vowels are: a, σ, Γ, e, Ω, i, o, ⌠, ╜, u, ▀, y, and their
corresponding capitals. The encoding values are those listed in Table 3, the
8-bit VISCII proposed standard.
The diacritical marks are represented by ASCII characters having
correspondingly similar appearances. Table 4 lists the 7 ASCII characters
used as mnemonic replacements for the Vietnamese diacritics; the first three
are modifiers, and the remaining five are tone marks.
19
Table 4: ASCII Mnemonics for Vietnamese Diacritics
================================================================
Diacritic Char ASCII Code Dñu Example
----------------------------------------------------------------
breve ( 0x28, left paren trσng ba(n khoa(n
circumflex ^ 0x5E, caret m√ ho^m nay
horn + 0x2B, plus sign m≤c Qui Nho+n
acute ' 0x27, apostrophe síc La'i Thie^u
grave ` 0x60, backquote huy½n Bi`nh Du+o+ng
hook above ? 0x3F, question h÷i Thu? DDu+'c
tilde ~ 0x7E, tilde ngπ di~ va~ng
dot below . 0x2E, period núng ho.c ta^.p
================================================================
4.2.2. Explicit Composition
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Explicit composition is associated with the concept of a leading character
which explicitly announces the composition. The announcer character is the
backslash ("\", ASCII 0x5C), known here as <COM> {*}. The subsequent combining
characters are defined in the same way as those in implicit composition. Thus
the examples given above would appear in explicit composition mode as:
a^ ---> Γ
o+? ---> ╖
Vie^.t ---> Vi«t
Explicit composition is useful for data containing mainly English text, as
well as for maintaining real-time compatibility with keyboard character
events, as will be discussed in Section 5.2 on Vietnamese keyboarding.
With the composition methods described, we are now ready to discuss how
they are employed in each of the three states. The state of the data stream
is specified by the two character sequence <COM>x , where "x" is specified
below.
4.2.3. Literal State
~~~~~~~~~~~~~~~~~~~~
The appearance of <COM>L or <COM>l in the data stream initiates the Literal
state. This state is intended for near-perfect transparent literal data
transfer. Neither implicit nor explicit composition is available here, nor is
the <COM> character special, except when it is followed by one of the six
------------------------------------------------------------------------------
{*} In cases of ambiguity the notation <...> will be used to indicate that
the whole sequence <...> represents a byte in memory or storage; i.e., it
corresponds to one and only one character. For instance, the word TÅT can
also be written as <T><Å><T>.
20
characters l, L, v, V, m or M which initiates one of the three states {9}.
4.2.4. English State
~~~~~~~~~~~~~~~~~~~~
The sequence <COM>M or <COM>m sets the data stream state to English. In
English state, only explicit composition is supported. This means that in
order to generate a Vietnamese letter, the announcer character <COM> must be
used. A "composition" sequence not preceded by <COM> will be left
uninterpreted. Examples:
\mD\u~ng, how are you? ---> D√ng, how are you?
\mKho\e? kh\o^ng? ---> Khoδ kh⌠ng?
As noted, the sequence "you?" above was not converted into "yoⁿ" because
no composition was specified.
4.2.5. Vietnamese State
~~~~~~~~~~~~~~~~~~~~~~~
The data stream state is set to Vietnamese when the sequence <COM>V or <COM>v
is encountered. In Vietnamese mode, both explicit and implicit compositions
are in effect. The following examples assume that the data stream is
initially in English state:
\vChu+~ Vie^.t ---> Chµ Vi«t
\vCh\u+~ Vi\e^.t ---> Chµ Vi«t
Chu+~ \vVie^.t ---> Chu+~ Vi«t
The availability of implicit composition in Vietnamese state ensures that
the text is not cluttered with unnecessary <COM>s, as would be the case in
Vietnamese text using explicit composition. Explicit composition is included
to maintain compatibility with the English state so that there is no need to
define additional meanings for the <COM> sequences. Also, the real-time
keyboard compatibility mentioned previously is also available in Vietnamese
state through explicit composition.
4.2.6. Character Literals in English and Vietnamese States
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider the following example:
\vDu~ng, how are you? ---> D√ng, how are yoⁿ
In this example, the sequence "you?" was interpreted as "yoⁿ" because the
data stream was still in Vietnamese state. Thus it is sometimes desirable to
suppress composition altogether without having to switch states. The literal
property of the <COM> character conveniently accomplishes this. In either
------------------------------------------------------------------------------
{9} To effect <COM>L, <COM>M, and <COM>V themselves, it is necessary to
switch to either English or Vietnamese state and use the Character Literal
feature available there.
21
Vietnamese or English state, whenever <COM> is followed by a non-combining
character c the result is the literal character c itself. The <COM> is
discarded from the data stream. To get the <COM> character literally, use
<COM><COM>. Consider the following examples:
\vddi dda^u? ---> ≡i ≡Γⁿ
\vddi dda^u\? ---> ≡i ≡Γu?
\m\ddi v\o^? ---> ≡i v▒
\m\ddi v\o^\? ---> ≡i v⌠?
\\ ---> \
\\V ---> \V
\\M ---> \M
\\L ---> \L
4.2.7. Closure
~~~~~~~~~~~~~~
The data stream supports another special character used to generate explicit
closure. The closure character is CTRL-A (ASCII 0x01), known here as <CLS>.
When <CLS> is encountered in the data stream, it immediately terminates any
ongoing composition sequence. The <CLS> itself is always discarded, unless it
appears in the literal sequence <COM><CLS>.
Explicit closure is useful in real-time character applications such as
keyboard entry, when it is necessary to specify that a composition sequence
has in fact ended and the input engine should not stay hanging and wait for
more data.
5. SPECIFIC APPLICATIONS
~~~~~~~~~~~~~~~~~~~~~~~~
This section outlines application-specific guidelines and conventions that
have evolved in the software development community. It is intended to be a
live and growing documentation of such discussions as more experience is
gathered. Readers are welcome to participate in these discussions and
contribute to the development of these guidelines in particular, and to the
standards in general.
5.1 ELECTRONIC MAIL OVER 7-BIT CHANNELS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Many of the available channels for electronic mail currently still enforce the
7-bit limitation. The 8-bit character set defined in Section 3 cannot be
transported verbatim over these channels. VIQR plays an important role here,
as it provides for 7-bit transport of Vietnamese text without the ambiguity
problem of deciding what to do with the double usage of a
diacritical/punctuation mark, e.g., the hook-above or question mark, "?".
Because of the 7-bit nature of these communications channels, mail agents will
typically not encounter those Vietnamese-specific base vowels that are encoded
in the G1 area, namely: σ, ┼, Γ, ┬, Ω, ╩, ⌠, ╘, ╜, ┤, ▀, and ┐. However, mail
agents designed to work with 8-bit channels are still expected to handle the
occurrence of these characters according to the complete VIQR, namely to
22
combine base vowels and diacritical marks as appropriate, for example:
σ' ---> í
In order to be correctly interpreted, electronic mail messages must
explicitly set the language state either in the headers or text body. One
cannot assume what state the receiving input engine is in at the start of the
message, since messages are not always read in message units, e.g., when a
file containing multiple mail messages is scanned.
Furthermore, if a language state specification (\L, \V or \M) is present
in a mail message, it is highly recommended that the message end in the
Literal state. This helps applications reading multiple mail messages in one
data stream, such as a terminal application. It is useful because mail
headers do not adhere to the VIQR, and they are more adversely affected when
interpreted in non-Literal states.
5.2. VIETNAMESE KEYBOARDING
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Keyboards are becoming increasingly internationalized. As mentioned in the
8-bit specification, this is the major reason for using the same code
positions for those Vietnamese characters already present in ISO 8859/Latin-1.
A Vietnamese keyboard driver designed to work in the 7-bit-only environment
can assume that it will not encounter Vietnamese base vowels residing in G1.
Keyboard drivers for the 8-bit environments, like 8-bit electronic mail agents
(Section 5.1), must be prepared to accept any base vowel, including those
encoded in G1.
The real-time echoing behavior of keyboard input during composition
requires further specification. The options are to report the character only
after the composition sequence has finished, or to report all intermediate
forms and backspacing over them. Each has its own useful context as described
below.
5.2.1. Immediate Echo for Implicit Composition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Implicit composition is designed to be convenient for a user processing data
that is mostly Vietnamese. As such it is desirable for the keyboarding user
to get immediate feedback on typed keys. With implicit composition, the
keyboard works in immediate-echo mode. Keypresses immediately generate key
events. If a character is subsequently composed with a diacritical mark, a
backspace (typically BS, ASCII 0x08) is sent followed by the new composed
character. This cycle continues as long as composition is possible. The
sequence of events for the key sequence "a^'n" under immediate echo is:
1. user types a, a is sent to the application
2. user types ^, BS and Γ are sent
3. user types ', BS and ñ are sent
4. user types n, the single key n is sent
The actual backspace character code may vary depending on the system,
23
application, and user settings. The keyboard interface should use the
appropriate code, and/or allow the user to specify the preferred backspace
character.
5.2.2. Delayed Echo for Explicit Composition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a composition sequence is started, the keyboard interface must not send
any key events to the application expecting keyboard input until the sequence
is terminated. Composition may end either naturally when the interface
receives a character that cannot be composed into the sequence, or when the
closure character <CLS> is received. A single key event for the composed
character is then sent to the application above. Subsequent processing can
proceed naturally. Consider what happens when the user types the sequence
"\a^'n" under delayed echo:
1. user types \, no key is sent to the application
2. user types a, no key is sent
3. user types ^, no key is sent
4. user types ', the single key ñ is sent
5. user types n, the single key n is sent
Or an example involving closure, t\o+<CLS>:
1. user types t, the key t is sent
2. user types \, no key is sent
3. user types o, no key is sent
4. user types +, no key is sent
5. user types CTRL-A, the single key ╜ is sent
Note that without the closure key the keyboard interface would still be
left hanging after the "+" key has been pressed, because the user can still
enter a tone mark as part of the composition sequence.
This delayed-echo behavior for explicit composition is designed to ensure
compatibility with applications expecting single key events for each
character, particularly in the English state where only explicit composition
is available.
While it is certainly possible to have immediate-echo in explicit
composition or delayed-echo in implicit composition, these options are not
useful and serve only to confuse the user learning how to use a Vietnamese
keyboard. It is therefore simplest to associate delayed-echo with explicit
composition, and immediate-echo with implicit composition. These options make
natural sense.
This standard defines the minimal "look-and-feel" behavior a user can
expect from a compliant Vietnamese software package. A standardized interface
decreases the required learning time for each new application. This standard
does not preclude other input mechanisms to improve user-friendliness, e.g.,
intelligent menu-driven diacritics, or to assist in speed typing, e.g.,
through the use of CONTROL or FUNCTION keys. Any enhancement in compliant
applications is a bonus for the user, so long as such enhancements do not
24
adversely conflict with the minimum expected behavior described here.
5.3. ADAPTING EXISTING VIETNAMESE APPLICATIONS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A realistic approach to standardization provides for the inertia against
change in existing software applications. While it is desirable that the
standard 8-bit encoding described here be fully supported, an alternative
exists which is more amenable to rapid adoption. All applications should
provide a means for importing and exporting data encoded using the VISCII
8-bit encoding table. At the same time, the VIQR keyboard interface should be
implemented, at least as an optional entry method. Such moves are highly
desirable both for the user and the vendor alike. The user will be able to
use the software immediately because of the uniform keyboard interface, as
well as process the same data in different applications and on different
platforms, with increased productivity and interactivity among users. This
ease of use means greater acceptance and a correspondingly larger customer
base for the vendor.
6. SUMMARY & CONCLUSIONS
~~~~~~~~~~~~~~~~~~~~~~~~
This paper has presented a proposal for standardization of Vietnamese
information processing. A case has been made for the necessity of
standardization; we hope to have encouraged vendors and users of Vietnamese
alike to work together toward this goal to benefit everyone involved. Various
encoding approaches were discussed, leading to the choice of the VISCII 8-bit
encoding proposal. A single encoding table was presented that has been shown
in actual practice to work well for Vietnamese including editing, processing,
storage, transfer, font encoding, and printing. Where 8-bit data handling was
not available or reliable, e.g., electronic mail transport, the Vietnamese
Quote-Readable specification (VIQR) was introduced to provide a seamless
filtering gateway. VIQR was defined to be input-source-independent and hence
has been designed to be applicable to Vietnamese keyboard input as well as
machine data filters. All of this was shown to have been integrated into
existing environments facilitating the use of existing tools and
applications---a major strength of the encoding. Finally, these
specifications have been linked together seamlessly to include every point in
the input-process/transfer-output cycle of data handling and provide for a
truly unified framework for Vietnamese information processing.
References
~~~~~~~~~~
[1] B╒ch H▀ng Khang. "Institute of Informatics." Hα N╡i,
Vi«t Nam, Feb. 1991.
[2] B. Jerman-Blazic, "Will the Multi-octet Standard Character Set
Code Solve the World Coding Problems for Information Interchange?"
Computer Standards & Interfaces, vol. 8, pages 127-136, 1988.
25
[3] The Unicode Consortium. The Unicode Standard: Worldwide
Character Encoding Version 1.0. Addison-Wesley, Reading, MA,
first edition, Oct. 1991.
[4] ISO Technical Committee, "Universal Multiple-Octet Coded
Character Set (UCS), ISO/IEC DIS 10646-1.2," Draft standard,
International Organization for Standardization, 1992.
[5] International Organization for Standardization, ISO 8859/x:
8-bit Internatonal Code Sets. ISO, 1977.
[6] Famjxuaen Thais. Vi«t Ngµ CΣi Cßch. T╤ HΣi, Hα N╡i,
Vi«t Nam, Mar. 1948.
[7] Ph╒m XuΓn Thßi. Chµ Vi«t H■p Lφ, Tφn-╨╤c Th▀ Xπ,
Sαi G≥n, Vi«t Nam, Apr. 1958.
[8] J. Postel, "Simple Mail Transfer Protocol," RFC 822, USC
Information Sciences Institute, Aug. 1982.
[9] J. C. Klensin et al., "SMTP Extensions for Transport of
Text-Based Messages Containing 8-bit Characters," Internet draft,
Massachusetts Institute of Technology, July 1991.
[10] K. Simonsen, "Character Mnemonics & Character Sets," Internet
Draft, Danish Unix Users Group, Jan. 1992.
[11] K. Simonsen, "Mnemonic Text Format," Internet draft, Danish
Unix Users Group, Aug. 1991.
[12] International Organization for Standardization. ISO 646:
7-bit Coded Character Set for Information Interchange. ISO,
third edition, 1991.
[13] International Organization for Standardization. ISO 2022:
7-bit and 8-bit Coded Character Sets--Code Extension Techniques.
ISO, third edition, 1986.
[14] E. M. van der Poel, "Multilingual Character Encoding for Internet
Messages," Internet draft, Software Research Associates,
Japan, Jan. 1992.
[15] IBM. System/370 Reference Summary-GX20-1850-5, sixth edition, 1984.
[16] C.E. Mackenzie. Coded-Character Sets: History and Development.
Addison-Wesley, Reading, MA, 1980.
[17] D.E. Knuth. The TeXbook. Addison-Wesley, Reading, MA, 1984.
26
Glossary of Terms
~~~~~~~~~~~~~~~~~
Announcer: A character or sequence of characters appearing in the data that
signifies the start of some special sequence. In this text, it announces
a Vietnamese composition sequence.
ASCII: American Standard Code for Information Interchange, a 128-character
code used almost universally by computers for representing and
transmitting characters data, in which each character corresponds to a
decimal number between 0 and 127. Eight- or nine-bit codes of which the
first 128 characters correspond to ASCII are called Extended ASCII; the
additional characters are used to provide graphic characters for roman
alphabets with diacritics, non-roman alphabets, special screen effects,
etc.
Base Vowel: In this text, the unaccented Vietnamese vowels: a, σ, Γ, e, Ω, i,
o, ⌠, ╜, u, ▀, y (and their capitals). Contrast this with Vowel.
C0 Space: "Control characters" at code positions with hex values 00 through
1F.
C1 Space: "Control characters" at code positions with hex values 80 through
9F.
Code: In data communication, the numeric or internal representation for a
character, e.g., in ASCII.
Code Page: Name used to denote glyph sets on the IBM PC. Abbreviated as CP. CP
850 is the multilingual code page, CP860 is for Portugal, CP863 is for
French Canada, CP865 is for Norway.
Control Character: An ASCII character in the range 0 to 31, plus ASCII
character 127, contrasted with the printable, or graphic, characters in
the range 32 to 126. It is produced on an ASCII terminal by holding down
the CTRL key and typing the desired character.
EBCDIC: Extended Binary Coded Decimal Interchange Code. The character code
used on IBM mainframes. Not covered by any formal standards but described
definitively in [15] and discussed at length in [16].
Floating Diacritics: A multiple-unit encoding approach for Vietnamese that
treats the vowel and its diacritics as separate units. The diacritics may
either precede or follow the vowel, or even the word. Contrast this with
Precomposed Character.
Glyph: The physical appearance of a character as displayed on the screen or
printed on paper.
G0 Space: "Graphic characters" at code positions with hex values 20 through
7F.
G1 Space: "Graphic characters" at code positions with hex values A0 through
27
FF.
ISO: International Organization for Standardization. A voluntary
international group of national standards organizations that issues
standards in all areas, including computers, information processing, and
character sets.
ISO 646: The standard 7-bit code set, equivalent to ASCII [12].
ISO Standard 8859: An ISO standard specifying a series of 8-bit computer
character sets that include characters from many languages. These include
ISO Latin Alphabets 1-9, which cover most of the written languages based
on Roman letters, plus special character sets for Cyrillic, Greek, Arabic,
and Hebrew [5].
ISO 8859/1: ISO Standard 8859 Latin Alphabet Number 1. Supports at least the
following languages: Latin, Danish, Dutch, English, Faeroese, Finnish,
French, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish,
and Swedish [5].
ISO 2022 and ISO 4873: ISO standards for switching code pages [13].
ISO DIS 10646: The prospective 16- and 32-bit Universal Coded Set, (Draft
International Standard) [4].
Latin: Referring to the Latin, or Roman, alphabet, comprised of the letters A
through Z, or to any alphabet based upon it.
MS-DOS: Microsoft's Disk Operating System for microcomputers based on the
Intel 80x86 family of CPU chips.
Modifier: A phonetic diacritical mark. The Vietnamese modifiers are: breve
(dñu trσng or dñu ß), circumflex (dñu m√, ^), horn (dñu m≤c).
PC: Personal Computer. In this text, the term PC refers to the entire IBM PC
and PS/2 families and compatibles, which includes the AT, 286, 386, and
486 PC's.
PostScript: A page description language with graphics capabilities designed
for electronic printing. The description is high-level and device-
independent. PostScript is a trademark of Adobe Systems Incorporated.
Precomposed Characters: An encoding approach for Vietnamese that treats all
vowel combinations as single units. Contrast this with Floating
Diacritics.
TeX: A computerized typesetting system developed by Donald Knuth [17],
providing nearly everything needed for high-quality typesetting of
mathematical notations as well as of ordinary text. TeX is a trademark of
the American Mathematical Society.
Tone Mark: A tonal diacritical mark that indicates the tone/accent. The
Vietnamese tone marks are: acute (síc), grave (huy½n), hook above (h÷i),
28
tilde (ngπ), dot below (núng).
Unicode: A 16-bit multilingual character code proposed by the Unicode
Consortium [3].
Unix: A popular operating system developed at AT&T Bell Laboratories and noted
for its portability.
Usenet: A worldwide network available to users for sending messages (or "news
articles") that can be read and responded to by other users. Participating
in Usenet is like subscribing to a collection of electronic magazines.
These "magazines," called newsgroups, are devoted to particular topics.
The "Soc.Culture.Vietnamese" newsgroup is very popular among both
Vietnamese and non-Vietnamese worldwide.
Viet-Std: A non-profit group of overseas Vietnamese and other professionals
working on software & hardware standards for the Vietnamese language.
Members of the group exchange ideas via electronic mail and meetings.
Vowel: In this text, a generic term applying to all Vietnamese vowels and
their various combining forms, e.g., a, σ, and í. See Base Vowel.
29
Appendix A: Vietnamese Characters under VISCII and VIQR by Collating Order
+=================++==================++==================++=================+
|Chr: VIQR:VISCII || Chr: VIQR:VISCII || Chr: VIQR:VISCII || Chr: VIQR:VISCII|
+---:-----:-------++----:-----:-------++----:-----:-------++----:-----:------+
| A : A : 065 || N : N : 078 || a : a : 097 || n : n : 110 |
| ┴ : A' : 193 || O : O : 079 || ß : a' : 225 || o : o : 111 |
| └ : A` : 192 || ╙ : O' : 211 || α : a` : 224 || ≤ : o' : 243 |
| ─ : A? : 196 || ╥ : O` : 210 || Σ : a? : 228 || ≥ : o` : 242 |
| ├ : A~ : 195 || Ö : O? : 153 || π : a~ : 227 || ÷ : o? : 246 |
| Ç : A. : 128 || ╒ : O~ : 213 || á : a. : 160 || ⌡ : o~ : 245 |
| ┼ : A( : 197 || Ü : O. : 154 || σ : a( : 229 || ≈ : o. : 247 |
| ü : A(' : 129 || ╘ : O^ : 212 || í : a(' : 161 || ⌠ : o^ : 244 |
| é : A(` : 130 || Å : O^' : 143 || ó : a(` : 162 || » : o^' : 175 |
| : A(? : 002 || É : O^` : 144 || ╞ : a(? : 198 || ░ : o^` : 176 |
| : A(~ : 005 || æ : O^? : 145 || ╟ : a(~ : 199 || ▒ : o^? : 177 |
| â : A(. : 131 || Æ : O^~ : 146 || ú : a(. : 163 || ▓ : o^~ : 178 |
| ┬ : A^ : 194 || ô : O^. : 147 || Γ : a^ : 226 || ╡ : o^. : 181 |
| ä : A^' : 132 || ┤ : O+ : 180 || ñ : a^' : 164 || ╜ : o+ : 189 |
| à : A^` : 133 || ò : O+' : 149 || Ñ : a^` : 165 || ╛ : o+' : 190 |
| å : A^? : 134 || û : O+` : 150 || ª : a^? : 166 || ╢ : o+` : 182 |
| : A^~ : 006 || ù : O+? : 151 || τ : a^~ : 231 || ╖ : o+? : 183 |
| ç : A^. : 135 || │ : O+~ : 179 || º : a^. : 167 || ▐ : o+~ : 222 |
| B : B : 066 || ö : O+. : 148 || b : b : 098 || ■ : o+. : 254 |
| C : C : 067 || P : P : 080 || c : c : 099 || p : p : 112 |
| D : D : 068 || Q : Q : 081 || d : d : 100 || q : q : 113 |
| ╨ : DD : 208* || R : R : 082 || ≡ : dd : 240 || r : r : 114 |
| E : E : 069 || S : S : 083 || e : e : 101 || s : s : 115 |
| ╔ : E' : 201 || T : T : 084 || Θ : e' : 233 || t : t : 116 |
| ╚ : E` : 200 || U : U : 085 || Φ : e` : 232 || u : u : 117 |
| ╦ : E? : 203 || ┌ : U' : 218 || δ : e? : 235 || · : u' : 250 |
| ê : E~ : 136 || ┘ : U` : 217 || ¿ : e~ : 168 || ∙ : u` : 249 |
| ë : E. : 137 || £ : U? : 156 || ⌐ : e. : 169 || ⁿ : u? : 252 |
| ╩ : E^ : 202 || ¥ : U~ : 157 || Ω : e^ : 234 || √ : u~ : 251 |
| è : E^' : 138 || ₧ : U. : 158 || ¬ : e^' : 170 || ° : u. : 248 |
| ï : E^` : 139 || ┐ : U+ : 191 || ½ : e^` : 171 || ▀ : u+ : 223 |
| î : E^? : 140 || ║ : U+' : 186 || ¼ : e^? : 172 || ╤ : u+' : 209 |
| ì : E^~ : 141 || ╗ : U+` : 187 || ¡ : e^~ : 173 || ╫ : u+` : 215 |
| Ä : E^. : 142 || ╝ : U+? : 188 || « : e^. : 174 || ╪ : u+? : 216 |
| F : F : 070 || : U+~ : 255 || f : f : 102 || µ : u+~ : 230 |
| G : G : 071 || ╣ : U+. : 185 || g : g : 103 || ± : u+. : 241 |
| H : H : 072 || V : V : 086 || h : h : 104 || v : v : 118 |
| I : I : 073 || W : W : 087 || i : i : 105 || w : w : 119 |
| ═ : I' : 205 || X : X : 088 || φ : i' : 237 || x : x : 120 |
| ╠ : I` : 204 || Y : Y : 089 || ∞ : i` : 236 || y : y : 121 |
| ¢ : I? : 155 || ▌ : Y' : 221 || ∩ : i? : 239 || ² : y' : 253 |
| ╬ : I~ : 206 || ƒ : Y` : 159 || ε : i~ : 238 || ╧ : y` : 207 |
| ÿ : I. : 152 || : Y? : 020 || ╕ : i. : 184 || ╓ : y? : 214 |
| J : J : 074 || : Y~ : 025 || j : j : 106 || █ : y~ : 219 |
| K : K : 075 || : Y. : 030 || k : k : 107 || ▄ : y. : 220 |
| L : L : 076 || Z : Z : 090 || l : l : 108 || z : z : 122 |
| M : M : 077 || : : || m : m : 109 || : : |
+=================++==================++==================++=================+
* VIQR also allows "╨" to be represented by "Dd" or "dD".
30
Appendix B: Vietnamese Characters under VISCII and VIQR by Encoding Order
+===================================+========================================+
|VISCII:Chr:VIQR : Descriptive Name |VISCII:Chr:VIQR : Descriptive Name |
+------:---:-----:------------------+------:---:-----:-----------------------+
| 002 : : A(? :A breve hook-above| 112 : p : p : p |
| 005 : : A(~ :A breve tilde | 113 : q : q : q |
| 006 : : A^~ :A circumflex tilde| 114 : r : r : r |
| 020 : : Y? :Y hook-above | 115 : s : s : s |
| 025 : : Y~ :Y tilde | 116 : t : t : t |
| 030 : : Y. :Y dot-below | 117 : u : u : u |
| 065 : A : A : A | 118 : v : v : v |
| 066 : B : B : B | 119 : w : w : w |
| 067 : C : C : C | 120 : x : x : x |
| 068 : D : D : D | 121 : y : y : y |
| 069 : E : E : E | 122 : z : z : z |
| 070 : F : F : F | 128 : Ç : A. : A dot-below |
| 071 : G : G : G | 129 : ü : A(' : A breve acute |
| 072 : H : H : H | 130 : é : A(` : A breve grave |
| 073 : I : I : I | 131 : â : A(. : A breve dot-below |
| 074 : J : J : J | 132 : ä : A^' : A circumflex acute |
| 075 : K : K : K | 133 : à : A^` : A circumflex grave |
| 076 : L : L : L | 134 : å : A^? :A circumflex hook-above|
| 077 : M : M : M | 135 : ç : A^. :A circumflex dot-below |
| 078 : N : N : N | 136 : ê : E~ :E tilde |
| 079 : O : O : O | 137 : ë : E. :E dot-below |
| 080 : P : P : P | 138 : è : E^' :E circumflex acute |
| 081 : Q : Q : Q | 139 : ï : E^` :E circumflex grave |
| 082 : R : R : R | 140 : î : E^? :E circumflex hook-above|
| 083 : S : S : S | 141 : ì : E^~ :E circumflex tilde |
| 084 : T : T : T | 142 : Ä : E^. :E circumflex dot-below |
| 085 : U : U : U | 143 : Å : O^' :O circumflex acute |
| 086 : V : V : V | 144 : É : O^` :O circumflex grave |
| 087 : W : W : W | 145 : æ : O^? :O circumflex hook-above|
| 088 : X : X : X | 146 : Æ : O^~ :O circumflex tilde |
| 089 : Y : Y : Y | 147 : ô : O^. :O circumflex dot-below |
| 090 : Z : Z : Z | 148 : ö : O+. : O horn dot-below |
| 097 : a : a : a | 149 : ò : O+' : O horn acute |
| 098 : b : b : b | 150 : û : O+` : O horn grave |
| 099 : c : c : c | 151 : ù : O+? : O horn hook-above |
| 100 : d : d : d | 152 : ÿ : I. : I dot-below |
| 101 : e : e : e | 153 : Ö : O? : O hook-above |
| 102 : f : f : f | 154 : Ü : O. : O dot-below |
| 103 : g : g : g | 155 : ¢ : I? : I hook-above |
| 104 : h : h : h | 156 : £ : U? : U hook-above |
| 105 : i : i : i | 157 : ¥ : U~ : U tilde |
| 106 : j : j : j | 158 : ₧ : U. : U dot-below |
| 107 : k : k : k | 159 : ƒ : Y` : Y grave |
| 108 : l : l : l | 160 : á : a. : a dot-below |
| 109 : m : m : m | 161 : í : a(' : a breve acute |
| 110 : n : n : n | 162 : ó : a(` : a breve grave |
| 111 : o : o : o | 163 : ú : a(. : a breve dot-below |
+===================================+========================================+
31
Appendix B: Vietnamese Characters under VISCII and VIQR by
Encoding Order (continued)
+========================================+===================================+
|VISCII:Chr:VIQR : Descriptive Name |VISCII:Chr:VIQR : Descriptive Name |
+------:---:-----:-----------------------+------:---:-----:------------------+
| 164 : ñ : a^' : a circumflex acute | 210 : ╥ : O` : O grave |
| 165 : Ñ : a^` : a circumflex grave | 211 : ╙ : O' : O acute |
| 166 : ª : a^? :a circumflex hook-above| 212 : ╘ : O^ : O circumflex |
| 167 : º : a^. :a circumflex dot-below | 213 : ╒ : O~ : O tilde |
| 168 : ¿ : e~ :e tilde | 214 : ╓ : y? : y hook-above |
| 169 : ⌐ : e. :e dot-below | 215 : ╫ : u+` : u horn grave |
| 170 : ¬ : e^' :e circumflex acute | 216 : ╪ : u+? : u horn hook-above|
| 171 : ½ : e^` :e circumflex grave | 217 : ┘ : U` : U grave |
| 172 : ¼ : e^? :e circumflex hook-above| 218 : ┌ : U' : U acute |
| 173 : ¡ : e^~ :e circumflex tilde | 219 : █ : y~ : y tilde |
| 174 : « : e^. :e circumflex dot-below | 220 : ▄ : y. : y dot-below |
| 175 : » : o^' :o circumflex acute | 221 : ▌ : Y' : Y acute |
| 176 : ░ : o^` :o circumflex grave | 222 : ▐ : o+~ : o horn tilde |
| 177 : ▒ : o^? :o circumflex hook-above| 223 : ▀ : u+ : u horn |
| 178 : ▓ : o^~ :o circumflex tilde | 224 : α : a` : a grave |
| 179 : │ : O+~ :O horn tilde | 225 : ß : a' : a acute |
| 180 : ┤ : O+ :O horn | 226 : Γ : a^ : a circumflex |
| 181 : ╡ : o^. :o circumflex dot-below | 227 : π : a~ : a tilde |
| 182 : ╢ : o+` : o horn grave | 228 : Σ : a? : a hook-above |
| 183 : ╖ : o+? : o horn hook-above | 229 : σ : a( : a breve |
| 184 : ╕ : i. : i dot-below | 230 : µ : u+~ : u horn tilde |
| 185 : ╣ : U+. : U horn dot-below | 231 : τ : a^~ :a circumflex tilde|
| 186 : ║ : U+' : U horn acute | 232 : Φ : e` : e grave |
| 187 : ╗ : U+` : U horn grave | 233 : Θ : e' : e acute |
| 188 : ╝ : U+? : U horn hook-above | 234 : Ω : e^ : e circumflex |
| 189 : ╜ : o+ : o horn | 235 : δ : e? : e hook-above |
| 190 : ╛ : o+' : o horn acute | 236 : ∞ : i` : i grave |
| 191 : ┐ : U+ : U horn | 237 : φ : i' : i acute |
| 192 : └ : A` : A grave | 238 : ε : i~ : i tilde |
| 193 : ┴ : A' : A acute | 239 : ∩ : i? : i hook-above |
| 194 : ┬ : A^ : A circumflex | 240 : ≡ : dd : d bar |
| 195 : ├ : A~ : A tilde | 241 : ± : u+. : u horn dot-below |
| 196 : ─ : A? : A hook-above | 242 : ≥ : o` : o grave |
| 197 : ┼ : A( : A breve | 243 : ≤ : o' : o acute |
| 198 : ╞ : a(? : a breve hook-above | 244 : ⌠ : o^ : o circumflex |
| 199 : ╟ : a(~ : a breve tilde | 245 : ⌡ : o~ : o tilde |
| 200 : ╚ : E` : E grave | 246 : ÷ : o? : o hook-above |
| 201 : ╔ : E' : E acute | 247 : ≈ : o. : o dot-below |
| 202 : ╩ : E^ : E circumflex | 248 : ° : u. : u dot-below |
| 203 : ╦ : E? : E hook-above | 249 : ∙ : u` : u grave |
| 204 : ╠ : I` : I grave | 250 : · : u' : u acute |
| 205 : ═ : I' : I acute | 251 : √ : u~ : u tilde |
| 206 : ╬ : I~ : I tilde | 252 : ⁿ : u? : u hook-above |
| 207 : ╧ : y` : y grave | 253 : ² : y' : y acute |
| *208 : ╨ : DD : D bar | 254 : ■ : o+. : o horn dot-below |
| 209 : ╤ : u+' : u horn acute | 255 : : U+~ : U horn tilde |
+========================================+===================================+
* VIQR also allows "╨" to be represented by "Dd" or "dD".
MôT KHU╘N KHæ THÅNG NHäT CHO VIÄC X╝ L▌ D KIÄN VIÄT NG
Nh≤m NghiΩn C╤u TiΩu Chuªn Ti¬ng Vi«t {1}
Thßng Chφn 1992 {2}
Toßt Y¬u
Nhi½u lo╒i nhu li«u ╤ng d°ng c≤ th¼ d∙ng Vi«t ngµ ≡π xuñt hi«n nhóm ≡ßp ╤ng
nhu cÑu x╪ l² dµ ki«n Vi«t ngµ bóng ≡i«n toßn ngαy cαng gia tσng. Nhu cÑu tñt
y¬u cⁿa vi«c tφch h■p ti¬ng Vi«t vαo m⌠i tr▀╢ng ≡i«n toßn hi«n th╢i, c√ng nh▀
vi«c trao ≡▒i dµ ki«n giµa cßc m⌠i tr▀╢ng nαy ≡½u cho thñy s± cÑn thi¬t phΣi
c≤ m╡t tiΩu chuªn chung. Vσn ki«n nαy tr∞nh bαy nhµng cΓn nhíc k█ thuºt c≤
tφnh cßch th±c ti¡n vα quan tr≈ng mα m╡t tiΩu chuªn nh▀ trΩn cÑn phΣi c≤, ≡░ng
th╢i c√ng duy«t l╒i m╡t s» quy ▀╛c vα ≡½ ßn hi«n hµu trong nhµng lπnh v±c quan
tr≈ng nαy. Vσn ki«n c√ng tr∞nh bαy tr≈n v⌐n ≡½ ßn cⁿa nh≤m Viet-Std, g░m
nhµng ≡i¼m sau: 1) BΣng mπ s» 8-bit cho tñt cΣ mτu t± Vi«t nguyΩn v⌐n (tΩn Anh
ngµ lα Vietnamese Standard Code for Information Interchange, g≈i tít lα
VISCII), 2) M╡t tiΩu chuªn 7-bit ≡≈c-≡▀■c-trong-ngoúc (c≤ tΩn Anh ngµ lα
Vietnamese Quoted-Readable, g≈i tít lα VIQR), d∙ng ≡¼ trao ≡▒i dµ ki«n qua cßc
m╒ch 7-bit, c≤ giao di«n su⌠ng sδ v╛i h« mπ t± 8-bit nΩu trΩn, 3) M╡t quy ≡╕nh
giao di«n ≡ßnh chµ cho ng▀╢i d∙ng c≤ th¼ vºn hαnh d¡ dαng v╛i cΣ 1 vα 2. Tñt
cΣ nhµng ≡i¼m trΩn t╒o thαnh m╡t khu⌠n kh▒ th»ng nhñt cho m⌠i tr▀╢ng x╪ l²
Vi«t ngµ, v╫a ≡╜n giΣn, v╫a c≤ hi«u nσng vα tφch h■p d¡ dαng. Vi«c xΓy d±ng
khu⌠n kh▒ nαy trΩn th±c t¬ ≡π thαnh c⌠ng xuyΩn qua nhµng ╤ng d°ng h■p th╤c sΣn
xuñt b╖i m╡t s» tºp th¼ vα cß nhΓn trΩn m╡t s» h« th»ng mßy khßc nhau, g░m cΣ
khi¼n h« Unix vα nhµng bi¬n th¼ t▀╜ng t±, h« th»ng khung X (X-window), MS-DOS,
Windows, vα xuyΩn qua cßc c⌠ng tr∞nh ≡ang ≡▀■c th±c hi«n ╖ cßc n╜i khßc.
1. LûI GIòI THIÄU
~~~~~~~~~~~~~~~~~
V╛i s» l▀■ng ng▀╢i Vi«t t╒i hΣi ngo╒i ngαy cαng gia tσng vα vi«c s╪ d°ng mßy
vi tφnh ngαy cαng lan r╡ng t╒i Vi«t Nam, vi«c s╪ d°ng chµ Vi«t trong lπnh v±c
x╪ l² tin t╤c ≡π tσng tr▀╖ng nhanh ch≤ng. ╨░ng th╢i nhu cÑu v½ nhu li«u ti¬ng
Vi«t c√ng gia tσng khi¬n cho nhi½u c⌠ng ty ≡π ≡▀■c thαnh lºp vα thαnh c⌠ng t╒i
Hoa K╧ vα cßc n╜i khßc, phÑn l╛n chuyΩn v½ nhu li«u x╪ l² chµ Vi«t (Vietnamese
word processing). Ngoαi ra, nhi½u t▒ ch╤c c√ng nh▀ cß nhΓn ≡π n▓ l±c cung cñp
nhi½u ╤ng d°ng c⌠ng c╡ng mi¡n phφ v╛i phªm chñt cao cho c╡ng ≡░ng Vi«t Nam.
T╒i Vi«t Nam, cßc trung tΓm nh▀ Vi«n Tin H≈c [1] ch╞ng h╒n ≡π ghi nhºn s± ti¬n
b╡ khΣ quan v½ nhi½u mút, trong ≡≤ c≤ vi«c Vi«t Nam h≤a nhµng b╡ nhu li«u ph▒
------------------------------------------------------------------------------
{1} ╨╕a ch∩: Viet-Std, 1212 Somerset Dr., San Jose, California 95132, USA.
╨╕a ch∩ ≡i«n th▀: Viet-Std@Haydn.Stanford.EDU
{2} Tßi bΣn ngαy 3 thßng 12 nσm 1992. än bΣn 1.1 nαy thay th¬ ñn bΣn 1.0
xuñt bΣn vαo thßng giΩng 1992. S± khßc bi«t chⁿ y¬u giµa hai ñn bΣn lα s± hoßn
≡▒i v╕ trφ cⁿa hai mτu t± ╒ vα á trong bΣng mπ 8-bit.
32
33
th⌠ng.
Tñt cΣ nhµng ≡i½u trΩn cho thñy r⌡ hai ≡i¼m quan tr≈ng: 1) Nhu cÑu v½ nhu
li«u d∙ng ≡▀■c v╛i chµ Vi«t cαng ngαy cαng tσng, vα 2) Kh⌠ng thi¬u tαi nσng ≡¼
ph°c v° nhµng nhu cÑu trΩn. Ti¬c thay, ch·ng ta vñp phΣi m╡t tr╖ ng╒i rñt
l╛n: hÑu h¬t cßc ╤ng d°ng d∙ng chµ Vi«t hi«n th╢i ch∩ ho╒t ≡╡ng ≡▀■c trong
m╡t khu⌠n kh▒ hay m╡t m⌠i tr▀╢ng duy nhñt cⁿa ng▀╢i sΣn xuñt vα cßc ╤ng d°ng
do cßc nhα sΣn xuñt khßc nhau kh⌠ng t▀╜ng h■p v╛i nhau. Kh»i ╤ng d°ng d∙ng
chµ Vi«t s¿ kh⌠ng bao gi╢ theo k╕p ≡α ≡≥i h÷i cⁿa th╕ tr▀╢ng m╡t khi khuynh
h▀╛ng nΩu trΩn vτn c≥n t░n t╒i. Ng▀╢i d∙ng mu»n s╪ d°ng ti¬ng Vi«t trong
nhi½u lπnh v±c khßc nµa ch╤ kh⌠ng ch∩ gi╛i h╒n trong lπnh v±c x╪ l² chµ mα
th⌠i, vα vi«c mong ≡■i m╡t c⌠ng ty cung cñp m≈i ╤ng d°ng cho m≈i lπnh v±c cho
m≈i giαn mßy khßc nhau lα m╡t vi«c v⌠ t▀╖ng. Ngoαi ra, nhµng chuyΩn viΩn vi¬t
cßc ╤ng d°ng nαy l╒i b╕ gi╛i h╒n vαo cßc nhu li«u d°ng c° (software tools)
ti¬ng Vi«t mα chφnh h≈ phΣi h≈c vα t± phßt tri¼n lñy. Do ≡≤, vi«c ≡╕nh chuªn
lα m╡t ≡i½u bít bu╡c. Bñt c╤ ai ≡π gúp phΣi t∞nh tr╒ng bñt t▀╜ng h■p giµa
ASCII vα EBCDIC ≡½u c≤ th¼ m▀╢ng t▀■ng ra m╡t m⌠i tr▀╢ng mα m▓i mßy d∙ng m╡t
b╡ mπ t± khßc nhau, l·c ≡≤ m╛i nhºn ra róng s» l▀■ng ╤ng d°ng cho m⌠i tr▀╢ng
≡≤ rñt lα gi╛i h╒n vα vi«c trao ≡▒i dµ ki«n rñt phi½n toßi. M╡t khu⌠n kh▒
th»ng nhñt s¿ t╒o rñt nhi½u thuºn l■i cho cΣ ng▀╢i d∙ng lτn ng▀╢i vi¬t nhu
li«u.
Bñt c╤ d± thΣo tiΩu chuªn ti¬ng Vi«t nαo c√ng phΣi c╤u xΘt m╡t s» ≡i¼m
tr≈ng y¬u trong ≡·ng ph╒m vi cⁿa n≤. ╨i¼m ≡Ñu tiΩn vα quan tr≈ng nhñt lα vñn
≡½ tφch h■p. V∞ vσn ki«n nαy ch· tr≈ng v½ m⌠i tr▀╢ng 7-bit vα 8-bit hi«n
th╢i, m°c ≡φch chφnh y¬u phΣi lα s± tφch h■p (h╡i nhºp) ti¬ng Vi«t tr±c ti¬p
vα d¡ dαng vαo cßc h« th»ng mßy hi«n th╢i. TiΩu chuªn phΣi s╪ d°ng ≡▀■c ngay
lºp t╤c. ╨i½u nαy ng° ² vi«c s╪ d°ng cßc mτu t± Vi«t nguyΩn v⌐n (precomposed
character), thay v∞ d∙ng cßc dñu r╢i (diacritic) ≡i kΦm v╛i cßc nguyΩn Γm, v∞
ngoαi cßc nhu li«u ≡úc bi«t kh⌠ng c≤ ╤ng d°ng t▒ng qußt nαo c≤ th¼ d∙ng ≡▀■c
cßc dñu r╢i. TiΩu chuªn ≡½ ra phΣi ≡▀■c thi¬t k¬ khΘo lΘo ≡¼ tºn d°ng t»i ≡a
cßc nhu li«u hi«n th╢i. Quy luºt quen thu╡c "╨╫ng phßt minh bßnh xe lÑn nµa"
kh⌠ng nhµng ch∩ lα ▀u th¬ mα c≥n lα bít bu╡c n¬u ch·ng ta mu»n xΓy d±ng m╡t s»
╤ng d°ng c╜ bΣn cÑn thi¬t trong m╡t th╢i gian v╫a phΣi. Ngoαi ra, n≤i chung
v½ th╢i-gian x╪-l² (time) c√ng nh▀ ch▓ ch╤a (space), m≈i ng▀╢i ≡½u r⌡ lα vi«c
x╪ l² cßc mπ t± nguyΩn v⌐n c≤ hi«u nσng cao h╜n lα x╪ l² cßc mπ t± d∙ng dñu
r╢i [2]. Do ≡≤ vi«c d∙ng dñu r╢i phΣi ≡▀■c gi╛i h╒n vαo nhµng tr▀╢ng h■p cÑn
thi¬t bñt khΣ khßng, nh▀ khi ≡ßnh bαn chµ hay khi truy½n dµ ki«n 7-bit. Ngoαi
ra kh⌠ng c≤ l² do g∞ ≡¼ bít bu╡c m≈i ╤ng d°ng phΣi ≡▀╜ng ≡Ñu v╛i s± ph╤c t╒p
vα kΘm hi«u nσng cⁿa dñu r╢i trong vi«c x╪ l², l▀u trµ, truy½n tin, t╒o h∞nh
trΩn mαn Σnh vα in dµ ki«n 8-bit.
╨i¼m quan tr≈ng th╤ nh∞ lα phΣi c╤u xΘt nhµng ti½n l« ≡π c≤ s╟n trong kh»i
nhu li«u ti¬ng Vi«t. Bñt c╤ vi«c tiΩu chuªn h≤a nαo c√ng ≡≥i h÷i th╢i gian ≡¼
thφch nghi, n¬u tiΩu chuªn ≡≥i h÷i quß nhi½u thay ≡▒i th∞ s¿ gúp nhi½u phΣn
khßng cⁿa ng▀╢i d∙ng, vα ch∩ lαm chºm tr¡ vi«c ßp d°ng tiΩu chuªn mα th⌠i.
Cßc tiΩu chuªn dµ ki«n kh▒ 16-bit hoúc r╡ng h╜n nµa ≡π dÑn dÑn xuñt hi«n nh▀
Unicode [3] vα ISO 10646 [4]. Trong khi ch╢ ≡■i cßc tiΩu chuªn kh▒ r╡ng nαy
tr╖ thαnh ph▒ th⌠ng, ch·ng ta cÑn phΣi c≤ m╡t tiΩu chuªn ti¬ng Vi«t 8-bit vα
tiΩu chuªn nαy phΣi ≡▀■c chñp nhºn mau ch≤ng ≡¼ kh÷i tr╖ thαnh l▓i th╢i. M╡t
tiΩu chuªn 8-bit mu»n ≡▀■c chñp nhºn mau ch≤ng kh⌠ng th¼ b÷ qua nhµng ti½n l«
trong kh»i nhu li«u hi«n th╢i.
34
╨i¼m quan tr≈ng th╤ ba lα phΣi giΣi quy¬t vñn ≡½ giao di«n v╛i ng▀╢i d∙ng;
n¬u kh⌠ng ≡út ra th∞ t»i thi¼u phΣi suy xΘt Σnh h▀╖ng cⁿa n≤ ≡»i v╛i ng▀╢i
d∙ng. ╨i¼m nαy phÑn l╛n liΩn quan t╛i bαn ≡ßnh chµ 7-bit vα cßch t▀■ng tr▀ng
ti¬ng Vi«t bóng cßc k² t± 7-bit --- trong cΣ hai tr▀╢ng h■p nαy, cßc dñu ph°
phΣi lα dñu r╢i vα ≡▀■c t▀■ng tr▀ng b╖i cßc k² t± 7-bit v╛i h∞nh th∙ gÑn gi»ng
nh▀ dñu thºt. ╨»i v╛i cßch ≡ßnh chµ Vi«t, ch·ng ta phΣi duy tr∞, n¬u ≡▀■c,
nhµng th≤i quen ≡ßnh chµ ≡π ≡▀■c quy ≡╕nh trΩn di¡n ≡αn Viet-Net (m╡t m╒ng
l▀╛i ≡i«n th▀ cⁿa ng▀╢i Vi«t) vα di¡n ≡αn Soc.Culture.Vietnamese (nh≤m th⌠ng
tin Vi«t Nam trΩn m╒ng l▀╛i Usenet) v╛i thαnh viΩn trΩn toαn th¬ gi╛i. ╨»i
v╛i cßch t▀■ng tr▀ng chµ Vi«t bóng 7-bit, ≡i½u quan tr≈ng lα phΣi "≡≈c ≡▀■c."
M°c ≡φch ╖ ≡Γy lα r·t ngín th╢i gian h≈c vα c▒ v⌡ m╡t giao di«n thuÑn nhñt ≡¼
ng▀╢i d∙ng kh⌠ng phΣi t»n th╢i gian h≈c cßch d∙ng cho m▓i b╡ nhu li«u khßc
nhau.
Cu»i c∙ng, tiΩu chuªn phΣi c» gíng bóng m≈i cßch tuΓn theo khu⌠n kh▒ cⁿa
cßc tiΩu chuªn qu»c t¬, nh▀ ISO-8859/x [5], ≡¼ ≡Σm bΣo s± t▀╜ng h■p v╛i nhµng
m⌠i tr▀╢ng hi«n hµu. Ch╞ng h╒n, ≡i½u nαy ≡≥i h÷i tiΩu chuªn ti¬ng Vi«t phΣi
duy tr∞ bΣng mπ s» ASCII cⁿa Hoa K╧, c√ng nh▀ phΣi duy tr∞ v╕ trφ cⁿa tñt cΣ
nhµng mτu t± Vi«t nαo ≡π c≤ s╟n trong bΣng 8859/Latin-1 ≡¼ bΣo ≡Σm cho bαn
≡ßnh chµ 8859/Latin-1 c≤ th¼ ho╒t ≡╡ng b∞nh th▀╢ng cho cßc mτu t± ≡≤. Tuy
nhiΩn, v½ mút th±c t¬ c≤ m╡t s» yΩu cÑu ≡π l▓i th╢i. Thφ d° nh▀ gÑn ≡Γy, ⁿy
ban Unicode/ISO-10646 ≡π quy¬t ≡╕nh bπi b÷ vi«c cñm d∙ng v∙ng ki¼m t± (control
characters) --- mπ s» t╫ xx00h cho ≡¬n xx1Fh, ngo╒i tr╫ C0 --- v╛i l² do lα
≡i½u nαy ch∩ lαm phφ ph╒m mπ s» v⌠ φch. Nh▀ ta s¿ thñy d▀╛i ≡Γy, vi«c ñn ≡╕nh
cßc mτu t± Vi«t vαo nhµng khoΣng tr»ng nαy c≤ nhµng ≡i¼m l■i h╒i cⁿa n≤. Vi«c
ch≈n l±a l■i h╒i phΣi ≡▀■c bi«n minh bóng nhµng l² do chφnh ≡ßng, vα phΣi
nghiΩng v½ th±c ti¡n h╜n lα l² thuy¬t.
Nhµng yΩu cÑu chⁿ y¬u trΩn ≡Γy cⁿa m╡t tiΩu chuªn ≡▀■c t≤m tít nh▀ sau:
R1. Tφch h■p d¡ dαng vα tr±c ti¬p vαo cßc h« th»ng mßy hi«n th╢i.
R2. Lαm cho cßc nhu li«u hi«n th╢i thφch nghi d¡ dαng v╛i tiΩu chuªn m╛i.
R3. Ph▀╜ng phßp mπ h≤a vα giao di«n phΣi d¡ nh╛ vα d¡ s╪ d°ng.
R4. TuΓn theo cßc tiΩu chuªn qu»c t¬.
R5. Vi«c ch≈n l■i h╒i phΣi ≡▀■c cΓn nhíc d±a trΩn th±c ti¡n vα c≤ l² do
chφnh ≡ßng.
Trong cßc phÑn sau ≡Γy, ch·ng t⌠i s¿ duy«t l╒i ▀u khuy¬t ≡i¼m cⁿa nhµng
cßch mπ h≤a ti¬ng Vi«t. PhÑn 3 s¿ m⌠ tΣ chi ti¬t bΣng mπ s» Vi«t ngµ 8-bit
cⁿa Viet-Std. PhÑn 4 s¿ tr∞nh bαy m╡t ph▀╜ng phßp mπ h≤a "≡≈c-≡▀■c-trong-
ngoúc" ßp d°ng cho d≥ng dµ ki«n 7-bit trong ≡≤ c≤ ≡i«n th▀ vα cßch ≡ßnh chµ.
Cu»i c∙ng, PhÑn 5 phßc h≈a m╡t s» luºt l« vα quy ▀╛c riΩng bi«t thφch nghi cho
m╡t vαi ╤ng d°ng c° th¼.
2. DUYÄT LÇI NH NG QUY ┐òC HIÄN THûI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Khi duy«t qua nhµng quy ▀╛c d∙ng b╖i cßc nh≤m phßt tri¼n nhu li«u hi«n th╢i,
35
ta thñy m╡t ≡úc ≡i¼m n▒i bºt: hÑu h¬t m≈i ng▀╢i ≡½u c⌠ng nhºn ▀u ≡i¼m cⁿa
ph▀╜ng phßp mπ h≤a mτu t± nguyΩn v⌐n vα ch≈n ≡≤ lαm ≡i½u ki«n tiΩn quy¬t. Tuy
nhiΩn, khi ch≈n ph▀╜ng phßp nαy ta gúp phΣi nhµng kh≤ khσn quen thu╡c: ngoαi
nhµng mτu t± ≡π c≤ s╟n trong bΣng ASCII, ti¬ng Vi«t Nam c≥n cÑn thΩm 134 mτu
t± nµa. Trong s» nαy, 128 mτu t± c≤ th¼ ≡▀■c ≡út trong v∙ng C1 vα G1. Sßu mτu
t± Vi«t c≥n l╒i c≤ th¼ ≡▀■c ≡út trong v∙ng C0 vα G0 bóng nhµng ph▀╜ng phßp
khßc nhau:
A1. X¬p vαo ch▓ cⁿa 6 mτu t± t▀╜ng ≡»i "φt ≡▀■c d∙ng nhñt" trong v∙ng G0
khi x╪ l² ti¬ng Vi«t.
A2. X¬p vαo ch▓ cⁿa 6 k² t± trong tºp k² t± khΣ hoßn NRC {3} (National
Replacement Character set).
A3. B÷ h╞n 6 mτu t± t▀╜ng ≡»i "φt d∙ng nhñt" {4} trong ti¬ng Vi«t, nh▀ ,
, , , , vα .
A4. Thay th¬ cßc mτu t± v╛i g»c "y" bóng "i," thφ d° nh▀ "k█ s▀" s¿ tr╖
thαnh "kε s▀."
A5. ╨út 6 mτu t± nαy vαo v∙ng ki¼m t± C0 cⁿa ASCII.
GiΣi phßp A1 vα A2 th÷a mπn cßc nhu cÑu tiΩu bi¼u cⁿa nhµng m⌠i-tr▀╢ng x╪-
l²-chµ mα trong ≡≤ ta c≤ th¼ trßnh s╪ d°ng nhµng k² t± ASCII φt d∙ng hoúc s╪
d°ng ch·ng bóng cßch chuy¼n ph⌠ng (font shifting). Tuy nhiΩn cΣ hai giΣi phßp
nαy ≡½u phß tan vi¡n t▀■ng tφch h■p ti¬ng Vi«t vαo cßc m⌠i tr▀╢ng ASCII hi«n
th╢i mα trong ≡≤ tñt cΣ cßc k² t± trong v∙ng G0 ≡½u phΣi ≡▀■c duy tr∞. M▓i k²
t± G0 ch∩ c≤ m╡t nhi«m v° riΩng vα kh⌠ng th¼ ≡▀■c d∙ng l╒i vαo m╡t nhi«m v°
khßc ≡▀■c. L² do th╤ nhñt lα vi«c t╒o h∞nh cⁿa k² t± G0 ≡≤ s¿ b╕ sai v∞ h∞nh
t╒o ra s¿ lα m╡t mτu t± Vi«t. V∞ cßc k² t± G0 ≡▀■c s╪ d°ng rñt th▀╢ng xuyΩn,
vi«c xung ≡╡t giµa mτu t± Vi«t vα k² t± G0 trong m╡t m⌠i tr▀╢ng tφch h■p lα
vi«c kh⌠ng th¼ chñp nhºn ≡▀■c. L² do th╤ hai lα trong khi ph▀╜ng phßp chuy¼n
ph⌠ng c≤ th¼ cΣi thi«n t∞nh tr╒ng xung ≡╡t nαy trong vαi tr▀╢ng h■p, ch·ng ta
s¿ gúp kh≤ khσn trÑm tr≈ng h╜n. M⌠i tr▀╢ng nhu li«u th▀╢ng th▀╢ng ñn ≡╕nh cho
m▓i k² t± trong G0 m╡t ² nghεa riΩng, ≡úc bi«t lα cßc k² t± NRC. Hπy xΘt
tr▀╢ng h■p ch·ng ta thay th¬ k² t± g╒ch-chΘo-ng▀■c "\" bóng m╡t mτu t± Vi«t
(nh▀ mτu t± ▒ ch╞ng h╒n) trong m⌠i tr▀╢ng Unix. K² t± g╒ch-chΘo-ng▀■c ≡▀■c
d∙ng trong nhi½u c╜ ch¬ thoßt (escape mechanism) cⁿa Unix thαnh th╪ ra mτu t±
"▒" kh⌠ng th¼ ≡▀■c d∙ng m╡t cßch b∞nh th▀╢ng mα phΣi ≡▀■c "thoßt" ≡úc bi«t
bóng cßch nαy hay cßch khßc. ╨Γy kh⌠ng phΣi ch∩ lα m╡t s± bñt ti«n nh÷; vi«c
trao ≡▒i dµ ki«n s¿ gúp ríc r»i nhi½u v∞ cßc h« th»ng mßy khßc s¿ kh⌠ng hi¼u
c╜ ch¬ thoßt ≡úc bi«t nαy, do ≡≤ dµ ki«n s¿ kh⌠ng ≡▀■c bΣo toαn. Bñt c╤ tiΩu
chuªn nαo ßp d°ng giΣi phßp nαy s¿ kh⌠ng lαm tr≥n ch╤c nσng cσn bΣn cⁿa n≤ lα
cung cñp s± thuÑn nhñt trΩn cßc h« th»ng mßy khßc nhau. TrΩn ≡Γy lα n≤i v½ k²
t± g╒ch-chΘo-ng▀■c nh▀ng cßc k² t± G0 khßc c√ng c≤ nhµng kh≤ khσn t▀╜ng t±.
------------------------------------------------------------------------------
{3} Tºp NRC g░m c≤ 12 k² t± ASCII ╖ v∙ng G0 lα #, $, @, [, \, ], ^, `, {,
|, }, vα ~. Nhµng k² t± nαy c≤ th¼ ≡▀■c thay th¬ bóng cßc chµ khßc t∙y theo
nhu cÑu cⁿa m▓i qu»c gia.
{4} ═t d∙ng b╖i v∞ nhµng mτu t± nαy (a) φt khi bít ≡Ñu m╡t chµ vα v∞ th¬
φt khi cÑn mút hoa, vα (b) φt xuñt hi«n trong cßc chµ Vi«t.
36
Cßc ≡½ ngh╕ A3 vα A4 h╒n ch¬ dµ ki«n Vi«t ngµ bóng cßch nαy hoúc cßch
khßc. HÑu h¬t m≈i ng▀╢i ≡░ng ² róng vi«c lo╒i b÷ m╡t s» mτu t± Vi«t lα vi«c
kh⌠ng th¼ chñp nhºn ≡▀■c; thºt vºy, trong phÑn thΣo luºn ╖ trΩn, ch·ng ta ≡π
coi vi«c bΣo toαn tñt cΣ mτu t± Vi«t lα m╡t yΩu cÑu ≡▀╜ng nhiΩn. Tuy th¬, cÑn
phΣi n≤i thΩm n╜i ≡Γy lα ≡½ ngh╕ A4 kh⌠ng phΣi lα kh⌠ng c≤ l² do chφnh ≡ßng.
╨π c≤ m╡t tr▀╢ng phßi t▀ t▀╖ng nghε róng trong cßc chµ ch∩ ch╤a "y" (vα dñu
gi≈ng n¬u c≤) lα nguyΩn Γm duy nhñt, chµ "y" c≤ th¼ ≡▀■c thay th¬ bóng mτu t±
"i" v∞ cßch phßt Γm cⁿa cΣ hai chµ t▀╜ng ≡▀╜ng v╛i nhau. Khßi ni«m nαy ≡π c≤
t╫ nσm 1948 [6,7]. Tuy nhiΩn, nhi«m v° cⁿa m╡t tiΩu chuªn mπ h≤a kh⌠ng phΣi lα
giΣi quy¬t cßc vñn ≡½ liΩn h« t╛i ng⌠n ngµ. Do ≡≤ vi«c ch≈n ≡½ ngh╕ A4 lα m╡t
≡i½u kh⌠ng t»t.
L² do ≡Ñu tiΩn ≡¼ bßc b÷ ≡½ ngh╕ A5 chⁿ y¬u phßt xuñt t╫ lπnh v±c truy½n
tin v∞ cßc m╒ch chuy¼n tin (data communication channel) d∙ng nhi½u mπ t± C0
trong vi«c ki¼m soßt dµ ki«n. Ngoαi ra, ≡½ ngh╕ nαy t╒o thΩm m╡t s» kh≤ khσn
khi tφch h■p ti¬ng Vi«t vαo nhµng m⌠i tr▀╢ng mα trong ≡≤ m╡t s» mπ t± C0 ≡▀■c
d∙ng trong giao di«n bαn ≡ßnh chµ (keyboard interface) vα trong vi«c ≡i½u
khi¼n khu⌠n th╤c dµ ki«n (data format control), t▀╜ng t± nh▀ nhµng kh≤ khσn
gúp phΣi trong ≡½ ngh╕ A1 vα A2. Tuy th¬, nh▀ s¿ ≡▀■c tr∞nh bαy trong cßc
phÑn k¬ ti¬p, vi«c ch≈n l±a thºn tr≈ng 6 mπ t± C0 trong th±c t¬ ≡π cho thñy c≤
k¬t quΣ t»t ≡⌐p mα vτn trßnh ≡▀■c cßc mπ t± quan tr≈ng d∙ng trong vi«c th⌠ng
tin dµ ki«n. H╜n nµa, hÑu h¬t cßc m╒ch chuy¼n tin cho phΘp chuy¼n ≡╒t dµ ki«n
8-bit m╡t cßch minh b╒ch, trung th±c, vα kh⌠ng c≤ l² do g∞ ≡¼ e ng╒i róng
ch·ng ta kh⌠ng th¼ chuy¼n bñt c╤ mπ s» nαo qua cßc m╒ch chuy¼n tin nαy.
Trong nhµng tr▀╢ng h■p cß bi«t mα C0 ≡▀■c d∙ng trong giao di«n bαn ≡ßnh
chµ, vi«c ch≈n l±a kh⌠n khΘo c√ng nh▀ vi«c ñn ≡╕nh l╒i nhi«m v° cⁿa cßc phφm
chµ s¿ lαm giΣm thi¼u nhµng xung khíc. Cßc mπ t± d∙ng trong vi«c ≡i½u khi¼n
khu⌠n th╤c dµ ki«n th▀╢ng thay ≡▒i theo t╫ng nhu li«u ╤ng d°ng nh▀ng th⌠ng
th▀╢ng ch·ng nóm rΣi rßc trong v∙ng C0 vα C1. Do ≡≤ ≡Γy lα m╡t kh≤ khσn ph▒
th⌠ng cho vi«c tφch h■p b╖i v∞ ch·ng ta cÑn phΣi d∙ng tñt cΣ cßc mπ s» trong
C1 cho chµ Vi«t. Tuy nhiΩn, m╡t lÑn nµa, vi«c xung khíc c≤ th¼ ≡▀■c giΣm
thi¼u bóng cßch nghiΩn c╤u cßc ╤ng d°ng quan tr≈ng. Cu»i c∙ng, ch·ng ta c≤
th¼ ch≈n 6 mτu t± Vi«t φt d∙ng nhñt ≡¼ lαm giΣm t»i ≡a xßc suñt xung khíc.
Xin ch· ² lα phÑn tr∞nh bαy trΩn ≡Γy ≡π phΓn tφch t╫ng ≡½ ngh╕ m╡t ≡¼ xem
ch·ng c≤ th÷a mπn cßc ≡i½u ki«n cho vi«c tφch h■p chµ Vi«t Nam vαo cßc ╤ng
d°ng vα cßc h« th»ng mßy hi«n th╢i, nh▀ ≡π nΩu ra trong PhÑn 1. Ch·ng ta
kh⌠ng th¼ kh⌠ng nhñn m╒nh tÑm m╤c quan tr≈ng l╛n lao cⁿa m°c tiΩu tφch h■p
nαy. M°c tiΩu nαy ≡π nΣy sinh nhi½u kh≤ khσn vα khi¬n cho ch·ng ta phΣi chñp
nhºn NguyΩn Tíc Th±c D°ng sau ≡Γy: Kh⌠ng c≤ cßch nαo t╒o ra m╡t tiΩu chuªn vºn
hαnh hoαn hΣo v╛i tñt cΣ m≈i ╤ng d°ng hi«n th╢i, do ≡≤, phΣi cΓn nhíc th±c t¬
≡¼ ≡½ ra m╡t tiΩu chuªn c≤ th¼ s╪ d°ng ≡▀■c trong cαng nhi½u ╤ng d°ng quan
tr≈ng vα cαng nhi½u h« th»ng mßy cαng t»t. Ch·ng t⌠i xin nhñn m╒nh ╖ ≡i¼m "c≤
th¼ s╪ d°ng ≡▀■c."
37
3. VISCII: QUY ╨ÿNH M├ 8-BIT CHO VIÄT NG
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3.1 ╨ôNG L╣C
~~~~~~~~~~~~
Nhµng bóng ch╤ng c° th¼ cho thñy h▀╛ng giΣi quy¬t A5 m⌠ tΣ ╖ phÑn trΩn, ≡út 6
mτu t± Vi«t vαo v∙ng C0, lα h▀╛ng c≤ nhi½u khΣ nσng nhñt ≡¼ th÷a mπn nhµng
≡i½u ki«n ≡út ra trong PhÑn 1. Vi«c l±a ch≈n 6 mπ t± C0 vα 6 mτu t± Vi«t φt
d∙ng nhñt, n¬u ≡▀■c cΓn nhíc k█ cαng, trΩn th±c t¬ s¿ lαm giΣm rñt nhi½u xßc
suñt xung khíc. Nhµng ▀u t▀ trong lπnh v±c th⌠ng tin dµ ki«n ≡▀■c giΣi quy¬t
bóng cßch trßnh cßc mπ t± C0 th▀╢ng d∙ng ≡¼ ≡i½u khi¼n khu⌠n th╤c dµ ki«n.
Thºt ra, m»i quan tΓm trong lπnh v±c th⌠ng tin dµ ki«n nΩn h▀╛ng vαo cßc k² t±
trong v∙ng C1 vα G1; m╡t thφ d° n▒i bºt lα vi«c truy½n ≡i«n th▀ qua cßc c▒ng
7-bit vα cßc b╡ chuy¼n ≡i«n th▀. Nhµng thñt b╒i trong vi«c chuy¼n tin ╖ ≡Γy
th▀╢ng lα do vi«c d∙ng bit th╤ tßm ch╤ kh⌠ng phΣi v∞ cßc mπ t± C0. DÑu th¬
nαo ≡i nµa, ta vτn c≤ nhµng ph▀╜ng th╤c ≡¼ truy½n dµ ki«n trong tr╒ng thßi
"8-bit" nαo ≡≤, hoúc d∙ng d╒ng "ti¬ng Vi«t ≡≈c-≡▀■c-trong-ngoúc" nh▀ ≡▀■c m⌠
tΣ trong PhÑn 4.
┐u ≡i¼m chφnh cⁿa ph▀╜ng phßp nαy lα s± s╟n sαng vα d¡ dαng tφch h■p vαo
m⌠i tr▀╢ng hi«n th╢i mα kh⌠ng gúp phΣi nhµng tríc tr╖ nh▀ cßc h▀╛ng giΣi quy¬t
khßc, giΣ s╪ róng cßc h▀╛ng giΣi quy¬t kia c≤ th¼ tφch h■p ≡▀■c. S± ki«n bΣn
vσn ki«n nαy ≡▀■c so╒n bóng h« th»ng TeX ch╒y trΩn Unix lα m╡t bóng ch╤ng h∙ng
h░n cho thñy s± thαnh c⌠ng cⁿa h▀╛ng giΣi quy¬t nαy. BΣn vσn ≡▀■c so╒n trong
m╡t khung-X 8-bit, d∙ng m╡t ñn bΣn Elvis ≡▀■c bi¬n ≡▒i {5} ch·t φt. (Elvis lα
m╡t ñn bΣn 8-bit c⌠ng c╡ng d∙ng gi»ng nh▀ b╡ so╒n vσn Vi cⁿa Unix.) CΣ TeX
(m╡t h« th»ng so╒n tαi li«u) vα Dvi2ps (m╡t ╤ng d°ng t╒o ra d╒ng PostScript)
≡½u nhºn vα x╪ l² dµ ki«n ti¬ng Vi«t (8-bit) m╡t cßch d¡ dαng vα th⌠ng su»t.
Cßc ╤ng d°ng khßc g░m c≤ bΣng tφnh (spreadsheet), ╤ng d°ng nh∞n chµ (text
viewer), in PostScript vα ma trºn ≡i¼m (dot-matrix), WordPerfect, Word, PC
Tools cⁿa DOS, v.v... ≡π ≡▀■c th╪ nghi«m qua vα ch╒y t»t ≡⌐p v╛i vσn ki«n
ti¬ng Vi«t. Tñt cΣ cßc bi¬n ≡▒i, n¬u c≤, chⁿ y¬u lα ≡¼ cßc ╤ng d°ng nαy chñp
nhºn dµ ki«n 8-bit. M╡t ╤ng d°ng gißo khoa ti¬ng Vi«t ≡π ≡▀■c vi¬t bóng ng⌠n
ngµ thΣo ch▀╜ng C v╛i cßc cΓu vσn ti¬ng Vi«t 8-bit ≡φnh kΦm trong bΣn thΣo
ch▀╜ng. V╛i ≡α gia tσng qu»c t¬ h≤a nhu li«u hi«n nay, cßc ╤ng d°ng vα nhu
li«u d°ng c° ≡ang ≡▀■c s╪a ≡▒i ≡¼ nhºn dµ ki«n 8-bit, vα do d≤ vi«c tφch h■p
cⁿa b╡ mπ t± Vi«t nαy l╒i cαng d¡ dαng h╜n.
3.2 C┴C L▌ DO BIÄN MINH VIÄC M├ H╙A
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
M╡t ≡i½u ki«n cσn bΣn lα phΣi bΣo toαn bΣng k² t± h∞nh (graphic character)
ASCII 7-bit (G0) v∞ m°c tiΩu lα tφch h■p v╛i cßc m⌠i tr▀╢ng hi«n th╢i. Do ≡≤,
v∙ng G0 ≡▀■c giµ nguyΩn v⌐n. ╨»i v╛i 6 mπ t± C0, ≡Ñu tiΩn ch·ng t⌠i phßc h≈a
ra v∙ng C0 vα xem xΘt cßc cßch d∙ng tiΩu bi¼u vα li«t kΩ trong BΣng 1. Cßc mπ
t± ≡▀■c ch≈n, STX (2), ENQ (5), ACK (6), DC4 (20), EM (25), vα RS (30) lα
nhµng mπ t± φt gΓy tr╖ ng╒i nhñt cho vi«c truy½n tin c√ng nh▀ cho cßc ╤ng d°ng
quan tr≈ng ≡π ≡▀■c c╤u xΘt. Ch╞ng h╒n nh▀ cßch d∙ng cⁿa mπ t± ACK thºt s± t∙y
thu╡c vαo ngµ cΣnh cⁿa n≤. Trong nhµng biΩn bΣn (protocol) mα ch·ng t⌠i ≡π
------------------------------------------------------------------------------
{5} S╪a ≡¼ c≤ th¼ ≡ßnh chµ Vi«t nh▀ s¿ ≡▀■c m⌠ tΣ trong cßc phÑn sau.
38
BΣng 1: Nhµng xung ≡╡t c≤ th¼ xΣy ra khi d∙ng C0. Cßc mπ s» d∙ng
trong tiΩu chuªn 8-bit VISCII ≡▀■c nΩu ra v╛i dñu "+" bΩn c╒nh.
=====================================================================
M├ SÅ T╩N CTRL- CHUNG M┴Y IN(PC) PC Unix vi (Unix)
---------------------------------------------------------------------
0 NUL @ C string strings
1 SOH A
+2 STX B back screen
3 ETX C INTR INTR INTR INTR
4 EOT D EOF EOF back tab
+5 ENQ E
+6 ACK F forw.screen
7 BEL G BEL BEL BEL BEL
8 BS H BS BS BS BS BS
9 HT I HT HT HT HT HT
10 LF J LF LF LF LF LF
11 VT K VT
12 FF L FF FF FF redraw
13 CR M CR CR CR CR CR
14 SO N wide on(IBM)
15 SI O comp.on(IBM)
16 DLE P Prt.on/off
17 DC1 Q XOFF XOFF XOFF XOFF
18 DC2 R comp.off(IBM) retype
19 DC3 S XON XON XON XON
+20 DC4 T wide off(IBM) forw.tab
21 NAK U clr buf(IBM) kill kill
22 SYN V literal literal
23 ETB W werase werase
24 CAN X kill
+25 EM Y suspend
26 SUB Z EOF suspend
27 ESC [ ESC chu▓i ESC ESC ESC ESC
28 FS \ quit
29 GS ] Telnet ESC
+30 RS ^
31 US _ Windows
=====================================================================
BΣng 2: Nhµng mτu t± Vi«t Nam ≡π c≤ s╟n trong tiΩu chuªn 8859/Latin-1.
+===============================================================+
| 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : A : B : C : D : E : F |
+----+---------------------------------------------------------------+
| Cx | └ : ┴ : ┬ : ├ : : : : : ╚ : ╔ : ╩ : : ╠ : ═ : : : |
| Dx | ╨ : : ╥ : ╙ : ╘ : á : : : : ┘ : ┌ : : : ▌ : : : |
| Ex | α : ß : Γ : π : : : : : Φ : Θ : Ω : : ∞ : φ : : : |
| Fx | ≡ : : ≥ : ≤ : ⌠ : ⌡ : : : : ∙ : · : : : ² : : : |
+====================================================================+
39
duy«t qua, ACK ch∩ ≡▀■c coi lα m╡t mπ t± "≡i½u khi¼n" khi nóm bΩn ngoαi khung
dµ ki«n; bΩn trong khung dµ ki«n n≤ ≡▀■c coi nh▀ m╡t mπ t± b∞nh th▀╢ng. ╨¼
lαm giΣm xßc suñt xung khíc h╜n nµa, 6 mτu t± chµ hoa ti¬ng Vi«t t▀╜ng ≡»i φt
d∙ng nhñt, , , , , , vα , ≡▀■c ≡út vαo ch▓ cⁿa 6 mπ t± C0 nαy.
Vñn ≡½ ti¬p theo lα mπ h≤a 128 chµ Vi«t c≥n l╒i trong v∙ng ASCII-r╡ng (C1
vα G1). V∞ kh⌠ng c≤ m╡t tiΩu chuªn qu»c t¬ duy nhñt trong v∙ng nαy ≡¼ d±a
theo, ph▀╜ng chΓm t»t nhñt lα cªn thºn t»i ≡a ≡¼ khi gúp tr▀╢ng h■p xñu nhñt
th∞ ng▀╢i d∙ng vτn c≥n c≤ th¼ s╪ d°ng tñt cΣ mτu t± con.
Vi«c mπ h≤a cßc k² t± trong v∙ng C1 gúp φt ríc r»i h╜n, múc dÑu m╡t vαi k²
t± C1 ≡▀■c s╪ d°ng v╛i ² nghεa ≡úc bi«t trong m╡t s» ╤ng d°ng. Duy«t qua cßc
c⌠ng tr∞nh hi«n th╢i ≡¼ tiΩu chuªn h≤a vi«c chuy¼n ≡i«n th▀ 8-bit, ch·ng ta
thñy cßc k² t± C1 ≡▀■c coi lα cßc k² t± h∞nh vα kh⌠ng b╕ gßn cho m╡t ² nghεa
≡úc bi«t g∞ cΣ. Tuy nhiΩn, ≡▀╢ng l»i thºn tr≈ng lα ch∩ nΩn ≡út cßc mτu t± chµ
hoa vαo trong v∙ng C1.
╨»i v╛i v∙ng G1, ch·ng t⌠i nhím vαo vi«c ≡ßp ╤ng nhu cÑu cⁿa b╡ mπ t± d∙ng
trΩn mßy vi tφnh PC (code page 850) vα tuΓn theo, n¬u c≤ th¼ ≡▀■c, tºp k² t±
8859/Latin-1 mα trong ≡≤ m╡t s» mτu t± Vi«t ≡π c≤ s╟n.
Kinh nghi«m trong vi«c thi¬t k¬ b╡ mπ t± cho h« th»ng MS-DOS ≡π ≡▀a ≡¬n
vi«c c╤u xΘt cßc k² t± v¿ ≡▀╢ng th╞ng trong b╡ mπ t± PC. N¬u mu»n cho phΘp
m╡t s» ╤ng d°ng v╫a d∙ng chµ Vi«t v╫a d∙ng k² t± v¿ ≡▀╢ng th╞ng mα kh⌠ng phΣi
chuy¼n ph⌠ng, ch·ng ta ch∩ c≤ th¼ bΣo toαn t»i ≡a lα cßc mτu t± con vα cßc k²
t± v¿ ≡▀╢ng ≡╜n vα ≡⌠i (single- and double-line drawing characters) mα th⌠i.
╨i½u nαy c≤ nghεa lα phΣi ≡út cßc mτu t± chµ hoa vαo v╕ trφ cⁿa cßc k² t± v¿
≡▀╢ng ≡╜n vα ≡⌠i. V╛i cßch nαy, ng▀╢i d∙ng MS-DOS c≤ th¼ ≡▀■c cung cñp tºp mπ
t± c≤ tñt cΣ cßc mτu t± Vi«t hoúc tºp mπ t± trong ≡≤ m╡t s» chµ hoa b╕ thay
th¬ bóng cßc k² t± v¿ ≡▀╢ng ≡╜n vα ≡⌠i. ╨»i v╛i cßc ╤ng d°ng c≤ s╟n, ng▀╢i
d∙ng c≤ th¼ ch≈n tºp mπ t± nαo thφch h■p nhñt cho m°c ≡φch cⁿa h≈. Khi bít
bu╡c phΣi s╪ d°ng tºp mπ t± v¿ ≡▀╢ng th╞ng, nhµng thi«t h╒i v∞ thi¬u cßc mτu
t± Vi«t c√ng giΣm thi¼u v∞ cßc mτu t± thi¬u lα mτu t± chµ hoa t▀╜ng ≡»i φt
≡▀■c d∙ng. ╨»i v╛i cßc ╤ng d°ng m╛i, ph▀╜ng th╤c "thay tºp mπ t±" c≤ th¼ ≡▀■c
s╪ d°ng, n¬u mu»n.
Nhu cÑu t▀╜ng h■p v╛i tiΩu chuªn 8859/Latin-1 ch∩ lα m╡t y¬u t» ph° nhóm
thuºn ti«n cho ng▀╢i d∙ng ch╤ kh⌠ng phΣi lα ≡i½u bít bu╡c. M╡t thφ d° c° th¼
lα ng▀╢i d∙ng ╖ Phßp nghε róng h≈ ch∩ cÑn nhñn c∙ng nhµng phφm chµ nh▀ nhau ≡¼
t╒o ra nhµng mτu t± Vi«t vα mτu t± Phßp gi»ng nhau, nh▀ chµ Θ ch╟ng h╒n. ╨≤
c√ng lα ≡i½u t± nhiΩn vα h■p l². Vi«c ch≈n t▀╜ng h■p v╛i 8859/Latin-1 xuñt
phßt t╫ s± ph▒ th⌠ng vα th╕nh hαnh cⁿa bαn ≡ßnh chµ vα ph⌠ng chµ 8859/Latin-1,
nh▀ lo╒t thi¬t b╕ ≡Ñu cu»i VT (VT-terminal series) cⁿa hπng Digital, bΣng phφm
chµ Xterm, vα ╤ng d°ng khung cⁿa Microsoft (MS Windows). BΣng 2 li«t kΩ ra
cßc mτu t± 8859/Latin-1 trong v∙ng G1 tr∙ng h■p v╛i mτu t± Vi«t {6}. C≤ th¼
k¬t luºn róng tñt cΣ vσn bΣn ch╤a chµ 8859/Latin-1 mα phÑn l╛n cßc chµ lα
ASCII vα nhµng mτu t± thu╡c BΣng 2, ch╞ng h╒n nh▀ vσn bΣn ti¬ng Phßp, ≡½u c≤
------------------------------------------------------------------------------
{6} L▀u ² róng chµ ≡ trong BΣng 2 thºt ra lα chµ "edh" cⁿa ti¬ng Bσng ╨Σo
theo 8859/Latin-1; d╒ng ≡ cⁿa Vi«t Nam ≡·ng theo bΣng 8859/Latin-2 h╜n.
40
BΣng 3: VISCII -- D± thΣo tiΩu chuªn 8-bit cho chµ Vi«t.
+======================================================================+
| || 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : A : B : C : D : E : F |
+======================================================================+
| 0x || NUL:SOH: :ETX:EOT: : :BEL:BS :HT :LF :VT :FF :CR :SO :SI |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 1x || DLE:DC1:DC2:DC3: :NAK:SYN:ETB:CAN: :SUB:ESC:FS :GS : :US |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 2x || SP : ! : " : # : $ : % : & : ' : ( : ) : * : + : , : - : . : / |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 3x || 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : : : ; : < : = : > : ? |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 4x || @ : A : B : C : D : E : F : G : H : I : J : K : L : M : N : O |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 5x || P : Q : R : S : T : U : V : W : X : Y : Z : [ : \ : ] : ^ : _ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 6x || ` : a : b : c : d : e : f : g : h : i : j : k : l : m : n : o |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 7x || p : q : r : s : t : u : v : w : x : y : z : { : | : } : ~ :DEL|
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 8x || Ç : ü : é : â : ä : à : å : ç : ê : ë : è : ï : î : ì : Ä : Å |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 9x || É : æ : Æ : ô : ö : ò : û : ù : ÿ : Ö : Ü : ¢ : £ : ¥ : ₧ : ƒ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Ax || á : í : ó : ú : ñ : Ñ : ª : º : ¿ : ⌐ : ¬ : ½ : ¼ : ¡ : « : » |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Bx || ░ : ▒ : ▓ : │ : ┤ : ╡ : ╢ : ╖ : ╕ : ╣ : ║ : ╗ : ╝ : ╜ : ╛ : ┐ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Cx || └ : ┴ : ┬ : ├ : ─ : ┼ : ╞ : ╟ : ╚ : ╔ : ╩ : ╦ : ╠ : ═ : ╬ : ╧ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Dx || ╨ : ╤ : ╥ : ╙ : ╘ : ╒ : ╓ : ╫ : ╪ : ┘ : ┌ : █ : ▄ : ▌ : ▐ : ▀ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Ex || α : ß : Γ : π : Σ : σ : µ : τ : Φ : Θ : Ω : δ : ∞ : φ : ε : ∩ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Fx || ≡ : ± : ≥ : ≤ : ⌠ : ⌡ : ÷ : ≈ : ° : ∙ : · : √ : ⁿ : ² : ■ : |
+======================================================================+
th¼ ≡≈c ≡▀■c, v╛i t∩ l« cao, trong m⌠i tr▀╢ng Vi«t ngµ.
Cu»i c∙ng, m╡t s» ╤ng d°ng kh⌠ng hi¼n th╕ (render) ≡▀■c m╡t s» mπ t± trong
v∙ng G1 nh▀ nhµng chµ c≤ mπ s» 160 (mπ t± cßch-dφnh (non-breaking space
character) trong 8859/Latin-1), 202 (mπ t± cßch-dφnh d∙ng trΩn mßy Macintosh),
hoúc 255. Danh sßch nhµng mπ t± c≤ th¼ kh⌠ng hi¼n th╕ ≡▀■c c≤ th¼ rñt dαi:
gÑn 30 mπ t± trong MS Windows 3.0 vα khoΣng 25 mπ t± trong MS Windows 3.1.
TuΓn theo ph▀╜ng chΓm cªn thºn ≡π nΩu trΩn, ch·ng t⌠i phΣi ≡út nhµng mτu t±
chµ hoa vαo nhµng v╕ trφ nαy. Trong nhµng ╤ng d°ng cho phΘp chuy¼n ph⌠ng,
vi«c hi¼n th╕ cßc chµ hoa c≤ th¼ ≡▀■c giΣi quy¬t bóng cßch cung cñp m╡t cúp
ph⌠ng cho m▓i ki¼u chµ: ph⌠ng b∞nh th▀╢ng vα ph⌠ng chµ hoa. Trong ph⌠ng chµ
hoa, tñt cΣ nhµng mτu t± ≡ßng l¿ lα chµ con ≡½u ≡▀■c bi¬n ≡▒i thαnh chµ hoa
t▀╜ng ╤ng. Trong th±c t¬ khi gúp m╡t chµ hoa (thφ d° chµ á) kh⌠ng th¼ hi¼n
th╕ ≡▀■c, ng▀╢i d∙ng ch∩ cÑn chuy¼n qua ph⌠ng chµ hoa t▀╜ng ╤ng vα ≡ßnh vαo
chµ th▀╢ng (chµ ⌡).
41
Sau khi ≡π ≡╕nh ra nhµng nguyΩn tíc ch∩ ≡╒o nΩu trΩn, c⌠ng vi«c ch∩ c≥n lα
x¬p ≡út cßc mτu t± Vi«t c≥n l╒i theo m╡t l»i nαo ≡≤, hay c≤ th¼ t∙y ti«n c√ng
≡▀■c. Vi«c nαy ≡π ≡▀■c th±c hi«n sao cho bΣng mτu t± c≤ m╡t d╒ng t▀╜ng ≡»i
cΓn ≡»i, thªm m█. K¬t quΣ lα tñt cΣ nguyΩn tíc ch∩ ≡╒o nΩu trΩn ≡½u ≡▀■c th÷a
mπn, ngo╒i tr╫ m╡t ≡i½u lα v╕ trφ chµ á trong 8859/Latin-1 kh⌠ng th¼ duy tr∞
≡▀■c. CÑn ghi nhºn lα kh⌠ng c≤ cßch nαo c≤ th¼ bΣo toαn ≡▀■c th╤ t± cßc mτu
t± Vi«t, nh▀ng ≡Γy kh⌠ng phΣi lα m╡t vñn ≡½ l╛n v∞ th╤ t± cßc mτu t± kh⌠ng
phΣi ASCII c≤ th¼ ≡▀■c giΣi quy¬t bóng cßch "tra bΣng."
BΣn d± thΣo tºp mπ t± ti¬ng Vi«t VISCII 8-bit (BΣng 3) ≡π ≡▀■c h∞nh thαnh
d±a trΩn cßc nguyΩn tíc nΩu trΩn. Ch·ng t⌠i c≤ ² ≡╕nh xem ≡Γy lα m╡t bΣng mπ
t± duy nhñt ßp d°ng vαo m≈i vi«c s╪ d°ng dµ ki«n ti¬ng Vi«t nh▀ l▀u trµ, x╪
l², truy½n, vα mπ h≤a ph⌠ng chµ. ╨i½u nαy s¿ ≡╜n giΣn h≤a rñt nhi½u quß tr∞nh
tφch h■p, th±c hi«n, vα s╪ d°ng, vα thºt s± lα m╡t trong nhµng ≡i¼m son cⁿa
bΣn d± thΣo nαy.
4. VIQR: QUY ╨ÿNH VIÄT NG ╨ÜC-╨┐öC-TRONG-NGOâC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4.1 ╨ôNG L╣C
~~~~~~~~~~~~
Trong khi quy ≡╕nh 8-bit ≡ang c» gíng tiΩu chuªn h≤a chµ Vi«t trong m⌠i tr▀╢ng
8-bit th∞ vτn c≥n rñt nhi½u vñn ≡½ phΣi ≡▀■c giΣi quy¬t trong m⌠i tr▀╢ng
7-bit, ch╞ng h╒n nh▀ vi«c chuy¼n ≡i«n th▀ vα cßc ≡▀╢ng dΓy chuy¼n tin 7-bit
khßc, c√ng nh▀ cßc giao di«n ≡¼ t╒o chµ Vi«t c√ng cÑn phΣi ≡▀■c tiΩu chuªn
h≤a.
S± kh≤ khσn khi phΣi truy½n nhi½u h╜n 128 mπ s» khßc nhau qua kΩnh
(channel) 7-bit kh⌠ng phΣi lα m╡t vñn ≡½ riΩng cⁿa ti¬ng Vi«t. Ngay t╫ sau
khi Quy Luºt Chuy¼n ╨i«n Th▀ LiΩn L▀╛i ╨╜n GiΣn ("SMTP", [8] ≡▀■c ≡½ ngh╕ nσm
1982, ≡π c≤ nhi½u n▓ l±c khai tri¼n quy luºt nαy nhóm ≡ßp ╤ng nhu cÑu chuy¼n
dµ ki«n 8-bit hoúc nhi½u h╜n cho nhµng chµ Latin ╖ ┬u ChΓu vα nhµng chµ t▀■ng
h∞nh ╖ ╨⌠ng ph▀╜ng (xem [9] ch╞ng h╒n). Múc d∙ nhu cÑu chuy¼n vºn 8-bit thºt
cÑn thi¬t, cßc c▒ng chuy¼n ≡i«n th▀ kh⌠ng d¡ g∞ thay ≡▒i trong m╡t s╛m m╡t
chi½u. Trong t▀╜ng lai tr▀╛c mít, ch·ng ta vτn c≤ nhu cÑu chuy¼n ≡i«n th▀
ti¬ng Vi«t minh b╒ch qua m╒ch 7-bit.
QuΣ thºt ≡π c≤ m╡t tiΩu chuªn ≡úc bi«t d∙ng trΩn Viet-Net vα trong nh≤m
th⌠ng tin Soc.Culture.Vietnamese trΩn m╒ng l▀╛i Usenet. ╨≤ lα cßch d∙ng nhµng
k² t± thφch h■p d¡ nh╛ ≡i theo sau m╡t nguyΩn Γm ≡¼ t▀■ng tr▀ng cho dñu ph°
(thφ d° nh▀ dñu ^ t▀■ng tr▀ng dñu m√); ch╞ng h╒n, "Vi«t Nam" ≡▀■c vi¬t thαnh
"Vie^.t Nam." Tuy nhiΩn, quy tíc nαy kh⌠ng ≡▀■c r⌡ rαng b╖i v∞ m╡t s» k² t±
t▀■ng tr▀ng dñu ph° c≤ th¼ d∙ng lαm chñm cΓu; thφ d° "tha?" c≤ th¼ hi¼u lα
tha? hoúc thΣ.
Quy ▀╛c cⁿa Viet-Net c√ng t▀╜ng t± nh▀ khßi ni«m "≡≈c ≡▀■c trong ngoúc" do
K. Simonsen [10,11] ≡½ ngh╕, ≡π lαm sßng t÷ nhµng tr▀╢ng h■p m╜ h░ nh▀ trΩn
bóng cßch quy ≡╕nh thΩm tr╒ng thßi cho bΣn vσn ╖ cΣ cñp bºc mτu t± lτn cñp bºc
h« mτu t±. Kh⌠ng may, trong n▓ l±c cung cñp m╡t giΣi phßp cho toαn th¬ gi╛i,
≡½ ngh╕ nαy kh⌠ng giΣi quy¬t thoΣ ≡ßng ti¬ng Vi«t. ╨Ñu tiΩn, quy tíc nαy gi╛i
h╒n nhµng k² t± d¡ nh╛ trong tºp h■p 83 k² t± h∞nh c» ≡╕nh cⁿa ISO-646 [12].
42
╨i½u nαy t÷ ra t»t ≡⌐p trΩn nguyΩn tíc, nh▀ng l╒i lαm cho chµ Vi«t kh≤ ≡≈c.
Thφ d° dñu "h÷i" vα "ngπ" ≡▀■c quy ≡╕nh lÑn l▀■t lα 2 vα ?, ≡¼ trßnh d∙ng dñu
~, v∞ dñu nαy kh⌠ng phΣi lα m╡t k² t± c» ≡╕nh. S± ph▒ bi¬n r╡ng rπi cⁿa cßc
bαn ≡ßnh chµ ASCII trong ≡a s» cßc ng▀╢i d∙ng chµ Vi«t cho thñy vi«c gi╛i h╒n
nαy kh⌠ng h■p l². C√ng xin nhñn m╒nh lα ch·ng t⌠i ≡ang bi«n h╡ quan ≡i¼m "d¡
≡≈c cho ≡a s» ng▀╢i d∙ng" h╜n lα "kh≤ ≡≈c cho tñt cΣ ng▀╢i d∙ng." H╜n nµa,
v╛i ≡α ti¬n cⁿa vi«c qu»c t¬ h≤a cßc bαn ≡ßnh chµ vα mαn Σnh, thφ d° trong m⌠i
tr▀╢ng bi¼u h≈a khung (graphical window environment), vi«c tßi ≡╕nh nghεa cßc
phφm chµ vα thay ≡▒i ph⌠ng c≤ th¼ ≡▀■c th±c hi«n d¡ dαng khi¬n cho gi╛i h╒n
trΩn cαng ngαy cαng l▓i th╢i.
╨i¼m kh≤ khσn l╛n h╜n cⁿa quy tíc trΩn lα ph▀╜ng phßp mπ h≤a bóng hai k²
t± (chi½u dαi c» ≡╕nh) {7} lαm cho chµ Vi«t kh≤ ≡≈c, nhñt lα nhµng mτu t± Vi«t
c≤ hai dñu ph° (thφ d° ñ). Ph▀╜ng phßp mπ h≤a d∙ng nhi½u k² t± (chi½u dαi
thay ≡▒i) {8} c√ng kh≤ ≡≈c vα kh⌠ng c≤ hi«u nσng v∞ ch╤a ≡Ñy ngh⌐t nhµng k² t±
d∙ng ≡¼ m╖ vα ≡≤ng trong khi Vi«t ngµ l╒i d∙ng nhi½u dñu. Múc d∙ mßy vi tφnh
c≤ th¼ ≡≈c d¡ dαng bñt c╤ ph▀╜ng phßp "d¡ nh╛" nαo, ph▀╜ng phßp ßp d°ng cho
ng▀╢i d∙ng phΣi lα ph▀╜ng phßp khi¬n h≈ c≤ th¼ ≡≈c vα vi¬t m╡t cßch d¡ dαng
khi d∙ng nhµng nhu li«u vi¬t bαi (editor) 7-bit. Ng▀╢i d∙ng Vi«t ngµ kh⌠ng
mu»n phΣi h≈c hoúc nh╛ nhµng chu▓i k² t± nh▀ "a5" t▀■ng tr▀ng cho í, hoúc phΣi
≡ßnh nhµng chu▓i dαi nh▀ "&_a('_" ≡¼ t▀■ng tr▀ng cho m╡t mτu t± Vi«t nαo ≡≤
trong cΣ m╡t bαi dαi.
╨¼ th÷a mπn nhu cÑu d¡ ≡≈c vα uy¼n chuy¼n, ch·ng ta cÑn ñn ≡╕nh m╡t quy
tíc khßc. Cßch t»t h╜n lα d∙ng ph▀╜ng phßp chuy¼n mπ-t±-h« (code-page
switching) nh▀ quy ≡╕nh ISO-2022 [13] ≡¼ chuy¼n vσn bΣn vαo tr╒ng thßi Vi«t
ngµ vα t»i ▀u h≤a vi«c mπ h≤a tu╧ theo tr╒ng thßi ng⌠n ngµ. GÑn ≡Γy, van der
Poel ≡½ x▀╛ng m╡t ph▀╜ng-phßp d¡ nh╛ [14], nhñn m╒nh v½ nhµng quy ▀╛c riΩng
cⁿa t╫ng ng⌠n ngµ. ╨½ ngh╕ nαy cung cñp m╡t ph▀╜ng ti«n ≡¼ quy ≡╕nh tr╒ng
thßi ng⌠n ngµ, v╛i m▓i ng⌠n ngµ t± quy ≡╕nh lñy cßch mπ h≤a sao cho hµu hi«u
nhñt. L■i ≡i¼m chφnh cⁿa n≤ lα nhµng ╤ng d°ng d±a theo ph▀╜ng phßp nαy kh⌠ng
cÑn phΣi t╒o h∞nh cho tñt cΣ cßc tºp mπ t± ≡▀■c ch∩ ≡╕nh trong d≥ng tin, mα
ch∩ cÑn t∙y nghi bßo tin v½ nhµng ng⌠n ngµ kh⌠ng ≡▀■c h▓ tr■ nh▀ "╖ ≡Γy c≤ chµ
Hy L╒p kh⌠ng th¼ t╒o h∞nh ≡▀■c" (xin xem [14] ≡¼ bi¬t thΩm chi ti¬t chφnh xßc
cⁿa quy ≡╕nh). Ph▀╜ng phßp nαy cho phΘp m▓i c╡ng ≡░ng d∙ng cßch th╤c riΩng
t»t nhñt ≡¼ mπ h≤a ng⌠n ngµ cⁿa m∞nh. Quy ▀╛c VIQR ph∙ h■p v╛i ≡▀╢ng l»i nαy,
vα c≤ th¼ ≡▀■c sßt nhºp d¡ dαng vαo khu⌠n kh▒ nαy.
Nhµng quy ≡╕nh ≡út ra ╖ ≡Γy s¿ ßp d°ng vαo m≈i d≥ng dµ ki«n, g░m c≤ chuy¼n
chµ (text transfer), h░ s╜ xuñt nhºp (file I/O), vα cßch ≡ßnh chµ. NguyΩn tíc
nαy lα nguyΩn nhΓn chφnh ≡▀a ≡¬n s± thαnh c⌠ng cⁿa khi¼n h« Unix, trong ≡≤
ng▀╢i vi¬t nhu li«u kh⌠ng phΣi quan tΓm ≡¬n cßc chi ti¬t ≡úc th∙ cⁿa cßc b╡
phºn ph°, mα ch∩ quan tΓm ≡¬n m╡t giao di«n ≡░ng nhñt (uniform interface) ≡¼
t╫ ≡≤ c≤ th¼ chia xδ cßc nhu li«u th▀ vi«n cⁿa khi¼n h«. Do ≡≤ ≡i½u cÑn thi¬t
lα cung cñp m╡t cσn bΣn chung ≡¼ t╫ ≡≤ xΓy d±ng nhµng phÑn d╕ch (data
interpreter) cho m≈i d≥ng dµ ki«n bñt k¼ xuñt x╤. TrΩn th±c t¬, ≡i½u nαy ≡π
gi·p cho vi«c phßt tri¼n nhµng nhu li«u d∙ng chµ Vi«t ≡▀■c d¡ dαng rñt nhi½u.
------------------------------------------------------------------------------
{7} Cßch d∙ng lα "&xy", mα x lα chµ chφnh vα y lα chµ ph° ≡¼ ghΘp v╛i x.
{8} Cßch d∙ng lα "&_xxxx_", mα xxxx c≤ th¼ lα bñt c╤ chu▓i k² t± nαo.
43
Ngoαi ra, ng▀╢i d∙ng ≡▀■c h▀╖ng l■i φch l╛n t╫ vi«c tiΩu chuªn h≤a cßch
≡ßnh chµ. H≈ kh⌠ng cÑn phΣi h≈c nhi½u cßch ≡ßnh chµ khßc nhau khi x╪ d°ng
nhµng nhu li«u khßc nhau. N¬u tñt cΣ cßc nhu li«u c∙ng h▓ tr■ m╡t tiΩu chuªn
chung, ng▀╢i d∙ng ≡π quen v╛i tiΩu chuªn nαy c≤ th¼ ≡ßnh ti¬ng Vi«t ngay mα
kh⌠ng cÑn phΣi h≈c l╒i. TiΩu chuªn trong bαi nαy quy ≡╕nh nhµng ≡úc tφnh t»i
thi¼u mα cßc nhu li«u h▓ tr■ n≤ cÑn phΣi c≤; dε nhiΩn cßc k█ thuºt ≡ßnh chµ
khßc c≤ th¼ ≡▀■c sßt nhºp c∙ng v╛i tiΩu chuªn nαy thαnh m╡t quy ≡╕nh t▒ng qußt
h╜n. ╨i½u nαy s¿ ≡▀■c bαn luºn thΩm nµa trong PhÑn 5.2 n≤i v½ cßch ≡ßnh chµ
Vi«t.
4.2 QUY ╨ÿNH "╨ÜC-╨┐öC-TRONG-NGOâC" (VIQR)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Quy ≡╕nh nαy hoαn toαn s╪ d°ng ki¼u mτu "d¡ ≡≈c" cⁿa Viet-Net. Quy ≡╕nh Vi«t
ngµ "≡≈c ≡▀■c trong ngoúc," VIQR, g░m c≤ ba tr╒ng thßi: nguyΩn d╒ng, Anh Ngµ
vα Vi«t Ngµ. Tr╒ng thßi nguyΩn d╒ng chⁿ ² ≡¼ chuy¼n tin y nguyΩn, kh⌠ng thay
≡▒i (ngo╒i tr╫ nhµng chu▓i thoßt (escape sequence) ≡¼ m╖ ≡Ñu vα k¬t th·c tr╒ng
thßi nguyΩn d╒ng). Tr╒ng thßi Anh Ngµ vα Vi«t Ngµ ≡▀■c d∙ng chⁿ y¬u cho nhµng
d≥ng dµ ki«n c≤ pha tr╡n Anh ngµ vα Vi«t ngµ, v╛i m▓i tr╒ng thßi ≡▀■c t»i ▀u
h≤a v½ h∞nh th╤c c√ng nh▀ kh»i l▀■ng t∙y theo vσn bΣn ch╤a ≡a s» lα Anh ngµ
hoúc Vi«t ngµ. M▓i tr╒ng thßi ≡½u c≤ c╜ ch¬ riΩng ≡¼ vi¬t mτu t± Vi«t, bóng
cßch d∙ng m╡t hoúc hai dñu ph° theo sau nguyΩn Γm cσn bΣn.
Tr▀╛c h¬t xin gi╛i thi«u khßi ni«m t╒o chµ ngÑm (implicit composition) vα
t╒o chµ ch∩ ≡╕nh (explicit composition).
4.2.1 PhΘp T╒o Chµ NgÑm
~~~~~~~~~~~~~~~~~~~~~~~
PhΘp t╒o chµ ngÑm th▀╢ng ≡▀■c d∙ng m╡t cßch hµu hi«u cho nhµng dµ ki«n phÑn
l╛n lα chµ Vi«t. Trong phΘp t╒o chµ ngÑm, m▓i khi m╡t hay hai dñu ph° ≡i theo
sau m╡t nguyΩn Γm c╜ bΣn th∞ ch·ng s¿ k¬t h■p v╛i nguyΩn Γm ≡≤ thαnh m╡t mτu
t± Vi«t duy nhñt sao cho ph∙ h■p v╛i quy tíc vσn ph╒m. Thφ d°:
a^ ---> Γ
o+? ---> ╖
╜? ---> ╖
Vie^.t ---> Vi«t
ViΩ.t ---> Vi«t
la'^n ---> lß^n (kh⌠ng phΣi lñn)
lß^n ---> lß^n (kh⌠ng phΣi lñn)
Trong hai thφ d° ch≤t, chu▓i a^' kh⌠ng t▀╜ng ≡▀╜ng v╛i a'^ hay ß^ v½ mút
vσn ph╒m. Th⌠ng th▀╢ng, ba k² t± ph° (, ^, vα + phΣi theo sßt sau cßc nguyΩn
Γm thφch h■p th∞ m╛i k¬t h■p ≡▀■c.
Chu▓i ≡úc bi«t dd k¬t h■p thαnh ≡; DD, dD, vα Dd ≡½u t▀■ng tr▀ng cho ╨.
Nhµng nguyΩn Γm c╜ bΣn g░m c≤ a, σ, Γ, e, Ω, i, o, ⌠, ╜, u, ▀, y, vα nhµng
mτu t± hoa t▀╜ng ╤ng. Mπ s» cⁿa nhµng mτu t± nαy ≡▀■c li«t kΩ trong BΣng 3, D±
ThΣo TiΩu Chuªn 8-bit VISCII.
Nhµng dñu ti¬ng Vi«t ≡▀■c t▀■ng tr▀ng bóng nhµng k² t± ASCII c≤ h∞nh d╒ng
44
BΣng 4. K² t± ASCII d¡ nh╛ d∙ng thay th¬ dñu ti¬ng Vi«t
==============================================================
Dñu K² t± Mπ s» ASCII Thφ d°
--------------------------------------------------------------
trσng (ß) ( 0x28, dñu m╖ ngoúc ba(n khoa(n
m√ ^ 0x5E, dñu m√ ho^m nay
m≤c + 0x2B, dñu c╡ng Qui Nho+n
síc ' 0x27, ngoúc ≡╜n La'i Thie^u
huy½n ` 0x60, ngoúc ≡╜n ng▀■c Bi`nh Du+o+ng
h÷i ? 0x3F, dñu h÷i Thu? DDu+'c
ngπ ~ 0x7E, dñu ngπ di~ va~ng
núng . 0x2E, dñu chñm ho.c ta^.p
==============================================================
t▀╜ng t±. BΣng 4 li«t kΩ 7 k² t± ASCII d¡ nh╛ ≡▀■c d∙ng ≡¼ thay th¬ nhµng dñu
ti¬ng Vi«t. Ph° l°c A vα B li«t kΩ, theo th╤ t± síp chµ vα th╤ t± mπ s», tñt
cΣ mτu t± Vi«t vα chu▓i VIQR t▀╜ng ╤ng.
4.2.2 PhΘp T╒o Chµ Ch∩ ╨╕nh
~~~~~~~~~~~~~~~~~~~~~~~~~~~
PhΘp t╒o chµ ch∩ ≡╕nh d±a trΩn khßi ni«m d∙ng m╡t k² t± ≡i tr▀╛c ≡¼ bßo tin
vi«c t╒o chµ m╡t cßch r⌡ rαng. K² t± bßo tin lα g╒ch-chΘo-ng▀■c ("\", mπ s»
ASCII 0x5C), t╫ nay s¿ g≈i lα k² t± <COM> {*}. Nhµng k² t± ≡i theo sau n≤ s¿
≡▀■c k¬t h■p theo c∙ng quy tíc vσn ph╒m nh▀ phΘp t╒o chµ gißn ti¬p. Do ≡≤,
nhµng thφ d° ╖ phÑn trΩn s¿ hi«n ra nh▀ sau khi d∙ng phΘp t╒o chµ ch∩ ≡╕nh:
a^ ---> Γ
o+? ---> ╖
Vie^.t ---> Vi«t
PhΘp t╒o chµ ch∩ ≡╕nh t÷ ra ti«n l■i trong d≥ng dµ ki«n mα phÑn l╛n lα Anh
ngµ, ≡░ng th╢i c√ng thφch h■p v╛i cßch ≡ßnh chµ th±c th╢i ≡▀■c ≡½ cºp ╖ PhÑn
5.2.
Sau ≡Γy, ch·ng ta s¿ phΓn tφch cßch x╪ d°ng hai phΘp t╒o chµ trΩn trong ba
tr╒ng thßi. Tr╒ng thßi cⁿa d≥ng dµ ki«n ≡▀■c ch∩ ≡╕nh bóng chu▓i g░m hai k² t±
<COM>x, v╛i x ≡▀■c quy ≡╕nh nh▀ sau ≡Γy.
4.2.3 Tr╒ng Thßi NguyΩn D╒ng
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
S± xuñt hi«n cⁿa <COM>L or <COM>l trong d≥ng dµ ki«n m╖ ≡Ñu tr╒ng thßi nguyΩn
d╒ng (hay nguyΩn tr╒ng). M°c ≡φch lα ≡¼ chuy¼n vºn dµ ki«n trong tr╒ng thßi
hÑu nh▀ nguyΩn v⌐n kh⌠ng bi¬n ≡▒i. CΣ hai phΘp t╒o chµ ngÑm vα ch∩ ≡╕nh ≡½u
------------------------------------------------------------------------------
{*} K² hi«u <...> ≡▀■c d∙ng ≡¼ nhñn m╒nh róng cΣ chu▓i <...> ch∩ th¼ hi«n
m╡t byte trong b╡ nh╛ hoúc kho ch╤a; nghεa lα ch∩ t▀╜ng ╤ng v╛i m╡t vα ch∩ m╡t
k² t± mα th⌠i. Thφ d°, chµ H┴T c≥n c≤ th¼ ≡▀■c vi¬t lα <H><┴><T>.
45
kh⌠ng ≡▀■c ßp d°ng ╖ ≡Γy, k¼ cΣ k² t± <COM> c√ng kh⌠ng c≤ ² nghεa ≡úc bi«t tr╫
khi k² t± nαy ≡▀■c theo sau b╖i m╡t trong sßu chµ l, L, v, V, m, hoúc M, v∞
l·c ≡≤ n≤ s¿ m╖ ≡Ñu m╡t trong ba tr╒ng thßi {9}.
4.2.4 Tr╒ng Thßi Anh Ngµ
~~~~~~~~~~~~~~~~~~~~~~~~
Tr╒ng thßi Anh ngµ ≡▀■c bít ≡Ñu bóng chu▓i <COM>M hay <COM>m. Trong tr╒ng
thßi Anh ngµ, ch∩ c≤ phΘp t╒o chµ ch∩ ≡╕nh ≡▀■c h▓ tr■. ╨i½u nαy c≤ nghεa lα
≡¼ t╒o ra m╡t chµ Vi«t, ta cÑn phΣi d∙ng k² t± bßo tin <COM>. N¬u chu▓i k² t±
kh⌠ng ≡▀■c m╖ ≡Ñu bóng <COM>, chu▓i nαy s¿ kh⌠ng ≡▀■c phΘp k¬t h■p. Thφ d°:
\mD\u~ng, how are you? ---> D√ng, how are you?
\mKho\e? kh\o^ng? ---> Khoδ kh⌠ng?
Nh▀ ≡π n≤i, chu▓i "you?" kh⌠ng ≡▀■c phΘp ≡▒i thαnh "yoⁿ" v∞ kh⌠ng c≤ k² t±
bßo tin <COM> ≡i tr▀╛c mτu t± u.
4.2.4 Tr╒ng Thßi Vi«t Ngµ
~~~~~~~~~~~~~~~~~~~~~~~~~
Chu▓i <COM>V hay <COM>v chuy¼n tr╒ng thßi cⁿa d≥ng dµ ki«n sang tr╒ng thßi
Vi«t ngµ. ù tr╒ng thßi nαy, ta c≤ th¼ d∙ng cΣ hai phΘp t╒o chµ ngÑm vα ch∩
≡╕nh. Nhµng thφ d° sau ≡Γy d±a trΩn giΣ thi¬t tr╒ng thßi ban ≡Ñu cⁿa d≥ng dµ
ki«n lα tr╒ng thßi Anh ngµ:
\vChu+~ Vie^.t ---> Chµ Vi«t
\vCh\u+~ Vi\e^.t ---> Chµ Vi«t
Chu+~ \vVie^.t ---> Chu+~ Vi«t
Tr╒ng thßi Vi«t ngµ d∙ng phΘp t╒o chµ ngÑm hÑu gi·p cho vσn bΣn gπy g≈n
h╜n v∞ kh⌠ng phΣi ch╤a nhi½u k² t± bßo tin <COM> m╡t cßch r▀╢m rα v⌠ φch nh▀
trong tr▀╢ng h■p t╒o chµ ch∩ ≡╕nh. Ngoαi ra, tr╒ng thßi Vi«t ngµ c√ng cho
phΘp t╒o chµ ch∩ ≡╕nh ≡¼ duy tr∞ s± t▀╜ng h■p (compatibility) v╛i tr╒ng thßi
Anh ngµ nhóm trßnh vi«c quy ≡╕nh thΩm ² nghεa cⁿa nhµng chu▓i bít ≡Ñu bóng k²
t± <COM>. Ngoαi ra, ph▀╜ng phßp t╒o chµ ch∩ ≡╕nh c√ng duy tr∞ s± t▀╜ng h■p v╛i
cßch ≡ßnh chµ th±c th╢i (real-time keyboarding).
4.2.6 NguyΩn T± trong Tr╒ng Thßi Anh Ngµ vα Vi«t Ngµ
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hπy xΘt thφ d° sau:
\vDu~ng, how are you? ---> D√ng, how are yoⁿ
Trong thφ d° nαy, chu▓i "you?" tr╖ thαnh "yoⁿ" v∞ d≥ng dµ ki«n ≡ang ╖
tr╒ng thßi Vi«t ngµ. V∞ th¬ ta thñy ≡⌠i khi cÑn t╒m ng▀ng vi«c k¬t h■p chµ mα
kh⌠ng phΣi chuy¼n tr╒ng thßi. Tφnh chñt "nguyΩn d╒ng" cⁿa mτu t± <COM> tr╖
------------------------------------------------------------------------------
{9} ╨¼ c≤ ≡▀■c chφnh chµ <COM>L, <COM>M, hoúc <COM>V ta cÑn phΣi chuy¼n
sang tr╒ng thßi Vi«t hoúc Anh vα d∙ng ≡úc ≡i¼m "nguyΩn d╒ng" quy ≡╕nh trong
nhµng tr╒ng thßi ≡≤. Xem 4.2.6
46
thαnh ti«n d°ng ╖ ≡Γy. Trong cΣ hai tr╒ng thßi Vi«t ngµ vα Anh ngµ, m▓i khi
k² t± <COM> ≡▀■c theo sau b╖i m╡t k² t± kh⌠ng th¼ k¬t h■p c, k¬t quΣ s¿ lα
chφnh k² t± c c≥n k² t± gi╛i thi«u <COM> s¿ b╕ lo╒i b÷ kh÷i d≥ng dµ ki«n. ╨¼
c≤ k² t± <COM>, d∙ng <COM><COM>. Hπy xem nhµng thφ d° sau:
\vddi dda^u? ---> ≡i ≡Γⁿ
\vddi dda^u\? ---> ≡i ≡Γu?
\m\ddi v\o^? ---> ≡i v▒
\m\ddi v\o^\? ---> ≡i v⌠?
\\ ---> \
\\V ---> \V
\\M ---> \M
\\L ---> \L
4.2.7 K² T± Hoαn Cñu
~~~~~~~~~~~~~~~~~~~~
D≥ng dµ ki«n c≤ th¼ ch╤a m╡t k² t± ≡úc bi«t g≈i lα k² t± hoαn cñu (k² t± hoαn
tñt vi«c cñu t╒o chµ, closure character) d∙ng ≡¼ k¬t th·c vi«c k¬t h■p chµ
≡ang di¡n ti¬n. K² t± nαy lα CTRL-A (ASCII 0x1), t╫ nay g≈i lα <CLS>. Khi
gúp <CLS> trong d≥ng dµ ki«n, tñt cΣ nhµng vi«c k¬t h■p chµ ≡ang ti¬n hαnh ≡½u
≡▀■c k¬t th·c. <CLS> lu⌠n lu⌠n b╕ lo╒i b÷, tr╫ khi n≤ xuñt hi«n trong chu▓i
nguyΩn d╒ng <COM><CLS>.
K² t± hoαn cñu c≤ φch trong nhµng ╤ng d°ng th±c th╢i nh▀ ≡ßnh chµ, khi cÑn
phΣi cho bi¬t lα chu▓i k¬t h■p ≡π k¬t th·c, vα c╜ phºn nhºn tin kh⌠ng cÑn phΣi
ch╢ thΩm dµ ki«n nµa.
5. C┴C ║NG D₧NG ╨âC BIÄT
~~~~~~~~~~~~~~~~~~~~~~~~
PhÑn nαy phßc h≈a nhµng nguyΩn tíc ch∩ ≡╒o vα quy ▀╛c ≡úc th∙ cho nhµng ╤ng
d°ng ≡π ≡▀■c d∙ng trong gi╛i phßt tri¼n nhu li«u. M°c ≡φch cⁿa n≤ lα cung cñp
m╡t tαi li«u s»ng bao g░m nhµng kinh nghi«m tφch l√y trong th╢i gian qua c√ng
nh▀ nhµng kinh nghi«m síp t╛i. Ch·ng t⌠i hoan nghΩnh ≡╡c giΣ tham gia vαo
nhµng cu╡c thΣo luºn nαy vα c»ng hi¬n vαo vi«c phßt tri¼n nhµng nguyΩn tíc ch∩
≡╒o n≤i riΩng, vα vi«c tiΩu chuªn h≤a n≤i chung.
5.1 ╨IÄN TH┐ CHUYîN QUA MÇCH 7-BIT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
╨a s» nhµng m╒ch hi«n hµu d∙ng ≡¼ chuy¼n ≡i«n th▀ vτn c≥n ╖ trong gi╛i h╒n
7-bit. Tºp mπ t± 8-bit quy ≡╕nh ╖ PhÑn 3 kh⌠ng th¼ ≡▀■c truy½n nguyΩn d╒ng
qua nhµng m╒ch nαy. Do ≡≤ VIQR ≡≤ng m╡t vai quan tr≈ng v∞ n≤ c≤ th¼ d∙ng ≡¼
chuy¼n vσn bΣn ti¬ng Vi«t 7-bit m╡t cßch minh b╒ch, kh⌠ng b╕ m╜ h░ v∞ tφnh
cßch l▀▐ng d°ng cⁿa nhµng k² t± v╫a t▀■ng tr▀ng dñu ph° v╫a t▀■ng tr▀ng dñu
chñm cΓu nh▀ dñu "?". Do tφnh chñt 7-bit cⁿa cßc m╒ch nαy, b╡ chuy¼n th▀ s¿
kh⌠ng gúp phΣi nhµng mτu t± Vi«t nóm trong G1 nh▀ σ, ┼, Γ, ┬, Ω, ╩, ⌠, ╘, ╜,
┤, ▀ vα ┐. Tuy nhiΩn, b╡ chuy¼n th▀ ch¬ t╒o cho m╒ch 8-bit s¿ phΣi giΣi quy¬t
nhµng mτu t± nαy ≡·ng theo quy tíc k¬t h■p VIQR, nghεa lα phΣi k¬t h■p nguyΩn
Γm c╜ bΣn vα dñu ph° n¬u ≡▀■c. Vφ d°:
47
σ' ---> í
╨¼ ≡▀■c hi¼u ≡·ng, ≡i«n th▀ phΣi ch∩ ≡╕nh r⌡ rαng tr╒ng thßi ng⌠n ngµ,
hoúc ╖ trong phÑn dτn ≡Ñu (header), hoúc ╖ trong phÑn n╡i dung (text body) cⁿa
th▀. Ch·ng ta kh⌠ng th¼ ph÷ng ≡oßn tr╒ng thßi cⁿa b╡ phºn nhºn tin ╖ ≡Ñu m▓i
lß th▀, v∞ th▀ c≤ th¼ ≡▀■c ≡≈c t╫ m╡t h░ s╜ (file) g░m nhi½u lß (message) ch╤
kh⌠ng phΣi ch∩ c≤ m╡t lß, do ≡≤ kh≤ bi¬t ≡Γu lα ch▓ bít ≡Ñu cⁿa lß th▀ khßc.
H╜n nµa, n¬u trong th▀ c≤ ch╤a m╡t chu▓i ch∩ ≡╕nh tr╒ng thßi ng⌠n ngµ
(<COM>L, <COM>V hoúc <COM>M), th∞ lß th▀ nΩn ≡▀■c k¬t th·c trong tr╒ng thßi
nguyΩn d╒ng, nghεa lα k¬t th·c bóng <COM>L. Vi«c nαy gi·p cho ╤ng d°ng ≡≈c
th▀ ≡≈c ≡▀■c nhµng lß th▀ sau nóm c∙ng trong m╡t h░ s╜, ch╞ng h╒n ╤ng d°ng ≡≈c
th▀ trΩn mαn Σnh. ╨i½u nαy t÷ ra φch l■i v∞ phÑn dτn ≡Ñu cⁿa ≡i«n th▀ n≤i
chung kh⌠ng tuΓn theo quy tíc VIQR, vα do ≡≤ c≤ th¼ b╕ hi¼u sai khi kh⌠ng ╖
trong tr╒ng thßi nguyΩn d╒ng.
5.2 ╨┴NH CH VIÄT
~~~~~~~~~~~~~~~~~
Bαn ≡ßnh chµ cαng ngαy cαng ≡▀■c qu»c t¬ h≤a thΩm. Nh▀ ≡π n≤i ╖ phÑn quy ≡╕nh
8-bit, ≡Γy lα m╡t l² do chφnh ≡¼ d∙ng c∙ng m╡t mπ s» cho nhµng mτu t± Vi«t ≡π
c≤ s╟n trong bΣng ISO 8859/Latin-1. B╡ ≡i½u khi¼n bαn chµ Vi«t, ≡▀■c thi¬t k¬
≡¼ d∙ng trong m⌠i tr▀╢ng 7-bit mα th⌠i, c≤ th¼ giΣ s╪ lα s¿ kh⌠ng bao gi╢ gúp
nhµng nguyΩn Γm c╜ bΣn cⁿa ti¬ng Vi«t nóm trong v∙ng G1. Nh▀ng b╡ ≡i½u khi¼n
bαn chµ d∙ng trong m⌠i tr▀╢ng 8-bit, c√ng nh▀ b╡ nhºn tin 8-bit (PhÑn 5.1),
phΣi s╟n sαng ti¬p nhºn bñt c╤ nguyΩn Γm c╜ bΣn nαo, k¼ cΣ nhµng nguyΩn Γm nóm
trong G1.
Vi«c t╒o h∞nh mτu t± trΩn mαn Σnh (echoing behavior) khi ≡ang k¬t h■p chµ
t╫ bαn ≡ßnh chµ cÑn ≡▀■c quy ≡╕nh thΩm. Ch·ng ta c≤ th¼ t╒o h∞nh cho mτu t±
ch∩ sau khi vi«c k¬t h■p ≡π hoαn tñt. Ch·ng ta c√ng c≤ th¼ t╒o h∞nh cho tñt
cΣ nhµng d╒ng trung gian, h∞nh cⁿa d╒ng sau ≡▀■c t╒o ra bóng cßch tr╖ lui
(backspace) ≡¼ x≤a r░i in ch░ng lΩn d╒ng tr▀╛c. M▓i cßch ≡½u c≤ hµu d°ng
riΩng nh▀ s¿ m⌠ tΣ sau ≡Γy.
5.2.1 Cßch T╒o H∞nh Lºp T╤c Trong PhΘp T╒o Chµ NgÑm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PhΘp t╒o chµ ngÑm ≡▀■c ≡út ra ≡¼ ti«n cho vi«c x╪ l² nhµng dµ ki«n mα phÑn l╛n
lα Vi«t ngµ. V╛i m°c tiΩu ≡≤, ng▀╢i ≡ßnh chµ cÑn thñy ngay nhµng chµ v╫a
≡ßnh. Trong phΘp t╒o chµ ngÑm, bαn ≡ßnh chµ ho╒t ≡╡ng trong tr╒ng thßi t╒o
h∞nh lºp t╤c. M▓i phφm chµ ≡▀■c nhñn (keypress) s¿ lºp t╤c t╒o ra m╡t bi¬n c»
phφm chµ (key event). N¬u m╡t mτu t± (thφ d° a) k¬t h■p v╛i m╡t dñu ph° (thφ
d° ^) theo sau n≤, m╡t thoßi t± (backspace) (th▀╢ng lα BS, ASCII 0x8) s¿ ≡▀■c
g╪i ≡i kΦm theo sau lα mτu t± m╛i v╫a thαnh lºp (Γ). Chu k╧ nαy tßi di¡n cho
≡¬n khi vi«c k¬t h■p chµ hoαn tñt. Trong cßch t╒o h∞nh lºp t╤c, nhµng bi¬n c»
t╒o ra do vi«c nhñn 4 phφm chµ a^'n lα:
1. Ng▀╢i d∙ng nhñn phφm a, a ≡▀■c g╪i cho ╤ng d°ng
2. Ng▀╢i d∙ng nhñn phφm ^, BS vα Γ ≡▀■c g╪i ≡i
3. Ng▀╢i d∙ng nhñn phφm ', BS vα ñ ≡▀■c g╪i ≡i
4. Ng▀╢i d∙ng nhñn phφm n, n ≡▀■c g╪i ≡i
48
Thoßi t± c≤ th¼ ≡▀■c thay ≡▒i t∙y theo ╤ng d°ng, khi¼n h«, vα m⌠i tr▀╢ng
cⁿa ng▀╢i d∙ng. B╡ ≡i½u khi¼n cⁿa bαn ≡ßnh chµ nΩn d∙ng ≡·ng thoßi t±, vα/hay
cho phΘp ng▀╢i d∙ng ch∩ ≡╕nh thoßi t± theo ² thφch.
5.2.2 Cßch T╒o H∞nh Chºm Trong PhΘp T╒o Chµ Ch∩ ╨╕nh
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Khi vi«c k¬t h■p chµ m╛i bít ≡Ñu, b╡ ≡i½u khi¼n cⁿa bαn ≡ßnh chµ kh⌠ng g╪i cho
╤ng d°ng t╫ng bi¬n c» phφm chµ mα phΣi ch╢ cho ≡¬n khi vi«c k¬t h■p chñm d╤t.
Vi«c k¬t h■p c≤ th¼ chñm d╤t m╡t cßch t± nhiΩn khi chu▓i k² t± k¬t h■p ≡π ≡Ñy
≡ⁿ hoúc khi b╡ ≡i½u khi¼n nhºn ≡▀■c k² t± kh⌠ng th¼ k¬t h■p ≡▀■c, hoúc chñm
d╤t khi nhºn ≡▀■c mπ t± hoαn cñu <CLS>. Sau ≡≤, ch∩ c≤ m╡t mτu t± duy nhñt
≡▀■c t╒o thαnh vα ≡▀■c g╪i cho ╤ng d°ng ≡ang ch╢. Vi«c x╪ l² sau ≡≤ s¿ ti¬n
hαnh t± nhiΩn tr╖ l╒i. Hπy xem di¡n ti¬n khi ng▀╢i d∙ng nhñn chu▓i phφm
"\a^'n":
1. Ng▀╢i d∙ng ≡ßnh \, kh⌠ng c≤ chµ nαo ≡▀■c g╪i ≡i
2. Ng▀╢i d∙ng ≡ßnh a, kh⌠ng c≤ chµ nαo ≡▀■c g╪i ≡i
3. Ng▀╢i d∙ng ≡ßnh ^, kh⌠ng c≤ chµ nαo ≡▀■c g╪i ≡i
4. Ng▀╢i d∙ng ≡ßnh ', chµ ñ ≡▀■c g╪i ≡i
5. Ng▀╢i d∙ng ≡ßnh n, chµ n ≡▀■c g╪i ≡i
Vφ d° sau ≡Γy d∙ng mπ t± hoαn cñu <CLS>, chu▓i "t\o+<CLS>":
1. Ng▀╢i d∙ng ≡ßnh t, chµ t ≡▀■c g╪i ≡i
2. Ng▀╢i d∙ng ≡ßnh \, kh⌠ng c≤ chµ nαo ≡▀■c g╪i ≡i
3. Ng▀╢i d∙ng ≡ßnh o, kh⌠ng c≤ chµ nαo ≡▀■c g╪i ≡i
4. Ng▀╢i d∙ng ≡ßnh +, kh⌠ng c≤ chµ nαo ≡▀■c g╪i ≡i
5. Ng▀╢i d∙ng ≡ßnh CTRL-A, m╡t chµ ╜ ≡▀■c g╪i ≡i
╨¼ ² lα n¬u kh⌠ng c≤ phφm hoαn cñu <CLS>, b╡ ≡i½u khi¼n bαn ≡ßnh chµ s¿
vτn ti¬p t°c ch╢ sau khi phφm + ≡▀■c bñm, v∞ ng▀╢i d∙ng vτn c≥n c≤ th¼ ≡ßnh
m╡t dñu gi≈ng nh▀ lα m╡t phÑn t╪ cⁿa chu▓i k¬t h■p.
Ph▀╜ng phßp t╒o h∞nh chºm cⁿa phΘp t╒o chµ ch∩ ≡╕nh ≡▀■c ≡½ ra ≡¼ bΣo ≡Σm
vi«c tφch h■p v╛i nhµng ╤ng d°ng ≡≥i h÷i m▓i mπ t± phΣi liΩn k¬t v╛i m╡t bi¬n
c» phφm chµ, nhñt lα trong tr╒ng thßi Anh ngµ v∞ tr╒ng thßi nαy ch∩ cho phΘp
t╒o chµ ch∩ ≡╕nh mα th⌠i.
Múc d∙ c≤ th¼ t╒o h∞nh lºp t╤c trong phΘp t╒o chµ ch∩ ≡╕nh hay t╒o h∞nh
chºm trong phΘp t╒o chµ ngÑm, nh▀ng nhµng cßch th╤c nαy kh⌠ng hµu φch vα ch∩
lαm cho ng▀╢i d∙ng lÑm lτn. Do ≡≤, vi«c ≡╜n giΣn nhñt lα ch∩ liΩn k¬t cßch
t╒o h∞nh lºp t╤c v╛i phΘp t╒o chµ ngÑm, vα cßch t╒o h∞nh chºm v╛i phΘp t╒o chµ
ch∩ ≡╕nh. Nhµng cßch th╤c nαy c≤ vδ t± nhiΩn h╜n.
TiΩu chuªn trong vσn ki«n nαy quy ≡╕nh nhµng ≡úc tφnh t»i thi¼u v½ mút
"h∞nh th╤c vα cΣm gißc" mα ng▀╢i d∙ng c≤ th¼ k╧ v≈ng ╖ m╡t nhu li«u Vi«t Ngµ
h■p th╤c. M╡t giao di«n ≡▀■c tiΩu chuªn h≤a s¿ giΣm thi¼u th╢i gian phΣi h≈c
cho m▓i ╤ng d°ng m╛i. TiΩu chuªn trong bαi nαy kh⌠ng lo╒i b÷ nhµng h« th»ng
≡ßnh chµ khßc mα m°c ≡φch cⁿa ch·ng lα gi·p ng▀╢i d∙ng d¡ s╪ d°ng h╜n, ch╞ng
h╒n, ≡ßnh dñu bóng bΣng-≡i½u-khi¼n kh⌠n ngoan, hay gi·p ≡ßnh chµ nhanh bóng
cßch d∙ng cßc phφm CONTROL hay FUNCTION ch╞ng h╒n. Bñt c╤ s± gia c⌠ng
(enhancement) nαo trong nhµng ╤ng d°ng h■p th╤c (compliant application) ≡½u lα
49
m╡t ≡i½u t»t cho ng▀╢i d∙ng, mi¡n lα nhµng s± gia c⌠ng nαy kh⌠ng xung ≡╡t v╛i
nhµng ≡úc tφnh t»i thi¼u m⌠ tΣ trong bαi nαy.
5.3 HöP TH║C H╙A ║NG D₧NG VIÄT NG HIÄN H└NH
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bñt c╤ ph▀╜ng phßp th±c t¬ nαo ≡¼ ≡╕nh chuªn c√ng cÑn phΣi d± tr∙ nhµng s±
ch»ng ≡▒i thay cⁿa cßc ╤ng d°ng hi«n hαnh. Trong khi mong mu»n róng tiΩu chuªn
8-bit trong bαi nαy ≡▀■c ⁿng h╡ hoαn toαn, ch·ng t⌠i c√ng ≡½ ngh╕ m╡t giΣi
phßp khßc d¡ dαng ≡▀■c chñp nhºn nhanh ch≤ng h╜n. Tñt cΣ nhµng ╤ng d°ng cÑn
phΣi cung cñp ph▀╜ng ti«n ≡¼ nhºn vαo vα xuñt ra nhµng dµ ki«n mπ h≤a theo
tiΩu chuªn VISCII 8-bit. ╨░ng th╢i, nhµng ╤ng d°ng ≡≤ phΣi th±c hi«n m╡t giao
di«n ≡ßnh chµ tuΓn theo VIQR, n¬u kh⌠ng phΣi lα ph▀╜ng phßp ≡ßnh chµ chⁿ y¬u
th∞ φt nhñt c√ng lα m╡t ph▀╜ng phßp ph° thΩm cho ng▀╢i d∙ng. Nhµng vi«c nαy
rñt cÑn thi¬t cho cΣ ng▀╢i d∙ng lτn ng▀╢i bßn. Ng▀╢i d∙ng c≤ th¼ d∙ng nhu
li«u ngay v∞ giao di«n ≡ßnh chµ ≡░ng nhñt, c√ng nh▀ c≤ th¼ x╪ l² dµ ki«n t╫
nhµng ╤ng d°ng khßc nhau vα trΩn nhµng h« th»ng mßy khßc nhau. ╨i½u ≡≤ s¿ lαm
gia tσng nσng suñt vα s± trao ≡▒i giµa cßc ng▀╢i d∙ng. Vi«c d¡ s╪ d°ng s¿
khi¬n cho ╤ng d°ng ≡▀■c chñp nhºn r╡ng rπi h╜n, vα do ≡≤ ng▀╢i bßn s¿ c≤ nhi½u
khßch hαng h╜n.
6. T╙M TüT & KèT LUçN
~~~~~~~~~~~~~~~~~~~~~
Vσn ki«n nαy v╫a tr∞nh bαy m╡t d± thΣo tiΩu chuªn h≤a vi«c x╪ l² dµ ki«n Vi«t
Ngµ. Nhu cÑu tiΩu chuªn h≤a c√ng ≡π ≡▀■c lαm sßng t÷. Ch·ng t⌠i mong róng ≡π
khuy¬n khφch gi╛i ch¬ t╒o nhu li«u vα ng▀╢i d∙ng nhu li«u Vi«t ngµ c╡ng tßc
v╛i nhau ≡¼ ≡╒t m°c ≡φch nαy hÑu ≡em l■i φch ≡¬n cho tñt cΣ m≈i ng▀╢i liΩn h«.
Vi«c bαn luºn nhµng ph▀╜ng phßp mπ h≤a khßc nhau ≡π ≡▀a ≡¬n s± ch≈n l±a d±
thΣo VISCII 8-bit. Ch·ng t⌠i ≡π ≡½ ngh╕ m╡t bΣng mπ t± duy nhñt, vα quß tr∞nh
th╪ nghi«m th±c ti¡n cho thñy bΣng nαy vºn hαnh t»t ≡⌐p cho Vi«t Ngµ qua cßc
c⌠ng vi«c nh▀ vi¬t bαi, x╪ l², l▀u trµ, chuy¼n tin, mπ h≤a ph⌠ng chµ, vα ñn
loßt. Trong nhµng lπnh v±c mα vi«c d∙ng 8-bit ch▀a cho phΘp hoúc kh⌠ng ≡ßng
tin cºy, ch╞ng h╒n nh▀ vi«c chuy¼n ≡i«n th▀, ch·ng t⌠i ≡π ≡½ ra quy ≡╕nh Vi«t
ngµ ≡≈c-≡▀■c-trong-ngoúc (VIQR) ≡¼ cung cñp m╡t c▒ng l≈c su⌠ng sδ. VIQR ≡π
≡▀■c quy ≡╕nh ≡╡c lºp v╛i ngu░n xuñt phßt dµ ki«n, do ≡≤ ≡π ≡▀■c thi¬t k¬ ≡¼
c≤ th¼ ßp d°ng ≡▀■c cho cΣ bαn ≡ßnh chµ ti¬ng Vi«t lτn cßc mßy l≈c dµ ki«n.
Tñt cΣ nhµng ≡i½u nαy ≡π ≡▀■c ch╤ng t÷ lα c≤ th¼ tφch h■p vαo nhµng m⌠i tr▀╢ng
hi«n hµu, gi·p cho vi«c s╪ d°ng nhµng h▀ c° vα ╤ng d°ng hi«n hµu ≡▀■c tr╖
thαnh d¡ dαng h╜n --- m╡t ▀u ≡i¼m l╛n cⁿa ph▀╜ng phßp mπ h≤a nαy. Cu»i c∙ng,
nhµng quy ≡╕nh nαy ≡π ≡▀■c liΩn k¬t v╛i nhau m╡t cßch su⌠ng sδ trong m≈i giai
≡o╒n cⁿa chu k╧ x╪ l² dµ ki«n (g░m c≤ nhºn dµ ki«n, x╪ l²/truy½n dµ ki«n, vα
xuñt ra dµ ki«n, g≈i tít lα chu k╧ nhºp-bi¬n-xuñt). Nhµng quy ≡╕nh nαy ≡π
cung cñp m╡t khu⌠n kh▒ th»ng nhñt th±c s± cho vi«c x╪ l² dµ ki«n Vi«t Ngµ.
T└I LIÄU THAM KH─O
~~~~~~~~~~~~~~~~~~
[1] B╒ch H▀ng Khang. "Institute of Informatics." Hα N╡i,
Vi«t Nam, thßng hai 1991.
50
[2] B. Jerman-Blazic, "Will the Multy-octet Standard Character Set
Code Solve the World Coding Problems for Information Interchange?"
Computer Standards & Interfaces, vol. 8, trang 127-136, 1988.
[3] The Unicode Consortium. The Unicode Standard: Worldwide
Character Encoding Version 1.0. Addison-Wesley, Reading, MA, ñn
bΣn th╤ nhñt, thßng m▀╢i 1991.
[4] ISO Technical Committee, "Universal Multiple-Octet Coded
Character Set (UCS), ISO/IEC DIS 10646-1.2," Draft standard,
International Organization for Standardization, 1992.
[5] International Organization for Standardization, ISO 8859/x:
8-bit Internatonal Code Sets. ISO, 1977.
[6] Famjxuaen Thais. Vi«t Ngµ CΣi Cßch. T╤ HΣi, Hα N╡i,
Vi«t Nam, thßng ba 1948.
[7] Ph╒m XuΓn Thßi. Chµ Vi«t H■p Lφ, Tφn-╨╤c Th▀ Xπ,
Sαi G≥n, Vi«t Nam, thßng t▀ 1958.
[8] J. Postel, "Simple Mail Transfer Protocol," RFC 822, USC
Information Sciences Institute, thßng tßm 1982.
[9] J. C. Klensin et al., "SMTP Extensions for Transport of
Text-Based Messages Containing 8-bit Characters," Internet draft,
Massachusetts Institute of Technology, thßng bΣy 1991.
[10] K. Simonsen, "Character Mnemonics & Character Sets," Internet
Draft, Danish Unix Users Group, thßng giΩng 1992.
[11] K. Simonsen, "Mnemonic Text Format," Internet draft, Danish
Unix Users Group, thßng tßm 1991.
[12] International Organization for Standardization. ISO 646:
7-bit Coded Character Set for Information Interchange. ISO,
ñn bΣn th╤ ba, 1991.
[13] International Organization for Standardization. ISO 2022:
7-bit and 8-bit Coded Character Sets--Code Extension Techniques.
ISO, ñn bΣn th╤ ba, 1986.
[14] E. M. van der Poel, "Multilingual Character Encoding for
Internet Messages," Internet draft, Software Research Associates,
Japan, thßng giΩng 1992.
[15] D.E. Knuth. The TeXbook. Addison-Wesley, Reading, MA, 1984.
51
THUçT-NG ANH VIÄT
~~~~~~~~~~~~~~~~~~
Announcer: mπ t± (hay chu▓i mπ t±) bßo tin. Khi mπ t± nαy xuñt hi«n trong
d≥ng dµ ki«n th∞ n≤ bßo cho bi¬t nhµng mπ t± ≡i sau c≤ m╡t ² nghεa ≡úc
bi«t. Trong vσn ki«n nαy, n≤ cho bi¬t s± m╖ ≡Ñu cⁿa vi«c k¬t h■p chµ
Vi«t.
ASCII: American Standard Code for Information Interchange, b╡ mπ t± tiΩu-chuªn
Hoa-k╧ dαnh cho vi«c trao ≡▒i tin-t╤c. B╡ mπ nαy c≤ 128 mπ s» ≡▀■c hÑu
h¬t cßc mßy vi-tφnh d∙ng ≡¼ ≡úc tr▀ng vα truy½n ≡i cßc dµ ki«n chµ. M▓i
chµ trong b╡ mπ nαy c≤ mπ s» trong khoΣng t╫ 0 ≡¬n 127. Nhµng b╡ mπ 8-bit
hoúc 9-bit trong ≡≤ 128 mπ t± ≡Ñu tiΩn t▀╜ng ╤ng v╛i ASCII ≡▀■c g≈i lα b╡
mπ ASCII-r╡ng (extended ASCII). Nhµng mπ t± thΩm vαo lα mτu t± La-tinh c≤
dñu r╢i, mτu t± phi-La-tinh, ki¼m t± ≡i½u khi¼n mαn Σnh, vΓn vΓn.
Backslash: g╒ch-chΘo-ng▀■c (\).
Base Vowel: nguyΩn Γm c╜ bΣn. ╨╤ng trΩn ph▀╜ng di«n mπ h≤a chµ Vi«t, vσn ki«n
nαy coi nhµng nguyΩn Γm sau ≡Γy lα c╜ bΣn: a, σ, Γ, e, Ω, i, o, ⌠, ╜, u,
▀, y vα nhµng chµ hoa t▀╜ng ╤ng.
Binary Data: dµ ki«n nh╕ phΓn; t∙y theo ngµ cΣnh c≥n mang nghεa dµ ki«n 8-bit,
nhñt lα trong lπnh v±c chuy¼n tin.
C0 Space: V∙ng (mi½n) C0. ╨Γy lα tºp h■p g░m nhµng mπ t± c≤ s» thºp-l°c phΓn
t╫ 00 ≡¬n 1F (v∙ng "ki¼m t±" cⁿa b╡ mπ ASCII).
C1 Space: V∙ng (mi½n) C1. ╨Γy lα tºp h■p g░m nhµng mπ t± c≤ s» thºp-l°c phΓn
t╫ 80 ≡¬n 8F (v∙ng "ki¼m t±" cⁿa b╡ mπ ASCII-r╡ng).
Character: mτu t±, k² t±, mπ t±. Trong tin-h≈c, character th▀╢ng ≡▀■c d∙ng ≡¼
ch∩ bñt c╤ cßi g∞ ≡▀■c liΩn k¬t v╛i m╡t mπ s» (code) nΩn nghεa ≡·ng nhñt
lα mπ t±. Mπ t± c≤ th¼ lα mτu t± (nh▀ a, b, c, ...), hoúc dñu hi«u, k²
hi«u (nh▀ +, -, =, ...), hoúc m╡t tφn hi«u ≡i½u khi¼n. K² t± ch∩ k² hi«u
theo nghεa r╡ng, bao g░m cßc dñu hi«u, mτu t±, hoúc chµ t▀■ng h∞nh nh▀ chµ
Hßn.
Character Set: tºp mπ t±, b╡ mπ t±. C≤ th¼ d╕ch thoßt lα h« mπ t± v∞ m▓i tºp
mπ t± lα m╡t h« th»ng k² t± cho m╡t hoúc nhi½u ng⌠n ngµ. C√ng c≤ th¼ d╕ch
thoßt lα bΣng mπ t± v∞ th▀╢ng th▀╢ng tºp mπ t± ≡▀■c tr∞nh bαy d▀╛i d╒ng
bΣng. S» l▀■ng mπ t± trong m▓i tºp lα 2^n (2 l√y th╫a n) v╛i n lα s» bit
d∙ng ≡¼ mπ h≤a m╡t mπ t±. Cßc h« mπ t± quen thu╡c lα b╡ ASCII 7-bit cⁿa
Hoa-k╧, h« 8-bit nh▀ cßc tºp mπ t± ISO-8859/X, h« 16-bit nh▀ Unicode, h«
32-bit nh▀ d± thΣo ISO DIS 10646.
Code: mπ s» (trong th⌠ng tin dµ ki«n), con s» t▀■ng tr▀ng cho m╡t mτu t±, k²
t±, hoúc tφn hi«u ≡i½u khi¼n. Thφ d° s» thºp phΓn 65 trong b╡ mπ t± ASCII
Hoa k╧ t▀■ng tr▀ng cho chµ A.
Code Page: thuºt ngµ th▀╢ng d∙ng ≡¼ ch∩ nhµng tºp mπ t± d∙ng trΩn mßy IBM PC,
vi¬t tít lα CP. CP850 lα tºp mπ t± ≡a ngµ, CP860 lα tºp mπ t± B░-≡αo-nha,
CP863 cho ti¬ng Phßp ╖ Gia-nπ-≡╒i, CP865 cho Na-uy.
52
Code Page Switching: ≡▒i bΣng (tºp, b╡, h«) mπ t±.
Compatible: t▀╜ng h■p, t▀╜ng dung
Compliant: h■p th╤c, tuΓn theo ≡·ng cßch.
Composed Character: chµ ghΘp, chµ r╢i (theo quan ≡i¼m mπ h≤a). Xem chµ
floating diacritic.
Context-Dependent: t∙y thu╡c vαo ngµ cΣnh (² nghεa chung quanh).
Control Character: mπ t± ≡i½u khi¼n, ki¼m t±. ╨≤ lα mπ t± ASCII nóm trong
khoΣng t╫ 0 ≡¬n 31, vα mπ t± 127, t▀╜ng phΣn v╛i nhµng mπ t± c≤ th¼ in ra
≡▀■c (g≈i lα k² t± h∞nh) nóm trong khoΣng t╫ 32 ≡¬n 126. TrΩn cßc bαn chµ
ASCII, ki¼m t± (thφ d° CTRL-A, mπ s» 1) ≡▀■c t╒o ra bóng cßch chºn phφm
CTRL xu»ng r░i ≡ßnh chµ liΩn h« (A).
Cross-Platform: xuyΩn-giαn, xuyΩn qua nhi½u h« th»ng mßy khßc nhau.
Data: dµ ki«n, dµ li«u.
Data Channel: m╒ch dµ ki«n
Data Communication: th⌠ng tin dµ ki«n, lπnh v±c chuy¼n tin.
Data Frame: khung dµ ki«n.
Data Integrity: s± toαn v⌐n dµ ki«n, s± bΣo toαn dµ ki«n.
Data Stream: lu░ng (d≥ng) dµ ki«n, d≥ng tin.
Diacritic: dñu ph°. Dñu ph° lα nhµng nΘt thΩm vαo m╡t mτu t± "g»c" ≡¼ t╒o ra
mτu t± khßc. Ch╞ng h╒n mτu t± ┬ ≡▀■c cñu t╒o t╫ mτu t± g»c A vα dñu ph° ^
(dñu m√).
Display: Hi¼n th╕, t╒o h∞nh, in h∞nh (trΩn mαn Σnh). Xem chµ rendering.
EBCDIC: Extended Binary Coded Decimal Interchange Code, b╡ mπ 8-bit g░m 256 mπ
t± d∙ng trΩn cßc mßy IBM mainframes.
Editor: ╤ng d°ng vi¬t bαi, vi¬t-c° (d°ng c° vi¬t bαi).
Electronic Mail: ≡i«n th▀.
to Encode: mπ h≤a.
Escape Mechanism: c╜ ch¬ thoßt.
Fax: ≡i«n h∞nh th▀. Khßc v╛i ≡i«n th▀ v∞ ch∩ chuy¼n ≡i nhµng chñm h∞nh.
Ng▀╢i nhºn kh⌠ng th¼ d∙ng nhµng ╤ng d°ng vi¬t bαi ≡¼ s╪a ≡▒i ≡▀■c.
File: h░ s╜. C≤ tr▀╢ng phßi d╕ch lα t«p.
53
Floating Diacritic: dñu r╢i (theo quan ≡i¼m mπ h≤a). M╡t mτu t± c≤ dñu ph° c≤
th¼ ≡▀■c mπ h≤a bóng m╡t mπ s» duy nhñt hoúc nhi½u mπ s». Thφ d° ╘ c≤ th¼
≡▀■c mπ h≤a bóng m╡t mπ s» duy nhñt vα ≡▀■c g≈i lα chµ nguyΩn v⌐n hoúc chµ
d±ng s╟n (precomposed character), hoúc mπ h≤a bóng hai mπ s», m╡t cho
nguyΩn Γm g»c O vα m╡t cho dñu m√ (^). Trong tr▀╢ng h■p sau, ╘ ≡▀■c g≈i
lα chµ ghΘp (composed character) vα dñu m√ ≡▀■c g≈i lα dñu r╢i.
Font: "ph⌠ng," b╡ ki¼u chµ, m╡t tºp h■p cßc h∞nh chµ c≤ chung m╡t s» ≡úc tφnh
nαo ≡≤ vα c≤ th¼ in ra ≡▀■c trΩn mαn Σnh hoúc trΩn giñy. Italic font: b╡
chµ nghiΩng. Bold face font: b╡ chµ in ≡ºm. M▓i h∞nh chµ (glyph) trong
ph⌠ng ≡▀■c ñn ≡╕nh m╡t mπ s». Ph⌠ng 8-bit c≤ th¼ c≤ tñt cΣ 256 h∞nh chµ.
Mπ s» cⁿa h∞nh chµ (glyph code) kh⌠ng nhñt thi¬t phΣi gi»ng v╛i mπ s» cⁿa
chµ (character code) t▀╜ng ╤ng trong tºp mπ t±. Thφ d°, dµ ki«n chµ A c≤
mπ s» 65 trong bΣng ASCII nh▀ng h∞nh chµ A c≤ th¼ ≡▀■c quy ≡╕nh ╖ v╕ trφ
th╤ 35 trong m╡t bΣng ph⌠ng nαo ≡≤, n¬u mu»n. Nh▀ng m╡t h∞nh chµ trong
ph⌠ng l╒i c≤ th¼ t▀╜ng ╤ng v╛i nhi½u chµ trong tºp mπ t±. ╨i½u nαy th▀╢ng
xΣy ra trong ph▀╜ng phßp mπ h≤a d∙ng dñu r╢i (xem floating diacritic).
Trong ph▀╜ng phßp nαy, chµ └ th±c ra lα s± k¬t h■p cⁿa dµ ki«n chµ A (mπ
s» 65) vα dµ ki«n dñu huy½n ` (thφ d° mπ s» 196), nh▀ng khi t╒o h∞nh th∞
d∙ng h∞nh d╒ng cⁿa chµ └, thφ d° nóm ╖ v╕ trφ 135 cⁿa bΣng ph⌠ng chµ.
╨i½u nαy lαm cho vi«c x╪ l² ph╤c t╒p vα kΘm hi«u nσng nΩn hÑu h¬t cßc nhu
li«u vα c▀╜ng li«u ┬u M█ trßnh kh⌠ng d∙ng. Do ≡≤, ≡¼ c≤ th¼ tφch h■p vαo
cßc m⌠i tr▀╢ng nhu li«u hi«n hµu, mπ s» h∞nh chµ vα mπ s» chµ phΣi gi»ng
nhau.
Font Shifting: chuy¼n ph⌠ng, chuy¼n mτu chµ.
Format: khu⌠n th╤c (khu⌠n kh▒ vα h∞nh th╤c).
Format Data Control Character: mπ t± ≡i½u khi¼n khu⌠n th╤c.
Frame: khung, s▀╢n.
Framework: khu⌠n kh▒.
G0 Space: v∙ng k² t± h∞nh (graphic character) c≤ mπ s» thºp-l°c phΓn t╫ 20 ≡¬n
7F.
G1 Space: v∙ng k² t± h∞nh c≤ mπ s» thºp-l°c phΓn t╫ A0 ≡¬n FF.
Gateway: c▒ng, ≡Ñu cÑu.
Glyph: h∞nh chµ, m╡t phÑn t╪ cⁿa ph⌠ng chµ (font).
Graphic Character: k² t± h∞nh (k² t± c≤ th¼ in ra ≡▀■c).
Interface: giao-di«n, m╒ch n»i.
Interface Between 2 Computers: giao di«n (m╒ch n»i) giµa 2 mßy.
ISO: International Organization for Standardization. M╡t ⁿy ban qu»c t¬ t±
nguy«n g░m cßc ⁿy ban ≡╕nh chuªn cßc n▀╛c h■p tßc v╛i nhau ≡¼ ≡╕nh ra cßc
tiΩu chuªn trong tñt cΣ m≈i lπnh v±c, trong ≡≤ bao g░m mßy vi tφnh, x╪ l²
54
tin t╤c, tºp mπ t±.
ISO 646: TiΩu chuªn mπ t± 7-bit, t▀╜ng t± v╛i ASCII.
ISO Standard 8859: TiΩu chuªn ISO 8859. TiΩu chuªn nαy quy ≡╕nh m╡t lo╒t cßc
tºp mπ t± 8-bit bao g░m chµ cⁿa nhi½u ng⌠n ngµ. Lo╒t nαy bao g░m nhµng
tºp mτu t± La-tinh 1--9, ßp d°ng cho tñt cΣ nhµng ng⌠n ngµ c≤ chµ vi¬t d±a
trΩn mτu t± La-mπ, c╡ng thΩm m╡t s» nhµng tºp mτu t± ≡úc bi«t nh▀
Cyrillic, Hy-l╒p, vα Do-thßi.
ISO 8859/1: TiΩu chuªn ISO 8859 Mτu t± La-tinh s» 1. T»i thi¼u nhµng ng⌠n ngµ
sau ≡Γy s╪ d°ng ≡▀■c: La-tinh, ╨an-m╒ch, ╨╤c, Dutch, Anh-ngµ, Faeroese,
PhÑn-lan, Phßp, Bσng-≡Σo, ┴i-nhε-lan, ▌, Na-uy, B░-≡αo-nha, TΓy-ban-nha,
vα Th°y-≡i¼n.
ISO 2022 and ISO 4873: TiΩu chuªn ISO d∙ng ≡¼ chuy¼n h« mπ t±.
ISO DIS 10646: ISO Draft International Standard, d± thΣo tiΩu chuªn qu»c t¬
ISO 16-bit vα 32-bit cho tñt cΣ cßc ng⌠n ngµ trΩn th¬ gi╛i.
Latin: ch∩ b╡ mτu t± La-tinh hoúc La-mπ, g░m c≤ cßc chµ t╫ A ≡¬n Z, hay tñt cΣ
cßc b╡ mτu t± d±a vαo La-tinh.
Keyboard Interface: giao di«n bαn ≡ßnh chµ.
Integrated Environment: m⌠i tr▀╢ng tφch h■p.
Integration: s± tφch h■p, sßt nhºp.
Keyboard: bαn ≡ßnh chµ.
Line-Drawing: v¿ ≡▀╢ng th╞ng.
Literal: nguyΩn d╒ng, theo sßt nghεa ≡en.
Literal character: nguyΩn t±, chµ vi¬t sao hi¼u vºy.
Lower-Case Character: chµ in th▀╢ng.
Look-and-Feel: h∞nh th╤c vα cΣm gißc.
Mail Agent: b╡ chuy¼n th▀, ≡╒i l² th▀ t╫.
Mnemonic: d¡ nh╛.
Modifier: Dñu ph° ≡¼ thay ≡▒i Γm. Trong ti¬ng Vi«t, ≡≤ lα cßc dñu trσng (dñu
ß), dñu m√ (dñu ^), vα dñu m≤c (trong chµ ┤ hoúc ┐).
On the Fly: theo l·c ≡≤, ngay l·c ≡≤.
PC: Personal Computer, mßy vi tφnh cß nhΓn. Trong vσn ki«n nαy, chµ PC ch∩
toαn b╡ cßc mßy IBM PC vα PS/2 c∙ng nhµng mßy t▀╜ng h■p, k¼ cΣ mßy AT,
286, 386 vα 486.
55
PostScript: M╡t ng⌠n ngµ m⌠-tΣ t╫ng trang (giñy, sßch, bßo, v.v.) m╡t, c≤ khΣ
nσng x╪ l² ≡░-h∞nh (graphic capabilities), d∙ng trong vi«c ñn loßt bóng
mßy vi-tφnh. ╨Γy lα m╡t ng⌠n ngµ thΣo ch▀╜ng cñp cao vα ≡╡c lºp v╛i m≈i
thi¬t b╕. PostScript lα nhπn hi«u cÑu ch╤ng cⁿa c⌠ng ty Adobe Systems
Incorporated.
Precomposed Character: chµ nguyΩn v⌐n, chµ d±ng s╟n (theo quan ≡i¼m mπ h≤a).
Xem chµ floating diacritic.
Processing: x╪ l².
Protocol: biΩn bΣn, nghi th╤c
Public-Domain: thu╡c lπnh v±c c⌠ng c╡ng.
Quoted-Readable: ≡≈c ≡▀■c trong ngoúc. T╫ x▀a, ² nghεa cⁿa nhµng mπ t± ≡úc
bi«t c≤ th¼ thay ≡▒i bóng cßch b÷ ch·ng trong ngoúc ≡╜n hoúc ngoúc kΘp,
hoúc d∙ng m╡t k² t± bßo tin ≡i tr▀╛c nh▀ dñu g╒ch chΘo ng▀■c (\) d∙ng
trong khi¼n h« Unix hay ng⌠n ngµ thΣo ch▀╜ng C. Do ≡≤ xuñt hi«n t╫
"trong-ngoúc" (quoted). C╜ ch¬ ngoúc ≡▀■c g≈i lα quoting mechanism. Thφ
d° trong khi¼n h« Unix, k² t± * ≡▀■c khai tri¼n thαnh nhµng k² t± khßc,
nΩn m«nh l«nh
rm *
s¿ x≤a h¬t m≈i h░ s╜, trong khi ≡≤ m«nh l«nh
rm "*"
ch∩ x≤a m╡t h░ s╜ c≤ tΩn lα k² t± *. T▀╜ng t±, n¬u c╜ ch¬ ngoúc ≡▀■c quy
▀╛c lα [cd] ≡¼ t▀■ng tr▀ng cho m╡t nguyΩn Γm c≤ dñu, trong ≡≤ c lα nguyΩn
Γm, d lα dñu ph°, th∞ ta c≤ th¼ di¡n tΣ chµ cΣ bóng c[a?], v∞ chu▓i a? c≤
nghεa lα Σ khi nóm trong ngoúc. Quy ≡╕nh VIQR trong vσn ki«n nαy d∙ng
nhµng k² t± g■i h∞nh nh▀ ' ` ? ~ . ≡¼ t▀■ng tr▀ng cho dñu gi≈ng nΩn d¡ ≡≈c
vα d¡ nh╛.
Real Time: th±c th╢i. Trong lπnh v±c ≡i«n toßn, ch∩ vi«c x╪ l² xΣy ra ≡░ng b╡
v╛i bi¬n c» th±c s±.
Rendering: s± hi¼n th╕, s± t╒o h∞nh, in h∞nh (lΩn mαn Σnh hoúc giñy). Xem chµ
display.
Sequence: chu▓i, lo╒t.
Software: nhu li«u. C≤ tr▀╢ng phßi d╕ch thoßt lα h▀ li«u (h▀ lα h▀ th¼, kh⌠ng
c≤ th±c).
Software Application: nhu li«u ╤ng d°ng, hoúc d╕ch tít lα ╤ng d°ng.
Software Library: th▀ vi«n nhu li«u. C≤ th¼ d╕ch thoßt lα h▀-vi«n.
Software Tool: nhu li«u d°ng c°, h▀ c°.
Source Code: ch▀╜ng tr∞nh g»c (bΣn g»c cⁿa ch▀╜ng tr∞nh ≡¼ cho b╡ d╕ch
(compiler) ≡≈c vα d╕ch ra ng⌠n ngµ mßy).
Specification: quy ≡╕nh.
56
Table-Lookup: tra bΣng.
TeX: M╡t h« th»ng ñn loßt ≡▀■c ≡i«n toßn h≤a do Donald Knuth [15] phßt tri¼n,
c≤ khΣ nσng ≡ßp ╤ng m≈i nhu cÑu ñn loßt cßc k² hi«u toßn h≈c vα cßc vσn
bΣn v╛i phªm chñt cao. TeX lα nhπn hi«u cÑu ch╤ng cⁿa H╡i Toßn-h≈c Hoa-
k╧.
Text: vσn bΣn.
Text Viewer: ╤ng d°ng nh∞n chµ. Th▀╢ng d∙ng ≡¼ duy«t qua trΩn mαn Σnh mßy vi
tφnh xem vσn bΣn hi«n ra nh▀ th¬ nαo tr▀╛c khi in ra giñy.
Trade-off: vi«c ch≈n l■i h╒i.
Transmission: chuy¼n, truy½n.
Transparent: th⌠ng su»t, v⌠ h∞nh, kh⌠ng dñu v¬t. ù ≡Γy d∙ng ≡¼ ch∩ vi«c x╪ l²
╖ nhµng cñp bºc d▀╛i c≤ th¼ khßc nhau nh▀ng vτn lαm cho giao di«n ╖ cñp
bºc cao h╜n kh⌠ng thñy c≤ g∞ thay ≡▒i hoúc phΓn bi«t ≡▀■c.
Unicode: M╡t b╡ mπ t± 16-bit do liΩn-≡oαn c⌠ng ty Unicode Consortium thi¬t
lºp. än bΣn (version) 1.0 [3] ≡▀■c phßt hαnh nσm 1991 bao g░m hÑu h¬t cßc
ng⌠n ngµ trΩn th¬ gi╛i, trong ≡≤ c≤ cßc chµ La-tinh, TΓy-┬u, ╨⌠ng ┬u,
Vi«t-nam, Hy-l╒p, Nga, Do-Thßi, ─-rºp, chµ t▀■ng-h∞nh Hßn Nhºt ╨╒i-hαn,
v.v. Trong ñn-bΣn 1.0, chµ Vi«t-nam ≡▀■c mπ h≤a theo ph▀╜ng phßp d∙ng dñu
r╢i (floating diacritic). Tin bßn chφnh th╤c cho bi¬t khi tiΩu chuªn
Unicode sßt nhºp v╛i ISO DIS 10646, chµ Vi«t-nam s¿ ≡▀■c mπ h≤a theo
ph▀╜ng phßp chµ nguyΩn.
Unix: M╡t h« th»ng ≡i½u khi¼n (operating system) mßy vi tφnh rñt ph▒ th⌠ng,
≡▀■c phßt tri¼n ╖ trung tΓm nghiΩn c╤u AT&T Bell Laboratories. Rñt n▒i
ti¬ng v∞ c≤ th¼ vºn hαnh trΩn nhi½u h« th»ng mßy khßc nhau.
Upper-Case Character: chµ hoa.
Usenet: m╡t m╒ng l▀╛i th⌠ng tin ≡i«n toßn qu»c t¬ cho phΘp ng▀╢i d∙ng mßy vi-
tφnh g╪i tin t╛i nhµng ng▀╢i khßc vα nhµng ng▀╢i nαy c≤ th¼ ph·c ≡ßp.
Vi«c tham gia vαo m╒ng l▀╛i Usenet c√ng gi»ng nh▀ vi«c ≡σng k² xem nhµng
t╒p chφ ≡i«n toßn (electronic magazine, t╒p chφ hi«n di«n trong m╒ng l▀╛i
≡i«n toßn, kh⌠ng phΣi lα t╒p chφ in trΩn giñy). Nhµng t╒p chφ nαy, g≈i lα
nh≤m-tin (newsgroup), chuyΩn v½ rñt nhi½u chⁿ ≡½ khßc nhau. Nh≤m-tin
"Soc.Culture.Vietnamese" rñt ph▒ th⌠ng trong gi╛i ng▀╢i Vi«t vα ngo╒i qu»c
╖ khíp n╜i trΩn th¬ gi╛i.
User: ng▀╢i d∙ng, ng▀╢i s╪ d°ng.
User-Interface: giao-di«n cho ng▀╢i d∙ng.
Utility Software: nhu li«u d°ng c°, d°ng li«u (nhu li«u hµu d°ng). Nhµng ╤ng
d°ng ≡¼ tr■ gi·p vi«c phßt tri¼n nhu li«u nh▀ nhu li«u vi¬t bαi (editor),
nhu li«u t∞m chµ (grep, awk trong khi¼n h« Unix), nhu li«u ≡¼ ≡i½u khi¼n
in nhi½u bαi.
57
Vietnamese Character Code: b╡ mπ chµ Vi«t, b╡ Vi«t-t±-mπ.
Vietnamese Character Encoding Standard: b╡ Vi«t-t±-mπ tiΩu chuªn.
Viet-Std: M╡t nh≤m v⌠ v° l■i g░m cßc chuyΩn viΩn Vi«t-nam ╖ hΣi ngo╒i h■p tßc
≡¼ ≡╕nh ra cßc tiΩu chuªn v½ c▀╜ng-li«u vα nhu li«u cho chµ Vi«t. Cßc
thαnh viΩn trao ≡▒i ² ki¬n qua ≡i«n th▀ vα cßc cu╡c h≈p.
Variant: d╕-bΣn
Word Processing: x╪ l² chµ.
Word Processor: mßy x╪ l² chµ.
Word Processing Software: nhu li«u x╪ l² chµ.
58
Ph° L°c A: Mτu T± Vi«t Li«t KΩ Theo Th╤ T± Síp Chµ
+============================================================================+
|Chµ: VIQR:VISCII || Chµ: VIQR:VISCII || Chµ: VIQR:VISCII || Chµ: VIQR:VISCII|
+---:-----:-------++----:-----:-------++----:-----:-------++----:-----:------+
| A : A : 065 || N : N : 078 || a : a : 097 || n : n : 110 |
| ┴ : A' : 193 || O : O : 079 || ß : a' : 225 || o : o : 111 |
| └ : A` : 192 || ╙ : O' : 211 || α : a` : 224 || ≤ : o' : 243 |
| ─ : A? : 196 || ╥ : O` : 210 || Σ : a? : 228 || ≥ : o` : 242 |
| ├ : A~ : 195 || Ö : O? : 153 || π : a~ : 227 || ÷ : o? : 246 |
| Ç : A. : 128 || ╒ : O~ : 213 || á : a. : 160 || ⌡ : o~ : 245 |
| ┼ : A( : 197 || Ü : O. : 154 || σ : a( : 229 || ≈ : o. : 247 |
| ü : A(' : 129 || ╘ : O^ : 212 || í : a(' : 161 || ⌠ : o^ : 244 |
| é : A(` : 130 || Å : O^' : 143 || ó : a(` : 162 || » : o^' : 175 |
| : A(? : 002 || É : O^` : 144 || ╞ : a(? : 198 || ░ : o^` : 176 |
| : A(~ : 005 || æ : O^? : 145 || ╟ : a(~ : 199 || ▒ : o^? : 177 |
| â : A(. : 131 || Æ : O^~ : 146 || ú : a(. : 163 || ▓ : o^~ : 178 |
| ┬ : A^ : 194 || ô : O^. : 147 || Γ : a^ : 226 || ╡ : o^. : 181 |
| ä : A^' : 132 || ┤ : O+ : 180 || ñ : a^' : 164 || ╜ : o+ : 189 |
| à : A^` : 133 || ò : O+' : 149 || Ñ : a^` : 165 || ╛ : o+' : 190 |
| å : A^? : 134 || û : O+` : 150 || ª : a^? : 166 || ╢ : o+` : 182 |
| : A^~ : 006 || ù : O+? : 151 || τ : a^~ : 231 || ╖ : o+? : 183 |
| ç : A^. : 135 || │ : O+~ : 179 || º : a^. : 167 || ▐ : o+~ : 222 |
| B : B : 066 || ö : O+. : 148 || b : b : 098 || ■ : o+. : 254 |
| C : C : 067 || P : P : 080 || c : c : 099 || p : p : 112 |
| D : D : 068 || Q : Q : 081 || d : d : 100 || q : q : 113 |
| ╨ : DD : 208* || R : R : 082 || ≡ : dd : 240 || r : r : 114 |
| E : E : 069 || S : S : 083 || e : e : 101 || s : s : 115 |
| ╔ : E' : 201 || T : T : 084 || Θ : e' : 233 || t : t : 116 |
| ╚ : E` : 200 || U : U : 085 || Φ : e` : 232 || u : u : 117 |
| ╦ : E? : 203 || ┌ : U' : 218 || δ : e? : 235 || · : u' : 250 |
| ê : E~ : 136 || ┘ : U` : 217 || ¿ : e~ : 168 || ∙ : u` : 249 |
| ë : E. : 137 || £ : U? : 156 || ⌐ : e. : 169 || ⁿ : u? : 252 |
| ╩ : E^ : 202 || ¥ : U~ : 157 || Ω : e^ : 234 || √ : u~ : 251 |
| è : E^' : 138 || ₧ : U. : 158 || ¬ : e^' : 170 || ° : u. : 248 |
| ï : E^` : 139 || ┐ : U+ : 191 || ½ : e^` : 171 || ▀ : u+ : 223 |
| î : E^? : 140 || ║ : U+' : 186 || ¼ : e^? : 172 || ╤ : u+' : 209 |
| ì : E^~ : 141 || ╗ : U+` : 187 || ¡ : e^~ : 173 || ╫ : u+` : 215 |
| Ä : E^. : 142 || ╝ : U+? : 188 || « : e^. : 174 || ╪ : u+? : 216 |
| F : F : 070 || : U+~ : 255 || f : f : 102 || µ : u+~ : 230 |
| G : G : 071 || ╣ : U+. : 185 || g : g : 103 || ± : u+. : 241 |
| H : H : 072 || V : V : 086 || h : h : 104 || v : v : 118 |
| I : I : 073 || W : W : 087 || i : i : 105 || w : w : 119 |
| ═ : I' : 205 || X : X : 088 || φ : i' : 237 || x : x : 120 |
| ╠ : I` : 204 || Y : Y : 089 || ∞ : i` : 236 || y : y : 121 |
| ¢ : I? : 155 || ▌ : Y' : 221 || ∩ : i? : 239 || ² : y' : 253 |
| ╬ : I~ : 206 || ƒ : Y` : 159 || ε : i~ : 238 || ╧ : y` : 207 |
| ÿ : I. : 152 || : Y? : 020 || ╕ : i. : 184 || ╓ : y? : 214 |
| J : J : 074 || : Y~ : 025 || j : j : 106 || █ : y~ : 219 |
| K : K : 075 || : Y. : 030 || k : k : 107 || ▄ : y. : 220 |
| L : L : 076 || Z : Z : 090 || l : l : 108 || z : z : 122 |
| M : M : 077 || : : || m : m : 109 || : : |
+============================================================================+
* Quy ≡╕nh VIQR c≥n cho phΘp t▀■ng tr▀ng "╨" bóng "Dd" hoúc "dD".
59
Ph° L°c B: Mτu T± Vi«t Li«t KΩ Theo Th╤ T± Mπ S» VISCII Thºp PhΓn
+===================================+========================================+
|VISCII:Chµ:VIQR : TΩn Anh Ngµ |VISCII:Chµ:VIQR : TΩn Anh Ngµ |
+------:---:-----:------------------+------:---:-----:-----------------------+
| 002 : : A(? :A breve hook-above| 112 : p : p : p |
| 005 : : A(~ :A breve tilde | 113 : q : q : q |
| 006 : : A^~ :A circumflex tilde| 114 : r : r : r |
| 020 : : Y? :Y hook-above | 115 : s : s : s |
| 025 : : Y~ :Y tilde | 116 : t : t : t |
| 030 : : Y. :Y dot-below | 117 : u : u : u |
| 065 : A : A : A | 118 : v : v : v |
| 066 : B : B : B | 119 : w : w : w |
| 067 : C : C : C | 120 : x : x : x |
| 068 : D : D : D | 121 : y : y : y |
| 069 : E : E : E | 122 : z : z : z |
| 070 : F : F : F | 128 : Ç : A. : A dot-below |
| 071 : G : G : G | 129 : ü : A(' : A breve acute |
| 072 : H : H : H | 130 : é : A(` : A breve grave |
| 073 : I : I : I | 131 : â : A(. : A breve dot-below |
| 074 : J : J : J | 132 : ä : A^' : A circumflex acute |
| 075 : K : K : K | 133 : à : A^` : A circumflex grave |
| 076 : L : L : L | 134 : å : A^? :A circumflex hook-above|
| 077 : M : M : M | 135 : ç : A^. :A circumflex dot-below |
| 078 : N : N : N | 136 : ê : E~ :E tilde |
| 079 : O : O : O | 137 : ë : E. :E dot-below |
| 080 : P : P : P | 138 : è : E^' :E circumflex acute |
| 081 : Q : Q : Q | 139 : ï : E^` :E circumflex grave |
| 082 : R : R : R | 140 : î : E^? :E circumflex hook-above|
| 083 : S : S : S | 141 : ì : E^~ :E circumflex tilde |
| 084 : T : T : T | 142 : Ä : E^. :E circumflex dot-below |
| 085 : U : U : U | 143 : Å : O^' :O circumflex acute |
| 086 : V : V : V | 144 : É : O^` :O circumflex grave |
| 087 : W : W : W | 145 : æ : O^? :O circumflex hook-above|
| 088 : X : X : X | 146 : Æ : O^~ :O circumflex tilde |
| 089 : Y : Y : Y | 147 : ô : O^. :O circumflex dot-below |
| 090 : Z : Z : Z | 148 : ö : O+. : O horn dot-below |
| 097 : a : a : a | 149 : ò : O+' : O horn acute |
| 098 : b : b : b | 150 : û : O+` : O horn grave |
| 099 : c : c : c | 151 : ù : O+? : O horn hook-above |
| 100 : d : d : d | 152 : ÿ : I. : I dot-below |
| 101 : e : e : e | 153 : Ö : O? : O hook-above |
| 102 : f : f : f | 154 : Ü : O. : O dot-below |
| 103 : g : g : g | 155 : ¢ : I? : I hook-above |
| 104 : h : h : h | 156 : £ : U? : U hook-above |
| 105 : i : i : i | 157 : ¥ : U~ : U tilde |
| 106 : j : j : j | 158 : ₧ : U. : U dot-below |
| 107 : k : k : k | 159 : ƒ : Y` : Y grave |
| 108 : l : l : l | 160 : á : a. : a dot-below |
| 109 : m : m : m | 161 : í : a(' : a breve acute |
| 110 : n : n : n | 162 : ó : a(` : a breve grave |
| 111 : o : o : o | 163 : ú : a(. : a breve dot-below |
+===================================+========================================+
60
Ph° L°c B: Mτu T± Vi«t Li«t KΩ Theo Th╤ T± Mπ S» VISCII Thºp PhΓn (ti¬p theo)
+========================================+===================================+
|VISCII:Chµ:VIQR : TΩn Anh Ngµ |VISCII:Chµ:VIQR : TΩn Anh Ngµ |
+------:---:-----:-----------------------+------:---:-----:------------------+
| 164 : ñ : a^' : a circumflex acute | 210 : ╥ : O` : O grave |
| 165 : Ñ : a^` : a circumflex grave | 211 : ╙ : O' : O acute |
| 166 : ª : a^? :a circumflex hook-above| 212 : ╘ : O^ : O circumflex |
| 167 : º : a^. :a circumflex dot-below | 213 : ╒ : O~ : O tilde |
| 168 : ¿ : e~ :e tilde | 214 : ╓ : y? : y hook-above |
| 169 : ⌐ : e. :e dot-below | 215 : ╫ : u+` : u horn grave |
| 170 : ¬ : e^' :e circumflex acute | 216 : ╪ : u+? : u horn hook-above|
| 171 : ½ : e^` :e circumflex grave | 217 : ┘ : U` : U grave |
| 172 : ¼ : e^? :e circumflex hook-above| 218 : ┌ : U' : U acute |
| 173 : ¡ : e^~ :e circumflex tilde | 219 : █ : y~ : y tilde |
| 174 : « : e^. :e circumflex dot-below | 220 : ▄ : y. : y dot-below |
| 175 : » : o^' :o circumflex acute | 221 : ▌ : Y' : Y acute |
| 176 : ░ : o^` :o circumflex grave | 222 : ▐ : o+~ : o horn tilde |
| 177 : ▒ : o^? :o circumflex hook-above| 223 : ▀ : u+ : u horn |
| 178 : ▓ : o^~ :o circumflex tilde | 224 : α : a` : a grave |
| 179 : │ : O+~ :O horn tilde | 225 : ß : a' : a acute |
| 180 : ┤ : O+ :O horn | 226 : Γ : a^ : a circumflex |
| 181 : ╡ : o^. :o circumflex dot-below | 227 : π : a~ : a tilde |
| 182 : ╢ : o+` : o horn grave | 228 : Σ : a? : a hook-above |
| 183 : ╖ : o+? : o horn hook-above | 229 : σ : a( : a breve |
| 184 : ╕ : i. : i dot-below | 230 : µ : u+~ : u horn tilde |
| 185 : ╣ : U+. : U horn dot-below | 231 : τ : a^~ :a circumflex tilde|
| 186 : ║ : U+' : U horn acute | 232 : Φ : e` : e grave |
| 187 : ╗ : U+` : U horn grave | 233 : Θ : e' : e acute |
| 188 : ╝ : U+? : U horn hook-above | 234 : Ω : e^ : e circumflex |
| 189 : ╜ : o+ : o horn | 235 : δ : e? : e hook-above |
| 190 : ╛ : o+' : o horn acute | 236 : ∞ : i` : i grave |
| 191 : ┐ : U+ : U horn | 237 : φ : i' : i acute |
| 192 : └ : A` : A grave | 238 : ε : i~ : i tilde |
| 193 : ┴ : A' : A acute | 239 : ∩ : i? : i hook-above |
| 194 : ┬ : A^ : A circumflex | 240 : ≡ : dd : d bar |
| 195 : ├ : A~ : A tilde | 241 : ± : u+. : u horn dot-below |
| 196 : ─ : A? : A hook-above | 242 : ≥ : o` : o grave |
| 197 : ┼ : A( : A breve | 243 : ≤ : o' : o acute |
| 198 : ╞ : a(? : a breve hook-above | 244 : ⌠ : o^ : o circumflex |
| 199 : ╟ : a(~ : a breve tilde | 245 : ⌡ : o~ : o tilde |
| 200 : ╚ : E` : E grave | 246 : ÷ : o? : o hook-above |
| 201 : ╔ : E' : E acute | 247 : ≈ : o. : o dot-below |
| 202 : ╩ : E^ : E circumflex | 248 : ° : u. : u dot-below |
| 203 : ╦ : E? : E hook-above | 249 : ∙ : u` : u grave |
| 204 : ╠ : I` : I grave | 250 : · : u' : u acute |
| 205 : ═ : I' : I acute | 251 : √ : u~ : u tilde |
| 206 : ╬ : I~ : I tilde | 252 : ⁿ : u? : u hook-above |
| 207 : ╧ : y` : y grave | 253 : ² : y' : y acute |
| *208 : ╨ : DD : D bar | 254 : ■ : o+. : o horn dot-below |
| 209 : ╤ : u+' : u horn acute | 255 : : U+~ : U horn tilde |
+========================================+===================================+
* Quy ≡╕nh VIQR c≥n cho phΘp t▀■ng tr▀ng "╨" bóng "Dd" hoúc "dD".
ANNOUNCEMENT OF VISCII-COMPLIANT SOFTWARE APPLICATIONS - RELEASE 3
San Jose, California, September 1992
by
The TriChlor Group
3388 Burgundy Drive
San Jose, CA 95132, U.S.A.
Email: TriChlor@Haydn.Stanford.EDU
This release includes major support for Microsoft Windows 3.1, and an upgrade
to VISCII 1.1. It has been an unusually productive period for TriChlor and
TriChlor-Talk members. Thanks everyone for a great effort. New software added
in this release:
winvnkey, vnfont1, vn-nhac, butviet, more bdf fonts for X,
vietxrn, vprint, thitap1.
With the newly added Windows and TrueType supports, one can generate
"beautiful" and "professional" publications easily, such as ≡úc san, bßo xuΓn,
bφch bßo, sßch, ... All data files are identical across Unix, DOS and Windows
platforms, with the same keyboard entry style. If you use TriChlor software to
generate these publications, we would be glad to receive a compliment/souvenir
copy.
Most of TriChlor software provides a foundation upon which other Vietnamese-
dedicated applications can be built. Following is a short description of all
programs. All are Viet-Std VISCII 1.1 & VIQR compliant. Unless otherwise
specified, most software is written/ported by C▀╢ng Tñn Nguy¡n
(cuong@Haydn.Stanford.EDU) and C▀╜ng Minh B∙i (bui@berlioz.nsc.com).
1. FOR UNIX & X-WINDOWS:
~~~~~~~~~~~~~~~~~~~~~~~
include: Include files for software developers.
lib: Library routines for software developers.
fonts: Contains X-Window bitmap fonts, METAFONT sources for TeX, pre-made
TeX bitmap .pk fonts, and scalable outline Type-1 fonts.
METAFONT can be used to generate different font sizes and typefaces
for TeX bitmap .pk fonts.
X-Window .bdf fonts include:
vn10x20.bdf, vnlucida18 [NEW]
designed by Thu V. Vu (tvu@sg102a.ess.harris.com)
vn9x15.bdf, vn-r14.bdf.
61
62
Type 1 fonts: include all the TrueType fonts in vnfont1 archived
under VN/WINDOWS
vietterm:
A modified version of X-Window xterm. It allows entry and display
of Vietnamese characters, at the same time stays compatible with US-
ASCII environment, and uses existing 8-bit clean Unix tools.
Changes include 8-bit Viet-Std, support for both Athena and MOTIF
widgets, radio buttons to switch/display keyboard and screen modes.
vnterm:
A modified version of X-Window xterm. It allows entry and display
of Vietnamese characters, at the same time stays compatible with US-
ASCII environment, and uses existing 8-bit clean Unix tools.
Changes include 8-bit Viet-Std.
vn7to8:
Filters for conversion between 8-bit Viet-Std VISCII codes and 7-bit
Viet-Std (VIQR), for printing, posting or mailing. SCV articles and
Viet-Net mails (7-bit) can be printed by converting them to 8-bit
Viet-Std. Algorithm by H≈c ╨. Ng⌠ (ngo@nas.nasa.gov)
vnpstext:
Prints 8-bit Viet-Std file to PostScript printers. Uses PostScript
Type 3 Courier font. Both regular and bold typefaces are supported.
Default to font size 10. Prolog files can be modified for different
font size. Also available for DOS.
vietxmail:
A modified version of X Window xmail. Used together with
"vnmailtool" to allow one to read/write 8-bit Vietnamese chars in
7-bit channel mail transparently. Updates include 8-bit Viet-Std
VISCII codes. Allows choice of different mail-agents (sendmail) or
filters. Needs utilities from "vnmailtool" to work.
vnmailtool:
A set of conversion filters, when set up properly, will process
8-bit Viet-Std to 7-bit Viet-Std mails and vice-versa transparently.
Allows choice of different mail-agents (sendmail) or filters. Can
be used with vietxmail or vietterm/vnterm. Also included are
viet7to8 and viet8to7 for conversion between 8-bit Viet-Std and
7-bit Viet-Std for posting or mailing.
vnelvis:
A modified "elvis1.4" ("vi" editor clone), tuned for 8-bit Viet-Std
& C0 space characters. Need vietterm or vnterm. Works with most
Unix system. Also available on DOS. Needs vietdos on DOS.
vpp8:
A modified version of vpp, a troff preprocessor, used to print 8-bit
Viet-Std characters. Original program written by Hµu, ported to C
by NhΓn TrÑn (tran@peora.sdc.ccur.com), ported to 8-bit Viet-Std by
63
TriChlor. Support for Italic added. troff is standard on most Unix
system. Scalable fonts.
vnfortune:
Same as Unix "fortune" program. Displays Vietnamese poems, t°c ngµ,
ca dao, ... randomly. Users can add or modify the database.
Database consists of around 100 phrases, contributed by SCV and
Vietnetters, most notably, Nguy¡n Th╕ Ca-Dao (ca-
dao@sjsuvm1.bitnet).
Author: QuΓn TrÑn (qdt00@duts.ccc.amdahl.com)
locale.sun:
VISCII locale for SUN running OS 4.1. When installed will allow
direct use of Sun's "vi", "ls" and other tools that support locale
to work also on VISCII 8-bit Vietnamese chars.
vnless:
A modified "less" to work with 8 bit Viet-Std Very powerful pager.
vietsc:
A modified "sc", a popular public domain Unix spreadsheet. Includes
mortgage calculation examples. May need vncurses.
vncurses:
Vietnamese capable curses library. Unix's curses from BSD 4.3 or
SVR3 is not 8 bit clean.
lambai:
A sample application for teaching Vietnamese. Teachers can generate
coded quiz in Vietnamese. Students answer in either English or
Vietnamese. Also available on IBM PC.
vietxrn:
Vietnamese capable version of xrn 6.17. For posting, reading and
replying using Vietnamese 8-bit Viet-Std. Automatic, transparent
conversion to Viet-Std 7-bit VIQR for posting and mailing. To
compose and print in 8-bit, needs vietterm or vnterm, vnelvis or
locale, and vpp8 or vnpstext. Supports both MOTIF and Athena
widgets.
2. FOR MS-WINDOWS
~~~~~~~~~~~~~~~~~
winvnkey: [NEW]
Allows entry of Vietnamese chars in Microsoft Windows applications.
Works with most Windows existing applications such as Word, Notepad,
Paintbrush, ... Viet-Std compliant.
64
vnfont1: [NEW]
Set of 4 Vietnamese TrueType fonts, VISCII encoding.
HeoMay (vn-helvetica)
HoangYen (vn-present-script)
MinhQuan (vn-courier)
UHoai (vn-utopia)
From these fonts, Windows can generated many point sizes and
variations such as Italic, Bold and Bold-Italic.
monmoi: [NEW]
Add Vietnamese description to icon. Window's File Manager has
trouble doing this.
thitap1: [NEW]
Electronic poetry collection of 11 poems by Quang D√ng. Display and
print poems using Window Help facility.
3. FOR MS-DOS
~~~~~~~~~~~~~
vietdos:
Allows entry and display of Vietnamese chars. Works with most
existing software such as vi, vnelvis, pctools, xtree, MS Edit,
WordPerfect, brief, Paradox, Word, C, Pascal, ASM, ... Requires EGA
or VGA. Use vprint or vlaser to print.
vnelvis:
"elvis1.4", a "vi" editor clone, tuned for Vietnamese. This is the
same version of vnelvis on Unix, ported to DOS.
viet78:
filters for 7-bit (used by SCV, vietnet) to 8-bit Viet-Std. Also
from 8-bit Viet-Std to 7-bit.
less:
public domain "less" pager, better than "more", with environment
variable for charset tuned for 8-bit Viet-Std
lambai:
same version of lambai on Unix, ported to DOS. Dedicated for
teaching/testing using Vietnamese language. Required vietdos.
vlaser2:
printing 8-bit Viet-Std Vietnamese on laser printers. Supports 4
typefaces.
Author: H≥a Nguy¡n (nguyenh@cod.nosc.mil)
He is also author of viet1p6, a Vietnamese word processor.
65
testdata.std:
A file of VISCII-1.1 8-bit Vietnamese characters for testing
software.
Author: H≥a Nguy¡n (nguyenh@cod.nosc.mil)
vn-nhac: [NEW]
A colorful program that display collection of Vietnamese songs, menu
driven. Currently, there are four volumes of 200 songs contributed
by SCVers and Viet-Netters.
Author: H░ Ph▀╛c H∙ng (hho@aludra.usc.EDU)
butviet: [NEW]
A Vietnamese editor, Viet-Std VISCII and VIQR compliant, can be used
with CGA, Hercules, EGA, VGA. Has multi-windows and mouse support.
Author: Nguy¡n Doπn V▀■ng (vnguyen@adobe.com)
vprint: [NEW]
VISCII 1.1 printer driver for Epson and IBM dot-matrix printer. Can
be invoked from a pipe.
Author: Nguy¡n Doπn V▀■ng (vnguyen@adobe.com)
convert 1.3: [NEW]
Convert between many Vietnamese coding formats, including VISCII and
VIQR.
Author: Vietnamese Professional Society
(H╡i ChuyΩn Gia Vi«t Nam)
Email: hcgvn@netcom.com
4. WORK IN PROGRESS
~~~~~~~~~~~~~~~~~~~
vietvu:
similar to winvnkey, can also use clipboard besides message hook.
Author: V√ V. ChΓu (chauv@microsoft.com)
thitap:
a bookshelf of poetry volumes, currently has thitap1 (Quang D√ng).
Will have thitap2 (Nguy¡n Bφnh), thitap3 (nhi½u tßc giΣ), ... Many
poems were posted by Viet-Netters and SCV-ers. Needs volunteer
editors! Contact bui@berlioz.nsc.com if interested.
5. PROPOSED PROJECTS (Need volunteers!)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TrueType fonts:
Additional fonts are needed. Should be public domain and re-
distributable.
66
text2speech:
to encode Vietnamese speech and implement text to speech conversion
for Vietnamese text files.
multimedia application:
vietspell:
a dictionaryless speller based on lex/yacc.
6. GETTING SOFTWARE
~~~~~~~~~~~~~~~~~~~
There is an archive site at sonygate for anonymous ftp, or if you don't have
ftp, there is a mailserver at saigon.com. Pre-compiled binaries for IBM-PC
and Sun 4/SPARC are also available via floppy disks. Binaries are compiled
under Sun OS 4.1, X11R4, OpenWindow 3.0, Motif 1.1.
6.1 Using Anonymous ftp
~~~~~~~~~~~~~~~~~~~~~~~
Here are the instructions for accessing the anonymous FTP archive.
Site name: sonygate.Sony.Com [192.65.138.2] (try 192.65.138.240 if the
first one does not work)
Login as anonymous, password is anything (please use your email name).
% mkdir VN # where you will be compiling everything
% cd VN
% ftp sonygate.sony.com
Connected to sonygate.sony.com
220 sonygate.sony.com FTP server ready.
Name (sonygate.sony.com:<your_login>): anonymous
Password (sonygate.sony.com:anonymous): <type_your_name>
331 Guest login ok, send ident as password.
230 Guest login ok, access restrictions apply.
ftp> cd tin/VN
250 CWD command successful.
ftp> type image # (or "binary")
200 Type set to I
ftp> get <file>.tar.Z
ftp> quit
If you have question/problem accessing the archive, please contact
Tφn LΩ (tin@smsc.sony.com)
67
6.2 Using Modem or MailServer at Saigon.COM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: tin@smsc.sony.com (Tφn LΩ)
Subject: [TECH] Instruction for accessing Viet files
Date: Tue, 4 Aug 92 00:19:56 GMT
The mailserver at Saigon.COM has been updated to contain the latest files from
the anonymous FTP archive at sonygate. This is for those of you without FTP
access. You can retrieve the files by sending email requests.
You may also dialup Saigon.COM directly and download the files. The front end
to it is Station Zebra, a BBS. It is located in the San Francisco Bay Area.
The phone number is 408-739-1520. Login with user id of BBS.
[ MailServer Instructions ]
Vietnet and Vietnamese Software MailServer Archive
-------------------------------------------------
This is the MailServer archive for Vietnet and Vietnamese software. It is a
duplicate of the anonymous FTP archive. The purpose of this mailserver is to
allow those without ftp access to retrieve these files.
The MailServer looks for instructions in the body of the message. It ignores
the subject line. Following are some of the instructions that it understands:
LIST directory_name
DIR directory_name
CD directory_name
GET filename
HELP
Please make sure that your email address in the header of the message is good.
The MailServer uses that to send the files to you. I have gotten a lot of
bounced messages from those of you with unreachable email address.
Here is an example session:
$ mail mailserver@saigon.com
Subject: <anything here>
DIR viet.net
GET viet.net/vnterm.tar.Z
EOF (type a Control-D)
The email address of the server is mailserver@saigon.com Some of you who are
in the UUCP domain might have problem sending mail to an Internet style
address. You could try the following bang paths:
uunet!sonyusa!szebra!mailserver
claris!szebra!mailserver
zorch!szebra!mailserver
68
6.3 Using Modem to Phsys.COM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
VIETNET BBS - phsys.com (PH. Systems)
Phone No: 1-213-734-8528 (Modem: Hayes V32-V42, 8N1, 300-38400bps)
Login: NEW (then fill out questionaire for account)
Owner: H∙ng P. H░, hung@phsys.com (Los Angeles, California, USA)
Mail Server will be in operation starting 2nd week of Oct 92,
please mail to mail-server@phsys.com for more info
The BBS is a 24-hour public system with Vietnamese newsgroups, Job listings,
limited Usenet News, mail connections to many other nets including Internet.
Archive site of all Vietnamese public-domain software and technical documents
from Viet-Std, Vietnamese Professional Society, TriChlor, MAT/CAT, etc.
Graphics section has been updated with *.gif from the Viet Nam War era. You
are welcome to login and download software/gif. Please leave him a
message/mail so he will give you full access.
6.4 Using Post Office
~~~~~~~~~~~~~~~~~~~~~
For DOS/Windows send $5.00 to cover the set of 2 floppy disks, and specify
your disk type (5-inch disk or 3.5-inch disk). The disks come with very easy
to set up and install program, that will put executables in proper place, and
installed to proper Windows group.
For Sun Sparc/Sun4/X-Windows binaries, send $10.00 to cover a set of 2 floppy
disks containing the following programs: vietterm, vnpstext, vpp8, vietxrn,
vnelvis, locale, ... and bdf fonts.
Make check payable to C▀╢ng T. Nguy¡n (acting accountant) and send to:
TriChlor Software
3388 Burgundy Drive
San Jose, CA 95132, U.S.A.
We definitely are not going to profit from this. The $5 covering mailing
expenses does not come anywhere close to representing the huge effort
everyone, named and unnamed, has put into this, both in terms of time,
keyboard stress, head strain, and out-of-pocket expenses to buy compilers,
font converters, hardware, etc. to bring VISCII/VIQR compliant software to the
Vietnamese community. If you like what we are doing and would like to to help
us meet expenses---which are quite real and considerable---your contribution
in any amount is always welcome, but is never required.
[ For an experimental period, precompiled binaries for a select set of
TriChlor software will be available via anonymous FTP to Haydn.Stanford.EDU
(36.22.0.47). Although limits are not enforced, please try to schedule your
access between 7PM - 6AM Pacific time. Depending on the resulting load on the
system, this service may become permanent, off-hours, or altogether removed.
69
Binaries are currently available for DecStation 3100, Sun-3, and Sun-4.
Knowledgeable users with binaries compiled for these and other platforms are
encouraged to make them available to other netters. Trichlor-
binaries@Haydn.Stanford.EDU is a mailing list for that circle of volunteers
who participate in this distribution system. If you have binaries and are
interested in helping, please send a "SIGN-ON" message to
TriChlor-binaries-request@Haydn.Stanford.EDU
]
7. COPYRIGHT
~~~~~~~~~~~~
All software products are copyrighted by respective authors or the TriChlor
group. All rights reserved. You may copy, distribute, and use these software
for any purpose, as long as you do not charge a fee for doing so, and include
this copyright notice. All software is provided "as is", without express or
implied warranty.
All software is provided free of charge as copyrighted freeware (that means
you can have it for free and make as many copies for as many friends as you
like, but we reserve the copyright in order to prevent unauthorized
modifications and to prevent people from selling it to unsuspecting users---we
want to keep the freeware free).
8. ABOUT THE TRICHLOR GROUP
~~~~~~~~~~~~~~~~~~~~~~~~~~~
TriChlor is a group of volunteers whose common interests are to provide
quality and free utilities, codes, libraries for use with the Vietnamese
language, and to integrate Vietnamese language to current computing
environments.
If you would like to pitch in and help, we can be contacted at
TriChlor@Haydn.Stanford.EDU
If you wish, you may get on the mailing list and participate in discussions by
sending a message to
TriChlor-talk-request@Haydn.Stanford.EDU
with the subject
"SIGN-ON".
But please be prepared to donate your time :-).
All works, unless specified, will be placed in public domain.
The TriChlor Group