The Unsorted BBS Collection

home *** CD-ROM | disk | FTP | other *** search

/ The Unsorted BBS Collection / thegreatunsorted.tar / thegreatunsorted / hacking / internet / text.for < prev next >

Wrap

Text File | 1993-09-21 | 25.4 KB | 567 lines

Newly Revised and Updated Formatting Standard for Project Galactic Guide Revised 19930420 by Paul Clegg, with lots of information supplied by Stephane Lussier, Tobias B Koehler, and everyone on alt.galactic-guide Introduction: The point of all this is to have a very, very extensive reference for programmers and editors to create and maintain the data archives for Project Galactic Guide. The reason this extensive formatting design is necessary is because the Guide will be (and already has been) ported to various computer architectures, and not all computers use the same character sets, or can handle the same type of information. In particular, the Unix systems that most of us use can only handle 7-bit ASCII for mailings, news posts, etc, so we are constrained to use the worst possible character set for our data. This does not mean that we cannot represent alternate character sets. This was the primary reason for updating the design into an extremely complex standard in the first place. The purpose has since expanded to include various text effects, margin control, etc, that is or might be needed to properly portray specific articles. This text here should not intimidate field researchers in any way. Articles will be accepted in raw ASCII format, hand-written hardcopy, or even in text printed with a word processor package. The editors would like to encourage field researchers to use the following standard, to lighten their workload, but the hierarchy here does not at all require field researchers to use this format for their submissions. With that aside, I now cast you into the world of 7-bit data representations... Special Characters: This section details all the special characters that might be used in any given article. Accompanying the name of the character is the code, 7-bit replacement (if there is no better replacement in any given character set), and numerical codes for several popular character sets. Most of the information contained within this section has been derived from Tobias B. Koehler's posting to alt.galactic-guide. Definitions of accents: breve accent: \_/ (above letter) acute accent: / (above letter) grave accent: \ (above letter) circumflex: /\ (above letter) hacek accent: \/ (above letter) tilde: ~ (above letter) two dots: .. (above letter) ring: o (above letter) two acute acc: // (above letter) dot: . (above letter) cedilla: _) (under letter) ogonek hook: (_ (under letter) Special letters: Eth and Thorn are special Icelandic characters. The uppercase Eth looks like a slashed D, the lowercase eth looks like a horizontally flipped 6 with a slash. The uppercase Thorn looks like the upper half of a b combined with the lower half of a p. The long s looks like the f without the horizontal bar; the sharp s is a ligature of a long s and a normal s. Both are German thingies. code: Textual code repl: 7-bit replace to be used if character not available EC: TeX Extended Computer Modern character set code ISO: ISO 8859/1 (Amiga, Windows) character set code 850: IBM codepage 850 (MS-DOS, OS/2) character set code Most important: To represent a backslash (which is normally an escape character to denote a special character or effect) use a double backslash: \\ inserts a single \ character. code |repl|description |position | | | |EC |ISO |850 | \ch`` " Eng dbl left/Ger dbl right quote 16 147 \ch'' " English double right quote 17 148 \ch,, " German double left quote 18 132 \ch<< " French double left quote 19 171 174 \ch>> " French double right quote 20 187 175 \ch < ` French single left quote 14 139 \ch > ' French single right quote 15 152 \ch-- -- long dash (as opposed to hyphen) 22 151 196 \ch r d degree sign 6 176 248 \ch$$ $ paragraph or section sign 159 167 245 \%o o/oo promille sign 37+24 137 \chOC (C) copyright sign 169 184 \chOR (R) registered trademark sign 174 169 \ch=L L pound sterling sign 191 163 156 \chuA A A with breve accent 128 \ch;A A A with ogonek hook 129 \ch`A A A with grave accent 192 192 183 \ch'A A A with acute accent 193 193 181 \ch^A A A with circumflex 194 194 182 \ch~A A A with tilde 195 195 199 \ch"A Ae A with two dots 196 196 142 \chrA Aa A with ring (ala Angstrom) 197 197 143 \chAE AE AE ligature 198 198 146 \chua a a with breve accent 160 \ch;a a a with ogonek hook 161 \ch`a a a with grave accent 224 224 133 \ch'a a a with acute accent 225 225 160 \ch^a a a with circumflex 226 226 131 \ch~a a a with tilde 227 227 198 \ch"a ae a with two dots 228 228 132 \chra aa a with ring 229 229 134 \chae ae ae ligature 230 230 145 \ch'C C C with acute accent 130 \chvC C C with hacek accent 131 \ch,C C C with cedilla 199 199 128 \ch'c c c with acute accent 162 \chvc c c with hacek accent 163 \ch,c c c with cedilla 231 231 135 \chvD D D with hacek accent 132 \ch-D D slashed D or Eth (\chEt) 208 208 209 \ch-d d slashed d 158 \chet eth (\chet) 240 240 208 \chvE E E with hacek accent 133 \ch;E E E with ogonek hook 134 \ch`E E E with grave accent 200 200 212 \ch'E E E with acute accent 201 201 144 \ch^E E E with circumflex 202 202 210 \ch"E E E with two dots 203 203 211 \chve e e with hacek accent 165 \ch;e e e with ogonek hook 166 \ch`e e e with grave accent 232 234 138 \ch'e e e with acute accent 233 234 130 \ch^e e e with circumflex 234 234 136 \ch"e e e with two dots 235 235 137 \chuG G G with breve accent 135 \chug g g with breve accent 167 \ch.I I I with dot 157 \ch`I I I with grave accent 204 204 222 \ch'I I I with acute accent 205 205 161 \ch^I I I with circumflex 206 206 215 \ch"I I I with two dots 207 207 216 \ch i i dotless i 25 213 \ch`i i i with grave accent 236 236 141 \ch'i i i with acute accent 237 237 161 \ch^i i i with circumflex 238 238 140 \ch"i i i with two dots 239 239 139 \ch j j dotless j 26 \ch'L L L with acute accent 27 \ch-L L slashed L 138 \ch'l l l with acute accent 168 \ch-l l slashed l 169 \ch'N N N with acute accent 139 \chvN N N with hacek accent 140 \chNJ Nj NJ ligature 141 \ch~N N N with tilde 209 209 165 \ch'n n n with acute accent 170 \chvn n n with hacek accent 171 \chnj nj nj ligature 173 \ch~n n n with tilde 241 241 164 \chhO Oe O with two acute accents 142 \ch`O O O with grave accent 210 210 227 \ch'O O O with acute accent 211 211 224 \ch^O O O with circumflex 212 212 226 \ch~O O O with tilde 213 213 229 \ch"O Oe O with two dots 153 \chOE OE OE ligature 215 140 \ch/O Oe slashed O 216 216 157 \chho oe o with two acute accents 174 \ch`o o o with grave accent 242 242 149 \ch'o o o with acute accent 243 243 162 \ch^o o o with circumflex 244 244 147 \ch~o o o with tilde 245 245 228 \ch"o oe o with two dots 148 \choe oe oe ligature 247 156 \ch/o oe slashed o 248 248 155 \ch'R R R with acute accent 143 \chvR R R with hacek accent 144 \ch'r r r with acute accent 175 \chvr r r with hacek accent 176 \ch'S S S with acute accent 145 \chvS S S with hacek accent 146 138 \ch,S S S with cedilla 147 \ch's s s with acute accent 177 \chvs s s with hacek accent 178 154 \ch,s s s with cedilla 179 \chss ss sharp s 255 223 225 \chls s long s \chvT T T with hacek accent 148 \ch,T T T with cedilla 149 \chTh Thorn 222 222 232 \ch,t t t with cedilla 181 \chth thorn 254 254 231 \chhU UE U with two acute accents 150 \chrU U U with ring 151 \ch`U U U with grave accent 217 217 235 \ch'U U U with acute accent 218 218 233 \ch^U U U with circumflex 219 219 234 \ch"U Ue U with two dots 220 220 154 \ch.U U U with dot \chhu ue u with two acute accents 182 \chru u u with ring 183 \ch`u u u with grave accent 249 249 151 \ch'u u u with acute accent 250 250 163 \ch^u u u with circumflex 251 251 150 \ch"u ue u with two dots 252 252 129 \ch.u u u with dot \ch"Y Y Y with two dots 152 \ch'Y Y Y with acute accent 221 221 237 \ch"y y y with two dots 184 152 \ch'y y y with acute accent 253 253 236 \ch'Z Z Z with acute accent 153 \chvZ Z Z with hacek accent 154 \ch.Z Z Z with dot 155 \ch'z z z with acute accent 185 \chvz z z with hacek accent 186 \ch.z z z with dot 187 NOTE: The following information was mostly picked out of one of Stephane Lussier's numerous informative posts. The following are REALLY special characters that are usually only used in special circumstances, such as mathematical texts. I do not have the resources to research the characters in the various character sets, so in this case, the character code is followed by the 7-bit ASCII representation and a short explanation. Greek Characters: code |repl|description \Galp a lower case alpha \GALP A upper case alpha \Gbet b lowercase beta \GBET B uppercase beta \Ggam g lowercase gamma \GGAM G uppercase gamma \Gdel d lowercase delta \GDEL D uppercase delta \Geps e lowercase epsilon \GEPS E uppercase epsilon \Gzet z lowercase zeta \GZET Z uppercase zeta \Geta h lowercase eta \GETA H uppercase eta \Gthe o lowercase theta \GTHE O uppercase theta \Giot i lowercase iota \GIOT I uppercase iota \Gkap k lowercase kappa \GKAP K uppercase kappa \Glam l lowercase lambda \GLAM L uppercase lambda \G*mu m lowercase mu \G*MU M uppercase mu \G*nu n lowercase nu \G*NU N uppercase nu \G*xi x lowercase xi \G*XI X uppercase xi \Gomi o lowercase omicron \GOMI O uppercase omicron \G*pi pi lowercase pi \G*PI PI uppercase pi \Grho p lowercase rho \GRHO P uppercase rho \Gsig s lowercase sigma \GSIG S uppercase sigma \Gtau t lowercase tau \GTAU T uppercase tau \Gups u lowercase upsilon \GUPS U uppercase upsilon \Gphi o lowercase phi \GPHI O uppercase phi \Gchi x lowercase chi \GCHI X uppercase chi \Gpsi y lowercase psi \GPSI Y uppercase psi \Gome w lowercase omega \GOME W uppercase omega Note: Some 7-bit representations have been duplicated. From a programming standpoint, it's probably preferred to actually replace the symbol with its full name (sans upper/lowercase), since the 7-bit letters don't fully coincide with the real characters too much. Mathematical Characters: code |repl|description \M**8 oo infinity \M*+- +- plus over minus \MNOT - negation character (horizontal bar w/ short vertical bar on left) \M*lv V logic: OR \M(+) (+) logic: XOR (Exclusive OR) \M(/) 0 empty set notation \M*|^ v logic: NOR (down arrow type of thing) \M--> --> implication \M-/> -/-> "does not imply" \M<-- <-- implication \M</- <-/- "does not imply" \M<-> <--> double implication \M</> <-/-> "there is no double implication" \M==> ==> implication \M=/> =/=> "does not imply" \M<== <== implication \M</= <=/= "does not imply" \M<=> <==> equivalence \M</> <=/=> "there is no equivalence" \M*-= = congruence (three horizontal bars) \M/-= != not congruent \M*/= != not equal (slashed equal sign) \M**~ ~ is equivalent to \M*~- ~- isomorphism (tilde over single bar) \M*~~ ~= approximately equals (two stacked wavy lines) \M*~= = wavy line over equal sign \M*)( asymptotal (upcurve over downcurve) \M*|| || two parallel lines \M*rA upturned A, "for all" \M*rE reversed E, "there exists" \M/rE slashed reversed E, "there does not exist" \M*.: three dots in triangle, "therefore" \M**U U union \M*rU intersection (overturned U) \M**E "is an element of" \M*/E "is not an element of" \M**C C "is a subset of" \M*/C !C "is not a subset of" \M**X X Cartesian product sign \M**| | Full vertical bar for absolute values, etc. \M*/| !| Does not divide (vertical bar w/ slash) \M**o o Composition (small circle) \M**. * Product (small point) \M**> Derivable, right pointing hollow triangle \M**< Normal subgroup notation, left pointing hollow triangle \M**% Division sign (circle over and below horizontal line) \M*>= >= Greater than or equal to \M/>= !>= Not greater than or equal to \M*<= <= Less than or equal to \M/<= !<= Not less than or equal to \Mint Integration sign \Mont Integration sign with small circle on it \M**' ' Prime \M**" " Double prime \M*'" '" Triple prime (etc. up to \M""", sextuple prime) Formatting Effects: The following sections include various special text effects and devices to allow various platforms to display various things in special formats. Since monospaced ASCII has been shown to not work very well, particularly with varying display widths, it is impossible to relegate text formatting to the ASCII dump. Many of the ideas within this section have been taken straight from Stephane Lussier's post(s), though everyone's posts have influenced the end result you see here. Text Effects: Text effects are things such as bold, italic, superscript, subscript, underline, and other visual effects that may be applied to text to make it more visually appealing, clear, and informative. All format controls are denoted by a backslash, a code (usually four letters), and a left curly brace ("{"). These sections are terminated by a right curly brace ("}"). The text to be that should have the given effect should be inside the two curly braces. Because there may easily be a reason to have a right curly brace in the text, a right curly brace is denoted as \}, to indicate that it is not part of the text coding. There is no reason for an alternate marker for left curly braces. Bold: \bold{ <text> } Italic: \ital{ <text> } Underlined: \undl{ <text> } Double Underlined: \dund{ <text> } Subdued: \subd{ <text> } Flashing: \flsh{ <text> } Subscript: \subs{ <text> } Superscript: \sups{ <text> } Effects primarily used in mathematics: Overlined: \ovrl{ <text> } Right Arrow Over Expression (vector): \raro{ <text> } Left Arrow Over Expression: \laro{ <text> } Hat Over Expression: \mhat{ <text> } Note: <text> here must be a single character. NOTE: Very intricate mathematical formatting instructions may eventually be included in this standard, but they are not being included in this version. For programmers writing code, assume that if you come across the \MATH{ <text> } escape code sequence, ignore it all. This will allow reader programs written to this format to be able to handle the only major expansion to this format that I forsee in the future, or at least not barf if it comes across an article with the expanded math features. Addendum: You WILL have to check to make sure all the curly braces are matched within the \MATH structure, in order to figure out when the \MATH structure ends. Within MATH structures, \{ and \} indicate curly braces with no escape codes attached (and thus don't affect the stack of braces). Standard Structure: The body of every article is organized into sections. For instance, should this become an entry, this paragraph is considered a section. A table would have to be used for the character codes above, and that would be another section. In this case even the subtitles (such as "Standard Structure:") would be separate sections. Whether or not sections should be separated by blank lines is optional, and may be left to a user-defined option, or programmer's choice; the ruling is not made here. Text formatting codes (such as underline, etc. as listed above) should be reset to default in between sections. If a text style is to be continued into the next section, the proper codes must be re-applied within the section's curly braces. Paragraphs: The type of section that should be most common would be the standard paragraph. A paragraph is denoted by \para{ followed by all the text that should go into that paragraph. The paragraph must be terminated by an ending }. Escape codes are allowed in paragraphs provided they are not section codes. You cannot embed paragraphs inside other paragraphs, nor can you embed matrices, lists, etc. within paragraphs. An example paragraph: \para{This is an example paragraph. Other than the initial escape code, and the ending curly brace, and any required escape codes within this text, this text should be completely \bold{ASCII}. For electronic mail transmission purposes, the length of a line should not be more than 78 characters in width, and lines of less than 76 characters is appreciated. Because the end of a paragraph is only when a \} is found, the reader programs can wrap text on their own, and so the EOL can be relatively ignored. Do \bold{NOT} hyphenate words.} Individual Lines: Often individual lines are wanted or required, particularly for things such as subsection headers, and so on. An individual line is still considered a section, and as such should leave a blank line after it. However single lines are much more flexible than paragraphs in most respects, and there are actually several types of individual lines that may be employed in an article. Justification: Single lines may be justified in any one of three ways: left, right, and center. The codes for this are, respectively, \jstl{ }, \jstr{ }, and \cntr{ }. Preformat: A single line may be dictated as being preformatted, or absolute, where the reader should accept the text as being formatted for an 75 column display and should not try to "play" with the text involved. This is included only for those rare problems, and should not be used if at all possible. The escape code is \PREF{ <text> }. Textual effects may still be applied to the text contained in a preformatted line, but spacing should not be toyed with by the reader program. Special Effects: A single line allows us some freedom in other ways, too. Inserting a \. into a single line inserts a line feed, such that the text should drop to the same column, but the next row. This may be accomplished almost as easily, if not more easily, by simply using several preformat commands. Internal Passages: Long quotes should be given special cases, being different from a standard paragraph. Text enclosed in the \quot{} formatting code should be treated as a normal paragraph, but it should be indented on both sides when displayed. For an 80 column text screen, a five space indent on both sides is suggested. Lists: Lists are obviously used for lists of information, which may of any number of things. The list command, however, also works for outline designs, which is basically a specialized list design. There are several types of lists, and all of them may be nested within each other, with the one exception of the military notation list (see below). In any case, an element in a list should be offset from the left margin by some number of characters; for an 80 column display, the suggested indent space is 10 characters. Text that wraps around a display should be indented so as to line up with the first character of the actual text, and not just with the first digit of the element identifier. Sublists, or lists embedded in other lists, should be indented again. For all lists, the list type is used only to determine the type of list. Each element in the list must be contained in a \item{} field. Arabic Number List: This is your basic list, with elements numbered 1, 2, 3, etc. The escape code for this type of list is \LSAx { ... }, where x is the character that follows the number (see below). Lowercase Letter List: This uses the alphabet to denote its elements. The first element will be marked with by "a", the next by "b", etc. There may NOT be more than 26 elements in a letter list. The escape code is \LSlx { ... }. Uppercase Letter List: Exactly like the \LSlx { ... } list type, but using uppercase letters instead. The escape code is \LSLx { ... }, and it too is restricted to 26 or fewer elements. Lowercase Roman List: Uses lowercase Roman numerals, i, ii, iii, iv, etc. The escape code is \LSrx { ... }. Uppercase Roman List: Uses uppercase Roman numeral, I, II, III, IV, etc. The escape code is \LSRx { ... }. No Identifier List: This does not use any number or character to differentiate between elements. The escape code is \LS_x { ... }, which allows the author to still use special characters listed below to mark elements. Military Notation List: This is a tricky one. Only Military Notation Lists may be nested within Military Notation Lists. The identifying numbers are in Arabic numerals (ie. decimal), but also show the hierarchy of the list itself. The reader program must run through the list and determine how deep the sublists embedded in the list go, as each number must be expanded to show this. Thus, if you have a list that has a sublist inside it, and that sublist has yet another sublist, the numbers must expanded to three places, so the very first element would be 1.0.0, the second element would be 2.0.0, etc., but the sublist off the first element would have 1.1.0 for the first element. The first element off the first sublist of the first sublist would be 1.1.1. If sublists nested in a list five deep, the very first number would be 1.0.0.0.0, but if they nested only two deep, the first number would be 1.0. The escape code for this type of list is \LSMN { ... }. Separator Characters: With the exception of the Military Notation List, all the lists have one space in their command for a single character. This character must be chosen off the following list: . Uses a period after the list identifier. , Uses a comma after the list identifier. : Uses a colon after the list identifier. - Uses a dash after the list identifier. ) Uses a right parenthesis after the list identifier _ Puts nothing after the list identifier. > Puts an arrow after the list identifier. * Puts a bullet after the list identifier. Matrices: There have been several suggestions for matrices, but I have yet to figure out yet how exactly to implement them. A matrix will be given the escape code \MTRX { ... }, so until a matrix standard is produced, ignore the matrices. Conclusion: This is the first Really Big Galactic Guide Format in the Guide's history. Undoubtedly, there are many problems with what I've put together here, and I've almost certainly left things out. But that's what revisions are all about. With this standard, however, the use of escape codes allows for future expansion very easily, and any revisions will most likely not be of such a large scale. I want to take this time here to thank everyone who actually put more than thirty seconds of thought into this project, and especially everyone who stuck with the project from the very beginning. And a really big hand to all the programmers who've created Guide readers, cuz they're really going to be pissed when they try to program for this monstrosity! ...Paul