eWriter's Tagsets

                    & eXtensible Markup

 

In the no longer distant future, a writer using eWriter as his or her textwriter or etypewriter may not be turning out HTML documents for the main "web" with its tens of millions of existing documents, but XML-defined tagset "punctuated"documents for a more closed in situation, existing in a company's or organization's intanet or extranet or "overlay" on the Internet. That writer will want to adapt this textwriter (no longer a simple "text editor") to this new chore and not lose all the quick tagging or punctuating.

What's an XML-defined tagset? How does this relate to the HTML tags you've been using as an advanced, semantically rich punctuation? HTML is, like the punctuation marks you learned in "grammar" school, a tagset. It's an SGML-defined tagset. SGML is an acronym that expands to Standard Generalized Markup Language. If you accept the idea that the HTML tagset is a language, then SGML is a metalanguage -- a language for defining languages. You could have, and do have, other SGML-defined tagsets used in document handling systems and have had these, I believe, for some twenty years.

XML, eXtensible Markup Language, is a lighter, faster, adapted subset-plus of SGML, and so, in the "language" jargon, a metalanguage. It's used to define markup languages, punctuation systems, or ...tagsets. It's for use on webs. Or its tagsets are. The "eXtensible" catches the nature of a generalized markup definer or metalanguage that's of most use to people writing with HTML. The HTML browser vendors have always pushed the envelope. They "add tags" that their browsers will respond to and do things in the "interface" copies of documents that the writers can drive in their "engine" (manuscript) copies. So, we get the defined HTML moving from 2.0 to 3.2 to the current, already chafing, 4.0. Then, there's DHTML (D, Dynamic), style sheets linked in and, finally, Cascading Style Sheets (CSS), and so forth. What the "eXtensible" stands for, really, is the ability to "define your own tags, your own tagset." Of course, you have to have software that can read the tags and do what the tags ask it to do.

The "you" in that isn't the writer who is writing "using a tagset" -- obviously. You don't create tags on the fly and there'd be no point. A tagset is defined. Whether SGML or XML is doing the defining, it's done in a document type definition (DTD) document. That is usually one or more files not in the documents written in the situation where the DTD is used. But, there can be "overriding" DTD-type material in a header at the top of, say, particular sorts of document produced. The actual you may write DTDs and style sheets, whether CSSs or those written using XSL, or work on them with others involved in the company or organization. But, mostly, when it comes to using eWriter shortcuts, you'll be using the tagset.

The line between defining and using tagsets, when you are working with XML, isn't sharp. Typists, as sorted out from typing writers, can simply use the tags that a supervisor has said to use for particular items. For instance, a typist can be told "Everytime you type in anybody's email address use the <MAILTO-LINK> tag from the Tagset menu (Alt+G,3) and type the address inside it, which means between the start and end tags.

That's what the writer actually does at that point in the text, of course, even if generating the text out of his or her head. The difference is that the writer knows why and, indeed, will do it only sometimes, for some addresses in the report or eletter or whatever else.

This is what semantically rich punctuation is for. The "reason" for making a particular address a live link for the reader of the report or letter. In the example, the tag simply described what was being done. The reader does not need to do any "doubled" reading, reading of the "engine" or manuscript copy. A reader might, though, hoping the tag will tell why this address is a link if the text doesn't say why. And the value of such a tag increases if it tells why. In the company's "web" of documents it may be understood that the addresses of certain classes of people will always be live links. And a tag can be used that identifies the class of people this person (or the person's address) falls into. So a reader may see the linked address and look at the manuscript to see if the tag is <COMPANY-VIP> or <PREF-CUST> or....

Actually, as I understand it, since tag-pairs surround and define "elements" the declaration can be such that it's the name that would be tagged and an email address might be a live link in parentheses added in automatically and the writer's choice point, to get this, would be in tagging the person by class. You can see, therefore, that illustrations c'n spiral out of sight into the stratosphere.

The thing to remember here, about "eXtensibly"-defined tags, is that you can have tags that initiate any action the handling software (a browser, for instance) is capable of doing and knowing that you want done. And that's about all the XML I c'n try to define here. In any case, I only know, like Will Rogers, "what I read in the papers" -- and what I c'n imagine.

If eWriter is going to handle XML-defined tagsets, it needs a menu that can be created or swapped in almost as if it was a boilerplate cylinder container -- but with the advantages that eWriter's menus have over boilerplate. So we have the Tagset menu.

When you first see the Tagset menu, it has three items on it. First there is a label with an ellipsis before the colon, which means this label is to be clicked on. Clicking on it gets an info-plaque "manual." The other two items are "Add Tag" and "Remove Tag". Clicking on one of these brings up an info-plaque and then an input dialog that uses the title bar and prompt to further guide the user. The user enters the content of the start tag, without the < or >. This will be the element's name and any attributes with values or, more usually, the empty pairs of quotes where values can be plugged in. Then, eWriter will make the pair of tags and store them for when the menu item is clicked. It will also put the tag pair on the menu so the user sees what's there. To emulate the punctuation mark's key, you key the menu item. This is Alt+G, # (where number is 1 to 9 and A to F).

Some tags are "empties." In the HTML tagset, <BR> is considered an empty element. There is no end tag. In any XML-defined tagset, a BR tag, if you included one, would be <BR/>. Since the Tagset menu can be used to add to your HTML menus, eWriter can take either kind of empty. To make an XML-type empty, add the / to your content. To make an HTML empty, add the > usually left off. The prompt in the input box reminds you of these cues.

XML is set up so that everything can be very familiar to HTML "developers" (writers). Take the familiar link from HTML which would have a tag-pair like this on the menu:

      <A HREF="">|</A>

The | represents eWriter's placing of the cursor. (A boilerplate insertion leaves the cursor at the end.) You can, now, type what will be clicked or, with another tag-pair, split the line (Return) and then generate an empty line to type in with Shift+Spacebar. First, you will want to back up and put an URL in the HREF="|".

XML has very advanced link handling and an entire language specificiation (XLL). The "simple" link, though, is made so that it's very much like the HTML link and, if you named the element A, you could even seem to have an identical link. Some relatively rarely used attributes, like REL, REV, and TARGET take some tricky handling for actual duplication, but, those are not usually used. Still, XML's power lies in that any element can be a link along with being whatever else it is. When the element is declared and its ATTLIST set up and all (you don't have to know what that's all about here), there are a number of attributes for its link nature to be set up. But most of these will, for a given link element you will use in a tagset, be set by default or are optional and won't be used. Only HREF="" has to be set in the instance. So if you named your element A (not necessarily to tie it to HTML links, but just because you like the name A or it made semantic sense in your A, B, C system), you'd get the same tag-pair you saw in the last paragraph. Of course, it is more likely to be named QRTLY-RPT. In which case, the URL you'd plug in would be some quarter's quarterly report document.

The naming of the parts... If everybody in your group is using eWriter as the "etypewriter" or "textwriter" of choice, there could be a reason to use a general simple link, using content to explain it, and name it A. You could then use the "anchor" keying that eWriter has on Ctrl+NumPad#1 (and the HTML menu). This lets you fill in the HREF="" and the clickable content in serially put up input boxes. This would be for general or "on the fly" use and you could still have your specifically named (like QRTLY-RPT) linking element tags on the menu.

Naming an element for your writers' convenience, given their use of a particular textwriter? Uhmmmm. Well, your boss may think that's a little too much self-indulgence. He or she may wonder, out loud, if you have stock in eWriter's future. But lets look at another f'rinstance. Let's look at the famous <I>, the italics tag.

Originally, there were no <I> or <B> tag. There were <EM> and <STRONG> tags. The early browsers were of quite varied capabilities. And Lynx, at least, was bound to the old green monster's (monitor from IBM) ASCII text. Lynx screened <EM>...</EM> content as underlined text. Many writing about XML look back to those tags as illustrating the separation of formatting and semantics with formatting, finally, being left to the browser and its style sheet, whether built in or stored somewhere. <TT>, <CODE>, <PRE> all elicit monospaced type from (probably) every browser. But, the tags have different meanings and while they all converge on a single, familiar (in each of the contexts) formatting ...well, that c'n conceivably change if there is a reason, in any instance, to change it. The separation is good.

Even if <EM> was named before any browser was GUI and could present italics ...that's what American readers had in mind, of course. It's what, as writers, they used to get the voice shift that enabled their "talking through their fingers." It matches their finding *...* to be italics in the non-GUI email "plain text."

Today, in XML-defined tagsets, we're not going to go back to an <EM>. We're going to assign italics and other qualities like that to <BOOK-TITLE> or something. But, there is a writer's "on the fly" emphasizing ...in free narrative, say. So, maybe something like an <EMPHASIS-1a> which, today, most would put in italics, but, tomorrow, most might handle quite differently, like, say, blue type. Hence, that very logical but, for humans, really meaningless (in terms of their life gather of experience) label. An academic attempt to be very "logical" and very "precise" while being "general."

Uhmmmm. Remember those first creators of the <EM>. They wanted italics, which they knew from all their reading and from their published writing, if any. They knew that manuscript writers and IBM monitors used underline with a margin note to "simulate" italics. So they found a way to say "italics or what you've got to fake italics" which was, with only slight inaccuracy, "emphasis." So boldface print had to be "strong emphasis," which shortened to just STRONG as the other shortened to EM. Now...

...People using these had to keep "interpreting" in their heads. Uhmmmm. The namers were too uncertain to say "italics" and let it lie there as obvious that a browser that can't produce italic, or "oblique" type will do something else.

We can't go back and revisit that. But our "on the fly" italic tag can be <I>. Further more, as our society evolves in the new multimediac age ..."italic" itself may generalize, along more than one line, to various sorts of presentation much different from, perhaps more elaborate than, "slanted text." And it may not relate in any way recognizable to non-historians with any practice identified with Italians from the past.

The <I> and <B> tags have advantages. They are shorter than <EM> and particularly <STRONG> and for inline tags (that can't be dropped left to a gutter as is done in eletters) that's important. They're more like *...* of email and don't take even automatic "reading," but simply "seeing." Since "EM" was an attempt to generalize from "I" in the heads of the developers, nobody will have trouble generalizing "I" in evolving usage.

And, for the writers, using not only eWriter, but say Winword or wordPerfect or Communicator ...You get that nice, built-in ability to use Ctrl+I as you type and such extras as selecting text, hitting Ctrl+I to get that text inside the tags. True, in eWriter, the HTML tags where it makes sense can do that on clicking as well as hot keys and I haven't put that into the Tagset menu slots. But in general, every word processor and any text editor that helps with the typing will have that Ctrl+I and Ctrl+B keying. So these general tags should, forever, ...mean the voice shifting (I) and volume shifting (B) that they've meant for all of us through our reading lives. Sure, I'm biased. I want eWriter to remain a topnotch 21st century etypewriter or textwriter ...with minimal rebuilding. And that includes this note exploring the notion that, when building tagsets, you can think in terms of what tools exist whose value can be maximized with only a little thinking while naming tags. Salvaging eWriter's Ctrl+NumPad# keys (formatting pad) may be pushing it. Salvaging every word processor's Ctrl+I or Ctrl+B makes very good and generally useful sense. And the principle c'n prove handy in evaluating all sorts of very general and very unique "special cases."

Meanwhile, back at the ranch... I think adding in the second half of eWriter's XML menu and Tagset is about the best life-extending idea I've had for my poor old home made etypewriter.

Gene Fowler
July, 1998