Magazine |
| | Community |
| | Workshop |
| | Tools & Samples |
| | Training |
| | Site Info |
|
|
||||||||
|
The requirements for transferring HTML text by means of the clipboard differ depending on the scenario. This article is concerned with cutting and pasting fragments of an HTML document. There may be requirements for transferring entire HTML documents through the clipboard; however, this article is driven by a requirement to transfer fragments of selected HTML text. As such, a method that requires the entire HTML document to be copied to the clipboard is more complicated than necessary.
The CF_HTML clipboard format allows a fragment of raw HTML text and its context to be stored on the clipboard as ASCII. This allows the context of the HTML fragment, which consists of all preceding and surrounding tags, to be examined by an application so that the surrounding tags of the HTML fragment can be noted with their attributes. Although it is up to an application to interpret such fragments, some basic guidelines are included here based on IE4/MSHTML implementations.
The official name of the clipboard (the string used by RegisterClipboardFormat) is HTML Format.
CF_HTML is entirely text format, and uses UTF-8. It includes a description, an optional context, and, within the context, the fragment.
The following is an example of a clipboard:
Version:0.9 StartHTML:71 EndHTML:170 StartFragment:140 EndFragment:160 StartSelection:140 EndSelection:160 <!DOCTYPE> <HTML> <HEAD> <TITLE>The HTML Clipboard</TITLE> <BASE HREF="http://sample/specs"> </HEAD> <BODY> <UL> <!--StartFragment --> <LI>The Fragment</LI> <!--EndFragment --> </UL> </BODY> </HTML>
The description includes the clipboard version number and offsets, indicating where the context and the fragment start and end. The description is a list of ASCII text keywords (min/maj is not meaningful) followed by a string and separated by a colon (:).
Version vv | Version number of the clipboard. Starting version is 0.9. |
StartHTML | Bytecount from the beginning of the clipboard to the start of the context, or -1 if no context. |
EndHTML | Bytecount from the beginning of the clipboard to the end of the context, or -1 if no context. |
StartFragment | Bytecount from the beginning of the clipboard to the start of the fragment. |
EndFragment | Bytecount from the beginning of the clipboard to the end of the fragment. |
StartSelection | Bytecount from the beginning of the clipboard to the start of the selection. |
EndSelection | Bytecount from the beginning of the clipboard to the end of the selection. |
The StartSelection and EndSelection keywords are optional and both must be omitted if you do not want the application to generate this information.
Other information of this kind could be added here later, since the HTML starts at the StartHTML offset. For example, multiple pairs of StartFragment/EndFragment could be added later to support noncontiguous selection of fragments.
For the convenience of the programs generating the bytecounts, bytecounts could be left padded by zeros. For example, programs generating the bytecounts could arbitrarily affect ten (10) zeros to each keyword (StartHTML: 0000000000). Then, when the exact StartHTML bytecount is known (say, 71), the program can replace the appropriate number of zeros by the bytecount (StartHTML: 0000000071).
The only character set supported by the clipboard is Unicode in its UTF-8 encoding. Because the first characters of UTF-8 and ASCII match, the description is always ASCII, but the bytes of the context (starting at StartHTML) could be using any other characters coded in UTF-8.
End of lines can be represented in a clipboard format header as CR, CR/LF, or LF.
The fragment contains valid HTML representing the area the user has selected (in order to copy, for example). This contains the selected text plus the opening tags and attributes of any element that has an end tag within the selected text, as well as end tags at the end of the fragment for any start tag included. All this information is required for basic pasting of an HTML fragment.
The fragment should be preceded and followed by the HTML comments <!--StartFragment--> and <!--EndFragment--> (no space allowed between the !-- and the text) to indicate where the fragment starts and ends. Thus the start and end of the fragment are indicated by the presence of these comments and by StartFragment and EndFragment byte counts in the description. Tools are expected to produce this information. This redundancy has been introduced to be able to rapidly find the start of the fragment (from the byte count) and mark the position of the fragment directly in the HTML tree.
The selection indicates the exact HTML area the user has selected (in order to copy, for example). This adds more information to the fragment by indicating the exact selected text, without the opening and ending tags that have been added to ensure that the fragment is well-formed HTML.
The selection is optional, since sufficient information is included in the fragment for basic pasting. If the selection is not stored, both StartSelection and EndSelection are not stored in the header.
The context is a valid, complete HTML document. This article contains the fragment and all preceding surrounding tags (start and end tags; these preceding surrounding tags represent all the parent nodes of the fragment, until the HTML node). The article also contains the complete HEAD element, and allows the BASE and TITLE elements, for example, to be included so this additional information can be obtained. An application copying a fragment of HTML to the clipboard can choose to create a BASE element to include in the context if such an element is not already present, so that partial URLs in the fragment can be resolved.
The context is optional, since sufficient information is included in the fragment for basic pasting of an HTML fragment to take place. If the context is not stored, only the fragment is stored, and the StartHTML=EndHTML=-1.
The following scenarios describe how the IE4/MSHTML HTML editor handles HTML cutting and pasting; other applications may or may not follow these scenarios. The clipboard format described here is intended to allow flexibility for how an application chooses to function. (These scenarios show only good HTML, that is, no overlapping tags.)
Head1 | Item 1 | Item 2 | Item 3 | Item 4 |
---|---|---|---|---|
Item 5 | Item 6 | Item 7 | Item 8 | |
Head2 | Item 9 | Item 10 | Item 11 | Item 12 |
Does this content meet your programming needs? Write us!
© 1998 Microsoft Corporation. All rights reserved. Terms of use.