Microsoft HomeproductssearchsupportshopWrite Us   Microsoft Home
Magazine
 |  Community
 |  Workshop
 |  Tools & Samples
 |  Training
 |  Site Info

Workshop  |  Networking, Protocols & Data Formats

HTML Clipboard Format


The requirements for transferring HTML text by means of the clipboard differ depending on the scenario. This article is concerned with cutting and pasting fragments of an HTML document. There may be requirements for transferring entire HTML documents through the clipboard; however, this article is driven by a requirement to transfer fragments of selected HTML text. As such, a method that requires the entire HTML document to be copied to the clipboard is more complicated than necessary.

The CF_HTML clipboard format allows a fragment of raw HTML text and its context to be stored on the clipboard as ASCII. This allows the context of the HTML fragment, which consists of all preceding and surrounding tags, to be examined by an application so that the surrounding tags of the HTML fragment can be noted with their attributes. Although it is up to an application to interpret such fragments, some basic guidelines are included here based on IE4/MSHTML implementations.

The official name of the clipboard (the string used by RegisterClipboardFormat) is HTML Format.

Description

CF_HTML is entirely text format, and uses UTF-8. It includes a description, an optional context, and, within the context, the fragment.

The following is an example of a clipboard:

Version:0.9
    StartHTML:71
    EndHTML:170
    StartFragment:140
    EndFragment:160
    StartSelection:140
    EndSelection:160
    <!DOCTYPE>
    <HTML>
    <HEAD>
    <TITLE>The HTML Clipboard</TITLE>
    <BASE HREF="http://sample/specs"> 
    </HEAD>
    <BODY>
    <UL>
    <!--StartFragment -->
    <LI>The Fragment</LI>
    <!--EndFragment -->
    </UL>
    </BODY>
    </HTML>

The description includes the clipboard version number and offsets, indicating where the context and the fragment start and end. The description is a list of ASCII text keywords (min/maj is not meaningful) followed by a string and separated by a colon (:).

Version vvVersion number of the clipboard. Starting version is 0.9.
StartHTMLBytecount from the beginning of the clipboard to the start of the context, or -1 if no context.
EndHTMLBytecount from the beginning of the clipboard to the end of the context, or -1 if no context.
StartFragmentBytecount from the beginning of the clipboard to the start of the fragment.
EndFragmentBytecount from the beginning of the clipboard to the end of the fragment.
StartSelectionBytecount from the beginning of the clipboard to the start of the selection.
EndSelectionBytecount from the beginning of the clipboard to the end of the selection.

The StartSelection and EndSelection keywords are optional and both must be omitted if you do not want the application to generate this information.

Other information of this kind could be added here later, since the HTML starts at the StartHTML offset. For example, multiple pairs of StartFragment/EndFragment could be added later to support noncontiguous selection of fragments.

For the convenience of the programs generating the bytecounts, bytecounts could be left padded by zeros. For example, programs generating the bytecounts could arbitrarily affect ten (10) zeros to each keyword (StartHTML: 0000000000). Then, when the exact StartHTML bytecount is known (say, 71), the program can replace the appropriate number of zeros by the bytecount (StartHTML: 0000000071).

The only character set supported by the clipboard is Unicode in its UTF-8 encoding. Because the first characters of UTF-8 and ASCII match, the description is always ASCII, but the bytes of the context (starting at StartHTML) could be using any other characters coded in UTF-8.

End of lines can be represented in a clipboard format header as CR, CR/LF, or LF.

The fragment contains valid HTML representing the area the user has selected (in order to copy, for example). This contains the selected text plus the opening tags and attributes of any element that has an end tag within the selected text, as well as end tags at the end of the fragment for any start tag included. All this information is required for basic pasting of an HTML fragment.

The fragment should be preceded and followed by the HTML comments <!--StartFragment--> and <!--EndFragment--> (no space allowed between the !-- and the text) to indicate where the fragment starts and ends. Thus the start and end of the fragment are indicated by the presence of these comments and by StartFragment and EndFragment byte counts in the description. Tools are expected to produce this information. This redundancy has been introduced to be able to rapidly find the start of the fragment (from the byte count) and mark the position of the fragment directly in the HTML tree.

The selection indicates the exact HTML area the user has selected (in order to copy, for example). This adds more information to the fragment by indicating the exact selected text, without the opening and ending tags that have been added to ensure that the fragment is well-formed HTML.

The selection is optional, since sufficient information is included in the fragment for basic pasting. If the selection is not stored, both StartSelection and EndSelection are not stored in the header.

The context is a valid, complete HTML document. This article contains the fragment and all preceding surrounding tags (start and end tags; these preceding surrounding tags represent all the parent nodes of the fragment, until the HTML node). The article also contains the complete HEAD element, and allows the BASE and TITLE elements, for example, to be included so this additional information can be obtained. An application copying a fragment of HTML to the clipboard can choose to create a BASE element to include in the context if such an element is not already present, so that partial URLs in the fragment can be resolved.

The context is optional, since sufficient information is included in the fragment for basic pasting of an HTML fragment to take place. If the context is not stored, only the fragment is stored, and the StartHTML=EndHTML=-1.

Scenarios

The following scenarios describe how the IE4/MSHTML HTML editor handles HTML cutting and pasting; other applications may or may not follow these scenarios. The clipboard format described here is intended to allow flexibility for how an application chooses to function. (These scenarios show only good HTML, that is, no overlapping tags.)

  1. Simple Fragment of HTML.
    • HTML text:
      <BODY>This is normal <B>This is bold </B><I><B>This is bold italic </B></I><I>This is italic </I></BODY>
    • Appears as:
      This is normal This is bold This is bold italic This is italic
    • The text between the ** is selected and copied to the clipboard:
      This is normal This is **bold This is bold italic This** is italic
    • This is what will be on the clipboard (note this is IE4/MSHTML's interpretation):
      Version:0.9
      StartHTML:71
      EndHTML:160
      StartFragment:130
      EndFragment:150
      StartSelection:130
      EndSelection:150
      <!DOCTYPE ...>
      <HTML>
      <BODY>
      <!--StartFragment-->
      <B>bold </B><I><B>This is bold italic </B>This</I>
      <!--EndFragment-->
      </BODY>
      </HTML>
    • In this scenario only the BODY element and the HTML element appear in the context as it precedes the selected fragment. Note that start tags and end tags are included in the context. The selection, as delimited by StartSelection and EndSelection, is shown in bold.
  2. Fragment of a table in HTML.
    • HTML text:
      <BODY><TABLE BORDER><TR><TH ROWSPAN=2>Head1</TH><TD>Item 1</TD> <TD>Item 2</TD> <TD>Item 3</TD> <TD>Item 4</TD></TR><TR><TD>Item 5</TD> <TD>Item 6</TD> <TD>Item 7</TD> <TD>Item 8</TD></TR><TR><TH>Head2</TH><TD>Item 9</TD> <TD>Item 10</TD> <TD>Item 11</TD> <TD>Item 12</TD></TR></TABLE></BODY>
    • Appears as:
      Head1Item 1 Item 2 Item 3 Item 4
      Item 5 Item 6 Item 7 Item 8
      Head2Item 9 Item 10 Item 11 Item 12
    • The Item 6, Item7, Item 10, and Item 11 elements of the table are selected as a block and copied to the clipboard. The following is an IE4/MSHTML interpretation of what will be on the clipboard.
      <!DOCTYPE ...>
      <HTML><BODY><TABLE BORDER>
      <!--StartFragment-->
      <TR><TD>Item 6</TD> <TD>Item 7</TD></TR><TR><TD>Item 10</TD> <TD>Item 11</TD></TR>
      <!--EndFragment-->
      </TABLE>
      </BODY></HTML>
      The selection, as delimited by StartSelection and EndSelection, is shown in bold.
  3. Pasting a fragment of an ordered list into plain text.
    • HTML text:
      <BODY><OL TYPE = a><LI>Item 1<LI>Item 2<LI>Item 3<LI>Item 4<LI>Item 5<LI>Item 6</OL></BODY>
    • Appears as:
      1. Item 1
      2. Item 2
      3. Item 3
      4. Item 4
      5. Item 5
      6. Item 6
    • The user selects and copies items 3 through 5 to the clipboard.
      The following HTML is in the clipboard:
      <DOCTYPE...><HTML><BODY><OL TYPE = a>
      <!-- StartFragment-->
      <LI>Item 3<LI>Item 4<LI>Item 5
      <!-- EndFragment-->
      </OL></BODY></HTML>
      The selection, as delimited by StartSelection and EndSelection, is show in bold.
    • If this fragment is now pasted into an empty document, the following HTML will be created:
      <BODY><OL TYPE = a><LI>Item 3<LI>Item 4<LI>Item 5</OL></BODY>
    • Appearing as:
      1. Item 3
      2. Item 4
      3. Item 5
  4. Pasting a partially selected region.
    • HTML text:
      <P> IE4/MSHTML is a WYSIWYG Editor that supports :<UL><LI>Cut<LI>Copy<LI>Paste
      </UL> <P>This is a Great Tool!
    • Appears as:
      IE4/MSHTML is a WYSIWYG Editor that supports:
      • Cut
      • Copy
      • Paste
    • The user selects from "WYSIWYG" until "Cop". The following HTML is in the clipboard:
      <DOCTYPE...><HTML><BODY>
      <!-- StartFragment-->
      <P>
      WYSIWYG Editor, which supports
      <UL><LI>Cut<LI>Cop
      </UL>
      <!-- EndFragment-->
      </BODY></HTML>
      The selection, as delimited by StartSelection and EndSelection, is shown in bold.
    • The user selects from "opy" until "Great". The following HTML is in the clipboard:
      <DOCTYPE...><HTML><BODY>
      <!-- StartFragment-->
      <UL><LI>
      opy<LI>Paste</UL><P> This is a Great
      </P>
      <!-- EndFragment-->
      </BODY></HTML>
      The selection, as delimited by StartSelection and EndSelection, is shown in bold.

Does this content meet your programming needs? Write us!

Back to topBack to top

© 1998 Microsoft Corporation. All rights reserved. Terms of use.

 

Magazine Home
Ask Jane
DHTML Dude
Extreme XML
For Starters
More or Hess
Servin' It Up
Site Lights
Web Men Talking
Member Community Home
Benefits: Freebies & Discounts
Benefits: Promote Your Site
Benefits: Connect with Your Peers
Benefits at a Glance
Online Special-Interest Groups
Your Membership
SBN Stores
Join Now
Workshop Home
Essentials
Content & Component Delivery
Component Development
Data Access & Databases
Design
DHTML, HTML & CSS
Extensible Markup Language (XML)
Languages & Development Tools
Messaging & Collaboration
Networking, Protocols & Data Formats
Reusing Browser Technology
Security & Cryptography
Server Technologies
Streaming & Interactive Media
Web Content Management
Workshop Index
Tools & Samples Home
Tools
Samples, Headers, Libs
Images
Sounds
Style Sheets
Web Fonts
Training Home
SBN Live Seminars
SBN Live Chats
Courses
Peer Support
CD-ROM Training
Books & Training Kits
Certification
SBN Home
New to SBN?
What's New on SBN
Site Map
Site Search
Glossary
Write Us
About This Site