Channel Definition Format (CDF)

Version 1.01

Updated: April 1, 1998

Related documentation:: Open Software Description (OSD) Specification
CDF Reference

Author:: Castedo Ellerman, Microsoft Corporation, castedoe@microsoft.com

Specification changes since CDF 1.0 (Sept 30, 1997):

Obsolescence of ENDDATE attribute by STOPDATE attribute

Specification changes since July 14, 1997:

The STYLE attribute of the LOGO element can take an additional value of "IMAGE-WIDE".
The use of OSD elements in CDF are specified in the OSD specification instead of this CDF specification.
The TIMEZONE attribute of the SCHEDULE element has been added.
The LASTMOD attribute of ITEM and CHANNEL no longer take a time zone suffix.
All attributes of the LOGIN element have been removed
The LASTMOD and PRECACHE attributes may now appear inside the CHANNEL tag.
The SELF attribute of the CHANNEL element has been removed for consideration in later CDF versions.

Table of Contents

Introduction
CDF Document
- Linking Declarations
- Time Related Declarations
  - Date and Time
  - Time Quantity
- Schedule Declarations
- Content Description Declarations
  - Bibliographic Elements
  - Page Retrieval Attributes
- Hierarchical Declarations
  - Item Element
  - Channel Element
  - Subchannel Element
  - Base Attribute
- Miscellaneous Declarations
  - Logo Element
  - Login Element
  - Usage Element
- Logging Declarations
  - LogTarget Element
  - Log Element

Appendices

Introduction

Content "push" means automatic delivery of up-to-date, personalized information. Today's world of "pushing" content and streaming media is littered with a number of related but distant technology efforts. Each of these efforts provides a solution for a focused problem, but few promise the flexibility to expand to fill the larger role of providing an application standard for the overall problem of pushing or streaming content. This document poses an application standard and description format for publishing content that can be delivered to end-users using different mechanisms (pull, push, streaming), and that can scale to meet various device limits, network sizes, and bandwidth constraints.

The Channel Definition Format, provide publishers with the ability to author content a single time for publishing via many different avenues - Web browsers, Web crawlers, "push" clients, or direct streaming push. A Channel is defined as a set of documents or a grouping of content that can be "pushed", pulled, or operated on as a unit. In today's applications, the types of operations on a channel primarily involve automatic scheduled download for offline use ("smart pull"), or multicast delivery for later use. However, the emergence of a standard in this space will also enable the next generation of applications and technology in content publishing - searching, indexing, profiling, filtering, and personalizing content independent of the publishing mechanism. In order to provide the flexibility to expand into next generation technologies, it is essential to have a standardized declarative meta-data schema along with a standardized procedural object model for pushing or streaming content. The proposal here for the Channel Definition Format depends on the eXtensible Markup Language (XML) for its declarative syntax, and therefore assumes the use of the eventual standard for the XML Document Object Model for procedural access to this syntax.

To understand the use of CDF within a "push" application, it is best to visualize its use for "smart pull". A number of "push" technologies in the industry are more accurately described as smart pull. In a smart pull solution for "pushing" content, a client-side mechanism automatically schedules the HTTP content-pulling mechanism of regular Web browsers, presenting content pulled over HTTP to the user automatically so the user experience is one of pushed content. The underlying technology here consists of HTTP as a transport protocol, combined with a scheduler for triggering requests, and a user-interface that presents regular HTML content that is automatically retrieved via HTTP, providing a "push" experience that brings content to the user without requiring browsing to a site. Of the various smart pull clients on the Web today, some use simple web-crawl based pull mechanisms to retrieve content - pulling all the new content from a site using a scheduled crawl of every link starting from the home page. However, most clients use an application-specific directive file to program the scheduler to pull only the content that is necessary to present to the user as part of a "channel" experience. Regular HTTP polling is used to determine when new content is available and to pull it automatically for presentation to the user. In these more advanced clients, this file optimizes this "smart pull" mechanism declaratively, providing content publishers a declarative mechanism for specifying the schedule and exact set of resources that need to be pulled. The Channel Definition Format, proposed here, aims to standardize the file that is used in such applications.

Relationship to Existing Standards and Proposals

CDF is an application of eXtensible Markup Language (XML) thus this specification relies heavily on the definition of XML. Even this specification itself uses much of the notation and definitions from the Extensible Markup Language V1.0 Non-MS link specification from the W3C [W3C-WD-xml].

CDF also takes advantage of the following standards in different parts of its definition:

IETF RFC 1738 for URLs [RFC-1738]
W3C HTML 3.2 for precedence on how relative URLs are resolved relative to a "base" URL [W3C-REC-html32] [RFC-1808]
IETF RFC 1945 and the W3C Extended Log File Format Working Draft on how logging data is transferred [RFC-1945] [W3C-WD-logfile]
ISO 8601 standard for times and dates in CDF [ISO-8601] [W3C-NOTE-datetime]
Most of the Open Software Description (OSD) Format constructs can be applied to CDF, see the OSD submission at the W3C for details [W3C-NOTE-OSD]

Notation

Since CDF is an application of XML, a formal grammar is only given for the root element of a CDF document. The formal grammar for applications of XML and the definition of a root element is given in the XML Working Draft Non-MS link [W3C-WD-xml]. The notation used in this specification is an extension of the simple Extended Backus-Naur Form (EBNF) notation used in the XML working draft. A "multiset expression" notation is introduced which allows concise definition of order-independent concatenations of expressions with restrictions on multiplicity of those expressions. A formal definition is in included in the appendix Formal Definition of "Multiset Expressions".

The following is a summary of expression found in EBNF and "multiset" notation.

[a-zA-Z], [#xN-#xN]: represents any character with a value in the range(s) indicated (inclusive).
'string': set of strings which match case-insensitive with the string inside the single quotes.
a b: a followed by b.
a | b: a or b but not both.
a - b: the set of strings represented by a but not represented by b
a?: a or nothing; optional a.
a+: one or more occurrences of a.
a*: zero or more occurrences of a.
A·B: multiset expression of combined multiset expressions A and B
<a>: multiset expression containing one occurrence of a
<a>?: multiset expression with just one or zero occurrences of a
<a>*: multiset expression with just a but any number of occurrences (incl. zero occurrences)

Useful Syntactic Constructs

Below are syntactical constructs which appear frequently in this CDF specification. With the exception of the last construct, all of them also appear in the W3C XML specification.

Useful Constructs

`[1]`	`S`	`::=`	`(#x20 \| #x9 \| #xd \| #xa)+`
`[2]`	`Eq`	`::=`	`S? '='` `S?`
`[3]`	`content`	`::=`		`/* see XML 1.0 */`
`[4]`	`Name`	`::=`		`/* see XML 1.0 */`
`[5]`	`NaturalNum`	`::=`	`[0-9]*`

The symbol S specifies one of more white space characters: spaces, tabs, carriage returns or line feeds.

The symbol Eq specifies an equal sign surrounded by white space and is used to separate attribute names from attributes values (e.g. PRECACHE = "NO").

The symbols content and Name are from the W3C XML specification mentioned in the reference [W3C-WD-xml]. The content symbol specifies any sequence of character data, elements, entity reference in addition to some other constructs. The Name symbol more or less specifies a letter followed by any number of alphanumeric characters.

The symbol NaturalNum specifies a natural number in text (e.g 0, 15, 23, ...)

CDF Document

XML applications such as CDF consist of a singe root element which varies from application to application. For CDF the root element is the CHANNEL element. Around the root element are other constructs which are used by XML parsers. These surrounding constructs are, for the most, independent of an application of XML. For this reason only the grammar for the CHANNEL element and its contents are defined in this CDF specification. For details on surrounding XML constructs see the W3C XML Specification [W3C-WD-xml].

The constructs that are found inside the CHANNEL element are defined in the following sections.

Linking Declarations

The following constructs are used for referring to external resources such as HTML pages.

Linking Constructs

`[6]`	`URL`	`::=`		`/* see [RFC-1738]*/`
`[7]`	`HrefAtt`	`::=`	`'HREF'` `Eq` `URL`
`[8]`	`Anchor`	`::=`	`'<A'` `S+` `HrefAtt` `S? '>'content` `'</A'` `S? '>'`

The URL symbol specifies a Uniform Resource Locator (URL) as defined in RFC-1738 in the reference. The HrefAtt symbol simply specifies the attribute name/value pair that is always used with a URL. Finally the Anchor symbol specifies an element in which an HrefAtt attribute name\value pair can be placed inside of for compatibility concerns.

Time Related Declarations

Dates and dates with a time are expressed in CDF:

Date and Time

`[9]`	`Date`	`::=`	`Year` `'-'` `Month` `'-'` `Day`
`[10]`	`Year`	`::=`	`[0-9][0-9][0-9][0-9]`
`[11]`	`Month`	`::=`	`[0-1][0-9]`
`[12]`	`Day`	`::=`	`[0-3][0-9]`
`[13]`	`DateTime`	`::=`	`Date` `'T'` `Hour` `':'` `Minute`
`[14]`	`Hour`	`::=`	`[0-2][0-9]`
`[15]`	`Minute`	`::=`	`[0-5][0-9]`

This format is a simplified subset of the ISO 8601 date and time standard [ISO 8601], [W3C-NOTE-datetime]. Note that Month must be between 01 and 12, Day must be between 01 and 31 and the Hour must be between 00 and 23.

There also needs to be a way to expresses quantity of time rather than specific times or dates:

Time Quantity

`[16]`	`TimeQuantAttList`	`::=`	`( <DayAtt>?·<HourAtt>?·<MinAtt>?·<S>* ) - (S?)`
`[17]`	`DayAtt`	`::=`	`'DAY'` `Eq` `NaturalNum`
`[18]`	`HourAtt`	`::=`	`'HOUR'` `Eq` `NaturalNum`
`[19]`	`MinAtt`	`::=`	`'MIN'` `Eq` `NaturalNum`

Schedule Declarations

Channel Definition Format can specify how often a channel needs to be updated. This updating involves downloading and locally saving (precaching) on a client machine the contents referenced by the CHANNEL and ITEM elements and also sending page hit logging information to the target server specified in the LOGTARGET element.

Schedule

[20]	`Schedule`	`::=`	`'<SCHEDULE' ( <StartDateAtt>? · <StopDateAtt>? · <TimeZoneAtt>? · <S>* ) '>' <IntervalTime> · <EarliestTime>? · <LatestTime>? '</SCHEDULES? '>'`
`[21]`	`StartDateAtt`	`::=`	`'STARTDATE'Eq` `Date`
`[22]`	`StopDateAtt`	`::=`	`('STOPDATE' \| 'ENDDATE')` `Eq` `Date`
	`TimeZoneAtt`	`::=`	`'TIMEZONE'` `Eq( '-' \| '+' ) [0-1][0-9][0-9][0-9]`
`[23]`	`IntervalTime`	`::=`	`'<INTERVALTIME'TimeQuantAttList` `'/>'`
`[24]`	`EarliestTime`	`::=`	`'<EARLIESTTIME'TimeQuantAttList` `'/>'`
`[25]`	`LatestTime`	`::=`	`'<LATESTTIME'TimeQuantAttList` `'/>'`

STARTDATE: Specifies the day on which the schedule will start to apply. If this attribute is omitted, the schedule will start to apply on the current day. Time values specified with the date will be ignored.
STOPDATE: Specifies the day on which the schedule expires. Time values specified with the date will be ignored. Obsoletes ENDDATE for which most existing clients have implementation problems.
TIMEZONE: Specifies the time zone to which times of the schedule are relative to.
INTERVALTIME: This value declares how often updates will occur (on average).
EARLIESTTIME: Specifies the earliest time within the repeating interval that the schedule applies to. This is the beginning of the valid range of time that the update to the CDF can occur. By default, if this value is not specified, the earliest time is set to the beginning of the INTERVALTIME interval. The days, hours, and minutes are totaled to determine the offset value from the start of the INTERVALTIME.
LATESTTIME: Specifies the latest time during the INTERVALTIME to which the schedule is applied and updated. The days, hours, and minutes are totaled to determine the offset value from the INTERVALTIME that represents the latest valid time for updating a channel. If omitted, the latest time is set to the beginning of the INTERVALTIME.

The most important function of a CDF schedule is to specify how often a task should be repeated on a periodic basis. Between each repeated task (e.g. downloading content) is a quantity of time. This is the "interval time" which is specified in the INTERVALTIME element. A schedule is a series of repeating periods lasting an interval time amount of time. This series begins on the midnight of the day specified by the STARTDATE attribute. If STARTDATE is not present it is assumed to be the day a schedule starts to be used by a client. Likewise a schedule will cease to apply after the day specified by STOPDATE ends.

By default, channel updating occurs during the beginning of each of these interval time periods. This presents a problem for content servers which can be overloaded when a large number of clients download content at the exact same time specified by a CDF schedule. To solve this problem clients can instead of updating during a specific time can update randomly within an range of times within an interval time period. When introduced with a "latest time" quantity, clients can randomly choose a single time between the beginning of the interval time period and out into the future by the "latest time" amount of time. If the range of times in which a random time is chosen should not begin during the start of each interval time period, then EARLIESTTIME can be introduced so that the range begins "earliest time" amount of time into the future from the start of the interval time period. For example:

<SCHEDULE STARTDATE="1997-03-24">
   <INTERVALTIME HOUR="6" />
   <LATESTTIME HOUR="3" />
   <EARLIESTTIME HOUR="1" />
</SCHEDULE>

specifies that every day from March 24th, 1997 on, an update should randomly occur once between 1am and 3am, once between 7am and 9am, once between 1pm and 3pm and once between 7pm and 9pm.

A TIMEZONE attribute is introduced since schedules can be published in different time zones from where a client may use the schedule. When the TIMEZONE attribute is not present then all the times deduced from a schedule are in a client's local time. If a TIMEZONE attribute is specified it specifies the time zone of the publisher of a CDF schedule. All clients in other time zones should adjust their interpretation of a CDF schedule so that it matches the times in the publisher's time zone. If a client in Israel (time zone +0200) uses a schedule from Spain (time zone +0100):

<SCHEDULE TIMEZONE="+0100" >
   <INTERVALTIME DAY="1" />
   <LATESTTIME HOUR="3" />
</SCHEDULE>

then the client will update between 1am and 4am in the client's local Israeli time which is actually 12am and 3am in the publisher's Spanish time.

The set of time ranges during which a client should randomly choose a time to update content can be expressed formally on a timeline as follows:

{ [ S + E + I*n , S + L + I*n ] : n ³0 } / { [a,b] : b > T }

where

S is the point on a timeline for STARTDATE
T is the point on a timeline for STOPDATE
I is the quantity of time in INTERVALTIME
E is the quantity of time in EARLIESTTIME
L is the quantity of time in LATESTTIME
the times are relative to the time zone specified in TIMEZONE

Content Description Declarations

The primary purpose for CDF is to describe and index external content, usually in the form of HTML pages. The information to satisfy this purpose can be divided in two areas: bibliographic description and information specific to where content resides and how it should be retrieved.

Bibliographic Description

`[26]`	`BiblioElemSeq`	`::=`	`<Title>? · <Abstract>?`
`[27]`	`XMLSpaceAtt`	`::=`	`'XML-SPACE'Eq` `( 'DEFAULT' \| 'PRESERVE' )`
`[28]`	`Title`	`::=`	`'<TITLE'S? XMLSpaceAtt?` `S? '>'` `content` `'</TITLE'` `S? '>'`
`[29]`	`Abstract`	`::=`	`'<ABSTRACT'S? XMLSpaceAtt?` `S? '>' ? '>'` `content` `'</ABSTRACT'` `S? '>'`

TITLE

Specifies a text string representing the title of associate content

(eg: <TITLE>CDF Reference</TITLE> )

ABSTRACT

Represents a description of associated content

(eg: <ABSTRACT>How to create CDF</ABSTRACT>)

XML-SPACE: As specified in the W3C XML specification, "The value DEFAULT signals that applications' default white-space processing modes are acceptable for this element; the value PRESERVE indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of this element, unless overridden with another instance of the XML-SPACE attribute."

Page Retrieval Attributes

`[30]`	`PageAttSeq`	`::=`	`<HrefAtt> · <LastModAtt>? · <PreCacheAtt>? · <Level>?`
`[31]`	`LastModAtt`	`::=`	`'LASTMOD'` `Eq` `DateTime`
`[32]`	`PreCacheAtt`	`::=`	`'PRECACHE'` `Eq` `( 'YES' \| 'NO' )`
`[33]`	`Level`	`::=`	`'LEVEL'` `Eq` `NaturalNum`

These attributes specify from where and how resources and HTML pages which are indexed by a channel should be retrieved during channel updating. The HREF attributes specified the URL of the page or resource indexed.

LASTMOD: Specifies the last date on which content was modified. This allows a client having only download a CDF file to determine whether the content has changed since the last time it was downloaded. The content is downloaded only if the date associated with the cached content is older than this LASTMOD date.
PRECACHE: Specifies whether content should be downloaded and cached on a client computer. If PRECACHE is set to "NO", content is not downloaded and the LEVEL attribute is ignored. If PRECACHE is omitted the default is "YES".
LEVEL: Specifies the number of levels (or links) deep the client should "site crawl" and precache the HTML content from the URL specified in the HREF attribute. The default is zero, which specifies to only precache the content and associated resources needed to properly render HTML content. For instance, if the HTML content contains frames, the client will also retrieve the HTML content inside the frames.

Hierarchical Declarations

In addition to providing indexing and description on external content, CDF can also organize that indexed content into a hierarchical structure. When an external resource is a leaf node in the hierarchical structure provided in CDF, it's bibliographic and retrieval information is provided in an ITEM element. For nodes which contain other nodes in the hierarchical structure the CHANNEL element is used.

ITEM element - leaf node in hierarchy

[34] Item ::= '<ITEM' ( PageAttSeq · <S>* ) '>' (BiblioElemSeq·<Logo>*·<Anchor>?·<Log>? · <Usage>* · <S>* ) '</ITEM'S? '>'

Example:

<ITEM HREF="http://www.foosports.com/intro.htm">
  <TITLE>Welcome to FooSports!</TITLE>
  <ABSTRACT>FooSports articles, news, and promotional offers</ABSTRACT>
</ITEM>

Two symbols which are in ITEM but not CHANNEL are Log and Usage. Log (defined farther below) indicates whether the indexed page should be logged. Usage, also defined farther below, indicates how the indexed resource or page should be used.

CHANNEL element - container node in hierarchy

`[35]`	`ChanElem`	`::=`	`'<CHANNEL'` `PageAttSeq·<BaseAtt>?·<S>* '>'BiblioElemSeq·HierarchySeq· <Logo>·<Anchor>?·<Schedule>?· <Logtarget>?·<Login>?·<S> '</CHANNEL'S? '>'`
`[36]`	`HierarchySeq`	`::=`	`<SubChannel>·<Item>`
`[37]`	`SubChannel`	`::=`	`'<CHANNEL'` `PageAttSeq·BaseAtt·<S>* '>' (BiblioElemSeq·HierarchySeq · <Logo>·<Anchor>?·<S> ) '</CHANNEL'S? '>'`

The resource or page indexed by a node in the CDF hierarchy is provided through the HREF attribute in either the PageAttSeqor Anchor symbols, with preference given to the PageAttSeq symbol. The bibliographic information in BiblioElemSeq refers to this page. The Logo symbol provides a reference to an image which serves as an associated visual for the indexed resource or page.

The CHANNEL and ITEM elements which are specified in the HierarchySeq symbol are nodes which are contained by the containing CHANNEL element in the hierarchy specified by CDF.

Lastly we have the BASE attribute in CHANNEL.

Base Attribute

[38] BaseAtt ::= 'BASE' Eq URL /* See BASE element in [W3C-HTML-3.2] */

BASE="url": Specifies the base URL for the channel. This is used to resolve relative URLs [RFC 1808] specified in ITEM and CHANNEL elements contained within this channel. The BASE value applies to all child elements of the current channel or "folder." This attribute supersedes the BASE value previously defined by a parent CHANNEL element, if any exist. The URL for this attribute is used just like the BASE element in the W3C HTML specification [W3C-REC-html32].

Miscellaneous Declarations

Logo

`[39]`	`Logo`	`::=`	`'<LOGO' ( <HrefAtt> · <StyleAtt> · <S>* ) '/>'`
`[40]`	`StyleAtt`	`::=`	`'STYLE'` `Eq( 'ICON' \| 'IMAGE' \| 'IMAGE-WIDE' )`

Specifies an image that can be associated with a channel or channel item.

STYLE="ICON" | "IMAGE" | "IMAGE-WIDE"

Specifies a text string indicating the context in which a logo will be used as follows:

Value	Size
"ICON"	16H x 16W
"IMAGE"	32H x 80W
"IMAGE-WIDE"	32H x 194W

Note GIF and JPEG formats should be supported as logo images.

Example:

<LOGO HREF="http://www.foosports.com/images/logo.gif" STYLE="IMAGE"/>

Usage

[41] Usage ::= '<USAGE' S+ 'VALUE' Eq Name ( '/>' | '> content '</USAGE' S? '>'

Indicates how an ITEM element should be used.

VALUE

Required. Specifies a special usage of the parent element.

"Channel"	These items appear visually on client program. This is the default behavior when no USAGE element appears under an ITEM.
"NONE"	If a "NONE" value is used as the only USAGE element in an ITEM, the item should not be display on the client.
"ScreenSaver"	These items are displayed in a screen saver.

The content of the element is optional and varies according to the VALUE attribute. Any other values can be specified in the VALUE attribute.

Example:

<ITEM HREF="http://www.foosports.com/screensaver.htm">
    <USAGE VALUE="ScreenSaver"></USAGE>
</ITEM>

Login

[42] Login ::= '<LOGIN' S? '/>'

Indicates that when a channel is subscribed to, the client should prompt the user for user authentication information to use in later unattended channel udpates.

Logging Declarations

Logging

`[43]`	`LogTarget`	`::=`	`'<LOGTARGET' ( <HrefAtt>·<MethodAtt>·<ScopeAtt>?·<S>* ) '>' PurgeTime? '</LOGTARGET'S? '>'`
`[44]`	`MethodAtt`	`::=`	`'METHOD'` `Eq` `( 'POST' \| 'PUT' )`
`[45]`	`ScopeAtt`	`::=`	`'SCOPE'` `Eq` `( 'ALL' \| 'OFFLINE' \| 'ONLINE' )`
`[46]`	`PurgeTime`	`::=`	`'<PURGETIME'` `TimeQuantAttList` `'/>'`

Specifies where to upload a client's page-hit log file in Extended Log File Format [W3C-WD-logfile].

NOTE:

http://www.microsoft.com/cdf/example.cdf

http://www.msn.com/example.htm

HREF="url": Required. Specifies the URL of where the log file should be sent.
METHOD="POST" | "PUT": Required. Specifies the HTTP method to be used for sending the data. [RFC 1945]
SCOPE="ALL" | "OFFLINE" | "ONLINE": Specifies which type of page hits should be logged. Page hits can be logged for offline (read from local cache) or online (read from URL) browsing. The default for this attribute is "ALL", which logs both types of hits.
PURGETIME: When the log file is being uploaded, any page hits older than PURGETIME will not be reported.

Example:

<LOGTARGET HREF="http://www.foosports.com/logging"
  Method="POST" SCOPE="OFFLINE">
    <PURGETIME HOUR="12"/>
</LOGTARGET>

Log

[47] Log ::= '<LOG' S+ 'VALUE' Eq 'DOCUMENT:VIEW' S? '/>'

Specifies that the page referenced by the parent CHANNEL or ITEM element should be recorded in a page hit log file.

NOTE:

http://www.microsoft.com/cdf/example.cdf

http://www.msn.com/example.htm

VALUE="document:view": Specifies the event to be logged.

An ITEM can be logged only if the path of the ITEM's HREF attribute falls under the path of the CDF's URL or the path of the LOGTARGET's HREF.

Example:

<ITEM  HREF="http://www.foosports.com/promotion.htm">
    <LOG VALUE="document:view"/>
</ITEM>

A. Formal Definition of "Multiset" Expressions

Multisets are collections for objects which do not have any order but do have a number or occurrences associated with each objects. Sets are multisets where all member one either one or zero occurrences. Multisets provide a clean way to define any collection where multiplicity of member is of concern but not order. Formally a multiset is function from a universe of possible member object to the natural numbers (i.e. {0,1,2,....}). Like functions, a multiset can be defined from a set of ordered pairs of a multiset member and a natural number. When such a set is provided it is assumed that any ordered pair (x,0) is included if no ordered pair (x,n) is included. With this representation of multisets, the multiset union of two multisets, ( a È b ) , is { (x,n+m) : (x,n) Î a, (x,m) Î b }. We will find that collections of multisets of languages are useful for creating string expressions where multiplicity of component parts is important but now order. We will refer to collections of multisets of languages as multiset expressions.

A useful function L maps multisets of languages (sets of strings) to languages.

L(M) := { s₁ s₂ ..... s_n : s₁ Î L₁, s₂ Î L₂ , ... , s_n Î L_n, {(L₁,1)} È {(L₂,1)} È .... È {(L_n,1)} = M }

A version of L for multiset expressions naturally results with:

L(C) := { s : s Î L(M), M Î C }

When a multiset expressions needs to be mapped to a language it is natural to use the L function. For this reason, if a multiset expression is found in a regular expression of languages it will be assumed that the L function is to be applied to the multiset expression.

Now that there is a well defined means to include multiset expression in regular expressions, it will be useful to have operations whereby multiset expression can be constructed from languages and other multisets:

<a> := { {(a,1)} }: collection of the multiset containing one occurrence of a
<a>* := { {(a,n)} : n a non-negative integer }: collection of multisets with just a but any number of occurrences (incl. zero occurrences)
<a>? := { {(a,0)}, {(a,1)} }: collection of multisets with just one or zero occurrences of a
A·B := { a È b : a Î A, b Î B }: collection of combined multisets from collection A and collection B

B. References

[W3C-WD-xml]: Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Extensible Markup Language (XML) , Aug 1997
[RFC-1808]: R. Fielding, Relative Uniform Resource Locators , June 1995
[RFC-1738]: T. Berners-Lee, L. Masinter, M. McCahill, Uniform Resource Locators (URL) , Dec 1991.
[W3C-NOTE-OSD]: Arthur van Hoff, Hadi Partovi, Tom Thai, Open Software Description Format (OSD) , Aug 1997.
[ISO-8601]: ISO (International Organization for Standardization), ISO 8601:1988 (E), Data elements and interchange formats - Information interchange - Representation of dates and times, 1998.
[W3C-NOTE-datetime]: Misha Wolf, Charles Wicksteed, Date and Time Formats , Sept 1997
[W3C-REC-html32]: Dave Raggett, HTML 3.2 Reference Specification , Jan 1997.
[W3C-WD-logfile]: Phillip M. Hallam-Baker, Brian Behlendorf, Extended Log File Format , March 1996.
[RFC 1945]: T. Berners-Lee, R. Fielding, H. Frystyk, Hypertext Transfer Protocol -- HTTP/1.0 , May 1996.