Magazine |
| | Community |
| | Workshop |
| | Tools & Samples |
| | Training |
| | Site Info |
|
|
||||||||
|
Updated: April 1, 1998
Specification changes since CDF 1.0 (Sept 30, 1997):
Specification changes since July 14, 1997:
Table of Contents
Content "push" means automatic delivery of up-to-date, personalized information. Today's world of "pushing" content and streaming media is littered with a number of related but distant technology efforts. Each of these efforts provides a solution for a focused problem, but few promise the flexibility to expand to fill the larger role of providing an application standard for the overall problem of pushing or streaming content. This document poses an application standard and description format for publishing content that can be delivered to end-users using different mechanisms (pull, push, streaming), and that can scale to meet various device limits, network sizes, and bandwidth constraints.
The Channel Definition Format, provide publishers with the ability to author content a single time for publishing via many different avenues - Web browsers, Web crawlers, "push" clients, or direct streaming push. A Channel is defined as a set of documents or a grouping of content that can be "pushed", pulled, or operated on as a unit. In today's applications, the types of operations on a channel primarily involve automatic scheduled download for offline use ("smart pull"), or multicast delivery for later use. However, the emergence of a standard in this space will also enable the next generation of applications and technology in content publishing - searching, indexing, profiling, filtering, and personalizing content independent of the publishing mechanism. In order to provide the flexibility to expand into next generation technologies, it is essential to have a standardized declarative meta-data schema along with a standardized procedural object model for pushing or streaming content. The proposal here for the Channel Definition Format depends on the eXtensible Markup Language (XML) for its declarative syntax, and therefore assumes the use of the eventual standard for the XML Document Object Model for procedural access to this syntax.
To understand the use of CDF within a "push" application, it is best to visualize its use for "smart pull". A number of "push" technologies in the industry are more accurately described as smart pull. In a smart pull solution for "pushing" content, a client-side mechanism automatically schedules the HTTP content-pulling mechanism of regular Web browsers, presenting content pulled over HTTP to the user automatically so the user experience is one of pushed content. The underlying technology here consists of HTTP as a transport protocol, combined with a scheduler for triggering requests, and a user-interface that presents regular HTML content that is automatically retrieved via HTTP, providing a "push" experience that brings content to the user without requiring browsing to a site. Of the various smart pull clients on the Web today, some use simple web-crawl based pull mechanisms to retrieve content - pulling all the new content from a site using a scheduled crawl of every link starting from the home page. However, most clients use an application-specific directive file to program the scheduler to pull only the content that is necessary to present to the user as part of a "channel" experience. Regular HTTP polling is used to determine when new content is available and to pull it automatically for presentation to the user. In these more advanced clients, this file optimizes this "smart pull" mechanism declaratively, providing content publishers a declarative mechanism for specifying the schedule and exact set of resources that need to be pulled. The Channel Definition Format, proposed here, aims to standardize the file that is used in such applications.
CDF is an application of eXtensible Markup Language (XML) thus
this specification relies heavily on the definition of XML. Even
this specification itself uses much of the notation and
definitions from the Extensible
Markup Language V1.0 specification from the W3C
[W3C-WD-xml].
CDF also takes advantage of the following standards in different parts of its definition:
Since CDF is an application of XML, a formal grammar is only
given for the root element of a CDF document. The formal grammar
for applications of XML and the definition of a root element is
given in the XML Working
Draft [W3C-WD-xml]. The notation used in this specification
is an extension of the simple Extended Backus-Naur Form (EBNF)
notation used in the XML working draft. A "multiset
expression" notation is introduced which allows concise
definition of order-independent concatenations of expressions
with restrictions on multiplicity of those expressions. A formal
definition is in included in the appendix Formal Definition of "Multiset
Expressions".
The following is a summary of expression found in EBNF and "multiset" notation.
Below are syntactical constructs which appear frequently in this CDF specification. With the exception of the last construct, all of them also appear in the W3C XML specification.
Useful Constructs | |||||||||||||||||||||||||
|
The symbol S specifies one of more white space characters: spaces, tabs, carriage returns or line feeds.
The symbol Eq specifies an equal sign surrounded by white space and is used to separate attribute names from attributes values (e.g. PRECACHE = "NO").
The symbols content and Name are from the W3C XML specification mentioned in the reference [W3C-WD-xml]. The content symbol specifies any sequence of character data, elements, entity reference in addition to some other constructs. The Name symbol more or less specifies a letter followed by any number of alphanumeric characters.
The symbol NaturalNum specifies a natural number in text (e.g 0, 15, 23, ...)
XML applications such as CDF consist of a singe root element which varies from application to application. For CDF the root element is the CHANNEL element. Around the root element are other constructs which are used by XML parsers. These surrounding constructs are, for the most, independent of an application of XML. For this reason only the grammar for the CHANNEL element and its contents are defined in this CDF specification. For details on surrounding XML constructs see the W3C XML Specification [W3C-WD-xml].
The constructs that are found inside the CHANNEL element are defined in the following sections.
The following constructs are used for referring to external resources such as HTML pages.
Linking Constructs | |||||||||||||||
|
The URL symbol specifies a Uniform Resource Locator (URL) as defined in RFC-1738 in the reference. The HrefAtt symbol simply specifies the attribute name/value pair that is always used with a URL. Finally the Anchor symbol specifies an element in which an HrefAtt attribute name\value pair can be placed inside of for compatibility concerns.
Dates and dates with a time are expressed in CDF:
Date and Time | ||||||||||||||||||||||||||||
|
This format is a simplified subset of the ISO 8601 date and time standard [ISO 8601], [W3C-NOTE-datetime]. Note that Month must be between 01 and 12, Day must be between 01 and 31 and the Hour must be between 00 and 23.
There also needs to be a way to expresses quantity of time rather than specific times or dates:
Time Quantity | ||||||||||||||||
|
Channel Definition Format can specify how often a channel needs to be updated. This updating involves downloading and locally saving (precaching) on a client machine the contents referenced by the CHANNEL and ITEM elements and also sending page hit logging information to the target server specified in the LOGTARGET element.
Schedule | ||||||||||||||||||||||||||||
|
The most important function of a CDF schedule is to specify how often a task should be repeated on a periodic basis. Between each repeated task (e.g. downloading content) is a quantity of time. This is the "interval time" which is specified in the INTERVALTIME element. A schedule is a series of repeating periods lasting an interval time amount of time. This series begins on the midnight of the day specified by the STARTDATE attribute. If STARTDATE is not present it is assumed to be the day a schedule starts to be used by a client. Likewise a schedule will cease to apply after the day specified by STOPDATE ends.
By default, channel updating occurs during the beginning of each of these interval time periods. This presents a problem for content servers which can be overloaded when a large number of clients download content at the exact same time specified by a CDF schedule. To solve this problem clients can instead of updating during a specific time can update randomly within an range of times within an interval time period. When introduced with a "latest time" quantity, clients can randomly choose a single time between the beginning of the interval time period and out into the future by the "latest time" amount of time. If the range of times in which a random time is chosen should not begin during the start of each interval time period, then EARLIESTTIME can be introduced so that the range begins "earliest time" amount of time into the future from the start of the interval time period. For example:
<SCHEDULE STARTDATE="1997-03-24"> <INTERVALTIME HOUR="6" /> <LATESTTIME HOUR="3" /> <EARLIESTTIME HOUR="1" /> </SCHEDULE>
specifies that every day from March 24th, 1997 on, an update should randomly occur once between 1am and 3am, once between 7am and 9am, once between 1pm and 3pm and once between 7pm and 9pm.
A TIMEZONE attribute is introduced since schedules can be published in different time zones from where a client may use the schedule. When the TIMEZONE attribute is not present then all the times deduced from a schedule are in a client's local time. If a TIMEZONE attribute is specified it specifies the time zone of the publisher of a CDF schedule. All clients in other time zones should adjust their interpretation of a CDF schedule so that it matches the times in the publisher's time zone. If a client in Israel (time zone +0200) uses a schedule from Spain (time zone +0100):
<SCHEDULE TIMEZONE="+0100" > <INTERVALTIME DAY="1" /> <LATESTTIME HOUR="3" /> </SCHEDULE>
then the client will update between 1am and 4am in the client's local Israeli time which is actually 12am and 3am in the publisher's Spanish time.
The set of time ranges during which a client should randomly choose a time to update content can be expressed formally on a timeline as follows:
{ [ S + E + I*n , S + L + I*n ] : n
³
0 } / { [a,b] : b > T }
where
The primary purpose for CDF is to describe and index external content, usually in the form of HTML pages. The information to satisfy this purpose can be divided in two areas: bibliographic description and information specific to where content resides and how it should be retrieved.
Bibliographic Description | ||||||||||||||||
|
(eg: <TITLE>CDF Reference</TITLE> )
(eg: <ABSTRACT>How to create CDF</ABSTRACT>)
Page Retrieval Attributes | ||||||||||||||||
|
These attributes specify from where and how resources and HTML pages which are indexed by a channel should be retrieved during channel updating. The HREF attributes specified the URL of the page or resource indexed.
In addition to providing indexing and description on external content, CDF can also organize that indexed content into a hierarchical structure. When an external resource is a leaf node in the hierarchical structure provided in CDF, it's bibliographic and retrieval information is provided in an ITEM element. For nodes which contain other nodes in the hierarchical structure the CHANNEL element is used.
ITEM element - leaf node in hierarchy | ||||
|
Example:
<ITEM HREF="http://www.foosports.com/intro.htm"> <TITLE>Welcome to FooSports!</TITLE> <ABSTRACT>FooSports articles, news, and promotional offers</ABSTRACT> </ITEM>
Two symbols which are in ITEM but not CHANNEL are Log and Usage. Log (defined farther below) indicates whether the indexed page should be logged. Usage, also defined farther below, indicates how the indexed resource or page should be used.
CHANNEL element - container node in hierarchy | ||||||||||||
|
The resource or page indexed by a node in the CDF hierarchy is
provided through the HREF attribute in either the PageAttSeq
or Anchor
symbols, with preference given to the PageAttSeq symbol. The
bibliographic information in BiblioElemSeq
refers to this page. The Logo symbol provides a reference to an
image which serves as an associated visual for the indexed
resource or page.
The CHANNEL and ITEM elements which are specified in the HierarchySeq symbol are nodes which are contained by the containing CHANNEL element in the hierarchy specified by CDF.
Lastly we have the BASE attribute in CHANNEL.
Base Attribute | |||||
|
Logo | ||||||||
|
Specifies an image that can be associated with a channel or channel item.
Value | Size |
"ICON" | 16H x 16W |
"IMAGE" | 32H x 80W |
"IMAGE-WIDE" | 32H x 194W |
Note GIF and JPEG formats should be supported as logo images.
Example:
<LOGO HREF="http://www.foosports.com/images/logo.gif" STYLE="IMAGE"/>
Usage | ||||
|
Indicates how an ITEM element should be used.
"Channel" | These items appear visually on client program. This is the default behavior when no USAGE element appears under an ITEM. |
"NONE" | If a "NONE" value is used as the only USAGE element in an ITEM, the item should not be display on the client. |
"ScreenSaver" | These items are displayed in a screen saver. |
The content of the element is optional and varies according to the VALUE attribute. Any other values can be specified in the VALUE attribute.
Example:
<ITEM HREF="http://www.foosports.com/screensaver.htm"> <USAGE VALUE="ScreenSaver"></USAGE> </ITEM>
Login | ||||
|
Indicates that when a channel is subscribed to, the client should prompt the user for user authentication information to use in later unattended channel udpates.
Logging | ||||||||||||||||
|
Specifies where to upload a client's page-hit log file in Extended Log File Format [W3C-WD-logfile].
Example:
<LOGTARGET HREF="http://www.foosports.com/logging" Method="POST" SCOPE="OFFLINE"> <PURGETIME HOUR="12"/> </LOGTARGET>
Log | ||||
|
Specifies that the page referenced by the parent CHANNEL or ITEM element should be recorded in a page hit log file.
An ITEM can be logged only if the path of the ITEM's HREF attribute falls under the path of the CDF's URL or the path of the LOGTARGET's HREF.
Example:
<ITEM HREF="http://www.foosports.com/promotion.htm"> <LOG VALUE="document:view"/> </ITEM>
Multisets are collections for objects which do not have any order but do have a number or occurrences associated with each objects. Sets are multisets where all member one either one or zero occurrences. Multisets provide a clean way to define any collection where multiplicity of member is of concern but not order. Formally a multiset is function from a universe of possible member object to the natural numbers (i.e. {0,1,2,....}). Like functions, a multiset can be defined from a set of ordered pairs of a multiset member and a natural number. When such a set is provided it is assumed that any ordered pair (x,0) is included if no ordered pair (x,n) is included. With this representation of multisets, the multiset union of two multisets, ( a È b ) , is { (x,n+m) : (x,n) Î a, (x,m) Î b }. We will find that collections of multisets of languages are useful for creating string expressions where multiplicity of component parts is important but now order. We will refer to collections of multisets of languages as multiset expressions.
A useful function L maps multisets of languages (sets of strings) to languages.
L(M) := { s1 s2 ..... sn : s1 Î L1 , s2 Î L2 , ... , sn Î Ln , {(L1,1)} È {(L2,1)} È .... È {(Ln,1)} = M }
A version of L for multiset expressions naturally results with:
L(C) := { s : s Î L(M), M Î C }
When a multiset expressions needs to be mapped to a language it is natural to use the L function. For this reason, if a multiset expression is found in a regular expression of languages it will be assumed that the L function is to be applied to the multiset expression.
Now that there is a well defined means to include multiset expression in regular expressions, it will be useful to have operations whereby multiset expression can be constructed from languages and other multisets:
© 1998 Microsoft Corporation. All rights reserved. Terms of use.
Did you find this article useful? Gripes? Compliments? Suggestions for other articles? Write us!
© 1998 Microsoft Corporation. All rights reserved. Terms of use.