home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1997 December
/
Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso
/
drafts
/
draft_s_z
/
draft-yong-sarp-00.txt
< prev
next >
Wrap
Text File
|
1996-08-19
|
30KB
|
694 lines
Network Working Group Yong Xie
Internet-Draft Myung J. Lee
Tarek N. Saadawi
The City College of New York
August 20, 1996
Sink-Assisted Routing Protocol (SARP) For General IP Multicasting
<draft-yong-sarp-00.txt>
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).
Abstract
This Internet-Draft proposes a routing protocol for supporting
real-time multicast service over the Internet. The work is
motivated by the fact that the existing multicast routing protocols
are designed under the assumption of symmetric paths, which is not
necessarily true especially in the real-time applications.
The essence of the proposed protocol is using the inter-subnet
Routing Guide Message (RGM) flows generated by sink-subnets
to form multicast spanning trees.
The underlying algorithms enable source and intermediate nodes
to construct minimal end-to-end delay paths to sink-subnets
based on the path and the session information conveyed by
incident RGM-flows.
Expires February 20, 1997 [Page 1]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
1. Introduction
The primary function of a multicast routing protocol is to select a
subset of the Internet topology to form the group-specific spanning
trees over which multicast datagrams are delivered from their
sources to their sinks. Practically, a multipoint-to-multipoint tree
is constructed by multiple point-to-multipoint subtrees. Multicast
routers need to look up the actual source or the sinks of a
particular datagram in order to make the forwarding decisions.
There have been a number of multicast routing protocols and
associated algorithms proposed by the Internet community. A
typical example is the Distance Vector Multicast Routing Protocol
(DVMRP) [4] which employs the Reverse Path Multicast (RPM) algorithm
in the latest mrouted version 3.5 and some vender implementations. As
well surveyed by [3], RPM is an enhancement to the Reverse Path
Broadcast (RPB) and the Truncated Reverse Path Broadcasting (TRPB).
RPM defines a way to form the source-rooted spanning trees. The
distance back to the source of datagrams is the main factor in
determining the forwarding paths. It assumes the symmetricity of
paths which becomes a fundamental limitation of entire "reverse
optimal path" routing algorithm family, including RPB, TRPB, and
RPM. In fact, a shortest path does not necessarily guarantee
a shortest reverse path because of the possibility that a
full-duplex link could have substantially imbalanced loads on each
direction. Furthermore, using fixed "metrics" (e.g., hop count) to
compare alternative routes is inappropriate for the situations where
routes need to be chosen based on real-time parameters
(e.g., real-time delay)(see [2]). These two problems, symmetric path
assumption and fixed metric, poses significant challenge for any IP
multicast routing protocols should they support real-time applications.
The proposed Sink-Assisted Routing Protocol (SARP) is an attempt to
address aforementioned two issues. That is, SARP focuses on
computing the minimal-delay path from sources to sinks using
a real-time delay metric. In contrast, DVMRP is derived from
the Routing Information Protocol (RIP) and concerned with computing
the shortest path, in terms of fixed metrics, back to the source of a
multicast datagram. In short, SARP is to form sink-rooted rather than
source-rooted spanning trees using the inter-gateway primitive called
the Routing Guide Message (RGM).
In order to search for potential multicast source groups, it is
assumed that the sink-subnet periodically generates and multicasts
RGMs to every possible subnets within a limited scope of the
Internet. A source-subnet starts multicasting datagrams out only
after intercepting the corresponding RGM flow(s). The information
conveyed by a RGM includes a list of the requested multicast groups
on behalf of the entire sink-subnet, and the path status as well.
Expires February 20, 1997 [Page 2]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
RGMs are to be forwarded using a typical Reverse Path Broadcast
algorithm, except that the optimal path is determined based on the
measure of Reverse-Relative-Delay (RRD). In other words, RGMs are
routed along the paths whose reverse paths are optimal in term of
real-time delay back to the sink-subnets.
Currently, multicast service is considered connectionless, due to the
genetic property of IP. Conventional higher layer handshake-type
setup of point-to-point connection is often too expensive for most
real-time and multicast applications. Also, flow control and error
correction could be really complicated when maintaining a dynamic
multicast session group. Nevertheless, there is an emerging demand
for a quality assured IP multicast service, which requires certain
degree of connectivity between sources and sinks.
Towards that end, SARP is also intended to introduce the concept
of "loose-IP-connectivity". That is, the logical connectivity between
two IP end-points can be loosely defined by the continuity of
end-to-end control-message-flow, while using the basic IP support.
In SARP, multicast spanning tree is completely specified by the
incidence relations between the RGM-flows and the source nodes.
The sink-nodes send periodic RGMs as requests, wait for data arrival,
and regard the data arrival as receiving the natural acknowledgement,
while a source-node is aware of the existence of the sink-nodes and
maintains transmission only if RGM-flows are engaged, giving a sense
of data connection. We strongly believe that, as far as multicasting
and routing are part of the IP functions, it is necessary to extend
IP to provide an unacknowledged connection-oriented service.
First, such service is beneficial to IP multicasting in which the
point-to-point addressing is impossible and the data connection is
more important than the logical connection. Second, it is capable of
conjoining the routing and the resource reservation, which are
naturally two inseparable issues.
Throughout this draft, the term "IP-entity" is used generally to cover
a single network or a subnet, in which all hosts share the same subnet
IP address. It has at least one full-duplex link to its neighbor.
The access point of an IP-entity to the link is called the port.
The abstraction of the Internet can be considered as a planar graph
with a set of IP-entities as the nodes, and pairs of the directed
edges connecting adjacent nodes. Thus, we shall frequently call
the IP entity as a node, for example, the source-node, the sink-node,
and the intermediate-node.
2. Model of Operation
Multicast routing within an entity is transparent to IP. That means,
as long as a datagram remains on a subnet, it can reach any sink
Expires February 20, 1997 [Page 3]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
application within the subnet using the technology that is specific
to that subnetwork. For example, on the Ethernet, multicasting is
handled by mapping a class-D address into an Ethernet address. On
connection-oriented networks, such as an ATM network, it can be
done by using some signaling mechanism to establish the host-to-host
or host-to-gateway Virtual Circuits (VCs).
As with the task of IP multicast routing being primarily a matter of
finding gateways among networks, the proposed protocol is concerned
only with routing multicast datagrams across an internetwork,
along the minimal-delay paths from source-subnets to sink-subnets.
SARP depends on the participations of the sinks to establish the
group-specific multicast spanning trees. According the standardized
host implementation for supporting IP multicasting [1], incoming
multicast IP datagrams are received by upper-layer protocol modules
using the same "Receive IP" operation as for normal unicast datagram,
an application process need to explicitly ask its IP module to join
the group through a service interface. In addition, SARP assumes
that the process claim itself as the "sink" by opening the file
descriptor in a read or read/write mode. The host IP module must be
extended to inform the multicast session information to the subnet-
router, so that the routing protocol module is able to summarize the
demands of sub-hosts into the per subnet sink-group-table. As long as
the table is not empty, RGMs are generated.
3. Algorithms for SARP
3.1. Measuring the Reverse-Relative-Delay of a Path
Considering the real-time properties of most multicast applications,
it is desirable that the distance between two nodes is quantified in
term of the real-time delay. The absolute delay, however, is costly
to measure since network clocks usually are not accurately
synchronized enough for real-time interactive applications.
Moreover, the current network synchronization protocols (e.g., [5])
assume symmetric network paths and use the round-trip-delay as an
approximation. We call the delay measured by referencing
asynchronous clocks the "relative-delay". It is the real-time delay
measure with a clock-offset error in it.
In a best-effort service packet-switching internetwork, "optimal" is
quite a relative term. When a source/intermediate-node intercepts
only one RGM-flow from a given sink-node, the route of the incoming RGM-
flow is certainly the only and therefore the optimal route to forward
datagrams. In case when multiple RGM-flows incident with a source-
node are rooted at the same sink-node but via different routes, the
decision of the optimal route back to the sink-node can be reached by
simply comparing relative-delays of distinct paths. This is possible
Expires February 20, 1997 [Page 4]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
because that incurred relative-delays result in the same amount of
the clock-offset for any path between a pair of nodes.
To measure the relative-delay of a path, each node in the Internet
that participates in the routing protocol is assumed to send periodic
delay-probe-packets to every possible adjacent node. Thus, the
receiving node is able to compute the directional One-Hop-Relative-
Delays (OHRDs) from (not to) its neighbors. It is required that each
node maintains an OHRD cache which has entries for each neighbor. No
further routine information exchange or propagation among routing
entities is required by the protocol. This ensures the algorithm
against from slow-convergence. RGM is designed to convey the "run-
time" relative-delay information across the internetwork.
Let d(i,i+1) represent the absolute-delay from node i to adjacent
node i+1. The OHRD d'(i,i+1) can be measured by sending a probe
IP datagram from i to i+1, whose departure time t(i) and arrival
time t(i+1) are time-stamped according to the clocks at i and i+1,
respectively. d'(i,i+1) actually consists of two components:
d'(i,i+1) = d(i,i+1) + o(i,i+1)
where o(i,i+1) denotes the clock-offset between i and i+1.
Obviously, a single trial of the delay measurement would be
insufficient to represent the statistics of the path. It is assumed
that the OHRD database maintained by each node is updated with
moving-averages of instantaneous measures. The measure interval and
the averaging window size are critical to both sensibility and
stability of the routing protocol. The basic criterion is that, OHRD
database is expected to reflect link statistics, while minimal
computation is most desirable. Further studies are required for
determination of those parameters. From now on, we shall refer
d'(i,i+1) as the measure of the mean OHRD from node i to its neighbor
i+1. It is recorded by node i+1, and available for reference at any
time instant.
When a RGM traverses along a multi-hop path from node j to node i,
the relative delay of the reverse path, denoted as D'(i,j), can be
calculated by adding the OHRD of every hop along the path. Among
other information, each RGM-datagram carries a Reverse-Relative-Delay
(RRD) parameter, which is the sum of the OHRDs added every time
before the RGM crossing the hop. Thus, the receiving node is able to
exam the reverse path by simply reading the value of the RRD field:
D'(i,j) = d'(j-1,j) + d'(j-2,j-1) + ... + d'(i-1,i)
= d(j-1,j) + d(j-2,j-1)+ ...+ d(i-1,i) + o(i,j)
where o(i,j) is the clock-offset between node i and node j. It is
Expires February 20, 1997 [Page 5]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
equal to the sum of the clock-offset components of all OHRDs along
the path. Let c(i) represent the value of node-i's clock at a given
time instance, o(i,j) is always independent of the intermediate
clocks, and the path as well:
o(i,j) = c(i) - c(j)
= c(i) - c(i+1) + c(i+1) - c(i+2) + ... + c(j-1) - c(j)
= o(i,i+1) + o(i+1, i+2) + ... + o(j-1, j)
As shown in the example topology below, two RGM-flows originating at
node-A destinating at node-D via B and C. If D'(D,B,A) < D'(D,C,A),
the port to B is considered by node-D as the optimal route to forward
datagrams of certain multicast group to sink-node A, vice versa.
B---------D D'(D,B,A) = d'(B,A) + d'(D,B)
/ / = d(B,A) + o(B,A) + d(D,B) + o(D,B)
/ / = D(D,B,A) + o(D,A)
A---------C D'(D,C,A) = D(D,C,A) + o(D,A)
3.2. Forming the Sink-Rooted Spanning Trees
As mentioned in the foregoing, RGMs are used to search for multicast
source-node groups on behalf of multicast sink-node groups, and
to convey the information about the end-to-end RRD measures. The IP
header of a RGM datagram contains the sink-subnet-id as its Source
Address, and a well-known multicast address in the Destination
Address field particularly assigned to the routing protocol. The
initial Time-To-Live (TTL) parameter is set to a number greater than
1 and less than 255 to limit the sink-rooted tree to span within a
given scope. RGMs also carry other information which will be
discussed later.
RGMs are to be forwarded using the typical Reverse Path Broadcasting
(RPB) algorithm. It is assumed that each node that has more than one
port maintains a session database, called the "RGM-flow table", to
cache the information about every incidence RGM-flow. An entry is
created for a sink-node in the cache upon receiving the first RGM
from that node. Successive arrivals from the same sink-node will make
the entry stay effective.
For a node on the edge of the Internet, i.e., it has only one port
connecting to the outside world, it only has to record the union of
the multicast-group-lists of all incoming RGMs. The actual source
addresses of RGMs are not interested because the node ought to send
datagrams through the only port any way, if it is the source node of
Expires February 20, 1997 [Page 6]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
the requested multicast group.
In case that an intermediate/source node has direct connections with
multiple adjacent nodes, it possibly intercepts multiple RGM-flows
sourced by the same sink-node but via different routes. Nevertheless
, only one RGM flow from a given sink-node is accepted and possibly
forwarded to form the segment of the sink-rooted spanning tree. In
the RGM-flow table, an incoming RGM-flow is represented by the
(SinkNode, InPort) pair. RRD is recorded for the respective RGM-flow
whenever a RGM arrives. Any other ports that are currently not being
contacted by the same sink-node are considered having RRD value of
infinite.
Suppose that node-A has three neighboring nodes, it receives RGMs
rooted at sink-subnet S (134.74.60.0) from port p0 and port p2,
respectively. According to the RRD measurement, the route via p0 is
the optimal reverse path back to node-S. The RGMs arriving at p0 are
to be forwarded through ports p1 and p2 to the downstream neighbors
(with respect to RGM-flows), whereas, those arriving at p1 are
discarded. The RGM-flow table of node-A should look like as follows:
SinkNode InPort RRD FlowTTL
--------------------------------------
134.74.60.0 p0 12345 2
p1 inf /
p2 23456 3
As shown in the table, each (SinkNode, InPort) pair also has a Flow
Time-To-Live (FlowTTL) counter to monitor the continuity of the RGM-
flow. FlowTTLs will be decreased by one every unit time, and reset
after receiving a successive RGM. The RRD value of a sub-entry is
set to infinite if no more RGM arriving at the port and the FlowTTL
has expired. The entry for a given SinkNode is cleaned up only if all
sub-entries have infinite RRD values. It is mandatory that the
interval between any sink-node sending two consecutive RGMs along the
same outgoing path must have a protocol-specific up-bound, in order
to quantify the continuity of the RGM-flow. Since packet-loss and
delay-jitter are expected in the Internet, the duration of FlowTTL is
supposed to cover several such maximal RGM-intervals.
In summary, a source/intermediate node records the RRD information
for each incoming RGM-flow, and only forwards RGMs that arrive via
the optimal reverse path (with smallest RRD for a given sink-node)
to downstream neighbors through all ports except the incoming port.
Any RGM-flow will be terminated by the receiving node if the TTLs of
RGM-datagrams are exhausted, and/or the reverse path of RGMs is not
considered as the optimal reverse path back to their origins.
Expires February 20, 1997 [Page 7]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
3.3. Forwarding Multicast Datagrams to Sink-Subnets
Once the sink-rooted trees are formed, delivery of datagrams seems
straight-forward. The Forwarding-Table of a source/intermediate node
is derived from its RGM-flow-table. It contains entries for each
multicast group that is interested by at least one active sink-node.
Each multicast group in the table is mapped to a set of ports that
represents the union of the optimal routes to every sink-node within
the group. Thus, the basic forwarding rule is that, any node that has
either local-generated or received datagrams will forward them to
selected downstream neighbors according to the MulticastGroup-to-
OutPorts map. However, multiple sink-rooted trees might conjoin each
other so that the aggregated spanning graph contains circuits. In
order to form the group-specific spanning tree, circuits must be
broken.
#----------------------------------------------------------------#
| |
| +++++p0 +++++ G-E G E |
| + G @--------@ E @------- | | \ |
| ++@++ p1+++++p0 \ A-B-D A-B-D |
| p1| \ | | |
| |p1 \ p0 F-C (A) F-C (B)(C)(F) |
| ++@++ p1+++++ p1++@++ |
| + A @--------@ B @--------@ D + G-E G-E G-E |
| +++++p0 ++@++p0 +++++ \ | \ | \ |
| p2| A-B-D A B-D A-B D |
| |p0 | | | |
| +++++p0 p1++@++ F-C (D) F-C (E) F-C (G) |
| + F @--------@ C + |
| +++++ +++++ the tree rooted at a given |
| sink-node |
| |
| (*) represents sink node. |
#----------------------------------------------------------------#
The above figure illustrates an example topology and a possible
set of trees rooted at every potential sink-node in the graph.
Suppose that (A,C,G) is a set of active sink-nodes of a given
multicast group 239.10.10.1, accordingly, the forwarding-table of
each node should contain information as follows:
Expires February 20, 1997 [Page 8]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
MulticastGroup OutPorts SinkNodes
-----------------------------------
node-A: 239.10.10.1 p0 C
p1 G
node-B: 239.10.10.1 p1 A,G
p2 C
node-C: 239.10.10.1 p0 A,G
node-D: 239.10.10.1 p1 A,C
node-E: 239.10.10.1 p0 C
p1 A,G
node-F: 239.10.10.1 p0 A,C,G
The problem arises if node-E serves as one of the sources of the
multicast group, while the branches of tree(A) and tree(C) intersect
at source-node E from different directions. According to the basic
forwarding rule, datagrams are to be multicasted out through both
port p0 and port p1 of node-E. Upon receiving datagrams from its
port p0, intermediate-node B is supposed to further forward them
via port p1 and port p2, since both routes are optimal from B's
view point to sink-nodes A/G and C, respectively. Similarly, node-A
considers its OutPort p0 as the optimal route to sink-node C.
Obviously, this is a case of "mutual confusion", in which both node-A
and node-B will forward and receive duplicated datagrams to and from
each other.
The basic cause of such error is that, the sink-rooted routing
algorithm works, so far, in a manner that is totally transparent to
the source address of the multicast datagram. The protocol has defined
the mechanisms to enable an intermediate-node to explicitly determine
the downstream optimal route to a given sink-node. However, the node
has no idea whether an upstream path is also optimal for the same
sink-node. It is evident that the "mutual confusion" problem exists
between a pair of sink/intermediate nodes on a circuit, only because
they are receiving datagrams from a particular source/intermediate
node that forks multiple data-flows by duplicating datagrams. In the
previous example, only node A and node B are confused when receiving
datagrams sourced by node E. Any node other than node-E will not
cause the same problem.
The simplest solution to the circuit-problem is that, any node that
receives identical datagrams from different neighbors always forward
the early copy of a datagram and discards the later one. In so doing,
any intermediate-node that has multiple OutPorts for a given
Expires February 20, 1997 [Page 9]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
multicast group has to check the source address and the sequence
number of every incoming datagram. Furthermore, the node may send
messages to suggest its neighbors not to forward datagrams from a
particular source-node. The link that has such messages on both
directions must be the one causing the trouble. The problem is that,
in addition to the processing overhead and the potential "flip-flop"
situation, link-waste is inevitable.
Since any source/intermedia node knows exactly who are expecting the
data and binds them to corresponding OutPorts, a deterministic
solution is suggested: If a source/intermediate node "splits" the
data-flow, i.e., forwards a single incoming data-flow to more than
one OutPort, it is responsible for specifying the sink-information
regarding the departure-port in each outgoing datagram. The
information can be a list of sink-nodes either exclusive or inclusive
of the OutPort. Also, a receiving intermediate-node will check and
possibly re-specify the applicable sink information only if it is the
next splitting point of the data-flow. The proposed forwarding-algorithm
can be expressed as the following pseudo-code, in which inclusive
sink-specification is assumed.
while ( receiving datagram D(G) of group G from InPort(D) ) {
if ( G has entry in forwarding-table && TTL(D) != 0 ) {
outport[] = OutPorts(D) - InPort(D);
dup = number of entries in outport[];
if ( dup == 0 )
discard D;
else if ( dup == 1) {
TTL(D) = TTL(D) -1;
forward D through outport[0]; /* only active port */
}
else {
for (i=0; i<dup; i++) {
if ( sinks(D) && sinks(output[i]) ) {
sinks(D) = sinks(output[i]);
TTL(D) = TTL(D) -1;
forward D through outport[i];
}
}
}
} else
discard D;
}
Where, sinks(D) is the inclusive sink-node list coming with datagram
D, and sinks(output[i]) is the set of sink-nodes bound to a given
OutPort of multicast group G. Discarding datagram is only for the
purpose of resolving possible protocol error. In the steady state,
Expires February 20, 1997 [Page 10]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
the algorithm guarantees that any node intercepts data-flows only
from the optimal upstream paths. Therefore, delivery paths of
multicast datagrams are guaranteed end-to-end optimal. An additional
advantage is that any source/intermediate node is capable of
rejecting a particular sink-node for the purpose of security or
resource-reservation.
4. Discussion
One of the limitations of SARP is the additional overhead for
processing RGMs. With the basic algorithm of SARP, the size
of system memory required by the protocol is proportional to the
number of active sink-nodes. Any node are potentially saturated by
RGM-flows, especially when routing is taking place in an internetwork
with a regular mesh topology.
The protocol is expected to be further improved in terms of the
protocol efficiency and scalability with other supplementary means.
Since the multicast spanning tree is completely specified by the
incidence relations between RGM-flows and nodes, the optimalization
can be achieved by reducing unnecessary or redundant forwarding of
RGMs. An immediate justification to be made is using RPM algorithm
instead of RPB algorithm to forward RGMs. Moreover, an intermediate-
node that interconnects two disjoined sub-topology of the Internet,
such as the node on the multicast backbone, can be used to merge
RGM-flows from both directions to represent two groups of sink-nodes.
The current focus of SARP is to find minimal-delay paths with respect
to every sink-node, and the overall network load for a given multicast
group is not necessarily optimized. We reconize that realization of
minimal network load is equivalent to finding minimal RGM-flow
pattern. This issue will also remain under further study.
Reference
[1] Steve Deering, "Host Extensions for IP Multicasting," RFC1112,
August 1989.
[2] Hedrick, C., "Routing Information Protocol," RFC 1058, Rutgers
University, June 1988.
[3] C. Semeria, T. Maufer, "Introduce to IP Multicast Routing,"
draft-rfced-info-semeria-00.txt, 3Comy Corporation, March 1996.
[4] D. Waitzman, C. Partridge, and S. Deering, "Distance Vector
Multicast Routing Protocol," RFC1057, November 1988.
[5] D. Mills, "Internet Time Synchronization: The Network Time
Protocol," IEEE Trans. on Commun., Vol.39, Oct. 1991.
Expires February 20, 1997 [Page 11]
Internet Draft <draft-yong-sarp-00.txt> August 20, 1996
Author's Address
Yong Xie
Phone: 212-650-7261
Email: xyong@ee-mail.engr.ccny.cuny.edu
Myung J. Lee
Phone: 212-650-7260
Email: mjlee@ee-mail.engr.ccny.cuny.edu
Tarek N. Saadawi
Phone: 212-650-7263
Email: eetns@ee-mail.engr.ccny.cuny.edu
Department of Electrical Engineering
City University of New York, City College
New York, NY 10031
Expires February 20, 1997 [Page 12]