home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1997 December
/
Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso
/
drafts
/
draft_n_r
/
draft-odell-8+8-00.txt
< prev
next >
Wrap
Text File
|
1996-10-26
|
53KB
|
1,119 lines
Network Working Group Mike O'Dell
Internet-Draft UUNET Technologies
1996/10/22 05:58:54GMT
Expire in six months
8+8 - An Alternate Addressing Architecture for IPv6
<draft-odell-8+8-00.txt>
1. Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa) , nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast ), or
ftp.isi.edu (US West Coast).
2. Abstract
This document presents an alternative addressing architecture for
IPv6 which controls global routing growth with very aggressive
topological aggregation. It also includes support for scalable
multihoming as a distinguished service while freeing sites and
service resellers from the tyranny of CIDR-based aggregation by
providing transparent rehoming of both.
3. Introduction
IP version 6 represents a significant advancement in the technology
of the Internet. It provides large addresses, many sorely-needed
functional capabilities, and was intended to be a platform for the
further evolution of the Global Internet. Unfortunately, when IPv6
was created, Route Scaling, which has become the most significant
problem for the continued growth of the Internet, was not widely
understood to be the forcing function we now know it to be. Because
of that, the current IPv6 addressing proposal fails to provide an
operationally-scalable scheme for aggressive topological aggregation
and the continued scaling of the routing architecture.
O'Dell v2.21 [Page 1]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
The current IPv6 addressing proposals continue to rely almost
entirely upon CIDR-style aggregation for route growth control. Unlike
IPv4, in IPv6 this mechanism is coupled with support for easier
network renumbering which may make so-called "provider-based
addressing" a bit more palatable.
In general, the current IPv6 addressing model is inadequate for
several reasons. CIDR-style aggregation breaks down in the face of
the accelerating growth of multi-homed sites (leaf sites or regional
networks). Renumbering to accomplish simple topological rehoming
(e.g., changing ISPs) is a problem whose magnitude will only grow
over time. It will always be difficult to explain this to customers,
increasingly so with decreasing customer sophistication. While the
large IPv6 addresses provide for a huge increase in the number of end
systems which can be accommodated, it also portends a huge increase
in the number of routes required to reach them. Even if CIDR
aggregation continues at current levels, this presents a serious
problem because of the scaling behavior of the global route
computations.
This document presents a new proposal for using the 16 byte IPv6
address which mitigates the route scaling problem and with it a
number of collateral issues. This model provides for aggressive
topological aggregation while controlling the complexity of flat-
routed regions. It uses and supports the dynamic address assignment
machinery in IPv6, but makes the exact role of that machinery a local
decision with understandable costs and benefits rather than a
mandatory mechanism for simple rehoming situations.
The model also identifies the special work done by the global
Internet infrastructure to support multihomed sites, isolating it
into a specific mechanism which is then traceable to and incurred by
only those sites wishing to use this capability. This then makes it
possible for sites to make informed cost-benefit decisions about
multihoming.
4. Central Concepts
The addressing model proposed here is called "8+8" to distinguish it
from the existing proposals which are called "Flat-16" in this
document. The first central concept in 8+8 is simple:
The 16 byte IPv6 address is split into two 8-byte objects stored
in the existing 16-byte container.
The lower 8 bytes (least significant) form the "End System
Designator," or ESD. The upper 8 bytes (most significant) are called
the "Routing Goop", or RG. The ESD designates a computer system and
O'Dell v2.21 [Page 2]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
the RG encodes information about its attachment to the global
Internet topology.
As with other schemes distinguishing location from identity, the 8+8
model requires modifying the upper level protocols to consider only
the ESD when performing pseudo-header operations meant to identify
the end system as opposed to its location in the topology. A few
important examples: the TCP checksum pseudo-header would use only the
ESDs instead of the Flat-16 addresses; TCP associations would be
identified by ESD/Port instead of Flat-16/Port; IPSEC Authentication
and ESP header calculations would only consider the ESD and not the
RG of the address. Together these allow session-scale state like TCP
connections to survive global topology changes without special
considerations in the transport protocol.
Note: this proposal does not effect the IPv6 multicast, loopback, or
link-local address formats or usage. It is probably necessary to
create a new version of the "IPv6 site-local prefix" which uses an
ESD as the lower 8 bytes and would be used for within-site sessions
(in the exiting IPv6 sense) and for originating external traffic.
The second central concept is:
Formalize the distinction between "Public Topology" and "Private
Topology".
"Public Topology" is structure which must be understood by a number
other organizations, especially and specifically transit networks,
for constructing global Internet connectivity. "Private Topology" is
structure which is of no particular interest outside the containing
organization. In particular, general transit service is provided by
networks exposed in the Public Topology; networks composed of only
Private Topology cannot provide general transit service to the Global
Internet.
In the current IPv4 Internet, the distinction between Public and
Private Topology exists as a side-effect but it is not used to any
significant advantage beyond that which arises naturally from CIDR-
style aggregation. A current example of private topology is the
subnet structure used by the topology within a site as applied to the
CIDR block for the entire site. No one else outside the site
particularly cares about the internal structure of the site so there
is no real need to carry any routing information about it other than
the CIDR block describing it as a whole.
The 8+8 model elevates this observation to a major architectural
component providing an explicit notion of a "Site". A "Site" is the
O'Dell v2.21 [Page 3]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
simplest unit attachment to the Global Internet and is also the unit
of Private Topology. Within a Site, the ESD of a system is
sufficient for reaching it across the Private Topology as well as
globally identifying the system outside the confines of the Site.
This site-internal reachability can be accomplished by either flat-
routing on the ESD with a site (whether this is called "LAN
Switching" or something else is irrelevant), or by using a structured
ESD within the site. Both of these solutions are supported by the
structure of the ESD and each has identifiable and understandable
costs and benefits. These will be discussed at length later.
The "Public Topology" is the transit infrastructure which carries
traffic from one Site to another. It is composed of the various
carrier, reseller, and regional networks which we know today. The
Routing Goop portion of an 8+8 address is a locator which encodes
information about the way a Site (containing Private Topology) is
connected to the Public Topology of the transit networks. As will be
explained later, Routing Goop compactly encodes topology information
with very high degrees of aggregation while still affording the
opportunity to carry local detail for optimizing regional routes
without sacrificing global aggregation. Again, this will be
discussed later.
The third central concept is:
Dynamic insertion of Routing Goop into source addresses by Site
Boundary Routers when a packet leaves a Site and enters the
Public Topology.
This is one of the most radical parts of this proposal and was not
included in earlier versions of this document, but discussions with
various people convinced the author that it solves a sufficiently
compelling number of problems with one simple mechanism that it was
adopted. It too will be discussed later.
5. The Structure of End System Designators - the ESD
End System Designators designate every computer system in the 8+8
Internet regardless of whether it is a host, router, or other network
element. While a given system can have more than one ESD, each ESD
is globally unique. This is critical for their utility to the
upper-level protocols. This uniqueness can be induced several ways
as will be seen.
An interesting question is whether an ESD identifies a system,
possibly as in the XNS architecture, or an interface, as in the
existing IPv4 and IPv6 architecture. The answer is that an ESD
designates an interface on a computer system and that interface can
O'Dell v2.21 [Page 4]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
be either physical or virtual.
When processing an 8+8 address, a computer system need only examine
the ESD portion of the address to determine whether a packet is
destined for that system.
There are circumstances when it is quite useful to have "an address"
for a computer system which is independent of any particular physical
interface on that system. It has become commonplace in IPv4 practice
to use a distinguished virtual interface to provide a system with
such an "interface independent identity". This provides the same
architectural utility of XNS while still allowing the flexibility of
the IPv4 "addressed interface" model. We chose to retain the
successful IPv4/IPv6 model.
NOTE: We specifically avoid being pedantic about exactly what
constitutes an "interface" and a "computer system" as the
malleability of those notions in IPv4 has proven manifestly useful in
practice.
To summarize the ESD uniqueness characteristics:
(1) an ESD is globally unique
(2) an ESD designates an "interface" on "a computer system"
(3) an Interface may have more than one ESD
(current IPv6 already requires implementations to support
multiple addresses per interface)
(4) an ESD may not necessarily designate a particular
physical computer (Neighbor Discovery continues to provide
a level of virtual address translation and great
cleverness can be contained therein)
The following describes the 8 bytes of the currently-defined ESD
structures.
O'Dell v2.21 [Page 5]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Private Topology Partition |M| top 16 bits of Identity Token |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3 4 5 6
2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| bottom 32 bits of Identity Token |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Bits 0-14: 15-bit Private Topology Partition (PTP)
Provides for 32768 distinct partitions in the
Private Topology
Bit 15: Identity Token Mode Indicator
0 => 48-bit Identity Token
1 => Mode in upper bits of Identity Token
Bits 16-63: 48-bit Identity Token
Identity Tokens are formed as follows:
Mode 0 ESDs: (Bit 15: 0)
Identity Token is 48-bits of IEEE MAC Address
Bits 16-63: IEEE 48-bit MAC Address
Mode 1 ESDs: (Bits 15-18: 1001)
Identity Token is 45 bit "IETF NodeID" integer which are
assigned densely starting with 1.
Bits 19-63: IETF NodeID
Mode 2 ESDs: (Bits 15-18: 1010)
Identity Token is 32 bit officially-assigned public IPv4
address (i.e., NOT an RFC-1918 private-use address),
zero padded
Bits 19-31: must be zero
Bits 32-63: valid IPv4 Address
Mode 3 through Mode 7 ESDs (Bits 15-18: 1011 - 1111)
RESERVED
For interfaces with IEEE-assigned 48-bit MAC addresses, a Mode-0 ESD
is the most natural ESD for that particular interface. On the other
hand, a point-to-point interface with no other naturally-occurring
MAC address could be labeled using a Mode-1 ESD. Mode-2 ESDs provide
for exploiting an already widely-deployed identifier space for easing
the transition to 8+8. Links with MAC addresses larger than 6 bytes
O'Dell v2.21 [Page 6]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
can use Mode-2 ESDs and IPv6 dynamic configuration support with
Neighbor Discovery.
The IETF NodeID in the Mode-1 ESD is a 45-bit unsigned integer which
starts at one (1) and is incremented, assigning the numbers as
densely as possible. There is no particular need to delegate on bit
boundaries as powers-of-2 don't matter. The numbers merely must be
assigned uniquely to requesters. We leave the actual assignment
strategy and any potential delegation to the purview of the IANA.
A few comments on "global uniqueness" are in order because in
previous discussions, some people seem to think that unless
"uniqueness" can be accomplished with absolute and complete
mathematical perfection any scheme using the concept is unworkable.
This complete and utter nonsense and is rendered patently false by
multiple counter-example:
IEEE MAC addresses are globally unique by nature of the delegation
process where they are assigned to interfaces by the manufacturers.
Both XNS and IPX rely on this uniqueness and it works very well in
practice. IETF-NodeID values will be globally unique by nature of
the same kind of assignment mechanism. IPv4 addresses must be
globally unique for the Internet to function, and it does, mostly, by
nature of exactly the same kind of assignment mechanism.
Yes, it is true that sometimes accidents happen and an IPv4 prefix is
misconfigured and it can be troublesome to track down. But the
problem is quite manageable. Moreover, even with its extreme rarity,
it is much more common than two Ethernet interfaces having the same
MAC address. The author believes that the IEEE MAC address
assignment machinery coupled with the job the manufacturers do is the
closest approximation to "global uniqueness" which any significant
human enterprise can achieve, and it is more than adequate to the
task at hand. The IETF NodeIDs will be assigned at least as well as
IPv4 addresses, and IPv4 seems to work well enough for the Global
Internet to function with incredibly few problems arising from this
particular source.
6. The Structure of a Site
The 8+8 global routing architecture ultimately views a Site as a leaf
of the topology and doesn't concern itself with the interior of this
private topology. However, the internal topology of a Site is
extremely important to the management and operation of the Site so
the ESD structure provides for a rich set of organizational
alternatives with different cost-benefit tradeoffs.
O'Dell v2.21 [Page 7]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
ESDs are globally unique but can also carry internal structure. The
global uniqueness is provided by the Identity Token while the
internal structure is carried in the Private Topology Partition. The
ESD structure provides for 32768 distinct Private Topology Partitions
(PTPs) within a Site. This is the equivalent of EVERY Site having
been assigned a CIDR block of 128 Class-B addresses subnetted down to
a Class-C. The difference is that in an ESD, the subnet population
is limited strictly by the link-level (LAN) technology and not by the
253 host limit of the Class-C subnet. This allows an extremely rich
topology to be contained within a Site without it exporting
complexity into the global routing structure which must then be
concealed by tricks like CIDR aggregation.
Of course, an organization is not constrained to being structured as
a single Site. The trade-off is that the inter-Site topology must
then be part of the Public Topology. While the individual Sites
retain considerable independence in topological structure and
attachment to the Global Internet, they must be aware of changes
between the constituent Sites and that rehoming of constituent Sites
will potentially impact long-running sessions. That is the cost of
exploiting the routing machinery available to the Public Topology.
Given the flexibility available for organizing a Site, it is
worthwhile to examine a few examples. Note that none of these
organizational approaches is exclusive. A large Site might well mix
these approaches to good effect and indeed the goal is to provide the
designer of private Site topology with a broad spectrum of design
alternatives.
The simplest structure to imagine is a Site using all Mode-0 ESDs
with all the systems connected in a single Private Topology Partition
(i.e., all the ESDs carry the same PTP value which is assigned by the
local network administration). Given the sophistication of current
LAN-switching technology, a Site like this could be both large and
internally complex, but the complexity is absorbed into the LAN
infrastructure and it appears to be only one partition from the 8+8
Private Topology view. This structure has one very significant
advantage: rehoming a system within this structure will not change
the ESD and TCP sessions (for example) will survive arbitrary changes
in the private topology. This works, of course, because the single
PTP is a virtual topology with the real topology hidden by the LAN
Switching machinery.
The second Site model is like the one just described, except it would
have multiple PTPs with routing carrying traffic between the
segments. This is very close to the common IPv4 structure of a CIDR
block being subnetted to assign a prefix to each PTP. This approach
has the advantage of familiarity, but it has the disadvantage that
O'Dell v2.21 [Page 8]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
long-lived TCP connections don't necessarily survive arbitrary
changes to the private topology. The existing IPv6 dynamic address
assignment machinery will serve to make such internal changes much
less painful than with IPv4, however. One point worth noting,
though, is that even with multiple PTPs routed within a Site, a
"Private Topology Partition" need not correspond to a "physical" LAN
cable. The PTP values could be used to label larger organizational
structures like "Engineering" or "Finance". This could reduce the
likelihood that common internal topology changes break long-lived
connections.
The third Site model uses Mode-2 ESDs based on existing IPv4 address
assignments. In this case, all the IPv4 Identity Tokens could be
placed in a single PTP and then routed internally on the IPv4 address
in the lowest 4 bytes of the Identity Token. This has the advantage
of significant familiarity, but also can induce externally-visible
changes if ESDs must be reassigned because of private topology
requirements. Again, it must be emphasized that the IPv4 addresses
used in a Mode-2 ESD must be an officially-registered, public-use
IPv4 address and NOT an RFC-1918 private-use address. Using an RFC-
1918 private-use address violates the global uniqueness properties
required of an ESD.
In all of the multi-segment cases, a Mode-1 ESD could be used to
designate any point-to-point link endpoint, the loopback addresses in
routers, or any other IP-accessible network elements which don't
naturally have IEEE MAC address for forming a Mode-0 ESD. And in all
of the cases, Mode-1 ESDs could be used universally, although it is
more appropriate to use Mode-0 whenever possible; no sense wasting
Identity Tokens when it isn't necessary.
In all of the cases where the real topology is not completely
virtualized by the LAN technology, there will be "Internal
Renumbering" events caused by moving systems between infrastructure
segments (PTPs). This will have the effect of killing long-running
off-Site connections unless provisions are made to allow the systems
to carry the previous ESDs as synonyms for a while. Given that most
significant topology moves involve powering off the end system in
question, this is hardly a hardship. However, the powerful
renumbering support already developed for IPv6 can make those other
moves considerably less impacting.
But most importantly, external rehoming of a Site to the global
infrastructure can be made completely transparent in almost every
case.
7. The Structure of Routing Goop
O'Dell v2.21 [Page 9]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
Routing Goop, or "RG" is the upper 8 bytes of an 8+8 address. This
somewhat non-technical term was chosen because all the other
alternatives seem to have various degrees of conceptual baggage which
would be as much work to neutralize as the new notions are to explain
in the first place.
Fundamentally, RG is a Locator. It encodes the topological
connectivity of the Site containing the computer system identified by
the ESD in the lower 8 bytes. In the case of a singly-homed Site,
rehoming to a new attachment to the Public Topology will change ONLY
the RG in full 8+8 addresses for computer systems at that Site. One
example of such a rehoming would be a change of the Site's Internet
Service Provider. This change-over can be made essentially
completely transparent to users both inside and outside the Site,
although it does involve a practical limit on the transition duration
relating to how long the departing ISP is willing to extend
transitional courtesies. During a changeover, though, all new
connections will be initiated via the new ISP connection.
This brings up the deep structure of the topology information carried
in RG and how it is encoded. More specifically, RG is a hierarchical
locator which can be viewed as a rooted path-expression of flat-
routed regions which are tangent. Each element in the path-
expression contains only enough detail to negotiate the flat-routed
region.
It has been observed before that the graph of the Global Internet is
not obviously a hierarchy so how can this work?
We start with the observation that every connected graph has at least
one labeling which forms a spanning tree covering the nodes. The
hierarchy is induced by a labeling function which partitions the
global graph into regions and recursively into subregions. This
function is only globally visible at the top-level where an initial
partitioning of the graph is used to form the first level of what
will become the hierarchy. Within each partition there is a local
sub-partition function which assigns labels, and we proceed
recursively. The nested recursions directly induce the hierarchy.
This decomposition of the Global Internet produces a recursive graph
where each level is composed of a set of subgraphs which are
explicitly connected (i.e., explicitly routed between the subgraphs)
while the structure within each subgraph is assumed to be flat-routed
(at least as seen at that level).
From an abstract viewpoint, a hierarchical partitioning can be
induced with an arbitrary choice of labeling function (as long as the
function produces the minimally-required partitioning). However, we
O'Dell v2.21 [Page 10]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
desire the partitions to have several important properties which
effects the choice of labeling function.
The general goal is to produce a global labeling which represents the
topology as compactly as possible, yet allows rich connectivity while
bounding the complexity of the discrete regions which are flat-
routed.
The top level objects in the 8+8 graph hierarchy are called "Large
Structures". These are objects chosen for their ability to naturally
represent significant topological aggregation of substructure (not
geographical, political, or geometric). The number of Large
Structures is explicitly limited to bound the complexity at the top
level of the aggregation graph.
Within Large Structures, the (sub-)partition function is a trade-off
between the flat-routing complexity within a region and minimizing
total depth of the substructure. This is driven by the internal
topology of a Large Structure and the choices in different Large
Structures will not necessarily be the same. This is why Routing Goop
only has one hard bit boundary; Large Structures are free to
internally subdivide as they chose. They are only required to
encapsulate a significant portion of the Public Topology.
One obvious candidate for Large Structures is large networks which
already represent considerable aggregation based on existing CIDR
deployment. Another good candidate might be "Exchange Points". The
8+8 model can accommodate both of these simultaneously, allowing
IPv6-style "Network-anchored Prefixes" and "Exchange-anchored
Prefixes" like that proposed by some to coexist and be subsumed into
a unified notion of "Aggregator-anchored Prefixes." Of course, these
aren't prefixes strictly in the IPv4 CIDR sense, but the left-
anchored substrings of the Routing Goop are intuitively quite
similar.
Large Structures are assigned a Large Structure Identifier, known as
an LSID. The total number of LSIDs is intentionally limited as we
assume the paths between Large Structures are only flat-routed.
Two consenting Large Structures remain free to share a tangency below
the top level and exchange routes so as to provide for improved
routing between the two of them (formalizing cut-throughs in the
natural hierarchy). The goal is to provide for manageable complexity
of the ultimate default-free zone (the top level of the global
hierarchy) while allowing for controlled circumvention of the natural
hierarchical paths.
O'Dell v2.21 [Page 11]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
Bit-level structure of Routing Goop:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| xxx | 13 Bits of LSID | Upper 16 bits of Goop |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3 4 5 6
2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Bottom 32 bits of Routing Goop |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
NOTE: The Routing Goop structure above assumes that the 8+8 proposal
is designated by a 3-bit type of IPv6 address. If an 8+8 address is
identified by two upper bits, the LSID would expand to 14 bits. If
identified by one bit, the LSID would stay at 14 bits and the Upper
16 bits of Goop would expand to 17 bits.
Routing between two interior points of two Large Structures is always
possible based solely on the LSID. This provides a "forwarding
strategy of last resort" for a router running "default-free". From
one point of view, the LSID partitions the Global Internet into a set
of regions such that an interior router only need carry a "per-LSID
default" pointing at an appropriate boundary router which knows how
to to handle traffic bound outside the containing Large Structure for
a point in the other Large Structure.
If two Large Structures share a tangency somewhere below the top
level, then some interior routers of both Large Structures will share
routes to exploit the tangency for optimizing paths. How this cut-
through information is distributed within the two Large Structures is
not revealed elsewhere in the global topology. The exact "shape" of
the optimization region is controlled by the decisions about which
routes to advertise across the cut-through. These decisions are made
by the collaborators and the optimized region need not be symmetric
with respect to the cut-through. The size of the optimization area
is controlled by how far routes learned via the cut-through are
propagated within the sub-graphs tangent via the cut-through. Again,
this is a matter of engineering choices made by the collaborators
operating the cut-through.
We note that while the LSID is intuitively similar to the Autonomous
System Number currently used in IPv4 policy-based routing machinery,
the LSID is quite distinct from the AS number and the two identifiers
play very different roles. AS Numbers will continue be used for
policy routing information exchange and will remain distinct.
O'Dell v2.21 [Page 12]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
8. The "Flow" of Routing Goop
It is intuitively useful to think about Routing Goop as "flowing
downhill" through the hierarchy from the topmost Large Structures,
through the intermediate levels of the Public Topology, and
ultimately down to the Site. As the RG propagates downward, the
prefix extends to the right, just like in IPv4 CIDR, with each
extension navigating the nested flat-routed subgraphs, eventually
terminating at the Site, which then descends invisibly into the
Private Topology of that Site.
The nested flat-routed areas correspond to transit subnetworks of the
Large Structure. One very important example of such subnets is the
"reseller" or "wholesale transit customer" of a Large Structure.
(Note that whether the Large Structure is a network or an exchange
point doesn't matter.) The reseller network provides transit for
Sites, so must be part of the Public Topology and appears as a
substring within the Routing Goop, usually the right-most extension
unless the reseller has further reseller customers. In that case,
the next level reseller will have his own extension to record his
place in the Public Topology and to provide for navigating through it
as well.
The overall picture can now be drawn as a forest of trees
distributing Routing Goop down to the Sites, with each tree being a
Large Structure and the Large Structures connected arbitrarily at the
top level. This structure will be mirrored by the actual machinery
for distributing Routing Goop to the Sites as will be discussed a bit
later, but this mental image of the prefixes "flowing" from the
anchoring Large Structures is critical to understanding fundamental
self-organizing abilities in the 8+8 model.
While the 8+8 machinery is intended to be adequate for almost
completely automated self-organization with respect to the
construction and propagation of Routing Goop on an Internet-wide
basis, we proceed for now closely following current practice
(admitting manual configuration of certain information like Routing
Goop) because of the additional complexity of the self-organization
functions. Initial deployment following current practice would not
preclude eventual deployment of a fully self-organizing Global
Internet.
9. The Distribution of Routing Goop
There are two cases to consider for how Routing Goop gets
distributed: source addresses and destination addresses. In both
cases RG is part of the address, one way or another, so we show how a
full 16-byte address with the right RG gets created in these two
O'Dell v2.21 [Page 13]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
cases.
9.1 RG for Source Addresses
The RG of a source address is almost always the site-local prefix.
If the destination address is not within the Site, the packet will
leave the Site via one of possibly several Site Boundary Routers.
The Site Boundary Router inserts the correct RG in the source address
based on the path the destination should use to return a packet to
the sender. Except in very unusual circumstances this will be the RG
which corresponds to the attachment path of the Site Boundary Router
to the Global Internet.
If the Site is Mulithomed via just one Site Boundary Router, then the
router is free to apply whatever local policy suits. It simply must
fill in a valid RG path which leads back to a Site Boundary Router
for that Site. If the Site is Multihomed via more than one Site
Boundary Router, which router the packet leaves by is purely local
policy and which RG gets applied is likewise local policy.
The dynamic insertion of RG upon Site exit accomplishes a number of
things.
(1) It means that for most purposes, a computer system at a Site need
not concern itself with exit topology policy matters which can be
particularly tricky in Multihomed Sites.
(2) It means that computer systems are essentially not impacted at
all by topological rehoming of the Site.
(3) It means that more complex Multihoming scenarios with multiple
Site Boundary Routers each with multiple connections to the Global
Internet can execute arbitrarily complex path recovery policy without
concern for how it might impact a computer system doing source
address selection.
(4) It means that Mobile IP is dramatically simplified over the
current model, but we postpone that discussion to another day.
(5) It means that while a computer systems might forge the ESD in a
source address, it CANNOT forge the point of injection into the
Public Topology. This is not strong authentication down to the
particular computer system, but it is probably a strong deterrent to
certain obnoxious activities due to the dramatically improved
traceability. We also note that the first-hop attachment router in
the Public Topology is free to insert or override the RG if somehow
an errant packet escapes a Site without it, thereby enforcing
tracability. Of course, the Public first-hop router could always just
O'Dell v2.21 [Page 14]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
drop a packet carrying inappropriate source RG as well. But to make
it very clear, we put the burden of inserting correct RG in exiting
source addresses squarely and solely on the Site and the Site Border
Router. Any other location of the task has bad performance scaling.
This simple mechanism solves a number of problems and actually
simplifies the operation and deployment of this architecture so is
well worth the implications it has for Site Border Routers.
The Site Border Router gets the necessary RG from the first-hop
attachment router in the Public Topology. Alternately, as an initial
mechanism the RG could be statically configured, but the real goal is
completely automated propagation down the tree so that an entire
complex subtree can be rehomed without human intervention or service
disruption.
9.2 RG for Destination Addresses
Currently, an IPv6 address lookup for a DNS name returns the
information in a "AAAA" record which is the full 16 bytes of the IPv6
address.
The 8+8 design proposes synthesizing the 16 bytes of information in a
query response from two different sources: an "AA" record and an "RG"
record. The "AA" record carries the 8-byte ESD for the DNS name in
question and the "RG" record carries 8 bytes of the appropriate
Routing Goop.
One interesting question is how the AA record gets paired with an RG
record in a given nameserver. One simpleminded implementation would
be to pair an RG record with a zone, but that has the problem of
requiring all the systems in that zone to use the same Routing Goop
and hence be in the same Site.
A better scheme is to carry an "RG Name" in the "AA" record which
would allow a nameserver to concatenate an arbitrary RG prefix to the
ESD producing the full 16 byte response. The "RG Name" would be a
full DNS name which could be recursively translated (and the result
cached). Structured as an "upward delegation" with an appropriate
Time-to-Live, a Site could import the Routing Goop information from
their service provider completely automatically. This capability
will be used to great advantage in the discussions of rehoming which
follows. [Interactions between RG TTL and zone TTL is an issue to be
explored more.]
Alternately, one special case for an RG record could be a delegation
to a Site Border Router which could supply the correct RG
automatically, at least in single-homed cases, and possibly in
O'Dell v2.21 [Page 15]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
multihomed cases.
The result of this structure is that individual zone entries for
individual nodes (AA records) do NOT change when a Site rehomes. The
only thing which changes (logically) is the RG information which is
composed with the nodes AA record to produce a full 16-byte response.
This means the general Dynamic DNS machinery is NOT required to
support Site rehoming.
It also gives rise to significant potential for "smart nameservers"
which examine the source address of a query to provided a more
topologically appropriate translation for a given DNS query. This
isn't perfect, but it is much more detail than current nameservers
have available without processing a full BGP routing table to
ascertain IPv4 prefix/AS correspondence.
10. Rehoming A Site
When a Site changes its point of attachment to the Global Internet,
it is said to "rehome". One of the significant criticisms of IPv4
CIDR and IPv6 "Provider-based Addressing" is the requirement to
"renumber" a Site when it rehomes. One of the explicit goals of the
8+8 architecture is to eliminate, or at least mitigate, the impact of
this.
It is important to reiterate the notion that the Routing Goop of an
8+8 address is not just a Locator, but that it encodes a PATH from
the top level of the global hierarchy down to the Site. Changing
that path is what makes Rehoming and Multihoming essentially
equivalent operations. We proceed with the simple case first.
When a Site wishes to rehome, it must establish a new attachment
point to the Global Internet, and hence establish a new access path.
Then it must start using that new path before the old path is
removed. The procedure is as follows:
A Site establishes a connection with a new ISP and it becomes able to
carry the traffic. At that point, the Site alters the upward
delegation of the DNS RG records. Henceforth, all new connections
made with the new translations will follow the new path to the Site.
The new connection path is then made the preferred exit path and
source addresses in packets exiting the Site immediately start being
marked with the new return path. The old connection should be
maintained for some administratively determined grace period to allow
DNS timeouts to transition new sessions to the new path and for
long-running sessions to terminate.
At first blush, it might appear that when the exit path for the Site
O'Dell v2.21 [Page 16]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
switches over to the new path and the Site Border Router starts
marking packets with the new RG, the return path for long-running
sessions would automatically switch over to the new path. Alas, this
is not so because a long-running session will be using destination
address containing the old RG acquired when the session first
started.
Consideration was given to providing some kind of "path redirect"
which would allow the other end to deal with "flying cutovers" of a
running session, but the security implications of this mechanism are
too far-reaching to consider as part of initial depolyment. If at
some later point it becomes clear how to accomplish this safely, then
it could be added downstream. But the complexity, security risks, and
the mangnitude of the added value do not make it worthwhile at
present, although the author would love to be convinced otherwise.
Alternately, the Site could request a "Rehoming Courtesy" from their
old ISP which would effectively make it a multihomed Site for some
period of time. After multihoming was established, the old
connection could be taken down and the long-running sessions would
continue to survive as long as the Site was multi-homed by way of the
Rehoming Courtesy.
Note that at no time did the rehoming effect anything internal to the
Site's Private Topology. The only change was the attachment to the
Public Topology and the Routing Goop which records that attachment
location.
11. Multihoming a Site
One of the curiosities of IPv4 is that the network does a lot more
work for a multihomed site but it is very hard to pin it down so that
the instigator of the efforts can compensate the workers.
In the 8+8 model, multihoming is an explicit service which is
performed for a Site by the agents of the Public Topology which
provide the access for the Site. This mechanism can be made more
sophisticated, but the notion is most readily explained by
considering a Site which is dual homed to two different ISPs and
hence has two distinct access paths represented by two distinct blobs
of Routing Goop.
The Site is attached to each ISP via some link and we postulate some
kind of keep-alive protocol which determines when reachability to the
Site's border router is lost. The ISP routers serving the dual-homed
Site are identified to each other (via static configuration
information in the simplest case or a dynamic protocol in the more
general case), and when a link to the Site is lost, the ISP router
O'Dell v2.21 [Page 17]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
anchoring the dead link simply tunnels any traffic destined for the
Site via the other ISP router.
This approach clearly requires coordination between the two serving
ISPs. This is not a new constraint - multihoming already requires
considerable coordination between the Site and is providers. Of
course, creating a protocol for dynamically creating a "homing group"
is probably a very worthwhile investment but it is not absolutely
necessary at the outset.
It should be obvious now that the "Rehoming Courtesy" in the previous
section is simply doing the router-pair coordination with the new ISP
for some period of time.
12. Rehoming a Reseller
Rehoming a Reseller is a slightly more general case of rehoming a
Site, primarily characterized by more lead time, a longer grace
period, and some necessary coordination with customer Sites to insure
that the Routing Goop propagates correctly.
The Reseller will establish a new connection which will not only
result in a new path for the Reseller's topology, but for that of his
customer Sites. When the Reseller alters his upward delegation of
Routing Goop, it will ripple downward to his customer Sites by nature
of their upward delegations. The downward ripple of Routing Goop via
the upward delegations should cause the Site zone TTLs to be reduced
appropriately to insure caches expire well within the dual-homed
transition grace period for the Reseller.
This essentially rehomes all the Reseller's customer Sites all at the
same time the Reseller's infrastructure is rehoming and should be
completely transparent except for long-lived sessions which do not
terminate by the end of the grace period.
13. Multihoming a Reseller
There are two parts to multihoming a Reseller - one part similar to
the Multihomed Site case above, and one part which is quite
different.
For this discussion, assume a Reseller which is dual-homed and hence
has two different Routing Goop prefixes (remember that each path to
the top level of the hierarchy has a distinct prefix). The reseller
can solicit multihomed tunneling services from his two access point
routers to provide alternate path service just like a multihomed
Site. Why traffic is coming to any particular router, though, is
influenced entirely by what routes are advertised out that particular
O'Dell v2.21 [Page 18]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
connection via BGP5 (or IDRP). This is rather different from the
multihomed Site case where the ESD is the object of interest and the
RG simply gets the traffic to the Site boundary.
The question arises, however, as to which prefix gets used for
extending downward to his customer Sites. The answer in the simplest
case is to pick one and use it, making the Sites "natural" in the
chosen prefix. The alternate prefix can, of course, be advertised
out the alternate path if desired. But this work can be ascribed to
the instigator and the superior attachment points can charge for this
service. (This is somewhat akin to charging for routes, but only
routes which create a discontinuity in the routing space.)
15. A Comment on NAT Boxes
Discussions of this proposal raised the question of what it means for
Network Address Translation (NAT) boxes. On the one hand, the 8+8
model allows a NAT box to modify the Routing Goop during forwarding
without impeding end-to-end TCP checksums which only rely upon the
ESDs. On the other hand, it isn't very clear what purpose of a NAT
box would have given the 8+8 model.
Typically a NAT box is cited as a way to have private topology within
a site (note lower case) which is then attached to the Public
Topology via the NAT box without revealing anything about that
private topology. The basic structure of the 8+8 model accomplishes
exactly this goal - providing genuine Private Topology within local
purview while providing independence of attachment point to the
Public Topology. The broad conclusion is that pure NAT boxes don't
have much of a future given the 8+8 model. More general application
gateways performing firewall functions or "intranet bridges"
providing crypto-tunnels between the protected interior of two Sites,
however, are altogether another matter.
15. General Comments
While some of 8+8 is something of a radical departure from IPv6 as we
currently know it, in general it relies deeply on all the IPv6
underpinnings which contribute so much to the attractiveness of IPv6:
Neighbor Discover, all the dynamic configuration machinery designed
to make renumbering palatable even using "provider-based addressing",
and the flexibility of the "salami headers" which make tunneling and
security attractive. The general forwarding operations based on
longest-match-under-prefix-mask and the policy-based routing
machinery of BGP5/IDRP are also simply assumed. All of these will
need a tweak or two based on this proposal and it is beyond this
author to do all the analysis required to identify every such tweak
needed, so it will be up to the community to analyze this proposal
O'Dell v2.21 [Page 19]
Internet-Draft 8+8 for IPv6 1996/10/22 05:58:54GMT
and if embraced, look at all the related machinery which is touched
in some subtle manner.
This document has presented both an outline and the deep ideas behind
an 8+8 proposal, and the author believes it has addressed the "hard
problems" to the point it can convince the reader of the viability,
and indeed the merits of this approach. The routing scaling problems
going forward require the kind of flexibility afforded by this
approach. Once the 8+8 partitioning of the address is accomplished,
we are freed to tinker with the routing and forwarding machinery in
ways which cannot be achieved nearly as readily as with a monolithic
16-byte address.
16. Closing Comments
This document presents a model which has been under construction by
the author since before Fall of 1995, at least. Conversations with a
great many people have contributed to the design presented in this
document. A skeletal version of this proposal first appeared in some
email from Dave Clark of MIT who planted the seed and provided the
monicker "8+8". A great many others have contributed ideas and
observations, all of which went into the stew pot for the synthesis
contained here. While it is impossible to mention all of them, a few
deserve special mention as having provided comments on drafts or
otherwise have significantly influence the thinking contained herein:
Vadim Antonov, Ran Atkinson, Scott Bradner, Brian Carpenter, Noel
Chiappa, Steve Deering, Sean Doran, Joel Halpern, Christian Huitema,
Tony Li, Peter Lothberg, Louis Mamakos, Radia Perlman, Yakov Rekhter,
Paul Traina. And a special thanks to all those folks in the IPng
working groups who contributed to the foundation which is IPv6.
17. Security Considerations
Almost certainly lots of them.
18. Author's Address
Mike O'Dell
UUNET Technologies, Inc.
3060 Williams Drive
Fairfax, VA 22031
voice: 703-206-5890
fax: 703-206-5471
email: mo@uu.net
O'Dell v2.21 [Page 20]