home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1997 December
/
Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso
/
drafts
/
draft_ietf_j_p
/
draft-ietf-pktway-protocol-spec-03.txt
< prev
next >
Wrap
Text File
|
1997-09-02
|
103KB
|
2,449 lines
Network Working Group Danny Cohen (Myricom)
Internet Draft Craig Lund (Mercury)
expires in six months Tony Skjellum (MSU)
Thom McMahon (MSU)
and Robert George (MSU)
February 1997
Proposed Specification for the PacketWay Protocol
draft-ietf-pktway-protocol-spec-03.txt
expires August 1997
Status of this Memo
This document is an independent submission. Comments should be
submitted to the PktWay@myri.com mailing list.
Distribution of this memo is unlimited.
This document is an Internet-Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months, and may be updated, replaced, or obsoleted by other
documents at any time. It is not appropriate to use Internet
Drafts as reference material, or to cite them other than as a
"working draft" or "work in progress."
To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the internet-drafts Shadow
Directories on:
ftp.is.co.za (Africa)
nic.nordu.net (Europe)
ds.internic.net (US East Coast)
ftp.isi.edu (US West Coast)
munnari.oz.au (Pacific Rim)
Abstract
PacketWay's goal is to move data from a "Source" (a node on a System
Area Network) to a "Destination" (another node, probably on another
System Area Network) at the high performance available on these SANs.
Sources and Destinations can be physical things (a processor or a
smart memory board). They can also be "logical" things, such as a
group of cooperating processes.
[ B l a n k ]
PktWay-WG <01> PktWay-WG
D R A F T
February 1997
Proposed Specification for the
PacketWay Protocol
------------------
Danny Cohen (Myricom)
Craig Lund (Mercury Computers),
Tony Skjellum (MSU), Thom McMahon (MSU)
and Robert George (MSU)
PktWay-WG
This page....................................1
Cheat-sheet..................................2
Introduction.................................3
Notations....................................4
Part-1: PacketWay EEP Messages.......................5
PktWay Message Structure.....................5
Part-2: PacketWay RRP Messages......................15
The Basic Model.............................16
Node Attributes.............................17
Part-3: PacketWay RRP Message Format................19
RRP Message sub-types.......................19
The Structure of RRP messages...............20
RRP Record Format...........................23
RRP Message Examples........................26
Appendix-A: Enumerations................................31
Appendix-B: Example of the use of RRP for discovery.....35
Appendix-C: Routing Tables..............................43
Appendix-D: Glossary....................................45
Appendix-E: Acronyms and Abbreviations..................47
Please send your comments re this draft to <Cohen@myri.com>.
Cheat-Sheet <02> PktWay-WG
PktWay at a Glance
+-----------------
2 6 24 16 16
PW-Hdr+-+------+-------+--------+---------+--------+--------+--------+--------+
PH1|V| P | Destination-Address | Type-Extension | Packet-Type |
+-+-+---++--------------------------+-+------+--------+-----------------+
PH2| E | PL| Data-Length (8B-words) |h| RZ |0 Source-Address |
+---+---+--------+--------+---------+-+------+--------+--------+--------+
4 3 25 1 7 1 23
2 6 2 6 8 8 8 8 8 8
+--------+--------+--------+--------+--------+--------+--------+--------+
L2RH |vv000000|11LLLLLL| SR01 | SR02 |........|........|........|........|
+--------+--------+--------+--------+--------+--------+--------+--------+
Length
2 6 4 6 8 8 8 8 8 8
+--------+--------+--------+--------+--------+--------+--------+--------+
Symbol|vv000000|1011ssss|ssssssss|ssssssss| Length | data |........|........|
+--------+--------+--------+--------+--------+--------+--------+--------+
<---- Symbol Type --->
2 6 8 8 8 8 8 8 8
Opt'l +--------+--------+--------+--------+--------+--------+--------+--------+
hdr |TCtttttt|LLLLLLLL| data |........|........|........|........|........|
fields+--------+--------+--------+--------+--------+--------+--------+--------+
T: 0=optional, 1=mandatory; C: 0=more OH-fields follow, 1=last OH-field
8 8 8 8 8 8 8 8
RRP +--------+--------+--------+--------+--------+--------+--------+--------+
Record| RTyp | PL | RL |........|........|........|........|
+--------+--------+--------+--------+--------+--------+--------+--------+
RRP-messages: GVL2, L2SR, RDRC, TELL, INFO, HRTO, WRU;
RTyp: ADDR, NAME, CAPA, LADR, SRQR, MTUR;
PktWay-WG <03> Introduction
INTRODUCTION
------------
PacketWay is an open family of specifications for internetworking
high-performance System Area Networks (SANs) and high-performance LANs.
Even though most modern SANs have much in common (such as high rates,
low latency, low BER, being packet networks made of point-to-point links
with flow control, and the usage of source routes), each is an island
upon itself, incapable of direct inter-communications with other SANs.
PacketWay's goal is to "internet" such SANs and high-performance LANs.
The core of the PktWay protocol is its End/End Protocol (EEP) and its
Router/Router-Protocol (RRP). Above the core several extension are
expected to be defined (and implemented), including dynamic resource
and routing discovery, secure-PktWay, and multicast-PktWay.
This part describes the PacketWay EEP (End/End Protocol). Part-2
describes the PacketWay RRP (Router/Router Protocol). Part-3 defines
the format of the RRP packets. Other PacketWay layers, such as the
PacketWay dynamic discovery security, multicast, and a PktWay Server
Layer, will be described in documents to be provided later.
Some basic PacketWay terminology requires explanation. PacketWay
interconnects high-performance System Area Networks (SANs). Each
SAN contains some "nodes". At least one node in each SAN is also
a PacketWay "router", connected to more than one SAN.
PacketWay's goal is to move data from a "Source" (e.g., a node on
a SAN) to a "Destination" (e.g., a node on another SAN). Sources and
Destinations can be physical entities (a processor or a smart memory
board). They can also be logical entities (a group of cooperating
processes). These nodes include sources, destinations, and routers.
Within each instance of PacketWay all nodes have unique 24-bit
PacketWay addresses. A system designer can assign these "PacketWay
Addresses" manually. Alternatively, the optional PacketWay Server
Layer provides a way to assign and discover addresses dynamically.
Throughout this document "address" always means the 24-bit PacketWay
address.
SANs also may have PacketWay addresses, aka SAN-IDs. They are also
24-bit quantities, sharing the address space with the nodes. These
addresses, of SANs and nodes, are unique within each instance of
PacketWay.
To optimize for performance, PacketWay has a data transfer mode that
leverages the native message routing schemes used within the SANs.
This mode uses a "Planned Transfer" paradigm. During the planning phase,
a source collects information on optimal routes to a destination,
expressed in the various native formats of the intervening SANs.
A source later uses this information for low latency transfers to that
destination. In PacketWay, the transfer phase of a Planned Transfer
is called "L2-forwarding." Appendix-B shows an example of the planning
phase.
Introduction <04> PktWay-WG
PacketWay also optionally supports a more traditional data transfer
mode that requires no planning. Such transfers specify the destinations
by their addresses only. PacketWay calls this more traditional
approach "L3-forwarding."
PacketWay packets travel through SANs encapsulated inside the native
packet format of each SAN, by being prefixed with the routing header and
followed by the tail as required by that SAN.
PacketWay packets get to their destinations by Level-2 (L2) forwarding,
Level-3 (L3) forwarding, or a combination thereof.
In L3-forwarding (similar to IP forwarding), the L2-routing through each
SAN is determined by an inter-SAN router upon entering that SAN. The
router prefixes the packet with an L2 routing header (such as a source
route) corresponding to the destination address specified in the packet.
It is a task for that router to determine the L2-routing-header
corresponding to the given PacketWay-address.
In L2-forwarding the source prefixes the packet with all the L2-routing
headers needed along the path to the destination. Each router has only
to get the L2-routing-header from the leading L2RH (L2-Routing-Header
record) that was provided by the source.
PacketWay does not provide Segmentation and Reassembly (SAR).
Therefore, the length of a packet cannot exceed the minimum MTU
(Maximum Transmission Unit) along its path.
PacketWay does not detect errors. It only gathers error detection
information from the SANs and inter-SAN routers that a packet transits.
PacketWay is big-Endian 8B-word based.
NOTATIONS
+--------
8B means "8-byte" (64 bits).
0x indicates hexadecimal values (e.g., 0x0100 is 2^8=256-decimal).
0b indicates binary values (e.g., 0b0100 is 4-decimal).
xxx indicate a field that is discarded without any checking (e.g., padding).
[fff] indicates that fff is an optional field.
All length fields do not include themselves, and therefore may be zero.
PktWay-WG <05> EEP-Msgs
Part-1: PacketWay EEP messages
-------------------------------
The PacketWay MESSAGE STRUCTURE
+------------------------------
PacketWay messages have 5 components, including 3 optional ones:
[1]: [Optional Sequence of L2-Routing-Headers Records (L2RHs) and Symbols]
[2]: EEP Header (16 bytes) (PH)
[3]: [Optional Header fields] (OH)
[4]: [Optional, Most likely: Data Block] (DB)
[5]: [Optional Trailer fields] (OT)
[6]: EEP Trailer (8 bytes) (TAIL)
Re [1]: as explained later, if the 9th+10th bits of a messages are 0b11
then the message starts with an L2RH (or a symbol). If the 9th through
the 12th bits of a message are 0b1011 then this message starts with a
"symbol". The other values of these bits indicate the lack of L2RH and
symbols and that the message begins with the EEP-header.
Re [3]: if the h-bit in the EEP header [2] is 1 then there are optional
header fields. The sequence of these header fields is terminated with a
word whose 2 MSBytes are 0xFF00.
Re [4]: if DL>0, in the EEP header, zero then a Data Block (DB) is
included in this message.
Re [5]: the optional header fields, [3], may indicate that some optional
trailer fields are present after the DB, [4]. The order and the formats
of the trailer fields are defined by the optional header fields.
It is expected that most messages will have Data Blocks (DB), and that
most messages will not have Optional Header fields (OH), nor trailer
fields (OT).
[1], the leading L2RHs and symbols are consumed by the SANs before
reaching the destination which receives only the other components, [2]
through [6]. These parts, [2] to [6], constitute the End/End Protocol
of PacketWay.
TAIL, the EEP trailer, [6] may be modified along the way to the
destination, unlike [2], [3] and [4], which arrive exactly as sent
by the source.
Each PacketWay packet may be first L2-forwarded (zero or more times)
before being L3-forwarded (zero or more times).
Although PacketWay headers and trailers are always in Big Endian order,
the byte order of the Data Block is not defined by PacketWay.
Since all the elements of PacketWay (L2RHs, EEP-headers, optional
fields, data, and EEP-trailers) are always multiples of 8B-words,
it is recommended that PacketWay headers (and data) be aligned on
8B-boundaries.
RRP-Msgs <06> PktWay-WG
[1]: Optional Sequence of L2-Routing-Headers Records (L2RHs) and Symbols
+-----------------------------------------------------------------------
A PacketWay source may specify native routes, by placing the native
routes before the PacketWay Header. The native routes (for all SANs and
LANs beyond the initial one) must appear within a sequence of PacketWay
L2-Routing-Header records (L2RH).
The contents of the L2RH are totally SAN dependent, with the exception
of the first 2 bytes that distinguish this record from an EEP-header and
also provide the Length (L) indicating the number of routing bytes of
that L2RH (not including these 2 bytes).
L is always between 0 and 63. The total number of bytes in the L2RH is
L+2, packed in [(L+9)/8] 8B-words (where the square brackets [] indicate
the integer part of the quantity within).
It's up to each SAN to provide padding, if needed, to fill the L2RH words.
Each L2RH is defined by the entity that will process it. In addition
to routing information per se, it may also include demuxing information
such as a local message-type. For example, over Myrinet it should end
with 0x0300 which is the Myrinet-type assigned to PacketWay.
The L2 header must contain enough information to allow a router to
quickly create any necessary local routing headers and trailers.
PacketWay implementations that support L2-forwarding must document
their unique L2 header requirements.
When a PacketWay message is encapsulated inside any native SAN message
(Paragon or Myrinet, for example), it's up to that SAN to distinguish
between it and its own native packets. This is not a PacketWay issue.
For example, Myrinet uses its Message-Type to recognize PacketWay
messages.
PacketWay-Routers on the boundaries between SANs are asked to forward
packets with either L2 or L3 routings. The former start with an L2RH,
(having both its 9th and its 10th bits set to 1), whereas the latter
start with PacketWay-addresses (with other values for these 2 bits).
FORMAT:
Each L2RH is in the format:
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|11LLLLLL| SR01 | SR02 |........|........|........| xxx | L2RH
+--------+--------+--------+--------+--------+--------+--------+--------+
The first 2 bits are 0b00 for the working version of the protocol.
They may have other values for experimental versions.
The next 6 bits should be all zeroes.
PktWay-WG <07> EEP-Msgs
The next two bits must be 0b11 to indicate that this is an L2RH record.
The next 6 bits are the byte count of the routing information that
starts in the third byte and is followed by as many padding bytes as
needed to fill to the next 8B-boundary. The length of the routing
information is expected to be between 1 and 63 bytes.
This 0b11 was chosen to be consistent with the 0b11 of PktWay-addresses,
as described in [2] below.
EXAMPLES:
An L2RH with an SR with 5 routing bytes:
0b11 L=5 #1 #2 #3 #4 #5 padding
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|11000101| SR01 | SR02 | SR03 | SR04 | SR05 | xxx | L2RH
+--------+--------+--------+--------+--------+--------+--------+--------+
^^ |<---------- routing information ----------->|
An L2RH with an SR with 13 routing bytes:
0b11 L=13 #1 #2 #3 #4 #5 #6
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|11001101| SR01 | SR02 | SR03 | SR04 | SR05 | SR06 | L2RH
+--------+--------+--------+--------+--------+--------+--------+--------+
| SR07 | SR08 | SR09 | SR10 | SR11 | SR12 | SR13 | xxx |
+--------+--------+--------+--------+--------+--------+--------+--------+
#7 #8 #9 #10 #11 #12 #13 padding
Symbols (to be defined later) may be mixed among the L2RHs, before the
EEP-header. Their format is:
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|1011ssss|ssssssss|ssssssss| Length | data |........|........|Symbol
+--------+--------+--------+--------+--------+--------+--------+--------+
<---- Symbol Type --->
[2]: EEP Header (16 bytes) (PH)
+------------------------------
2 6 24 16 16
+-+------+-------+--------+---------+--------+--------+--------+--------+
|V| P | Destination-Address | Type-Extension | Packet-Type |PH1
+-+-+---++--------------------------+-+------+--------+--------+--------+
| E | PL| Data-Length>=0 (8B-words) |h| RZ |0 Source-Address |PH2
+---+---+--------+--------+---------+-+------+--------+--------+--------+
4 3 25 1 7 24
These fields are described below.
RRP-Msgs <08> PktWay-WG
[3]: Optional Header Fields (OH)
+-------------------------------
A PacketWay-message has optional header fields (OH) if the Option-Flag
(h) is set to 1 in the EEP-header.
Each OH is in the format:
+--------+--------+--------+--------+--------+--------+--------+--------+
|TCtttttt|LLLLLLLL| data |........|........|........|........|........| OH
+--------+--------+--------+--------+--------+--------+--------+--------+
The first byte indicates the optional header field type (OH-TYPE).
The first bit, T, of the first byte indicates the processing of this
OH-TYPE:
T=0: Optional (may drop this field if this OH-TYPE is unknown)
T=1: Mandatory (should not process this message if this OH-TYPE is unknown)
The second bit, C, of the first byte indicates that there are more more
trailer fields (i.e., whether this is the last field of this message).
C=0: More Optional header fields follow
C=1: End of Optional header fields group
The other 6 bits of this byte, tttttt, define application-specific
OH-TYPEs.
The second byte is the byte-count of the data for this field that starts
in the third byte, and is padded with as many padding bytes as needed to
fill 8B-words. E.g., L=(0-6) implies one 8B-word. L=(7-14) implies two.
L does not include itself, and can range from 0 to 255.
Example: An Optional Header Field (OH) with a mandatory OH-TYPE and 4
data bytes:
L=4 #1 #2 #3 #4 padding padding
+--------+--------+--------+--------+--------+--------+--------+--------+
|1xtttttt|00000100| data01 | data02 | data03 | data04 | xxx | xxx | OH
+--------+--------+--------+--------+--------+--------+--------+--------+
|<------------- value ------------->|
PktWay-WG <09> EEP-Msgs
[4]: Optional Data Block (DB)
+----------------------------
The DB is free for applications to use in any way. Routers must not
modify this field.
The DB has DL 8B-words, including optional padding (at the end) of PL
bytes. Hence, the number of data bytes is 8*DL-PL. Both DL and PL are
defined in the EEP-header.
The maximum length of the DB is 8*(2^25-1)B=256MB.
[5]: Optional Trailer Fields (OT)
+--------------------------------
A PacketWay-message has optional trailer fields (OT) if so indicated in
an Optional Header field, e.g., an OH field may indicate that a CRC64
is in the OT.
An OT may have just the data for an OH defined above (in the EEP
header), or be a stand alone field in the same format as OH.
The OT-fields are in the order defined by the OHs. For example, if an
OH-field indicating that a CRC32 is in the OT, is followed by another
OH-fields indicating that a CRC64 is in the OT, then the OT with the
CRC32 should be followed by the OT with the CRC64.
[6]: EEP Trailer (TAIL)
+----------------------
The TAIL consists of only the Error Indication (EI) field which is a
single 8B-word.
Routers may start forwarding packets toward their destinations before
detecting transmission errors (wormhole routing). The EI field provides
such routers with a means to append an error indication to the end of a
packet.
An all zero EI value means that no error was indicated.
Any non zero EI value indicates one or more errors.
The packet source will usually initialize the EI field to all zeros.
However, as an alternative example, a memory board may create a packet
with a non zero EI field (EI=1) that indicates that a parity error was
detected by the memory board.
Each router does an arithmetic left shift, on the EI field by one bit
unless its MSbit is 1. Routers that detect transmission errors also set
the LSbit (after the shift) to 1.
This provides the ability to identify which routers have indicated
errors (if the route is known).
RRP-Msgs <10> PktWay-WG
THE DETAILS OF THE EEP-HEADER, [2]
+---------------------------------
Bytes.bits
Version (V) 0.2
Priority (P) 0.6
Destination Address (DA) 3.0
Packet Type Extension (TE) 2.0
Packet Type (PT) 2.0
Endianness (E) 0.4
Padding Length (PL) 0.3
Data Length (DL) 3.1
Options flag (h) 0.1
Reserved (RZ) 0.7
Source Address (SA) 3.0
Version (V) 2 bits
This field is static. Its 2 bits are 0b00 for the working version of
the protocol. These bits should have other values for experimental
versions.
Priority (P) unsigned integer, 6 bits
It is anticipated that some SANs, especially those working in real
time, will want to implement priorities. This field supports such
usage.
All ones is the highest priority, and all zeroes the lowest. Ideally,
packets with higher priority should gain access to contested resources
before packets with lower priority. Implementations may ignore the
Priority field.
Destination Address (DA) 24 bits
This field contains the PacketWay address of the destination.
Addresses are unique within each instance of PacketWay. Nodes
should have addresses assigned to them. The method of assigning
addresses to PacketWay nodes is not specified here.
Examples of potentially addressable PacketWay nodes include: groups
of cooperating processes, an entire MPP, or each of an MPP's many
processors or processes.
All half-routers (as defined in Part-2) must have addresses so that
they can exchange control and configuration packets with other
routers.
The 24-bit PacketWay address space is divided into several segments,
each identified by the most significant bit(s) of the address.
PktWay-WG <11> EEP-Msgs
MSbits | Segment | Count | Range
---------+----------+-------+-------------------
0XXX | Physical | 8M | 0x000000-0x7FFFFF
100X | Unused | 2M | 0x800000-0x9FFFFF
1010 | Logical | 1M | 0xA00000-0xAFFFFF
1011 | Symbol | 1M | 0xB00000-0xBFFFFF
11XX | L2RH | 4M | 0xC00000-0xFFFFFF
PHYSICAL ADDRESS MAP Logical
Segment: Physical Unused ^ Symbol L2RH
/---------------^--------------\ /--^--\ / \ / \ /------^------\
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| . : . | | | | . |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Memory: 0 8M 10M 11M 12M 16M
The 0b11xx was chosen for L2RH to be consistent with the 0b11
indication of L2RHs, as described in [1] above.
LAs, "Logical Addresses", (for broadcast and for multicast groups)
are also in this address space. An address is a "Logical Address"
if its 4 MSbits are set to 0b1010.
Certain RRP messages specify addresses either as a unique address or
as a set of addresses, by (min,max) or by (value,mask).
A few Physical-addresses are reserved:
0x000000 Undefined address (illegal where an address is expected)
0x7FFFFE ("Hey-You!") This address could be used at power up
to address nodes or routers, over point-to-point links.
("If you receive it, it's for you.")
0x7FFFFF (Broadcast) This address is reserved for broadcast
operations which may be added in later versions.
("If you receive it, it's for you.")
Type Extension (TE) 2 bytes
An extension of the following PT field.
Logically, the TE should be after the PT. However, the PT is 8B-word
aligned, easier to process than the TE which is not 8B-aligned. Since
the PT is more frequently used than the TE, it was assigned to the
better aligned field.
RRP-Msgs <12> PktWay-WG
Packet Type (PT) 2 bytes
The intent of the PT field is to provide all the information needed
for demuxing in support of multiple protocol layers. Whereas
traditional protocol layering requires several stages of sequential
demuxing, PacketWay is expected to provide enough information to
support a single combined demuxing (such as in support of zero copy
TCP).
PT values to support popular parallel programming APIs such as MPI
will be defined. The Enumeration Appendix (A1) defines several values
for this PT field.
Some PTs use also the preceeding 2 bytes of the Type Extension (TE)
field for passing PT-specific parameters.
However, layered protocols cannot be ignored. The PT field can also
define data blocks as containing IP, SNMP, ATM, Ethernet, and other
popular layered protocols. The PT will be then used for that purpose,
as done throughout the internet (e.g., "ether-types").
For example, here are PT values a memory board may need:
PT Meaning
--------- --------------------------------------------------------
MEM-WRITE -- Treat the first 8 bytes of the Data Block as a local
memory-address and write the remaining data into memory.
MEM-READ -- Treat the data block as 2 8B-memory-addresses and an
8B-byte count. Generate a return WRITE packet containing
the first address, followed by the appropriate data,
that was read from the second address.
The PT field will also indicate the commands used in the PacketWay
Router/Router configuration and control Protocol (RRP).
We will define a special PT value that specifies that the Data Block
contains an embedded PacketWay message, complete with another EEP
header, and, potentially, prefixed L2-Routing-Headers. This feature
will allow to use L3-routing to an intermediate node, followed by
L2-routing from there to the final destination.
Special Types
RRP - PacketWay's Router/Router protocol (see Part-2).
ERR - Error reporting packet, usually sent to the Source Address
(SA, see below) in response to a PacketWay message that
could not be properly handled, such as "Destination Unknown."
The TE indicates the nature of the error (e.g., UNK) as
defined in the Enumeration Appendix (A4).
PktWay-WG <13> EEP-Msgs
Endianness (E), 4 bits
The idea is that if the SAN interface of the receiving-node detects
Endianness that is different than its own and if the entire Data Block
(DB) consists of N-byte fields, then it may kick in byte-swapping
hardware for N-byte fields, saving much work for the receiving node.
e, the first bit (MSbit) of E, indicates that the DB is in Big-Endian
order (e=0) or in Little-Endian order (e=1). The next 3 bits could
control hardware byte swapping, if any, which assumes that all the
data consists of words of the same length.
e000: don't swap, it's 8-bit data
e001: swap as if all the data is 16-bit words
e010: swap as if all the data is 32-bit words
e011: swap as if all the data is 64-bit words
e100: swap as if all the data is 128-bit words
e101: illegal and reserved for future use
e110: illegal and reserved for future use
e111: illegal and reserved for future use
Pad Length (PL) unsigned integer, 3 bits
The number of padding bytes that were added at the end of the DB
(i.e., from the end of the data to the end of the DB). PL can be
between 0 and 7.
Data Length (DL) unsigned integer, 25 bits
Length, in 8B-words, of the data block (not including the L2RHs,
EEP-header, OH, OT, and TAIL, including any optional padding. Hence,
the net length of the Data Block is 8*DL-PL bytes. The minimum is
zero, and the maximum length is (2^25-1)*8 bytes ~ 2^28 ~ 256 MBytes.
Optional Header-Field Flag (h) 1 bit
This bit is set to 1 if there is one (or more) optional header fields
following the standard 16-byte EEP-header.
Reserved (RZ) 7 bits
This field is reserved for future use. Applications should neither
use it, nor count on others not to use it. In this version it should
be always set to zero (0b0000000).
Source Address (SA) 24 bit
This field contains the physical address of the packet's original
source in the same format as DA. However, unlike DA, the SA must
be a physical address.
Filling in this field is optional. A value of zero means that the SA
is not specified.
Routers may use this field to identify the sender to which error
messages may be returned.
RRP-Msgs <14> PktWay-WG
[ B l a n k ]
PktWay-WG <15> RRP-msgs
Part-2: PacketWay RRP messages
------------------------------
PacketWay is an open family of specifications for internetworking System
Area Networks (SANs). This part-2 of the PktWay specification describes
the RRP (Router to Router Protocol) part of the PacketWay-protocol.
The RRP is built on top of the PacketWay-EEP described in Part-1.
Part-3 defines and discusses the format of the RRP packets.
We introduce some new terminology within this document. A PacketWay
Router always bridges (at least) two SANs. The Router consists of three
parts: the "Half Router" (HR) attached to the first SAN, the HR attached
to the second SAN, and their interconnection.
PacketWay does not define the nature of this interconnection.
However, we believe the PCI Local Bus de facto standard will become
a very popular link.
This document specifies a series of options that allow system designers
to deploy PacketWay routers of varying levels of intelligence. Each
router is considered as a set of interconnected Half-Routers (HRs),
each being a full fledged address-bearing node on some SAN.
There are several implementation levels of PktWay, indicated by a letter
code, which are specified differently for nodes and routers. The higher
the letter code ("A" = lowest), the more interoperability and
adaptability result. System designers may choose the level of
implementation to best suit their needs.
Node implementation levels ("A" being the lowest):
Level-A: Built-in L2 source routes
Level-B: Built-in L3 addresses (dynamic update of first HR)
Level-C: Requesting and receiving dynamic information
Level-A nodes send messages by using L2-forwarding, by specifying
SRs (in L2RHs) that are hard-coded into them, without the ability
to dynamically acquire or modify them.
Level-B nodes have, in addition, the ability to send messages by
using L3-forwarding, by specifying addresses that are hard-coded into
them (without the ability to dynamically acquire them). These nodes can
ask HRs for the best first HR for any destination node (specified by its
address) and for the SR to destination nodes. In addition they can also
handle re-direct messages, telling them which HR to use for given nodes.
Level-C nodes can also locate L3-nodes by asking HRs to provide the
attributes of nodes specified by addresses, names, and/or capabilities.
They also respond to such queries by reporting their own attributes.
Router implementation levels ("A" being the lowest):
Level-A: Forwarding according to L2 source routes
Level-B: Handling L3 addresses, and dynamic first HR (re-direct, etc)
Level-C: Supporting node discovery
RRP-msgs <16> PktWay-WG
HRs can support nodes of the same (or lower) implementation level.
Level-A routers support only L2-Forwarding, and do not support the
planning phase of Planned Transfers. Therefore, nodes which use Level-A
routers must have the necessary native routes hard-coded into them
(e.g., burned into a PROM somewhere).
Level-B routers also support L3-Forwarding, and advise nodes about the
first HR to use for each destination. They add the planning phase of
Planned Transfers (by supporting requests for routes, [GVL2] and
[L2SR]).
Level-C routers help nodes discover (resources by capabilities).
PktWay is designed for the highest implementation levels, but will
interoperate with instances of PktWay using lower implementation levels.
THE BASIC PACKETWAY MODEL
The basic model of PktWay is a set of SANs (System Area Networks), each
with its own conventions and protocols, using a common protocol (PktWay)
for interconnection.
The interconnection between SANs is via PacketWay-routers. A router
between SAN-A and SAN-B is composed of two interconnected processes,
each a node on a SAN, complying with the conventions of their SANs..
These processes are known as HRs ("Half-Routers") or "SAN-interfaces."
These HRs may be implemented by two separate "boxes" with an inter-SAN
communication link between them, or inside a single "multi-homed" box
that has interfaces to both SANs, interconnected via some bus or SAN.
RRP defines (via message structure and behavior) the interactions
between HRs, and between HRs and computing nodes. RRP does not define
the lower level protocols that deliver its messages (over links, or
between processes in multi-homed routers). In particular, RRP does not
define the inter-SAN interconnection links between the HRs -- these are
left for mutual agreements among the implementors. These links are
expected to range from serial fibers to PCI buses. An optional PPP-like
protocol may be defined later for these links.
It is assumed that each HR has a Routing Table (RT) for its own SAN
(aka Local Routing Table, LRT), with (at least) the addresses of all
the nodes, and the source routes to each of them from the HR. This
information could be dynamic or static, even manually configured.
The HRs may (or may not) perform dynamic mapping of their SANs.
It is also assumed that each node, on each SAN/LAN, knows the SR to at
least one HR on its SAN/LAN.
PktWay-WG <17> RRP-msgs
In L2 operation under levels C , when a source node, SA, needs to send
a message to a destination node, DA, it first asks any of the HRs on its
[SA's] SAN for a source route (SR) from HR to DA. That HR would (1)
provide such an SR, or (2) reply with a "re-direct" message, suggesting
to ask another HR which is also on SA's SAN, or (3) report no knowledge
of DA (using the UNK error message).
SA may ask more than one HR for SRs to the same DA and use any algorithm
to choose which of these SRs to use.
RRP does not specify whether (and how) to cache SRs.
In L3 operation, when a source node, SA, needs to send a message to a
destination node, DA, it sends that message to any of the HRs on its
SAN, using L2, expecting L3-forwarding to DA, using DA's PacketWay
address. That HR would either (1) forward the message toward DA, and
possibly return to SA a "re-direct" message, suggesting to use, in the
future, another HR on SA's SAN for DA, or (2) report no knowledge of DA
(using the UNK error message).
Under level C nodes may be located by PacketWay-addresses, names, or
capabilities, but only addresses may be used for routing.
NODE ATTRIBUTES
+--------------
Each node has: Physical Address, Name, Capabilities, and Logical-Addresses
Address (Physical): 3 bytes, flat, unique in this PacketWay
Name: flat, globally unique (e.g., IP address),
arbitrary length
Capabilities: regular GP node, router, PacketWay-server, NFS,
paging server, M/C server, DSP, printer, ....
Some capabilities may need additional parameters
(e.g., SAN-ID for routers, and resolution+colors
for printers).
The capabilities are defined in the Enumeration
Appendix (A5).
Logical-Addresses: a set of (logical) addresses to which this node
requests to listen. Logical addresses designate
multicast and broadcast groups.
The control of the Logical-Addresses (a la IGMP)
is not defined in this document. this will be
designed by the applications that use it (e.g.,
PktWay-multicast).
The management of logical addresses (e.g., JOIN
and LEAVE) is not defined yet.
RRP-Msgs <18> PktWay-WG
[ B l a n k ]
PktWay-WG <19> RRP-Format
Part-3: PacketWay RRP Message Format
-------------------------------------
RRP messages are PacketWay messages with PT="RRP" in their EEP-header.
The EEP-header is followed by some (zero or more) RRP-records according
to their RRP-type, followed (always) by the TAIL which is the EI field.
The RRP-records constitute the DB of the PacketWay-message. They must
be in Big-Endians order, with e=0 in the EEP-header.
The RRP-Type is carried in the TE of the of the EEP-header.
Following are the RRP messages, with their RRP-type:
RRP MESSAGE SUBTYPES
+-------------------
RRP- Impl'n
Type Levels Description
+------- ------ -----------------------------------------------
[GVL2] BC Please give me L2-routes to node (address)
The reply to [GVL2] is [L2SR], [RDRC], or [ERR/UNK].
[L2SR] BC Here are L2-routes to node (address)
[RDRC] BC Re-direct to node (address) via a neighbor HR(address)
[TELL] C Please tell me about node (address, name, capabilities)
The reply to [TELL] is [INFO], or [ERR/UNK].
[INFO] C Info about node (address, name, capabilities, LAs)
[HRTO] BC Which HR should I use for node (address)
The reply to [HRTO] is [RDRC], or [ERR/UNK].
[WRU?] C Who/what-Are-You?
The reply to [WRU?] is [INFO].
RRP also uses the following error messages:
[ERR/UNK] BC Destination Unknown (address)
[ERR/HRDOWN] BC HR Down
[ERR/LKDOWN] BC Link Down
[ERR/GENERAL]ABC General error message
All these messages may be sent from nodes or from HRs, to nodes or
to HRs.
The format of these messages is defined in this part.
The implementation levels are:
Level-A: pre-wired (static) native routing, "MAC"-based operation
Level-B: L3 forwarding (planner transfers), IP-like operation
Level-C: Node discovery (static routing)
RRP-Format <20> PktWay-WG
The RRP records are:
RTyp Description
---- ----------------------------------
ADDR Address
NAME Name
CAPA Capability
LADR Logical Addresses
SRQR Source Route and its Quality (SR,Q)
MTUR MTU (for the previous SRQR)
THE STRUCTURE OF THE RRP MESSAGES
+--------------------------------
The RRP-records are made of one or more 8B-words. In the following the
RRP-type is in [] and its implementation level in (). Each message ends
with an TAIL which is not shown here.
* [GVL2] (BC) Please give me L2-routes from you to node (address)
PH (with [PT/TE]=[RRP/GVL2])
ADDR (address of the node for which SR is requested)
* [L2SR] (BC) Here are L2-routes to node (address)
PH (with [PT/TE]=[RRP/L2SR])
ADDR (address of the node for which SR is provided)
SRQR (SR with Q)
MTUR (MTU for the above SR)
This message may have several (SRQR,MTUR)s, one for each SR.
* [RDRC] (BC) Re-direct to node (address) via a neighbor HR (address)
PH (with [PT/TE]=[RRP/RDRC])
ADDR (address of the destination node for which re-direct is issued)
ADDR (address of the HR to be used for that destination node)
The above addresses are expected to be physical (but they be
otherwise).
PktWay-WG <21> RRP-Format
* [TELL] (C) Please tell me about node (address | name | capabilities)
PH (with [PT/TE]=[RRP/TELL])
ADDR (address of the node for which more information is requested)
or
PH (with [PT/TE]=[RRP/TELL])
NAME (name of the node for which more information is requested)
or
PH (with [PT/TE]=[RRP/TELL])
CAPA (capabilities for which nodes are requested)
This message may have several CAPA's, one for each capability.
[TELL] identifies a node by an address and/or a name and/or
capabilities. If more than one attribute is specified (e.g., an
address and a name) any nodes that meets any of them should be
considered (like an implied OR).
* [INFO] (C) Info about node (address, name, capabilities)
PH (with [PT/TE]=[RRP/INFO])
ADDR (address of the node for which more information is requested)
NAME (name of the node for which more information is requested)
CAPA (capabilities for which nodes are requested)
LADR (Logical-Addresses for the requested node)
This message may have several CAPA's, one for each capability.
For nodes without NAME or LADR, these records are omitted.
[INFO] provides all the known information about that node,
address, name, capabilities, and logical-addresses.
* [HRTO] (BC) Which HR should I use for node (address)
PH (with [PT/TE]=[RRP/HRTO])
ADDR (address of the node for which initial HR is requested)
* [WRU?] (C) Who/what-Are-You?
PH (with [PT/TE]=[RRP/WRU?] and [DA]=0x7FFFFE)
* [ERR/UNK] (BC) Destination Unknown (address)
PH (with [PT/TE]=ERROR/UNK)
XXXX (XXXX of the Destination node for which the requested
information is not available), where XXXX is the ADDR
and/or NAME and/or CAPA of the node(s) about which this
message is sent
RRP-Format <22> PktWay-WG
* [ERR/HRDOWN] (BC) HR Down (or Router-Down)
PH (with [PT/TE]=[ERROR/HRDOWN])
ADDR (address of the HR that is down)
ADDR (the other address of the router that is down)
* [ERR/LINKDOWN] (BC) Link Down
PH (with [PT/TE]=[ERROR/LINKDOWN])
ADDR (address of one end of the link that is down)
ADDR (address of the other end of the link that is down)
* [ERR/GENERAL] (ABC)
PH (with [PT/TE]=[ERROR/GENERAL])
XX (The entire message that caused that error: PH+OH+DB+TAIL)
PktWay-WG <23> RRP-Format
RRP RECORD FORMAT
+----------------
Each RRP-record starts with an 8B-word header as shown below. Its first
byte identifies the record type (RTyp). The second byte is the
Pad-Count byte (PL) indicating the number of padding bytes. The third
and the fourth bytes (RL) are the length (in 8B-words) of the record,
excluding the record header, hence it may be zero. The rest of the
header bytes depend on the record type (RTyp).
+--------+--------+--------+--------+--------+--------+--------+--------+
| RTyp | PL | RL | | | | |Record
+--------+--------+--------+--------+--------+--------+--------+--------+
Some records that have an arbitrary length are "right justified" and
have PL padding bytes before the data. Padding Before Data [PBD].
Some records that have an arbitrary length are "left justified" and
have PL bytes after the data. Padding After Data [PAD].
In either case the total number of data bytes is: (8*RL-PL-4).
Following are the RRP-records. These records are the building blocks
used to construct RRP-messages.
In the following xxx indicate bytes that are discarded, such as for
padding. It is recommended to set them to all-0.
===> [ADDR] Node-Address Record [PAD]
This record specifies either a single address (with AT=1) or a range
of addresses (with AT=2 followed by AT=3, or by AT=4 followed by AT=5).
AT is the "Address-Type".
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | PktWay-Address |
+--------+--------+--------+--------+--------+--------+--------+--------+
or:
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
| "ADDR" | PL=4 | RL=1 | AT=2 | Min-PktWay-Address |
+--------+--------+--------+--------+--------+--------+--------+--------+
| AT=3 | Max-PktWay-Address | xxx | xxx | xxx | xxx |
+--------+--------+--------+--------+--------+--------+--------+--------+
RRP-Format <24> PktWay-WG
or:
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
| "ADDR" | PL=4 | RL=1 | AT=4 | PktWay-Address-Value |
+--------+--------+--------+--------+--------+--------+--------+--------+
| AT=5 | PktWay-Address-Mask | xxx | xxx | xxx | xxx |
+--------+--------+--------+--------+--------+--------+--------+--------+
The address-mask follows the address-value after 4 padding bytes.
The above addresses may be physical or logical.
The address X is specified by an ADDR record if:
if AT=1: X == PktWay-Address
if AT=2,3: Min-PktWay-Address <= X <= Max-PktWay-Address
if AT=4,5: (PktWay-Address-Mask & X) == PktWay-Address-Value
An ADDR-record defines only one PktWay-address (or one range),
unlike an LADR record that may specify multiple addresses and
multiple address-ranges.
If the ADDR record is followed by other records that describe the same
node (such as NAME, CAPA, LADR, SRQR, and MTUR) then the RL of the ADDR
records also covers all these records. All these records apply to all
the addresses specified in this ADDR-record. Needless to say that NAME
is not expected to appear within a record that specifies more than one
address.
Hence, if an ADDR-record with AT=1 has RL>1, or if an ADDR-record with
AT>1 has RL>2, then this ADDR-record includes additional records (such
as CAPA, LADR, SRQR, and/or MTUR) about the specified address(es).
===> [NAME] Node-Name Record [PAD] (e.g., a name with 9 characters: A1..A9):
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
| "NAME" | PL=3 | RL=1 | A1 | A2 | A3 | A4 |Name
+--------+--------+--------+--------+--------+--------+--------+--------+
| A5 | A6 | A7 | A8 | A9 | xxx | xxx | xxx |
+--------+--------+--------+--------+--------+--------+--------+--------+
===> [CAPA] Node-Capability Record [PAD] (e.g., with 9 parameter bytes):
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
| "CAPA" | PL=2 | RL=1 | CC=Cx | P1 | P2 | P3 |cap
+--------+--------+--------+--------+--------+--------+--------+--------+
| P4 | P5 | P6 | P7 | P8 | P9 | xxx | xxx |
+--------+--------+--------+--------+--------+--------+--------+--------+
Byte#4 is the Capability Code, CC, followed by as many parameter bytes
as needed.
PktWay-WG <25> RRP-Format
The capability codes are listed in the Enumeration Appendix (A5).
The number of bytes used by the parameters is 8*RL-PL-5.
===> [LADR] Logical-Addresses Record [PAD] (e.g., 2 logical addresses
and a range of logical addresses):
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
| "LADR" | PL=4 | RL=2 | AT=1 |1010 Logical-Address-#1 |LogAdr
+--------+--------+--------+--------+--------+--------+--------+--------+
| AT=2 |1010 Min-Logical-Address | AT=3 |1010 Max-Logical-Address |
+--------+--------+--------+--------+--------+--------+--------+--------+
| AT=1 |1010 Logical-Address-#2 | xxx | xxx | xxx | xxx |
+--------+--------+--------+--------+--------+--------+--------+--------+
Whereas an ADDR-record defines only one PktWay-address (or one range),
an LADR record may specify multiple addresses (each with AT=1) and
multiple ranges (each with a pair of AT=2,3 or AT=4,5).
===> [SRQR] Source-Route Record [PBD], with Q for that route.
(e.g., a combined SR with 13 bytes and an SR with 4 bytes)
This record carries one, or more, L2RHs (2 in the following example).
1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
| "SRQR" | PL=2 | RL=3 | xxx | xxx | Q |SR+Q
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|11 L=13B| SR01 | SR02 | SR03 | SR04 | SR05 | SR06 |L2RH#1
+--------+--------+--------+--------+--------+--------+--------+--------+
| SR07 | SR08 | SR09 | SR10 | SR11 | SR12 | SR13 | xxx |
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|11 L=4B | SR01 | SR02 | SR03 | SR04 | xxx | xxx |L2RH#2
+--------+--------+--------+--------+--------+--------+--------+--------+
Q (the Route Quality) is an unsigned 16-bit integer. The units are not
defined here. It is assumed that it is monotonic with all-0 being the
best and all-1 the worst. If there is an MTUR (MTU-record) for that SR
it should follow this SRQR record. However, the RL of this SRQR does
not include the RL of the MTUR.
===> [MTUR] MTU record [PBD]:
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
| "MTUR" | PL=0 | RL=0 | MTU (in 8B-words) |MTU
+--------+--------+--------+--------+--------+--------+--------+--------+
The MTU record provides the MTU for the SR defined before (by an SRQR).
The value of 0 means indefinite MTU (i.e., any length is OK).
RRP-Format <26> PktWay-WG
RRP MESSAGE EXAMPLES
+-------------------
Node-S asks HR1 to provide an L2RH to node-X:
==> [GVL2] Please give me L2-routes from you to node-X
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | HR1-Address | "GVL2" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=1 (8B-words) |0| RZ |0 S-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | X-Address |Dest
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
==> [L2SR] HR1 replies with two L2-routes to node-X with Qs and MTUs
(e.g., an SR of 2 L2RHs (of 5+4 bytes), and an SR an L2RH of 3 bytes)
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | S-Address | "L2SR" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=8 (8B-words) |0| RZ |0 HR1-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=7 | AT=1 | X-Address |Addr
+--------+--------+--------+--------+--------+--------+--------+--------+
| "SRQR" | PL=2 | RL=2 | xxx | xxx | Q |SR+Q
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|11 L=5B | SR01 | SR02 | SR03 | SR04 | SR05 | xxx |L2RH
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|11 L=4B | SR01 | SR02 | SR03 | SR04 | xxx | xxx |L2RH
+--------+--------+--------+--------+--------+--------+--------+--------+
| "MTUR" | PL=0 | RL=0 | MTU (in 8B-words) |MTU
+--------+--------+--------+--------+--------+--------+--------+--------+
| "SRQR" | PL=2 | RL=1 | xxx | xxx | Q |SR+Q
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|11 L=3B | SR01 | SR02 | SR03 | xxx | xxx | xxx |L2RH
+--------+--------+--------+--------+--------+--------+--------+--------+
| "MTUR" | PL=0 | RL=0 | MTU (in 8B-words) |MTU
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
==> [RDRC] HR1 redirects Node-S to use HR2 for node-X
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | S-Address | "RDRC" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=2 (8B-words) |0| RZ |0 HR1-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | X-Address |Dest
+--------+--------+--------+--------+--------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | HR2-Address |via-HR
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
PktWay-WG <27> RRP-Format
==> [TELL] Please tell me about Node-X (address | name | capabilities)
This message may have any of the following 3 forms:
If by PacketWay-address:
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | HR1-Address | "TELL" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=1 (8B-words) |0| RZ |0 S-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | X-Address |Addr
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
If by name (e.g., a name with 9 characters: A1...A9):
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | HR1-Address | "TELL" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=2 (8B-words) |0| RZ |0 S-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "NAME" | PL=3 | RL=1 | A1 | A2 | A3 | A4 |Name
+--------+--------+--------+--------+--------+--------+--------+--------+
| A5 | A6 | A7 | A8 | A9 | xxx | xxx | xxx |
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
If by capabilities (e.g., 2 capabilities, C1 with 2 parameter bytes,
and C2 with no parameter bytes):
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | HR1-Address | "TELL" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=2 (8B-words) |0| RZ |0 S-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "CAPA" | PL=1 | RL=0 | CC=C1 | P1 | P2 | xxx |cap
+--------+--------+--------+--------+--------+--------+--------+--------+
| "CAPA" | PL=3 | RL=0 | CC=C2 | xxx | xxx | xxx |cap
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
A "TELL" may specify several nodes, by addresses, names, and
capabilities. Any node that matches any of the specifications will be
included in the reply.
RRP-Format <28> PktWay-WG
==> [INFO] Info about Node-X (address, name, capabilities) e.g., a name
with 9 characters (A1...A9) and 3 capabilities (Cx, Cy, and Cz):
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | S-Address | "INFO" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=7 (8B-words) |0| RZ |0 HR1-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=6 | AT=1 | X-Address | *
+--------+--------+--------+--------+--------+--------+--------+--------+ *
| "NAME" | PL=3 | RL=1 | A1 | A2 | A3 | A4 | *
+--------+--------+--------+--------+--------+--------+--------+--------+ *
| A5 | A6 | A7 | A8 | A9 | xxx | xxx | xxx | *
+--------+--------+--------+--------+--------+--------+--------+--------+ *
| "CAPA" | PL=1 | RL=0 | CC=Cx | P1 | P2 | xxx | *
+--------+--------+--------+--------+--------+--------+--------+--------+ *
| "CAPA" | PL=3 | RL=0 | CC=Cy | xxx | xxx | xxx | *
+--------+--------+--------+--------+--------+--------+--------+--------+ *
| "CAPA" | PL=5 | RL=1 | CC=Cz | P1 | P2 | P3 | *
+--------+--------+--------+--------+--------+--------+--------+--------+ *
| P4 | P5 | P6 | xxx | xxx | xxx | xxx | xxx | *
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
The INFO records aggregate all the nodes that meet any of the attributed
specified in the TELL record. When such aggregation is used, the DL
(data length) in the PH is the sum of the RLs in all the ADDR fields.
(*) The ADDR, NAME, and CAPA records are repeated for each applicable node.
Same also for LADR, SRQR, and MTUR, if any.
If several capabilities are specified in [TELL], any node that has any of
these capabilities should be reported in [INFO].
==> [HRTO] Node-S asks HR1 which HR to use for Node-X.
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | HR1-Address | "HRTO" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=1 (8B-words) |0| RZ |0 S-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | X-Address |Dest
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
PktWay-WG <29> RRP-Format
==> [WRU?] Who/what-Are-You?
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P |01111111|11111111|11111110| "WRU?" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=0 (8B-words) |0| RZ |0 S-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
This is addressed to 0x7FFFFE, the "Hey-You" address.
==> [ERR/UNK] Destination Unknown (address). HR1 tells Node-S that
he does not know about Node-X.
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | S-Address | UNK | "E R R" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=1 (8B-words) |0| RZ |0 HR1-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | X-Address |Addr
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
==> [ERR/HRDOWN] HR Down (2 addresses). HR1 tells Node-S that HR-X is down
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | S-Address | "HRDOWN" | "E R R" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=2 (8B-words) |0| RZ |0 HR1-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | HRX-Address-1 |Addr
+--------+--------+--------+--------+--------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | HRX-Address-2 |Addr
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
HR1 knows 2 addresses of the downed router.
RRP-Format <30> PktWay-WG
==> [ERR/LINKDOWN] Link Down (2 addresses)
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | S-Address | "LINKDOWN" | "E R R" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=2 (8B-words) |0| RZ |0 HR1-Address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | A-Addr |
+--------+--------+--------+--------+--------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | B-Addr |
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
This message reports that the link between A-Addr and B-Addr is down.
==> [ERR/GENERAL] General error: HR1 tells node-S that it (HR1) could
not handle the enclosed message)
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | S-Address | GENERAL | "E R R" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=? (8B-words) |0| RZ |0 HR1-address |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| |Data
|<------The entire message that could not be handled by the sender----->|Data
| |Data
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
This message reports that the enclosed message could not be handled by
its receiver (the sender of this error message).
PktWay-WG <31> Appendix-A
Appendix-A: Enumerations
------------------------
(A1) PacketWay Packet Types
+--------------------------
The EEP header reserves 4 bytes for signaling from the source node
directly to the destination node. They are the PACKET TYPE (PT),
and the TYPE EXTENSION (TE), 2 bytes each.
This list defines values for the PACKET-TYPE (PT) 2B-field. Each
packet-type has its own interpretation of the TE and the h-fields.
2B-Code Packet Type
+---------- ----------------------
0 Illegal
1 RRP
2 Embedded PacketWay Packet
3 MEM-READ
4 MEM-WRITE
Higher level protocols:
21 IP
22 SNMP
23 ATM
Link layer Protocols
50 Ethernet (E10)
51 Ethernet (E100)
52 Ethernet (E1000)
53 Myrinet
54 Fibre Channel
55 RACEway
56 SCI
57 VME
Application level protocols:
81 MPI
82 PVM
Secure Protocols
121 Secure (1)
122 Secure (2)
123 Secure (3)
1,024-2,047 User defined
65,535 ERR (for Error)
More values will be assigned. "Ether-types" should be added with
a pointer to those used by the Internet.
Appendix-A <32> PktWay-WG
(A2) RRP Messages (Type Extensions of PT="RRP)
+----------------------------------------------
RRP-
Type Code Description
+------ ---- ----------------------------------------------------
0 Illegal
GVL2 21 Please give me L2-routes from you to node (address)
L2SR 22 Here are L2-routes to node (address)
RDRC 23 Re-direct to node (address) via a neighbor HR (address)
TELL 24 Please tell about node (address | name | capabilities)
INFO 25 Info about node (address, name, capabilities)
HRTO 26 Which HR should I use for node (address)
WRU? 27 Who/what-Are-You?
GVRT 28 Please give me your RTs
RTBL 29 Here is an RT
Throughout this document the RRP messages are indicated by their
type (e.g., RDRC for re-direct). In actual messages the code is used
(e.g., 2 for RDRC).
(A3) RRP records
+---------------
RTyp Code Description
+------ ---- ----------------------------------------------------
0 Illegal
ADDR 41 Address record for one or many nodes
NAME 42 Node Name record
CAPA 43 Node Capability record
LADR 44 Node Logical Addresses record
SRQR 45 Source Route record and its Quality (SR, Q)
MTUR 46 MTU record (for the previous SRQR)
Throughout this document the RRP records are indicated by their RTyp (e.g.,
ADDR for address). In actual messages the code is used (e.g.,41 for ADDR).
(A4) Error Messages
+------------------
Subtype Code Description
--------- ---- ----------------------------------------------
0 Illegal
UNK 71 Unknown (address)
HRDOWN 72 HR-Down (and the links associated with it)
LINKDOWN 73 Link-Down (between two HRs)
GENERAL 74 General error message
Throughout this document the error messages are indicated by their
subtype (e.g., LINKDOWN for Link-Down). In actual messages the code
is used (e.g., 3 for LINKDOW).
PktWay-WG <33> Appendix-A
(A5) PacketWay Node Capabilities
+-------------------------------
Code Capability Parameters
+--- ------------------------ --------------------------------------
0 Illegal
1 GP Computing Node
2 Router SAN-IDs, 1+3 Bytes each
3 PacketWay Server
4 Network Multicast Server
5 NFS
6 NPS (Paging Server)
7 Floating-point DSP IEEE word-sizes (in bytes), 1B per size
8 Fixed-point DSP word-sizes (in bytes), 1B per size
9 Printer
253 Secure PacketWay HR
254 Multicast agent for its SAN
255 SAN
(A6) Optional Header Fields Types (OH)
+-------------------------------------
The MSbit of the type field (T) is the type-of-type. Its assignment is:
0b0: Optional (may drop this OH if its type, tttttt, is unknown)
0b1: Mandatory (should not process this OH if its type is unknown)
The next bit is the "Completion bit" (C). Its assignment is:
0b0: More options follow
0b1: This is the last option field
The 6 LSbits, tttttt, are the type field. Their assignment is:
0x00: Illegal
0x01: TBD
0x02: CRC32 here
0x03: CRC32 following in the OT (after the DB)
0x04: CRC64 here
0x05: CRC64 following in the OT (after the DB)
0x06: There is an OT (Optional Trailer)
0x07-0x3D: TBD
0x3E: Cryptographic data
Appendix-A <34> PktWay-WG
(A7) Byte Order (Endianness)
+---------------------------
A 4 bit field (E) is used to indicate Endianness, with e being its first
bit (MSbit).
e=0: Big-Endian order
e=1: Little-Endian order
The 3 LSbits indicate the size of data chucks (must be the same for the
entire data block) to allow hardware swapping
e000: don't swap, it's 8-bit data
e001: swap as if all the data is 16-bit words
e010: swap as if all the data is 32-bit words
e011: swap as if all the data is 64-bit words
e100: swap as if all the data is 128-bit words
e101: illegal and reserved for future use
e110: illegal and reserved for future use
e111: illegal and reserved for future use
(A8) Symbol Types (ST)
+---------------------
The format of Symbols is:
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|1011ssss|ssssssss|ssssssss| Length | data |........|........|
+--------+--------+--------+--------+--------+--------+--------+--------+
<---- Symbol Type --->
Code Symbol Type
-------- -----------------
0x00000: Reserved
0x00001: Multicast
0x00002: SCID
PktWay-WG <35> Appendix-B
Appendix-B: Example of the use of RRP (over Myrinets)
-----------------------------------------------------
In this example Node1 on SAN1 (with MTU=16KB) is looking for an
automatic spectral analyzer (CC=X). It uses TELL (s1) to ask its
default router (RTRA1, the half of RouterA connected to SAN#1) which
nodes have this capability. RTRA1 knows about no such node, and replies
with ERROR/UNK (s2) telling Node1 that RTRA1 knows about no such node.
(The extent by which RTRA1 checks with others before sending this reply
is not specified here.)
Failing to find such analyzer, Node1 is looking for a DSP that handles
IEEE floating-point 64-bit data. Node1 (s3) asks RTRA1 to provide the
list of floating-point DSPs that can handle 64bit IEEE data. (s4) RTRA1
provides the addresses of both Node2 and Node3. For its own reasons
Node1 decides to use Node2. (s5) Node1 asks RTRA1 which router to use
for Node2. (s6) RTRA1 suggests to use RouterB. (s7) Node1 uses
L3-forwarding, via Router-B, to verify Node2's capabilities, by asking
Node2 for information about itself. (s8) Node2 provides this
information which Node1 likes. (s9) Node1 asks RouterB for L2RH(s) to
Node2. (s10) RouterB provides the requested L2RH with its MTU of 1,024
8B-words (8KB). Finally, (s11) Node1 starts sending data to Node2 using
L2-forwarding. Similarly, Node2 may ask its default router which HR to
use for Node1 and for L2RH(s) to Node1.
If Node1 had only Level-A implementation then it should have the
combined L2RH from itself to RouterB and from there to Node2 pre-wired,
saving all this message exchange.
+-------+ +--0--+ SAN1 +--0--+ +--0--+
| Node1 +----------3 SW0 1----------3 SW1 1----------3 SW2 1 MTU=16KB
+-------+ +--2--+ +--2--+ +--2--+
| |
RTRA1 *********** +---+---+ *********** RTRB1
* RouterA * | Node2 | * RouterB *
RTRA3 *********** +---+---+ *********** RTRB2
| | |
+-------+ SAN3 +--0--+ +--0--+ SAN2 +--0--+
| Node3 +----------3 SW3 1 3 SW4 1----------3 SW5 1 MTU=8KB
+-------+ +--2--+ +--2--+ +--2--+
The sequence of messages is shown below.
(s1) Node1 sends a [TELL] message asking its default router (RTRA1) to
provide a list of nodes with the capability code X (CC=X). Node1 knows
that RTRA1 is on its network, with SR={2,PW}={2,3,0}, where PW=0x0300 is
the 16-bit Myrinet-type assigned to PacketWay. Myrinet is described
here with absolute addresses.
Appendix-B <36> PktWay-WG
0 1 2 3 4 5 6 7
+-----------------------------------------------------------------------+
| <---- The L2-header needed to get from Node1 to RouterA1 ----> |
| It may be any number of bytes. In this example it is 3 bytes: {2,PW} |
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | RTRA1 | "TELL" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=1 (8B-words) |0| RZ |0 Node1 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "CAPA" | PL=3 | RL=0 | CC=X | xxx | xxx | xxx |Spect
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
This asks for information about nodes with capability-X.
(s2) RTRA1 uses [ERR/UNK] to tell Node1 that no such node is known to RTRA1.
0 1 2 3 4 5 6 7
+--------+--------+--------+--------+--------+--------+--------+--------+
| <---- The L2-header needed to get from RouterA1 to Node1 ----> |
| It may be any number of bytes. In this example it is 3 bytes: {3,PW} |
+---+----+--------+--------+--------+--------+--------+--------+--------+
|00 P | Node1 | UNK | "E R R" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=1 (8B-words) |0| RZ |0 RTRA1 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "CAPA" | PL=3 | RL=0 | CC=X | xxx | xxx | xxx |Spect
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
(s3) Node1 sends another [TELL] message to RTRA1 asking for a list of
floating-point DSPs that handle 64bit IEEE data (CC=7,8).
0 1 2 3 4 5 6 7
+-----------------------------------------------------------------------+
| <---- The L2-header needed to get from Node1 to RouterA1 ----> |
| It may be any number of bytes. In this example it is 3 bytes: {2,PW} |
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | RTRA1 | "TELL" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=1 (8B-words) |0| RZ |0 Node1 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "CAPA" | PL=2 | RL=0 | CC=7 | 8 | xxx | xxx |64-DSP
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
PktWay-WG <37> Appendix-B
(s4) RTRA1 uses [INFO] to provide the addresses and capabilities of
both Node2 and Node3 (the former only 64 bits, the latter both 32 and 64).
0 1 2 3 4 5 6 7
+-----------------------------------------------------------------------+
| <---- The L2-header needed to get from RouterA1 to Node1 ----> |
| It may be any number of bytes. In this example it is 3 bytes: {3,PW} |
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | Node1 | "INFO" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=4 (8B-words) |0| RZ |0 RTRA1 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=1 | AT=1 | Node2 |adr2
+--------+--------+--------+--------+--------+--------+--------+--------+
| "CAPA" | PL=2 | RL=0 | CC=7 | 8 | xxx | xxx |FP-DSP
+--------+--------+--------+--------+--------+--------+--------+--------+
| "ADDR" | PL=0 | RL=1 | AT=1 | Node3 |adr3
+--------+--------+--------+--------+--------+--------+--------+--------+
| "CAPA" | PL=1 | RL=0 | CC=7 | 4 | 8 | xxx |FP-DSP
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
It is possible that Node2 and Node3 are specified by addresses that are
not physical addresses.
For its own reasons Node1 decided to use Node2 and sends [HRTO] to ask
RTRA1 which HR to use for node2.
0 1 2 3 4 5 6 7
+-----------------------------------------------------------------------+
| <---- The L2-header needed to get from Node1 to RouterA1 ----> |
| It may be any number of bytes. In this example it is 3 bytes: {2,PW} |
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | RTRA1 | "HRTO" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=1 (8B-words) |0| RZ |0 Node1 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | Node2 |Dest
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
Appendix-B <38> PktWay-WG
(s6) RTRA1 uses [RDRC] to re-direct to Node2 via RouterB.
0 1 2 3 4 5 6 7
+-----------------------------------------------------------------------+
| <---- The L2-header needed to get from RouterA1 to Node1 ----> |
| It may be any number of bytes. In this example it is 3 bytes: {3,PW} |
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | Node1 | "RDRC" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=2 (8B-words) |0| RZ |0 RTRA1 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | Node2 |Dest
+--------+--------+--------+--------+--------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | RTRB1 |via-HR
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
Node1 knows how to get to RouterB over its SAN.
(s7) Node1 uses [TELL] (still using L3-forwarding via RouterB) to verify
Node2's capabilities, by asking Node2 for information about itself.
0 1 2 3 4 5 6 7
+-----------------------------------------------------------------------+
| <---- The L2-header needed to get from Node1 to RouterB1 ----> |
| It may be any number of bytes. Here it is 5 bytes: {1,1,2,PW} |
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | Node2 | "TELL" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=1 (8B-words) |0| RZ |0 Node1 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | Node2 |Addr
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
PktWay-WG <39> Appendix-B
(s8) Node2 uses [INFO] (via RouterB2, also using L3-forwarding) to provide
more information to Node1 about Node2 than what RTRA1 did.
0 1 2 3 4 5 6 7
+-----------------------------------------------------------------------+
| <---- The L2-header needed to get from Node2 to RouterB2 ----> |
| It may be any number of bytes. Here it is 4 bytes: {1,0,PW} |
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | Node1 | "INFO" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=5 (8B-words) |0| RZ |0 Node2 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=4 | AT=1 | Node2 |
+--------+--------+--------+--------+--------+--------+--------+--------+
| "NAME" | PL=7 | RL=1 | "S" | "u" | "p" | "e" |
+--------+--------+--------+--------+--------+--------+--------+--------+
| "r" | xxx | xxx | xxx | xxx | xxx | xxx | xxx |
+--------+--------+--------+--------+--------+--------+--------+--------+
| "CAPA" | PL=1 | RL=0 | CC=7 | 4 | 8 | xxx |FP-DSP
+--------+--------+--------+--------+--------+--------+--------+--------+
| "CAPA" | PL=3 | RL=0 | CC=5 | xxx | xxx | xxx |NFS
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
Node2 provided more information about itself, than what RTRA1 did, such
as its name, "Super", its ability to handle also 32-bit IEEE floating
point (in addition to 64 bit), and also being an NFS (CC=5).
(s9) Node1 uses [GVL2] to ask RouterB for L2RH(s) from RouterB to Node2.
0 1 2 3 4 5 6 7
+-----------------------------------------------------------------------+
| <---- The L2-header needed to get from Node1 to RouterB1 ----> |
| It may be any number of bytes. Here it is 5 bytes: {1,1,2,PW} |
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | RTRB1 | "GVL2" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=1 (8B-words) |0| RZ |0 Node1 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=0 | AT=1 | Node2 |Dest
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
Appendix-B <40> PktWay-WG
(s10) RouterB uses [L2SR] to provide Node1 with an L2RH from RTRB2 to
Node2, with its Q and MTU. Here it is {3,0,PW} from RouterB to Node2.
0 1 2 3 4 5 6 7
+-----------------------------------------------------------------------+
| <---- The L2-header needed to get from RouterB1 to Node1 ----> |
| It may be any number of bytes. Here it is 5 bytes: {3,3,3,PW} |
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | Node1 | "L2SR" | "R R P" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=0|PL=0| Data-Length=4 (8B-words) |0| RZ |0 RTRA1 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| "ADDR" | PL=0 | RL=3 | AT=1 | Node2 |Dest
+--------+--------+--------+--------+--------+--------+--------+--------+
| "SRQR" | PL=2 | RL=1 | xxx | xxx | Q |SR+Q
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|11 L=4B | 3 | 0 | 3 | 0 | xxx | xxx |L2RH
+--------+--------+--------+--------+--------+--------+--------+--------+
| "MTUR" | PL=1 | RL=0 | MTU=1,024 (in 8B-words) |MTU
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
The MTU in the MTUR above is the lessor of the MTUs of both networks.
The RL (record-length) of the last MTUR-record is included both in the
RL of the preceding SRQR-record and in the RL of the preceding
ADDR-record (since the RL of the SRQR is included in the RL of the ADDR).
(s11) Finally, Node1 starts sending data to Node2 using L2-forwarding.
0 1 2 3 4 5 6 7
+-----------------------------------------------------------------------+
| <---- The L2-header needed to get from Node1 to RouterB1 ----> |
| It may be any number of bytes. Here it is 5 bytes: {1,1,2,PW} |
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|11 L=4B | 3 | 0 | 3 | 0 | xxx | xxx |L2RH
+--------+--------+--------+--------+--------+--------+--------+--------+
|00 P | Node2 |Sensor.SubType=? | "Sensor" |PH1
+---+----+--------+--------+--------+-+------+--------+--------+--------+
|E=3|PL=0| Data-Length=? (8B-words) |0| RZ |0 Node1 |PH2
+---+----+--------+--------+--------+-+------+--------+--------+--------+
| |Data
| <------------------- The sensor data goes here ---------------------> |....
| |Data
+--------+--------+--------+--------+--------+--------+--------+--------+
| 64 zero bits, unless any error was indicated along the path |TAIL
+--------+--------+--------+--------+--------+--------+--------+--------+
E=3 (0b0011) indicates that all the data is 64-bit, in Big Endian order.
PktWay-WG <41> Appendix-B
Again, if Node1 had only Level-A implementation then it would have
pre-wired the combined L2RH from itself to RouterB and from there to
Node2, saving all this message exchange.
All the messages shown in this appendix start with local L2 routing
bytes needed to get across either SAN1 or SAN2 (indicated with "The
L2-header needed to get from ... to ...") which are not L2RHs. The
difference is that these bytes are in front of the packet, exposed to
the local switches, whereas the L2RHs are only exposed to
PacketWay-entities.
These local L2 routing bytes are the actual bytes required by the SANs
and likely to be consumed as the messages traverses the SAN, unlike the
L2RHs that are intact until converted to actual routing bytes.
The L2RHs start with 0b0000000011 followed by the number of routing
bytes in that L2RH, and possibly also by several bytes of padding.
Appendix-B <42> PktWay-WG
[ B l a n k ]
PktWay-WG <43> Appendix-C
Appendix-C: Routing Tables (RTs)
--------------------------------
Using only levels A, B, and C, of PktWay does not require coordination
of how routing tables (local and external/remote) are structured.
However, it is anticipated that in the future dynamic inter-SAN routing,
mapping, and resource discovery will be added to PktWay. This appendix
discusses a recommended structure for routing tables. It is desired
that implementors that are looking only at the Level A-C document, will
not create some arbitrary internal representation for their routing
tables, that will hamper future interoperability. Instead, it is
expected that keeping the future need for common structures for routing
tables will lead to structures that will be easy to interoperable in the
future when the PktWay specification is extended to include dynamic
inter-SAN routing, mapping, and resource discovery.
For that purpose this section includes a discussion of routing tables
which is NOT a part of this specification.
Routing tables provide the information needed for finding SRs to
destinations specified by their addresses. In future levels of the
PktWay specification the RTs will provide means to identify nodes also
by names and/or capabilities.
The RTs are based on "maps" for SANs prepared by "mappers", local
nodes on the SANs with maps (i.e., routing tables, LRTs) of their SANs,
obtained dynamically or statically. The inter-SAN routing process
depends on the exchange of these maps to form local and remote RTs.
The attributes of an RT are:
SN Serial Number of this RT (by RCVF)
SAN-ID ID of the SAN which this RT describes
RCVF List of Received-From physical addresses or SAN-IDs (history)
CSR+Q Common Source Route for the entire RT and its Q
MinMTU Min MTU for this RT (along the above CSR)
Local-RT Node-Structures, for nodes on the SAN specified by this RT
The Local-RT has one or more Node-Structures for each node on the SAN
specified by this RT.
These Node-Structures are of the form:
Address Physical address on this SAN
[Name] Optional
[Capabilities] Optional
[Logical-Address(es)] Optional: The LA(s) to which it listens
SR From the mapper to the node specified by
this structure
Each SR entry (and the CSR, too) contains Q, the quality of the SR, an
unsigned 16-bit integer. The units are not defined yet. It is assumed
that Q is monotonic (sort of analogous to latency, hence additive) with
all-0 being the best and all-1 the worst.
Appendix-C <44> PktWay-WG
Until otherwise defined, let Q be an unsigned-integer in microseconds.
In updating it, its value should be clipped to the maximum value (~64msec).
The CSR has a MinMTU which is the minimal MTU along the entire CSR.
The RCVF is the list of the physical addresses along which this RT was
forwarded. Its entries are either HR-addresses or SAN-IDs.
The purpose of the RCVF is to identify the genealogy of a composite
route. It could be used for preventing routing loops.
The RCVF could have been derived from the CSR, if only the HRs could
parse the CSR and associate HR-addresses with SRs and SAN-IDs with
HR-addresses, which should not be assumed.
Different RTs for the same SAN may be kept. Each RCVF has its own SN.
The Node-Structure (in an RT) has SRs from the mapper (of that RT) to
that node. The CSR is an SR to the same mapper. Hence, by catenating
the CSR to the beginning of the SR in the Node-Structure, an SR is
derived all the way from the local node (where the RT resides) to the
remote node.
Each SAN has a unique SAN-ID, known to the HRs on it. The SAN-IDs share
the PacketWay-address space with the nodes. Hence, a SAN-ID is also a
unique 24-bit physical PacketWay-address (starting with a 0 bit).
PktWay-WG <45> Glossary
Appendix-D: Glossary
--------------------
Address: A unique designation of a node (actually an interface to
that node) or a SAN.
Buddy-HR: HRs are "buddies" if they are on the same SAN.
Cut-Thru: See wormhole.
Destination: The node to which a packet is intended
Dynamic-Routing: Routing according to dynamic information
(i.e., acquired at run time, rather than pre-set).
Endianness: The property of being Big-Endian or Little-Endian
(transmission order, etc.)
Ethertype: A 16-bit value designating the type of Level-3 packets
carried by a Level-2 communication system.
HR: Half-Router, the part of a router that handles one
network only.
L2-Forwarding: Forwarding based on Level-2 (i.e., data-link layer
of the ISORM) information, e.g., the native technique
of each SAN or LAN. Also called "source routing."
L3-Forwarding: Forwarding based on end-to-end Level-3 (i.e., network
layer of the ISORM) addresses. Also called
"destination routing."
MAC: Message Authentication Code.
Map: The topology of a network.
Mapper: A node on a SAN/LAN that has the map and an RT for that
network. It is expected that the mapper dynamically
updates the map and the RT.
Multi-homed Node: A node with more than one network interface, where each
interface has another address.
Node: Whatever can send and receive packets (e.g., a computer,
an MPP, a software process, etc.)
Node structure: A C-struct (or equivalent) containing values for some
attributes of a node.
Planned Transfer: Transfer of information, occurs after an initial phase
in which the sender decides which Level-2 route to use
for that transfer.
Glossary <46> PktWay-WG
RCVF: The "Received From" set includes all the physical
addresses through which an RT was disseminated, starting
with that of the mapper that created that RT.
Re-direct-message: A message that tells nodes which HR should be used in
order to get to a certain remote address (or range of).
Router: The inter-SAN communication device
SAN: System Area Network.
Security Context: A relationship between 2 (or more) nodes that defines
how the nodes utilize security services to communicate
securely.
Source: The node that created a packet.
Source-Route: A Level-2 route that is chosen for a packet by its source.
Symbol: Data preceeding the EEP header of a PktWay message,
interleaving with the L2RHs.
Twin-HR: Two HRs are twins if they both are parts of the same
inter-SAN router.
Wormhole-routing: (aka cut-thru routing) forwarding packets out of
switches as soon as possible, without storing that
entire packet in the switch (unlike Stop-and-forward).
Zero-copy TCP: A TCP system that copies data directly between the user
area and the network device, bypassing OS copies.
PktWay-WG <47> Acronyms
Appendix-E: Acronyms and Abbreviations
--------------------------------------
0bNNNN The binary number NNNN (e.g., 0b0100 is 4-decimal)
0xNNNN The hexadecimal number NNNN (e.g., 0x0100 is 256-decimal)
8B 8 byte (64 bits) entity
ADDR The Address-record of RRP
API Application/Program Interface
AT Address Type
ATM Asynchronous Transmission Mode
B Byte (e.g., 4B)
b bit (e.g., 32b)
BC Byte Count (of parameters)
BER Bit Error Rate
CAPA The capability-record of RRP
CC Capability Code
CSR Common Source-Route
DA Destination Address
DB Data Block
DL Data Length (in 8B words)
DSP Digital Signal Processor
e The MSbit of E
E The Endianness field (in the EEP header)
EEP End/End Protocol
EI Error Indication
GP General Purpose
GVL2 An RRP message, requesting L2 route to a given destination
GVRT An RRP message asking an HR to give its routing tables
h Optional header fields flag
HR Half Router
HRTO An RRP message asking which HR to use for a given destination
ID Identification
IGMP Internet Group Management Protocol
INFO An RRP message providing information about nodes
IP The Internet protocol
ISORM The ISO Reference Model
L Length field (exclusive of itself)
L2 Level-2 of the ISORM (Link)
L2RH Level-2 Routing Header
L2SR Source Route
L3 Level-3 of the ISORM (Network)
LA Logical Address
LADR The Logical-addresses-record of RRP
LAN Local Area Network
LRT Local Routing Table
LSbit Least Significant bit
LSbyte Least Significant byte
MPI Message Passing Interface
MPP Massively Parallel Processing system
MSbit Most Significant bit
MSbyte Most Significant byte
MSU Mississippi State University
MTU Maximum Transmission Unit
MTUR The MTU-record of RRP
M/C Multicast
Acronym <48> PktWay-WG
NAME The name-record of RRP
NFS Network File Server
OH Optional Header field
OH-TYPE The Type of an Optional Header field
OT Optional Trailer field
P The Priority field
PAD Padding After Data
PBD Padding Before Data
PCI The Peripheral Component Interconnect "standard"
PH PacketWay Header
PL Padding Length (always in bytes)
PPP The Point-to-Point Protocol
PROM Programmable ROM (Read-Only-Memory)
PT Packet Type (2B)
PVM Parallel Virtual Machine
PW The Myrinet Packet Type assigned to PktWay (PW=0x0300)
Q Quality (of a path)
RCVF Received-From list, or the Received-From record of RRP
RDRC A re-direct message of RRP
RH Routing Header
RID Record ID
RL Record Length (in 8B-words)
RRP Router/Router Protocol
RT-hd RT (Routing Table) header
RT Routing Table
RTBL An RRP message proving a Routing Table
RTHD The Routing-Table-Header record of RRP
RTyp RRP's Record Type
RZ The Reserved field (in the EEP header)
SA Source Address
SAN System Area Network
SAN-ID The 24-bit PktWay-address of a SAN
SAR Segmentation and Reassembly
SN Serial Number
SNID SAN-ID
SNMP Simple Network Management Protocol
SR Source Route (always at Level-2)
SRQR The Source-Route-and-Q-record of RRP
ST Symbol Type
TAIL PacketWay EEP Trailer
TE Type Extension (2B)
TELL An RRP message requesting information about nodes partially specified
UNK Unknown
V Version
WRU? An RRP message asking its recipient to identify itself
XRT External Routing Table
xxx A padding byte
draft-ietf-pktway-protocol-spec-03.txt
[end]