home *** CD-ROM | disk | FTP | other *** search
-
- Performance Monitoring
- -or-
- "I wanna fix it, is it broke?"
-
- Skip Hansen, WB6YMH
- Harold Price, NK6K
-
- Presented at the 6th ARRL Computer Networking Conference
- Redondo Beach, California, August 1987
-
-
- Abstract
-
-
- Much of the performance information on Amateur Packet Radio is
- anecdotal and ephemeral; a subjective and non-detailed account
- usually limited to a gross statement of "goodness" or "badness",
- which is neither well documented nor long remembered. While
- there are several papers which describe the expected performance
- of CSMA-type systems, there is little actual data about the live
- amateur packet system.
-
- The authors discuss the need for accumulating performance data
- and describe work in progress to supply performance measurement
- software using a C program and a TNC with KISS software.
-
-
- 1. Why Performance Monitoring?
-
- Big changes are coming in amateur packet radio. In early 1987,
- most of the amateur packet network was based exclusively on AX.25
- and digipeaters. By the end of 1988, if not sooner, much of the
- packet world will be made of up a conglomeration of NET/ROM,
- TEXNET, TCP/IP, and other systems interconnecting 40,000 AX.25-
- based users. Each will be implemented and installed by
- packeteers eager to make the network better than it was before.
-
- Each system contains a myriad of trade-offs and compromises.
- Each system has several tuning knobs which can be used to
- modify the way it operates, affecting both local user
- performance and global network performance. In many cases, these
- knobs will be cranked by people with no data on how things are
- running and therefore no way to tell if anything got better. In
- other cases, the knobs will be tuned to optimize local
- performance, to the undetected detriment of the rest of the
- network.
-
- Performance data is vital to a local network. It is needed
- before the current network can be tuned, and it should be
- available to those who will help specify the next network. Put
- simply, if you don't know what you have now, how will you know
- if what you get next is any better?
-
-
- 1.1 We've already missed one chance.
-
- We've already missed one chance to monitor a major change, and in
- California, we've missed a second. The first version of AX.25
- did not use the Poll/Final facility of LAPB. In that version, if
- an acknowledgment of a data frame was not received, the data
- frame was re-transmitted. If multiple data frames were
- outstanding, only the first one was re-sent. In the second
- version of AX.25, the poll/final facility was implemented. In
- AX25v2, if a data frame is not acknowledged, a "poll" is sent
- out, soliciting a new acknowledgment. If that ack does not
- indicate that the data frame was received, the data frame is then
- retransmitted, otherwise transmission continues with new data
- frames.
-
- Any change to a protocol like the one described above entails
- some cost. Whether it is the effort involved in updating and
- distributing new software, or the trek to a snowed in
- mountaintop to swap ROMS, some of our limited people resources are
- expended. In the poll/final update, was a improvement in network
- performance obtained that in some way offset the effort involved in
- implementing it and updating the user base?
-
- Unfortunately, we'll never know. Since there was no network
- performance data before the change, and none was taken after,
- there is no way to tell. Our only indication is indirect; one
- of the original major proponents of the change to poll/final is
- now suggesting that poll/final not be used in some cases. [1]
-
- For future changes, we must do better.
-
-
- 1.2 NET/ROM
-
- In California, the old digipeater backbone which connected
- northern California, southern California, and Arizona has been
- largely supplanted by NET/ROM nodes. We had no data showing the
- performance of the old system, and we have no data on the
- performance of the new system. It is therefore difficult to
- measure the improvement.
-
-
- 2. Field Experience vs. Theoretical Predications.
-
- There is a large amount of literature on the topic of packet
- switching systems, and on packet radio. Some is quite accessible
- to the average amateur, one networking textbook in particular, by
- Tanenbaum [2], has been cited so often that it is stocked by
- local amateur radio stores. There is little written, however, on
- packet as it is practiced in the amateur radio world. In most
- cases, if you notice the discussion leaning toward the way we do
- it, you find it given as an example of the wrong way. Actually,
- the word "wrong" is seldom used, "less optimal" is more common.
-
- Much of the non-amateur networking experience of those who make
- up the amateur packet radio community is in the area of local
- area networks (LANs). Although there are a great many common
- problems and solutions between commercial LANs and the amateur
- packet network, there is a danger in assuming calculated
- performance parameters for the former have relevance in the
- later. Unfortunately, there is a tendency, with a lack of actual
- data, to use predicted LAN data in design and implementation
- discussions as if it were gospel.
-
- A LAN, as discussed in Tanenbaum [2] page 286, generally has three
- distinctive characteristics:
-
- 1. A diameter of not more than a few kilometers.
- 2. A total data rate exceeding 1 Mbps.
- 3. Ownership by a single organization.
-
-
- Although (1) is of importance only as it relates to propagation
- delay for very high data rates, (2) and (3) are worthy of note.
- The standard data rate in 1987 is still 1200 baud. There are
- 56kbps modems being beta tested now, but that is still only 6% of
- 1 Mbps. Ownership by a single organization is also something
- that is unusual in the amateur radio network. Item (3) tends to
- lead to either a homogeneous set of network hardware, a common
- set of goals, or at least a common forum for discussing those
- items. In the amateur world, users and implementors in northern
- California, Southern CA, and Arizona don't get together very
- often. That's another advantage of (1), in Los Angeles, one node
- can cover a area with a diameter of 200 miles.
-
- Many of the studies done on LAN performance make assumptions that
- are not valid in the amateur environment. One study, for
- example, from [2] page 289 assumed:
-
- o All packets are of constant length
-
- o There are no errors, except those caused by collisions.
-
- o There is no capture effect.
-
- o Each station can sense the transmissions of all other
- stations.
-
- None of these assumptions hold for our current environment.
-
- Another difference between our network and more commonly modeled
- networks is in the large number of autonomous stations we have on
- the network, and the large number of different traffic patterns
- running simultaneously. During the three days that data was
- gathered for this paper, 371 transmitters were on the air at some
- time in southern California. The peak number of active
- transmitters on a single 1200 baud frequency in a single five
- minute interval was 42. Most modeled networks have higher baud
- rates and/or low data throughput, and assume traffic is moving
- between a large number of outlying stations and a central
- station.
-
- Again, while there is much value in reading and modeling, we
- should make the attempt to measure what we have; both to feed the
- result back into the models, and to establish a base against
- which future modifications can be judged.
-
-
-
-
- 2. The Current State of Affairs.
-
- There appears to be only one kind of monitoring being done in
- amateur packet radio today. The two most common BBS systems, by
- W0RLI and by WA7MBL, both produce a log of BBS activities. An
- analysis program produces a report for the BBS operator of the
- number of connects from users and the number of messages
- forwarded, among other items. While this gives a BBS operator
- some idea of his local usage patterns, it does little to
- described total network activity, or even the throughput the BBS
- experiences.
-
- For global network performance we are left with anecdotal
- evidence, e.g., "01 really stinks tonight" (translation:
- performance is less than expected), and "I had no problem with 01
- today" (translation: I'm retired and was on at 10:00am).
-
- For local user performance, we get "I can talk to Utah all night
- long", and "I haven't been able to connect up north all week".
- Obviously, we need something better.
-
-
- 3. It's not easy
-
- There are two ways of looking at network performance, one is from
- the network's point of view, the other is from the user's point
- of view. In the first case we are interested in how the channel
- is performing, in the simplest view, how many bytes of data it is
- carrying. Is the network carrying a large number of user bytes,
- or is most of the capacity going to overhead or retries? Are we
- losing data to collisions, or to bad RF paths?
-
- In the second case, the user's point of view, the questions are
- more toward what level of service an individual user is getting
- from the network. Is the response time from distant locations
- adequate? Do many connections time-out? Are some destinations
- unreachable due to congestions or path failures?
-
- There are several ways of acquiring performance data. One is to
- have each user station collect it. As updating 40,000 user's is
- a non-trivial exercise, we've chosen another route. A specialized
- monitor station sits at a central place and looks at all the
- activity on the channel. Unfortunately, it isn't easy to answer
- any of the questions from a third party monitor station. Some
- of the problems are discussed below.
-
-
- 3.1 The problem is, it's Radio.
-
- In most wire based, broadcast-type LANs, a monitor program can
- make the assumption that if it heard a packet, everyone else in
- the LAN heard the packet. More importantly, if it didn't hear a
- packet, no one else did either. Even if the LAN is relaying data
- between two other LANs, it is at least certain that for data
- originated on the LAN or destined for the LAN, the monitor has a
- high probability of having the seen the same data as the other
- stations on the LAN. In the amateur packet network, due to
- hidden terminals, the FM capture effect, and propagation, all
- stations do not hear the same packets.
-
- If the monitor station heard all packets, it could easily follow
- the state of all connections on the LAN. For connection oriented
- protocols like AX.25 and TCP, and providing the monitor has been
- up as long as the other stations on the LAN, the monitor can tell
- how long a connection has been in place based on the circuit
- start and end protocols. In the amateur radio case, the monitor
- station can not be certain that it heard all packets. It may
- miss a circuit startup or end. It must instead be prepared to
- infer that a connection exists because it sees data flowing, or
- that a circuit has closed because it has seen no data for an
- interval of time. This will add uncertainty to data gathered in
- an RF environment but it does not invalidate the entire effort.
-
- Although collisions can be directly detected on a wire LAN, they
- can not be as easily detected on radio. Due to the capture
- effect, a stronger FM station will completely override a weaker
- station such that stronger packet is received without error,
- even though two packets were being transmitted at the same time.
- A collision may be inferred if the received packet is seen again.
-
- Some tasks then become exercises in gather as much information as
- possible, and then making an educated guess. Still, this is
- better than no data at all.
-
-
- 3.2 Users are Easy to Replace.
-
- It is somewhat easier to gather user oriented data, e.g., does a
- path to station X exist at this time, or what is the round-trip
- delay for packets between Los Angeles and Salt Lake City. The
- monitor station can actually be a user and directly measure these
- values.
-
- While data can be gathered about the performance of the channel at a
- specific time in this way, this alone will not supply information
- about the global network status at the time the measurement was
- taken. To be able to draw a meaningful conclusion from the data,
- aside from variable X was equal to Y at time T, other information
- is needed, such as the number transmitters on the air, and the
- number of other packets on the channel. In sort, both types of
- monitoring must be performed, direct measurement of user
- performance and global network measurement.
-
- 4. Monitoring Software
-
- The software currently under development by the authors addresses
- the problem of global network monitoring. Other types of
- monitoring will be added in the future.
-
- In this early version of the software, we are attempting to
- determine what sorts of questions can be answered by a program
- which listens to a channel and takes note of the packets it hears.
- Some questions, such has how many total bytes are being
- received at the monitor site, how many transmitters are seen, how
- many beacons are heard, are easy to answer.
-
- A much more difficult question is "How many times does the
- average forwarding BBS send a 20k file before it goes all the way
- without timing out?" The type of information we're collecting,
- and the type of questions that can be answered, are discussed
- below.
-
-
- 4.1 Questions to Answer
-
- There are two basic questions which are reasonably easy to answer.
- One is "What is the efficiency of the channel", the other is "How
- many users does the channel support".
-
- We have chosen to define efficiency as the ratio of the number
- of unique bytes of user data on the channel verses the total
- number of bytes on the channel. "Unique data bytes" is our term
- for actual user data not including frame overhead, retransmitted
- copies, or digipeated copies. For example, if the string "hello"
- is entered, digipeated once, not acked, retransmitted,
- redigipeated, and acked, the total number of bytes on the channel
- would be 168, the number of unique data bytes is 5, an efficiency
- of 2.9%. If 256 user bytes are sent and directly acked, the
- efficiency is 88%.
-
-
- To keep statistics on each user of the channel, we store pairs of
- Source and Destination calls from the frame header. The pair is
- called a circuit. A normal two-way connection would consist of
- two circuits. If NK6K and WB6YMH were connected, one circuit
- would be (TO:NK6K,FROM:WB6YMH), the other circuit would be
- (TO:WB6YMH,FROM:NK6K). Statistics for each circuit are maintained
- separately.
-
- In addition to two basic questions, we wanted to be able to determine
- the number of digipeaters the circuit used, what the average size
- of a data frame was, the number of RNR (input blocked) frames
- transmitted, and similar questions. Since this required looking
- into the control fields of the frame, the standard TNC interface
- was unsuitable.
-
-
- 4.2 KISS
-
- We chose the KISS TNC interface to give us access to all fields
- of the frame. KISS sends the entire frame, minus the checksum, to
- the terminal port using an async framing format. The KISS
- interface has been implemented on the TAPR TNC 1, the TAPR TNC 2
- and clones, and on the AEA PK-232. The KISS software for the TNC
- 2 is included with the KA9Q TCP/IP package.
-
- There are no modifications required to the KISS code for use in
- this application.
-
-
- 4.3 Software Design
-
- The current implementation of the monitor package consists of
- three programs.
-
- o STATS.EXE - This program monitors the received frames and
- accumulates data, periodically dumping the data into a log file.
- STATS also displays the addresses, data, control fields, and a
- "retry" flag in real-time as frame are received. NET/ROM and
- TCP/IP control fields are also displayed.
-
- o REPORT.EXE - This program massages selected data from the log
- file into a form suitable for passing to a plotting program. The
- plotting program is not included.
-
- o AVERAGE.EXE - This program massages the output of REPORT,
- combining and averages the records into larger intervals of time.
- This can result in clearer plots.
-
-
- 4.3.1 STATS.EXE
-
- STATS collects data over a five minute interval, storing it into
- several different tables. These tables are then written into the
- log file at the end of each interval, along with a time stamp
- record. The tables are summarized below.
-
-
- Digipeater Data.
-
- The total number of packets and bytes heard from a digipeater is
- stored, along with the call of the digipeater.
-
-
- Frequency Data.
-
- Totals on bytes and packets heard on the channel without regard
- to source are maintained. Packet are also counted by length into
- five buckets: 32, 64, 128, 256, and greater than 256 bytes. The
- total number of ticks of the 18.2 Hz clock when the data carrier
- detect (DCD) line was high are recorded, as are the number of
- ticks when DCD was low.
-
-
- Circuit Data.
-
- Several items are stored for each circuit, or TO:/FROM: pair.
- This includes the number of digipeaters used, the Protocol ID
- Byte (PID) of the last I frame received in the interval, the
- total number of packets and bytes received, the number of unique
- packets and bytes received, and the number of packets and bytes
- ignoring those heard from multiple digipeaters. Also included is
- the number of unique frames heard of each frame type (sabm, ua,
- etc.), the number of frames with POLL, and the number of frames
- with FINAL. The number of I frames heard is also counted into
- five buckets based on the data size: 32, 64, 128, 256, and
- greater than 256 bytes.
-
- As an indication of the difficulty of accurately determining the
- status of a frame, the algorithm used to determine uniqueness is
- described below.
-
-
- Uniqueness
-
- Depending on the packet type one of three different algorithms
- are used to test for uniqueness.
-
- I frames are judged to be unique if the N(s) variable matches the
- expected V(s), or if the locally computed checksum of the
- information portion of the current frame does not match the
- checksum of the last frame received with the same N(s). Note
- that the checksum is only used to resolve the ambiguity resulting
- from lost frames. An algorithm based solely on checksums would
- be confused easily by data streams containing identical
- consecutive lines. For example consider the transmission of text
- files containing multiple blank lines separating pages. In such
- cases several consecutive packets would contain identical
- information, a single carriage return, but still be unique.
-
- S and U type frames are judged to be unique if the control field
- of the current frame is different than the control field the last
- S or U frame received. Note that this does not detect retries of
- frames such as multiple SABMs sent because the target station is
- not responding.
-
- UI frames are judged to be unique frames if the checksum of the
- information field of the current frame is different than the
- checksum of the last UI frame which was received.
-
-
- Digipeated frame filtering logic
-
- The various "non-digipeated" counters in the software are
- designed to show the number of times a particular frame appears
- on the channel without regard to multiple retransmissions by
- digipeaters. The "non-digipeated" counters are advanced once and
- only once regardless of how many digipeater hops are observable
- by the monitoring station. This data is used to determine the
- number of retries of a packet without confusing a retry for a
- digipeat.
-
- The software maintains bit maps of observed hops for its use in
- filtering out digipeated frames. A separate bit map of observed
- hops is maintained for UI, S and U frames types as well one bit
- map for each outstanding I frame. There are 9 bits in each map
- which correspond to the originating station plus up to 8
- digipeaters.
-
- A frame is considered to be "non-digipeated" when it is either
- heard for the first time or it is heard from a hop from which it
- had been previously heard. The first condition is met when a
- frame is first transmitted, the second condition is met when a
- frame is retransmitted successive times. If neither case is met
- the frame is a digipeated frame and is not used to increment the
- "non-digipeated" counters.
-
- The digipeat bit map is cleared when either the uniqueness
- subroutine determines the frame is unique or when the digipeat
- filter subroutine determines that the frame is a retransmission.
-
-
- 4.3.2 REPORT.EXE
-
- REPORT produces several output formats. The RAW format displays
- each field in each record. This is useful if a particular
- interval is being examined in detail, or when debugging STATS.
- Several other formats are used to produce data for plotting. One
- report totals all circuit data for an interval. Examples of this
- output are provided later.
-
-
- 4.4 Hardware
-
- As discussed above in the section on KISS, TNC 1 and TNC 2
- clones, and the AEA PK-232 can be used with this software. If the
- DCD ON and OFF times are desired, a jumper must be added.
-
- DCD Jumpering
-
- Since most of the current TNC designs use the DCD signal on the RS-232
- interface as a connect status indicator it is necessary to modify
- the TNC hardware slightly to provide a true modem DCD on the RS-232
- interface. The modification for the TNC 2 and clones is very
- simple, consisting of a single jumper wire. The jumper goes
- between pin 2 of the modem disconnect header (DCD output from the
- modem) and the pin of JMP1 which is NOT connected to +5 volts
- (input to the DCD driver). On the MFJ-1270B artwork the correct
- pin of JMP1 is the one closest to the front panel. The authors
- have not researched modifications to other TNC designs, but it is
- expected the modifications will be similar. It is NOT necessary
- to perform the DCD modification to run the monitoring software,
- it is only necessary if the statistics of DCD activity are
- desired.
-
- Most terminal software used on packet will be unaffected by this
- modification, however most BBS software will require the jumper to
- be removed for normal operation.
-
- The software was developed on an IBM PC/AT using Microsoft C 4.0.
- It should be easily transportable to other systems provided a
- suitable serial port interface is available.
-
- A hard disk is highly recommended. Twenty-four hours of data for
- 145.01 MHz as monitored in southern California produced 500k
- bytes of log file data. This may be reduced, of course, by
- increasing the interval time.
-
-
-
- 5. Examples
-
- We used the STATS program to acquire performance data on all of
- the active packet channels in southern California. The
- monitoring site used for 145.01 MHz was at 700 feet on Palos
- Verdes. On this frequency, the site can "see" 8 NET/ROM nodes.
- During the 24 hours during which 145.01 was monitored, from 00:00
- to 23:59 local time on a Thursday, 105 total transmitters were
- seen.
-
- The data shown in the sample graphs is based on the five minute
- interval data from STATS, which was then processed by REPORT.
- The output of REPORT was then averaged by AVERAGE into 15 minute
- samples. Each point plotted represents the average of three five
- minute intervals.
-
- Figure 1 shows the number of user circuits seen in an interval.
- A "user circuit" is a subset of the total circuits; beacons,
- repeater ids, and other circuits consisting of a small number of
- UI frames have been removed. The peak of data just after midnight
- is caused by forwarded BBS traffic, large broadcast messages such
- as newsletters are restricted to being forwarded between 00:00
- and 8:00 by local custom.
-
- Figure 2 shows the total bytes per minute. Further analysis of
- the data would show the distribution of bytes in the peaks; how
- much is destined for local users, and how much is going
- "overhead", passing through the backbone to other NET/ROM
- locations. The major NET/ROM path through to Arizona is still on
- 145.01, this should change before the summer is out. It will be
- interesting to see what effect moving the backbone will have on
- this graph.
-
- Figure 3 shows the efficiency of the channel, computed as
- discussed above.
-
- Figure 4 is a plot of efficiency vs. the number of user circuits
- on the channel. The distribution on a plot of efficiency vs.
- total packets is similar to this one. The occurrence of low
- efficiency over the entire range of users (and number of packets)
- shows that there are causes of low efficiency other than
- congestion. One interpretation would be that more data is lost
- due to poor RF paths than to collisions. Another would be that
- hidden terminals are causing problems. Further analysis of the
- data, coupled with a knowledge of the geography and stations
- involved, might result in information that could be used to
- improve the network.
-
-
- 6. Other Uses / Future Goals
-
- Once a basic set of data gathering tools and formats has been
- define, the applications are boundless. For example, STATS can
- be used to make improvements to the current 14.109 HF forwarding
- scheme. For example, data gathered during the day on Friday,
- July 29, shows that of the two top stations in terms of the total
- bytes transmitted, shows that one had 30% better efficiency than
- the other. If monitoring was continued, and the trend continued
- over time, it may mean that the less efficient station is trying
- to reach stations beyond its range, or that there are local
- receiver problems. It also means that the monitor station was
- hearing more data frames from the transmitting station that the
- target station was, perhaps the mail between those stations
- should be re-routed.
-
- STATS can be used to check propagation between the monitor
- station and other stations. Figure 5 shows then number of bytes
- received on 14.109 MHz in a 24 hour period. It can also be used to
- infer propagation between other station. For example, if you
- hear a station in Indiana sending packets to Seattle and the
- efficiency is high, then a path must exist between those two
- points, even if you do not hear packets from Seattle at the
- monitor site.
-
- The "unique" subroutine can be used filter retries out of a
- monitored connection as the data of the connection is displayed.
- AEA offers a similar feature on some of its TNCs. STATS will be
- updated in the future to allow the capture of filtered text from
- each circuit into files for later review. This can serve several
- purposes, as a diagnostic aid, a periodic check for intruders on
- the amateur network as required by the FCC, or to satisfy the
- standard urge to "read the mail".
-
- This type of data collection could also assist in message traffic
- analysis, e.g., how many bytes are in the average connection?
- Are most of the BBS messages forwarded on a channel destined for
- users in the local area or are they just passing through?
-
- Currently, STATS monitors at the link-layer level. Higher layer
- protocols such as TCP/IP and NET/ROM add additional complications
- to traffic analysis, primarily in determining the actual
- origination and destination point. Work remains to be done in
- this area.
-
-
- 7. Conclusion
-
- There is much good to be gained from gathering and analyzing
- performance data. It can tell us where we are and suggest where
- we might go. It will also help determine if we like where we've
- gone once we get there. The work discussed here is a start
- toward developing tools to aid in this task. Others are invited
- to participate.
-
-
- 8. Availability
-
- The software described in this paper is available in source form
- from the WB6YMH-2 BBS on 145.36 in southern California. This BBS
- is also available by phone for those not in the local area at
- (213) 541-2503. Updates will periodically be sent to the HAMNET
- BBS on Compuserve.
-
-
- 9. Acknowledgments
-
- Thanks to Craig Robins, WB6FVC, for his help in the preparation
- of this paper. Thanks also to those who have implemented the
- KISS code for the TNC 1 and TNC 2, and the folks at AEA.
-
-
- 10. References.
-
- [1] Karn, P., KA9Q, "Proposed Changes to AX.25 Level 2", informal
- paper circulated on various mail systems and reprinted in the
- July/August 1987 NEPRA PacketEar, the newsletter of the New
- England Packet Radio Association.
-
- [2] Tanenbaum, A., "Computer Networks", Englewood Cliffs, NJ:
- Prentice Hall, 1981.
-
-