Internet Info 1997 December

home *** CD-ROM | disk | FTP | other *** search

/ Internet Info 1997 December / Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso / drafts / draft_ietf_a_c / draft-ietf-avt-reconsider-00.txt < prev next >

Wrap

Text File | 1997-07-29 | 63KB | 1,449 lines

Internet Engineering Task Force AVT WG Internet Draft J.Rosenberg,H.Schulzrinne draft-ietf-avt-reconsider-00.txt Bell Laboratories/Columbia U. July 1997 Expires: January 1998 Timer Reconsideration for Enhanced RTP Scalability STATUS OF THIS MEMO This document is an Internet-Draft. Internet-Drafts are working docu- ments of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute work- ing documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference mate- rial or to cite them other than as ``work in progress''. To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. 1 Abstract RTP, the Real Time Transport Protocol, has gained widespread accep- tance as the transport protocol for voice and video on the Internet. It provides services such as timestamping, sequence numbering, and payload identification. It also contains a control component, the Real Time Control Protocol (RTCP), which is used for loose session control, QoS reporting, and media synchronization, among other func- tions. The RTP specification describes an algorithm for determining the RTCP packet transmission rate at a host participating in a multi- cast RTP session. This algorithm was designed to allow RTP to be used in sessions with anywhere from one to a million members. However, we have discovered several problems with this algorithm when used with very large groups with rapidly changing group membership. One problem is the flood of RTCP packets which occurs when many users join a mul- ticast RTP session at nearly the same time. To solve this problem, we present a novel adaptive timer algorithm called reconsideration. We present a mathematical analysis of this algorithm, and demonstrate that it performs extremely well, reducing the congestion problem by several orders of magnitude. We also back up these results with simu- lation. J.Rosenberg,H.Schulzrinne [Page 1] 2 Introduction Internet Draft Reconsideration July 1997 There has recently been a flood of interest in the delivery of multi- media services on the Internet. The growing popularity of Internet telephony, streaming audio and video services (such as those provided by Real Audio) and the Mbone are all indicators of this trend. To support these applications, standards are being developed to insure interoperability. The ITU-T H.323 [1] specification for Internet telephony is gaining widespread acceptance among software vendors. The IETF is developing protocols such as SIP [2] for multimedia ses- sion initiation, and RTSP [3] for controlling multimedia servers on the Internet. Interwoven with all of the above protocols is the Real Time Transport Protocol (RTP) [4]. It is used by H.323 terminals as the transport pprotocol for multimedia; both SIP and RTSP were designed to control multimedia sessions delivered over RTP. Its main function is to carry real time services, such as voice and video, over an IP network. It provides payload type identification so that the receiver can deter- mine the media type contained in the packet. Sequence numbers and timestamps are also provided, so that packets can be reordered, losses can be detected, and data can be played out at the right speeds. RTP was designed to be easily used in multicast conferences. To this end, it guarantees that each participant in a session has a unique identifier called the synchronization source (SSRC). This identifier is carried in each packet, providing applications a way to de-multiplex packets from different users. RTP also contains a control component, called the Real Time Control Protocol (RTCP). It is multicast to the same multicast group as RTP, but on a different port number. Both data senders and receivers peri- odically multicast RTCP messages. RTCP packets provide many services. First, they are used to identify the users in a session. One RTCP packet type, the Source Descriptor (SDES) contains the name, email address, telephone number, fax, and location of the participant. Another, the receiver report, contains reception quality reporting. This information can be used by senders to adapt their transmission rates or encodings dynamically during a session [5]. It can also be used by network administrators to monitor network quality [6]. It could potentially be used by receivers to decide which multicast groups to join in a layered multimedia session; such an application is similar to RLM [7]. Yet another RTCP packet type, the sender report (SR), is used to aid receivers in inter-media synchronization (lip sync), and to indicate transmitted bit rates, among other func- tions. Since RTP was designed for multicast, it was engineered to work well with both large and small sessions. A typical small session might be a teleconference among five business executives, while a typical large session might be an Mbone broadcast of a shuttle launch, where J.Rosenberg,H.Schulzrinne [Page 2] Internet Draft Reconsideration July 1997 group sizes of two hundred listeners have been reported [8]. As the demand for multimedia continues to grow, larger and larger group sizes will become commonplace. It is not difficult to envision Mbone concert broadcasts with thousands of members. It has even been sug- gested that RTP might be the transport protocol of choice for multi- cast distribution of multimedia in future cable networks, where tens of thousands of users might be the norm. The principle difficulty in achieving scalability to large group sizes is the rate of RTCP packet transmissions from a host. If each host sends packets at some fixed interval, the total packet rate sent to the multicast group increases linearly with the group size, N. This traffic would quickly congest the network, and be particularly problematic for hosts connected through low speed dialup modems. To counter this, the RTP specification requires that end systems utiliz- ing RTP listen to the multicast group, and count the number of dis- tinct RTP end systems which have sent an RTCP packet. This results in a group size estimate, L computed locally at each host. The interval between packet transmissions is then set to scale linearly with L. This has the effect of keeping the total traffic in the multicast group constant, independent of the number of group members. The above scaling mechanism works well for small to medium sized groups (up to perhaps a few hundred members). However, it suffers from problems when applied to larger groups, particularly ones whose group membership is dynamic. We have identified three inter-related problems which arise with large, dynamic multicast groups: oCongestion: In many cases, the access bandwidths for users will be small compared to network bandwidths (28.8 kb/s modems, for exam- ple, can now handle multimedia RTP sessions when RTP header com- pression [9] is used). We also anticipate that many multicast RTP sessions will exhibit rapid increases in group membership at cer- tain points in time. This can happen for a number of reasons. Many sessions have precise start times. Multimedia tools such as vat and vic can be programmed to join a session at the instant of its inception. Even without automation, users are likely to fire up their applications around the time the session is scheduled to begin. Such phenomena are common in current cable networks, where people change channels when shows begin and end. Studies have been performed to look at the group membership over time of some of the popular sessions on the Mbone [10][8]. These studies show exactly this kind of step-join behavior. The result of these step joins are inaccuracies in the group size estimates obtained by listening to the group. Each newly joined member believes that they are the only member, at least initially, and begins to send packets at a relatively fast rate. Combined with slow access links, the result is a flood of RTCP reports, causing access link congestion and J.Rosenberg,H.Schulzrinne [Page 3] Internet Draft Reconsideration July 1997 loss. For example, consider an RTP session where the total RTCP rate is to be limited to 1 kb/s. If all RTCP packets are 1 kbit, packets should be sent at a total rate of one per second. Under steady state conditions, if there are 100 group members, each member will send a packet once every 100 seconds, and everything works. How- ever, if 100 group members all join the session at about the same time, each thinks they are initially the only group member. Each therefore sends packets at a rate of 1 per second, yielding an aggregate rate of 100 packets per second, or 100 kb/s, into the group. oState Storage: In order to estimate the group size, hosts must listen to the multicast group and count the number of distinct end systems which send an RTCP packet. To make sure an end system is counted only once, its unique identifier (SSRC) must be stored. Clearly, this does not scale well to extremely large groups, which would require megabytes of memory just to track users. Alternate solutions must be found, particularly for set top boxes, where memory is limited. oDelay: As the group sizes grow, the time between RTCP reports from any one particular user becomes very large (in the example above, with 3000 group members, each would get to send an RTCP packet about once an hour). This interval may easily exceed the duration of group membership. This means that timely reporting of QoS prob- lems from a specific user will not occur, and the value of the actual reports is lost. In this paper, we consider only the first problem, that of congestion. It is our aim to solve this problem with a single mechanism, applicable to large groups and small alike. It is possible to develop solutions which work well for specific applications. For example, RTCP reporting can be disabled completely [11]. This, in fact, solves all three of the above problems. However, many applications require RTCP, and therefore this approach is not a general one. Another alternative is to use hier- archical summarizers. This helps improve convergence time, and may relieve congestion, but it requires special servers to be deployed. It is also not appropriate for small groups, and therefore does not work well as a universal solution to the congestion problem. There has been a small amount of prior work on resolving difficulties with timers in Internet protocols. Most prominent among this work is [12] and [13]. Sharma et. al. consider how to scale soft state timers to varying link capacities and state quantities. Their work considers only the point to point case. In that scenario, any sharp increases in the amount of state to send (which is equivalent to the sharp increases of J.Rosenberg,H.Schulzrinne [Page 4] Internet Draft Reconsideration July 1997 group membership we consider here) are known instantly by the sender, since all of the state resides there. The congestion problem which we treat here arises due to the distributed nature of the system and the multicast interconnect. In that scenario, a rapid change in group mem- bership is represented by a change in group state distributed across many nodes. As such, our work can be viewed as a generalization of their's to distributed multicast groups. In fact, our algorithm for controlling the congestion problem in RTP is applicable to other protocols and systems. An extension to the Service Location Protocol [14] has been proposed [15] which uses the reconsider- ation algorithm to control congestion in the multicast group used to disseminate information on network services. The algorithm is generally applicable to distributed systems where (1) control of bandwidth is desirable, (2) the bandwidth is used to transmit state, (3) the state is used to determine end system transmission rates, and (4) the state is dynamic. These constraints apply to BGP [16], for example, when a route server is used and update rates are to be controlled. The remainder of the paper is organized as follows. In Section 3, we detail the current RTCP packet transmission algorithm. Section 4 describes the desired ideal behavior. Section 5 describes our solution, an algorithm called timer reconsideration, and shows its performance. Section 6 then analyzes the algorithm to provide more insight into its performance. Section 7 discusses the algorithms performance under steady state conditions, and Section 8 summarizes our work. 3 Current RTCP Algorithm Each user i in a multicast group using RTP maintains a single state variable, the learning curve, which we denote by L(t). This variable represents the number of other users that have been heard from at time t. The state is initialized to L(0)=1 when the user joins the group. Each user multicasts RTCP reports periodically to the group. In order to avoid network congestion, the total rate of RTCP reports multicast to the group, summed across all users, is set at 5% of the total mul- ticast session bandwidth (it is assumed in RTP that this quantity is known apriori). We define C as the average interval between arrivals of RTCP packets, across all users, into the group, so that C is the average RTCP packet size divided by 5% of the session bandwidth. To meet this criteria, each user computes a deterministic interval, which represents the nominal interval between their own packet trans- missions required to meet the 5% constraint. This interval is given by : __________________________ In actuality, the RTP specification allocates 75% of J.Rosenberg,H.Schulzrinne [Page 5] Internet Draft Reconsideration July 1997 T_d= (T_ min, C L(t)), whereTmmin is2.5sfortheinitialpacketfromtheuser,and5s for all other packets. To avoid synchronization, the actual interval is then computed as a random number uniformly distributed between 0.5 and 1.5 times Td. The algorithm for sending RTCP packets follows directly. Assume a user joins at time t=0. The first packet from that user is scheduled at a time uniformly distributed between 1/2 and 3/2 of Tmin (which is 2.5 s for the first packet), putting the first packet transmission time between 1.25 and 3.75 seconds. We denote this time as t0. All subsequent packets are sent at a time tn equal to: t_n = t_n-1 + R(alpha) (5,CL(t_n-1)), where we have defined R(alpha) as a random variable uniformly dis- tributed between (1-alpha) and (1+alpha). (A equals 1/2 in the cur- rent algorithm; we generalize because A has a strong impact on tran- sient behavior). A pseudo-code algorithm describing the behavior upon expiration of the interval timer is given in Figure 1. 3.5in new_interval = C * current_group_size_estimate; new_interval = max(new_interval, Tmin); new_interval = new_interval * random_factor; send_packet(); schedule_timer(current_time + new_interval); Figure 1: Current RTCP Algorithm The difficuly arises when a large number (say, N) of users all join the group at the same time. We call this a step-join. Since all users start out with L(t)=1, all schedule their first packet to be sent between t=1.25 and t=3.75, a fixed, 2.5 second interval. The result is a flood of N packets for 2.5 s, many of which are lost if the access bandwidth is low. Since group size estimates are based on the reception of these packets, losing them will continue to cause each user to have a low estimate of the actual group size. This will cause __________________________ the RTCP bandwidth to data senders, and the remaining 25% to listeners. In the remainder of the paper, we assume that everyone is a listener. This simplifies the analysis and simulations, all of which can be trivially extended to include the case where there are senders. J.Rosenberg,H.Schulzrinne [Page 6] Internet Draft Reconsideration July 1997 continued congestion until enough packets get through to make the group size estimates correct. 4 Ideal Behavior The flood of packets caused by the current RTCP algorithm with a step join has both good and bad consequences. Clearly, the congestion which results is not desirable. However, the flood allows the end systems to very rapidly learn about the group sizes and group member- ship, which is desireable. There is a fundamental and unavoidable tradeoff between the convergence time (i.e., the time until the observed group size L(t) equals the actual group size) and the band- width used to achieve convergence. What, then, represents the behav- ior which is desirable? Our approach is to define the ideal behavior as the one where feed- back into the group never exceeds its specified threshold (5% for RTCP). This implies that convergence times will grow as the group sizes grow. However, it is the most social solution, in the sense that it will never congest the network, no matter how large the group sizes become. If we define the ideal behavior as convergence within any amount of time that grows less than linearly with the group size, the result is a protocol that does not scale and can eventually result in congestion. We also consider congestion avoidance to be more important because we expect many users to be connected via low speed dialup lines. In that case, bandwidth is at a premium, and it is in the self-interest of users to make the best use of it. Most users probably consider RTCP feedback much less important than the video or audio data itself, and therefore it is important to keep the feedback below the required 5%. We now state the desired ideal behavior: 1. The learning curve L(t) grows linearly at a rate of C users per second, until it reaches the group size N, at which point it becomes flat, and remains at N. 2. The bandwidth used by all feedback is always equal to C pack- ets per second during the convergence period. 5 Reconsideration Our contribution is a new solution which we call reconsideration. The effect of the algorithm is to reduce the initial flood of packets which occur when a number of users simultaneously join the group. This algorithm operates in two modes, conditional and unconditional. We first discuss conditional reconsideration. J.Rosenberg,H.Schulzrinne [Page 7] Internet Draft Reconsideration July 1997 At time tn, as defined above, instead of sending the packet, the user checks if the group size estimate L(t) has changed since tn-1. If it has, the user reconsiders. This means that the user recomputes the RTCP interval (including the randomization factor) based on the cur- rent state (call this new interval Trime), and adds it to tn-1. If the result is a time before the current time tn, the packet is sent, else it is rescheduled for tn-1+Trime. In other words, the state at time tn gives us potentially new infor- mation about the group size, compared to the state at time tn-1. Therefore, we redo the interval computation that was done previously at time tn-1, but using the new state. If the resulting interval would have caused the packet to be scheduled before the current time, we know that our interval estimate was not too low. If, however, the recomputation pushes the timer off into the future, we know that our initial timer estimate was computed incorrectly, and we delay trans- mission based on our new timer. A pseudo-code specification of the algorithm is given in Figure 2. 4.5in new_interval = C * current_group_size_estimate; new_interval = max(new_interval, Tmin); new_interval = new_interval * random_factor; if ((last_transmission + new_interval < current_time) (current_group_size_estimate vious_group_size_estimate)) send_packet(); schedule_timer(current_time + new_interval); last_transmission = current_time; previous_group_size_estimate = current_group_size_estimate; else schedule_timer(last_transmission + new_interval); previous_group_size_estimate = current_group_size_estimate; Figure 2: Conditional Reconsideration Intuitively, this mechanism should help alleviate congestion by restricting the transmission of packets during the convergence peri- ods, where the perceived group sizes L(t) are rapidly increasing. In unconditional reconsideration, the user reconsiders independently of whether the number of perceived users has changed since the last report time. Thus, the RTCP interval is always recomputed, added to the last transmission time tn-1, and the packet is only sent if the resulting time is before the current time. Clearly, when the group sizes are increasing, this algorithm behaves identically to condi- tional reconsideration. However, its behavior differs in two respects. First, consider the case where we have converged, and group J.Rosenberg,H.Schulzrinne [Page 8] Internet Draft Reconsideration July 1997 sizes are no longer changing. In conditional reconsideration, no timer recomputation is done. But for unconditional, it is redone. Since group sizes have not changed, the deterministic part of the interval remains the same. However, the random factor is redrawn each time. This means that packets will be transmitted when the recomputed random factor is smaller than the previous factor, and packets will be delayed when the recomputed random factor is greater than the pre- vious one. Note that since the random factor is of finite extent (between 1/2 and 3/2), packets are guaranteed to eventually be sent. However, the result is an average increase in the interval between RTCP packets. 4in new_interval = C * current_group_size_estimate; new_interval = max(new_interval, Tmin); new_interval = new_interval * random_factor; if (last_transmission + new_interval < current_time) send_packet; schedule_timer(current_time + new_interval); last_transmission = current_time; else schedule_timer(last_transmission + new_interval); Figure 3: Unconditional Reconsideration The behavior of unconditional reconsideration differs during the ini- tial transient as well. Consider N users who simultaneously join the group at time 0. They all schedule their first RTCP packets to be sent between t=1.25 and t=3.75. The users whose packets were sched- uled earliest (at a time a little bit after t=1.25) will not recon- sider with conditional reconsideration, and will always send their packets. This is because no one else has sent any packets yet, and thus they have not perceived the group size to have changed. In fact, because of network delays, many users may send packets without recon- sidering. Once the first transmitted packet has reached the end sys- tems, conditional reconsideration kicks in, since users will perceive a change in group size only then. With unconditional reconsideration, those first few users do not wait for the first packet to arrive before using the reconsideration algorithm. They will all recompute the timer. Obviously, the group size estimate hasn't changed, but the random variable will be redrawn. For the first few users, the random factor was initially extremely small (that's why they are the first few users to send). In all likelihood, when the factor is redrawn, it will be larger than the initial factor, and thus the resulting inter- val will be larger. This will delay transmission of RTCP packets for those users. As time goes on, it becomes less likely than the new J.Rosenberg,H.Schulzrinne [Page 9] Internet Draft Reconsideration July 1997 random factor will be greater than the initial one. However, by then, any RTCP packets which may have been sent will begin to arrive, increasing the group size estimates for each user. In this fashion, unconditional reconsideration alleviates the initial spike of packets which are present in conditional reconsideration. These arguments are all quantified in later sections. Both modes of the algorithm are advantageous in that they do not require any modifications to the current RTCP protocol structure. In fact, they operate properly even when only a subset of the multicast group utilizes them. As more and more members of a group use the algorithm, the amount of congestion is lessened in proportion. This leaves open a smooth migration path which is absent for most of the other proposed solutions. 5.1 Simulations We ran a number of simulations to examine the performance of the reconsideration algorithms. In our simulation model each user is connected to the network via an access link of 28.8 kb/s downstream (i.e., from the network to the user). We assume upstream links are infinitely fast, since congestion occurs only downstream. This congestion is due to the RTCP reports from all of the other users being sent to any particular user. Multi- cast join latencies are ignored; this is reasonable in protocols such as DVMRP [17] since initial packets are flooded. We assume that the network introduces a delay of D seconds, where D is uniformly dis- tributed between 0 and 600 ms. Each user has a 100 kB buffer on the downstream access link. We assume all RTCP packets are 128 bytes in size. Figures only available in postscript version Figures only available in postscript version Figure 5.1 and Figure 5.1 depict state evolution for a single user when 10,000 users simultaneously join a multicast group at t=0. The figures depict the system with no reconsideration (the current speci- fication), conditional reconsideration, unconditional reconsidera- tion, and the ideal behavior. The graphs are plotted on a log-log scale to emphasize the beginning and complete evolution of the sys- tem. Figure 5.1 depicts the learning curve, and Figure 5.1 shows the integral of r(t), i.e., the total number of packets sent, given that r(t) is the packet transmission rate into the multicast group. Note the burst of packets sent in the beginning by the current algorithm. Exactly 10,000 packets are sent out in a 2.5 s interval. This is almost 3000 times the desired RTCP packet rate. However, this burst J.Rosenberg,H.Schulzrinne [Page 10] Internet Draft Reconsideration July 1997 is reduced substantially by the reconsideration mechanisms. Condi- tional reconsideration causes only 197 packets to be sent over a 210 ms interval, and unconditional reconsideration causes merely 75 pack- ets to be sent over a 327 ms interval. We also observed similar improvements with a variety of different link speeds, delays, and group memberships. We noted that the startup burst with reconsideration was particularly disturbing when network delays were deterministic instead of uni- formly distributed. This is demonstrated in Figure 5.1, which looks at the cumulative number of packets sent when 10,000 users simultane- ously join at t=0, using conditional reconsideration. In all cases, the mean network delay was 300ms, but the distribution varies. Expo- nentially distributed network delays exhibited nearly identical per- formance to a uniform distribution. Later sections will demonstrate that the spike is dependent on the amount of time until the first packet arrives. As the number of users in the step join becomes large, the number of users who send their packets within the first S seconds after t=1.25 grows large for any S. Consider an S much smaller than typical network delays, say 10 mus. As far as computing arrival times at end stations, these packets can be treated as though they were all sent at the same time. The amount of time until the first of these packets arrives at any end system is thus the minimum network delay experienced by all of those packets. If the network delays are exponential, the expected minimum of N exponential random variables goes to zero as N grows. The same is also true for a uni- form random variable. For a deterministic variable, this is not the case; the minimum is always the same. Therefore, the performance is worse for network delays which are fixed. Figures only available in postscript version We have also observed that the reconsideration mechanisms cause a complete pause in packet transmissions after the initial spike. This pause (which we call the plateau effect) lasts for a time propor- tional to the number of packets in the spike. This has both positive and negative implications. On the plus side, it gives network buffers time to clear. However, it also causes the send rate to deviate from our desired fixed 1/C packets per second. The phenomenon occurs because the spike of packets in the beginning causes the system to reconsider, and not send, all packets after the spike. A more detailed explanation of the phenomenon is given in Section 6. How- ever, after the spike and plateau, the packet rate behaves fairly well, sending packets at a nearly constant rate. We also ran simulations to observe performance in linear joins, where groups of users join the system at time seconds apart, for some small . The results are shown in Figure 5.1 and Figure 5.1. Both J.Rosenberg,H.Schulzrinne [Page 11] Internet Draft Reconsideration July 1997 plots depict the cumulative number of packets sent by all users. The simulation parameters are identical to the above cases, except net- work delays are deterministic 300 ms. The first plot depicts condi- tional reconsideration, and the second, unconditional. In all cases, 2500 users join the system, but the rate that they do so is varied. Both plots depict the step join, and joins at a rate of 5000, 2500, and 500 users per second. The plots indicate that linear joins quickly eliminate the initial transient of packets and the plateau period, with the reduction being better for unconditional reconsider- ation. We have done some analysis to examine how the behavior of reconsider- ation changes under linear joins. Our analysis has shown that as soon as the number of users who join, times , exceeds the network delays, the initial bursts in the reconsideration algorithms begin to disap- pear, whereas they remain for the current specification. All other aspects of the system performance (including long term growth of L(t)) are identical to the step-join case. Figures only available in postscript version Figures only available in postscript version 6 Analysis In this section, we present a mathematical analysis of the reconsid- eration mechanism. We first consider the case where there are no net- work delays. This results in a differential equation which describes the learning curve. The analysis also applies to networks with delay, but only models the post-transient behavior of the system. However, this is sufficient to compute the post-transient packet rate and sys- tem convergence times. We then extend this analysis to the case of network delays, and derive expressions which describe the transient spikes and plateaus in the learning curve. We also analytically demonstrate the reasons for improved performance from unconditional reconsideration, which only exists in the presence of network delays. 6.1 No Delay We consider a system where all of the users join the network at the same time, t=0. It is assumed that the network introduces neither delay nor loss, and that access links have infinite bandwidth. The result is that when a user sends an RTCP packet, it is received by all of the users simultaneously at the time it was transmitted. In the model considered here, all users will have exactly the same state (in terms of L(t)) at all times. Thus, we trace state evolution as seen by a particular user. The user estimate has converged when J.Rosenberg,H.Schulzrinne [Page 12] Internet Draft Reconsideration July 1997 L(t)=N, the number of users actually in the group. Whenever a packet is reconsidered, it is either sent, or it is not, depending on whether the newly computed send time is before or after the current time. We can therefore view the reconsideration mechanism as causing any packet to be sent with some probability P. In the most general case, P is a function of the current time t, the time of the last RTCP report, and the number of users observed at t, L(t). For- tunately, the learning curve is only affected by packets which are initial, that is, sent by users which have not yet sent a packet. For all such users, the last report time is initialized to t=0, so that the send probability is a function of t and L(t) only. If we consider some small interval of time, the change in L(t) is equal to the number of initial packets scheduled to be sent during this interval, times the probability of sending a packet in that interval. Based on this, we can immediately write the differential equation: (dL)/(dt) = d(t) P(t,L(t)), where d(t) is the rate of packets scheduled for transmission during some time interval. What remains is the evaluation of the scheduled rate d(t) and probability P(t,L(t)). We first consider the send prob- ability. Consider an initial packet scheduled to be transmitted by a user at time t. Since the number of perceived users, L(t) has surely changed over any time interval, this packet is reconsidered user perceives L(t) other users in the system. It thus calculates a new packet interval, which is equal to: T = R(alpha) (T_ min,C L(t)) SinceCL(t)islargerthanTmmin mostofthetime,weignorethemax operator. Keeping in mind that the previous report time is always t=0, we can immediately write the probability of sending a packet using Equation (1): ll P_ send = (t-(1- alpha) C L(t))/(2 alpha C L(t))& (1 - alpha) C L(t) < t < (1 + alpha) C L(t) __________________________ It is for this reason that we make no distinction between conditional and unconditional reconsideration here J.Rosenberg,H.Schulzrinne [Page 13] Internet Draft Reconsideration July 1997 Figures only available in postscript version The numerator represents the range of times in the interval widow which fall below the current time t, while the denominator represents the total range over which the times for the interval are selected. This is illustrated in Figure 6.1. Note that this probability only makes sense when t is between (1-alpha) and (1+alpha) of CL(t). When t is to the left of the reconsideration window, the probability is zero, and when t is to the right of the window, it is one. An important implication of this equation is that the send probabil- ity is zero when t < (1 - alpha) C L(t) . This places an upper bound on the learning curve; if the learning curve should reach this bound, no initial packets would be sent, and the curve would remain flat until it fell back below this upper bound. We therefore define the maximumlearningcurveLmmax (t)tobe: L_ max(t) = (1)/((1 - alpha) C) t TheactuallearningcurveL(t)isalwaysbelowLmmax (t). The next step is to compute the scheduled rate, which is signifi- cantly harder. In the ideal case, the rate that packets have been scheduled at should equal the number of users in the system, N, divided by the average RTCP interval size perceived by those users at time t, namely CL(t). At any point in time the fraction of packets to be sent which are initial is (N-L)/N. Thus, the scheduled rate of initial packets is roughly given by: d(t)=(N - L(t))/(C L(t)) The curves of Figure 5.1 show that the reconsideration algorithms exhibit linear behavior between roughly t=100 and t=9000 (thus ignor- ing the transient behavior in the beginning few seconds). We there- fore attempt to determine the slope a of this line based on the dif- ferential equation. Substituting L(t)=at into (2): a = (N - L(t))/(C L(t)) (1-(1- alpha) C a)/(2 alpha C a) For small t, L(t)<N, so we can ignore the L in the first term's numerator. Thus: (2 alpha C^2 L(t))/(N) a^2 + a (1- alpha) C -1 = 0 Thus, for large N and small t, L(t)ect the a2 term, and obtain the desired result: a = (1)/((1 - alpha) C) J.Rosenberg,H.Schulzrinne [Page 14] Internet Draft Reconsideration July 1997 Not coincidentally, this is also the slope of the maximum learning curve. The equation indicates, therefore, that L(t) grows at its max- imum rate until the approximation is no longer valid, at which point its growth tapers off. As mentioned previously, the equation for the scheduled rate d(t) is very approximate. We have done some more extensive analysis, and found that a slightly better approximation is given by: d(t) = (N - L(t))/(C (1 - alpha)/(2 - alpha) L(t)) This is of the same form as the previous equation, but tends to model the nonlinearities of the system better. Now, with the density and send probabilities computed, we can write the final differential equation, which is: (dL)/(dt) = (N - L(t))/(C (1 - alpha)/(2 - alpha) L(t)) (t - (1 - alpha) C L(t))/(2 alpha C L(t)) This ODE allows us to compute a numerical solution, which can be com- pared against the simulations. Figure 6.1 shows the learning curve, with 10,000 users joining at t=0, for both analysis and simulation. In the simulation, however, we take into account non-zero delays and finite link speeds; network delays are a deterministic 300 ms, and link speeds are 28.8 kbps. Note that despite this change in assump- tions, the analytical expression still comes extremely close to the experimental for a large portion of the convergence period. In par- ticular, it is very close during the period of linearity of L(t) and less accurate afterwards. In addition, the differential equation does not capture the behavior of L(t) for 0 the experimental curve exhibits the spike and plateau (this is difficult to see in Figure 6.1 because of the x axis scale). We believe that network delays only impact the behavior of L(t) when they are on the order of CL(t). This is somewhat intuitive; the timescale of transmission events for any particular user is CL(t). If network delays are much smaller than this, they are almost instanta- neous as far as sending packets goes, and therefore do not affect the system behavior. It is for this reason that network delays only impact the learning curve during the first minute or so. Figures only available in postscript version With an understanding of the behavior of L(t), we are now in a posi- tion to discuss the real quantity of interest; the aggregate bit rate generated by these sources as they move towards convergence. We call this quantity r(t). Since the integral of this quantity is the total J.Rosenberg,H.Schulzrinne [Page 15] Internet Draft Reconsideration July 1997 number of packets sent, we have, as an immediate consequence: r(t) >= (d)/(dt)L(t) Experimentally, we have observed that r(t) is actually equal to the derivative of L(t) for a large fraction of the time until conver- gence. The reason for this is that the reconsideration mechanism favors packets from users who have not yet sent a packet (initial packets). Consider two packets, both scheduled to be sent at some time t. One is an initial packet, and the other is from a user who has sent a packet previously. For the initial packet, the last report time is at t=0, whereas for the other packet, the last report time is at some time t*, not equal to zero. In the latter case, the bottom edge of the interval window is at t*+C(1-alpha)L(t). Thus, the proba- bility of sending a non-initial packet at time t is: P_ sendold = (t - t^* - C (1 - alpha) L(t))/(2 alpha C L(t)) This quantity is always less than the send probability for initial packets as given in (3). In fact, for small t, L(t) is equal to t/C(1-alpha). Plugging this in to (7), we get that the numerator of the fraction is negative, so the send probability is exactly zero. is exactly equal to the derivative of L(t) while L(t) is linear. We expect it to continue to track the derivative closely even as L(t) tapers off. Once L(t) has converged to N, packets are sent at a rate of 1/C with conditional reconsideration. With unconditional reconsideration, this rate is somewhat less. Therefore, r(t) exhibits a dual-constant behavior; it starts at 1/(1-alpha)C, stays there for some time, then reduces to 1/C, where it remains from then on. The final step is to approximate the convergence time. Unfortunately, the precise time depends on the non-linear regime of L(t), which we cannot capture adequately. However, we can bound the convergence time by assuming linear behavior until L(t) equals N. Since the actual L(t) is less than this curve, the convergence time Tc is easily bounded on the left by: __________________________ Note that plugging in L(t)=t/C(1-alpha) to equation (3) yields a numerator of zero, and thus a probability of zero also. In fact, the send probability is zero only in the limit for N=infty; it is slightly positive for all real cases. This is in contrast to the send probability for non-initial packets, which is exactly zero for finite N. J.Rosenberg,H.Schulzrinne [Page 16] Internet Draft Reconsideration July 1997 T_c >= N C (1 - alpha) This bound grows linearly with the group size, as expected. It is possible to compute an upper bound as well. Consider the last initial packet to be sent. Before it is sent, L(t)=N-1. As long as the send probability is less than one, it is possible that this last initial packet will not be sent. But, according to (3), the send probability is one when t>(1+alpha)CL(t). This means that the last initial packet must be sent as soon as t=(1+alpha)C(N-1). This gives us an upper bound of: T_c <= N C (1 + alpha) 6.2 Modeling Delay and Loss In this section, we consider the reconsideration algorithm in the presence of network delay and link bottlenecks. We compute the size of the spike during the initial transient, and the duration of the plateau. We also demonstrate the superiority of unconditional recon- sideration in reducing these startup effects. The spike and plateau are easily explained. At t=0, all N users join the system. They schedule their packets to be sent between (1-alpha)Tmin and (1+alpha)Tmin. At time (1-alpha)Tmin, packets begin to be sent. Lets say the network introduces a delay of D seconds. This means that no packets will arrive at any end system until time (1-alpha)Tmin+D. During these D seconds, many packets will be sent by end-systems, causing the initial spike of packets. After D seconds, this burst of packets will arrive. This causes a sharp increase in the perceived group size L(t). This, in turn, increases the packet transmission interval, and moves the left hand side of the interval window well beyond the current time, so that Psend=0. The result is a complete halt in transmissions until real time catches up with the left hand side of the reconsideration window. This qualitative description of the system is easily quantified. For a large enough N, the flood of packets starting at time (1-alpha)Tmin will saturate the access links D seconds later, independent of whether conditional or unconditional reconsideration is used. While the links remain saturated, packets arrive at a continuous rate at the link speed, which we denote as m packets per second. We can therefore express the arrival time of the nth packet as: t_n = (1 - alpha) T_min + D + (n)/(m) Since each packet arrival increases L(t) by one, we can set n equal to L(t) in the above equation and solve for L(t): J.Rosenberg,H.Schulzrinne [Page 17] Internet Draft Reconsideration July 1997 L(t) = m(t - (1 - alpha) T_min - D) This flood of packets will cause the learning curve L(t) to advance very quickly, beyond its maximum as given in (4). When the learning curve exceeds this maximum, all sending will stop. Call this stopping time tstop. It can be obtained as the solution to: (1 - alpha) C L(t_stop) = t_stop t_stop = (1 - alpha) T_min + D + ((1 - alpha) T_min + D)/((1 - alpha) C m - 1) We can then plug this into (9) and solve for the number of packets which have arrived at this point, nstop: n_stop = ((1 - alpha) T_min + D)/((1 - alpha) C - 1/m) The next step is to determine the number of packets sent up to this point. This figure differs based on whether the reconsideration mech- anism is conditional or unconditional. We first look at conditional. The number of packets sent consists of two terms. Before the arrival of the first packet (at time (1-alpha)Tmin+D+1/m), all packets sched- uled to be sent are actually sent, since no users have observed a change in the group size (which would activate the reconsideration mechanism). The number of packets sent is then the density of packets scheduled to be sent (which is N/2ATmin) times the amount of time until the first packet arrives. We call this quantity nsenta, and it is: n_senta = (N)/(2 alpha T_min) (D + (1)/(m) ) Once the first packet arrives, reconsideration kicks in, and not all packets will be sent. Each will be sent with some probability, P. Unfortunately, this is not the same probability Psend as defined in Equation 3. That equation ignored the max operator, assuming L(t) was large most of the time. This is not true in the very beginning, where it takes a few packets to increase CL(t) beyond Tmin. We assume that once enough packets have arrived to do this, the result will be to move the left hand side of the reconsideration window ahead of the current time (this is true when D<C). In other words, we assume the left hand side of the reconsideration window is always at (1-alpha)CTmin until tstop. With this in mind, the send probability between the arrival of the first packet, and the stopping of transmission, is given by: P_send = (t - (1 - alpha) T_min)/(2 alpha T_min) J.Rosenberg,H.Schulzrinne [Page 18] Internet Draft Reconsideration July 1997 The number of packets sent is given by the integral of the scheduled packet rate times the send probability: n_sentb = INTEGRAL_(1 - alpha) T_min+ D + 1/m^t_stop d(t) P_send dt Since the density is N/2ATmin during this time period of interest, the number of packets sent is obtained by: n_sentb = INTEGRAL_(1 - alpha) T_min+ D + 1/m^t_stop (N)/(2 alpha T_min) (t - (1 - alpha) T_min)/(2 alpha T_min) dt This integral results in a growth in the number of sent packets as t2 until complete cutoff at tstop. The solution to the integral is: n_sentb = (N)/(8 alpha^2 T_min^2) ( ( ((1 - alpha) T_min + D)/((1 - alpha) C m - 1) + D)^2 - (D + (1)/(m) )^2 ) And the total number of packets sent, using conditional reconsidera- tion, during this transient spike is: n_sent = n_senta + n_sentb These analytical results are compared with simulation in Figure 6.2. The figure displays the cumulative number of packets sent for a step join. For the simulation, 100,000 users join the system at t=0. Net- work delays are deterministic and equal to 300 ms, and link speeds are 28.8 kbps. The plot shows only the initial transient. The linear and then t2 behavior is clear from the simulation. Our approximation for both nsenta and nsentb is quite good. The analysis also predicts that sending will stop at tstop=1.72s, which agrees with the simula- tion. Also note that the number of packets sent is dominated by the nsenta term. Figures only available in postscript version For unconditional reconsideration, the number of packets sent during the transient is different. In the conditional case, the total con- sisted of two parts; one before the arrival of the first packet (as the reconsideration mechanism had not kicked in yet), and one after. In the case of unconditional, we do not need to wait for the arrival of a packet for the mechanism to activate. Therefore, the number of packets sent is given by an equation similar to that for nsentb above. It is the integral of the scheduled rate, times the send prob- ability. In this case, the integral is between (1-alpha)CTmin and tstop, instead of just between the arrival of the first packet and tstop. The number of packets sent for unconditional is therefore: n_sent = INTEGRAL_(1 - alpha) T_min^t_1 (N)/(2 alpha T_min) (t - (1 - J.Rosenberg,H.Schulzrinne [Page 19] Internet Draft Reconsideration July 1997 alpha) T_min)/(2 alpha T_min) dt Solving, we obtain: n_sent = (N)/(8 alpha^2 T_min^2) ( ((1 - alpha) T_min + D)/((1 - alpha) C m - 1) + D)^2 This quantity is small compared to nsenta for conditional reconsider- ation, thus the improved performance. These results are compared with simulation in Figure 6.2. The simulation model is identical to that in Figure 6.2, except unconditional reconsideration is used. As the plot indicates, only the t2 behavior is present here. The total num- ber of packets sent during the transient is much reduced, and reason- ably well predicted by our analysis. Figures only available in postscript version The next step is to determine the duration of the plateau period. Packet sending will start again when the current time catches up with the left hand side of the interval window, which will have quickly advanced to (1-alpha)Cnsent. The time at which this happens, tstart is: t_start = (1 - alpha) C n_sent For conditional reconsideration, if we assume nsent....pproxnsenta, we obtain: t_start = (C (1 - alpha) N)/(2 alpha T_min) (D + (1)/(m) ) The duration of the plateau period itself is given by: T_plat = t_start - t_stop Table 1 lists the values of the parameters derived above for various group sizes. In all cases, A=1/2, Tmin=2.5, C=.711s, and D=300ms. The unconditional mechanism provides clear gains in terms of reducing the number of packets sent during the transient, and the duration of the plateau effect. 7 Steady State Behavior It is important to consider the behavior of the reconsideration algo- rithms when the learning curve has reached steady state (i.e., L(t)=N). The ideal behavior is for the total send rate of the group to be 1/C RTCP packets per second, equally divided among all users. J.Rosenberg,H.Schulzrinne [Page 20] Internet Draft Reconsideration July 1997 _________________________________________________________ Conditional Unconditional 2-5 Group Size N nsent Tplat nsent Tplat _________________________________________________________ 1000 143 49 s 18 5 s 10000 1430 506 s 178 61 s 100000 14305 5083 s 1784 632 s _________________________________________________________ Table 1: Transient Behavior for Various Group Sizes There are actually two situations which can be reasonably deemed as steady state. The first of these is a group size which remains exactly fixed. However, in real systems, users come and go, so a sec- ond definition of steady state is a group whose membership oscillates slightly about some large value. We ran simulations to examine performance of the algorithms under both of these conditions. In first, fixed-group size scenario, both conditional reconsideration and the current RTCP algorithm both gen- erate packets at the desired rate of 1/C per second. We found, as expected, that the packet rate was reduced in the unconditional case, and packets were sent at .82/C packets per second, a reduction by 18%. We also performed a stochastic analysis of the unconditional algo- rithm in steady state. The analysis demonstrated that the packet intervals, instead of being uniformly distributed between 1/2 and 3/2 of the deterministic interval, were distributed with a density of (y-1/2)e y-1/2dybetween1/2and3/2.Theresultisthatthepacket rates are reduced by 1-ac1e-3/2, or 18%, matching our simulations exactly. In the second, slightly oscillating scenario, unconditional reconsid- eration and the current algorithm performed identically to their behavior in the first scenario. Conditional reconsideration, however, exhibited an average packet rate of .91/C, a reduction by 9%. This makes sense. When a user is about to send an RTCP packet, half the time the group size is larger than when the last packet is sent, activating the reconsideration. The other half of the time, the group size is slightly less, and the packet is sent, as if there were no reconsideration. Thus, the packet rate should be halfway between unconditional and the current algorithm. We also ran some simulations to investigate the fairness properties J.Rosenberg,H.Schulzrinne [Page 21] Internet Draft Reconsideration July 1997 of the algorithm. By fairness, we mean the variation of the number of packets transmitted per user, across all users. In a perfectly fair system, all users should have transmitted the same number of packets. We found all algorithms, including the current RTCP algorithm, to be extremely fair, with coefficients of variation below 0.005 after about an hour of running time. Finally, we investigated the impact of reconsideration on synchro- nization. The problem of synchronization in the Internet was studied by Van Jacobson and Sally Floyd in [18]. Their study focused on the synchronization of periodic routing messages, such as those generated by RIP or IGRP. However, they generalize their results to any system which is characterized by their periodic messages model. Fortunately, the RTCP feedback mechanism fits perfectly into this model, making their results directly applicable here. Although reconsideration reduces the randomness of the interval, the reduction is negligible compared to the amount required to induce synchronization. 8 Summary and Future Work RTP was meant to support real-time communications ranging from two- party telephone calls to broadcast applications with very large user populations. It incorporates an adaptive feedback mechanism that allows scaling to moderately sized groups, but shows a number of deficiencies once the group size exceeds on the order of a thousand. The problems can be summarized as congestion, convergence delays and state storage problems. We have solved the congestion problem via a simple algorithm called reconsideration. Both analysis and simulation have shown that the algorithm reduces the initial congestion by orders of magnitude under a variety of conditions. Furthermore, the algorithm is backwards compatible with the existing RTCP algorithm, allowing for a simple migration path. The reconsideration algorithm has been implemented as part of a gen- eric RTP Library, and is now operational in several common Mbone tools, such as rat and Nevot. It has also been proposed to the IETF as an improvement to the RTP specification, and is likely to be incorporated into the next release. Future work involves considering the problem of simultaneous leaves, to which reconsideration cannot be directly applied. More work is also needed to solve the other RTP scalability problems. 9 Acknowledgments The authors wish to thank Steve Casner, T.V. Lakshman, Sanjay Mithal, Daniel Rubenstein, Bernhard Suter, and Don Towsley for their insight- ful comments and discussion. J.Rosenberg,H.Schulzrinne [Page 22] Internet Draft Reconsideration July 1997 10 Bibliography [1] ITU-T, Recommendation H.323 - Visual Telephone Systems and Equipment for Local Area Networks which Provide Non-Guaranteed Qual- ity of Service , February 1996. [2] M. Handley, H. Schulzrinne, and E. Schooler, SIP: Session initia- tion protocol, Internet Draft, Internet Engineering Task Force, Dec. 1996. Work in progress. [3] H. Schulzrinne, A real-time stream control protocol (RTSP'), Internet Draft, Internet Engineering Task Force, Dec. 1996. Work in progress. [4] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: a transport protocol for real-time applications, Request for Comments (Proposed Standard) RFC 1889, Internet Engineering Task Force, Jan. 1996. [5] I. Busse, B. Deffner, and H. Schulzrinne, Dynamic QoS control of multimedia applications based on RTP, Computer Communications , vol. 19, pp. 4958, Jan. 1996. [6] J. Robinson and J. Stewart, Multimon - an ipmulticast monitor. The MultiMON web page can be found at http://www.merci.crc.doc.ca/mbone/MultiMON. [7] S. McCanne, V. Jacobson, and M. Vetterli, Receiver-driven layered multicast, in SIGCOMM Symposium on Communications Architectures and Protocols , (Palo Alto, California), Aug. 1996. [8] K. Almeroth and M. Ammar, Multicast group behavior in the internet's multicast backbone (mbone), IEEE Communications Magazine , vol. 35, June 1997. [9] S. Casner and V. Jacobson, Compressing IP/UDP/RTP headers for low-speed serial links, Internet Draft, Internet Engineering Task Force, Nov. 1996. Work in progress. [10] K. Almeroth and M. Ammar, Collecting and modeling the join/leave behavior of multicast group members in the mbone, in Proceedings of the Symposium on High Performance Distributed Computing , (Syracuse, NY), pp. 20916, IEEE, Aug. 1996. [11] B. Aboba, Alternatives for enhancing RTCP scalability, Internet Draft, Internet Engineering Task Force, Jan. 1997. Work in progress. [12] P. Sharma, D. Estrin, S. Floyd, and V. Jacobson, Scalable timers J.Rosenberg,H.Schulzrinne [Page 23] Internet Draft Reconsideration July 1997 for soft state protocols, in Proc. of IEEE Infocom , 1997. [13] P. Sharma, D. Estrin, S. Floyd, and V. Jacobson, Scalable timers for soft state protocols, technical report, University of Southern California, June 1996. [14] J. Veizades, E. Guttman, and C. Perkins, Service location proto- col, Request for Comments (Proposed Standard) RFC 2165, Internet Engineering Task Force, June 1997. [15] J. Rosenberg, H. Schulzrinne, and B. Suter, Wide area service location, (internet draft), Internet Engineering Task Force, July 1997. Work In Progress. [16] Y. Rekhter and T. Li, A border gateway protocol 4 (BGP-4), Request for Comments (Draft Standard) RFC 1771, Internet Engineering Task Force, Mar. 1995. (Obsoletes RFC1654). [17] T. Pusateri, Distance vector multicast routing protocol, Inter- net Draft, Internet Engineering Task Force, Sept. 1996. Work in pro- gress. [18] S. Floyd and V. Jacobson, The synchronization of periodic rout- ing messages, IEEE/ACM Transactions on Networking , vol. 2, pp. 122136, Apr. 1994. J.Rosenberg,H.Schulzrinne [Page 24]