NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / unix / wizards / 5234 < prev next >

Wrap

Internet Message Format | 1992-12-15 | 6.4 KB

Xref: sparky comp.unix.wizards:5234 comp.unix.admin:6653 Newsgroups: comp.unix.wizards,comp.unix.admin Path: sparky!uunet!elroy.jpl.nasa.gov!swrinde!gatech!nntp.msstate.edu!news From: fwp@CC.MsState.Edu (Frank Peters) Subject: Re: Causes Of Death On Heavily-Loaded UNIX System? Message-ID: <1992Dec15.151754.27354@ra.msstate.edu> Sender: news@ra.msstate.edu Nntp-Posting-Host: jester.cc.msstate.edu Organization: Computing Center, Mississippi State University References: <1gjlnjINN9eb@smurf.sti.com> Date: Tue, 15 Dec 1992 15:17:54 GMT Lines: 117 In article <1gjlnjINN9eb@smurf.sti.com> westes@smurf.sti.com (Will Estes) says: : A local SunOS host is experiencing extremely poor performance, and I am : interested in understanding a little more about the dynamics of UNIX : performance given a large number of users. The system in question is a : SPARC2 system that routinely gets about 80 users on it. The workload : mix is probably something like this: : : 50% mail : 30% news : 20% compiles or other activities with heavy file I/O : : A typical symptom of poor performance on the subject system would be : very slow character echoing in a vi session, with occasional delays of : as much as 15 seconds when you type a character. : : What are the major factors that contribute to performance death on a : system that is heavily loaded by the sheer number of users? If you : could list the major causes of system slowdown, what would be the : percentage contribution of each cause to the total slowdown of performance : of the system? : : I realize that performance is a very complex topic, and a lot depends on : particulars of a situation. Still, I am hoping that some : generalizations can be made, even based on the sparse facts listed : above. Thanks to all who respond. There are some useful bits of information left out. How much memory does the system have? How many disks on how many controllers? Are typical users connecting via telnet? Are any of them running X applications (reading news via xvnews is a lot different from reading news via rn)? Does the system have a graphic console that is in regular use? Is the poor performance fairly steady or does it come in fits? (for example, does character echo get slow for a few seconds and then quick and then slow and then quick or does it get slow and stay slow for a while?) Keeping what I don't know about your system and your last paragraph above firmly in mind I'd look at vmstat, ps and iostat and think about memory, disk IO and context switches. Run the command "vmstat 5 10" and ignore the first line of data. To the far left, under procs look at the b and w columns. Ideally these should sit at 0. Anything under the b column indicates processes blocked waiting for IO. Any consistent numbers there can indicate a disk bottleneck among other things. The w column is a more likely suspect. It indicates potentially active jobs that are swapped out. Anything there is a sign of memory limitations. Also look at the po column under the 'page' heading. This indicates page outs and it too should ideally sit at 0. Like the w column anything there can be a sign of memory shortage. The cs culumn under the 'faults' heading indicates the context switch rate. A lot of context switches can give a distinct knee in your system performance curves. Unfortunately, there isn't any particular number to look for here and relatively little you can do about it. Different Sun systems have different numbers of hardware contexts and can thus deal with different context switch rates. But such a change requires a system upgrade...you can't just by more context for your SS2. Run "ps -au". Look for individual processes using an unusually high %CPU or %MEM. It only takes one process allocating 30MB of memory or using 30% of CPU to hit the whole system. Often you can find some unusual process that you were unaware of which can be optimised in some way or run in off hours. Under the STAT heading look for processes in P or D state. These indicate processes waiting for page in or IO and can reflect memory or disk IO constraint. Run "iostat -d 5 10". If you have more than 4 disks use the -l option to display more drives ("iostat -l 5 -d 5 10" for 5 drives, for example). Ignore the msps columns and the first line of data. This gives a feel for how busy your disks are. Using this information can be tricky. The ideal, of course, is for all of your disks to be equivalent and have the load evenly distributed among them. In practice different disks have different performance and without special disk striping software (like Online Disksuite) you may not be able to balance the load. But if you see a lot of activity on your cheapest, slowest disk you may have found a bottleneck (especially if you found a lot of processes waiting for IO with vmstat and ps). If you have more than one disk controller look at how the load is distributed between the controllers (are, for example, all of your busy disks on one controller). These are all short term things to look at. To really get a useful feel you need to run these things over the long term at relatively idle, moderately loaded and peak periods. So many of the numbers reported by performance monitoring tools don't have any right answer. All you can do is look at them at various load levels and see which ones show unusual values when load is high. Think hard about memory. UNIX likes to have a lot of memory and when it runs out and has to start going to disk performance goes down hill fast. This seems to be by far the most common constraint on most systems. Also think hard about the disk system. On larger systems disk load is distributed across multiple disks on multiple controllers. Yet it is fairly common in the workstation server world to try to hang all of the disk of the single builtin SCSI controller with a few big disks. You might be surprised at the performance boost a second (or third) controller would make if you don't already have them. BIG DISCLAIMER TIME. There are entire books written about performance tuning that cover a wide variety of issues like tuning kernel parameters and long term monitoring strategies. I strongly suggest buying one of them. These few lines (well...these several lines then) can only be a rough first pass at the task of finding out what your system is doing. -- Frank Peters - UNIX Systems Programmer - Mississippi State University Internet: fwp@CC.MsState.Edu - Phone: (601)325-7030 - FAX: (601)325-8921