home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.unix.wizards:5234 comp.unix.admin:6653
- Newsgroups: comp.unix.wizards,comp.unix.admin
- Path: sparky!uunet!elroy.jpl.nasa.gov!swrinde!gatech!nntp.msstate.edu!news
- From: fwp@CC.MsState.Edu (Frank Peters)
- Subject: Re: Causes Of Death On Heavily-Loaded UNIX System?
- Message-ID: <1992Dec15.151754.27354@ra.msstate.edu>
- Sender: news@ra.msstate.edu
- Nntp-Posting-Host: jester.cc.msstate.edu
- Organization: Computing Center, Mississippi State University
- References: <1gjlnjINN9eb@smurf.sti.com>
- Date: Tue, 15 Dec 1992 15:17:54 GMT
- Lines: 117
-
- In article <1gjlnjINN9eb@smurf.sti.com> westes@smurf.sti.com (Will Estes) says:
- : A local SunOS host is experiencing extremely poor performance, and I am
- : interested in understanding a little more about the dynamics of UNIX
- : performance given a large number of users. The system in question is a
- : SPARC2 system that routinely gets about 80 users on it. The workload
- : mix is probably something like this:
- :
- : 50% mail
- : 30% news
- : 20% compiles or other activities with heavy file I/O
- :
- : A typical symptom of poor performance on the subject system would be
- : very slow character echoing in a vi session, with occasional delays of
- : as much as 15 seconds when you type a character.
- :
- : What are the major factors that contribute to performance death on a
- : system that is heavily loaded by the sheer number of users? If you
- : could list the major causes of system slowdown, what would be the
- : percentage contribution of each cause to the total slowdown of performance
- : of the system?
- :
- : I realize that performance is a very complex topic, and a lot depends on
- : particulars of a situation. Still, I am hoping that some
- : generalizations can be made, even based on the sparse facts listed
- : above. Thanks to all who respond.
-
- There are some useful bits of information left out. How much memory
- does the system have? How many disks on how many controllers? Are
- typical users connecting via telnet? Are any of them running X
- applications (reading news via xvnews is a lot different from reading
- news via rn)? Does the system have a graphic console that is in
- regular use? Is the poor performance fairly steady or does it come in
- fits? (for example, does character echo get slow for a few seconds and
- then quick and then slow and then quick or does it get slow and stay
- slow for a while?)
-
- Keeping what I don't know about your system and your last paragraph
- above firmly in mind I'd look at vmstat, ps and iostat and think about
- memory, disk IO and context switches.
-
- Run the command "vmstat 5 10" and ignore the first line of data.
-
- To the far left, under procs look at the b and w columns. Ideally
- these should sit at 0. Anything under the b column indicates processes
- blocked waiting for IO. Any consistent numbers there can indicate a
- disk bottleneck among other things. The w column is a more likely
- suspect. It indicates potentially active jobs that are swapped out.
- Anything there is a sign of memory limitations.
-
- Also look at the po column under the 'page' heading. This indicates
- page outs and it too should ideally sit at 0. Like the w column
- anything there can be a sign of memory shortage.
-
- The cs culumn under the 'faults' heading indicates the context switch
- rate. A lot of context switches can give a distinct knee in your
- system performance curves. Unfortunately, there isn't any particular
- number to look for here and relatively little you can do about it.
- Different Sun systems have different numbers of hardware contexts and
- can thus deal with different context switch rates. But such a change
- requires a system upgrade...you can't just by more context for your
- SS2.
-
- Run "ps -au". Look for individual processes using an unusually high
- %CPU or %MEM. It only takes one process allocating 30MB of memory or
- using 30% of CPU to hit the whole system. Often you can find some
- unusual process that you were unaware of which can be optimised in some
- way or run in off hours.
-
- Under the STAT heading look for processes in P or D state. These
- indicate processes waiting for page in or IO and can reflect memory or
- disk IO constraint.
-
- Run "iostat -d 5 10". If you have more than 4 disks use the -l option
- to display more drives ("iostat -l 5 -d 5 10" for 5 drives, for
- example). Ignore the msps columns and the first line of data. This
- gives a feel for how busy your disks are.
-
- Using this information can be tricky. The ideal, of course, is for all
- of your disks to be equivalent and have the load evenly distributed
- among them. In practice different disks have different performance and
- without special disk striping software (like Online Disksuite) you may
- not be able to balance the load. But if you see a lot of activity on
- your cheapest, slowest disk you may have found a bottleneck (especially
- if you found a lot of processes waiting for IO with vmstat and ps). If
- you have more than one disk controller look at how the load is
- distributed between the controllers (are, for example, all of your busy
- disks on one controller).
-
- These are all short term things to look at. To really get a useful
- feel you need to run these things over the long term at relatively
- idle, moderately loaded and peak periods. So many of the numbers
- reported by performance monitoring tools don't have any right answer.
- All you can do is look at them at various load levels and see which
- ones show unusual values when load is high.
-
- Think hard about memory. UNIX likes to have a lot of memory and when
- it runs out and has to start going to disk performance goes down hill
- fast. This seems to be by far the most common constraint on most
- systems.
-
- Also think hard about the disk system. On larger systems disk load is
- distributed across multiple disks on multiple controllers. Yet it is
- fairly common in the workstation server world to try to hang all of the
- disk of the single builtin SCSI controller with a few big disks. You
- might be surprised at the performance boost a second (or third)
- controller would make if you don't already have them.
-
- BIG DISCLAIMER TIME. There are entire books written about performance
- tuning that cover a wide variety of issues like tuning kernel
- parameters and long term monitoring strategies. I strongly suggest
- buying one of them. These few lines (well...these several lines then)
- can only be a rough first pass at the task of finding out what your
- system is doing.
-
- --
- Frank Peters - UNIX Systems Programmer - Mississippi State University
- Internet: fwp@CC.MsState.Edu - Phone: (601)325-7030 - FAX: (601)325-8921
-