home *** CD-ROM | disk | FTP | other *** search
-
- Mini_HOWTO: Multi Disk System Tuning
-
- Version 0.2
- Date 960321
- By Stein Gjoen <sgjoen@nyx.net>
-
- This document was written for two reasons, mainly because I got hold
- of 3 old SCSI disks to set up my Linux system on and I was pondering
- how best to utilize the inherent possibilities of parallising in a
- SCSI system. Secondly I hear there is a prize for people who write
- docs...
-
- This is intended to be read in conjunction with the Linux File System
- Standard (FSSTD). It does not in any way replace it but tries to
- suggest where physically to place directories detailed in FSSTD, both
- in terms of drives, partitions, types, RAID, file system (fs),
- physical sizes and other parameters that should be considered and
- tuned in a linux system, ranging from single home systems to large
- servers.
-
- This is also a learning experience for myself and I hope I can start
- the ball rolling with this Mini-HOWTO and that it perhaps can evolve
- into a larger more detailled and hopefully even more correct HOWTO.
- Notes in square brackets indicate where I need more information.
-
- Note that this is a guide on how to design and map logical partitions
- onto multiple disks and tune for performance and reliability, NOT how
- to actually partition the disks or format them.
-
- This is the first update, still without any inputs...
-
- So let's cut to the chase where swap and /tmp are racing along hard
- drive...
-
- ---------------------------------------------------------------
-
- 1. Considerations
-
- The starting point in this will be to consider where you are and what
- you want to do. The typical home system starts out with existing
- hardware and the newly converted will want to get the most out of
- existing hardware. Someone setting up a new system for a specific
- purpose (such as an Internet provider) will instead have to consider
- what the goal is and buy accordingly. Being ambitious I will try to
- cover the entire range.
-
- Various purposes will also have different requirements regarding file
- system placement on the drives, a large multiuser machine would
- probably be best off with the /home directory on a separate disk, just
- to give an example.
-
- In general for performance it is advantageous to split most things
- over as many disks as possible but there is a limited number of
- devices that can live on a SCSI bus and cost is naturally also a
- factor.
-
- 1.1 File system features
-
- The various parts of FSSTD have different requirements regarding
- speed, reliability and size, for instance losing root is a pain but
- can easily be recovered. Losing /var/spool/mail is a rather different
- issue. Here is a quick summary of some essential parts and their
- properties and requirements. [This REALLY need some beefing up]:
-
- 1.1.1 Swap
-
- Speed: Maximum! Though if you rely too much on swap you
- should consider buying some more RAM.
-
- Size: Quick and dirty algorithm: just as for tea: 16M for
- the machine and 2M for each user. Smallest kernel run in 1M but is
- tight, use 4M for general work and light applications, 8M for X11 or
- GCC or 16M to be comfortable. [The author is known to brew rather
- powerful tea...]
-
- Reliability: Medium. When it fails you know it pretty quickly and
- failure will cost you some lost work. You save often, don't you?
-
- 1.1.2 /tmp and /var/tmp
-
- Speed: Very high. On a separate disk/partition this will
- reduce fragmentation generally, though ext2fs is handles fragmentation
- rather well.
-
- Size: Hard to tell, small systems are easy to run with few
- megs but these are notorious hiding places for stashing files away
- from prying eyes and quota enforcements and can grow without control
- on larger machines. Suggested: small machine: 8M, large machines up to
- 500M (The machine here has 1100 users and 300M /tmp file).
-
- Reliability: Low. Often programs will warn or fail gracefully when
- these areas fail or are filled up. Random file errors will of course
- be more serious, no matter what file area this is.
-
- (* That was 50 lines, I am home and dry! *)
-
- 1.1.3 Spool areas (/var/spool/news, /var/spool/mail)
-
- Speed: High, especially on large news servers. News transfer
- and expiring are disk intensive and will benefit from fast drives.
- Print spools: low. Consider RAID0 for news.
-
- Size: For news/mail servers: whatever you can afford. Single
- user systems a few megs will be sufficient if you read continuously.
- Joining a list server and taking a holiday is on the other hand it not
- a good idea. (Again the machine I use has 100M reserved for the
- entire /var/spool)
-
- Reliability: Mail: very high, news: medium, print spool: low. If
- your mail is very important (isn't it always?) consider RAID for
- reliability. [Is mail spool failure frequent? I have never experienced
- it but there are people catering to this market of reliability...]
-
- Note: Some of the news documentation suggests putting all
- the .overview files on a drive separate from the news files, check out
- the news FAQs for more information.
-
- 1.1.4 Home directories (/home)
-
- Speed: Medium. Although many programs use /var for temporary
- storage, other such as some newsreaders frequently update files in the
- home directory which can be noticeable on large multiuser systems. For
- small systems this is not a critical issue.
-
- Size: Tricky! On some systems people pay for storage so this
- is usually then a question of economy. Large systems such as nyx.net
- (which is a free Internet service with mail, news and WWW services)
- run successfully with a suggested limit of 100K per user and 300K as
- max. If however you are writing books or are doing design work the
- requirements balloon quickly.
-
- Reliability: Variable. Losing /home on a single user machine is
- annoying but when 2000 users call you to tell you their home
- directories are gone it is more than just annoying. For some their
- livelihood relies on what is here. You do regular backups of course?
-
- Note: You might consider RAID for either speed or
- reliability. If you want extremely high speed and reliability you
- might be looking at other OSes and platforms anyway. (Fault tolerance
- etc.)
-
- 1.1.5 Main binaries ( /usr/bin and /local/bin)
-
- Speed: Low. Often data is bigger than the programs which are
- demand loaded anyway so this is not speed critical. Witness the
- successes of life file systems on CD ROM.
-
- Size: The sky is the limit but 200M should give you most of
- what you want for a comprehensive system. (The machine I use, including
- the libraries, uses about 800M)
-
- Reliability: Low. This is usually mounted under root where all
- the essentials are collected. Nevertheless losing all the binaries is
- a pain...
-
- 1.1.6 Libraries ( /usr/lib and /local/lib)
-
- Speed: Medium. These are large chunks of data loaded often,
- ranging from object files to fonts, all susceptible to bloating. Often
- these are also loaded in their entirety and speed is of some use here.
-
- Size: Variable. This is for instance where word processors
- store their immense font files. [actual sizes, anyone? I'd like data
- for GCC related libraries, TeX/LaTeX, X11 and others that can be relevant]
-
- Reliability: Low. See point 1.1.5
-
- 1.1.7 Root
-
- Speed: Quite low: only the bare minimum is here, much of
- which is only run at startup time.
-
- Size: Quite small. Biggest file is /vmlinuz, unless you have
- a large rescue file collection about 4M should be sufficient.
-
- Reliability: High. A failure here will possible cause a lot of
- grief and with with rescuing your boot partition. Naturally you do
- have a rescue disk?
-
- 1.2 Explanation of terms
-
- Naturally the faster the better but often the happy installer of Linux
- has several disks of varying speed and reliability so even though this
- document describes performance as 'fast' and 'slow' it is just a rough
- guide since no finer granularity is feasible. Even so there are a few
- details that should be kept in mind:
-
- 1.2.1 Speed
-
- This is really a rather woolly mix of several terms: CPU load, transfer
- setup overhead, disk seek time and transfer rate. It is in the very nature
- of tuning that there is no fixed optimum, and in most cases price is the
- dictating factor. CPU load is only significant for IDE systems where the
- CPU does the transfer itself [more details needed here !!] but is generally
- low for SCSI, see SCSI documentation for actual numbers. Disk seek time is
- also small, usually in the millisecond range. This however is not a problem
- if you use command queuing on SCSI where you then overlap commands keeping
- the bus busy all the time. News spools are a special case consisting of a
- huge number of normally small files so in this case seek time can become
- more significant.
-
- 1.2.2 Reliability
-
- Naturally none wants low reliability disks but one might be better off
- regarding old disks as unreliable. Also for RAID purposes (See the relevant
- docs) it is suggested to use a mixed set of disks so simultaneous disk
- crashes becomes less likely.
-
- 1.3 RAID
-
- This is a method of increasing reliability, speed or both by using multiple
- disks in parallel thereby increasing access time and transfer speed. A
- checksum or mirroring system can be used to increase reliability. Large
- servers can take advantage of such a setup but it might be overkill for a
- single user system unless you already have a large number of disks
- available. See other docs and FAQs for more information.
-
- 1.4 AFS, Veritas and Other Volume Management Systems
-
- Although multiple partitions and disks have the advantage of making for more
- space and higher speed and reliability there is a significant snag: if for
- instance the /tmp partition is full you are in trouble even if the news spool
- is empty, it is not easy to retransfer quotas across disks. Volume management
- is a system that does just this and AFS and Veritas are two of the best known
- examples. Some also offer other file systems like log file systems and others
- optimised for reliability or speed. Note that Versitas is not available (yet)
- for Linux and it is not certain they can sell kernel modules without providing
- source for their proprietary code, this is just mentioned for information on
- what is out there. Still, you can check their web page http://www.veritas.com
- to see how such systems function.
-
- 1.5 Linux md Kernel Patch
-
- There is however one kernel project that attempts to do some of this, md,
- which has been part of the kernel distributions since 1.3.69. Currently
- providing spanning and RAID it is still in early development and people
- reporting varying degrees of success as well as total wipe out. Use with
- caution.
-
- 1.6 General File System Consideration
-
- In the Linux world ext2fs is well established as a general purpose system.
- Still for some purposes others can be a better choice. News spools lend
- themselves to a log file based system whereas high reliability data might
- need other formats. This is a hotly debated topic and there are currently
- few choices available but work is underway. [I believe someone from Yggdrasil
- mentioned a log file based system once, details? And AFS is available to
- Linux I think, sources anyone?]
-
- There is room for access control lists (ACL) and other unimplemented
- features in the existing ext2fs, stay tuned for future updates. There has
- been some talk about adding on the fly compression too.
-
- DouBle already features file compression with some limitations.
- Zlibc adds transparent on-the-fly decompression of files as they load.
-
- Also there is the user file system that allows ftp based file system and
- some compression (arcfs) plus fast prototyping and many other features.
-
-
- 2 Disk Layout
-
- With all this in mind we are now ready to embark on the layout [and no
- doubt controversy]. I have based this on my own method used when I got hold
- of 3 old SCSI disks and boggled over the possibilities.
-
- 2.1 Selection
-
- Determine your needs and set up a list of all the parts of the file system
- you want to be on separate partitions and sort them in descending order of
- speed requirement and how much space you want to give each partition.
-
- If you plan to RAID make a not of the disks you want to use and what
- partitions you want to RAID. Remember various RAID solutions offers
- different speeds and degrees of reliability.
-
- (Just to make it simple I'll assume we have a set of identical SCSI disks
- and no RAID)
-
- 2.2 Mapping
-
- Then we want to place the partitions onto physical disks. The point of the
- following algorithm is to maximise parallelizing and bus capacity. In this
- example the drives are A, B and C and the partitions are 987654321 where 9
- is the partition with the highest speed requirement. Starting at one drive
- we 'meander' the partition line over and over the drives in this way:
-
- A : 9 4 3
- B : 8 5 2
- C : 7 6 1
-
- This makes the 'sum of speed requirements' the most equal across each
- drive.
-
- 2.3 Optimizing
-
- After this there are usually a few partitions that have to be 'shuffled' over
- the drives either to make them fit or if there are special considerations
- regarding speed, reliability, special file systems etc. Nevertheless this
- gives [what this author believes is] a good starting point for the complete
- setup of the drives and the partitions. In the end it is actual use that will
- determine the real needs after we have made so many assumptions. After
- commencing operations one should assume a time comes when a repartitioning
- will be beneficial.
-
- 3 Further Information
-
- There is wealth of information one should go through when setting up a
- major system, for instance for a news or general Internet service provider.
- The FAQs in the following groups are useful:
-
- News groups:
- comp.arch.storage, comp.sys.ibm.pc.hardware.storage, alt.filesystems.afs,
- comp.periphs.scsi ...
- Mailing lists:
- raid, scsi ...
-
- many mailing lists are at vger.rutgers.edu but this is notoriously
- overloaded, try to find a mirror. There are some lists mirrored at
- http://www.redhat.com [more references please!].
-
- [much more info needed here]
-
- 4 Concluding Remarks
-
- Disk tuning and partition decisions are difficult to make, and there are no
- hard rules here. Nevertheless it is a good idea to work more on this as the
- payoffs can be considerable. Maximizing usage on one drive only while the
- others are idle is unlikely to be optimum, watch the drive light, they are not
- there just for decoration. For a properly set up system the lights should look
- like Christmas in a disco. Linux offers software RAID but also support for
- some hardware base SCSI RAID controllers. Check what is available. As your
- system and experiences evolve you are likely to repartition and you might look
- on this document again. Additions are always welcome.
-
-