home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!ukma!usenet.ins.cwru.edu!agate!toe.CS.Berkeley.EDU!bostic
- From: bostic@toe.CS.Berkeley.EDU (Keith Bostic)
- Newsgroups: comp.unix.bsd
- Subject: Re: Largest file size for 386BSD ?
- Date: 9 Nov 1992 18:58:25 GMT
- Organization: University of California, Berkeley
- Lines: 128
- Message-ID: <1dmcchINNt54@agate.berkeley.edu>
- References: <1992Nov6.031757.20766@ntuix.ntu.ac.sg> <1992Nov6.173454.17896@fcom.cc.utah.edu>
- NNTP-Posting-Host: toe.cs.berkeley.edu
-
- There are four issues for file size in a UNIX-like system:
-
- 1: the off_t type, the file "offset" measured in bytes
- 2: the logical block type, measured in X block units
- 3: the physical block type, measured in Y block units
- 4: the number of meta-data blocks you can access
-
- The off_t is the value returned by lseek, and in all BSD systems with
- the exception of 4.4BSD, it's a 32-bit signed quantity. In 4.4BSD,
- it's a 64-bit signed quantity. (As a side-note, this change broke
- every application on the system. The two big issues were programs that
- depended on fseek and lseek returning similar values, and programs that
- explicitly casted lseek values to longs.) The 32-bit off_t limit
- means that files cannot grow to be more than 2G in size, the 64-bit
- limit means that you don't have to worry about it 'cause the next three
- limits are going to kick in. So, the bottom line for this limit is
- 2^off_t - 1, because a single out-of-band value, -1, is used to denote
- an error.
-
- The second limit is the logical block type, and in a BSD system is a
- daddr_t, a signed 32-bit quantity. The logical block type is the
- number of logical blocks that a file may have. The reason that this
- has to be a signed quantity is that the "name space" for logical blocks
- is split into two parts, the data blocks and the meta-data blocks.
- Before 4.4BSD, the FFS used physical addresses for meta-data, so that
- this division wasn't necessary. However, this implies that you know
- the disk address of a block at all times. In a log-structured file
- system, since you don't know the address until you actually write the
- block (for lots of reasons), the "logical" name space has to be divided
- up. In the 4BSD LFS (and the 4.4BSD FFS and the Sprite LFS) the
- logical name space is split by the top bit, i.e. "negative" block
- numbers are meta-data blocks. So, the bottom line for this limit is
- 2^31 logical blocks in a file.
-
- The third limit is the physical block type. In UNIX-like systems, the
- physical block is also a daddr_t. In the FFS, it's the fragment size,
- and the FFS addresses the disks in units of fragments, i.e. an 8K block
- 1K fragment file system will address the disks in 1K units. This limits
- the size of the physical device.
-
- The fourth limit is the number of data blocks that are accessible
- through triple-indirect addressing. In 4BSD there are 12 (NDADDR) direct
- blocks and 3 (NIADDR) levels of indirection, for a total of:
-
- NDADDR +
- NINDIR(blocksize) + NINDIR(blocksize)^2 + NINDIR(blocksize)^3
-
- data blocks.
-
- Given 64-bit off_t's, and 32-bit daddr_t's, this all boils down to:
-
- Block size # of data blocks Max file size Limiting type
- .5K 2113676 ~ 1G 4
- 1K 16843020 ~ 16G 4
- 2K 134480396 ~262G 4
- 4K 1074791436 ~ 4T 4
- 8K 2147483648 ~16T 2
- 16K 2147483648 ~32T 2
-
- Note 1:
- For 32-bit off_t's, the maximum file size is 2G, except for 512
- byte block file systems where it's 1G. The limiting type for
- all of these is #1, except for 512 byte block file systems where
- it's #4.
-
- Note 2:
- If we go to 64-bit daddr_t's, the branching factor goes DOWN,
- because you need 8-bytes in the indirect block for each physical
- block. The table then becomes:
-
- Block size # of data blocks Max file size Limiting type
- .5K 266316 ~130M 4
- 1K 2113676 ~ 2G 4
- 2K 16843020 ~ 32G 4
- 4K 134480396 ~512G 4
- 8K 1074791436 ~ 8T 4
- 16K 8594130956 ~128T 4
-
-
- >In article <1992Nov6.031757.20766@ntuix.ntu.ac.sg> eoahmad@ntuix.ntu.ac.sg (Othman Ahmad) writes:
-
- >>This will be an important issue because soon we'll have hundreds of gigabytes,
- >>instead of magabytes soon.
- >> It took the jump from tens mega to hundreds in just 10 years.
-
- There are two issues that you need to consider. The first is the actual
- physical data that you have, which can probably be satisfied, in 99.99
- percent of the cases, by 2G, let alone 16T. The latter figure is also
- probably fine given what we can physically store on both magnetic and
- tertiary storage. While it is true that big files are getting bigger (by
- roughly an order of magnitude), most files are about the same size they
- were ten years ago, i.e 40% are under 1K and 80% are under 20K [SOSP '91,
- Mary Baker, Measurements of a Distributed File System]. Even that order
- of magnitude isn't all that interesting for this case, as most files simple
- aren't larger than 16T.
-
- The second issue is the addressibility of the data. Some applications
- want to store large objects (measured in megabytes) in a huge sparse file.
- These applications may have a 2G disk, but want files sized in terabytes.
- There is no satisfactory answre on most current UNIX systems, but the
- 64-bit daddr_t's would seem to make the situation better.
-
- In article <1992Nov6.173454.17896@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
-
- >Get around the problem:
- >
- >1) Multiple partitions not exceeding the 4 Gig limit.
- >2) Larger terminal blocks.
- >3) Additional indirection levels.
- >4) Assumption of larger files = log-structure file systems (ala Sprite).
-
- The interesting point for me is #4 -- although I'm not real sure what
- you meant. The advantages of LFS are two-fold. First, the features
- that theoretically would be available to applications, due to its
- no-overwrite policy, are attractive, e.g. "unrm", versioning,
- transactions. Second, with multiple writers it has the potential for
- improved performance.
-
- It is becoming clearer, at least to me, that the LFS performance
- advantages are not as obvious as they originally appeared, mostly
- because of the strong effects of the cleaner. I'm starting to agree
- with Larry McVoy of [USENIX, January 1991, Extent-like Performance
- from a UNIX File System] that FFS with read/write clustering is just
- as fast as LFS in many circumstances, and faster in lots of large-file
- applications where the disk is over, say, 80% utilized.
-
- Keith Bostic
-
-