Return-Path: 
From: ben@REX.RE.uokhsc.edu (Benjamin Z. Goldsteen)
Subject: Re: >2GB file FSes
To: rdv@alumni.caltech.edu (Rodney D. Van Meter)
Date: Thu, 25 Aug 1994 12:46:00 -0500 (CDT)
In-Reply-To: <199408240722.AAA16399@alumni.caltech.edu> from "Rodney D. Van Met\
er" at Aug 24, 94 00:22:39 am
Reply-To: benjamin-goldsteen@uokhsc.edu
X-Mailer: ELM [version 2.4 PL21]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 6284

> You had a list of Unixes (and other OSes?) that support single files >
> 2GB. Can you send me a current copy of that list, and is it okay to
> include it in the FAQ?

I suppose.  It is still somewhat incomplete.  I included it at the end.

> I seem to recall that there is a 3rd-party product that supports large
> files for SunOS, do you know anything about it?

I think there is/was a product called DiskSuite that is/was based on the
Veritas stuff.  I don't think it supports large files because that would
require kernel modifications.

> NFS Version 3 supports 64-bit files? Do you know of anybody who's

I have heard that NFS V.3 is 64-bit.

> implemented it yet? Do you know where it's documented -- presumably an

I don't know of anybody, but I haven't followed it.  I bet Cray or Convex
will be the first (if they haven't already), though.

> RFC?

I belive NFS V.3 is an Internet standard (as opposed to the previous which
were wholey creatdby Sun?).  If so, there must be an RFC.

> AIX 4.1, announced recently, supports large files.

You corrected that in your latter message.  Also, one can only get that if
they make certain changes with respect to block sizes or PP sizes or
something like that.

> Obviously the implementation is trivial on machines with integers
> larger than 32 bits, such as the Cray and Convex, but, as you've noted
> before on comp.arch.storage, doing on machines with OSes that have
> traditionally had 32-bit file offsets is a problem. Are people adding
> separate calls, or bypassing VFS, or have they hacked the VFS layer so

Unfortunetly, many are not doing anything!  However, some of those that are
adding large file support, are doing it via seperate system calls (I think
Solaris has a system call for large seeks on a device driver -- at least the
version of Solaris used on the Cray SuperServers does).

> that it really takes 64-bit values and somehow still understands
> 32-bit ones?

ConvexOS does something weird that I still have not had a chance to
investigate.  In particular, I think they have seperate calls that are
available in certain compiling environments (e.g. off64_t, etc).  Don't
quote me on that...


[excerpts of my last posting on the matter]
     Commercial UNIX's support for filesystems >2GB is becoming fairly
common though it annoys me to no end that our RS/6000-AIX 3.2 system is
limited in that way.  It usually isn't very hard to fix either.  I'm
glad that Linux supports large filesystems though I would guess that
*few* Linux users need 2GB filesystems.

UNIX's that support >2GB filesystems:
-------------------------------------
Cray UNICOS
  Heard it was extent-based and supports 64-bit files and file systems.

Amdlah's UTS (2.1 and 4)
  No additional information

Convex's ConvexOS (~1 TB)
  ConvexOS is heavily based on BSD with substantial enhancements for
  the supercomputer environment.  While they use the FFS, it isn't
  quite stock...Ed Hamrick posted a nice article about its performance a
  while ago.

[New Info: AIX 4.1 supports file systems up to at least 32 GB's when the
appropriate optiosn are chosen.  AIX 4.1 also adds true stripping among
other things.  AIX's JFS (including the version in AIX 3.2) is a journaled
file system offering fast recovering.  It sits on top of a LVM that offers
mirroring, load balancing (pseudo-striping), partition placement control,
spanning, etc.  Under AIX, disks can be added and removed on the fly.  Fil
system/logical volumns can be increased on the fly, too.]

DEC's AXP OSF/1
  OSF's OSF/1 ships with a BSD FFS (as well as other file systems like
  memory filesystems) which DEC provides as the default file system.
  DEC also offers something called The AdvFS which offers journaling,
  on-line addition and removal of disks, per-file striping, etc.  Both
  file systems support large file systems (i.e. tens of millions of
  dollars of disk).

[New info: I think it maxes out at 128 GB]

Data General's DG/UX 5.4 (2 TB)
  UFS (based on BSD FFS?) file systems can sit on a logical volume that
  supports mirroring, stripping, and logging for fast fsck'ing.  DG/UX
  DG/UX 5.4.2 allowed non-mirrored, non-striped logical volumes to be
  expanded and shrunk when un-mounted.  DG/UX 5.4.3 offers a new
  virtual disk management system that can reconfigure file systems when
  mounted (among other things)

BSD 4.4
  I believe "long long" files and file systems are supported for both
  the trendy log-structured (LFS) and traditional block/fragment-
  structured (FFS).  Not exactly production...

Linux
  "ext2" is an extent-based file system supporting 2 TB file
  systems.

[New Info: a alpha/beta-level (but "works for me" ;-) defragger is available
for Linux's ext2, xiafs, and original file system.  BTW, I don't know if
Linux's device driver can handle 2 TB of disk...]

SGI's IRIX (3?,4,5) (8 GB)
  SGI IRIX (version 4 at least) supports EFS file systems of up to 8
  GB.  EFS is an extent-based file system.  Such file systems are
  optimized for sequential performance.  SGI includes a defragger (as
  configured from the factory, it runs once a week during the wee
  hours).  SGI also offers a logical volume manager for spanning disks.
  I am not sure if SGI offers any optional file systems.  I am not sure
  what is offered in IRIX 5 or IRIX 6 except that SGI's EFS is limited
  to 8 GB.  SGI is also working on a file system that offers "near raw
  disk performance", though.

[New Info: IRIX 6 still has a 8 GB file system limit because they use the
same EFS from previous versions.  However, IRIX 6 is 64-bit and support for
large file/file systems can be added relatively trivially (the interfaces
are all 64-bit, off_t is 64-bit, etc).  IRIX 5 uses the same EFS as IRIX 4.
However, I understand it is/can be considerably faster]

Sun's SunOS 4 w/DiskSuite
Sun's SunOS 5
  No additional information

HP's HP/UX 9 without LVM (4 GB)
  No additional information

HP's HP/UX 9 with LVM (server only)
  No additional information

Systems with Veritas LVM
  No additional information


UNIX's that support >2GB files:
-------------------------------
Amdahl's UTS
Cray's UNICOS
Convex's ConvexOS (~1 TB)
DEC's AXP OSF/1
BSD 4.4


--
Benjamin Z. Goldsteen
If I find out any further information, I will try to send it to you.

Return-Path: 
From: EdHamrick@aol.com
Subject: Re: Convex FS

ConvexOS is a stock FFS, with a few performance mods I made.  It uses
32-bit integers containing block numbers, where a block is 512 bytes
(a legacy of VAX).  We actually built a 1 TByte file system at NASA
Ames, and wrote a 1 TByte file.  Because of the signed 32-bit limit on
block numbers, this is the maximum.

> Is the integer size on the Convex 32 bits? I thought it was 64. Under
> C, what datatype is 64 bits -- "long" or "long long" or "int64" or...?

int and long are 32 bits, "long long" is 64 bits.  The "A" registers on
the C-series are 32 bit integers, and the "S" registers can hold
64-bit integers or 64-bit floating point (and 32-bit floating point).

> At the VFS layer (or its ConvexOS equivalent) do you not pass an off_t
> (if it's an off_t, what is that declared as)? Is it more like VMS,
> with both a block number and an offset that must be supplied by the
> programmer? If so, then it's not transparent programming to write
> large files. How do you seek to a large offset?

There are two system call interfaces  - lseek and llseek.  The former takes
an off_t and the latter takes an off64_t (long long).

A 64-bit offset is passed all the way down through the code till it needs to
be converted into a block number.  inodes hold 32-bit block numbers, and
these block numbers are in increments of 512 bytes, thus the limit of
1 TByte files and file systems.

That's why I like the OSF/1 approach - there's no 1 TByte limit, and lseek is
fully 64-bit clean (no need for llseek).

Regards,
Ed Hamrick
 
comp.arch.storage #2492 (0 + 37 more)
From: john@iastate.edu (John Hascall)
Subject: Re: File system limits for various UNIX's
Date: Sat Jan 15 08:00:32 PST 1994
Organization: Iowa State University, Ames, IA
Lines: 24

maxstrat@shell.portal.com (Peter van Cruyningen) writes:
}Benjamin Z. Goldsteen  wrote:
}>     I posted a quick list of UNIX's that supported >2GB files and
}>filesystems.  

}What restrictions do these UNIX's have in their client NFS code?  Can they
}support file systems > 2GB, or files > 2GB (for NFS V.3)?

For DEC OSF/1 V1.3, (from "man mountd" and "man nfsd" respectively):

  NFS can export partitions that are greater than 2 gigabytes. However, they
  appear as 2 gigabyte partitions when viewed from NFS clients.

  Files that are larger than 2 gigabytes are exported as 2 gigabyte files,
  because NFS Version 2 is a 32-bit protocol. Therefore, the size and offset
  fields are 32-bit quantities (on Alpha AXP UFS they are 64-bit quantities).
  Use caution when accessing files larger than 2 gigabytes from NFS clients.

John
-- 
John Hascall                   ``An ill-chosen word is the fool's messenger.''
Systems Software Engineer
Project Vincent
Iowa State University Computation Center  +  Ames, IA  50011  +  515/294-9551


comp.arch.storage #3017 (1 + 3 more)
From: sesow@csn.org (Timothy S Sesow)
Subject: Re: Unix files bigger than 2^31 bytes
Organization: Colorado SuperNet, Inc.
X-Newsreader: TIN [version 1.2 PL1]
Date: Tue Apr 19 05:46:40 PDT 1994
Lines: 38

Guy Harris (guy@nova.netapp.com) wrote:
: >The Veritas Volume Manager which will soon be available for Solaris supports
: >files larger than 2^31 bytes.

: I didn't think the Veritas Volume Manager supported files, period; I
: thought that was VxFS's job.

: I.e., we're not talking about *file systems* >2^31 bytes, we're talking
: about *individual files* >2^31 bytes.

: Does the SunOS 5.x VFS layer and VM system support file offsets with
: more than 32 bits?  If not, it's difficult to see how a file system
: placed under that VFS layer, and using that VM system, could support
: files of a size that doesn't fit in 32 bits.

There are many file system offerings that support greater than 2^31
bytes of file SYSTEM space, including my company (shameless plug ;-)).
However to get a file size greater than 2^31 bytes you have two 
issues:
1. The O/S needs to support greater than 2^31 bytes per file through
the whole I/O subsystem.  Not really a major job, but has some serious
impacts on the software vendors.  As it stands, the Virtual File
System interface on SunOS (4.x and 5.x) currently uses a 32-bit 
value to pass in the offset for I/O operations, so therefore 
no files greater than 2^31 bytes.

2. The Application needs to support this.  If the application does
an lseek or uses file size in the fstat call, then it just won't work.
If, however, the application uses sequential I/O (just reads or writes)
then it could conceivably work with a changed file system.  


--
--------------------------------------------
Tim Sesow
Automated Network Technologies LLC
P.O. Box 280507, Lakewood, CO 80228


comp.arch.storage #3026 (0 + 0 more)
From: ben@rex.uokhsc.edu (Benjamin Z. Goldsteen)
Subject: Re: Unix files bigger than 2^31 bytes
Date: Wed Apr 20 13:35:47 PDT 1994
Organization: Health Sciences Center, University of Oklahoma
Lines: 36

sesow@csn.org (Timothy S Sesow) writes:

>There are many file system offerings that support greater than 2^31
>bytes of file SYSTEM space, including my company (shameless plug ;-)).
>However to get a file size greater than 2^31 bytes you have two 
>issues:
>1. The O/S needs to support greater than 2^31 bytes per file through
>the whole I/O subsystem.  Not really a major job, but has some serious
>impacts on the software vendors.  As it stands, the Virtual File
>System interface on SunOS (4.x and 5.x) currently uses a 32-bit 
>value to pass in the offset for I/O operations, so therefore 
>no files greater than 2^31 bytes.

>2. The Application needs to support this.  If the application does
>an lseek or uses file size in the fstat call, then it just won't work.

I don't see why this is.  lseek() just has to know how to take >32-bit
offsets.  "struct stat" just needs to have >32-bit "st_size".  This could be
accomplished easily with 64-bit "long"'s and/or "off_t" (of course, 64-bit
"long" and 32-bit "off_t" wouldn't work...).  As we have noticed, operating
systems vendors tend not to want to do this.  This is probably because
various customers will find that their free mixing of "int", "long", "char
*", and/or complete ignorence of "off_t" will no longer work.  They should
be shot.  Both groups.

[This all assumes C.  What needs to be done for other languages?  What is
invovled for typical applications written in Pascal/FORTRAN-77/Fortran-90
/Ada/COBOL/Modula-2,3]

>If, however, the application uses sequential I/O (just reads or writes)
>then it could conceivably work with a changed file system.  

I don't like that; it is too unpredictable.
-- 
Benjamin Z. Goldsteen
BSD Net/2: What does "it" mean in the sentence "What time is it?"?


Date: Wed, 24 Aug 1994 09:57:57 -0400
From: < friends at Digital >
To: rdv@alumni.caltech.edu
In-Reply-To: Rodney D. Van Meter's message of Wed, 24 Aug 1994 01:39:51 -0700 <\
199408240839.BAA17930@alumni.caltech.edu>
Subject: Alpha >2GB file systems

> Which Alpha OSes support, in any fashion, files larger than 2GB (2^31
> bytes)? If so, since what version, and how is this achieved -- by
> changing, augmenting, or bypassing the VFS layer (in OSF/1)?

I really can only speak for DEC OSF/1.  DEC OSF/1 has supported >2GB
files with UFS since V1.2, which was the first version of DEC OSF/1 on
Alpha AXP.  The VFS layer was changed to provide 64-bit offsets
throughout.

> How does this appear to the programmer? How do programs that want to
> use larger offsets declare them and call functions such as lseek()?
> Is there a largeint or doubleint library or datatype for the C
> compiler, or is it simply that all integers are 64 bits?

The offset argument to lseek is typed as 'off_t', which is a 64-bit
type on DEC OSF/1 AXP.  The 'long' datatype is 64 bits, and 'int' is
32 bits.  This strategy was chosen primarily to preserve
sizeof(void *) == sizeof(long), while still providing a 32-bit
integer datatype to conserve on memory.

> What's the maximum size of a partition?

Maximum partition size is limited by the disk itself, or, using the
Logical Storage Manager (LSM), up to 128GB.

> Do you have, or plan to have, NFS V3 support?

DEC OSF/1 V3.0 (started shipping last week) fully supports NFS V3.  In
fact, I believe Digital is the first vendor to ship NFS V3.  I suspect Sun
will be next.

> Any additional info you can give me or point me to concerning large
> files would be a help. For that matter, one-sentence to one-paragraph
> descriptions of the types of file systems supported under OFS/1 would
> be nice, if you have them -- I think one of them is an extent-based FS?

Here is the list (from the Software Product Description for DEC OSF/1):

-----------------------------------------------------
File System Support

 The DEC OSF/1 file system architecture is based on OSF/1 Virtual File
 System (VFS) which is based on the Berkeley 4.3 Reno Virtual File Sys-
 tem. VFS provides an abstract layer interface into files regardless
 of the file systems in which the file resides.

 DEC OSF/1 supports the following file system types:

 o  UNIX File System (UFS) - based on the Berkeley Fast File system

 o  Network File System (NFS)

 o  Memory File System (MFS)

 o  ISO 9660 Compact Disc File System (CDFS)

 o  POLYCENTER Advanced File System (AdvFS)

 o  File-on-File Mounting File System (FFM)

 o  /proc File System


The ones probably most interesting to you:


 UNIX File System

 UFS is compatible with the Berkeley 4.3 Tahoe release. UFS allows a
 pathname component to be 255 bytes, with the fully qualified pathname
 length restriction of 1023 bytes. The DEC OSF/1 implementation of UFS
 supports a maximum file size equivalent to the largest supported file
 system (32 GB for DEC OSF/1 V2.0, 128 GB for V3.0).



 POLYCENTER Advanced File System (AdvFS)

 The POLYCENTER Advanced File System is a local file system that uses
 journaling to recover from unplanned system restarts (such as power
 failures) significantly faster than the standard UFS. In addition to
 fast restart, the AdvFS provides the flexibility of allowing filesets
 (filesystems) to share a single storage pool. Since more than one file-
 set can occupy the same storage pool, soft and hard quotas may be spec-
 ified for filesets in addition to the common application of quotas to
 users and groups. Optionally, user data may be logged to allow recov-
 erability of user data writes. The AdvFS backup utility allows data
 to be archived in compressed form.

 The right to use the POLYCENTER Advanced File System is granted by the
 DEC OSF/1 Operating System license. In addition, a separately licensed,
 optional layered product, the POLYCENTER Advanced File System Util-
 ities, may be ordered. Please refer to the OPTIONAL SOFTWARE section
 of this SPD for more information.
-----------------------------------------------------

In addition to the above, we will soon be shipping our implementation
of DCE DFS, which will also support >2GB files.


> WWW pointers welcome, too.

A good place to start is http://www.digital.com/info.html.  There is
probably some info in there about VMS and NT file system limits.

As for further info on DEC OSF/1, there is a good summary of file
system limits in 'DEC OSF/1 Technical Summary, Aug. 1994, V3.0', order
number AA-Q0R1B-TE.  The most recent Software Product Description
should be available at www.digital.com, although they may not have
gotten the most recent version on there yet.

Disclaimer: Please be aware that I don't officially speak for Digital.
The above information is correct to the best of my knowledge, but
customers should refer to the official documents (SPDs) for the
official information.  Feel free to reference the www pointer in your
FAQ, though I'd appreciate your not including my email address --
thanks!

Hope this helps,

	XXX

Return-Path: 
Date: Wed, 24 Aug 1994 12:05:35 -0700
From: olson@anchor.engr.sgi.com (Dave Olson)
To: rdv@alumni.caltech.edu (Rodney D. Van Meter)
Subject: Re:  SGI 64-bit File Systems

|  I'm unofficially keeper of the comp.arch.storage FAQ now, and I'm
|  writing a section on 64-bit file systems (this info is also important
|  for a potential customer of ours).
|
|  Can you answer some questions for me, or forward this to someone who
|  can? Thanks.

I'll answer some of it, and also pass it on to somebody who might be
able to comment further.

|  Does Irix 5.x support, in any fashion, files larger than 2GB (2^31
|  bytes)? If so, since what version, and how is this achieved -- by
|  changing, augmenting, or bypassing the VFS layer?

Not yet (in any release).  We do support *filesystems* up to 8 GB.

|  How does this appear to the programmer? How do programs that want to
|  use larger offsets declare them and call functions such as lseek()?
|  Does Irix have a largeint or doubleint library or datatype for the C
|  compiler?
|
|  What's the maximum size of a partition?
|
|  Any additional info you can give me or point me to concerning large
|  files would be a help. For that matter, one-sentence to one-paragraph
|  descriptions of the types of file systems supported under Irix would
|  be nice, if you have them -- I know one of them is an extent-based FS?

EFS (Extent File System) is our only 'serious' filesystem, although we
do support ISO9660/RockRidge, HFS, and DOS.  EFS is indeed extent
based, as you might guess from the name; up to 248 (512 byte) blocks
per extent.  Files are typically grown in 16KB chunks (or larger for
large writes, of course) then truncated back on close if necessary.
Bitmap freelist (on disk and in memory), and directory/inodes spread
around cylinder groups, similar to UFS.


Like many other system vendors, we are looking at doing a 64 bit (large
files and larger partitions) filesystem, but I don't know if this is
announced as any kind of a real product yet; the person I've forwarded
to may be able to say more.

Our current (IRIX 5.2) compilers support long long (64 bit) in 32 bit
programs via a combo of inline ops and library support routines.  We
are releasing a 64 bit OS and libraries on a subset of our high end
machines 'soon'.

NFS v3 is being worked on; I don't think a release date is announced.
I don't think there's an RFC for that, just the Sun docs.

The most beautiful things in the world are              |   Dave Olson
those from which all excess weight has been             |   Silicon Graphics
removed.  -Henry Ford                                   |   olson@sgi.com

Return-Path: 
From: woan@austin.ibm.com (Ronald S. Woan)
Subject: Re: more 64-bit FS questions
To: rdv@alumni.caltech.edu (Rodney D. Van Meter)
Date: Wed, 24 Aug 1994 16:51:23 -0500 (CDT)
Reply-To: woan@austin.ibm.com
In-Reply-To: <199408240854.BAA18234@alumni.caltech.edu> from "Rodney D. Van Met\
er" at Aug 24, 94 01:54:43 am
X-Mailer: ELM [version 2.4 PL23]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 937

Rodney D. Van Meter writes:
> If you support files larger than 2GB, how does that appear to the
> programmer? Does the C compiler support a 64-bit integer datatype?

No files are still limited to 2GB :-(. xlc does indeed support a long
long 64 bit integer datatype.

> I'm the maintainer of the comp.arch.storage FAQ, and I'm preparing a
> section on OSes with >32-bit file support.
>
> Regardless, you need to change the WWW AIX 4.1 release notes slightly;
> they're a little confusing as to whether you are support >2GB
> _partitions_ or >2GB _files_. Which is it?

Up 64 GB filesystems... 2GB file size limit still.

 +------All Views Expressed Are My Own And Not Necessarily Shared By IBM-----+
 + Ronald S. Woan       (IBM VNET)WOAN AT AUSTIN, woan@exeter.austin.ibm.com +
 + outside of IBM  woan@austin.ibm.com or woan@cactus.org or r.woan@ieee.org +
 + other woan@soda.csua.berkeley.edu Prodigy: XTCR74A Compuserve: 73530,2537 +