home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.aix
- Path: sparky!uunet!cs.utexas.edu!torn!csd.unb.ca!dedourek
- From: dedourek@jupiter.sun.csd.unb.ca (John DeDourek)
- Subject: Re: NFS write error intrepretation
- Message-ID: <1993Jan5.153622.25957@jupiter.sun.csd.unb.ca>
- Summary: NFS write error interp. and print service on diskless client
- Keywords: NFS write error diskless client printing
- Sender: dedourek@jupiter.sun.csd.unb.ca
- Organization: University of New Brunswick
- References: <1993Jan4.131704.86@rcwusr>
- Date: Tue, 5 Jan 1993 15:36:22 GMT
- Lines: 74
-
- In article <1993Jan4.131704.86@rcwusr> jenkinsonjp@rcwusr.bp.com (John P. Jenkinson) writes:
- >we get several NFS write errors on our consoles every day in the format
- >nfs write error ___ on host ___ ___ ___ ___ ___ ___ ___
- >typically the error is 13 (permission) so i know it is benign but some time
- >ago in this conference i saw an explanation of how to intrepret the other
- >numbers to get the major,minor numbers, the inode, etc to find out why.
- >when i used this posting to intrepret ours, the numbers didn't make sense.
- >this is a Sun 690 as the NFS server. can't find this in info, techlib and
- >IBM tech support is making this a problem. i just want the how to on
- >how to intrepret these numbers. if you know, please reply. thanks.
-
-
- The following is the technique I used with errors on a RS/6000 client
- to an RS/6000 server. Can't tell whether this applies to SUN server.
-
- There are eight numbers in the message. Each is an 8 digit hex number
- with leading zeroes suppressed. You are interested in the first and
- the fourth number. Extend these numbers to 8 digits by prepending
- leading zeroes. The first number has the "major,minor" numbers of
- the "logical disk" holding the file system on the SERVER, in the
- left and right 16 bits (left and right 4 hex digits). These
- correspond to the "major,minor" numbers of the "block special device"
- holding the file system with the offending file. This is located
- by doing an "ls -l" on the SERVER in the /dev directory, or one
- of its subdirectories (depending on how your server is organized)
- and looking for entries with a leading "b" (maybe the SUN is different)
- and locating the one which has matching major, minor numbers.
- WARNING: the major,minor numbers in the NFS error are HEX, and
- those in the "ls -l" are decimal, so you have to convert.
-
- Next you must locate the directory over which that file system is
- mounted. (Try the mount command to display this).
-
- Now, the fourth number in the NFS error contains the inode number
- in the file system in the leftmost 16 bits (4 hex digits). Locate
- this inode using find on the SERVER, e.g. on an RS/6000
- find /mnt-point -inum decimalnumber -xdev -print
- You will need the equivalent on the Sun server. Here, /mnt-point
- is where the partition is mounted, decimalnumber is the inode
- number, and the -xdev option restricts find to search within
- only the one file system.
-
- As a further note, I suspect that a lot of the NFS write 13 errors
- are due to the following problem. Under Unix, there are a lot
- of "set uid to root" programs which work something like this:
- User smith runs the program. The program starts as root, so
- it has full access. The program opens a log file for writing
- stuff, which file is only writeable by root. The program then
- switches to run with access permissions of smith, this to process
- the files which smith has supplied as parameters without laboriously
- checking that smith him/herself really has access to the files
- supplied as parameters. Works fine on a diskful system.
-
- Now put the program on a diskless client. The log file is remotely
- mounted from the server using NFS. NFS is a "stateless" system, i.e.
- it doesn't remember the open, but the daemons go after the
- appropriate block every time a request comes. Obviously the daemons
- must be privileged to get everything, so they check the access
- authority with EVERY REQUEST WHICH COMES IN. Now the write
- requests come in after the program on the client has switched to
- smith, and smith doesn't have access to write to the log file.
- SO NFS WRITE ERROR 13.
-
- In our case, the problem which I debugged involved running a printer
- on one of the diskless clients, and allowing it to act as a server
- to the network. The print spooler runs non-privileged after apparently
- opening a status file, to which it writes the "per cent completed."
- Penalty: on a long print job, the various queue status displays, e.g. lpq,
- or qchk, won't show percent completed. But you get NFS WRITE ERROR 13
- messages on the console of the client.
-
- There is a PTF available for the print spooler.
-
- dedourek@jupiter.sun.csd.unb.ca
-