home *** CD-ROM | disk | FTP | other *** search
- From: jsq@usenix.uucp (John Quarterman)
-
- Pipe Write Problems Page 1 of 11 IEEE 1003.1 N.116
-
-
-
- John S. Quarterman
- Institutional Representative
- From USENIX to IEEE P1003
- {uunet,ucbvax,seismo}!usenix!jsq
-
- Texas Internet Consulting
- 701 Brazos, Suite 500
- Austin, Texas 78701-3243
- +1-512-320-9031
- jsq@longway.tic.com
-
- 24 August 1987
-
- Attention: P1003 Working Group
- Secretary, IEEE Standards Board
- 345 East 47th Street
- New York, NY 10017
-
- Cc: 1003.1 Technical Reviewers:
-
- Maggie Lee, 2 Jeff Smits, 6 Hal Jespersen, Rationale
- +1-408-746-7216 +1-201-522-6263 +1-415-420-6400
- ihnp4!amdahl!maggie ihnp4!attunix!smits ucbvax!unisoft!hlj
-
- There are several problems in IEEE Std 1003.1, Draft 11
- regarding writes to a pipe or FIFO. These problems are
- sufficient to produce a no ballot from USENIX. This
- objection includes discussion of the problems, their
- sources, and suggested solutions, including both standard
- and rationale text.
-
-
- 1. Problems
-
- 1.1 Ambiguous O_NONBLOCK wording in Draft 11, 6.4.2.2.
-
- Understanding the case of the triple condition
-
- + O_NONBLOCK is set,
-
- + and {PIPE_BUF} < nbyte <= {PIPE_MAX},
-
- + and 0 < immediately writable < nbyte,
-
- requires a close reading of Draft 11, 6.4.2.2, page 125,
- lines 224-227:
-
-
-
-
-
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
- Pipe Write Problems Page 2 of 11 IEEE 1003.1 N.116
-
-
-
- If the O_NONBLOCK flag is set, write() shall not block
- the process. If nbyte > {PIPE_BUF}, and some data can
- be written without blocking the process, write() shall
- write what it can and return the number of bytes
- written. Otherwise, it shall return -1 and errno shall
- be set to [EAGAIN].
-
- It is not immediately obvious what ``Otherwise'' refers to
- (which clause of the condition?). But in the context of the
- paragraph at lines 217-221 it must refer to the case when
- {PIPE_BUF} < nbyte <= {PIPE_MAX} and no data can be written
- without blocking the process.
-
- 1.2 Nonblocking partial pipe writes are an option in
- Draft 11.
-
- According to David Willcox, who was in many of the atomic
- pipe write small groups, the word ``can'' in both uses in
- the preceding quote is meant to refer to what the
- implementation permits. In other words, the case where
- ``some data can be written'' may refer to there being some
- space free in the pipe, or the case may be null, meaning
- that [EAGAIN] will always be returned when {PIPE_BUF} <
- nbyte <= {PIPE_MAX}, regardless of whether there is free
- space in the pipe or not. Which is to say that the standard
- permits the implementation to perform partial writes, but
- does not require it to do so.
-
- Partial writes are not implementation-defined (according to
- the definition in 2.1), because the standard completely
- describes their behavior (or attempts to). So partial
- writes are an interface implementation option in Draft 11,
- even though they are not properly specified as such by the
- use of the word ``may'' or listing in 2.2.1.2.
-
- 1.3 Incorrect error code?
-
- If partial writes are not implemented, the error [EAGAIN] is
- not appropriate, because the write will never succeed, no
- matter how many times it is retried. Better would be
- [EINVAL], which matches the other cases where retrying will
- not help. However, this argument assumes that {PIPE_BUF} is
- not only the maximum atomic size, but also the maximum
- amount writable on one operation: this may not be so; see
- below.
-
-
-
-
-
-
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
- Pipe Write Problems Page 3 of 11 IEEE 1003.1 N.116
-
-
-
- 1.4 {PIPE_MAX} with O_NONBLOCK clear.
-
- Should {PIPE_MAX} apply when O_NONBLOCK is not set? All of
- Version 7, System V Release 3, 4.2BSD, and 4.3BSD permit
- arbitrarily large values of nbyte when O_NDELAY is not set.
- While it is possible to imagine a system where such a limit
- would be required by the implementation, there seem to be
- none at the moment, so there are probably no applications
- that depend on it. The enforcement of such a limit would
- make pipes basically different from other things that
- write() can be applied to, requiring extra code in
- applications. Thus there is no obvious advantage in
- portability for applications. So {PIPE_MAX} should not be
- applied when O_NONBLOCK is clear.
-
-
- 2. Sources of the problems.
-
- There are three basic sources of confusion about the
- behavior of pipes and FIFOs (especially when the non-
- blocking flag is set):
-
- 1. It is not clear what the various existing systems do.
-
- 2. It is clear that they do many things differently.
-
- 3. It is not clear what behavior is important to
- applications, and thus worth standardizing.
-
- 2.1 Existing systems.
-
- Some of the following descriptions may not be totally
- accurate, but they should serve to illustrate the point of
- diversity.
-
- + Version 7 introduced atomicity of writes to pipes. The
- manual page write(2) guarantees that write requests of
- 4096 bytes or less will not be interleaved with writes
- from any other process. The purpose of this feature
- was to allow multiple processes to write to the same
- pipe while permitting a single reader to parse their
- data.
-
- 4096 also happens to be the size of a pipe, and is
- fixed at compile time (it is not larger because that
- would have made pipes large files, that is, they would
- have had indirect blocks).
-
- Any amount (that will fit in an int) of data may be
- requested on a single call to write().
-
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
- Pipe Write Problems Page 4 of 11 IEEE 1003.1 N.116
-
-
-
- Version 7 does not have a non-blocking flag.
-
- + The SVID requires atomicity of writes to pipes when the
- request is of {PIPE_BUF} bytes or less. This feature
- may have been introduced from the /usr/group Standard,
- which had it.
-
- There is no maximum write request, regardless of
- whether O_NDELAY is set.
-
- With O_NDELAY set, write requests of less than
- {PIPE_BUF} bytes either succeed or return zero. Write
- requests of more than that may also succeed partially,
- returning the amount written.
-
- + 4.2BSD appears to guarantee atomicity of pipe write
- requests up to 1024 bytes. It will return an error for
- requests for more than 4096 bytes when the O_NDELAY
- flag is set. Partial writes are not done. With the
- flag clear, any size write request will succeed
- eventually.
-
- + 4.3BSD does not guarantee atomicity of any size pipe
- write (greater than one byte). The maximum amount that
- can be requested will vary dynamically, as will the
- maximum amount that can be written on a single
- operation. With the O_NDELAY flag set, any write of
- more than one byte may be partial. UCB CSRG is
- probably amenable to changing this behavior.
-
- + Version 8 does not necessarily measure the maximum
- amount of data that can be written to a pipe on a given
- operation in bytes, i.e., it may depend on the number
- of outstanding write requests.
-
- There is no nonblocking flag in Version 8 or Version 9.
-
- 2.2 Useful behavior.
-
- It is more useful to specify how an application should
- interpret a return value than it is to specify precisely
- when the implementation shall return it. I believe this
- observation may be the rope for climbing out of the chronic
- pipe write morasse.
-
- [EAGAIN] should mean that retrying later with the same
- size request may succeed. The Rationale should
- recommend actions the application should take in
- such a case. Because some systems dynamically
- vary their pipe size, what would have succeeded
- this time on an empty pipe may not succeed next
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
- Pipe Write Problems Page 5 of 11 IEEE 1003.1 N.116
-
-
-
- time. Of course, if the request was for
- {PIPE_BUF} or less bytes, retries shall
- eventually succeed (unless no reader reads
- enough from the pipe). But it is not useful for
- the standard to attempt to specify for exactly
- what larger requests [EAGAIN] will be returned,
- or the probability of success on later retries.
- After all, if the reader does not read, no
- retries will succeed.
-
- [EINVAL] should mean that retrying later with the same
- size request shall never succeed. But the
- standard should not require the implementation
- to always return this error at a fixed limit.
-
- There is no reason for the standard to try to specify what
- happens in every corner case produced by the intersections
- of all the known implementations. The standard should
- specify behavior that promotes portability of applications
- and that is implementable relatively readily on existing
- systems. In addition, the behavior of writes to pipes or
- FIFOs should be made as little different from that of writes
- to other file descriptors as possible. The main reason for
- making it different at all is that POSIX does not currently
- include any more sophisticated interprocess communication
- facility: for example, given a reliable sequenced datagram
- service, there would be no need to require pipes to be
- atomic.
-
- 1. Atomic writes are useful. The standard should specify
- that write requests of {PIPE_BUF} or less bytes shall
- be atomic, regardless of whether O_NONBLOCK is set.
-
- 2. Write requests of more than {PIPE_BUF} bytes with
- O_NONBLOCK set are useful. A real time data
- acquisition process might want to write large amounts
- of data through a pipe to a single processing process,
- while never blocking.
-
- 3. Partial writes are useful, but not useful enough for
- the standard to require the implementation to include
- them. The standard should require portable
- applications to expect them, however: since the
- application should expect them for other kinds of
- writes, anyway. In other words, partial writes should
- not be a major option, instead merely an
- implementation-defined detail. Exactly when they
- occur is not important enough to specify (especially
- considering that it is not specified for other kinds
- of writes), except that they are prohibited when nbyte
- <= {PIPE_BUF} because of the guarantee of atomicity.
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
- Pipe Write Problems Page 6 of 11 IEEE 1003.1 N.116
-
-
-
- There is no strong reason for an application to be
- able to discover at compile or run time whether
- partial writes are implemented: every application
- should assume that they may be implemented.
-
- The usefulness of {PIPE_MAX} is slightly dubious, and it
- might be better to eliminate it, instead specifying that
- [EINVAL] may be returned whenever O_NONBLOCK is set and
- nbyte > {PIPE_BUF}. But let us assume that it is useful.
-
- 1. A maximum amount that can be requested without ever
- producing [EINVAL] is worthwhile. {PIPE_MAX} could be
- used for this. But it should not apply if O_NONBLOCK
- is not set.
-
- 2. {PIPE_MAX} >= {PIPE_BUF}. Allowing {PIPE_MAX} <
- {PIPE_BUF} would permit a guaranteed atomic write to
- return [EINVAL], which is a contradiction.
-
- 3. The standard should explicitly permit an
- implementation to set {PIPE_MAX} = {PIPE_BUF}, simply
- because there is no reason to prohibit it. This would
- not rule out partial writes, but would mean that
- applications running on such an implementation should
- never depend on successful writes with nbyte >
- {PIPE_BUF}.
-
- 4. The standard should permit an implementation to set
- {PIPE_MAX} = {INT_MAX}, meaning that [EINVAL] will
- never be returned. That is effectively what some
- implementations do, and there is no reason not to if
- partial writes are implemented.
-
- 5. An implementation could even set all three limits
- equal: {PIPE_BUF} = {PIPE_MAX} = {INT_MAX}, meaning
- that [EINVAL] will never be returned, there are no
- partial writes, and all writes are atomic.
-
- Finally, this is an interface standard: it should not try to
- specify implementation details, such as the internal
- buffering arrangements of the pipe. Such phrases as ``it
- shall write as much as it can'' are inappropriate.
-
-
- 3. Rewording.
-
- Here is rewording to account for the implications of the
- above arguments.
-
- The text and tables below include specifications and
- rationale for {PIPE_MAX}. But, if the Working Group decides
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
- Pipe Write Problems Page 7 of 11 IEEE 1003.1 N.116
-
-
-
- to drop {PIPE_MAX}, it can be excised with no ill effects.
- References to it should then also be removed from Draft 11
- 2.9.2, page 42, lines 808-810, and 5.7.1.2, page 117, line
- 971.
-
- 3.1 Standard.
-
- Move the definition of {PIPE_MAX} down into the text that
- specifies what happens when O_NONBLOCK is set. That is,
- first remove Draft 11 6.4.2.2, page 125, lines 215-216:
-
- Write requests for greater than {PIPE_MAX} bytes shall
- result in a return of value of -1 and set errno to
- [EINVAL].
-
- Then replace the wording (quoted in 1.1 above) of Draft 11,
- 6.4.2.2, page 125, lines 224-227 with this new wording:
-
- If the O_NONBLOCK flag is set, write requests shall be
- handled differently in the following ways: The write()
- function shall not block the process. Write requests
- for {PIPE_BUF} or less bytes shall either succeed
- completely and return nbyte, or return -1 and set errno
- to [EAGAIN] to indicate that retrying the write() later
- with the same arguments may succeed. Write requests
- for more than {PIPE_BUF} bytes may in addition write
- some amount of data less than nbyte and return the
- amount written. Write requests for more than
- {PIPE_MAX} bytes may in addition return -1 and set
- errno to [EINVAL] to indicate that retrying the write()
- later with the same arguments shall never succeed.
- {PIPE_MAX} shall be greater than or equal to {PIPE_BUF}
- and less than or equal to {INT_MAX}.
-
- The beginning of the following paragraph, 6.4.2.2, page 125,
- lines 228-229, is misleading and should be changed from
-
- When attempting to write to a file descriptor...
-
- to
-
- When attempting to write to a file descriptor (other
- than one for a pipe or FIFO)...
-
- The meaning of [EINVAL] when set by write() as specified in
- 6.4.2.4, page 126, lines 260-261, should be changed from
-
- [EINVAL] An attempt was made to write more than
- {PIPE_MAX} bytes to a pipe or FIFO special file.
- to
-
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
- Pipe Write Problems Page 8 of 11 IEEE 1003.1 N.116
-
-
-
- [EINVAL] An attempt was made to write to a pipe or FIFO
- special file with a value of nbyte greater than
- {PIPE_MAX} and also large enough that the
- operation shall never succeed if retried.
-
- 3.2 Rationale.
-
- In the Rationale, remove the editorial note from B.6.4.2,
- Page 240, line 2104, and replace B.6.4.2, Page 240, line
- 2105 (``Write to a Pipe'') with:
-
- [begin replacement]
-
- An attempt to write to a pipe or FIFO has several major
- characteristics:
-
- Atomic/non-atomic
- A write is atomic if the whole amount written in one
- operation is not interleaved with data from any other
- process. This is useful when there are multiple
- writers sending data to a single reader. Applications
- need to know how large a write request can be expected
- to be performed atomically. We call this maximum
- {PIPE_BUF}. The standard does not say whether write
- requests for more than {PIPE_BUF} bytes will be atomic,
- but requires that writes of {PIPE_BUF} or less bytes
- shall be atomic.
-
- Blocking/immediate
- Blocking is only possible with O_NONBLOCK clear. If
- there is enough space for all the data requested to be
- written immediately, the implementation should do so.
- Otherwise, the process may block, that is, pause until
- enough space is available for writing. The effective
- size of a pipe or FIFO (the maximum amount that can be
- written in one operation without blocking) may vary
- dynamically, depending on the implementation, so it is
- not possible to specify a fixed value for it.
-
- Complete/partial/deferred
- A write request,
-
- int fildes, nbyte, ret;
- char *buf;
-
- ret = write(fildes, buf, nbyte);
-
- may return
-
- complete: ret = nbyte
-
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
- Pipe Write Problems Page 9 of 11 IEEE 1003.1 N.116
-
-
-
- partial: ret < nbyte
- This shall never happen if nbyte <=
- {PIPE_BUF}. If it does happen (with nbyte
- > {PIPE_BUF}), the standard does not
- guarantee atomicity, even if ret <=
- {PIPE_BUF}, because atomicity is guaranteed
- according to the amount requested, not the
- amount written.
-
- deferred: ret = -1, errno = [EAGAIN]
- This error indicates that a later request
- may succeed. It does not indicate that it
- shall succeed, even if nbyte <= {PIPE_BUF},
- because if no process reads from the pipe
- or FIFO, the write will never succeed. An
- application could usefully count the number
- of times [EAGAIN] is caused by a particular
- value of nbyte > {PIPE_BUF} and perhaps do
- later writes with a smaller value, on the
- assumption that the effective size of the
- pipe may have decreased.
-
- Partial and deferred writes are only possible with
- O_NONBLOCK set.
-
- Requestable/invalid
- If a write request shall never succeed with the value
- given for nbyte, the request is invalid, and write()
- shall return -1 with errno set to [EINVAL]. This is
- only permitted to happen when nbyte > {PIPE_MAX} and
- O_NONBLOCK is set, and it is never required to happen.
- {PIPE_MAX} is not necessarily a minimum on the
- effective size of a pipe or FIFO; if it says anything
- about that size, it is that it sometimes varies above
- {PIPE_MAX}. Because {PIPE_MAX} specifies the maximum
- size write request that shall never cause [EINVAL], it
- must be greater than or equal to the maximum atomic
- write size, {PIPE_BUF}. {PIPE_BUF} and {PIPE_MAX} may
- be equal, which means that [EINVAL] may be produced by
- any write of greater than {PIPE_BUF} bytes. {PIPE_MAX}
- may be equal to {INT_MAX}, meaning that [EINVAL] shall
- never be returned (unless nbyte > {INT_MAX}, when the
- result is implementation-defined). All three limits
- may be equal, meaning that [EINVAL] shall never be
- returned, no partial writes are done, and all completed
- writes are atomic. Applications should be prepared for
- all these cases.
-
- The relations of these properties are best shown in tables.
-
-
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
- Pipe Write Problems Page 10 of 11 IEEE 1003.1 N.116
-
-
-
- ________________________________________________
- | Write to a Pipe or FIFO with O_NONBLOCK clear.|
- |_____________|_________________________________|
- | immediately | |
- | writable: | none some nbyte |
- |_____________|_________________________________|
- | | atomic atomic atomic |
- | nbyte <= | blocking blocking immediate|
- | {PIPE_BUF} | nbyte nbyte nbyte |
- |_____________|_________________________________|
- | | atomic? atomic? atomic? |
- | nbyte > | blocking blocking immediate|
- | {PIPE_BUF} | nbyte nbyte nbyte |
- |_____________|_________________________________|
-
- If the O_NONBLOCK flag is clear, a write request shall block
- if the amount writable immediately is less than that
- requested. If the flag is set (by fcntl()), a write request
- shall never block.
-
- __________________________________________________________
- | Write to a Pipe or FIFO with O_NONBLOCK set. |
- |____________|____________________________________________|
- | immediately| |
- | writable: | none some nbyte |
- |____________|____________________________________________|
- | nbyte <= | -1, -1, atomic |
- | {PIPE_BUF} | [EAGAIN] [EAGAIN] nbyte |
- |____________|____________________________________________|
- | | atomic? atomic? |
- | | < nbyte <=nbyte |
- | nbyte > | -1, or -1, or -1, |
- | {PIPE_BUF} | [EAGAIN] [EAGAIN] [EAGAIN] |
- |____________|____________________________________________|
- | | atomic? atomic? |
- | | < nbyte <=nbyte |
- | nbyte > | -1, or -1, or -1, |
- | {PIPE_MAX} | ([EAGAIN] ([EAGAIN] ([EAGAIN] |
- | | or [EINVAL]) or [EINVAL]) or [EINVAL])|
- |____________|____________________________________________|
-
- There is no way provided for an application to determine
- whether the implementation will ever perform partial writes
- to a pipe or FIFO. Every application should be prepared to
- handle partial writes when O_NONBLOCK is set and the
- requested amount is greater than {PIPE_BUF}, just as every
- application should be prepared to handle partial writes on
- other kinds of file descriptors.
-
- Where the standard requires -1 returned and errno set to
- [EAGAIN], most historical implementations return 0 (with the
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
- Pipe Write Problems Page 11 of 11 IEEE 1003.1 N.116
-
-
-
- O_NDELAY flag set: that flag is the historical predecessor
- of O_NONBLOCK, but is not itself in the standard). The
- error indications in the standard were chosen so that an
- application can distinguish these cases from end of file.
- While write() cannot receive an indication of end of file,
- read() can, and the Working Group chose to make the two
- functions have similar return values. Also, some existing
- systems (e.g., Version 8) permit a write of zero bytes to
- mean that the reader should get an end of file indication:
- for those systems, a return value of zero from write
- indicates a successful write of an end of file indication.
- [end replacement]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- $Revision: 3.1 $ DRAFT $Date: 87/08/24 10:54:56 $
-
-
-
-
-
-
-
- CONTENTS
-
-
- 1. Problems............................................. 1
- 1.1 Ambiguous O_NONBLOCK wording in Draft 11,
- 6.4.2.2......................................... 1
- 1.2 Nonblocking partial pipe writes are an option
- in Draft 11..................................... 2
- 1.3 Incorrect error code?........................... 2
- 1.4 {PIPE_MAX} with O_NONBLOCK clear................ 3
-
- 2. Sources of the problems.............................. 3
- 2.1 Existing systems................................ 3
- 2.2 Useful behavior................................. 4
-
- 3. Rewording............................................ 6
- 3.1 Standard........................................ 7
- 3.2 Rationale....................................... 8
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - i -
-
-
-
-
-
- Volume-Number: Volume 12, Number 22
-
-