home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.c
- Path: sparky!uunet!uunet.ca!wildcan!sq!msb
- From: msb@sq.sq.com (Mark Brader)
- Subject: struct hack, and other out-of-array references
- Message-ID: <1992Sep10.014137.16209@sq.sq.com>
- Summary: struct hack legal even though int a[5][5];a[1][7]...; illegal
- Organization: SoftQuad Inc., Toronto, Canada
- References: <9209080014.AA03467@enet-gw.pa.dec.com> <1992Sep09.112101.1139@x.co.uk> <1992Sep07.104932.20060@x.co.uk> <1992Sep8.124655.1498@Urmel.Informatik.RWTH-Aachen.DE>
- Date: Thu, 10 Sep 92 01:41:37 GMT
- Lines: 148
-
-
- [<> = Norman Diamond; o' = Clive Feather; ^^ = Stephen R. van den Berg;
- || = Interpretation Ruling; ## = the standard; no indent = me.]
-
- <> You can't go past the end of an array object. But if malloc() or some
- <> other variable has defined the end of the actual array object, then the
- <> + operator can get you that far, regardless of the declared type that
- <> some other array variable had before getting flattened to a pointer.
-
- o' But there is an interpretation that says that, given
- o' int a [5][5];
- o' the access "a [1][6]" is illegal, because it goes past the bounds of the
- o' array "a [1]". In other words, the declared type of the array does
- o' restrict what can happen to a pointer derived from it.
-
- Clive is right to the extent that Norman shouldn't have said "or some other
- variable". (Sorry, Norman, I'd forgotten about this myself when I email
- you before.) However, objects returned by malloc() are another matter.
-
-
- <> Is that an actual interpretation ruling ...?
-
- o' RFI 17, item 16.
-
- || For an array of arrays, the permitted pointer arithmetic in Standard
- || ##3.3.6 Semantics (page 48, lines 12-40) is to be understood by
- || interpreting the use of the word "object" as denoting the specific
- || object determined directly by the pointer's type and value, *not* other
- || objects related to that one by contiguity. For example, the following
- || code has undefined behaviour:
- || int a [4][5];
- || a [1][7] = 0; /* undefined */
- || Some conforming implementations may choose to diagnose an "array bounds
- || violation", while others may choose to interpret such attempted accesses
- || successfully with the "obvious" extended semantics.
-
- I found this ruling very surprising, but after reviewing the standard,
- I decided that it was correct. The standard defines [] in terms of +
- and *, and the relevant text restricting this usage of + (in section 3.3.6,
- ANSI numbering, 6.3.6 in ISO) is the following. In the typical use
- of [], of course, "the pointer operand" is the first operand and "the
- integral expression" is the second operand of the []:
-
- ## If the pointer operand points to an element of an array object,
- ## and the array is large enough, the result points to an element
- ## offset from the original element such that the difference in the
- ## subscripts of the resulting and original array elements equals
- ## the integral expression. In other words, if the expression P
- ## points to the i-th element of an array object, the expressions
- ## P+(N) ... and P-(N) (where N has the value n) point to, respectively,
- ## the i+n-th and i-n-th elements of the array object, provided they
- ## exist.
-
- There is then text allowing a pointer one place past the last element to
- be computed, and then...
-
- ## If both the pointer operand and the result point to elements of the
- ## same array object, or one past the last element of the same array
- ## object, the evaluation shall not produce an overflow; otherwise, the
- ## behavior is undefined.
-
- Now, the important thing is that nowhere in the standard is there any
- license to treat the object a as an array of 20 ints. It is an array
- of 4 arrays of 5 ints each. If you write a[1][7], you are calling for
- the computation of a[1]+7. a[1] here decays to a pointer to int, which
- points to the first element of the array a[1]. a[1]+7 does not point
- to that array. The fact that we know that it points to a certain element
- of the array a[2] is irrelevant; it is undefined behavior to compute this.
-
- ^^ ... but:
- ^^ int a[5][5]; int*p;
- ^^ p=a[1]+5;
- ^^ printf("a[2][0]=a[1][6]=%d\n",*++p);
- ^^
- ^^ sure seems to be allowed (although not recommended).
-
- Even assuming that there is supposed to be a line assigning a value to the
- accessed sub-element of a, this code is certainly not recommended, because
- its intent is to print "a[2][0]=" followed by the value of a[2][1]!
-
- As to whether it's allowed, there seems to be some room for doubt about
- that. The issue, which I think was raised and not resolved some time
- back on this newsgroup, is what exactly the phrase "points one past the
- last element of the array object" means. There is of course an obvious
- interpretation/implementation where you the pointer value just gets
- incremented by the appropriate element size, but it could also be that
- the phrase means to signify an *out-of-band* pointer value, which cannot
- be dereferenced unless decremented back into the array.
-
- It would take an Interpretation Ruling to settle that. My own interpre-
- tation is that Stephen's code is legal.
-
-
- o' This RFT applies to slices of arrays, but, in *my* opinion, it is
- o' extendable to this case:
-
- [In case anyone has read this far in the thread and doesn't know, this code
- is what the "struct hack" of the subject line refers to.]
-
- o' struct fred { int i; char s [1]; } *f;
- o' char *s;
- o'
- o' /* ... */
- o' f = malloc (sizeof f + strlen (ss));
- o' if (f != NULL)
- o' strcpy (f->s, ss);
- o'
- o' The pointer f->s points to an object with type "char [1]", and so, if
- o' strlen (ss) > 0, the access to (f->s)[1] required by strlen is undefined
- o' according to this RFI, even though the array has already decayed to a
- o' pointer.
-
- Clive's opinion here is wrong (see also signature quote). The difference
- between this and the other case is that the standard provides (in ANSI
- section 4.10.3, ISO 7.10.3) specific dispensation for the value returned by
- malloc() to:
-
- ## ... be assigned to a pointer to any type of object and then used to
- ## access such an object or an array of such objects in the space
- ## allocated ...
-
- The pointer f->s does point to an object with type char[1], but it is
- also a pointer into the space returned by malloc(), which may be treated
- as an array of chars. That is, when strcpy computes the equivalent of
- (f->s)[1], and therefore (f->s)+1, both the pointer operand of +, i.e.
- f->s, and the computed result *are* within the same array, namely the one
- returned by malloc(), and all is well.
-
-
- I would also argue that the Interpretation Ruling cited above would not
- apply if the type of the example array "a" had been char instead of int.
- The definitions of "object" and "byte" (in ANSI section 1.6, ISO 3.14 and
- 3.4) in effect require that it be possible to treat any object as an
- array of any character type. This guarantee is assumed, of course, by
- any number of library functions. Once again, this should render legal any
- references that are out of bounds in a subarray but which address locations
- known to be in the containing array.
-
- If all this had been raised during standardization, I'd've asked for a
- change to render a[1][7] legal as well, because I think it's more C-like
- for it to be legal, and more consistent with the three exceptions that I've
- just mentioned. But it isn't, even though I think they all are.
- --
- Mark Brader "'A matter of opinion'[?] I have to say you are
- SoftQuad Inc., Toronto right. There['s] your opinion, which is wrong,
- utzoo!sq!msb, msb@sq.com and mine, which is right." -- Gene Ward Smith
-
- This article is in the public domain.
-