home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!sun-barr!ames!agate!dog.ee.lbl.gov!horse.ee.lbl.gov!torek
- From: torek@horse.ee.lbl.gov (Chris Torek)
- Newsgroups: comp.lang.c
- Subject: Re: Where are literals stored?
- Date: 24 Nov 1992 01:34:12 GMT
- Organization: Lawrence Berkeley Laboratory, Berkeley
- Lines: 131
- Message-ID: <27627@dog.ee.lbl.gov>
- References: <27542@dog.ee.lbl.gov> <By0vzy.4Ly@ocsmd.ocs.com> <14542.610.uupcb@spacebbs.com> <27583@dog.ee.lbl.gov>
- Reply-To: torek@horse.ee.lbl.gov (Chris Torek)
- NNTP-Posting-Host: 128.3.112.15
-
- In article <27542@dog.ee.lbl.gov> I claimed that strings used as array
- initializers are not `string literals'. Mark Brader and others inform
- me that my terminology disagreed with the standard, so I will abandon
- that. (I just moved and my copy of the standard is in a box somewhere
- within a pile of furniture.... On the bright side, I now live two
- blocks from campus, and hence from the free LBL shuttle.)
-
- In article <14542.610.uupcb@spacebbs.com> ted.jensen@spacebbs.com
- (Ted Jensen) writes:
- >... according to K&R2 p194 under "A2.6 String Literals"
- >"A string literal, also called a string constant, is a sequence
- >of characters surrounded by double quotes, as in "..."."
- >"A string has type 'array of characters' and storage class static
- >and is initialized with the given characters."
-
- Note that K&R-2 is not the standard, and is itself sometimes at
- variance with the standard. (Neither of the quoted statements are
- wrong in and of themselves, as far as I know, but they do not tell
- the whole story.)
-
- >void my_func(void) {
- > static char a[] = "ABC";
- > char b[] = "ABC";
-
- >The actual strings themselves were, in both cases, stored in the
- >data segment. ...
-
- I will note (without reference to the standard, although I am confident
- that this is correct) that a compiler will have to emit `a' in a data
- segment (or local system quivalent). On the other hand, `b' could be
- set up with something like:
-
- // hypothetical 16-bit big-endian machine
- sub #4,sp // make room for 4 bytes
- mov #'AB',0(sp) // set b[0], b[1]
- mov #'C\0',2(sp) // set b[2], b[3]
-
- This might take less space and time than a call to strcpy() or memcpy(),
- and would violate nothing in the standard, but it would mean that the
- "ABC\0" for array `b' would appear nowhere in the source code.
-
- >In the case of array b[] the [Borland] code, on entering the function at
- >run time, copies the string from the data segment to the stack and 'b'
- >takes on the value of a pointer pointing to the string now on the stack.
- >This was not clear from some of the replies which, IMHO, made it sound
- >like that in the case of b[] the string literal ABC was not stored in
- >the data segment.
-
- There is no reason that the string "ABC" *must* appear in the data
- segment (other than for a[]); the Borland compiler merely uses this
- as an implementation technique.
-
- In article <27583@dog.ee.lbl.gov> I noted that
- >... It is not completely clear whether merging overlapping literals
- >is permitted by the ANSI standard, but I would not object to a compiler
- >that did so.
-
- Several people have sent mail asking how a conformant program could
- tell whether a compiler has done so. (If a conformant program cannot
- test for some effect, that effect is implicitly allowed under the
- `as-if' rule.)
-
- The standard leaves undefined the effect of `<' or `>' comparsions on
- pointers to different objects, but the `==' and `!=' comparisons are
- fully specified. In particular, given any two valid pointers `p' and
- `q', of the same type, the standard tells us that:
-
- p == q if and only if they point to the same object;
- p != q otherwise.
-
- (This rule can have negative effects on run time for segmented
- implementations. In particular, on some PCs, == and != comparisons of
- pointers must normalize the pointers, while </> comparisons can compare
- only the offsets. This means that, in some cases, a loop of the form:
-
- for (p = &a[0]; p < &a[N]; p++)
-
- will run faster than one of the form:
-
- for (p = &a[0]; p != &a[N]; p++)
-
- A compiler needs to perform some analysis to discover that these are
- equivalent [assuming they are in fact equivalent; this depends on the
- loop body]).
-
- Anyway, given this fact, we can write:
-
- (p + strlen(p)) == (q + strlen(q))
-
- to see whether or not two strings overlap. This compares pointers to
- the objects (of type char or const char) holding the '\0's that end
- those strings. Those pointers will be equal if and only if the two
- objects are in fact the same single object. (Thanks to someone whose
- mail address I failed to save for this simplification---I was going
- to run a pointer forward along the longer string to look for overlap!)
-
- So, we can now write a strictly conformant program along these lines:
-
- #include <stdio.h>
- #include <stdlib.h>
-
- int main(void) {
- char *p, *q;
-
- p = "string";
- q = "ring";
- if (p + strlen(p) == q + strlen(q))
- printf("this compiler merged (%s,%s)\n", p, q);
- else
- printf("this compiler did not merge (%s,%s)\n", p, q);
- exit(EXIT_SUCCESS);
- }
-
- The standard does not explicitly grant license to compilers to do such
- merging unless the strings exactly match (e.g., "string" and
- "string"). On the other hand, the standard does not explicitly
- prohibit this either. It *does* say that pointers to distinct objects
- always compare unequal, and "string" and "ring" are certainly distinct
- objects---but are the last four characters of each *also* distinct
- objects? This is where comp.std.c gets involved. :-)
-
- Note that if we write, e.g.,
-
- char ap[] = "string", aq[] = "ring";
-
- we can tell from the standard that ap + strlen(ap) != aq + strlen(aq).
- Pointers to anonymous `string objects' are quite different from pointers
- into named arrays.
- --
- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 510 486 5427)
- Berkeley, CA Domain: torek@ee.lbl.gov
-