NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / lang / c / 17075 < prev next >

Wrap

Text File | 1992-11-24 | 5.7 KB | 144 lines

Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!sun-barr!ames!agate!dog.ee.lbl.gov!horse.ee.lbl.gov!torek From: torek@horse.ee.lbl.gov (Chris Torek) Newsgroups: comp.lang.c Subject: Re: Where are literals stored? Date: 24 Nov 1992 01:34:12 GMT Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 131 Message-ID: <27627@dog.ee.lbl.gov> References: <27542@dog.ee.lbl.gov> <By0vzy.4Ly@ocsmd.ocs.com> <14542.610.uupcb@spacebbs.com> <27583@dog.ee.lbl.gov> Reply-To: torek@horse.ee.lbl.gov (Chris Torek) NNTP-Posting-Host: 128.3.112.15 In article <27542@dog.ee.lbl.gov> I claimed that strings used as array initializers are not `string literals'. Mark Brader and others inform me that my terminology disagreed with the standard, so I will abandon that. (I just moved and my copy of the standard is in a box somewhere within a pile of furniture.... On the bright side, I now live two blocks from campus, and hence from the free LBL shuttle.) In article <14542.610.uupcb@spacebbs.com> ted.jensen@spacebbs.com (Ted Jensen) writes: >... according to K&R2 p194 under "A2.6 String Literals" >"A string literal, also called a string constant, is a sequence >of characters surrounded by double quotes, as in "..."." >"A string has type 'array of characters' and storage class static >and is initialized with the given characters." Note that K&R-2 is not the standard, and is itself sometimes at variance with the standard. (Neither of the quoted statements are wrong in and of themselves, as far as I know, but they do not tell the whole story.) >void my_func(void) { > static char a[] = "ABC"; > char b[] = "ABC"; >The actual strings themselves were, in both cases, stored in the >data segment. ... I will note (without reference to the standard, although I am confident that this is correct) that a compiler will have to emit `a' in a data segment (or local system quivalent). On the other hand, `b' could be set up with something like: // hypothetical 16-bit big-endian machine sub #4,sp // make room for 4 bytes mov #'AB',0(sp) // set b[0], b[1] mov #'C\0',2(sp) // set b[2], b[3] This might take less space and time than a call to strcpy() or memcpy(), and would violate nothing in the standard, but it would mean that the "ABC\0" for array `b' would appear nowhere in the source code. >In the case of array b[] the [Borland] code, on entering the function at >run time, copies the string from the data segment to the stack and 'b' >takes on the value of a pointer pointing to the string now on the stack. >This was not clear from some of the replies which, IMHO, made it sound >like that in the case of b[] the string literal ABC was not stored in >the data segment. There is no reason that the string "ABC" *must* appear in the data segment (other than for a[]); the Borland compiler merely uses this as an implementation technique. In article <27583@dog.ee.lbl.gov> I noted that >... It is not completely clear whether merging overlapping literals >is permitted by the ANSI standard, but I would not object to a compiler >that did so. Several people have sent mail asking how a conformant program could tell whether a compiler has done so. (If a conformant program cannot test for some effect, that effect is implicitly allowed under the `as-if' rule.) The standard leaves undefined the effect of `<' or `>' comparsions on pointers to different objects, but the `==' and `!=' comparisons are fully specified. In particular, given any two valid pointers `p' and `q', of the same type, the standard tells us that: p == q if and only if they point to the same object; p != q otherwise. (This rule can have negative effects on run time for segmented implementations. In particular, on some PCs, == and != comparisons of pointers must normalize the pointers, while </> comparisons can compare only the offsets. This means that, in some cases, a loop of the form: for (p = &a[0]; p < &a[N]; p++) will run faster than one of the form: for (p = &a[0]; p != &a[N]; p++) A compiler needs to perform some analysis to discover that these are equivalent [assuming they are in fact equivalent; this depends on the loop body]). Anyway, given this fact, we can write: (p + strlen(p)) == (q + strlen(q)) to see whether or not two strings overlap. This compares pointers to the objects (of type char or const char) holding the '\0's that end those strings. Those pointers will be equal if and only if the two objects are in fact the same single object. (Thanks to someone whose mail address I failed to save for this simplification---I was going to run a pointer forward along the longer string to look for overlap!) So, we can now write a strictly conformant program along these lines: #include <stdio.h> #include <stdlib.h> int main(void) { char *p, *q; p = "string"; q = "ring"; if (p + strlen(p) == q + strlen(q)) printf("this compiler merged (%s,%s)\n", p, q); else printf("this compiler did not merge (%s,%s)\n", p, q); exit(EXIT_SUCCESS); } The standard does not explicitly grant license to compilers to do such merging unless the strings exactly match (e.g., "string" and "string"). On the other hand, the standard does not explicitly prohibit this either. It *does* say that pointers to distinct objects always compare unequal, and "string" and "ring" are certainly distinct objects---but are the last four characters of each *also* distinct objects? This is where comp.std.c gets involved. :-) Note that if we write, e.g., char ap[] = "string", aq[] = "ring"; we can tell from the standard that ap + strlen(ap) != aq + strlen(aq). Pointers to anonymous `string objects' are quite different from pointers into named arrays. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 510 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov