home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!think.com!ames!agate!dog.ee.lbl.gov!horse.ee.lbl.gov!torek
- From: torek@horse.ee.lbl.gov (Chris Torek)
- Newsgroups: comp.lang.c
- Subject: Re: Where are literals stored?
- Date: 20 Nov 1992 21:17:46 GMT
- Organization: Lawrence Berkeley Laboratory, Berkeley
- Lines: 109
- Message-ID: <27583@dog.ee.lbl.gov>
- References: <27542@dog.ee.lbl.gov> <By0vzy.4Ly@ocsmd.ocs.com>
- Reply-To: torek@horse.ee.lbl.gov (Chris Torek)
- NNTP-Posting-Host: 128.3.112.15
-
- In article <27542@dog.ee.lbl.gov> I wrote:
- >>String literals are formed from quoted strings that appear in a
- >>value context, e.g.,
- >> char *p, *q;
- >> p = "ABC";
- >> q = "ABC";
- >>
- >> /* it is unspecified whether p == q */
-
- In article <By0vzy.4Ly@ocsmd.ocs.com> ted@ocsmd.ocs.com (Ted Scott) writes:
- >Ok, *I'm* confused here. Does this mean that if I do something like:
- >
- > *p[1] = ' ';
- >
- >q will now point to "A C" ??
-
- Since p[1] is a value of type `char', *p[1] is illegal: the operand of
- a unary `*' indirection operator must have type `pointer to T', for some
- valid type T.
-
- Assuming you meant
-
- p[1] = ' ';
-
- >or will the above assignment yield a SEGV? or what? (I know probably what :)
-
- Yes, `or what' :-) . In ANSI C, the effect is explicitly undefined;
- *anything* can happen. Your computer might suddenly quote King Lear
- (`old Tom's a-cold!'). If this were comp.std.c we could stop there,
- but...:
-
- >To carry this on further, what if p and q are in different scope, but the
- >same source file?
-
- The scope and duration of p and q are irrelevant until we limit ourselves
- to specific implementations.
-
- Given the code fragment:
-
- char *p = "ABC";
- p[1] = ' ';
-
- one of three effects is likely:
-
- 1) p[1] (which was 'B') is replaced by the code for ' ',
- so that p points to "A C".
- 2) The program aborts (e.g., segmentation fault).
- 3) The program continues, but p[1] remains 'B'.
-
- Effect (1) occurs in a number of implementations, all of which simply
- put string literals somewhere in RAM. Effect (2) occurs on many
- quality UNIX C implementations, all of which simply put string literals
- somewhere in the read-only protected text space. Effect (3) occurs
- on most embedded implementations, which simply put string literals
- somewhere in ROM.
-
- Now, with effects (2) and (3), if we have:
-
- char *q = "ABC";
-
- anywhere else in the same program, we will not be able to see any
- change, because there *was* no change (either the program aborted or
- the write attempt was ignored). So let us assume we see effect (1).
- What happens to q?
-
- This time the effect depends on exactly *where* string literals go
- ---which was part of the original question. Again, it is unanswerable
- without referring to specific implementations.
-
- Some compilers, particularly those which attempt to save space in the
- final executable, will `remember' every string literal as it appears in
- the source, and will make only one copy of each distinct literal. In
- this case, given
-
- char *p = "ABC", *q = "ABC";
-
- we will have p == q and (assuming effect (1)) both will point to "A C"
- after the assignment (p[1] = ' ').
-
- If p and q are separated by some distance---say, a scope block or
- separate source files---whether they will still compare equal depends
- on just how much effort the compiler puts into conserving space. It is
- not too difficult to keep a complete table of every string in every
- single source file; any compiler that defers code generation until
- `link' time is capable of arranging for p==q even if p and q are in
- separate files, or even if one (or both) is in a library. The reward
- for this effort is typically fairly small, however, and I know of no
- compilers that do it. (This does not mean that none exist.)
-
- That said, in a compiler for which compile-time space is not a serious
- issue (e.g., GCC), it is reasonable for the compiler to remember every
- string literal in a single source file. In this case, p==q iff the two
- "ABC"s appear in the same source file. Those who want better string
- optimization can use `helper' programs like the BSD UNIX `xstr'.
-
- Incidentally, note that even xstr is suboptimal: it does not catch
- libraries, and it given something like:
-
- char *p = "world", *q = "Hello world";
-
- it misses the chance to make the `world's overlap. (It catches them
- when they appear in the other order. To do a better job, xstr should
- read the entire file, then sort literals by length before attempting to
- combine tails.) It is not completely clear whether merging overlapping
- literals is permitted by the ANSI standard, but I would not object to a
- compiler that did so.
- --
- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 510 486 5427)
- Berkeley, CA Domain: torek@ee.lbl.gov
-