home *** CD-ROM | disk | FTP | other *** search
- From: richw@inmet.camb.inmet.com
- Date: 21 Aug 1992 17:25 EDT
- Newsgroups: comp.lang.c
- Subject: Re: Allocate memory to typed in string,
- Message-ID: <20900033@inmet>
- Path: sparky!uunet!noc.near.net!inmet!inmet!richw
- Nf-ID: #R:<bibhas:712893811:inmet:20900033:000:5742
- Nf-From: inmet.camb.inmet.com!richw Aug 21 17:25:00 1992
- References: <712893811@<bibhas>
- Lines: 139
-
-
- An algorithm to read an arbitrarily long input line that I've used might be
- encoded in a routine with the following kind of "spec":
-
- char *
- read_line (f, buffer, buffer_size)
- FILE *f;
- char *buffer;
- int buffer_size;
- /*
- Reads characters from "f" until either a newline has been read
- or an EOF. The characters read are either placed in the memory
- pointed to by "buffer" or in memory allocated by "malloc";
- "buffer" is used if fewer than "buffer_size" characters are
- read. A pointer to the memory used (i.e. the value of "buffer"
- or of the "malloc"-ed string) is returned, with the input line
- terminated by a '\0'.
-
- To always have this function use "malloc"-ed memory, pass in a
- value of "0" for buffer_size.
- */
-
- Note that this function lets one allocate a "usually big enough" buffer and
- pass it in as "buffer", but it *also* handles the "rare" case where the input
- line is longer than that. (Personally, I think that any answer to the
- original question that refuses to deal with the exceptional case is not
- really answering the question.)
-
- I'll describe the algorithm for "read_line":
-
- "read_line" uses a secondary, recursive function with the following spec:
-
-
- char *
- finish_line (f, reserve_count)
- FILE *f;
- int reserve_count;
- /*
- Like "read_line", this reads N characters from "f" until either a
- newline has been read or an EOF. The characters are placed in
- a "malloc"-ed piece of memory whose length is X, where X equals
- (reserve_count + N + 1), where the additional 1 is for the term-
- inating '\0'. The pointer, P, to the "malloc"-ed space is returned.
- The input N characters and the terminating '\0' are written to
- (P + reserve_count), thus leaving space for "reserve_count" characters
- read earlier.
- */
-
-
- The body of "finish_line" includes a local buffer:
-
- char buffer [BUFFER_SIZE];
-
- where "BUFFER_SIZE" is some "#define"-ed constant. The second to last element
- of "buffer" is initialized to '\0'. Then, "buffer" is given to "fgets",
- which may or may not read a complete line (i.e. it won't if "BUFFER_SIZE"
- isn't big enough. To determine whether or not "fgets" did read a whole line:
-
- (1) if the second-to-last "buffer" character is still '\0',
- then "fgets" must have read a whole line;
-
- (2) Otherwise, if the second-to-last "buffer" character is '\n',
- then, again, "fgets" must have read a whole line;
-
- (3) Otherwise, we don't know if EOF was encountered after *exactly*
- "BUFFER_SIZE - 1" characters were read, or if characters remain
- in "f" in this line. In either case, we recursively call
- "finish_line", with "reserve_count + BUFFER_SIZE - 1" as the
- "reserve_count" parameter to the recursive call.
-
- In case (1), we "malloc" a string size: (reserve_count + strlen (buffer) + 1);
- in case (2), we "malloc" a string size: (reserve_count + BUFFER_SIZE);
- in case (3), the "malloc"-ed space is returned by the recursive call.
-
- In cases (1) and (2), use "strcpy (result + reserve_count, buffer)" to transfer
- this final invocation's chunk of the line into the result string; in case (3),
- use "memcpy" instead since the '\0' in "buffer" should not be transferred.
-
-
- The implementation of "read_line" is entirely analogous, except that the
- buffer and its size are the passed-in arguments, and not the "buffer" local
- and the BUFFER_SIZE constant. Also, in cases (1) and (2), no "malloc" or
- "strcpy" is needed.
-
-
- Performance:
-
- In those cases where the "buffer_size" argument to "read_line" is larger than
- the size of the input string, the cost equals the cost of a "fgets" plus the
- minimal overhead of checking that "fgets" got the whole line. Thus, if your
- "buffer_size" is "usually big enough", then this "usually" is just a little
- slower than "fgets" (but a lot safer).
-
- In those cases where "buffer_size" is not big enough, then the cost is:
-
- 1 malloc + N strcpy calls + (N + 1) fgets calls
- + (read_line overhead) + N * (finish_line overhead)
-
- where N is the number of times "finish_line" is called, or about:
-
- (((input line length) - buffer_size) / BUFFER_SIZE) rounded-up
-
- In terms of memory usage:
-
- When "buffer_size" is big enough, there's just the memory needed for the
- "read_line" stack frame (which intentionally does *not* include a "buffer"
- like "finish_line"). When it's not big enough, there are N additional
- "finish_line" stack frames, each of which includes a BUFFER_SIZE "buffer".
-
-
-
- Note that this *intentionally* avoids using "strcat" with each recursize call,
- since the cost of each such call gets "BUFFER_SIZE" more expensive than the
- last.
-
- Note this also effectively allocates temporary storage in the run-time stack
- for each extra "chunk" of input that remains; this should be faster than
- using a linked-list of "malloc"-ed chunks, as someone suggested, presumably
- since stack allocation and freeing is faster than "malloc" and "free" calls.
-
- Rich Wagner
- Intermetrics, Inc.
- (richw@inmet.camb.inmet.com)
-
-
- P.S. I should also point out that an alternative solution involves being
- optimistic in "finish_line" and -- instead of using a "buffer" local --
- using "malloc" to allocate a pointer, P, to (reserve_count + BUFFER_SIZE)
- bytes. The "fgets" then reads directly into (P + "reserve_count").
- This loses if "fgets" then doesn't get the whole string; in that case,
- you paid the price of one extra "malloc" and one extra "free", whereas
- using a "buffer" local would've been faster.
-
- Also, even if the "fgets" in "finish_line" does fit the rest of the
- line into the the space at P, there'll probably unused bytes at the
- end of that "malloc"-ed space, whereas the original method allocates
- exactly the number of needed bytes from the heap.
-
-
-