Home Tools Solutions Order Mail Links

AnsiString Primer

This is a concise background primer for those unfamiliar with the new 32-bit long dynamic string type; otherwise known as AnsiString.

Structure

Under the hood, an AnsiString is primarily just a pointer to a dynamic memory block. Implicit pointer de-referencing and automatic memory management tends to obscure this fact. In response to an AnsiString declaration, only a string header (which includes a pointer) is allocated. Memory for the actual storage of text is dynamically allocated on assignment. In contrast, the older Pascal style strings were always allocated a static 256 byte memory block. The first byte always held the effective length; therefore, usable string length was limited to 255 characters or less.

Power and Performance

The dynamic memory used by an AnsiString is transparently managed (allocated, re-allocated and released as necessary) by the Delphi memory manager. As a result, an AnsiString can be easily assigned any length within the limits of available memory. A string that holds 2 bytes can be effortlessly expanded to hold 2 million bytes if needed and vice versa. This is much more powerful than the older Pascal style strings; however, with power comes responsibility. Even though memory management is out of site, it's implications can not be totally out of mind if good performance is to be achieved. Consider these simple examples:

Poor:
S2 := ’ ’;
for I := 2 to length(S1) do S2 := S2 + S1[I];

Better:
setlength(S2, length(S1) - 1);
for I := 2 to length(S1) do S2[I-1] := S1[I];

Ignoring the fact that Copy( ) would normally be used for this, the second example is potentially much more efficient. In the first case, string S2 is being built incrementally. This means that S2 may require repeated re-allocation inside the loop. Re-allocation is done automatically but it still takes time. In the second example, allocation is performed only once; prior to the loop.

Complaints regarding AnsiString performance can usually be traced to this sort of memory manager abuse and overuse. As shown above, such abuse can often be eliminated with some simple changes in coding style. Aside from their more powerful, dynamic nature; AnsiStrings are just strings --- a linear sequence of bytes in memory. With proper use and management, they are no more or no less efficient than any other string type.

Compatibility

Windows is largely written in C. As a result, the WinAPI functions expect C-style, null terminated strings. For compatibility, AnsiStrings are also null terminated. However, outside the API, this terminating null is not normally accessible and thus it can not and does not serve as an indicator of string length.

If a null doesn’t do it, what sets the length of an AnsiString? Instead of an embedded indicator, effective string length is stored in the AnsiString header alongside the dynamic memory block pointer. This uses a very small amount of storage overhead but the benefits are well worth it. Having the string length always readily available (using Length( )) simplifies almost every aspect of AnsiString use and makes string operations more efficient. In comparison, string functions in C/C++ routinely demand either continuous testing for end of string or an advance scan to determine effective length; both of which can adversely affect efficiency and ease of use.

Versatility

Within an AnsiString, a character is a character. No single character has any special significance over any other, not even a null. Therefore; an AnsiString is capable of holding not only text but binary data as well. As such, AnsiStrings make good, convenient general purpose buffers. A dynamically allocated buffer (using GetMem( ), GlobalAlloc( ), etc.) with tedious pointer addressing and mandatory cleanup can often be easily replaced with a safer, easier to use AnsiString.

Summary

AnsiStrings are a new, more powerful string type available for the first time with 32-bit Delphi. With proper coding, these new strings are just as efficient as the older Pascal style strings but much more powerful and versatile and they offer compatibility with the C strings used by the WinAPI.

With power, versatility and compatibility, why use anything but AnsiString?