═══ 1. Authors ═══ Version 1.0 Steve Hargis Mike Skelton IBM Corporation Personal Software Products LAN Systems Performance Austin, Texas Please send comments to: Internet: hargis@vnet.ibm.com IBM Mail ID: usib4pvk@ibmmail ═══ 2. Trademarks ═══ References in this publication to IBM products, programs, or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM's product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any of IBM's intellectual property rights or other legally protectable rights may be used instead of the IBM product, program, or service. Evaluation and verification of operation in conjunction with other products, programs, or services, except those expressly designated by IBM, are the user's responsibility. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not imply giving license to these patents. Written license inquiries may be sent to the IBM Director of Commercial Relations, IBM Corporation, Purchase, NY 10577. The following terms in this publication, are trademarks of the IBM Corporation in the United States and/or other countries: IBM Corporation C Set/2, C/C++ Tools, IBM, OS/2, SPM/2, Theseus2 ═══ 3. Abstract ═══ This paper reviews memory leak detection and fixes as experienced by the IBM LAN Systems Performance area using the IBM C Set/2, IBM C/C++ Tools, and other compilers with Theseus2. The use of compiler facilities and utility programs to detect and fix memory leaks are demonstrated. The paper also discusses why memory leaks are a concern, what can be done to detect and fix them, and how to avoid memory leak problems. We give extensive examples of how common memory leaks are detected and fixed using the tools described in the paper. Software designers, developers, testers, performance analysts or others who are responsible for producing memory neutral software will benefit from the design and programming recommendations and tools described in this paper. It is our hope that the techniques used in this paper will become part of the repertoire of every developer and that the tools covered here will become part of the toolbox of every analyst. ═══ 4. Introduction ═══ Creating an excellent software product includes two goals. First, is making the product excellent through good design and development. Second, is iterating through comprehensive testing until all problems are fixed. We will focus on these goals as they pertain to memory management in software products. We cover the methodology and tools needed to be proactive/preventative in getting memory management right during design, code and unit test. We also address the methodology and tools needed to effectively react to memory management bugs found in the integration and system test phases. It seems only fair that we disclose any bias that may occur as we write. This white paper is written by members of a performance team; that is not to say that some of us do not have any code in products, we do. It does indicate that we highly value design and analysis before development in order to avoid errors and more analysis during development and test. As we look at code we consider its resource usage (i.e. CPU time, disk, memory, communication channels) and effects on other applications that will run concurrently. Thus we value low resource utilization over fast development and economy of resources over easy-to-code solutions. It is our experience that adherence to disciplined design and development will lead to increased software productivity and shorter development and test cycles. Teachers are often told to teach by 1) telling the students what you are going to teach them; 2) teach them; and 3) tell them what you taught them. Thus, after some basic definitions (the first section), the paper lists the lessons that we learned (the second section) about memory debugging as it relates to product development. We try to be clear and explicit. The third section of the paper lists the tools that we used and discusses how we used each tool; hopefully, this will equip developers to apply the tools to software products. The last section of the paper is several examples of exactly how we detected and plugged different types of memory leaks. This last section can be viewed as a tutorial. Two appendices are also included. The first appendix documents an anomaly that causes memory leaks when C code is interfaced with C++ code. And the second appendix is excerpted and edited forum questions and answers about the memory management function malloc. ═══ 5. Definitions ═══ Memory Leak A memory leak is memory a program allocates or that is allocated on a program's behalf, that is not freed after the program is finished with it. If the same action is executed again, the program allocates more memory instead of using memory previously allocated. With repeated executions the program accumulates more memory than it needs. The accumulation of memory by the leaking program prevents other applications and system functions from using the memory, thereby interfering with the systems operation. Memory leaks are usage dependent, like any programming bug. A program that has not leaked when running one set of test cases may leak when running a different test case. Memory Overwrite Also called a memory walker. A memory overwrite occurs when a program allocates less memory than it actually writes to a location. The effect is to overwrite and possibly wipe out data that happens to be located immediately after it in memory. Whether or not a memory overwrite occurs depends on how many additional bytes are stored beyond the amount allocated. The observable symptom is intermittent data corruption, which is dependent upon the length of data stored in memory (which is not an intuitive item to check). How the length of data is related to memory overwrites is the subject of an example in Section Detection and Solution Examples. Memory Neutrality A module, component, function, API, program, system, etc. is said to be memory neutral if it 1. deallocates all memory for which it is responsible (this is usually the same memory that it allocates, however some specification standards specify that function A is supposed to deallocate memory allocated by function B); and 2. does not attempt to deallocate more memory than that for which it is responsible. Both of these items must be true for memory neutrality. The basic definition is straight forward: Deallocate all, and only the memory that you allocate. However, when specifications call for different modules to delete memory than allocated the memory, the definition gets a little fuzzy in order to include such cases. The "memory than that for which it is responsible" wording takes care of unusual specifications. ═══ 6. Design and Programming Recommendations ═══ Our experiences have motivated creation of a set of design and programming recommendations that focus specifically on memory management. Their purpose is to be proactive during design, code, unit test and functional test so that discovery of memory problems, in complex system tests, is limited to genuinely complex problems that could not reasonably be found in simple test environments. The earlier in the product development that these recommendations are applied the smoother (and faster) development will go. Know the memory usage of each component within the product. A development approach that has gained much popularity is prototyping code. A quick prototype is used to prove that a concept will work, with the expectation that the lessons learned from the prototype will be kept but the prototype itself will be discarded and "real," quality code will be written for the product. However, schedules are often aggressively short and the prototype becomes the product. We have noticed that a major short cut taken in prototypes is in memory management. And why not? The prototype runs for a short time using whatever memory is needed and ends, thus freeing any memory used. As the accumulation of prototypes comes together as a product there has been neither a design for memory management nor a prototype for memory management. It is difficult to diagnose memory problems when no one has the big picture of how the product is supposed to use memory. The logical person to know what is supposed to happen is the developer, but he may need assistance in discovering what is really happening. Diagnosing memory problems usually requires an analyst using special software tools to provide an "under the covers" look at program behavior AND the developer to provide knowledge about what is supposed to be happening. The analyst provides expertise on what is actually occurring and the developer provides expertise on what is supposed to be occurring. Together their expertise can be used to make the product behave as it was designed. Ideally, the analyst and the developer are the same person. Do not rely on the operating system to delete memory when the process dies. This is a common practice with, what would seem, no ill effects. After all, if you need the memory until the process dies, why not let the operating system get rid of it for you? That the operating system will deallocate resources when the process dies is true; however, counting on that only works for relatively short-lived processes. A short running utility can count on that, but major processes that stay up for days, weeks or months should not be implemented that way. There are four main reasons 24/7 (24 hours a day, 7 days a week) processes, and arguably even shorter lived processes, should not rely on the operating system to delete resources. The first is that resource consumption will grow over time; if the time is short-lived and the resource consumption is small it may be OK, but if the time is long then the resources consumed will to begin reach the maximum of the machine. For example, if memory is allocated to store information until a process dies and the process does not die for 3 weeks, the information stored will consume a lot of memory. Consuming lots of memory will cause some memory somewhere (wherever it is used the least) to be swapped. It will not take very long for the swap file to be so large that it adversely affects the performance of everything that runs on the machine - all because memory usage was not tracked when allocations began so the memory could be freed or reused when previous use has ended. The second reason is that code may be ported or expected to run on a different operating system/platform. The new platform may not clean up resources as nicely as OS/2. Or perhaps the other platform does clean up nicely, today; but it may not in the next release. A third reason to not rely on the operating system to delete memory when a process dies is that the program's design may change. Code may be written to run in a process of it's own, however the process/thread model of products are subject to change. Code may be consolidated from several processes into a smaller number of longer lived processes in order to avoid process overhead and process creation, deletion overhead. Therefore, code that started out in a process of its own may be consolidated to run on a thread(s) of a process containing other threads. Resources like memory, semaphores, files etc., are owned by the process, not by the thread. So, when the code now terminates the thread, the process remains - along with all of the resources accumulated by the thread but never freed. The last reason for not relying on the operating system to delete resources is that debugging is greatly complicated by it. We strongly advocate use of the C Set/2 and C/C++ debug memory functions; when using them we have noticed that many programs are not memory neutral at the instruction immediately prior to returning control to the operating system. So if we are trying to fix a memory leak we must first decipher through the "noise" of memory that is being leaked "on purpose" and memory that is being leaked accidentally. This as a major hindrance to debugging. Let us close this piece of advice by saying that sound coding practices include allocating machine resources as needed, tracking those resources and deallocating them at the earliest reasonable time. Relying on the operating system to clean up a process' use of resources is dangerous in todays cross-platform code environment. Avoid designs where one module allocates memory and another module deallocates the memory. Unless of course, the design specification comes from a standards institute and you have no control over it. This sort of design can lead to problems such as: 1. Poor communications between memory allocation and deallocation modules, or programmers will lead to memory leaks. 2. When more than one C runtime exists, each runtime will need APIs to free memory and run _heapmin. Then memory allocated by a module compiled with one runtime can be called, by a module compiled with another runtime, to deallocate the memory. (_heapmin is a C runtime function that attempts to return unused heap memory to the operating system. It is an expensive call, thus it should not be run frequently. The algorithm for when it successfully returns memory to the operating system is complex. We recommend running _heapmin after a significant amount of processing in memory has completed and a lull of activity in the system is expected; this way the time to execute _heapmin may go unnoticed by the system users.) 3. Very few people, if anyone, will really know who is allocating what, for what, and who can deallocate or reuse what, under what conditions; and communicating that to fellow team members will be extremely difficult, not to mention the difficulties in answering/debugging memory usage questions. Minimize the number of different compilers used in the product. Each compiler (and usually each version of a single compiler) has it own runtime library and runtime libraries do not communicate with each other. Inevitably, a module compiled with compiler x will want to know about memory structures compiled with compiler y. They will not be able to communicate unless you code that ability; that seems like a waste of good time. Multiple compilers will also complicate your build environment and your builds. Set the criteria for determining memory neutrality early in the development cycle. Testing for memory neutrality can consume some time to create a build with memory debug features in it, execute test cases and analyze the data, but the results are worth the effort. If the memory usage is correct this analysis goes quickly. Our strong recommendation is that you use the debug memory features available in C Set/2 and C/C++ Tools in order to have an authoritative (the compiler knows best) source certify that your code is memory neutral. We recommend that memory neutrality be a test exit criteria beginning with functional verification test. If code has memory problems, finding and fixing them in a small environment is much easier than waiting until tests in a large, complex environment. Since memory use bugs are usage dependent, like any other code bug, some memory usage bugs may not show up until the code is used in a large, complex environment. ═══ 7. Memory Debugging Tools ═══ There are several tools that we have found to be very useful for debugging OS/2 memory problems. The sections that follow introduce the tools and their features. We recommend that you become familiar with each of them. ═══ 7.1. C Set/2 and C/C++ Tools Debug Memory Management Features ═══ The C Set/2 and C/C++ Tools compilers have some invaluable features when it comes to debugging memory problems. Most notable is their battery of debug memory management features. This consists of debug versions of the calloc, free, _heapmin, malloc and realloc functions. The debug versions, respectively, are _debug_calloc, _debug_free, _debug_heapmin, _debug_malloc and _debug_realloc; there is also a _dump_allocated function that prints information about every object on the heap. Version 2 of the compiler also supports debugging the new and delete operators. New and delete are C++ operators/functions to allocate and deallocate memory; they are analogous to malloc and free respectively (although some memory book keeping is added). Using new eliminates the need for a call to sizeof and the need to cast the returned pointer; likewise, using delete causes class destructors to be automatically called. Note that although there are functions to debug tiled memory (see the C Library Reference), there are no functions to debug DosAllocMem since that function is not part of the compiler. The compiler can be told to use the debug versions of the memory management functions instead of the regular versions at compile time. When you do this, each memory function call automatically invokes _heap_check, which performs a thorough check of the heap, its memory objects and all associated pointers. If anything is out of order, processing is aborted and a diagnostic message is output on stderr. At first it may be annoying that processing is aborted, but it is helpful to know that memory was overwritten or that a bad pointer was passed to a memory function. A bad or unrecognized pointer can result from allocating memory with malloc and trying to free it with _debug_free, so do not attempt to mix the regular and the debug versions of the memory functions. Details about the C Set/2 and C/C++ Tools debug memory management functions can be found in C Set/2 v1.00 User's Guide, chapter 13 and C/C++ Tools v2.00 C Library Reference, pp.42-44 and under the specific function names. When a program aborts due to memory errors it is common for the compiler to indicate that the error was before or after module xxx line yyy. An error message like this indicates that an error was just detected; therefore, the actual error occurred between the line referenced in the message and the last memory function called (thus the last _heap_check performed). Finding the memory function executed just prior to the one flagged is usually easy but can be quite difficult in a multi-threaded environment. Section Detection and Solution Examples gives detailed examples on how to solve the problems once they are detected. Let's review the advantages of learning to use and using this tool. Then we will cover how to use the debug versions of the memory management functions for versions 1 and 2 of the compiler. ═══ 7.1.1. Benefits ═══ 1. The compiler is the definitive source as to whether or not a module has memory leaks. A home grown memory check scheme may suffer from the same blind spots as the code it was intended to check. We know of no better way to ensure that the C Set/2 and C/C++ Tools memory management functions are used in a memory neutral way than to use the __DEBUG_ALLOC__ flag and its features. 2. Some memory errors are detected only when the memory debug functions are turned on (i.e. passing a bad pointer to free and memory overwrites). 3. It is not hard to use; although some code additions may be required for C++ code that redefine/overload the new and delete operators. ═══ 7.1.2. Turning them on for C Set/2 v1 ═══ 1. Include stdlib.h in each module (or in each module that uses the memory functions if you want to take the time to determine which ones those are). 2. On all compilations add the __DEBUG_ALLOC__ flag (i.e. /D__DEBUG_ALLOC__). We suggest making the __DEBUG_ALLOC__ flag, and/or compile flags, an environment variable in the build process. This allows uniform control over the build. Mixing normal and debug memory management functions should not be a problem. The _heap_check and _dump_allocated functions only work with memory allocated by the debug memory management functions. 3. Locate the point in your code where you return control to the operating system (your code should be memory neutral at this point). Just prior to this point add a call to the _dump_allocated(16) function that is conditional based on the flag discussed in step 2 above. For example: #ifdef __DEBUG_ALLOC__ _dump_allocated(16); #endif It can also be useful to use dump_allocated before and after a memory neutral function, operation or test case is executed to determine if the function, operation or test case is actually memory neutral. 4. Recompile and relink your code. Run a test case. ═══ 7.1.3. Turn them on for C/C++ Tools v2 ═══ 1. Include stdlib.h in each module (or in each module that uses the memory functions if you want to take the time to determine which ones those are). 2. On all compilations add the /Tm+ option (which implicitly defines a macro called __DEBUG_ALLOC__). We suggest making the __DEBUG_ALLOC__ flag, and/or compile flags, an environment variable in the build process. This allows uniform control over the build. Mixing normal and debug memory management functions should not be a problem. The _heap_check and _dump_allocated functions only work with memory allocated by the debug memory management functions. 3. Every place that your code redefines (not calls) the new or delete operators you must conditionally compile based on the flag/macro discussed in step 2 above. When the __DEBUG_ALLOC__ macro is defined, use of new or delete must include room for 2 additional parameters and they must be the second and third parameters. The examples below indicate what the calls should look like; these modifications have been tested on simple and complex usage of the new and delete operators. Complex uses include redefining the operators for an inherited class and overriding/adding parameters to the new operator (this cannot be done with the delete operator). #ifdef __DEBUG_ALLOC__ void* operator new(size_t size, const char *, size_t) { void* pTemp; APIRET SubAllocRC = SH.SubAlloc(size, &pTemp); if (SubAllocRC) { cout << "shared::operator new: "; cout << "SH.SubAlloc error: " << SubAllocRC << endl; return NULL; } return (pTemp); } void operator delete(void* pDel, const char *, size_t, size_t size) { APIRET SubFreeRC = SH.SubFree(pDel); if (SubFreeRC) { cout << "shared::operator delete: "; cout << "heap.SubFree error: " << SubFreeRC << endl; } } #else void* operator new(size_t size) { void* pTemp; APIRET SubAllocRC = SH.SubAlloc(size, &pTemp); if (SubAllocRC) { cout << "shared::operator new: "; cout << "SH.SubAlloc error: " << SubAllocRC << endl; return NULL; } return (pTemp); } void operator delete(void* pDel, size_t size) { APIRET SubFreeRC = SH.SubFree(pDel); if (SubFreeRC) { cout << "shared::operator delete: "; cout << "heap.SubFree error: " << SubFreeRC << endl; } } #endif Note in this example that new is redefined to use SH.SubAlloc, which is a class for using DosSubAllocMem, and delete is redefined to use SH.SubFree, which is a class for using DosSubFreeMem. Note also that when the __DEBUG_ALLOC__ macro is defined that new and delete have 2 additional parameters, 1 of type const char* and 1 of type size_t; the second and third parameters respectfully. 4. Locate the point in your code where you return control to the operating system (your code should be memory neutral at this point); just prior to this point add a call to the _dump_allocated(16) function that is conditional based on the flag discussed in step 2 above. For example: #ifdef __DEBUG_ALLOC__ _dump_allocated(16); #endif It can also be useful to use dump_allocated before and after a memory neutral function, operation or test case is executed to determine if the function, operation or test case is actually memory neutral. 5. Recompile and relink your code. Run your test cases. ═══ 7.1.4. Memory Debug Output and Analysis ═══ After compiling with the memory debug options turned on, your programs may behave differently. They will definitely run slower, approximately double previous run times. Each malloc, etc. will actually be calling _debug_malloc, etc. plus a function called _heap_check. _heap_check runs through the heap and validates all entries; if you pass an invalid pointer to free (which is truly _debug_free now), _heap_check will print a message to that effect on stderr (file handle 2) and cause the process to stop running. Thus all pointers to free must be valid before the programs will run to completion; this has been used to find messed up pointers and DosSubAllocMem pointers sent to free. When the heap is in a good state you should be able to run a test case and then shutdown your component. Just before control is returned to the OS, the _dump_allocated call will print information about heap storage that is still allocated. Ideally it will say that your programs have no storage allocated; if you do have storage allocated then you have a memory leak! The report is sent to stderr (file handle 2) and includes the line number and name of the module which allocated the memory, the pointer that points to the memory, how much memory was allocated, and the first 16 bytes of that memory location (the parameter 16 can be changed to as much or as little as is helpful to you). This will tell you exactly where your memory leak is. The memory shown by the _dump_allocated located before the return to the operating system will be freed by the operating system when the process is terminated. A common response from developers is "Hey, all this memory will go away when the process dies so I don't have to free it." Two points need to be recalled here. First, if the memory shown by _dump _allocated gets larger the longer the program has run, or gets larger depending on the operations the program executed then, there is a problem to be fixed. Second, the process/thread model of the product may change so that this code is no longer in a process that terminates frequently to clean up behind the code. You also will have to decide which memory from the _dump_allocated report was leaked on purpose (because you decided to let the operating system clean up after your process dies) and which was accidental - good luck. Your analysis can be greatly aided by the knowledge contained in appendix Forum Q & A about Malloc and by the help screens of the next tool, Theseus2. ═══ 7.2. Theseus2 ═══ Theseus (pronounced Thee'-see-us) was one of the Attic heroes of Greek mythology. According to legend, he was one of the young Greeks chosen to be sacrificed to the Minotaur, the mythical half man/half bull confined in the Labyrinth. Instead, Theseus killed the Minotaur. One of the local girls, Ariadne, gave him a thread, which he used to mark his trail into and out of the Labyrinth. He then married Ariadne and they sailed off into the sunset. The analogy of the labyrinth is appropriate, because assigning memory to a user in the OS/2 environment is quite complex, much like the Labyrinth of old. However, this program knows its way through the labyrinth and can assist you in determining the amount of memory used by your program. No tool gives such a complete view and analysis of what is occurring with OS/2 memory management. It is commercially available as part of the SPM/2 (System Performance Product/2) product, internal users can get it from the OS2TOOLS disk. Its device driver gives it access to memory structures that lesser programs would fear to grapple with. OS/2 memory experts have spent years putting their expertise into this program. It analyzes memory as well as the swapper.dat file, giving access to all contents. It contains a memory leak detector, for private and shared memory, along with formatted, hyper-linked displays of the virtual, linear, physical and module views of memory. Monitoring or reporting can be done at the system level or on a per process basis. This is one slick tool that opens up the world of OS/2 memory management as far as you care to go. The uninitiated (to OS/2 memory management) can get very useful information from Theseus2 and the experts who created the tool routinely use it to debug. Its memory leak detector will detect leaks long before monitoring the swapper.dat file would indicate, and it will indicate who is leaking. ═══ 7.2.1. Theseus2 Memory Utilization ═══ Knowing how processes in OS/2 are using memory is very helpful in determining which program is leaking. Let us repeat again that Theseus2 is a very comprehensive tool, what we demonstrate here is very limited. Theseus2 can show how much memory is used (private and shared, allocated, committed and present within private and shared). Taking a baseline system measurement, running a program that leaks and taking a second measurement, indicates which process grew and might be leaking - it is surely a good place to look at more closely. Below is an example of Theseus2 output for both system and process level memory utilization at a point in time. If some of the terminology is foreign don't worry, Theseus2 has context sensitive helps including an "Explanation of the contents of this window" for every screen. Below is the initial screen and the System menu. Theseus2 main screen ═══ 7.2.1.1. At the System Level ═══ If the option above, where the arrow is pointing, is chosen then output formatted like the example below would result. The System, RAM Usage by Process report reports which executable (.exe or .dll) owns how much private and shared memory. The last column on the right-hand side labeled "who" is the internal name of the executable. The internal name is the same as the external or file name for DLLs, but that is not necessarily true for .exe files. The division of private and shared memory is clearly labeled; the "bytes", "Kbytes" and "Mbytes" labels are the same information in three different formats. RAM Usage by Process: --------- private -------- ------ owned shared ------ bytes Kbytes Mbytes bytes Kbytes Mbytes who 008C9000 8996 8.785 001D3000 1868 1.824 system 00000000 0 0.000 sysinit 00000000 0 0.000 IP2IDMN 00000000 0 0.000 LANDLL 00000000 0 0.000 LANMSGEX 00006000 24 0.023 CNTRL 00002000 8 0.008 00002000 8 0.008 SPMNBL 00000000 0 0.000 LSDAEMON 00002000 8 0.008 LOGDAEM 00000000 0 0.000 EPWROUT 00017000 92 0.090 00112000 1096 1.070 PMSHL32 00000000 0 0.000 EPWRES 00002000 8 0.008 SPMSNAPL 00000000 0 0.000 00015000 84 0.082 SPMNET 00000000 0 0.000 HARDERR 00000000 0 0.000 00002000 8 0.008 STOPLAN 00054000 336 0.328 0001F000 124 0.121 PMSHL32 00006000 24 0.023 00004000 16 0.016 LEXAES 0000F000 60 0.059 0000C000 48 0.047 WKSTA 0000D000 52 0.051 00005000 20 0.020 CMKFMSMI 00000000 0 0.000 MUGLRQST 00000000 0 0.000 WKSTAHLP 00000000 0 0.000 CMD 00000000 0 0.000 LSCLIENT 00001000 4 0.004 MSRV 00000000 0 0.000 NETPOPUP 00000000 0 0.000 00043000 268 0.262 NETPSINI 0000C000 48 0.047 00004000 16 0.016 NETPSERV 00004000 16 0.016 NETPSERV 00007000 28 0.027 00004000 16 0.016 NBQRSPND 00000000 0 0.000 CMD 00000000 0 0.000 CMD 00009000 36 0.035 00003000 12 0.012 CMD 0000C000 48 0.047 VDM 00059000 356 0.348 00004000 16 0.016 E 00000000 0 0.000 CLIPAPI 0005F000 380 0.371 00009000 36 0.035 THESEUS2 00008000 32 0.031 00004000 16 0.016 CMVCPM 00000000 0 0.000 LPD 00009000 36 0.035 0000F000 60 0.059 PORTMAP 0000E000 56 0.055 NFSD 00000000 0 0.000 CMD 00000000 0 0.000 NFSCTL 00000000 0 0.000 NFSBIOD 00000000 0 0.000 NFSBIOD 00000000 0 0.000 NFSBIOD 00000000 0 0.000 NFSBIOD 00000000 0 0.000 CMD 00000000 0 0.000 CMD 00000000 0 0.000 DMCM 00001000 4 0.004 00017000 92 0.090 RMMINTRM 00003000 12 0.012 0002C000 176 0.172 ASP000 00019000 100 0.098 0001D000 116 0.113 PCPRINT 00028000 160 0.156 0005E000 376 0.367 ACS3EINI 00021000 132 0.129 00006000 24 0.023 CMD 0002C000 176 0.172 00006000 24 0.023 MEMLEAKS -------- ------ ------ -------- ------ ------ 00AF8000 11232 10.969 0046A000 4520 4.414 total RAM in use 0003E000 248 0.242 free RAM -------- ------ ------ 00FA0000 16000 15.625 total of all RAM pages found (Pvt + Shr + Free) < End of THESEUS2 (v 2.0.1c) output @ 11:05:31 on 1-21-1994 > ═══ 7.2.1.2. At the Process Level ═══ Memory Utilization for Process with PID = 00AB, name = 'MEMLEAKS': bytes bytes number bytes bytes allocated committed present each present description 00000824 00000824 1 0824 00000824 PTDA 000001AC 000001AC 1 01AC 000001AC TCB 00001000 00001000 1 1000 00001000 TSD 00010000 00005000 5 1000 00005000 LDT 00000200 00000200 1 0200 00000200 Process Page Directory 00080000 00013000 19 1000 00013000 Page Tables 02D30000 007A8000 644 1000 00284000 Accessible Shared memory 00840000 00009000 6 1000 00006000 Originated Shared memory 00120000 00057000 44 1000 0002C000 Private memory -------- -------- -------- 00091BD0 00019BD0 00019BD0 Total System 00840000 00009000 00006000 Total Shared originated 00120000 00057000 0002C000 Total Private -------- -------- -------- 009F1BD0 00079BD0 0004BBD0 Total RAM for the Process 10183 487 303 (in Kbytes) 9.945 0.476 0.296 (in Mbytes) The following values are taken directly from the PTDA: Allocated PTEs: private = 00057, shared = 007B8, total = 0080F. Present PTEs: private = 0002C, shared = 00294, total = 002C0. Resident PTEs: private = 00000, shared = 00010, total = 00010. < End of THESEUS2 (v 2.0.1c) output @ 11:05:44 on 1-21-1994 > ═══ 7.2.2. Theseus2 Object Summary ═══ Another helpful report that is very fast when checking for memory leaks in a specific process is the Private and Shared Object Summary. Often this affords a quick look at a process that is supposed to use memory during processing and free it all when processing is done. It lists each memory object belonging to the process and, since the memory addresses on the screen are hyper-linked, it is easy to quickly choose a memory that is new or has grown in committed size (since a base measurement) and examine its contents. This is often a first step in fixing a leak; if the developer recognizes the data structures in the object that appears to be leaked then he may know where in the program to look. Depending on the contents of the memory object you may discover that it is a particular configuration file read into memory during initialization, a recognizable data structure, or a set of linked lists x bytes in length. Each piece of knowledge gained should narrow the search in the code for where the leak originated. Below is example output of the Private Object Summary of Theseus2; it was captured by single clicking on the process name in the main window of Theseus2, then choosing the Process option from the main menu, and clicking on the Private Object Summary option. Note that name of the executable in the column labeled "Description" is MEMLEAKS .Thisisthenameofourexecutable ,notalabelthatTheseus2usestoindicateleaks .Youwillhavetoanalyzetheinformationtodetermineifamemoryleakispresent ,Theseus2merelyprovidestheinformation . Private Object Summary for 'MEMLEAKS': Object Allocated Committed Present Swapped address memory memory memory memory Description 00010000 00010000 0000A000 0000A000 00000000 MEMLEAKS #0001 (shared code) 00020000 00010000 0000A000 00003000 00000000 MEMLEAKS #0002 (private) 00030000 00010000 00001000 00001000 00000000 User Environment (hmte) 00040000 00010000 00001000 00001000 00000000 Thread Information Block (hmte) 00050000 00010000 00004000 00002000 00000000 stack (hmte [system owner]) 00060000 00010000 00001000 00000000 00000000 MEMLEAKS allocated it 00070000 00010000 00010000 00005000 00000000 MEMLEAKS allocated it 00080000 00010000 00001000 00000000 00000000 MEMLEAKS allocated it 00090000 00010000 00002000 00001000 00000000 MEMLEAKS allocated it 000A0000 00010000 00001000 00001000 00000000 MEMLEAKS allocated it 000B0000 00010000 00002000 00001000 00000000 MEMLEAKS allocated it 000C0000 00010000 00001000 00000000 00000000 MEMLEAKS allocated it 000D0000 00010000 00002000 00000000 00000000 MEMLEAKS allocated it 000E0000 00010000 00010000 00010000 00000000 MEMLEAKS allocated it 00100000 00010000 00001000 00001000 00000000 PMWIN allocated it 00110000 00010000 00001000 00001000 00000000 PMGRE allocated it 00120000 00010000 00010000 00001000 00000000 PMGRE allocated it 00130000 00010000 00001000 00000000 00000000 DISPLAY #0000 (private) -------- -------- -------- -------- Totals: 00120000 00057000 0002C000 00000000 (in bytes) 1152 348 176 0 (in Kbytes) 1.125 0.340 0.172 0.000 (in Mbytes) Number of objects = 18. < End of THESEUS2 (v 2.0.1c) output @ 15:17:09 on 1-21-1994 > By double clicking on one of the highlighted (indicating a hyper-link) pointers, the contents of the field located at that pointer is brought up. We choose the memory object at 00080000; note that its description above (EXE or internal DLL name) is MEMLEAKS. Thus we know that the program allocated this memory itself; if the program were dynamically linked it may well include objects allocated from the C runtime DLL. This memory object contains what was entered from the command line in response to a question posed by a test program; it should not be too hard to guess what number we entered in response to the question. Memory from Linear address 00080000 for 100 bytes for process 'MEMLEAKS': 00080000 (0000) 31 0A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *1...............* 00080010 (0010) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 00080020 (0020) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 00080030 (0030) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 00080040 (0040) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 00080050 (0050) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 00080060 (0060) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 00080070 (0070) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 00080080 (0080) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 00080090 (0090) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 000800A0 (00A0) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 000800B0 (00B0) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 000800C0 (00C0) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 000800D0 (00D0) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 000800E0 (00E0) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 000800F0 (00F0) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* < End of THESEUS2 (v 2.0.1c) output @ 15:25:47 on 1-21-1994 > ═══ 7.2.3. Theseus2 Memory Leak Detection ═══ The last scratch which we will make into Theseus2 capabilities is to note the memory leak detection option. We hope our reasons for covering it are obvious; it was designed to indicate memory leaks before the mortal analyst could ever catch it. Theseus provides a leak detector at both the system level and at the process level. The system level leak detector gives reports about processes in the system. The process level leak detector reports on memory object within individual process. A useful approach in applying this pair of leak detectors is as follows. First use the system leak detector to identify the leaking process or processes. Then apply the process level leak detector to identify what memory objects are leaking or what memory objects are being leaked. Finally apply some code level analysis to identify why the memory object(s) is leaking. Code level analysis identified in this paper are the Theseus2 memory objects leak detection and C/C++ debug memory management functions. Consider this output from a sample program. It is being monitored by the memory leak detection function at a system wide level. It reports changes in memory activity every 15 seconds. Note that nothing occurred in the system until 15:54:03 when a module named T2SAMP allocated and committed some memory in the preceding 15 second period. T2SAMP then deallocated the same amount of memory at 15:54:18 and allocated more memory at 15:54:33. After that nothing else happened until the memory leak detector was stopped. Memory leak detection: < End of THESEUS2 (v 2.0.1c) output @ 15:52:51 on 1-21-1994 > Use the 'Function' pull-down to start and stop the data collection. Leak data captured. Previous sample will be used as the base. Periodic update started with interval of 0:15. < End of THESEUS2 (v 2.0.1c) output @ 15:53:25 on 1-21-1994 > < End of THESEUS2 (v 2.0.1c) output @ 15:53:48 on 1-21-1994 > ----------Private----------- -----------Shared----------- Allocated Committed Actual Allocated Committed Actual PID Name +80 +5 +0 +0 +0 00E1 T2SAMP < End of THESEUS2 (v 2.0.1c) output @ 15:54:03 on 1-21-1994 > ----------Private----------- -----------Shared----------- Allocated Committed Actual Allocated Committed Actual PID Name -80 -5 +0 +0 +0 00E1 T2SAMP < End of THESEUS2 (v 2.0.1c) output @ 15:54:18 on 1-21-1994 > ----------Private----------- -----------Shared----------- Allocated Committed Actual Allocated Committed Actual PID Name +80 +5 +0 +0 +0 00E1 T2SAMP < End of THESEUS2 (v 2.0.1c) output @ 15:54:33 on 1-21-1994 > < End of THESEUS2 (v 2.0.1c) output @ 15:54:48 on 1-21-1994 > < End of THESEUS2 (v 2.0.1c) output @ 15:55:03 on 1-21-1994 > < End of THESEUS2 (v 2.0.1c) output @ 15:55:18 on 1-21-1994 > Periodic update stopped. < End of THESEUS2 (v 2.0.1c) output @ 15:55:20 on 1-21-1994 > This is a simplified example, with only 1 process doing anything in the system, but it does clarify how to spot a module that has memory activity. The output below is the same module being monitored at a process level. Memory leak detection for Process with PID = 00E2, name = T2SAMP: < End of THESEUS2 (v 2.0.1c) output @ 15:56:00 on 1-21-1994 > Use the 'Function' pull-down to start and stop the data collection. Leak data captured. Previous sample will be used as the base. Periodic update started with interval of 0:15. < End of THESEUS2 (v 2.0.1c) output @ 15:56:18 on 1-21-1994 > < End of THESEUS2 (v 2.0.1c) output @ 15:56:35 on 1-21-1994 > Allocated Committed Actual har Address P/S Description +16 +1 +0 0C32 +00060000 Pvt T2SAMP allocated it +16 +1 +0 0B65 +000B0000 Pvt T2SAMP allocated it +16 +1 +0 0B6C +000D0000 Pvt T2SAMP allocated it +16 +1 +0 0C72 +000E0000 Pvt T2SAMP allocated it +16 +1 +0 0BAD +000F0000 Pvt T2SAMP allocated it < End of THESEUS2 (v 2.0.1c) output @ 15:56:50 on 1-21-1994 > Allocated Committed Actual har Address P/S Description -16 -1 +0 0C32 -00060000 Pvt -16 -1 +0 0B65 -000B0000 Pvt -16 -1 +0 0B6C -000D0000 Pvt -16 -1 +0 0C72 -000E0000 Pvt -16 -1 +0 0BAD -000F0000 Pvt < End of THESEUS2 (v 2.0.1c) output @ 15:57:05 on 1-21-1994 > Allocated Committed Actual har Address P/S Description +16 +1 +0 0BAD +00060000 Pvt T2SAMP allocated it +16 +1 +0 0C72 +000B0000 Pvt T2SAMP allocated it +16 +1 +0 0B6C +000D0000 Pvt T2SAMP allocated it +16 +1 +0 0B65 +000E0000 Pvt T2SAMP allocated it +16 +1 +0 0C32 +000F0000 Pvt T2SAMP allocated it < End of THESEUS2 (v 2.0.1c) output @ 15:57:20 on 1-21-1994 > < End of THESEUS2 (v 2.0.1c) output @ 15:57:35 on 1-21-1994 > Periodic update stopped. < End of THESEUS2 (v 2.0.1c) output @ 15:57:45 on 1-21-1994 > Note that the +80 from the system level is seen on the process level as 5, +16's. The +5 at the system level is seen as 5, +1's. Note also that even though the description is missing when the memory was deallocated, the address and har allow us to know that the memory belonged to T2SAMP. It will take the developer of T2SAMP to know whether this behavior is normal or abnormal. If it is abnormal then this output, along with knowledge of what test case was being run, we can begin to work with the developer to account for the modules behavior and fix it if we determine that the behavior is not what was intended. ═══ 7.2.3.1. At the Object Level ═══ A single click on any linear address in Theseus2 activates the "Memory Object Leak Detection" option of the "Functions" menu at the top of the window. Choosing this function will bring up a "Memory Leak Detection for " screen. Once we have determined that a memory leak exists (via system memory leak detection) and we know which process is leaking (via process memory leak detection), we can use the object memory leak detection facility. Often the memory object is small enough that developers recognize enough detail to know where the leak is occurring. However, it is possible that the memory object is huge. For instance, debugging the usage of the PM APIs is very difficult because several PMGRE memory objects can each be many megabytes large and sparsely populated. The object level memory leak detector gives detailed information about each page state within the memory object as it changes. In the example of object level memory leak detection below, the memory object we are working with starts at linear address 00090000 and contains 16 pages. The states of the pages, while tracing the memory object, are represented by lower case letters with a summary line labeled "state counts." A page can be in any one of ten different states; the states, listed in the order shown in the "state counts" line, are: uncommitted (.=), special system (?=), allocate on demand (a=), claimable (c=), idle (i=), to be loaded (l=), present (p=), resident (r=), swapped (s=) and UVirt (u=). The lines labeled with "dif" at the beginning indicate pages whose states have changed since the last data capture. If the "dif" line contains a "." then the state has not changed, otherwise a capital letter indicates which line below contains an explanation of how the page state changed. Leak detection for Process with PID = 0882, name = LAMAIL, linear requested = 00090000, object start linear = 00090000, object has 16 pages: < End of THESEUS2 (v 2.0.1c) output @ 16:49:50 on 3-3-1994 > Use the 'Function' pull-down to start and stop the data collection. Previous sample will no longer be used as the base. < End of THESEUS2 (v 2.0.1c) output @ 16:49:55 on 3-3-1994 > The data captured is: Page states, starting with the page at 00090000 sssspsssssspp state counts: .=3, ?=0, a=0, c=0, i=0, l=0, p=3, r=0, s=10, u=0 Leak data captured. < End of THESEUS2 (v 2.0.1c) output @ 16:50:00 on 3-3-1994 > Previous sample will not be used as the base. Periodic update started with interval of 0:05. < End of THESEUS2 (v 2.0.1c) output @ 16:50:05 on 3-3-1994 > old .slspssissppp new aplspssispppp dif AB.......C... dif address old state new state A: 00090000 uncommitted allocate on demand B: 00091000 swapped present C: 00099000 swapped present < End of THESEUS2 (v 2.0.1c) output @ 16:50:10 on 3-3-1994 > old aplspssispppp new pslspssispppp dif AB........... dif address old state new state A: 00090000 allocate on demand present B: 00091000 present swapped < End of THESEUS2 (v 2.0.1c) output @ 16:50:15 on 3-3-1994 > Periodic update stopped. < End of THESEUS2 (v 2.0.1c) output @ 16:50:20 on 3-3-1994 > Leak data compared. old pslspssispppp new splspssispppp dif AB........... dif address old state new state A: 00090000 present swapped B: 00091000 swapped present < End of THESEUS2 (v 2.0.1c) output @ 16:50:25 on 3-3-1994 > In a large and sparsely populated memory object, the object level memory leak detector is invaluable when looking for memory usage patterns or for pages that are changing during processing. This function is used most often when a particular memory object is suspected of leaking when a particular transaction is executed. It is possible to capture the memory objects state, execute the transaction, capture the changed state and ask Theseus2 to compare the before and after page states. The pages within the memory object that changed are obvious; the before and after contents of the pages is also helpful in determining what part of the code is responsible for the leak. Once the memory contents are known (implying that the process and executable responsible for the object are also known), it is not difficult to find a developer who recognizes what is going on. The developer who recognizes the executable, the process, the memory object and its contents, owns the responsibility for fixing the leak. At times developers have been known to flee when approached with what looks like memory object dumps. ═══ 7.3. Detecting Memory Leaks ═══ There are several ways to detect memory leaks; some are good, some are bad, some are fast and some are slow. This section discusses a few of the bad ones so that you will avoid them, and then suggests a few good techniques that have proven valuable. ═══ 7.3.1. SWAPPER.DAT File Growth ═══ We have included this section because many developers and testers tend to equate SWAPPER.DAT file growth with memory leaks. While it is true that memory leaks usually cause swap file growth, the converse is not true. Swap file growth can occur, and be sustained, when there is no memory leak, and memory leaks do not always reveal themselves in swap file growth. The lesson here is: Do not rely on utilities that merely report SWAPPER.DAT size (e.g. SWAPMON)! Such utilities, for the purpose of memory leaks, are not worth the resources needed to run them - even if they were free. The reason is not only are they not an accurate picture of what is occurring, they are misleading; leading to errors of commission (yelling wolf when there is no wolf). We have answered countless such calls and have acquired quite the distaste for such utilities as indicators of memory leaks. If you must monitor the swap file, then use Theseus2 so that you know how many pages within the file are used and how many are free. If the number of pages used grows consistently with multiple executions of the same case, you may have a memory leak. Note also that OS/2's algorithm for shrinking the swap file is complex, containing several or'd conditions; if any one of the or'd conditions fails, then the swap file will not shrink in size. Even if the swap file does shrink in size the rate is much slower than its rate of growth. ═══ 7.3.2. Multiple C Runtime Libraries ═══ We include a section about multiple runtime libraries in a paper about memory leaks because multiple runtime libraries can be a tricky pit to discover that you have fallen into. As you know, most compilers require that you not ship a runtime library with your product that has the same name as the runtime they shipped to you. For example, if you use the C Set/2 compiler, you cannot ship a runtime library that has the name DDE4SBS, DDE4MBS or any name that they ship as a DLL. Usually developers rebuild the compiler's runtime with another name. Now the tricky part comes with large development efforts where several components each create their own runtime libraries, or when you use several versions of the same compiler, each with a uniquely renamed runtime. With multiple runtimes every developer must be cognizant of which runtime he allocating memory from, so that he can deallocate the memory from the same runtime; runtimes do not communicate with each other. This is a particularly difficult situation if you have a design that specifies that one module allocate and another module deallocate memory. Either ensure that both modules have been compiled with the same runtime or provide APIs for each other to call. ═══ 8. Detection and Solution Examples ═══ In this last section of the paper we want to show you how we detected and fixed several common types of leaks. The examples are relatively simple but sufficient to indicate the steps you may want to take first when fixing memory leaks. Below we have included the source code of the program that we used to generate most of the examples in this section; we compiled the source with the DEBUG and __DEBUG_ALLOC__ macros on. It will be helpful to refer back to the source to see each error as we detect and fix it. ═══ 8.1. Example Program: MemLeaks ═══ /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * MemLeaks.h debug defines * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */ #ifdef DEBUG #define DEBUG_CODE(code) \ code #define DEBUG_MSG(msg) \ printf(msg); \ fflush(stdout) #define DEBUG_MSG1(msg, p1) \ printf(msg, p1); \ fflush(stdout) #define DEBUG_MSG2(msg, p1, p2) \ printf(msg, p1, p2); \ fflush(stdout) #define DEBUG_MSG3(msg, p1, p2, p3) \ printf(msg, p1, p2, p3); \ fflush(stdout) #define DEBUG_MSG4(msg, p1, p2, p3, p4) \ printf(msg, p1, p2, p3, p4); \ fflush(stdout) #else #define DEBUG_CODE(code) #define DEBUG_MSG(msg) #define DEBUG_MSG1(msg, p1) #define DEBUG_MSG2(msg, p1, p2) #define DEBUG_MSG3(msg, p1, p2, p3) #define DEBUG_MSG4(msg, p1, p2, p3, p4) #endif 1 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 2 MemLeaks.c 3 4 MODULE: MemLeak 1.24.94 5 6 AUTHOR: SD Hargis 7 8 FUNCTION: Memory leak examples used with a White paper on detecting 9 and fixing memory leaks. Several kinds of leaks are demon- 10 strated by this program. 11 12 DESCRIPTION: The type of memory problems demonstrated in this program: 13 1) trying to free memory allocated by DosAllocMem; 14 2) memory overwrites; 15 3) memory leak; 16 17 INPUTS: free : A parameter indicating that memory should be 18 freed; this is an optional parameter, its 19 absence indicates that memory should not be 20 freed. 21 User will be prompted for a string. 22 23 OUTPUTS: none : 24 There is some screen output. 25 26 ERROR CODES: >= 0 successful completion 27 < 0 error condition occurred 28 29 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */ 30 31 #define ARRAY_SIZE 8 32 33 #include 34 #include 35 #include 36 #include 37 #include 38 #include 39 #include "memleaks.h" 40 41 42 APIRET main(int argc, char *argv[]) 43 { 44 APIRET rc=0; 45 char *array=NULL; 46 int *a=NULL; 47 PVOID pDOS=NULL; 48 49 /* DEFINE VARIABLES */ 50 51 52 /* INITIALIZE VARIABLES */ 53 54 55 /* MAIN LINE PROCESSING */ 56 if (argc > 2) 57 { 58 printf("You did not correctly specify the parameters.\n"); 59 printf("Necessary parameters, in order, are:\n"); 60 61 if (argv[1] != NULL) 62 { 63 printf("\nfree : [optional parameter]\n"); 64 } 65 DEBUG_CODE( 66 else 67 { 68 printf("\nNo input parameters, good.\n\n"); 69 } /* endif */ 70 ) 71 72 printf("Please reenter \"%s\" followed by valid parameters.\n" 73 "\n", argv[0]); 74 goto end; 75 } /* end if */ 76 77 78 /* dosallocmem memory for an array */ 79 if (rc = DosAllocMem (&pDOS, (ULONG) ARRAY_SIZE, fALLOC)) 80 { 81 printf("%s: DosAllocMem error at %d with %ld.\n", __FILE__, 82 __LINE__-3, rc); 83 rc = -1; 84 goto end1; 85 } 86 DEBUG_CODE( 87 else 88 { 89 printf("%d in %s: %d bytes DosAllocMem'd @ %p.\n", __LINE__-10, 90 __FILE__, ARRAY_SIZE, &pDOS); 91 } 92 ) 93 94 95 /* malloc memory for an array and an integer */ 96 if (! (array = malloc(ARRAY_SIZE))) 97 { 98 printf("%s: no memory to malloc at %d.\n", __FILE__, __LINE__-2); 99 rc = -1; 100 goto end2; 101 } 102 DEBUG_CODE( 103 else 104 { 105 printf("%d in %s: %d bytes malloc'd @ %p.\n", __LINE__-9, __FILE__, 106 ARRAY_SIZE, array); 107 } 108 ) 109 110 if (! (a = malloc(sizeof(int)))) 111 { 112 printf("%s: no memory to malloc at %d.\n", __FILE__, __LINE__-2); 113 rc = -1; 114 goto end3; 115 } 116 DEBUG_CODE( 117 else 118 { 119 printf("%d in %s: %d bytes malloc'd @ %p.\n", __LINE__-9, __FILE__, 120 sizeof(int), a); 121 } 122 ) 123 124 /* have user input an integer and characters for the array */ 125 printf("\nType an integer (-32768 - 32767) & press the key.\n"); 126 scanf("%d", a); 127 128 /* we are using malloc'd memory so free dosallocmem memory */ 129 free(&pDOS); 130 131 printf("Type some characters and press the key.\n"); 132 scanf("%s", array); 133 printf("\nCheers! The number you entered was %d &\n" 134 "the characters you entered were \"%s\".\n", *a, array); 135 136 137 end3: 138 if (argv[1]) free(a); 139 end2: 140 if (argv[1]) free(array); 141 end1: 142 if (argv[1]) DosFreeMem(pDOS); 143 end: 144 145 #ifdef __DEBUG_ALLOC__ 146 _dump_allocated(16); 147 #endif 148 149 return(rc); 150 /* main */ ═══ 8.2. Normal Execution ═══ This is how the program looks when it executes as it should. Note that to get the program to execute flawlessly we had to correct the error found in the next section. Bold letters are user typed responses. [D:\WORK\LEAKS] memleaks.exe free 79 in MEMLEAKS.C: 8 bytes DosAllocMem'd @ 2986C. 96 in MEMLEAKS.C: 8 bytes malloc'd @ 70250. 110 in MEMLEAKS.C: 4 bytes malloc'd @ 70270. Type an integer (-32768 - 32767) & press the key. 12345 Type some characters and press the key. abcdefg Cheers! The number you entered was 12345 & the characters you entered were "abcdefg". ════════════════════════════════════════════════════════════════════════════════ START OF DUMP OF ALLOCATED MEMORY BLOCKS ════════════════════════════════════════════════════════════════════════════════ END OF DUMP OF ALLOCATED MEMORY BLOCKS ════════════════════════════════════════════════════════════════════════════════ ═══ 8.3. Bad Memory Pointers ═══ This execution of the program is exactly as the source is shown above and demonstrates how easy it can be to locate some memory errors with the correct tools. In this case we will locate an invalid pointer passed to a memory management function. [D:\WORK\LEAKS] memleaks.exe free 79 in MEMLEAKS.C: 8 bytes DosAllocMem'd @ 2986C. 96 in MEMLEAKS.C: 8 bytes malloc'd @ 70250. 110 in MEMLEAKS.C: 4 bytes malloc'd @ 70270. Type an integer (-32768 - 32767) & press the key. 12345 The invalid memory block address 0x0002987C was passed in at line 129 of MEMLEAKS.C. Look at that, an error on line 129. How easy to spot. Examining the line in question we discover that we are trying to free memory allocated by line 79. The allocation method on line 79 is DosAllocMem; so the error is calling free with a pointer received from DosAllocMem. The proper call is DosFreeMem. Let us trace how the error was caught. At the beginning of every C/C++ debug memory management call, _heap_check is called. As _heap_check went through the heap it noticed that the pointer passed to free was not a pointer that it was managing. Thus it aborted the process with an appropriate message. This type of memory error is found only when the compiler's debug memory management functions are turned on. If you read the library reference where it discusses the free function it says that calls with an invalid pointer are ignored, however, the behavior is undefined. We have seen repeated calls to free with the same pointer cause intermittent heap corruption. Such errors are hard to find and not worth the trouble of not using the tools available to eradicate them. For our program we will fix line 129 by commenting it out. Future executions of MemLeaks.exe will be with this update. ═══ 8.4. Memory Left on the Heap ═══ OK, now we are going to execute MemLeaks.exe without any arguments/parameters. What do the diagnostic messages indicate? [D:\WORK\LEAKS] memleaks.exe 79 in MEMLEAKS.C: 8 bytes DosAllocMem'd @ 2986C. 96 in MEMLEAKS.C: 8 bytes malloc'd @ 70250. 110 in MEMLEAKS.C: 4 bytes malloc'd @ 70270. Type an integer (-32768 - 32767) & press the key. 12345 Type some characters and press the key. abcdefg Cheers! The number you entered was 12345 & the characters you entered were "abcdefg". ════════════════════════════════════════════════════════════════════════════════ START OF DUMP OF ALLOCATED MEMORY BLOCKS ════════════════════════════════════════════════════════════════════════════════ Address: 0x00070270 Size: 0x00000004 (4) This memory block was (re)allocated at line number 110 in MEMLEAKS.C. Memory contents: 39300000 [90.. ] ════════════════════════════════════════════════════════════════════════════════ Address: 0x00070250 Size: 0x00000008 (8) This memory block was (re)allocated at line number 96 in MEMLEAKS.C. Memory contents: 61626364 65666700 [abcdefg. ] ════════════════════════════════════════════════════════════════════════════════ END OF DUMP OF ALLOCATED MEMORY BLOCKS ════════════════════════════════════════════════════════════════════════════════ The "DUMP OF ALLOCATED MEMORY BLOCKS" is generated by the _dump_allocated call that is ifdef'd at the end of the program, just before returning control to the operating system. _dump_allocated printed out the first 16 bytes of each memory object that is still being managed by the debug memory management functions. Obviously we have left memory on the heap - Yes, a memory leak! How easy it is to see that the memory was left on the heap because we malloc'd it but never free'd it. When we include a parameter to the execution of MemLeaks.exe the if statements on lines 138-142 are executed and the memory is freed. You're getting good. Let's move on to memory overwrite errors; they will challenge your newly found expertise. ═══ 8.5. Memory Overwrites ═══ Since this type of error is hard to debug let us set the stage for what is to occur. On line 96 of the source we malloc enough room for an array of 8 bytes (see also line 31). The C Set/2 and C/C++ Tools compilers do not necessarily allocate exactly 8 bytes. It happens to allocate according to the following formula, where we have asked for x bytes: x + 16, rounded up to the next power of 2. So when we asked for 8 bytes we really got 32 bytes. And on line 110 when we asked for 4 bytes, we really got another 32 bytes. You can confirm this by noting that the first malloc returned the address of 70250 and the second malloc returned 70270. The difference being hex 20 or decimal 32. Thus the successive mallocs are 32 bytes apart. Isn't math fun! Now what would happen if, while reading in our array of characters, it was longer than 32 bytes? Wouldn't the compiler limit us to the 8 bytes that we asked for, or at least 16 (32 allocated - 16 for book keeping)? No. As we will see shortly, it is possible to go beyond the size of the memory object you allocated and overwrite onto the next object. Oh these errors are hard to find without the compilers assistance. Often memory overwrite errors are innocuous because the data does not go beyond the padding that the compiler allocated due to its rounding up to the next power of 2, but no-one wants to bet the quality of their product on all such errors being innocuous. Below is what MemLeaks.exe looks like when a memory overwrite occurs. [D:\WORK\LEAKS] memleaks.exe 79 in MEMLEAKS.C: 8 bytes DosAllocMem'd @ 2986C. 96 in MEMLEAKS.C: 8 bytes malloc'd @ 70250. 110 in MEMLEAKS.C: 4 bytes malloc'd @ 70270. Type an integer (-32768 - 32767) & press the key. 12345 Type some characters and press the key. abcdefghijklmnopqrstuvwxyzabcdefghij Cheers! The number you entered was 1785292903 & the characters you entered were "abcdefghijklmnopqrstuvwxyzabcdefghij". Memory was overwritten before the allocated memory block which starts at address 0x00070270. The first eight bytes of the memory block (in hex) are: 6768696A00A5B5A5. The file name and line number are not available. Shucks, no file name or line number. Boy what a mess we are in; most data is correct and sometimes, for no apparent reason at all, some data goes berserk on us. Well, not all is lost; we have the address where the corruption is occurring. The compiler figured that much out by surrounding the memory that we allocated with sentinels. The sentinel is a xD5A5 at the front and a xB5A5B5A5 at the end of the memory, surrounding only the number of bytes that we asked to be allocated. The collection of screen dumps below shows how we used IPMD, the C/C++ debugger, to gain more information about memory overwrite errors and fix them. We start with the initial IPMD screens. Initial IPMD screen Once the initial screens are up you can double click on the line numbers where you want to set a break point. Execution stops at each break point before the line is executed. For our program we will break at line 79 where the allocation of memory begins. A break point is set Clicking on the storage icon at the top of the screen will bring up a window that shows you the contents of memory locations. Click on the storage icon to see data in memory Click on a number in the address section and type in the address that you want to see. Since our memory overwrite problem was at 70270 we will display the contents of memory start a few words before that. Type in the memory address whose contents you want to see Now click on the green run button at the top and the program will run until the next break point is met. We have placed break points at strategic places in the program; the alternative is to "step over" each function. Clicking the green run button executes the program until the next break point While executing your program under IPMD you can "step over," "step into," "step debug," "step return" or run the functions. Stepping over (executing the line and all lower level function calls) lines of code seems to be the most useful to begin with. Execution can proceed one function call at a time Note that each memory area that we malloc'd is evident by the sentinels. The sentinels are immediately before and after the memory that we requested, and only the amount of memory that we requested is available to be used. It can be helpful to note that 16 bytes before the address that is returned to us as allocated, is for memory book keeping information. See how the bytes immediately above and below the mouse pointer shows the length of the memory that was requested. Remember that the numeric information in memory is byte reversed (character information is not byte reversed), thus 08000000 in memory in equivalent to 00000008 in the way that we look at it. Between this information and the sentinels you should be able to locate almost any allocated memory field. o Beginning sentinel: xD5A5 o Ending sentinel: xB5A5B5A5 o Length field: (pointer returned from allocation) - 16 While stepping over the scanf on line 126 we need to go to the session that the program was running in and type in a response to the prompt to input a number. The number we are using is still 12345, which is 3039 in hex, which is stored as 39300000 in memory. The memory contents now show the number we input While stepping over the scanf on line 132 we switched to the running program and input more than 8 characters. Stepping over lines of code allows us to clearly see which line is the offender. As each line is stepped over watch the storage window. A line of code will overwrite memory objects. In this case it is the scanf on line 132. The memory overwrite is clearly indicated and so is the offending code By looking at the last storage screen you can see that the behavior of the program depends on how much memory you overwrite. In this case we overwrote the book keeping information, beginning sentinel and most of the contents of the integer we previously input. Note that the program continued and reported a wrong number; we input the integer 12345 and it is reporting 1785292903 which is x6A696867. Do you see where the number it reported came from? The debug memory management functions catch the memory overwrite ═══ 8.6. Closing Suggestions ═══ Well, there you have it; most of the knowledge that we have about memory leaks. We discussed what they are, how to detect them, how to find them, how to plug them and what tools to use. We even gave you a pictorial guide of detecting and fixing the most common types of leaks. Our examples were simple, so fixing leaks in your code may require you to work with more noise and less clear logic, but the basic skills are all there now. You may find yourself with a system of programs that has memory leaks and you do not know how to narrow the focus to just 1 component. Here are several ideas that developers, whom we personally know, have tried successfully. o Use a common function to do all allocation and deallocation of memory, and add some tracking information to that common function. Many developers have their own versions of this idea. Most ideas include accessing the stack to print out who the caller is; they also track the pointer returned from allocating memory, the number of bytes requested and a timestamp. This information accommodates pairing up allocation and deallocation requests. o If you can get the source code for your compiler, you can supplement memory allocation and deallocation calls with tracking information. A post-processor can then be used to digest the information and indicate where one might focus. It is even possible to build the modified compiler functions into a DLL with the original compiler functions. At runtime, based on an environment variable, the DLL can be initialized to use either the modified or the original version of the memory de/allocation functions. o Build separate components with separate runtime libraries and use Theseus2 to show which runtime the leak(s) is coming from. Be sure to avoid creating additional memory leaks since you now have multiple runtime libraries. Remember: Nothing beats having the right tools for the job! ═══ 9. Leaky Interfaces between C and C++ ═══ When writing functions which allow standard C programs to access C++ objects care must be given when supplying object destructors. If object references are not declared correctly then it is possible that the destructor will not properly free all of the memory that was claimed by the object's constructor and thus result in a memory leak. The symptom is that a pointer is freed but not the object pointed to by the pointer. An example of an object constructor and destructor to be accessed by standard C programs is given below for an object named Trace_Class. Any ANSI C/C++ compiler will behave in the manner indicated by this example. void * TC_Constructor(void) { return (void *) new Trace_Class(); } void * TC_Destructor(void *pTC) { delete pTC; } The constructor above, when called by standard C code, will return a pointer to a newly created instance of an object of type Trace_Class. Note that the pointer to the object instance is returned as a (void *) since this pointer should be considered a reference to opaque data by standard C routines. The destructor above is supposed to free the object created by the call to TC_Constructor. There is an error, however. The declaration of the argument to TC_Destructor is (void *), which is appropriate since that is the same type that is returned by TC_Constructor. However, the C++ delete routine is called with pTC which the compiler translates to free(pTC) since pTC's type is (void *). This will not have the desired effect since properly freeing all of the memory assigned by "new Trace_Class()" requires a call to Trace_Class's destructor. This can be accomplished by properly declaring pTC's type as a pointer to an object of type Trace_Class. The modified TC_Destructor that frees the objects memory completely is: void * TC_Destructor(void *pTC) { delete (Trace_Class *)pTC; } The above change now allows the compiler to translate the above to a call to the Trace_Class destructor with a pointer to the instance held in pTC. ═══ 10. Forum Q & A about Malloc ═══ Like many developers we have had occasion to question the inner workings of C/C++ memory management. Below is a compilation of questions and answers that we found to be helpful. We hope that you do too. The edited excerpts are from the C-SET2 FORUM and C-SET2 ANSWERS. In future releases any implementation details below the API level are compiler specific and subject to improvements or other changes without notice. o How does malloc work internally, ie. how much memory does it really allocate? If a program uses malloc exclusively, will C-Set/2 then take care of all memory management related matters (such as de/committing pages)? Are there any limits to be observed? o Internally, malloc adds 16 to the size requested, rounds it up to the next even power of 2, and grabs that much memory. If malloc has a block of that size available, it gives it to you without going to the OS. Otherwise, it grabs a block of at least 4K bytes from the OS (since the page size is 4K). Anything extra is kept around for later calls to malloc. When you call free(), the memory is not returned to the OS, but is kept around for later allocation. If you want to return memory to the OS, you must call _heapmin(). The algorithm is the Berkeley Bucket Algorithm. It's very fast, but has a tendency to waste memory with some allocation sizes. o Can somebody give me more details or a reference to details for the the Berkeley Bucket Algorithm. o Allocating memory in sizes which are powers of 2 allows us to write a very simple, very fast, malloc routine. If the memory were cut up into the exact size that you malloc (plus, of course, overhead), we would have to search what was on our heap. The Berkeley Bucket algorithm allows us to check in one place to see if we have memory for you. A search algorithm is nowhere near as fast. o The rounding up to the next even power of 2 disturbs me a little. When getting into larger sizes this could alloc a great deal, i.e. asking for 33K would allocate 64K (if I understand correctly what was said). And would the part from 33K to 64K be available to be used in a next malloc or is it assumed to be part of the 33K request? Also out of curiosity, does malloc allocate a large area uncommitted, and commit pages out of it, or every time it needs to ask OS/2 for memory it does a DosAllocMem of just that size? o Yes - if you malloc 33K, you really get 64K. the remaining 31K is wasted storage (but it only exists on the swapper, not in real memory). (Authors note: In this scenario 36K of physical memory will be committed when it is accessed. The amount is 36K because 36 is a multiple of 4 and memory is committed in 4K chunks; in other words, virtual memory is backed by physical memory, when accessed, in 4K chunks. It is possible that this 36K could be swapped, but not necessary. In OS/2 v1.x space in the swap file was reserved, in OS/2 v2.x that is no longer the case. The remaining 28K is allocated (virtual) memory only.) malloc calls DosAllocMem to allocate blocks which are a multiple of 4K bytes (same size as a memory page). It doesn't use DosSubAlloc to cut them up, since the performance of DosSubAlloc isn't that great. The Berkely Bucket algorithm optimizes speed, not total memory used. (Authors note: The sentences above state that DosAllocMem allocates memory in 4K chunks; that is most of the story, but not all of it. DosAllocMem allocates virtual memory in 4K chunks; virtual memory is backed by physical memory in 4K chunks. For memory objects that are used by 16-bit code, OS/2 uses a tiled LDT that has 8192 descriptor slots. Each descriptor maps a maximum of 64K. If you DosAllocMem a tiled object a descriptor is used and that 64K of virtual address space is unavailable even if the actual size of the object is only 33K. So if 33K is DosAllocMem'd, 64K of virtual memory becomes unavailable, 36K will be allocated and the allocated memory is backed by physical memory in 4K chunks when it is accessed.) You may be overlooking some significant details concerning the memory management in OS/2. OS/2 uses what is called "lazy page allocation" this means that if I DosAllocMem 10 megabytes, all the OS does is set up the page table entries so they are marked "not-present", and remember the range of entries. *Nothing* else happens until you actually *access* the page. At that point, the OS creates a 0 filled page for you. Since pages are 4K in size, if I only use the first 10K of that 10 Meg, only 12K worth of real memory is taken. (plus the overhead for the page table entries) The malloc routines in the library take advantage of this (very good) allocation strategy on the part of the OS, to keep the malloc and free code simple and fast. Committing and de-committing memory involves calling ring 0 routines in the OS -- This will cost about 400 clocks just to do the privilege transitions. malloc and free currently take less time than this to do *all* their work. P.S. Please note that BSD implementations of Unix use this algorithm too. o If I allocate 10Megs of memory, apart from what you describe, doesn't that also take up 10Megs of SWAPPER space, if that memory is allocated as committed, or does the reservation of SWAPPER.DAT space only occur at the first memory hit on each page? o Does the OS do the allocation on the hit of each page? The answer I always got was that the *only* thing created at the time DosAllocMem is called are the page table entries. P.S. Lots of UNIX programs assume the OS does lazy page allocation. o Also, if you have a process that periodically requires large amounts of memory and then doesn't require it for awhile, does it make sense to call _heapmin to return it? I'm thinking of a process that maybe is refreshing data from the host say every 30 minutes and during the refresh would need large amounts of data in memory but then when done it doesn't need it. This doesn't take up memory from other processes since it will be swapped out, but it still takes up all that room on the swapper. Could be a couple meg of memory. o By all means, if you allocate a lot of memory, use it for a short time, and then free it, and then "go to sleep" for a long period, use _heapmin. It's not required, but will free up swapper space. Makes your program a "good citizen". o Use _alloca to allocate dynamic storage that is specific to a function. It allocates from the stack and the storage is automatically freed when the function returns; make sure the function is short-lived and does not use all of the stack space. o Use malloc to allocate storage that is small or will be resized. If you have hundreds of small mallocs you should consider doing a large malloc and managing the small pieces yourself. o Use DosAllocMem for allocations greater than 3,000 to 4,000 bytes. o Ensure that objects favoring DWORD alignment are DWORD aligned. malloc and _alloca always return storage that is DWORD aligned, so keep objects within that storage properly aligned. o WORD align shorts. o DWORD align int long and float. o QWORD align doubles.