═══ 1. Copyright and License Agreements ═══ Copyright and License Agreements ═══ 1.1. Copyright Notices ═══ Copyright Notices LXOPT and the LXOPT logo are trademarks of Functional Software Limited. OS/2, C Set ++ and VisualAge C++ are trademarks of International Business Machines Corporation. The LXOPT software and accompanying documentation may be distributed and used free of charge but are copyright (C) 1994-1997 Functional Software Limited. All rights reserved. Functional Software may be contacted via the internet at funcsoft@cix.compulink.co.uk ═══ 1.2. LXOPT License Agreement ═══ LXOPT License Agreement Definition of terms used in this agreement LXOPT: The LXOPT software, utility programs and accompanying documentation. USER: You, the purchaser of the LXOPT software. FSL: Us, Functional Software Limited. OUTPUT: Executable computer code, or derivative thereof, created or altered by the LXOPT software. Acceptance Use of LXOPT indicates acceptance by USER of the terms and conditions of this agreement. If you do not agree to these terms and conditions you may not make use of LXOPT and must destroy any and all installed copies of the software. Grant of License The LXOPT software is copyright of Functional Software Limited. FSL retains ownership of LXOPT. You are hereby granted a nonexclusive license to use LXOPT subject to the permitted uses and restrictions contained in this agreement. Permitted Uses LXOPT may be applied by USER to software owned by USER. LXOPT may be applied by USER to software licensed by USER where such actions are consistent with that license. Copies of LXOPT may be freely distibuted provided that each copy is complete and unaltered. No fee may be made for distribution other than a nominal distribution charge. USER may distribute unlimited copies of OUTPUT. USER may distribute unlimited copies of the files PRELOAD.EXE and PRELOAD.INF. Restrictions USER may not alter LXOPT unless such alteration is approved by FSL. Such prohibited alterations include, but are not limited to, the operation and appearance of the software and the text of the documentation which shall include this agreement and accompanying copyright notices. USER may not reverse compile, reverse engineer or reverse assemble any part of the LXOPT software. For the purposes of this agreement use of LXOPT as described in the accompanying documentation shall not constitute reverse engineering of the software. All rights not expressly granted by this agreement are retained by FSL. Limited Warranty LXOPT does not work with all valid OS/2 executables. Programming constructs exist that defeat the mechanisms used within LXOPT. It is the responsibility of USER to test the suitability of LXOPT for USER's applications. In view of this USER assumes all liability and responsibility for the decision to use LXOPT and all OUTPUT produced including any consequences thereof. LXOPT is supplied "AS IS", without warranty of any kind, either expressed or implied, statutory or otherwise, including but not limited to the implied warranties of merchantability or fitness for a particular purpose that may be made by FSL or its software suppliers on this product. No oral or written information or advice given by FSL, its software suppliers, dealers, distributors, agents or employees shall create a warranty and you cannot rely on the correctness of any such information or advice. Neither FSL, its software suppliers, dealers, distributors, agents or employees shall be liable for any direct, indirect, consequential or incidental damages. Including but not limited to damages for loss of business profits, business interruption or loss of business information, arising out of the use or inability to use the software or accompanying documentation, whether or not FSL has been advised of the possibility of such damages. Under no circumstances will FSL liability exceed the purchase price of the software. Governing Law This license is governed by the laws of England and USER agrees to submit to the jurisdiction of the English courts. Where the local laws of USER prohibit the jurisdiction of English law this license will be governed by the laws of the country in which LXOPT is used. FSL may, at its own discretion, elect to enforce, apply and interpret the terms of this agreement under any applicable foreign jurisdiction. If any provision of this agreement is unenforceable, all others shall remain in effect. Furthermore, any such unenforceable provision shall remain and be interpreted in its strictest sense which remains consistent with governing law. ═══ 2. URGENT MESSAGE ═══ URGENT MESSAGE This message only applies to OS/2 Warp 3.0 at FixPak level 10 or lower. Users of OS/2 V2.x or Warp 3.0 users who have applied FixPak 11 or later may ignore this message. All affected users are strongly advised to upgrade their OS/2 installation. There is an obscure software fault in the virtual memory manager of OS/2 WARP V3.0 (PJ18014) where FixPak 11 or later has not been applied. This fault can result in the unpredictable corruption of a page of memory when an application allocates memory in excess of the computers physical RAM and accesses/alters it in a particular order forcing swapfile growth. Normal applications have little need to concern themselves with this problem. Unfortunately LXOPT reliably reproduces the conditions for failure when processing a large executable file. For these purposes a file of 1Mb is considered large, although the precise limit is unknown and will vary with available RAM. The most common manifestation of the fault is the failure of an internal consistency check resulting in an LXO0116 internal error. Given the nature of the fault errors can appear at any stage and some users have experienced the generation of corrupt executable files where a page containing application code has been corrupted before being written to disk. Other users have experienced protection violations or unexplained premature termination of the software. While all software faults are of concern, the unpredictability and potentially undetectable nature of this fault make it particularly dangerous. A workaround involving the removal of the need for swapfile growth has fixed all known manifestations of this problem. This is achieved by pre-setting the initial size of the swapfile in config.sys to a value that ensures that execution of LXOPT will not require swapfile growth. To calculate the required figure multiply the size of the largest EXE/DLL to be processed by 15 and add it to the 'normal' swapfile size as given by a directory listing on your machine. Obviously running other applications simultaneously with LXOPT will affect swapfile requirements and the initial swapfile size should be set accordingly. Adding an additional 5 or 10Mb to the resulting figure would be a wise precaution. The line below sets the initial swapfile size to 50Mb with a 2Mb minimum free disk space limit. SWAPPATH=D:\ 2048 51200 Regardless of file size, if you experience symptoms matching the description above you should attempt this workaround. ═══ 3. This Software is FREE! ═══ This Software is FREE! LXOPT is now freeware! LXOPT was previously a commercial OS/2 development tool but was withdrawn from sale at the end of January 1997. Rather than allow the software to disappear this final version is now distributed as unsupported freeware. Unfortunately this also means that development for the OS/2 platform has ceased. Technical support for this freeware version is not available. ═══ 4. V1.22 - What's Changed? ═══ V1.22 - What's Changed? Version 1.22 Change to Freeware LXOPT is now distributed as unsupported freeware. Version 1.21 Demo Version Changes The LXOPT Demonstrator has been upgraded to include all the LXOPT utility programs and will now produce unrestricted applications when processing code of less than 256Kb in size. TLINK Problems In some circumstances the Borland linker (TLINK) can generate invalid values in the fixup page table within executables. LXOPT now detects and corrects illegal values. Software Faults Fault corrected which could cause files arranged with /preload to reject preload requests. Code recognition updated to prevent reported rejections of valid code sequences. Code generation algorithm updated to prevent potentially endless application of transfer optimisation. Version 1.2 Code Preloader The new /preload option allows application code to be transferred to the swapfile at boot time. A freely distributable preload utility works in combination with LXOPT processed code allowing users to selectively preload code. Executed code is loaded direct from the swapfile with no need to apply fixups or for network traffic. Preloads of network files may be automatically deferred until specific network drive(s) are available allowing preload requests (e.g. in startup.cmd) to be made prior to restoration of network connections. Dead Code Elimination Code recognition algorithms have been revised to improve the detection and elimination of unused code/data in the processed code object. Code previously included due to a cyclic reference or unused pointer reference (typically from unused CASE statement jump tables) is now removed. Recording Efficiency Code path detection will now predict more code paths removing the need to record their use in the recording file. Untraced sequences will execute faster and help reduce recording file sizes. Fixup Encodings Generation of new fixup tables has been improved to allow the grouping of fixups with common targets. Although previous versions of LXOPT performed this optimisation, some combinations were omitted. This is particularly important for DLLs which typically contain a far larger quantity of fixups in relation to code size. Undo Utility LXOPT now includes an undo utility. Applied to processed DLL/EXE files this will restore the original unprocessed version of a file and delete any related files generated by LXOPT. This utility is installed in the WPS as part of the installation process. See UNLXOPT for more details. DosGetMessage Code inserted from IBM libraries to handle DosGetMessage API calls violates an LXOPT restriction. LXOPT now rejects such code which needs to be moved to another code object to operate correctly. See the LXO0169 error description for more details. Default Recording File The default recording file pathname is now based on the absolute path name of the processed executable. This forces recording files into the same directory as the executable by default. Relative path names may still be specified using the /recFile option. Software Faults Small EXE/DLL files produced by the Borland linker sometimes use 16 bit values to indicate entry points into 32 bit code. This is a valid technique which is now correctly handled by LXOPT. Some assembler routines in the Watcom floating point library contain operations which violate LXOPT restrictions. V1.2 will now handle the violations which have been reported. Assembler sequences inserted into code to help trace execution have been altered to avoid a potential memory update race condition with a store from an outstanding floating point instruction. The method by which access to recording code is serialised has been updated. This improves performance and prevents the deadlock that often occurred during recording sessions where multiple threads of varying priorities were executed. Previous Changes Version 1.1 Pricing The price of the single user license increased slightly and all distribution license fees were removed. Processing Large EXE/DLL Files Under WARP Advice on how to avoid problems that may occur processing large EXE/DLL files under WARP has been revised. It is important that LXOPT users are aware and act on this information. See Urgent Message for details. WARP Compressed Page Support LXOPT now supports the EXE/DLL file page compression introduced with OS/2 WARP. See the /pack2 option for more details. New Arrangement Algorithms LXOPT has two new arrangement algorithms. The binary algorithm arranges code based on the similarity of binary patterns of use and is the new default algorithm. Parkonly, as the name suggests, parks unused code but performs no other arrangement. It is intended for use where developers wish to retain control over the arrangement of executed code. CPU Optimisations V1.1 contains the first LXOPT CPU oriented optimisations. These focus on CPU instruction cache utilisation efficiency (see CPU Instruction Caching) and branch prediction. New options giving greater control over alignment within the processed code object have also been provided. See /alignCode, /alignData, /cLineSize and / cLineWaste for more details. CPU bound applications should achieve performance improvements of between one and five per cent. There are numerous other optimisation opportunities which will be included in future versions of LXOPT. Special code arrangement pre and post processing have also been introduced to strike a better balance between CPU efficiency and working set tuning. This suppresses layout divisions within performance critical areas and significantly reduces the size of recording files. Default Alignment The default alignment of code pointer targets is changed from 4 to 1. Performance related code alignment issues should be addressed using the new /cLineSize and /cLineWaste options. Applications which rely on alignment to make the low order bits of addresses redundant should set alignment explicitly using the /align option. ICC.EXE Patch for CSet++ V2.0 and 2.1 Users Linking initiated by ICC.EXE always forced a base address for EXE files resulting in the removal of internal fixups. LXOPT requires internal fixups for correct processing of the input executable. Although many users could avoid this problem by direct use of LINK386, C++ template users were forced to link directly via ICC preventing the use of LXOPT. A patch for ICC.EXE is now included to remove this restriction. See Patch for CSet++ V2.0 and 2.1 Users for a description of this patch and how to apply it. Demo Version Restrictions Code arranged by the demonstration version will not produce a tone on start-up unless an error is encountered. The tone has been replaced by a processed application lifetime limit of seven days. A warning is also produced if the machine in use has not been recently rebooted to encourage testing in line with the new Performance Testing section. This change also allows evaluators to time the execution of processed code without having to deduct the duration of the start-up and termination tones. Code produced by the LXOPT Demo will now execute almost identically to that produced by the full product but may produce up to three additional page faults during the runtime of the processed application. Recording Session Error Messages V1.0 of LXOPT placed error messages produced during recording sessions in the file "LXREC.ERR". V1.1 retains this behaviour but when the message has been safely recorded it will now attempt to display the message on screen. Code Offset Translations New options to translate code offsets between pre and post arrangement code are now provided. These are designed to assist in debugging situations and to help trace instruction pointer based error messages back to the original source code. See the /getOld and /getNew options for more details. Code Disassembly Code within the processed code object can now be disassembled using the /disasm option. Utility Programs New utility programs have been introduced to time application execution, simulate low memory conditions and translate EXE/DLL files between the OS/2 2.x and WARP compressed executable file formats. See TimeRun, Thrash, LXWarp and LXUnWarp for more details. Installation/WPS Set-up The installer now creates an LXOPT desktop folder. The contained program objects are prepared to allow use of LXOPT and associated utilities direct from the WPS. Alteration of DPATH is not required for the use of LXOPT V1.1. Software Faults The LXOPT recorder has been redesigned to remove the problems some users experienced while attempting to perform multiple simultaneous recording sessions. Unique recording DLLs are now created on a per application basis. The code analyser has been updated to prevent rejection of code generated by the Watcom C++ compiler. Run times on large files (>2Mb) have been reduced by improved efficiency in the file creation routines. Note that LXOPT is not intended for use on a daily basis but as a final 'pre-shipment' optimiser. Future design targets will permit arrangement times of up to 12 hours (i.e. over night) if optimisation rewards warrant it. ═══ 5. *** START HERE! *** ═══ *** START HERE! *** Welcome to LXOPT LXOPT is a unique tool for the OS/2 developer. By working set tuning EXE/DLL files LXOPT will typically halve the amount of memory required to store application code. The instruction stream is also processed to ensure maximum CPU instruction cache efficiency and a new preloading option allows the resulting code to be transferred to the swapfile allowing the fastest possible application startup and paging. See benefits for a full list of what LXOPT can do or go to the Introduction section for a more complete introduction to the software. You Can Now Use LXOPT Free! LXOPT is now freeware, see This Software is FREE! for details. Quick Start The installer has created program objects to process applications using standard LXOPT defaults which appear on the Open menu of EXE and DLL files. If you are eager to get started go straight to Using LXOPT. Spreading the Word Working set tuned applications page less and use less memory. This leaves more memory free for use by other applications, reducing their need to page. When users execute tuned code the entire system benefits. Only when the majority of applications have been working set tuned will the full benefits of tuning appear. Please help to spread the word by passing on this copy of LXOPT to other OS/2 developers. ═══ 6. Introduction ═══ Introduction ═══ 6.1. Background ═══ Background With the release of OS/2 V2.0 came the introduction of the Linear Executable file format, the format now used by all 32-bit OS/2 EXE and DLL files. For the first time the Linear Executable allowed code to be loaded into memory in 4Kb units, the page size of the 80386 and later processors. This new efficient design changed the way code was loaded into memory. Applications were no longer loaded on a per segment basis but used the virtual memory mechanisms now used by the rest of the operating system. A code page is loaded into memory when an attempt is made to execute an instruction within the page. The 4Kb page is read from the disk and relocations ( fixups ) are applied from data structures contained within the Linear Executable. When memory is heavily utilised code pages will be recycled as with other system memory and the code is discarded. If instructions on the code page are later referenced the page must be reloaded from the disk and fixups reapplied. An efficient program would ensure that all instructions on a single code page were executed at roughly the same time to make sure that memory was used and paged most efficiently. Unfortunately modern programming methods and convenience favour grouping code logically rather than by time of execution. During the run time of a typical application between 30% and 50% of the code loaded is never executed. Entire 4Kb code pages are often loaded to execute only twenty or thirty bytes of code. ═══ 6.2. What is LXOPT? ═══ What is LXOPT? LXOPT (Linear eXecutable OPTimiser) is a development tool designed to improve the code layout of 32-bit OS/2 applications. Applied directly to EXE and DLL files LXOPT rearranges code at the assembler level to minimise page faults, maximise CPU instruction cache efficiency and provides many other useful benefits. It is particularly effective on large applications forced to run in low memory conditions and can reduce code load page faults by up to 95% in extreme conditions. LXOPT can group together all unused assembler sequences and move them to other code pages from where they will not occupy memory unless executed. This technique, known as 'Sleeping Code Parking', reduces the total code memory requirements of a typical application by between 30% and 50%. LXOPT is unique in that it works at the assembler level and is able to change the location of not just whole procedures but individual processor instructions. Code handling infrequently used branches of IF or CASE statements may be moved to different code pages significantly reducing a programs working set. LXOPT also produces minor CPU related performance improvements by improving CPU cache efficiency and branch prediction. CPU bound applications should achieve performance improvements of between one and five percent. Applications to which LXOPT is to be applied must adhere to certain restrictions or may require special caution, see restrictions for more information. LXOPT operates directly on 32-bit code in OS/2 Linear Executable files and does not normally require alteration or recompilation of source code. Processed applications may contain 16-bit code but this code is not optimised by LXOPT. LXOPT is not designed for daily use but intended as a final stage in the development cycle. A completed application should be processed by LXOPT as a final optimisation phase before retesting and internal or external deployment. Note The normal method of testing a performance enhancing tool is to compare the execution times of pre and post optimised code. LXOPT primarily operates by improving the caching characteristics of application code and results are distorted by caching effects within the file and operating systems. See Performance Testing for details of how to negate these effects. If you intend using this software on files greater than 1Mb in size and are using OS/2 WARP 3.0 (without FixPak 11 or higher) there is a workaround for a software fault of which you MUST be aware. See Urgent Message for details. ═══ 6.3. Benefits of LXOPT ═══ Benefits of LXOPT Working Set Reduction The primary benefit of LXOPT is its ability to identify an applications working set and produce an optimised code layout based upon it. Any application with code greater than 4Kb in size may benefit from this effect. The effect is most apparent when an application is forced to execute in a restricted amount of memory. Reduction of the working set reduces page faults which improves performance. Large applications constrained by physical memory may benefit greatly from this effect. Small applications benefit from improved start up times and contribute to a general reduction in system load. Code Parking Users may be familiar with 'Dead Code Elimination', an optimisation used by compilers to remove unused code. LXOPT performs 'Sleeping Code Parking', the moving of apparently unused code to the end of an applications code space. 'Parked' code will not normally be loaded by your application BUT REMAINS ACCESSIBLE should it be required. Code parking can significantly reduce the total amount of memory used by application code, typically by 30% to 50%. It is particularly useful in reducing the memory overhead of largely dormant applications that run continuously in the background. Code Removal The LXOPT code analyser will detect and completely remove unreferenced instruction sequences. Although modern linkers are normally very good at performing this function they are unable to detect some forms of unused code. LXOPT identifies every byte of code, if any instruction is unreachable LXOPT will remove it. Preloading LXOPT includes a /preload option which permits application files to be transferred to the swapfile on machine start-up. This technique, formerly reserved for OS/2 system DLLs, is now available to all applications. Preloading permits faster application start-up and paging at the expense of an extended boot time and increase in swapfile requirements. CPU Optimisations Code arrangement and CPU caching related options allow the tuning of code for maximum CPU cache efficiency. Improved instruction caching aids performance both by helping to reduce instruction fetch times and freeing the main memory bus for other instruction/data accesses. The execution history generated by LXOPT is also used to assist in branch prediction. Ease of Use LXOPT works directly on application EXE and DLL files. Code usage gathering and new layout generation are fully automated, no source code files are ever examined or altered. Developers no longer need to alter code structure, insert compiler pragmas or predict runtime code usage to tune their code layouts. Near optimal code layouts which can effectively halve code size can often be achieved with less than an hours work. Users With Insufficient RAM Despite declining memory prices many users still operate machines with insufficient RAM for their requirements. For many applications LXOPT provides a means of quickly and effectively supporting such users while providing a useful performance enhancement regardless of the target machine. Multitasking Users are often limited in running multiple concurrent applications by memory constraints. Multitasking applications compete with each other for valuable system memory. Applications processed by LXOPT work more effectively in low memory conditions, reducing system load when running several large programs concurrently. Fewer/Faster Disk Accesses In addition to the reduction in disk activity due to fewer page faults, optimised code layouts have other beneficial effects. Pages in the LXOPT processed application file will tend to be arranged in the order in which they are used. Once a page is loaded from disk following pages are likely to be fully or partially loaded at the same time by disk hardware/cache. Subsequent page faults can therefore often avoid a disk hit, significantly reducing the time taken to service the fault. Even if not already loaded, subsequent code pages will tend to be located at nearby locations on the disk so reducing head movement. In addition LXOPT aligns each code page within the file to minimise the number disk blocks which need to be read by the disk controller. Using Libraries All application developers use libraries. Some come with the compiler, others are developed in house or purchased externally. Normally these libraries are developed for general use without regard for a specific application. LXOPT can process these libraries to optimise their memory usage with your application. Unused code in the libraries is parked where it causes least overhead to your application. This benefit may be gained whether the library is statically or dynamically linked. Error handling/Debugging code Most software contains error handling for unexpected internal errors. Often such code is pre-processed out before final release. While the tests for error conditions remain, LXOPT will park the error handling itself. While your application is operating normally the overhead of the error handler code is reduced to the size of a single transfer instruction. ═══ 6.4. Restrictions ═══ Restrictions Please ensure that all programmers involved in the development of the software to be processed read the contents of this section. LXOPT may be applied to almost all OS/2 Linear Executable DLL or EXE files containing a pageable 32 bit code object. Some unusual programming techniques may cause LXOPT to fail or require special caution. A 32 bit OS/2 linear executable application may contain many objects. An object is analogous to a segment within the old 16 bit file format with each object typically containing a different type of information. For example a normal 32 bit executable file might contain 32 bit code object(s), data object(s), resource object(s) and perhaps a 16 bit code object to interface with some 16 bit APIs. LXOPT always applies itself to the largest 32 bit code object. To function correctly LXOPT needs to identify all potential execution paths through your program. LXOPT will work correctly with all transfers of control generated by standard compilers and normal assembler techniques. Some coding techniques which rely on assumptions about code layout will cause LXOPT to fail or require special caution. Run times will also be reduced if users avoid placing read-only application data within the processed code object. These following restrictions apply only when they involve the code object which LXOPT processes. A restriction may often be avoided by moving the offending code/data to a different object. This can be achieved by placing the code in a named code segment via compiler options/pragmas and adding this name to the module definition (.DEF) file in the SEGMENTS section. Pointer Alignment If your code relies on the alignment of functions/data (i.e. lower bits used for informational purposes and masked out before dereference) you must use a consistent /align option value. Hard Coded Relative Distances Your application may not make assumptions about relative distances between functions or the absolute location thereof. For example, a statement such as ((NextFunc *)((char *)MyProc+10))() will cause your processed application to expire just ahead of your programming career. In general the absolute distance between two locations within the processed code object should never be used within a calculation. The only exception to this rule is where both addresses refer to data and no code exists between them. Assembler programmers may need to take particular care in observing this restriction. A special case of this error occurs in code inserted by IBM compiler libraries when handling a DosGetMessage call. See the LXO0169 error for more details. Timing During the recording process LXOPT increases the load on the CPU and generates occasional disk activity. If an application is timing sensitive this may affect its operation. CPU load/disk activity can often be reduced or redistributed to reduce such problems. See Creating a Recording Version for more information. Exporting Code Object Data LXOPT assumes all exports from the processed code object are 32 bit function entry points. Do not export data from your applications 32 bit code object as this will cause failure of LXOPT or the resulting program. Self Referencing Code Code that deliberately alters itself to effect transfers may fail. If your application contains assembler that uses this technique disassemble the resulting code to ensure it has been translated correctly. LXOPT will issue a warning if the code object it is processing is writeable. Invalid SS:ESP LXOPT uses the applications stack during recording sessions. To ensure correct operation of the recorder SS:ESP must be valid at all transfers of control within your application (e.g. during a JMP instruction). If your 32 bit code receives control with an invalid or 16 bit stack a valid SS:ESP pair must be created before the next transfer is executed. LXOPT also uses the DS register. All recording operations are suspended while the DS register contains a value other than that in effect at application start-up. ═══ 6.5. An Example ═══ An Example The effect of LXOPT on code is best demonstrated by example. The caption below shows a simple piece of disassembled 'C' code which opens a file and checks that it contains the correct version number. /* open the file */ fp = fopen(filePath, "r"); PUSH "r" PUSH filePath CALL fopen MOV fp, EAX /* check if opened ok */ if (!fp) { CMP EAX, EAX JNE DO_FSCANF ReportError(RE_OPEN_FAIL, filePath); PUSH filePath PUSH RE_OPEN_FAIL CALL ReportError return FALSE; MOV EAX, 0 RET } /* read in the version number */ fscanf(fp, "VERSION %d", & version); PUSH & version PUSH "VERSION %d" PUSH fp CALL fscanf /* test the version number */ if (version != APP_VERSION) { CMP version, APP_VERSION JE VERSION_OK ReportError(RE_WRONG_VERSION, filePath); PUSH filePath PUSH RE_WRONG_VERSION CALL ReportError return FALSE; MOV EAX, 0 RET } The code performs some simple error checking typical of this type of operation. In normal use the error handling code will not be executed yet is always loaded due to its proximity to the other code. LXOPT uses the execution history to identify the error handling as 'Sleeping Code' and moves it to another code page. This produces the new instruction sequence below. ... PUSH "r" PUSH filePath CALL fopen MOV fp, EAX CMP EAX, EAX JE NOT_OPENED PUSH & version PUSH "VERSION %d" PUSH fp CALL fscanf CMP version, APP_VERSION JNE WRONG_VERSION ... The twenty two instructions of the original code sequence is reduced to twelve while the application executes normally. If a file access error does occur the error handling is loaded and executed normally. LXOPT improvements don't stop there. The entire 'ReportError' function is also parked and code from the library functions for 'fopen' and 'fscanf' will be separated from the other library code allowing it to be moved to the code page on which it is used. Also 'fscanf' is a powerful function which contains code capable of reading many data types and includes floating point conversions. LXOPT breaks the function into its component parts and only moves the code from 'fscanf' which specifically handles the input formats specified by your application. The result is a code sequence that is effectively halved in size and which only needs a single code page to contain all the required code. ═══ 7. User Guide ═══ User Guide ═══ 7.1. Using LXOPT ═══ Using LXOPT Use of LXOPT is divided into four stages. Preparation The EXE/DLL file to be processed may need some special preparation. What preparation is needed (if any) is described in preparing your application. Creating a Recording Version Next a special version of your application must be created. This will execute as normal but will create a recording detailing where and when all code is used. LXOPT creates this special version of your application for you. See Creating a Recording Version for more details. Running the Special Version The special recording version of the application is then executed to generate a recording. This is the most important part of the optimisation process. See Recording Program Statistics for details. Arrangement Finally LXOPT is used again, this time with the /arrange parameter to create an optimised code arrangement. Options may also be specified to compress pages in the output file (/pack2) or to allow the created file to be preloaded (/preload). Arrangement issues are discussed in Creating An Optimised Code Arrangement The result is a leaner, faster application with dramatically improved performance characteristics when forced to run in low memory conditions. ═══ 7.1.1. Preparing Programs ═══ Preparing Programs EXE File Preparation LXOPT can process both EXE and DLL files. To process EXE files they must contain internal fixups which are normally removed by the linker when the executable is given a base address. LINK386 users must ensure that EXE files have been linked without the /base linker option. ICC.EXE in CSet++ V2.0 and V2.1 automatically provides /base:65536 as an option to LINK386 when initiating a link. Users of this compiler should invoke LINK386 separately. This is not possible where the code to be processed uses templates and for these users a patch for ICC.EXE is included to allow use of LXOPT. See Patch For Cset++ V2.0 and V2.1 Users for more details. Users of VisualAge C++ should use ILINK with the '/nobase' option. Do not link using ICC.EXE, this provides the /base option to a link even if /nobase is specified. Watcom created EXE files need the 'op int' linker option to ensure internal fixups are retained. DLL File Preparation Standard DLL's always retain internal fixups and require no special preparation. DosGetMessage Code within the IBM CSet/VAC libraries violates an LXOPT restriction when handling a DosGetMessage call. Even where an application does not make direct use of this function it will often be included to handle messages required by the compiler library. LXOPT detects use of this technique and produces an LXO0169 error. See the description of this error for information on how to avoid this problem. Restrictions and Stack Use Before using LXOPT please read the Restrictions section to ensure that it is suitable for your application. LXOPT uses up to 2Kb of your applications stack while recording. To reduce stack size problems LXOPT intercepts thread creation within the recording application and increases the allocated stack by 4Kb. The main application stack is not altered. ═══ 7.1.2. Creating a Recording Version ═══ Creating a Recording Version To create a recording version of your EXE or DLL file it must be processed by LXOPT using the /prep parameter. If the /arrange parameter is not specified then /prep is assumed by default. To prepare an application using default options select LXOPT Prepare from the Open menu or using the command line type: LXOPT LXOPT searches the file for 32 bit code objects and selects the largest for processing. Although most applications will be processed in under five minutes, LXOPT is effectively reconstructing your entire application including any libraries to which it is statically linked. Processing may take up to a few hours for multi Mb application files. Errors and Warnings LXOPT detects all unreferenced bytes in the processed code object. Often these bytes are initialised data or unused code. For each unused byte sequence which contains fixups LXOPT produces a LXO0150 warning. It is normal for files to produce several of these warnings during processing. All unused bytes are removed from the output executable. When LXOPT detects an error processing a file a simple error message is produced. More detailed information on the error and how to avoid it is provided via the F1 key which provides a direct link to the reference section in this documentation. Memory Use Analysis During processing LXOPT identifies every byte within the processed code object. There are three basic classifications; 'code', 'data' and 'unused'. 'Code' bytes are those used to form processor instructions and will normally be the largest group. 'Data' bytes are read-only data placed within the code object. Such data is normally inserted by a compiler and are usually compile time constants or jump tables used in the encoding of CASE statements. 'Unused' bytes are never referenced within the application. Typically these are padding used to adjust procedures and data to alignment boundaries. Often applications contain unused code or data that is linked into the final executable image. Some development tools and techniques can prevent standard linkers from detecting unused code. Files Produced When processing has completed the original application file is replaced by a special recording version. The original file is renamed to have an '.ORI' extension. A special recording DLL is also created to assist with runtime recording. The name of this DLL is generated by placing an '@1' at the end of the root name and adding a '.DLL' extension. For example, processing of 'MYAPP.EXE' will create the recording DLL 'MYAPP_@1.DLL'. The numeric value is altered if necessary to ensure generation of a unique file name. Special Options Some applications may require special options for recording to increase performance or alter the distribution of the overhead of the recording process. Often the most simple solution for timing problems is to record on a higher performance PC or remove timing sensitive code from the processed code object. See the /buff, /compress, /thread and /recfile options for more information. Note If you intend using this software on files greater than 1Mb in size and are using OS/2 WARP prior to application of FixPak 11 there is a workaround for a software fault of which you MUST be aware. See Urgent Message for details. ═══ 7.1.3. Recording Program Statistics ═══ Recording Program Statistics To record program statistics run your prepared EXE/DLL performing the operations to be typically performed by the user. The quality of recording information directly affects the performance of LXOPT and is the most important part of the optimising process. During the recording process the recording DLL generated during preparation (e.g. 'MYAPP_@1.DLL') must be somewhere on your LIBPATH. By default this DLL was created in the directory of the processed EXE/DLL file. Typically LIBPATH contains the current directory '.' which often allows the DLL to be found at its default location. If your application consists of multiple EXE/DLL files you may prepare and record all of them simultaneously. Recording Strategy The purpose of recording is to tell LXOPT where and when instructions are normally executed and identify which are rarely or not normally used. During the recording session you perform operations that you would normally expect of the user. Avoiding unusual program conditions while recording will greatly enhance the effectiveness of LXOPT. DO NOT be tempted to use preprepared test scripts designed to test program stability, these rarely mimic true user behaviour. A good general rule is to start your application and do the most commonly performed actions first. Then progress through the interface in order based on expected frequency of use. If you wish to focus tuning on a specific area of the code simply execute that code more frequently. Recording (.REC) Files During execution your application will create a recording file which by default is the name of the processed file with a '.REC' extension. Your application may be used for more than one recording session, each session appends its results to the existing recording. Recording sessions may also be performed on separate machines, the files concatenated later to form one continuous recording. If you wish to record multiple instances of your application at the same time a separate recording file is created for each instance. These special circumstances require special treatment of the recording file(s), see the /recfile option for more details. Special Options If recording seems slow or your application is timing sensitive the /buff, /thread, /compress and /recfile options can all help to reduce the recording overhead if used when creating a recording version. If an error is detected during recording a two pitch alternating tone is generated and an error message placed in the file 'LXREC.ERR'. An attempt will then be made to display the error message on screen. This will normally succeed but given the unknown state of the application it cannot be guaranteed. The application will then terminate with error code 99. WARNING Recording may create large recording files ranging from a few Kb to several hundred Mb in size. Running out of disk space will cause recording to fail and may impair other applications if recording files are placed on the same drive as your swap file. The recording file pathname may be specified using the /recfile option and the recording file may be reduced in size by the /compress option. You are advised to start with a short recording session to judge the disk space requirements for your application. ═══ 7.1.4. Creating An Optimised Code Arrangement ═══ Creating An Optimised Code Arrangement When a recording has been created LXOPT is used with the /arrange parameter to create an optimised layout. To use all defaults select LXOPT Arrange from the Open menu or using the command line type: LXOPT /arrange If you wish to preload the resulting file you must also specify the /preload parameter. This option is also available as LXOPT Arrange Preload on the Open menu. As with preparation, LXOPT may take a significant time to complete when processing large input files. Performance Testing If you wish to compare performance between the original and optimised applications please read the Performance Testing section. Caching effects of the operating system, network servers and disk caches need to be considered to ensure valid results. Files Produced When complete the optimised EXE/DLL file is created and overwrites the existing recording version. The recording version of the file is renamed with a '.PRP' extension. The original application file remains with a '.ORI' extension. Arrangement Report When a new arrangement has been created LXOPT performs a page fault simulation on the original and optimised versions. The test simulates the execution of the entire recording session with the original and optimised versions of the application file. The results are analysed and used to generate a comparative report for a series memory load conditions. See Reading Reports for an explanation of the information displayed. Production of a report may be disabled using the /noreport option Recording Files LXOPT arrangement uses a single recording file which by default is the EXE/DLL path name with a '.REC' extension. If you wish to specify another file or use multiple files please see the /recfile option for more details. Alternative Arrangement Algorithms LXOPT uses named arrangement algorithms to create optimised code layouts. Although the default algorithm is generally the best, algorithm performance is dependent on program structure and can vary dramatically. For information on customising the optimisation phase refer to the /alg option. CS:EIP Based Messages Many applications use the current instruction pointer as part of their fatal error messages. The default exception handler also provides this information. Developers often use a .MAP file to trace back these pointers to the offending code. The movement of assembler sequences within the processed code object means that offsets within this region will have altered. LXOPT provides the /getOld and /getnew options to translate these offsets between the old and new versions of the executable. ═══ 7.2. Reading Reports ═══ Reading Reports When the arrangement process is complete LXOPT generates a report to show how effectively the new instruction layout will reduce page faults due to the loading of application code. The data for this report is generated using a simulation of the page fault behaviour of the old and new code layouts for a range of available free memory. The execution history generated by the recording process is used to recreate the flow of control throughout the lifetime of the application. OS/2 is an advanced multitasking operating system and as such the resources allocated to an application will vary dynamically based on system load. To allow production of a meaningful report page fault data is generated for fixed amounts of available memory. A typical report appears below: Calculating page faults loading instructions from code object 1 ... Memory (Kb) Old Faults New Faults Percentage Reduction 28 13187 4014 69% 84 2048 290 85% 140 743 58 92% 196 461 43 90% 252 337 43 87% 308 240 43 82% 364 182 43 76% 420 160 43 73% 476 144 43 70% 532 131 43 67% 375161 bytes (63%) of the code was parked. Each row of the table details page fault behaviour for a fixed amount of available memory. To take the first row, if code object 1 were restricted to the use of 28Kb of memory the old version would generate 13,187 page faults while executing the recording sequence. The new code layout would generate 4,014 page faults, a 9,173 (69%) reduction. Code parking often has a significant effect on the arrangement process and in the above example roughly 366Kb was parked (i.e. 375,161 bytes of unused code fragments were collected from throughout the application and placed together at the end of the code area). The last row of the table shows the number of page faults generated when all the memory the code needs is available. This figure is never zero as application code is always loaded via page faults (*). As no page is ever forced out the number of page faults is equal to the total number of pages referenced. The old layout used 131 pages (524Kb) while the new code layout uses 43 pages (172Kb) a reduction of 88 pages (352Kb) or 67%. This is the reason why for a range of available memory the new code layout causes 43 page faults, 43 pages (172Kb) of memory is all that is required to load all of the executed code. So has the applications total memory requirements been reduced by 67%? No. All figures in the table relate to page faults loading instructions from the processed code object, all references to data and resources remain unaltered. If your application is relatively small but manipulates large amounts of data the effect may even go unnoticed. If your applications memory requirements are primarily due to the size of the code then the effect can be transformative. (*) OS/2 V2.0 and later ignore the PRELOAD attribute specified in module definition files. ═══ 7.3. Preloading ═══ Preloading Preloading (transferring exe/dll files to the swapfile) can give a significant load time performance boost at the expense of additional swapfile space consumption and initial preloading delays (normally at boot/network connection time). For a file to be preloaded it must have been arranged using the /preload option. The resulting file will execute normally until preloaded using the Preload Utility. Use of preloading also raises other more subtle issues. DLL Initialisation Applications are preloaded by use of the DosStartSession and DosLoadModule APIs. When preloaded via the preload utility no DLL initialisation code is executed in DLLs that have been arranged with the /preload option. This is required to ensure transparent operation and also to protect the preloading mechanism from harm. No EXE file code is ever executed. The only exception is for DLLs with global initialisation for which normal start-up is performed. When a DLL/EXE is loaded, so are all the DLLs on which it depends. The DLLs not processed by LXOPT will initialise normally and are not preloaded. This places constraints on the actions that may be performed while executing the _DLL_InitTerm function within unprocessed or global init DLLs used by an LXOPT preloadable file. In these circumstances the _DLL_InitTerm routine should not attempt any user interaction or perform any action likely to materially affect other client processes. It should also not attempt to access functions exported by other LXOPT processed DLLs, their DLL initialisation will not have been performed making exported functions unreliable. Although it is very unusual for such DLL initialisation code to exist it is extremely important that this restriction is not violated. Philosophical Issues Preloading is a compromise. It trades the performance boost of application paging against increased use of machine resources; namely swap space and preload (usually boot) time. Problems appear where each application considers itself "important" enough to be preloaded. Clearly if a users machine were to load every piece of software they possess every time the machine booted preloading would quickly become counter productive. OS/2 used to permit the preloading of segments with the PRELOAD segment keyword in module definition files. The loader would read all PRELOAD segments into memory on application start-up. Unlike the LXOPT preload, this OS/2 preloading only took effect when the user attempted to start an application. PRELOAD segments were loaded but often at the expense of other executing applications. Although tools still support this option and indicate preload requirements in the executable files they produce the operating system now ignores them. A major factor in this is the potential abuse of PRELOAD to boost a single applications apparent performance at the expense of the rest of the system. LXOPT preloadable files are preload enabled, they do not automatically preload by themselves. This is deliberate. By requiring the use of the preload utility the decision to preload is taken away from the developer and given to the user. Your development tools are a good example of the issues involved. If you use them on a daily basis then preloading all the executables and DLLs may be very beneficial but it is hardly warranted if development is confined to an annual tweak of an in-house utility. In general do not make assumptions about the desirability of preloading your application. If you are working set tuning your software with LXOPT, enable preloading and deploy with the preload utility wherever possible. Users sometimes run server applications intermittently while apparently trivial applications are often executed thousands of times within automated scripts. Deployment Issues The Preload Utility which accompanies LXOPT is freely distributable with your application. Remember that a users installation may already be using the preload utility. Two issues arise, where to put it and what to do if the files already exist. Preload needs to be available prior to network availability. It is therefore recommended that the preload utility is stored in the "\PRELOAD" directory on the OS/2 boot drive. If the utility already exists follow the simple rule that if the files have a more recent creation date than the ones supplied with your application, leave them alone!. Future versions of the preload utility will maintain backwards compatibility. The preloader operates by copying the entire processed file into the swapfile. The original application file on disk is closed and all further page loading is performed directly from the swapfile. While this reduces the page loading overhead it also increases disk space requirements on the swap drive. Running out of swap space may make the system unstable. If the problems are caused by commands in startup.cmd it will make recovery more difficult. By default the preload utility will refuse to preload a file if there is less than 10Mb of free swap space. If your application installation places preload commands in startup.cmd and predict the need for greater free space use the /M option. ═══ 7.4. Working Set Tuning ═══ Working Set Tuning Working set tuning is the act of optimising the arrangement of information to be stored in a cache to allow the most efficient operation of the caching mechanism. When an operating system manages pages of memory the system RAM effectively becomes a giant cache. 4Kb code and data pages travel via this cache from executable files or the swapfile on their way to the processor in the same way that files travel through a disk cache. The success of caching relies on the non random nature of access requests. The greater the locality of reference the more efficiently a cache operates. The path of execution through code is not random. Instructions execute in sequences broken up by transfers of control such as CALL or JMP instructions. It is these transfers of control that often produce a 'cache miss' resulting in a disk access. While the targets of these transfers are sometimes some distance from the current instruction pointer the destination is often known or predictable. It is this predictability on which a working set tuner for code is based. Code Arrangement When applications are divided into pages it is done without regard to the underlying contents, every 4096 bytes the code is indiscriminately severed. Functions and even individual instructions are split across page boundaries. What appears on each page is dictated by the order of the code within the executable file. What dictates that order? You do. Within each compilation unit the order in which code appears is generated by the compiler. It builds the instruction sequences and orders them roughly as they appear in the source file. The linker takes each object file and concatenates the contents, appends any used libraries and outputs the executable. The result is that code within an executable will appear in the order that it is typed in the source files and the order in which those files are processed by the linker. Well so what? - if code is only loaded when referenced then there's no problem, right? Take a look at the two code layouts below where A to L are individual code sequences which for simplicity have been grouped 3 to each 4Kb page. ┌──────────┬────────┬────────┬────────┬────────┐ │ │ Page 1 │ Page 2 │ Page 3 │ Page 4 │ ├──────────┼────────┼────────┼────────┼────────┤ │ Untuned: │ A,B,C │ D,E,F │ G,H,I │ J,K,L │ ├──────────┼────────┼────────┼────────┼────────┤ │ Tuned: │ G,E,D │ L,B,C │ H,I,K │ A,F,J │ └──────────┴────────┴────────┴────────┴────────┘ Untuned vs Tuned Code Layout Reduced Code Memory Requirements If the normal path of execution is "GEDLBLBLBCHIK" the benefits of the tuned layout become clearer. The normal layout loads four pages in the sequence 3241, the tuned layout loads only three pages in the sequence 123. Both layouts contained and executed the same code but the tuned layout avoided one disk access and used 25% less memory. This memory saving is a direct result of the sequences A, F and J not being executed. If this seems contrived consider that on average between 30% and 50% of a typical applications code is loaded but not executed during normal operation. If that sounds high examine your own code for error handling of memory allocation, file access or OS/2 API errors. Include with this the code that your users will rarely execute such as routines handling the import of that obscure file format, changing application configuration or displaying that list of programmer credits (try clicking on the Warp desktop and then press Ctrl-Alt-Shift-O). Often such code appears within branches of IF or CASE statements, lying dormant inside otherwise active functions. In an untuned code layout much of this code is loaded because it happens to share part of a code page with some other active code sequence. Performance In Low Memory Conditions Limit the code sequence to the use of one 4Kb page as a crude simulation of low memory conditions and more significant differences emerge. The normal layout will execute as "GEdLBLBLBcHiK" where an upper case character represents the need to load a page from disk. The result is a total of 10 page loads. The tuned layout sequence is "GedLblblbcHik". The total of 3 page loads has been unchanged by the restriction to one page of memory. The reduction in page faults is due to the way that tuned code layouts group code by time of execution. When a tuned application executes a CALL or JMP instruction the target is much more likely to be on the same page of memory. In low memory conditions any attempt to execute an instruction on all but the most recently used pages is likely to be punished by a page fault. While this small example conveys the basic principles, in reality the situation is much more complex. A typical code page contains not 3 but 300 separate code sequences and OS/2 recycles each page on approximately a least recently used basis. In low memory conditions applications and the operating system compete for ownership of these memory pages, the number of pages available to an application varying dynamically based on current demand. As a result, when working set tuned code is executed the reduced memory requirements benefit not just the tuned software but all other executing applications. ═══ 7.5. CPU Instruction Caching ═══ CPU Instruction Caching Level 1 CPU cache memory is a limited and valuable resource. Modern processors require these caches to allow the instruction pipelines to be fed at their maximum rate. The Intel Pentium processor implements an 8Kb on chip instruction cache. Compared to the size of modern applications 8Kb is extremely small and operational efficiency depends on its ability to contain small repeatedly executed code sequences (such as program loops) in their entirety. There are three inefficiencies in the use of CPU instruction caches that LXOPT addresses. Processor Caching Algorithm The unit of storage within a cache is the 'cache line' which on a Pentium processor is 32 bytes long. These cache lines are analogous to the 512 byte block storage units in a disk cache. Ideally an instruction cache would operate using a pure Least Recently Used algorithm to maximise the chances of keeping repetitive instruction sequences within the cache. But caches are high speed devices, often with access times of less than 10 nanoseconds. Time constraints prevent the implementation of optimal algorithms. Early devices used a simple direct mapping approach where some bits from an address were used to directly address a cache line. Tag Line Adr Binary Address: 00110110110001001110001011000010 In the example above 8 bits are used from the address to identify the cache line to use. This gives us 256 possible cache lines or 8Kb of total cache assuming a 32 byte line. A problem occurs where two memory locations need to be cached that have the same bit pattern for the cache line address. As each location is loaded the previous occupier of the cache line is pushed out. For simple assembler loops this rarely occurs, but if the loop contains a call to a function with a clashing address performance suffers dramatically. Hardware designers responded to this problem with the set associative cache. Tag Line Adr Binary Address: 00110110110001001110001011000010 Here the cache line address is reduced to 7 bits giving a total of 128 addresses but now the cache holds 2 cache lines per address. The cache maintains an LRU mechanism within each group of two lines. The total cache size is unchanged but it is now more flexible, reducing the number of clashes which force out useful cache lines. The use of 2 cache lines per address (2 way) set associative cache is the method used by the Pentium processor. The Pentium Pro also uses an 8Kb 2 way set associative instruction cache. The 486 processor uses a 4 way set associative cache with a 16 byte cache line. Set associative caches reduce cache line address clashes but do not eliminate the problem. Modern applications, unlike simple benchmark code, contain deep hierarchies of programming constructs that can significantly increase the risk of clashes. When these occur not only is instruction fetching delayed but the use of the external bus blocks other data memory access. When clashes occur subsequent sequential instruction fetches are also much more likely to push out reusable cache lines. While these valuable cache lines are discarded, old unneeded lines are retained by virtue of their uncontested cache line address. LXOPT organises instructions such that commonly executed code is laid out in near sequential form. This is the most efficient ordering for the instruction cache as it minimises the chances of related code sequences having the same cache line address. The resulting more sequential order of cache line loads allows the cache to perform as if a near pure LRU algorithm had been implemented. Reduced Cache Line Wastage Grouping commonly executed code together has another simple benefit, it moves rarely executed code elsewhere. The Intel x86 instruction set was designed with memory efficiency as a high priority and many of the most commonly executed instructions are contained within a single byte. The average size of an uninterrupted sequence of assembler instructions is only 13 bytes. As a result almost every cache line will span two or more separate instruction sequences. If these sequences are not closely grouped by time of execution then significant portions of the cache are wasted. LXOPT arranges code by dividing code into each uninterrupted sequence and tracing its use. On arrangement these sequences are recombined based on time of execution, exactly the criteria for the least cache memory waste. Target Alignment The third instruction caching benefit is also the result of the sequential grouping of executed code. Most modern compilers align code reached by transfer instructions (such as the start of procedures) on a 16 byte boundary. The principle is simple, once the procedure is reached the most efficient thing to do is minimise the number of cache line reads. Up to 15 bytes of padding must be inserted to ensure 16 byte paragraph alignment. This is clearly a good strategy in normal circumstances but LXOPT processed code is no longer 'normal'. After processing, the instructions immediately preceding the aligned procedure are very likely to already be inside the instruction cache. LXOPT has grouped code by time of use. Even if the code is not already in the cache it is likely to be needed in the immediate future. Padding code out to 16 byte boundaries is now a less attractive proposition. Most instruction padding becomes a waste of valuable CPU cache memory. The benefit of improved locality of reference is enhanced if the result of a transfer instruction has come as a surprise to the processors instruction prefetch mechanism. If the new target is already in the cache the initial instructions may be fed directly to the execution unit, keeping it busy while the instruction prefetch struggles to regain its lead over the execution pipeline(s). Removal of all padding is not necessarily optimal even after LXOPT processing. Performance of the 486 processor can be degraded where an instruction crosses an alignment boundary. LXOPT removes padding between arranged code but allows the developer to choose how many bytes of CPU cache to waste to ensure alignment on a cache line boundary. /cLineSize (default 16) permits the developer to specify the target cache line size and /cLineWaste (default 3) specifies how many bytes of cache line you are willing to waste to ensure alignment. ═══ 7.6. Performance Testing ═══ Performance Testing The working set tuning benefits of LXOPT in percentage terms are reasonably consistent across applications but other factors affect how this translates into performance improvements. LXOPT reduces the overhead of loading code and the memory needed to store it. A 50Kb application that manipulates 10Mb of data will benefit little from working set tuning of code. If you are evaluating LXOPT for use on a large project do not performance test on a 'Hello World' type program. The only way to obtain reliable results is to test with the intended application. The first reaction of a developer after creating a new program arrangement is to test that the application behaves and performs as expected. Next is to reach for the stopwatch and test to see how much of a performance improvement has been achieved. Normal performance testing involves executing the original and processed versions of the application through a pre-set sequence of tests and comparing the execution times. There are several issues that a developer should be aware of when performance testing LXOPT processed code. Although LXOPT performs some minor CPU related optimisations, its primary role is still that of a working set tuner. When OS/2 manages the allocation and recycling of 4Kb pages of memory it is effectively performing the role of a cache controller. In a virtual memory environment your system RAM has become a giant cache memory. LXOPT operates by arranging your code to allow this cache to work as efficiently as possible. When your application executes it results in the loading of code pages from not only the application executable but from other DLLs on which it depends (e.g. PMMERGE.DLL). When an application terminates many of these DLLs code pages remain in memory. Execution of the application has changed the set of code pages that are loaded into memory. This caching of code pages will result in varying execution times for actions which appear to the user to be equivalent. Caching is used not just by the OS/2 loader but by the file system(s) and often disk access hardware. If you are working on a network your file server is caching there too. This caching which is normally of great benefit is also a major obstacle to valid performance testing. As a developer you may have recently created or executed the code to be tested. Either way you will have caused the contents of the executable file to pass through the cache(s) on the way to or from the disk. Program data files may benefit/suffer from similar effects. For valid testing we need an equal playing field with all code starting out on disk. The most simple and effective way to do this is to restart your machine, wait for all disk activity to cease (wait for 3 whole minutes of inactivity) and then time the execution of the original application. Repeat this process for the LXOPT processed version and the runtimes can be reliably compared. Avoid using code or data files on a shared file server for obvious reasons. The program utility TimeRun will assist in measuring the run time of an application. Remember also that OS/2 is a multitasking operating system and other tasks are always executing together with your own. This may become a problem if these other tasks behave differently between test runs. The simple act of moving a mouse pointer over another window or forcing an additional redraw may significantly affect results as code is loaded and executed to handle these actions. The ratio of code size to available system memory is another major influence. Developers tend to have the fastest disks, modern controllers and more system RAM than other types of computer users. Working set tuning really shows its worth when a tuned application is executed in a restricted memory environment. When performance testing try using a machine typical of your users. Using a machine with only 8Mb or less will help to simulate low memory conditions. Alternatively use the program utility Thrash while performance testing your application. This utility will simulate the low memory conditions that can be created by other memory hungry applications. Ultimately memory savings will depend on application size. Pages not used by your application are free for use by all. These free pages are allocated to applications on demand by OS/2. If your application is the major consumer of memory then it will have the greatest reward. Trivial short-lived applications are usually not memory restricted as memory pages allocated to them will rarely be loaded long enough to reach the end of the LRU page queue. The benefit here is a shortened load time and reduced impact on the rest of the system. In an ideal world all code would be working set tuned. Assuming a 50:50 code to data ratio the combined effect would be equivalent to a RAM upgrade of approximately 33% for all OS/2 users. ═══ 7.7. Patch for CSet++ V2.0 and 2.1 Users ═══ Patch for CSet++ V2.0 and 2.1 Users LXOPT requires internal fixups to be retained within executable files. LINK386 will remove these internal fixups if it is invoked with the '/base' option. When ICC.EXE initiates a link of an EXE file it automatically passes this option to LINK386. Although many users can invoke LINK386 directly, ICC requires that applications using C++ templates use ICC to initiate the link. Under these circumstances it is impossible to prevent the setting of a base address. LXOPT contains the program ICCPATCH.EXE which disables this behaviour in ICC.EXE. The patch locates the parameter string within ICC.EXE and removes it, preventing it from being passed to the linker. Usage of ICCPATCH is: [C:\DEVTOOLS\IBMCPP\BIN] ICCPATCH ICC.EXE Make a copy of the file before you patch it, the given file is altered in place. If ICCPATCH reports that it is unable to open the file it is likely that ICC.EXE is still in memory. Type 'ICC /tl-' and reattempt the patch. Providing a base address of 65536 for an EXE file is good practice and should normally always be observed. Based executables will be a little smaller and load a little faster, which is the reason ICC provides a base address by default. After applying the patch be sure to provide the '/base:65536' parameter to the linker via the ICC.EXE /B option for all executables which are not to be processed by LXOPT. DLLs require no such action and if you plan to apply LXOPT only to DLL files you should not apply this patch. If you later apply a CSet++ CSD you may have to reapply this patch. Note that this patch is supplied by Functional Software Ltd and is not supported by IBM. ═══ 8. Reference ═══ Reference ═══ 8.1. Options ═══ Options Options are identified by a preceding / and are not case sensitive. If an option requires a value that value must immediately follow the option name separated by a colon. e.g. /alg:stat ═══ 8.1.1. /alg ═══ /alg Syntax: /alg: Specify the algorithm to be used to generate the new code arrangement. Available algorithms are binary, firstuse, parkonly and stat. Firstuse is a simple (and fast) arrangement algorithm which orders code blocks based on the order in which they are first used. Stat is a more powerful algorithm which identifies the pattern of executed code using statistical methods and uses this to order code blocks. The 'stat' algorithm is capable of producing significantly better results than 'firstuse' but is extremely sensitive to its /groups option. Binary is a powerful arrangement algorithm which uses the recording history to create a binary pattern of usage for each code block. Blocks are then grouped by similarity of the patterns produced. Parkonly performs no arrangement other than the parking of unused code. It is intended for use by developers who wish to retain control over the location of executed code and is not recommended for normal use. Default is to use the binary algorithm. ═══ 8.1.2. /align ═══ /align Syntax: /align: Specify the alignment to be given to both instruction and data pointer targets in the code area. Set to 1 (/align:1) for maximum compression. Alignment of code and data may be specified separately using the /alignCode and /alignData options. The option /align:16 is equivalent to /alignCode:16 /alignData:16. The default alignment is 1 for code pointer targets and 4 for data pointer targets. Use of alignment is not just a performance issue, it may be vital for some types of program, see restrictions for more information. Alignment of code for performance reasons is better addressed via the /cLineSize and /cLineWaste options. See CPU Instruction Caching for a discussion of the issues involved. ═══ 8.1.3. /alignCode ═══ /alignCode Syntax: /alignCode: Specify the alignment to be given to processor instructions in the processed code object that are referenced by a pointer. Default is 1 for maximum compression. Use of alignment is not just a performance issue, it may be vital for some types of program, see restrictions for more information. Alignment of code is also affected by the more general /align option. Data in the processed code object may be aligned using the /alignData option. Alignment of code for performance reasons is better addressed via the /cLineSize and /cLineWaste options. See CPU Instruction Caching for a discussion of the issues involved. ═══ 8.1.4. /alignData ═══ /alignData Syntax: /alignData: Specify the alignment to be given to data in the processed code object. Default alignment is 4. Use of alignment is not just a performance issue, it may be vital for some types of program, see restrictions for more information. Alignment of data is also affected by the more general /align option. Code reached via pointer values in the processed code object may be aligned using the /alignCode option. ═══ 8.1.5. /arrange ═══ /arrange Syntax: /arrange Generate new code arrangement based on results of previous recording session(s). Alternative is /prep. For an explanation of the arrangement process see Creating Optimised Code Arrangement. ═══ 8.1.6. /base ═══ /base Syntax: /base: (default - See below) Specify the new base address for the application file processed. By default EXE files will automatically be based at address 00010000H. By default DLL files will retain any existing base address. If not previously based the DLL is given a default base of 00800000H. ═══ 8.1.7. /buff ═══ /buff Syntax: /buff: (default 1024Kb) Specify the size of the recording buffer in Kb. This is the size of the recording buffer to be used while recording. Code references are stored in the buffer while recording. Once the buffer is full it must be written to disk. Writing is done as part of the application instance/thread being recorded. If your application is timing sensitive adjusting this value may remove timing problems while recording. Valid values are from 4Kb to the maximum allocatable amount of virtual memory. ═══ 8.1.8. /cLineSize ═══ /cLineSize Syntax: /cLineSize: (default 16) Specifies the cache line size (and hence alignment) to be assumed for CPU cache optimisation. Default is 16 bytes. Only code which cannot be reached after executing the preceding instruction in memory is affected by this option. A typical example of such code is the first instruction of a procedure in a high level language. Together with /cLineWaste this option controls the performance related alignment of code within the processed code object. In combination these options allow the developer to influence CPU cache efficiency. See CPU Instruction Caching for a discussion of the issues involved. ═══ 8.1.9. /cLineWaste ═══ /cLineWaste Syntax: /cLineWaste: (default 3) Specifies the number of bytes of a CPU cache line the developer is willing to waste to ensure that code reached only via transfer instructions aligns on a cache line boundary. Default is 3 bytes. Only code which cannot be reached after executing the preceding instruction in memory is affected by this option. A typical example of such code is the first instruction of a procedure in a high level language. Together with /cLineSize this option controls the performance related alignment of code within the processed code object. In combination these options allow the developer to influence CPU cache efficiency. See CPU Instruction Caching for a discussion of the issues involved. ═══ 8.1.10. /compress ═══ /compress Syntax: /compress: (default 50) Compress recording by only recording code references not used within the last 'comp_distance' recorded items. As each use of code is detected the recorder checks to see how recently this code was last recorded. If within 'comp_distance' the use of the code is not recorded. This significantly reduces the size of the recording file saving disk space and the time taken to write the data. Compressed recording has a higher CPU and memory overhead but this overhead is unaffected by the size of 'comp_distance'. Performance of the stat arrangement algorithm is progressively degraded by increasing values of 'comp_distance'. Performance of the other arrangement algorithms is relatively unaffected by compression. The frequency with which recordings are flushed to disk may be altered by the /buff option. ═══ 8.1.11. /disasm ═══ /disasm Syntax: /disasm: (default 0) Perform disassembly beginning at the given offset into the code object. For performance reasons the disassembly does not take advantage of the code identification features of LXOPT but performs a 'blind' disassembly as typically provided by a debugging tool. The default number of bytes to disassemble is 50 but can be altered via the /dislen option. ═══ 8.1.12. /dislen ═══ /dislen Syntax: /dislen: (default 50) Set the number of bytes to disassemble when using the /disasm option. The locations are given and provided as hexadecimal offsets into the processed code object. Offsets within other objects are not affected by LXOPT. Note that translated offsets of transfer instructions may point to a different (possibly zero length) transfer instruction sequence in the original executable file. The corresponding /getNew option provides the reverse of this translation. ═══ 8.1.13. /forceIntFix ═══ /forceIntFix Syntax: /forceIntFix (default OFF) The setting of the internal fixups flag in the executable file header has proven to vary between development tools. LXOPT now ignores the header flag and searches for internal fixups within the fixup section. This option disables this test and forces LXOPT to assume that all required internal fixups are present in the file. This option is provided to cover the rare but theoretically possible situation where a file (most probably a code only DLL) does not require a single internal fixup. Inappropriate use of this option will cause the produced recording version of the application to fail with an access violation. DO NOT USE THIS OPTION UNLESS YOU ARE SURE YOUR EXE/DLL FILE DOES NOT CONTAIN A SINGLE DIRECT MEMORY REFERENCE. ═══ 8.1.14. /getOld ═══ /getOld Syntax: /getOld: Find the original position of the code/data that is located at the given offset into the processed code object in the optimised executable file. The locations are given and provided as hexadecimal offsets into the processed code object. Offsets within other objects are not affected by LXOPT. Note that LXOPT alters, inserts and removes transfer instructions (JMP, JNZ etc) as part of the optimisation process. When using the offsets of transfer instructions the translated offset may point to a different instruction sequence but will represent the same logical position in the path of execution. The corresponding /getNew option provides the reverse of this translation. ═══ 8.1.15. /getNew ═══ /getNew Syntax: /getNew: Find the new position of the code/data that was located at the given offset into the processed code object in the original executable file. The locations are given and provided as offsets into the processed code object. Offsets within other objects are not affected by LXOPT. Note that unused code or data from the original version of the executable may be removed by LXOPT optimisation and so will have no corresponding new offset. Translated offsets of transfer instructions may point to a different transfer instruction sequence or the transfer sequence may have been optimised out entirely. The /getOld option provides the reverse of this translation process. ═══ 8.1.16. /groups ═══ /groups Syntax: /groups: (default 3) Specify the number of distribution groups into which to divide code blocks when using the stat algorithm. The optimum value for this setting is application specific and may require experimentation to achieve the best result. ═══ 8.1.17. /ignoreMsgSeg ═══ /ignoreMsgSeg Syntax: /ignoreMsgSeg WARNING: USE OF THIS OPTION WILL CAUSE PROCESSED CODE TO FAIL ON THE FIRST ATTEMPT TO RETRIEVE DosGetMessage DATA. Force LXOPT to ignore DosGetMessage data within application code. This data bypasses normal fixup referencing and is incompatible with LXOPT processing. To avoid problems with DosGetMessage data refer to the instructions given with the LXO0169 error. Only use the /ignoreMsgSeg option if you are unable to relink the exe/dll. The DosGetMessage data is detected by searching for a (0xFF,"_MSGSEG") byte sequence within the processed code. This option should only be used where the detection is believed to be incorrect (BEWARE - DosGetMessage is often included and used by library code) or if it is known that code logic prevents any calls to DosGetMessage. ═══ 8.1.18. /lxinfo ═══ /lxinfo Syntax: /lxinfo: Specify path name for information gathered from source. The default is to use the application name with a .LXI extension. Use this option to place the file in another location. This option must be consistent across use of the /prep and /arrange options. ═══ 8.1.19. /noreport ═══ /noreport Syntax: /noreport Do not generate comparative report on old and new code arrangements. See the /report option for a description of report generation. ═══ 8.1.20. /orig ═══ /orig Syntax: /orig: Specify path name to which the original application file should be renamed. If this file already exists it is deleted. Default is to use the application name with a .ORI extension. This option must be consistent across use of the /prep and /arrange options. ═══ 8.1.21. /overwrite ═══ /overwrite Syntax: /overwrite Disable test which checks that original EXE/DLL file has not been updated before overwriting it after a new code arrangement has been generated. During arrangement LXOPT will normally check that an application file has not changed since it was prepared. If it has confirmation is sought that the file may be overwritten. This option switches off that test and always overwrites the original file. This option is only relevant when used with the /arrange option. ═══ 8.1.22. /pack2 ═══ /pack2 Syntax: /pack2[:] (default auto) Force the use of executable file page compression as introduced with OS/2 WARP. Each page in the file is compressed and the reduction in size calculated. If the percentage reduction is greater than 15% the compressed form of the page is used. The threshold of 15% may be varied by providing the new threshold to be used as a numeric parameter. FILES USING PAGE LEVEL COMPRESSION WILL NOT EXECUTE ON VERSIONS OF OS/2 PRIOR TO WARP. To preserve the operating system version compatibility of the existing software LXOPT will only use page compression if the file already contains compressed pages or if the /pack2 option is specified. All pages in the file including those containing data and resources are candidates for compression. The use of /pack2:100 will remove all compression from the resulting executable. ═══ 8.1.23. /preload ═══ /preload Syntax: /preload Allow the produced EXE/DLL file to be used with the Preload Utility. This option should be used in combination with the /arrange option. Preloading raises important development and performance issues. See the section on Preloading Code for a discussion of when and how to use files prepared with this option. Use of /preload causes LXOPT to intercept EXE/DLL initialisation and insert a small preloading code stub of approximately 200 bytes. The resulting file may be used normally or preloaded using the Preload Utility. Files with 16 bit initialisation entry points may be not be preloaded. The inserted code stub is activated by the Preload Utility when a request is made to preload the file. ═══ 8.1.24. /prep ═══ /prep Syntax: /prep (default) Prepare the given exe/dll for a recording session. Alternative is /arrange. For an explanation of the preparation process see Creating a Recording Version. ═══ 8.1.25. /recfile ═══ /recfile Syntax: /recfile: Specify path name of recording file. If the file already exists when recording starts the new recording is appended. The path name is used exactly as entered. A relative path name will be used to open/create the recording file if one is specified. For example, /recfile:MYAPP.REC will create the recording file MYAPP.REC in the current directory of the executing program. This can be useful if recording is to be performed on other machines The default is to use the full application path name with a .REC extension. Moving data from the recording buffer to the recording file can form a significant part of the overhead of the recording process. Recording files should ideally be placed on the fastest available local drive. Certain applications and DLLs have more than one instance while running. When multiple instances are detected the recording DLL creates a new recording file for each instance. The name for this file is generated by appending the instance number separated by a dot to the normal recording file name. Thus 3 instances of a DLL during recording might create: MYDLL.REC MYDLL.REC.2 MYDLL.REC.3 If recording onto a FAT based partition you will need to use a /recfile path without an extension to allow production of valid 8.3 filenames. When using the /arrange option only one recfile may be specified. To use multiple files concatenate them into one file with the OS/2 COPY command and specify the result as the recording file (e.g. COPY C:\RECDIR\*.REC* COMBINED.REC). File concatenation also allows multiple recordings to be created at different times or on different machines and then combined before use. ═══ 8.1.26. /reckeep ═══ /reckeep Syntax: /recKeep: Specify path name to which the previously generated recording application should be renamed. If this file already exists it is deleted. When a new arrangement is created LXOPT keeps the recording version of the application to allow further recording if required. Valid only with the /arrange option, this option allows the user to specify the path name to which the recording version should be renamed. Default is to use application path name with a .PRP extension. ═══ 8.1.27. /report ═══ /report Syntax: /report (default) Show comparative page fault report using old and new arrangements. LXOPT can use the recording file to simulate program page fault behaviour under various available memory conditions. This is used to provide a comparison between the old and new code arrangements. Note: LXOPT can only simulate code accesses. If data is stored within the code object page faults caused by references to it will not be simulated. While informative a report may take some time to calculate and may be disabled by the alternative /noreport option. ═══ 8.1.28. /thread ═══ /thread Syntax: /thread: Specifies the how the target application uses threads. This information is used to decide if or how the LXOPT recorder should protect itself from interruption. Users need not use this option unless they experience performance problems while recording. single Target application is single threaded. Recording code does not need to guard against interruption by another thread. This involves the least overhead during recording and should be used whenever possible. multi Target application is multi-threaded but does not terminate/suspend threads asynchronously. Recording code guards against interruption by another thread but assumes that no thread halts execution of another by use of DosKillThread(otherThread) or DosSuspendThread(otherThread). This option carries a slightly greater overhead during recording than the 'single' option. async Target application is multi-threaded and may terminate/suspend threads asynchronously. Recording code ensures that it is not halted inside critical instruction sequences. This involves a higher overhead during recording but does not allow for multiple threads of differing priorities. crit (Default) Target application is multi-threaded, may terminate/suspend threads asynchronously and may vary thread priorities. Recording code ensures that no thread can ever interrupt critical instruction sequences. This involves a higher overhead during recording but is the safest option. Use the option with the least overhead that your application allows. If you are unsure which option is correct for your application use 'crit'. If a thread safety level lower than required is used your application may lock-up, deadlock on semaphores or fail during recording. The /thread option takes effect only with the /prep option. ═══ 8.2. Glossary ═══ Glossary ═══ 8.2.1. Binary Arrangement Algorithm ═══ Binary Arrangement Algorithm An algorithm to calculate a new code layout based on code usage information obtained while recording. The binary arrangement algorithm is the default LXOPT arrangement algorithm. The recording history is used to generate a binary pattern with bits set corresponding to time of use in the executing application. A pattern is created for each code block within the application. Blocks are then grouped together based on similarity of these patterns. For many applications the binary algorithm will produce the best results. Algorithms are chosen by use of the /alg option. ═══ 8.2.2. Firstuse Arrangement Algorithm ═══ Firstuse Arrangement Algorithm An algorithm to calculate a new code layout based on code usage information obtained while recording. The firstuse algorithm arranges code on disk in the order in which it is first executed. This is a simple but effective algorithm which is best used on small applications and those with a simple flow of control. Algorithms are chosen by use of the /alg option. ═══ 8.2.3. Fixup ═══ Fixup When an application is loaded any absolute references to other parts of the application and all external references such as calls to the operating system must be resolved. For this to be achieved information must be retained in the exe/dll file describing the references which need to be 'fixed'. This information, known as 'relocations' or 'fixups', allows code to be located at a position in memory chosen by the operating system. Fixups are automatically applied as each code (or data) page is loaded from the executable file. ═══ 8.2.4. Page Fault ═══ Page Fault All code for normal applications is divided into 4Kb pages. A page fault is generated when an attempt is made to reference an address within a page that is not already in memory (*). When a page fault is generated OS/2 must load the page from disk, apply fixups to the page and return control to the application. All application code is loaded via page faults. When an application is started OS/2 does not immediately read any application code from the disk (**). Code is loaded into memory only as a result of page faults generated as the path of execution strays on to each new page. Execution of the errant thread is suspended until the page is loaded and available for use. It is this effect which often causes the system to appear slow or 'jerky', most noticeably at application start-up or after the use of a large memory hungry application. The problem is compounded in low memory conditions where OS/2 is forced to release the contents of one memory page to provide space for another. If the original page is later referenced it must be reloaded, again taking space from another. Servicing page faults can represent a significant overhead for both your application and the system as a whole. LXOPT optimises your code layout to minimise the number of page faults required to run your application. * Not all page faults result in a disk access. OS/2 will mark pages 'not present' and use resulting page faults to detect page references. This mechanism helps to prevent repeatedly accessed pages being removed from memory. These 'artificial' page faults have only a small CPU impact, the page is found in memory and will not be read from disk. ** OS/2 V2.0 and later ignore EXE/DLL preload pages created by use of the PRELOAD keyword used in module definition files. ═══ 8.2.5. Parked Code ═══ Parked Code Code that is logically reachable but is not normally executed by your application. Such code is detected by LXOPT and placed or 'parked' in your applications executable file on code pages where it causes least overhead. See Creating Optimised Code Arrangement for more details. ═══ 8.2.6. ParkOnly Arrangement Algorithm ═══ ParkOnly Arrangement Algorithm An algorithm to calculate a new code layout based on code usage information obtained while recording. The parkonly arrangement algorithm does not attempt to perform any code arrangement other than parking any unused code. This algorithm is intended to allow developers with special code arrangement needs to retain control over the arrangement of executed code while still gaining the benefit of code parking. This algorithm is not intended for normal use and should only be chosen by developers who have created manual code layouts to match their special requirements. Algorithms are chosen by use of the /alg option. ═══ 8.2.7. Recording File ═══ Recording File A file generated during the recording phase of LXOPT usage. This file is created and expanded as your application runs, recording the paths of execution through your code. See /recfile for more information and how to specify the recording file path. ═══ 8.2.8. Sleeping Code ═══ Sleeping Code Code not normally executed by your application. Such code is typically used to handle unusual situations not encountered by your application during recording. This code may also be logically unreachable. LXOPT can place such code in your applications executable file in a location where it causes the least overhead while remaining accessible. See Creating Optimised Code Arrangement for more details. ═══ 8.2.9. Statistical Arrangement Algorithm ═══ Statistical Arrangement Algorithm An algorithm to calculate a new code layout based on code usage information obtained while recording. The stat algorithm uses statistics to detect the pattern of use of each code sequence throughout the life of the application. Code sequences are grouped by this pattern and then arranged within the group based on the most frequently used ordering. The optimum number of groups into which to split the code is dependant on your application and is specified with the /groups option. Experiment with this option using different group values to find the optimum value for your application. This is typically between 1 and 20. Performance of the stat algorithm is degraded when using high values for the /compress option. The stat algorithm is best suited to large applications or those with a complex flow of control. Experiment with values for the /groups option to achieve the best results. Algorithms are chosen by use of the /alg option. ═══ 8.3. Program Messages ═══ Program Messages ═══ 8.3.1. Internal Errors ═══ Internal Errors The LXOPT software performs numerous internal consistency checks during its operation. If one of these tests should fail an internal error is produced. If you have not read the restrictions section please do so now, it is likely that some code has violated a restriction. Try moving suspect code out of the processed code object (put it in a separate named segment) and reattempt LXOPT processing. ═══ 8.3.2. LXO0100 ═══ LXO0100 Failed to expand program fixup records. LXOPT failed to expand the applications fixup records. Check that the application file is not corrupt (e.g. run exehdr on it). ═══ 8.3.3. LXO0101 ═══ LXO0101 Unable to create dump file. Dump files are used for debugging purposes and are created with the name "dump" in the current directory of the application. ═══ 8.3.4. LXO0102 ═══ LXO0102 Unable to pursue code instruction sequence. LXOPT was unable to pursue a sequence of bytes which it assumed to be code. This may be caused by exporting data from your applications code segment or by an internal error in LXOPT. To rectify export problems move any exported data to a separate object. To rectify code recognition problems move read-only application data out of your code object to a read-only data object. ═══ 8.3.5. LXO0103 ═══ LXO0103 Preparation of new code arrangement failed. An error occurred preparing the structures for the new application. Please ensure your original application file is valid. ═══ 8.3.6. LXO0104 ═══ LXO0104 Failed to create/write LXI file. A disk error occurred creating or writing the .LXI file. Check that sufficient disk space is available and usage of the /lxinfo option. ═══ 8.3.7. LXO0105 ═══ LXO0105 Cannot copy program to "pathname" Failed to copy the original application file. Check that the /orig option parameter. If the file is a DLL check that it is not currently loaded by another application. ═══ 8.3.8. LXO0106 ═══ LXO0106 Cannot create file "pathname" Failed to create the given file. Check the path is valid, there is sufficient free disk space, the file is not in use and the required access rights to the target directory are available. ═══ 8.3.9. LXO0107 ═══ LXO0107 Failed to write new program file. A failure occurred while writing the new executable. Check disk space. ═══ 8.3.10. LXO0108 ═══ LXO0108 Failed to read LXI file "pathname" The LXI file could not be read or is invalid. Check use of the /lxinfo option. ═══ 8.3.11. LXO0109 ═══ LXO0109 Invalid parameter or value for option: "option". The option or its parameter given to LXOPT was not recognised. ═══ 8.3.12. LXO0110 ═══ LXO0110 Failed to open "pathname" Could not open the given file. Check that the pathname is valid, the file exists and you have sufficient access rights to its directory. ═══ 8.3.13. LXO0111 ═══ LXO0111 File "pathname" is not a valid linear executable Could not load the specified file. The file is corrupt or is not an OS/2 2.x/WARP Linear eXecutable file. ═══ 8.3.14. LXO0112 ═══ LXO0112 See Internal Errors ═══ 8.3.15. LXO0113 ═══ LXO0113 Input offset is out of range. The supplied offset does not exist. This message appears if an incorrect parameter is used with the /getOld or /getNew options. Check that the value given is the offset in hex from the start of the processed code object. If this message appears during LXOPT preparation or arrangement this is an Internal Error. ═══ 8.3.16. LXO0114 ═══ LXO0114 See LXO0113 ═══ 8.3.17. LXO0115 ═══ LXO0115 See LXO0113 ═══ 8.3.18. LXO0116 ═══ LXO0116 See Internal Errors Although an internal error, this message has only previously occurred where a user is not following the advice in the Urgent Message section. This is important advice, do not ignore it. ═══ 8.3.19. LXO0117 ═══ LXO0117 See Internal Errors ═══ 8.3.20. LXO0118 ═══ LXO0118 See Internal Errors ═══ 8.3.21. LXO0119 ═══ LXO0119 See Internal Errors ═══ 8.3.22. LXO0120 ═══ LXO0120 Information (.LXI) file was not created by this version of LXOPT. The LXI file you have attempted to use was created by another version of LXOPT. Applications must be prepared and arranged using the same version of LXOPT. Prepare and generate new recording information for the application using the current version of the LXOPT software. ═══ 8.3.23. LXO0121 ═══ LXO0121 Recording (.REC) file does not belong to the processed application. Your recording file was not made by the application being processed. Check that you are attempting to arrange the correct application and that you are not using a recording file intended for an earlier version or a different application. See also the /recfile option. ═══ 8.3.24. LXO0122 ═══ LXO0122 .REC file is not valid. The given recording file has been corrupted or is not a recording file. Check use of the /recfile option parameter. ═══ 8.3.25. LXO0123 ═══ LXO0123 Unrecognised arrangement algorithm "alg-name" The given arrangement algorithm name is not recognised. See the /alg option for more details. ═══ 8.3.26. LXO0124 ═══ LXO0124 See Internal Errors ═══ 8.3.27. LXO0125 ═══ LXO0125 See Internal Errors ═══ 8.3.28. LXO0126 ═══ LXO0126 See Internal Errors ═══ 8.3.29. LXO0127 ═══ LXO0127 See Internal Errors ═══ 8.3.30. LXO0128 ═══ LXO0128 See Internal Errors ═══ 8.3.31. LXO0129 ═══ LXO0129 See Internal Errors ═══ 8.3.32. LXO0130 ═══ LXO0130 See Internal Errors ═══ 8.3.33. LXO0131 ═══ LXO0131 See Internal Errors ═══ 8.3.34. LXO0132 ═══ LXO0132 See Internal Errors ═══ 8.3.35. LXO0133 ═══ LXO0133 See Internal Errors ═══ 8.3.36. LXO0134 ═══ LXO0134 See Internal Errors ═══ 8.3.37. LXO0135 ═══ LXO0135 See Internal Errors ═══ 8.3.38. LXO0136 ═══ LXO0136 See Internal Errors ═══ 8.3.39. LXO0137 ═══ LXO0137 See Internal Errors ═══ 8.3.40. LXO0138 ═══ LXO0138 Code object exports a 286 call gate entry point. LXOPT has detected an exported 16 bit call gate entry which refers to the 32 bit code object which is being processed. It is likely that when control is received via this entry point that some 16 bit selectors will be in effect. LXOPT is designed to be applied to 32 bit code. You cannot export 16 bit call gate entry points from the processed 32 bit code object. ═══ 8.3.41. LXO0139 ═══ LXO0139 See Internal Errors ═══ 8.3.42. LXO0140 ═══ LXO0140 See Internal Errors ═══ 8.3.43. LXO0141 ═══ LXO0141 See Internal Errors ═══ 8.3.44. LXO0142 ═══ LXO0142 See Internal Errors ═══ 8.3.45. LXO0143 ═══ LXO0143 See Internal Errors ═══ 8.3.46. LXO0144 ═══ LXO0144 See Internal Errors ═══ 8.3.47. LXO0145 ═══ LXO0145 See Internal Errors ═══ 8.3.48. LXO0146 ═══ LXO0146 Out of memory. LXOPT ran out of memory while processing the application. Up to approximately 15 times the size of the application code size will need to be allocated for processing to succeed. Check that there is sufficient free disk space on your swap partition. ═══ 8.3.49. LXO0147 ═══ LXO0147 Failed to follow code pointer. See LXO0102 ═══ 8.3.50. LXO0148 ═══ LXO0148 No objects in program module. The processed application does not contain any code. ═══ 8.3.51. LXO0149 ═══ LXO0149 No pages in program module. The processed application does not contain any code. ═══ 8.3.52. LXO0150 ═══ LXO0150 Unused area "start address" to "end address" contains fixups. The bytes between the start and end offsets have been detected as unused. However they represent either valid code or initialised data. This warning will appear for almost all applications processed and demonstrates LXOPT's effectiveness at finding unused code or data. Often such code is contained in libraries to which your application is linked. Offsets are relative to the start of the processed code object. Unused code or data is not referenced by any part of the application and is removed by LXOPT If your application contains assembler code that you suspect might be violating LXOPT restrictions you can use the address range and your map file to identify the source code. ═══ 8.3.53. LXO0151 ═══ LXO0151 Code object "object-number" is writeable. The given code object is writeable. Although LXOPT can still process the application writeable code objects can indicate that programming techniques that violate LXOPT restrictions may be in use. This is particularly true if the object named is the one being processed. ═══ 8.3.54. LXO0152 ═══ LXO0152 See Internal Errors ═══ 8.3.55. LXO0153 ═══ LXO0153 See Internal Errors ═══ 8.3.56. LXO0154 ═══ LXO0154 Base not multiple of 64Kb. The specified new program base is not a multiple of 64Kb. This warning normally appears when the new base has been incorrectly specified. Base values are entered as an address in hex. See the /base option for correct usage. ═══ 8.3.57. LXO0155 ═══ LXO0155 Module is not standard EXE or DLL. The application file is of a type not processed by LXOPT. Typically this message is produced by an attempt to apply LXOPT to a physical or virtual device driver. ═══ 8.3.58. LXO0156 ═══ LXO0156 See Internal Errors ═══ 8.3.59. LXO0157 ═══ LXO0157 Internal fixups have been removed. The EXE file has been given a base address. Applying a base address when linking strips internal fixups from EXE files. LXOPT needs internal fixups to enable it to correctly process your application. See Preparing Applications for more information. ═══ 8.3.60. LXO0158 ═══ LXO0158 File contains an unknown page type. The processed file contains code/data encoded in a manner unknown to LXOPT or OS/2 V2.1. This is probably due to the use of a new linker or other development tool. To rectify the problem choose options with the new linker/tool that will allow the resulting application to run under OS/2 V2.x or WARP. Check also that you are using the latest available version of LXOPT. ═══ 8.3.61. LXO0159 ═══ LXO0159 Failed to decompress EXEPACK2 page, file corrupted?. The processed file contains code or data identified as compressed using the technique introduced by OS/2 V3.0 (Warp). A compressed page did not expand correctly. It is likely that the file is corrupted. Please regenerate any files created via the resource compiler -X2 option and relink the application. ═══ 8.3.62. LXO0160 ═══ LXO0160 Information (.LXI) file does not belong to the processed application. Your information file was not made by the application being processed. Check that you are attempting to arrange the correct application and that you are not using an information file intended for an earlier version or a different application. See also the /lxinfo option. ═══ 8.3.63. LXO0161 ═══ LXO0161 The preload option or LXOPT demo version cannot process files with 16 bit library initialisation code. The /preload option needs to alter library initialisation to ensure the DLL is correctly loaded. Your DLL contains a 16 bit entry point and LXOPT is unable to process the initialisation sequence. To use the /preload option with this DLL you must move your library initialisation to 32 bit code. Alternatively do not use /preload with this DLL. A 16 bit main entry point indicates that the restrictions section will need to be read with particular notice to the section on SS:ESP. ═══ 8.3.64. LXO0162 ═══ LXO0162 No recorder dll name available. LXOPT has failed to generate a unique name for your application specific recording DLL. It is likely that over a period of time a large number of unused recording DLLs has built up in your development directory. Delete all old versions and reattempt preparation. ═══ 8.3.65. LXO0163 ═══ LXO0163 Cannot locate LXOPT recorder data. Check your PATH and installation. LXOPT failed to find its installation directory where vital data is stored. Please ensure that the LXOPT installation directory is on your PATH and that installation completed successfully. ═══ 8.3.66. LXO0164 ═══ LXO0164 Disk access failure creating recorder DLL. Creation of a recorder DLL failed due to a file access failure. Check that you have enough available disk space and that you have sufficient access rights if the target directory is on a network. Check also that an existing recording DLL is not in use. ═══ 8.3.67. LXO0165 ═══ LXO0165 See Internal Errors ═══ 8.3.68. LXO0166 ═══ LXO0166 High number of layout attempts: , continuing... LXOPT is having difficulty recreating your EXE/DLL. Some large applications make take a significant number of attempts before a successful layout can be achieved. LXOPT should always eventually succeed. ═══ 8.3.69. LXO0167 ═══ LXO0167 Cannot copy prepared recording program to "pathname" Failed to copy the prepared application file. Check disk space or file/directory access permissions. ═══ 8.3.70. LXO0168 ═══ LXO0168 See Internal Errors ═══ 8.3.71. LXO0169 ═══ LXO0169 Active DosGetMessage data (MSGSEG32) detected in code area. Some message data used by DosGetMessage is contained within the code object processed by LXOPT. This message data is retrieved during execution of DosGetMessage code using the address of the containing code object and adding a predetermined offset to it. This mechanism bypasses normal code/data referencing and is incompatible with LXOPT processing. This data is often introduced by use of functions within the IBM CSet/VAC libraries which in turn depend on DosGetMessage for their own message handling. To prevent this problem move the message data to another code object by identifying the 'segment' in your module definition (.DEF) file. If the processed file was an executable which does not currently have a .DEF file, create a file .DEF and insert the lines below. Remember to include your definition file when you relink. NAME SEGMENTS _MSGSEG32 CLASS 'CODE' For existing .DEF files simply add the line containing _MSGSEG32 to the SEGMENTS section. MSGSEG32 data is detected by searching for data which starts with a 0xFF byte followed by the text MSGSEG32. In the unlikely event that your code contains this data sequence for another purpose or you are SURE that it is not used this LXOPT test may be disabled, see the /ignoreMsgSeg option for more details. ═══ 9. Utilities ═══ Utilities ═══ 9.1. LXWarp - Apply OS/2 WARP compression to 2.x executables ═══ LXWarp - Apply OS/2 WARP compression to 2.x executables Usage: LXWARP [/clear] [/bakfile:] [/threshold:] Examples: To compress 'myapp.dll' and delete the backup file on successful completion. LXWARP myapp.dll /clear Compress 'myapp.dll' pages where compression reduces the page size by more than 20%. Original file stored as 'myapp.wbk'. LXWARP myapp.dll /threshold:20 LXWARP takes an existing executable and applies OS/2 WARP page compression to it. The EXE/DLL file created will occupy less disk space and take less time to be loaded by the operating system. This new file will not execute under versions of OS/2 prior to WARP. LXWARP allows OS/2 WARP users of 2.x targeted code to gain the benefits of page level compression. Developers may also use LXWARP to process third party 2.x targeted DLLs for use with their OS/2 WARP specific applications. ═══ 9.1.1. /clear ═══ /clear Syntax: /clear (default OFF) Delete the copy of the original executable file when LXWARP/LXUNWARP processing is successfully completed. The LXWARP and LXUNWARP applications both keep backup copies of the original file during processing. If an error occurs the original file is automatically restored. If the new executable is successfully created use of this option will cause the backup file to be deleted. ═══ 9.1.2. /bakfile ═══ /bakfile Syntax: /bakfile: (default see below) Specify the name of the backup file in which to keep the unprocessed executable. Both LXWARP and LXWARP always create a copy of the original file before processing. By default this backup file is created with the same root name and in the same directory as the original file. By default LXWARP backup files have the extension 'WBK', LXUNWARP backup files have the extension 'UBK'. ═══ 9.1.3. /threshold ═══ /threshold Syntax: /threshold: (default 15) Specify the reduction in page size required for compression to be used. Each page in the file is compressed and the reduction in size compared with the original page size. If the percentage reduction is greater than the value supplied by this parameter the compressed form of the page is used. ═══ 9.2. LXUnWarp - Remove OS/2 WARP compression from executables ═══ LXUnWarp - Remove OS/2 WARP compression from executables Usage: LXUNWARP [/clear] [/bakfile:] Example: LXUNWARP myapp.dll LXUNWARP takes an existing executable file and expands OS/2 WARP compressed pages. Although the resulting file will be loadable by OS/2 V2.x the application may still not execute due to OS/2 WARP specific API dependencies. Expanded pages are tested for normal iterated data encoding which is performed as required. LXUNWARP allows OS/2 2.x users to run code linked specifically for OS/2 WARP installations. It can also be used to undo prior use of the LXWARP command. Use of LXUNWARP on an LXWARPed file may not result in an exact copy of the original file due to page alignment issues and choice of iteration strategy. ═══ 9.3. Preload - Transfer executables to swapfile ═══ Preload - Transfer executables to swapfile Usage: PRELOAD [options] [] [options] /Q .- Quiet mode (suppress copyright notice) /S - Silent mode (suppress all output) /G: - Group name (default is exe/dll file name) /I - Make Preload Manager invisible (remove from task list) /V - Make Preload Manager visible (add to task list) /U - Unload file/group, unload ALL if none specified /L - List previous active load instructions /W: - Wait for drive letters to become available. e.g. /W:FGH /X - Unload all and terminate the preload manager /T: - Exe load time-out in seconds /M: - Deny request if below Mb free swap space (default 10) /? or /h - Display options The preload utility allows users to selectively preload and unload LXOPT produced EXE/DLL files arranged with the /preload option. There are important DLL initialisation, disk space requirements, boot time and user control issues raised by preloading code, see the Preloading section for more details. The utility (PRELOAD.EXE) and its accompanying on-line documentation (PRELOAD.INF) are included with the LXOPT software package and these two files may also be distributed with LXOPT processed files. The program operates in three modes, as a command line utility as described above, in background Wait Mode awaiting network drive availability and as a continuously active Preload Manager. The preload utility automatically activates a manager process if one is not currently running. All utility requests are passed on to the manager to be performed. Preload Manager The preload manager performs commands supplied via the preload utility. Started by the first use of the preload utility the manager runs continuously in the background. When made visible by the /V option the preload process appears to the user as a normal windowable VIO application. If made invisible via the /I option the process is hidden, removed from the PM task list and prevented from rejecting an operating system shutdown. When operating invisibly the process may only be interacted with via the preload utility. Termination of the manager will cause all preloaded programs to be unloaded as the use count of each module drops to zero. The manager may be terminated by closing the visible manager session or via the preload utility with the /X option. Although the preload manager uses the Presentation Manager API it does not require PM to be present. If running with another shell (e.g. TSHELL) interaction with the task list is not attempted and sessions are started using parameters compatible with TSHELL operation. Wait Mode Permanently invisible, a wait mode process is started to handle a preload request where the /W parameter is used and the drives specified are not currently available. The wait mode process performs the actions normally performed by the preload utility, it simply waits for the specified drives to become available before issuing the load request to the preload manager. Wait Mode is vital for PRELOAD to be able to operate effectively in a customer network environment. Users/installation programs are able to insert preload requests into startup.cmd without concern for network availability. Preload Utility The preload utility provides a simple command line interface to the preload manager. The utility has options to suppress output (/Q /S), control manager visibility (/V /I), list current active loads (/L) as well as provide the basic file loading interface. DLL files are loaded into the Preload Manager process, each executable is provided with its own session. Execution of normal initialisation is suppressed in LXOPT processed EXE files and per instance initialised DLLs. This is designed to ensure preloading is both transparent and does not adversely affect normally loaded versions of the code. LXOPT processed globally initialised DLLs and all unprocessed DLLs will initialise normally. For a fuller discussion of initialisation and the issues raised see Preloading Files are loaded into named groups. By default the group name is the filename derived from the pathname specified on the command line. For example PRELOAD C:\APPS\FAST_APP.EXE will preload the LXOPT processed executable file FAST_APP.EXE, LXOPT processed DLLs on which it depends and load normally any other DLLs which it requires. These files are treated as a single loadable unit under the group name FAST_APP.EXE. Group names become useful when a user wishes to unload all the preloaded files associated with a single product. For example PRELOAD C:\APPS\FAST_APP.EXE /G:FAST_SUITE PRELOAD C:\APPS\FAST_SRV.EXE /G:FAST_SUITE PRELOAD C:\APPS\FAST_WPS.DLL /G:FAST_SUITE preloads the 'FAST' application, background server and workplace shell object DLL under the single name 'FAST_SUITE'. To release the preload on all these files requires one command PRELOAD /G:FAST_SUITE /U The preloader uses the operating system calls DosStartSession and DosLoadModule to load files and then transfers them to the swapfile. This can be a time consuming process and by default the preloader will wait for up to 120 seconds for a load request to complete. This time limit may be adjusted using the /T option. Preload requests where the amount of free swap space is less than 10Mb will be rejected. This limit may be varied by use of the /M option. /M:0 will disable free swap space testing but is not recommended, particularly for preload requests in startup.cmd. Preload requests may be issued if some or all of the requested files are already active. The preload operation is transparent to any existing executing code. When complete all instances of preloaded code, including those previously running, will benefit from the preloading effect. ═══ 9.4. TimeRun - Measure application run-time ═══ TimeRun - Measure application run-time Usage: TIMERUN [parameters] Example: TIMERUN touch data.txt Execute touch.exe with the parameter 'data.txt' TIMERUN executes the given application and displays the total execution time on termination. This is the total time elapsed from start-up to termination. This is not a measure of CPU time. Before executing the application TIMERUN waits 5 seconds to ensure disk activity due to delayed writes has completed. Repeating the same command often results in a reduced execution time due to caching effects. Subsequent execution times may then vary due to the activity of background processes. This can make reliable comparative performance testing very difficult, see Performance Testing for more details. TIMERUN uses CMD.EXE to execute the given command. Total time elapsed may be dominated by the time taken to invoke the command processor where total run time is less than 10 seconds. ═══ 9.5. Thrash - Create high memory load for performance testing ═══ Thrash - Create high memory load for performance testing Usage: THRASH Examples: THRASH 4 Allocate and continuously access 4Mb of memory. THRASH 50% Allocate an amount of memory equivalent to 50% of system RAM and continuously access it. THRASH 75P Allocate an amount of memory equivalent to 75% of system RAM and continuously access it. THRASH allocates the given amount of memory and continuously accesses it. Order of access is updated dynamically to maximise the amount of the allocation retained in system RAM. THRASH provides a simple way to produce a constant high memory load as might be generated by several large background applications. This can be very useful for testing software performance under low memory conditions. The alternative of 'P' for the '%' trailing character is provided to simplify the passing of parameters via a WPS program object. Use THRASH with caution. Thrashing large quantities of memory may cause your system to run unacceptably slowly. Any keyboard input to THRASH will cause it to release the allocated memory and terminate. ═══ 9.6. UnLXOPT - Undo processing and delete LXOPTed files ═══ UnLXOPT - Undo processing and delete LXOPTed files Usage: UNLXOPT Example: UNLXOPT MYAPP.DLL Restore original MYAPP.DLL and delete LXOPT created files UNLXOPT is a simple tool that restores the original version of the LXOPTed file and then deletes any .REC, .PRP, .ORI and automatically generated recording DLL. The file names to be deleted are generated by using the default names that would be used by LXOPT when processing the supplied application file. ═══ 10. FAQs ═══ FAQs ═══ 10.1. How Do I Trace CS:EIP Back to My Source Code? ═══ How Do I Trace CS:EIP Back to My Source Code? This is in fact two questions: how to find the original CS:EIP of a location in an LXOPTed file and then how to trace that back to the source code. CS:EIP values are typically provided as : pairs. LXOPT only alters the offsets of code within the largest code object so if the object number is not the same as the one LXOPT processed that offset will not have been altered by LXOPT processing. If the object number is the one processed by LXOPT use the LXOPT /getOld option to find the original offset value. Once you have obtained the original offset value you can then search your map file to find the containing function or examine the code in a debugger to locate the instruction that caused the error. ═══ 10.2. Isn't /BASE:65536 Better With EXE Files? ═══ Isn't /BASE:65536 Better With EXE Files? The simple answer is yes. All EXE files produced by LXOPT are automatically based at address 65536 and the internal fixups removed. LXOPT only requires input EXE files to avoid this option to ensure the file contains all the information required for correct processing. This has no adverse affect on the resulting LXOPT processed executable. ═══ 10.3. Which is the Best Arrangement Algorithm? ═══ Which is the Best Arrangement Algorithm? There is no single algorithm that out performs all others for all situations/executables. The binary algorithm introduced with V1.1 is now the default algorithm and is most likely to produce the best overall results. Choice of algorithm is best made by experimentation. There may not always be a clear winner, sometimes the best algorithm for a 100Kb memory restriction will be beaten by another at 200Kb. Due to the 'Least Recently Used' algorithm used to recycle pages of memory, applications with a short runtime are unlikely to ever be memory restricted unless they require large quantities of system RAM. Such applications have little need for complex arrangements and may be better suited to the firstuse algorithm. This algorithm produces the most sequential code which may result in more efficient page loads and cpu caching. The Stat arrangement algorithm may be used on larger applications with long complex runtimes. Use of Stat is complicated by its /groups option and will often be out performed by the binary algorithm. ═══ 10.4. Where is my Recording File? ═══ Where is my Recording File? First be sure you have executed the recording version of the code. If you invoke your code via a program object it may now reference the .ORI file. If a previous version of an EXE/DLL is preloaded this old version will still be executed even though a new file exists. Recording files are created by an LXOPT prepared application while it executes. By default this file has the full path name of the executable provided to LXOPT but with a .REC extension. If the executable file name provided via /recfile does not have a full path (e.g. .\MYAPP.REC) then when the recording version of the application executes it will also attempt to create a file without a full path. As no full path is used this file will be created based on the current drive/directory at time of creation. This can be useful where recording is performed on a separate machine and facilitates simultaneous use of the recording version with different machines on the same network. This can however cause problems where the current drive/directory is not predictable and in these circumstances specify a full pathname for the recording file by use of the /recfile option. ═══ 10.5. LXOPT Just Hangs During Processing! ═══ LXOPT Just Hangs During Processing! Although LXOPT may appear to be a simple tuning utility it requires a considerable amount of CPU time and memory to process large input files. Preparation/arrangement times of over 5 hours have been reported. Run times can be particularly long where a large file is processed on a machine with 8Mb or less of RAM. During processing LXOPT is effectively recompiling and relinking your entire application in addition to performing its working set tuning function. If you suspect LXOPT has hung please allow it to run uninterrupted overnight before reporting the problem. ═══ 10.6. Why Isn't my Application 50% Faster? ═══ Why Isn't my Application 50% Faster? LXOPT rearranges application code to reduce the total memory occupied by code and the time taken to load it. The percentages reported by the arrangement process are expected reductions in page faults, not overall run time. It follows that any performance improvement in this area is dictated by how much time the original application spent loading its code. If your application is a 30Kb executable searching for prime numbers then performance improvements are likely to be restricted to CPU caching benefits. These CPU caching improvements are unlikely to produce more than a 5% performance boost. Working set tuning, like any efficiency measure, shows its worth when the commodity in question is in short supply. Examination of an arrangement report shows that page fault reductions improve under increasingly restricted memory conditions. The improvement is not just in percentage terms, but much more significantly in total number of faults. The greater the page fault load, the better LXOPT performs. ═══ 10.7. Why Does my Page Fault Monitor Report More Page Faults? ═══ Why Does my Page Fault Monitor Report More Page Faults? When LXOPT produces an arrangement report it shows the expected number of page faults loading code from the processed code object. LXOPT only tunes the code arrangement within this processed object. LXOPT does not tune data accesses or attempt to simulate page faults generated by any other means. Your application will generate page faults loading application data and executing code in other DLLs such as Presentation Manager APIs. Performance monitoring tools tend to collect and report all such page fault data in combined form. Although the number of page faults reported by such tools will differ from that provided by LXOPT, they can still be used to assess the improvement in an LXOPT processed applications performance. Use your tool on the original and processed versions of the application and examine the reduction in total page faults. If your test mimics the actions performed during LXOPT recording this reduction should agree the numeric reduction in page faults predicted by LXOPT. Page fault monitoring tools will not reveal CPU caching or reduced disk seek time benefits. ═══ 10.8. Why Does my Application Make a Beeping Noise? ═══ Why Does my Application Make a Beeping Noise? Applications produced by LXOPT produce an alternating tone when attempting to display an LXOPT message to the user. If a message does not appear it is likely that the LXOPT installation directory is not on your PATH. Add the LXOPT directory to the path and retry the application. Tones generated by a recording version of an application are an indication that an error has occurred. If an error message is not displayed examine the file 'LXREC.ERR' to identify the source of the error. This is typically a failure writing to the recording file due to an invalid file name or insufficient disk space. LXOPT initiated tones or messages are not generated by optimised EXE/DLL files. These files will perform identically to their unoptimised originals and do not require access to any LXOPT support files. ═══ 10.9. My Application Fails - Cannot Find @1? ═══ My Application Fails - Cannot Find @1? The system could not execute the recording version of your application because it could not find the file @1.DLL. This DLL was created for your application by LXOPT to aid in the recording process. This file is located in the directory in which the original EXE/DLL file was processed. Copy the DLL to a directory on your LIBPATH.