home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 10 Tools
/
10-Tools.zip
/
lxopt122.zip
/
LXOPT.INF
(
.txt
)
< prev
next >
Wrap
OS/2 Help File
|
1997-07-01
|
117KB
|
3,518 lines
ΓòÉΓòÉΓòÉ 1. Copyright and License Agreements ΓòÉΓòÉΓòÉ
Copyright and License Agreements
ΓòÉΓòÉΓòÉ 1.1. Copyright Notices ΓòÉΓòÉΓòÉ
Copyright Notices
LXOPT and the LXOPT logo are trademarks of Functional Software Limited.
OS/2, C Set ++ and VisualAge C++ are trademarks of International Business
Machines Corporation.
The LXOPT software and accompanying documentation may be distributed and used
free of charge but are copyright (C) 1994-1997 Functional Software Limited. All
rights reserved.
Functional Software may be contacted via the internet at
funcsoft@cix.compulink.co.uk
ΓòÉΓòÉΓòÉ 1.2. LXOPT License Agreement ΓòÉΓòÉΓòÉ
LXOPT License Agreement
Definition of terms used in this agreement
LXOPT: The LXOPT software, utility programs and accompanying documentation.
USER: You, the purchaser of the LXOPT software.
FSL: Us, Functional Software Limited.
OUTPUT: Executable computer code, or derivative thereof, created or altered by
the LXOPT software.
Acceptance
Use of LXOPT indicates acceptance by USER of the terms and conditions of this
agreement.
If you do not agree to these terms and conditions you may not make use of LXOPT
and must destroy any and all installed copies of the software.
Grant of License
The LXOPT software is copyright of Functional Software Limited. FSL retains
ownership of LXOPT. You are hereby granted a nonexclusive license to use LXOPT
subject to the permitted uses and restrictions contained in this agreement.
Permitted Uses
LXOPT may be applied by USER to software owned by USER.
LXOPT may be applied by USER to software licensed by USER where such actions
are consistent with that license.
Copies of LXOPT may be freely distibuted provided that each copy is complete
and unaltered. No fee may be made for distribution other than a nominal
distribution charge.
USER may distribute unlimited copies of OUTPUT.
USER may distribute unlimited copies of the files PRELOAD.EXE and PRELOAD.INF.
Restrictions
USER may not alter LXOPT unless such alteration is approved by FSL. Such
prohibited alterations include, but are not limited to, the operation and
appearance of the software and the text of the documentation which shall
include this agreement and accompanying copyright notices.
USER may not reverse compile, reverse engineer or reverse assemble any part of
the LXOPT software. For the purposes of this agreement use of LXOPT as
described in the accompanying documentation shall not constitute reverse
engineering of the software.
All rights not expressly granted by this agreement are retained by FSL.
Limited Warranty
LXOPT does not work with all valid OS/2 executables. Programming constructs
exist that defeat the mechanisms used within LXOPT. It is the responsibility
of USER to test the suitability of LXOPT for USER's applications. In view of
this USER assumes all liability and responsibility for the decision to use
LXOPT and all OUTPUT produced including any consequences thereof.
LXOPT is supplied "AS IS", without warranty of any kind, either expressed or
implied, statutory or otherwise, including but not limited to the implied
warranties of merchantability or fitness for a particular purpose that may be
made by FSL or its software suppliers on this product. No oral or written
information or advice given by FSL, its software suppliers, dealers,
distributors, agents or employees shall create a warranty and you cannot rely
on the correctness of any such information or advice.
Neither FSL, its software suppliers, dealers, distributors, agents or employees
shall be liable for any direct, indirect, consequential or incidental damages.
Including but not limited to damages for loss of business profits, business
interruption or loss of business information, arising out of the use or
inability to use the software or accompanying documentation, whether or not FSL
has been advised of the possibility of such damages.
Under no circumstances will FSL liability exceed the purchase price of the
software.
Governing Law
This license is governed by the laws of England and USER agrees to submit to
the jurisdiction of the English courts.
Where the local laws of USER prohibit the jurisdiction of English law this
license will be governed by the laws of the country in which LXOPT is used.
FSL may, at its own discretion, elect to enforce, apply and interpret the terms
of this agreement under any applicable foreign jurisdiction.
If any provision of this agreement is unenforceable, all others shall remain in
effect. Furthermore, any such unenforceable provision shall remain and be
interpreted in its strictest sense which remains consistent with governing law.
ΓòÉΓòÉΓòÉ 2. URGENT MESSAGE ΓòÉΓòÉΓòÉ
URGENT MESSAGE
This message only applies to OS/2 Warp 3.0 at FixPak level 10 or lower. Users
of OS/2 V2.x or Warp 3.0 users who have applied FixPak 11 or later may ignore
this message. All affected users are strongly advised to upgrade their OS/2
installation.
There is an obscure software fault in the virtual memory manager of OS/2 WARP
V3.0 (PJ18014) where FixPak 11 or later has not been applied. This fault can
result in the unpredictable corruption of a page of memory when an application
allocates memory in excess of the computers physical RAM and accesses/alters it
in a particular order forcing swapfile growth.
Normal applications have little need to concern themselves with this problem.
Unfortunately LXOPT reliably reproduces the conditions for failure when
processing a large executable file. For these purposes a file of 1Mb is
considered large, although the precise limit is unknown and will vary with
available RAM.
The most common manifestation of the fault is the failure of an internal
consistency check resulting in an LXO0116 internal error. Given the nature of
the fault errors can appear at any stage and some users have experienced the
generation of corrupt executable files where a page containing application code
has been corrupted before being written to disk. Other users have experienced
protection violations or unexplained premature termination of the software.
While all software faults are of concern, the unpredictability and potentially
undetectable nature of this fault make it particularly dangerous.
A workaround involving the removal of the need for swapfile growth has fixed
all known manifestations of this problem. This is achieved by pre-setting the
initial size of the swapfile in config.sys to a value that ensures that
execution of LXOPT will not require swapfile growth.
To calculate the required figure multiply the size of the largest EXE/DLL to be
processed by 15 and add it to the 'normal' swapfile size as given by a
directory listing on your machine. Obviously running other applications
simultaneously with LXOPT will affect swapfile requirements and the initial
swapfile size should be set accordingly. Adding an additional 5 or 10Mb to the
resulting figure would be a wise precaution.
The line below sets the initial swapfile size to 50Mb with a 2Mb minimum free
disk space limit.
SWAPPATH=D:\ 2048 51200
Regardless of file size, if you experience symptoms matching the description
above you should attempt this workaround.
ΓòÉΓòÉΓòÉ 3. This Software is FREE! ΓòÉΓòÉΓòÉ
This Software is FREE!
LXOPT is now freeware! LXOPT was previously a commercial OS/2 development tool
but was withdrawn from sale at the end of January 1997. Rather than allow the
software to disappear this final version is now distributed as unsupported
freeware.
Unfortunately this also means that development for the OS/2 platform has
ceased. Technical support for this freeware version is not available.
ΓòÉΓòÉΓòÉ 4. V1.22 - What's Changed? ΓòÉΓòÉΓòÉ
V1.22 - What's Changed?
Version 1.22
Change to Freeware
LXOPT is now distributed as unsupported freeware.
Version 1.21
Demo Version Changes
The LXOPT Demonstrator has been upgraded to include all the LXOPT utility
programs and will now produce unrestricted applications when processing
code of less than 256Kb in size.
TLINK Problems
In some circumstances the Borland linker (TLINK) can generate invalid
values in the fixup page table within executables. LXOPT now detects and
corrects illegal values.
Software Faults
Fault corrected which could cause files arranged with /preload to reject
preload requests.
Code recognition updated to prevent reported rejections of valid code
sequences.
Code generation algorithm updated to prevent potentially endless
application of transfer optimisation.
Version 1.2
Code Preloader
The new /preload option allows application code to be transferred to the
swapfile at boot time. A freely distributable preload utility works in
combination with LXOPT processed code allowing users to selectively
preload code. Executed code is loaded direct from the swapfile with no
need to apply fixups or for network traffic. Preloads of network files
may be automatically deferred until specific network drive(s) are
available allowing preload requests (e.g. in startup.cmd) to be made prior
to restoration of network connections.
Dead Code Elimination
Code recognition algorithms have been revised to improve the detection and
elimination of unused code/data in the processed code object. Code
previously included due to a cyclic reference or unused pointer reference
(typically from unused CASE statement jump tables) is now removed.
Recording Efficiency
Code path detection will now predict more code paths removing the need to
record their use in the recording file. Untraced sequences will execute
faster and help reduce recording file sizes.
Fixup Encodings
Generation of new fixup tables has been improved to allow the grouping of
fixups with common targets. Although previous versions of LXOPT performed
this optimisation, some combinations were omitted. This is particularly
important for DLLs which typically contain a far larger quantity of fixups
in relation to code size.
Undo Utility
LXOPT now includes an undo utility. Applied to processed DLL/EXE files
this will restore the original unprocessed version of a file and delete
any related files generated by LXOPT. This utility is installed in the
WPS as part of the installation process. See UNLXOPT for more details.
DosGetMessage
Code inserted from IBM libraries to handle DosGetMessage API calls
violates an LXOPT restriction. LXOPT now rejects such code which needs to
be moved to another code object to operate correctly. See the LXO0169
error description for more details.
Default Recording File
The default recording file pathname is now based on the absolute path name
of the processed executable. This forces recording files into the same
directory as the executable by default. Relative path names may still be
specified using the /recFile option.
Software Faults
Small EXE/DLL files produced by the Borland linker sometimes use 16 bit
values to indicate entry points into 32 bit code. This is a valid
technique which is now correctly handled by LXOPT.
Some assembler routines in the Watcom floating point library contain
operations which violate LXOPT restrictions. V1.2 will now handle the
violations which have been reported.
Assembler sequences inserted into code to help trace execution have been
altered to avoid a potential memory update race condition with a store
from an outstanding floating point instruction.
The method by which access to recording code is serialised has been
updated. This improves performance and prevents the deadlock that often
occurred during recording sessions where multiple threads of varying
priorities were executed.
Previous Changes
Version 1.1
Pricing
The price of the single user license increased slightly and all
distribution license fees were removed.
Processing Large EXE/DLL Files Under WARP
Advice on how to avoid problems that may occur processing large EXE/DLL
files under WARP has been revised. It is important that LXOPT users are
aware and act on this information. See Urgent Message for details.
WARP Compressed Page Support
LXOPT now supports the EXE/DLL file page compression introduced with OS/2
WARP. See the /pack2 option for more details.
New Arrangement Algorithms
LXOPT has two new arrangement algorithms. The binary algorithm arranges
code based on the similarity of binary patterns of use and is the new
default algorithm. Parkonly, as the name suggests, parks unused code but
performs no other arrangement. It is intended for use where developers
wish to retain control over the arrangement of executed code.
CPU Optimisations
V1.1 contains the first LXOPT CPU oriented optimisations. These focus on
CPU instruction cache utilisation efficiency (see CPU Instruction Caching)
and branch prediction. New options giving greater control over alignment
within the processed code object have also been provided. See /alignCode,
/alignData, /cLineSize and / cLineWaste for more details. CPU bound
applications should achieve performance improvements of between one and
five per cent. There are numerous other optimisation opportunities which
will be included in future versions of LXOPT.
Special code arrangement pre and post processing have also been introduced
to strike a better balance between CPU efficiency and working set tuning.
This suppresses layout divisions within performance critical areas and
significantly reduces the size of recording files.
Default Alignment
The default alignment of code pointer targets is changed from 4 to 1.
Performance related code alignment issues should be addressed using the
new /cLineSize and /cLineWaste options. Applications which rely on
alignment to make the low order bits of addresses redundant should set
alignment explicitly using the /align option.
ICC.EXE Patch for CSet++ V2.0 and 2.1 Users
Linking initiated by ICC.EXE always forced a base address for EXE files
resulting in the removal of internal fixups. LXOPT requires internal
fixups for correct processing of the input executable. Although many
users could avoid this problem by direct use of LINK386, C++ template
users were forced to link directly via ICC preventing the use of LXOPT. A
patch for ICC.EXE is now included to remove this restriction. See Patch
for CSet++ V2.0 and 2.1 Users for a description of this patch and how to
apply it.
Demo Version Restrictions
Code arranged by the demonstration version will not produce a tone on
start-up unless an error is encountered. The tone has been replaced by a
processed application lifetime limit of seven days. A warning is also
produced if the machine in use has not been recently rebooted to encourage
testing in line with the new Performance Testing section.
This change also allows evaluators to time the execution of processed code
without having to deduct the duration of the start-up and termination
tones. Code produced by the LXOPT Demo will now execute almost
identically to that produced by the full product but may produce up to
three additional page faults during the runtime of the processed
application.
Recording Session Error Messages
V1.0 of LXOPT placed error messages produced during recording sessions in
the file "LXREC.ERR". V1.1 retains this behaviour but when the message
has been safely recorded it will now attempt to display the message on
screen.
Code Offset Translations
New options to translate code offsets between pre and post arrangement
code are now provided. These are designed to assist in debugging
situations and to help trace instruction pointer based error messages back
to the original source code. See the /getOld and /getNew options for more
details.
Code Disassembly
Code within the processed code object can now be disassembled using the
/disasm option.
Utility Programs
New utility programs have been introduced to time application execution,
simulate low memory conditions and translate EXE/DLL files between the
OS/2 2.x and WARP compressed executable file formats. See TimeRun,
Thrash, LXWarp and LXUnWarp for more details.
Installation/WPS Set-up
The installer now creates an LXOPT desktop folder. The contained program
objects are prepared to allow use of LXOPT and associated utilities direct
from the WPS. Alteration of DPATH is not required for the use of LXOPT
V1.1.
Software Faults
The LXOPT recorder has been redesigned to remove the problems some users
experienced while attempting to perform multiple simultaneous recording
sessions. Unique recording DLLs are now created on a per application
basis.
The code analyser has been updated to prevent rejection of code generated
by the Watcom C++ compiler.
Run times on large files (>2Mb) have been reduced by improved efficiency
in the file creation routines. Note that LXOPT is not intended for use on
a daily basis but as a final 'pre-shipment' optimiser. Future design
targets will permit arrangement times of up to 12 hours (i.e. over night)
if optimisation rewards warrant it.
ΓòÉΓòÉΓòÉ 5. *** START HERE! *** ΓòÉΓòÉΓòÉ
*** START HERE! ***
Welcome to LXOPT
LXOPT is a unique tool for the OS/2 developer. By working set tuning EXE/DLL
files LXOPT will typically halve the amount of memory required to store
application code. The instruction stream is also processed to ensure maximum
CPU instruction cache efficiency and a new preloading option allows the
resulting code to be transferred to the swapfile allowing the fastest possible
application startup and paging. See benefits for a full list of what LXOPT can
do or go to the Introduction section for a more complete introduction to the
software.
You Can Now Use LXOPT Free!
LXOPT is now freeware, see This Software is FREE! for details.
Quick Start
The installer has created program objects to process applications using
standard LXOPT defaults which appear on the Open menu of EXE and DLL files. If
you are eager to get started go straight to Using LXOPT.
Spreading the Word
Working set tuned applications page less and use less memory. This leaves more
memory free for use by other applications, reducing their need to page. When
users execute tuned code the entire system benefits. Only when the majority of
applications have been working set tuned will the full benefits of tuning
appear. Please help to spread the word by passing on this copy of LXOPT to
other OS/2 developers.
ΓòÉΓòÉΓòÉ 6. Introduction ΓòÉΓòÉΓòÉ
Introduction
ΓòÉΓòÉΓòÉ 6.1. Background ΓòÉΓòÉΓòÉ
Background
With the release of OS/2 V2.0 came the introduction of the Linear Executable
file format, the format now used by all 32-bit OS/2 EXE and DLL files. For the
first time the Linear Executable allowed code to be loaded into memory in 4Kb
units, the page size of the 80386 and later processors.
This new efficient design changed the way code was loaded into memory.
Applications were no longer loaded on a per segment basis but used the virtual
memory mechanisms now used by the rest of the operating system.
A code page is loaded into memory when an attempt is made to execute an
instruction within the page. The 4Kb page is read from the disk and
relocations ( fixups ) are applied from data structures contained within the
Linear Executable. When memory is heavily utilised code pages will be recycled
as with other system memory and the code is discarded. If instructions on the
code page are later referenced the page must be reloaded from the disk and
fixups reapplied.
An efficient program would ensure that all instructions on a single code page
were executed at roughly the same time to make sure that memory was used and
paged most efficiently. Unfortunately modern programming methods and
convenience favour grouping code logically rather than by time of execution.
During the run time of a typical application between 30% and 50% of the code
loaded is never executed. Entire 4Kb code pages are often loaded to execute
only twenty or thirty bytes of code.
ΓòÉΓòÉΓòÉ 6.2. What is LXOPT? ΓòÉΓòÉΓòÉ
What is LXOPT?
LXOPT (Linear eXecutable OPTimiser) is a development tool designed to improve
the code layout of 32-bit OS/2 applications. Applied directly to EXE and DLL
files LXOPT rearranges code at the assembler level to minimise page faults,
maximise CPU instruction cache efficiency and provides many other useful
benefits. It is particularly effective on large applications forced to run in
low memory conditions and can reduce code load page faults by up to 95% in
extreme conditions.
LXOPT can group together all unused assembler sequences and move them to other
code pages from where they will not occupy memory unless executed. This
technique, known as 'Sleeping Code Parking', reduces the total code memory
requirements of a typical application by between 30% and 50%.
LXOPT is unique in that it works at the assembler level and is able to change
the location of not just whole procedures but individual processor
instructions. Code handling infrequently used branches of IF or CASE
statements may be moved to different code pages significantly reducing a
programs working set.
LXOPT also produces minor CPU related performance improvements by improving CPU
cache efficiency and branch prediction. CPU bound applications should achieve
performance improvements of between one and five percent.
Applications to which LXOPT is to be applied must adhere to certain
restrictions or may require special caution, see restrictions for more
information.
LXOPT operates directly on 32-bit code in OS/2 Linear Executable files and does
not normally require alteration or recompilation of source code. Processed
applications may contain 16-bit code but this code is not optimised by LXOPT.
LXOPT is not designed for daily use but intended as a final stage in the
development cycle. A completed application should be processed by LXOPT as a
final optimisation phase before retesting and internal or external deployment.
Note The normal method of testing a performance enhancing tool is to
compare the execution times of pre and post optimised code. LXOPT
primarily operates by improving the caching characteristics of
application code and results are distorted by caching effects within
the file and operating systems. See Performance Testing for details
of how to negate these effects.
If you intend using this software on files greater than 1Mb in size
and are using OS/2 WARP 3.0 (without FixPak 11 or higher) there is a
workaround for a software fault of which you MUST be aware. See
Urgent Message for details.
ΓòÉΓòÉΓòÉ 6.3. Benefits of LXOPT ΓòÉΓòÉΓòÉ
Benefits of LXOPT
Working Set Reduction
The primary benefit of LXOPT is its ability to identify an applications working
set and produce an optimised code layout based upon it. Any application with
code greater than 4Kb in size may benefit from this effect. The effect is most
apparent when an application is forced to execute in a restricted amount of
memory.
Reduction of the working set reduces page faults which improves performance.
Large applications constrained by physical memory may benefit greatly from this
effect. Small applications benefit from improved start up times and contribute
to a general reduction in system load.
Code Parking
Users may be familiar with 'Dead Code Elimination', an optimisation used by
compilers to remove unused code. LXOPT performs 'Sleeping Code Parking', the
moving of apparently unused code to the end of an applications code space.
'Parked' code will not normally be loaded by your application BUT REMAINS
ACCESSIBLE should it be required.
Code parking can significantly reduce the total amount of memory used by
application code, typically by 30% to 50%. It is particularly useful in
reducing the memory overhead of largely dormant applications that run
continuously in the background.
Code Removal
The LXOPT code analyser will detect and completely remove unreferenced
instruction sequences. Although modern linkers are normally very good at
performing this function they are unable to detect some forms of unused code.
LXOPT identifies every byte of code, if any instruction is unreachable LXOPT
will remove it.
Preloading
LXOPT includes a /preload option which permits application files to be
transferred to the swapfile on machine start-up. This technique, formerly
reserved for OS/2 system DLLs, is now available to all applications.
Preloading permits faster application start-up and paging at the expense of an
extended boot time and increase in swapfile requirements.
CPU Optimisations
Code arrangement and CPU caching related options allow the tuning of code for
maximum CPU cache efficiency. Improved instruction caching aids performance
both by helping to reduce instruction fetch times and freeing the main memory
bus for other instruction/data accesses. The execution history generated by
LXOPT is also used to assist in branch prediction.
Ease of Use
LXOPT works directly on application EXE and DLL files. Code usage gathering
and new layout generation are fully automated, no source code files are ever
examined or altered.
Developers no longer need to alter code structure, insert compiler pragmas or
predict runtime code usage to tune their code layouts. Near optimal code
layouts which can effectively halve code size can often be achieved with less
than an hours work.
Users With Insufficient RAM
Despite declining memory prices many users still operate machines with
insufficient RAM for their requirements.
For many applications LXOPT provides a means of quickly and effectively
supporting such users while providing a useful performance enhancement
regardless of the target machine.
Multitasking
Users are often limited in running multiple concurrent applications by memory
constraints. Multitasking applications compete with each other for valuable
system memory.
Applications processed by LXOPT work more effectively in low memory conditions,
reducing system load when running several large programs concurrently.
Fewer/Faster Disk Accesses
In addition to the reduction in disk activity due to fewer page faults,
optimised code layouts have other beneficial effects.
Pages in the LXOPT processed application file will tend to be arranged in the
order in which they are used. Once a page is loaded from disk following pages
are likely to be fully or partially loaded at the same time by disk
hardware/cache. Subsequent page faults can therefore often avoid a disk hit,
significantly reducing the time taken to service the fault. Even if not
already loaded, subsequent code pages will tend to be located at nearby
locations on the disk so reducing head movement.
In addition LXOPT aligns each code page within the file to minimise the number
disk blocks which need to be read by the disk controller.
Using Libraries
All application developers use libraries. Some come with the compiler, others
are developed in house or purchased externally. Normally these libraries are
developed for general use without regard for a specific application.
LXOPT can process these libraries to optimise their memory usage with your
application. Unused code in the libraries is parked where it causes least
overhead to your application. This benefit may be gained whether the library is
statically or dynamically linked.
Error handling/Debugging code
Most software contains error handling for unexpected internal errors. Often
such code is pre-processed out before final release. While the tests for error
conditions remain, LXOPT will park the error handling itself. While your
application is operating normally the overhead of the error handler code is
reduced to the size of a single transfer instruction.
ΓòÉΓòÉΓòÉ 6.4. Restrictions ΓòÉΓòÉΓòÉ
Restrictions
Please ensure that all programmers involved in the development of the software
to be processed read the contents of this section.
LXOPT may be applied to almost all OS/2 Linear Executable DLL or EXE files
containing a pageable 32 bit code object. Some unusual programming techniques
may cause LXOPT to fail or require special caution.
A 32 bit OS/2 linear executable application may contain many objects. An
object is analogous to a segment within the old 16 bit file format with each
object typically containing a different type of information. For example a
normal 32 bit executable file might contain 32 bit code object(s), data
object(s), resource object(s) and perhaps a 16 bit code object to interface
with some 16 bit APIs. LXOPT always applies itself to the largest 32 bit code
object.
To function correctly LXOPT needs to identify all potential execution paths
through your program. LXOPT will work correctly with all transfers of control
generated by standard compilers and normal assembler techniques. Some coding
techniques which rely on assumptions about code layout will cause LXOPT to fail
or require special caution. Run times will also be reduced if users avoid
placing read-only application data within the processed code object.
These following restrictions apply only when they involve the code object which
LXOPT processes. A restriction may often be avoided by moving the offending
code/data to a different object. This can be achieved by placing the code in a
named code segment via compiler options/pragmas and adding this name to the
module definition (.DEF) file in the SEGMENTS section.
Pointer Alignment
If your code relies on the alignment of functions/data (i.e. lower bits used
for informational purposes and masked out before dereference) you must use a
consistent /align option value.
Hard Coded Relative Distances
Your application may not make assumptions about relative distances between
functions or the absolute location thereof. For example, a statement such as
((NextFunc *)((char *)MyProc+10))() will cause your processed application to
expire just ahead of your programming career.
In general the absolute distance between two locations within the processed
code object should never be used within a calculation. The only exception to
this rule is where both addresses refer to data and no code exists between
them. Assembler programmers may need to take particular care in observing this
restriction.
A special case of this error occurs in code inserted by IBM compiler libraries
when handling a DosGetMessage call. See the LXO0169 error for more details.
Timing
During the recording process LXOPT increases the load on the CPU and generates
occasional disk activity. If an application is timing sensitive this may
affect its operation. CPU load/disk activity can often be reduced or
redistributed to reduce such problems. See Creating a Recording Version for
more information.
Exporting Code Object Data
LXOPT assumes all exports from the processed code object are 32 bit function
entry points. Do not export data from your applications 32 bit code object as
this will cause failure of LXOPT or the resulting program.
Self Referencing Code
Code that deliberately alters itself to effect transfers may fail. If your
application contains assembler that uses this technique disassemble the
resulting code to ensure it has been translated correctly. LXOPT will issue a
warning if the code object it is processing is writeable.
Invalid SS:ESP
LXOPT uses the applications stack during recording sessions. To ensure correct
operation of the recorder SS:ESP must be valid at all transfers of control
within your application (e.g. during a JMP instruction). If your 32 bit code
receives control with an invalid or 16 bit stack a valid SS:ESP pair must be
created before the next transfer is executed.
LXOPT also uses the DS register. All recording operations are suspended while
the DS register contains a value other than that in effect at application
start-up.
ΓòÉΓòÉΓòÉ 6.5. An Example ΓòÉΓòÉΓòÉ
An Example
The effect of LXOPT on code is best demonstrated by example. The caption below
shows a simple piece of disassembled 'C' code which opens a file and checks
that it contains the correct version number.
/* open the file */
fp = fopen(filePath, "r");
PUSH "r"
PUSH filePath
CALL fopen
MOV fp, EAX
/* check if opened ok */
if (!fp) {
CMP EAX, EAX
JNE DO_FSCANF
ReportError(RE_OPEN_FAIL, filePath);
PUSH filePath
PUSH RE_OPEN_FAIL
CALL ReportError
return FALSE;
MOV EAX, 0
RET
}
/* read in the version number */
fscanf(fp, "VERSION %d", & version);
PUSH & version
PUSH "VERSION %d"
PUSH fp
CALL fscanf
/* test the version number */
if (version != APP_VERSION) {
CMP version, APP_VERSION
JE VERSION_OK
ReportError(RE_WRONG_VERSION, filePath);
PUSH filePath
PUSH RE_WRONG_VERSION
CALL ReportError
return FALSE;
MOV EAX, 0
RET
}
The code performs some simple error checking typical of this type of operation.
In normal use the error handling code will not be executed yet is always loaded
due to its proximity to the other code.
LXOPT uses the execution history to identify the error handling as 'Sleeping
Code' and moves it to another code page. This produces the new instruction
sequence below.
...
PUSH "r"
PUSH filePath
CALL fopen
MOV fp, EAX
CMP EAX, EAX
JE NOT_OPENED
PUSH & version
PUSH "VERSION %d"
PUSH fp
CALL fscanf
CMP version, APP_VERSION
JNE WRONG_VERSION
...
The twenty two instructions of the original code sequence is reduced to twelve
while the application executes normally. If a file access error does occur the
error handling is loaded and executed normally.
LXOPT improvements don't stop there. The entire 'ReportError' function is also
parked and code from the library functions for 'fopen' and 'fscanf' will be
separated from the other library code allowing it to be moved to the code page
on which it is used. Also 'fscanf' is a powerful function which contains code
capable of reading many data types and includes floating point conversions.
LXOPT breaks the function into its component parts and only moves the code from
'fscanf' which specifically handles the input formats specified by your
application.
The result is a code sequence that is effectively halved in size and which only
needs a single code page to contain all the required code.
ΓòÉΓòÉΓòÉ 7. User Guide ΓòÉΓòÉΓòÉ
User Guide
ΓòÉΓòÉΓòÉ 7.1. Using LXOPT ΓòÉΓòÉΓòÉ
Using LXOPT
Use of LXOPT is divided into four stages.
Preparation
The EXE/DLL file to be processed may need some special preparation. What
preparation is needed (if any) is described in preparing your application.
Creating a Recording Version
Next a special version of your application must be created. This will execute
as normal but will create a recording detailing where and when all code is
used. LXOPT creates this special version of your application for you. See
Creating a Recording Version for more details.
Running the Special Version
The special recording version of the application is then executed to generate a
recording. This is the most important part of the optimisation process. See
Recording Program Statistics for details.
Arrangement
Finally LXOPT is used again, this time with the /arrange parameter to create an
optimised code arrangement. Options may also be specified to compress pages in
the output file (/pack2) or to allow the created file to be preloaded
(/preload).
Arrangement issues are discussed in Creating An Optimised Code Arrangement
The result is a leaner, faster application with dramatically improved
performance characteristics when forced to run in low memory conditions.
ΓòÉΓòÉΓòÉ 7.1.1. Preparing Programs ΓòÉΓòÉΓòÉ
Preparing Programs
EXE File Preparation
LXOPT can process both EXE and DLL files. To process EXE files they must
contain internal fixups which are normally removed by the linker when the
executable is given a base address.
LINK386 users must ensure that EXE files have been linked without the /base
linker option. ICC.EXE in CSet++ V2.0 and V2.1 automatically provides
/base:65536 as an option to LINK386 when initiating a link. Users of this
compiler should invoke LINK386 separately. This is not possible where the code
to be processed uses templates and for these users a patch for ICC.EXE is
included to allow use of LXOPT. See Patch For Cset++ V2.0 and V2.1 Users for
more details.
Users of VisualAge C++ should use ILINK with the '/nobase' option. Do not link
using ICC.EXE, this provides the /base option to a link even if /nobase is
specified.
Watcom created EXE files need the 'op int' linker option to ensure internal
fixups are retained.
DLL File Preparation
Standard DLL's always retain internal fixups and require no special
preparation.
DosGetMessage
Code within the IBM CSet/VAC libraries violates an LXOPT restriction when
handling a DosGetMessage call. Even where an application does not make direct
use of this function it will often be included to handle messages required by
the compiler library.
LXOPT detects use of this technique and produces an LXO0169 error. See the
description of this error for information on how to avoid this problem.
Restrictions and Stack Use
Before using LXOPT please read the Restrictions section to ensure that it is
suitable for your application.
LXOPT uses up to 2Kb of your applications stack while recording. To reduce
stack size problems LXOPT intercepts thread creation within the recording
application and increases the allocated stack by 4Kb. The main application
stack is not altered.
ΓòÉΓòÉΓòÉ 7.1.2. Creating a Recording Version ΓòÉΓòÉΓòÉ
Creating a Recording Version
To create a recording version of your EXE or DLL file it must be processed by
LXOPT using the /prep parameter. If the /arrange parameter is not specified
then /prep is assumed by default. To prepare an application using default
options select LXOPT Prepare from the Open menu or using the command line type:
LXOPT <exe/dll path name>
LXOPT searches the file for 32 bit code objects and selects the largest for
processing.
Although most applications will be processed in under five minutes, LXOPT is
effectively reconstructing your entire application including any libraries to
which it is statically linked. Processing may take up to a few hours for multi
Mb application files.
Errors and Warnings
LXOPT detects all unreferenced bytes in the processed code object. Often these
bytes are initialised data or unused code. For each unused byte sequence which
contains fixups LXOPT produces a LXO0150 warning. It is normal for files to
produce several of these warnings during processing. All unused bytes are
removed from the output executable.
When LXOPT detects an error processing a file a simple error message is
produced. More detailed information on the error and how to avoid it is
provided via the F1 key which provides a direct link to the reference section
in this documentation.
Memory Use Analysis
During processing LXOPT identifies every byte within the processed code object.
There are three basic classifications; 'code', 'data' and 'unused'.
'Code' bytes are those used to form processor instructions and will normally be
the largest group.
'Data' bytes are read-only data placed within the code object. Such data is
normally inserted by a compiler and are usually compile time constants or jump
tables used in the encoding of CASE statements.
'Unused' bytes are never referenced within the application. Typically these
are padding used to adjust procedures and data to alignment boundaries. Often
applications contain unused code or data that is linked into the final
executable image. Some development tools and techniques can prevent standard
linkers from detecting unused code.
Files Produced
When processing has completed the original application file is replaced by a
special recording version. The original file is renamed to have an '.ORI'
extension.
A special recording DLL is also created to assist with runtime recording. The
name of this DLL is generated by placing an '@1' at the end of the root name
and adding a '.DLL' extension. For example, processing of 'MYAPP.EXE' will
create the recording DLL 'MYAPP_@1.DLL'. The numeric value is altered if
necessary to ensure generation of a unique file name.
Special Options
Some applications may require special options for recording to increase
performance or alter the distribution of the overhead of the recording process.
Often the most simple solution for timing problems is to record on a higher
performance PC or remove timing sensitive code from the processed code object.
See the /buff, /compress, /thread and /recfile options for more information.
Note If you intend using this software on files greater than 1Mb in size
and are using OS/2 WARP prior to application of FixPak 11 there is a
workaround for a software fault of which you MUST be aware. See
Urgent Message for details.
ΓòÉΓòÉΓòÉ 7.1.3. Recording Program Statistics ΓòÉΓòÉΓòÉ
Recording Program Statistics
To record program statistics run your prepared EXE/DLL performing the
operations to be typically performed by the user. The quality of recording
information directly affects the performance of LXOPT and is the most important
part of the optimising process.
During the recording process the recording DLL generated during preparation
(e.g. 'MYAPP_@1.DLL') must be somewhere on your LIBPATH. By default this DLL
was created in the directory of the processed EXE/DLL file. Typically LIBPATH
contains the current directory '.' which often allows the DLL to be found at
its default location.
If your application consists of multiple EXE/DLL files you may prepare and
record all of them simultaneously.
Recording Strategy
The purpose of recording is to tell LXOPT where and when instructions are
normally executed and identify which are rarely or not normally used. During
the recording session you perform operations that you would normally expect of
the user. Avoiding unusual program conditions while recording will greatly
enhance the effectiveness of LXOPT. DO NOT be tempted to use preprepared test
scripts designed to test program stability, these rarely mimic true user
behaviour.
A good general rule is to start your application and do the most commonly
performed actions first. Then progress through the interface in order based on
expected frequency of use. If you wish to focus tuning on a specific area of
the code simply execute that code more frequently.
Recording (.REC) Files
During execution your application will create a recording file which by default
is the name of the processed file with a '.REC' extension. Your application may
be used for more than one recording session, each session appends its results
to the existing recording.
Recording sessions may also be performed on separate machines, the files
concatenated later to form one continuous recording. If you wish to record
multiple instances of your application at the same time a separate recording
file is created for each instance. These special circumstances require special
treatment of the recording file(s), see the /recfile option for more details.
Special Options
If recording seems slow or your application is timing sensitive the /buff,
/thread, /compress and /recfile options can all help to reduce the recording
overhead if used when creating a recording version.
If an error is detected during recording a two pitch alternating tone is
generated and an error message placed in the file 'LXREC.ERR'. An attempt will
then be made to display the error message on screen. This will normally
succeed but given the unknown state of the application it cannot be guaranteed.
The application will then terminate with error code 99.
WARNING
Recording may create large recording files ranging from a few Kb to several
hundred Mb in size. Running out of disk space will cause recording to fail and
may impair other applications if recording files are placed on the same drive
as your swap file.
The recording file pathname may be specified using the /recfile option and the
recording file may be reduced in size by the /compress option. You are advised
to start with a short recording session to judge the disk space requirements
for your application.
ΓòÉΓòÉΓòÉ 7.1.4. Creating An Optimised Code Arrangement ΓòÉΓòÉΓòÉ
Creating An Optimised Code Arrangement
When a recording has been created LXOPT is used with the /arrange parameter to
create an optimised layout.
To use all defaults select LXOPT Arrange from the Open menu or using the
command line type:
LXOPT <exe/dll path name> /arrange
If you wish to preload the resulting file you must also specify the /preload
parameter. This option is also available as LXOPT Arrange Preload on the Open
menu.
As with preparation, LXOPT may take a significant time to complete when
processing large input files.
Performance Testing
If you wish to compare performance between the original and optimised
applications please read the Performance Testing section. Caching effects of
the operating system, network servers and disk caches need to be considered to
ensure valid results.
Files Produced
When complete the optimised EXE/DLL file is created and overwrites the existing
recording version.
The recording version of the file is renamed with a '.PRP' extension.
The original application file remains with a '.ORI' extension.
Arrangement Report
When a new arrangement has been created LXOPT performs a page fault simulation
on the original and optimised versions. The test simulates the execution of
the entire recording session with the original and optimised versions of the
application file. The results are analysed and used to generate a comparative
report for a series memory load conditions. See Reading Reports for an
explanation of the information displayed.
Production of a report may be disabled using the /noreport option
Recording Files
LXOPT arrangement uses a single recording file which by default is the EXE/DLL
path name with a '.REC' extension. If you wish to specify another file or use
multiple files please see the /recfile option for more details.
Alternative Arrangement Algorithms
LXOPT uses named arrangement algorithms to create optimised code layouts.
Although the default algorithm is generally the best, algorithm performance is
dependent on program structure and can vary dramatically. For information on
customising the optimisation phase refer to the /alg option.
CS:EIP Based Messages
Many applications use the current instruction pointer as part of their fatal
error messages. The default exception handler also provides this information.
Developers often use a .MAP file to trace back these pointers to the offending
code. The movement of assembler sequences within the processed code object
means that offsets within this region will have altered. LXOPT provides the
/getOld and /getnew options to translate these offsets between the old and new
versions of the executable.
ΓòÉΓòÉΓòÉ 7.2. Reading Reports ΓòÉΓòÉΓòÉ
Reading Reports
When the arrangement process is complete LXOPT generates a report to show how
effectively the new instruction layout will reduce page faults due to the
loading of application code.
The data for this report is generated using a simulation of the page fault
behaviour of the old and new code layouts for a range of available free memory.
The execution history generated by the recording process is used to recreate
the flow of control throughout the lifetime of the application.
OS/2 is an advanced multitasking operating system and as such the resources
allocated to an application will vary dynamically based on system load. To
allow production of a meaningful report page fault data is generated for fixed
amounts of available memory.
A typical report appears below:
Calculating page faults loading instructions from code object 1 ...
Memory (Kb) Old Faults New Faults Percentage Reduction
28 13187 4014 69%
84 2048 290 85%
140 743 58 92%
196 461 43 90%
252 337 43 87%
308 240 43 82%
364 182 43 76%
420 160 43 73%
476 144 43 70%
532 131 43 67%
375161 bytes (63%) of the code was parked.
Each row of the table details page fault behaviour for a fixed amount of
available memory. To take the first row, if code object 1 were restricted to
the use of 28Kb of memory the old version would generate 13,187 page faults
while executing the recording sequence. The new code layout would generate
4,014 page faults, a 9,173 (69%) reduction.
Code parking often has a significant effect on the arrangement process and in
the above example roughly 366Kb was parked (i.e. 375,161 bytes of unused code
fragments were collected from throughout the application and placed together at
the end of the code area).
The last row of the table shows the number of page faults generated when all
the memory the code needs is available. This figure is never zero as
application code is always loaded via page faults (*). As no page is ever
forced out the number of page faults is equal to the total number of pages
referenced. The old layout used 131 pages (524Kb) while the new code layout
uses 43 pages (172Kb) a reduction of 88 pages (352Kb) or 67%. This is the
reason why for a range of available memory the new code layout causes 43 page
faults, 43 pages (172Kb) of memory is all that is required to load all of the
executed code.
So has the applications total memory requirements been reduced by 67%? No.
All figures in the table relate to page faults loading instructions from the
processed code object, all references to data and resources remain unaltered.
If your application is relatively small but manipulates large amounts of data
the effect may even go unnoticed. If your applications memory requirements are
primarily due to the size of the code then the effect can be transformative.
(*) OS/2 V2.0 and later ignore the PRELOAD attribute specified in module
definition files.
ΓòÉΓòÉΓòÉ 7.3. Preloading ΓòÉΓòÉΓòÉ
Preloading
Preloading (transferring exe/dll files to the swapfile) can give a significant
load time performance boost at the expense of additional swapfile space
consumption and initial preloading delays (normally at boot/network connection
time). For a file to be preloaded it must have been arranged using the
/preload option. The resulting file will execute normally until preloaded
using the Preload Utility.
Use of preloading also raises other more subtle issues.
DLL Initialisation
Applications are preloaded by use of the DosStartSession and DosLoadModule
APIs.
When preloaded via the preload utility no DLL initialisation code is executed
in DLLs that have been arranged with the /preload option. This is required to
ensure transparent operation and also to protect the preloading mechanism from
harm. No EXE file code is ever executed. The only exception is for DLLs with
global initialisation for which normal start-up is performed.
When a DLL/EXE is loaded, so are all the DLLs on which it depends. The DLLs
not processed by LXOPT will initialise normally and are not preloaded.
This places constraints on the actions that may be performed while executing
the _DLL_InitTerm function within unprocessed or global init DLLs used by an
LXOPT preloadable file. In these circumstances the _DLL_InitTerm routine
should not attempt any user interaction or perform any action likely to
materially affect other client processes. It should also not attempt to access
functions exported by other LXOPT processed DLLs, their DLL initialisation will
not have been performed making exported functions unreliable.
Although it is very unusual for such DLL initialisation code to exist it is
extremely important that this restriction is not violated.
Philosophical Issues
Preloading is a compromise. It trades the performance boost of application
paging against increased use of machine resources; namely swap space and
preload (usually boot) time.
Problems appear where each application considers itself "important" enough to
be preloaded. Clearly if a users machine were to load every piece of software
they possess every time the machine booted preloading would quickly become
counter productive.
OS/2 used to permit the preloading of segments with the PRELOAD segment keyword
in module definition files. The loader would read all PRELOAD segments into
memory on application start-up. Unlike the LXOPT preload, this OS/2 preloading
only took effect when the user attempted to start an application. PRELOAD
segments were loaded but often at the expense of other executing applications.
Although tools still support this option and indicate preload requirements in
the executable files they produce the operating system now ignores them. A
major factor in this is the potential abuse of PRELOAD to boost a single
applications apparent performance at the expense of the rest of the system.
LXOPT preloadable files are preload enabled, they do not automatically preload
by themselves. This is deliberate. By requiring the use of the preload
utility the decision to preload is taken away from the developer and given to
the user.
Your development tools are a good example of the issues involved. If you use
them on a daily basis then preloading all the executables and DLLs may be very
beneficial but it is hardly warranted if development is confined to an annual
tweak of an in-house utility.
In general do not make assumptions about the desirability of preloading your
application. If you are working set tuning your software with LXOPT, enable
preloading and deploy with the preload utility wherever possible. Users
sometimes run server applications intermittently while apparently trivial
applications are often executed thousands of times within automated scripts.
Deployment Issues
The Preload Utility which accompanies LXOPT is freely distributable with your
application. Remember that a users installation may already be using the
preload utility. Two issues arise, where to put it and what to do if the files
already exist. Preload needs to be available prior to network availability.
It is therefore recommended that the preload utility is stored in the
"\PRELOAD" directory on the OS/2 boot drive. If the utility already exists
follow the simple rule that if the files have a more recent creation date than
the ones supplied with your application, leave them alone!. Future versions of
the preload utility will maintain backwards compatibility.
The preloader operates by copying the entire processed file into the swapfile.
The original application file on disk is closed and all further page loading is
performed directly from the swapfile. While this reduces the page loading
overhead it also increases disk space requirements on the swap drive. Running
out of swap space may make the system unstable. If the problems are caused by
commands in startup.cmd it will make recovery more difficult. By default the
preload utility will refuse to preload a file if there is less than 10Mb of
free swap space. If your application installation places preload commands in
startup.cmd and predict the need for greater free space use the /M option.
ΓòÉΓòÉΓòÉ 7.4. Working Set Tuning ΓòÉΓòÉΓòÉ
Working Set Tuning
Working set tuning is the act of optimising the arrangement of information to
be stored in a cache to allow the most efficient operation of the caching
mechanism.
When an operating system manages pages of memory the system RAM effectively
becomes a giant cache. 4Kb code and data pages travel via this cache from
executable files or the swapfile on their way to the processor in the same way
that files travel through a disk cache. The success of caching relies on the
non random nature of access requests. The greater the locality of reference
the more efficiently a cache operates.
The path of execution through code is not random. Instructions execute in
sequences broken up by transfers of control such as CALL or JMP instructions.
It is these transfers of control that often produce a 'cache miss' resulting in
a disk access. While the targets of these transfers are sometimes some
distance from the current instruction pointer the destination is often known or
predictable. It is this predictability on which a working set tuner for code
is based.
Code Arrangement
When applications are divided into pages it is done without regard to the
underlying contents, every 4096 bytes the code is indiscriminately severed.
Functions and even individual instructions are split across page boundaries.
What appears on each page is dictated by the order of the code within the
executable file. What dictates that order? You do.
Within each compilation unit the order in which code appears is generated by
the compiler. It builds the instruction sequences and orders them roughly as
they appear in the source file. The linker takes each object file and
concatenates the contents, appends any used libraries and outputs the
executable. The result is that code within an executable will appear in the
order that it is typed in the source files and the order in which those files
are processed by the linker.
Well so what? - if code is only loaded when referenced then there's no problem,
right? Take a look at the two code layouts below where A to L are individual
code sequences which for simplicity have been grouped 3 to each 4Kb page.
ΓöîΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö¼ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö¼ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö¼ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö¼ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÉ
Γöé Γöé Page 1 Γöé Page 2 Γöé Page 3 Γöé Page 4 Γöé
Γö£ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö╝ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö╝ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö╝ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö╝ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöñ
Γöé Untuned: Γöé A,B,C Γöé D,E,F Γöé G,H,I Γöé J,K,L Γöé
Γö£ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö╝ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö╝ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö╝ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö╝ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöñ
Γöé Tuned: Γöé G,E,D Γöé L,B,C Γöé H,I,K Γöé A,F,J Γöé
ΓööΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö┤ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö┤ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö┤ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓö┤ΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÇΓöÿ
Untuned vs Tuned Code Layout
Reduced Code Memory Requirements
If the normal path of execution is "GEDLBLBLBCHIK" the benefits of the tuned
layout become clearer. The normal layout loads four pages in the sequence
3241, the tuned layout loads only three pages in the sequence 123. Both
layouts contained and executed the same code but the tuned layout avoided one
disk access and used 25% less memory.
This memory saving is a direct result of the sequences A, F and J not being
executed. If this seems contrived consider that on average between 30% and
50% of a typical applications code is loaded but not executed during normal
operation. If that sounds high examine your own code for error handling of
memory allocation, file access or OS/2 API errors. Include with this the code
that your users will rarely execute such as routines handling the import of
that obscure file format, changing application configuration or displaying
that list of programmer credits (try clicking on the Warp desktop and then
press Ctrl-Alt-Shift-O). Often such code appears within branches of IF or
CASE statements, lying dormant inside otherwise active functions. In an
untuned code layout much of this code is loaded because it happens to share
part of a code page with some other active code sequence.
Performance In Low Memory Conditions
Limit the code sequence to the use of one 4Kb page as a crude simulation of
low memory conditions and more significant differences emerge. The normal
layout will execute as "GEdLBLBLBcHiK" where an upper case character
represents the need to load a page from disk. The result is a total of 10
page loads. The tuned layout sequence is "GedLblblbcHik". The total of 3
page loads has been unchanged by the restriction to one page of memory.
The reduction in page faults is due to the way that tuned code layouts group
code by time of execution. When a tuned application executes a CALL or JMP
instruction the target is much more likely to be on the same page of memory.
In low memory conditions any attempt to execute an instruction on all but the
most recently used pages is likely to be punished by a page fault.
While this small example conveys the basic principles, in reality the
situation is much more complex. A typical code page contains not 3 but 300
separate code sequences and OS/2 recycles each page on approximately a least
recently used basis. In low memory conditions applications and the operating
system compete for ownership of these memory pages, the number of pages
available to an application varying dynamically based on current demand. As a
result, when working set tuned code is executed the reduced memory
requirements benefit not just the tuned software but all other executing
applications.
ΓòÉΓòÉΓòÉ 7.5. CPU Instruction Caching ΓòÉΓòÉΓòÉ
CPU Instruction Caching
Level 1 CPU cache memory is a limited and valuable resource. Modern processors
require these caches to allow the instruction pipelines to be fed at their
maximum rate. The Intel Pentium processor implements an 8Kb on chip
instruction cache. Compared to the size of modern applications 8Kb is
extremely small and operational efficiency depends on its ability to contain
small repeatedly executed code sequences (such as program loops) in their
entirety.
There are three inefficiencies in the use of CPU instruction caches that LXOPT
addresses.
Processor Caching Algorithm
The unit of storage within a cache is the 'cache line' which on a Pentium
processor is 32 bytes long. These cache lines are analogous to the 512 byte
block storage units in a disk cache.
Ideally an instruction cache would operate using a pure Least Recently Used
algorithm to maximise the chances of keeping repetitive instruction sequences
within the cache. But caches are high speed devices, often with access times
of less than 10 nanoseconds. Time constraints prevent the implementation of
optimal algorithms. Early devices used a simple direct mapping approach where
some bits from an address were used to directly address a cache line.
Tag Line Adr
Binary Address: 00110110110001001110001011000010
In the example above 8 bits are used from the address to identify the cache
line to use. This gives us 256 possible cache lines or 8Kb of total cache
assuming a 32 byte line. A problem occurs where two memory locations need to
be cached that have the same bit pattern for the cache line address. As each
location is loaded the previous occupier of the cache line is pushed out. For
simple assembler loops this rarely occurs, but if the loop contains a call to a
function with a clashing address performance suffers dramatically.
Hardware designers responded to this problem with the set associative cache.
Tag Line Adr
Binary Address: 00110110110001001110001011000010
Here the cache line address is reduced to 7 bits giving a total of 128
addresses but now the cache holds 2 cache lines per address. The cache
maintains an LRU mechanism within each group of two lines. The total cache
size is unchanged but it is now more flexible, reducing the number of clashes
which force out useful cache lines. The use of 2 cache lines per address (2
way) set associative cache is the method used by the Pentium processor. The
Pentium Pro also uses an 8Kb 2 way set associative instruction cache. The 486
processor uses a 4 way set associative cache with a 16 byte cache line.
Set associative caches reduce cache line address clashes but do not eliminate
the problem. Modern applications, unlike simple benchmark code, contain deep
hierarchies of programming constructs that can significantly increase the risk
of clashes. When these occur not only is instruction fetching delayed but the
use of the external bus blocks other data memory access. When clashes occur
subsequent sequential instruction fetches are also much more likely to push out
reusable cache lines. While these valuable cache lines are discarded, old
unneeded lines are retained by virtue of their uncontested cache line address.
LXOPT organises instructions such that commonly executed code is laid out in
near sequential form. This is the most efficient ordering for the instruction
cache as it minimises the chances of related code sequences having the same
cache line address. The resulting more sequential order of cache line loads
allows the cache to perform as if a near pure LRU algorithm had been
implemented.
Reduced Cache Line Wastage
Grouping commonly executed code together has another simple benefit, it moves
rarely executed code elsewhere.
The Intel x86 instruction set was designed with memory efficiency as a high
priority and many of the most commonly executed instructions are contained
within a single byte. The average size of an uninterrupted sequence of
assembler instructions is only 13 bytes. As a result almost every cache line
will span two or more separate instruction sequences. If these sequences are
not closely grouped by time of execution then significant portions of the cache
are wasted.
LXOPT arranges code by dividing code into each uninterrupted sequence and
tracing its use. On arrangement these sequences are recombined based on time
of execution, exactly the criteria for the least cache memory waste.
Target Alignment
The third instruction caching benefit is also the result of the sequential
grouping of executed code. Most modern compilers align code reached by
transfer instructions (such as the start of procedures) on a 16 byte boundary.
The principle is simple, once the procedure is reached the most efficient thing
to do is minimise the number of cache line reads. Up to 15 bytes of padding
must be inserted to ensure 16 byte paragraph alignment.
This is clearly a good strategy in normal circumstances but LXOPT processed
code is no longer 'normal'. After processing, the instructions immediately
preceding the aligned procedure are very likely to already be inside the
instruction cache. LXOPT has grouped code by time of use. Even if the code is
not already in the cache it is likely to be needed in the immediate future.
Padding code out to 16 byte boundaries is now a less attractive proposition.
Most instruction padding becomes a waste of valuable CPU cache memory.
The benefit of improved locality of reference is enhanced if the result of a
transfer instruction has come as a surprise to the processors instruction
prefetch mechanism. If the new target is already in the cache the initial
instructions may be fed directly to the execution unit, keeping it busy while
the instruction prefetch struggles to regain its lead over the execution
pipeline(s).
Removal of all padding is not necessarily optimal even after LXOPT processing.
Performance of the 486 processor can be degraded where an instruction crosses
an alignment boundary.
LXOPT removes padding between arranged code but allows the developer to choose
how many bytes of CPU cache to waste to ensure alignment on a cache line
boundary. /cLineSize (default 16) permits the developer to specify the target
cache line size and /cLineWaste (default 3) specifies how many bytes of cache
line you are willing to waste to ensure alignment.
ΓòÉΓòÉΓòÉ 7.6. Performance Testing ΓòÉΓòÉΓòÉ
Performance Testing
The working set tuning benefits of LXOPT in percentage terms are reasonably
consistent across applications but other factors affect how this translates
into performance improvements. LXOPT reduces the overhead of loading code and
the memory needed to store it. A 50Kb application that manipulates 10Mb of data
will benefit little from working set tuning of code. If you are evaluating
LXOPT for use on a large project do not performance test on a 'Hello World'
type program. The only way to obtain reliable results is to test with the
intended application.
The first reaction of a developer after creating a new program arrangement is
to test that the application behaves and performs as expected.
Next is to reach for the stopwatch and test to see how much of a performance
improvement has been achieved. Normal performance testing involves executing
the original and processed versions of the application through a pre-set
sequence of tests and comparing the execution times. There are several issues
that a developer should be aware of when performance testing LXOPT processed
code.
Although LXOPT performs some minor CPU related optimisations, its primary role
is still that of a working set tuner.
When OS/2 manages the allocation and recycling of 4Kb pages of memory it is
effectively performing the role of a cache controller. In a virtual memory
environment your system RAM has become a giant cache memory. LXOPT operates by
arranging your code to allow this cache to work as efficiently as possible.
When your application executes it results in the loading of code pages from not
only the application executable but from other DLLs on which it depends (e.g.
PMMERGE.DLL). When an application terminates many of these DLLs code pages
remain in memory. Execution of the application has changed the set of code
pages that are loaded into memory. This caching of code pages will result in
varying execution times for actions which appear to the user to be equivalent.
Caching is used not just by the OS/2 loader but by the file system(s) and often
disk access hardware. If you are working on a network your file server is
caching there too. This caching which is normally of great benefit is also a
major obstacle to valid performance testing. As a developer you may have
recently created or executed the code to be tested. Either way you will have
caused the contents of the executable file to pass through the cache(s) on the
way to or from the disk. Program data files may benefit/suffer from similar
effects.
For valid testing we need an equal playing field with all code starting out on
disk. The most simple and effective way to do this is to restart your machine,
wait for all disk activity to cease (wait for 3 whole minutes of inactivity)
and then time the execution of the original application. Repeat this process
for the LXOPT processed version and the runtimes can be reliably compared.
Avoid using code or data files on a shared file server for obvious reasons.
The program utility TimeRun will assist in measuring the run time of an
application.
Remember also that OS/2 is a multitasking operating system and other tasks are
always executing together with your own. This may become a problem if these
other tasks behave differently between test runs. The simple act of moving a
mouse pointer over another window or forcing an additional redraw may
significantly affect results as code is loaded and executed to handle these
actions.
The ratio of code size to available system memory is another major influence.
Developers tend to have the fastest disks, modern controllers and more system
RAM than other types of computer users. Working set tuning really shows its
worth when a tuned application is executed in a restricted memory environment.
When performance testing try using a machine typical of your users. Using a
machine with only 8Mb or less will help to simulate low memory conditions.
Alternatively use the program utility Thrash while performance testing your
application. This utility will simulate the low memory conditions that can be
created by other memory hungry applications.
Ultimately memory savings will depend on application size. Pages not used by
your application are free for use by all. These free pages are allocated to
applications on demand by OS/2. If your application is the major consumer of
memory then it will have the greatest reward. Trivial short-lived applications
are usually not memory restricted as memory pages allocated to them will rarely
be loaded long enough to reach the end of the LRU page queue. The benefit here
is a shortened load time and reduced impact on the rest of the system.
In an ideal world all code would be working set tuned. Assuming a 50:50 code
to data ratio the combined effect would be equivalent to a RAM upgrade of
approximately 33% for all OS/2 users.
ΓòÉΓòÉΓòÉ 7.7. Patch for CSet++ V2.0 and 2.1 Users ΓòÉΓòÉΓòÉ
Patch for CSet++ V2.0 and 2.1 Users
LXOPT requires internal fixups to be retained within executable files. LINK386
will remove these internal fixups if it is invoked with the '/base' option.
When ICC.EXE initiates a link of an EXE file it automatically passes this
option to LINK386.
Although many users can invoke LINK386 directly, ICC requires that applications
using C++ templates use ICC to initiate the link. Under these circumstances it
is impossible to prevent the setting of a base address.
LXOPT contains the program ICCPATCH.EXE which disables this behaviour in
ICC.EXE. The patch locates the parameter string within ICC.EXE and removes it,
preventing it from being passed to the linker.
Usage of ICCPATCH is:
[C:\DEVTOOLS\IBMCPP\BIN] ICCPATCH ICC.EXE
Make a copy of the file before you patch it, the given file is altered in
place. If ICCPATCH reports that it is unable to open the file it is likely
that ICC.EXE is still in memory. Type 'ICC /tl-' and reattempt the patch.
Providing a base address of 65536 for an EXE file is good practice and should
normally always be observed. Based executables will be a little smaller and
load a little faster, which is the reason ICC provides a base address by
default. After applying the patch be sure to provide the '/base:65536'
parameter to the linker via the ICC.EXE /B option for all executables which are
not to be processed by LXOPT.
DLLs require no such action and if you plan to apply LXOPT only to DLL files
you should not apply this patch.
If you later apply a CSet++ CSD you may have to reapply this patch. Note that
this patch is supplied by Functional Software Ltd and is not supported by IBM.
ΓòÉΓòÉΓòÉ 8. Reference ΓòÉΓòÉΓòÉ
Reference
ΓòÉΓòÉΓòÉ 8.1. Options ΓòÉΓòÉΓòÉ
Options
Options are identified by a preceding / and are not case sensitive.
If an option requires a value that value must immediately follow the option
name separated by a colon.
e.g. /alg:stat
ΓòÉΓòÉΓòÉ 8.1.1. /alg ΓòÉΓòÉΓòÉ
/alg
Syntax: /alg:<binary | firstuse | stat | parkonly>
Specify the algorithm to be used to generate the new code arrangement.
Available algorithms are binary, firstuse, parkonly and stat.
Firstuse is a simple (and fast) arrangement algorithm which orders code blocks
based on the order in which they are first used.
Stat is a more powerful algorithm which identifies the pattern of executed code
using statistical methods and uses this to order code blocks. The 'stat'
algorithm is capable of producing significantly better results than 'firstuse'
but is extremely sensitive to its /groups option.
Binary is a powerful arrangement algorithm which uses the recording history to
create a binary pattern of usage for each code block. Blocks are then grouped
by similarity of the patterns produced.
Parkonly performs no arrangement other than the parking of unused code. It is
intended for use by developers who wish to retain control over the location of
executed code and is not recommended for normal use.
Default is to use the binary algorithm.
ΓòÉΓòÉΓòÉ 8.1.2. /align ΓòÉΓòÉΓòÉ
/align
Syntax: /align:<num>
Specify the alignment to be given to both instruction and data pointer targets
in the code area. Set to 1 (/align:1) for maximum compression.
Alignment of code and data may be specified separately using the /alignCode and
/alignData options. The option /align:16 is equivalent to /alignCode:16
/alignData:16.
The default alignment is 1 for code pointer targets and 4 for data pointer
targets.
Use of alignment is not just a performance issue, it may be vital for some
types of program, see restrictions for more information.
Alignment of code for performance reasons is better addressed via the
/cLineSize and /cLineWaste options. See CPU Instruction Caching for a
discussion of the issues involved.
ΓòÉΓòÉΓòÉ 8.1.3. /alignCode ΓòÉΓòÉΓòÉ
/alignCode
Syntax: /alignCode:<num>
Specify the alignment to be given to processor instructions in the processed
code object that are referenced by a pointer. Default is 1 for maximum
compression.
Use of alignment is not just a performance issue, it may be vital for some
types of program, see restrictions for more information.
Alignment of code is also affected by the more general /align option.
Data in the processed code object may be aligned using the /alignData option.
Alignment of code for performance reasons is better addressed via the
/cLineSize and /cLineWaste options. See CPU Instruction Caching for a
discussion of the issues involved.
ΓòÉΓòÉΓòÉ 8.1.4. /alignData ΓòÉΓòÉΓòÉ
/alignData
Syntax: /alignData:<num>
Specify the alignment to be given to data in the processed code object.
Default alignment is 4.
Use of alignment is not just a performance issue, it may be vital for some
types of program, see restrictions for more information.
Alignment of data is also affected by the more general /align option.
Code reached via pointer values in the processed code object may be aligned
using the /alignCode option.
ΓòÉΓòÉΓòÉ 8.1.5. /arrange ΓòÉΓòÉΓòÉ
/arrange
Syntax: /arrange
Generate new code arrangement based on results of previous recording
session(s). Alternative is /prep.
For an explanation of the arrangement process see Creating Optimised Code
Arrangement.
ΓòÉΓòÉΓòÉ 8.1.6. /base ΓòÉΓòÉΓòÉ
/base
Syntax: /base:<hex base address> (default - See below)
Specify the new base address for the application file processed.
By default EXE files will automatically be based at address 00010000H.
By default DLL files will retain any existing base address. If not previously
based the DLL is given a default base of 00800000H.
ΓòÉΓòÉΓòÉ 8.1.7. /buff ΓòÉΓòÉΓòÉ
/buff
Syntax: /buff:<size> (default 1024Kb)
Specify the size of the recording buffer in Kb.
This is the size of the recording buffer to be used while recording.
Code references are stored in the buffer while recording. Once the buffer is
full it must be written to disk. Writing is done as part of the application
instance/thread being recorded.
If your application is timing sensitive adjusting this value may remove timing
problems while recording. Valid values are from 4Kb to the maximum allocatable
amount of virtual memory.
ΓòÉΓòÉΓòÉ 8.1.8. /cLineSize ΓòÉΓòÉΓòÉ
/cLineSize
Syntax: /cLineSize:<cache_line_size> (default 16)
Specifies the cache line size (and hence alignment) to be assumed for CPU cache
optimisation. Default is 16 bytes.
Only code which cannot be reached after executing the preceding instruction in
memory is affected by this option. A typical example of such code is the first
instruction of a procedure in a high level language.
Together with /cLineWaste this option controls the performance related
alignment of code within the processed code object. In combination these
options allow the developer to influence CPU cache efficiency.
See CPU Instruction Caching for a discussion of the issues involved.
ΓòÉΓòÉΓòÉ 8.1.9. /cLineWaste ΓòÉΓòÉΓòÉ
/cLineWaste
Syntax: /cLineWaste:<cache_waste_bytes> (default 3)
Specifies the number of bytes of a CPU cache line the developer is willing to
waste to ensure that code reached only via transfer instructions aligns on a
cache line boundary. Default is 3 bytes.
Only code which cannot be reached after executing the preceding instruction in
memory is affected by this option. A typical example of such code is the first
instruction of a procedure in a high level language.
Together with /cLineSize this option controls the performance related alignment
of code within the processed code object. In combination these options allow
the developer to influence CPU cache efficiency.
See CPU Instruction Caching for a discussion of the issues involved.
ΓòÉΓòÉΓòÉ 8.1.10. /compress ΓòÉΓòÉΓòÉ
/compress
Syntax: /compress:<comp_distance> (default 50)
Compress recording by only recording code references not used within the last
'comp_distance' recorded items.
As each use of code is detected the recorder checks to see how recently this
code was last recorded. If within 'comp_distance' the use of the code is not
recorded. This significantly reduces the size of the recording file saving
disk space and the time taken to write the data.
Compressed recording has a higher CPU and memory overhead but this overhead is
unaffected by the size of 'comp_distance'.
Performance of the stat arrangement algorithm is progressively degraded by
increasing values of 'comp_distance'. Performance of the other arrangement
algorithms is relatively unaffected by compression.
The frequency with which recordings are flushed to disk may be altered by the
/buff option.
ΓòÉΓòÉΓòÉ 8.1.11. /disasm ΓòÉΓòÉΓòÉ
/disasm
Syntax: /disasm:<hex offset> (default 0)
Perform disassembly beginning at the given offset into the code object.
For performance reasons the disassembly does not take advantage of the code
identification features of LXOPT but performs a 'blind' disassembly as
typically provided by a debugging tool.
The default number of bytes to disassemble is 50 but can be altered via the
/dislen option.
ΓòÉΓòÉΓòÉ 8.1.12. /dislen ΓòÉΓòÉΓòÉ
/dislen
Syntax: /dislen:<count> (default 50)
Set the number of bytes to disassemble when using the /disasm option.
The locations are given and provided as hexadecimal offsets into the processed
code object. Offsets within other objects are not affected by LXOPT. Note
that translated offsets of transfer instructions may point to a different
(possibly zero length) transfer instruction sequence in the original executable
file.
The corresponding /getNew option provides the reverse of this translation.
ΓòÉΓòÉΓòÉ 8.1.13. /forceIntFix ΓòÉΓòÉΓòÉ
/forceIntFix
Syntax: /forceIntFix (default OFF)
The setting of the internal fixups flag in the executable file header has
proven to vary between development tools. LXOPT now ignores the header flag
and searches for internal fixups within the fixup section. This option
disables this test and forces LXOPT to assume that all required internal fixups
are present in the file.
This option is provided to cover the rare but theoretically possible situation
where a file (most probably a code only DLL) does not require a single internal
fixup. Inappropriate use of this option will cause the produced recording
version of the application to fail with an access violation. DO NOT USE THIS
OPTION UNLESS YOU ARE SURE YOUR EXE/DLL FILE DOES NOT CONTAIN A SINGLE DIRECT
MEMORY REFERENCE.
ΓòÉΓòÉΓòÉ 8.1.14. /getOld ΓòÉΓòÉΓòÉ
/getOld
Syntax: /getOld:<hex_offset>
Find the original position of the code/data that is located at the given offset
into the processed code object in the optimised executable file.
The locations are given and provided as hexadecimal offsets into the processed
code object. Offsets within other objects are not affected by LXOPT. Note
that LXOPT alters, inserts and removes transfer instructions (JMP, JNZ etc) as
part of the optimisation process. When using the offsets of transfer
instructions the translated offset may point to a different instruction
sequence but will represent the same logical position in the path of execution.
The corresponding /getNew option provides the reverse of this translation.
ΓòÉΓòÉΓòÉ 8.1.15. /getNew ΓòÉΓòÉΓòÉ
/getNew
Syntax: /getNew:<hex offset>
Find the new position of the code/data that was located at the given offset
into the processed code object in the original executable file.
The locations are given and provided as offsets into the processed code object.
Offsets within other objects are not affected by LXOPT. Note that unused code
or data from the original version of the executable may be removed by LXOPT
optimisation and so will have no corresponding new offset.
Translated offsets of transfer instructions may point to a different transfer
instruction sequence or the transfer sequence may have been optimised out
entirely.
The /getOld option provides the reverse of this translation process.
ΓòÉΓòÉΓòÉ 8.1.16. /groups ΓòÉΓòÉΓòÉ
/groups
Syntax: /groups:<group_count> (default 3)
Specify the number of distribution groups into which to divide code blocks when
using the stat algorithm.
The optimum value for this setting is application specific and may require
experimentation to achieve the best result.
ΓòÉΓòÉΓòÉ 8.1.17. /ignoreMsgSeg ΓòÉΓòÉΓòÉ
/ignoreMsgSeg
Syntax: /ignoreMsgSeg
WARNING: USE OF THIS OPTION WILL CAUSE PROCESSED CODE TO FAIL ON
THE FIRST ATTEMPT TO RETRIEVE DosGetMessage DATA.
Force LXOPT to ignore DosGetMessage data within application code. This data
bypasses normal fixup referencing and is incompatible with LXOPT processing.
To avoid problems with DosGetMessage data refer to the instructions given with
the LXO0169 error. Only use the /ignoreMsgSeg option if you are unable to
relink the exe/dll.
The DosGetMessage data is detected by searching for a (0xFF,"_MSGSEG") byte
sequence within the processed code. This option should only be used where the
detection is believed to be incorrect (BEWARE - DosGetMessage is often
included and used by library code) or if it is known that code logic prevents
any calls to DosGetMessage.
ΓòÉΓòÉΓòÉ 8.1.18. /lxinfo ΓòÉΓòÉΓòÉ
/lxinfo
Syntax: /lxinfo:<path>
Specify path name for information gathered from source.
The default is to use the application name with a .LXI extension. Use this
option to place the file in another location.
This option must be consistent across use of the /prep and /arrange options.
ΓòÉΓòÉΓòÉ 8.1.19. /noreport ΓòÉΓòÉΓòÉ
/noreport
Syntax: /noreport
Do not generate comparative report on old and new code arrangements. See the
/report option for a description of report generation.
ΓòÉΓòÉΓòÉ 8.1.20. /orig ΓòÉΓòÉΓòÉ
/orig
Syntax: /orig:<path>
Specify path name to which the original application file should be renamed. If
this file already exists it is deleted.
Default is to use the application name with a .ORI extension.
This option must be consistent across use of the /prep and /arrange options.
ΓòÉΓòÉΓòÉ 8.1.21. /overwrite ΓòÉΓòÉΓòÉ
/overwrite
Syntax: /overwrite
Disable test which checks that original EXE/DLL file has not been updated
before overwriting it after a new code arrangement has been generated.
During arrangement LXOPT will normally check that an application file has not
changed since it was prepared. If it has confirmation is sought that the file
may be overwritten. This option switches off that test and always overwrites
the original file.
This option is only relevant when used with the /arrange option.
ΓòÉΓòÉΓòÉ 8.1.22. /pack2 ΓòÉΓòÉΓòÉ
/pack2
Syntax: /pack2[:<reduction threshold% (default 15)>] (default auto)
Force the use of executable file page compression as introduced with OS/2 WARP.
Each page in the file is compressed and the reduction in size calculated. If
the percentage reduction is greater than 15% the compressed form of the page is
used. The threshold of 15% may be varied by providing the new threshold to be
used as a numeric parameter. FILES USING PAGE LEVEL COMPRESSION WILL NOT
EXECUTE ON VERSIONS OF OS/2 PRIOR TO WARP.
To preserve the operating system version compatibility of the existing software
LXOPT will only use page compression if the file already contains compressed
pages or if the /pack2 option is specified. All pages in the file including
those containing data and resources are candidates for compression.
The use of /pack2:100 will remove all compression from the resulting
executable.
ΓòÉΓòÉΓòÉ 8.1.23. /preload ΓòÉΓòÉΓòÉ
/preload
Syntax: /preload
Allow the produced EXE/DLL file to be used with the Preload Utility. This
option should be used in combination with the /arrange option.
Preloading raises important development and performance issues. See the
section on Preloading Code for a discussion of when and how to use files
prepared with this option.
Use of /preload causes LXOPT to intercept EXE/DLL initialisation and insert a
small preloading code stub of approximately 200 bytes. The resulting file may
be used normally or preloaded using the Preload Utility. Files with 16 bit
initialisation entry points may be not be preloaded.
The inserted code stub is activated by the Preload Utility when a request is
made to preload the file.
ΓòÉΓòÉΓòÉ 8.1.24. /prep ΓòÉΓòÉΓòÉ
/prep
Syntax: /prep (default)
Prepare the given exe/dll for a recording session. Alternative is /arrange.
For an explanation of the preparation process see Creating a Recording Version.
ΓòÉΓòÉΓòÉ 8.1.25. /recfile ΓòÉΓòÉΓòÉ
/recfile
Syntax: /recfile:<path>
Specify path name of recording file. If the file already exists when recording
starts the new recording is appended.
The path name is used exactly as entered. A relative path name will be used to
open/create the recording file if one is specified. For example,
/recfile:MYAPP.REC will create the recording file MYAPP.REC in the current
directory of the executing program. This can be useful if recording is to be
performed on other machines
The default is to use the full application path name with a .REC extension.
Moving data from the recording buffer to the recording file can form a
significant part of the overhead of the recording process. Recording files
should ideally be placed on the fastest available local drive.
Certain applications and DLLs have more than one instance while running. When
multiple instances are detected the recording DLL creates a new recording file
for each instance. The name for this file is generated by appending the
instance number separated by a dot to the normal recording file name. Thus 3
instances of a DLL during recording might create:
MYDLL.REC
MYDLL.REC.2
MYDLL.REC.3
If recording onto a FAT based partition you will need to use a /recfile path
without an extension to allow production of valid 8.3 filenames.
When using the /arrange option only one recfile may be specified. To use
multiple files concatenate them into one file with the OS/2 COPY command and
specify the result as the recording file (e.g. COPY C:\RECDIR\*.REC*
COMBINED.REC). File concatenation also allows multiple recordings to be
created at different times or on different machines and then combined before
use.
ΓòÉΓòÉΓòÉ 8.1.26. /reckeep ΓòÉΓòÉΓòÉ
/reckeep
Syntax: /recKeep:<path>
Specify path name to which the previously generated recording application
should be renamed. If this file already exists it is deleted.
When a new arrangement is created LXOPT keeps the recording version of the
application to allow further recording if required.
Valid only with the /arrange option, this option allows the user to specify the
path name to which the recording version should be renamed.
Default is to use application path name with a .PRP extension.
ΓòÉΓòÉΓòÉ 8.1.27. /report ΓòÉΓòÉΓòÉ
/report
Syntax: /report (default)
Show comparative page fault report using old and new arrangements.
LXOPT can use the recording file to simulate program page fault behaviour under
various available memory conditions. This is used to provide a comparison
between the old and new code arrangements.
Note: LXOPT can only simulate code accesses. If data is stored within the code
object page faults caused by references to it will not be simulated.
While informative a report may take some time to calculate and may be disabled
by the alternative /noreport option.
ΓòÉΓòÉΓòÉ 8.1.28. /thread ΓòÉΓòÉΓòÉ
/thread
Syntax: /thread:<single | multi | async | crit>
Specifies the how the target application uses threads. This information is
used to decide if or how the LXOPT recorder should protect itself from
interruption. Users need not use this option unless they experience
performance problems while recording.
single
Target application is single threaded. Recording code does not need to guard
against interruption by another thread. This involves the least overhead
during recording and should be used whenever possible.
multi
Target application is multi-threaded but does not terminate/suspend threads
asynchronously. Recording code guards against interruption by another thread
but assumes that no thread halts execution of another by use of
DosKillThread(otherThread) or DosSuspendThread(otherThread). This option
carries a slightly greater overhead during recording than the 'single' option.
async
Target application is multi-threaded and may terminate/suspend threads
asynchronously. Recording code ensures that it is not halted inside critical
instruction sequences. This involves a higher overhead during recording but
does not allow for multiple threads of differing priorities.
crit (Default)
Target application is multi-threaded, may terminate/suspend threads
asynchronously and may vary thread priorities. Recording code ensures that no
thread can ever interrupt critical instruction sequences. This involves a
higher overhead during recording but is the safest option.
Use the option with the least overhead that your application allows. If you
are unsure which option is correct for your application use 'crit'. If a thread
safety level lower than required is used your application may lock-up, deadlock
on semaphores or fail during recording.
The /thread option takes effect only with the /prep option.
ΓòÉΓòÉΓòÉ 8.2. Glossary ΓòÉΓòÉΓòÉ
Glossary
ΓòÉΓòÉΓòÉ 8.2.1. Binary Arrangement Algorithm ΓòÉΓòÉΓòÉ
Binary Arrangement Algorithm
An algorithm to calculate a new code layout based on code usage information
obtained while recording.
The binary arrangement algorithm is the default LXOPT arrangement algorithm.
The recording history is used to generate a binary pattern with bits set
corresponding to time of use in the executing application. A pattern is
created for each code block within the application. Blocks are then grouped
together based on similarity of these patterns. For many applications the
binary algorithm will produce the best results.
Algorithms are chosen by use of the /alg option.
ΓòÉΓòÉΓòÉ 8.2.2. Firstuse Arrangement Algorithm ΓòÉΓòÉΓòÉ
Firstuse Arrangement Algorithm
An algorithm to calculate a new code layout based on code usage information
obtained while recording.
The firstuse algorithm arranges code on disk in the order in which it is first
executed. This is a simple but effective algorithm which is best used on small
applications and those with a simple flow of control.
Algorithms are chosen by use of the /alg option.
ΓòÉΓòÉΓòÉ 8.2.3. Fixup ΓòÉΓòÉΓòÉ
Fixup
When an application is loaded any absolute references to other parts of the
application and all external references such as calls to the operating system
must be resolved.
For this to be achieved information must be retained in the exe/dll file
describing the references which need to be 'fixed'. This information, known as
'relocations' or 'fixups', allows code to be located at a position in memory
chosen by the operating system. Fixups are automatically applied as each code
(or data) page is loaded from the executable file.
ΓòÉΓòÉΓòÉ 8.2.4. Page Fault ΓòÉΓòÉΓòÉ
Page Fault
All code for normal applications is divided into 4Kb pages. A page fault is
generated when an attempt is made to reference an address within a page that is
not already in memory (*).
When a page fault is generated OS/2 must load the page from disk, apply fixups
to the page and return control to the application. All application code is
loaded via page faults. When an application is started OS/2 does not
immediately read any application code from the disk (**). Code is loaded into
memory only as a result of page faults generated as the path of execution
strays on to each new page. Execution of the errant thread is suspended until
the page is loaded and available for use. It is this effect which often causes
the system to appear slow or 'jerky', most noticeably at application start-up
or after the use of a large memory hungry application.
The problem is compounded in low memory conditions where OS/2 is forced to
release the contents of one memory page to provide space for another. If the
original page is later referenced it must be reloaded, again taking space from
another. Servicing page faults can represent a significant overhead for both
your application and the system as a whole.
LXOPT optimises your code layout to minimise the number of page faults required
to run your application.
* Not all page faults result in a disk access. OS/2 will mark pages 'not
present' and use resulting page faults to detect page references. This
mechanism helps to prevent repeatedly accessed pages being removed from memory.
These 'artificial' page faults have only a small CPU impact, the page is found
in memory and will not be read from disk.
** OS/2 V2.0 and later ignore EXE/DLL preload pages created by use of the
PRELOAD keyword used in module definition files.
ΓòÉΓòÉΓòÉ 8.2.5. Parked Code ΓòÉΓòÉΓòÉ
Parked Code
Code that is logically reachable but is not normally executed by your
application. Such code is detected by LXOPT and placed or 'parked' in your
applications executable file on code pages where it causes least overhead.
See Creating Optimised Code Arrangement for more details.
ΓòÉΓòÉΓòÉ 8.2.6. ParkOnly Arrangement Algorithm ΓòÉΓòÉΓòÉ
ParkOnly Arrangement Algorithm
An algorithm to calculate a new code layout based on code usage information
obtained while recording.
The parkonly arrangement algorithm does not attempt to perform any code
arrangement other than parking any unused code.
This algorithm is intended to allow developers with special code arrangement
needs to retain control over the arrangement of executed code while still
gaining the benefit of code parking. This algorithm is not intended for normal
use and should only be chosen by developers who have created manual code
layouts to match their special requirements.
Algorithms are chosen by use of the /alg option.
ΓòÉΓòÉΓòÉ 8.2.7. Recording File ΓòÉΓòÉΓòÉ
Recording File
A file generated during the recording phase of LXOPT usage. This file is
created and expanded as your application runs, recording the paths of execution
through your code.
See /recfile for more information and how to specify the recording file path.
ΓòÉΓòÉΓòÉ 8.2.8. Sleeping Code ΓòÉΓòÉΓòÉ
Sleeping Code
Code not normally executed by your application. Such code is typically used to
handle unusual situations not encountered by your application during recording.
This code may also be logically unreachable.
LXOPT can place such code in your applications executable file in a location
where it causes the least overhead while remaining accessible.
See Creating Optimised Code Arrangement for more details.
ΓòÉΓòÉΓòÉ 8.2.9. Statistical Arrangement Algorithm ΓòÉΓòÉΓòÉ
Statistical Arrangement Algorithm
An algorithm to calculate a new code layout based on code usage information
obtained while recording.
The stat algorithm uses statistics to detect the pattern of use of each code
sequence throughout the life of the application. Code sequences are grouped by
this pattern and then arranged within the group based on the most frequently
used ordering. The optimum number of groups into which to split the code is
dependant on your application and is specified with the /groups option.
Experiment with this option using different group values to find the optimum
value for your application. This is typically between 1 and 20.
Performance of the stat algorithm is degraded when using high values for the
/compress option.
The stat algorithm is best suited to large applications or those with a complex
flow of control. Experiment with values for the /groups option to achieve the
best results.
Algorithms are chosen by use of the /alg option.
ΓòÉΓòÉΓòÉ 8.3. Program Messages ΓòÉΓòÉΓòÉ
Program Messages
ΓòÉΓòÉΓòÉ 8.3.1. Internal Errors ΓòÉΓòÉΓòÉ
Internal Errors
The LXOPT software performs numerous internal consistency checks during its
operation. If one of these tests should fail an internal error is produced.
If you have not read the restrictions section please do so now, it is likely
that some code has violated a restriction. Try moving suspect code out of the
processed code object (put it in a separate named segment) and reattempt LXOPT
processing.
ΓòÉΓòÉΓòÉ 8.3.2. LXO0100 ΓòÉΓòÉΓòÉ
LXO0100
Failed to expand program fixup records.
LXOPT failed to expand the applications fixup records. Check that the
application file is not corrupt (e.g. run exehdr on it).
ΓòÉΓòÉΓòÉ 8.3.3. LXO0101 ΓòÉΓòÉΓòÉ
LXO0101
Unable to create dump file.
Dump files are used for debugging purposes and are created with the name "dump"
in the current directory of the application.
ΓòÉΓòÉΓòÉ 8.3.4. LXO0102 ΓòÉΓòÉΓòÉ
LXO0102
Unable to pursue code instruction sequence.
LXOPT was unable to pursue a sequence of bytes which it assumed to be code.
This may be caused by exporting data from your applications code segment or by
an internal error in LXOPT.
To rectify export problems move any exported data to a separate object.
To rectify code recognition problems move read-only application data out of
your code object to a read-only data object.
ΓòÉΓòÉΓòÉ 8.3.5. LXO0103 ΓòÉΓòÉΓòÉ
LXO0103
Preparation of new code arrangement failed.
An error occurred preparing the structures for the new application. Please
ensure your original application file is valid.
ΓòÉΓòÉΓòÉ 8.3.6. LXO0104 ΓòÉΓòÉΓòÉ
LXO0104
Failed to create/write LXI file.
A disk error occurred creating or writing the .LXI file. Check that sufficient
disk space is available and usage of the /lxinfo option.
ΓòÉΓòÉΓòÉ 8.3.7. LXO0105 ΓòÉΓòÉΓòÉ
LXO0105
Cannot copy program to "pathname"
Failed to copy the original application file. Check that the /orig option
parameter. If the file is a DLL check that it is not currently loaded by
another application.
ΓòÉΓòÉΓòÉ 8.3.8. LXO0106 ΓòÉΓòÉΓòÉ
LXO0106
Cannot create file "pathname"
Failed to create the given file. Check the path is valid, there is sufficient
free disk space, the file is not in use and the required access rights to the
target directory are available.
ΓòÉΓòÉΓòÉ 8.3.9. LXO0107 ΓòÉΓòÉΓòÉ
LXO0107
Failed to write new program file.
A failure occurred while writing the new executable. Check disk space.
ΓòÉΓòÉΓòÉ 8.3.10. LXO0108 ΓòÉΓòÉΓòÉ
LXO0108
Failed to read LXI file "pathname"
The LXI file could not be read or is invalid. Check use of the /lxinfo option.
ΓòÉΓòÉΓòÉ 8.3.11. LXO0109 ΓòÉΓòÉΓòÉ
LXO0109
Invalid parameter or value for option: "option".
The option or its parameter given to LXOPT was not recognised.
ΓòÉΓòÉΓòÉ 8.3.12. LXO0110 ΓòÉΓòÉΓòÉ
LXO0110
Failed to open "pathname"
Could not open the given file. Check that the pathname is valid, the file
exists and you have sufficient access rights to its directory.
ΓòÉΓòÉΓòÉ 8.3.13. LXO0111 ΓòÉΓòÉΓòÉ
LXO0111
File "pathname" is not a valid linear executable
Could not load the specified file. The file is corrupt or is not an OS/2
2.x/WARP Linear eXecutable file.
ΓòÉΓòÉΓòÉ 8.3.14. LXO0112 ΓòÉΓòÉΓòÉ
LXO0112
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.15. LXO0113 ΓòÉΓòÉΓòÉ
LXO0113
Input offset is out of range.
The supplied offset does not exist. This message appears if an incorrect
parameter is used with the /getOld or /getNew options. Check that the value
given is the offset in hex from the start of the processed code object.
If this message appears during LXOPT preparation or arrangement this is an
Internal Error.
ΓòÉΓòÉΓòÉ 8.3.16. LXO0114 ΓòÉΓòÉΓòÉ
LXO0114
See LXO0113
ΓòÉΓòÉΓòÉ 8.3.17. LXO0115 ΓòÉΓòÉΓòÉ
LXO0115
See LXO0113
ΓòÉΓòÉΓòÉ 8.3.18. LXO0116 ΓòÉΓòÉΓòÉ
LXO0116
See Internal Errors
Although an internal error, this message has only previously occurred where a
user is not following the advice in the Urgent Message section. This is
important advice, do not ignore it.
ΓòÉΓòÉΓòÉ 8.3.19. LXO0117 ΓòÉΓòÉΓòÉ
LXO0117
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.20. LXO0118 ΓòÉΓòÉΓòÉ
LXO0118
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.21. LXO0119 ΓòÉΓòÉΓòÉ
LXO0119
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.22. LXO0120 ΓòÉΓòÉΓòÉ
LXO0120
Information (.LXI) file was not created by this version of LXOPT.
The LXI file you have attempted to use was created by another version of LXOPT.
Applications must be prepared and arranged using the same version of LXOPT.
Prepare and generate new recording information for the application using the
current version of the LXOPT software.
ΓòÉΓòÉΓòÉ 8.3.23. LXO0121 ΓòÉΓòÉΓòÉ
LXO0121
Recording (.REC) file does not belong to the processed application.
Your recording file was not made by the application being processed.
Check that you are attempting to arrange the correct application and that you
are not using a recording file intended for an earlier version or a different
application.
See also the /recfile option.
ΓòÉΓòÉΓòÉ 8.3.24. LXO0122 ΓòÉΓòÉΓòÉ
LXO0122
.REC file is not valid.
The given recording file has been corrupted or is not a recording file. Check
use of the /recfile option parameter.
ΓòÉΓòÉΓòÉ 8.3.25. LXO0123 ΓòÉΓòÉΓòÉ
LXO0123
Unrecognised arrangement algorithm "alg-name"
The given arrangement algorithm name is not recognised. See the /alg option for
more details.
ΓòÉΓòÉΓòÉ 8.3.26. LXO0124 ΓòÉΓòÉΓòÉ
LXO0124
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.27. LXO0125 ΓòÉΓòÉΓòÉ
LXO0125
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.28. LXO0126 ΓòÉΓòÉΓòÉ
LXO0126
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.29. LXO0127 ΓòÉΓòÉΓòÉ
LXO0127
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.30. LXO0128 ΓòÉΓòÉΓòÉ
LXO0128
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.31. LXO0129 ΓòÉΓòÉΓòÉ
LXO0129
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.32. LXO0130 ΓòÉΓòÉΓòÉ
LXO0130
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.33. LXO0131 ΓòÉΓòÉΓòÉ
LXO0131
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.34. LXO0132 ΓòÉΓòÉΓòÉ
LXO0132
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.35. LXO0133 ΓòÉΓòÉΓòÉ
LXO0133
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.36. LXO0134 ΓòÉΓòÉΓòÉ
LXO0134
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.37. LXO0135 ΓòÉΓòÉΓòÉ
LXO0135
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.38. LXO0136 ΓòÉΓòÉΓòÉ
LXO0136
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.39. LXO0137 ΓòÉΓòÉΓòÉ
LXO0137
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.40. LXO0138 ΓòÉΓòÉΓòÉ
LXO0138
Code object exports a 286 call gate entry point.
LXOPT has detected an exported 16 bit call gate entry which refers to the 32
bit code object which is being processed. It is likely that when control is
received via this entry point that some 16 bit selectors will be in effect.
LXOPT is designed to be applied to 32 bit code. You cannot export 16 bit call
gate entry points from the processed 32 bit code object.
ΓòÉΓòÉΓòÉ 8.3.41. LXO0139 ΓòÉΓòÉΓòÉ
LXO0139
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.42. LXO0140 ΓòÉΓòÉΓòÉ
LXO0140
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.43. LXO0141 ΓòÉΓòÉΓòÉ
LXO0141
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.44. LXO0142 ΓòÉΓòÉΓòÉ
LXO0142
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.45. LXO0143 ΓòÉΓòÉΓòÉ
LXO0143
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.46. LXO0144 ΓòÉΓòÉΓòÉ
LXO0144
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.47. LXO0145 ΓòÉΓòÉΓòÉ
LXO0145
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.48. LXO0146 ΓòÉΓòÉΓòÉ
LXO0146
Out of memory.
LXOPT ran out of memory while processing the application. Up to approximately
15 times the size of the application code size will need to be allocated for
processing to succeed. Check that there is sufficient free disk space on your
swap partition.
ΓòÉΓòÉΓòÉ 8.3.49. LXO0147 ΓòÉΓòÉΓòÉ
LXO0147
Failed to follow code pointer.
See LXO0102
ΓòÉΓòÉΓòÉ 8.3.50. LXO0148 ΓòÉΓòÉΓòÉ
LXO0148
No objects in program module.
The processed application does not contain any code.
ΓòÉΓòÉΓòÉ 8.3.51. LXO0149 ΓòÉΓòÉΓòÉ
LXO0149
No pages in program module.
The processed application does not contain any code.
ΓòÉΓòÉΓòÉ 8.3.52. LXO0150 ΓòÉΓòÉΓòÉ
LXO0150
Unused area "start address" to "end address" contains fixups.
The bytes between the start and end offsets have been detected as unused.
However they represent either valid code or initialised data.
This warning will appear for almost all applications processed and demonstrates
LXOPT's effectiveness at finding unused code or data. Often such code is
contained in libraries to which your application is linked. Offsets are
relative to the start of the processed code object. Unused code or data is not
referenced by any part of the application and is removed by LXOPT
If your application contains assembler code that you suspect might be violating
LXOPT restrictions you can use the address range and your map file to identify
the source code.
ΓòÉΓòÉΓòÉ 8.3.53. LXO0151 ΓòÉΓòÉΓòÉ
LXO0151
Code object "object-number" is writeable.
The given code object is writeable. Although LXOPT can still process the
application writeable code objects can indicate that programming techniques
that violate LXOPT restrictions may be in use. This is particularly true if the
object named is the one being processed.
ΓòÉΓòÉΓòÉ 8.3.54. LXO0152 ΓòÉΓòÉΓòÉ
LXO0152
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.55. LXO0153 ΓòÉΓòÉΓòÉ
LXO0153
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.56. LXO0154 ΓòÉΓòÉΓòÉ
LXO0154
Base not multiple of 64Kb. The specified new program base is not a multiple of
64Kb. This warning normally appears when the new base has been incorrectly
specified. Base values are entered as an address in hex.
See the /base option for correct usage.
ΓòÉΓòÉΓòÉ 8.3.57. LXO0155 ΓòÉΓòÉΓòÉ
LXO0155
Module is not standard EXE or DLL.
The application file is of a type not processed by LXOPT. Typically this
message is produced by an attempt to apply LXOPT to a physical or virtual
device driver.
ΓòÉΓòÉΓòÉ 8.3.58. LXO0156 ΓòÉΓòÉΓòÉ
LXO0156
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.59. LXO0157 ΓòÉΓòÉΓòÉ
LXO0157
Internal fixups have been removed.
The EXE file has been given a base address. Applying a base address when
linking strips internal fixups from EXE files. LXOPT needs internal fixups to
enable it to correctly process your application.
See Preparing Applications for more information.
ΓòÉΓòÉΓòÉ 8.3.60. LXO0158 ΓòÉΓòÉΓòÉ
LXO0158
File contains an unknown page type.
The processed file contains code/data encoded in a manner unknown to LXOPT or
OS/2 V2.1. This is probably due to the use of a new linker or other
development tool.
To rectify the problem choose options with the new linker/tool that will allow
the resulting application to run under OS/2 V2.x or WARP. Check also that you
are using the latest available version of LXOPT.
ΓòÉΓòÉΓòÉ 8.3.61. LXO0159 ΓòÉΓòÉΓòÉ
LXO0159
Failed to decompress EXEPACK2 page, file corrupted?.
The processed file contains code or data identified as compressed using the
technique introduced by OS/2 V3.0 (Warp). A compressed page did not expand
correctly.
It is likely that the file is corrupted. Please regenerate any files created
via the resource compiler -X2 option and relink the application.
ΓòÉΓòÉΓòÉ 8.3.62. LXO0160 ΓòÉΓòÉΓòÉ
LXO0160
Information (.LXI) file does not belong to the processed application.
Your information file was not made by the application being processed.
Check that you are attempting to arrange the correct application and that you
are not using an information file intended for an earlier version or a
different application.
See also the /lxinfo option.
ΓòÉΓòÉΓòÉ 8.3.63. LXO0161 ΓòÉΓòÉΓòÉ
LXO0161
The preload option or LXOPT demo version cannot process files with 16 bit
library initialisation code.
The /preload option needs to alter library initialisation to ensure the DLL is
correctly loaded. Your DLL contains a 16 bit entry point and LXOPT is unable to
process the initialisation sequence.
To use the /preload option with this DLL you must move your library
initialisation to 32 bit code. Alternatively do not use /preload with this
DLL. A 16 bit main entry point indicates that the restrictions section will
need to be read with particular notice to the section on SS:ESP.
ΓòÉΓòÉΓòÉ 8.3.64. LXO0162 ΓòÉΓòÉΓòÉ
LXO0162
No recorder dll name available.
LXOPT has failed to generate a unique name for your application specific
recording DLL.
It is likely that over a period of time a large number of unused recording DLLs
has built up in your development directory. Delete all old versions and
reattempt preparation.
ΓòÉΓòÉΓòÉ 8.3.65. LXO0163 ΓòÉΓòÉΓòÉ
LXO0163
Cannot locate LXOPT recorder data. Check your PATH and installation.
LXOPT failed to find its installation directory where vital data is stored.
Please ensure that the LXOPT installation directory is on your PATH and that
installation completed successfully.
ΓòÉΓòÉΓòÉ 8.3.66. LXO0164 ΓòÉΓòÉΓòÉ
LXO0164
Disk access failure creating recorder DLL.
Creation of a recorder DLL failed due to a file access failure.
Check that you have enough available disk space and that you have sufficient
access rights if the target directory is on a network. Check also that an
existing recording DLL is not in use.
ΓòÉΓòÉΓòÉ 8.3.67. LXO0165 ΓòÉΓòÉΓòÉ
LXO0165
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.68. LXO0166 ΓòÉΓòÉΓòÉ
LXO0166
High number of layout attempts: <N>, continuing...
LXOPT is having difficulty recreating your EXE/DLL.
Some large applications make take a significant number of attempts before a
successful layout can be achieved. LXOPT should always eventually succeed.
ΓòÉΓòÉΓòÉ 8.3.69. LXO0167 ΓòÉΓòÉΓòÉ
LXO0167
Cannot copy prepared recording program to "pathname"
Failed to copy the prepared application file. Check disk space or
file/directory access permissions.
ΓòÉΓòÉΓòÉ 8.3.70. LXO0168 ΓòÉΓòÉΓòÉ
LXO0168
See Internal Errors
ΓòÉΓòÉΓòÉ 8.3.71. LXO0169 ΓòÉΓòÉΓòÉ
LXO0169
Active DosGetMessage data (MSGSEG32) detected in code area.
Some message data used by DosGetMessage is contained within the code object
processed by LXOPT. This message data is retrieved during execution of
DosGetMessage code using the address of the containing code object and adding a
predetermined offset to it. This mechanism bypasses normal code/data
referencing and is incompatible with LXOPT processing.
This data is often introduced by use of functions within the IBM CSet/VAC
libraries which in turn depend on DosGetMessage for their own message handling.
To prevent this problem move the message data to another code object by
identifying the 'segment' in your module definition (.DEF) file. If the
processed file was an executable which does not currently have a .DEF file,
create a file <exe-name>.DEF and insert the lines below. Remember to include
your definition file when you relink.
NAME <exe-name> <WINDOWAPI | WINDOWCOMPAT | NOTWINOWCOMPAT>
SEGMENTS
_MSGSEG32 CLASS 'CODE'
For existing .DEF files simply add the line containing _MSGSEG32 to the
SEGMENTS section.
MSGSEG32 data is detected by searching for data which starts with a 0xFF byte
followed by the text MSGSEG32. In the unlikely event that your code contains
this data sequence for another purpose or you are SURE that it is not used this
LXOPT test may be disabled, see the /ignoreMsgSeg option for more details.
ΓòÉΓòÉΓòÉ 9. Utilities ΓòÉΓòÉΓòÉ
Utilities
ΓòÉΓòÉΓòÉ 9.1. LXWarp - Apply OS/2 WARP compression to 2.x executables ΓòÉΓòÉΓòÉ
LXWarp - Apply OS/2 WARP compression to 2.x executables
Usage: LXWARP <filename> [/clear] [/bakfile:<backup_pathname>]
[/threshold:<reduction_threshold>]
Examples:
To compress 'myapp.dll' and delete the backup file on successful completion.
LXWARP myapp.dll /clear
Compress 'myapp.dll' pages where compression reduces the page size by more
than 20%. Original file stored as 'myapp.wbk'.
LXWARP myapp.dll /threshold:20
LXWARP takes an existing executable and applies OS/2 WARP page compression to
it. The EXE/DLL file created will occupy less disk space and take less time
to be loaded by the operating system. This new file will not execute under
versions of OS/2 prior to WARP.
LXWARP allows OS/2 WARP users of 2.x targeted code to gain the benefits of
page level compression. Developers may also use LXWARP to process third party
2.x targeted DLLs for use with their OS/2 WARP specific applications.
ΓòÉΓòÉΓòÉ 9.1.1. /clear ΓòÉΓòÉΓòÉ
/clear
Syntax: /clear (default OFF)
Delete the copy of the original executable file when LXWARP/LXUNWARP processing
is successfully completed.
The LXWARP and LXUNWARP applications both keep backup copies of the original
file during processing. If an error occurs the original file is automatically
restored. If the new executable is successfully created use of this option
will cause the backup file to be deleted.
ΓòÉΓòÉΓòÉ 9.1.2. /bakfile ΓòÉΓòÉΓòÉ
/bakfile
Syntax: /bakfile:<pathname> (default see below)
Specify the name of the backup file in which to keep the unprocessed
executable.
Both LXWARP and LXWARP always create a copy of the original file before
processing. By default this backup file is created with the same root name and
in the same directory as the original file. By default LXWARP backup files
have the extension 'WBK', LXUNWARP backup files have the extension 'UBK'.
ΓòÉΓòÉΓòÉ 9.1.3. /threshold ΓòÉΓòÉΓòÉ
/threshold
Syntax: /threshold:<threshold%> (default 15)
Specify the reduction in page size required for compression to be used.
Each page in the file is compressed and the reduction in size compared with the
original page size. If the percentage reduction is greater than the value
supplied by this parameter the compressed form of the page is used.
ΓòÉΓòÉΓòÉ 9.2. LXUnWarp - Remove OS/2 WARP compression from executables ΓòÉΓòÉΓòÉ
LXUnWarp - Remove OS/2 WARP compression from executables
Usage: LXUNWARP <filename> [/clear] [/bakfile:<backup_pathname>]
Example: LXUNWARP myapp.dll
LXUNWARP takes an existing executable file and expands OS/2 WARP compressed
pages.
Although the resulting file will be loadable by OS/2 V2.x the application may
still not execute due to OS/2 WARP specific API dependencies.
Expanded pages are tested for normal iterated data encoding which is performed
as required.
LXUNWARP allows OS/2 2.x users to run code linked specifically for OS/2 WARP
installations. It can also be used to undo prior use of the LXWARP command.
Use of LXUNWARP on an LXWARPed file may not result in an exact copy of the
original file due to page alignment issues and choice of iteration strategy.
ΓòÉΓòÉΓòÉ 9.3. Preload - Transfer executables to swapfile ΓòÉΓòÉΓòÉ
Preload - Transfer executables to swapfile
Usage: PRELOAD [options] [<exe/dll path>] [options]
/Q .- Quiet mode (suppress copyright notice)
/S - Silent mode (suppress all output)
/G:<name> - Group name (default is exe/dll file name)
/I - Make Preload Manager invisible (remove from task
list)
/V - Make Preload Manager visible (add to task list)
/U - Unload file/group, unload ALL if none specified
/L - List previous active load instructions
/W:<drvs> - Wait for drive letters to become available. e.g.
/W:FGH
/X - Unload all and terminate the preload manager
/T:<num> - Exe load time-out in seconds <default 120>
/M:<num> - Deny request if below <num>Mb free swap space
(default 10)
/? or /h - Display options
The preload utility allows users to selectively preload and unload LXOPT
produced EXE/DLL files arranged with the /preload option. There are important
DLL initialisation, disk space requirements, boot time and user control issues
raised by preloading code, see the Preloading section for more details.
The utility (PRELOAD.EXE) and its accompanying on-line documentation
(PRELOAD.INF) are included with the LXOPT software package and these two files
may also be distributed with LXOPT processed files.
The program operates in three modes, as a command line utility as described
above, in background Wait Mode awaiting network drive availability and as a
continuously active Preload Manager. The preload utility automatically
activates a manager process if one is not currently running. All utility
requests are passed on to the manager to be performed.
Preload Manager
The preload manager performs commands supplied via the preload utility.
Started by the first use of the preload utility the manager runs continuously
in the background. When made visible by the /V option the preload process
appears to the user as a normal windowable VIO application. If made invisible
via the /I option the process is hidden, removed from the PM task list and
prevented from rejecting an operating system shutdown. When operating
invisibly the process may only be interacted with via the preload utility.
Termination of the manager will cause all preloaded programs to be unloaded as
the use count of each module drops to zero. The manager may be terminated by
closing the visible manager session or via the preload utility with the /X
option.
Although the preload manager uses the Presentation Manager API it does not
require PM to be present. If running with another shell (e.g. TSHELL)
interaction with the task list is not attempted and sessions are started using
parameters compatible with TSHELL operation.
Wait Mode
Permanently invisible, a wait mode process is started to handle a preload
request where the /W parameter is used and the drives specified are not
currently available. The wait mode process performs the actions normally
performed by the preload utility, it simply waits for the specified drives to
become available before issuing the load request to the preload manager.
Wait Mode is vital for PRELOAD to be able to operate effectively in a customer
network environment. Users/installation programs are able to insert preload
requests into startup.cmd without concern for network availability.
Preload Utility
The preload utility provides a simple command line interface to the preload
manager. The utility has options to suppress output (/Q /S), control manager
visibility (/V /I), list current active loads (/L) as well as provide the
basic file loading interface.
DLL files are loaded into the Preload Manager process, each executable is
provided with its own session. Execution of normal initialisation is
suppressed in LXOPT processed EXE files and per instance initialised DLLs.
This is designed to ensure preloading is both transparent and does not
adversely affect normally loaded versions of the code. LXOPT processed
globally initialised DLLs and all unprocessed DLLs will initialise normally.
For a fuller discussion of initialisation and the issues raised see Preloading
Files are loaded into named groups. By default the group name is the filename
derived from the pathname specified on the command line. For example
PRELOAD C:\APPS\FAST_APP.EXE
will preload the LXOPT processed executable file FAST_APP.EXE, LXOPT processed
DLLs on which it depends and load normally any other DLLs which it requires.
These files are treated as a single loadable unit under the group name
FAST_APP.EXE. Group names become useful when a user wishes to unload all the
preloaded files associated with a single product. For example
PRELOAD C:\APPS\FAST_APP.EXE /G:FAST_SUITE
PRELOAD C:\APPS\FAST_SRV.EXE /G:FAST_SUITE
PRELOAD C:\APPS\FAST_WPS.DLL /G:FAST_SUITE
preloads the 'FAST' application, background server and workplace shell object
DLL under the single name 'FAST_SUITE'. To release the preload on all these
files requires one command
PRELOAD /G:FAST_SUITE /U
The preloader uses the operating system calls DosStartSession and
DosLoadModule to load files and then transfers them to the swapfile. This can
be a time consuming process and by default the preloader will wait for up to
120 seconds for a load request to complete. This time limit may be adjusted
using the /T option.
Preload requests where the amount of free swap space is less than 10Mb will be
rejected. This limit may be varied by use of the /M option. /M:0 will
disable free swap space testing but is not recommended, particularly for
preload requests in startup.cmd.
Preload requests may be issued if some or all of the requested files are
already active. The preload operation is transparent to any existing
executing code. When complete all instances of preloaded code, including
those previously running, will benefit from the preloading effect.
ΓòÉΓòÉΓòÉ 9.4. TimeRun - Measure application run-time ΓòÉΓòÉΓòÉ
TimeRun - Measure application run-time
Usage: TIMERUN <application-name> [parameters]
Example:
TIMERUN touch data.txt Execute touch.exe with the parameter
'data.txt'
TIMERUN executes the given application and displays the total execution time
on termination. This is the total time elapsed from start-up to termination.
This is not a measure of CPU time.
Before executing the application TIMERUN waits 5 seconds to ensure disk
activity due to delayed writes has completed.
Repeating the same command often results in a reduced execution time due to
caching effects. Subsequent execution times may then vary due to the activity
of background processes. This can make reliable comparative performance
testing very difficult, see Performance Testing for more details.
TIMERUN uses CMD.EXE to execute the given command. Total time elapsed may be
dominated by the time taken to invoke the command processor where total run
time is less than 10 seconds.
ΓòÉΓòÉΓòÉ 9.5. Thrash - Create high memory load for performance testing ΓòÉΓòÉΓòÉ
Thrash - Create high memory load for performance testing
Usage: THRASH <n | n% | nP>
Examples:
THRASH 4 Allocate and continuously access 4Mb of memory.
THRASH 50% Allocate an amount of memory equivalent to 50% of
system RAM and continuously access it.
THRASH 75P Allocate an amount of memory equivalent to 75% of
system RAM and continuously access it.
THRASH allocates the given amount of memory and continuously accesses it.
Order of access is updated dynamically to maximise the amount of the
allocation retained in system RAM.
THRASH provides a simple way to produce a constant high memory load as might
be generated by several large background applications. This can be very
useful for testing software performance under low memory conditions.
The alternative of 'P' for the '%' trailing character is provided to simplify
the passing of parameters via a WPS program object.
Use THRASH with caution. Thrashing large quantities of memory may cause your
system to run unacceptably slowly. Any keyboard input to THRASH will cause it
to release the allocated memory and terminate.
ΓòÉΓòÉΓòÉ 9.6. UnLXOPT - Undo processing and delete LXOPTed files ΓòÉΓòÉΓòÉ
UnLXOPT - Undo processing and delete LXOPTed files
Usage: UNLXOPT <application-name>
Example:
UNLXOPT MYAPP.DLL Restore original MYAPP.DLL and delete
LXOPT created files
UNLXOPT is a simple tool that restores the original version of the LXOPTed
file and then deletes any .REC, .PRP, .ORI and automatically generated
recording DLL. The file names to be deleted are generated by using the
default names that would be used by LXOPT when processing the supplied
application file.
ΓòÉΓòÉΓòÉ 10. FAQs ΓòÉΓòÉΓòÉ
FAQs
ΓòÉΓòÉΓòÉ 10.1. How Do I Trace CS:EIP Back to My Source Code? ΓòÉΓòÉΓòÉ
How Do I Trace CS:EIP Back to My Source Code?
This is in fact two questions: how to find the original CS:EIP of a location in
an LXOPTed file and then how to trace that back to the source code.
CS:EIP values are typically provided as <object_number>:<offset> pairs. LXOPT
only alters the offsets of code within the largest code object so if the object
number is not the same as the one LXOPT processed that offset will not have
been altered by LXOPT processing.
If the object number is the one processed by LXOPT use the LXOPT /getOld option
to find the original offset value.
Once you have obtained the original offset value you can then search your map
file to find the containing function or examine the code in a debugger to
locate the instruction that caused the error.
ΓòÉΓòÉΓòÉ 10.2. Isn't /BASE:65536 Better With EXE Files? ΓòÉΓòÉΓòÉ
Isn't /BASE:65536 Better With EXE Files?
The simple answer is yes. All EXE files produced by LXOPT are automatically
based at address 65536 and the internal fixups removed.
LXOPT only requires input EXE files to avoid this option to ensure the file
contains all the information required for correct processing. This has no
adverse affect on the resulting LXOPT processed executable.
ΓòÉΓòÉΓòÉ 10.3. Which is the Best Arrangement Algorithm? ΓòÉΓòÉΓòÉ
Which is the Best Arrangement Algorithm?
There is no single algorithm that out performs all others for all
situations/executables.
The binary algorithm introduced with V1.1 is now the default algorithm and is
most likely to produce the best overall results.
Choice of algorithm is best made by experimentation. There may not always be a
clear winner, sometimes the best algorithm for a 100Kb memory restriction will
be beaten by another at 200Kb.
Due to the 'Least Recently Used' algorithm used to recycle pages of memory,
applications with a short runtime are unlikely to ever be memory restricted
unless they require large quantities of system RAM. Such applications have
little need for complex arrangements and may be better suited to the firstuse
algorithm. This algorithm produces the most sequential code which may result
in more efficient page loads and cpu caching.
The Stat arrangement algorithm may be used on larger applications with long
complex runtimes. Use of Stat is complicated by its /groups option and will
often be out performed by the binary algorithm.
ΓòÉΓòÉΓòÉ 10.4. Where is my Recording File? ΓòÉΓòÉΓòÉ
Where is my Recording File?
First be sure you have executed the recording version of the code. If you
invoke your code via a program object it may now reference the .ORI file. If a
previous version of an EXE/DLL is preloaded this old version will still be
executed even though a new file exists.
Recording files are created by an LXOPT prepared application while it executes.
By default this file has the full path name of the executable provided to LXOPT
but with a .REC extension.
If the executable file name provided via /recfile does not have a full path
(e.g. .\MYAPP.REC) then when the recording version of the application executes
it will also attempt to create a file without a full path. As no full path is
used this file will be created based on the current drive/directory at time of
creation. This can be useful where recording is performed on a separate
machine and facilitates simultaneous use of the recording version with
different machines on the same network. This can however cause problems where
the current drive/directory is not predictable and in these circumstances
specify a full pathname for the recording file by use of the /recfile option.
ΓòÉΓòÉΓòÉ 10.5. LXOPT Just Hangs During Processing! ΓòÉΓòÉΓòÉ
LXOPT Just Hangs During Processing!
Although LXOPT may appear to be a simple tuning utility it requires a
considerable amount of CPU time and memory to process large input files.
Preparation/arrangement times of over 5 hours have been reported. Run times
can be particularly long where a large file is processed on a machine with 8Mb
or less of RAM.
During processing LXOPT is effectively recompiling and relinking your entire
application in addition to performing its working set tuning function. If you
suspect LXOPT has hung please allow it to run uninterrupted overnight before
reporting the problem.
ΓòÉΓòÉΓòÉ 10.6. Why Isn't my Application 50% Faster? ΓòÉΓòÉΓòÉ
Why Isn't my Application 50% Faster?
LXOPT rearranges application code to reduce the total memory occupied by code
and the time taken to load it. The percentages reported by the arrangement
process are expected reductions in page faults, not overall run time.
It follows that any performance improvement in this area is dictated by how
much time the original application spent loading its code. If your application
is a 30Kb executable searching for prime numbers then performance improvements
are likely to be restricted to CPU caching benefits. These CPU caching
improvements are unlikely to produce more than a 5% performance boost.
Working set tuning, like any efficiency measure, shows its worth when the
commodity in question is in short supply. Examination of an arrangement report
shows that page fault reductions improve under increasingly restricted memory
conditions. The improvement is not just in percentage terms, but much more
significantly in total number of faults. The greater the page fault load, the
better LXOPT performs.
ΓòÉΓòÉΓòÉ 10.7. Why Does my Page Fault Monitor Report More Page Faults? ΓòÉΓòÉΓòÉ
Why Does my Page Fault Monitor Report More Page Faults?
When LXOPT produces an arrangement report it shows the expected number of page
faults loading code from the processed code object. LXOPT only tunes the code
arrangement within this processed object. LXOPT does not tune data accesses or
attempt to simulate page faults generated by any other means.
Your application will generate page faults loading application data and
executing code in other DLLs such as Presentation Manager APIs. Performance
monitoring tools tend to collect and report all such page fault data in
combined form.
Although the number of page faults reported by such tools will differ from that
provided by LXOPT, they can still be used to assess the improvement in an LXOPT
processed applications performance. Use your tool on the original and
processed versions of the application and examine the reduction in total page
faults. If your test mimics the actions performed during LXOPT recording this
reduction should agree the numeric reduction in page faults predicted by LXOPT.
Page fault monitoring tools will not reveal CPU caching or reduced disk seek
time benefits.
ΓòÉΓòÉΓòÉ 10.8. Why Does my Application Make a Beeping Noise? ΓòÉΓòÉΓòÉ
Why Does my Application Make a Beeping Noise?
Applications produced by LXOPT produce an alternating tone when attempting to
display an LXOPT message to the user. If a message does not appear it is
likely that the LXOPT installation directory is not on your PATH. Add the
LXOPT directory to the path and retry the application.
Tones generated by a recording version of an application are an indication that
an error has occurred. If an error message is not displayed examine the file
'LXREC.ERR' to identify the source of the error. This is typically a failure
writing to the recording file due to an invalid file name or insufficient disk
space.
LXOPT initiated tones or messages are not generated by optimised EXE/DLL files.
These files will perform identically to their unoptimised originals and do not
require access to any LXOPT support files.
ΓòÉΓòÉΓòÉ 10.9. My Application Fails - Cannot Find <app-name>@1? ΓòÉΓòÉΓòÉ
My Application Fails - Cannot Find <app-name>@1?
The system could not execute the recording version of your application because
it could not find the file <app-name>@1.DLL. This DLL was created for your
application by LXOPT to aid in the recording process.
This file is located in the directory in which the original EXE/DLL file was
processed. Copy the DLL to a directory on your LIBPATH.