Andys Binary Folding Editor is primarily designed for structured browsing, although it also provides minimal editing facilities.
This program is designed to take in a set of binary files, and with the aid of an initialisation file, decode and display the definitions (structures or unions) within them. BE is particularly suited to displaying non-variable length definitions within the files.
This makes examination of known file types easy, and allows rapid and reliable navigation of memory dumps. BE is often used as the data navigation half of a debugger.
For a summary of how to use the editor, see the section Using the editor.
This documentation corresponds to the 12/01/00 version of BE.
BE has the following features :-
long long
).
These are currently all versions except 32 bit OS/2.
usage: be [-i inifile] {-I incpath} {-D symbol} {-S name=val} { [-s s1-[s2]] [-d defn] [-a addr] [-f field] } {-Y symfmt} {-y symfile[@bias]} [-C dx] [-w width] [-h height] [-c colscheme] [-p] [-r] [-v viewflags] [-g] [--] { binfile[@addr] | mx![args[@addr]] } flags: -i inifile override default initialisation file -I incpath append include path(s) for use by inifile -D symbol pre-$define symbol(s) for use by inifile -S name=val set constant name to be value -s s1-s2 set sessions following -d, -a, -f and -p apply to -d defn initial definition to use (default: main) -a addr initial address to use (default: 0) -f field field name within defn (list link, or array to expand) -Y symfmt symbol table format -y symfile@bias input symbol table file(s) (with optional biases) -C dx code disassembler extension -w width set screen width -h height set screen height -c colscheme set colour scheme (0 to 3, default: 0) -p print data to stdout, non-interactive -r restricted mode, no shelling out allowed -v viewflags combinations of A,O,L,I,a,e,b,o,d,h,j,+,- -g perform seg:off->physical mapping on all addresses -A size address space size (8 to 64, default: 32) binfile@addr binary file(s) (with optional address, default: 0) mx!args@addr memory extension with arguments (and optional address)
The -i
flag overrides the default initialisation file.
The -I
flag affects the operation of the
include command in the
initialisation file.
The -D
flag allows the definition of symbols which may be
accessed via the $ifdef
and similar directives in the
initialisation file.
The -S
flag allows the definition of a
named constant for use in numeric expressions in the
configuration file.
The editor has 10 editing sessions, and the -d
, -a
and
-f
options affect all of these (by default), unless the
-s
option is used to specify which session(s) are affected.
By default only session 0 is shown by -p
, but this too can be
changed with the -s
option.
The initial structure definition and address to decode on all/each session
may be overridden with the -d
and -a
flags.
Normally BE starts by looking up the definition of a 'main' definition,
and decoding the data at address 0 as such.
The address expression is allowed to refer to symbols in symbol tables,
as it is evaluated after the symbol tables have been loaded.
All the other numeric command line arguments are evaluated before any
symbol table loading takes place, and so can't refer to symbols.
If the -f
flag is used, it must identify a field within
the specified structure.
If the field is a pointer to a structure of the same type, BE will initially
display a linked list of structures, rather than just one structure.
Otherwise, the field is assumed to be an array of fields, and an
element list is displayed instead.
Symbol table(s) may be specified using the -y
flag.
Symbol files are assumed to be the format generated by the ARM linker.
However, the -Y
flag can be used to tell BE that symbols in
other formats follow.
Multiple symbol files in differing formats may be specified, as in :-
be -Y aix_nm -y syms.nm -Y arm -y syms.sym ...
See the section on symbol table formats for a description of the supported file formats.
If a bias is specified, then it is added to each symbol value in the file. This is handy when a symbol table contains relative values, rather than absolute addresses.
The -C dx
option may be used to extend BE by the use of
a disassembler extension.
This is a peice of code with a well defined interface, which BE uses to
disassemble data annotated as code.
The -w
and -h
arguments can be used to try
to override the current screen size.
This doesn't work on UNIX or NetWare, but does on 32 bit DOS, 32 bit OS/2 and
Windows.
The -c
argument allows you to choose from a small selection
of colour schemes.
The -p
flag causes BE to be invoked in a non-interactive
manner.
It decodes the address given, as a structure of the type specified,
and writes the result to the screen (as stdout).
Multiple structure dumps can be obtained by judicious use of the
-s
flag above.
The -r
flag prevents a user of BE from shelling out a
nested operating system command.
The -v
flag allows you to state that addresses,
offsets, lengths and array indices are to be displayed next to the data
on display initially (note that -vI
turns off indices).
You can also turn on the symbolic display of addresses.
In addition, you can specify the display mode of indices, one of
binary, octal, decimal or hex.
The +
and -
keys affect the initial level of detail
of display, and only has effect when used with the -f
flag.
This is particularly useful when combined with the -p
flag.
Unfortunately, view flags are global, rather than per-session.
The -g
argument is the 'segmented mode' switch.
When enabled, BE translates all addresses prior to using them
to fetch or store data.
ie: address 0xSSSSOOOO
is mapped to SSSS*16+OOOO
.
This is obviously intended for debugging dumps from embedded Intel processor
dumps, and anyone with a sensible file format can ignore this flag.
Normally BE operates in a 32 bit address space.
-A
can be used to change this.
For example, you could select a 24 bit address space.
In this case BE would ignore bit 24 and above when addressing data, and
would only show the bottom 24 bits when displaying addresses.
Support for >32 bit address spaces is currently only available on
certain operating systems versions of BE.
Multiple input binary files can be specified, and they should be loaded at non-overlapping address ranges.
BE supports the --
to end options and thus allow filenames
given afterwards to have names starting with a -
.
Each binary file provides data for a part of the memory space which BE can view or edit. Therefore each binary file may be described as a memory section.
Alternatively a memory section may be specified as mx!args
.
This instructs BE to load a memory extension,
and to access the data indentified by the arguments via the memory
extension.
This feature allows BE to be extended to be able to edit non-file data
directly, such as sectors on a disk.
Typical invokations of BE might be :-
be picture.bmp to edit a file, which is loaded into the BE memory space at 0 onwards.
be -y gizmo.sym gizmo.rom gizmo.ram@0x8000 to edit dumps from the RAM and ROM of a coprocessor. where the ROM starts at 0, and the RAM at 0x8000. gizmo.sym is the symbols for the microcode the coprocessor was running.
be -Y map -y ucode.map -i ucode.ini -g -C i86 coproc!io=0x400,mem=0xc0000 to live edit a running coprocessor. ucode.map has the symbols for the microcode the coprocessor is running. ucode.ini is a custom initialisation file. BEcoproc.DLL provides BE with access to coprocessor memory. io=0x400,mem=0xc0000 tells BEcoproc.DLL how to find the coprocessor. BEi86.DLL allows BE to disassemble any code in the data.
be -d HEADER -a 512 -p -vA file.dat display the header at 512 bytes into file.dat. decoded data is to be written to stdout, BE is not interactive. addresses are to be displayed next to the data.
be -s 1 -d STRUCT1 -a 0x1000 \ -s 2 -d STRUCT2 -a 0x2000 \ -s 3 -d STRUCT3 -a 0x3000 \ -s 1-3 -p \ filename.dat to pick three structures at three addresses, and to have BE decode and display all three to stdout.
One of the first things BE does is to find and load the initialisation file, and this tells BE the layout of various file formats and the structures within them.
Under 32 bit OS/2, Windows and 32 bit DOS, BE finds the initialisation file
by searching along the path for the .EXE
file, and then looking
for a .INI
file with the same name.
BE for NetWare looks for a .INI
in the same directory as
the .NLM
file.
Under UNIX, BE looks for ~/.berc
, and failing that,
it looks along the path for be
and then appends .ini
.
If be
is renamed to xx
, then the files will be
~/.xxrc
and xx.ini
.
BE can be made to look elsewhere using the -i
command line
option.
Also, $define
, $undef
, $ifdef
,
$ifndef
,$else
, $endif
and
$error
are supported,
as a form of a pre-processing/conditional processing step.
The -D
command line option may be used to
pre-$define
such conditional processing symbols.
It should be noted that $define
, $undef
,
$ifdef
and $ifndef
can all be given a list of
symbols (rather that just one).
This causes $define
or $undef
to define or undefine
all the symbols in the list.
It causes $ifdef
or $ifndef
to check that all the
symbols in the list are defined or that they are all undefined.
De-Morgans law can be used to acheive OR combinations :-
$ifndef A B C // None of A, B or C is $defined $else // This part is therefore if A, B or C is $defined $endif
If BE is running on 32 bit OS/2, then OS2
is
pre-$define
d.
If running on Windows, then WIN32
is
pre-$define
d.
If running on NetWare, then NETWARE
is
pre-$define
d.
If running on a type of UNIX, then UNIX
is
pre-$define
d.
If running specifically on AIX, then AIX
is
pre-$define
d.
If running specifically on Linux, then LINUX
is
pre-$define
d.
If running specifically on HP-UX, then HP
is
pre-$define
d.
If running specifically on SunOS, then SUN
is
pre-$define
d.
If running on 32 bit DOS, then DOS
is
pre-$define
d.
Either BE
or LE
will be
pre-$define
d, depending upon whether BE is running on a
big-endian or little-endian machine.
If BE supports 64 bit numbers and a 64 bit address space, then
BE64
is pre-$define
d.
These $define
s allow you to write initialisation files with
sensible defaults, relevant for the current environment.
An
include directive is supported, and included files
will be searched for by
looking in the current directory,
then along an internal include path,
along the BEINCLUDE
environment variable,
and finally along the PATH
environment variable.
The internal include path is usually empty, but may be appended to by the
use of the -I
command line option.
By the time the initialisation file is processed, any symbol files specified on the command line will have been loaded, along with any data files. This means that initialisation files may make reference to symbols and also to the data itself.
The initialisation file contains commands to set the default data display attributes, set constant, structure definitions, alignment declarations and include statements.
As BE processes the initialisation file, it generates warnings (such as undefined symbol table symbol), and error messages into an internal buffer. If there are no errors, then this buffer is discarded. If there are errors, then all the warnings and errors are listed, and BE aborts.
This initialisation file may contain C or C++ style comments.
Numbers may be given in binary, octal, decimal or hex, as the following examples, all of which represent 13 decimal :-
0b1101, 0o15, 13, 0x0d
Numbers may also be given in character form. Multiple characters may be given to form a number, and this is quite handy because sometimes files/datastructures use magic numbers which are formed to out of characters so as to be eye-catching. More than 4 characters give undefined results. Characters may be quoted, similar to traditional C/C++ style :-
'a' = 0x61 'ab' = 0x6162 'abc' = 0x616263 'abcd' = 0x61626364 '\n' = 10 '\x34' = 0x34 always 2 hex digits after \x '\040' = 32 always 3 octal digits after \ (unlike C/C++) '\0' isn't legal, must be 3 octal digits '\000' isn't legal, 0 is the string terminator
Strings may be given in traditional C/C++ style too :-
"Hello World" "One line\nAnother line" "String with a tab\tin the middle" "String with funny character at the end\x9a" "String using octal \377 notation to get character 255 in it" "String with \000 string terminator in it" isn't legal "String which starts on one line \ and continues on another"
Strings can be no more than 250 characters long.
Note that all strings used in the BE initialisation file must be 'clean', in that they can only contain the regular ASCII characters within the range ' ' to '~' (ie: 32 to 126 inclusive). Given this, you may ask why BE allows the escaped character notation: Well, strings can also be typed in by the user when interactively editing data, and it is very useful to allow a way to type non-ASCII characters.
When displaying strings, BE typically makes best possible use of the terminal in use, and may show the glyphs for unusual non-ASCII characters if it can. However, non-displayable characters are simply shown as '.'s.
Identifiers start with an underscore or a letter, and continue with more underscores, letters or digits. Some identifiers are actually reserved words in the BE initialisation file language.
The fact the NULs aren't allowed within BE strings is a rather irritating side effect of the way BE is implemented using traditional C/C++ NUL terminated strings. Perhaps one day I'll fix this.
Wherever the initialisation file calls for a number, the following variants may be used :-
addr "symbolinthesymboltable"
nosym
, or if that isn't defined, its
~0
.
sizeof DEFN
offsetof DEFN "fieldname"
valof "fieldname"
map MAPNAME "mapletstring"
valof "identifier"
.
set
command).
addr "identifier"
.
map M "identifier"
for all M
.
This step isn't particularly quick, as there can be a very large
number of mappings
addr
, valof
or map
to do this.
Also, using the explicit forms is more efficient, as BE needn't look in
all the possible places, as it does above.
The reason BE looks in all the places when just the identifier is given
is to reduce typing when using BE interactively.
`
identifier expression `
.
(dot)
@
command, it is
the current address.
When prompted for a delta value, dot is the current delta.
When using the =
to change a numeric value, dot is
the current value.
When specifying a value in a maplet, dot means the previous value
plus one, or zero if this the first maplet.
When specifying a mask in a maplet, dot means the maplet value.
[
type attributes ,
address ,
defaultvalue ]
n32
), to take into account the given attributes
(eg: signed be
), from the given address.
If nothing can be fetched from that address, then the result is
the defaultvalue.
If the defaultvalue is omitted, then the expression
cannot be evaluated.
[[
buff ,
e0 ,
e1 ,
e2 ,
defaultvalue ]]
for ( a = e0; a != e1; a += e2 ) match(buff,a)
.
Be careful using this, there is no way to abort the scan.
If the search doesn't locate the pattern, then the result is the
defaultvalue, unless it has been omitted, in which case the
expression cannot be evaluted.
strlen ADDR
strlen
function.
Note that the commas in the [
and [[
expressions
can be omitted, although this is not recommended.
Consider the expression [n32 0xf000 -5]
:
this looks like it means 'the 32 bit word from address 0xf000, or -5 if it
can't be fetched', but it actually means 'the 32 bit word at 0xeffb, with
no default if it can't be fetched'.
Writing [n32 0xf000 (-5)]
would fix this problem, but using
commas makes the intention explicit.
Semicolons (not just commas) are also valid syntax for seperating expressions.
It should be noted that when using the offsetof
or
map
keywords, leading and trailing space is not significant
in the "mapletstring"
or "fieldname"
.
Expressions may be constructed by use of brackets and also the following operators, with usual C language meanings. Operators grouped together have equal precedence. Higher precedence operators listed first :-
+ , - , ~ , !
| unary plus, unary minus, complement, not |
* , / , %
| multiply, divide, modulo |
+ , -
| add (plus), subtract (minus) |
<< , >> , >>>
| shift left, shift right (signed), shift right (unsigned) [Note 1] |
> , < ,
>= , <=
| greater than, less than, greater than or equal, less than or equal |
== , !=
| equal, not equal |
&
| bitwise AND |
^
| bitwise exclusive OR |
|
| bitwise inclusive OR |
&&
| logical AND |
^^
| logical exclusive OR [Note 2] |
||
| logical inclusive OR |
? :
| conditional expression |
Note 1: The >>
is a signed shift right, and
>>>
is the unsigned shift right (much like Java).
This distinction is necessary as all numbers in BE expressions are unsigned.
(This affects affects the outcome of expressions like -2/2
which is 0xfffffffffffffffe/2
which is 0x7fffffffffffffff
,
rather than the -1
you might expect).
Note 2: C/C++ does not have a logical exclusive OR, but BE does for symmetry.
Note also that the operator precedence now matches that of C++. Versions of BE prior to 1/7/99 had incorrect precedence for the shift operators. Luckily people tend to use brackets with these anyway.
Such numeric expressions can also be used when BE prompts for a number, not just in the initialisation file.
Some example expressions :-
addr "tablebase" + 4 * sizeof RGB -- symbol tablebase plus four times the size of the RGB definition [ n32 be , 0x70200+0x44 ] + 27 -- fetch big-endian 32 bit word from 0x70244, then add 27 [ n16 be bits 11:4 , 0x1000 ] -- get big-endian 16 bit word from 0x1000, extract bits 11 to 4 inclusive -- if the word was 0x1234, this would give a result of 0x23 [[ "SIGNATURE" , 0x1000 , 0x2000 , 4 ]] -- locate "SIGNATURE" between 0x1000 and 0x2000, 4 byte aligned
BE maintains a smallish list of global numeric constants. eg:
set num_elements 14+5
Avoid using constant names which clash with other identifiers, such as map or structure definition names. Also, avoid clashing with reserved words in the initialisation file language.
The constant can be assigned any numeric expression, including referencing other constants.
This feature allows initialisation files with the following technique for managing multiple configurations of data :-
$ifdef BIG_DATA_FILE set n_entries 100 $else set n_entries 10 $endif def DATA_RECORD { n_entries n32 buf 100 asc "names" n_entries n32 dec "salaries" }
Attempting to set
a constant which is already defined
produces an error.
The unset
command can be used to undefine a previous value.
It is not an error to unset
a constant which is not previously
set to anything :-
set elems 100 unset elems set elems 200
The -S
command line flag can be used to set a constant
before the initialisation file is processed.
Because the constant is set before the initialisation file is
processed, the expression the constant is set to can't refer to things
within the initialisation file.
Assuming the initialisation file debinfo.ini
uses a constant
called tabsize
:-
be -i debinfo.ini -S tabsize=10 debug.dat is fine be -i debinfo.ini -S tabsize=10+4 debug.dat is fine be -i debinfo.ini -S "tabsize=sizeof STRUCT" debug.dat is illegal
The value of a constant may be interactively set, changed or unset by the user using the $ keystroke.
The special constant nosym
if set, is returned when
the addr "symbol"
syntax is used in an expression, to try
to determine the numeric value of a symbol which isn't defined.
The usual use of this is in defining a value which is miles away from
any sensible value.
The special constant disp_limit
if set, affects the way
BE displays address values in symbol+offset
form.
If the offset (ie: the displacement) from the symbol exceeds the
disp_limit
value, then the address isn't displayed in
symbol+offset
form.
When the program starts parsing the initialisation file, the default data
display attributes are
le unsigned hex nomul abs nonull nocode nolj noglue noseg nozterm
.
To change this default setting, just include one or more of the following keywords in the file :-
be
- read multibyte values from memory in a big-endian fashion.
le
- read multibyte values from memory in a little-endian fashion.
signed
- when fetching numeric values sign extend them,
and when displaying numerically show '+signedvalue' or '-signedvalue'.
unsigned
- when fetching numeric values zero extend them,
and when displaying numerically show 'unsignedvalue'.
asc
- set display mode to ASCII.
ebc
- set display mode to EBCDIC.
bin
- set display mode to binary.
oct
- set display mode to octal.
dec
- set display mode to decimal.
hex
- set display mode to hex.
time
- set display mode to time (decode seconds since epoch).
sym
- set display mode to symbolic.
ie: look up the value in the symbol table, and if found, display
symbol+hexoffset, else display value in hex.
null
- allow following of 0 pointers.
nonull
- disallow following of 0 pointers.
seg
- cope with 16:16 segmented pointers.
noseg
- pointers are not segmented.
mul
- pointer values should be multiplied by the size of
the data type being pointed to.
nomul
- pointer values are given in regular byte addresses.
abs
- pointer values are absolute.
rel
- pointer values are to be considered relative to their
own addresses.
code
- specify that numeric value is actually a code address.
nocode
- specify that numeric value is not a code address.
lj
- perform ARM specific
long-jump interpretation of code addresses.
nolj
- don't do long-jump interpretation.
glue
- perform PowerPC specific
pointer glue interpretation of code addresses.
noglue
- don't do pointer glue interpretation.
zterm
- stop displaying buf
data when a nul
terminator is reached.
nozterm
- display data beyond nul terminators.
Note that when multibyte numeric values are displayed in ASCII or EBCDIC, the ordering of the characters produced works like this :-
Type | Sample value | Displays in ASCII |
---|---|---|
n8 | 0x41 | 'A' |
n16 | 0x4142 | 'AB' |
n24 | 0x414243 | 'ABC' |
n32 | 0x41424344 | 'ABCD' |
n40 | 0x4142434445 | 'ABCDE' |
n48 | 0x414243444546 | 'ABCDEF' |
n56 | 0x41424344454647 | 'ABCDEFG' |
n64 | 0x4142434445464648 | 'ABCDEFGH' |
Support for >32 bit numbers is only present in certain operating systems versions of BE.
This can have the side effect that when people design eye-catcher values
as numbers to store into memory, they may appear reversed when displayed.
In such cases, it might make more sense to decode the field as a N byte
ASCII buffer, rather than a number.
Alternatively, use the big-endian designation, as in n32 be
etc..
Mappings are BE's equivelent to C enumerated types and bitfield support.
These define a mapping between symbolic names and numeric values. A typical mapping definition in the initialisation file might be :-
map compression_type { "uncompressed" 1 "huffman" 2 "lzw" 3 }
If the numeric value on display matches the value given, then it can be converted to the textual description.
Mappings in which the values are one bigger than the previous one are
quite common.
So BE gives a shorthand, where .
in the value means 0 for the
first maplet given after the open curly brace, and one plus the previous value
otherwise :-
map ordinals { "zero" . "one" . "two" . }
map larger_ordinals { "four" 4 "five" . "six" . }
Bitfields may be acheived in the following fashion :-
map pending_events { "reconfiguration" 0x0001 : 0x0001 "flush_cache" 0x0002 : 0x0002 "restart_io" 0x0004 : 0x0004 }
The :
symbol introduces an additional mask.
The number to string conversion algorithm inside BE works like this :-
for each maplet in the map if ( value & maplet.mask ) == maplet.value then display the maplet.name if some unexplained bits left over then display the remaining value in hex
The case where the value and following mask are the same is
much more common than the case where they are not.
So BE provides a typing shortcut where .
in the mask
means 'the same as the value'.
So the above example can be written :-
map pending_events { "reconfiguration" 0x0001 : . "flush_cache" 0x0002 : . "restart_io" 0x0004 : . }
It is possible to have multiple field decodes from a single value :-
map twobitfields { "green" 0x0001 : 0x000f "blue" 0x0002 : 0x000f "red" 0x0003 : 0x000f "small" 0x0100 : 0x0f00 "large" 0x0200 : 0x0f00 }
The value 0x0243
would be converted to
red|large|0x40
.
It has been alluded to above, that when supplying numeric expressions,
the map
keyword may also be used.
In the following example, the expression evaluates to 0x0105
:-
map twobitfields "small" + 5
In fact, if there is no constant or symbol with the same name, you can use the following shorthand for the above example :-
small + 5
Even sophisticated mappings like the following will work as expected :-
map attribute_byte { "colour" 0x10 : 0xf0 "red" 0x13 : 0xff "green" 0x14 : 0xff "shape" 0x20 : 0xf0 "round" 0x23 : 0xff "square" 0x24 : 0xff }
In this example the meaning of the bottom 4 bits is dependent on the
value of the top 4 bits.
The top 4 bits encode whether the attribute is encoding information about
the colour or shape of something, and the bottom 4 bits encode which
colour or shape.
The value 0x23
is displayed as "shape|round"
.
Sometimes it can be convenient to add to the definition of a mapping.
This can be done via the add
keyword, as follows.
An example :-
map animals { "dog" 1 "cat" 2 } map animals { "giraffe" 3 } // Error, redefinition of map animals map animals add { "zebra" 4 } // Okay, extends map animals map birds add { "pelican" 5 } // Error, no map birds to extend
When displaying a maplet decoded value, the M key can be used to bring up a list of the maplets and whether they decode or not. Through this, the value can be edited.
You can use the suppress
keyword to prevent BE using a
maplet when converting a number to a string.
Not normally used, but can sometimes be handy to cut down screen clutter,
as a number is normally displayed in less space.
In the following example, 0xc3, bright flashing blue, is shown as
"blue|0xc0"
.
Maybe we are only interested in the colour :-
map obscure_mapping { "bright" suppress 0x80 : . "flash" suppress 0x40 : . "red" 0x01 : 0x3f "green" 0x02 : 0x3f "blue" 0x03 : 0x3f }
Much more common is to interactively suppress maplets from the M maplet list using the @S and @N keys.
Definitions are BEs equivelent to C structures and unions.
Definitions are a list of at OFFSET
clauses,
align ALIGNMENT
clauses and field definitions.
When the structure definition is processed, then the current-offset is
initialised to 0.
An at OFFSET
clause moves the current-offset to the specified
numeric value.
An align ALIGNMENT
clause moves the current-offset to be
the next integer multiple of the specified numeric value.
A
field definition defines a field which lives at the
current-offset into the structure.
After definition of the field, the current-offset is moved to the end of
the field, so that the next field will immediately follow it
(unless another at OFFSET
clause is used, or
a union
is being defined).
The size of the structure is the largest value that the current-offset
ever attains.
This is the value returned whenever sizeof DEFN
is used as a
number.
Duplicate definitions of the same named definition are not allowed.
A structure definition may have zero or more fields,
align ALIGNMENT
clauses and/or at OFFSET
clauses.
A structure definition may behave like a C struct
definition, in that each field follows on from the previous one in memory.
Or it may behave like a C union
definition, in that all fields
overlay each other in memory, and the total size is the size of the
largest field.
def A_STRUCTURE struct { n32 "first field, bytes 0 to 3" n32 "next field, bytes 4 to 7" // sizeof A_STRUCTURE is 8 }
def A_UNION union { n32 "first field, bytes 0 to 3" n16 "second field, bytes 0 to 1" // sizeof A_UNION is 4 }
The keyword struct
is unnecessary, and may be omitted.
These may be combined, like in the following :-
def MY_COMPLICATED_STRUCTURE { n32 "first field, occupying bytes 0 to 3" union { n32 "second field, occupying bytes 4 to 7" struct { n16 "the bottom 16 bits of the second field, occupying bytes 4 to 5" n8 "the upper middle byte, occupying byte 6" n8 "the top byte, occupying byte 7" } } }
The at OFFSET
clause also allows the same areas of a
structure to be displayed in more than one way, thus also allowing the
implementation of unions :-
def UNION_THE_HARD_WAY { n32 le "first value, bytes 0 to 3" at 0 n8 "the lower byte, byte 0" // sizeof UNION_THE_HARD_WAY is 4 }
Note: in the above style of example, you can't use the
offsetof
keyword to position a new field on top of
an earlier field, because whilst you are defining a structure definition,
it isn't actually fully defined yet, and so the offsetof
keyword will not be able to find it.
Each clause can be terminated or seperated with a ;
, although
normally this isn't necessary.
One example of where it is required is :-
def WONT_BEHAVE_AS_EXPECTED { n8 "first" align 4 // #1 +5 n16 "array" }
The lack of a ;
at #1 causes BE to interpret this as
align 9
, followed by a single n16
field.
Here are some examples of field definitions :-
n8 asc "initial" buf 20 "surname" n16 be unsigned dec "age" 3 pet "pet names" 3 n16 be unsigned dec "pet costs" 2 n32 le unsigned hex ptr person "2 pointers to parents" 2 n32 ptr person null "2 pointers, null legal" person "a person" n32 sym code "__main" 1024 n32 unsigned dec "memory as 32 bit words" 9 n16 map errorcodes "results" buf 100 asc zterm "a C style string" GENERIC_POINTER suppress "pointer" n32 ptr FRED add -. "link" n32 bits 31:28 "top 4 bits" n32 bits 27:0 "bottom 28 bits (of another word)" n32 sym code width 10 "function" n32 time "last_update_time"
Each example is of the form :-
optional-count type optional-attrs name
The field describes count data items of the specified type.
If count is not 1, then the field is initially displayed by just showing its
type (eg: 10 n32 le unsigned hex "numbers"
).
When you select the field, you are presented with an element list, with
count lines, from which you can select the element you are interested in.
The type of the data is one of
n8
, n16
, n24
, n32
,
n40
, n48
, n56
, n64
,
buf N
or DEFN
, where DEFN is the name of a
previously defined definition.
This type may be considered to be the way in which BE is told the size
of the data item concerned.
n8
, n16
, n24
, n32
n40
, n48
, n56
, n64
mean 8, 16, 24, 32, 40, 48, 56 or 64 bit numeric data item.
Support for >32 bit values is only present in certain operating systems versions of BE.
buf N
means a buffer of N bytes.
There is also a special expr E
type, which defines a
'field' whose value is the result of the expression E
.
The expression E
may be any expression and may even refer to
other fields in the definition.
The .
symbol evaluates to the address of the field.
Obviously you can't edit/change the value of an expression.
So the following sort of thing becomes possible :-
def RECTANGLE { n8 dec "width" n16 be dec "height" expr "width*height" dec "area" }
The field has the default data display attributes, unless data display attribute keywords (as defined above) are included in the field definition.
In addition to the data display attribute keywords given above is the
map MAP
attribute which means display the numeric field by
looking up a textual equivelent of the numeric value using the
mapping which must have previously been defined.
If the field is one of
n8
, n16
, n24
, n32
,
n40
, n48
, n56
or n64
,
the bits MS:LS
designation can be
used to say that only a subset of the bits fetched are to be displayed.
Also, if you edit the field, only the subset of bits are changed.
BE does a read-modify-write of the numeric field to acheive this.
Despite only showing a subset of the bits, the field is still the same
'size', and the union
mechanism must be used to decode multiple
bit ranges in the same numeric field.
eg:
union { n16 be bits 15:12 bin "top 4 bits" n16 be bits 11: 0 hex "bottom 12 bits" }
The ptr DEFN
attribute says that the numeric value is in fact
a pointer to a definition of type DEFN.
DEFN need not be defined yet in the initialisation file.
The mul
/nomul
attribute described above
specifies whether to multiply the pointer value by the size of the data item
being pointed to.
You can use mult MULT
to multiply the pointer value by MULT
(therefore mul
is effectively the same as
mult sizeof DEFN
).
The null
/nonull
attribute described above specifies
whether this pointer may be followed if the numeric value is 0.
The keyword add BASE
may be used, and there is also a
align ALIGNMENT
keyword.
ALIGNMENT can only be 1, 2, 4, 8, 16, 32 or 64 in the current implementation.
Also, the rel
/abs
attribute described above specifies
whether to add the address of the pointer itself to the numeric value.
By using combinations of the pointer keywords, various effects may be
acheived :-
n32 ptr DEFN abs
n32 ptr DEFN add 0x40000 abs
n32 ptr DEFN mul add addr "table" abs
n32 ptr DEFN rel
n32 ptr DEFN add 8 rel
n8 ptr DEFN add 1 align 4 abs
Clearly the expr
mechanism described above can be used
to similar effect.
The procedure for following pointers is :-
nonull
and pointer is 0, then don't follow the pointer.
mul
, then multiply the pointer value by the size of
the item being pointed to.
mult MULT
, then multiply the pointer value by MULT.
add BASE
, then add BASE to the pointer value.
rel
, then add the address of the pointer itself.
seg
, then mangle pointer address to account for
the 16:16 segmented mode of x86 processors.
align ALIGNMENT
, then round up pointer to the
next multiple of ALIGNMENT.
The seg
keyword works by taking the top 16 bits of the
pointer value as the segment, the bottom as the offset, and producing
a new pointer value which is segment*16+offset.
This feature may be of use for decoding large memory model program dumps
which have been running on x86 processors running in real mode, or a 16:16
protected mode with a linear selector mapping.
This feature is not recommended - its much easier to use the
new -g
command line switch instead.
Anyone with a sensible file format to decode, or a memory dump taken from the
memory space of a processor of a sensible architecture, can ignore this
feature.
The keyword open
may be given and this has the effect
of increasing the level of detail that is initially displayed.
See the description of the level of detail of display feature later
in this document.
This feature has its problems (bugs), but can be used to ensure
that small arrays and short definitions are displayed in full without the
user having to manually increase the level of detail by hand.
The suppress field attribute may be given using
the suppress
keyword.
Suppressed fields are omitted from display when
showing a whole definition on one line (by expanding the level of display).
Suppressed fields are shown in round brackets when viewing a definition
with each field on a new line.
The tag
attribute may be given.
When this field is initially displayed, the line will initially be tagged.
Typically you might pre-tag one or two specific fields in a structure,
if the structure were large, and certain fields were more important than
others.
The width WIDTH
attribute may also be given.
By default, field widths are 0, which means don't pad or truncate fields
when they are displayed.
When set non-0, each field (or each individual field of an array) is padded
or truncated to be the given width.
If a field is truncated, a >
or <
symbol is
shown.
The width can be changed interactively by the user.
A validity check expression may be associated
with each field, via the valid V
syntax.
Any field which passes its validity check has
++
displayed next to it, and any which fails has --
displayed next to it.
When a whole definition is shown on one line (by expanding the level of
detail of display), those fields which fail their validity tests, are not
shown.
This provides a handy way of doing conditional decode of variant records.
map T_ { "T_SHORT" 1 "T_LONG" 2 } def VARIABLE { buf 20 asc zterm "name" n8 map T_ "type" union { n16 dec valid "type==T_SHORT" "value16" n32 dec valid "type==T_LONG" "value32" } }
Sometimes validity checks can get quite long, so remember that backslash at the end of a line causes 'line continuation', as in :-
n8 dec "discriminator" n16 dec valid "discriminator==1||\ discriminator==2||\ discriminator==3" "conditional_field"
Aside: Beware of using the C/C++ pre-processor (or other macro
pre-processors) on BE initialisation files - they may not handle things like
'line continuation' quite the same way as BE does.
eg: In the example, BE ignores the white space preceeding the word
discriminator
on the last two lines, but some (all?) C++
pre-processors include the white space in the final string!
Finally the name of the field must be given. You used to have to pad all field names of the same definition to be the same width with spaces, so that when displayed, everything lines up nice. But now BE does this automatically for you.
A typical structure definition might look like :-
def FROGLISTELEM { n32 ptr FROGLISTELEM "next_frog_in_list" buf 100 asc "name_of_this_frog" }
However, consider the case that BE is being used to edit a dump of
a processors memory space.
In this case we also wish to be able to see all the global variables, whose
addresses are determined by a symbol (rather than some fixed address).
So it is typical to take advantage of the fact that fields can be placed at
any offset into a structure (using at EXPR
), and that expressions
may refer to the symbol table (using addr "SYM"
).
You put such fields in a structure holding global variables, which would be
decoded from address 0.
You'd write something like :-
def GLOBAL_VARS { at addr "frog_list" n32 ptr FROG "frog_list" ... }
Now this can be a very common idiom, and you usually want the displayed field name to match the symbol name. So to avoid typing everything twice, BE provides a short-cut :-
def GLOBAL_VARS { n32 ptr FROG at "frog_list" ... }
When this feature was added to BE, and some real-world BE initialisation files were modified to take advantage of it, the files got 17% smaller.
Normally, when parsing a
structure definition, each field is positioned immediately
after the one before (unless the union
, align
,
or at
keywords are used).
When BE begins processing the initialisation file, it believes that all
n8
, n16
, n24
, n32
,
n40
, n48
, n56
and n64
variables should be aligned on a 1 byte boundary.
In other words, no special alignment is to be automatically performed.
This is radically different from the way the high level languages such as C lay out the fields within their structures and unions. These languages enforce constraints such as '32 bit integers are aligned on 4 byte boundaries'. This is usually done because certain processor architectures either can't access certain sizes of data from odd alignments, or are slower doing so. This can be accounted for by manually adding padding to structure definitions :-
def ALIGNED_USING_MANUAL_PADDING { n8 "fred" buf 3 "padding to align bill on a 4 byte boundary" n32 "bill" }
Or alternatively, the align
keyword could be used :-
def ALIGN_USING_align_KEYWORD { n8 "fred" align 4 n32 "bill" }
It is possible to tell BE to automatically align
n8
, n16
, n24
, n32
or nested definition fields on specific byte (offset) boundaries by
constructs such as the following (which corresponds to many 32 bit C
compilers) :-
align n16 2 align n32 4 align def 4 align { 4 align } 4 def ALIGNED_AUTOMATICALLY { n8 "fred" n32 "bill" }
The align {
directive specifies that nested
definitions must start on the indicated boundary.
The align }
directive specifies that structure sizes get
rounded up to a multiple of the alignment.
Clearly, this feature is more useful when BE is being used to probe memory spaces of running programs via an memory extension, or doing post-mortem examination of program memory dumps.
Most data file formats don't-need-to and/or don't-bother-to align their fields.
The initialisation file can contain the following, as long as it is outside of any other definition :-
include "anotherfile.ini"
Be sure to notice that this is a initialisation language command,
not a pre-processor directive like $ifdef
.
This is why it is not $include
.
There is also a tryinclude
variant, which tries to open the
file specified, but does not get upset if it can't :-
tryinclude "extrastuff.ini"
The following are reserved words, and so should be avoided as names of constants in the initialisation file :-
abs add addr align asc at be bin bits buf code dec def ebc expr glue hex include le lj map mul mult n8 n16 n24 n32 n40 n48 n56 n64 nocode noglue nolj nomul nonull noseg nozterm null oct offsetof open ptr rel seg set signed sizeof struct suppress sym tag time tryinclude union unset unsigned valid valof width zterm
Here is a real initialisation file, which is intended for viewing the master boot record written on sector 0 of PC disks :-
// // mbr.ini - BE initialisation file for decoding master boot records // // Under Linux, root can obtain the MBR via a command much like :- // // # dd if=/dev/sda of=mbr.dat bs=512 count=1 // // Then you'd invoke BE via :- // // % be -i mbr.ini mbr.dat // // The file assumes the drive from which the MBR was obtained has // 63 sectors per track and 255 heads. These assumptions are used in // computations of LBAs given CHS information. If the disk geometry is // actually different (as is likely for <8GB disks), you can override // the assumptions via a command line much like :- // // % be -Ssectors_per_track=32 -Sheads=127 -i mbr.ini mbr.dat // // Information obtained mainly from STORAGE.INF. //
set nspt `sectors_per_track 63` set nh `heads 255`
map BOOTINDIC { "Not Active" 0x80 : 0x80 "Active" 0x00 : 0x80 }
map PARTOWNER { "Unused" 0x00 "DOS, 12-bit FAT" 0x01 "XENIX System" 0x02 "XENIX User" 0x03 "DOS, 16-bit FAT" 0x04 "Extended" 0x05 "DOS, >32MB support, <=64KB Allocation unit" 0x06 "OS/2, >32MB partition support" 0x07 "Linux swap" 0x82 "Linux native" 0x83 // Note: lots missing for brevity of example }
def PARTCHS { at 1 n8 bits 7:6 hex width 3 suppress "CylinderHigh" at 2 n8 hex width 4 suppress "CylinderLow" expr "(CylinderHigh<<8)+CylinderLow" dec width 4 "Cylinder" at 0 n8 dec width 3 "Head" at 1 n8 bits 5:0 dec width 2 "Sector" expr "(Cylinder*nh+Head)*nspt+Sector-1" width 8 dec suppress "lba" }
def PART { n8 map BOOTINDIC "BootIndicator" PARTCHS open "PartitionStart" n8 map PARTOWNER "SystemIndicator" PARTCHS open "PartitionEnd" n32 dec width 8 "OffsetFromStartOfDiskInSectors" n32 dec width 8 "PartitionLengthInSectors" // By adding these two, you can work out the LBA // immediately following the partition expr "OffsetFromStartOfDiskInSectors+PartitionLengthInSectors" dec width 8 suppress "next_lba" expr "OffsetFromStartOfDiskInSectors*512" hex ptr MBR valid "SystemIndicator==Extended" suppress "extended" }
def MBR { buf 446 hex "MasterBootRecordProgram" 4 PART "PartitionTable" n16 be hex valid "Signature==0x55aa" "Signature" }
def main { MBR "mbr" }
In the above example quite a few of the BE features are demonstrated.
The setting of nspt
and nh
shows the expression
syntax meaning "value of symbol if defined, else default value".
These variables represent the geometry of the disk.
The map BOOTINC
shows using map
for decoding
bits (in this case just one bit).
This mapping decodes the per partition 'boot indicator' flag.
The map PARTOWNER
shows using map
for decoding
enumerations.
This mapping decodes the owner (or type) of the partition.
The def PARTCHS
, which shows the cylinder head and sector
of either the start or end of a partition, shows the following BE features :-
at OFFSET
construct.
bits MS:LS
and how to combine them together to make
a meaningful value using expr "EXPRESSION"
.
width WIDTH
so that the screen layout is nice.
suppress
to suppress some fields, so that only
the ones worthy of display are shown when the entire PARTCHS
structure is shown expanded on one line.
In the def PART
, which decodes an entire partition entry in
the partition table in the master boot record, we can see the use of the
earlier two mappings, and also the use of open
so that the
PARTCHS
structures are shown 'ready expanded'.
Because of the earlier use of suppress
in the
def PARTCHS
explained above, you'll just see the decoded cylinder
head and sector.
Of course when you select the PARTCHS, you'll see everything.
The use of expr "EXPRESSION"
for computing
next_lba
is really useful when you use it in conjunction with the
computed lba
in the def PARTCHS
above.
Basically, the LBA beyond the end of a partition should be the start of the
next partition, and so should be the
OffsetFromStartOfDiskInSectors
of another partition, and this
should tally with the LBA computed from the CHS in the
PartitionStart
.
In the def MBR
there is the use of a
valid "EXPRESSION"
validity check.
It says the Signature
field is valid, if it is
0x55aa
.
So if when you load BE to view an MBR, the Signature
is shown
with a --
indicator, you know the MBR isn't valid.
Clearly, with the above file, it is possible to do some rather low and dirty FDISK like things to MBR data, especially if you are using BE via a memory extension to directly access live disk sectors. Just because you can, it doesn't mean you should - be careful.
The supplied initialisation file contains enough definitions to enable you to examine the contents of many file formats.
Bitmap files supported include :-
Animation formats :-
Also, the following miscellaneous file formats :-
The definitions in the initialisation file are in no way complete, or intended to be a definitive statement of such files contents, but are merely intended to aid in the browsing of the contents of such files.
Limitations of BE make it awkward to decode certain data structures in some files, so the attitude taken is typically 'display as best you can', and where data may be of variable length 'display the first few bytes worth...'.
If you are simply interested in looking at some of the file raw,
you can use the DB
, DW
and DD
definitions that come supplied in the default initialisation file.
If you wanted to look at memory at 0x8000 as dwords, you could type :-
@ DD Enter 0x8000 Enter Enter
Here is a more formal specification of the BE initialisation file language. Actually, BE will accept variations on the following, but here we document the clearest/least-ambiguous use of the language. Where BE accepts variations on the following, typically it is in the ordering of independant attributes.
Some basics just before we start :-
<number> ::= a number in C/C++ style as in 0b1101, 0o15, 13, or 0x0d, or '\r' or similar <id> ::= a C/C++ style identifier <string> ::= a C/C++ style double quoted string which is clean (characters between 32 and 126 only) as in "Hello World" etc. <buffer> ::= a string or hexstring buffer as in "SIGNATURE" or @FF0022
Numeric expressions :-
<sep> ::= { ',' | ';' }
<expr13> ::= <number> | '+' <expr13> | '-' <expr13> | '~' <expr13> | '!' <expr13> | 'addr' <string> | 'sizeof' <id> | 'offsetof' <id> <string> | 'valof' <string> | 'map' <id> <string> | '(' <expr> ')' | <id> | '`' <id> <expr> '`' | '.' | '[' <n_value> <sep> <expr> [ <sep> <expr> ] ']' | '[[' <buffer> >sep> <expr> <sep> <expr> <sep> <expr> [ >sep> <expr> ] ']]' | 'strlen' <expr>
<expr12> ::= <expr13> { ( '*' | '/' | '%' ) <expr13> } <expr11> ::= <expr12> { ( '+' | '-' ) <expr12> } <expr10> ::= <expr11> { ( '<<' | '>>' | '>>>' ) <expr11> } <expr9> ::= <expr10> { ( '>' | '<' | '>=' | '<=' ) <expr10> } <expr8> ::= <expr9> { ( '==' | '!=' ) <expr9> } <expr7> ::= <expr8> { '&' <expr8> } <expr6> ::= <expr7> { '^' <expr7> } <expr5> ::= <expr6> { '|' <expr6> } <expr4> ::= <expr5> { '&&' <expr5> } <expr3> ::= <expr4> { '^^' <expr4> } <expr2> ::= <expr3> { '||' <expr3> } <expr> ::= <expr2> { '?' <expr2> ':' <expr2> }
Sometimes in expressions .
(dot) is allowed.
It usually refers to some default amount.
Other times it isn't allowed.
A maplet is mapping from a number to a string to display, and
a map is zero or more maplets :-
Using .
in the maplet value (first expression) means 0 or previous
value plus 1, and in the maplet mask (optional second expression) it means the
same as the value.
<maplet> ::= <string> [ 'suppress' ] <expr> [ ':' <expr> ] <map> ::= 'map' <id> [ 'add' ] '{' { <maplet> } '}'
Numeric fields. Where the value comes from, how to display it, how to use it as a pointer (if it is one), and putting it all together :-
<n_value> ::= ( 'n8' | 'n16' | 'n24' | 'n32' | 'n40' | 'n48' | 'n56' | 'n64' ) [ 'le' | 'be' ] [ 'bits' <expr> ':' <expr> ] [ 'signed' | 'unsigned' ] <expr_value> ::= 'expr' <string> <code_attrs> ::= [ 'lj' | 'nolj' | 'glue' | 'noglue' ] <numeric_attrs> ::= [ 'map' <id> ] [ 'asc' | 'ebc' | 'bin' | 'oct' | 'dec' | 'hex' | 'sym' | 'time' ] [ 'code' <code_attrs> | 'nocode' ] <pointer_attrs> ::= [ 'ptr' <id> [ 'null' | 'nonull' ] [ 'rel' | 'abs' ] [ 'mul' | 'nomul' ] [ 'mult' <expr> ] [ 'add' <expr> ] [ 'align' <expr> ] [ 'seg' | 'noseg' ] ] <numeric_field> ::= ( <n_value> | <expr_value> ) <numeric_attrs> <pointer_attrs>
The expr
string is itself a numeric expression.
You'll need to escape any quotes within it.
A buffer field.
How big, how to show the data, and whether to stop at a NUL byte.
Using .
in the buffer size expression gives the current offset
into the definition :-
<buffer_field> ::= 'buf' <expr> [ 'hex' | 'asc' | 'ebc' ] [ 'zterm' | 'nozterm' ]
A field may name a nested definition :-
<def_field> ::= <id>
All fields share a set of general attributes, and have a name, so a complete field specification looks like :-
<field> ::= ( <numeric_field> | <buffer_field> | <def_field> ) { 'open' } [ 'valid' <string> ] [ 'width' <expr> ] [ 'suppress' ] [ 'tag' ] [ 'at' ] <string>
The valid
string is itself a numeric expression.
Fields are just one type of item which can be found within a definition.
Offset within the definition, alignment and nested definitions.
Items in an itemlist follow one another (as in C/C++ structs), or overlay
each other (as in C/C++ unions).
Using .
in the at expression gives the current offset into the
definition.
So definitions are specified as :-
<item> ::= 'at' <expr> | 'align' <expr> | <itemlist> | <field> | ';' <itemlist> ::= [ 'struct' | 'union' ] '{' { <item> } '}' <def> ::= 'def' <id> <itemlist>
File includes are specified :-
<include> := ( 'include' | 'tryinclude' ) <string>
The default attributes used, if not fully specified in the fields, can be specified globally :-
<default> := 'asc' | 'ebc' | 'bin' | 'oct' | 'dec' | 'hex' | 'sym' | 'time' | 'signed' | 'unsigned' | 'be' | 'le' | 'rel' | 'abs' | 'mul' | 'nomul' | 'seg' | 'noseg' | 'null' | 'nonull' | 'code' | 'nocode' | 'lj' | 'nolj' | 'glue' | 'noglue' | 'zterm' | 'nozterm' | ( 'align' ( 'n8' | 'n16' | 'n24' | 'n32' | 'n40' | 'n48' | 'n56' | 'n64' | 'def' | '{' | '}' ) <expr> )
Set and unsetting :-
<set> ::= 'set' <id> <expr> <unset> ::= 'unset' <id>
So the total language is :-
<be> ::= <map> | <def> | <include> | <default> | <set> | <unset>
BE displays most of the non-obvious keys you may press on the 2nd line of its status area, at the top of the screen.
BE works by presenting lists to the user. These can be lists of data fields, lists of array elements etc.. A user action can result in a new list being displayed on top of the previous one. Effectively, there is a 'stack' of lists, where you always get to see the topmost one. The level of nesting is always on display at the top right hand corner of the screen.
Although not displayed, the arrow keys, such as Up, Down, PgUp, PgDn, Home, End, Left and Right all work in the obvious ways, traversing the list on display. The Wordstar 'cursor diamond' keys ^E, ^X, ^R, ^C, ^W, ^Z, ^S and ^D also work.
As you move around the current list, your line number and total number
of lines in the list are shown on the top right of the screen in the form
line/totallines
.
The user can discard the current list, and go back to the previous one by pressing Esc.
q or @X (ie: Alt+X) exits the program. If you have made any changes, you will be prompted as to whether BE should write them out to disk. @W can write out any unsaved changes.
p allows you to 'print' the list on display to a file.
You can specify the filename, and whether to append to or overwrite any
existing file of that name.
^P is a short hand way of saying append to the same file as
last time.
If you haven't specified a file yet with p, the default is
be.log
.
Non-printable (but displayable) characters get converted to '.' dots.
f or / or F9 allows you to do a find over the list on display. This only searches as much as the user could see if he were to manually page up and down through the list. The find command is case sensitive. n or F10 can be used to repeat the last find. If a find is taking a long time, it may be interrupted using Ctrl+Break on OS/2, Windows or DOS. Elsewhere, the Esc key may be used. The \ key will reverse the direction of the find, ready for when you next use the 'repeat the last find' function.
i allows you to generate a new list, which only has lines which include a pattern you specify. This new list pops-up on top of the current one. For example, if you have an array of trace-point events, you can easily generate a list of just trace-points from one module. Similarly, x allows you generate a display which excludes lines which match the pattern.
S can be used to generate a new list which is the same as the
current list, except the lines are sorted.
You are prompted for a 'sort after' pattern, and as to whether the result is
to be sorted in ascending or descending order.
You are also prompted whether to do textual or numeric comparison.
Anything on each line, upto and including the 'sort after' pattern is ignored
for the purpose of the sort.
With textual comparison, the strings are compared, with numeric comparison,
the strings are expected to start with a decimal, or 0x
preceeded
hex value, and these are compared.
N takes a textual snapshot of the current list. As the result is just text, you'll find that all you can do is view the data. This feature can be useful if you are using BE to view changing live data.
The find, include and exclude commands normally do a straight case sensitive
textual comparison.
The editor can be toggled in and out of Extended Regular Expression mode (as in
UNIX egrep), using the @R key.
When set into this mode, future finds, includes and excludes all work with
extended regular expressions.
eg: include (fred|bill)[0-9]+
will include all lines with
'fred' or 'bill', followed by one or more digits.
Similarly, @I can be used to toggle in and out of case sensitive search mode.
The Extended Regular Expression mode case sensitivity mode also affects
the sort command.
The sort command and the use of Extended Regular Expression mode go
naturally hand in hand, because you often want to be able to sort upon
the Nth field of each line.
It is trivial to write an ERE like ,[^,]*,
which
matches the first pair of commas (so the sort can be done on the third field),
or 0x[0-9a-f]+
which matches the first hex number.
The Extended Regular Expression mode and case sensitivity mode also affects the 'power address slide' patterns, and tag/untag all matching commands, as explained later.
The r key causes a refresh. BE re-fetches all the data on display. The R key is a slightly more aggressive form of refresh. If a memory extension providing data to BE was caching data, this type of refresh causes it to drop its cache. Sometimes BE is used with an extension to watch live real-time data, and continual refresh is desired. By pressing the periodic update key, @U, you can put BE into a mode whereby it refreshes at regular intervals. The interval is user-selectable. You exit this mode using Ctrl+Break on OS/2, Windows or DOS. Elsewhere, Esc may be used.
Tags may be placed or removed within the list on display by pressing the @T key. You may quickly move backwards or forwards between tags by pressing ^Home or ^End. Tags appear as little 'T's on the right hand side of the line. Placing or removing tags in one session or list has no effect on any others.
T and U may be used to tag or untag all lines matching a given pattern or extended regular expression.
The ! key may be used to execute an operating system command.
This capability can be disabled by the -r
command line flag.
@V can be used to bring up a view of a regular text file. There is no text editing capability. As special cases, F1 trys to bring up the help file, and F2 trys to bring up the configuration file.
BE doesn't just maintain a single stack of lists. In fact it maintains 10 parallel stacks, or 'sessions'. You can jump between them using the @0, @1, ... @9 keys. This allows you to be looking at several places within your data at once, and to be able to easily hop between them. The current session number is the second from last number on display on the top right corner of the screen. It is initially 1.
@C copies the stack of lists from the previous session onto the current session. Typically you use this when you've found something interesting, and you'd like to leave the current session showing the interesting data, and yet you'd also like to continue investigations around that area.
Given there are 10 sessions, each with any amount of nesting, it can be easy to get lost, so the @K allows you to generate a summary of where you are in each session.
@Z may be used to pop off all the lists in the current session, and effectively reset the nesting level to 1.
@F1 to @F4 inclusive may be used to change the colour
scheme to scheme 0 to 3, as initially specified by the -c
command line argument, or as initially defaulting to 0.
The keys A,O,L,I toggle the display of addresses, offsets,
lengths and array indices.
@A, @E, @B, @O, @D and @H may be used to set the display
mode of the array indices to ASCII, EBCDIC, binary, octal, decimal or hex.
Also, @Y toggles the display of addresses between raw hex, and symbol
table entry and offset.
The @J command toggles the display of symbolic code addresses
which have the lj
attribute between the short and long forms.
By default, at startup, BE choses only to show array indices, the
array index mode is hex, addresses are not shown symbolic, and long
jumps are shown in their short form.
The -v
command line flag can also be used to change the
startup display flags.
The |
(pipe-bar) key toggles the display of pipe bars between
flags in a mapping.
This is typically only used when a mapping has been cleverly defined to do
something like RISC instruction set disassembly, to tidy up the display.
The &
(ampersand) key toggles the display of pointer
values.
Normally they are shown, but quite a bit of screen clutter can sometimes be
removed by not showing them.
Pressing @ will cause BE to prompt for a structure definition name, and then an address. It will then pop-up a new list, decoding the memory at the given address as if it were of the specified structure type. When being prompted for the definition name, you can actually type a definition name followed by a numeric expression, in order to display an array of that many elements, each of which is a definition of the given type.
The C allows you to disassemble from a given address,
assuming a disassembler extension has been supplied
to BE via the -C
command line argument.
D can be used to pass user-options through to the disassembler.
Initially, if a symbol table is supplied to BE, disassembly stops when the addresses symbols (as in symbol+offset) change. ie: BE stops disassembling more than one function. Although one compiled C function typically has one label, hand written assembler tends to have many labels within one function, so the Y key can toggle between stopping on label changes and ignoring them.
The @F key pops up a list of the memory sections BE is editing. There is one for each file (or memory extension invokation) currently being edited. Against each, BE says whether it has any unsaved changes.
The editor holds a list of 12 'address slide' patterns, and these may be displayed by pressing @M. These are used when the 'power address slide' feature is used. You can set one of the 12 patterns by using the ~F1 to ~F12 keys. To disable one, you specify a new pattern as a empty string.
The editor holds an 'address slide' delta value. Initially this delta value is 4, but it may be changed using the # key. When using #, dot '.' may be used in the numeric expression, and its current value is the current delta value. This delta value is used by the manual 'address slide' feature using the < and > keys, and also the 'power address slide' feature.
If you press ?, BE will prompt for an numeric expression, which it will then evaluate. You can then choose to see the result in binary, octal, decimal, hex or symbolic forms, signed or unsigned.
$ is similar, except it will prompt for a variable name first. It will set the variable to the result of evaluating the expression. If the variable is already set, its value is changed. If the expression is empty, the variable is unset.
When you use the ^L key, you are prompted for a count and a keystroke. BE presses the keystroke on the current line, and then steps down a line. It does this once for each of the count of lines you specified. The count value can be 0 or blank, meaning upto the end of the list on display. This keypress, step down and repeat loop, will stop if the keypress is not 'understood' by the line it is pressed on. This means that only keypresses which operate on a given line are sensible for using with ^L. It will also stop if the end of the list is reached.
^K toggles the keep-going-on-error flag. This flag is initially false, causing ^L to stop if a line doesn't understand the keystroke. However, when true, ^L simply advances to the next line.
@G can be used to go to the Nth line on display. 0 means the first line, a blank line number, or a very large number means the very last line.
Normally BE will only show at most 4096 lines of data, ie: elements of an array or elements of a linked-list. ^ can be used to change this number to anywhere in the range 256 to 65536. Use with caution, BE can get much slower when dealing with longer lists.
^U and ^V can be used to cycle the memory sections around one way or the other. When you have multiple files/memory sections covering the same address range, this controls which one a memory reference will hit first.
At any given time you may be displaying some data from some start address, as indicated on the title at the top of the screen.
The . key can be used to change the current address, and the , key can be used to add to the current address.
The editor provides a feature known as 'address sliding'.
You can use the ( and ) keys to step (slide) the address backwards or forwards by 1.
You can also use the < and > keys to step (slide) the address backwards or forwards by a particular delta (as setup by the # key, described above).
The 'power address slide' feature is the combination of regular 'address sliding' with a pattern match capability. You set up the power address slide patterns and then press [ or ] (for a backwards or forwards search). You then state whether one, all, or all-in-order of the patterns must match, and how to refresh the screen as the search proceeds. You're also prompted for an address to stop at. BE then slides through memory, checking to see whether the patterns can be matched with the screen, and if so it stops.
A 'power address slide' may be interrupted via Ctrl+Break (OS/2, Windows or DOS), or Esc (elsewhere).
There are a few main uses of address sliding :-
DD
definition in the default initialisation file.
You set the delta value to be a page worth of data, and then use
the < and > keys to page up and down.
The justification for the default delta of 4 is that many structures within processor memory spaces or within files are 4 byte aligned.
The @ command described earlier works a little better when you are viewing data, because a dot used in the numeric address expression is taken to mean the current address (as shown on the title).
Similarly, the C command described earlier works a little better when you are viewing data, because a dot used in the numeric address expression is taken to mean the current address (as shown on the title).
Often you may find yourself looking at a definition that is actually a member of a larger definition. If you know the offset of the smaller definiton in the larger definition, you can subtract this from the current address and display the larger parent definition. This can be awkward, so the @P key will pop-up a list of all possible parent definitions, with an entry for every time the smaller definition appears in another definition.
g/l is displayed if you are allowed to change the memory interpretation mode to big or little endian.
s/u is displayed if you are allowed to change the signed display mode to signed or unsigned.
A subset of the keys a/e/b/o/d/h/k/y/m may be displayed if you are allowed to change the viewing mode to ASCII, EBCDIC, binary, octal, decimal, hex, decode seconds since epoch, symbolic or via a mapping table.
z is displayed if you are allowed to toggle the 'stop displaying when a nul terminator is found' attribute.
The t will decode the current field as if it were raw ASCII text, and will break it up into lines upon CR, LF, CR-LF pair, or NUL boundarys. The new line-by-line list pops-up on top the current list.
If the datum is a code address (marked with the code
attribute in the initialisation file), then c can disassemble
the code at that address.
+/- is displayed to indicate that the level of detail of display may be increased or decreased. Level 0 means display the data type only. Level 1 means display the first level of data. Levels 2 and above mean display additional levels of detail.
Increasing the level of display can make BE open up an array,
and enumerate the elements.
eg: 3 n32
to [123,123,456]
.
Increasing the level of display can also make BE open up a
definition, and display the fields.
eg: VAR
to {"name",123}
.
This is capable of opening up the datastructure pointed to by a pointer, providing the pointer may be fetched and followed.
Some examples :-
level 0 (=type) | level 1 | level 2 | level 3 |
---|---|---|---|
n32 |
7 |
7 |
7 |
3 n32 |
3 n32 |
[8,9,10] |
[8,9,10] |
VAR |
VAR |
{"a",1} |
{"a",1} |
2 VAR |
2 VAR |
[VAR,VAR] |
[{"b",2},{"c",3}] |
n16 ptr VAR |
22->VAR |
22->{"d",4} |
22->{"d",4} |
2 n8 ptr VAR |
2 n8 ptr VAR |
[33->VAR,44->VAR] |
[33->{"e",5},44->{"f",6}] |
Enter is displayed if you can press enter to either show the contents of the sub-definition, or to follow a pointer and show the definition there. This results in a new list of fields or array elements being popped-up. The Esc key brings you back to where you are now.
There is a shorthand of the above @ command.
If you are on a numeric field, and you know this is an absolute pointer
to a structure definition, you can use the
follow pointer key *
.
BE will then prompt for the definition name.
This shortcut ignores any pointer information that may be deduce-able
from the value on display, so even if you are looking at a relative
pointer which is aligned, BE will decode a definition at an absolute
address.
Another handy command is P. If you press this when on a numeric field, it allows you set or change what datatype the value points to. This is great for when you've forgot to put something in the BE initialisation file.
The editor provides the @L key, which makes the job of following long linked lists especially easy. If you looking at the members of a definition, and are on a member which is in fact a pointer to the same type of definition, then you can use the @L (show list) key. You will be presented with the elements in the linked list (at least the first 4096 by default), and at the end the reason the link following ended. This reason can be that there are too many to show at once, 'can't fetch value', 'can't follow null pointer', or the list has 'looped back' to an element shown earlier. If your list is really long, you can always go to the last linked list element on display, select it, and then use the @L key again to get the next 4096 elements!
The = key may be used to edit the current field on display.
If the current field is a numeric value, then you can type a new expression, according to the rules for numbers and expressions used when parsing the initialisation file. Dot '.' evaluates to the fields current numeric value. Examples include :-
1 1+2 addr "symbol" sizeof RGBTRIPLE map FF_ "FF_Split" | 0x20
If the current field is displayed via a mapping table, then the M key can be used to bring up a list of the maplets, and whether each of them can be decoded from the numeric value. The current fields value can be edited from this new list. Esc quits the maplet list.
If the current field is a buffer, then either ASCII data or raw hex bytes may be supplied :-
"a string within quotes" @1234FF00
If the zterm
attribute is applicable to the current field,
then after the data is stored, a NUL terminator is appended.
The @S key toggles the suppress attribute of the current datum. This affects how the current structure shall be displayed, when displayed in short. The @N key unconditionally sets the suppress attribute of the current datum. Only non-suppressed fields are shown in the one line summary.
The v key can be used to disable (or re-enable) a fields validity check. Validity checks can act as a form of 'suppression' when viewing definitions 'one to a line'. This keystroke can help cancel that effect (if desired). The V key allows you to set/change the validity check on a field.
w can be used to set the field width. Normally fields are shown is as many characters as are necessary. This corresponds to a field width value of 0. When non-0, fields are padded or truncated to the indicated width.
Del and Ins can be used to copy and paste between the current datum and a memory clipboard or file. To use the memory clipboard, simply specify a blank filename when prompted. Only smallish blocks of data (<=4MB) can be copied or pasted. The amount of data transferred is always the minimum of the datum size, the clipboard size and 4MB.
The external edit key, E, works by prompting you for an
editor command.
It then saves away the current datum into a temporary file and invokes
the editor on it.
Afterwards, the file contents are re-read.
At most 4MB can be processed in this way.
This might be useful if a file contained a chunk of free-flow text, and
you wished to perform some complicated editing on it, involving inserting
and deleting - you could externally edit that chunk using a text editor.
Or, sometimes when editing binary data, you might like to see it in a
typical hex dump and edit raw hex - you can externally edit with a
normal hex editor.
This command doesn't work if BE is running in restricted mode,
ie: has been invoked with the -r
command line argument.
Z will zero the current datum. Only datums of 4MB or smaller can be zeroed.
Each possible maplet in the mapping is displayed in the list. Each maplet has a mask and value, and the maplet is deemed to match if :-
value & maplet.mask = maplet.value
In this case a 1
is displayed next to it, otherwise
a 0
is shown.
If you press 0 then the value is anded with the complement of the mask.
If you press 1 then the value is anded with the complement of the mask, and then the value is or-ed in.
Although this may seem strange, the net effect is that when maps are being used for enumerations, 1 will change the value from whatever it was before to the new desired value.
When the mapping is used for decoding bitfields, 1 will turn on a bit and 0 will turn it off.
Examples of enumeration and bitfield style mappings :-
map ENUMERATION map BITFIELD { { "first value" 1 "lowest bit" 0x01 : 0x01 "second value" 2 "next bit" 0x02 : 0x02 "third value" 3 "high bit" 0x80 : 0x80 } }
The @S and @N keys toggle or set the maplet suppress attribute. Suppressed maplets are ignored when converting numbers to textual display.
If the current line of code references another routine or code code address, c can be used to pop up another list of the referenced routine.
Similarly, if data is referenced, and the address is easily determinable by the disassembler, the * can be used to follow a pointer and display a structure at that address.
W can be used to write back any unsaved changes on the current memory section. This isn't normally necessary, as when you leave BE using q or @X, you are prompted as to whether you wish to save any unsaved changes on a memory section by memory section basis.
o can be used to pass an user supplied option string to the memory extension peice of code providing the memory section. The memory extension is given the memory section instance and the option string. It can parse the option string in any way it sees fit. If there is a syntax error, or other problem, it can fail the options command with an error message to say why. If a memory section is provided from a file, this command will fail (files have no options). This user-exit mechanism might be used to allow you to tell a memory extension to change how much caching it can do.
These are shown in the list brought up by the @M key, as described earlier.
It is a list of 12 entries, each of which may be disabled, a pattern or an Extended Regular Expression.
You can set one of the entries using the = key. This is the same as using ~F1 to ~F12.
Many of the keystrokes listed above were chosen so as to match the default key bindings of Andys Source Code Folding Editor (AE).
Although OS/2, Windows, NetWare, AIX, Linux and DOS machines are able to
support Alt keys, not all UNIXes are.
In fact Alt key support for UNIX can vary depending upon terminal types.
Therefore UNIX versions of BE provides a 'feature' whereby Esc
quickly followed by a key is equivelent to pressing Alt and the key together.
As BE is often used for viewing memory dumps from embedded programs, support for symbol tables is highly desirable. Although BE technically need only support one format, it actually supports a few of the more commonly used formats to avoid a proliferation of symbol file conversion programs.
The arm
symbol format is the default.
Each non-blank line in the symbol file has the symbol name,
followed by a number of spaces, followed by the address specified
in hex (without an 0x
prefix).
Additional information is sometimes present on the end of the line
(particularly if overlays are used), but this is ignored.
On a Linux computer, the 'proc' filesystem provides a special file called
/proc/ksyms
.
Each line of this file has an address in hex (without an 0x
prefix), followed by a space, followed by the symbol name.
This is the ksyms
symbol table format.
eg:
be -Y ksyms -y /proc/ksyms kernel.dat -- assuming kernel.dat is a dump of the kernels memory
Note that sometimes the address and symbol are followed by more information. This additional information is ignored.
Linux has a symbol versioning convention whereby it can append a suffix to each symbol. The suffix varies depending upon the type of Linux kernel in use, ie: whether it is SMP or not, or compiled in '2GB mode' or not. BE has the following symbol formats, which strip the indicated suffix of each symbol as it is read :-
BE symbol format | what suffix is stripped |
---|---|
_ksyms_R | _Rhhhhhhhh
|
_ksyms_Rsmp_ | _Rsmp_hhhhhhhh
|
_ksyms_R2gig_ | _R2gig_hhhhhhhh
|
_ksyms_Rsmp2gig_ | _Rsmp2gig_hhhhhhhh
|
hhhhhhhh
are lower case hex digits, which
contain the versioning information.
BE allow 8 or 16 digits in the versioning information.
See /usr/src/linux/Rules.make
to understand where these
suffixes come from.
The nm
command on an AIX 4.1 or later machine
generates output which is understood by the aix_nm
symbol
table format.
Typically nm
is invoked with the -e
argument,
so that only external symbols get listed.
Each line has the symbol name,
followed by a symbol type character,
followed by an address
and optionally followed by a length.
Fields are seperated with white space.
Addresses and lengths are 0x
preceeded if they are listed in hex
(this is caused by invoking nm
with the -x
flag).
BE ignores 4 byte type d
data entries from the table,
as these tend to refer to TOC entries.
BE also ignores machine generated symbols which start _$STATIC
.
C++ symbol names are typically listed demangled, and so can contain spaces. BE has quite complicated special logic to handle this.
Note that the symbol values obtained using nm
are actually
offsets from the beginning of the executable.
You'll need to determine where the executable is in memory or the crash
dump memory image, perhaps using the AIX crash
command.
Assuming this base value to be 0xBBBBBBBB
, you would pass the
following options to BE :-
-Y aix_nm -y symbolfile.sym@0xBBBBBBBB
It is not too difficult to write a BE memory extension
which accesses AIX kernel memory space by accessing /dev/kmem
.
Hey presto, BE can show live datastructures within the AIX kernel!
The map
format corresponds the .map
files
written by the 16 bit DOS link.exe
program.
This has a section at the beginning of the file which declares segment names, positions and sizes. BE ignores this.
Next the symbols are listed, ordered by name, and BE ignores this too.
Finally the symbols are listed again, ordered by value. BE reads this data.
Each line is of the form :-
SSSS:OOOO SymbolName
BE enters an entry in the symbol table of value 0xSSSSOOOO
for each symbol.
This works well in conjunction with BEs -g
command line argument.
eg:
be -Y map -y embedded.map -g dump.dat@0xf0000 -- assuming embedded.map is the map file from linking some embedded -- application, and that dump.dat is a dump of the memory starting -- at physical location 0xf0000
NLMs and drivers can be linked using the NetWare or Watcom linkers
and these can both be made to spit out a .map
file.
In the .map
file, symbols are listed with their offset
from the start of the CODE or DATA segment.
In order to know a symbols address we must load the NLM and determine its
CODE and DATA segment base addresses.
These base values can then be added onto the offset values in the
.map
file.
The bases can be determined using the built-in NetWare debugger. Enter it via the Shift+Shift+Alt+Esc sequence, use .m nlmname to get the bases, and g to resume NetWare.
The following options are provided :-
-Y nw_nw_code
.map
file produced by NetWare linker,
and extract and process CODE symbols.
-Y nw_nw_data
.map
file produced by NetWare linker,
and extract and process DATA symbols.
-Y nw_wc_code
.map
file produced by Watcom linker,
and extract and process CODE symbols.
-Y nw_wc_data
.map
file produced by Watcom linker,
and extract and process DATA symbols.
Assuming an NLM had its code based at 0xCCCCCCCC
, and
its data based at 0xDDDDDDDD
, and it was linked with the Watcom
linker, you would use the following BE options :-
-Y nw_wc_code -y nlmname.map@0xCCCCCCCC -Y nw_wc_data -y nlnmame.map@0xDDDDDDDD
Notice how we process the .map
twice - once to get the code
symbols and to relocate them by 0xCCCCCCCC
, and once to get the
data symbols and to relocate them by 0xDDDDDDDD
.
Awkward, but it works, without having to post-process the .map
file output by the linker.
The NetWare linker output has a section which looks like :-
Publics By Address DATA 00005B94 Evan (D:\build\ham\hamdata.c) DATA 00005B98 deviceName (D:\build\ham\hamhacb.c) DATA 00005BA8 hamName (D:\build\ham\hamnlm.c)
It is this section which BE uses.
The Watcom linker output has lines in it of the form :-
CODE:00305678 fhbf CODE:00045678+ symmy DATA:00345678* sym DATA:00345008s symbol
To complete the picture, all that is needed is a BE memory extension which allows BE to access the memory space of an NLM.
The binary file arguments to BE are normally of the form :-
filename[@address]
This tells BE to load the file and whenever data at a memory address
from address
to address+filelength
is accessed,
to supply the data from the file.
However, it is possible to supply binary file arguments of the form :-
extension!args[@address]
Memory extensions may be written to provide either read-only, or read-write access to their data.
BE loads the memory extension DLL or shared library.
It then passes the args
and address
to the memory
extension, who does something of its own chosing with them.
The memory extension DLL can then supply data to BE on request.
One use of the BE memory extension feature is the provision of a memory extension for handling files too massive to load into memory all at once. The memory extension opens a file handle and reads bytes demanded by BE upon request. Source for BEBIG is included in this document. The user can type :-
be big!verybigfile.dat
It ought to be noted that the author regularly uses BE on files of several megabytes in size, without a problem. However, files of several gigabytes would present a problem!
Another use is the in live-debug of running adapter cards.
The memory extension can provide data bytes directly from the memory space
of the adapter.
args
could be used to identify the slot the adapter is in.
Alternatively, args
could identify IO base addresses,
memory window addresses, or a device driver to use to access the data.
Memory extensions which do this, do exist, and they almost turn BE into
a debugger (almost, because there is no run, stop, or single step).
Run, stop and single step of an adapter could be driven by the
options mechanism, if that were possible and/or desired.
When using these, a customised initialisation file is typically also used,
which understands all the structure definitions and variables used in the
firmware on the adapter.
Yet another use, might be providing BE with access to physical or
virtual or process specific linear address spaces, perhaps via the use
of a device driver.
Shared memory windows might give addressibility of datastructures in
other programs.
A simple example of this is a memory extension which reads bytes from
the /dev/kmem
special device in the AIX or Linux environment.
Using this, kernel device drivers may be debugged.
Also, the surface of a disk or block device can be made accessible via an memory extension. Again, a memory extension which does this does exist (but it uses a non-standard mechanism for accessing the disk blocks). BE could then debug and repair filesystem data.
Perhaps bytes sent down a communications port could be made to appear as a stream of binary data.
The file bememext.h
documents the extension interface.
Currently extensions may be built for :-
loadAndInit
API to load the extension, named
beextension
, searching along the directories listed in the
PATH
and LIBPATH
environment variables.
The IBM xlC C++ compiler can be used to compile the extension.
Unfortunately xlC C++ is expensive and being discontinued.
dlopen
to load shared libraries named beextension.so
, searching
along the LIBPATH
environment variable.
g++ will be usable to compile the extension.
dlopen
API to load the extension, which is
named beextension.so
.
dlopen
locates shared libraries by looking along the
LD_LIBRARY_PATH
, directories listed in the
/etc/ld.so.cache
file, and in the /usr/lib
and /lib
directories.
shl_load
API to load the extension,
which is named beextension.sl
.
shl_load
locates shared libraries by looking along
the SHLIB_PATH
.
I reserve the right to switch-over-to or add-support-for using
dlopen
based shared library support.
dlopen
API to load the extension, which is
named beextension.so
.
dlopen
locates shared libraries by looking along the
LD_LIBRARY_PATH
(it may also look in other places).
LoadLibrary
API to load
BEextension.DLL
.
DosLoadModule
API to load
BEextension.DLL
, which it finds by looking along the
LIBPATH
environment variable.
LoadLibrary
API to load
the extension, which is named BEextension.DLL
, which it seems
to find by looking in the current directory or along the PATH
.
.ndl
file extension.
The DLL remains in memory for as long as BE needs it.
The .ndl
files must be on the search path so BE can find them.
BEBIG is a simple memory extension for accessing enormous (ie: upto 4GB) files. The source for it is included here primarily as a reference for writing others. Despite not implementing the full richness of the memory extension interface, it should serve well to get you writing and testing your own extensions.
The C++ source code, bebig.C
, looks like :-
// // bebig.C - BE memory extension for editing massive files // // This is a rather simple implementation that simply seeks // around an open file, and gets and puts bytes. // // This only supports files less than 4GB in size due to the use of 32 bit // addresses. BE with support for the 64 bit address space can use alternative // memory extension entrypoints with _64 suffixes and which pass BEMEMADDR64 // addresses instead. However support for massive files varies from platform // to platform, and thus would have complicated this example somewhat. // #include <stdio.h> #include <ctype.h> #include <string.h> #include <stddef.h> #include <stdlib.h> #include <memory.h> #include "bememext.h"
class BeMem { FILE *fp; BEMEMADDR32 base_addr, len; Boolean read_only; public: BeMem(FILE *fp, BEMEMADDR32 base_addr, Boolean read_only) : fp(fp), base_addr(base_addr), read_only(read_only) { fseek(fp, 0L, SEEK_END); len = ftell(fp); } ~BeMem() { fclose(fp); } Boolean read(BEMEMADDR32 addr, unsigned char & b) { addr -= base_addr; if ( addr >= len ) return FALSE; if ( fseek(fp, addr, SEEK_SET) != 0 ) return FALSE; return fread(&b, 1, 1, fp) == 1; } Boolean write(BEMEMADDR32 addr, unsigned char b) { if ( read_only ) return FALSE; addr -= base_addr; if ( addr >= len ) return FALSE; if ( fseek(fp, addr, SEEK_SET) != 0 ) return FALSE; return fwrite(&b, 1, 1, fp) == 1; } };
BEMEMEXPORT Boolean BEMEMENTRY bemem_read( void * ptr, BEMEMADDR32 addr, unsigned char & b ) { BeMem *bemem = (BeMem *) ptr; return bemem->read(addr, b); }
BEMEMEXPORT Boolean BEMEMENTRY bemem_write( void * ptr, BEMEMADDR32 addr, unsigned char b ) { BeMem *bemem = (BeMem *) ptr; return bemem->write(addr, b); }
BEMEMEXPORT void * BEMEMENTRY bemem_create( const char *args, BEMEMADDR32 addr, const char *(&err) ) { FILE *fp; BeMem *bemem; if ( !memcmp(args, "RO:", 3) ) { if ( (fp = fopen(args+3, "rb")) == 0 ) { err = "can't open file in read only mode"; return 0; } bemem = new BeMem(fp, addr, TRUE); } else { if ( (fp = fopen(args, "rb+")) == 0 ) { err = "can't open file in read/write mode"; return 0; } bemem = new BeMem(fp, addr, FALSE); } if ( bemem == 0 ) { fclose(fp); err = "out of memory"; return 0; } return (void *) bemem; }
BEMEMEXPORT void BEMEMENTRY bemem_delete( void * ptr ) { BeMem *bemem = (BeMem *) ptr; delete bemem; }
#ifdef DOS32 // Note: Required due to the way DOS CauseWay DLLs are constructed int main(int term) { term=term; return 0; } #endif
#ifdef AIX // Note: The need for this section may vanish if the AIX version of BE // stops using loadAndInit to load shared libraries, and uses dlopen. extern "C" { BEMEM_EXPORT * __start(void) { static BEMEM_EXPORT exports[] = { (BEMEM_EP) bemem_read , "bemem_read" , (BEMEM_EP) bemem_write , "bemem_write" , (BEMEM_EP) bemem_create , "bemem_create" , (BEMEM_EP) bemem_delete , "bemem_delete" , (BEMEM_EP) 0 , 0 }; return exports; } } #endif
#ifdef NW // Rather ugly mechanism required to make NLMs behave like DLLs under NetWare. #include <conio.h> #include <process.h> #include <advanced.h> static int tid; extern "C" BEMEMEXPORT void BEMEMENTRY _bemem_term(void); BEMEMEXPORT void BEMEMENTRY _bemem_term(void) { ResumeThread(tid); } int main(int argc, char *argv[]) { argc=argc; argv=argv; // Suppress warnings int nid = GetNLMID(); SetAutoScreenDestructionMode(TRUE); SetNLMDontUnloadFlag(nid); tid = GetThreadID(); SuspendThread(tid); ClearNLMDontUnloadFlag(nid); return 0; } #endif
Yes, sure, I could use C++ streams and I could cache the data read, but this is supposed to be just a simple example.
Note that some operating systems require you to include specific bits of
code in your source to make the DLL mechanism work.
Thats the stuff thats #ifdef
d at the end.
Under AIX, the makefile
looks like the following.
Note that this will change when I move from xlC++ loadAndInit
style shared libraries, to g++ dlopen
style :-
bebig: bebig.o /usr/lpp/xlC/bin/makeC++SharedLib \ -p 1 -n __start -o $@ bebig.o chmod a-x $@ bebig.o: bebig.C bememext.h xlC -DUNIX -DAIX -c $*.C
Under Linux, the makefile
looks like :-
bebig.so: bebig.o g++ -shared -o $@ bebig.o chmod a-x $@ bebig.o: bebig.C bememext.h g++ -DUNIX -DLINUX -fPIC -c $*.C
Under HP-UX, the makefile
looks like :-
bebig.sl: bebig.o aCC -b -o $@ bebig.o bebig.o: bebig.C bememext.h aCC -DUNIX -DHP +z -c $*.C
Under SunOS, the makefile
looks like :-
bebig.so: bebig.o CC -G -Kpic -o $@ bebig.o bebig.o: bebig.C bememext.h CC -DUNIX -DSUN -w -G -Kpic -c $*.C
Under Windows, the makefile
is very similar :-
bebig.dll: bebig.obj link /NOLOGO /INCREMENTAL:NO /DLL $** /OUT:$@ bebig.obj: bebig.C bememext.h cl /c /DWIN32 /G4 /Gs /Oit /MT /nologo /W3 /WX /Tp $*.C
Under OS/2, using IBM Visual Age C++, a module definition file,
bebig.def
, is needed :-
LIBRARY BEBIG INITINSTANCE TERMINSTANCE DATA MULTIPLE NONSHARED READWRITE CODE PRELOAD EXECUTEREAD EXPORTS bemem_create bemem_delete bemem_read bemem_write
Under OS/2, the makefile
will typically look like :-
bebig.dll: bebig.obj bebig.def ilink /NOI /NOLOGO /OUT:$@ $** bebig.obj: bebig.C bememext.h icc /C+ /W3 /Wcmp+cnd+dcl+ord+par+use+ \ /Ge-d-m+ /Q+ /DOS2 /Tp $*.C
Under 32 bit DOS, the makefile
looks like the following
example.
If you don't explicitly reference plib3r.lib
, and the C++ code
uses operator new
then its multithreaded equivelent gets dragged
in (which causes link problems) :-
bebig.dll: bebig.obj wlink @<< System CWDLLR Name $@ File bebig.obj Library %watcom%\lib386\plib3r.lib Option Quiet << bebig.obj: bebig.C bememext.h wpp386 -bt=DOS -dDOS32 -oit -4r -s -w3 -zp4 -mf -zq -fr -bd $*.C
Under NetWare, the makefile
looks like the following.
Again, the plib3s.lib
reference is required.
bebig.ndl: bebig.obj wlink @<< Format Novell NLM '$@' Name $@ Option Quiet Option Map Option ScreenName 'System Console' Option ThreadName '$@' Debug Novell Module clib, mathlib File bebig.obj Library $(WATCOM)\lib386\plib3s.lib Library $(WATCOM)\lib386\math387s.lib Library $(WATCOM)\lib386\noemu387.lib Library $(WATCOM)\lib386\netware\clib3s.lib Import @$(WATCOM)\novi\clib.imp Export _bemem_term Export bemem_read, bemem_write Export bemem_create, bemem_delete << bebig.obj: bebig.C bememext.h wpp386 /s /fpi87 /mf /zfp /zgp /zl /zq /4s /fpd /wx /bt=NETWARE /DNW $*.C
Despite BE being compiled multi-threaded on 32 bit OS/2 and Win32, its not compiled that way for 32 bit DOS, AIX, Linux, HP-UX, SunOS and NetWare. One day BE for AIX, Linux, HP-UX and SunOS may be multi-threaded and thus the makefiles for making BE memory extensions may need appropriate modifications. Even though BE only uses one thread, compiling multi-threaded gives the memory extension writer the flexibility to write code which tries to read data in the background in advance of it being needed.
The -C dx
command line argument is a way of telling BE
to load and use a disassembler extension for displaying any code in the data.
The same rules for naming and locating disassembly extensions apply, as for memory extensions.
eg: If you have an Intel 8086 disassembler, you could type :-
be -C i86 dump.ram
The file bedisext.h
documents the extension interface.
Disassembler extensions are compiled and linked in exactly the same way as memory extensions (see example above), although they obviously provide different entrypoints.
When editing files, changes to the data are recorded in memory. When BE is closed down, it attempts to write back any changes back into the disk files where the data originally came from. BE will prompt you as to whether to save the changes back to disk.
If a memory extension is providing the data to BE for display, and the memory extension supports modification of the data, it has a choice :-
As most memory extensions provide a live view of some real-time data, they tend to opt for the first choice.
Over the years a number of requests have popped up, some of which may actually get implemented, time allowing :-
Note that a significant number of the existing BE features have arisen from user requests. No promises though - my free time is very scarse...
The latest version of the full BE package is most easily obtainable over the Internet via the links on my home page :-
A smaller package is available from the Hobbes FTP site, which only includes the PC versions.
be_aix4
, be_linux
,
be_hpux
or be_sun
executable to somewhere like
/usr/bin
, /usr/local/bin
or ~/bin
,
or wherever on the path you consider appropriate, and rename it
to be
.
be.ini
to the same directory as
be
so it can be found, or copy it to .berc
in
your home directory.
BE uses your local initialisation file in preference to the common one.
be.hlp
to the same directory as
be
so it can be found.
be.htm
to wherever you keep documentation.
On AIX, best keyboard and colour support is obtained by using an
aixterm
, or by logging in from OS/2 using
HFTTERM.EXE
.
It should be noted that HFTTERM.EXE
appears to have a bug
whereby it doesn't generate the correct datastream for the @9 and
@0 keystrokes.
On Linux, best colour and keyboard support is found using the
regular linux
terminal.
On the RedHat distribution, the xterm
terminfo entry may not
include support for colour, and you may have to set the TERM
environment variable to be xterm-color
.
BE for Linux is now compiled on a RedHat 6.1 system with glibc using egcs-2.91.66. This effectively rules out running it on earlier libc5 based Linux systems.
On HP-UX, I get best keyboard and screen support when using an
xterm
or a vt100
.
I can even get colour support if I roll my own vt100-color
by
taking the default vt100
terminfo definition, adding support
for the ANSI colour escape sequences (and op
and
bce
capabilities), and using tic_colr
to compile the
new definition.
On SunOS, I get best keyboard and screen support when using an
xterm
or a vt100
.
I can even get colour support if I roll my own vt100-color
by
taking the default vt100
terminfo definition, adding support
for the ANSI colour escape sequences (and op
and
bce
capabilities).
For an easy way to enhance your terminal support to support colour, see the TERMINFO package on my home page.
be_win.exe
to be.exe
, somewhere on the path.
be.ini
to the same directory as be.exe
so it
can be found.
be.hlp
to the same directory as
be.exe
so it can be found.
be.htm
to wherever you keep documentation.
BE is a Win32 application, which has had extensive testing on Windows NT. Rather less testing has been performed with Windows 95, and quite a few bugs in the Windows 95 version of the Win32 Console API (used for screen redraw) have been identified and worked around. Some oddities relating to the use of the unusual screen sizes still remain. I would not be surprised if there are more problems to be found...
be_os2.exe
to be.exe
, somewhere on the path.
be.ini
to the same directory as be.exe
so
it can be found.
be.hlp
to the same directory as
be.exe
so it can be found.
be.htm
to wherever you keep documentation.
be.ico
to the same directory as
be.exe
.
This allows BE to have a cute icon when running in the Workplace shell.
be.ini
can be found.
be_dos32.exe
to be.exe
, somewhere on the
path.
be.ini
to the same directory as be.exe
so
it can be found.
be.hlp
to the same directory as
be.exe
so it can be found.
be.htm
to wherever you keep documentation.
Obviously, because BE for DOS is a 32 bit program, which uses a DOS extender, the machine upon which you run it must have a 32 bit processor.
be.nlm
to somewhere on the search path.
be.ini
to the same directory as be.nlm
.
be.hlp
to the same directory as
be.nlm
.
be.htm
to wherever you keep documentation.
Unfortunately I don't have continual access to all the platforms, so the latest improvements in one version may not yet be reflected into the others.
def
keyword in the
initialisation file.
1+2*3
.
Wherever BE prompts for a number, any numeric expression may be
used.
Basic arithmetic is supported, along with
symbol table lookup and support for
mapping.
See the section on numbers for more details.
lj
and nolj
keywords are used for this
purpose.
map
keyword in the
initialisation file defines a mapping between
numbers and strings.
Essentially it is a way of mapping numbers back to more a readable
enumerated type form.
The map MAPNAME "MAPLETSTRING"
syntax may be used in any
expression in the
initialisation file or at any time BE prompts you
for a number, and it evaluates the the numeric equivelent of the
enumerated type named value.
Data displayed via mapping tables can be edited via the M key.
-S name=val
command
line argument, or through the set
and unset
keywords in the initialisation file.
nullptr
and nonullptr
are used.
When the user presses Enter on a pointer value, BE pops up
the data in the 'pointed to' definition,
unless the value is 0, and null-pointer attribute is present.
ptr DEFN
is used in a
field definition to indicate that a numeric
field identifies the address of another definition.
_ptrgl
.
This loads the TOC register from bytes 4 to 7 in the glue block and
branches to the code specified by bytes 0 to 3 in the glue block.
This mechanism is much like __loadds
in 16 bit Intel code,
in that it ensures the callee can access its own global data, even if
it is a seperate library or module.
suppress
keyword may be used in the
initialisation file on a field, or
the @S and @N keys may be used interactively.
-y symtab
command line argument.
It is a list of names (the symbols) and their values.
Typically these are code or data addresses for functions or
variable within an executable program.
BE can use this information so it can display addresses in
symbol+offset form, or so it can allow you to type
addr "symbol"
in an expression and
have BE substitute the numeric value of the symbol.
valid "EXPR"
syntax
in the initialisation file, or by pressing the
V key whilst on the field.
Fields with validity checks have either ++
or --
shown next to them, depending upon whether the check passes.
Fields failing their validity check are suppressed when viewing a
structure definition in single line summary form.
This feature can be used to effectively give conditional decode.
zterm
or nozterm
keywords in the
initialisation file.
Copying of this program is encouraged, as it is fully public domain. The source code is not publically available. Caveat Emptor.