Andys Binary Folding Editor

Introduction

Andys Binary Folding Editor is primarily designed for structured browsing, although it also provides minimal editing facilities.

This program is designed to take in a set of binary files, and with the aid of an initialisation file, decode and display the definitions (structures or unions) within them. BE is particularly suited to displaying non-variable length definitions within the files.

This makes examination of known file types easy, and allows rapid and reliable navigation of memory dumps. BE is often used as the data navigation half of a debugger.

For a summary of how to use the editor, see the section Using the editor.

This documentation corresponds to the 12/01/00 version of BE.

Features

BE has the following features :-

Ability to decode definitions in (multiple) files.
Ability to handle either endian multibyte values.
Ability to handle signed or unsigned numeric values.
Ability to display fields in definitions as ASCII, binary, EBCDIC, octal, decimal, hex, seconds since epoch, via symbol table lookup, or via a mapping table (enumerations and/or bit-flags).
Fields can be numeric or buffers or nested definitions.
Computed fields (whose values may be computed from a numeric expression, which may include references to other fields and/or memory locations).
Selectable level of detail of display.
Ability to suppress fields of structures considered irrelevant. This includes within nested sub-structures etc..
Ability to expand sub-definitions or follow absolute or relative pointers to other definitions.
Automatic linked-list following.
Ability to write/append current view of data to a text file.
Searching over data on display, optionally using Extended Regular Expressions.
Include/exclude and sort lines features.
Refresh data and auto-refresh data.
Ability to edit data not actually in a file, but supplied by a (possibly user written) BE memory extension module. This feature can turn BE into the data-navigation half of a debugger!
Tagging of lines on display, and rapid stepping between tags.
Ability to view a text file.
Ability to bring up online help.
Ability to bring up initialisation file for user review.
Shelling out to the operating system.
Multiple sessions, with copying between sessions.
Address sliding feature.
Power address sliding feature, for systematically tracking down definitions at unknown locations in the file or memory space. Keeps decoding definition at successive addresses, until patterns match. This feature is truly awesome!
Optional display of addresses, offsets, lengths and indices for fields in a definition. Indices of arrays displayable in ASCII, EBCDIC, binary, octal, decimal or hex. Addresses displayable as hex or symbol+offset. Ability to set initial display mode on the command line.
ARM long-jump decoding of symbolic code addresses.
Ability to use a symbol table file(s), so that addresses may be displayed symbol+offset, and so the user may refer to symbolic addresses in numeric expressions.
Symbol files in a variety of formats are supported.
User extendable initialisation file, defining definitions within the files or memory spaces.
Initialisation file influencable via command line options.
A non-interactive mode, where BE displays data to stdout and quits.
Support for a plug-in disassembler.
Support for AIX, Linux, HP-UX, SunOS, Windows (both NT and 9x), 32 bit OS/2, 32 bit DOS (via a DOS extender), and NetWare.
Support for >32 bit datatypes and address space (upto 64 bit) on certain systems (subject to compiler support for long long). These are currently all versions except 32 bit OS/2.

Command line arguments

  usage: be [-i inifile] {-I incpath} {-D symbol} {-S name=val}
            { [-s s1-[s2]] [-d defn] [-a addr] [-f field] }
            {-Y symfmt} {-y symfile[@bias]} [-C dx]
            [-w width] [-h height] [-c colscheme] [-p] [-r] [-v viewflags]
            [-g] [--] { binfile[@addr] | mx![args[@addr]] }
  flags: -i inifile       override default initialisation file
         -I incpath       append include path(s) for use by inifile
         -D symbol        pre-$define symbol(s) for use by inifile
         -S name=val      set constant name to be value
         -s s1-s2         set sessions following -d, -a, -f and -p apply to
         -d defn          initial definition to use (default: main)
         -a addr          initial address to use (default: 0)
         -f field         field name within defn (list link, or array to expand)
         -Y symfmt        symbol table format
         -y symfile@bias  input symbol table file(s) (with optional biases)
         -C dx            code disassembler extension
         -w width         set screen width
         -h height        set screen height
         -c colscheme     set colour scheme (0 to 3, default: 0)
         -p               print data to stdout, non-interactive
         -r               restricted mode, no shelling out allowed
         -v viewflags     combinations of A,O,L,I,a,e,b,o,d,h,j,+,-
         -g               perform seg:off->physical mapping on all addresses
         -A size          address space size (8 to 64, default: 32)
         binfile@addr     binary file(s) (with optional address, default: 0)
         mx!args@addr     memory extension with arguments (and optional address)

The -i flag overrides the default initialisation file.

The -I flag affects the operation of the include command in the initialisation file.

The -D flag allows the definition of symbols which may be accessed via the $ifdef and similar directives in the initialisation file.

The -S flag allows the definition of a named constant for use in numeric expressions in the configuration file.

The editor has 10 editing sessions, and the -d, -a and -f options affect all of these (by default), unless the -s option is used to specify which session(s) are affected. By default only session 0 is shown by -p, but this too can be changed with the -s option.

The initial structure definition and address to decode on all/each session may be overridden with the -d and -a flags. Normally BE starts by looking up the definition of a 'main' definition, and decoding the data at address 0 as such. The address expression is allowed to refer to symbols in symbol tables, as it is evaluated after the symbol tables have been loaded. All the other numeric command line arguments are evaluated before any symbol table loading takes place, and so can't refer to symbols.

If the -f flag is used, it must identify a field within the specified structure. If the field is a pointer to a structure of the same type, BE will initially display a linked list of structures, rather than just one structure. Otherwise, the field is assumed to be an array of fields, and an element list is displayed instead.

Symbol table(s) may be specified using the -y flag. Symbol files are assumed to be the format generated by the ARM linker. However, the -Y flag can be used to tell BE that symbols in other formats follow. Multiple symbol files in differing formats may be specified, as in :-

  be -Y aix_nm -y syms.nm -Y arm -y syms.sym ...

See the section on symbol table formats for a description of the supported file formats.

If a bias is specified, then it is added to each symbol value in the file. This is handy when a symbol table contains relative values, rather than absolute addresses.

The -C dx option may be used to extend BE by the use of a disassembler extension. This is a peice of code with a well defined interface, which BE uses to disassemble data annotated as code.

The -w and -h arguments can be used to try to override the current screen size. This doesn't work on UNIX or NetWare, but does on 32 bit DOS, 32 bit OS/2 and Windows. The -c argument allows you to choose from a small selection of colour schemes.

The -p flag causes BE to be invoked in a non-interactive manner. It decodes the address given, as a structure of the type specified, and writes the result to the screen (as stdout). Multiple structure dumps can be obtained by judicious use of the -s flag above.

The -r flag prevents a user of BE from shelling out a nested operating system command.

The -v flag allows you to state that addresses, offsets, lengths and array indices are to be displayed next to the data on display initially (note that -vI turns off indices). You can also turn on the symbolic display of addresses. In addition, you can specify the display mode of indices, one of binary, octal, decimal or hex. The + and - keys affect the initial level of detail of display, and only has effect when used with the -f flag. This is particularly useful when combined with the -p flag. Unfortunately, view flags are global, rather than per-session.

The -g argument is the 'segmented mode' switch. When enabled, BE translates all addresses prior to using them to fetch or store data. ie: address 0xSSSSOOOO is mapped to SSSS*16+OOOO. This is obviously intended for debugging dumps from embedded Intel processor dumps, and anyone with a sensible file format can ignore this flag.

Normally BE operates in a 32 bit address space. -A can be used to change this. For example, you could select a 24 bit address space. In this case BE would ignore bit 24 and above when addressing data, and would only show the bottom 24 bits when displaying addresses. Support for >32 bit address spaces is currently only available on certain operating systems versions of BE.

Multiple input binary files can be specified, and they should be loaded at non-overlapping address ranges.

BE supports the -- to end options and thus allow filenames given afterwards to have names starting with a -.

Each binary file provides data for a part of the memory space which BE can view or edit. Therefore each binary file may be described as a memory section.

Alternatively a memory section may be specified as mx!args. This instructs BE to load a memory extension, and to access the data indentified by the arguments via the memory extension. This feature allows BE to be extended to be able to edit non-file data directly, such as sectors on a disk.

Typical invokations of BE might be :-

  be picture.bmp
    to edit a file, which is loaded into the BE memory space at 0 onwards.

  be -y gizmo.sym gizmo.rom gizmo.ram@0x8000
    to edit dumps from the RAM and ROM of a coprocessor.
    where the ROM starts at 0, and the RAM at 0x8000.
    gizmo.sym is the symbols for the microcode the coprocessor was running.

  be -Y map -y ucode.map -i ucode.ini -g -C i86 coproc!io=0x400,mem=0xc0000
    to live edit a running coprocessor.
    ucode.map has the symbols for the microcode the coprocessor is running.
    ucode.ini is a custom initialisation file.
    BEcoproc.DLL provides BE with access to coprocessor memory.
    io=0x400,mem=0xc0000 tells BEcoproc.DLL how to find the coprocessor.
    BEi86.DLL allows BE to disassemble any code in the data.

  be -d HEADER -a 512 -p -vA file.dat
    display the header at 512 bytes into file.dat.
    decoded data is to be written to stdout, BE is not interactive.
    addresses are to be displayed next to the data.

  be -s 1 -d STRUCT1 -a 0x1000 \
     -s 2 -d STRUCT2 -a 0x2000 \
     -s 3 -d STRUCT3 -a 0x3000 \
     -s 1-3 -p \
     filename.dat
    to pick three structures at three addresses,
    and to have BE decode and display all three to stdout.

The initialisation file

One of the first things BE does is to find and load the initialisation file, and this tells BE the layout of various file formats and the structures within them.

Under 32 bit OS/2, Windows and 32 bit DOS, BE finds the initialisation file by searching along the path for the .EXE file, and then looking for a .INI file with the same name.

BE for NetWare looks for a .INI in the same directory as the .NLM file.

Under UNIX, BE looks for ~/.berc, and failing that, it looks along the path for be and then appends .ini. If be is renamed to xx, then the files will be ~/.xxrc and xx.ini.

BE can be made to look elsewhere using the -i command line option.

Also, $define, $undef, $ifdef, $ifndef,$else, $endif and $error are supported, as a form of a pre-processing/conditional processing step. The -D command line option may be used to pre-$define such conditional processing symbols.

It should be noted that $define, $undef, $ifdef and $ifndef can all be given a list of symbols (rather that just one). This causes $define or $undef to define or undefine all the symbols in the list. It causes $ifdef or $ifndef to check that all the symbols in the list are defined or that they are all undefined.

De-Morgans law can be used to acheive OR combinations :-

  $ifndef A B C
    // None of A, B or C is $defined
  $else
    // This part is therefore if A, B or C is $defined
  $endif

If BE is running on 32 bit OS/2, then OS2 is pre-$defined. If running on Windows, then WIN32 is pre-$defined. If running on NetWare, then NETWARE is pre-$defined. If running on a type of UNIX, then UNIX is pre-$defined. If running specifically on AIX, then AIX is pre-$defined. If running specifically on Linux, then LINUX is pre-$defined. If running specifically on HP-UX, then HP is pre-$defined. If running specifically on SunOS, then SUN is pre-$defined. If running on 32 bit DOS, then DOS is pre-$defined. Either BE or LE will be pre-$defined, depending upon whether BE is running on a big-endian or little-endian machine. If BE supports 64 bit numbers and a 64 bit address space, then BE64 is pre-$defined. These $defines allow you to write initialisation files with sensible defaults, relevant for the current environment.

An include directive is supported, and included files will be searched for by looking in the current directory, then along an internal include path, along the BEINCLUDE environment variable, and finally along the PATH environment variable. The internal include path is usually empty, but may be appended to by the use of the -I command line option.

By the time the initialisation file is processed, any symbol files specified on the command line will have been loaded, along with any data files. This means that initialisation files may make reference to symbols and also to the data itself.

The initialisation file contains commands to set the default data display attributes, set constant, structure definitions, alignment declarations and include statements.

As BE processes the initialisation file, it generates warnings (such as undefined symbol table symbol), and error messages into an internal buffer. If there are no errors, then this buffer is discarded. If there are errors, then all the warnings and errors are listed, and BE aborts.

Lexical elements

This initialisation file may contain C or C++ style comments.

Numbers may be given in binary, octal, decimal or hex, as the following examples, all of which represent 13 decimal :-

  0b1101, 0o15, 13, 0x0d

Numbers may also be given in character form. Multiple characters may be given to form a number, and this is quite handy because sometimes files/datastructures use magic numbers which are formed to out of characters so as to be eye-catching. More than 4 characters give undefined results. Characters may be quoted, similar to traditional C/C++ style :-

  'a'    = 0x61
  'ab'   = 0x6162
  'abc'  = 0x616263
  'abcd' = 0x61626364
  '\n'   = 10
  '\x34' = 0x34           always 2 hex digits after \x
  '\040' = 32             always 3 octal digits after \ (unlike C/C++)
  '\0'                    isn't legal, must be 3 octal digits
  '\000'                  isn't legal, 0 is the string terminator

Strings may be given in traditional C/C++ style too :-

  "Hello World"
  "One line\nAnother line"
  "String with a tab\tin the middle"
  "String with funny character at the end\x9a"
  "String using octal \377 notation to get character 255 in it"
  "String with \000 string terminator in it"                isn't legal
  "String which starts on one line \
  and continues on another"

Strings can be no more than 250 characters long.

Note that all strings used in the BE initialisation file must be 'clean', in that they can only contain the regular ASCII characters within the range ' ' to '~' (ie: 32 to 126 inclusive). Given this, you may ask why BE allows the escaped character notation: Well, strings can also be typed in by the user when interactively editing data, and it is very useful to allow a way to type non-ASCII characters.

When displaying strings, BE typically makes best possible use of the terminal in use, and may show the glyphs for unusual non-ASCII characters if it can. However, non-displayable characters are simply shown as '.'s.

Identifiers start with an underscore or a letter, and continue with more underscores, letters or digits. Some identifiers are actually reserved words in the BE initialisation file language.

The fact the NULs aren't allowed within BE strings is a rather irritating side effect of the way BE is implemented using traditional C/C++ NUL terminated strings. Perhaps one day I'll fix this.

Numbers

Wherever the initialisation file calls for a number, the following variants may be used :-

number

Just a number, as given in the Lexical Elements section above.

addr "symbolinthesymboltable"

if a symbol table is loaded, and the symbol can be found then the result is the numeric value of the symbol. Otherwise a warning is generated, and the result is the value of the constant nosym, or if that isn't defined, its ~0.

sizeof DEFN

this gives the size in bytes of the earlier defined definition called DEFN. If DEFN isn't already defined, then an error results.

offsetof DEFN "fieldname"

this gives the offset in bytes of the given field in the earlier defined DEFN. If DEFN isn't already defined, then an error results. If the field can't be found in the DEFN, then an error results.

valof "fieldname"

When displaying a list of fields within a definition, you can refer to the value of a numeric field using this notation.

map MAPNAME "mapletstring"

this gives the numeric value that corresponds to the given string defined in the map defintion, as explained below.

identifier

BE tries the following steps in order to find a value for the identifier :-

If you are displaying a list of fields within a definition, then BE looks to see if the identifier matches a numeric field.
ie: BE trys valof "identifier".
Then BE scans its internal list of numeric constants (which can be defined with the set command).
Then BE will then try to look up the name in the symbol table.
ie: BE trys addr "identifier".
After this, BE will scan all the map definitions to see if the identifier matches exactly one maplet name.
ie: BE trys map M "identifier" for all M. This step isn't particularly quick, as there can be a very large number of mappings

Obviously you will never be able to refer to a field, symbol or maplet with white space in its name by this shorthand mechanism. Use addr, valof or map to do this. Also, using the explicit forms is more efficient, as BE needn't look in all the possible places, as it does above. The reason BE looks in all the places when just the identifier is given is to reduce typing when using BE interactively.

` identifier expression `

The value of the identifier if defined, or the value of the expression if not.

. (dot)

When defining a DEFN, dot evaluates to the current offset. When prompted for an address in the @ command, it is the current address. When prompted for a delta value, dot is the current delta. When using the = to change a numeric value, dot is the current value. When specifying a value in a maplet, dot means the previous value plus one, or zero if this the first maplet. When specifying a mask in a maplet, dot means the maplet value.

[ type attributes , address , defaultvalue ]

This trys to fetch a numeric datum of the given type (eg: n32), to take into account the given attributes (eg: signed be), from the given address. If nothing can be fetched from that address, then the result is the defaultvalue. If the defaultvalue is omitted, then the expression cannot be evaluated.

[[ buff , e0 , e1 , e2 , defaultvalue ]]

This loops through some addresses trying to match the pattern specified. The loop is basically of the form for ( a = e0; a != e1; a += e2 ) match(buff,a). Be careful using this, there is no way to abort the scan. If the search doesn't locate the pattern, then the result is the defaultvalue, unless it has been omitted, in which case the expression cannot be evaluted.

strlen ADDR

This loops starting at address ADDR fetching bytes until it fetches a 0 byte. It returns the number of bytes prior to the 0. Consider it analogous to the C/C++ strlen function.

Note that the commas in the [ and [[ expressions can be omitted, although this is not recommended. Consider the expression [n32 0xf000 -5]: this looks like it means 'the 32 bit word from address 0xf000, or -5 if it can't be fetched', but it actually means 'the 32 bit word at 0xeffb, with no default if it can't be fetched'. Writing [n32 0xf000 (-5)] would fix this problem, but using commas makes the intention explicit. Semicolons (not just commas) are also valid syntax for seperating expressions.

It should be noted that when using the offsetof or map keywords, leading and trailing space is not significant in the "mapletstring" or "fieldname".

Expressions may be constructed by use of brackets and also the following operators, with usual C language meanings. Operators grouped together have equal precedence. Higher precedence operators listed first :-

+, -, ~, ! unary plus, unary minus, complement, not
*, /, % multiply, divide, modulo
+, - add (plus), subtract (minus)
<<, >>, >>> shift left, shift right (signed), shift right (unsigned) [Note 1]
>, <, >=, <= greater than, less than, greater than or equal, less than or equal
==, != equal, not equal
& bitwise AND
^ bitwise exclusive OR
| bitwise inclusive OR
&& logical AND
^^ logical exclusive OR [Note 2]
|| logical inclusive OR
? : conditional expression

Note 1: The >> is a signed shift right, and >>> is the unsigned shift right (much like Java). This distinction is necessary as all numbers in BE expressions are unsigned. (This affects affects the outcome of expressions like -2/2 which is 0xfffffffffffffffe/2 which is 0x7fffffffffffffff, rather than the -1 you might expect).

Note 2: C/C++ does not have a logical exclusive OR, but BE does for symmetry.

Note also that the operator precedence now matches that of C++. Versions of BE prior to 1/7/99 had incorrect precedence for the shift operators. Luckily people tend to use brackets with these anyway.

Such numeric expressions can also be used when BE prompts for a number, not just in the initialisation file.

Some example expressions :-

  addr "tablebase" + 4 * sizeof RGB
    -- symbol tablebase plus four times the size of the RGB definition

  [ n32 be , 0x70200+0x44 ] + 27
    -- fetch big-endian 32 bit word from 0x70244, then add 27

  [ n16 be bits 11:4 , 0x1000 ]
    -- get big-endian 16 bit word from 0x1000, extract bits 11 to 4 inclusive
    -- if the word was 0x1234, this would give a result of 0x23

  [[ "SIGNATURE" , 0x1000 , 0x2000 , 4 ]]
    -- locate "SIGNATURE" between 0x1000 and 0x2000, 4 byte aligned

Set constant

BE maintains a smallish list of global numeric constants. eg:

  set num_elements 14+5

Avoid using constant names which clash with other identifiers, such as map or structure definition names. Also, avoid clashing with reserved words in the initialisation file language.

The constant can be assigned any numeric expression, including referencing other constants.

This feature allows initialisation files with the following technique for managing multiple configurations of data :-

  $ifdef BIG_DATA_FILE
  set n_entries 100
  $else
  set n_entries 10
  $endif

  def DATA_RECORD
    {
    n_entries n32 buf 100 asc "names"
    n_entries n32 dec         "salaries"
    }

Attempting to set a constant which is already defined produces an error.

The unset command can be used to undefine a previous value. It is not an error to unset a constant which is not previously set to anything :-

  set elems 100
  unset elems
  set elems 200

The -S command line flag can be used to set a constant before the initialisation file is processed. Because the constant is set before the initialisation file is processed, the expression the constant is set to can't refer to things within the initialisation file. Assuming the initialisation file debinfo.ini uses a constant called tabsize :-

  be -i debinfo.ini -S tabsize=10   debug.dat              is fine
  be -i debinfo.ini -S tabsize=10+4 debug.dat              is fine
  be -i debinfo.ini -S "tabsize=sizeof STRUCT" debug.dat   is illegal

The value of a constant may be interactively set, changed or unset by the user using the $ keystroke.

The special constant nosym if set, is returned when the addr "symbol" syntax is used in an expression, to try to determine the numeric value of a symbol which isn't defined. The usual use of this is in defining a value which is miles away from any sensible value.

The special constant disp_limit if set, affects the way BE displays address values in symbol+offset form. If the offset (ie: the displacement) from the symbol exceeds the disp_limit value, then the address isn't displayed in symbol+offset form.

Commands to set the default data display attributes

When the program starts parsing the initialisation file, the default data display attributes are le unsigned hex nomul abs nonull nocode nolj noglue noseg nozterm.

To change this default setting, just include one or more of the following keywords in the file :-

be - read multibyte values from memory in a big-endian fashion.
le - read multibyte values from memory in a little-endian fashion.
signed - when fetching numeric values sign extend them, and when displaying numerically show '+signedvalue' or '-signedvalue'.
unsigned - when fetching numeric values zero extend them, and when displaying numerically show 'unsignedvalue'.
asc - set display mode to ASCII.
ebc - set display mode to EBCDIC.
bin - set display mode to binary.
oct - set display mode to octal.
dec - set display mode to decimal.
hex - set display mode to hex.
time - set display mode to time (decode seconds since epoch).
sym - set display mode to symbolic. ie: look up the value in the symbol table, and if found, display symbol+hexoffset, else display value in hex.
null - allow following of 0 pointers.
nonull - disallow following of 0 pointers.
seg - cope with 16:16 segmented pointers.
noseg - pointers are not segmented.
mul - pointer values should be multiplied by the size of the data type being pointed to.
nomul - pointer values are given in regular byte addresses.
abs - pointer values are absolute.
rel - pointer values are to be considered relative to their own addresses.
code - specify that numeric value is actually a code address.
nocode - specify that numeric value is not a code address.
lj - perform ARM specific long-jump interpretation of code addresses.
nolj - don't do long-jump interpretation.
glue - perform PowerPC specific pointer glue interpretation of code addresses.
noglue - don't do pointer glue interpretation.
zterm - stop displaying buf data when a nul terminator is reached.
nozterm - display data beyond nul terminators.

Note that when multibyte numeric values are displayed in ASCII or EBCDIC, the ordering of the characters produced works like this :-

Type Sample value Displays in ASCII
n8 0x41 'A'
n16 0x4142 'AB'
n24 0x414243 'ABC'
n32 0x41424344 'ABCD'
n40 0x4142434445 'ABCDE'
n48 0x414243444546 'ABCDEF'
n56 0x41424344454647 'ABCDEFG'
n64 0x4142434445464648 'ABCDEFGH'

Type	Sample value	Displays in ASCII
n8	0x41	'A'
n16	0x4142	'AB'
n24	0x414243	'ABC'
n32	0x41424344	'ABCD'
n40	0x4142434445	'ABCDE'
n48	0x414243444546	'ABCDEF'
n56	0x41424344454647	'ABCDEFG'
n64	0x4142434445464648	'ABCDEFGH'

Support for >32 bit numbers is only present in certain operating systems versions of BE.

This can have the side effect that when people design eye-catcher values as numbers to store into memory, they may appear reversed when displayed. In such cases, it might make more sense to decode the field as a N byte ASCII buffer, rather than a number. Alternatively, use the big-endian designation, as in n32 be etc..

Map definitions

Mappings are BE's equivelent to C enumerated types and bitfield support.

These define a mapping between symbolic names and numeric values. A typical mapping definition in the initialisation file might be :-

  map compression_type
    {
    "uncompressed" 1
    "huffman"      2
    "lzw"          3
    }

If the numeric value on display matches the value given, then it can be converted to the textual description.

Mappings in which the values are one bigger than the previous one are quite common. So BE gives a shorthand, where . in the value means 0 for the first maplet given after the open curly brace, and one plus the previous value otherwise :-

  map ordinals
    {
    "zero" .
    "one"  .
    "two"  .
    }

  map larger_ordinals
    {
    "four" 4
    "five" .
    "six"  .
    }

Bitfields may be acheived in the following fashion :-

  map pending_events
    {
    "reconfiguration" 0x0001 : 0x0001
    "flush_cache"     0x0002 : 0x0002
    "restart_io"      0x0004 : 0x0004
    }

The : symbol introduces an additional mask. The number to string conversion algorithm inside BE works like this :-

  for each maplet in the map
    if ( value & maplet.mask ) == maplet.value then
      display the maplet.name
  if some unexplained bits left over then
      display the remaining value in hex

The case where the value and following mask are the same is much more common than the case where they are not. So BE provides a typing shortcut where . in the mask means 'the same as the value'. So the above example can be written :-

  map pending_events
    {
    "reconfiguration" 0x0001 : .
    "flush_cache"     0x0002 : .
    "restart_io"      0x0004 : .
    }

It is possible to have multiple field decodes from a single value :-

  map twobitfields
    {
    "green" 0x0001 : 0x000f
    "blue"  0x0002 : 0x000f
    "red"   0x0003 : 0x000f
    "small" 0x0100 : 0x0f00
    "large" 0x0200 : 0x0f00
    }

The value 0x0243 would be converted to red|large|0x40.

It has been alluded to above, that when supplying numeric expressions, the map keyword may also be used. In the following example, the expression evaluates to 0x0105 :-

  map twobitfields "small" + 5

In fact, if there is no constant or symbol with the same name, you can use the following shorthand for the above example :-

  small + 5

Even sophisticated mappings like the following will work as expected :-

  map attribute_byte
    {
    "colour" 0x10 : 0xf0
    "red"    0x13 : 0xff
    "green"  0x14 : 0xff
    "shape"  0x20 : 0xf0
    "round"  0x23 : 0xff
    "square" 0x24 : 0xff
    }

In this example the meaning of the bottom 4 bits is dependent on the value of the top 4 bits. The top 4 bits encode whether the attribute is encoding information about the colour or shape of something, and the bottom 4 bits encode which colour or shape. The value 0x23 is displayed as "shape|round".

Sometimes it can be convenient to add to the definition of a mapping. This can be done via the add keyword, as follows. An example :-

  map animals { "dog" 1 "cat" 2 }
  map animals { "giraffe" 3 }        // Error, redefinition of map animals
  map animals add { "zebra" 4 }      // Okay, extends map animals
  map birds add { "pelican" 5 }      // Error, no map birds to extend

When displaying a maplet decoded value, the M key can be used to bring up a list of the maplets and whether they decode or not. Through this, the value can be edited.

You can use the suppress keyword to prevent BE using a maplet when converting a number to a string. Not normally used, but can sometimes be handy to cut down screen clutter, as a number is normally displayed in less space. In the following example, 0xc3, bright flashing blue, is shown as "blue|0xc0". Maybe we are only interested in the colour :-

  map obscure_mapping
    {
    "bright" suppress 0x80 : .
    "flash"  suppress 0x40 : .
    "red"             0x01 : 0x3f
    "green"           0x02 : 0x3f
    "blue"            0x03 : 0x3f
    }

Much more common is to interactively suppress maplets from the M maplet list using the @S and @N keys.

Structure definitions

Definitions are BEs equivelent to C structures and unions.

Definitions are a list of at OFFSET clauses, align ALIGNMENT clauses and field definitions. When the structure definition is processed, then the current-offset is initialised to 0.

An at OFFSET clause moves the current-offset to the specified numeric value.

An align ALIGNMENT clause moves the current-offset to be the next integer multiple of the specified numeric value.

A field definition defines a field which lives at the current-offset into the structure. After definition of the field, the current-offset is moved to the end of the field, so that the next field will immediately follow it (unless another at OFFSET clause is used, or a union is being defined).

The size of the structure is the largest value that the current-offset ever attains. This is the value returned whenever sizeof DEFN is used as a number.

Duplicate definitions of the same named definition are not allowed.

A structure definition may have zero or more fields, align ALIGNMENT clauses and/or at OFFSET clauses.

A structure definition may behave like a C struct definition, in that each field follows on from the previous one in memory. Or it may behave like a C union definition, in that all fields overlay each other in memory, and the total size is the size of the largest field.

  def A_STRUCTURE struct
    {
    n32 "first field, bytes 0 to 3"
    n32 "next field, bytes 4 to 7"
      // sizeof A_STRUCTURE is 8
    }

  def A_UNION union
    {
    n32 "first field, bytes 0 to 3"
    n16 "second field, bytes 0 to 1"
      // sizeof A_UNION is 4
    }

The keyword struct is unnecessary, and may be omitted.

These may be combined, like in the following :-

  def MY_COMPLICATED_STRUCTURE
    {
    n32 "first field, occupying bytes 0 to 3"
    union
      {
      n32 "second field, occupying bytes 4 to 7"
      struct
        {
        n16 "the bottom 16 bits of the second field, occupying bytes 4 to 5"
        n8  "the upper middle byte, occupying byte 6"
        n8  "the top byte, occupying byte 7"
        }
      }
    }

The at OFFSET clause also allows the same areas of a structure to be displayed in more than one way, thus also allowing the implementation of unions :-

  def UNION_THE_HARD_WAY
    {
    n32 le  "first value, bytes 0 to 3"
    at 0 n8 "the lower byte, byte 0"
      // sizeof UNION_THE_HARD_WAY is 4
    }

Note: in the above style of example, you can't use the offsetof keyword to position a new field on top of an earlier field, because whilst you are defining a structure definition, it isn't actually fully defined yet, and so the offsetof keyword will not be able to find it.

Each clause can be terminated or seperated with a ;, although normally this isn't necessary. One example of where it is required is :-

  def WONT_BEHAVE_AS_EXPECTED
    {
    n8     "first"
    align 4          // #1
    +5 n16 "array"
    }

The lack of a ; at #1 causes BE to interpret this as align 9, followed by a single n16 field.

Field definitions

Here are some examples of field definitions :-

  n8 asc "initial"
  buf 20 "surname"
  n16 be unsigned dec "age"
  3 pet "pet names"
  3 n16 be unsigned dec "pet costs"
  2 n32 le unsigned hex ptr person "2 pointers to parents"
  2 n32 ptr person null "2 pointers, null legal"
  person "a person"
  n32 sym code "__main"
  1024 n32 unsigned dec "memory as 32 bit words"
  9 n16 map errorcodes "results"
  buf 100 asc zterm "a C style string"
  GENERIC_POINTER suppress "pointer"
  n32 ptr FRED add -. "link"
  n32 bits 31:28 "top 4 bits"
  n32 bits 27:0  "bottom 28 bits (of another word)"
  n32 sym code width 10 "function"
  n32 time "last_update_time"

Each example is of the form :-

  optional-count type optional-attrs name

The field describes count data items of the specified type. If count is not 1, then the field is initially displayed by just showing its type (eg: 10 n32 le unsigned hex "numbers"). When you select the field, you are presented with an element list, with count lines, from which you can select the element you are interested in.

The type of the data is one of n8, n16, n24, n32, n40, n48, n56, n64, buf N or DEFN, where DEFN is the name of a previously defined definition. This type may be considered to be the way in which BE is told the size of the data item concerned.

n8, n16, n24, n32 n40, n48, n56, n64 mean 8, 16, 24, 32, 40, 48, 56 or 64 bit numeric data item.

Support for >32 bit values is only present in certain operating systems versions of BE.

buf N means a buffer of N bytes.

There is also a special expr E type, which defines a 'field' whose value is the result of the expression E. The expression E may be any expression and may even refer to other fields in the definition. The . symbol evaluates to the address of the field. Obviously you can't edit/change the value of an expression. So the following sort of thing becomes possible :-

  def RECTANGLE
    {
    n8                  dec "width"
    n16 be              dec "height"
    expr "width*height" dec "area"
    }

The field has the default data display attributes, unless data display attribute keywords (as defined above) are included in the field definition.

In addition to the data display attribute keywords given above is the map MAP attribute which means display the numeric field by looking up a textual equivelent of the numeric value using the mapping which must have previously been defined.

If the field is one of n8, n16, n24, n32, n40, n48, n56 or n64, the bits MS:LS designation can be used to say that only a subset of the bits fetched are to be displayed. Also, if you edit the field, only the subset of bits are changed. BE does a read-modify-write of the numeric field to acheive this. Despite only showing a subset of the bits, the field is still the same 'size', and the union mechanism must be used to decode multiple bit ranges in the same numeric field. eg:

  union
    {
    n16 be bits 15:12 bin "top 4 bits"
    n16 be bits 11: 0 hex "bottom 12 bits"
    }

The ptr DEFN attribute says that the numeric value is in fact a pointer to a definition of type DEFN. DEFN need not be defined yet in the initialisation file. The mul/nomul attribute described above specifies whether to multiply the pointer value by the size of the data item being pointed to. You can use mult MULT to multiply the pointer value by MULT (therefore mul is effectively the same as mult sizeof DEFN). The null/nonull attribute described above specifies whether this pointer may be followed if the numeric value is 0. The keyword add BASE may be used, and there is also a align ALIGNMENT keyword. ALIGNMENT can only be 1, 2, 4, 8, 16, 32 or 64 in the current implementation. Also, the rel/abs attribute described above specifies whether to add the address of the pointer itself to the numeric value. By using combinations of the pointer keywords, various effects may be acheived :-

n32 ptr DEFN abs: fetch pointer value, and decode DEFN at that address. This case is very common for file format decoding and memory dumps.
n32 ptr DEFN add 0x40000 abs: fetch pointer value, add 0x40000, and decode DEFN at that address. This case can be used to handle multiple memory space problems.
n32 ptr DEFN mul add addr "table" abs: fetch pointer value, multiply by the size of a DEFN, add the address of the table (as determined from the symbol table), and decode the DEFN at that address. This case is typical for when the pointer is in fact a table index.
n32 ptr DEFN rel: fetch pointer value, add address of the pointer itself, and decode the DEFN at that address. When a file consists of a list of variable length structures, where the first field is the size of the structure, this provides a handy way to skip past it to the next.
n32 ptr DEFN add 8 rel: fetch pointer value, add address of the pointer itself, add the numeric value 8 (this can be negative), and decode the DEFN at that address. This case is common for when one structure includes a field which identifies an amount of data to skip before the next structure is seen.
n8 ptr DEFN add 1 align 4 abs: fetch pointer value, add 1, and round up to the next 4 aligned address, before decoding DEFN at that address. Sometimes data items in files have length fields which need to be rounded up to a multiple of N (typically 2 or 4), before the next data field appears.

Clearly the expr mechanism described above can be used to similar effect.

The procedure for following pointers is :-

Fetch pointers numeric value.
If nonull and pointer is 0, then don't follow the pointer.
If mul, then multiply the pointer value by the size of the item being pointed to.
If mult MULT, then multiply the pointer value by MULT.
If add BASE, then add BASE to the pointer value.
If rel, then add the address of the pointer itself.
If seg, then mangle pointer address to account for the 16:16 segmented mode of x86 processors.
If align ALIGNMENT, then round up pointer to the next multiple of ALIGNMENT.
Decode and display data item at resultant address.

The seg keyword works by taking the top 16 bits of the pointer value as the segment, the bottom as the offset, and producing a new pointer value which is segment*16+offset. This feature may be of use for decoding large memory model program dumps which have been running on x86 processors running in real mode, or a 16:16 protected mode with a linear selector mapping. This feature is not recommended - its much easier to use the new -g command line switch instead. Anyone with a sensible file format to decode, or a memory dump taken from the memory space of a processor of a sensible architecture, can ignore this feature.

The keyword open may be given and this has the effect of increasing the level of detail that is initially displayed. See the description of the level of detail of display feature later in this document. This feature has its problems (bugs), but can be used to ensure that small arrays and short definitions are displayed in full without the user having to manually increase the level of detail by hand.

The suppress field attribute may be given using the suppress keyword. Suppressed fields are omitted from display when showing a whole definition on one line (by expanding the level of display). Suppressed fields are shown in round brackets when viewing a definition with each field on a new line.

The tag attribute may be given. When this field is initially displayed, the line will initially be tagged. Typically you might pre-tag one or two specific fields in a structure, if the structure were large, and certain fields were more important than others.

The width WIDTH attribute may also be given. By default, field widths are 0, which means don't pad or truncate fields when they are displayed. When set non-0, each field (or each individual field of an array) is padded or truncated to be the given width. If a field is truncated, a > or < symbol is shown. The width can be changed interactively by the user.

A validity check expression may be associated with each field, via the valid V syntax. Any field which passes its validity check has ++ displayed next to it, and any which fails has -- displayed next to it. When a whole definition is shown on one line (by expanding the level of detail of display), those fields which fail their validity tests, are not shown. This provides a handy way of doing conditional decode of variant records.

  map T_
    {
    "T_SHORT" 1
    "T_LONG"  2
    }

  def VARIABLE
    {
    buf 20 asc zterm                "name"
    n8 map T_                       "type"
    union
      {
      n16 dec valid "type==T_SHORT" "value16"
      n32 dec valid "type==T_LONG"  "value32"
      }
    }

Sometimes validity checks can get quite long, so remember that backslash at the end of a line causes 'line continuation', as in :-

  n8 dec                             "discriminator"
  n16 dec valid "discriminator==1||\
                 discriminator==2||\
                 discriminator==3"   "conditional_field"

Aside: Beware of using the C/C++ pre-processor (or other macro pre-processors) on BE initialisation files - they may not handle things like 'line continuation' quite the same way as BE does. eg: In the example, BE ignores the white space preceeding the word discriminator on the last two lines, but some (all?) C++ pre-processors include the white space in the final string!

Finally the name of the field must be given. You used to have to pad all field names of the same definition to be the same width with spaces, so that when displayed, everything lines up nice. But now BE does this automatically for you.

A typical structure definition might look like :-

  def FROGLISTELEM
    {
    n32 ptr FROGLISTELEM "next_frog_in_list"
    buf 100 asc          "name_of_this_frog"
    }

However, consider the case that BE is being used to edit a dump of a processors memory space. In this case we also wish to be able to see all the global variables, whose addresses are determined by a symbol (rather than some fixed address). So it is typical to take advantage of the fact that fields can be placed at any offset into a structure (using at EXPR), and that expressions may refer to the symbol table (using addr "SYM"). You put such fields in a structure holding global variables, which would be decoded from address 0. You'd write something like :-

  def GLOBAL_VARS
    {
    at addr "frog_list" n32 ptr FROG "frog_list"
    ...
    }

Now this can be a very common idiom, and you usually want the displayed field name to match the symbol name. So to avoid typing everything twice, BE provides a short-cut :-

  def GLOBAL_VARS
    {
    n32 ptr FROG at "frog_list"
    ...
    }

When this feature was added to BE, and some real-world BE initialisation files were modified to take advantage of it, the files got 17% smaller.

Alignment declarations

Normally, when parsing a structure definition, each field is positioned immediately after the one before (unless the union, align, or at keywords are used).

When BE begins processing the initialisation file, it believes that all n8, n16, n24, n32, n40, n48, n56 and n64 variables should be aligned on a 1 byte boundary. In other words, no special alignment is to be automatically performed.

This is radically different from the way the high level languages such as C lay out the fields within their structures and unions. These languages enforce constraints such as '32 bit integers are aligned on 4 byte boundaries'. This is usually done because certain processor architectures either can't access certain sizes of data from odd alignments, or are slower doing so. This can be accounted for by manually adding padding to structure definitions :-

  def ALIGNED_USING_MANUAL_PADDING
    {
    n8 "fred"
    buf 3 "padding to align bill on a 4 byte boundary"
    n32 "bill"
    }

Or alternatively, the align keyword could be used :-

  def ALIGN_USING_align_KEYWORD
    {
    n8 "fred"
    align 4
    n32 "bill"
    }

It is possible to tell BE to automatically align n8, n16, n24, n32 or nested definition fields on specific byte (offset) boundaries by constructs such as the following (which corresponds to many 32 bit C compilers) :-

  align n16 2
  align n32 4
  align def 4
  align { 4
  align } 4

  def ALIGNED_AUTOMATICALLY
    {
    n8  "fred"
    n32 "bill"
    }

The align { directive specifies that nested definitions must start on the indicated boundary. The align } directive specifies that structure sizes get rounded up to a multiple of the alignment.

Clearly, this feature is more useful when BE is being used to probe memory spaces of running programs via an memory extension, or doing post-mortem examination of program memory dumps.

Most data file formats don't-need-to and/or don't-bother-to align their fields.

Include directives

The initialisation file can contain the following, as long as it is outside of any other definition :-

  include "anotherfile.ini"

Be sure to notice that this is a initialisation language command, not a pre-processor directive like $ifdef. This is why it is not $include.

There is also a tryinclude variant, which tries to open the file specified, but does not get upset if it can't :-

  tryinclude "extrastuff.ini"

Reserved words

The following are reserved words, and so should be avoided as names of constants in the initialisation file :-

  abs add addr align asc at be bin bits buf code dec def ebc expr glue hex
  include le lj map mul mult n8 n16 n24 n32 n40 n48 n56 n64 nocode noglue
  nolj nomul nonull noseg nozterm null oct offsetof open ptr rel seg set
  signed sizeof struct suppress sym tag time tryinclude union unset unsigned
  valid valof width zterm

A sample initialisation file

Here is a real initialisation file, which is intended for viewing the master boot record written on sector 0 of PC disks :-

  //
  // mbr.ini - BE initialisation file for decoding master boot records
  //
  // Under Linux, root can obtain the MBR via a command much like :-
  //
  //   # dd if=/dev/sda of=mbr.dat bs=512 count=1
  //
  // Then you'd invoke BE via :-
  //
  //   % be -i mbr.ini mbr.dat
  //
  // The file assumes the drive from which the MBR was obtained has
  // 63 sectors per track and 255 heads. These assumptions are used in
  // computations of LBAs given CHS information. If the disk geometry is
  // actually different (as is likely for <8GB disks), you can override
  // the assumptions via a command line much like :-
  //
  //   % be -Ssectors_per_track=32 -Sheads=127 -i mbr.ini mbr.dat
  //
  // Information obtained mainly from STORAGE.INF.
  //

  set nspt `sectors_per_track 63`
  set nh   `heads 255`

  map BOOTINDIC
    {
    "Not Active" 0x80 : 0x80
    "Active"     0x00 : 0x80
    }

  map PARTOWNER
    {
    "Unused"                                           0x00
    "DOS, 12-bit FAT"                                  0x01
    "XENIX System"                                     0x02
    "XENIX User"                                       0x03
    "DOS, 16-bit FAT"                                  0x04
    "Extended"                                         0x05
    "DOS, >32MB support, <=64KB Allocation unit"       0x06
    "OS/2, >32MB partition support"                    0x07
    "Linux swap"                                       0x82
    "Linux native"                                     0x83
      // Note: lots missing for brevity of example
    }

  def PARTCHS
    {
    at 1 n8 bits 7:6 hex width 3            suppress "CylinderHigh"
    at 2 n8 hex width 4                     suppress "CylinderLow"
    expr "(CylinderHigh<<8)+CylinderLow" dec width 4 "Cylinder"
    at 0 n8 dec width 3                              "Head"
    at 1 n8 bits 5:0 dec width 2                     "Sector"

    expr "(Cylinder*nh+Head)*nspt+Sector-1" width 8 dec
                                            suppress "lba"
    }

  def PART
    {
    n8 map BOOTINDIC          "BootIndicator"
    PARTCHS open              "PartitionStart"
    n8 map PARTOWNER          "SystemIndicator"
    PARTCHS open              "PartitionEnd"
    n32 dec width 8           "OffsetFromStartOfDiskInSectors"
    n32 dec width 8           "PartitionLengthInSectors"

    // By adding these two, you can work out the LBA
    // immediately following the partition
    expr "OffsetFromStartOfDiskInSectors+PartitionLengthInSectors"
         dec width 8 suppress "next_lba"

    expr "OffsetFromStartOfDiskInSectors*512" hex ptr MBR
         valid "SystemIndicator==Extended" suppress "extended" 
    }

  def MBR
    {
    buf 446 hex                          "MasterBootRecordProgram"
    4 PART                               "PartitionTable"
    n16 be hex valid "Signature==0x55aa" "Signature"
    }

  def main
    {
    MBR "mbr"
    }

In the above example quite a few of the BE features are demonstrated.

The setting of nspt and nh shows the expression syntax meaning "value of symbol if defined, else default value". These variables represent the geometry of the disk.

The map BOOTINC shows using map for decoding bits (in this case just one bit). This mapping decodes the per partition 'boot indicator' flag.

The map PARTOWNER shows using map for decoding enumerations. This mapping decodes the owner (or type) of the partition.

The def PARTCHS, which shows the cylinder head and sector of either the start or end of a partition, shows the following BE features :-

How to decode the bytes in the structure in an order other than first byte first, by using the at OFFSET construct.
How to extract just some bits of the bytes/words using bits MS:LS and how to combine them together to make a meaningful value using expr "EXPRESSION".
How to use width WIDTH so that the screen layout is nice.
How to use suppress to suppress some fields, so that only the ones worthy of display are shown when the entire PARTCHS structure is shown expanded on one line.

In the def PART, which decodes an entire partition entry in the partition table in the master boot record, we can see the use of the earlier two mappings, and also the use of open so that the PARTCHS structures are shown 'ready expanded'. Because of the earlier use of suppress in the def PARTCHS explained above, you'll just see the decoded cylinder head and sector. Of course when you select the PARTCHS, you'll see everything.

The use of expr "EXPRESSION" for computing next_lba is really useful when you use it in conjunction with the computed lba in the def PARTCHS above. Basically, the LBA beyond the end of a partition should be the start of the next partition, and so should be the OffsetFromStartOfDiskInSectors of another partition, and this should tally with the LBA computed from the CHS in the PartitionStart.

In the def MBR there is the use of a valid "EXPRESSION" validity check. It says the Signature field is valid, if it is 0x55aa. So if when you load BE to view an MBR, the Signature is shown with a -- indicator, you know the MBR isn't valid.

Clearly, with the above file, it is possible to do some rather low and dirty FDISK like things to MBR data, especially if you are using BE via a memory extension to directly access live disk sectors. Just because you can, it doesn't mean you should - be careful.

The supplied initialisation file

The supplied initialisation file contains enough definitions to enable you to examine the contents of many file formats.

Bitmap files supported include :-

Windows or OS/2 bitmap.
Dr.Halo bitmap.
Compu$erve GIF bitmap.
Amiga ILBM bitmap.
JFIF JPEG bitmap.
IBM KIPS bitmap.
Microsoft Paint bitmap.
Atari ST, NEOchrome bitmap.
Dr.Halo palette.
ZSoft PCX bitmap.
Portable Network Graphics bitmap.
IBM Page Segment.
Sun Raster file.
Utah Raster Toolkit bitmap.
Silicon Graphics bitmap.
RiscOS Sprite bitmap.
Targa/Vista bitmap.
Aldus/Microsoft TIFF.
IBM M-Motion bitmap.
Kodak YCC printer bitmap.
X Windows 10 window dump.
X Windows 11 window dump.

Animation formats :-

Atari ST, NEOchrome Animation Format.
Atari ST, Animatic File Format.
Atari ST, Cyber Paint Sequence Format.

Also, the following miscellaneous file formats :-

IBM OS/2 Resource file.
Microsoft RIFF chunked file.
RT and MR 3D texture format.
ZIP compressed archive file.

The definitions in the initialisation file are in no way complete, or intended to be a definitive statement of such files contents, but are merely intended to aid in the browsing of the contents of such files.

Limitations of BE make it awkward to decode certain data structures in some files, so the attitude taken is typically 'display as best you can', and where data may be of variable length 'display the first few bytes worth...'.

If you are simply interested in looking at some of the file raw, you can use the DB, DW and DD definitions that come supplied in the default initialisation file. If you wanted to look at memory at 0x8000 as dwords, you could type :-

  @ DD Enter 0x8000 Enter Enter

BNF definition

Here is a more formal specification of the BE initialisation file language. Actually, BE will accept variations on the following, but here we document the clearest/least-ambiguous use of the language. Where BE accepts variations on the following, typically it is in the ordering of independant attributes.

Some basics just before we start :-

  <number> ::= a number in C/C++ style
               as in 0b1101, 0o15, 13, or 0x0d, or '\r' or similar
  <id>     ::= a C/C++ style identifier
  <string> ::= a C/C++ style double quoted string
               which is clean (characters between 32 and 126 only)
               as in "Hello World" etc.
  <buffer> ::= a string or hexstring buffer
               as in "SIGNATURE" or @FF0022

Numeric expressions :-

  <sep>    ::= { ',' | ';' }

  <expr13> ::= <number>
             | '+' <expr13>
             | '-' <expr13>
             | '~' <expr13>
             | '!' <expr13>
             | 'addr' <string>
             | 'sizeof' <id>
             | 'offsetof' <id> <string>
             | 'valof' <string>
             | 'map' <id> <string>
             | '(' <expr> ')'
             | <id>
             | '`' <id> <expr> '`'
             | '.'
             | '[' <n_value> <sep> <expr> [ <sep> <expr> ] ']'
             | '[[' <buffer> >sep>
               <expr> <sep> <expr> <sep> <expr> [ >sep> <expr> ] ']]'
             | 'strlen' <expr>

  <expr12> ::= <expr13> { ( '*' | '/' | '%' ) <expr13> }
  <expr11> ::= <expr12> { ( '+' | '-' ) <expr12> }
  <expr10> ::= <expr11> { ( '<<' | '>>' | '>>>' ) <expr11> }
  <expr9>  ::= <expr10> { ( '>' | '<' | '>=' | '<=' ) <expr10> }
  <expr8>  ::= <expr9>  { ( '==' | '!=' ) <expr9> }
  <expr7>  ::= <expr8>  { '&' <expr8> }
  <expr6>  ::= <expr7>  { '^' <expr7> }
  <expr5>  ::= <expr6>  { '|' <expr6> }
  <expr4>  ::= <expr5>  { '&&' <expr5> }
  <expr3>  ::= <expr4>  { '^^' <expr4> }
  <expr2>  ::= <expr3>  { '||' <expr3> }
  <expr>   ::= <expr2>  { '?' <expr2> ':' <expr2> }

Sometimes in expressions . (dot) is allowed. It usually refers to some default amount. Other times it isn't allowed.

A maplet is mapping from a number to a string to display, and a map is zero or more maplets :- Using . in the maplet value (first expression) means 0 or previous value plus 1, and in the maplet mask (optional second expression) it means the same as the value.

  <maplet> ::= <string> [ 'suppress' ] <expr> [ ':' <expr> ]
  <map>    ::= 'map' <id> [ 'add' ] '{' { <maplet> } '}'

Numeric fields. Where the value comes from, how to display it, how to use it as a pointer (if it is one), and putting it all together :-

  <n_value>       ::= ( 'n8'  | 'n16' | 'n24' | 'n32' |
                        'n40' | 'n48' | 'n56' | 'n64' )
                      [ 'le' | 'be' ]
                      [ 'bits' <expr> ':' <expr> ]
                      [ 'signed' | 'unsigned' ]
  <expr_value>    ::= 'expr' <string>
  <code_attrs>    ::= [ 'lj' | 'nolj' | 'glue' | 'noglue' ]
  <numeric_attrs> ::= [ 'map' <id> ]
                      [ 'asc' | 'ebc' | 'bin' | 'oct' |
                        'dec' | 'hex' | 'sym' | 'time' ]
                      [ 'code' <code_attrs> | 'nocode' ]
  <pointer_attrs> ::= [ 'ptr' <id>
                        [ 'null' | 'nonull' ]
                        [ 'rel' | 'abs' ]
                        [ 'mul' | 'nomul' ]
                        [ 'mult' <expr> ]
                        [ 'add' <expr> ]
                        [ 'align' <expr> ]
                        [ 'seg' | 'noseg' ]
                      ]
  <numeric_field> ::= ( <n_value> | <expr_value> )
                      <numeric_attrs> <pointer_attrs>

The expr string is itself a numeric expression. You'll need to escape any quotes within it.

A buffer field. How big, how to show the data, and whether to stop at a NUL byte. Using . in the buffer size expression gives the current offset into the definition :-

  <buffer_field> ::= 'buf' <expr>
                     [ 'hex' | 'asc' | 'ebc' ]
                     [ 'zterm' | 'nozterm' ]

A field may name a nested definition :-

  <def_field> ::= <id>

All fields share a set of general attributes, and have a name, so a complete field specification looks like :-

  <field> ::= ( <numeric_field> | <buffer_field> | <def_field> )
              { 'open' }
              [ 'valid' <string> ]
              [ 'width' <expr> ]
              [ 'suppress' ]
              [ 'tag' ]
              [ 'at' ]
              <string>

The valid string is itself a numeric expression.

Fields are just one type of item which can be found within a definition. Offset within the definition, alignment and nested definitions. Items in an itemlist follow one another (as in C/C++ structs), or overlay each other (as in C/C++ unions). Using . in the at expression gives the current offset into the definition. So definitions are specified as :-

  <item>     ::= 'at' <expr>
               | 'align' <expr>
               | <itemlist>
               | <field>
               | ';'
  <itemlist> ::= [ 'struct' | 'union' ] '{' { <item> } '}'
  <def>      ::= 'def' <id> <itemlist>

File includes are specified :-

  <include> := ( 'include' | 'tryinclude' ) <string>

The default attributes used, if not fully specified in the fields, can be specified globally :-

  <default> := 'asc' | 'ebc' | 'bin' | 'oct' | 'dec' | 'hex' | 'sym' | 'time'
             | 'signed' | 'unsigned'
             | 'be' | 'le'
             | 'rel' | 'abs'
             | 'mul' | 'nomul'
             | 'seg' | 'noseg'
             | 'null' | 'nonull'
             | 'code' | 'nocode'
             | 'lj' | 'nolj'
             | 'glue' | 'noglue'
             | 'zterm' | 'nozterm'
             | ( 'align' ( 'n8'  | 'n16' | 'n24' | 'n32' |
                           'n40' | 'n48' | 'n56' | 'n64' |
                           'def' | '{' | '}' ) 
                 <expr> )

Set and unsetting :-

  <set>   ::= 'set' <id> <expr>
  <unset> ::= 'unset' <id>

So the total language is :-

  <be> ::= <map>
         | <def>
         | <include>
         | <default>
         | <set>
         | <unset>

Using the editor

BE displays most of the non-obvious keys you may press on the 2nd line of its status area, at the top of the screen.

BE works by presenting lists to the user. These can be lists of data fields, lists of array elements etc.. A user action can result in a new list being displayed on top of the previous one. Effectively, there is a 'stack' of lists, where you always get to see the topmost one. The level of nesting is always on display at the top right hand corner of the screen.

General keys

Although not displayed, the arrow keys, such as Up, Down, PgUp, PgDn, Home, End, Left and Right all work in the obvious ways, traversing the list on display. The Wordstar 'cursor diamond' keys ^E, ^X, ^R, ^C, ^W, ^Z, ^S and ^D also work.

As you move around the current list, your line number and total number of lines in the list are shown on the top right of the screen in the form line/totallines.

The user can discard the current list, and go back to the previous one by pressing Esc.

q or @X (ie: Alt+X) exits the program. If you have made any changes, you will be prompted as to whether BE should write them out to disk. @W can write out any unsaved changes.

p allows you to 'print' the list on display to a file. You can specify the filename, and whether to append to or overwrite any existing file of that name. ^P is a short hand way of saying append to the same file as last time. If you haven't specified a file yet with p, the default is be.log. Non-printable (but displayable) characters get converted to '.' dots.

f or / or F9 allows you to do a find over the list on display. This only searches as much as the user could see if he were to manually page up and down through the list. The find command is case sensitive. n or F10 can be used to repeat the last find. If a find is taking a long time, it may be interrupted using Ctrl+Break on OS/2, Windows or DOS. Elsewhere, the Esc key may be used. The \ key will reverse the direction of the find, ready for when you next use the 'repeat the last find' function.

i allows you to generate a new list, which only has lines which include a pattern you specify. This new list pops-up on top of the current one. For example, if you have an array of trace-point events, you can easily generate a list of just trace-points from one module. Similarly, x allows you generate a display which excludes lines which match the pattern.

S can be used to generate a new list which is the same as the current list, except the lines are sorted. You are prompted for a 'sort after' pattern, and as to whether the result is to be sorted in ascending or descending order. You are also prompted whether to do textual or numeric comparison. Anything on each line, upto and including the 'sort after' pattern is ignored for the purpose of the sort. With textual comparison, the strings are compared, with numeric comparison, the strings are expected to start with a decimal, or 0x preceeded hex value, and these are compared.

N takes a textual snapshot of the current list. As the result is just text, you'll find that all you can do is view the data. This feature can be useful if you are using BE to view changing live data.

The find, include and exclude commands normally do a straight case sensitive textual comparison. The editor can be toggled in and out of Extended Regular Expression mode (as in UNIX egrep), using the @R key. When set into this mode, future finds, includes and excludes all work with extended regular expressions. eg: include (fred|bill)[0-9]+ will include all lines with 'fred' or 'bill', followed by one or more digits.

Similarly, @I can be used to toggle in and out of case sensitive search mode.

The Extended Regular Expression mode case sensitivity mode also affects the sort command. The sort command and the use of Extended Regular Expression mode go naturally hand in hand, because you often want to be able to sort upon the Nth field of each line. It is trivial to write an ERE like ,[^,]*, which matches the first pair of commas (so the sort can be done on the third field), or 0x[0-9a-f]+ which matches the first hex number.

The Extended Regular Expression mode and case sensitivity mode also affects the 'power address slide' patterns, and tag/untag all matching commands, as explained later.

The r key causes a refresh. BE re-fetches all the data on display. The R key is a slightly more aggressive form of refresh. If a memory extension providing data to BE was caching data, this type of refresh causes it to drop its cache. Sometimes BE is used with an extension to watch live real-time data, and continual refresh is desired. By pressing the periodic update key, @U, you can put BE into a mode whereby it refreshes at regular intervals. The interval is user-selectable. You exit this mode using Ctrl+Break on OS/2, Windows or DOS. Elsewhere, Esc may be used.

Tags may be placed or removed within the list on display by pressing the @T key. You may quickly move backwards or forwards between tags by pressing ^Home or ^End. Tags appear as little 'T's on the right hand side of the line. Placing or removing tags in one session or list has no effect on any others.

T and U may be used to tag or untag all lines matching a given pattern or extended regular expression.

The ! key may be used to execute an operating system command. This capability can be disabled by the -r command line flag.

@V can be used to bring up a view of a regular text file. There is no text editing capability. As special cases, F1 trys to bring up the help file, and F2 trys to bring up the configuration file.

BE doesn't just maintain a single stack of lists. In fact it maintains 10 parallel stacks, or 'sessions'. You can jump between them using the @0, @1, ... @9 keys. This allows you to be looking at several places within your data at once, and to be able to easily hop between them. The current session number is the second from last number on display on the top right corner of the screen. It is initially 1.

@C copies the stack of lists from the previous session onto the current session. Typically you use this when you've found something interesting, and you'd like to leave the current session showing the interesting data, and yet you'd also like to continue investigations around that area.

Given there are 10 sessions, each with any amount of nesting, it can be easy to get lost, so the @K allows you to generate a summary of where you are in each session.

@Z may be used to pop off all the lists in the current session, and effectively reset the nesting level to 1.

@F1 to @F4 inclusive may be used to change the colour scheme to scheme 0 to 3, as initially specified by the -c command line argument, or as initially defaulting to 0.

The keys A,O,L,I toggle the display of addresses, offsets, lengths and array indices. @A, @E, @B, @O, @D and @H may be used to set the display mode of the array indices to ASCII, EBCDIC, binary, octal, decimal or hex. Also, @Y toggles the display of addresses between raw hex, and symbol table entry and offset. The @J command toggles the display of symbolic code addresses which have the lj attribute between the short and long forms. By default, at startup, BE choses only to show array indices, the array index mode is hex, addresses are not shown symbolic, and long jumps are shown in their short form. The -v command line flag can also be used to change the startup display flags.

The | (pipe-bar) key toggles the display of pipe bars between flags in a mapping. This is typically only used when a mapping has been cleverly defined to do something like RISC instruction set disassembly, to tidy up the display.

The & (ampersand) key toggles the display of pointer values. Normally they are shown, but quite a bit of screen clutter can sometimes be removed by not showing them.

Pressing @ will cause BE to prompt for a structure definition name, and then an address. It will then pop-up a new list, decoding the memory at the given address as if it were of the specified structure type. When being prompted for the definition name, you can actually type a definition name followed by a numeric expression, in order to display an array of that many elements, each of which is a definition of the given type.

The C allows you to disassemble from a given address, assuming a disassembler extension has been supplied to BE via the -C command line argument.

D can be used to pass user-options through to the disassembler.

Initially, if a symbol table is supplied to BE, disassembly stops when the addresses symbols (as in symbol+offset) change. ie: BE stops disassembling more than one function. Although one compiled C function typically has one label, hand written assembler tends to have many labels within one function, so the Y key can toggle between stopping on label changes and ignoring them.

The @F key pops up a list of the memory sections BE is editing. There is one for each file (or memory extension invokation) currently being edited. Against each, BE says whether it has any unsaved changes.

The editor holds a list of 12 'address slide' patterns, and these may be displayed by pressing @M. These are used when the 'power address slide' feature is used. You can set one of the 12 patterns by using the ~F1 to ~F12 keys. To disable one, you specify a new pattern as a empty string.

The editor holds an 'address slide' delta value. Initially this delta value is 4, but it may be changed using the # key. When using #, dot '.' may be used in the numeric expression, and its current value is the current delta value. This delta value is used by the manual 'address slide' feature using the < and > keys, and also the 'power address slide' feature.

If you press ?, BE will prompt for an numeric expression, which it will then evaluate. You can then choose to see the result in binary, octal, decimal, hex or symbolic forms, signed or unsigned.

$ is similar, except it will prompt for a variable name first. It will set the variable to the result of evaluating the expression. If the variable is already set, its value is changed. If the expression is empty, the variable is unset.

When you use the ^L key, you are prompted for a count and a keystroke. BE presses the keystroke on the current line, and then steps down a line. It does this once for each of the count of lines you specified. The count value can be 0 or blank, meaning upto the end of the list on display. This keypress, step down and repeat loop, will stop if the keypress is not 'understood' by the line it is pressed on. This means that only keypresses which operate on a given line are sensible for using with ^L. It will also stop if the end of the list is reached.

^K toggles the keep-going-on-error flag. This flag is initially false, causing ^L to stop if a line doesn't understand the keystroke. However, when true, ^L simply advances to the next line.

@G can be used to go to the Nth line on display. 0 means the first line, a blank line number, or a very large number means the very last line.

Normally BE will only show at most 4096 lines of data, ie: elements of an array or elements of a linked-list. ^ can be used to change this number to anywhere in the range 256 to 65536. Use with caution, BE can get much slower when dealing with longer lists.

^U and ^V can be used to cycle the memory sections around one way or the other. When you have multiple files/memory sections covering the same address range, this controls which one a memory reference will hit first.

When viewing data

At any given time you may be displaying some data from some start address, as indicated on the title at the top of the screen.

The . key can be used to change the current address, and the , key can be used to add to the current address.

The editor provides a feature known as 'address sliding'.

You can use the ( and ) keys to step (slide) the address backwards or forwards by 1.

You can also use the < and > keys to step (slide) the address backwards or forwards by a particular delta (as setup by the # key, described above).

The 'power address slide' feature is the combination of regular 'address sliding' with a pattern match capability. You set up the power address slide patterns and then press [ or ] (for a backwards or forwards search). You then state whether one, all, or all-in-order of the patterns must match, and how to refresh the screen as the search proceeds. You're also prompted for an address to stop at. BE then slides through memory, checking to see whether the patterns can be matched with the screen, and if so it stops.

A 'power address slide' may be interrupted via Ctrl+Break (OS/2, Windows or DOS), or Esc (elsewhere).

There are a few main uses of address sliding :-

You know the rough address at which a particular structure is, so you use the keys to step through memory until the display changes from a structure that looks obviously wrong, to one which looks possibly right.
You wish to browse memory hex style, perhaps by using the DD definition in the default initialisation file. You set the delta value to be a page worth of data, and then use the < and > keys to page up and down.
You have an array of a large number of elements, each of which is a structure defininition. You display the first one, and then set the delta to be the size of the element. Then < and > can be used to rapidly step from element to element.
You use the power address sliding feature to locate a structure in the file or memory space.

The justification for the default delta of 4 is that many structures within processor memory spaces or within files are 4 byte aligned.

The @ command described earlier works a little better when you are viewing data, because a dot used in the numeric address expression is taken to mean the current address (as shown on the title).

Similarly, the C command described earlier works a little better when you are viewing data, because a dot used in the numeric address expression is taken to mean the current address (as shown on the title).

Often you may find yourself looking at a definition that is actually a member of a larger definition. If you know the offset of the smaller definiton in the larger definition, you can subtract this from the current address and display the larger parent definition. This can be awkward, so the @P key will pop-up a list of all possible parent definitions, with an entry for every time the smaller definition appears in another definition.

Manipulating the current datum

g/l is displayed if you are allowed to change the memory interpretation mode to big or little endian.

s/u is displayed if you are allowed to change the signed display mode to signed or unsigned.

A subset of the keys a/e/b/o/d/h/k/y/m may be displayed if you are allowed to change the viewing mode to ASCII, EBCDIC, binary, octal, decimal, hex, decode seconds since epoch, symbolic or via a mapping table.

z is displayed if you are allowed to toggle the 'stop displaying when a nul terminator is found' attribute.

The t will decode the current field as if it were raw ASCII text, and will break it up into lines upon CR, LF, CR-LF pair, or NUL boundarys. The new line-by-line list pops-up on top the current list.

If the datum is a code address (marked with the code attribute in the initialisation file), then c can disassemble the code at that address.

+/- is displayed to indicate that the level of detail of display may be increased or decreased. Level 0 means display the data type only. Level 1 means display the first level of data. Levels 2 and above mean display additional levels of detail.

Increasing the level of display can make BE open up an array, and enumerate the elements. eg: 3 n32 to [123,123,456].

Increasing the level of display can also make BE open up a definition, and display the fields. eg: VAR to {"name",123}.

This is capable of opening up the datastructure pointed to by a pointer, providing the pointer may be fetched and followed.

Some examples :-

level 0 (=type) level 1 level 2 level 3

n32 7 7 7

3 n32 3 n32 [8,9,10] [8,9,10]

VAR VAR {"a",1} {"a",1}

2 VAR 2 VAR [VAR,VAR] [{"b",2},{"c",3}]

n16 ptr VAR 22->VAR 22->{"d",4} 22->{"d",4}

2 n8 ptr VAR 2 n8 ptr VAR [33->VAR,44->VAR] [33->{"e",5},44->{"f",6}]

level 0 (=type)	level 1	level 2	level 3
`n32`	`7`	`7`	`7`
`3 n32`	`3 n32`	`[8,9,10]`	`[8,9,10]`
`VAR`	`VAR`	`{"a",1}`	`{"a",1}`
`2 VAR`	`2 VAR`	`[VAR,VAR]`	`[{"b",2},{"c",3}]`
`n16 ptr VAR`	`22->VAR`	`22->{"d",4}`	`22->{"d",4}`
`2 n8 ptr VAR`	`2 n8 ptr VAR`	`[33->VAR,44->VAR]`	`[33->{"e",5},44->{"f",6}]`

Enter is displayed if you can press enter to either show the contents of the sub-definition, or to follow a pointer and show the definition there. This results in a new list of fields or array elements being popped-up. The Esc key brings you back to where you are now.

There is a shorthand of the above @ command. If you are on a numeric field, and you know this is an absolute pointer to a structure definition, you can use the follow pointer key *. BE will then prompt for the definition name. This shortcut ignores any pointer information that may be deduce-able from the value on display, so even if you are looking at a relative pointer which is aligned, BE will decode a definition at an absolute address.

Another handy command is P. If you press this when on a numeric field, it allows you set or change what datatype the value points to. This is great for when you've forgot to put something in the BE initialisation file.

The editor provides the @L key, which makes the job of following long linked lists especially easy. If you looking at the members of a definition, and are on a member which is in fact a pointer to the same type of definition, then you can use the @L (show list) key. You will be presented with the elements in the linked list (at least the first 4096 by default), and at the end the reason the link following ended. This reason can be that there are too many to show at once, 'can't fetch value', 'can't follow null pointer', or the list has 'looped back' to an element shown earlier. If your list is really long, you can always go to the last linked list element on display, select it, and then use the @L key again to get the next 4096 elements!

The = key may be used to edit the current field on display.

If the current field is a numeric value, then you can type a new expression, according to the rules for numbers and expressions used when parsing the initialisation file. Dot '.' evaluates to the fields current numeric value. Examples include :-

  1
  1+2
  addr "symbol"
  sizeof RGBTRIPLE
  map FF_ "FF_Split" | 0x20

If the current field is displayed via a mapping table, then the M key can be used to bring up a list of the maplets, and whether each of them can be decoded from the numeric value. The current fields value can be edited from this new list. Esc quits the maplet list.

If the current field is a buffer, then either ASCII data or raw hex bytes may be supplied :-

  "a string within quotes"
  @1234FF00

If the zterm attribute is applicable to the current field, then after the data is stored, a NUL terminator is appended.

The @S key toggles the suppress attribute of the current datum. This affects how the current structure shall be displayed, when displayed in short. The @N key unconditionally sets the suppress attribute of the current datum. Only non-suppressed fields are shown in the one line summary.

The v key can be used to disable (or re-enable) a fields validity check. Validity checks can act as a form of 'suppression' when viewing definitions 'one to a line'. This keystroke can help cancel that effect (if desired). The V key allows you to set/change the validity check on a field.

w can be used to set the field width. Normally fields are shown is as many characters as are necessary. This corresponds to a field width value of 0. When non-0, fields are padded or truncated to the indicated width.

Del and Ins can be used to copy and paste between the current datum and a memory clipboard or file. To use the memory clipboard, simply specify a blank filename when prompted. Only smallish blocks of data (<=4MB) can be copied or pasted. The amount of data transferred is always the minimum of the datum size, the clipboard size and 4MB.

The external edit key, E, works by prompting you for an editor command. It then saves away the current datum into a temporary file and invokes the editor on it. Afterwards, the file contents are re-read. At most 4MB can be processed in this way. This might be useful if a file contained a chunk of free-flow text, and you wished to perform some complicated editing on it, involving inserting and deleting - you could externally edit that chunk using a text editor. Or, sometimes when editing binary data, you might like to see it in a typical hex dump and edit raw hex - you can externally edit with a normal hex editor. This command doesn't work if BE is running in restricted mode, ie: has been invoked with the -r command line argument.

Z will zero the current datum. Only datums of 4MB or smaller can be zeroed.

When on a maplet list

Each possible maplet in the mapping is displayed in the list. Each maplet has a mask and value, and the maplet is deemed to match if :-

  value & maplet.mask = maplet.value

In this case a 1 is displayed next to it, otherwise a 0 is shown.

If you press 0 then the value is anded with the complement of the mask.

If you press 1 then the value is anded with the complement of the mask, and then the value is or-ed in.

Although this may seem strange, the net effect is that when maps are being used for enumerations, 1 will change the value from whatever it was before to the new desired value.

When the mapping is used for decoding bitfields, 1 will turn on a bit and 0 will turn it off.

Examples of enumeration and bitfield style mappings :-

  map ENUMERATION              map BITFIELD
      {                            {
      "first value"  1             "lowest bit" 0x01 : 0x01
      "second value" 2             "next bit"   0x02 : 0x02
      "third value"  3             "high bit"   0x80 : 0x80
      }                            }

The @S and @N keys toggle or set the maplet suppress attribute. Suppressed maplets are ignored when converting numbers to textual display.

When on a line of code

If the current line of code references another routine or code code address, c can be used to pop up another list of the referenced routine.

Similarly, if data is referenced, and the address is easily determinable by the disassembler, the * can be used to follow a pointer and display a structure at that address.

When on a memory section

W can be used to write back any unsaved changes on the current memory section. This isn't normally necessary, as when you leave BE using q or @X, you are prompted as to whether you wish to save any unsaved changes on a memory section by memory section basis.

o can be used to pass an user supplied option string to the memory extension peice of code providing the memory section. The memory extension is given the memory section instance and the option string. It can parse the option string in any way it sees fit. If there is a syntax error, or other problem, it can fail the options command with an error message to say why. If a memory section is provided from a file, this command will fail (files have no options). This user-exit mechanism might be used to allow you to tell a memory extension to change how much caching it can do.

When on a power address slide pattern

These are shown in the list brought up by the @M key, as described earlier.

It is a list of 12 entries, each of which may be disabled, a pattern or an Extended Regular Expression.

You can set one of the entries using the = key. This is the same as using ~F1 to ~F12.

Many of the keystrokes listed above were chosen so as to match the default key bindings of Andys Source Code Folding Editor (AE).

Although OS/2, Windows, NetWare, AIX, Linux and DOS machines are able to support Alt keys, not all UNIXes are. In fact Alt key support for UNIX can vary depending upon terminal types. Therefore UNIX versions of BE provides a 'feature' whereby Esc quickly followed by a key is equivelent to pressing Alt and the key together.

Symbol table files

As BE is often used for viewing memory dumps from embedded programs, support for symbol tables is highly desirable. Although BE technically need only support one format, it actually supports a few of the more commonly used formats to avoid a proliferation of symbol file conversion programs.

ARM linker .sym

The arm symbol format is the default. Each non-blank line in the symbol file has the symbol name, followed by a number of spaces, followed by the address specified in hex (without an 0x prefix). Additional information is sometimes present on the end of the line (particularly if overlays are used), but this is ignored.

Linux Kernel symbols

On a Linux computer, the 'proc' filesystem provides a special file called /proc/ksyms. Each line of this file has an address in hex (without an 0x prefix), followed by a space, followed by the symbol name.

This is the ksyms symbol table format.

eg:

  be -Y ksyms -y /proc/ksyms kernel.dat
    -- assuming kernel.dat is a dump of the kernels memory

Note that sometimes the address and symbol are followed by more information. This additional information is ignored.

Linux has a symbol versioning convention whereby it can append a suffix to each symbol. The suffix varies depending upon the type of Linux kernel in use, ie: whether it is SMP or not, or compiled in '2GB mode' or not. BE has the following symbol formats, which strip the indicated suffix of each symbol as it is read :-

BE symbol format what suffix is stripped
_ksyms_R _Rhhhhhhhh
_ksyms_Rsmp_ _Rsmp_hhhhhhhh
_ksyms_R2gig_ _R2gig_hhhhhhhh
_ksyms_Rsmp2gig_ _Rsmp2gig_hhhhhhhh
In the above hhhhhhhh are lower case hex digits, which contain the versioning information. BE allow 8 or 16 digits in the versioning information.

BE symbol format	what suffix is stripped
_ksyms_R	`_Rhhhhhhhh`
_ksyms_Rsmp_	`_Rsmp_hhhhhhhh`
_ksyms_R2gig_	`_R2gig_hhhhhhhh`
_ksyms_Rsmp2gig_	`_Rsmp2gig_hhhhhhhh`

See /usr/src/linux/Rules.make to understand where these suffixes come from.

AIX NM

The nm command on an AIX 4.1 or later machine generates output which is understood by the aix_nm symbol table format.

Typically nm is invoked with the -e argument, so that only external symbols get listed.

Each line has the symbol name, followed by a symbol type character, followed by an address and optionally followed by a length. Fields are seperated with white space. Addresses and lengths are 0x preceeded if they are listed in hex (this is caused by invoking nm with the -x flag).

BE ignores 4 byte type d data entries from the table, as these tend to refer to TOC entries.

BE also ignores machine generated symbols which start _$STATIC.

C++ symbol names are typically listed demangled, and so can contain spaces. BE has quite complicated special logic to handle this.

Note that the symbol values obtained using nm are actually offsets from the beginning of the executable. You'll need to determine where the executable is in memory or the crash dump memory image, perhaps using the AIX crash command. Assuming this base value to be 0xBBBBBBBB, you would pass the following options to BE :-

  -Y aix_nm -y symbolfile.sym@0xBBBBBBBB

It is not too difficult to write a BE memory extension which accesses AIX kernel memory space by accessing /dev/kmem. Hey presto, BE can show live datastructures within the AIX kernel!

link.exe .map

The map format corresponds the .map files written by the 16 bit DOS link.exe program.

This has a section at the beginning of the file which declares segment names, positions and sizes. BE ignores this.

Next the symbols are listed, ordered by name, and BE ignores this too.

Finally the symbols are listed again, ordered by value. BE reads this data.

Each line is of the form :-

  SSSS:OOOO SymbolName

BE enters an entry in the symbol table of value 0xSSSSOOOO for each symbol. This works well in conjunction with BEs -g command line argument.

eg:

  be -Y map -y embedded.map -g dump.dat@0xf0000
    -- assuming embedded.map is the map file from linking some embedded
    -- application, and that dump.dat is a dump of the memory starting
    -- at physical location 0xf0000

NetWare .map

NLMs and drivers can be linked using the NetWare or Watcom linkers and these can both be made to spit out a .map file.

In the .map file, symbols are listed with their offset from the start of the CODE or DATA segment. In order to know a symbols address we must load the NLM and determine its CODE and DATA segment base addresses. These base values can then be added onto the offset values in the .map file.

The bases can be determined using the built-in NetWare debugger. Enter it via the Shift+Shift+Alt+Esc sequence, use .m nlmname to get the bases, and g to resume NetWare.

The following options are provided :-

-Y nw_nw_code: Read .map file produced by NetWare linker, and extract and process CODE symbols.
-Y nw_nw_data: Read .map file produced by NetWare linker, and extract and process DATA symbols.
-Y nw_wc_code: Read .map file produced by Watcom linker, and extract and process CODE symbols.
-Y nw_wc_data: Read .map file produced by Watcom linker, and extract and process DATA symbols.

Assuming an NLM had its code based at 0xCCCCCCCC, and its data based at 0xDDDDDDDD, and it was linked with the Watcom linker, you would use the following BE options :-

  -Y nw_wc_code -y nlmname.map@0xCCCCCCCC
  -Y nw_wc_data -y nlnmame.map@0xDDDDDDDD

Notice how we process the .map twice - once to get the code symbols and to relocate them by 0xCCCCCCCC, and once to get the data symbols and to relocate them by 0xDDDDDDDD. Awkward, but it works, without having to post-process the .map file output by the linker.

The NetWare linker output has a section which looks like :-

  Publics By Address
    DATA 00005B94 Evan                                (D:\build\ham\hamdata.c)
    DATA 00005B98 deviceName                          (D:\build\ham\hamhacb.c)
    DATA 00005BA8 hamName                             (D:\build\ham\hamnlm.c)

It is this section which BE uses.

The Watcom linker output has lines in it of the form :-

  CODE:00305678  fhbf
  CODE:00045678+ symmy
  DATA:00345678* sym
  DATA:00345008s symbol

To complete the picture, all that is needed is a BE memory extension which allows BE to access the memory space of an NLM.

Memory extensions

The binary file arguments to BE are normally of the form :-

  filename[@address]

This tells BE to load the file and whenever data at a memory address from address to address+filelength is accessed, to supply the data from the file.

However, it is possible to supply binary file arguments of the form :-

  extension!args[@address]

Memory extensions may be written to provide either read-only, or read-write access to their data.

BE loads the memory extension DLL or shared library. It then passes the args and address to the memory extension, who does something of its own chosing with them. The memory extension DLL can then supply data to BE on request.

One use of the BE memory extension feature is the provision of a memory extension for handling files too massive to load into memory all at once. The memory extension opens a file handle and reads bytes demanded by BE upon request. Source for BEBIG is included in this document. The user can type :-

  be big!verybigfile.dat

It ought to be noted that the author regularly uses BE on files of several megabytes in size, without a problem. However, files of several gigabytes would present a problem!

Another use is the in live-debug of running adapter cards. The memory extension can provide data bytes directly from the memory space of the adapter. args could be used to identify the slot the adapter is in. Alternatively, args could identify IO base addresses, memory window addresses, or a device driver to use to access the data. Memory extensions which do this, do exist, and they almost turn BE into a debugger (almost, because there is no run, stop, or single step). Run, stop and single step of an adapter could be driven by the options mechanism, if that were possible and/or desired. When using these, a customised initialisation file is typically also used, which understands all the structure definitions and variables used in the firmware on the adapter.

Yet another use, might be providing BE with access to physical or virtual or process specific linear address spaces, perhaps via the use of a device driver. Shared memory windows might give addressibility of datastructures in other programs. A simple example of this is a memory extension which reads bytes from the /dev/kmem special device in the AIX or Linux environment. Using this, kernel device drivers may be debugged.

Also, the surface of a disk or block device can be made accessible via an memory extension. Again, a memory extension which does this does exist (but it uses a non-standard mechanism for accessing the disk blocks). BE could then debug and repair filesystem data.

Perhaps bytes sent down a communications port could be made to appear as a stream of binary data.

The file bememext.h documents the extension interface. Currently extensions may be built for :-

AIX - today: BE is compiled with xlC C++. It runs on AIX 4.1.5 or later and can use the xlC C++ loadAndInit API to load the extension, named beextension, searching along the directories listed in the PATH and LIBPATH environment variables. The IBM xlC C++ compiler can be used to compile the extension. Unfortunately xlC C++ is expensive and being discontinued.
AIX - near future: BE will be compiled with g++. It will run on AIX 4.2.1 or later only, and will use dlopen to load shared libraries named beextension.so, searching along the LIBPATH environment variable. g++ will be usable to compile the extension.
Linux: The egcs C++ compiler (version 2.91.66, or later) can be used to compile the extension. BE uses the dlopen API to load the extension, which is named beextension.so. dlopen locates shared libraries by looking along the LD_LIBRARY_PATH, directories listed in the /etc/ld.so.cache file, and in the /usr/lib and /lib directories.
HP-UX: The aCC C++ compiler can be used to compile the extension. BE uses the shl_load API to load the extension, which is named beextension.sl. shl_load locates shared libraries by looking along the SHLIB_PATH. I reserve the right to switch-over-to or add-support-for using dlopen based shared library support.
SunOS: The CC C++ compiler can be used to compile the extension. BE uses the dlopen API to load the extension, which is named beextension.so. dlopen locates shared libraries by looking along the LD_LIBRARY_PATH (it may also look in other places).
Windows: The MS Visual C++ compiler (version 4.2 or 5.0, or later) can be used to compile the extension. BE uses the Win32 LoadLibrary API to load BEextension.DLL.
32 bit OS/2: The IBM VisualAge C++ compiler (version 3.0, with fixpaks CTC306, CTD302 and CTU304, or later recommended) can be used to compile the extension. BE uses the OS/2 DosLoadModule API to load BEextension.DLL, which it finds by looking along the LIBPATH environment variable.
32 bit DOS: The Watcom C/C++ compiler (version 11.0b) and the CauseWay DOS extender (version 1.3 or later) can be used to compile the extension. Despite using the Watcom compiler, I still prefer NMAKE over WMAKE. BE uses the CauseWay provided LoadLibrary API to load the extension, which is named BEextension.DLL, which it seems to find by looking in the current directory or along the PATH.
NetWare: The Watcom C/C++ compiler (version 11.0b) can be used to compile the extension. Despite using the Watcom compiler, I still prefer NMAKE over WMAKE. The DLL mechanism works by the DLL actually being an NLM named with a .ndl file extension. The DLL remains in memory for as long as BE needs it. The .ndl files must be on the search path so BE can find them.

A sample memory extension - BEBIG

BEBIG is a simple memory extension for accessing enormous (ie: upto 4GB) files. The source for it is included here primarily as a reference for writing others. Despite not implementing the full richness of the memory extension interface, it should serve well to get you writing and testing your own extensions.

The C++ source code, bebig.C, looks like :-

//
// bebig.C - BE memory extension for editing massive files
//
// This is a rather simple implementation that simply seeks
// around an open file, and gets and puts bytes.
//
// This only supports files less than 4GB in size due to the use of 32 bit
// addresses. BE with support for the 64 bit address space can use alternative
// memory extension entrypoints with _64 suffixes and which pass BEMEMADDR64
// addresses instead. However support for massive files varies from platform
// to platform, and thus would have complicated this example somewhat.
//

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stddef.h>
#include <stdlib.h>
#include <memory.h>

#include "bememext.h"

class BeMem
	{
	FILE *fp;
	BEMEMADDR32 base_addr, len;
	Boolean read_only;
public:
	BeMem(FILE *fp, BEMEMADDR32 base_addr, Boolean read_only)
		:
		fp(fp), base_addr(base_addr), read_only(read_only)
		{ fseek(fp, 0L, SEEK_END); len = ftell(fp); }
	~BeMem() { fclose(fp); }
	Boolean read(BEMEMADDR32 addr, unsigned char & b)
		{
		addr -= base_addr;
		if ( addr >= len )
			return FALSE;
		if ( fseek(fp, addr, SEEK_SET) != 0 )
			return FALSE;
		return fread(&b, 1, 1, fp) == 1;
		}
	Boolean write(BEMEMADDR32 addr, unsigned char b)
		{
		if ( read_only )
			return FALSE;
		addr -= base_addr;
		if ( addr >= len )
			return FALSE;
		if ( fseek(fp, addr, SEEK_SET) != 0 )
			return FALSE;
		return fwrite(&b, 1, 1, fp) == 1;
		}
	};

BEMEMEXPORT Boolean BEMEMENTRY bemem_read(
	void * ptr, BEMEMADDR32 addr, unsigned char & b
	)
	{
	BeMem *bemem = (BeMem *) ptr;
	return bemem->read(addr, b);
	}

BEMEMEXPORT Boolean BEMEMENTRY bemem_write(
	void * ptr, BEMEMADDR32 addr, unsigned char b
	)
	{
	BeMem *bemem = (BeMem *) ptr;
	return bemem->write(addr, b);
	}

BEMEMEXPORT void * BEMEMENTRY bemem_create(
	const char *args, BEMEMADDR32 addr, const char *(&err)
	)
	{
	FILE *fp;
	BeMem *bemem;
	if ( !memcmp(args, "RO:", 3) )
		{
		if ( (fp = fopen(args+3, "rb")) == 0 )
			{ err = "can't open file in read only mode"; return 0; }
		bemem = new BeMem(fp, addr, TRUE);
		}
	else
		{
		if ( (fp = fopen(args, "rb+")) == 0 )
			{ err = "can't open file in read/write mode"; return 0; }
		bemem = new BeMem(fp, addr, FALSE);
		}
	if ( bemem == 0 )
		{ fclose(fp); err = "out of memory"; return 0; }
	return (void *) bemem;
	}

BEMEMEXPORT void BEMEMENTRY bemem_delete(
	void * ptr
	)
	{
	BeMem *bemem = (BeMem *) ptr;
	delete bemem;
	}

#ifdef DOS32

// Note: Required due to the way DOS CauseWay DLLs are constructed

int main(int term) { term=term; return 0; }

#endif

#ifdef AIX

// Note: The need for this section may vanish if the AIX version of BE
// stops using loadAndInit to load shared libraries, and uses dlopen.

extern "C" {

BEMEM_EXPORT * __start(void)
	{
	static BEMEM_EXPORT exports[] =
		{
		(BEMEM_EP) bemem_read   , "bemem_read"   ,
		(BEMEM_EP) bemem_write  , "bemem_write"  ,
		(BEMEM_EP) bemem_create , "bemem_create" ,
		(BEMEM_EP) bemem_delete , "bemem_delete" ,
		(BEMEM_EP) 0            , 0
		};
	return exports;
	}

}

#endif

#ifdef NW

// Rather ugly mechanism required to make NLMs behave like DLLs under NetWare.

#include <conio.h>
#include <process.h>
#include <advanced.h>

static int tid;

extern "C" BEMEMEXPORT void BEMEMENTRY _bemem_term(void);

BEMEMEXPORT void BEMEMENTRY _bemem_term(void)
	{
	ResumeThread(tid);
	}

int main(int argc, char *argv[])
	{
	argc=argc; argv=argv; // Suppress warnings
	int nid = GetNLMID();
	SetAutoScreenDestructionMode(TRUE);
	SetNLMDontUnloadFlag(nid);
	tid = GetThreadID();
	SuspendThread(tid);
	ClearNLMDontUnloadFlag(nid);
	return 0;
	}

#endif

Yes, sure, I could use C++ streams and I could cache the data read, but this is supposed to be just a simple example.

Note that some operating systems require you to include specific bits of code in your source to make the DLL mechanism work. Thats the stuff thats #ifdefd at the end.

Under AIX, the makefile looks like the following. Note that this will change when I move from xlC++ loadAndInit style shared libraries, to g++ dlopen style :-

bebig:		bebig.o
		/usr/lpp/xlC/bin/makeC++SharedLib \
			-p 1 -n __start -o $@ bebig.o
		chmod a-x $@

bebig.o:	bebig.C bememext.h
		xlC -DUNIX -DAIX -c $*.C

Under Linux, the makefile looks like :-

bebig.so:	bebig.o
		g++ -shared -o $@ bebig.o
		chmod a-x $@

bebig.o:	bebig.C bememext.h
		g++ -DUNIX -DLINUX -fPIC -c $*.C

Under HP-UX, the makefile looks like :-

bebig.sl:	bebig.o
		aCC -b -o $@ bebig.o

bebig.o:	bebig.C bememext.h
		aCC -DUNIX -DHP +z -c $*.C

Under SunOS, the makefile looks like :-

bebig.so:	bebig.o
		CC -G -Kpic -o $@ bebig.o

bebig.o:	bebig.C bememext.h
		CC -DUNIX -DSUN -w -G -Kpic -c $*.C

Under Windows, the makefile is very similar :-

bebig.dll:	bebig.obj
		link /NOLOGO /INCREMENTAL:NO /DLL $** /OUT:$@

bebig.obj:	bebig.C bememext.h
		cl /c /DWIN32 /G4 /Gs /Oit /MT /nologo /W3 /WX /Tp $*.C

Under OS/2, using IBM Visual Age C++, a module definition file, bebig.def, is needed :-

LIBRARY BEBIG INITINSTANCE TERMINSTANCE

DATA MULTIPLE NONSHARED READWRITE

CODE PRELOAD EXECUTEREAD

EXPORTS
  bemem_create
  bemem_delete
  bemem_read
  bemem_write

Under OS/2, the makefile will typically look like :-

bebig.dll:	bebig.obj bebig.def
		ilink /NOI /NOLOGO /OUT:$@ $**

bebig.obj:	bebig.C bememext.h
		icc /C+ /W3 /Wcmp+cnd+dcl+ord+par+use+ \
			/Ge-d-m+ /Q+ /DOS2 /Tp $*.C

Under 32 bit DOS, the makefile looks like the following example. If you don't explicitly reference plib3r.lib, and the C++ code uses operator new then its multithreaded equivelent gets dragged in (which causes link problems) :-

bebig.dll:	bebig.obj
		wlink @<<
			System CWDLLR
			Name $@
			File bebig.obj
			Library %watcom%\lib386\plib3r.lib
			Option Quiet
<<

bebig.obj:	bebig.C bememext.h
		wpp386 -bt=DOS -dDOS32 -oit -4r -s -w3 -zp4 -mf -zq -fr -bd $*.C

Under NetWare, the makefile looks like the following. Again, the plib3s.lib reference is required.

bebig.ndl:	bebig.obj
		wlink @<<
			Format Novell NLM '$@'
			Name $@
			Option Quiet
			Option Map
			Option ScreenName 'System Console'
			Option ThreadName '$@'
			Debug Novell
			Module clib, mathlib
			File bebig.obj
			Library $(WATCOM)\lib386\plib3s.lib
			Library $(WATCOM)\lib386\math387s.lib
			Library $(WATCOM)\lib386\noemu387.lib
			Library $(WATCOM)\lib386\netware\clib3s.lib
			Import @$(WATCOM)\novi\clib.imp
			Export _bemem_term
			Export bemem_read, bemem_write
			Export bemem_create, bemem_delete
<<

bebig.obj:	bebig.C bememext.h
		wpp386 /s /fpi87 /mf /zfp /zgp /zl /zq /4s /fpd /wx /bt=NETWARE /DNW $*.C

Despite BE being compiled multi-threaded on 32 bit OS/2 and Win32, its not compiled that way for 32 bit DOS, AIX, Linux, HP-UX, SunOS and NetWare. One day BE for AIX, Linux, HP-UX and SunOS may be multi-threaded and thus the makefiles for making BE memory extensions may need appropriate modifications. Even though BE only uses one thread, compiling multi-threaded gives the memory extension writer the flexibility to write code which tries to read data in the background in advance of it being needed.

Disassembler extensions

The -C dx command line argument is a way of telling BE to load and use a disassembler extension for displaying any code in the data.

The same rules for naming and locating disassembly extensions apply, as for memory extensions.

eg: If you have an Intel 8086 disassembler, you could type :-

  be -C i86 dump.ram

The file bedisext.h documents the extension interface.

Disassembler extensions are compiled and linked in exactly the same way as memory extensions (see example above), although they obviously provide different entrypoints.

Flushing

When editing files, changes to the data are recorded in memory. When BE is closed down, it attempts to write back any changes back into the disk files where the data originally came from. BE will prompt you as to whether to save the changes back to disk.

If a memory extension is providing the data to BE for display, and the memory extension supports modification of the data, it has a choice :-

it can make modifications immediately.
it can accumulate modifications, and 'make them good' in response to the 'flush' request that the extension will receive when BE shuts down, if you say you would like your changes saved.

As most memory extensions provide a live view of some real-time data, they tend to opt for the first choice.

Wish list

Over the years a number of requests have popped up, some of which may actually get implemented, time allowing :-

Key binding / macro.
Save/restore session positions.
Unicode, UTF-8, DBCS, and other wide character variants.
Support for assorted floating point types.
Ability to obtain information regarding positions of fields in defns.

Note that a significant number of the existing BE features have arisen from user requests. No promises though - my free time is very scarse...

Installation

The latest version of the full BE package is most easily obtainable over the Internet via the links on my home page :-

http://www.interalpha.net/customer/nyangau/

A smaller package is available from the Hobbes FTP site, which only includes the PC versions.

Installing BE for UNIX

Copy the be_aix4, be_linux, be_hpux or be_sun executable to somewhere like /usr/bin, /usr/local/bin or ~/bin, or wherever on the path you consider appropriate, and rename it to be.
Either copy the be.ini to the same directory as be so it can be found, or copy it to .berc in your home directory. BE uses your local initialisation file in preference to the common one.
Optionally copy be.hlp to the same directory as be so it can be found.
Optionally copy be.htm to wherever you keep documentation.

On AIX, best keyboard and colour support is obtained by using an aixterm, or by logging in from OS/2 using HFTTERM.EXE. It should be noted that HFTTERM.EXE appears to have a bug whereby it doesn't generate the correct datastream for the @9 and @0 keystrokes.

On Linux, best colour and keyboard support is found using the regular linux terminal. On the RedHat distribution, the xterm terminfo entry may not include support for colour, and you may have to set the TERM environment variable to be xterm-color.

BE for Linux is now compiled on a RedHat 6.1 system with glibc using egcs-2.91.66. This effectively rules out running it on earlier libc5 based Linux systems.

On HP-UX, I get best keyboard and screen support when using an xterm or a vt100. I can even get colour support if I roll my own vt100-color by taking the default vt100 terminfo definition, adding support for the ANSI colour escape sequences (and op and bce capabilities), and using tic_colr to compile the new definition.

On SunOS, I get best keyboard and screen support when using an xterm or a vt100. I can even get colour support if I roll my own vt100-color by taking the default vt100 terminfo definition, adding support for the ANSI colour escape sequences (and op and bce capabilities).

For an easy way to enhance your terminal support to support colour, see the TERMINFO package on my home page.

Installing BE for Windows

Copy be_win.exe to be.exe, somewhere on the path.
Copy be.ini to the same directory as be.exe so it can be found.
Optionally copy be.hlp to the same directory as be.exe so it can be found.
Optionally copy be.htm to wherever you keep documentation.

BE is a Win32 application, which has had extensive testing on Windows NT. Rather less testing has been performed with Windows 95, and quite a few bugs in the Windows 95 version of the Win32 Console API (used for screen redraw) have been identified and worked around. Some oddities relating to the use of the unusual screen sizes still remain. I would not be surprised if there are more problems to be found...

Installing BE for 32 bit OS/2

Copy be_os2.exe to be.exe, somewhere on the path.
Copy be.ini to the same directory as be.exe so it can be found.
Optionally copy be.hlp to the same directory as be.exe so it can be found.
Optionally copy be.htm to wherever you keep documentation.
Optionally copy be.ico to the same directory as be.exe. This allows BE to have a cute icon when running in the Workplace shell.
Optionally create a Workplace Shell Program Object(s) that references the BE executable. The working directory should be the directory where be.ini can be found.

Installing BE for 32 bit DOS

Copy be_dos32.exe to be.exe, somewhere on the path.
Copy be.ini to the same directory as be.exe so it can be found.
Optionally copy be.hlp to the same directory as be.exe so it can be found.
Optionally copy be.htm to wherever you keep documentation.

Obviously, because BE for DOS is a 32 bit program, which uses a DOS extender, the machine upon which you run it must have a 32 bit processor.

Installing BE for NetWare

Copy be.nlm to somewhere on the search path.
Copy be.ini to the same directory as be.nlm.
Optionally copy be.hlp to the same directory as be.nlm.
Optionally copy be.htm to wherever you keep documentation.

Unfortunately I don't have continual access to all the platforms, so the latest improvements in one version may not yet be reflected into the others.

Glossary

Address slide.: When displaying data, BE is showing a particular definition at a given address. This address is shown on the title. Address sliding is a mechanism whereby this address may be advanced forwards or backwards.
Alignment.: Certain data items are required to exist at addresses which are multiples of 2, 4 or other numbers. This is often because certain processor architectures run slower accessing mis-aligned data, or are unable to do so.
Caching.: Caching is the practice of keeping a local copy of (less easily accessible) data, for speedier access. For example, when BE uses a memory extension as a means of editing some data not in a file, the memory extension may cache some of the data in memory. If the user does a full refresh (using the R key), this cached data is discarded, so any data which is subsequently displayed definitely comes from the actual data, rather than the cached copy. Also, when the user uses BE to modify data, the data in the cache may be updated, and the real data may not immediately be updated. If the user flushes the data, any pending changes (in the cache) are written back into to the real data.
Current offset.: As a definition is being defined, the current offset indicates the byte offset within it that the next field will be placed. Typically in a C structure, each field immediately follows the previous field (subject to alignment restrictions). In a C union, all the fields can overlay each other, sharing the same offset. BEs definitions are flexible enough to handle all these cases.
Data display attributes.: Each data field on display has some data display attributes which govern the way in which the fields data is fetched from memory (ie: the endianness), and the way it is displayed.
Definition.: A definition is like a C structure or union definition. It is made up of a number of fields. A definition is defined via the def keyword in the initialisation file.
Disassembler extension.: A BE disassembler extension is a peice of (possibly user written) code which BE can call upon to disassemble raw bytes of data into some instruction set. Typically disassembler extensions exist as DLLs or shared libraries.
Endianness.: Multibyte numeric values can be stored within the data with the most or least significant byte first or last. If the least significant byte is first, then the data is typically referred to as little endian, or in the Intel byte order. If the most significant byte is last, then the data is typically referred to as big endian, or in the Motorola byte order.
Expression.: Typically refers to a numeric expression, such as 1+2*3. Wherever BE prompts for a number, any numeric expression may be used. Basic arithmetic is supported, along with symbol table lookup and support for mapping. See the section on numbers for more details.
Extended Regular Expression.: This is a powerful form of a search pattern, which allows for searching for several alternatives at once, zero or one occurance of a pattern, or one or more, or zero or more, and character classes.
Field.: A number of fields together form a definition. Fields in a definition can be made to overlay each other or not, thus acheiving the effect of C structures or unions. It is possible to tell BE to display the fields in a variety of ways, via the use of data display attributes.
Flushing.: BE may hold data in a memory cache for speed of access, and may choose to 'make the changes good' in response to a flush command. The @W key will try to flush any cached data. The W key can be used to flush cached data from a single memory section. BE prompts you as to whether you wish to flush any unsaved changes before exiting.
Initialisation file.: When BE runs it locates and processes an initialisation file which includes within it all the definitions of all the structures within the data.
Level of detail.: When displaying a field, BE displays it to a specific level of detail. This level of detail may be adjusted using the + and - keys. Increasing the level of detail can result in the fields of definitions being displayed, or pointers being followed and the fields in the 'pointed-to' definitions being displayed, or elements of an array being shown.
Long jump.: The ARM instruction set only includes a branch instruction which can only jump a certain distance forwards or backwards in memory. The ARM C compiler typically generates code which uses this branch instruction. To branch long distances, a trick can be done whereby the normal branch is made to branch to an instruction which loads the instruction pointer from the word of memory immediately following. This trick means that the mapping of addresses to function names using the symbol table doesn't work properly. By using the long jump data display attribute, BE is told to take this mechanism into account, when displaying code addresses symbolically. The lj and nolj keywords are used for this purpose.
Map.: The map keyword in the initialisation file defines a mapping between numbers and strings. Essentially it is a way of mapping numbers back to more a readable enumerated type form. The map MAPNAME "MAPLETSTRING" syntax may be used in any expression in the initialisation file or at any time BE prompts you for a number, and it evaluates the the numeric equivelent of the enumerated type named value. Data displayed via mapping tables can be edited via the M key.
Memory section.: In any given invokation of BE a number of filename arguments are specified, and each of these constitutes a memory section, because the data from the file covers a section of the memory space. BE can also edit data, where the data is provided to BE via a memory extension, invoked with some parameters.
Memory space.: Every byte of data BE can edit is presented to BE at an address in the BE memory space.
Memory extension.: A BE memory extension is a peice of (possibly user written) code which provides access to the data on demand. Typically memory extensions exist as DLLs or shared libraries.
Named constant.: BE keeps a small collection of named constants. These can be created by use of the -S name=val command line argument, or through the set and unset keywords in the initialisation file.
Null pointer.: Is a pointer whose numeric value indicates that the pointer doesn't actually point to another data item at this time. Typically the numeric value 0 is used to represent this. BE has a data display attribute which indicates whether the numeric value 0 represents a null pointer. The keywords nullptr and nonullptr are used. When the user presses Enter on a pointer value, BE pops up the data in the 'pointed to' definition, unless the value is 0, and null-pointer attribute is present.
Parent definition(s): Often definitions include other definitions. Thus any given definition will have 0 or more parent definitions which include it. When displaying a definition @P will pop-up a list of all those definitions which use the current definition on display.
Pointer.: A pointer is typically a numeric value which somehow gives the address of another definition in the data. The keyword ptr DEFN is used in a field definition to indicate that a numeric field identifies the address of another definition.
Pointer glue.: When one PowerPC function calls another function via a function pointer this function pointer is actually the address of a 12 byte 'glue block'. The caller loads the address of the glue block and calls a routine called _ptrgl. This loads the TOC register from bytes 4 to 7 in the glue block and branches to the code specified by bytes 0 to 3 in the glue block. This mechanism is much like __loadds in 16 bit Intel code, in that it ensures the callee can access its own global data, even if it is a seperate library or module.
Power address slide.: This is a form of address slide, whereby BE can be made to automatically address slide until certain patterns (which can be Extended Regular Expressions) appear in the decoded data.
Session.: Navigation of the data being edited starts by displaying a list of some of the data, and bringing up other lists. You effectively build up a stack of lists, and can step back to an earlier list. This stack of lists, or thread of investigation, is referred to as a session, and BE maintains 10 independent sessions, which may be switched between via @0, @1, ... @9.
Suppressing: When displaying a definition, BE normally displays all the fields. However it is possible to display all the fields of a definition in a single one line summary, by increasing the level of detail of display. In this case, only non-suppressed fields are displayed. When viewing a structure definition with one field to a line, suppressed fields are shown in round brackets. The suppress keyword may be used in the initialisation file on a field, or the @S and @N keys may be used interactively.
Symbol table: Is typically provided from a file via the -y symtab command line argument. It is a list of names (the symbols) and their values. Typically these are code or data addresses for functions or variable within an executable program. BE can use this information so it can display addresses in symbol+offset form, or so it can allow you to type addr "symbol" in an expression and have BE substitute the numeric value of the symbol.
Validity check.: Any field can have a validity check expression associated with it, either using the valid "EXPR" syntax in the initialisation file, or by pressing the V key whilst on the field. Fields with validity checks have either ++ or -- shown next to them, depending upon whether the check passes. Fields failing their validity check are suppressed when viewing a structure definition in single line summary form. This feature can be used to effectively give conditional decode.
Zero terminator.: When strings are stored in memory or in files, often a 0 byte is appended to indicate the end of the string. BE can be told to stop displaying string data (or not) when it hits a 0 byte via the 'stop at zero terminator' data display attribute, specified using the zterm or nozterm keywords in the initialisation file.

Copying

Copying of this program is encouraged, as it is fully public domain. The source code is not publically available. Caveat Emptor.

This documentation is written and maintained by the Binary Editor author, Andy Key

nyangau@interalpha.co.uk

`+`, `-`, `~`, `!`	unary plus, unary minus, complement, not
`*`, `/`, `%`	multiply, divide, modulo
`+`, `-`	add (plus), subtract (minus)
`<<`, `>>`, `>>>`	shift left, shift right (signed), shift right (unsigned) [Note 1]
`>`, `<`, `>=`, `<=`	greater than, less than, greater than or equal, less than or equal
`==`, `!=`	equal, not equal
`&`	bitwise AND
`^`	bitwise exclusive OR
`\|`	bitwise inclusive OR
`&&`	logical AND
`^^`	logical exclusive OR [Note 2]
`\|\|`	logical inclusive OR
`? :`	conditional expression