Before diving into the internals, you should understand the formal requirements and other expectations for GDB. Although some of these may seem obvious, there have been proposals for GDB that have run counter to these requirements.
First of all, GDB is a debugger. It's not designed to be a front panel for embedded systems. It's not a text editor. It's not a shell. It's not a programming environment.
GDB is an interactive tool. Although a batch mode is available, GDB's primary role is to interact with a human programmer.
GDB should be responsive to the user. A programmer hot on the trail of a nasty bug, and operating under a looming deadline, is going to be very impatient of everything, including the response time to debugger commands.
GDB should be relatively permissive, such as for expressions. While the compiler should be picky (or have the option to be made picky), since source code lives for a long time usually, the programmer doing debugging shouldn't be spending time figuring out to mollify the debugger.
GDB will be called upon to deal with really large programs. Executable sizes of 50 to 100 megabytes occur regularly, and we've heard reports of programs approaching 1 gigabyte in size.
GDB should be able to run everywhere. No other debugger is available for even half as many configurations as GDB supports.
GDB consists of three major subsystems: user interface, symbol handling (the "symbol side"), and target system handling (the "target side").
Ther user interface consists of several actual interfaces, plus supporting code.
The symbol side consists of object file readers, debugging info interpreters, symbol table management, source language expression parsing, type and value printing.
The target side consists of execution control, stack frame analysis, and physical target manipulation.
The target side/symbol side division is not formal, and there are a number of exceptions. For instance, core file support involves symbolic elements (the basic core file reader is in BFD) and target elements (it supplies the contents of memory and the values of registers). Instead, this division is useful for understanding how the minor subsystems should fit together.
The symbolic side of GDB can be thought of as "everything you can do in GDB without having a live program running". For instance, you can look at the types of variables, and evaluate many kinds of expressions.
The target side of GDB is the "bits and bytes manipulator". Although it may make reference to symbolic info here and there, most of the target side will run with only a stripped executable available -- or even no executable at all, in remote debugging cases.
Operations such as disassembly, stack frame crawls, and register display, are able to work with no symbolic info at all. In some cases, such as disassembly, GDB will use symbolic info to present addresses relative to symbols rather than as raw numbers, but it will work either way.
Host refers to attributes of the system where GDB runs. Target refers to the system where the program being debugged executes. In most cases they are the same machine, in which case a third type of Native attributes come into play.
Defines and include files needed to build on the host are host support. Examples are tty support, system defined types, host byte order, host float format.
Defines and information needed to handle the target format are target dependent. Examples are the stack frame format, instruction set, breakpoint instruction, registers, and how to set up and tear down the stack to call a function.
Information that is only needed when the host and target are the same,
is native dependent. One example is Unix child process support; if the
host and target are not the same, doing a fork to start the target
process is a bad idea. The various macros needed for finding the
registers in the upage
, running ptrace
, and such are all
in the native-dependent files.
Another example of native-dependent code is support for features that
are really part of the target environment, but which require
#include
files that are only available on the host system. Core
file handling and setjmp
handling are two common cases.
When you want to make GDB work "native" on a particular machine, you have to include all three kinds of information.
GDB uses a number of debugging-specific algorithms. They are often not very complicated, but get lost in the thicket of special cases and real-world issues. This chapter describes the basic algorithms and mentions some of the specific target definitions that they use.
A frame is a construct that GDB uses to keep track of calling and called functions.
FRAME_FP
in the machine description has no meaning to the
machine-independent part of GDB, except that it is used when setting up
a new frame from scratch, as follows:
create_new_frame (read_register (FP_REGNUM), read_pc ()));
Other than that, all the meaning imparted to FP_REGNUM
is
imparted by the machine-dependent code. So, FP_REGNUM
can have
any value that is convenient for the code that creates new frames.
(create_new_frame
calls INIT_EXTRA_FRAME_INFO
if it is
defined; that is where you should use the FP_REGNUM
value, if
your frames are nonstandard.)
Given a GDB frame, define FRAME_CHAIN
to determine the address of
the calling function's frame. This will be used to create a new GDB
frame struct, and then INIT_EXTRA_FRAME_INFO
and
INIT_FRAME_PC
will be called for the new frame.
In general, a breakpoint is a user-designated location in the program where the user wants to regain control if program execution ever reaches that location.
There are two main ways to implement breakpoints; either as "hardware" breakpoints or as "software" breakpoints.
Hardware breakpoints are sometimes available as a builtin debugging features with some chips. Typically these work by having dedicated register into which the breakpoint address may be stored. If the PC ever matches a value in a breakpoint registers, the CPU raises an exception and reports it to GDB. Another possibility is when an emulator is in use; many emulators include circuitry that watches the address lines coming out from the processor, and force it to stop if the address matches a breakpoint's address. A third possibility is that the target already has the ability to do breakpoints somehow; for instance, a ROM monitor may do its own software breakpoints. So although these are not literally "hardware breakpoints", from GDB's point of view they work the same; GDB need not do nothing more than set the breakpoint and wait for something to happen.
Since they depend on hardware resources, hardware breakpoints may be limited in number; when the user asks for more, GDB will start trying to set software breakpoints.
Software breakpoints require GDB to do somewhat more work. The basic theory is that GDB will replace a program instruction a trap, illegal divide, or some other instruction that will cause an exception, and then when it's encountered, GDB will take the exception and stop the program. When the user says to continue, GDB will restore the original instruction, single-step, re-insert the trap, and continue on.
Since it literally overwrites the program being tested, the program area must be writeable, so this technique won't work on programs in ROM. It can also distort the behavior of programs that examine themselves, although the situation would be highly unusual.
Also, the software breakpoint instruction should be the smallest size of instruction, so it doesn't overwrite an instruction that might be a jump target, and cause disaster when the program jumps into the middle of the breakpoint instruction. (Strictly speaking, the breakpoint must be no larger than the smallest interval between instructions that may be jump targets; perhaps there is an architecture where only even-numbered instructions may jumped to.) Note that it's possible for an instruction set not to have any instructions usable for a software breakpoint, although in practice only the ARC has failed to define such an instruction.
The basic definition of the software breakpoint is the macro
BREAKPOINT
.
Basic breakpoint object handling is in `breakpoint.c'. However, much of the interesting breakpoint action is in `infrun.c'.
GDB has support for figuring out that the target is doing a
longjmp
and for stopping at the target of the jump, if we are
stepping. This is done with a few specialized internal breakpoints,
which are visible in the maint info breakpoint
command.
To make this work, you need to define a macro called
GET_LONGJMP_TARGET
, which will examine the jmp_buf
structure and extract the longjmp target address. Since jmp_buf
is target specific, you will need to define it in the appropriate
`tm-xyz.h' file. Look in `tm-sun4os4.h' and
`sparc-tdep.c' for examples of how to do this.
GDB has several user interfaces. Although the command-line interface is the most common and most familiar, there are others.
The command interpreter in GDB is fairly simple. It is designed to allow for the set of commands to be augmented dynamically, and also has a recursive subcommand capability, where the first argument to a command may itself direct a lookup on a different command list.
For instance, the set
command just starts a lookup on the
setlist
command list, while set thread
recurses
to the set_thread_cmd_list
.
To add commands in general, use add_cmd
. add_com
adds to
the main command list, and should be used for those commands. The usual
place to add commands is in the _initialize_xyz
routines at the
ends of most source files.
libgdb
was an abortive project of years ago. The theory was to
provide an API to GDB's functionality.
Symbols are a key part of GDB's operation. Symbols include variables, functions, and types.
GDB reads symbols from "symbol files". The usual symbol file is the
file containing the program which GDB is debugging. GDB can be directed
to use a different file for symbols (with the symbol-file
command), and it can also read more symbols via the "add-file" and
"load" commands, or while reading symbols from shared libraries.
Symbol files are initially opened by code in `symfile.c' using the
BFD library. BFD identifies the type of the file by examining its
header. symfile_init
then uses this identification to locate a
set of symbol-reading functions.
Symbol reading modules identify themselves to GDB by calling
add_symtab_fns
during their module initialization. The argument
to add_symtab_fns
is a struct sym_fns
which contains the
name (or name prefix) of the symbol format, the length of the prefix,
and pointers to four functions. These functions are called at various
times to process symbol-files whose identification matches the specified
prefix.
The functions supplied by each module are:
xyz_symfile_init(struct sym_fns *sf)
symbol_file_add
when we are about to read a new
symbol file. This function should clean up any internal state (possibly
resulting from half-read previous files, for example) and prepare to
read a new symbol file. Note that the symbol file which we are reading
might be a new "main" symbol file, or might be a secondary symbol file
whose symbols are being added to the existing symbol table.
The argument to xyz_symfile_init
is a newly allocated
struct sym_fns
whose bfd
field contains the BFD for the
new symbol file being read. Its private
field has been zeroed,
and can be modified as desired. Typically, a struct of private
information will be malloc
'd, and a pointer to it will be placed
in the private
field.
There is no result from xyz_symfile_init
, but it can call
error
if it detects an unavoidable problem.
xyz_new_init()
symbol_file_add
when discarding existing symbols.
This function need only handle the symbol-reading module's internal
state; the symbol table data structures visible to the rest of GDB will
be discarded by symbol_file_add
. It has no arguments and no
result. It may be called after xyz_symfile_init
, if a new
symbol table is being read, or may be called alone if all symbols are
simply being discarded.
xyz_symfile_read(struct sym_fns *sf, CORE_ADDR addr, int mainline)
symbol_file_add
to actually read the symbols from a
symbol-file into a set of psymtabs or symtabs.
sf
points to the struct sym_fns originally passed to
xyz_sym_init
for possible initialization. addr
is
the offset between the file's specified start address and its true
address in memory. mainline
is 1 if this is the main symbol
table being read, and 0 if a secondary symbol file (e.g. shared library
or dynamically loaded file) is being read.
In addition, if a symbol-reading module creates psymtabs when
xyz_symfile_read is called, these psymtabs will contain a pointer
to a function xyz_psymtab_to_symtab
, which can be called
from any point in the GDB symbol-handling code.
xyz_psymtab_to_symtab (struct partial_symtab *pst)
psymtab_to_symtab
(or the PSYMTAB_TO_SYMTAB macro) if
the psymtab has not already been read in and had its pst->symtab
pointer set. The argument is the psymtab to be fleshed-out into a
symtab. Upon return, pst->readin should have been set to 1, and
pst->symtab should contain a pointer to the new corresponding symtab, or
zero if there were no symbols in that part of the symbol file.
GDB has three types of symbol tables.
This section describes partial symbol tables.
A psymtab is constructed by doing a very quick pass over an executable file's debugging information. Small amounts of information are extracted -- enough to identify which parts of the symbol table will need to be re-read and fully digested later, when the user needs the information. The speed of this pass causes GDB to start up very quickly. Later, as the detailed rereading occurs, it occurs in small pieces, at various times, and the delay therefrom is mostly invisible to the user.
The symbols that show up in a file's psymtab should be, roughly, those visible to the debugger's user when the program is not running code from that file. These include external symbols and types, static symbols and types, and enum values declared at file scope.
The psymtab also contains the range of instruction addresses that the full symbol table would represent.
The idea is that there are only two ways for the user (or much of the code in the debugger) to reference a symbol:
find_pc_function
,
find_pc_line
, and other find_pc_...
functions handle
this.
lookup_symbol
does most of the work here.
The only reason that psymtabs exist is to cause a symtab to be read in at the right moment. Any symbol that can be elided from a psymtab, while still causing that to happen, should not appear in it. Since psymtabs don't have the idea of scope, you can't put local symbols in them anyway. Psymtabs don't have the idea of the type of a symbol, either, so types need not appear, unless they will be referenced by name.
It is a bug for GDB to behave one way when only a psymtab has been read, and another way if the corresponding symtab has been read in. Such bugs are typically caused by a psymtab that does not contain all the visible symbols, or which has the wrong instruction address ranges.
The psymtab for a particular section of a symbol-file (objfile) could be thrown away after the symtab has been read in. The symtab should always be searched before the psymtab, so the psymtab will never be used (in a bug-free environment). Currently, psymtabs are allocated on an obstack, and all the psymbols themselves are allocated in a pair of large arrays on an obstack, so there is little to be gained by trying to free them unless you want to do a lot more work.
Fundamental Types (e.g., FT_VOID, FT_BOOLEAN).
These are the fundamental types that GDB uses internally. Fundamental types from the various debugging formats (stabs, ELF, etc) are mapped into one of these. They are basically a union of all fundamental types that gdb knows about for all the languages that GDB knows about.
Type Codes (e.g., TYPE_CODE_PTR, TYPE_CODE_ARRAY).
Each time GDB builds an internal type, it marks it with one of these types. The type may be a fundamental type, such as TYPE_CODE_INT, or a derived type, such as TYPE_CODE_PTR which is a pointer to another type. Typically, several FT_* types map to one TYPE_CODE_* type, and are distinguished by other members of the type struct, such as whether the type is signed or unsigned, and how many bits it uses.
Builtin Types (e.g., builtin_type_void, builtin_type_char).
These are instances of type structs that roughly correspond to fundamental types and are created as global types for GDB to use for various ugly historical reasons. We eventually want to eliminate these. Note for example that builtin_type_int initialized in gdbtypes.c is basically the same as a TYPE_CODE_INT type that is initialized in c-lang.c for an FT_INTEGER fundamental type. The difference is that the builtin_type is not associated with any particular objfile, and only one instance exists, while c-lang.c builds as many TYPE_CODE_INT types as needed, with each one associated with some particular objfile.
The `a.out' format is the original file format for Unix. It consists of three sections: text, data, and bss, which are for program code, initialized data, and uninitialized data, respectively.
The `a.out' format is so simple that it doesn't have any reserved place for debugging information. (Hey, the original Unix hackers used `adb', which is a machine-language debugger.) The only debugging format for `a.out' is stabs, which is encoded as a set of normal symbols with distinctive attributes.
The basic `a.out' reader is in `dbxread.c'.
The COFF format was introduced with System V Release 3 (SVR3) Unix. COFF files may have multiple sections, each prefixed by a header. The number of sections is limited.
The COFF specification includes support for debugging. Although this was a step forward, the debugging information was woefully limited. For instance, it was not possible to represent code that came from an included file.
The COFF reader is in `coffread.c'.
ECOFF is an extended COFF originally introduced for Mips and Alpha workstations.
The basic ECOFF reader is in `mipsread.c'.
The IBM RS/6000 running AIX uses an object file format called XCOFF. The COFF sections, symbols, and line numbers are used, but debugging symbols are dbx-style stabs whose strings are located in the `.debug' section (rather than the string table). For more information, see See section `Top' in The Stabs Debugging Format.
The shared library scheme has a clean interface for figuring out what shared libraries are in use, but the catch is that everything which refers to addresses (symbol tables and breakpoints at least) needs to be relocated for both shared libraries and the main executable. At least using the standard mechanism this can only be done once the program has been run (or the core file has been read).
Windows 95 and NT use the PE (Portable Executable) format for their executables. PE is basically COFF with additional headers.
While BFD includes special PE support, GDB needs only the basic COFF reader.
The ELF format came with System V Release 4 (SVR4) Unix. ELF is similar to COFF in being organized into a number of sections, but it removes many of COFF's limitations.
The basic ELF reader is in `elfread.c'.
SOM is HP's object file and debug format (not to be confused with IBM's SOM, which is a cross-language ABI).
The SOM reader is in `hpread.c'.
Other file formats that have been supported by GDB include Netware Loadable Modules (`nlmread.c'.
This section describes characteristics of debugging information that are independent of the object file format.
stabs
started out as special symbols within the a.out
format. Since then, it has been encapsulated into other file
formats, such as COFF and ELF.
While `dbxread.c' does some of the basic stab processing, including for encapsulated versions, `stabsread.c' does the real work.
The basic COFF definition includes debugging information. The level of support is minimal and non-extensible, and is not often used.
ECOFF includes a definition of a special debug format.
The file `mdebugread.c' implements reading for this format.
DWARF 1 is a debugging format that was originally designed to be used with ELF in SVR4 systems.
The DWARF 1 reader is in `dwarfread.c'.
DWARF 2 is an improved but incompatible version of DWARF 1.
The DWARF 2 reader is in `dwarf2read.c'.
Like COFF, the SOM definition includes debugging information.
If you are using an existing object file format (a.out, COFF, ELF, etc), there is probably little to be done.
If you need to add a new object file format, you must first add it to BFD. This is beyond the scope of this document.
You must then arrange for the BFD code to provide access to the debugging symbols. Generally GDB will have to call swapping routines from BFD and a few other BFD internal routines to locate the debugging information. As much as possible, GDB should not depend on the BFD internal data structures.
For some targets (e.g., COFF), there is a special transfer vector used to call swapping routines, since the external data structures on various platforms have different sizes and layouts. Specialized routines that will only ever be implemented by one object file format may be called directly. This interface should be described in a file `bfd/libxyz.h', which is included by GDB.
GDB's language support is mainly driven by the symbol reader, although it is possible for the user to set the source language manually.
GDB chooses the source language by looking at the extension of the file
recorded in the debug info; .c
means C, .f
means Fortran,
etc. It may also use a special-purpose language identifier if the debug
format supports it, such as DWARF.
To add other languages to GDB's expression parser, follow the following steps:
#define yyparse lang_parse #define yylex lang_lex #define yyerror lang_error #define yylval lang_lval #define yychar lang_char #define yydebug lang_debug #define yypact lang_pact #define yyr1 lang_r1 #define yyr2 lang_r2 #define yydef lang_def #define yychk lang_chk #define yypgo lang_pgo #define yyact lang_act #define yyexca lang_exca #define yyerrflag lang_errflag #define yynerrs lang_nerrsAt the bottom of your parser, define a
struct language_defn
and
initialize it with the right values for your language. Define an
initialize_lang
routine and have it call
`add_language(lang_language_defn)' to tell the rest of GDB
that your language exists. You'll need some other supporting variables
and functions, which will be used via pointers from your
lang_language_defn
. See the declaration of struct
language_defn
in `language.h', and the other `*-exp.y' files,
for more information.
eval.c:evaluate_subexp()
. Add cases
for new opcodes in two functions from `parse.c':
prefixify_subexp()
and length_of_subexp()
. These compute
the number of exp_element
s that a given operation takes up.
enum language
in `defs.h'.
Update the routines in `language.c' so your language is included.
These routines include type predicates and such, which (in some cases)
are language dependent. If your language does not appear in the switch
statement, an error is reported.
Also included in `language.c' is the code that updates the variable
current_language
, and the routines that translate the
language_lang
enumerated identifier into a printable
string.
Update the function _initialize_language
to include your
language. This function picks the default language upon startup, so is
dependent upon which languages that GDB is built for.
Update allocate_symtab
in `symfile.c' and/or symbol-reading
code so that the language of each symtab (source file) is set properly.
This is used to determine the language to use at each stack frame level.
Currently, the language is set based upon the extension of the source
file. If the language can be better inferred from the symbol
information, please set the language of the symtab in the symbol-reading
code.
Add helper code to expprint.c:print_subexp()
to handle any new
expression opcodes you have added to `expression.h'. Also, add the
printed representations of your operators to op_print_tab
.
lang_parse()
and lang_error
in
parse.c:parse_exp_1()
.
_LANG_lang
defined in it. Use #ifdef
s to
leave out large routines that the user won't need if he or she is not
using your language.
Note that you do not need to do this in your YACC parser, since if GDB
is not build for lang, then `lang-exp.tab.o' (the
compiled form of your parser) is not linked into GDB at all.
See the file `configure.in' for how GDB is configured for different
languages.
HFILES
and OBJS
, otherwise your code may
not get linked in, or, worse yet, it may not get tar
red into the
distribution!
With the advent of autoconf, it's rarely necessary to have host definition machinery anymore.
Most of GDB's host configuration support happens via autoconf. It should be rare to need new host-specific definitions. GDB still uses the host-specific definitions and files listed below, but these mostly exist for historical reasons, and should eventually disappear.
Several files control GDB's configuration for host systems:
XM_FILE=
xm-xyz.h
. You can also define CC
, SYSV_DEFINE
,
XM_CFLAGS
, XM_ADD_FILES
, XM_CLIBS
, XM_CDEPS
,
etc.; see `Makefile.in'.
XDEPFILES
line in
`gdb/config/arch/xyz.mh'.
There are some "generic" versions of routines that can be used by
various systems. These can be customized in various ways by macros
defined in your `xm-xyz.h' file. If these routines work for
the xyz host, you can just include the generic file's name (with
`.o', not `.c') in XDEPFILES
.
Otherwise, if your machine needs custom support routines, you will need
to write routines that perform the same functions as the generic file.
Put them into xyz-xdep.c
, and put xyz-xdep.o
into XDEPFILES
.
SER_HARDWIRE
; override this
variable in the `.mh' file to avoid it.
When GDB is configured and compiled, various macros are defined or left undefined, to control compilation based on the attributes of the host system. These macros and their meanings (or if the meaning is not documented here, then one of the source files where they are used is indicated) are:
GDBINIT_FILENAME
MEM_FNS_DECLARED
memcpy
and memset
. Define this to avoid conflicts between
the native include files and the declarations in `defs.h'.
NO_SYS_FILE
<sys/file.h>
.
SIGWINCH_HANDLER
SIGWINCH
, you can define this to be the name
of a function to be called if SIGWINCH
is received.
SIGWINCH_HANDLER_BODY
SIGWINCH_HANDLER
.
ALIGN_STACK_ON_STARTUP
tgetent
if the stack happens not to be longword-aligned when
main
is called. This is a rare situation, but is known to occur
on several different types of systems.
CRLF_SOURCE_FILES
\r\n
rather than \n
as a
line terminator. This will cause source file listings to omit \r
characters when printing and it will allow \r\n line endings of files
which are "sourced" by gdb. It must be possible to open files in binary
mode using O_BINARY
or, for fopen, "rb"
.
DEFAULT_PROMPT
"(gdb) "
).
DEV_TTY
"/dev/tty"
.
FCLOSE_PROVIDED
fclose
in the headers included
in defs.h
. This isn't needed unless your compiler is unusually
anal.
FOPEN_RB
GETENV_PROVIDED
getenv
in its headers included
in defs.h
. This isn't needed unless your compiler is unusually
anal.
HAVE_MMAP
mmap
for reading symbol
tables. For some machines this allows for sharing and quick updates.
HAVE_SIGSETMASK
sigsetmask()
. Currently, this is only true of the RS/6000.
HAVE_TERMIO
termio.h
.
HOST_BYTE_ORDER
BIG_ENDIAN
or LITTLE_ENDIAN
.
INT_MAX
INT_MIN
LONG_MAX
UINT_MAX
ULONG_MAX
ISATTY
LONGEST
long long
or long
, depending on
CC_HAS_LONG_LONG
.
CC_HAS_LONG_LONG
PRINTF_HAS_LONG_LONG
HAVE_LONG_DOUBLE
PRINTF_HAS_LONG_DOUBLE
SCANF_HAS_LONG_DOUBLE
LSEEK_NOT_LINEAR
lseek (n)
does not necessarily move to byte number
n
in the file. This is only used when reading source files. It
is normally faster to define CRLF_SOURCE_FILES
when possible.
L_SET
MAINTENANCE_CMDS
MALLOC_INCOMPATIBLE
malloc
differs from the
ANSI definition.
MMAP_BASE_ADDRESS
MMAP_INCREMENT
NEED_POSIX_SETPGID
setpgid
to determine
whether job control is available.
NORETURN
volatile
,
that can be used in both the declaration and definition of functions to
indicate that they never return. The default is already set correctly
if compiling with GCC. This will almost never need to be defined.
ATTR_NORETURN
__attribute__ ((noreturn))
, that can be used in the declarations
of functions to indicate that they never return. The default is already
set correctly if compiling with GCC. This will almost never need to be
defined.
USE_MMALLOC
mmalloc
library for memory allocation for symbol
reading if this symbol is defined. Be careful defining it since there
are systems on which mmalloc
does not work for some reason. One
example is the DECstation, where its RPC library can't cope with our
redefinition of malloc
to call mmalloc
. When defining
USE_MMALLOC
, you will also have to set MMALLOC
in the
Makefile, to point to the mmalloc library. This define is set when you
configure with --with-mmalloc.
NO_MMCHECK
mmalloc
, but don't want the overhead
of checking the heap with mmcheck
. Note that on some systems,
the C runtime makes calls to malloc prior to calling main
, and if
free
is ever called with these pointers after calling
mmcheck
to enable checking, a memory corruption abort is certain
to occur. These systems can still use mmalloc, but must define
NO_MMCHECK.
MMCHECK_FORCE
mmcheck
being called, but that memory is never freed so we don't
have to worry about it triggering a memory corruption abort. The
default is 0, which means that mmcheck
will only install the heap
checking functions if there has not yet been any memory allocation
calls, and if it fails to install the functions, gdb will issue a
warning. This is currently defined if you configure using
--with-mmalloc.
NO_SIGINTERRUPT
R_OK
SEEK_CUR
SEEK_SET
STOP_SIGNAL
USE_O_NOCTTY
USG
lint
volatile
__volatile__
or
/**/
.
GDB relies on a number of libraries:
-lreadline
and -lhistory
libraries for
command-line processing. The -lreadline
library handles
command-line editing, terminal interface, keymap interfaces, and file
completion; the -lhistory
library handles history processing and
history substitution using csh-style syntax. For more information, see
`readline/doc/hist.texi' and `readline/doc/rlman.texi'.
malloc()
library.
-liberty
library of free software.
It is a collection of subroutines used by various GNU programs,
typically functions that are included in GNU libc, but not in certain
vendor versions of libc. Example functions provided by -liberty
:
getopt
obstack
strerror
strtol
strtoul
The sources to GDB itself are currently stored in four subdirectories, all of which are used to build the final executable:
Until recently, it was possible to build both GDB 4.17 and the GDB 4.14/4.17 hybrid that shipped with DR1 from the same source tree. GDB 4.17 was built in `gdb' and used files from `gdb-next' and `gdb/', in that order, and GDB 4.14 was built in `gdb-4.14' and used the files from `gdb-next-4.14/', `gdb-next/', `gdb-4.14/', and `gdb/', also in that order.
As of January 8, I have stopped building GDB 4.14 along with GDB 4.17 from the same sources. I suspect GDB 4.14 will no longer build from these sources without modification, although I suspect the necessary changes would be relatively minor.
GDB also uses the following subdirectories:
wait_for_inferior
, probably the hairiest function in all of GDB.
nextstep-*
functions.
A type
is the fundamental data structure in GDB for representing
type information. Each type
structure is associated with a
particular object file, with the exception of a few pre-created type
structures used for backwards compatibility with other parts of GDB.
GDB provides a number of "fundamental" data types; more complex data
types can be represented by nesting type
structures within each
other. See section Types, Values, and Expressions for more information.
A value
is the GDB data structure for representing both R- and
L-values of any type. A value
contains a pointer to a GDB
type
structure, as well as a region of memory containing the
value's contents (for an R-value) or address (for an L-value).
A expression
is the GDB data structure for all expressions in all
programming languages. Expressions can be parsed and evaluated
interactively according to the current language syntax, can be used by
breakpoints and watchpoints to compute values, and can cause execution
to take place within a target process (by evaluating function
expressions). Expressions are parsed, evaluated, and printed using the
language-dispatching mechanisms described in `language.c' and
section Language-Specific Sources.
GDB source files to manipulate type
structures:
GDB source files to manipulate expression
structures:
expression
.
Interfaces to the language-specific expression parsing routines described
in section Language-Specific Sources.
expression
structures in the current execution context.
expression
structures in readable (infix) form.
Interfaces to the language-specific type printing routines described in
section Language-Specific Sources.
GDB source files to manipulate value
structures:
The following files allow GDB to parse and manage symbol information in a variety of formats. For an overview of GDB object file and symbol handling, @xref{Symbol Tables}.
The following source files provide symbol-reading interfaces for various file formats. Although all these files are compiled into GDB for Mac OS X, only the first three (`stabsread', `dbxread', and `machoread') are actively used by the rest of GDB.
The following files are used to provide language-specific expression evaluation and printing support. The file lang-exp handles expression parsing, lang-typeprint prints human-readable versions of GDB 'type' structures, and lang-valprint prints human-readable versions of GDB 'value' structures, and lang-lang provides general language-specific support functions. For more information on language-specific support in GDB, @xref{Languages}.
C
C++
Objective-C
Chill
Fortran
Java
Modula II
Scheme
GDB's target architecture defines what sort of machine-language programs GDB can work with, and how it works with them.
At present, the target architecture definition consists of a number of C macros.
GDB's model of the target machine is rather simple. GDB assumes the machine includes a bank of registers and a block of memory. Each register may have a different size.
GDB does not have a magical way to match up with the compiler's idea of
which registers are which; however, it is critical that they do match up
accurately. The only way to make this work is to get accurate
information about the order that the compiler uses, and to reflect that
in the REGISTER_NAME
and related macros.
GDB can handle big-endian, little-endian, and bi-endian architectures.
This section describes the macros that you can use to define the target machine.
ADDITIONAL_OPTIONS
ADDITIONAL_OPTION_CASES
ADDITIONAL_OPTION_HANDLER
ADDITIONAL_OPTION_HELP
ADDR_BITS_REMOVE (addr)
((addr) & ~3)
.
BEFORE_MAIN_LOOP_HOOK
BELIEVE_PCC_PROMOTION
BELIEVE_PCC_PROMOTION_TYPE
BITS_BIG_ENDIAN
BREAKPOINT
BIG_BREAKPOINT
LITTLE_BREAKPOINT
REMOTE_BREAKPOINT
LITTLE_REMOTE_BREAKPOINT
BIG_REMOTE_BREAKPOINT
BREAKPOINT_FROM_PC (pcptr, lenptr)
CALL_DUMMY
CALL_DUMMY_LOCATION
CALL_DUMMY_STACK_ADJUST
CANNOT_FETCH_REGISTER (regno)
FETCH_INFERIOR_REGISTERS
is not defined.
CANNOT_STORE_REGISTER (regno)
DO_DEFERRED_STORES
CLEAR_DEFERRED_STORES
CPLUS_MARKER
'$'
. Most System V targets should
define this to '.'
.
DBX_PARM_SYMBOL_CLASS
SYMBOL_CLASS
of a parameter when decoding DBX symbol
information. In the i960, parameters can be stored as locals or as
args, depending on the type of the debug record.
DECR_PC_AFTER_BREAK
DECR_PC_AFTER_HW_BREAK
DISABLE_UNSETTABLE_BREAK addr
DO_REGISTERS_INFO
END_OF_TEXT_DEFAULT
EXTRACT_RETURN_VALUE(type,regbuf,valbuf)
EXTRACT_STRUCT_VALUE_ADDRESS(regbuf)
FLOAT_INFO
FP_REGNUM
FRAMELESS_FUNCTION_INVOCATION(fi, frameless)
FRAME_ARGS_ADDRESS_CORRECT
FRAME_CHAIN(frame)
FRAME_CHAIN_COMBINE(chain,frame)
FRAME_CHAIN_VALID(chain,thisframe)
default_frame_chain_valid
(the
default) is nonzero if the chain pointer is nonzero and given frame's PC
is not inside the startup file (such as `crt0.o').
alternate_frame_chain_valid
is nonzero if the chain pointer is
nonzero and the given frame's PC is not in main()
or a known
entry point function (such as _start()
).
FRAME_INIT_SAVED_REGS(frame)
frame->saved_regs
. Space for
frame->saved_regs
shall be allocated by
FRAME_INIT_SAVED_REGS
using either
frame_saved_regs_zalloc
or frame_obstack_alloc
.
FRAME_FIND_SAVED_REGS and EXTRA_FRAME_INFO are deprecated.
FRAME_NUM_ARGS (val, fi)
FRAME_SAVED_PC(frame)
FUNCTION_EPILOGUE_SIZE
x_sym.x_misc.x_fsize
field of the
function end symbol is 0. For such targets, you must define
FUNCTION_EPILOGUE_SIZE
to expand into the standard size of a
function's epilogue.
GCC_COMPILED_FLAG_SYMBOL
GCC2_COMPILED_FLAG_SYMBOL
gcc_compiled.
and gcc2_compiled.
, respectively. (Currently
only defined for the Delta 68.)
GDB_TARGET_IS_HPPA
GDB_TARGET_IS_MACH386
GDB_TARGET_IS_SUN3
GDB_TARGET_IS_SUN386
GET_LONGJMP_TARGET
GET_SAVED_REGISTER
get_saved_register
. Currently this is only done for the a29k.
HAVE_REGISTER_WINDOWS
REGISTER_IN_WINDOW_P (regnum)
IBM6000_TARGET
IEEE_FLOAT
INIT_EXTRA_FRAME_INFO (fromleaf, frame)
frame->extra_info
. Space for frame->extra_info
is allocated using frame_obstack_alloc
.
INIT_FRAME_PC (fromleaf, prev)
INNER_THAN (lhs,rhs)
lhs < rhs
if
the target's stack grows downward in memory, or lhs > rsh
if the
stack grows upward.
IN_SIGTRAMP (pc, name)
SIGTRAMP_START (pc)
SIGTRAMP_END (pc)
IN_SOLIB_CALL_TRAMPOLINE pc name
IN_SOLIB_RETURN_TRAMPOLINE pc name
IS_TRAPPED_INTERNALVAR (name)
NEED_TEXT_START_END
NO_HIF_SUPPORT
SOFTWARE_SINGLE_STEP_P
SOFTWARE_SINGLE_STEP
must also be defined.
SOFTWARE_SINGLE_STEP(signal,insert_breapoints_p)
sparc-tdep.c
and rs6000-tdep.c
for examples.
PCC_SOL_BROKEN
PC_IN_CALL_DUMMY
PC_LOAD_SEGMENT
PC_REGNUM
TARGET_WRITE_PC
is not defined.
NPC_REGNUM
NNPC_REGNUM
PRINT_REGISTER_HOOK (regno)
PRINT_TYPELESS_INTEGER
print_longest
that seems to
have been defined for the Convex target.
PROCESS_LINENUMBER_HOOK
PROLOGUE_FIRSTLINE_OVERLAP
PS_REGNUM
POP_FRAME
PUSH_ARGUMENTS (nargs, args, sp, struct_return, struct_addr)
PUSH_DUMMY_FRAME
REGISTER_BYTES
REGISTER_NAME(i)
REG_STRUCT_HAS_ADDR (gcc_p, type)
SDB_REG_TO_REGNUM
SHIFT_INST_REGS
SKIP_PROLOGUE (pc)
SKIP_PROLOGUE_FRAMELESS_P
SKIP_PROLOGUE
will be used instead.
SKIP_TRAMPOLINE_CODE (pc)
SP_REGNUM
STAB_REG_TO_REGNUM
STACK_ALIGN (addr)
STEP_SKIPS_DELAY (addr)
STORE_RETURN_VALUE (type, valbuf)
SUN_FIXED_LBRAC_BUG
SYMBOL_RELOADING_DEFAULT
TARGET_BYTE_ORDER_DEFAULT
BIG_ENDIAN
or LITTLE_ENDIAN
. This macro replaces
TARGET_BYTE_ORDER which is deprecated.
TARGET_BYTE_ORDER_SELECTABLE_P
BIG_ENDIAN
and
LITTLE_ENDIAN
variants. This macro replaces
TARGET_BYTE_ORDER_SELECTABLE which is deprecated.
TARGET_CHAR_BIT
TARGET_COMPLEX_BIT
2 * TARGET_FLOAT_BIT
.
TARGET_DOUBLE_BIT
8 * TARGET_CHAR_BIT
.
TARGET_DOUBLE_COMPLEX_BIT
2 * TARGET_DOUBLE_BIT
.
TARGET_FLOAT_BIT
4 * TARGET_CHAR_BIT
.
TARGET_INT_BIT
4 * TARGET_CHAR_BIT
.
TARGET_LONG_BIT
4 * TARGET_CHAR_BIT
.
TARGET_LONG_DOUBLE_BIT
2 * TARGET_DOUBLE_BIT
.
TARGET_LONG_LONG_BIT
2 * TARGET_LONG_BIT
.
TARGET_PTR_BIT
TARGET_INT_BIT
.
TARGET_SHORT_BIT
2 * TARGET_CHAR_BIT
.
TARGET_READ_PC
TARGET_WRITE_PC (val, pid)
TARGET_READ_SP
TARGET_WRITE_SP
TARGET_READ_FP
TARGET_WRITE_FP
read_pc
, write_pc
,
read_sp
, write_sp
, read_fp
and write_fp
.
For most targets, these may be left undefined. GDB will call the read
and write register functions with the relevant _REGNUM
argument.
These macros are useful when a target keeps one of these registers in a
hard to get at place; for example, part in a segment register and part
in an ordinary register.
TARGET_VIRTUAL_FRAME_POINTER(pc,regp,offsetp)
(register, offset)
pair representing the virtual
frame pointer in use at the code address "pc"
. If virtual
frame pointers are not used, a default definition simply returns
FP_REGNUM
, with an offset of zero.
USE_STRUCT_CONVENTION (gcc_p, type)
VARIABLES_INSIDE_BLOCK (desc, gcc_p)
n_desc
from the
N_RBRAC
symbol, and gcc_p is true if GDB has noticed the
presence of either the GCC_COMPILED_SYMBOL
or the
GCC2_COMPILED_SYMBOL
. By default, this is 0.
OS9K_VARIABLES_INSIDE_BLOCK (desc, gcc_p)
Motorola M68K target conditionals.
BPT_VECTOR
0xf
.
REMOTE_BPT_VECTOR
1
.
The following files define a target to GDB:
If you are adding a new operating system for an existing CPU chip, add a
`config/tm-os.h' file that describes the operating system
facilities that are unusual (extra symbol table info; the breakpoint
instruction needed; etc). Then write a `arch/tm-os.h'
that just #include
s `tm-arch.h' and
`config/tm-os.h'.
The target vector defines the interface between GDB's abstract handling of target systems, and the nitty-gritty code that actually exercises control over a process or a serial port. GDB includes some 30-40 different target vectors; however, each configuration of GDB includes only a few of them.
Both executables and core files have target vectors.
GDB's file `remote.c' talks a serial protocol to code that runs in the target system. GDB provides several sample "stubs" that can be integrated into target programs or operating systems for this purpose; they are named `*-stub.c'.
The GDB user's manual describes how to put such a stub into your target code. What follows is a discussion of integrating the SPARC stub into a complicated operating system (rather than a simple program), by Stu Grossman, the author of this stub.
The trap handling code in the stub assumes the following upon entry to trap_low:
As long as your trap handler can guarantee those conditions, then there
is no reason why you shouldn't be able to `share' traps with the stub.
The stub has no requirement that it be jumped to directly from the
hardware trap vector. That is why it calls exceptionHandler()
,
which is provided by the external environment. For instance, this could
setup the hardware traps to actually execute code which calls the stub
first, and then transfers to its own trap handler.
For the most point, there probably won't be much of an issue with
`sharing' traps, as the traps we use are usually not used by the kernel,
and often indicate unrecoverable error conditions. Anyway, this is all
controlled by a table, and is trivial to modify. The most important
trap for us is for ta 1
. Without that, we can't single step or
do breakpoints. Everything else is unnecessary for the proper operation
of the debugger/stub.
From reading the stub, it's probably not obvious how breakpoints work. They are simply done by deposit/examine operations from GDB.
Several files control GDB's configuration for native support:
There are some "generic" versions of routines that can be used by
various systems. These can be customized in various ways by macros
defined in your `nm-xyz.h' file. If these routines work for
the xyz host, you can just include the generic file's name (with
`.o', not `.c') in NATDEPFILES
.
Otherwise, if your machine needs custom support routines, you will need
to write routines that perform the same functions as the generic file.
Put them into xyz-nat.c
, and put xyz-nat.o
into NATDEPFILES
.
ptrace
call in a vanilla way.
register_addr()
, see below. Now that BFD is used to read core
files, virtually all machines should use core-aout.c
, and should
just provide fetch_core_registers
in xyz-nat.c
(or
REGISTER_U_ADDR
in nm-xyz.h
).
nm-xyz.h
file defines the macro
REGISTER_U_ADDR(addr, blockend, regno)
, it should be defined to
set addr
to the offset within the `user' struct of GDB
register number regno
. blockend
is the offset within the
"upage" of u.u_ar0
. If REGISTER_U_ADDR
is defined,
`core-aout.c' will define the register_addr()
function and
use the macro in it. If you do not define REGISTER_U_ADDR
, but
you are using the standard fetch_core_registers()
, you will need
to define your own version of register_addr()
, put it into your
xyz-nat.c
file, and be sure xyz-nat.o
is in
the NATDEPFILES
list. If you have your own
fetch_core_registers()
, you may not need a separate
register_addr()
. Many custom fetch_core_registers()
implementations simply locate the registers themselves.
When making GDB run native on a new operating system, to make it
possible to debug core files, you will need to either write specific
code for parsing your OS's core files, or customize
`bfd/trad-core.c'. First, use whatever #include
files your
machine uses to define the struct of registers that is accessible
(possibly in the u-area) in a core file (rather than
`machine/reg.h'), and an include file that defines whatever header
exists on a core file (e.g. the u-area or a `struct core'). Then
modify trad_unix_core_file_p()
to use these values to set up the
section information for the data segment, stack segment, any other
segments in the core file (perhaps shared library contents or control
information), "registers" segment, and if there are two discontiguous
sets of registers (e.g. integer and float), the "reg2" segment. This
section information basically delimits areas in the core file in a
standard way, which the section-reading routines in BFD know how to seek
around in.
Then back in GDB, you need a matching routine called
fetch_core_registers()
. If you can use the generic one, it's in
`core-aout.c'; if not, it's in your `xyz-nat.c' file.
It will be passed a char pointer to the entire "registers" segment,
its length, and a zero; or a char pointer to the entire "regs2"
segment, its length, and a 2. The routine should suck out the supplied
register values and install them into GDB's "registers" array.
If your system uses `/proc' to control processes, and uses ELF format core files, then you may be able to use the same routines for reading the registers out of processes and out of core files.
When GDB is configured and compiled, various macros are defined or left undefined, to control compilation when the host and target systems are the same. These macros should be defined (or left undefined) in `nm-system.h'.
ATTACH_DETACH
attach
and
detach
commands.
CHILD_PREPARE_TO_STORE
FETCH_INFERIOR_REGISTERS
fetch_inferior_registers
and store_inferior_registers
in
`HOST-nat.c'. If this symbol is not defined, and
`infptrace.c' is included in this configuration, the default
routines in `infptrace.c' are used for these functions.
FILES_INFO_HOOK
FP0_REGNUM
GET_LONGJMP_TARGET
KERNEL_U_ADDR
u
structure (the "user
struct", also known as the "u-page") in kernel virtual memory. GDB
needs to know this so that it can subtract this address from absolute
addresses in the upage, that are obtained via ptrace or from core files.
On systems that don't need this value, set it to zero.
KERNEL_U_ADDR_BSD
u
at
runtime, by using Berkeley-style nlist
on the kernel's image in
the root directory.
KERNEL_U_ADDR_HPUX
u
at
runtime, by using HP-style nlist
on the kernel's image in the
root directory.
ONE_PROCESS_WRITETEXT
PROC_NAME_FMT
PTRACE_FP_BUG
PTRACE_ARG3_TYPE
ptrace
system call, if it
exists and is different from int
.
REGISTER_U_ADDR
SHELL_COMMAND_CONCAT
SHELL_FILE
"/bin/sh"
.
SOLIB_ADD (filename, from_tty, targ)
SOLIB_CREATE_INFERIOR_HOOK
START_INFERIOR_TRAPS_EXPECTED
SVR4_SHARED_LIBS
USE_PROC_FS
U_REGS_OFFSET
FETCH_INFERIOR_REGISTERS
is not defined). If
the default value from `infptrace.c' is good enough, leave it
undefined.
The default value means that u.u_ar0 points to the location of
the registers. I'm guessing that #define U_REGS_OFFSET 0
means
that u.u_ar0 is the location of the registers.
CLEAR_SOLIB
DEBUG_PTRACE
BFD provides support for GDB in several ways:
The opcodes library provides GDB's disassembler. (It's a separate library because it's also used in binutils, for `objdump').
Regex conditionals.
C_ALLOCA
NFAILURES
RE_NREGS
SIGN_EXTEND_CHAR
SWITCH_ENUM_BUG
SYNTAX_TABLE
Sword
sparc
This chapter covers topics that are lower-level than the major algorithms of GDB.
Cleanups are a structured way to deal with things that need to be done
later. When your code does something (like malloc
some memory,
or open a file) that needs to be undone later (e.g. free the memory or
close the file), it can make a cleanup. The cleanup will be done at
some future point: when the command is finished, when an error occurs,
or when your code decides it's time to do cleanups.
You can also discard cleanups, that is, throw them away without doing what they say. This is only done if you ask that it be done.
Syntax:
struct cleanup *old_chain;
old_chain = make_cleanup (function, arg);
char *
) later. The result, old_chain, is a
handle that can be passed to do_cleanups
or
discard_cleanups
later. Unless you are going to call
do_cleanups
or discard_cleanups
yourself, you can ignore
the result from make_cleanup
.
do_cleanups (old_chain);
make_cleanup
returned
old_chain. E.g.:
make_cleanup (a, 0); old = make_cleanup (b, 0); do_cleanups (old);will call
b()
but will not call a()
. The cleanup that
calls a()
will remain in the cleanup chain, and will be done
later unless otherwise discarded.
discard_cleanups (old_chain);
do_cleanups
except that it just removes the cleanups from
the chain and does not call the specified functions.
Some functions, e.g. fputs_filtered()
or error()
, specify
that they "should not be called when cleanups are not in place". This
means that any actions you need to reverse in the case of an error or
interruption must be on the cleanup chain before you call these
functions, since they might never return to your code (they
`longjmp' instead).
Output that goes through printf_filtered
or fputs_filtered
or fputs_demangled
needs only to have calls to wrap_here
added in places that would be good breaking points. The utility
routines will take care of actually wrapping if the line width is
exceeded.
The argument to wrap_here
is an indentation string which is
printed only if the line breaks there. This argument is saved
away and used later. It must remain valid until the next call to
wrap_here
or until a newline has been printed through the
*_filtered
functions. Don't pass in a local variable and then
return!
It is usually best to call wrap_here()
after printing a comma or
space. If you call it before printing a space, make sure that your
indentation properly accounts for the leading space that will print if
the line wraps there.
Any function or set of functions that produce filtered output must
finish by printing a newline, to flush the wrap buffer, before switching
to unfiltered ("printf
") output. Symbol reading routines that
print warnings are a good example.
GDB follows the GNU coding standards, as described in `etc/standards.texi'. This file is also available for anonymous FTP from GNU archive sites. GDB takes a strict interpretation of the standard; in general, when the GNU standard recommends a practice but does not require it, GDB requires it.
GDB follows an additional set of coding standards specific to GDB, as described in the following sections.
You can configure with `--enable-build-warnings' to get GCC to check on a number of these rules. GDB sources ought not to engender any complaints, unless they are caused by bogus host systems. (The exact set of enabled warnings is currently `-Wall -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations'.
The standard GNU recommendations for formatting must be followed strictly.
Note that while in a definition, the function's name must be in column zero; in a function declaration, the name must be on the same line as the return type.
In addition, there must be a space between a function or macro name and the opening parenthesis of its argument list (except for macro definitions, as required by C). There must not be a space after an open paren/bracket or before a close paren/bracket.
While additional whitespace is generally helpful for reading, do not use more than one blank line to separate blocks, and avoid adding whitespace after the end of a program line (as of 1/99, some 600 lines had whitespace after the semicolon). Excess whitespace causes difficulties for diff and patch.
The standard GNU requirements on comments must be followed strictly.
Block comments must appear in the following form, with no `/*'- or '*/'-only lines, and no leading `*':
/* Wait for control to return from inferior to debugger. If inferior gets a signal, we may decide to start it up again instead of returning. That is why there is a loop in this function. When this function actually returns it means the inferior should be left stopped and GDB should read more commands. */
(Note that this format is encouraged by Emacs; tabbing for a multi-line comment works correctly, and M-Q fills the block consistently.)
Put a blank line between the block comments preceding function or variable definitions, and the definition itself.
In general, put function-body comments on lines by themselves, rather than trying to fit them into the 20 characters left at the end of a line, since either the comment or the code will inevitably get longer than will fit, and then somebody will have to move it anyhow.
Code must not depend on the sizes of C data types, the format of the host's floating point numbers, the alignment of anything, or the order of evaluation of expressions.
Use functions freely. There are only a handful of compute-bound areas in GDB that might be affected by the overhead of a function call, mainly in symbol reading. Most of GDB's performance is limited by the target interface (whether serial line or system call).
However, use functions with moderation. A thousand one-line functions are just as hard to understand as a single thousand-line function.
Prototypes must be used to declare functions but never to define them. Prototypes for GDB functions must include both the argument type and name, with the name matching that used in the actual function definition.
For the sake of compatibility with pre-ANSI compilers, define prototypes
with the PARAMS
macro:
extern int memory_remove_breakpoint PARAMS ((CORE_ADDR addr, char *contents_cache));
Note the double parentheses around the parameter types. This allows an arbitrary number of parameters to be described, without freaking out the C preprocessor. When the function has no parameters, it should be described like:
extern void noprocess PARAMS ((void));
The PARAMS
macro expands to its argument in ANSI C, or to a
simple ()
in traditional C.
All external functions should have a PARAMS
declaration in a
header file that callers include, except for _initialize_*
functions, which must be external so that `init.c' construction
works, but shouldn't be visible to random source files.
All static functions must be declared in a block near the top of the source file.
In addition to getting the syntax right, there's the little question of semantics. Some things are done in certain ways in GDB because long experience has shown that the more obvious ways caused various kinds of trouble.
You can't assume the byte order of anything that comes from a target
(including values, object files, and instructions). Such things
must be byte-swapped using SWAP_TARGET_AND_HOST
in GDB, or one of
the swap routines defined in `bfd.h', such as bfd_get_32
.
You can't assume that you know what interface is being used to talk to
the target system. All references to the target must go through the
current target_ops
vector.
You can't assume that the host and target machines are the same machine (except in the "native" support modules). In particular, you can't assume that the target machine's header files will be available on the host machine. Target code must bring along its own header files -- written from scratch or explicitly donated by their owner, to avoid copyright problems.
Insertion of new #ifdef
's will be frowned upon. It's much better
to write the code portably than to conditionalize it for various
systems.
New #ifdef
's which test for specific compilers or manufacturers
or operating systems are unacceptable. All #ifdef
's should test
for features. The information about which configurations contain which
features should be segregated into the configuration files. Experience
has proven far too often that a feature unique to one particular system
often creeps into other systems; and that a conditional based on some
predefined macro for your current system will become worthless over
time, as new versions of your system come out that behave differently
with regard to this feature.
Adding code that handles specific architectures, operating systems, target interfaces, or hosts, is not acceptable in generic code. If a hook is needed at that point, invent a generic hook and define it for your configuration, with something like:
#ifdef WRANGLE_SIGNALS WRANGLE_SIGNALS (signo); #endif
In your host, target, or native configuration file, as appropriate,
define WRANGLE_SIGNALS
to do the machine-dependent thing. Take a
bit of care in defining the hook, so that it can be used by other ports
in the future, if they need a hook in the same place.
If the hook is not defined, the code should do whatever "most" machines
want. Using #ifdef
, as above, is the preferred way to do this,
but sometimes that gets convoluted, in which case use
#ifndef SPECIAL_FOO_HANDLING #define SPECIAL_FOO_HANDLING(pc, sp) (0) #endif
where the macro is used or in an appropriate header file.
Whether to include a small hook, a hook around the exact pieces of
code which are system-dependent, or whether to replace a whole function
with a hook depends on the case. A good example of this dilemma can be
found in get_saved_register
. All machines that GDB 2.8 ran on
just needed the FRAME_FIND_SAVED_REGS
hook to find the saved
registers. Then the SPARC and Pyramid came along, and
HAVE_REGISTER_WINDOWS
and REGISTER_IN_WINDOW_P
were
introduced. Then the 29k and 88k required the GET_SAVED_REGISTER
hook. The first three are examples of small hooks; the latter replaces
a whole function. In this specific case, it is useful to have both
kinds; it would be a bad idea to replace all the uses of the small hooks
with GET_SAVED_REGISTER
, since that would result in much
duplicated code. Other times, duplicating a few lines of code here or
there is much cleaner than introducing a large number of small hooks.
Another way to generalize GDB along a particular interface is with an attribute struct. For example, GDB has been generalized to handle multiple kinds of remote interfaces -- not by #ifdef's everywhere, but by defining the "target_ops" structure and having a current target (as well as a stack of targets below it, for memory references). Whenever something needs to be done that depends on which remote interface we are using, a flag in the current target_ops structure is tested (e.g. `target_has_stack'), or a function is called through a pointer in the current target_ops structure. In this way, when a new remote interface is added, only one module needs to be touched -- the one that actually implements the new remote interface. Other examples of attribute-structs are BFD access to multiple kinds of object file formats, or GDB's access to multiple source languages.
Please avoid duplicating code. For example, in GDB 3.x all the code
interfacing between ptrace
and the rest of GDB was duplicated in
`*-dep.c', and so changing something was very painful. In GDB 4.x,
these have all been consolidated into `infptrace.c'.
`infptrace.c' can deal with variations between systems the same way
any system-independent file would (hooks, #if defined, etc.), and
machines which are radically different don't need to use infptrace.c at
all.
Most of the work in making GDB compile on a new machine is in specifying
the configuration of the machine. This is done in a dizzying variety of
header files and configuration scripts, which we hope to make more
sensible soon. Let's say your new host is called an xyz (e.g.
`sun4'), and its full three-part configuration name is
arch-xvend-xos
(e.g. `sparc-sun-sunos4').
In particular:
In the top level directory, edit `config.sub' and add arch,
xvend, and xos to the lists of supported architectures,
vendors, and operating systems near the bottom of the file. Also, add
xyz as an alias that maps to
arch-xvend-xos
. You can test your changes by
running
./config.sub xyz
and
./config.sub arch-xvend-xos
which should both respond with arch-xvend-xos
and no error messages.
You need to port BFD, if that hasn't been done already. Porting BFD is beyond the scope of this manual.
To configure GDB itself, edit `gdb/configure.host' to recognize
your system and set gdb_host
to xyz, and (unless your
desired target is already available) also edit `gdb/configure.tgt',
setting gdb_target
to something appropriate (for instance,
xyz).
Finally, you'll need to specify and define GDB's host-, native-, and target-dependent `.h' and `.c' files used for your configuration.
From the top level directory (containing `gdb', `bfd', `libiberty', and so on):
make -f Makefile.in gdb.tar.gz
This will properly configure, clean, rebuild any files that are
distributed pre-built (e.g. `c-exp.tab.c' or `refcard.ps'),
and will then make a tarfile. (If the top level directory has already
been configured, you can just do make gdb.tar.gz
instead.)
This procedure requires:
makeinfo
(texinfo2 level)
dvips
yacc
or bison
... and the usual slew of utilities (sed
, tar
, etc.).
`gdb.texinfo' is currently marked up using the texinfo-2 macros, which are not yet a default for anything (but we have to start using them sometime).
For making paper, the only thing this implies is the right generation of `texinfo.tex' needs to be included in the distribution.
For making info files, however, rather than duplicating the texinfo2
distribution, generate `gdb-all.texinfo' locally, and include the
files `gdb.info*' in the distribution. Note the plural;
makeinfo
will split the document into one overall file and five
or so included files.
Check the `README' file, it often has useful information that does not appear anywhere else in the directory.
GDB is a large and complicated program, and if you first starting to work on it, it can be hard to know where to start. Fortunately, if you know how to go about it, there are ways to figure out what is going on.
This manual, the GDB Internals manual, has information which applies generally to many parts of GDB.
Information about particular functions or data structures are located in comments with those functions or data structures. If you run across a function or a global variable which does not have a comment correctly explaining what is does, this can be thought of as a bug in GDB; feel free to submit a bug report, with a suggested comment if you can figure out what the comment should say. If you find a comment which is actually wrong, be especially sure to report that.
Comments explaining the function of macros defined in host, target, or native dependent files can be in several places. Sometimes they are repeated every place the macro is defined. Sometimes they are where the macro is used. Sometimes there is a header file which supplies a default definition of the macro, and the comment is there. This manual also documents all the available macros.
Start with the header files. Once you some idea of how GDB's internal symbol tables are stored (see `symtab.h', `gdbtypes.h'), you will find it much easier to understand the code which uses and creates those symbol tables.
You may wish to process the information you are getting somehow, to enhance your understanding of it. Summarize it, translate it to another language, add some (perhaps trivial or non-useful) feature to GDB, use the code to predict what a test case would do and write the test case and verify your prediction, etc. If you are reading code and your eyes are starting to glaze over, this is a sign you need to use a more active approach.
Once you have a part of GDB to start with, you can find more
specifically the part you are looking for by stepping through each
function with the next
command. Do not use step
or you
will quickly get distracted; when the function you are stepping through
calls another function try only to get a big-picture understanding
(perhaps using the comment at the beginning of the function being
called) of what it does. This way you can identify which of the
functions being called by the function you are stepping through is the
one which you are interested in. You may need to examine the data
structures generated at each stage, with reference to the comments in
the header files explaining what the data structures are supposed to
look like.
Of course, this same technique can be used if you are just reading the code, rather than actually stepping through it. The same general principle applies--when the code you are looking at calls something else, just try to understand generally what the code being called does, rather than worrying about all its details.
A good place to start when tracking down some particular area is with a
command which invokes that feature. Suppose you want to know how
single-stepping works. As a GDB user, you know that the step
command invokes single-stepping. The command is invoked via command
tables (see `command.h'); by convention the function which actually
performs the command is formed by taking the name of the command and
adding `_command', or in the case of an info
subcommand,
`_info'. For example, the step
command invokes the
step_command
function and the info display
command invokes
display_info
. When this convention is not followed, you might
have to use grep
or M-x tags-search in emacs, or run GDB on
itself and set a breakpoint in execute_command
.
If all of the above fail, it may be appropriate to ask for information
on bug-gdb
. But never post a generic question like "I was
wondering if anyone could give me some tips about understanding
GDB"---if we had some magic secret we would put it in this manual.
Suggestions for improving the manual are always welcome, of course.
If GDB is limping on your machine, this is the preferred way to get it
fully functional. Be warned that in some ancient Unix systems, like
Ultrix 4.2, a program can't be running in one process while it is being
debugged in another. Rather than typing the command ./gdb
./gdb
, which works on Suns and such, you can copy `gdb' to
`gdb2' and then type ./gdb ./gdb2
.
When you run GDB in the GDB source directory, it will read a
`.gdbinit' file that sets up some simple things to make debugging
gdb easier. The info
command, when executed without a subcommand
in a GDB being debugged by gdb, will pop you back up to the top level
gdb. See `.gdbinit' for details.
If you use emacs, you will probably want to do a make TAGS
after
you configure your distribution; this will put the machine dependent
routines for your local machine where they will be accessed first by
M-.
Also, make sure that you've either compiled GDB with your local cc, or
have run fixincludes
if you are compiling with gcc.
Thanks for thinking of offering your changes back to the community of GDB users. In general we like to get well designed enhancements. Thanks also for checking in advance about the best way to transfer the changes.
The GDB maintainers will only install "cleanly designed" patches. You may not always agree on what is clean design.
If the maintainers don't have time to put the patch in when it arrives, or if there is any question about a patch, it goes into a large queue with everyone else's patches and bug reports.
The legal issue is that to incorporate substantial changes requires a
copyright assignment from you and/or your employer, granting ownership
of the changes to the Free Software Foundation. You can get the
standard document for doing this by sending mail to
gnu@prep.ai.mit.edu
and asking for it. I recommend that people
write in "All programs owned by the Free Software Foundation" as "NAME
OF PROGRAM", so that changes in many programs (not just GDB, but GAS,
Emacs, GCC, etc) can be contributed with only one piece of legalese
pushed through the bureacracy and filed with the FSF. I can't start
merging changes until this paperwork is received by the FSF (their
rules, which I follow since I maintain it for them).
Technically, the easiest way to receive changes is to receive each feature as a small context diff or unidiff, suitable for "patch". Each message sent to me should include the changes to C code and header files for a single feature, plus ChangeLog entries for each directory where files were modified, and diffs for any changes needed to the manuals (gdb/doc/gdb.texi or gdb/doc/gdbint.texi). If there are a lot of changes for a single feature, they can be split down into multiple messages.
In this way, if I read and like the feature, I can add it to the sources with a single patch command, do some testing, and check it in. If you leave out the ChangeLog, I have to write one. If you leave out the doc, I have to puzzle out what needs documenting. Etc.
The reason to send each change in a separate message is that I will not install some of the changes. They'll be returned to you with questions or comments. If I'm doing my job, my message back to you will say what you have to fix in order to make the change acceptable. The reason to have separate messages for separate features is so that other changes (which I am willing to accept) can be installed while one or more changes are being reworked. If multiple features are sent in a single message, I tend to not put in the effort to sort out the acceptable changes from the unacceptable, so none of the features get installed until all are acceptable.
If this sounds painful or authoritarian, well, it is. But I get a lot of bug reports and a lot of patches, and most of them don't get installed because I don't have the time to finish the job that the bug reporter or the contributor could have done. Patches that arrive complete, working, and well designed, tend to get installed on the day they arrive. The others go into a queue and get installed if and when I scan back over the queue -- which can literally take months sometimes. It's in both our interests to make patch installation easy -- you get your changes installed, and I make some forward progress on GDB in a normal 12-hour day (instead of them having to wait until I have a 14-hour or 16-hour day to spend cleaning up patches before I can install them).
Please send patches directly to the GDB maintainers at
gdb-patches@cygnus.com
.
Fragments of old code in GDB sometimes reference or set the following configuration macros. They should not be used by new code, and old uses should be removed as those parts of the debugger are otherwise touched.
STACK_END_ADDR
PYRAMID_CONTROL_FRAME_DEBUGGING
PYRAMID_CORE
PYRAMID_PTRACE
REG_STACK_SEGMENT
This document was generated on 6 April 1999 using the texi2html translator version 1.51.