Go to the first, previous, next, last section, table of contents.


Porting GAS

Each GAS target specifies two main things: the CPU file and the object format file. Two main switches in the `configure.in' file handle this. The first switches on CPU type to set the shell variable cpu_type. The second switches on the entire target to set the shell variable fmt.

The configure script uses the value of cpu_type to select two files in the `config' directory: `tc-CPU.c' and `tc-CPU.h'. The configuration process will create a file named `targ-cpu.h' in the build directory which includes `tc-CPU.h'.

The configure script also uses the value of fmt to select two files: `obj-fmt.c' and `obj-fmt.h'. The configuration process will create a file named `obj-format.h' in the build directory which includes `obj-fmt.h'.

You can also set the emulation in the configure script by setting the em variable. Normally the default value of `generic' is fine. The configuration process will create a file named `targ-env.h' in the build directory which includes `te-em.h'.

Porting GAS to a new CPU requires writing the `tc-CPU' files. Porting GAS to a new object file format requires writing the `obj-fmt' files. There is sometimes some interaction between these two files, but it is normally minimal.

The best approach is, of course, to copy existing files. The documentation below assumes that you are looking at existing files to see usage details.

These interfaces have grown over time, and have never been carefully thought out or designed. Nothing about the interfaces described here is cast in stone. It is possible that they will change from one version of the assembler to the next. Also, new macros are added all the time as they are needed.

Writing a CPU backend

The CPU backend files are the heart of the assembler. They are the only parts of the assembler which actually know anything about the instruction set of the processor.

You must define a reasonably small list of macros and functions in the CPU backend files. You may define a large number of additional macros in the CPU backend files, not all of which are documented here. You must, of course, define macros in the `.h' file, which is included by every assembler source file. You may define the functions as macros in the `.h' file, or as functions in the `.c' file.

TC_CPU
By convention, you should define this macro in the `.h' file. For example, `tc-m68k.h' defines TC_M68K. You might have to use this if it is necessary to add CPU specific code to the object format file.
TARGET_FORMAT
This macro is the BFD target name to use when creating the output file. This will normally depend upon the OBJ_FMT macro.
TARGET_ARCH
This macro is the BFD architecture to pass to bfd_set_arch_mach.
TARGET_MACH
This macro is the BFD machine number to pass to bfd_set_arch_mach. If it is not defined, GAS will use 0.
TARGET_BYTES_BIG_ENDIAN
You should define this macro to be non-zero if the target is big endian, and zero if the target is little endian.
md_shortopts
md_longopts
md_longopts_size
md_parse_option
md_show_usage
GAS uses these variables and functions during option processing. md_shortopts is a const char * which GAS adds to the machine independent string passed to getopt. md_longopts is a struct option [] which GAS adds to the machine independent long options passed to getopt; you may use OPTION_MD_BASE, defined in `as.h', as the start of a set of long option indices, if necessary. md_longopts_size is a size_t holding the size md_longopts. GAS will call md_parse_option whenever getopt returns an unrecognized code, presumably indicating a special code value which appears in md_longopts. GAS will call md_show_usage when a usage message is printed; it should print a description of the machine specific options.
md_begin
GAS will call this function at the start of the assembly, after the command line arguments have been parsed and all the machine independent initializations have been completed.
md_cleanup
If you define this macro, GAS will call it at the end of each input file.
md_assemble
GAS will call this function for each input line which does not contain a pseudo-op. The argument is a null terminated string. The function should assemble the string as an instruction with operands. Normally md_assemble will do this by calling frag_more and writing out some bytes (see section Frags). md_assemble will call fix_new to create fixups as needed (see section Fixups). Targets which need to do special purpose relaxation will call frag_var.
md_pseudo_table
This is a const array of type pseudo_typeS. It is a mapping from pseudo-op names to functions. You should use this table to implement pseudo-ops which are specific to the CPU.
tc_conditional_pseudoop
If this macro is defined, GAS will call it with a pseudo_typeS argument. It should return non-zero if the pseudo-op is a conditional which controls whether code is assembled, such as `.if'. GAS knows about the normal conditional pseudo-ops,and you should normally not have to define this macro.
comment_chars
This is a null terminated const char array of characters which start a comment.
tc_comment_chars
If this macro is defined, GAS will use it instead of comment_chars.
line_comment_chars
This is a null terminated const char array of characters which start a comment when they appear at the start of a line.
line_separator_chars
This is a null terminated const char array of characters which separate lines (the semicolon is such a character by default, and need not be listed in this array).
EXP_CHARS
This is a null terminated const char array of characters which may be used as the exponent character in a floating point number. This is normally "eE".
FLT_CHARS
This is a null terminated const char array of characters which may be used to indicate a floating point constant. A zero followed by one of these characters is assumed to be followed by a floating point number; thus they operate the way that 0x is used to indicate a hexadecimal constant. Usually this includes `r' and `f'.
LEX_AT
You may define this macro to the lexical type of the } character. The default is zero. Lexical types are a combination of LEX_NAME and LEX_BEGIN_NAME, both defined in `read.h'. LEX_NAME indicates that the character may appear in a name. LEX_BEGIN_NAME indicates that the character may appear at the beginning of a nem.
LEX_BR
You may define this macro to the lexical type of the brace characters {, }, [, and ]. The default value is zero.
LEX_PCT
You may define this macro to the lexical type of the % character. The default value is zero.
LEX_QM
You may define this macro to the lexical type of the ? character. The default value it zero.
LEX_DOLLAR
You may define this macro to the lexical type of the $ character. The default value is LEX_NAME | LEX_BEGIN_NAME.
SINGLE_QUOTE_STRINGS
If you define this macro, GAS will treat single quotes as string delimiters. Normally only double quotes are accepted as string delimiters.
NO_STRING_ESCAPES
If you define this macro, GAS will not permit escape sequences in a string.
ONLY_STANDARD_ESCAPES
If you define this macro, GAS will warn about the use of nonstandard escape sequences in a string.
md_start_line_hook
If you define this macro, GAS will call it at the start of each line.
LABELS_WITHOUT_COLONS
If you define this macro, GAS will assume that any text at the start of a line is a label, even if it does not have a colon.
TC_START_LABEL
You may define this macro to control what GAS considers to be a label. The default definition is to accept any name followed by a colon character.
NO_PSEUDO_DOT
If you define this macro, GAS will not require pseudo-ops to start with a . character.
TC_EQUAL_IN_INSN
If you define this macro, it should return nonzero if the instruction is permitted to contain an = character. GAS will use this to decide if a = is an assignment or an instruction.
TC_EOL_IN_INSN
If you define this macro, it should return nonzero if the current input line pointer should be treated as the end of a line.
md_parse_name
If this macro is defined, GAS will call it for any symbol found in an expression. You can define this to handle special symbols in a special way. If a symbol always has a certain value, you should normally enter it in the symbol table, perhaps using reg_section.
md_undefined_symbol
GAS will call this function when a symbol table lookup fails, before it creates a new symbol. Typically this would be used to supply symbols whose name or value changes dynamically, possibly in a context sensitive way. Predefined symbols with fixed values, such as register names or condition codes, are typically entered directly into the symbol table when md_begin is called.
md_operand
GAS will call this function for any expression that can not be recognized. When the function is called, input_line_pointer will point to the start of the expression.
tc_unrecognized_line
If you define this macro, GAS will call it when it finds a line that it can not parse.
md_do_align
You may define this macro to handle an alignment directive. GAS will call it when the directive is seen in the input file. For example, the i386 backend uses this to generate efficient nop instructions of varying lengths, depending upon the number of bytes that the alignment will skip.
HANDLE_ALIGN
You may define this macro to do special handling for an alignment directive. GAS will call it at the end of the assembly.
md_flush_pending_output
If you define this macro, GAS will call it each time it skips any space because of a space filling or alignment or data allocation pseudo-op.
TC_PARSE_CONS_EXPRESSION
You may define this macro to parse an expression used in a data allocation pseudo-op such as .word. You can use this to recognize relocation directives that may appear in such directives.
BITFIELD_CONS_EXPRESSION
If you define this macro, GAS will recognize bitfield instructions in data allocation pseudo-ops, as used on the i960.
REPEAT_CONS_EXPRESSION
If you define this macro, GAS will recognize repeat counts in data allocation pseudo-ops, as used on the MIPS.
md_cons_align
You may define this macro to do any special alignment before a data allocation pseudo-op.
TC_CONS_FIX_NEW
You may define this macro to generate a fixup for a data allocation pseudo-op.
md_number_to_chars
This should just call either number_to_chars_bigendian or number_to_chars_littleendian, whichever is appropriate. On targets like the MIPS which support options to change the endianness, which function to call is a runtime decision. On other targets, md_number_to_chars can be a simple macro.
md_reloc_size
This variable is only used in the original version of gas (not BFD_ASSEMBLER and not MANY_SEGMENTS). It holds the size of a relocation entry.
WORKING_DOT_WORD
md_short_jump_size
md_long_jump_size
md_create_short_jump
md_create_long_jump
If WORKING_DOT_WORD is defined, GAS will not do broken word processing (see section Broken words). Otherwise, you should set md_short_jump_size to the size of a short jump (a jump that is just long enough to jump around a long jmp) and md_long_jump_size to the size of a long jump (a jump that can go anywhere in the function), You should define md_create_short_jump to create a short jump around a long jump, and define md_create_long_jump to create a long jump.
md_estimate_size_before_relax
This function returns an estimate of the size of a rs_machine_dependent frag before any relaxing is done. It may also create any necessary relocations.
md_relax_frag
This macro may be defined to relax a frag. GAS will call this with the frag and the change in size of all previous frags; md_relax_frag should return the change in size of the frag. See section Relaxation.
TC_GENERIC_RELAX_TABLE
If you do not define md_relax_frag, you may define TC_GENERIC_RELAX_TABLE as a table of relax_typeS structures. The machine independent code knows how to use such a table to relax PC relative references. See `tc-m68k.c' for an example. See section Relaxation.
md_prepare_relax_scan
If defined, it is a C statement that is invoked prior to scanning the relax table.
LINKER_RELAXING_SHRINKS_ONLY
If you define this macro, and the global variable `linkrelax' is set (because of a command line option, or unconditionally in md_begin), a `.align' directive will cause extra space to be allocated. The linker can then discard this space when relaxing the section.
md_convert_frag
GAS will call this for each rs_machine_dependent fragment. The instruction is completed using the data from the relaxation pass. It may also create any necessary relocations. See section Relaxation.
md_apply_fix
GAS will call this for each fixup. It should store the correct value in the object file.
TC_HANDLES_FX_DONE
If this macro is defined, it means that md_apply_fix correctly sets the fx_done field in the fixup.
tc_gen_reloc
A BFD_ASSEMBLER GAS will call this to generate a reloc. GAS will pass the resulting reloc to bfd_install_relocation. This currently works poorly, as bfd_install_relocation often does the wrong thing, and instances of tc_gen_reloc have been written to work around the problems, which in turns makes it difficult to fix bfd_install_relocation.
RELOC_EXPANSION_POSSIBLE
If you define this macro, it means that tc_gen_reloc may return multiple relocation entries for a single fixup. In this case, the return value of tc_gen_reloc is a pointer to a null terminated array.
MAX_RELOC_EXPANSION
You must define this if RELOC_EXPANSION_POSSIBLE is defined; it indicates the largest number of relocs which tc_gen_reloc may return for a single fixup.
tc_fix_adjustable
You may define this macro to indicate whether a fixup against a locally defined symbol should be adjusted to be against the section symbol. It should return a non-zero value if the adjustment is acceptable.
MD_PCREL_FROM_SECTION
If you define this macro, it should return the offset between the address of a PC relative fixup and the position from which the PC relative adjustment should be made. On many processors, the base of a PC relative instruction is the next instruction, so this macro would return the length of an instruction.
md_pcrel_from
This is the default value of MD_PCREL_FROM_SECTION. The difference is that md_pcrel_from does not take a section argument.
tc_frob_label
If you define this macro, GAS will call it each time a label is defined.
md_section_align
GAS will call this function for each section at the end of the assembly, to permit the CPU backend to adjust the alignment of a section.
tc_frob_section
If you define this macro, a BFD_ASSEMBLER GAS will call it for each section at the end of the assembly.
tc_frob_file_before_adjust
If you define this macro, GAS will call it after the symbol values are resolved, but before the fixups have been changed from local symbols to section symbols.
tc_frob_symbol
If you define this macro, GAS will call it for each symbol. You can indicate that the symbol should not be included in the object file by definining this macro to set its second argument to a non-zero value.
tc_frob_file
If you define this macro, GAS will call it after the symbol table has been completed, but before the relocations have been generated.
tc_frob_file_after_relocs
If you define this macro, GAS will call it after the relocs have been generated.
LISTING_HEADER
A string to use on the header line of a listing. The default value is simply "GAS LISTING".
LISTING_WORD_SIZE
The number of bytes to put into a word in a listing. This affects the way the bytes are clumped together in the listing. For example, a value of 2 might print `1234 5678' where a value of 1 would print `12 34 56 78'. The default value is 4.
LISTING_LHS_WIDTH
The number of words of data to print on the first line of a listing for a particular source line, where each word is LISTING_WORD_SIZE bytes. The default value is 1.
LISTING_LHS_WIDTH_SECOND
Like LISTING_LHS_WIDTH, but applying to the second and subsequent line of the data printed for a particular source line. The default value is 1.
LISTING_LHS_CONT_LINES
The maximum number of continuation lines to print in a listing for a particular source line. The default value is 4.
LISTING_RHS_WIDTH
The maximum number of characters to print from one line of the input file. The default value is 100.

Writing an object format backend

As with the CPU backend, the object format backend must define a few things, and may define some other things. The interface to the object format backend is generally simpler; most of the support for an object file format consists of defining a number of pseudo-ops.

The object format `.h' file must include `targ-cpu.h'.

This section will only define the BFD_ASSEMBLER version of GAS. It is impossible to support a new object file format using any other version anyhow, as the original GAS version only supports a.out, and the MANY_SEGMENTS GAS version only supports COFF.

OBJ_format
By convention, you should define this macro in the `.h' file. For example, `obj-elf.h' defines OBJ_ELF. You might have to use this if it is necessary to add object file format specific code to the CPU file.
obj_begin
If you define this macro, GAS will call it at the start of the assembly, after the command line arguments have been parsed and all the machine independent initializations have been completed.
obj_app_file
If you define this macro, GAS will invoke it when it sees a .file pseudo-op or a `#' line as used by the C preprocessor.
OBJ_COPY_SYMBOL_ATTRIBUTES
You should define this macro to copy object format specific information from one symbol to another. GAS will call it when one symbol is equated to another.
obj_fix_adjustable
You may define this macro to indicate whether a fixup against a locally defined symbol should be adjusted to be against the section symbol. It should return a non-zero value if the adjustment is acceptable.
obj_sec_sym_ok_for_reloc
You may define this macro to indicate that it is OK to use a section symbol in a relocateion entry. If it is not, GAS will define a new symbol at the start of a section.
EMIT_SECTION_SYMBOLS
You should define this macro with a zero value if you do not want to include section symbols in the output symbol table. The default value for this macro is one.
obj_adjust_symtab
If you define this macro, GAS will invoke it just before setting the symbol table of the output BFD. For example, the COFF support uses this macro to generate a .file symbol if none was generated previously.
SEPARATE_STAB_SECTIONS
You may define this macro to indicate that stabs should be placed in separate sections, as in ELF.
INIT_STAB_SECTION
You may define this macro to initialize the stabs section in the output file.
OBJ_PROCESS_STAB
You may define this macro to do specific processing on a stabs entry.
obj_frob_section
If you define this macro, GAS will call it for each section at the end of the assembly.
obj_frob_file_before_adjust
If you define this macro, GAS will call it after the symbol values are resolved, but before the fixups have been changed from local symbols to section symbols.
obj_frob_symbol
If you define this macro, GAS will call it for each symbol. You can indicate that the symbol should not be included in the object file by definining this macro to set its second argument to a non-zero value.
obj_frob_file
If you define this macro, GAS will call it after the symbol table has been completed, but before the relocations have been generated.
obj_frob_file_after_relocs
If you define this macro, GAS will call it after the relocs have been generated.

Writing emulation files

Normally you do not have to write an emulation file. You can just use `te-generic.h'.

If you do write your own emulation file, it must include `obj-format.h'.

An emulation file will often define TE_EM; this may then be used in other files to change the output.


Go to the first, previous, next, last section, table of contents.