Class Parser

public class Parser
{
  // Fields
  protected static final String PackageName;
  protected PrintWriter PuntOutput;
  protected Vector Functions;
  protected Vector Symbols;
  protected Hashtable TypeLUT;
  protected Hashtable StringTypes;
  protected Hashtable OutputClasses;
  protected Hashtable StructureLUT;
  protected Hashtable Precedence;
  protected Vector IncludeHeaders;
  protected Vector ExcludeHeaders;
  protected Vector ExcludeFunctions;
  protected static final boolean ReadSymbols;
  protected static final boolean Suppress_UnknownLib_Functions;
  protected static final boolean Suppress_Unused_Structures;
  protected static final boolean Comment_Variant_Types;
  protected static final int DEBUG;
  protected static final String AnonymousString;
  protected static final String UnknownFileString;
  protected static final String UnknownLibraryString;
  protected static final String ExcludeFunctionFile;
  public static final String CallbackString;
  

  // Constructors
  public Parser();

  // Methods
  public void finalizer();
  protected void ReadExcludeFunctions();
  public static final void usage();
  protected void PopulateTypeLUT();
  protected void SetStringTypes();
  public void Convert() throws InvalidParameterException;
  protected void OutputToClassFile(Function func)
        throws InvalidParameterException;
  public String ConvertFunction(Function func)
        throws InvalidParameterException;
  public String ConvertArgumentType(Variable var, Function func)
        throws UnrecognizedCodeException, InvalidParameterException;
  public void ParseFile(String FileIn) throws UnrecognizedCodeException,
        InvalidParameterException;
  protected void MungeVariables(Function func, StreamTokenizer st)
        throws IOException, UnrecognizedCodeException;
  protected multiFieldDataStructure ReadStructure(StreamTokenizer st,
        boolean insideStructure) throws UnrecognizedCodeException,
        InvalidParameterException, IOException;
  protected Variable readField(StreamTokenizer st, char separator,
        char terminator, boolean isInsideStruct, boolean allowAnonymous)
        throws UnrecognizedCodeException, InvalidParameterException, IOException,
        PuntException;
  public boolean isCOperator(char c);
  public Operator readOperator(StreamTokenizer st, boolean couldBePrefix)
        throws IOException;
  protected void PackHandler(StreamTokenizer st, Stack packsize)
        throws InvalidParameterException, IOException, UnrecognizedCodeException;
  protected void ParseSymbolFile(String File) throws BadInputFileException,
        InvalidParameterException;
  protected void CompareFunctionWithSymbols();
  public void UnifyFunctions();
  public void UnifyStructures();
  protected void SetupFileFilters();
  protected void SetupOutputClasses();
  protected void SetupPrecedenceTable();
  protected boolean CheckFile(String File);
  protected Function findFunction(String Name);
  protected void FindLibrary(Function func) throws InvalidParameterException;
  public void OfficeFunctions(String OfficeFileName, String MissingFileName);
  public void WriteOutFunctions(PrintWriter pw);
  public void ReadListofSymbolFiles(String list)
        throws InvalidParameterException;
  public static final void main(String args[]);
}

Parser for Win32 API header files

By Brian Grunkemeyer, June-August 1997

This tool was used to generate a significant portion of the Win32 API classes. It is being included for you to use and modify to fit your specific needs. Remember that C header files were not designed to be language-independent descriptions, and that there is more than one correct way to represent some data types in Java. Thus, some functions will require hand-translation. For information on how to do this, see the J/Direct documentation.

Notes:

Constructors

Parser

public Parser();

Methods

CheckFile

protected boolean CheckFile(String File);

Decides if we should examine the current file or not. Checks IncludeHeaders, ExcludeHeaders, whether its an IDL file, and whether it starts with "mm".

Return Value

Returns true if we should parse it, else false.

ParameterDescription
FileHeader file we may want to parse.

CompareFunctionWithSymbols

protected void CompareFunctionWithSymbols();

Compares parsed functions with symbols from a DLL. Assumes a file has been parsed and a symbols file has been read in.

Return Value

No return value.

Convert

public void Convert() throws InvalidParameterException;

Converts all functions and structures from C to Java.

Return Value

No return value.

ConvertArgumentType

public String ConvertArgumentType(Variable var, Function func)
        throws UnrecognizedCodeException, InvalidParameterException;

Converts a C function argument's type into the equivalent Java type. Also determines how to do any string conversion (Ansi vs. Unicode) by setting function's stringformat.

Return Value

Returns String holding Java type name.

ParameterDescription
varVariable object to convert
funcFunction containing this Variable

Exceptions

UnrecognizedCodeException if cannot convert argument's C type to Java.

ConvertFunction

public String ConvertFunction(Function func)
        throws InvalidParameterException;

Converts a C function prototype to a Java wrapper.

Return Value

Returns String of converted Java wrapper or "" if conversion failed.

ParameterDescription
funcA Function object to convert.

Exceptions

InvalidParameterException if func is null.

finalizer

public void finalizer();

findFunction

protected Function findFunction(String Name);

Finds a Function with the given name in the Functions vector, returning the Function object.

Return Value

Returns reference to the Function or null if a function with that name didn't exist.

ParameterDescription
NameName of the Function to look for.

FindLibrary

protected void FindLibrary(Function func) throws InvalidParameterException;

Finds which library function occurs in based on loaded symbol files. Assumes the symbol tables have been set up to be effective. Changes Function's library field.

Return Value

No return value.

ParameterDescription
funcFunction to search for in symbol tables.

isCOperator

public boolean isCOperator(char c);

Is this character an operator or a valid first token in an operator in C?

Return Value

Returns true if c is a C operator or the first character in a C operator, else false

ParameterDescription
cchar to test

main

public static final void main(String args[]);

Main. Runs the application.

Return Value

No return value.

ParameterDescription
args[]Array of Strings containing command line parameters.

MungeVariables

protected void MungeVariables(Function func, StreamTokenizer st)
        throws IOException, UnrecognizedCodeException;

MungeVariables takes the current function name and a StreamTokenizer positioned right after the first parenthesis. It prints the function name, a tab, the type of a parameter, then the parameter name on a line for every parameter.

Example: void WINAPI foo(int, char ch);

Translates into:
foo\t void
foo\t int\t <anonymous>
foo\t char\t ch

Return Value

No return value.

ParameterDescription
funcFunction object whose arguments are being munged.
stStreamTokenizer positioned right after beginning '(' of function arguments.

Exceptions

IOException if StreamTokenizer has a problem

UnrecognizedCodeException if it can't parse a variable (unlikely, but possible)

OfficeFunctions

public void OfficeFunctions(String OfficeFileName, String MissingFileName);

Given a file name for a list of function names, it will print out the ones not in the parser's internal storage. Writes out the missing function names to the screen and to a file.

Return Value

No return value.

ParameterDescription
OfficeFileNameFile containing a list of function names, separated by whitespace
MissingFileNameFile to output all missing function names to

OutputToClassFile

protected void OutputToClassFile(Function func)
        throws InvalidParameterException;

Writes out a Function to the correct class file for that function.

Return Value

No return value.

ParameterDescription
funcFunction to write to a class.

Exceptions

InvalidParameterException if func is null.

PackHandler

protected void PackHandler(StreamTokenizer st, Stack packsize)
        throws InvalidParameterException, IOException, UnrecognizedCodeException;

Handles #pragma pack lines. Adjusts packsize stack as needed.

Return Value

No return value.

ParameterDescription
stStreamTokenizer positioned on a #pragma pack (specifically on "pack").
packsizeStack of PackContainer's representing current alignment.

Exceptions

InvalidParameterException if st isn't positioned on the word "pack"

IOException if st encounters an I/O error.

UnrecognizedCodeException if PackHandler encounters syntax error.

ParseFile

public void ParseFile(String FileIn) throws UnrecognizedCodeException,
        InvalidParameterException;

Parsefile(String) reads in the filename you pass it, stores the functions and structures from that file in a vector of functions or a hash table of structures. This is the main input processing function.

Return Value

No return value.

ParameterDescription
FileInFilename to parse.

Exceptions

UnrecognizedCodeException if there was a parsing problem.

InvalidParameterException if there's a problem with a function called by this one.

ParseSymbolFile

protected void ParseSymbolFile(String File) throws BadInputFileException,
        InvalidParameterException;

Reads in a symbol file, adding all symbols to the Symbol table, noting which file each symbol came from for use later when putting functions in files.

Symbol files can be generated by calling dumpbin on a library, like this:

dumpbin /exports c:\windows\system\kernel32.dll > kernel32.sym

Do not edit symbol files, except to get rid of function names with question marks or other really odd names in them. You can leave the Microsoft dumpbin header + trailer info, or you can remove them if you need to. (Its not used by this program, but there is a keyword used to stop skipping over the header). Symbol files should contain info like this:

                  1    0   AddAtomA  (000079FE)
                  2    1   AddAtomW  (00004478)

Return Value

No return value.

ParameterDescription
FileSymbol file name.

Exceptions

BadInputFileException if file is not strictly the output of dumpbin /exports

InvalidParameterException if one of the functions called here failed.

PopulateTypeLUT

protected void PopulateTypeLUT();

Inserts C type names and their corresponding Java types into TypeLUT, this class's internal hashtable. Edit this function if you want to handle another type name in a different way.

Tricky types are handled by leaving their type names as they were. Then you are forced to deal with them yourself when you try to compile the resulting file.

Took out most pointer to function types, hoping to recognize those at runtime. They need some special case handling anyway that I don't think we can do easily.

Return Value

No return value.

ReadExcludeFunctions

protected void ReadExcludeFunctions();

Reads through list of functions to exclude, in the file described by ExcludeFunctionFile.

Return Value

No return value.

readField

protected Variable readField(StreamTokenizer st, char separator,
        char terminator, boolean isInsideStruct, boolean allowAnonymous)
        throws UnrecognizedCodeException, InvalidParameterException, IOException,
        PuntException;

Given a StreamTokenizer, it will read a variable type and name, including the more complex user-defined data types like unions and structs. Assumes it is being called on text within a structure or a union, although it should work with functions too. Recursively calls ReadStructure if it hits an embedded structure. Reads a field until the ending separator or terminator, leaving st there. within this one. Also tries to handle some slightly different conversion rules while reading fields from structures.

Return Value

Returns a Variable object (or if isInsideStruct is true, a Field) representing the field read in.

ParameterDescription
stStreamTokenizer positioned at beginning of a field.
separatorcharacter used to separate multiple fields
terminatorcharacter used to end a list of fields
isInsideStructwhether we're reading a data structure
allowAnonymouswhether we can have anonymous data types declared in place.

Exceptions

UnrecognizedCodeException if the function gets lost.

InvalidParameterException if separator or terminator equal StreamTokenizer.TT_WORD, or if st is null.

IOException if StreamTokenizer has an IO problem.

PuntException if the parser doesn't understand this field or is told to ignore it, based on name and/or type.

ReadListofSymbolFiles

public void ReadListofSymbolFiles(String list)
        throws InvalidParameterException;

Reads in a file containing filenames of symbol files, then subsequently parses each symbol file. Filenames should be separated by newlines. Comment char is '#'.

Return Value

No return value.

ParameterDescription
listname of file containing paths to symbol files.

Exceptions

InvalidParameterException if list file isn't in the correct format.

readOperator

public Operator readOperator(StreamTokenizer st, boolean couldBePrefix)
        throws IOException;

Reads in a C++ operator, given a set of constraints on what this operator could be.

Return Value

Returns Operator instance of token we just read.

ParameterDescription
stStreamTokenizer positioned at start of operator.

Exceptions

IOException if StreamTokenizer has problems.

ReadStructure

protected multiFieldDataStructure ReadStructure(StreamTokenizer st,
        boolean insideStructure) throws UnrecognizedCodeException,
        InvalidParameterException, IOException;

Parses a multiFieldDataStructure, reading in its fields, etc. Returns the Struct or Union object.

Return Value

Returns a new multiFieldDataStructure object.

ParameterDescription
stStreamTokenizer positioned at the struct or union keyword.
insideStructuretrue if ReadStructure is nested in a struct or union.

Exceptions

UnrecognizedCodeException if struct was unparsible.

InvalidParameterException if StreamTokenizer wasn't positioned on struct.

IOException if StreamTokenizer can't read stream.

SetStringTypes

protected void SetStringTypes();

Fills the StringTypes Hashtable with all String types and how to convert them from Unicode to whatever format is needed. Punts on TCHAR and derivatives, setting them to auto.

Return Value

No return value.

SetupFileFilters

protected void SetupFileFilters();

Builds file Include and Exclude lists. Here is where we set IncludeHeaders and ExcludeHeaders to their original values. This should be edited when you add a new set of libraries, although the default rules should let your own header files be parsed with a warning. Remember, this program can't parse COM.

Return Value

No return value.

SetupOutputClasses

protected void SetupOutputClasses();

Initializes hash table containing package files for each of the DLL's. To add a new output class, create a PrintWriter for it, output the header info and class name to it, and add it to the hash table, using the symbol file name as the key. This function totally controls how various functions are routed into their own Java classes.

Return Value

No return value.

SetupPrecedenceTable

protected void SetupPrecedenceTable();

Set up a table of C operator precedence. Taken from the VC++ 5 online help. Keys are String's containing operator and values are ints describing precedence, with 0 being the lowest. Duplicate entries are handled by appending odd characters that convey some sense of the meaning with them. I spaced out the precedence numbers to add new operators, in case the table was incomplete or if the ISO committee goes change-happy.

Return Value

No return value.

UnifyFunctions

public void UnifyFunctions();

Scans through read in functions, looking for the ASCII and Unicode versions of any such functions. If it finds them both, it will strip off the last character, merging them into one function call. Deals with the 4 special cases I found in the Win32 API.

Return Value

No return value.

UnifyStructures

public void UnifyStructures();

Scans through read in functions, looking for the ASCII and Unicode versions of any such functions. If it finds them both, it will strip off the last character, merging them into one function call. Deals with the 4 special cases I found in the Win32 API.

Return Value

No return value.

usage

public static final void usage();

Prints the command line syntax to stdout.

Return Value

No return value.

WriteOutFunctions

public void WriteOutFunctions(PrintWriter pw);

Prints the functions out to the PrintWriter. Uses the format specified in Function::toString().

Return Value

No return value.

ParameterDescription
pwPrintWriter to send output to.

Fields

AnonymousString
CallbackString
Comment_Variant_Types
CopyrightNotice
DEBUG
ExcludeFunctionFile
ExcludeFunctions
ExcludeHeaders
Functions
IncludeHeaders
OutputClasses
PackageName
Precedence
PuntOutput
ReadSymbols
StringTypes
StructureLUT
Suppress_UnknownLib_Functions
Suppress_Unused_Structures
Symbols
TypeLUT
UnknownFileString
UnknownLibraryString