Earley Deduction and Earley Parsing:
Parsing of Arbitrary Context-Free Grammars
Overview
There are a number of parsing algorithms for context-free grammars, but most have restrictions. If you are looking for an efficient algorithm to handle arbitrary context-free grammars, this is the page for you.
The parsing algorithms commonly used for programming languages, including the LL and LR algorithms, can only handle unambiguous grammars. However, when processing natural languages, ambiguous grammars are generally used, since natural languages themselves contain much ambiguity.
Prolog is a logic programming language and every context-free grammar can be written as a Prolog program with virtually no change. A Prolog system can then be used to "execute" the logic program, which results in parsing a string.
Unfortunately, the Prolog execution model results in a top-down parse, which is not generally adequate for natural language grammars.
Earley deduction is an execution model which can be applied to logic programs, but unlike the Prolog execution model, Earley deduction is complete in the sense that it will find all solutions. As a general proof technique, Earley deduction can be applied to an arbitrary set of Horn clauses, an important subset of First-Order Logic.
When Earley deduction is applied to a logic program representing a context-free grammar, the resulting algorithm is known as "Earley Parsing" and "Chart Parsing."
DATALOG is a subset of logic programs, i.e., a subset of Horn clauses. DATALOG is sufficient to encode relational operators and context-free grammars. Definite Clause Grammars (DCGs), a notation for expressing context-free grammars, fits within the DATALOG subset.
These papers go into greater depth on Earley deduction (and by extension, Earley parsing) and discuss optimizations for efficient implementation. Some performance results are also mentioned.
Papers
-
Earley Deduction (12 pages)
Describes Earley deduction; gives straightforward arguments for consistency and completeness; describes the DATALOG subset into which context-free grammars can be written; discusses implementation techniques;
html
pdf
-
Optimizations to Earley Deduction for DATALOG Programs (11 pages)
More details on the efficient implementation of Earley deduction for the DATALOG subset.
html
pdf
About the Author
Harry H. Porter III, Ph.D.
Computer Science Department
Portland State University
Short Bio:
click here
Harry's Website: www.cs.pdx.edu/~harry
Email: harry@cs.pdx.edu
This page was produced May 15, 2009.