LALR(1) parser in C#
Always being interested in parsers, I decided to try my luck on writing an LALR(1) parser in C#. Traditionally these parsers (Yacc, Bison) have been written so that you must run the scanner/lexer first, then include the output into your project as separate files.
I decided it was smarter if everything was assembled into one Parser class, which you could then instantiate in your program. I wanted to be able to configure it via a grammar also based on objects, build it, and then run the parser every time I wanted to parse a text string.
This is similar to the way the Regex class in C# works. You compile your regex first by instantiating a Regex class and give it the regex pattern, thereafter you can use it to match text strings multiple times.
An example
The following example is for a parser implementing the following grammar (written in BNF notation)
Expr ::= <Term>
Term ::= <Term> Plus <Factor>
| <Term> Minus <Factor>
| <Factor>
Factor ::= <Factor> Multiply <Primary>
| <Factor> Divide <Primary>
| <Primary>
Primary ::= Number
| Pi
| LPar <Term> RPar
| Minus <Primary>
Terminals and non-terminals
Instead of having the grammar in a separate text file, the grammar for this parser is built by creating a number of Terminal and NonTerminal objects:
// Terminals
Terminal Plus = new Terminal("plus", "\\+");
Terminal Minus = new Terminal("minus", "-");
Terminal Multiply = new Terminal("multiply", "\\*");
Terminal Divide = new Terminal("divide", "/");
Terminal LPar = new Terminal("lPar", "\\(");
Terminal RPar = new Terminal("rPar", "\\)");
Terminal Pi = new Terminal("pi", "pi");
ValueTerminal Number = new ValueTerminal("number", "([0-9]*\\.)?[0-9]+”);
// Non-terminals
NonTerminal Expr = new NonTerminal(”Expression”);
NonTerminal Term = new NonTerminal(”Term”);
NonTerminal Factor = new NonTerminal(”Factor”);
NonTerminal Primary = new NonTerminal(”Primary”);
The Terminal class constructor takes two parameters: A name (used for debugging) and a regex pattern. ValueTerminal is a terminal whose value is of interest, and which should be available to the code fragments that are evaluated when the parser is run (see later).
Productions
In addition to declaring terminals and non-terminals, the set of productions must also be specified:
new Production3(Term, new Symbol[] { Term, Plus, Factor }, (a, b) => a + b);
new Production3(Term, new Symbol[] { Term, Minus, Factor }, (a, b) => a - b);
new Production(Term, new Symbol[] { Factor });
new Production3(Factor, new Symbol[] { Factor, Multiply, Primary }, (a, b) => a * b);
new Production3(Factor, new Symbol[] { Factor, Divide, Primary }, (a, b) => a / b);
new Production(Factor, new Symbol[] { Primary });
new Production2(Primary, new Symbol[] { Number }, a => Double.Parse(a));
new Production1(Primary, new Symbol[] { Pi }, () => Math.PI);
new Production(Primary, new Symbol[] { LPar, Term, RPar });
new Production2(Primary, new Symbol[] { Minus, Primary }, a => -a);
The various Production classes take as their first parameter the non-terminal that goes on the laft hand side of the ::= operator of the production. The second parameter is a list of terminals and non-terminals that correspond to the right side of the ::= operator.
The third parameter - which is optional - is a delegate that should be called whenever the corresponding production is reduced. That is, when a number of symbols have been read and matched against the right hand side of the production.
If the Production class is used, the delegate will receive one parameter, an Object[], containing the values of the symbols on the right hand side of the ::= operatorand. It should also return an Object. If one of the template classes are used, the delegate will be called with a number of parameters corresponding to the actual number of symbols on the right hand side of the ::= operator. The parameters will be correctly typed depending on the types specified for the template.
- Use Production1<R> when there are no input parameters and you only want to specify the type (R) of the output of the delegate
- Use Production2<A,R> when there is one input parameter (with type A) and output (with type R)
- Use Production3<A,B,R> when there are two input parameters (with types A and B) and output (with type R)
- etc.
Note that not all symbols on the right hand side of the ::= operator will be passed on to the delegate. Only those that represent non-terminals or value-terminals, as mentioned above. This is because the value of a terminal like “+” is always “+” and is therefore not significant.
Creating the parser
To create the parser, create a Grammar and give that and the start symbol as input to the Parser’s constructor:
parser = new Parser<double>(
new Grammar(
new NonTerminal[] { Expr, Term, Factor, Primary },
new Terminal[] { Plus, Minus, Multiply, Divide, LPar, RPar, Pi, Number }
),
Expr
);
The parser can now be used to parse text strings, like this:
double result = parser.Parse("(2+3)*4");
Which will put 20.0 in result.
