Flex & Bison are a set of tools developed by GNU that are used to generate a parser that can deal with structured input. Flex is a tokenizer that divided input character stream into tokens. These tokens are then sent to Bison which will match them against specific grammar rules and perform actions accordingly.
Flex & Bison have been around for more than 40 years. When they first born, there is even no cpp. So at first they only generate parser in c which includes lot of global variables and usually can not be reused. Unfortunately, today even they are officially support cpp, most of the tutorial is still using c code. And that is why I want to do this tutorial to assist you switch to cpp. Here I suppose you have master the basic usage and if you are not you may refer to this article.
The outline of our parser will be like this.
Note: All of the codes are available at the end of this article.
class Driver
{
public:
Driver();
~Driver();
int parse();
int parse_file(std::string& path);
private:
Scanner* scanner; //Generated by Flex
Parser* parser; //Generated by Bison
location* location;//Used to track errors
};
In this part, we will set up out flex file named scanner.l.
%option nodefault
%option debug
%option noyywrap
%option prefix="MyParser"
%option yylineno
%option c++
First, we need to put the above options into our flex file. Their meanings are as following:
nodefault: Let flex do not generate default token.debug: Enable debug infomation.noyywrap: Flex will continue even if an EOF is read.prefix: Specific namespace of generated lexer.yylineno: Count line number.c++: Ask flex to generate c++ lexer.After this, we can defined our rules like this.
[0-9]{1,} return MyParser::Parser::make_NUM(atoi(yytext),loc);
"-" return MyParser::Parser::make_MINUS(loc);
"+" return MyParser::Parser::make_PLUS(loc);
\n return MyParser::Parser::make_NEWLINE(loc);
[ \t]+ /* ignore whitespace */
. return MyParser::Parser::make_ILLEGAL(std::string(yytext),loc);
These options need to be put into parser.y.
%locations
%define api.namespace {MyParser}
%define api.parser.class {Parser}
%lex-param {MyParser::Driver &driver}
%parse-param {MyParser::Driver &driver}
%define parse.error verbose
%language "c++"
%define api.value.type variant
%define api.token.constructor
Their meanings are as following:
location: Enable location tracking.parse.error verbose: Let bison generate detailed error messages.parse-param: Parameters passed to yyparse.lex-param: Parameters passed to yylex.language: Enable location tracking.variant: Use cpp variant feature instead of c style union.constructor: Generate constructor like make_PLUS we have used in scanner.l.Then we can add tokens and typed.
%token NEWLINE PLUS MINUS
%token NUM
%token END
%token ILLEGAL
%type EXPR
We may return value type for each grammer. Note that token type must match the rules specificed by scanner.l.
In order to read input line by line so that we can properly display error message. We need to reload LexerInput.
virtual size_t LexerInput( char* buf, size_t max_size );
And also update laoction when a token is read or a newline arrives.
/* scanner.l */
#define YY_USER_ACTION \
{loc.columns(yyleng); \
driver.scanner->current_col = \
driver.scanner->current_col_end; \
driver.scanner->current_col_end += yyleng;}
/* parser.y */
STATEMENT :
{
printf("Enter expression:");
}
| STATEMENT EXPR NEWLINE
{
printf("The result is %f\n",$2);
printf("Enter expression:");
driver.location->lines();
driver.location->step();
driver.scanner->reset_current_col();
}
Now we have our location tracked and every line of input buffered. When an error occur, Bison will call Parser::error() and stop parsing. To prevent this from happening, we can utilize a special rule error which will match all unrecognized tokens.
/* Error display */
void Parser::error
(const location& loc, const std::string& m)
{
size_t current_col
= driver.scanner->current_col;
std::cout<< "line " <<loc<< ": " <<m<< "\n";
fprintf(stderr,"\t%s\t",
driver.scanner->current_line.c_str());
for(int i = 0; i < current_col; i++)
fprintf(stderr,"~");
fprintf(stderr,"^\n");
}
/* Error recovery */
| STATEMENT error NEWLINE
{
driver.location->lines();
driver.location->step();
driver.scanner->reset_current_col();
printf("Enter expression:");
}
The error message will be like this.
Enter expression:12+6-0&+66
line 1.1-7: syntax error, unexpected ILLEGAL, expecting NEWLINE
12+6-0&+66
~~~~~~^
Enter expression:
For more details and complete codes, you may want to check my github repository here.