Flex & Bison are a set of tools developed by GNU that are used to generate a parser that can deal with structured input. Flex is a tokenizer that divided input character stream into tokens. These tokens are then sent to Bison which will match them against specific grammar rules and perform actions accordingly.
Flex & Bison have been around for more than 40 years. When they first born, there is even no cpp. So at first they only generate parser in c which includes lot of global variables and usually can not be reused. Unfortunately, today even they are officially support cpp, most of the tutorial is still using c code. And that is why I want to do this tutorial to assist you switch to cpp. Here I suppose you have master the basic usage and if you are not you may refer to this article.
The outline of our parser will be like this.
Note: All of the codes are available at the end of this article.
class Driver
{
public:
Driver();
~Driver();
int parse();
int parse_file(std::string& path);
private:
Scanner* scanner; //Generated by Flex
Parser* parser; //Generated by Bison
location* location;//Used to track errors
};
In this part, we will set up out flex file named scanner.l
.
%option nodefault
%option debug
%option noyywrap
%option prefix="MyParser"
%option yylineno
%option c++
First, we need to put the above options into our flex file. Their meanings are as following:
nodefault
: Let flex do not generate default token.debug
: Enable debug infomation.noyywrap
: Flex will continue even if an EOF is read.prefix
: Specific namespace of generated lexer.yylineno
: Count line number.c++
: Ask flex to generate c++ lexer.After this, we can defined our rules like this.
[0-9]{1,} return MyParser::Parser::make_NUM(atoi(yytext),loc);
"-" return MyParser::Parser::make_MINUS(loc);
"+" return MyParser::Parser::make_PLUS(loc);
\n return MyParser::Parser::make_NEWLINE(loc);
[ \t]+ /* ignore whitespace */
. return MyParser::Parser::make_ILLEGAL(std::string(yytext),loc);
These options need to be put into parser.y
.
%locations
%define api.namespace {MyParser}
%define api.parser.class {Parser}
%lex-param {MyParser::Driver &driver}
%parse-param {MyParser::Driver &driver}
%define parse.error verbose
%language "c++"
%define api.value.type variant
%define api.token.constructor
Their meanings are as following:
location
: Enable location tracking.parse.error verbose
: Let bison generate detailed error messages.parse-param
: Parameters passed to yyparse.lex-param
: Parameters passed to yylex.language
: Enable location tracking.variant
: Use cpp variant feature instead of c style union.constructor
: Generate constructor like make_PLUS
we have used in scanner.l
.Then we can add tokens and typed.
%token NEWLINE PLUS MINUS
%token NUM
%token END
%token ILLEGAL
%type EXPR
We may return value type for each grammer. Note that token type must match the rules specificed by scanner.l
.
In order to read input line by line so that we can properly display error message. We need to reload LexerInput
.
virtual size_t LexerInput( char* buf, size_t max_size );
And also update laoction when a token is read or a newline arrives.
/* scanner.l */
#define YY_USER_ACTION \
{loc.columns(yyleng); \
driver.scanner->current_col = \
driver.scanner->current_col_end; \
driver.scanner->current_col_end += yyleng;}
/* parser.y */
STATEMENT :
{
printf("Enter expression:");
}
| STATEMENT EXPR NEWLINE
{
printf("The result is %f\n",$2);
printf("Enter expression:");
driver.location->lines();
driver.location->step();
driver.scanner->reset_current_col();
}
Now we have our location tracked and every line of input buffered. When an error occur, Bison will call Parser::error()
and stop parsing. To prevent this from happening, we can utilize a special rule error
which will match all unrecognized tokens.
/* Error display */
void Parser::error
(const location& loc, const std::string& m)
{
size_t current_col
= driver.scanner->current_col;
std::cout<< "line " <<loc<< ": " <<m<< "\n";
fprintf(stderr,"\t%s\t",
driver.scanner->current_line.c_str());
for(int i = 0; i < current_col; i++)
fprintf(stderr,"~");
fprintf(stderr,"^\n");
}
/* Error recovery */
| STATEMENT error NEWLINE
{
driver.location->lines();
driver.location->step();
driver.scanner->reset_current_col();
printf("Enter expression:");
}
The error message will be like this.
Enter expression:12+6-0&+66
line 1.1-7: syntax error, unexpected ILLEGAL, expecting NEWLINE
12+6-0&+66
~~~~~~^
Enter expression:
For more details and complete codes, you may want to check my github repository here.