Terence Ng
| Blogs

Flex & Bison Cpp Tutorial

What is Flex & Bison ?

Flex & Bison are a set of tools developed by GNU that are used to generate a parser that can deal with structured input. Flex is a tokenizer that divided input character stream into tokens. These tokens are then sent to Bison which will match them against specific grammar rules and perform actions accordingly.

Flex & Bison and c++

Flex & Bison have been around for more than 40 years. When they first born, there is even no cpp. So at first they only generate parser in c which includes lot of global variables and usually can not be reused. Unfortunately, today even they are officially support cpp, most of the tutorial is still using c code. And that is why I want to do this tutorial to assist you switch to cpp. Here I suppose you have master the basic usage and if you are not you may refer to this article.

Setup Driver

The outline of our parser will be like this.
Note: All of the codes are available at the end of this article.

class Driver
{
public:
    Driver();
    ~Driver();

    int parse();
    int parse_file(std::string& path);

private:
    Scanner*    scanner; //Generated by Flex
    Parser*        parser;  //Generated by Bison
    location*    location;//Used to track errors
};

Setup Scanner

In this part, we will set up out flex file named scanner.l.

%option nodefault
%option debug
%option noyywrap
%option prefix="MyParser"
%option yylineno
%option c++

First, we need to put the above options into our flex file. Their meanings are as following:

After this, we can defined our rules like this.

[0-9]{1,} return MyParser::Parser::make_NUM(atoi(yytext),loc);
"-" return MyParser::Parser::make_MINUS(loc);
"+" return MyParser::Parser::make_PLUS(loc);
\n return MyParser::Parser::make_NEWLINE(loc);
[ \t]+ /* ignore whitespace */
. return MyParser::Parser::make_ILLEGAL(std::string(yytext),loc);

Setup Parser

These options need to be put into parser.y.

%locations
%define api.namespace {MyParser}
%define api.parser.class {Parser}
%lex-param {MyParser::Driver &driver}
%parse-param {MyParser::Driver &driver}
%define parse.error verbose
%language "c++"
%define api.value.type variant
%define api.token.constructor

Their meanings are as following:

Then we can add tokens and typed.

%token NEWLINE PLUS MINUS 
%token  NUM 
%token END
%token  ILLEGAL
%type  EXPR    

We may return value type for each grammer. Note that token type must match the rules specificed by scanner.l.

Error Detection and Recovery

In order to read input line by line so that we can properly display error message. We need to reload LexerInput.

virtual size_t LexerInput( char* buf, size_t max_size );

And also update laoction when a token is read or a newline arrives.

/* scanner.l */
#define YY_USER_ACTION \
{loc.columns(yyleng); \
driver.scanner->current_col = \
    driver.scanner->current_col_end; \
driver.scanner->current_col_end += yyleng;}

/* parser.y */
STATEMENT : 
{  
    printf("Enter expression:");
}
| STATEMENT EXPR NEWLINE
{
    printf("The result is %f\n",$2);
    printf("Enter expression:");
    driver.location->lines();
    driver.location->step();
    driver.scanner->reset_current_col();
}

Now we have our location tracked and every line of input buffered. When an error occur, Bison will call Parser::error() and stop parsing. To prevent this from happening, we can utilize a special rule error which will match all unrecognized tokens.

/* Error display */
void Parser::error
    (const location& loc, const std::string& m)
{
    size_t current_col 
        = driver.scanner->current_col;
    std::cout<< "line " <<loc<< ": " <<m<< "\n";
    fprintf(stderr,"\t%s\t",
        driver.scanner->current_line.c_str());
    for(int i = 0; i < current_col; i++)
        fprintf(stderr,"~");
    fprintf(stderr,"^\n");
    }

/* Error recovery */
| STATEMENT error NEWLINE
{
    driver.location->lines();
    driver.location->step();
    driver.scanner->reset_current_col();
    printf("Enter expression:");
}

The error message will be like this.

Enter expression:12+6-0&+66
line 1.1-7: syntax error, unexpected ILLEGAL, expecting NEWLINE
    12+6-0&+66
    ~~~~~~^
Enter expression:

Code Download

For more details and complete codes, you may want to check my github repository here.