- Source: GitHub
- Reference:
Interpreters - Context Free Grammar
Agenda
- Introduction to Context Free Grammar
- Defining CFG rules for Lua
What is Context Free Grammar?
-
A formal grammar for defining the syntax of a language
-
Consists of a set of rules describing all possible strings
-
Need to establish the syntax rules before implementation
-
Create recursive descent parser
- Simple to implement
- Directly map rules to parsing functions
Components of CFG
-
Production rules: Rules combining terminals and non-terminals
-
Terminals: The basic "alphabet" which forms strings
- Letters form words, words form sentences
- Tokens form the grammar of the programming language
-
Non-terminals: A named reference to another rule to compose grammar
- S --> 'a'A ;
- A --> 'b' | 'c' ;
-
Start symbol: Represents the language, root of the production process
-
Other symbols: |, ?, *
Example
identifier --> (letter | "_") (letter | digit | "_")*
letter --> 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' |
'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' |
'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' |
'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z'
digit --> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
- A programming language's grammar is composed of statements and expressions
program --> (statement)* ;
statement --> ?? ;
expression --> ?? ;
Regular vs Context Free
- A regular language can be described by regular expressions
- Regular expressions are sequences of characters defining a pattern
- Don't use extra memory apart from knowing its current state
- Context-free languages can be described with CFG
- May use additional memory (stack to track recursive structures)
Balanced Parentheses Example
Usage in interpreters
-
Tokenisation involves matching the substrings with a specific pattern
-
The lexical structure is regular, it can be defined with regular expressions
-
Parsing analyses the structure of the program
- Conditional statements
- Loops
- Function calls
-
CFGs being more expressive, can describe nested structures of arbitrary depth
Expressions and statements
-
An expression is evaluated to produce a value
- Values (ex.
10
) - Operators (ex.
4 + 8 / 2
) - Function calls (ex.
max(a, b)
)
- Values (ex.
-
A statement is an instruction that can be executed to perform an action
- Assignments
- Control flow (conditions, loops)
- Function and class definitions
- Expression statements (ex.
print(x)
)
Statements
-
Define a set of grammar rules for:
- Variable assignments / declarations
- Blocks
- Control structures
- Function calls
Variables
- There are global, local variables and table fields
- An identifier represents a variable (
myTable
,x
,_variable
) - Square brackets are used to index a table (
myTable["key"]
,myTable[2]
) - Can also use "." syntax (
myTable.key
) - Can index the result of a variable, result of function call or another table
variable --> identifier |
prefixExpression '[' expression ']' |
prefixExpression '.' identifier ;
prefixExpression --> variable | functionCall | '(' expression ')'
prefixExpression examples
table = { x = 30 }
print(table['x']) -- `prefixExpression` = `table`
function getTable()
return { x = 30 }
end
print(getTable()['x']) -- `prefixExpression` = `getTable()`
table = { x = { y = 30 }}
print(table['x']['y']) -- `prefixExpression` = `table['x']`
Variable assignments
-
Multiple assignments allowed
-
Left side: List of variables
-
Right side: List of expressions
-
Delimiter: ,
a = 3
x = {}
b, c, x[a] = a - 1, "Hello", 20
Variables grammar
statement --> variableList '=' expressionList ;
variableList --> variable (',' variable)* ;
expressionList --> expression (',' expression)* ;
variable --> identifier |
prefixExpression '[' expression ']' |
prefixExpression '.' identifier ;
Local declarations
- Use local keyword
- Can specify attributes with < > (
local x<const> = 10
)
statement --> variableList '=' expressionList |
'local' attrVarList ('=' expressionList)? ;
attrVarList --> identifier attribute (',' identifier attribute)* ;
attribute --> ('<' identifier '>')? ;
Blocks
-
Blocks are a list of statements executed sequentially
- A program can have 0 or many statements
- Blocks sometimes have a return statement
block --> (statement)* returnStatement? ;
Chunks
-
In Lua, a "chunk" represents a sequence of statements followed optionally by a return statement
-
Represents a complete or part of a Lua script
-
Syntactically, a chunk is just a block
- Source files are treated as a chunk
- Using load function to execute a string of Lua code
- Blocks of anonymous functions
chunk --> block ;
Control structures
-
if, then and end are required
-
0 or more elseif
-
Optional else
-
Blocks containing statements to execute
if x > 0 then print("Number is positive") elseif x == 0 then print("Number is zero") else print("Number is negative") end
-
Updated statement
statement --> variableList '=' expressionList | 'local' attrVarList ('=' expressionList)? | 'if' expression 'then' block ('elseif' expression 'then' block)* ('else' block)? 'end' ;
While loop
- while, do and end keywords
- Block containing statements to execute
count = 10
while count > 0 do
print(count)
count = count - 1
end
statement --> ... |
'while' expression 'do' block 'end' ;
Repeat until loop
-
repeat and until keywords
-
Block containing statements to execute
-
Expression at the end
count = 0 repeat print(count) count = count + 1 until count == 10
-
Updated statement
statement --> ... | 'repeat' block 'until' expression ;
For loops
-
Numerical for loop
- Block is repeated as a variable undergoes arithmetic progression
-
Generic for loop (iterators)
- Block is repeated as long as iterator function can produce a new value
Numerical for loop
-
for, do and end keywords
-
Block containing statements to execute
-
Can specify three values for initial, min / max value, increment
for i = 1, 5, 2 do print("Iteration", i) end
-
Updated statement
statement --> ... | 'for' identifier '=' expression ',' expression (',' expression)? 'do' block 'end' ;
Generic for loop
-
for, in, do and end keywords
-
List of variable names
-
Block containing statements to execute
local person = { name = "Bob", age = 30, city = "Perth" } for key, value in pairs(person) do print(key, ":", value) end
-
Updated statement
statement --> ... | 'for' identifierList 'in' expressionList 'do' block 'end' ; identifierList --> identifier (',' identifier)* ;
Function calls
-
Can be executed to simply perform actions with no return value
-
0 to many arguments
-
Can use ":" syntax to invoke functions on tables, passing its own reference
function greet(name) print("Hello, " .. name) end greet("Bob")
table = { greet = function(self, name) print("Hello, " .. name) print("Hello, " .. self.name) end, name = "Mary" } table:greet("Bob")
-
Updated statement
statement --> ... | functionCall ; functionCall --> prefixExpression arguments | prefixExpression ':' identifier arguments ; arguments --> '(' expressionList? ')' ;
Function definitions
- Optionally, use local keyword
- function and end required
- Name, optional list of parameters, function body
function greet() print("Hello") end
local function greet() print("Hello") end
Function names
-
Identifier
-
Declare functions in table fields (global)
-
Declare functions with colon syntax (global)
table = { nestedTable = {} } function table.nestedTable.greet() print("Hello") end table.nestedTable.greet()
table = { name = "Bob" } function table:greet() print("Hello from: ", self.name) end table:greet() -- Same as table.greet(table)
Function parameters
-
Empty
-
List of identifiers
-
Optionally, use spread syntax at the end for variadic functions
function func(a, b, ...) print(...) -- 3, 4 end func(1, 2, 3, 4)
-
Updated statement
statement --> ... | 'function' functionName body | 'local function' identifier body ; functionName --> identifier ('.' identifier)* (':' identifier)? ; body --> '(' (parameterList)? ')' block 'end' ; parameterList --> identifierList (',' '...')? | '...' ;
Expressions
- Types of expressions to consider:
- Literals (number, string, boolean)
- Unary operations
- Binary operations
- Prefix expressions
- Anonymous function
- Table
Table example
- Separate with either , or ;
- Implicit index assignments override explicit assignments
mixedTable = { [1] = "first", [2] = "second"; key1 = "value1", key2 = "value2"; "default" } print(mixedTable[1]) -- "default"
Expressions in Lua
expression --> literal |
unaryOp expression |
expression binaryOp expression |
prefixExpression | table | functionDef ;
literal --> NUMBER | STRING | "true" | "false" | "nil" | '...' ;
unaryOp --> "-" | "~" | "not" | "#"
binaryOp --> "==" | "~=" | "<" | "<=" | ">" | ">=" | "+" | "-" | "*" | "/" |
"^" | "%" | ".." | "|" | "//" | "%" | "&"| "<<" | ">>" |
"and" | "or" ;
functionDef --> 'function' body ;
table --> '{' fieldList? '}' ;
fieldList --> field (fieldSeparator field)* fieldSeparator? ;
field --> '[' expression ']' '=' expression | identifier '=' expression | expression ;
fieldSeparator --> ',' | ';'
Ambiguity
- Precedence: Determine which operator is evaluated first
- Consider the expression:
10 / 2 + 3
- The rules generate the same string but different trees
Addressing ambiguity
-
Restructure the grammar
- Evaluate operators with higher precedence first
- Create a rule for each precedence level
- Each rule matches expressions at its precedence level or higher
-
Associativity: Determine which common operators are evaluated first
- Left-associative:
10 / 5 / 2
- Right-associative:
a = b = c
- Left-associative:
Precedence and associativity in Lua
-
Binary operators are left associative, except .. and ^
-
Listed from highest to lowest precedence
- ^
- not, #, -, ~
- *, /, //, %
- +, -
- ..
- <<, >>
- &
- ~
- |
- <, >, <=, >=, ~=, ==
- and
- or
New grammar for expressions
expression --> logicalOr ;
logicalOr --> logicalAnd ("or" logicalAnd)* ;
logicalAnd --> comparison ("and" comparison)* ;
comparison --> bitwiseOr (( "==" | "~=" | ">" | ">=" | "<" | "<=" ) bitwiseOr)* ;
bitwiseOr --> bitwiseXor ( "|" bitwiseXor )* ;
bitwiseXor --> bitwiseAnd ( "~" bitwiseAnd )* ;
bitwiseAnd --> shift ( "&" shift )* ;
shift --> concatenation (( "<<" | ">>" ) concatenation)* ;
concatenation --> term ( ".." concatenation )* ;
term --> factor (( "+" | "-" ) factor)* ;
factor --> unary (( "*" | "/" | "//" | "%" ) unary)* ;
unary --> ( "-" | "not" | "#" | "~" ) unary | power ;
power --> primary ("^" power)* ;
primary --> NUMBER | STRING | BOOLEAN | NIL | "(" expression ")" ;
Entire grammar for Lua
chunk --> block ;
block --> (statement)* returnStatement? ;
statement --> variableList '=' expressionList |
'local' attrVarList ('=' expressionList)? |
'if' expression 'then' block
('elseif' expression 'then' block)*
('else' block)?
'end' |
'while' expression 'do' block 'end' |
'repeat' block 'until' expression |
'for' identifier '=' expression ',' expression
(',' expression)?
'do' block 'end' |
'for' identifierList 'in' expressionList 'do' block 'end' |
functionCall |
'function' functionName body |
'local function' identifier body ;
variableList --> variable (',' variable)* ;
expressionList --> expression (',' expression)* ;
identifierList --> identifier (',' identifier)* ;
attrVarList --> identifier attribute (',' identifier attribute)* ;
attribute --> ('<' identifier '>')? ;
prefixExpression --> variable | functionCall | '(' expression ')'
variable --> identifier |
prefixExpression '[' expression ']' |
prefixExpression '.' identifier ;
functionCall --> prefixExpression arguments |
prefixExpression ':' identifier arguments ;
arguments --> '(' expressionList? ')' ;
functionName --> identifier ('.' identifier)* (':' identifier)? ;
body --> '(' (parameterList)? ')' block 'end' ;
parameterList --> identifierList (',' '...')? | '...' ;
expression --> logicalOr ;
logicalOr --> logicalAnd ("or" logicalAnd)* ;
logicalAnd --> comparison ("and" comparison)* ;
comparison --> bitwiseOr (( "==" | "~=" | ">" | ">=" | "<" | "<=" ) bitwiseOr)* ;
bitwiseOr --> bitwiseXor ( "|" bitwiseXor )* ;
bitwiseXor --> bitwiseAnd ( "~" bitwiseAnd )* ;
bitwiseAnd --> shift ( "&" shift )* ;
shift --> concatenation (( "<<" | ">>" ) concatenation)* ;
concatenation --> term ( ".." concatenation )* ;
term --> factor (( "+" | "-" ) factor)* ;
factor --> unary (( "*" | "/" | "//" | "%" ) unary)* ;
unary --> ( "-" | "not" | "#" | "~" ) unary | power ;
power --> primary ("^" power)* ;
primary --> NUMBER | STRING | "true" | "false" | "nil" | functionDef | table | prefixExpression ;