Abstract Syntax Tree
What is it?
- Acts as the intermediate step between the front end and back end of the compiler.
- It's called an abstract syntax tree because it abstracts away information by storing it in a tree structure.
- Storing it in a tree structure allows it to abstract away information such as brackets due to it being in a hierarchical structure.
- Below are some basic equations broken into an AST.
How is Source Code Processed
- Source code goes through lexical analysis which involves the compiler looking through the source code for tokens.
- A token can be anything from a literal to function.
- Once all the tokens have been identified, they are parsed in for syntax analysis.
- This is where the AST is created.
Lexical Analysis - Tokenization
- Breaks the code down into smaller units called tokens.
- Tokens can represent identifiers, keywords, operators, and constants.
- The process involves scanning code character by character.
- Tokens are classified based on predefined rules that are defined by the programming language.
- Lexical Analysis really helps in identifying the structure of code and is a crucial step in language processing tasks.
Syntax Analysis - Parsing
- The step proceeding lexical analysis.
- Analyses the structure of the code based on the grammar rules of the programming language.
- Checks if the sequence of tokens generated by lexical analysis forms valid statements or expressions according to the language syntax.
- Parsing identifies syntactical errors.
- Successful parsing ensures that the code conforms to the language's syntax rules, facilitating further compilation.
Developer Applications of AST
1. Audits
2. Transforms
3. Linting
1.) Code Audits
A common scenario:
Counting how many times a function, variable, component or prop is used in source code.
- Some examples of things you may audit your code for include:
- TODOs
- console.log()
- legal comments
- Any other specific patterns
2.) Transforming Code
A common scenario:
Transforming code from one syntax to another.
- Babel does this, a common transformation for JavaScript is to convert from ES6 to ES5
3.) Linting
A common scenario:
Enforcing and applying specific rules for syntax on a code base.
-
A common linter is ESLint, where the code will be transformed into an AST where it can then be traversed to check for certain styles.
-
Below is an example of what doing that manually can look like:
const { ESLint } = require('eslint'); const codeToLint = ` const add = (a, b) => { return a + b; } console.log(add(2, 3)); `; async function lintCode(code) { const eslint = new ESLint(); const results = await eslint.lintText(code); return results[0].messages; } (async () => { const lintingIssues = await lintCode(codeToLint); console.log("Linting issues:"); console.log(lintingIssues); })();
Links to AST Visualizer
Have a Play Around with Creating and Traversing an AST
You only need to install 2 dependencies:
npm install --save-dev @babel/parser
npm install --save-dev @babel/traverse
Below are the code snippets from the talk:
const { parse } = require("@babel/parser");
// const code = "2 + (4 * 10)";
// const ast = parse(code);
// console.log(ast);
// ------------------------------------------------
// const code = "2 + (4 * 10)";
// const ast = parse(code);
// console.log(ast.program.body[0]);
// ---------------------------------------------------
const code = "2 + (4 * 10)";
const ast = parse(code);
console.log(ast.program.body[0].expression.left.value);
console.log(ast.program.body[0].expression.right.left.value);
console.log(ast.program.body[0].expression.right.right.value);
const { parse } = require("@babel/parser");
const traverse = require("@babel/traverse").default;
const code = "2 + (3 - (4 * 10))";
const ast = parse(code);
traverse(ast, {
NumericLiteral(path) {
console.log(path.node.value);
}
// NumberLiteral: {
// enter(path) {
// console.log(`Entered ${path.node.value}`);
// },
// exit(path) {
// console.log(`Exited ${path.node.value}`)
// }
// }
});