Graduate Program KB

Abstract Syntax Tree

What is it?

  • Acts as the intermediate step between the front end and back end of the compiler.
  • It's called an abstract syntax tree because it abstracts away information by storing it in a tree structure.
    • Storing it in a tree structure allows it to abstract away information such as brackets due to it being in a hierarchical structure.
  • Below are some basic equations broken into an AST.

How is Source Code Processed

  • Source code goes through lexical analysis which involves the compiler looking through the source code for tokens.
    • A token can be anything from a literal to function.
  • Once all the tokens have been identified, they are parsed in for syntax analysis.
    • This is where the AST is created.

width:900px


Lexical Analysis - Tokenization

  • Breaks the code down into smaller units called tokens.
  • Tokens can represent identifiers, keywords, operators, and constants.
  • The process involves scanning code character by character.
  • Tokens are classified based on predefined rules that are defined by the programming language.
  • Lexical Analysis really helps in identifying the structure of code and is a crucial step in language processing tasks.

Syntax Analysis - Parsing

  • The step proceeding lexical analysis.
  • Analyses the structure of the code based on the grammar rules of the programming language.
  • Checks if the sequence of tokens generated by lexical analysis forms valid statements or expressions according to the language syntax.
  • Parsing identifies syntactical errors.
  • Successful parsing ensures that the code conforms to the language's syntax rules, facilitating further compilation.

Developer Applications of AST

1. Audits
2. Transforms
3. Linting

1.) Code Audits

A common scenario:
Counting how many times a function, variable, component or prop is used in source code.

  • Some examples of things you may audit your code for include:
    • TODOs
    • console.log()
    • legal comments
    • Any other specific patterns

2.) Transforming Code

A common scenario:
Transforming code from one syntax to another.

  • Babel does this, a common transformation for JavaScript is to convert from ES6 to ES5

3.) Linting

A common scenario:
Enforcing and applying specific rules for syntax on a code base.

  • A common linter is ESLint, where the code will be transformed into an AST where it can then be traversed to check for certain styles.

  • Below is an example of what doing that manually can look like:

    const { ESLint } = require('eslint');
    
    const codeToLint = `
    const add = (a, b) => {
      return a + b;
    }
    
    console.log(add(2, 3));
    `;
    
    async function lintCode(code) {
      const eslint = new ESLint();
      const results = await eslint.lintText(code);
      return results[0].messages;
    }
    
    (async () => {
      const lintingIssues = await lintCode(codeToLint);
      console.log("Linting issues:");
      console.log(lintingIssues);
    })();
    

Links to AST Visualizer

Have a Play Around with Creating and Traversing an AST

You only need to install 2 dependencies:

npm install --save-dev @babel/parser
npm install --save-dev @babel/traverse

Below are the code snippets from the talk:

  const { parse } = require("@babel/parser");

  // const code = "2 + (4 * 10)";

  // const ast = parse(code);

  // console.log(ast);

  // ------------------------------------------------

  // const code = "2 + (4 * 10)";

  // const ast = parse(code);

  // console.log(ast.program.body[0]);

  // ---------------------------------------------------

  const code = "2 + (4 * 10)";

  const ast = parse(code);

  console.log(ast.program.body[0].expression.left.value);
  console.log(ast.program.body[0].expression.right.left.value);
  console.log(ast.program.body[0].expression.right.right.value);
  const { parse } = require("@babel/parser");
  const traverse = require("@babel/traverse").default;

  const code = "2 + (3 - (4 * 10))";

  const ast = parse(code);

  traverse(ast, {
      NumericLiteral(path) {
          console.log(path.node.value);
      }
      // NumberLiteral: {
      //     enter(path) {
      //         console.log(`Entered ${path.node.value}`);
      //     },
      //     exit(path) {
      //         console.log(`Exited ${path.node.value}`)
      //     }
      // }
  });

References