How To Read Javascript - Part 2

Specification

I will use the ECMA262 standard for chapter numbers, this may differ across versions.

Chapters 1-4
- Outline the language and define conformance
Chapter 5: Notational Conventions
- Explains how to read the spec, most notably the particular convention used.
- I would recommend reading this
Chapter 6: ECMAScript Data Types and Values
- Defines the js type system
Chapter 7: Abstract operations
- Defines general abstract operations
- More specialized operations are defined locally to the chapter they are relevant to
- Abstract operations are procedures that are performed by the interpreter but not available directly in the language
  - How the language behaves internally
Chapter 8: Syntax directed operations
- Syntax directed operations are procedures that the interpreter performs which are associated with grammar from a js script
- Syntax directed operations can reference elements from the syntax as inputs
Chapter 9: Executable code and execution contexts
- Specifies executions contexts and bindings
Chapter 10: Ordinary and Exotic Objects Behaviours
- Specifies the prototype system
Chapters 11 - 16
- Define the grammar of the language
- Also, partly how the language executes in the form of syntax directed operations
Chapter 17
- Specifies error handling
Chapters 18 - 28
- Specify the standard library available to the programmer
Chapter 29
- Specifies the memory consistency model

Reading grammar

Programming languages are made of two parts, syntax and semantics. The syntax is what it looks like, how it's written. The semantics are how it behaves. Syntax is expressed in a grammar, which defines the structure of a string - specifies which strings are legal and illegal in the language.

Context free grammar is made of two parts, an alphabet which is the symbols the language is made out of, and a grammar which specifies how they can be combined;

Context free grammar is typically defined recursively in a notation such as a Backus-Naur form (BNF). BNF is a family of notations with different extensions. BNF defines a symbol in terms of one or more replacement rules separated by |s. A rule consists of symbols and terminators, and specifies what the symbol on the left can be replaced by. A terminator is a string that represents its value - terminates the recursive definition

Example

<digit> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
<number> ::= <number> | <digit> <number>
<add-expression> ::= <expression> + <expression>
<subtract-expression> ::= <expression> - <expression>
<expression> ::= <number> | <add-expression> | <subtract-expression>

In this example a digit is defined as any terminal character from 0-9. A number is defined recursively as a number followed by a digit. Expression, add-expression, and subtract-expression are mutually recursive. We define two specific kinds of expressions - add and subtract. These can join any expression.

To make a number symbol we can substitute it by anything on the right hand side, for example a digit

1

We could also substitute it by the second option on the right hand side

1 <number>

But we haven't finished because we still have an unexpanded symbol. We can replace it again

13<number>

And now use the base case of <number> to <digit> to terminate

134

The recursive definition for number allows us to create a number with any number of digits.

Similarly the mutually recursive definition for expression, add-expression, and subtract-expression allows us to build any expression with numbers, +, or -. Let's start with an expression

<expression>

Our base case is that an expression can be replaced by a number (which we went through previously)

134

let's do something more interesting and substitute the expression with an add-expression instead

<add-expression>

And now substitute the add-expression with it's one option

<expression> + <expression>

Notice that we have expressions again, we could terminate in the next step, replacing the expressions with the number base case. Or we could use different expressions Let's do a subtract-expression and another add-expression

<subtract-expression> + <add-expression>

And now replace those expressions with their only option

<expression> - <expression> + <expression> + <expression>

Lets terminate all the expressions except the last with numbers

1 - 2 + 3 + <expression>

We could then further expand this last expression.

In the JS spec the basic format for specifying syntax is a line with the symbol name to be defined followed by a colon. The following lines define replacement rules. Symbol names are italicized, while terminals are bold

Example

IfStatement:
 if ( Expression )  Statement  else Statement
 if ( Expression )  Statement

Symbols

A symbols is a primitive that's guaranteed to be unique. They are most commonly used for creating properties with unique manes won't conflict with existing names and that the symbol is required to access.

Corner cases

typeof null === "object"
// Null is not an object
typeof ()=>{} === "function"
// For all other objects it returns "object"
NaN !=== NaN
// According the the floating point standard isNaN(NaN)
isNaN("string") === true 
// The string isn't NaN but isNaN coerces to a number
Number.isNaN("string") === false
// This might be what you actually want, doesn't coerce
typeof NaN === "number"
// Unlike null,undefined for objects NaN is actually a number (specified by IEEE 754)
(-0 === 0) === true
// -0 and 0 are different values with different representations, js tried to pretend they're the same
(-0).toString() === '0'

Object.is("123", NaN) === false
Object.is(-0, -0) === true
Object.is(0, -0) === false
// Object.is acts like === but checks -0 and NaN more correctly

Destructuring

An object literal

var a = {'prop1': 1, 'prop2': 2}

is a textual representation of an object consisting of a property definition list contained in braces. There are a few forms of property definitions that can go in the list

PropertyName: AssignmentExpression

Evaluates the AssignmentExpression and sets the result on the object with the property name from PropertyName
Two forms for PropertyName literal key: value (can also be a string), and computed ['key']: value.
A literal name is used as is for the property name, a computed name is an expression surrounded by square brackets, the expression is computed to determine the name

IdentifierReference

someVar
The property name is the identifier name and the value is the value bound to the identifier

MethodDefinition

myMethod()

Note that in the representation of an object the property names are before the colon.

With destructuring we move the object representation to the left side of the equals so we can refer to parts of it. The representation keeps the same format, however instead of putting things on, we take things off.

const my = "my"
const obj = {
  myA: 1,
  myB: 2,
  [my+'C']: 3
}

const {
  myA: myAVar, 
  myB, 
  [my+'C']: myC,
  myD: myD = 4
} = obj

console.log(myAVar, myB, myC, myD)

Also note that the default initializer for myD is on the right side of the colon. It is part of what we are assigning in the function, not the object.

Types and Coercion

Variables don't have types, values stored in variables do. You can query the type currently in a variable with typeof

var myVar = 1
typeof myVar === 'number'
myVar = 'abc'
typeof myVar === 'string'

JS has the following primitive types

number
string
boolean
undefined
null
symbol
bigint Everything else is an object

Actual coercion in the interpreter is performed by abstract operations. These are procedures defined by the specification that the interpreter wil perform, but not available as functions in the runtime.

Coercing to boolean

Coercing to boolean is done by the ToBoolean abstract operation. To coerce to a boolean the interpreter looks up whether the value is falsy. If the value is falsy it returns false, otherwise it returns true. The following are falsy

false
undefined
null
NaN
""
0
-0 Note that the empty object and empty array are not on this list, and therefore truthy

Coercing to a string

Coercing to a string is performed by the ToString abstract operation

If the value is already a string it is returned
If the value is a symbol throw a TypeError
If the value is undefined return "undefined"
If the value is null return "null"
If the value is true return "false"
If the value is false return "false"
If the value is a BigInt return BigInt::toString(argument, 10)
If the value is a Number return Number::toString(argument, 10)
Otherwise we have an object
- Return ToString(ToPrimitive(value, string)) Note that this depends on the ToPrimitive operation which we have not defined yet (it's mutually recursive). ToString handles all primitive operations so as long as ToPrimitive returns us some kind of primitive we will finish recursing

Coercing to number

Performed by the ToNumber abstract operation

If the value is already a number it is returned
If the value is a Symbol or BigInt throw a TypeError
If value is undefined or null return NaN
If value is null or false return +0
If value is true return 1
If value is string return StringToNumber(argument)
Otherwise we have an object
- return ToNumber(ToPrimitive(value, number)) Note that this has the same structure as ToString and is also mutually recursive with ToPrimitive.

Coercing to primitive

the ToPrimitive abstract operation takes a type hint as a second parameter which is what kind of primitive we'd like to try to get. In the case of ToString we'd like a string, but if we give back something else it can be converted in ToString. Likewise for ToNumber except we want a number.

If value is a primitive return it
If the object defines a method Symbol.toPrimitive we call that method with the value and type hint as arguments
- If it gives back a primitive it returns, otherwise throw a type error
If the method is not defined we return OrdinaryToPrimitive(value, hint) OrdinaryToPrimitive tries to get a primitive using the toString and valueOf methods on the object. If we want a string it will try toString first, otherwise it will try toValue first. Note that the otherwise means we default to coercing to number. Object.prototype defines a toString method which returns '[object Object]'. It does not define a toValue.

Equality

Loose and strict equality are defined by the IsLooselyEqual(x,y) and IsStrictlyEqual(x,y) abstract operations.

IsStrictlyEqual

Returns false if the types don't match, otherwise compares numbers with Number::equal(x, y) and everything else with SameValueNonNumber(x, y)

IsLooselyEqual

IsLooselyEqual will try to coerce both operands to the same type before comparing.

If the types are already the same compare with IsStrictlyEqual
If both arguments are in [null, undefined] return true
If either argument is a number we rerun IsLooselyEqual with the other coerced to a number
If either argument is a boolean we coerce it to a number and rerun IsLooselyEqual
If we are comparing a primitive to an object we rerun IsLooselyEqual with the object coerced to primitive Notice that loose equality likes to coerce to numbers and only coerces objects to strings if the toValue method fails (See coercing to primitive).

Primitive type constructors will coerce if not called with new, otherwise they will box

Number
String
Boolean

const primitiveString = String(123)
const objectString = new String(123)

typeof primitiveString === 'string' // true
typeof objectString === 'object' // true
objectString instanceof String // true

primitiveString === objectString // false
primitiveString == objectString // true

primitiveString was created by coercing the number 123 to a string, objectString was created by coercing 123 to a string and boxing that string in an object. Note that objectString is an object which is an instance of String. Strict equality returns false since they are not the same type, while coercive equality

Other constructors should be invoked with new but may have different behaviour if they aren't e.g.

typeof new Date() === 'object'
new Date() instanceof Date // true
typeof Date() === 'string'
console.log(Date())

Others throw an error

const p = Promise()

Corner cases

  Number("") // 0
  Number("   \t\n") //0
  Number(null) // 0
  Number(undefined) // NaN
  Number([]) // 0
  Number([1,2,3]) // naN
  Number([null]) // 0
  Number([undefined]) // 0
  Number({}) // NaN

  String(-0) // "0"
  String(null) // "null"
  String(undefined) // "undefined"
  String([null]) // ""
  String([undefined]) // ""

  Boolean(new Boolean(false)) // true

Iterators

The iterator pattern abstracts traversing over a container. We can iterate over an array using a for loop

for (const i=0; i<myArray.length; i++) {
  // Stuff
}

However,

We need to repeat the end check and increment everywhere we iterate

Seems like a lower layer of abstraction

What about other kinds of containers

e.g. different tree traversals, custom map
Can we generalize this?

We ca use an iterator with for..of

for (const element of myArray) {
  // Stuff
}

Now our user code does not need to know the details of iterating over the container, but how does this work? Lets try defining one, we need an object to store the current index and a next method to get the next value

class MyArrayIterator {
  constructor(array) {
    this.array = array
    this.index = 0
  }
  next() {
    return array[index++]
  }
}

The next problem is we need to know when to stop iterating, we can instead return an object that says if we've finished iterating. We can't use a tombstone value since null or undefined could be a value from the collection. We could use a Maybe monad, but this is the interface js specifies.

class MyArrayIterator {
  constructor(array) {
    this.array = array
    this.index = 0
  }
  next() {
    if (this.index < this.array.length) {
      return {done: false, value: this.array[this.index++]}
    }
    else {
      return {done: true, value: null}
    }
  }
}

Note that we return done as true past the end of the array.

const myArray = [1,2,3]
const iterator = new MyArrayIterator(myArray)
var {done, value} = iterator.next()
for (; done == false; {done, value} = iterator.next()) {
  console.log(value)
}

for...of will look for an iterator factory on the Symbol.iterator property.

const myObj = {
  myArray: [1,2,3],
  [Symbol.iterator]: function ()  {
    return new MyArrayIterator(this.myArray)
  }
}
for (element of myObj) {
  console.log(element)
}

We could inline the class into myObj using a closure

const myObj = {
  myArray: [1,2,3],
  [Symbol.iterator]: function() {
    var index = 0
    return {
      next: () => 
        (index < this.myArray.length) 
        ? {done: false, value: this.myArray[index++]}
        : {done: true, value: null}
    }
  }
}
for (element of myObj) {
  console.log(element)
}

Implementing the Symbol.iterator method on an object is called the iterable protocol. The object exposing a next method with that particular return object is called the iteration protocol.

Generators

What if we created an object that implements both protocols

function createMyArrayIterator(myArray) {
  var index = 0
  return {
    next: () => 
      (index < myArray.length) 
      ? {done: false, value: myArray[index++]}
      : {done: true, value: null},

    [Symbol.iterator]: function () {
      return this
    }
  }
}

for (const e of createMyArrayIterator([1,2,3])) {
  console.log(e)
}

This is the idea of a generator. There is an easier way to write one

function* createMyArrayGenerator(myArray) {
  var index = 0;
  while (index < myArray.length) {
    yield myArray[index++]
  }
}

for (const e of createMyArrayGenerator([1,2,3])) {
  console.log(e)
}

function* defines a generator function which returns a Generator object implementing both iteration protocols, like the previous code block. The Generator constructor is not available and the only way to create one is through a generator function

const generatorObject = createMyArrayGenerator([])
console.log(generatorObject) // Object [Generator] {}

The generator function defines behaviour similar to what we had previously in next. The yield operator pauses the generator's execution, saving the execution context. The value given to the yield operator will be returned from the generator object's next method wrapped in an object following the iteration protocol. The next time next is called we will resume the generator from that yield with the saved execution context.