How To Read Javascript - Part 2
Specification
I will use the ECMA262 standard for chapter numbers, this may differ across versions.
- Chapters 1-4
- Outline the language and define conformance
- Chapter 5: Notational Conventions
- Explains how to read the spec, most notably the particular convention used.
- I would recommend reading this
- Chapter 6: ECMAScript Data Types and Values
- Defines the js type system
- Chapter 7: Abstract operations
- Defines general abstract operations
- More specialized operations are defined locally to the chapter they are relevant to
- Abstract operations are procedures that are performed by the interpreter but not available directly in the language
- How the language behaves internally
- Chapter 8: Syntax directed operations
- Syntax directed operations are procedures that the interpreter performs which are associated with grammar from a js script
- Syntax directed operations can reference elements from the syntax as inputs
- Chapter 9: Executable code and execution contexts
- Specifies executions contexts and bindings
- Chapter 10: Ordinary and Exotic Objects Behaviours
- Specifies the prototype system
- Chapters 11 - 16
- Define the grammar of the language
- Also, partly how the language executes in the form of syntax directed operations
- Chapter 17
- Specifies error handling
- Chapters 18 - 28
- Specify the standard library available to the programmer
- Chapter 29
- Specifies the memory consistency model
Reading grammar
Programming languages are made of two parts, syntax and semantics. The syntax is what it looks like, how it's written. The semantics are how it behaves. Syntax is expressed in a grammar, which defines the structure of a string - specifies which strings are legal and illegal in the language.
Context free grammar is made of two parts, an alphabet which is the symbols the language is made out of, and a grammar which specifies how they can be combined;
Context free grammar is typically defined recursively in a notation such as a Backus-Naur form (BNF). BNF is a family of notations with different extensions. BNF defines a symbol in terms of one or more replacement rules separated by |s. A rule consists of symbols and terminators, and specifies what the symbol on the left can be replaced by. A terminator is a string that represents its value - terminates the recursive definition
Example
<digit> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
<number> ::= <number> | <digit> <number>
<add-expression> ::= <expression> + <expression>
<subtract-expression> ::= <expression> - <expression>
<expression> ::= <number> | <add-expression> | <subtract-expression>
In this example a digit is defined as any terminal character from 0-9. A number is defined recursively as a number followed by a digit. Expression, add-expression, and subtract-expression are mutually recursive. We define two specific kinds of expressions - add and subtract. These can join any expression.
To make a number symbol we can substitute it by anything on the right hand side, for example a digit
1
We could also substitute it by the second option on the right hand side
1 <number>
But we haven't finished because we still have an unexpanded symbol. We can replace it again
13<number>
And now use the base case of <number> to <digit> to terminate
134
The recursive definition for number allows us to create a number with any number of digits.
Similarly the mutually recursive definition for expression, add-expression, and subtract-expression allows us to build any expression with numbers, +, or -. Let's start with an expression
<expression>
Our base case is that an expression can be replaced by a number (which we went through previously)
134
let's do something more interesting and substitute the expression with an add-expression instead
<add-expression>
And now substitute the add-expression with it's one option
<expression> + <expression>
Notice that we have expressions again, we could terminate in the next step, replacing the expressions with the number base case. Or we could use different expressions Let's do a subtract-expression and another add-expression
<subtract-expression> + <add-expression>
And now replace those expressions with their only option
<expression> - <expression> + <expression> + <expression>
Lets terminate all the expressions except the last with numbers
1 - 2 + 3 + <expression>
We could then further expand this last expression.
In the JS spec the basic format for specifying syntax is a line with the symbol name to be defined followed by a colon. The following lines define replacement rules. Symbol names are italicized, while terminals are bold
Example
IfStatement:
if ( Expression ) Statement else Statement
if ( Expression ) Statement
Symbols
A symbols is a primitive that's guaranteed to be unique. They are most commonly used for creating properties with unique manes won't conflict with existing names and that the symbol is required to access.
Corner cases
typeof null === "object"
// Null is not an object
typeof ()=>{} === "function"
// For all other objects it returns "object"
NaN !=== NaN
// According the the floating point standard isNaN(NaN)
isNaN("string") === true
// The string isn't NaN but isNaN coerces to a number
Number.isNaN("string") === false
// This might be what you actually want, doesn't coerce
typeof NaN === "number"
// Unlike null,undefined for objects NaN is actually a number (specified by IEEE 754)
(-0 === 0) === true
// -0 and 0 are different values with different representations, js tried to pretend they're the same
(-0).toString() === '0'
Object.is("123", NaN) === false
Object.is(-0, -0) === true
Object.is(0, -0) === false
// Object.is acts like === but checks -0 and NaN more correctly
Destructuring
An object literal
var a = {'prop1': 1, 'prop2': 2}
is a textual representation of an object consisting of a property definition list contained in braces. There are a few forms of property definitions that can go in the list
-
PropertyName: AssignmentExpression
- Evaluates the AssignmentExpression and sets the result on the object with the property name from PropertyName
- Two forms for PropertyName literal
key: value
(can also be a string), and computed['key']: value
. - A literal name is used as is for the property name, a computed name is an expression surrounded by square brackets, the expression is computed to determine the name
-
IdentifierReference
-
someVar
- The property name is the identifier name and the value is the value bound to the identifier
-
MethodDefinition
-
myMethod()
Note that in the representation of an object the property names are before the colon.
With destructuring we move the object representation to the left side of the equals so we can refer to parts of it. The representation keeps the same format, however instead of putting things on, we take things off.
const my = "my"
const obj = {
myA: 1,
myB: 2,
[my+'C']: 3
}
const {
myA: myAVar,
myB,
[my+'C']: myC,
myD: myD = 4
} = obj
console.log(myAVar, myB, myC, myD)
Also note that the default initializer for myD is on the right side of the colon. It is part of what we are assigning in the function, not the object.
Types and Coercion
Variables don't have types, values stored in variables do. You can query the type currently in a variable with typeof
var myVar = 1
typeof myVar === 'number'
myVar = 'abc'
typeof myVar === 'string'
JS has the following primitive types
- number
- string
- boolean
- undefined
- null
- symbol
- bigint Everything else is an object
Actual coercion in the interpreter is performed by abstract operations. These are procedures defined by the specification that the interpreter wil perform, but not available as functions in the runtime.
Coercing to boolean
Coercing to boolean is done by the ToBoolean abstract operation. To coerce to a boolean the interpreter looks up whether the value is falsy. If the value is falsy it returns false, otherwise it returns true. The following are falsy
- false
- undefined
- null
- NaN
- ""
- 0
- -0 Note that the empty object and empty array are not on this list, and therefore truthy
Coercing to a string
Coercing to a string is performed by the ToString abstract operation
- If the value is already a string it is returned
- If the value is a symbol throw a TypeError
- If the value is undefined return "undefined"
- If the value is null return "null"
- If the value is true return "false"
- If the value is false return "false"
- If the value is a BigInt return BigInt::toString(argument, 10)
- If the value is a Number return Number::toString(argument, 10)
- Otherwise we have an object
- Return ToString(ToPrimitive(value, string)) Note that this depends on the ToPrimitive operation which we have not defined yet (it's mutually recursive). ToString handles all primitive operations so as long as ToPrimitive returns us some kind of primitive we will finish recursing
Coercing to number
Performed by the ToNumber abstract operation
- If the value is already a number it is returned
- If the value is a Symbol or BigInt throw a TypeError
- If value is undefined or null return NaN
- If value is null or false return +0
- If value is true return 1
- If value is string return StringToNumber(argument)
- Otherwise we have an object
- return ToNumber(ToPrimitive(value, number)) Note that this has the same structure as ToString and is also mutually recursive with ToPrimitive.
Coercing to primitive
the ToPrimitive abstract operation takes a type hint as a second parameter which is what kind of primitive we'd like to try to get. In the case of ToString we'd like a string, but if we give back something else it can be converted in ToString. Likewise for ToNumber except we want a number.
- If value is a primitive return it
- If the object defines a method Symbol.toPrimitive we call that method with the value and type hint as arguments
- If it gives back a primitive it returns, otherwise throw a type error
- If the method is not defined we return OrdinaryToPrimitive(value, hint) OrdinaryToPrimitive tries to get a primitive using the toString and valueOf methods on the object. If we want a string it will try toString first, otherwise it will try toValue first. Note that the otherwise means we default to coercing to number. Object.prototype defines a toString method which returns '[object Object]'. It does not define a toValue.
Equality
Loose and strict equality are defined by the IsLooselyEqual(x,y) and IsStrictlyEqual(x,y) abstract operations.
IsStrictlyEqual
Returns false if the types don't match, otherwise compares numbers with Number::equal(x, y) and everything else with SameValueNonNumber(x, y)
IsLooselyEqual
IsLooselyEqual will try to coerce both operands to the same type before comparing.
- If the types are already the same compare with IsStrictlyEqual
- If both arguments are in [null, undefined] return true
- If either argument is a number we rerun IsLooselyEqual with the other coerced to a number
- If either argument is a boolean we coerce it to a number and rerun IsLooselyEqual
- If we are comparing a primitive to an object we rerun IsLooselyEqual with the object coerced to primitive Notice that loose equality likes to coerce to numbers and only coerces objects to strings if the toValue method fails (See coercing to primitive).
Primitive type constructors will coerce if not called with new, otherwise they will box
- Number
- String
- Boolean
const primitiveString = String(123)
const objectString = new String(123)
typeof primitiveString === 'string' // true
typeof objectString === 'object' // true
objectString instanceof String // true
primitiveString === objectString // false
primitiveString == objectString // true
primitiveString was created by coercing the number 123 to a string, objectString was created by coercing 123 to a string and boxing that string in an object. Note that objectString is an object which is an instance of String. Strict equality returns false since they are not the same type, while coercive equality
Other constructors should be invoked with new but may have different behaviour if they aren't e.g.
typeof new Date() === 'object'
new Date() instanceof Date // true
typeof Date() === 'string'
console.log(Date())
Others throw an error
const p = Promise()
Corner cases
Number("") // 0
Number(" \t\n") //0
Number(null) // 0
Number(undefined) // NaN
Number([]) // 0
Number([1,2,3]) // naN
Number([null]) // 0
Number([undefined]) // 0
Number({}) // NaN
String(-0) // "0"
String(null) // "null"
String(undefined) // "undefined"
String([null]) // ""
String([undefined]) // ""
Boolean(new Boolean(false)) // true
Iterators
The iterator pattern abstracts traversing over a container. We can iterate over an array using a for loop
for (const i=0; i<myArray.length; i++) {
// Stuff
}
However,
- We need to repeat the end check and increment everywhere we iterate
- Seems like a lower layer of abstraction
- What about other kinds of containers
- e.g. different tree traversals, custom map
- Can we generalize this?
We ca use an iterator with for..of
for (const element of myArray) {
// Stuff
}
Now our user code does not need to know the details of iterating over the container, but how does this work? Lets try defining one, we need an object to store the current index and a next method to get the next value
class MyArrayIterator {
constructor(array) {
this.array = array
this.index = 0
}
next() {
return array[index++]
}
}
The next problem is we need to know when to stop iterating, we can instead return an object that says if we've finished iterating. We can't use a tombstone value since null or undefined could be a value from the collection. We could use a Maybe monad, but this is the interface js specifies.
class MyArrayIterator {
constructor(array) {
this.array = array
this.index = 0
}
next() {
if (this.index < this.array.length) {
return {done: false, value: this.array[this.index++]}
}
else {
return {done: true, value: null}
}
}
}
Note that we return done as true past the end of the array.
const myArray = [1,2,3]
const iterator = new MyArrayIterator(myArray)
var {done, value} = iterator.next()
for (; done == false; {done, value} = iterator.next()) {
console.log(value)
}
for...of will look for an iterator factory on the Symbol.iterator property.
const myObj = {
myArray: [1,2,3],
[Symbol.iterator]: function () {
return new MyArrayIterator(this.myArray)
}
}
for (element of myObj) {
console.log(element)
}
We could inline the class into myObj using a closure
const myObj = {
myArray: [1,2,3],
[Symbol.iterator]: function() {
var index = 0
return {
next: () =>
(index < this.myArray.length)
? {done: false, value: this.myArray[index++]}
: {done: true, value: null}
}
}
}
for (element of myObj) {
console.log(element)
}
Implementing the Symbol.iterator method on an object is called the iterable protocol. The object exposing a next method with that particular return object is called the iteration protocol.
Generators
What if we created an object that implements both protocols
function createMyArrayIterator(myArray) {
var index = 0
return {
next: () =>
(index < myArray.length)
? {done: false, value: myArray[index++]}
: {done: true, value: null},
[Symbol.iterator]: function () {
return this
}
}
}
for (const e of createMyArrayIterator([1,2,3])) {
console.log(e)
}
This is the idea of a generator. There is an easier way to write one
function* createMyArrayGenerator(myArray) {
var index = 0;
while (index < myArray.length) {
yield myArray[index++]
}
}
for (const e of createMyArrayGenerator([1,2,3])) {
console.log(e)
}
function*
defines a generator function which returns a Generator object implementing both iteration protocols, like the previous code block.
The Generator constructor is not available and the only way to create one is through a generator function
const generatorObject = createMyArrayGenerator([])
console.log(generatorObject) // Object [Generator] {}
The generator function defines behaviour similar to what we had previously in next
.
The yield
operator pauses the generator's execution, saving the execution context.
The value given to the yield operator will be returned from the generator object's next method wrapped in an object following the iteration protocol.
The next time next
is called we will resume the generator from that yield with the saved execution context.