Graduate Program KB

Generators

Iterators

We can iterate over an array using a for loop

const myArray = [1,2,3]
for (const i=0; i<myArray.length; i++) {
  // Do stuff
}

But there are some issues:

  1. We need to repeat the range check and increment every time we want to iterate
  • Seems like a lower level concern than the idea of iterating over the container
  1. What about other kinds of containers? e.g. a binary tree or a set

We can solve these issues using the iterator pattern. We create an object which encapsulates the traversal over the container.

Javascript supports iterators with the for...of syntax

const myArray = [1,2,3]
for (const element of myArray) {
  // Do stuff
}

Now our user code does not need to know the details of iterating over the container, but how does this work? There are two parts to this

  1. Defining the iterator - iteration protocol
  2. Creating an iterator - iterable protocol In the case of a container, the container would implement the iterable protocol to describe how to create a new iterator for the container. That iterator is an object implementing the iterator protocol.

We'll start by making an iterator

Iteration protocol

Let's start by creating a class, the constructor should perform our initialization, and we can define a next method to handle the traversal.

class MyArrayIterator {
  constructor(array) {
    this.array = array
    this.index = 0
  }
  next() {
    return array[index++]
  }
}

But we still need to know when to stop, javascript specifies that next should return an object which specifies whether the iteration is finished as well as the value

class MyArrayIterator {
  constructor(array) {
    this.array = array
    this.index = 0
  }
  next() {
    if (this.index < this.array.length) {
      return {done: false, value: this.array[this.index++]}
    }
    else {
      return {done: true, value: null}
    }
  }
}

There are other ways that this could be specified, for example returning a monad, or a sentinel symbol - but this is the protocol required by for..of.

We can now use this iterator

const myArray = [1,2,3]
const iterator = new MyArrayIterator(myArray)
var {done, value} = iterator.next()
for (; done === false; {done, value} = iterator.next()) {
  console.log(value)
}

We can now iterate the same way over any collection, but the code looks messier than a simple for loop over an array. Next, we will implement the iterable protocol so we can use a simpler syntax.

Iterable protocol

Now that we have an iterator we need to tell for..of how to create one. This is the iterable protocol, and it describes an iterable container.

const myIterable = {
  myArray: [1,2,3],
  [Symbol.iterator]: function ()  {
    return new MyArrayIterator(this.myArray)
  }
}
for (element of myIterable) {
  console.log(element)
}

Here, the Symbol.iterator property is set to a function that specifies how to create a new iterator.

We could also combine these protocols into one object instead of having to define two separate ones.

class MyArrayIterator {
  constructor(array) {
    this.array = array
    this.index = 0
    this[Symbol.iterator] = this
  }
  next() {
    if (this.index < this.array.length) {
      return {done: false, value: this.array[this.index++]}
    }
    else {
      return {done: true, value: null}
    }
  }
}

const myArray = [1,2,3]
const iterator = new MyArrayIterator(myArray)
for (element of iterator) {
  console.log(element)
}

Here, this object implements next - the iterator protocol, but it also has the property Symbol.iterator set to itself. This allows it to be used directly in the for..of loop. In this case we use the object directly as Symbol.iterator rather than a function since we have already constructed a new iterator, and don't need to do it again.

Other uses for iterators

Iterators can also be used to represent more abstract sequences resulting from computations. For example, the cartesian product of two arrays.

The cartesian product pairs everything in the first array with everything in the second, like this:

const numbers = [1,2]
const letters = ['a', 'b', 'c']
const product = cartesianProduct([1, 2], ['a', 'b', 'c'])

expect(cartesianProduct(product)).toEqual([
  [1, 'a'],
  [1, 'b'],
  [1, 'c'],
  [2, 'a'],
  [2, 'b'],
  [2, 'c'],
])

Here's an implementation

function cartesianProduct(array1, array2) {
  let result = []
  for (let index1=0; index1 < array1.length; index1++) {
    for (let index2=0; index2 < array2.length; index2++) {
      result.push([array1[index1], array2[index2]])
    }
  }
  return result
}

We can use it like this with for..of since cartesianProduct returns a list which implements the iterable protocol.

const numbers = [1, 2]
const letters = ['a', 'b', 'c']
for (const [number, letter] of cartesianProduct(numbers, letters)) {
  // Do something
}

But what if we don't need all the values. cartesianProduct will always complete both for loops in order to construct a list - it can't be interrupted. We could make an iterator that will lazily evaluate the next result that should be in the list when it's requested by calling next

class CartesianProductIterator {
  constructor(array1, array2) {
    this.array1 = array1
    this.array2 = array2
    this.index1 = 0
    this.index2 = 0

    this[Symbol.iterator] = this
  }

  isDone() {
    return this.index1 >= this.array1.length
  }

  next() {
    if (this.isDone()) {
      return { done: true, value: null }
    }

    const value = [this.array1[this.index1], this.array2[this.index2]]

    if (this.index2 < this.array2.length - 1) {
      this.index2++
    } else {
      this.index1 += 1
      this.index2 = 0
    }
    return { done: false, value }
  }
}

This looks a lot nastier. The next function needs to be reentrant - it can pick up where it left of the next time it's called. This requires storing our loop counters as state on the iterator object. We also can't use nice loops, since we only process one at a time - the increment has to be written manually.

const numbers = [1, 2]
const letters = ['a', 'b', 'c']
for (const [number, letter] of new CartesianProductIterator(numbers, letters)) {
  // Do something
}

We can now use our iterator with for..of like the list above, but instead of creating the entire list up front it produces the next result on the next iteration. We could stop at any point (e.g. break from the loop) and not have to calculate the unused values.

Callback

Another method you may think of for abstracting this is a callback. We want to do something in the nested loop with the value, and we don't know what it is yet. So, we defer it to runtime by passing a callback.

function cartesianProduct(array1, array2, callback) {
  for (let index1=0; index1 < array1.length; index1++) {
    for (let index2=0; index2 < array2.length; index2++) {
      callback([array1[index1], array2[index2]])
    }
  }
}

const numbers = [1,2]
const letters = ['a', 'b', 'c']
let result = []
cartesianProduct(array1, array2, ([a, b]) => result.push(a+b))

With the callback we can write the calculation as a normal function without worrying about making it reentrant - we never reenter

But we've lost the power of the iterators - we don't have an obvious way to break from the loop (maybe we could return something from the callback), and if we did we would have no way to resume. We've also lost our normal control flow. With the iterator we call .next on the iterator and it returns a result. Here's it's inverted, cartesianProduct does the calling - it calls our callback with each value.

If we want to perform multiple operations in this way we need to nest callbacks:

function cartesianProduct(array1, array2, callback) {
  for (let index1=0; index1 < array1.length; index1++) {
    for (let index2=0; index2 < array2.length; index2++) {
      callback([array1[index1], array2[index2]])
    }
  }
}

function sumPairs([a, b], callback) {
  callback(a+b)
}

const array1 = [1,2,3]
const array2 = [4,5,6]
let results = []
cartesianProduct(array1, array2,
  (pair) => sumPairs(pair,
    (result) => results.push(result)
  )
)

We can make this neater, similar to what we'll see later with some functional constructs, but we can't fix its other flaws.

  1. Saving the state and resuming it later
  2. Having non inverted control, and the nicer for..of syntax

Generators

What if the language could make a function reentrant for you without having to write an iterator object. The advantage of the iterator was its interface, we can save it somewhere and pull from it when we need. There's also a nice handy for..of syntax. The advantage of the callback is that the entire calculation runs in the same execution context - we don't need to manually save values and resume the computation.

What if we could just stop the function from executing, save it's state and return control back to the client code with an iterator interface. This is a generator.

function *myGeneratorFunction() {
  yield 1
}

const myGenerator = myGeneratorFunction()
const one = myGenerator.next()

This defines a generator function with function*. When that generator function is called it returns a generator. That generator implements both the iterable and iterator protocols and can be used with for..of. When .next() is called on the generator it resumes execution until the next yield expression. At that point .next() will return that value to the called. The generator will save its state and resume from that expression when .next() is called again. The generator will complete when the function returns (not yields).

Generators allow us to easily create iterators which we can easily save, pass around, resume, or even throw away before they're finished.

Here's the cartesian product from before:

function* cartesianProductGeneratorFunction(array1, array2) {
  for (let index1=0; index1 < array1.length; index1++) {
    for (let index2=0; index2 < array2.length; index2++) {
      yield [array1[index1], array2[index2]]
    }
  }
}

const array1 = [1,2]
const array2 = ['a', 'b', 'c']
const generator = cartesianProductGeneratorFunction(array1, array2)
for (const result of generator) {
  // Do stuff
}

Here we have syntax nearly identical to the simple nested for loop, except it's an iterator that lazily evaluates the next product when needed.

We can also create an array by fully executing an iterator (like a generator) with Array.from:

const numbers = [1, 2]
const letters = ['a', 'b', 'c']
expect(
  Array.from(cartesianProductGeneratorFunction(numbers, letters)),
).toEqual([
  [1, 'a'],
  [1, 'b'],
  [1, 'c'],
  [2, 'a'],
  [2, 'b'],
  [2, 'c'],
])

Data structures

We can use a generator to easily implement traversal over a custom data structure. For our earlier list we don't need to implement a constructor saving the list manage incrementing an index, and create the output format like we did with an iterator. We can just wrap a for loop.

function* listGenerator(list) {
  for (let i=0; i<list.length; i++) {
    yield list[i]
  }
}

We can also use yield* to easily work with recursive structures.

A preorder traversal of a binary tree visits the root node followed by all its left children, and then its right children.

Here yield* lets us delegate to another generator. First we yield the root, then we create a preorderTraversalGenerator for the left subtree. We delegate control to that iterator until the left tree is finished. Then we do the same for the right

In-order and post-order traversals can be written by reordering the three yield statements.

export class TreeNode {
  constructor(value, left, right) {
    this.value = value
    this.left = left
    this.right = right
  }
}

export function* preorderTraversalGenerator(tree) {
  if (tree) {
    yield tree.value
    yield* preorderTraversalGenerator(tree.left)
    yield* preorderTraversalGenerator(tree.right)
  }
}

Lazy sequences

The ability to throw the generator away part way through is particularly powerful because it allows us to represent calculations over large amounts, or even infinite amounts of data without having to process all of it.

The following is a generator function which will create a generator that represents an infinite process for generating every integer

export function* integers() {
  for (let value = 0; ; value++) {
    yield value
  }
}

Here's a similar generator function with a few more parameters to control the generated sequence of numbers.

function* range(start, stop = Infinity, step = 1) {
  for (let i = start; i < stop; i += step) {
    yield i
  }
}

We can write a utility function, take, to materialise some amount of our process into a list

export function take(amount, generator) {
  let result = []
  for (let i = 0; i < amount; i++) {
    result.push(generator.next().value)
  }
  return result
}
take(5, integers()) // [0,1,2,3,4]

Note that if you ask for more values than the generator can supply you will receive undefineds.

We can also represent operations over infinite lists of values with map

function* map(mapFn, generator) {
  for (let value of generator) {
    yield mapFn(value)
  }
}

const square = (a) => a*a

take(5,
  map(square,
    integers()
  )
) // [0,1,4,9,16]

Or selectively return values with filter

function* filter(mapFn, generator) {
  for (let value of generator) {
    if (predicate(value)) yield value
  }
}

const isEven = (a) => a % 2 === 0

take(5,
  filter(isEven,
    integers()
  )
) // [0,2,4,6,8]
take(5,
  map(square,
    filter(isEven,
      integers()
    )
  )
) // [0,4,16,36,64]

Generators can easily composed into more complex calculations. We can create a pipe to perform the composition.

const calculateSquareOfEvenNumbers = pipe(
  integers,
  filter(isEven),
  map(square),
)

take(5, calculateSquareOfEvenNumbers())

Pipe returns a generator function which needs to be invoked to produce a generator, this allows us to pass arguments to the first generator in the pipe.

const calculateSquareOfEvenNumbers = pipe(
  range,
  filter(isEven),
  map(square),
)

Array.from(calculateSquareOfEvenNumbers(5, 10)) // [4, 16, 36, 64]

You may notice something weird here though, map and filter took two arguments, but I'm only providing one. We could easily rewrite map to be called twice to return a generator

const map = (mapFn) => function *(generator) {
  for (let value of generator) {
    yield mapFn(value)
  }
}

Map is now a function that creates a generator function. pipe is given the result of calling map with mapFn which is a generator function - a generator factory. All that's left for pipe to do is to call that generator function with the previous generator to construct a new one that modifies the results.

export function pipe(...generatorFunctions) {
  if (generatorFunctions.length === 0) return function* () {}
  return function* (...argumentsForFirstGeneratorFunction) {
    let generator = generatorFunctions[0](...argumentsForFirstGeneratorFunction)
    for (var i = 1; i < generatorFunctions.length; i++) {
      generator = generatorFunctions[i](generator)
    }
    yield* generator
  }
}

Here pipe returns a generator function. This generator function passes its arguments along to the first generator function in the pipe to produce a generator. That generator is given as input to subsequent generator functions to create successive generators which modify the output of the previous. We then delegate control to this generator.