Generators
Iterators
We can iterate over an array using a for loop
const myArray = [1,2,3]
for (const i=0; i<myArray.length; i++) {
// Do stuff
}
But there are some issues:
- We need to repeat the range check and increment every time we want to iterate
- Seems like a lower level concern than the idea of iterating over the container
- What about other kinds of containers? e.g. a binary tree or a set
We can solve these issues using the iterator pattern. We create an object which encapsulates the traversal over the container.
Javascript supports iterators with the for...of syntax
const myArray = [1,2,3]
for (const element of myArray) {
// Do stuff
}
Now our user code does not need to know the details of iterating over the container, but how does this work? There are two parts to this
- Defining the iterator - iteration protocol
- Creating an iterator - iterable protocol In the case of a container, the container would implement the iterable protocol to describe how to create a new iterator for the container. That iterator is an object implementing the iterator protocol.
We'll start by making an iterator
Iteration protocol
Let's start by creating a class, the constructor should perform our initialization, and we can define a next method to handle the traversal.
class MyArrayIterator {
constructor(array) {
this.array = array
this.index = 0
}
next() {
return array[index++]
}
}
But we still need to know when to stop, javascript specifies that next should return an object which specifies whether the iteration is finished as well as the value
class MyArrayIterator {
constructor(array) {
this.array = array
this.index = 0
}
next() {
if (this.index < this.array.length) {
return {done: false, value: this.array[this.index++]}
}
else {
return {done: true, value: null}
}
}
}
There are other ways that this could be specified, for example returning a monad, or a sentinel symbol - but this is the protocol required by for..of.
We can now use this iterator
const myArray = [1,2,3]
const iterator = new MyArrayIterator(myArray)
var {done, value} = iterator.next()
for (; done === false; {done, value} = iterator.next()) {
console.log(value)
}
We can now iterate the same way over any collection, but the code looks messier than a simple for loop over an array. Next, we will implement the iterable protocol so we can use a simpler syntax.
Iterable protocol
Now that we have an iterator we need to tell for..of how to create one. This is the iterable protocol, and it describes an iterable container.
const myIterable = {
myArray: [1,2,3],
[Symbol.iterator]: function () {
return new MyArrayIterator(this.myArray)
}
}
for (element of myIterable) {
console.log(element)
}
Here, the Symbol.iterator property is set to a function that specifies how to create a new iterator.
We could also combine these protocols into one object instead of having to define two separate ones.
class MyArrayIterator {
constructor(array) {
this.array = array
this.index = 0
this[Symbol.iterator] = this
}
next() {
if (this.index < this.array.length) {
return {done: false, value: this.array[this.index++]}
}
else {
return {done: true, value: null}
}
}
}
const myArray = [1,2,3]
const iterator = new MyArrayIterator(myArray)
for (element of iterator) {
console.log(element)
}
Here, this object implements next - the iterator protocol, but it also has the property Symbol.iterator set to itself. This allows it to be used directly in the for..of loop. In this case we use the object directly as Symbol.iterator rather than a function since we have already constructed a new iterator, and don't need to do it again.
Other uses for iterators
Iterators can also be used to represent more abstract sequences resulting from computations. For example, the cartesian product of two arrays.
The cartesian product pairs everything in the first array with everything in the second, like this:
const numbers = [1,2]
const letters = ['a', 'b', 'c']
const product = cartesianProduct([1, 2], ['a', 'b', 'c'])
expect(cartesianProduct(product)).toEqual([
[1, 'a'],
[1, 'b'],
[1, 'c'],
[2, 'a'],
[2, 'b'],
[2, 'c'],
])
Here's an implementation
function cartesianProduct(array1, array2) {
let result = []
for (let index1=0; index1 < array1.length; index1++) {
for (let index2=0; index2 < array2.length; index2++) {
result.push([array1[index1], array2[index2]])
}
}
return result
}
We can use it like this with for..of since cartesianProduct returns a list which implements the iterable protocol.
const numbers = [1, 2]
const letters = ['a', 'b', 'c']
for (const [number, letter] of cartesianProduct(numbers, letters)) {
// Do something
}
But what if we don't need all the values. cartesianProduct
will always complete both for loops in order to construct a list - it can't be interrupted.
We could make an iterator that will lazily evaluate the next result that should be in the list when it's requested by calling next
class CartesianProductIterator {
constructor(array1, array2) {
this.array1 = array1
this.array2 = array2
this.index1 = 0
this.index2 = 0
this[Symbol.iterator] = this
}
isDone() {
return this.index1 >= this.array1.length
}
next() {
if (this.isDone()) {
return { done: true, value: null }
}
const value = [this.array1[this.index1], this.array2[this.index2]]
if (this.index2 < this.array2.length - 1) {
this.index2++
} else {
this.index1 += 1
this.index2 = 0
}
return { done: false, value }
}
}
This looks a lot nastier. The next
function needs to be reentrant - it can pick up where it left of the next time it's called. This requires storing our loop counters as state on the iterator object. We also can't use nice loops, since we only process one at a time - the increment has to be written manually.
const numbers = [1, 2]
const letters = ['a', 'b', 'c']
for (const [number, letter] of new CartesianProductIterator(numbers, letters)) {
// Do something
}
We can now use our iterator with for..of like the list above, but instead of creating the entire list up front it produces the next result on the next iteration. We could stop at any point (e.g. break from the loop) and not have to calculate the unused values.
Callback
Another method you may think of for abstracting this is a callback. We want to do something in the nested loop with the value, and we don't know what it is yet. So, we defer it to runtime by passing a callback.
function cartesianProduct(array1, array2, callback) {
for (let index1=0; index1 < array1.length; index1++) {
for (let index2=0; index2 < array2.length; index2++) {
callback([array1[index1], array2[index2]])
}
}
}
const numbers = [1,2]
const letters = ['a', 'b', 'c']
let result = []
cartesianProduct(array1, array2, ([a, b]) => result.push(a+b))
With the callback we can write the calculation as a normal function without worrying about making it reentrant - we never reenter
But we've lost the power of the iterators - we don't have an obvious way to break from the loop (maybe we could return something from the callback), and if we did we would have no way to resume. We've also lost our normal control flow. With the iterator we call .next on the iterator and it returns a result. Here's it's inverted, cartesianProduct
does the calling - it calls our callback with each value.
If we want to perform multiple operations in this way we need to nest callbacks:
function cartesianProduct(array1, array2, callback) {
for (let index1=0; index1 < array1.length; index1++) {
for (let index2=0; index2 < array2.length; index2++) {
callback([array1[index1], array2[index2]])
}
}
}
function sumPairs([a, b], callback) {
callback(a+b)
}
const array1 = [1,2,3]
const array2 = [4,5,6]
let results = []
cartesianProduct(array1, array2,
(pair) => sumPairs(pair,
(result) => results.push(result)
)
)
We can make this neater, similar to what we'll see later with some functional constructs, but we can't fix its other flaws.
- Saving the state and resuming it later
- Having non inverted control, and the nicer for..of syntax
Generators
What if the language could make a function reentrant for you without having to write an iterator object. The advantage of the iterator was its interface, we can save it somewhere and pull from it when we need. There's also a nice handy for..of syntax. The advantage of the callback is that the entire calculation runs in the same execution context - we don't need to manually save values and resume the computation.
What if we could just stop the function from executing, save it's state and return control back to the client code with an iterator interface. This is a generator.
function *myGeneratorFunction() {
yield 1
}
const myGenerator = myGeneratorFunction()
const one = myGenerator.next()
This defines a generator function with function*
. When that generator function is called it returns a generator. That generator implements both the iterable and iterator protocols and can be used with for..of. When .next()
is called on the generator it resumes execution until the next yield
expression. At that point .next()
will return that value to the called. The generator will save its state and resume from that expression when .next()
is called again. The generator will complete when the function returns (not yields).
Generators allow us to easily create iterators which we can easily save, pass around, resume, or even throw away before they're finished.
Here's the cartesian product from before:
function* cartesianProductGeneratorFunction(array1, array2) {
for (let index1=0; index1 < array1.length; index1++) {
for (let index2=0; index2 < array2.length; index2++) {
yield [array1[index1], array2[index2]]
}
}
}
const array1 = [1,2]
const array2 = ['a', 'b', 'c']
const generator = cartesianProductGeneratorFunction(array1, array2)
for (const result of generator) {
// Do stuff
}
Here we have syntax nearly identical to the simple nested for loop, except it's an iterator that lazily evaluates the next product when needed.
We can also create an array by fully executing an iterator (like a generator) with Array.from
:
const numbers = [1, 2]
const letters = ['a', 'b', 'c']
expect(
Array.from(cartesianProductGeneratorFunction(numbers, letters)),
).toEqual([
[1, 'a'],
[1, 'b'],
[1, 'c'],
[2, 'a'],
[2, 'b'],
[2, 'c'],
])
Data structures
We can use a generator to easily implement traversal over a custom data structure. For our earlier list we don't need to implement a constructor saving the list manage incrementing an index, and create the output format like we did with an iterator. We can just wrap a for loop.
function* listGenerator(list) {
for (let i=0; i<list.length; i++) {
yield list[i]
}
}
We can also use yield* to easily work with recursive structures.
A preorder traversal of a binary tree visits the root node followed by all its left children, and then its right children.
Here yield* lets us delegate to another generator. First we yield the root, then we create a preorderTraversalGenerator for the left subtree. We delegate control to that iterator until the left tree is finished. Then we do the same for the right
In-order and post-order traversals can be written by reordering the three yield statements.
export class TreeNode {
constructor(value, left, right) {
this.value = value
this.left = left
this.right = right
}
}
export function* preorderTraversalGenerator(tree) {
if (tree) {
yield tree.value
yield* preorderTraversalGenerator(tree.left)
yield* preorderTraversalGenerator(tree.right)
}
}
Lazy sequences
The ability to throw the generator away part way through is particularly powerful because it allows us to represent calculations over large amounts, or even infinite amounts of data without having to process all of it.
The following is a generator function which will create a generator that represents an infinite process for generating every integer
export function* integers() {
for (let value = 0; ; value++) {
yield value
}
}
Here's a similar generator function with a few more parameters to control the generated sequence of numbers.
function* range(start, stop = Infinity, step = 1) {
for (let i = start; i < stop; i += step) {
yield i
}
}
We can write a utility function, take, to materialise some amount of our process into a list
export function take(amount, generator) {
let result = []
for (let i = 0; i < amount; i++) {
result.push(generator.next().value)
}
return result
}
take(5, integers()) // [0,1,2,3,4]
Note that if you ask for more values than the generator can supply you will receive undefineds.
We can also represent operations over infinite lists of values with map
function* map(mapFn, generator) {
for (let value of generator) {
yield mapFn(value)
}
}
const square = (a) => a*a
take(5,
map(square,
integers()
)
) // [0,1,4,9,16]
Or selectively return values with filter
function* filter(mapFn, generator) {
for (let value of generator) {
if (predicate(value)) yield value
}
}
const isEven = (a) => a % 2 === 0
take(5,
filter(isEven,
integers()
)
) // [0,2,4,6,8]
take(5,
map(square,
filter(isEven,
integers()
)
)
) // [0,4,16,36,64]
Generators can easily composed into more complex calculations. We can create a pipe to perform the composition.
const calculateSquareOfEvenNumbers = pipe(
integers,
filter(isEven),
map(square),
)
take(5, calculateSquareOfEvenNumbers())
Pipe returns a generator function which needs to be invoked to produce a generator, this allows us to pass arguments to the first generator in the pipe.
const calculateSquareOfEvenNumbers = pipe(
range,
filter(isEven),
map(square),
)
Array.from(calculateSquareOfEvenNumbers(5, 10)) // [4, 16, 36, 64]
You may notice something weird here though, map and filter took two arguments, but I'm only providing one. We could easily rewrite map to be called twice to return a generator
const map = (mapFn) => function *(generator) {
for (let value of generator) {
yield mapFn(value)
}
}
Map is now a function that creates a generator function. pipe
is given the result of calling map with mapFn which is a generator function - a generator factory. All that's left for pipe to do is to call that generator function with the previous generator to construct a new one that modifies the results.
export function pipe(...generatorFunctions) {
if (generatorFunctions.length === 0) return function* () {}
return function* (...argumentsForFirstGeneratorFunction) {
let generator = generatorFunctions[0](...argumentsForFirstGeneratorFunction)
for (var i = 1; i < generatorFunctions.length; i++) {
generator = generatorFunctions[i](generator)
}
yield* generator
}
}
Here pipe returns a generator function. This generator function passes its arguments along to the first generator function in the pipe to produce a generator. That generator is given as input to subsequent generator functions to create successive generators which modify the output of the previous. We then delegate control to this generator.