# Swift 5.1 Collection Diffing

Disclaimer: this article refers to the first Swift implementation of collection diffing. The implementation itself has already been improved, however the concepts behind this article are exactly the same.

Until recently, dealing with dynamic tables/collection views meant one of the following:

- Call
`reloadData()`

at every change - Use a dependency like IGListKit or DifferenceKit
- Manually deal with every possible change (while praying to not miss anything 🤞🏻)

Swift 5.1 introduces a new alternative: Use the new `Collection`

’s `difference(from:)`

.

Want to know about what else is new Swift 5.1? Check out my lightning talk here.

Unlike other features, Swift collection diffing is entirely additive and it is completely written in Swift:

this presents us an unique opportunity to learn how the collection diffing is actually implemented without digging into any other lower level language.

In this article I'm going to do exactly that: at over 5000 words with a lot of technicalities and code, you might want to find a comfortable place before digging in.

Ready? Let's go!

** Open Source iOS & macOS Chat SDK** - Check out Stream's Cross-Platform Open Source Chat SDK on GitHub! Write once and deploy your app with fully featured chat UI on iOS and macOS!

## CollectionDifference<ChangeElement>

Before talking about the function itself, I want to focus on what it returns:

an instance of `CollectionDifference<ChangeElement>`

.

CollectionDifference is a generic `Collection`

defined as "*A collection of insertions and removals that describe the difference between two ordered collection states*".

The generic type `ChangeElement`

is the element type of the two collections that we are diffing, note that, if we look solely at the `CollectionDifference`

definition, `ChangeElement`

is completely unconstrained: it doesn’t even need to conform to `Equatable`

!

The elements of this collection are of type `CollectionDifference.Change`

, which is a new enum with two cases: `.insert`

and `.remove`

. Before moving into this enum, there are a couple of points that must be disclosed about `CollectionDifference`

:

- The collection returns first all the removals (all the
`.remove`

cases), and then all the insertions (all the`.insert`

cases). This is assured by an internal initializer: even if we try to initialize this collection with changes in random order, the initialized collection will be always ordered properly. - The collection insertions, deletions, and association between the two must be unique (more on this later). This is assured by an internal validator: while trying to initialize an unsorted
`CollectionDifference`

is allowed, failing this uniqueness validation will fail the collection initialization.

### Insertions and Removals

If we’re interested exclusively on the insertions **or** the removals of a `CollectionDifference`

, instead of traversing the collection itself, we can use its public properties `insertions`

and `removals`

, which are two arrays (with elements of type `CollectionDifference.Change`

).

These arrays are actually where the elements of the main collection are stored, and accessing them directly is more performant the traversing the collection itself.

## CollectionDifference.Change

When comparing two collections, you can think of any change between the collections elements as either:

- an insertion (a new element has been added)
- a removal (an element has been removed)
- or both (when an element changes position, it is
*removed*from the current position and*added*to the new position).

With that being said, it comes as no surprise that `CollectionDifference.Change`

is an `Enum`

with two cases: `insert`

and `remove`

.

Both cases come with three associated values: an offset, an element, and an optional association.

### Change Offset

The offset is the position of the change.

In case of a removal, it reflects the position where the element used to be:

```
let oldArray = ["a", "b", "c", "d"]
let newArray = ["a", "b", "d"]
// The element "c" at index 2 has been removed
let difference = newArray.difference(from: oldArray)
// difference is a one-element collection
// where the element is of type `.remove` with `offset` 2
```

In the case of an insertion, it reflects the position of the element in the final collection:

```
let oldArray = ["a", "b", "c", "d"]
let newArray = ["a", "b", "c", "d", "e"]
// A new element "e" has been added at index 4
let difference = newArray.difference(from: oldArray)
// difference is a one-element collection
// where the element is of type `.insert` with `offset` 4
```

### Change Element

This element is the whole reason why `CollectionDifference`

is declared generic:

the change element is the actual collection element that has been removed or inserted.

```
let oldArray = ["a", "b", "c", "d"]
let newArray = ["a", "b", "d", "e"]
// The element "c" at index 2 has been removed
// A new element "e" has been added at index 3
let difference = newArray.difference(from: oldArray)
// difference is a two elements collection:
// - the first element is of type `.remove` (with offset 2) with element "c"
// - the second element is of type `.insert` (with offset 3) with element "e"
```

### Change Association

Again, when an element moves position between two collection states, the change can be considered as a deletion (from the old position) and as an insertion (to the new position).

This is what the `Change`

association is for: to link that removal and that insertion.

The association, called `associatedWith`

, is an index representing *the other* change offset:

if we’re looking at a `.insert`

change, the `associatedWith`

index will be equal to the associated `.remove`

`offset`

index, and vice versa.

As change associations add an extra cost on the diffing, they’re not computed by default:

the `difference(from:)`

result is a collection of `.remove`

/`.insert`

elements all with `associatedWith`

value set to `nil`

. If we’re also interested in this association, we need to call `.inferringMoves()`

in our difference collection:

```
let oldArray = ["a", "b", "d", "e", "c"]
let newArray = ["a", "b", "c", "d", "e"]
// the element "c" has moved from index 4 to 2
// therefore we can think it as:
// - a removal from position 4
// - and an insertion at position 2
let difference = newArray.difference(from: oldArray)
// difference is a two elements collection
// - the first element is a `.remove` with offset 4
// and associated offset of `nil`
// - the second element is an `.insert` with offset 2
// and associated offset of `nil`
let differenceWithMoves = difference.inferringMoves()
// differenceWithMoves is a two elements collection
// - the first element is a `.remove` with offset 4
// and associated offset 2
// - the second element is an `.insert` with offset 2
// and associated offset 4
```

#### InferringMoves

This function is straightforward: it scans the whole `CollectionDifference`

instance and adds an association when an element change matches both among the removals and the insertions, otherwise it just returns the Change *unchanged*.

### A Look Back to CollectionDifference’s Rules

Now that we have a clear definition of what the `Change`

type is, we can look back to the `CollectionDifference`

definition and have a clearer understanding of some of its rules:

- Having non-unique removals would mean having multiple removals at the same index: it's impossible.
- Same with additions: having multiple additions at the same offset would mean having multiple elements at the same index in the final collection state.
- Lastly, associations must always come in pairs (an insertion linked to a removal and vice versa): there can’t be a one way association.

### A Second CollectionDifference Order

If we look at the how the collection internal init is defined, we can see that `CollectionDifference`

has a clear order that goes beyond "*removals first and insertions second*":

all the insertions are stored in the `insertions`

array in order from lowest to highest `offset`

, and same goes for the deletions with the `removals`

array.

However, if we look at the `CollectionDifference`

`subscript`

, we can see that the collection is exposed in, again, a different order:

the removals are exposed from highest to lowest `offset`

, while the insertions from lowest to highest `offset`

.

This exposed order allows us to use the returned `CollectionDifference`

instance to transform a collection from the old state to the new state by applying, one by one, the collection `Change`

s:

```
let oldArray = ["a", "b", "c", "d"]
let newArray = ["x", "a", "e", "c"]
var anotherArray = oldArray
let difference = newArray.difference(from: oldArray)
// difference is a four elements collection:
// 1. `.remove` at offset 3
// 2. `.remove` at offset 1
// 3. `.insert` at offset 0
// 4. `.insert` at offset 2
for change in difference {
switch change {
case let .remove(offset, _, _):
anotherArray.remove(at: offset)
case let .insert(offset, newElement, _):
anotherArray.insert(newElement, at: offset)
}
}
// at this point `anotherArray` is equal to `newArray`
```

`anotherArray`

starts in the same state of `oldArray`

and ends up like `newArray`

by following the `difference`

collection order:

```
["a", "b", "c", "d"] // initial state
["a", "b", "c"] // removal at index 3
["a", "c"] // removal at index 1
["x", "a", "c"] // insertion at index 0
["x", "a", "e", "c"] // insertion at index 2
```

If the collection was ordered in any another way, we would have obtained a different outcome (or even a crash!).

### Applying

If we need to apply the `difference`

result to a collection, we don’t have to do it ourselves (like in the example above):

Swift offers a new method, `applying(_:)`

, that does it for us.

Therefore the example above could have been written entirely as:

```
let oldArray = ["a", "b", "c", "d"]
let newArray = ["x", "a", "e", "c"]
var anotherArray = oldArray
let difference = newArray.difference(from: oldArray)
// applying the difference here 👇🏻
anotherArray = anotherArray.applying(difference)!
// at this point `anotherArray` is equal to `newArray`
```

Since this method can be called on *any* collection (as long the collection elements are compatible), instead of applying the `difference`

in place, it returns a new optional collection with the difference applied.

The reason why the return type is optional is clear as soon as we imagine doing a `remove`

or an `insert`

to an index out of range:

instead of crashing, the function will return `nil`

when the difference is *applied* to an incompatible collection.

## difference(from:)

Now that we’ve covered the basics, it’s time to uncover what this method actually does, it might take you by surprise (it did to me!) but the whole implementation is a one liner:

```
extension BidirectionalCollection where Element : Equatable {
public func difference<C: BidirectionalCollection>(
from other: C
) -> CollectionDifference<Element> where C.Element == Self.Element {
return difference(from: other, by: ==)
}
}
```

It turns out that we were using a convenience method all along!

Before digging into the real `difference(from:by:)`

method, it's important to note how it's `difference(from:)`

that requires the elements of the collections to conform to `Equatable`

, this requirement is nowhere to be seen in `difference(from:by:)`

👍🏻.

## difference(from:by:)

So far we’ve used `CollectionDifference`

to compute the difference in the *equality* of two states of a collection, however the truth is that this comparison could be used for many more use cases:

in fact, the original pitch for collection diffing describes the feature as a "*diffing functionality […] necessary to provide easy creation, representation, and application of ordered collection state transitions*".

A *state transition* can have different meanings based on the context, therefore it makes a lot of sense for Swift to set no limits on what this *transition* can be.

With that being said, it should come with no surprise that the real `difference`

function signature is:

```
public func difference<C: BidirectionalCollection>(
from other: C,
by areEquivalent: (Element, C.Element) -> Bool
) -> CollectionDifference<Element>
where C.Element == Self.Element
```

Note how non-constraining this function is: all it requires is two `BidirectionalCollection`

s (two collections that can be traversed both by moving backward and/or forward) with the same associated types.

The associated types don’t need to conform to any specific protocol, all it matters is that we pass a method that takes two elements of this type and returns a boolean:

what this *comparison* method does, and what it means, is entirely up to the developer.

### An Example

As a fun example, let’s run `difference(from:by:)`

with two identical collections and a comparison method that returns always `false`

:

```
let oldArray = ["a", "b", "c"]
let newArray = ["a", "b", "c"] // same as `oldArray`
let difference = newArray.difference(from: oldArray,
by: { _, _ in false })
// `difference` is a 6 elements collection with:
// - three removals
// - three insertions
```

The result is a collection with three removals and three insertions:

this is because our comparison method (which, again, returns always `false`

in this example) removes any connection between any element in either state, therefore the `difference`

algorithm will see this transition as a complete rewrite of the array, regardless of the contents of those arrays!

### A Note on difference(from:by:) Return Type

`difference`

returns a non-optional `CollectionDifference`

.

Previously I've mentioned that a `CollectionDifference`

initialization fails when we try to initialize it with elements that do not comply to its rules. How does Swift guarantees a non-optional `CollectionDifference`

instance?

While Swift exposes only one failable initializer for `CollectionDifference`

, it turns out that there's also a non-failable initializer:

this second initializer is marked as `internal`

, therefore it can only be used within the Swift repository, and it is not exposed when we use Swift from a toolchain in Xcode.

This is because the internal initializer is used only with algorithms that have been mathematically proven to instantiate a correct `CollectionDifference`

, therefore it is ok for Swift to return a non-optional collection.

### Inside difference(from:by:)

The first thing that this method does is creating two collections, `source`

and `target`

, of internal type `_CountingIndexCollection`

:

the `source`

reflects the *old* state of the original collection, while `target`

reflects the *new* state of the original collection.

If we call `["a","b","c"].difference(from: ["e", "f", "g"])`

for example, `source`

mirrors `["e", "f", "g"]`

while `target`

mirrors `["a", "b", "c"]`

.

`_CountingIndexCollection`

is a wrapper of the real collection, with an easy way to get the offset of its indices from its start index.

```
let originalCollection = ["a", "b", "c", "d"]
Let offsetCollection = _CountingIndexCollection(originalCollection)
if let index = offsetCollection.index(of: "d") {
print(index.offset) // prints "3"
print(offsetCollection[index]) // prints "d"
}
```

Lastly, the method creates an instance of internal type `_CollectionChanges`

, which takes the newly created `source`

and `target`

collections along with our injected comparison block.

After the `_CollectionChanges`

initialization is complete, the `difference`

method maps back the `_CollectionChanges`

instance into an array of `Change`

instances, and returns a `CollectionDifference`

initialized with these changes.

Therefore, the real diffing doesn’t happen in this method, but inside the `_CollectionChanges`

initialization:

we need to explore what this `_CollectionChanges`

is all about.

### _CollectionChanges

`_CollectionChanges`

is a generic collection of changes (*duh!*) between a source and target collection. In the documentation it is stated that an instance of this type can be used to:

- Traverse the longest common subsequence of
`source`

and`target`

, which means the longest subsequence common between the two collections. For example, the longest common sequence between X**MJ**Y**AU**Z and**M**Z**JA**WX**U**is MJAU: the subsequence, as long as it is found while traversing both collections in the same direction, doesn’t have to occupy consecutive positions within the original collections.

- Traverse the shortest edit script of remove and insert operations, which means finding the minimum number of operations needed to transform the
`source`

collection into the`target`

.

These definitions/challenges are actually dual:

if we solve one, we've also found a way to solve the other as well!

Both challenges are very valuable to our diffing: since additions and removals are costly, and no change is free, solving those problems means finding the cheapest (read: fastest) way to turn our old collection state into the new collection.

Let's take two collections for example, `ABCABBA`

and `CBABAC`

: how many ways there are to transform one into the other?

The answer is several: `_CollectionChanges`

promises to find the most efficient one.

### Endpoint

The first declaration that we met inside the `_CollectionChanges`

body is a new `typealias`

that forms a tuple by taking one index of the `source`

collection and one from the `target`

collection:

```
typealias Endpoint = (x: SourceIndex, y: TargetIndex)
```

This is what we will use when associating two locations of the two collections:

For example, if we want to associate the first element of the `source`

with the second element of the `target`

, we will define the tuple (nee, the `Endpoint`

) as `(1, 2)`

.

### PathStorage: An Hidden Swift Game 👾

Let’s assume to have a 2D chart in front of us. The X-axis is our `source`

collection and our Y-axis is our `target`

collection.

```
| • | X | A | B | C | D |
• | | | | | | |
X | | | | | | |
Y | | | | | | |
C | | | | | | |
D | | | | | | |
```

You’re now tasked to draw a path that goes:

- from
`(0,0)`

- to
`(lastIndexOfTarget, lastIndexOfSource)`

```
| • | X | A | B | C | D |
• | S | | | | | |
X | | | | | | | // S: Start Here
Y | | | | | | | // T: Target Destination
C | | | | | | |
D | | | | | | T |
// Start Position S: (0,0)
// Target Position T: (4,5)
```

Fill the gaps

Rules:

- You can only advance left to right (no going back)
- You can only advance top to bottom (no going back)
- You can advance left to right and top to bottom at the same time (diagonal move) only if you advance equally in both directions
- You can advance left to right and top to bottom at the same time (diagonal move) only when the X-axis and Y-axis elements are the same

Note: I’m using the equality as a comparison for simplicity sake, however the rules above are still applied in general sense.

The game is to find the allowed path with the most diagonal moves.

Sounds familiar? Because it is:

this game is exactly what we introduced before in `_CollectionChanges`

as the *longest common subsequence* challenge.

Finding the path with the most diagonal moves is the same as finding the *longest common subsequence* of our two collections, and finding the path with the most diagonal moves means finding the path with as little non-diagonal moves as possible. Non-diagonal moves correspond to a change:

- a vertical move (top to bottom) is an insertion
- while an horizontal move correspond to a deletion.

Here’s the solution of the game above:

```
| • | X | A | B | C | D |
• | S | | | | | |
X | | x | x | x | | | // S: Start Here
Y | | | | x | | | // T: Target Destination
C | | | | | x | |
D | | | | | | T |
// Final path:
(0,0) → (1,1) → (1,2) → (1,3) → (2,3) → (3,4) → (4,5)
```

Let's look at the moves, one by one:

- we start at
`(0,0)`

, nothing to see here - we first move diagonally to
`(1,1)`

, as both axis have the value`X`

at that position - then we move horizontally to
`(1,2)`

, which is equivalent to removing`A`

- then we move horizontally to
`(1,3)`

, which is equivalent to removing`B`

- then we move vertically to
`(2,3)`

, which is equal to inserting`Y`

to our collection - the we move diagonally to
`(3,4)`

which is allowed because we have`C`

in both axis - lastly we move diagonally to
`(4,5)`

, which is allowed because we have`D`

in both axis, arriving to our target destination.

In other words, we can use this path to find the difference operations necessary to go from `target`

to `destination`

, looking back again to the path: - the two horizontal moves, `(1,1) → (1,2)`

and `(1,2) → (1,3)`

, correspond to two deletions - the one vertical move, `(1,3) → (2,3)`

, correspond to an insertion

Before translating those moves to an

`offset`

of`CollectionDifference.Change`

we must remember that the indexes of this path are shifted by one, as`(0,0)`

is the initial position in the chart, and not the start index of each collection.

Remember our `Endpoint`

definition?

```
typealias Endpoint = (x: SourceIndex, y: TargetIndex)
```

The path that we have just found is just an array of endpoints:

the second declaration that we find in `_CollectionChanges`

is `PathStorage`

: which is where we store this path.

### pathStartIndex

Our final property found inside `_CollectionChanges`

is `pathStartIndex`

, which is the index in `pathStorage`

of the first segment in the *difference path*.

This is an internal optimization:

if two collections have the same prefix, like **AE**OS and **AE**FT, it’s no use to start doing our diffing in the common prefix AE, therefore we skip it and do all our diffing operations starting from the `pathStartIndex`

instead.

### enum Element

This is an `internal`

enum declaration that will help us when we need to transform the path found above into a `CollectionDifference.Change`

instance:

```
enum Element {
case removed(Range<SourceIndex>)
case inserted(Range<TargetIndex>)
case matched(Range<SourceIndex>, Range<TargetIndex>)
}
```

Note how the associated values of each case is a range instead of a simple index:

while in the example above we've separated each move to one step, multiple *consecutive* moves in the same direction can be grouped together into one move, therefore moves like our two horizontal moves `(1,1) → (1,2) → (1,3)`

can be grouped into one as `(1,1) → (1,3)`

, therefore we can describe our path succinctly, and this contraction allows us to use ranges.

If you recall were we started, we've said that our diffing method initializes a `_CollectionChanges`

instance: > After the `_CollectionChanges`

initialization is complete, the `difference`

method maps back the `_CollectionChanges`

instance into an array of `Change`

instances, and returns a `CollectionDifference`

initialized with these changes.

an excerpt of the chapter "Inside difference(from:by:)"

This `Element`

definition is exactly what `_CollectionChanges`

is a collection of, therefore what our original `difference`

method does is take these `Element`

instances and, one by one, turn them into `Change`

s (while ignoring all the matches).

We've covered the basics: it's time to look at the `_CollectionChanges`

initialization.

### _CollectionChanges Main init

As a reminder, this

`init`

has three parameters: our two collection states and the comparison method.

This init does two things: - initialize the `pathStartIndex`

with value `0`

and the `PathStorage`

array to an empty array - call a `private`

method `formChanges`

while passing to it all the `init`

parameters.

### _CollectionChanges formChanges

As a reminder, what we want to do now is filling the

`_CollectionChanges`

's`PathStorage`

(which is an array of`Endpoint`

s, which correspond to our path as we've seen above).

Once called, this method does some initial checks.

If both collection states are empty (e.g. we're comparing two empty arrays), this method stops immediately: the difference is empty as there's nothing to compare.

If we continue, it means that at least one of the collection states is not empty.

This is the point where our method finds the common prefix among the collection states (according to our comparison method):

- if the common prefix is equal to at least one of the two collections, then the change between the two collections is trivial: going back to our path game, having a collection equal to the prefix of the other means that we can use diagonal moves until we hit one of the edges. Once we hit the edge, we are only allowed to move either by going down (insertions) or right (deletions). Here are three examples where the collections are one the prefix of the other.

```
// source = ASDFO
// target = ASD
// the target is a prefix of the source:
| • | A | S | D | F | O |
• | S | | | | | |
A | | x | | | | | // S: Start Here
S | | | x | | | | // T: Target Destination
D | | | | x | x | T |
↑
we hit the edge here
// Final path:
(0,0) → (3,3) → (3,5)
// Which translates into:
// - three matches (0,0) → (3,3)
// - two deletions (of letter F, and O), (3,3) → (3,5)
// source = ASD
// target = ASDFO
// the source is a prefix of the target:
| • | A | S | D |
• | S | | | | // S: Start Here
A | | x | | | // T: Target Destination
S | | | x | |
D | | | | x | ← we hit the edge here
F | | | | x |
O | | | | x |
// Final path:
(0,0) → (3,3) → (5,3)
// Which translates into:
// - three matches (0,0) → (3,3)
// - two insertions (of letter F, and O), (3,3) → (5,3)
// source = ASD
// target = ASD
// both collections are equal, therefore they are each other's prefixes
| • | A | S | D |
• | S | | | | // S: Start Here
A | | x | | | // T: Target Destination
S | | | x | |
D | | | | T | ← we hit the edge here
// Final path:
(0,0) → (3,3)
// Which translates into:
// - three matches (0,0) → (3,3)
```

Once again I’m using the equality as a comparison for simplicity sake.

Based on which of the three cases we fall into, our final `pathStorage`

will have two or three elements (endpoints) and the method returns.

- Lastly, if we arrive here, it means that our collection states are:
- Not the same (according to our comparison method)
- Not empty
- Might have a common prefix

This is when our method `formChanges`

calls another method, `formChangesCore`

.

### _CollectionChanges formChangesCore

Remember what we are accomplishing:we are trying to find the most efficient way to go from one collection state to another, by using as little insertions/deletions as possible, while reusing the

`source`

collection elements as much as possible (a.k.a. finding the path with the the most diagonal moves in our path game).

This method is invoked with a few parameters: the usual original collections and comparison method, plus the end indexes on each collection of their common prefix (that we have just computed above).

Since we've already got rid of the base cases, this method is (finally!) where the magic happens.

To put it simply, this method implements the "Greedy LCS/SES Algorithm" published in 1986 (!) by Eugene W. Myers.

#### The Greedy LCS/SES Algorithm

Curious on which algorithm IGListKit uses? Check out Paul Heckel's paper here. Here's a Swift implementation of said algorithm.

Without going too deeply on the theory behind the algorithm, which is explained and demonstrated very clearly in the original paper (approachable by everyone), the algorithm is based on three steps:

- Find the furthest valid path that starts at
`(0,0)`

and has up to`D`

non-diagonal moves.

- If we've reached the Target Destination, end the algorithm.

- If not, increase
`D`

by one (initially set to`0`

) and start again.

In order to get to the Target Destination, instead of computing over and over all the possible valid paths with `D`

non-diagonal moves, the algorithm stores the previous explored paths.

However, not all possible paths are stored, nor all the possible paths are explored.

Instead, the algorithm bases its research for the best path on the *diagonals* of our 2D chart game, these diagonals are defined as `k = x - y`

. Here are a few examples:

```
diagonal with k = 0 diagonal with k = 1 diagonal with k = 2
| • | A | S | D | | • | A | S | D | | • | A | S | D |
• | k | | | | • | | k | | | • | | | k | |
A | | k | | | A | | | k | | A | | | | k |
S | | | k | | S | | | | k | S | | | | |
D | | | | k | D | | | | | D | | | | |
F | | | | | F | | | | | F | | | | |
O | | | | | O | | | | | O | | | | |
diagonal with k = -1 diagonal with k = -2 diagonal with k = -3
| • | A | S | D | | • | A | S | D | | • | A | S | D |
• | | | | | • | | | | | • | | | | |
A | k | | | | A | | | | | A | | | | |
S | | k | | | S | k | | | | S | | | | |
D | | | k | | D | | k | | | D | k | | | |
F | | | | k | F | | | k | | F | | k | | |
O | | | | | O | | | | k | O | | | k | |
```

Here are the first few iterations of the algorithm:

- Initially, the algorithm starts at
`(0,0)`

and finds the furthest path with`0`

non diagonal moves, which will necessarily end its path in the 0th diagonal (`k = 0`

).

- On the first iteration (if we haven't reached the Target Destination yet), we try to find the furthest paths that have
`1`

non diagonal moves,**starting from where we left in the previous found path**. We will necessarily end up on a diagonal with k equal to 1 or -1 (remember, by moving to the diagonal with`k = 1`

it means that we do an insertion, dually, we perform a removal by moving into the diagonal with`k = -1`

). Assuming that we haven't arrived to the final destination yet, we have 2 potential paths now: one that ends in diagonal 1 and one in diagonal -1. We keep both and keep iterating.

- On the next iteration we will try to find the furthest path with
`2`

non diagonal moves, instead of starting from scratch, again, we use the two potential paths from the previous iteration. Starting from the previous two potential paths that ended in diagonal 1 and -1, this iteration paths will necessarily end in either diagonal -2, 0, or 2. If we haven't found the Target Destination yet, we keep the furthest path on each diagonal (three in total) and keep iterating.

Would you like to see this process in action? Robert Elder has an interactive Visualization here (jump to step 5!)

This iteration keeps going until we finally reach the Target Point.

A few important observations:

- a path at a given iteration with "x" non-diagonal moves is just a path from the previous iteration, with "x-1" moves, and an extra non diagonal move (and potentially multiple diagonal moves).

- we grow the number of potential paths by one at every iteration:
- we start with one (that lies at diagonal 0)
- then we grow to 2 (one that lies at diagonal 1, one at diagonal -1)
- then 3 (diagonals -2, 0, 2)
- etcetera

- at a given iteration D, all the recorded path will end in one of the following diagonals: { −D, −D+2, ... D−2, D }.
- we start with one path that lays at diagonal 0 (iteration D = 0 -> diagonals {0})
- at the first iteration we have two paths, one at diagonal -1, one at diagonal 1 (iteration D = 1 -> diagonals {-1, 1})
- at the next iteration we have three paths, one at diagonal -2, one at diagonal 0, one at diagonal 2 (iteration D = 2 -> diagonals {-2, 0, 2})
- etcetera

We're basically alternating diagonals (while also expanding at each iteration).

All these observations are demonstrated in the original paper.

That's the whole algorithm! Let's see how to implement it in Swift.

#### The Greedy LCS/SES Algorithm in Swift

Since `formChanges`

passes the computed end indexes of the common prefix of our two collection states, the first iteration has been done already:

computing the common prefix of the two collections is equivalent to finding the furthest path with `0`

non diagonal moves.

All we have to do now is to store this first path somewhere.

##### _SearchState

As we've observed during the explanation of the algorithm, at every iteration we increase the potential paths by one, and one potential path at a given iteration is exactly a potential path from the previous iteration with one extra non-diagonal move.

Therefore we can describe each potential path by pointing at its last move, which is an `Endpoint`

instance as we've defined before, and then refer to its previous path from the previous iteration.

This is exactly what the private `_SearchState`

generic struct does.

This struct relies on an internal generic storage called `_LowerTriangularMatrix`

:

by definition, a square matrix is called lower triangular if all the entries above the main diagonal are zero. Example:

```
| * | 0 | 0 | 0 |
| * | * | 0 | 0 |
| * | * | * | 0 |
| * | * | * | * |
```

In order to optimize performance and allocation, this `_LowerTriangularMatrix`

externally behaves like a real lower triangular matrix (with subscript and all), however internally is just an array of elements along with a `dimension`

property (this is all this type needs to mirror a real lower triangular matrix).

Going back to `_SearchState`

, the struct considers every row of its storage (which, again, is a lower triangular matrix) as the *search frontier* of each iteration, while the columns represent the furthest path on the *x*th diagonal for that search frontier.

In other words:

- the position
`(0,0)`

in the storage contains the final position of our furthest path with`0`

non-diagonal moves starting at`(0,0)`

in our game chart. - the positions
`(1,0)`

and`(1,1)`

in the storage contain the furthest paths with`1`

non-diagonal moves, the former contains the one that ends in diagonal -1, while the latter contains the one that ends in diagonal 1. - etcetera

##### The Algorithm

Finally let's take a look at the algorithm.

Each line comes with extended explanation, to associate everything with what we've discussed so far.

```
private mutating func formChangesCore<Source: BidirectionalCollection,
Target: BidirectionalCollection>(
from a: Source, to b: Target,
x: Source.Index, y: Target.Index,
by areEquivalent: (Source.Element, Target.Element) -> Bool
) where
Source.Element == Target.Element, Source.Index == SourceIndex,
Target.Index == TargetIndex
{
// current furthest position coordinate,
// according to our collection common prefix.
// a.k.a. the furthest path with `0` non-diagonal moves
var (x, y) = (x, y)
// our Target Destination coordinates
let (n, m) = (a.endIndex, b.endIndex)
// initialize the storage
var v = _SearchState<Source.Index, Target.Index>(consuming: &pathStorage)
// put the end of the common prefix endpoint
// at position `(0,0)` of our storage
v.appendFrontier(repeating: (x, y))
// set the iteration number
var d = 1
// used later
var delta = 0
// iteration body
outer: while true {
// expand our storage to contain a new row of _search frontier_
// a.k.a. the furthest paths endpoints of the `d`th iteration
v.appendFrontier(repeating: (n, m))
// one of the observations of the algorithm was about the fact
// that at a given iteration D, all the furthest paths would
// lie at the diagonals { -D, -D + 2, ... D -2, D }
// This observation explains the following stride
for k in stride(from: -d, through: d, by: 2) {
// if we're finding the leftmost new path, a.k.a. the one with
// diagonal -d, or if we're finding the new furthest path at a
// diagonal kth, which is not the right most new diagonal,
// a.k.a. k == t, and if the furthest path in diagonal k-1 has
// traveled our `target` collection less than or furthest path
// in diagonal k+1 in our previous iteration...
if k == -d || (k != d && v[d - 1, k - 1].x < v[d - 1, k + 1].x) {
// ...then we start computing our new furthest path by
// reading the previous iteration furthest path at diagonal
// k+1
(x, y) = v[d - 1, k + 1]
// if we are not at the edge, move down by one
if y != m { b.formIndex(after: &y) }
} else {
// like above, but this time we use the previous iteration
// furthest path at diagonal k-1
(x, y) = v[d - 1, k - 1]
// if we are not at the edge, move right by one
if x != n { a.formIndex(after: &x) }
}
// with the if/else above we have just "used" our non-diagonal
// move of this iteration
// now that we've moved either down or right by one, check if
// we can move diagonally, and do it as long as you can.
// find the common prefix, according to our comparison method,
// from the current position in the two collection states
let matches = a[x..<n]._commonPrefix(with: b[y..<m],
by: areEquivalent)
// our new furthest path at the *d*th iteration for diagonal
// *k*th is equivalent to the end of the prefix that we've
// just found
(x, y) = (matches.0.endIndex, matches.1.endIndex)
// store the new endpoint
v[d, k] = (x, y)
// if this new endpoint matches the target destination...
if x == n && y == m {
// ...note down at which diagonal we've stopped at..
delta = k
// ...and end the algorithm.
break outer
}
}
// if we haven't found found the target destination yet,
// increase the depth by one and iterate again.
d += 1
}
// after finding the cheapest path at the *d*th iteration, on
// the *delta*th diagonal, convert our `_SearchState` instance
// storage into the final `_CollectionChanges` instance.
self = v.removeCollectionChanges(a: a, b: b, d: d, delta: delta)
}
```

** Open Source iOS & macOS Chat SDK** - Check out Stream's Cross-Platform Open Source Chat SDK on GitHub! Write once and deploy your app with fully featured chat UI on iOS and macOS!

### Conclusions

There you have it: if you've come so far, congratulations! You now know exactly how Swift implements the diffing between two collections.

It took us a long journey to arrive here, however, for something non-trivial and error-prone like collection diffing, it's very good to have a native, efficient solution implemented right into the language itself.

Lastly, if you are an iOS or mac developer, I would like to remind you that you shouldn't use this in collection and table views:

with the upcoming iOS and macOS releases, we will have an even better API that takes care of diffing and more for us. You can watch and hear all about Collection and Table Diffable Data Sources in the WWDC19 session here, or, if you prefer an article form, you can read about them in John Sundell's article here.

Thank you for reading and stay tuned for more articles!

### References

- The original pitch.
- The Swift Evolution proposal.
- The proposal review thread.
- The Swift implementation here and here.