# Erasure coding metrics

This document presents a reliability and a redundancy metric that can help tune the erasure coding mechanism to ensure reliability, efficiency and minimize reconstruction failures in the system.

## Parameters

Name | Type | Description |
---|---|---|

honestPercent | uint | Fraction of chunks assumed to be honest - either sent by honest parents, or (unlikely) sent in a truly random fashion by malicious parents |

totChunks | uint | Total number of chunks generated (by erasure coding) from the underlying message |

reqChunks | uint | Number of chunks required to reconstruct the underlying message |

rcvChunks | uint | Number of honest chunks received from parents in aggregate |

## Reliability metric

The reliability metric captures the (inverse of) the failure rate in the erasure coding system defined by the above parameters.

### Combinatorics

Let `nCr(n, r)`

denote the combinatorics operator. We need to calculate the probability of reconstructing the final message given `rcvChunks`

truly random chunks in a `reqChunks`

-out-of-`totChunks`

erasure coded system. We ignore the fact that a honest peer wouldn't send duplicate chunks and model this as the probability of getting at least `reqChunks`

unique chunks in a random sequence of `rcvChunks`

size.

Define

`h(n)`

: In a sequence of`rcvChunks`

, the number of sequences constructible with exactly`n`

unique chunk types given`n`

total chunk types`s(n)`

: In a sequence of`rcvChunks`

, the number of sequences constructible with exactly`n`

unique chunk types given`totChunks`

chunk types`r`

: Reliability metric which we wish to compute

We compute the reliability metric iteratively as follows

- To compute
`h(1)`

, we can construct only one unique sequence given one unique chunk type.

```
h(1) = 1
```

- To compute
`h`

, we construct all sequences possible with at most`n`

chunk types and remove sequences with less than`n`

unique chunk types, computed by iteratively choosing`k`

out of`n`

possible chunks and subtracting sequences with those`k`

chunks. This leaves sequences with exactly`n`

chunk types.

```
h(n) = (n ** rcvChunks)
- (nCr(n, n - 1) * h(n - 1))
- (nCr(n, n - 2) * h(n - 2))
...
- (nCr(n, 1) * h(1))
```

- To compute
`s`

, we choose`n`

chunk types out of`totChunks`

and construct sequences with them.

```
s(n) = nCr(totChunks, n) * h(n)
```

- We finally compute
`r`

by computing the fraction of sequences that have at most`reqChunks - 1`

chunk types and subtracting it from 1.

```
r = 1 - (s(1) + s(2) + ... + s(reqChunks - 1))/(totChunks ** rcvChunks)
```

### Python reference code

```
import math
def nCr(n, r):
return math.factorial(n)/math.factorial(r)/math.factorial(n-r)
h = [0]
for i in range(1, reqChunks):
h += [i**rcvChunks]
for j in range(1, i):
h[i] -= nCr(i,j)*h[j]
h[0] += nCr(totChunks,i)*h[i]
reliability = 1 - h[0]/totChunks**rcvChunks
print(reliability)
```

## Redundancy metric

The redundancy metric captures the bandwidth multiple that nodes spend in aggregate in the erasure coding system defined by the above parameters.

This is simply given by `(rcvChunks / reqChunks) / honestPercent`

assuming that in the erasure coding system, the total required size of chunks is pretty close to the original message size.