Byzantine resistant networking protocols feat. ODSBR
As a project dedicated to securely optimizing data transmission at the network layer, exploring prior work in the area is unsurprisingly of great interest to us. Unfortunately, while there’s abundant work from the 2000s in the area of P2P networking and performant overlay networks (with a significant amount of seminal work coming from our colleague Prateesh’s lab at MIT - Chord DHT and Resilient overlay networks being amongst a few), there’s substantially much less research out there in the context of Byzantine environments.
We suspect this to be the case due to the unincentivized nature of P2P systems during the pre-blockchain era. While certain anti-adversarial measures had been incorporated into systems like Bittorrent, there was never a billion dollar risk until Bitcoin and Ethereum came along. Fortunately, a particular area that we have identified, Wanets (Wireless Ad-hoc Networks) and Manets (Mobile Ad-hoc networks), has a close resemblance to incentivized P2P networks. Used for wireless communication in cases where there is poor internet connectivity (battlefields, disaster recovery) or low-latency requirements (vehicle-to-vehicle communication), wireless ad-hoc networks have attack scenarios similar to permissionless blockchains (imagine an army or airplane eavesdropping, manipulating or delaying its enemy’s communications or a terrorist causing road accidents).
We thus think that it would be interesting for the community to learn more about existing solutions to problems that slow down P2P communication or even worse, make them insecure. This series on byzantine-resistant networking protocols will feature some high-quality papers that we feel are relevant. In this post we are going to explore “ODSBR - An on-demand secure Byzantine resilient routing protocol for wireless ad hoc networks” by Baruch Awerbuch et al which appeared in ACM TISSEC, a pretty well regarded publication (above CCS for instance). We will start with a brief overview of ODSBR followed by the threat model, routing strategy and the suggested technique to detect malicious actors. We conclude by examining the guarantees provided and considerations of applying this method to blockchain protocols.
What is ODSBR?
ODSBR is an on-demand secure Byzantine resilient routing protocol for wireless adhoc networks. First let’s break the name down and understand what it means.
“On-demand” implies that the route from source to destination is not precalculated but is determined when necessary. This property ensures that the algorithm can work in situations where the network topology is not stable. Though the other side of the coin is that more resources and discovery time are required to find a route for every request.
“Secure Byzantine Resilient” means that the routing technique is secure against Byzantine actors who can behave arbitrarily. As a result, in spite of malicious actors who attempt to disrupt the network while trying their best to not be detected, the protocol provides a mechanism to detect such actors and even if unsuccessful, it provides guaranteed limits on the maximum harm they can cause the network.
“for wireless sensor networks” - Wireless sensor networks have characteristics that are typical of a open decentralized network layer: links between peers show unpredictable characteristics, peers come and go as they wish making the topology of the network dynamic, propagation is peer-to-peer as the range of wireless networks is limited and environments are power-constrained which is not a strict assumption for decentralized networks but good to have as they translate to reduced communication overhead (control packets, bandwidth etc).
ODSBR assumes that the network is not open but rather has a known but unbounded set of nodes. Each node is represented by a public key. One of the major assumptions that might not easily translate to all cases in the distributed world is that source and destination are considered to be trusted. A non adversarial path between source and destination is also assumed to exist. No assumption is made on the number/fraction of nodes in the network that can perform byzantine actions (a plus!) which are defined as any action by an authenticated node performed at the network layer that results in disruption or degradation of the routing service. Sybil attacks are not considered in this paper but very much relevant to open distributed systems. A fault is defined as any disruption that results in significant loss or delay. It can either be an intended attack or benign failure.
The goal of the protocol is to detect byzantine behaviour, be it either in isolation or collusion, and avoid or bound its effect on the overall system.
There are 3 different components to ODSBR.
1. Route discovery
2. Byzantine fault detection
3. Link weight management
Routes are discovered by flooding the network with requests that contain the destination. The requests are forwarded and the actual path discovery happens when the destination receives it and sends back a response. A metric called ‘link weights’ which captures reliability and adversarial behaviour is used to find the most performant path. A node forwards a response only if the cumulative weight arrived at by adding the individual link weights is the least among the ones it received so far. So multiple paths can be found with varying weights from which sender chooses the path with the least weight.
Byzantine Fault Detection
The fault detection mechanism only detects the faulty link but can’t attribute the responsibility to any nodes on either side of the link.
The source starts a timer after sending the data packet. If a valid acknowledgement is not received within a timeout, the source assumes that the packet is lost. A fault is assumed when the loss rate for a path is more than the threshold loss rate defined by the source. Once the source detects that a path is faulty, a probing mechanism is initiated on that path to detect the faulty link.
The probing mechanism uses binary search to detect the faulty link. The source controls the search by specifying on the data packets the list of intermediate nodes that must send an acknowledgement. These intermediate nodes are called probes. Probes divide the path into multiple intervals between which the fault is being searched for and if found, the interval is further divided into two by inserting another probe.
Probes are retired when not necessary as they are an overhead to the protocol. When a fault is detected in the interval, a probe is inserted by dividing the interval into 2 parts. Each side of the probe starts a counter proportional to the amount of loss (due to the corresponding interval) that triggered the fault (thus the probe). The counter decreases with every acknowledgement received while probing. When the 2 counters on both sides of the probe run out, the probe is removed joining both intervals and thus ending the probe. This procedure bounds the rate of loss any malicious behaviour can inflict on the path.
Link Weight Management
Link weight is the metric used to determine the route which is most reliable. Once the malicious links are determined by the detection process, link weights are updated to reflect the faults.
The goal of link weight management is to make sure that faults are penalized as well as that transient failures can be recovered from. The link weight is doubled for every fault so that continuous and persistent faults are heavily punished and consistently bad paths are eventually abandoned. To make sure that nodes can recover from transient faults, the link weight is reset to a default value after a timeout proportional to the number of losses while identifying the faulty link.
ODSBR provides an interesting approach to detect malicious links and in case that the malicious actor behaves intelligently in such a way to avoid detection while still misbehaving, it provides a bound on the loss rate that the misbehaviour can inflict.
ODSBR in the decentralized networks context
The binary search probing technique to detect malicious links is an interesting approach but some of the assumptions behind the approach like a trusted receiver and sender is a difficult assumption to make in a decentralized context. This prevents the approach from being used in and of itself to make global decisions. For example, it's hard to slash someone by believing a certain sender. A malicious or faulty receiver failing to send ACKs would make the node just before the receiver in the path suspect.
However, the technique can be used to make decisions locally to maintain local reputations for self-use. It could possibly also be used globally by corroborating data from multiple sources (albeit in a cheat-proof way).
Sybil resistance can be introduced through staking. Similarly, the assumption about nodes being authenticated is not bad due to the usual existence of a bonding period in staking networks.
ODSBR sends probe requests piggybacked on the data packets through the same path in which a malicious user is known to be present. While it may appear as though that the sender can just use an alternate path instead of wasting resources on searching a faulty link which might not yield results, there might be (and mostly will be) at least 1 malicious node (and thus one faulty path) in alternate paths (if something like 33 or 49% of the whole network is malicious). ODSBR provides a mechanism to bound the loss rate on any path as any link (and associated nodes) which are detected are blacklisted (and thus lose revenue for that period of time). As a corollary, in cases where penalties (in this case blacklisting) are imposed only after a certain threshold, rational nodes are incentivized to strictly provide only the threshold amount of services.
ODSBR provides a number of interesting techniques and insights that can be applied to networking in byzantine environments. Feel free to share your views in our discussion channel or research forum. Do not forget to subscribe to our blog to receive more such interesting content.
Other official social media channels: