April 11, 2013

Hash functions are not the only answer to load balancing problems

Load balancing is important in many networks because it allows to spread the load over different ressources. Load balancing can happen at different layers of the protocol stack. A web server farm uses load balancing to distribute the load between different servers. Routers rely on Equal Cost Multipath (ECMP) to balance packets without breaking TCP flows. Bonding enables to combine several links in the datalink layer.

Many deployments of load balancing rely on the utilisation of hash functions. ECMP is a typical example. When a router has several equal cost paths towards the same destination, it can use a hash function to distribute the packets over these paths. Typically, the router would compute for each packet to be forwarded hash(IPsrc, IPdst, Portsrc, Portdst) mod N where N is the number of equal cost paths towards the destination to select the nexthop. Various hash functions have been evaluated in the literature RFC 2992 : CRC, checksum, MD5, … Many load balancing techniques have adopted hash functions because they can be efficiently computed. An important criteria in selecting a function to perform load balancing is whether a small change in its input gives a large change in its output. This is sometimes called the avalanche effect and is a requirement for strong hash functions used in crypto applications. Unfortunately, hash functions have one important drawback : it is very difficult to predict an input that would lead to a given output. For crypto applications, this is the expectation, but many load balancing applications would like to be able to predict the output of the load balancing function while still benefitting from the avalanche effect.

In a recent paper, we have shown that it is possible to design a load balancing technique that both provides the avalanche effect (which is key for a good balancing of the traffic) and is predictable. The intuition behind this idea is that the hash function can be replaced by a block cipher. Block ciphers are usually used to encrypt/decrypt information by using a secret key. Since they are designed to provide an output that appears as random as possible for a wide range of inputs, they also exhibit the avalanche effect. Furthermore, provided that the key is known, the input that leads to a given output can be easily predicted. Our paper provides all the details and shows the benefits that such a technique can provide with Multipath TCP in datacenter networks, but there are many other potential applications.

Posted by Olivier Bonaventure