Disruption free reconfiguration for link-state routing protocols implementable on real routers
Link state routing protocols like OSPF and IS-IS flood information about the network topology to all routers. When a link needs to be shutdown or its weight changes, a new link state packet is flooded again. Each router processes the received link state packet and updates its forwarding table. Updating the forwarding table of all routers inside a large network can take several hundreds of milliseconds up to a few seconds depending on the configuration of the routing protocol. For his PhD thesis, Pierre Francois studied this problem in details. He first designed a simulator to analyse all the factors that influence the convergence time in a large ISP network. This analysis revealed that it is difficult for a large network to converge within less than 100-200 milliseconds after a topology change. During this period, transient loops can happen and packets get loss even for planned and non-urgent topology changes. Pierre Francois designed several techniques to avoid these transient loops during the convergence of a link-state routing protocol. The first solution required changes to the link-state routing protocol. We thought that getting IETF consensus on this solution would be very difficult. We were right since the framework draft has still not yet been adopted.
At INFOCOM 2007, by refining an intuition proposed by Mike Shand, we proposed a solution that does not require standardisation. This solution relies on the fact that if the weight of a link is changed by one (increase or decrease), no transient loop can happen. However, using only unit weight changes is not practical given the wide range of weights in real networks. Fortunately, our paper showed that when shutting down a link (i.e. setting its weight to infinity), it is possible to use a small number of metric increments to safely perform the reconfiguration. This metric reconfiguration was well accepted by the research community and received the INFOCOM 2007 best paper award.
The algorithms proposed in our INFOCOM 2007 paper were implemented in Java and took a few seconds or more to run. It was possible to run them on a management platform, but not on a router. Two years ago, Pascal Merindol and Francois Clad reconsidered the problem. They found new ways of expressing the proofs that enable loop-free reconfiguration of link-state protocols upon topology changes. Furthermore, they have also significantly improved the performance of the algorithms proposed in 2007 and reimplemented everything in C. The new algorithms operate one order of magnitude faster than the 2007 version. It becomes now possible to implement them directly on routers to enable OSPF and IS-IS to avoid low transient loops when dealing with non-urgent topology changes.
All the details are provided in our forthcoming paper that will appear in IEEE/ACM Transactions on Networking.