Olivier Bonaventure

eBPF in IETF protocols

Tue, 11 Jul 2023 00:00:00 +0200

eBPF in IETF protocols

In 1992, Steven MacCanne and Van Jacobson proposed the BSD Packet Filter as an efficient technique to implement flexible packet filters in the tcpdump packet capture tool. In a nutshell, BPF specifies a small instruction set which be used to write programs that are executed on the captured packets to determine which packets can be matched. BPF is the standard solution to build powerful packet filters in tcpdump and related tools.

Almost 25 years later, Alexei Starovoitov used a similar approach to design extended BPF (eBPF). eBPF is an extension of the original BPF instruction set which is supported by a virtual machine, a Just-In-Time compiler and a verifier inside the Linux kernel. With eBPF, it becomes possible to add executable bytecode inside the Linux kernel to tune its behavior without having to recompile the kernel or use modules. eBPF is now widely used inside the Linux kernel and there are many use cases.

eBPF has also been ported to Microsoft Windows. To encourage the adoption of eBPF in other use cases than the Linux kernel, the IETF has chartered the BPF/eBPF working group to specify the basic features of this technology, namely the eBPF Instruction Set, the expectations for the verifier, …

eBPF should not be limited to operating system kernels such as Linux and Windows. There are many other use cases where the extensibility of eBPF can be beneficial. The IETF spends a lot of efforts to maintain and slowly extend widely deployed protocols. In the last years, we have demonstrated that by leveraging eBPF inside protocol implementations, it is possible to make them much more flexible. Here are a few examples that illustrate the flexibility that eBPF can bring to IETF protocols.

A first example is IPv6 Segment Routing or SRv6. David Lebrun has implemented SRv6 in the Linux kernel and Mathieu Xhonneux added eBPF features to this implementation to make it easier to extend.

Our second example is the utilization of eBPF to monitor the networking stack. This is one of the classical use cases for eBPF today. Olivier Tilmans proposed the COP2 set of eBPF programs to monitor the performance of the networking stack and export the collected metrics using IPFIX.

Our third example is TCP. TCP is the standard reliable transport protocol for many applications. Viet-Hoang Tran showed that it is possible to leverage eBPF to extend both TCP and Multipath TCP by adding new TCP options in the Linux TCP stack. Mathieu Jadin took a different approach and proposed the TCP Path Changer (TPC). The TPC leverages eBPF and IPv6 Segment routing to allow a TCP stack to react to different types of events by moving established TCP connections to a different path.

QUIC is a new IETF protocol that replaces the classical TLS over TCP stack. It is already widely deployed by cloud providers. Quentin De Coninck, Maxime Piraux, Francois Michel showed in Pluginizing QUIC that it is possible to use eBPF inside a QUIC implementation to support various protocol extensions including Multipath, VPN or Forward Erasure Correction.

Our last example is BGP. BGP is the most important Internet routing protocol. It slowly evolves. With xBGP, Thomas Wirtgen demonstrated that by adding eBPF support in the BIRD and FRRouting implementations it becomes possible for network operators to develop their own extensions to the BGP protocol and execute them as eBPF program.

Once the IETF BPF/eBPF working group has finished the standardization of the basics of eBPF the IETF should start to discuss the utilization of eBPF inside various Internet protocols.

Welcome to the NCS blog

Wed, 12 Apr 2023 00:00:00 +0200

Welcome to the NCS blog

There are now six research groups within the ICTEAM institute at UCLouvain that are active in Networking, Cybersecurity and Systems in general. You can find information about there recently published articles of these six research groups on https://ncs.uclouvain.be

Conext’21 community award for TCPLS

Mon, 13 Dec 2021 00:00:00 +0100

Conext’21 community award for TCPLS

Our latest article TCPLS: Modern Transport Services with TCP and TLS written by Florentin Rochet in cooperation with Emery Assogba (UAC, Benin), Maxime Piraux, Korian Edeline (ULiege), Benoit Donnet (ULiege) and Olivier Bonaventure received the community award at Conext’21.

The committee was excited about the vision of this paper about the need for rethinking the transport service for emerging applications and appreciated the solid execution of the paper. The `TCPLS source code is freely available to encourage other researches to expand the work.

2021 ANRP award for xBGP

Tue, 05 Jan 2021 00:00:00 +0100

2021 ANRP award for xBGP

Every year, the Internet Research Task Force awards the Applied Networking Research Prize (ANRP) to recognise the best recent results in applied networking, interesting new research ideas of potential relevance to the Internet standards community. In late 2020, 78 papers published in networking conferences and journals were nominated. Our xBGP paper, xBGP: When You Can’t Wait for the IETF and Vendors written by Thomas Wirtgen (UCLouvain), Quentin De Coninck (UCLouvain), Randy Bush (Internet Initiative Japan & Arrcus, Inc), Laurent Vanbever (ETH Zürich) and Olivier Bonaventure (UCLouvain) was selected with five other articles. It will be presented by Thomas Wirtgen at a forthcoming IETF meeting.

This is the fourth time that one of our papers is selected by the IRTF for an ANRP.

A podcast on Multipath TCP, the coolest protocol you’re already using but didn’t know

Mon, 04 Jan 2021 00:00:00 +0100

A podcast on Multipath TCP, the coolest protocol you’re already using but didn’t know

A few weeks ago, I had the opportunity to chat with Derick Winkworth and Brandon Heller for the seekingtruthinnetworking During this conversation, we first discussed about the evolution of Multipath TCP that they consider as one of the coolest networking protocols. We discussed some of the lessons learned while designing and deploying Multipath TCP and also the current use cases that it enables. Apple uses Multipath TCP to improve user experience with seamless handovers for applications like Siri, Apple Maps or Apple Music. In parallel, Tessares pioneered the deployment of hybrid access networks that use Multipath TCP to combine xDSL and cellular to provide faster Internet services in rural areas in several countries. The 0-rtt convert protocol designed for this use case has been adopted by 3GPP for the Access Traffic Steering, Switching & Splitting (ATSSS) functions that enables 5G devices to efficiently combine 5G with 5G. We also discussed our recent research results on pluginizing Internet protocols and the open-source Computer Networking : Principles,Protocols and Practice ebook.

You can listen to this podcast from https://seekingtruthinnetworking.com/podcast/episode-7-olivier-bonaventure/

It is also available on Apple podcasts and Spotify

Two Hotnets papers

Wed, 04 Nov 2020 00:00:00 +0100

Two Hotnets papers

Hotnets is a yearly SIGCOMM workshop that focuses on new and innovative research ideas that could be start new research directions.

During Hotnets’20, we present two papers that propose new ideas for two of the key Internet protocols: BGP and TCP.

The first paper is entitled xBGP: When You Can’t Wait for the IETF and Vendors. It was written by Thomas Wirtgen, Quentin De Coninck, Randy Bush, Laurent Vanbever and Olivier Bonaventure. Since the early days of BGP, network operators have proposed changes and improvements to the protocol and its implementation. This has resulted in many discussions within the IETF and among network vendors to identify the extensions that need to be implemented and standardized. In this paper, we propose a very different approach. Since BGP implementations will evolve, we consider each BGP implementation as a kind of micro-kernel that exposes a vendor-neutral API which can be used by network operators to extend the protocol and develop protocol extensions. We implement this new idea on two different BGP implementations. To enable network operators to develop a single extension that runs on any implementation, we embed an eBPF virtual machine inside each BGP implementation. This virtual machine executes the protocol extensions that we call plugins. We use plugins to implement four very different BGP extensions.

The second paper is entitled TCPLS: Closely Integrating TCP and TLS. It was written by Florentin Rochet, Emery Assogba and Olivier Bonaventure. During the last years, the IETF has devoted a lot of effort to standardize QUIC: a new transport protocol that combines the main functions of the TLS and the transport layer (reliability, congestion control, …) but runs above UDP so that it can be implemented inside userspace libraries. Despite the growing interest in QUIC, TCP remains the most popular Internet transport protocol. In this paper, we should that by combining TCP and TLS together, it is possible to design and implement a transport protocol that has much better properties than TCP alone, but still benefits from TCP’s advantages compared to QUIC. We implement a TCPLS prototype and use it to demonstrate the benefits that TCPLS could bring to current and future Internet applications.

These two articles are part of a broader effort that aims developing a new methodology to design and implement Internet protocols. See https://pluginized-protocols.org/ for additional information and pointers to our current prototypes.

Bringing Multipath capabilities in QUIC

Thu, 22 Oct 2020 00:00:00 +0200

Bringing Multipath capabilities in QUIC

During the last years, the use cases for Multipath TCP have continued to grow. Multipath TCP is used on all iPhones to provide seamless handovers and improve performance for Siri, Apple Music and other applications. This successful deployment has encouraged 3GPP to adopt it for the ATSSS service that will future 5G smartphones to seamlessly use Wi-Fi and cellular networks. Multipath TCP is also used by network operators to deploy hybrid access networks that combine cellular and xDSL networks.

In parallel with this deployment, the IETF has finalised the specification of the QUIC protocol. In a nutshell, QUIC combines the functions that are typically found in the transport and TLS layer in a single protocol that runs above UDP so that it can be implemented as a userspace library. When the QUIC working group was chartered, we argued for the inclusion of multipath capabilities in this new protocol. This item was added to the charter and we proposed a first design for Multipath QUIC.

The QUIC working group spent more time than expected on designing the protocol and we adapted the design of Multipath QUIC to the different changes to the protocol. The latest version of Multipath QUIC is cleaner and better adapted to QUIC. The multipath requirements remain and QUIC version only provides a poor man’s solution with its connection migration capabilities that have not yet been evaluated in the field.

On October 22nd, 2020, the chairs of the QUIC working group organised an interim meeting to better understand the multipath requirements and how they could be included in QUIC. Key presentations during this interim meeting include :

Christoph Paasch on Multipath at Apple

Olivier Bonaventure on Hybrid acces networks and requirements on QUIC

Yanmei Liu and Yunfei Ma on MQUIC use cases

The recordings are also available on https://youtu.be/p-ArboToDmk

SIGCOMM’20 tutorial on Multipath Transport Protocols

Mon, 10 Aug 2020 00:00:00 +0200

SIGCOMM’20 tutorial on Multipath Transport Protocols

During ACM SIGCOMM’20, I gave with Quentin De Coninck a three hours tutorial on Multipath Transport Protocols. During this tutorial, we summarised the most important features of two multipath transport protocols that we co-designed: Multipath TCP and Multipath QUIC. Here are the materials that we prepared for this tutorial:

Slides for Multipath TCP part (pdf or pptx)

Slides for Multipath QUIC

Hands-on materials (vagrant box)

Tutorial video recording

The vagrant box contains both a Multipath TCP kernel and a recent version of pquic including the multipath plugin. You can use both to experiment with Multipath TCP and Multipath QUIC.

Multipath TCP proxies

Tue, 28 Jul 2020 00:00:00 +0200

Multipath TCP proxies

Network operators want to leverage the unique capabilities of Multipath TCP for two main types of use cases:

aggregate the bandwidth of heterogenous paths such as xDSL and cellular to support Hybrid Access Networks
seamless handovers and cellular/Wi-Fi offload for smartphones and mobile devices

The Broadband Forum and 3GPP have standardised solutions architectures that rarely on Multipath TCP proxies to support those service since servers rarely support Multipath TCP. We had identified this problem several years ago and proposed efficient Multipath TCP proxies in Multipath in the Middle(Box) in 2013.

To enable network operators to deploy such proxies, it was important to document them in IETF RFCs. Despite the simplicity of the idea, it took several years of effort to convince the IETF of documenting this approach in the 0-RTT TCP Convert Protocol that is now published as RFC 8803.

Open-source networking ebook

Wed, 01 Jul 2020 00:00:00 +0200

Open-source networking ebook

For more than a decade and with the help of several Ph.D. and Master students, we have developed a series of open educational resources to help training the next generation of networking students. A recent summary of our effort appears in the paper Open Educational Resources for Computer Networking published in the July 2020 issue of SIGCOMM’s CCR. The CCR Editor summarises our contribution as What distinguishes this textbook from traditional ones is that it not only is it free and available for anyone in the world to use, but also, it is also interactive. Therefore, this goes way beyond what a textbook usually offers: it is an interactive learning platform for computer networking.

Innovating with Multipath TCP

Mon, 22 Jun 2020 00:00:00 +0200

Innovating with Multipath TCP

There is a huge difference between research and innovation. Research often means finding new and original results. In the networking community, this often means inventing new protocols, finding new alogorithms, collecting measurements, … Innovation is different. According to the business dictionnary, innovation is “The process of translating an idea or invention into a good or service that creates value or for which customers will pay.”

Multipath TCP is an interesting example of a research result that drives innovation. In 2013, Apple started to use it to improve the performance of the SiRi voice recognition application. Two years later, several researchers from the IP Networking Lab created the Tessares to develop hybrid access networks that rely on Multipath TCP to efficiently combine xDSL and cellular networks. The main target of these hybrid networks are rural areas where existing xDSL networks do not provide enough bandwidth to support the current needs of rural users. The xDSL network is then complemented with the LTE network to boost the bandwidth when required.

This technology has been standardised by the BBF as TR-348 Hybrid Access Broadband Architecture. It is now deployed in multiple countries where rural users benefit from improved Internet access.

A recent paper published in IEEE Communications Standards Magazine and entitled Increasing Broadband Reach with Hybrid Access Networks provides additional technical details about this Multipath TCP use case.

Contributing to Multipath TCP in six Ph.D. theses

Mon, 30 Mar 2020 00:00:00 +0200

Contributing to Multipath TCP in six Ph.D. theses

Multipath TCP is a major extension TCP extension that has attracted a lot of interest from both academia, with more than one thousand citations for RFC 6824 and industry with deployments by Apple, Samsung, Huawei, LG, Tessares, … It received the 2019 ACM SIGCOMM’s Networking Systems award. The initial ideas on Multipath TCP emerged within the FP7 Trilogy funded by the European Commission and a large part of the work was carried out within UCLouvain’s IP Networking Lab During the last decade, six Ph.D. theses were granted with results that contributed to the different aspects of Multipath TCP.

Sébastien Barré wrote the first implementation of Multipath TCP in the Linux kernel. His Ph.D. thesis, Implementation and assessment of Modern Host-based Multipath Solutions, describes this initial implementation and evaluates its performance. He later co-founded Tessares that deploys Multipath TCP solutions for network operators.

Christoph Paasch moved our Multipath TCP implementation to the next level by designing many enhancements and contributing to the IETF. His Ph.D. thesis, Improving Multipath TCP remains the best reference for the Multipath TCP implementation in the Linux kernel. He then moved to Apple in California where he has pushed Multipath TCP further.

Gregory Detal focused on making Multipath TCP easier to deploy with Multipath TCP proxies. His Ph.D. thesis, Evaluating and Improving the Deployability of Multipath TCP was the starting point for the creation of Tessares.

Fabien Duchêne studied different use cases in his Ph.D. entitled Helping the Internet scale by leveraging path diversity. His solution to support Multipath TCP in load-balancers has been included in RFC 8684.

Viet-Hoang Tran showed in his Ph.D. thesis entitled Measuring and extending Multipath TCP that it is possible to dynamically extend the Linux TCP and MPTCP implementations by using eBPF. A similar approach has been pushed by facebook engineers in the mainline Linux kernel.

Quentin De Coninck started to work on improving Multipath TCP, but then designed Multipath QUIC and Pluginized QUIC in his Ph.D. entitled Flexible Multipath Transport Protocols

The history of Multipath TCP

Sun, 01 Dec 2019 00:00:00 +0100

The history of Multipath TCP

During the last decade, we heavily contributed to the design, implementation and the deployment of Multipath TCP. I was interviewed by Russ White as part of the History of Networking podcast series to summarize the history of Multipath TCP. You can download this podcast from https://historyofnetworking.s3.amazonaws.com/Olivier+Bonaventure+-+MTCP.mp3

Multipath TCP : the first ten years

Tue, 04 Dec 2018 00:00:00 +0100

Multipath TCP : the first ten years

The research on Multipath TCP started a bit more than ten years ago with the launch of the FP7 Trilogy project. During this decade, Multipath TCP has evolved a lot. It has also generated a lot of interest within the scientific community with several hundreds of articles that use, extend or reference Multipath TCP. As an illustration of the scientific impact of Multipath TCP, the figure below shows the cumulative number of citations for the sequence of internet drafts that became RFC 6824 according to Google Scholar.

The industrial impact of Multipath TCP is also very important as Apple uses it on all iPhones and several network operators use it to create Hybrid Access Networks that combine xDSL and LTE to provide faster Internet services in rural areas.

On all the remaining days until Christmas, a new post will appear on the Multipath TCP blog to illustrate one particular aspect of Multipath TCP with pointers to relevant scientific papers, commercial deployments, … This series of blog posts will constitute a simple advent calendar that could be useful for network engineers and researchers who want to understand how this new protocol works and why it is becoming more and more important in today’s Internet.

How can universities contribute to future Internet protocols ? Our experience with Segment Routing

Fri, 09 Nov 2018 00:00:00 +0100

How can universities contribute to future Internet protocols ? Our experience with Segment Routing

The Internet continues to evolve. Given its commercial importance, a large fraction of this evolution is driven by large telecommunications and cloud companies with input from various stakeholders such as network operators. In this growingly commercial Internet, some of my colleagues wondered the role that University researchers could play ? Different researchers have different strategies. Within the IP Networking Lab, we focus our research on protocols and techniques which can improve the Internet in the medium to long term. As most of the researchers of the group are Ph.D. students, it is important for them to address research problems that will remain relevant at the end of their thesis, typically after four years. The selection of a research topic is a strategic decision for any academic lab. While many labs focus on a single topic for decades and explore every of its aspects, I tend to switch the focus of the group every 5-7 years. During my Ph.D., I explored the interactions between TCP and ATM, but when I became professor, I did not consider that the topic for still relevant enough to encourage new Ph.D. students to continue to work on it. Looking at the evolution of the field, I decided to focus our work on traffic engineering techniques and the Border Gateway Protocol. This lead to very successful projects and Ph.D. theses. In 2008, the Trilogy project gave us the opportunity to work on new future Internet protocols. With a group a very talented Ph.D. students, we played a key role in the design, development and deployment of Multipath TCP. Its successes have exceeded our initial expectations.

However, despite the benefits of Multipath TCP, betting the future of the entire research group on a single protocol did not seem to be the right approach. In early 2013, I was impressed by a presentation that Clarence Filfils gave at NANOG. This presentation completely changed my view on MPLS. MPLS emerged in the 1990s from the early work on IP switching and tag switching. One of the early motivations for MPLS was the ability to reuse the ATM and frame relay switching fabrics that were, at that time, more powerful than their IP counterparts. When the MPLS shim header appeared, the IETF required MPLS to be agnostic of the underlying routing protocol and for this reason designed the LDP and RSVP-TE signalling protocols. Over the years, as MPLS networks grew, these two protocols became operational concerns.

Segment Routing was initially presented as a drastic simplification of the networking architecture. Instead of requiring the utilisation of specific signalling protocols, it relies on the existing link state routing protocols such as OSPF and IS-IS to distribute the MPLS labels. I saw that as a major breakthrough for future MPLS networks. Beyond the expected impact on networking protocols, Segment Routing brought a fundamental change to the way paths are computed in a network. A unique feature of Segment Routing compared to all the other networking technologies is that with Segment Routing a path between a source and a destination node is composed as a succession of shortest paths between intermediate nodes. With the MPLS variant of Segment Routing, these paths are identified by their MPLS label that is placed inside each packet. With the IPv6 variant of Segment Routing, these paths are encoded as a source route inside the IPv6 Segment Routing Header. This contrasts with popular networking architectures such as plain IP that uses a single shortest path between the source and the destination while MPLS with RSVP-TE can be configured to use any path. These different types of paths have lead to very different traffic engineering techniques. In pure IP networks, a popular technique is to tune the weights of the link-state routing protocol. With Segment Routing, the traffic engineering problem can be solved by using very different techniques. During the last years, we have proposed several innovative solutions to optimise the traffic flows in large networks.

This is illustrated in the figure below. The numbers associated to the links are the IGP weights. With pure IP routing, the path from node a to node f is the shortest one, i.e. the one via node a. With RSVP-TE, any path can be constructed between node a and node f, e.g. a-g-b-c-e-f, but this requires state on all intermediate nodes. With Segment Routing, we trade the state in the routers with labels in the packets. A path is now a succession of shortest paths. For example, the figure below shows the a-c-f paths. To send packets along those paths, node a sends packets that contain two labels: (1) the label to reach node c and (2) the label to reach node f. The packets are first forwarded according to node c’s label and there are two shortest paths of equal cost between a and c. When they reach node c, it pops the top label and then packets are forwarded along the shortest path to reach node f.

A few research labs, including the IP Networking Lab have actively participated to the development of Segment Routing. Our research started almost at the same time as the initial work within the Spring IETF working group. Despite the visibility of this working group, we decided to not actively participate to the standardisation of the MPLS variant of Segment Routing. Instead, we focused our work on two different but very important aspects of Segment Routing. The first one is the design of innovative optimisation techniques that can be applied by network operators to leverage the unique characteristics of Segment Routing. The second one is the IPv6 variant of Segment Routing. Both problems were important and they were not the initial focus of IETF working group. This gave us enough time to carry research whose results could have an impact on the development of Segment Routing.

Let us start with the optimisation techniques. This work was carried out in collaboration with two colleagues: Yves Deville and Pierre Schaus. Our first approach to solve this problem was presented at SIGCOMM’15 in A Declarative and Expressive Approach to Control Forwarding Paths in Carrier-Grade Networks. This was the first important traffic engineering paper that leverages the unique features of Segment Routing. Renaud Hartert, the Ph.D. student who initiated this traffic engineering work, presented at INFOCOM’17 a faster solution in Expect the unexpected: Sub-second optimisation for segment routing. His Ph.D. thesis, Fast and scalable optimisation for segment routing, contains other unpublished techniques.

Another Ph.D. student, Francois Aubry explored other use cases than the classical traffic engineering problem. In SCMon: Leveraging Segment Routing to Improve Network Monitoring, presented at INFOCOM’16, he proposed a new technique to create efficient cycles that a monitoring node can used to verify the performance of a live network. His most recent paper that will be presented at Conext’18, Robustly Disjoint Paths with Segment Routing demonstrates that it is possible with Segment Routing to create disjoint paths that remain disjoints even after a link failure. He is currently preparing his Ph.D. thesis.

David Lebrun explored the networking aspects of Segment Routing during his Ph.D. He started his Ph.D. at the same time as the initial thinkings about the IPv6 variant of Segment Routing and has proposed several important contributions. He was the first to implement IPv6 Segment Routing in the Linux kernel. His implementation has heavily influenced several of the design choices that have shaped the specification of the IPv6 Segment Routing Header. His implementation has been described in Implementing IPv6 Segment Routing in the Linux Kernel and in his Ph.D. thesis. It has been included in the mainline Linux kernel since version 4.14. This implies that any Linux host can now use IPv6 Segment Routing. Besides this kernel implementation, David Lebrun has demonstrated in a paper that was presented at SOSR’18 how enterprise networks could leverage IPv6 Segment Routing.

Our most recent work has contributed to the utilisation of IPv6 Segment Routing to support Network Function Virtualisation or Service Function Chaining. The IPv6 variant of Segment Routing enables a very nice feature that is called network programming. With network programming, a router can expose network functions as IPv6 addresses and the packets that are sent towards those addresses are processed by a specific function on the router before being forwarded. This idea looks nice on paper and reminds older researchers of active networks that were a popular research topic around 2000. Within his Master thesis, Mathieu Xhonneux proposed to use eBPF to implement such network functions on Linux. His architecture is described in more details in Leveraging eBPF for programmable network functions with IPv6 Segment Routing with several use cases. It has also been accepted in the mainline Linux kernel. Another use case is described in Flexible failure detection and fast reroute using eBPF and SRv6.

Looking at out last five years of research on Segment Routing, I think that there are two important lessons that would be valid for other research groups willing to have impact on Internet protocols. First, it is important to have a critical mass of 3-4 Ph.D. students who can collaborate together and develop different aspects in their own thesis. The second lesson is the importance of releasing the artefacts associated to our research results. These artefacts encourage other researcher to expand our work. Our implementations that are now included in the official Linux kernel go beyond the simple reproducibility of our research results since anyone will be able to use our code.

This research on Segment Routing has been funded by the ARC-SDN project, FRIA Ph.D. fellowships and a URP grant from Cisco. It continues with a facebook grant.

Concrete steps to improve the reproducibility of networking research

Sun, 14 Oct 2018 00:00:00 +0200

Concrete steps to improve the reproducibility of networking research

Schloss Dagstuhl – Leibniz Center for Informatics is a well-known and important place for Computer Science. Since 1990, it is a meeting place for Computer Science researchers who spent a few days or a week discussing about interesting research issues in a small castle in South Germany. Dagstuhl is well-known for the high quality of its seminars.

Last week, together with kc Claffy, Daniel Karrenberg and Vaibhav Bajpai, I had the pleasure to organise a Dagstuhl seminar on Encouraging Reproducibility in Scientific Research of the Internet. As many fields of Science, networking research has some reproducibility issues. During the seminar, the thirty participants discussed on several very concrete steps to improve the reproducibility of networking research. This is a long-term objective that the community needs to tackle step by step to achieve a sustainable solution.

Several very excellent ideas where discussed during the seminar and some of them will materialise in the coming months. The first one is a review form for research artifacts that will be used for the evaluation of the artifacts of SIGCOMM-sponsored conferences in the coming weeks. It will probably be followed by a set of guidelines to help young researchers to carry out reproducible research and other more long-term ideas.

A printed ebook

Sun, 14 Oct 2018 00:00:00 +0200

A printed ebook

In 2008, UCLouvain agreed to reduce my teaching load to let me concentrate on writing the open-source Computer Networking: Principles, Protocols and Practice ebook. Since then, this ebook has served as required textbook for basic networking courses at UCLouvain and at various universities. The ebook is completely open and available under a creative-commons license. This license enables anyone, including for-profit printing companies, to distribute and sell copies of the book. Honestly, I haven’t checked whether someone had decided to take the financial risk of printing copies of the book. With today’s print on demand solutions, the risk is very small.

While attending a Dagstuhl seminar on Encouraging Reproducibility in Scientific Research of the Internet, I visited their famous library, probably one of the best Computer Science libraries. I was delighted to see that they had bound copy of the first edition of Computer Networking: Principles, Protocols and Practice. This is the first time that I saw a copy of this book.

This printed version has been published by Textbook Equity.

A blog to complement university courses

Thu, 13 Sep 2018 00:00:00 +0200

A blog to complement university courses

When I was a student, university courses where an opportunity for the professor to teach all the important principles on a given topics to the students who registered for the course. At that time, students almost only used the course syllabus or one reference book. They rarely went to the library to seek additional information on any topic discussed by the professor. This forced the professor to be as complete as possible and cover all the important topics during the classes.

Today’s professors have a completely different job. Given the vast amount of information that is available to all students over the Internet, university courses have become a starting point that guide students in their exploration of the course topic. It remains important to teach the key principles to the student, but it becomes equally important to encourage them to explore the field by themselves. There are several activities that professors can organise in their classes to encourage the students to go further. For example, my networking course is based on the open-source Computer Networking: Principles, Protocols and Practice ebook. Initially, the ebook was distributed as a pdf file. The students were satisfied with the contents of the ebook, but they almost never spent time in the library to look at the books and articles referenced in the bibliography. This changed dramatically in 2011 after I modified the bibliography to include clickable URLs for most cited references. Since then, I observed that more and more students spent time to look at some references, including RFCs, to better understand specific parts of the course.

Another activity that I organise within the networking course to encourage students to explore the field is the detailed analysis of a popular website that each student has to carry out. During the last month of the semester, i.e. once the students has understood the basics of computer networking and some of the key protocols, each student has to apply his/her knowledge by writing a detailed four-page report that analyses the operation of a popular website. During the course, the students learns the basics of DNS, TLS, HTTP, TCP, IPv6 and they mobilise this knowledge to understand the protocol optimisations done by popular websites. They use standard tools such as the developper extensions of web browsers, dig, traceroute, wireshark, tcpdump, or openssl to interact with the website and analyse the protocol optimisation that it supports. During this analysis, they often see unexpected results that force them to understand in more details one of these protocols by looking at tutorials on the web, scientific articles or internet drafts and RFCs. With this kind of activity, the students gain a more in-depth knowlege of the Internet protocols that are explained during the course. More importantly, they also learn to find accurate technical information on the web, which is a very important skill for any computer scientist.

The exam is an important event for the students. It confirms that they have mastered the topic. However, the topics that were discussed during the course continue to evolve after the exam. While the basic principles of computer networking are stable, Internet protocols continue to evolve at a rapid pace. Various updates have been made to the Computer Networking: Principles, Protocols and Practice ebook. This ensures that future students will use up-to-date material to start their exploration of the networking field. However, former students are also interested in the evolution of the field and do not want to wait for the next edition of the ebook. For them, I have launched a companion blog for the ebook. On this blog, I summarise recent news, articles, or Internet drafts that could affect the evolution of the field. This blog is also available as an RSS feed.

TLS or HTTPS everywhere is not necessary the right answer

Fri, 08 May 2015 00:00:00 +0200

TLS or HTTPS everywhere is not necessary the right answer

Since the revelations about the massive surveillance by Edward Snowden, we have observed a strong move towards increasing the utilisation of encryption to protect the end-to-end traffic exchanged by Internet hosts. Various Internet stakeholders have made strong move on recommending strong encryption, e.g. :

The IETF has confirmed in RFC 7258 that pervasive monitoring is an attack and needs to be countered

The EFF has promoted the utilisation of HTTPS through the HTTPS-everywhere campaign and browser extension

The Let’s Encrypt campaign prepares a new certification authority to ease the utilisation of TLS

Mozilla has announced plans to deprecate non-secure HTTP

Most large web companies have announced plans to encrypt traffic between their datacenters

…

Pervasive monitoring is not desirable and researchers should aim at finding solutions, but encrypting everything is not necessarily the best solution. As an Internet user, I am also very concerned by the massive surveillance that is conducted by various commercial companies.

http://arstechnica.com/security/2013/11/encrypt-all-the-worlds-web-traffic-internet-architects-propose/

Segment Routing in the Linux kernel

Thu, 13 Nov 2014 00:00:00 +0100

Segment Routing in the Linux kernel

Segment Routing is a new packet forwarding technique which is being developed by the SPRING working group of the IETF. Until now, two packet forwarding techniques were supported by the IETF protocols :

datagram mode with IPv4 and IPv6

label swapping with MPLS

Segment Routing is a modern realisation of source routing that was supported by IPv4 in RFC 791 and initially in IPv6 RFC 2460. Source routing enables a source to indicate inside each packet that it sends a list of intermediate nodes to reach the final destination. Although rather old, this technique is not widely used today because it causes several security problems. For IPv6, various attacks against source routing were demonstrated in 2007. In the end, the IETF chose to deprecate source routing in IPv6 RFC 5095.

However, source routing has several very useful applications inside a controlled network such as an entreprise or a single ISP network. For this reason, the IETF has revived source routing and considers two data planes :

IPv6

MPLS

Evolution of link bandwidths

Mon, 15 Sep 2014 00:00:00 +0200

Evolution of link bandwidths

During my first lesson for the undergrad networking class, I wanted to provide the students with some historical background of the evolution of link bandwidth. Fortunately, wikipedia provides a very interesting page that lists most of the standards for modems, optical fibers, …

A first interesting plot is the evolution of the modems that allow to transmit data over the traditional telephone network. The figure below, based on information extracted from http://en.m.wikipedia.org/wiki/List_of_device_bandwidths shows the evolution of the modem technology. The first method to transfer data was the Morse code that appeared in the mid 1800s. After that, it took more than a century to move to the Bell 101 modem that was capable of transmitting data at 110 bits/sec. Slowly, 300 bps and later 1200 bps modems appeared. The late 1980s marked the arrival of faster modems with 9.6 kbps and later 28.8 and 56 kbps. This marked the highest bandwidth that was feasible on a traditional phone line. ISDN appeared in the late 1980s with a bandwidth of 64 kbps on digital lines that was later doubled.

When the telephone network become the bottleneck, telecommunication manufacturers and network operators moved to various types of Digital Subscriber Lines technologies, ADSL being the most widespread. From the early days at 1.5 Mbps downstream to the latests VDSL deployments, bandwidth has increased by almost two order of magnitude. As of this writing, it seems that xDSL technology is reaching its limits and while bandwidth will continue to grow, the rate of improvement will not remain as high as in the past. In parallel, CATV operators have deployed various versions of the DOCSIS standards to provide data services in cable networks. The next step is probably to go to fiber-based solutions, but they cost more than one order of magnitude more than DSL services and can be difficult to deploy in rural areas.

The performance of wireless networks has also significantly improved. As an illustration, and again based on data from http://en.m.wikipedia.org/wiki/List_of_device_bandwidths here is the theoretical maximum bandwidth for the various WiFi standards. From 2 Mbps for 802.11 in 1997, bandwidth increased to 54 Mbps in 2003 for 802.11g and 600 Mbps for 802.11n in 2009.

The datasets used in this post are partial. Suggestions for additional datasets that could be used to provide a more detailed view of the evolution of bandwidth are more than welcome. For optical fiber, an interesting figure appeared in Nature, see http://www.nature.com/nphoton/journal/v7/n5/fig_tab/nphoton.2013.94_F1.html

Flipping an advanced networking course

Tue, 11 Feb 2014 00:00:00 +0100

Flipping an advanced networking course

Before the beginning of the semester, Nick Feamster informed me that he decided to flip his advanced networking course . Various teachers have opted for flipped classrooms to increase the interaction with students. Instead of using class time to present theory, the teacher focuses his/her attention during the class on solving problems with the students. Various organisations of a flipped classroom have been tested. Often, the teacher posts short videos that explain the basic principles before the class and the students have to listen to the videos before attending the class. This is partially the approach adopted by Nick Feamster for his class.

By bringing more interaction in the classroom, the flipped approach is often considered to be more interesting for the teacher as well as for the student. Since my advanced networking class gathers only a few tens of students, compared to the 100+ and 300+ students of the other courses that I teach, I also decided to flip one course this year.

The advanced networking course is a follow-up to the basic networking course. I cover several advanced topics and aims at explaining to the students the operation of large Internet Service Provider networks. The main topics covered are :

Interdomain routing with BGP (route reflectors, traffic engineering, …)

Traffic control and Quality of Service (from basic mechanisms - shaping, policing, scheduling, buffer acceptance - to services - integrated or differentiated services)

IP Multicast and Multicast routing protocols

Multiprotocol Label Switching

Virtual Private Networks

Middleboxes

The course is complemented by projects during which the students configure and test realistic networks built from Linux-based routers.

During the last decade, I’ve taught this course by using slides and presenting them to the students and discussing the theoretical material. I could have used some of them to record videos explaining the basic principles, but I’m still not convinced by the benefits of using video online as a learning vehicle. Video is nice for demonstrations and short introductory material, but students need written material to understand the details. For this reason, I’ve decided to opt for a seminar-type approach where the students read one or two articles every week to understand the basic principles. Then, the class focuses on discussing real cases or exercises.

Many courses are organized as seminars during which the students read recent articles and discuss them. Often, these are advanced courses and the graduate students read and comment recent scientific articles. This approach was not applicable in many case given the maturity of the students who follow the advanced networking course. Instead of using purely scientific articles, I’ve opted for tutorial articles that appear in magazines such as IEEE Communications Magazine or the Internet Protocol Journal . These articles are easier to read by the students and often provide good tutorial content with references that the students can exploit if they need additional information.

The course has started a few weeks ago and the interaction with the student has been really nice. I’ll regularly post updates on the articles that I’ve used, the exercises that have been developed and the student’s reactions. Comments are, of course, welcome.

Happy eyeballs makes me unhappy…

Tue, 03 Dec 2013 00:00:00 +0100

Happy eyeballs makes me unhappy…

Happy eyeballs, defined in RFC 6555, is a technique that enables dual-stack hosts to automatically select between IPv6 and IPv4 based on their respective performance. When a dual-stack host tries to contact a webserver that is reachable over both IPv6 and IPv4, it :

first tries to establish a TCP connection towards the IPv6 or IPv4 address and starts a short timeout, say 300 msec

if the connection is established over the chosen address family, it continues

if the timer expires before the establishment of the connection, a second connection is tried with the other address family

Happy eyeballs works well when the one of the two address families provides bad performance or is broken. In this case, a host using happy eyeballs will automatically avoid the broken address family. However, when both IPv6 and IPv4 work correctly, happy eyeballs may cause frequent switches between the two address families.

As an exemple, here is a summary of a packet trace that I collected when contacting a dual-stack web server from my laptop using the latest version of MacOS.

First connection

09:40:47.504618 IP6 client6.65148 > server6.80: Flags [S], cksum 0xe3c1 (correct), seq 2500114810, win 65535, options [mss 1440,nop,wscale 4,nop,nop,TS val 1009628701 ecr 0,sackOK,eol], length 0
09:40:47.505886 IP6 server6.80 > client6.65148: Flags [S.], cksum 0x1abd (correct), seq 193439890, ack 2500114811, win 14280, options [mss 1440,sackOK,TS val 229630052 ecr 1009628701,nop,wscale 7], length 0

The interesting information in these packets are the TCP timestamps. Defined in RFC 1323, these timestamps are extracted from a local clock. The server returns its current timestamp in the SYN+ACK segment.

Thanks to happy eyeballs, the next TCP connection is sent over IPv4 (it might be faster than IPv6, who knows). IPv4 works well and answers immediately

09:40:49.512112 IP client4.65149 > server4.80: Flags [S], cksum 0xee77 (incorrect -> 0xb4bd), seq 321947613, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1009630706 ecr 0,sackOK,eol], length 0
09:40:49.513399 IP (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto TCP (6), length 60) server4.80 > client4.65149: Flags [S.], cksum 0xd86f (correct), seq 873275860, ack 321947614, win 5792, options [mss 1380,sackOK,TS val 585326122 ecr 1009630706,nop,wscale 7], length 0

Note the TS val in the returning SYN+ACK. The value over IPv4 is much larger than over IPv6. This is not because IPv6 is faster than IPv4, but indicates that there is a load-balancer that balances the TCP connections between (at least) two different servers.

Shortly after, I authenticated myself over an SSL connection that was established over IPv4

09:41:26.566362 IP client4.65152 > server4.443: Flags [S], cksum 0xee77 (incorrect -> 0x420d), seq 3856569828, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1009667710 ecr 0,sackOK,eol], length 0
09:41:26.567586 IP server4.443 > client4.65152: Flags [S.], cksum 0x933e (correct), seq 3461360247, ack 3856569829, win 14480, options [mss 1380,sackOK,TS val 229212430 ecr 1009667710,nop,wscale 7], length 0

Again, a closer look at the TCP timestamps reveals that there is a third server that terminated the TCP connection. Apparently, in this case this was the load-balancer itself that forwarded the data extracted from the connection to one of the server.

Thanks to happy eyeballs, my TCP connections reach different servers behind the load-balancer. This is annoying because the web servers maintain one session and every time I switch from one session to another I might switch from one server to another. In my experience, this happens randomly with this server, possibly as a function of the IP addresses that I’m using and the server load. As a user, I experience difficulties to log on the server or random logouts, while the problem lies in unexpected interactions between happy eyeballs and a load balancer. The load balancer would like to stick all the TCP connections from one host to the same server, but due to the frequent switchovers between IPv6 and IPv4 it cannot stick each client to a server.

I’d be interested in any suggestions on how to improve this load balancing scheme without changing the web servers…

Sandstorm : even faster TCP

Sun, 01 Dec 2013 00:00:00 +0100

Sandstorm : even faster TCP

Researchers have worked on improving the performance of TCP since the early 1980s. At that time, many researchers considered that achieving high performance with a software-based TCP implementation was impossible. Several new transport protocols were designed at that time such as XTP. Some researchers even explored the possibility of implementing transport protocols. Hardware-based implementations are usually interesting to achieve high performance, but they are usually too inflexible for a transport protocol. In parallel with this effort, some researchers continued to believe in TCP. Dave Clark and his colleagues demonstrated in [1] that TCP stacks could be optimized to achieve high performance.

TCP implementations continued to evolve in order to achieve even higher performance. The early 2000s, with the advent of Gigabit interfaces showed a better coupling between TCP and the hardware on the network interface. Many high-speed network interfaces can compute the TCP checksum in hardware, which reduces the load on the main CPU. Furthermore, high-speed interfaces often support large segment offload. A naive implementation of a TCP/IP stack would be to send segments and acknowledgements independently. For each segment sent/received, such a stack could need to process one interrupt, a very costly operation on current hardware. Large segment offload provides an alternative by exposing to the TCP/IP stack a large segment size, up to 64 KBytes. By sending an receiving larger segments, the TCP/IP stack minimizes the cost of processing the interrupts and thus maximises its performance.

Broadband in Europe

Sun, 01 Dec 2013 00:00:00 +0100

Broadband in Europe

The European Commission has published recently an interesting survey of the deployment of broadband access technologies in Europe. The analysis is part of the Commission’s Digital Agenda that aims at enabling home users to have speeds of 30 Mbps or more. Technologies like ADSL and cable modems are widely deployed in Europe and more than 95% of European households can subscribe to fixed broadband networks.

The report provides lots of data about the different technologies and countries. Some of the figures are worth being highlighted. A first interesting figure is the distribution of the different advanced access technologies. Satellite is widely available given its large footprint. DSL is also widely available, but less in rural areas. The newer technologies like VDSL, FTTP, WiMAX, LTE and DOCSIS3 cable start to appear.

Broadband coverage in Europe, source Study on broadband coverage 2012

For the standard fixed broadband network technologies, the top three in terms of coverage is Netherlands, Malta and Belgium. It remains simpler to largely deploy broadband in small countries. For the next generation broadband access, the same three countries remain at the top except that Malta has better coverage than the Netherlands. For rural regions, Luxemburg is the best country in terms of coverage.

The coverage map of next generation broadband coverage shows that some countries are much better covered than others. There is still work to be done by network operators to deploy advanced technologies to meet the 30 Mbps expectation of the Digital agenda.

Multipath RTP

Sat, 30 Nov 2013 00:00:00 +0100

Multipath RTP

Multipath TCP enables communicating nodes to use different interfaces and paths to reliably exchange data. In today’s Internet, most applications use TCP and can benefit from Multipath TCP. However, multimedia applications often use the Real Time Transport (RTP) on top of UDP. A few years after the initial work on Multipath TCP, researchers at Aalto university analyzed how RTP to be extended to support multiple path. Thanks to their work, there is now a backward compatible extension of RTP that can be used on multihomed hosts. This extension will be important to access mobile streaming websites that use the Real Time Streaming Protocol (RTSP).

The Software Defined Operator

Tue, 26 Nov 2013 00:00:00 +0100

The Software Defined Operator

Network operators are reconsidering the architecture of their networks to better address the quickly evolving traffic and connectivity requirements. DT is one of them and in a recent presentation at the Bell Labs Open Days in Antwerp, Axel Clauberg gave his vision of the next generation ISP network. This is not the first presentation that DT employees give on their TerraStream vision for future networks. However, there are some points that are worth being noted.

Lessons learned from SDN experiments and deployments

Mon, 25 Nov 2013 00:00:00 +0100

Lessons learned from SDN experiments and deployments

The scientific literature is full of papers that propose a new technique that (sometimes only slightly) improves the state of the art and evaluate its performance (by means of mathematical models, simulations are rarely experiments with real systems). During the last years, Software Defined Networking has seen a growing interest in both the scientific community and among vendors. Initially proposed as Stanford University, Software Defined Networking, aims at changing how networks are managed and operated. Today’s networks are composed of off-the-shelf devices that support standardized protocols with proprietary software and hardware implementations. Networked devices implement the data plane to forwarding packet and the control plane to correctly compute their forwarding table. Both planes are today implemented directly on the devices.

Software Defined Networking proposes to completely change how networks are built and managed. Networked devices still implement the data plane in hardware, but this data plane, or more precisely the forwarding table that controls its operation, is exposed through a simple API to software defined by the network operator to manage the network. This software runs on a controller and controls the update of the forwarding tables and the creation/removal of flows through the network according to policies defined by the network operator. Many papers have already been written on Software Defined Networking and entire workshops are already dedicated to this field.

A recently published paper, Maturing of OpenFlow and software-defined networking through deployments, written by M. Kobayashi and his colleagues analyzes Software Defined Networking from a different angle. This paper does not present a new contribution. Instead, it takes on step back and discusses the lessons that the networking group at Stanford have learned from designing, using and experimenting with the first Software Defined Networks that are used by real users. The paper discusses many of the projects carried out at Stanford in different phases, from the small lab experiments to international wide-area networks and using SDN for production traffic. For each phase, and this is probably the most interesting part of the paper, the authors highlight several of the lessons that they have learned from these deployments. Several of these lessons are worth being highlighted :

the size of the forwarding table on Openflow switches matters

the embedded CPU on networking devices is a barreer to innovation

virtualization and slicing and important when deployments are considered

the interactions between Openflow and existing protocols such as STP can cause problems. Still, it is unlikely that existing control plane protocols will disappear soon.

This paper is a must-read for researchers working on Software Defined Networks because it provides informations that are rarely discussed in scientific papers. Furthermore, it shows that eating your own dog food, i.e. really implementing and using the solutions that we propose in out papers is useful and has value.

Bibliography

[1] Masayoshi Kobayashi, Srini Seetharaman, Guru Parulkar, Guido Appenzeller, Joseph Little, Johan van Reijendam, Paul Weissmann, Nick McKeown, Maturing of OpenFlow and software-defined networking through deployments, Computer Networks, Available online 18 November 2013, ISSN 1389-1286, http://dx.doi.org/10.1016/j.bjp.2013.10.011.

Another type of attack on Multipath TCP ?

Sun, 24 Nov 2013 00:00:00 +0100

Another type of attack on Multipath TCP ?

In a recent paper presented at Hotnets, M. Zubair Shafiq and his colleagues discuss a new type of “attack” on Multipath TCP.

When the paper was announced on the Multipath TCP mailing list, I was somewhat concerned by the title. However, after having read it in details, I do not consider the inference “attack” discussed in this paper as a concern. The paper explains that thanks to Multipath TCP, it is possible for an operator to infer about the performance of “another operator” by observing the Multipath TCP packets that pass through its own network. The “attack” is discussed in the paper and some measurements are carried out in the lab to show that it is possible to infer some characteristics about the performance of the other network.

After having read the paper, I don’t think that the problem is severe and should be classified as an “attack”. First, if I want to test the performance of TCP in my competitor’s network, I can easily subscribe to this network, in particular for wireless networks that would likely benefit from Multipath TCP. There are even public measurements facilities that collect measurement data, see SamKnows, the FCC measurement app, speedtest or MLab.

More fundamentally, if an operator observes one subflow of a Multipath TCP connection, it cannot easily determine how many subflows are used in this Multipath TCP connection and what are the endpoints of these subflows. Without this information, it becomes more difficult to infer TCP performance in another specific network.

The technique proposed in the paper mainly considers the measurement throughput on each subflow as a time series whose evolution needs to be predicted. A passive measurement device could get more accurate predictions by looking at the packets that are exchanged, in particular the DATA level sequence number and acknowledgements. There is plenty of room to improve the inference technique described in this paper. Once Multipath TCP gets widely deployed and used for many applications, it might be possible to extend the technique to learn more about the performance of TCP in the global Internet.

The Multipath TCP buzz

Fri, 27 Sep 2013 00:00:00 +0200

The Multipath TCP buzz

The inclusion of Multipath TCP in iOS7 last week was a nice surprise for the designers and first implementors of the protocol. The initial announcement created a buzz that was echoed by many online publications :

A little-Heralded New iOS7 Feature : Multipath TCP on http://slahdot.org

Apple iOS 7 surprises as first with new multipath TCP connections on http://www.networkworld.com

Apple’s iOS 7 includes a surprise: a ticket to the next generation of the internet on http://qz.com

Apple found to be using advanced Multipath TCP networking in iOS 7 on http://appleinsider.com

iOS 7 becomes first commercial software to support multipath TCP, allowing simultaneous Wi-Fi and cell network connections on http://9to5mac.com

iOS 7 found to sport new networking tech on http://news.cnet.com

Apple includes Multipath TCP networking in iOS 7

Multipath TCP lets Siri seamlessly switch between Wi-Fi and 3G/LTE on http://arstechnica.com

iOS 7 exploite le Wi-Fi et la 3G en meme temps on http://www.macworld.fr

The same information also appeared in news sites in Spanish, Norwegian, Japanese, Chinese, Portugese (see 1, 2, 3) and various blogs. See Google news search for recent links.

If you’ve seen postings about Multipath TCP in other major online or print publications, let me know.

Computer Networking : starting from the principles

Mon, 23 Sep 2013 00:00:00 +0200

Computer Networking : starting from the principles

In 2009, I took my first sabbatical and decided to spend a large fraction of my time to write the open-source Computer Networking : Principles, Protocols and Practice ebook. This ebook was well received by the community and it received a 20,000$ award from the Saylor foundation that published it on iTunes.

There are two approaches to teach standard computer networking classes. Most textbooks are structured on the basis of the OSI or TCP/IP layered reference models. The most popular organisation is the bottom-up approach. Students start by learning about the physical layer, then move to datalink, … This approach is used by Computer Networks among others. Almost a decade ago, Kurose and Ross took the opposite approach and started from the application layer. I liked this approach and have adopted a similar one for the first edition of Computer Networking : Principles, Protocols and Practice.

However, after a few years of experience of using the textbook with students and discussions with several colleagues who were using parts of the text, I’ve decided to revise it. This is a major revision that will include the following major modifications.

the second edition of the ebook will use an hybrid approach. One half of the ebook will be devoted to the key principles that any CS student must know. To explain these principles, the ebook will start from the physical layer and go up to the application. The main objective of this first part is to give the students a broad picture of the operation of computer networks without entering into any protocol detail. Several existing books discuss this briefly in their introduction or first chapter, but one chapter is not sufficient to grasp this entirely.

the second edition will discuss at least two different protocols in each layer to allow the students to compare different designs.

the application layer will continue to cover DNS, HTTP but will also include different types of remote procedure calls

the transport layer will continue to explain UDP and TCP, but will also cover SCTP. SCTP is cleaner than TCP and provides a different design for the students.

the network layer will continue to cover the data and control planes. In the control plane, RIP, OSPF and BGP remain, except that iBGP will probably not be covered due to time constraints. Concerning the data plane, given the same time constraints, we can only cover two protocols. The first edition covered IPv4 and IPv6. The second edition will cover IPv6 and MPLS. Describing MPLS (the basics, not all details about LDP and RSVP-TE, more on this in a few weeks) is important to show a different design than IP to the students. Once this choice has been made, one needs to select between IPv4 and IPv6. Covering both protocols is a waste of student’s time and the second edition will only discuss IPv6. A this point, it appears that IPv6 is more future-proof than IPv4. The description of IPv4 can still be found in the first edition of the ebook.

the datalink layer will continue to cover Ethernet and WiFi. Zigbee or other techniques could appear in future minor revisions

Practice remains an important skill that networking students need to learn. The second edition will include IPv6 labs built on top of netkit to allow the students to learn how to perform basic network configuration and management tasks on Linux.

The second edition of the book will be tested by the students who follow INGI2141 at UCL. The source code is available from https://github.com/CNP3/ebook and drafts will be posted on http://cnp3bis.info.ucl.ac.be/ every Wednesday during this semester.

Sources of networking information

Sat, 21 Sep 2013 00:00:00 +0200

Sources of networking information

Students who start their Master thesis in networking have sometimes difficulties in locating scientific information which is related to their Master thesis’ topic. Many of them start by googling with a few keywords and find random documents and wikipedia pages. To aid them, I list below some relevant sources of scientific information about networking in general. The list is far from complete and biased by my own research interests which do not cover the entire networking domain.

Digital Libraries

During the last decade, publishers of scientific journals and conference organizers have created large digital libraries that are accessible through a web portal. Many of them are protected by a paywall that provides full access only to paid subscribers, but many universities have (costly) subscriptions to (some of) these librairies. Most of these digital librairies provide access to table of contents and abstracts.

Association for Computing Machinery : http://dl.acm.org

Institute of Electronics and Electrical Engineers : http://ieeexplore.ieee.org/Xplore/home.jsp

Springer Verlag : http://www.springer.com/computer?SGWID=0-146-0-0-0

Elsevier : http://www.sciencedirect.com

USENIX Association : http://www.usenix.org

arXiv : http://arxiv.org/

Magazines

Communications of the ACM : http://cacm.acm.org

Internet Protocol Journal : http://www.cisco.com/web/about/ac123/ac147/about_cisco_the_internet_protocol_journal.html

SIGCOMM Computer Communication Review : http://www.sigcomm.org/ccr/papers/

IEEE Network Magazine : http://www.comsoc.org/netmag

IEEE Internet Computing Magazine : http://www.computer.org/portal/web/computingnow/internetcomputing

IEEE Communications Magazine : http://www.comsoc.org/commag

Conferences

SIGCOMM : http://www.sigcomm.org

INFOCOM : http://www.ieee-infocom.org/

CoNEXT : http://www.sigcomm.org/events/conext-conference

Networked Systems Design and Implementation (NSDI) : https://www.usenix.org/conference/nsdi13

USENIX Annual Technical Conference : https://www.usenix.org/conference/atc13

IFIP Networking : http://ifip.informatik.uni-hamburg.de/ifip/tc/6/events

Internet Measurements Conference : http://www.sigcomm.org/events/imc-conference/

Journals

IEEE/ACM Transactions on Networking : http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=90

Computer Networks : http://www.journals.elsevier.com/computer-networks/

Computer Communications : http://www.journals.elsevier.com/computer-communications/

ACM Transactions on Internet Technology : http://toit.acm.org/

IEEE Transactions on Network and Service Management : http://www.comsoc.org/tnsm

International Journal of Network Management : http://eu.wiley.com/WileyCDA/WileyTitle/productCd-NEM.html

IEEE Communications Surveys and Tutorials : http://www.comsoc.org/cst

ACM Computing Surveys : http://csur.acm.org/

Standardisation bodies

Internet Engineering Task Force : http://www.ietf.org

Internet drafts : http://www.ietf.org/id-info/

Requests for Comments : http://www.ietf.org/rfc.html

Internet Research Task Force : http://www.irtf.org

IEEE : http://www.ieee802.org/

ETSI : http://www.etsi.org/

Is your network ready for iOS7 and Multipath TCP ?

Fri, 20 Sep 2013 00:00:00 +0200

Is your network ready for iOS7 and Multipath TCP ?

During the last days, millions of users have installed iOS7 on their iphones and ipad. Estimates published by The Guardian reveal that more than one third of the users have already upgraded their devices to support the new release. As I still don’t use a smartphone, I usually don’t check these new software releases. From a networking viewpoint, this iOS update is different because it is the first step towards a wide deployment of Multipath TCP [RFC 6824]. Until now, Multipath TCP has mainly been used by researchers. With iOS7, the situation changes since millions of devices are capable of using Multipath TCP.

From a networking viewpoint, the deployment of Multipath TCP is an important change that will affect many network operators. In the 20th century, networks were only composed of routers and switches. These devices are completely transparent to TCP and never change any field of the TCP header or payload. Today’s networks, mainly enterprise and cellular networks are much more complex. They include various types of middleboxes that process the IP header but also analyze the TCP headers and payload and sometimes modify them for various. Michio Honda an his colleagues presented at IMC2011 a paper that reveals the impact of these middleboxes on TCP and its extensibility. In a nutshell, this paper revealed the following behaviors :

some middleboxes drop TCP options that they do not understand

some middleboxes replace TCP options by dummy options

some middleboxes change fields of the TCP header (source and destination ports for NAT, but also sequence/acknowledgement numbers, window fields, …)

some middleboxes inspect the payload of TCP segments, reject out-of-sequence segments and sometimes modify the TCP payload (e.g. ALG for ftp on NAT)

These results had a huge influence on the design of Multipath TCP that includes various mechanisms that enable it to work around most of these middleboxes and fallback to regular TCP in case of problems (e.g. payload modifications) to preserve connectivity.

Of course, Multipath TCP will achieve the best performance when running in a network which is fully transparent and does not include middleboxes that interfere with it. Network operators might have difficulties to check the possible interference between their devices and TCP extensions like Multipath TCP. While implementing Multipath TCP in the Linux kernel, we spent a lot of time understanding the interference caused by our standard firewall that randomizes TCP sequence numbers.

To support network operators who want to check the transparency of their network, we have recently released a new open-source software called tracebox. tracebox is described in a forthcoming paper that will be presented at IMC2013.

In a nutshell, tracebox can be considered as an extension to traceroute. Like traceroute, it allows to discover devices in a network. However, while traceroute only detects IP routers, tracebox is able to detect any type of middlebox that modify some fields of the network or transport header. tracebox can be used as a command-line tool but also includes a scripting language that allows operators to develop more complex tests.

For example, tracebox can be used to verify that a path is transparent for Multipath TCP as shown below

# tracebox -n -p IP/TCP/MSS/MPCAPABLE/WSCALE bahn.de
tracebox to 81.200.198.6 (bahn.de): 64 hops max
130.104.228.126 IP::CheckSum
130.104.254.229 IP::TTL IP::CheckSum
193.191.3.85 IP::TTL IP::CheckSum
193.191.16.21 IP::TTL IP::CheckSum
195.69.144.123 IP::TTL IP::CheckSum
145.254.5.158 IP::TTL IP::CheckSum
88.79.13.62 IP::TTL IP::CheckSum
81.200.194.234 IP::TTL IP::CheckSum
81.200.197.9 IP::TTL IP::CheckSum
81.200.198.6 TCP::CheckSum IP::TTL IP::CheckSum TCPOptionMaxSegSize::MaxSegSize -TCPOptionMPTCPCapable -TCPOptionWindowScale

At each hop, tracebox verifies which fields of the IP/TCP headers have been modified. In the trace above, tracebox sends a SYN TCP segment on port 80 that contains MSS, MP_CAPABLE and WSCALE option. The last hop corresponds to a middlebox that changes the MSS option and removes the MP_CAPABLE and WSCALE option. Thanks to the flexibility of tracebox, it is possible to use it to detect almost any type of middlebox interference.

You can use it on Linux and MacOS to verify whether the network that you use is fully transparent to TCP. If not, tracebox will point you to the offending middlebox.

Apple seems to also believe in Multipath TCP

Wed, 18 Sep 2013 00:00:00 +0200

Apple seems to also believe in Multipath TCP

Multipath TCP is a TCP extension that allows a TCP connection to send/receive packets over different interfaces. Multipath TCP has various use cases, including :

enable smartphones to use their WiFi and 3G interfaces simultaneously or in failover modes

improve TCP performance in datacenters to exploiting multiple paths

improve performance on dual stack hosts running IPv4 and IPv6

…

Designing such a major TCP extension has been a difficult problem and took a lot of effort within several research projects. The work started within the FP7 Trilogy project funded by the European Commission. It continues within the CHANGE and Trilogy 2 projects.

After five years of effort, we are getting close to a wide adoption of Multipath TCP.

In January 2013, the IETF published the Multipath specification as an Experimental standard in RFC 6824

In July 2013, the MPTCP working group reported three independent implementations of Multipath TCP, including our implementation in the Linux kernel. To my knowledge, this is the first time that a large TCP extension is implemented so quickly.

On September 18th, 2013, Apple releases iOS7 which includes the first large scale commercial deployment of Multipath TCP. Given the marketing buzz around new iOS7 releases, when can expect tens of millions of users who will use a Multipath TCP enabled device.

Packet traces collected on an iPad running iOS7 reveal that it uses Multipath TCP to reach some destinations that seem to be directly controlled by Apple. You won’t see Multipath TCP for regular TCP connections from applications like Safari, but if you use SIRI, you might see that the connection with one of the apple servers uses Multipath TCP. The screenshot below shows the third ACK of a three-way handshake sent by an ipad running iOS7.

At this stage, the actual usage of Multipath TCP by iOS7 is unclear to me. If you have any hint on the type of information exchanged over this SSL connection, let me know.

The next step will, of course, be the utilisation of Multipath TCP by default for all applications running over iOS7.

Quickly producing time-sequence diagrams

Tue, 10 Sep 2013 00:00:00 +0200

Quickly producing time-sequence diagrams

Networking researchers and teachers often need to draw time-sequence diagrams that represent the exchange of packets through a network. Any drawing tool can be used to write these diagrams that contains mainly lines, arrows and text. However, while writing an article or a textbook, switching from the text to the drawing tool can be cumbersome.

A better approach would to write a description of the diagrams directly in the text as a set of commands in a simple langage. Latex hackers can probably manage this easily, but I’m far from a latex guru. Thanks to Benjamin Hesmans, I recently found an interesting software called MSCGen. MSCGen was designed to write Message Sequence Chart descriptions. It produces SVG and PNG images and is integrated with sphinx thanks to mscgen extension. This integration is very useful since it allows to write both images and text directly in ascii.

The langage supported by mscgen is similar to the DOT langage used by graphviz and is very easy to use. For example, the code below

.. msc::

    a [label="", linecolour=white],
    b [label="Host A", linecolour=black],
    z [label="Physical link", linecolour=white],
    c [label="Host B", linecolour=black],
    d [label="", linecolour=white];

    a=>b [ label = "DATA.req(0)" ] ,
    b>>c [ label = "", arcskip=1];
    c=>d [ label = "DATA.ind(1)" ];

Produces the following image.

The only drawback of MSCGen is that it is currently difficult to write a diagram that contains a window of packets that are exchanged and the opposite flow of the acknowledgements. Besides that, I’m planning to use it to produce all time sequence diagrams in the planned revision of Computer Networking : Principles, Protocols and Practice

Adding bibliographic information to pdf files

Tue, 27 Aug 2013 00:00:00 +0200

Adding bibliographic information to pdf files

Researchers often distribute pdf files of their articles on their homepages or through institutional repositories like DIAL. Researchers are encouraged to distribute their scientific papers electronically and measurements have shown that distributing papers online improves the impact of the papers. Still, there is often one important information which is missing when a paper is posted on a website : the precise bibliography information which is needed to cite the paper. Without this bibliographic information, readers of a paper my print or save it without knowing where it has been published and are more likely to ignore it when preparing the bibliography of their own papers.

A better approach is to add directly the bibliographic information inside the pdf file. This is what the default ACM Latex style provides for accepted papers. For the SIGCOMM ebook on Recent Advances in Networking, we opted for a simple note on each paper.

How quickly can we scan the entire Internet

Thu, 22 Aug 2013 00:00:00 +0200

How quickly can we scan the entire Internet

A random host on the Internet receives a large number of unsollicited packets. Some of these packets are caused by transmission errors that modify the destination address of the packets or bugs/implementation errors. Still, most of the background Internet noise observed by network telescopes comes from worms that try to propagate or researchers, security experts or attackers trying to find characteristics of remote hosts.

When researchers try to map the Internet, they usually operate slowly. For example, CAIDA takes a few days to send traceroute probes towards all reachable class C networks. The 2012 anonymous Internet Census that exploited a large number of vulnerable routers to serve as probes took months. nmap, the default tool to probe open services on a remote host or network also uses a slow mode of operation. These slow modes of operations are mainly chosen to avoid triggering alarms on the remote sites. A few packets can be easily unnoticed on an entreprise networks, not millions of them.

A recent paper presented at the USENIX 2013 Security symposium takes a completely different approach.

Adding hyperlinks to our Latex articles

Tue, 20 Aug 2013 00:00:00 +0200

Adding hyperlinks to our Latex articles

When they write papers, scientists take a lot of time in preparing their bibliography and correctly citing all their references. However, bibliographies and the corresponding bibtex styles were designed when everyone read scientific papers on paper. This is rarely the case today and most scientific papers are read online. Still, we insist on placing volume numbers, pages numbers and other information from the paper era in each paper but rarely URLs or DOIs. This is probably a mistake…

When developing the first edition of Computer Networking : Principles, Protocols and Practice I quickly found that students read references provided that these references were easily accessible through hyperlinks. Today’s students and I guess a growing number of researchers are used to browse the web but rarely go to their library to read articles on paper. For the recently published SIGCOMM ebook on Recent Advances in Networking, we did a small experiment in adding hyperlinks directly to each chapter in pdf format. Adding these hyperlinks was surprisingly easy and I hope useful for the readers.

TCP over UDP : a new hack to pass through (some) middleboxes

Thu, 04 Jul 2013 00:00:00 +0200

TCP over UDP : a new hack to pass through (some) middleboxes

Extending TCP in the presence of middleboxes is a difficult but not impossible task as shown by Multipath TCP RFC 6824. A recent IETF draft proposed by Apple suggests to encapsulate TCP segments inside UDP to prevent modifications performed by middleboxes. Apparently, some measurements indicate that UDP passes better through some types of NAT boxes that regular TCP segments. Since TCP is more widely used than UDP, the draft proposes to encapsulate TCP inside UDP. The proposed encapsulation technique is a bit unusual. A classical encapsulation would put the entire TCP segment after the UDP header. Instead, the TCP-over-UDP draft proposes to rewrite the TCP header as follows

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            Length             |           Checksum            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |           | |A|P|R|S|F|                               |
| Offset| Reserved  |0|C|S|S|Y|I|            Window             |
|       |           | |K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      (Optional) Options                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Should we completely deprecate IP fragmentation ?

Tue, 25 Jun 2013 00:00:00 +0200

Should we completely deprecate IP fragmentation ?

Fragmentation and reassembly have been part of the IPv4 specification seems the beginning. One of the main motivations for including such mechanisms in the network layer is of course to allow IP packets to be exchanged over subnetworks that support different packet sizes. The IPv4 fragmentation forced routers to be able to fragment too large fragments. When routers were entirely software based, doing fragmentation on the router was a viable solution. However, with the advent of hardware assisted routers, performing fragmentation on the routers became quickly too expensive. In a seminal paper, Christopher Kent and Jeff Mogul argued that fragmentation should be considered harmful. This encourage endhosts to avoid in-network packet fragmentation and most TCP implementations now include Path MTU discovery RFC 1191.

When IPv6 was designed, in-network fragmentation was quickly left out. However, the designers of IPv6 still believed in the benefits of fragmentation. IPv6 supports a fragmentation header that can be used by endhosts to fragment packets that are too large for a given path. One of the motivation for host based fragmentation is that some packets need to be transmitted over subnets that only support small packet sizes (IPv6 mandates a minimum MTU of 1280 bytes).

Don’t use ping for accurate delay measurements

Wed, 22 May 2013 00:00:00 +0200

Don’t use ping for accurate delay measurements

The ping software was designed decades ago to verify the reachability of a given IPv4 address. For this, it relies on ICMP that runs on top of IPv4. A host that receives an ICMP echo request message is supposed to reply immediately by sending an ICMP echo reply message. This confirms the reachability of the remote host. By measuring the delay between the transmission of the echo request message and the reception of the echo reply message, it is possible to infer the round-trip-time between the two hosts. Since the round-trip-time is important for the performance of many Internet protocols, this is an important metric which is reported by ping. Some variants of ping also report the minimum and maximum delays after measuring a number of round-trip-times. A typical example is shown below

ping www.psg.com
PING psg.com (147.28.0.62): 56 data bytes
bytes from 147.28.0.62: icmp_seq=0 ttl=48 time=148.715 ms
bytes from 147.28.0.62: icmp_seq=1 ttl=48 time=163.814 ms
bytes from 147.28.0.62: icmp_seq=2 ttl=48 time=148.780 ms
bytes from 147.28.0.62: icmp_seq=3 ttl=48 time=153.456 ms
bytes from 147.28.0.62: icmp_seq=4 ttl=48 time=148.935 ms
bytes from 147.28.0.62: icmp_seq=5 ttl=48 time=153.647 ms
bytes from 147.28.0.62: icmp_seq=6 ttl=48 time=148.682 ms
bytes from 147.28.0.62: icmp_seq=7 ttl=48 time=163.926 ms
bytes from 147.28.0.62: icmp_seq=8 ttl=48 time=148.669 ms
bytes from 147.28.0.62: icmp_seq=9 ttl=48 time=153.352 ms
bytes from 147.28.0.62: icmp_seq=10 ttl=48 time=163.688 ms
bytes from 147.28.0.62: icmp_seq=11 ttl=48 time=148.729 ms
bytes from 147.28.0.62: icmp_seq=12 ttl=48 time=163.691 ms
bytes from 147.28.0.62: icmp_seq=13 ttl=48 time=148.536 ms
^C
--- psg.com ping statistics ---
packets transmitted, 14 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 148.536/154.044/163.926/6.429 ms

In Computer Networking, Principles, Protocols and Practice, the following figure was used to illustrate the variation of the round-trip-time. This measurement was taken more than ten years ago between a host connected to a CATV modem in Charleroi and a server at the University of Namur. The main reason for the delay variations were the utilisation of the low speed link that we used at that time.

Evolution of the round-trip-time between two hosts

In a recent presentation at RIPE66, Randy Bush and several of his colleagues revealed some unexpected measurements collected by using ping. For these measurements, they used two unloaded servers and sent pings through mainly backbone networks. The figure below shows the CDF of the measured delays. The staircase curves were the first curves that they obtained. These delays look strange and several plateaux appear but it is not easy to find a clear explanation immediately.

Source : https://ripe66.ripe.net/presentations/128-130513.tokyo-ping.pdf

They studied these delays in more details and tried to understand the reason for the huge delay variations that they observed. To understand the source of the delay variations, it is useful to look back at the format of an ICMP message encapsulated inside an IPv4 packet.

The important part in this header is the first 32 bits word of the ICMPv4 header. For TCP and UDP, this word contains the source and destination ports of the transport flow. Many routers that support Equal Cost Multipath will compute a hash function over the source and destination IP addresses and ports for packets carrying TCP/UDP segments. However, how should such a load balancing router handle ICMP messages or other types of protocols that run directly on top of IPv4. A first option would be to always send ICMP messages over the same path, i.e. disable load balancing for ICMP messages. This is probably not a good idea from an operational viewpoint since this would imply that ICMP messages, that are often used for debugging, would not necessarily follow the same paths as regular packets. A better option would be to only use the source and destination IP addresses when load balancing ICMP messages. However, this requires the router to detect distinguish between UDP/TCP and other types of flows and react in function of the Protocol field of the IP header. This likely increases the cost of implementing load-balancing in hardware. The measurements presented above are probably, at least partially, caused by load-balancing routers that use the first 32 bits word of the IP payload to make their load balancing decision, without verifying the Protocol field in the IP header. The vertical bars shown in the figure above correspond to a modified ping that always send ICMP messages that start with the same first 32 bits word. However, this does not completely explain why there is a delay difference of more than 15 milliseconds on the equal cost paths between two servers. Something else might be happening in this network.

Additional details are discussed in On the suitability of ping to measure latency by Cristel Pelsser, Luca Cittadini, Stefano Vissicchio and Randy Bush. See https://ripe66.ripe.net/archives/video/12/ for the recorded video.

Disruption free reconfiguration for link-state routing protocols implementable on real routers

Thu, 09 May 2013 00:00:00 +0200

Disruption free reconfiguration for link-state routing protocols implementable on real routers

Link state routing protocols like OSPF and IS-IS flood information about the network topology to all routers. When a link needs to be shutdown or its weight changes, a new link state packet is flooded again. Each router processes the received link state packet and updates its forwarding table. Updating the forwarding table of all routers inside a large network can take several hundreds of milliseconds up to a few seconds depending on the configuration of the routing protocol. For his PhD thesis, Pierre Francois studied this problem in details. He first designed a simulator to analyse all the factors that influence the convergence time in a large ISP network. This analysis revealed that it is difficult for a large network to converge within less than 100-200 milliseconds after a topology change. During this period, transient loops can happen and packets get loss even for planned and non-urgent topology changes. Pierre Francois designed several techniques to avoid these transient loops during the convergence of a link-state routing protocol. The first solution required changes to the link-state routing protocol. We thought that getting IETF consensus on this solution would be very difficult. We were right since the framework draft has still not yet been adopted.

At INFOCOM 2007, by refining an intuition proposed by Mike Shand, we proposed a solution that does not require standardisation. This solution relies on the fact that if the weight of a link is changed by one (increase or decrease), no transient loop can happen. However, using only unit weight changes is not practical given the wide range of weights in real networks. Fortunately, our paper showed that when shutting down a link (i.e. setting its weight to infinity), it is possible to use a small number of metric increments to safely perform the reconfiguration. This metric reconfiguration was well accepted by the research community and received the INFOCOM 2007 best paper award.

The algorithms proposed in our INFOCOM 2007 paper were implemented in Java and took a few seconds or more to run. It was possible to run them on a management platform, but not on a router. Two years ago, Pascal Merindol and Francois Clad reconsidered the problem. They found new ways of expressing the proofs that enable loop-free reconfiguration of link-state protocols upon topology changes. Furthermore, they have also significantly improved the performance of the algorithms proposed in 2007 and reimplemented everything in C. The new algorithms operate one order of magnitude faster than the 2007 version. It becomes now possible to implement them directly on routers to enable OSPF and IS-IS to avoid low transient loops when dealing with non-urgent topology changes.

All the details are provided in our forthcoming paper that will appear in IEEE/ACM Transactions on Networking.

Hash functions are not the only answer to load balancing problems

Thu, 11 Apr 2013 00:00:00 +0200

Hash functions are not the only answer to load balancing problems

Load balancing is important in many networks because it allows to spread the load over different ressources. Load balancing can happen at different layers of the protocol stack. A web server farm uses load balancing to distribute the load between different servers. Routers rely on Equal Cost Multipath (ECMP) to balance packets without breaking TCP flows. Bonding enables to combine several links in the datalink layer.

Many deployments of load balancing rely on the utilisation of hash functions. ECMP is a typical example. When a router has several equal cost paths towards the same destination, it can use a hash function to distribute the packets over these paths. Typically, the router would compute for each packet to be forwarded hash(IPsrc, IPdst, Portsrc, Portdst) mod N where N is the number of equal cost paths towards the destination to select the nexthop. Various hash functions have been evaluated in the literature RFC 2992 : CRC, checksum, MD5, … Many load balancing techniques have adopted hash functions because they can be efficiently computed. An important criteria in selecting a function to perform load balancing is whether a small change in its input gives a large change in its output. This is sometimes called the avalanche effect and is a requirement for strong hash functions used in crypto applications. Unfortunately, hash functions have one important drawback : it is very difficult to predict an input that would lead to a given output. For crypto applications, this is the expectation, but many load balancing applications would like to be able to predict the output of the load balancing function while still benefitting from the avalanche effect.

In a recent paper, we have shown that it is possible to design a load balancing technique that both provides the avalanche effect (which is key for a good balancing of the traffic) and is predictable. The intuition behind this idea is that the hash function can be replaced by a block cipher. Block ciphers are usually used to encrypt/decrypt information by using a secret key. Since they are designed to provide an output that appears as random as possible for a wide range of inputs, they also exhibit the avalanche effect. Furthermore, provided that the key is known, the input that leads to a given output can be easily predicted. Our paper provides all the details and shows the benefits that such a technique can provide with Multipath TCP in datacenter networks, but there are many other potential applications.

Humming in the classroom

Tue, 05 Feb 2013 00:00:00 +0100

Humming in the classroom

One of the challenges of teaching to large classes is to encourage the students to interact during the class. Today’s professors do not simply read their course to passive students. Most try to initiate interaction with the students by asking questions, polling the students opinions, … However, my experience in asking questions to students in a large class shows that it is difficult to get answers from many students. Asking the students to raise their hands to vote for a binary question almost always results in : - a small fraction of the students vote for the correct answer - a (usually smaller) fraction of the students vote for the wrong answer - most of the students do not vote

One of the reasons why students do not vote is that they are unsure about their answer and do not want their colleagues or worse their professor to notice that they had the wrong answer. Unfortunately, this decreases the engagement of the students and after some time some of them do not even think about the questions that are asked.

To cope with those shy students, I’ve started to ask them to hum to answer some of my questions. Humming is a technique that is used by the IETF to evaluate consensus during a working group meeting. The IETF develops the specifications for most of the protocols that are used on the Internet. An IETF working group is responsible for a given protocol. IETF participants meet every quarter together. During these meetings, engineers and protocol experts discuss about the new protocol specifications being developed. From time to time, working group chairs need to evaluate whether the working group agrees with one proposal. This question can be discussed on the working group’s mailing list. Another possibility would be to use a show of hands during the meeting, but during a show of hands, it is possible to recognize who is in favor and who is against a given proposal. This is not always desirable. The IETF uses a nice trick to solve this problem. Instead of asking participants to raise their hands, working group chairmans ask participants to hum. If most of the participants in the room hum, the noise level is high and the vote is accepted. Otherwise, if the noise level is similar in favor and against a proposal, then there is no consensus and the proposal will need to be discussed at another meeting later.

Humming works well in the classroom as well when asking a binary question or a question having a small number of possible answers. Students can give their opinion without revealing it to the professor. Of course, electronic voting systems can be used to preserve the anonymity of students, but deploying these systems in large classes is more and costly and more time consuming than humming…

References

1. Hoffman, The Tao of IETF, http://www.ietf.org/tao.html
Tomas Carlsson, The Unique Political Soul of the IETF, IETF Journal December 2007

Looking at the DNS through a looking glass

Tue, 22 Jan 2013 00:00:00 +0100

Looking at the DNS through a looking glass

In the networking community a looking glass is often a router located inside an ISP network that can be contacted openly via a telnet server or sometimes an HTTP server. These looking glasses are very useful to debug networking problems since they can be used to detect filtering of BGP routes or other problems.

A nice example of these looking glasses is the web server maintained by GEANT the pan-European research network. The GEANT looking glass provides dumps of BGP routing tables, traceroutes and other tools from most routers of GEANT. As an example, GEANT routers have three routes towards network 8.8.8.0/24 that includes the open DNS resolver managed by google.

inet.0: 433796 destinations, 1721271 routes (433773 active, 12 holddown, 755 hidden)
+ = Active Route, - = Last Active, * = Both

8.8.8.0/24         *[BGP/170] 3w6d 01:51:39, MED 0, localpref 95
                      AS path: 15169 I
                    > to 62.40.125.202 via xe-4/1/0.0
                    [BGP/170] 20w6d 03:35:32, MED 0, localpref 80
                      AS path: 3356 15169 I
                    > to 212.162.4.9 via xe-2/1/0.0
                    [BGP/170] 4w5d 04:06:28, MED 0, localpref 80
                      AS path: 174 15169 I
                    > to 149.6.42.73 via ge-6/1/0.0

As on all BGP routers, the best path which is actually used to forward packets is prefixed by *. Network operators have deployed many looking glasses, a list can be found on http://www.lookinglass.org/ and www.traceroute.org among others.

The correction operation of today’s Internet does not only depend on the propagation of BGP prefixes. Another frequent issue today is the correct dissemination of DNS names. In the early days, command-line tools like nslookup and dig were sufficient to detect most DNS problems. Today, this is not always the case anymore since Content Distribution Networks provide different DNS answers to the same DNS request coming from different clients. Furthermore, some network operators use DNS resolvers that sometimes provide invalid answers to some DNS requests. Some of these DNS resolvers are deployed due to legal constraints as some countries block Internet access to some sites. However, some ISPs have sometimes less legal reasons to deploy fake DNS resolvers as shown recently in France where Free deployed DNS masquerading to block access to some Internet advertisement companies. Checking the correct distribution of DNS names becomes now an operational problem. Several authors have proposed tools to examine the answers provided by the DNS to remote clients. Stéphane Bortzmeyer, who has sent many patches and improvements for the CNP3, book has developed a very interesting alternative to these DNS looking glasses. dns-lg can be used manually through a web server, but can also be used through an API that provides JSON output. This is pretty interesting to develop automated tests. An interesting feature of dns-lg is its REST API that allows to easily query the looking glass. For example, http://dns.bortzmeyer.org/multipath-tcp.org/NS returns the NS record for multipath-tcp.org while http://dns.bortzmeyer.org/www.uclouvain.be/AAAA returns the IPv6 address (AAAA) record of UCL’s web server. Thanks to its web interface, dns-lg could be a very nice alternative for students who have difficulties to use classical command line tools when they start learning networking.

A closer look at broadband and cellular performance

Sat, 05 Jan 2013 00:00:00 +0100

A closer look at broadband and cellular performance

The Internet started almost exactly 30 years ago when ARPANet switched to the TCP/IP protocol suite. At that time, only a few experimental hosts were connected to the network. Three decades later, the Internet is part of our lives and most users access the Internet by using broadband access or cellular networks. Understanding the performance characteristics of these networks is essential to understand the factors that influence Internet protocols.

In 2011, Srikanth Sundaresan and several other researchers presented a very interesting paper [SdDF+11] at SIGCOMM that analysed a large number of measurement studies conducted by using modified home access routers. Two types of devices were used : dozen of home routers modified by the Bismark project and several thousands of measurement boxes deployed by Samknows throughout the US. This paper revealed that there is a huge diversity in broadband performance in the US. This diversity depends on the chosen ISP, the chosen plan and also the geographical location. The paper was later revised and published in Communications of the ACM, ACM’s flagship magazine [SdDF+12].

During the last edition of the Internet Measurements Conference, Joel Somers, Paul Barford and Igor Canadi presented two papers that analyse Internet performance from a different viewpoint. [CBS12] uses data from http://www.speedtest.net and SamKnows to analyse broadband performance. This enables them to provide a much more global analysis of broadband performance. For example, the figure below shows the average download throughput measured in different countries.

Average download performance (source [CBS12])

The second paper, [SB12] explores the performance of WiFi and cellular data networks in more than a dozen of cities, including Brussels. Latency and bandwidth information is extracted from http://www.speedtest.net. A first point to be noted from these measurements is that cellular and WiFi performance are significantly lower than broadband performance, despite all the efforts in deploying higher speed wireless networks. Note that the data was collected in February-June 2011, the network performance might have changed since. When comparing WiFi and cellular data, WiFi is consistently faster in all studied regions. In Brussels for example, WiFi download throughput is 8.6 Mbps in Brussels while the average cellular download throughput is only 1.2Mbps. Latency is also an important performance factor. In Brussels, the average WiFi latency is slightly above 100 milliseconds while it reaches 281 milliseconds for cellular networks. Both papers are recommended readings for anyone willing to better understand the performance of Internet access networks.

[CBS12]

(1, 2) Igor Canadi, Paul Barford, and Joel Sommers. Revisiting Broadband Performance. In the 2012 Internet Measurements conference, 273–286. New York, New York, USA, 2012. ACM Press.

[SB12]

Joel Sommers and Paul Barford. Cell vs. WiFi : On the Performance of Metro Area Mobile Connections. In the 2012 Internet Measurements conference, 301. New York, New York, USA, 2012. ACM Press.

[SdDF+11]

S Sundaresan, W de Donato, N Feamster, Renata Teixeira, Sam Crawford, and Antonio Pescapè. Broadband internet performance: a view from the gateway. SIGCOMM, 2011.

[SdDF+12]

Srikanth Sundaresan, Walter de Donato, Nick Feamster, Renata Teixeira, Sam Crawford, and Antonio Pescapè. Measuring home broadband performance. ACM SIGCOMM Computer Communication Review, 55(11):100, November 2012.

REconfiguration matters

Mon, 17 Dec 2012 00:00:00 +0100

REconfiguration matters

Network configuration and management is an important problem for all network operators. IP networks are implemented by using a large number of switches and routers from different vendors. All these devices must be configured by the operators to maximise the performance of the network. In some networks, configuration files contain several tens of thousands of lines per device. Managing all these configurations is costly in large networks. Some researchers have worked on analysing the complexity of these networks and proposing abstractions to allow operators to better configure their network. Still, network configuration and management is often closer to an art than to science.

Researchers often consider the network configuration problem as a design problem. Starting from a blank sheet, how can a network operator define his/her objectives and then derive the configuration files that meet these objectives. This is not how networks are operated. Network operators almost never have the opportunity to design and configure a network from scratch. This only happens in new datacenters or new entreprise networks. In their recent work, Laurent Vanbever, Stefano Vissicchio and other colleagues have addressed a slightly different but much more relevant problem : network REconfiguration. There are only two letters of difference between network configuration and network REconfiguration, but these two letters reflect one of the important sources of complexity in managing a network and introducting new services. Once a network has been configured, it must remain in operation 24h per day, 365 days per year. Network equipment can remain in operation for 5-10 years and during their period their role in the network changes. All these changes must be done with as few impact as possible on the network.

To better understand the difficulty of reconfiguring a network, it is interesting to have a brief look at earlier papers that deal with similar problems. A decade ago, routing sessions had to be reset for each policy change or when the operating system of the router had to be upgraded. Aman Shaikh and others have shown that it is possible to update the control plane of a router without disrupting the dataplane [SDV06]. Various graceful shutdown and graceful restart techniques have been proposed and implemented for the major control plane protocols. Another simple example of a reconfiguration problem is when operators need to change the OSPF weight associated to one link. This can happen for traffic engineering or maintenance purposes. This change triggers an OSPF convergence than can cause transient loops. Pierre Francois and others have proposed techniques that allow these simple reconfigurations to occur without causing transient forwarding problems [FB07][FSB07]. Another step to aid network reconfiguration was the shadow configuration paper [AWY08] that shows how to run different configurations in the same network at a given time.

During the last years, several network Reconfiguration problems have been addressed. The first problem is the migration from one configuration of a link-state routing protocol (e.g. OSPF without areas) to another link-state routing protocol (e.g. IS-IS with areas). At first glance, this problem could appear to be simple. However, network operators who have performed such a transition have spent more than half a year to plan the transition and analyse all the problems that could occur. [VVP+11] provides first a theoretical framework that shows the problems that could occur during such a reconfiguration. It shows that it is possible to avoid transient forwarding problems during the reconfiguration by using a ships-in-the night approach and updating the configuration of the routers in a specific order. Unfortunately, finding this ordering is an NP-complete problem. However, the paper proposes heuristics that find a suitable ordering and applies it to real networks and provides measurements from a prototype reconfigurator that manages an emulated network.

A second problem are the BGP reconfigurations. Given the complexity of BGP, it is not surprising that BGP reconfigurations are more difficult than IGP reconfigurations. [VCVB12] first shows that signalling and forwarding correctness that are usually used to very iBGP configuration are not sufficient properties. Dissemination correctness must be ensured in addition to these two properties. :cite:`6327628`_ analyses several iBGP reconfiguration problems and identifies some problematic configurations. To allow an iBGP reconfiguration, this paper proposes and evaluates a BGP multiplexer that combined with encapsulation enables iBGP reconfigurations. The proposed solution provably enables lossless BGP reconfigurations by leveraging existing technology to run multiple isolated control-planes in parallel.

This work on REconfiguration has already lead up to some follow-up work. For example, [RFR+12] has proposed techniques that use tagging to allow software-defined networks to support migrations in a seamless manner. We can expect to rad more papers that deal with REconfiguration problems in the coming years.

[AWY08]

Richard Alimi, Ye Wang, and Y. Richard Yang. Shadow configuration as a network management primitive. SIGCOMM Comput. Commun. Rev., 38(4):111–122, August 2008. URL: http://doi.acm.org/10.1145/1402946.1402972, doi:10.1145/1402946.1402972.

[FSB07]

P. Francois, M. Shand, and O. Bonaventure. Disruption free topology reconfiguration in ospf networks. In INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE, volume, 89 –97. may 2007. URL: http://inl.info.ucl.ac.be/publications/disruption-free-topology-reconfigurat, doi:10.1109/INFCOM.2007.19.

[FB07]

Pierre Francois and Olivier Bonaventure. Avoiding transient loops during the convergence of link-state routing protocols. IEEE/ACM Trans. Netw., 15(6):1280–1292, December 2007. URL: http://inl.info.ucl.ac.be/publications/avoiding-transient-loops-during-conve, doi:10.1109/TNET.2007.902686.

[RFR+12]

Mark Reitblatt, Nate Foster, Jennifer Rexford, Cole Schlesinger, and David Walker. Abstractions for network update. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, SIGCOMM ‘12, 323–334. New York, NY, USA, 2012. ACM. URL: http://doi.acm.org/10.1145/2342356.2342427, doi:10.1145/2342356.2342427.

[SDV06]

Aman Shaikh, Rohit Dube, and Anujan Varma. Avoiding instability during graceful shutdown of multiple ospf routers. IEEE/ACM Trans. Netw., 14(3):532–542, June 2006. URL: http://dx.doi.org/10.1109/TNET.2006.876152, doi:10.1109/TNET.2006.876152.

[VVP+11]

Laurent Vanbever, Stefano Vissicchio, Cristel Pelsser, Pierre Francois, and Olivier Bonaventure. Seamless network-wide igp migrations. In Proceedings of the ACM SIGCOMM 2011 conference, SIGCOMM ‘11, 314–325. New York, NY, USA, 2011. ACM. URL: http://inl.info.ucl.ac.be/publications/seamless-network-wide-igp-migrations, doi:10.1145/2018436.2018473.

[VCVB12]

S. Vissicchio, L. Cittadini, L. Vanbever, and O. Bonaventure. Ibgp deceptions: more sessions, fewer routes. In INFOCOM, 2012 Proceedings IEEE, volume, 2122 –2130. march 2012. URL: http://inl.info.ucl.ac.be/publications/ibgp-deceptions-more-sessions-fewer-routes, doi:10.1109/INFCOM.2012.6195595.

Mininet : improving the reproducibility of networking research

Fri, 14 Dec 2012 00:00:00 +0100

Mininet : improving the reproducibility of networking research

In most research communities, the ability to reproduce research results is a key step in validating and accepting new research results. Ideally, all published papers should contain enough information to enable researchers to reproduce the results discussed in the paper. Reproducibility is relatively easy for theoretical or mathematically oriented papers. If the main contribution of the paper is a proof or a mathematical model, then the paper contains all the information about the results. If the paper is more experimental, then reproducibility of often a concern. There are many (sometimes valid) reasons that can explain why the results obtained by the paper are difficult to reproduce :

the paper contains measurement data that are proprietary. This argument is often used by researchers who have tested their new solution in a commercial network, datacenter or used measurement data such as packet traces whose publication could revela private information

the source code used for the paper is proprietary and cannot be released to other researchers. This argument is weaker, especially when researchers extend publicly available (and often open-source) software to perform their research. Although they have benefitted from the publicly available software, they do not release their modification to this software

During the Conext2012, Brandon Heller, Nikhil Handigol, Vimalkumar Jeyakumar, Bob Lantz and Nick McKeown have presented a container-based emulation technique called Mininet 2.0 that enables researchers to easily create reproducible experiments. The paper describes in details the extension that they have developed above the Linux kernel to be able to emulate efficiently a set of hosts interconnected by virtual links on a single Linux kernel. The performance that they obtained is impressive. More importantly, they explain how they were able to reproduce recent networking papers on top of Mininet. Instead of performing the experiments themselves, they used Mininet 2.0 for a seminar at Stanford University and 17 groups of students were able to reproduce various measurements by using one virtual machine on EC2.

Beyond proposing a new tool, they also propose a new way to submit papers. In the introduction, they note :

To demonstrate that network systems research can indeed be made repeatable, each result described in this paper can be repeated by running a single script on an Amazon EC2 [5] instance or on a physical server. Following Claerbout’s model, clicking on each figure in the PDF (when viewed electronically) links to instructions to replicate the experiment that generated the figure. We encourage you to put this paper to the test and replicate its results for yourself.

I sincerely hope that will see more directly reproducible experimental papers in the coming months and years in the main networking conferences.

Brandon Heller, Nikhil Handigol, Vimalkumar Jeyakumar, Bob Lantz, Nick McKeown, Reproducible Network Experiments using Container Based Emulation, Proc. Conext 2012, December 2012, Nice, France

A real use case for the Locator/Identifier Separation Protocol

Fri, 07 Dec 2012 00:00:00 +0100

A real use case for the Locator/Identifier Separation Protocol

The Locator/Identifier Separation Protocol (LISP) was designed several years ago by Dino Farinacci and his colleagues at Cisco Systems as an architecture to improve the scalability of the Internet routing system. Like several other such proposals, LISP proposed to separate the two usages of addresses. In contrast with many other proposals that were discussed in the IRTF Routing Research Group, LISP has been fully implemented at tested on the global Internet through the lisp4.net testbed. Several implementations of LISP exist on different types of Cisco routers and there is also the OpenLISP open-source implementation on FreeBSD [ISB11].

On the Internet, IP addresses are used for identifying an endhost (or more precisely an interface on such a host) where TCP connections can be terminated. Addresses are also used as locators to indicate to the routing system the location of each router and endpoint. On the Internet, both endpoint addresses and locators are advertised in the routing system and contribute to its growth. With LISP, there are two types of addresses :

locators. These addresses are used to identify routers inside the network. The locators are advertised in the routing system.

identifier addresses are used to identify endhosts. These addresses are not distributed in the routing system. This is the main advantage of LISP from a scalability viewpoint.

A typical deployment of LISP in the global Internet is described in the figure below [SIBF12].

LISP packet flow (source [SIBF12])

Endhosts are assigned identifiers. Identifiers are IP addresses (IPv4 or IPv6) whose reachability is advertised by the intradomain routing protocol inside the enterprise network. Two hosts that belong to the same network can exchange packets directly (A->B arrow in the above figure). Locators are assigned to border routers (ITRx and ETRx in the figure above). These locators are advertised in the global Internet routing system by using BGP. As the addresses of hosts A, B and C are not advertised by BGP, packets destined to these addresses cannot apear in the global Internet. LISP solves this problem by using map-and-encap. When A sends a packet towards C, it first sends a regular packet to its border router (ITR2 in the above figure). The border router performs two operations. First, it queries the LISP mapping system to retrieve the locator of the border routers that can reach the destination identifier C. Then, the original packet destined to C in encapsulated inside a LISP packet whose source is ITR2 and destination ETR1, one of the locators of identifier C. When ETR1 receives the encapsulated packet, it removes the first IP header and forwards as a regular IP packet towards its destination.

The mapping system plays an important role in the performance and the stability of a LISP-based network since border routers need to maintain a cache of the mappings that they use [IB07]. The first releases of LISP used a hack that combined GRE tunnels and BGP to distribute mapping information [FFML11]. This solution had the advantage of being simple to implement (at least on Cisco routers) but we expected that it would become complex to operate and maintain in the long term. After many discussions and simulations, we convinced the LISP designers to opt for a different mapping system whose operation is inspired by the Domain Name System. Our LISP-TREE proposal [JCAC+10] is the basis for the DDT mapping system that is now implemented and used by LISP routers.

Simulation-based studies have shown that LISP can provide several important benefits compared to the classic Internet architecture [QIdLB07]. Some companies have used LISP to support specific services. For example, facebook has relied on LISP to support IPv6-based services [SIBF12]. However, until now the deployment use cases were not completely convincing from a commercial viewpoint. A recent announcement could change the situation. In a whitepaper, Cisco describes how LISP can be combined with encryption techniques to support Virtual Private Network services. Given the importance of VPN services for enterprise networks, this could become a killer application for LISP. There are apparently already several networks using LISP to support VPN services. The future of LISP will be guaranteed once a second major router vendor decides to implement LISP.

[FFML11]

V. Fuller, D. Farinacci, D. Meyer, and D. Lewis. Lisp alternative topology (lisp+alt). Internet draft, draft-ietf-lisp-alt-10.txt, December 2011.

[IB07]

Luigi Iannone and Olivier Bonaventure. On the cost of caching locator/ID mappings. In CoNEXT ‘07: Proceedings of the 2007 ACM CoNEXT conference. ACM, December 2007.

[ISB11]

Luigi Iannone, Damien Saucez, and Olivier Bonaventure. Implementing the Locator/ID Separation Protocol: Design and experience. Computer Networks: The International Journal of Computer and Telecommunications Networking, March 2011.

[JCAC+10]

Loránd Jakab, Albert Cabellos-Aparicio, Florin Coras, Damien Saucez, and Olivier Bonaventure. LISP-TREE: a DNS hierarchy to support the lisp mapping system. IEEE Journal on Selected Areas in Communications, October 2010.

[QIdLB07]

Bruno Quoitin, Luigi Iannone, Cédric de Launois, and Olivier Bonaventure. Evaluating the benefits of the locator/identifier separation. In MobiArch ‘07: Proceedings of 2nd ACM/IEEE international workshop on Mobility in the evolving internet architecture. ACM Request Permissions, August 2007.

[SIBF12]

(1, 2, 3) Damien Saucez, Luigi Iannone, Olivier Bonaventure, and Dino Farinacci. Designing a Deployable Internet: The Locator/Identifier Separation Protocol. IEEE Internet Computing Magazine, 16(6):14–21, 2012.

TCP congestion control schemes

Tue, 04 Dec 2012 00:00:00 +0100

TCP congestion control schemes

Since the publication of two end-to-end congestion control schemes at SIGCOMM’88 [Jac88] [RJ88], congestion control has been a very popular and important topic in the scientific community. The IETF tried to mandate a standard TCP congestion control scheme that would be used by all TCP implementations RFC 5681, but today’s TCP implementations contain different congestions control schemes. Linux supports different congestion control schemes that can be configured by the system administrator. A detailed analysis of these implementations was presented recently in [CGPP12] Windows has opted for their own congestion control scheme that is included in the microsoft stack.

Given the importance of the congestion control scheme from a performance viewpoint, it is useful to have a detailed overview of the different congestion control schemes that have been proposed and evaluated. The best survey paper on TCP congestion control is probably the paper written by Alexander Afanasyev and his colleagues on Host-to-Host Congestion Control for TCP [ATRK10] and that appeared in IEEE Communications Surveys and tutorials. This paper provides a detailed overview of the different TCP congestion control schemes and classifies them.

Last week, I received an alert from google scholar indicating that a new survey on TCP congestion control appeared in the Journal of Network and Computer Applications. This paper tries to provide a classification of the different TCP congestion control schemes. Unfortunately, the paper is not convincing at all and furthermore it reuses two of the figures published in [ATRK10] without citing this previously published survey. This is a form of plagiarism that should have been detected by the editors of the Journal of Network and Computer Applications

[ATRK10]

(1, 2) Alexander Afanasyev, Neil Tilley, Peter Reiher, and Leonard Kleinrock. Host-to-Host Congestion Control for TCP. IEEE Communications Surveys & Tutorials, 12(3):304–342, 2010.

[CGPP12]

C Callegari, S Giordano, M Pagano, and T Pepe. Behavior analysis of TCP Linux variants. Computer Networks, 56(1):462–476, January 2012.

[Jac88]

V. Jacobson. Congestion avoidance and control. In ACM SIGCOMM Computer Communication Review, volume 18, 314–329. ACM, 1988.

[RJ88]

KK Ramakrishnan and R. Jain. A binary feedback scheme for congestion avoidance in computer networks with a connectionless network layer. In ACM SIGCOMM Computer Communication Review, volume 18, 303–313. ACM, 1988.

Towards faster web downloads and some interesting data

Tue, 25 Sep 2012 00:00:00 +0200

Towards faster web downloads and some interesting data

Decreasing the time required to download web pages is an obsession for a large number of content providers. They rely on various tricks to speed up the download of web pages. Many of these tricks are widely known, although they are not implemented on all web servers. Some depend on the content itself. A smaller web page will always load faster than a large web page. This is the reason why major content providers optimise their HTML content to reduce the unnecessary tags or compress their javascript scripts. They also heavily rely on cacheable information such as images, CSS, javascript, … Some also use gzip-based compression to dynamically compress the data that needs to be transmitted over the wire. This is particularly useful when web pages are delivered over low bandwidth links such as to mobile phones.

Two years ago, at the beginning of their work on SPDY, google published interesting data about the size of web accessible content on https://developers.google.com/speed/articles/web-metrics

The analysis was based on the web pages collected by googlebot. It reflects thus a large fraction of the public web. Some of the key findings of this analysis include :

the average web page results in the transmission of 320 KBytes

there are on average more than 40 GET Requests per web page and a web page is retrieved by contacting 7 different hostnames. The average page contains 29 images, 7 scripts and 3 CSS

on average, each HTTP GET results in the retrieval of only 7 KBytes of data

The public web is thus widely fragmented and this fragmentation has an impact on its performance.

Last year, Yahoo researchers presented an interesting analysis of the work they did to optimise the performance of the web pages served by Yahoo [AFERG11]. This analysis focuses on the yahoo web pages but still provides some very interesting insights on the performance of the web. Some of the key findings of this analysis include :

around 90% of the objects served by yahoo are smaller than 25 Kbytes

despite the utilisation of HTTP/1.1, the average number of requests per TCP connection is still only 2.24 and 90% of the TCP connections do not carry more than 4 GETs

web page download is heavily affected by packet losses and packet losses occur. 30% of the TCP connections are slowed by retransmissions

Packet retransmission rate observed on the yahoo! CDN Source [AFERG11]

increasing the initial TCP window size as recently proposed and implemented on Linux reduces the page download time, even when packet losses occur

increasing the initial TCP window size may cause some unfairness problems given the small duration of TCP connections

There is still a lot of work to be done to reduce page download times and many factors influence the perceived performance of the web.

[AFERG11]

(1, 2) Mohammad Al-Fares, Khaled Elmeleegy, Benjamin Reed, and Igor Gashinsky. Overclocking the yahoo!: cdn for faster web page loads. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, IMC ‘11, 569–584. New York, NY, USA, 2011. ACM. URL: http://doi.acm.org/10.1145/2068816.2068869, doi:10.1145/2068816.2068869.

Understanding the dropbox protocol and quantifying the usage of cloud storage services

Fri, 14 Sep 2012 00:00:00 +0200

Understanding the dropbox protocol and quantifying the usage of cloud storage services

Measurement studies show that video is consuming more and more bandwidth in the global Internet. Another service whose usage is growing are the cloud-based storage services like dropbox, icloud, skydrive or GDrive. These cloud storage services use proprietary protocols and allow users to exchange files and share folders efficiently. dropbox is probably one of the most widely known storage services. It heavily relies on the amazon EC2 and AWS services. The dropbox application is easy to use, but few is known open the operation of the underlying protocol. In a paper that will be presented this fall at IMC’12, Idilio Drago and his colleagues provide a very detailed analysis of the dropbox protocol and its usage in home and campus networks [DMMunafo+12].

Several of the findings of this paper are very interesting. First, despite its popularity, dropbox is still provided from servers located mainly in the US. This implies a long round-trip-time for the large population of dropbox users who do not reside in North America. Since dropbox uses the Amazon infrastructure, it is surprising that they do not seem to use Amazon datacenters outisde the US. All the files that you store in your dropbox folder are likely stored on US servers. Another surprising result is that dropbox divides the files to be transferred in chunks of 4 MBytes and each chunk needs be acknowledged by the application. Coupled with the long round-trip-time, this results in a surprisingly low transfer rate of about 500 Kbps. This performance issue seems to have been solved recently by dropbox with the ability to send chunks in batches.

[DMMunafo+12] also provides an analysis of the operation of the main dropbox protocol. dropbox uses mainly servers hosted on Amazon datacenters for various types of operation. Although dropbox uses TLS to encrypt the data, the authors used SSLBump running on squid to perform a man-in-the-middle attack between a dropbox client and the official servers.

An example of a storage operation with dropbox (source [DMMunafo+12])

Another interesting information provided in [DMMunafo+12] is an analysis of the dropbox traffic in campus and home networks. This analysis performed by using tstat shows that cloud storage services already contribute to a large volume of data in the global Internet. The analysis also considers the percentage of clients that are uploading, downloading and silent. Users who have installed dropbox but are not using it should be aware that the dropbox client always opens connections to dropbox servers, even if no data needs to be exchanged. The entire dataset collected for is available from http://www.simpleweb.org/wiki/Dropbox_Traces

[DMMunafo+12]

(1, 2, 3, 4) Idilio Drago, Marco Mellia, Maurizio M. Munafò, Anna Sperotto, Ramin Sadre, and Aiko Pras. Inside Dropbox: Understanding Personal Cloud Storage Services. In Proceedings of the 12th ACM SIGCOMM Conference on Internet Measurement, IMC’12. 2012.

An interesting retrospective of a decade of transport protocol research with SCTP

Thu, 06 Sep 2012 00:00:00 +0200

An interesting retrospective of a decade of transport protocol research with SCTP

TCP is still the dominant transport protocol on the Internet. However, since the late 1990s, researchers and IETFers have worked on developing and improving another reliable transport protocol for the Internet : SCTP. The Stream Control Transport Protocol (SCTP) RFC 4960 provides several additional features compared to the venerable TCP, including :

multihoming and failover support

partially reliable data transfer

preservation of message boundaries

support for multiple concurrent streams

The main motivation for the design of SCTP has been the support of IP Telephony signaling applications that required a reliable delivery with small delays and fast failover in case of failures. SCTP has evolved to support other types of applications.

Since the publication of the first RFC on SCTP, various research groups have proposed extensions and improvements to SCTP. SCTP’s flexibility and extensibility has enabled researchers to explore various new techniques and solutions to improve transport protocols. A recently published survey paper [BGBFerrus12] analyses almost a decade of transport protocol research by looking at over 430 SCTP related publications. Like most survey papers, it provides an introduction to the paper topic, in this case SCTP and briefly compares some of the studied papers.

[BGBFerrus12] goes beyond a simple summary of research papers and the approach chosen by the authors could probably be applicable in other fields. In order to understand the evolution of the main topics of SCTP research, the authors of [BGBFerrus12] classified each paper along four different dimensions :

protocol features (handover, load sharing, congestion, partial reliability, …)

application (signaling, multimedia, bulk transfer, web, …)

network environment (wireless, satellite, best effort, …)

study approach (analytical, simulation, emulation, live measurements, …)

By using these four dimensions [BGBFerrus12], provides a quick snapshot of the past SCTP research and its evolution over the years. Not surprisingly, simulation is more popular than live measurements or bulk data transfer is more often explored than signaling. Furthermore, [BGBFerrus12] provides interesting visualization such as the figure below on the chosen study approach.

Fourth dimension : the study approach used for SCTP papers during the last decade [BGBFerrus12]

[BGBFerrus12] is a very interesting starting point for any researcher interested in transport protocol research. The taxonomy and the presentation could also inspire researchers in other fields. A web interface for the taxonomy is also available. Unfortunately, it does not seem to have been maintained after the finalization of the paper.

[BGBFerrus12]

(1, 2, 3, 4, 5, 6, 7) \Lukasz Budzisz, Johan Garcia, Anna Brunstrom, and Ramon Ferrús. A taxonomy and survey of sctp research. ACM Comput. Surv., 44(4):18:1–18:36, September 2012. http://doi.acm.org/10.1145/2333112.2333113. doi:10.1145/2333112.2333113.

Less than best effort : congestion control schemes do not always try to optimize goodput

Wed, 05 Sep 2012 00:00:00 +0200

Less than best effort : congestion control schemes do not always try to optimize goodput

The TCP congestion control scheme aims at providing a fair distribution of the network resources among the competing hosts while still achieving the highest possible throughput for all sources. In some sense, TCP’s congestion control scheme considers that all sources are equal and should obtain the same fraction of the available resources. In practice, this is not completely true since it is known that TCP is unfair and favors sources with a low round-trip-time compare to sources with a high round-trip-time. Furthermore, TCP’s congestion control scheme operates by filling the available buffers in the routers. This mode of operation results in an increase in the end-to-end delay perceived by the applications. This increased delay can be penalizing for interactive applications.

However, TCP’s congestion control RFC 5681 is only one possible design point in the space of congestion control schemes. Some congestion control schemes start from a different assumption than TCP. The Low Extra Delay Background Transport (LEDBAT) IETF working group is exploring such an alternate design point. Instead of assuming that all sources are equal, LEDBAT assumes that some sources are background sources that should benefit from the available network resources within creating unnecessary delays that would affect the other sources. The LEDBAT congestion control scheme is an example of a delay based congestion controller. LEDBAT operates by estimating the one-way delay between the source and the destination. This delay is estimated by measuring the minimum delay and assuming that this minimum delay can serve as a reference to evaluate the one-way delay. The figure below (from [RW12]) illustrates clearly the estimation of the minimum one-way-delay and then the estimation of the current one-way-delay. LEDBAT uses a congestion window and adjusts it every time the measured delay changes. If the measured delay increases, this indicates that the network is becoming more congested and thus background sources need to backoff and reduce their congestion window.

Estimation of the queueing delay in LEDBAT [RW12]

LEDBAT is only one example of congestion control schemes that can be used by less-than best effort applications. A recent survey [RW12] RFC 6297 summarizes the main features and properties of many of these congestion control schemes. Furthermore, [RW12] provides pointers to implementations of such controllers. Such congestion control schemes have notably been implemented inside Bittorrent clients. An analysis of the performance of such p2p clients may be found in [RTV12]

[RW12]

(1, 2, 3, 4) D. Ros and M. Welzl. Less-than-best-effort service: a survey of end-to-end approaches. Communications Surveys Tutorials, IEEE, PP(99):1 –11, 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6226797. doi:10.1109/SURV.2012.060912.00176.

[RTV12]

D. Rossi, C. Testa, and S. Valenti. Yes, we ledbat: playing with the new bittorrent congestion control algorithm. In Passive and Active Measurement (PAM’10). Zurich, Switzerland, April 2012. http://www.pam2010.ethz.ch/papers/full-length/4.pdf.

Getting a better understanding of the Internet via the lenses of RIPE Atlas

Fri, 31 Aug 2012 00:00:00 +0200

Getting a better understanding of the Internet via the lenses of RIPE Atlas

The Internet is probably the most complex system to have been built by humans. All the devices and software that compose the Internet interact in various ways. Most of the times, these interactions are positive and allow to improve the network performance. However, some interactions can cause losses of connectivity, decreased performance or other types of problems.

Understanding all the factors that affect the performance of the Internet is complex. One way to approach the problem is to collect measurements about various metrics such as delay, bandwidth or paths through the network. Several research projects and companies are currently collecting large amounts of data about the Internet.

A very interesting project is RIPE Atlas . RIPE is a non-profit organisation mainly composed of network operators whose objective is to allocate IP addresses in Europe. In addition to this address allocation activity, they allow carry various projects that are useful for their members. Atlas is one of their recent projects. To obtain a better understanding on the performance and the connectivity of Internet nodes, RIPE engineers have developed a very small network probe that contains an embedded operating system, has an Ethernet plug an is powered via USB. A photo of the Atlas probe is available here.

This embedded system has low power and low performance, but it can be deployed at a large scale. As of this writing, more than 1800 probes are connected to the Internet and new ones are added on a daily basis. This large number of nodes places RIPE in a very good position to obtain very good data about the performance of the network since they can run various types of measurements including ping and traceroute. As of this writing, Atlas is mainly used to check network connectivity since Atlas hosts can request their own measurements. In this future, it can be expected that Atlas hosts will be able to program various forms of measurements and RIPE has developed a credit system that allows hosts to obtain credits based on the number of Atlas probes that they host.

Atlas already cover a large fraction of the Internet. In can check on https://atlas.ripe.net/ the probes that have been activated near your location. If you live in an area without Atlas probes and have permanent Internet connectivity, you can apply on https://atlas.ripe.net/pre-register/

DNS injection can pollute the entire Internet

Thu, 30 Aug 2012 00:00:00 +0200

DNS injection can pollute the entire Internet

The Domain Name System is one of the key applications on the Internet since it enables the translation of domain names into IP addresses. There are many usages of the DNS and unfortunately some abuses as well. Since DNS allows to map domain names into IP addresses, a simple attack on the DNS system consist in providing incorrect answers to DNS queries. This can be performed by attackers willing to launch a Man in the Middle Attack but also by some ISPs and some governments to block some websites. Various countries have laws that force ISPs to block some specific websites, for various purposes. In Belgium this technique has been used several times to block a small number of websites, see e.g. http://blog.rootshell.be/2011/10/04/the-great-firewall-of-belgium-is-back/

Some countries have more ambitious goals than Belgium and try to block a large number of websites. Chine is a well-known example with the Great Firewall of China. One of the techniques used in China is DNS injection. In a few words, a DNS injector is a device that is strategically placed inside the network to capture DNS requests. Every time the injector sees a DNS request that matches a blocked domain, it sends a fake DNS reply containing invalid information. A recent article published in SIGCOMM CCR analyses the impact of such injectors [Ano12]. Surprisingly, DNS injectors can lead to pollution of DNS resolvers outside of the country where the injection takes place. This is partially because the DNS is a hierarchy and when a resolver sends a DNS request, it usually needs to query top-level domain name servers. When the path to such a server passes through a DNS injector, even if the actual data traffic will not pass through the DNS injector later, the injector will inject a fake reply and the website will not be reachable by any client of this resolver during the lifetime of the cached information. The analysis presented in the paper shows that this DNS injection technique can pollute a large number of resolvers abroad. The article reports that domains in belonging to .de are affected by the Great Firewall of China.

DNS injection (source : [Ano12])

The article on The Collateral Damage of Internet Censorship by DNS Injection should become a must read for ISP operators who are forced by their governments to deploy DNS injectors. In parallel, this should also be a strong motivation to deploy DNSSEC that will enable resolvers to detect and ignore these DNS injections.

[Ano12]

(1, 2) Anonymous. The collateral damage of internet censorship by dns injection. SIGCOMM Comput. Commun. Rev., 42(3):21–27, 2012. http://doi.acm.org/10.1145/2317307.2317311. doi:10.1145/2317307.2317311.

Interesting networking magazines

Wed, 29 Aug 2012 00:00:00 +0200

Interesting networking magazines

During their studies, students learn the basics of networking. After one of maybe two networking courses, they know a very small subset of the networking field. This subset should allow them to continue learning new protocols, architectures and algorithms. However, finding information about these new techniques can be difficult. Here is a short list of magazines and journals that publish regularly interesting information about the evolution of the networking field.

A first source of information are the magazines. There are various trade press magazines that usually present new products and provide information on new commercial deployments. A networking student should not limit his/her networking knowledge to the information published in the trade press. Here is a subset of the magazines that I often try to read :

the Internet Protocol Journal published by Cisco is a very useful and very interesting source of tutorial papers. Each issue usually contains articles about new Internet protocols written by an expert. The papers are easy to read and do not contain marketing information.

the IETF Journal published by the Internet Society publishes short papers about the evolution of the protocols standardized by the IETF

These two magazines are freely available to anyone.

Scientific societies also publish magazines with networking content. These magazines are not available at the newsstand, but good university libraries should store them.

The IEEE publishes several magazines with some networking content :

IEEE Network Magazine publishes tutorial papers in the broad networking field.

IEEE Communications Magazine publishes tutorial articles on various topics of interest to telecommunication engineers, with sometime networking sections or articles

IEEE Internet Computing Magazine publishes articles mainly on Internet-based applications with sometimes articles on lower layers

IEEE Security and Privacy publishes articles on new advances in security and privacy in general. Some of the published articles are related to networking problems.

The ACM publishes various magazines and journals that cover most areas of Computer Science.

Communications of the ACM is ACM’s main magazine. This magazine is an essential read for any computer scientist willing to track the evolution of his/her scientific field. It sometimes publishes networking articles. ACM Queue is a special section of the magazine that is devoted to more practically oriented articles. It has published very interesting networking articles and furthermore all content on ACM Queue is publicly available.

If you read other networking magazines that are of interest for networking students, let me know. I will cover networking conferences and networking journals in subsequent posts.

Multipath TCP : beyond grandmother’s TCP

Mon, 27 Aug 2012 00:00:00 +0200

Multipath TCP : beyond grandmother’s TCP

TCP, the Transmission Control Protocol, is the default transport protocol in the TCP/IP protocol suite. TCP is essentially based on research carried out during the 1970s that resulted in the publication of RFC 793. Since then, TCP has slowly evolved. A major step in TCP’s history is the congestion control scheme proposed and implemented by Van Jacobson [Jac88]. RFC 1323 proposed a few years later extended TCP to support larger windows, a key extension for today’s high speed networks. Various changes have been included in TCP over the years and RFC 4614 provides pointers to many of the TCP standardisation documents.

In the late 1990s and early 2000s, the IETF developed the Stream Control Transmission Protocol (SCTP) RFC 4960. Compared to TCP, SCTP is a more advanced transport protocol that provides a richer API to the application. SCTP was also designed with multihoming and mind and can support hosts with multiple interfaces, something that TCP cannot easily do. Unfortunately, as of this writing, SCTP has not yet been widely deployed despite being implemented in major operating systems. The key difficulties in deploying SCTP appear to be :

the lack of SCTP support in middleboxes such as NAT, firewalls, …

the need to update applications to support SCTP, although this is changing with RFC 6458

SCTP seems to be stuck in a classical chicken and egg problem. As there are not enough SCTP applications, middleboxes vendor do not support it and application developers do not use SCTP since middleboxes do not support. Multipath TCP is a major extension to TCP whose specification is currently being finalised by the MPTCP working group of the IETF. Multipath TCP allows a TCP connection to be spread over several interfaces during the lifetime of the connection. Multipath TCP has several possible use cases :

datacenters where Multipath TCP allows to better spread the load among all available paths [RBP+11]

smartphones where Multipath TCP allows to use both WiFi and 3G a the same time [PDD+12]

The design of Multipath TCP [FRHB12] has been more complicated than expected due to the difficulty of supporting various types of middleboxes [RPB+12], but the protocol is now ready and you can even try our implementation in the Linux kernel from http://www.multipath-tcp.org

[FRHB12]

Alan Ford, Costin Raiciu, Mark Handley, and Olivier Bonaventure. Tcp extensions for multipath operation with multiple addresses. Internet draft, draft-ietf-mptcp-multiaddressed-07, March 2012. URL: http://tools.ietf.org/html/draft-ietf-mptcp-multiaddressed-09.

[Jac88]

V. Jacobson. Congestion avoidance and control. In Symposium proceedings on Communications architectures and protocols, SIGCOMM ‘88, 314–329. New York, NY, USA, 1988. ACM. URL: http://doi.acm.org/10.1145/52324.52356, doi:10.1145/52324.52356.

[PDD+12]

Christoph Paasch, Gregory Detal, Fabien Duchene, Costin Raiciu, and Olivier Bonaventure. Exploring mobile/wifi handover with multipath tcp. In ACM SIGCOMM workshop on Cellular Networks (Cellnet12). 2012. URL: http://inl.info.ucl.ac.be/publications/exploring-mobilewifi-handover-multipath-tcp.

[RBP+11]

Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik, and Mark Handley. Improving datacenter performance and robustness with multipath tcp. In Proceedings of the ACM SIGCOMM 2011 conference, SIGCOMM ‘11, 266–277. New York, NY, USA, 2011. ACM. URL: http://doi.acm.org/10.1145/2018436.2018467, doi:10.1145/2018436.2018467.

[RPB+12]

Costin Raiciu, Christoph Paasch, Sebastien Barre, Alan Ford, Michio Honda, Fabien Duchene, Olivier Bonaventure, and Mark Handley. How hard can it be? designing and implementing a deployable multipath tcp. In USENIX Symposium of Networked Systems Design and Implementation (NSDI’12), San Jose (CA). 2012.

Anatomy of a Large European IXP

Tue, 21 Aug 2012 00:00:00 +0200

Anatomy of a Large European IXP

The Internet is still evolving and measurements allow us to better understand its evolution. In [LIJM+10], Craig Labovitz and his colleagues used extensive measurements to show the growing importance of large content providers such as google, yahoo or content distribution networks such as Akamai. This paper forced researchers to reconsider some of their assumptions on the organisation of the Internet with fully meshed Tier-1 ISPs serving Tier-2 ISPs that are serving Tier-3 ISPs and content providers. [LIJM+10] showed that many of the traffic sources are directly connected to the last mile ISPs.

In a recent paper [ACF+12] presented during SIGCOMM 2012 , Bernard Ager and his colleagues used statistics collected at one of the largest Internet eXchange Points (IXP) in Europe. IXPs are locations were various Internet providers place routers to exchange traffic via a switched network. While the first IXPs where simple Ethernet switches in the back of a room, current IXPs are huge. For example, AMS-IX gathers more than 500 different ISPs on more than 800 different ports. As of this writing, its peak traffic has been larger than 1.5 Terabits per second ! IXPs play a crucial role in the distribution of Internet traffic and in particular in Europe where there are many large IXPs. [ACF+12] highlights many unknown factors about these IXPs such as the large number of peering links that exist on each ISP, the application mix or the traffic matrices. [ACF+12] will become a classic paper for those willing to understand the organisation of the Internet.

[ACF+12]

(1, 2, 3) B. Ager, N. Chatzis, A. Feldmann, N. Sarrar, S. Uhlig, and W. Willinger. Anatomy of a large european ixp. In SIGCOMM 2012. Helsinki, Finland, April 2012.

[BNC11]

H. Babiker, I. Nikolova, and K. Chittimaneni. Deploying ipv6 in the google enterprise network. lessons learned. In USENIX LISA 2011. 2011. URL: http://static.usenix.org/events/lisa11/tech/full_papers/Babiker.pdf.

[LIJM+10]

(1, 2) Craig Labovitz, Scott Iekel-Johnson, Danny McPherson, Jon Oberheide, and Farnam Jahanian. Internet inter-domain traffic. SIGCOMM Comput. Commun. Rev., 41(4):–, August 2010. URL: http://dl.acm.org/citation.cfm?id=2043164.1851194.

[TLB+12]

T. Tsou, D. Lopez, J. Brzozowski, C. Popoviciu, C. Perkins, and D. Cheng. Exploring ipv6 deployment in the enterprise: experiences of the it department of futurewei technologies. IETF Journal, June 2012. URL: http://www.internetsociety.org/articles/exploring-ipv6-deployment-enterprise-experiences-it-department-futurewei-technologies.

Deploying IPv6 in enterprise networks

Mon, 23 Jul 2012 00:00:00 +0200

Deploying IPv6 in enterprise networks

The Internet is slowly moving towards IPv6. IPv6 traffic is now growing and more and more enterprise networks are migrating to IPv6. Migrating all enterprise networks to support IPv6 will be slow. One of the main difficulties faced by network administrators when migrating to IPv6 is that it is not sufficient to migrate the hosts (most operating systems already support IPv6) and configure the routers are dual-stack. All devices that process packets need to be verified or updated to support IPv6.

Network administrators who perform IPv6 migrations sometimes document their findings in articles. These articles are interesting for other networ administrators who might face similar problems.

In [BNC11], google engineers explain the issues that they faced when migrating the enterprise networks of google premisses to support IPv6. Surprisingly, one of the main bottleneck in this migration was the support of IPv6 on the transparent proxies that they use to accelerate web access.

In [TLB+12], researchers and network administrators from FutureWei report their experience in adding IPv6 connectivity to their network. The report is short on enabling IPv6, but discusses more recent solutions that are being developed within the IETF.

Another interesting viewpoint is the one discussed in RFC 6585. In this RFC, Jari Arkko and Ari Keranen report all the issues that they faced while running an IPv6 only network in their lab and using it on a daily basis to access the Internet which is still mainly IPv4.

[BNC11]

[TLB+12]

Don’t ignore the middleboxes

Thu, 19 Jul 2012 00:00:00 +0200

Don’t ignore the middleboxes

Traditional networks contain routers, switches, clients and servers. Most introductory networking textbooks focus on these devices and the protocols that they use. However, real networks can be much more complex than the typical academic networks that are considered in textbooks. During the last decade, enterprise networks have included more and more middleboxes. A middlebox can be roughly defined as a device that resides inside the network and it able to both forward (like a router or a switch) but also modify packets. For this reason, middleboxes are often considered as layer-7 relays but they are not officially part of the Internet architecture. These middleboxes are usually deployed by network operators to better control or improve the performance of traffic in their network. There exist various types of middleboxes RFC 3234. The most common ones are :

Network Address Translators (NAT) that rewrite IP addresses and port numbers

Firewalls that control the incoming and outgoing packets

Network Intrusion Detection System that analyse the packet payloads to detect possible attacks

Load balancers that allow to distribute the load among several servers

WAN optimizers that compress packets before transmitting them over expensive low bandwidth links

Media gateways that are able to transcode voice and video formats

transparent proxy caches that speedup access to remote web servers by maintaining caches

…

The list of middleboxes keeps growing and managing them in addition to the routers and the switches is becoming a concern for enterprise network operators. In a recent paper presented at USENIX NSDI12, Vyas Sekar and colleagues describe a survey that they performed in an anonymous entreprise network. This network contained about 900 routers and more than 600 middleboxes !

Appliance type	Number
Firewall	166
Network Intrusion Detection System	127
Conferencing/Media gateway	110
Load balancers	67
Proxy caches	66
VPN devices	45
WAN optimizers	44
Voice gateways	11
Routers	about 900

Internet Topology Zoo

Tue, 17 Jul 2012 00:00:00 +0200

Internet Topology Zoo

A recent article published on slate provided nice artistic views about the layout of the optical fibers that are used for Internet.

Researchers have spent a lot of time to collect data about ISP networks during the slate decade. If you are looking for nice maps about real networks, I encourage you to have a look at the Internet topology zoo. This website, maintained by researchers from the University of Adelaide contains maps of a few hundred networks, an excellent starting point if you would like to understand a bit more on ISP networks are designed.

If you are more interested by the layout of cables, notably submarine cables, you can also check the geographical maps provided by telegeography.

Controlling Queueing delays

Mon, 16 Jul 2012 00:00:00 +0200

Controlling Queueing delays

Routers use a buffer to store the packets that have arrived but have not yet been retransmitted on their output linke. These buffers play an important role in combination with TCP’s congestion control scheme TCP uses packet losses to detect congestion. To manage their buffers, routers rely on a buffer acceptance algorithm. The simplest buffer acceptance algorithm is to discard packets as soon as the buffer is full. This algorithm can be easily implemented, but simulations and measurements have shown that is does not always provide good performance with TCP.

In the 1990s, various buffer acceptance algorithms have been proposed to overcome this problem. Random Early Detection (RED) probabilistically drops packets when the average buffer occupancy becomes too high. RED has been implemented on routers and has been strongly recommended by the IETF in RFC 2309. However, as of this writing, RED is still not widely deployed. One of the reasons is that RED uses many parameters and is difficult to configure and tune correctly (see the references listed on http://www.icir.org/floyd/red.html).

In a recent paper published in ACM Queue, Kathleen Nichols and Van Jacobson propose a new Adaptive Queue Management algorithm. The new algorithm measures the waiting time of each packet in the buffer and its control law depends on the minimum buffer occupancy. An implementation for Linux-based routers seems to be in progress. Maybe it’s time to revisit buffer acceptance algorithms again…

Unicode is growing

Mon, 16 Jul 2012 00:00:00 +0200

Unicode is growing

The Internet was created by using the 7-bits ASCII character set. Over the years, the internationalisation of the Internet forced protocol designers to reconsider the utilisation of the 7-bits US-ASCII character set. A first move was the utilisation of the 8-bits character sets. Unicode became the unifying standard that allows to encode all written languages. A recent article in IEEE Spectrum provides interesting data about the progressing of Unicode on web servers. See http://spectrum.ieee.org/telecom/standards/will-unicode-soon-be-the-universal-code