Olivier Bonaventure http://perso.uclouvain.be/olivier.bonaventure/blog/html/ Homepage and blog en-us Fri, 08 May 2015 00:00:00 +0200 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2015/05/08/tls.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2015/05/08/tls.html <![CDATA[TLS or HTTPS everywhere is not necessary the right answer]]> TLS or HTTPS everywhere is not necessary the right answer

Since the revelations about the massive surveillance by Edward Snowden, we have observed a strong move towards increasing the utilisation of encryption to protect the end-to-end traffic exchanged by Internet hosts. Various Internet stakeholders have made strong move on recommending strong encryption, e.g. :

  • The IETF has confirmed in RFC 7258 that pervasive monitoring is an attack and needs to be countered
  • The EFF has promoted the utilisation of HTTPS through the HTTPS-everywhere campaign and browser extension
  • The Let’s Encrypt campaign prepares a new certification authority to ease the utilisation of TLS
  • Mozilla has announced plans to deprecate non-secure HTTP
  • Most large web companies have announced plans to encrypt traffic between their datacenters
  • ...

Pervasive monitoring is not desirable and researchers should aim at finding solutions, but encrypting everything is not necessarily the best solution. As an Internet user, I am also very concerned by the massive surveillance that is conducted by various commercial companies.

http://arstechnica.com/security/2013/11/encrypt-all-the-worlds-web-traffic-internet-architects-propose/

]]>
Fri, 08 May 2015 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2014/11/13/segment_routing_in_the_linux_kernel.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2014/11/13/segment_routing_in_the_linux_kernel.html <![CDATA[Segment Routing in the Linux kernel]]> Segment Routing in the Linux kernel

Segment Routing is a new packet forwarding technique which is being developed by the SPRING working group of the IETF. Until now, two packet forwarding techniques were supported by the IETF protocols :

  • datagram mode with IPv4 and IPv6
  • label swapping with MPLS

Segment Routing is a modern realisation of source routing that was supported by IPv4 in RFC 791 and initially in IPv6 RFC 2460. Source routing enables a source to indicate inside each packet that it sends a list of intermediate nodes to reach the final destination. Although rather old, this technique is not widely used today because it causes several security problems. For IPv6, various attacks against source routing were demonstrated in 2007. In the end, the IETF chose to deprecate source routing in IPv6 RFC 5095.

However, source routing has several very useful applications inside a controlled network such as an entreprise or a single ISP network. For this reason, the IETF has revived source routing and considers two data planes :

  • IPv6
  • MPLS

In both cases, labels/addresses can be associated to routers and links and are advertised by the intradomain routing protocol. To steer packets along a chosen path, the source node simply adds to the packet an MPLS label stack or an IPv6 header extension that lists all the intermediate nodes/links. To understand the benefits of this approach, let us consider the simple network shown below.

The MPLS dataplane reuses the label

]]>
Thu, 13 Nov 2014 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2014/09/15/evolution_of_link_bandwidths.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2014/09/15/evolution_of_link_bandwidths.html <![CDATA[Evolution of link bandwidths]]> Evolution of link bandwidths

During my first lesson for the undergrad networking class, I wanted to provide the students with some historical background of the evolution of link bandwidth. Fortunately, wikipedia provides a very interesting page that lists most of the standards for modems, optical fibers, ...

A first interesting plot is the evolution of the modems that allow to transmit data over the traditional telephone network. The figure below, based on information extracted from http://en.m.wikipedia.org/wiki/List_of_device_bandwidths shows the evolution of the modem technology. The first method to transfer data was the Morse code that appeared in the mid 1800s. After that, it took more than a century to move to the Bell 101 modem that was capable of transmitting data at 110 bits/sec. Slowly, 300 bps and later 1200 bps modems appeared. The late 1980s marked the arrival of faster modems with 9.6 kbps and later 28.8 and 56 kbps. This marked the highest bandwidth that was feasible on a traditional phone line. ISDN appeared in the late 1980s with a bandwidth of 64 kbps on digital lines that was later doubled.

When the telephone network become the bottleneck, telecommunication manufacturers and network operators moved to various types of Digital Subscriber Lines technologies, ADSL being the most widespread. From the early days at 1.5 Mbps downstream to the latests VDSL deployments, bandwidth has increased by almost two order of magnitude. As of this writing, it seems that xDSL technology is reaching its limits and while bandwidth will continue to grow, the rate of improvement will not remain as high as in the past. In parallel, CATV operators have deployed various versions of the DOCSIS standards to provide data services in cable networks. The next step is probably to go to fiber-based solutions, but they cost more than one order of magnitude more than DSL services and can be difficult to deploy in rural areas.

The performance of wireless networks has also significantly improved. As an illustration, and again based on data from http://en.m.wikipedia.org/wiki/List_of_device_bandwidths here is the theoretical maximum bandwidth for the various WiFi standards. From 2 Mbps for 802.11 in 1997, bandwidth increased to 54 Mbps in 2003 for 802.11g and 600 Mbps for 802.11n in 2009.

The datasets used in this post are partial. Suggestions for additional datasets that could be used to provide a more detailed view of the evolution of bandwidth are more than welcome. For optical fiber, an interesting figure appeared in Nature, see http://www.nature.com/nphoton/journal/v7/n5/fig_tab/nphoton.2013.94_F1.html

]]>
Mon, 15 Sep 2014 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2014/02/11/flipping_an_advanced_networking_course.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2014/02/11/flipping_an_advanced_networking_course.html <![CDATA[Flipping an advanced networking course]]> Flipping an advanced networking course

Before the beginning of the semester, Nick Feamster informed me that he decided to flip his advanced networking course . Various teachers have opted for flipped classrooms to increase the interaction with students. Instead of using class time to present theory, the teacher focuses his/her attention during the class on solving problems with the students. Various organisations of a flipped classroom have been tested. Often, the teacher posts short videos that explain the basic principles before the class and the students have to listen to the videos before attending the class. This is partially the approach adopted by Nick Feamster for his class.

By bringing more interaction in the classroom, the flipped approach is often considered to be more interesting for the teacher as well as for the student. Since my advanced networking class gathers only a few tens of students, compared to the 100+ and 300+ students of the other courses that I teach, I also decided to flip one course this year.

The advanced networking course is a follow-up to the basic networking course. I cover several advanced topics and aims at explaining to the students the operation of large Internet Service Provider networks. The main topics covered are :

  • Interdomain routing with BGP (route reflectors, traffic engineering, ...)
  • Traffic control and Quality of Service (from basic mechanisms - shaping, policing, scheduling, buffer acceptance - to services - integrated or differentiated services)
  • IP Multicast and Multicast routing protocols
  • Multiprotocol Label Switching
  • Virtual Private Networks
  • Middleboxes

The course is complemented by projects during which the students configure and test realistic networks built from Linux-based routers.

During the last decade, I’ve taught this course by using slides and presenting them to the students and discussing the theoretical material. I could have used some of them to record videos explaining the basic principles, but I’m still not convinced by the benefits of using video online as a learning vehicle. Video is nice for demonstrations and short introductory material, but students need written material to understand the details. For this reason, I’ve decided to opt for a seminar-type approach where the students read one or two articles every week to understand the basic principles. Then, the class focuses on discussing real cases or exercises.

Many courses are organized as seminars during which the students read recent articles and discuss them. Often, these are advanced courses and the graduate students read and comment recent scientific articles. This approach was not applicable in many case given the maturity of the students who follow the advanced networking course. Instead of using purely scientific articles, I’ve opted for tutorial articles that appear in magazines such as IEEE Communications Magazine or the Internet Protocol Journal . These articles are easier to read by the students and often provide good tutorial content with references that the students can exploit if they need additional information.

The course has started a few weeks ago and the interaction with the student has been really nice. I’ll regularly post updates on the articles that I’ve used, the exercises that have been developed and the student’s reactions. Comments are, of course, welcome.

]]>
Tue, 11 Feb 2014 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/12/03/happy.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/12/03/happy.html <![CDATA[Happy eyeballs makes me unhappy...]]> Happy eyeballs makes me unhappy...

Happy eyeballs, defined in RFC 6555, is a technique that enables dual-stack hosts to automatically select between IPv6 and IPv4 based on their respective performance. When a dual-stack host tries to contact a webserver that is reachable over both IPv6 and IPv4, it :

  • first tries to establish a TCP connection towards the IPv6 or IPv4 address and starts a short timeout, say 300 msec
  • if the connection is established over the chosen address family, it continues
  • if the timer expires before the establishment of the connection, a second connection is tried with the other address family

Happy eyeballs works well when the one of the two address families provides bad performance or is broken. In this case, a host using happy eyeballs will automatically avoid the broken address family. However, when both IPv6 and IPv4 work correctly, happy eyeballs may cause frequent switches between the two address families.

As an exemple, here is a summary of a packet trace that I collected when contacting a dual-stack web server from my laptop using the latest version of MacOS.

First connection

09:40:47.504618 IP6 client6.65148 > server6.80: Flags [S], cksum 0xe3c1 (correct), seq 2500114810, win 65535, options [mss 1440,nop,wscale 4,nop,nop,TS val 1009628701 ecr 0,sackOK,eol], length 0
09:40:47.505886 IP6 server6.80 > client6.65148: Flags [S.], cksum 0x1abd (correct), seq 193439890, ack 2500114811, win 14280, options [mss 1440,sackOK,TS val 229630052 ecr 1009628701,nop,wscale 7], length 0

The interesting information in these packets are the TCP timestamps. Defined in RFC 1323, these timestamps are extracted from a local clock. The server returns its current timestamp in the SYN+ACK segment.

Thanks to happy eyeballs, the next TCP connection is sent over IPv4 (it might be faster than IPv6, who knows). IPv4 works well and answers immediately

09:40:49.512112 IP client4.65149 > server4.80: Flags [S], cksum 0xee77 (incorrect -> 0xb4bd), seq 321947613, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1009630706 ecr 0,sackOK,eol], length 0
09:40:49.513399 IP (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto TCP (6), length 60) server4.80 > client4.65149: Flags [S.], cksum 0xd86f (correct), seq 873275860, ack 321947614, win 5792, options [mss 1380,sackOK,TS val 585326122 ecr 1009630706,nop,wscale 7], length 0

Note the TS val in the returning SYN+ACK. The value over IPv4 is much larger than over IPv6. This is not because IPv6 is faster than IPv4, but indicates that there is a load-balancer that balances the TCP connections between (at least) two different servers.

Shortly after, I authenticated myself over an SSL connection that was established over IPv4

09:41:26.566362 IP client4.65152 > server4.443: Flags [S], cksum 0xee77 (incorrect -> 0x420d), seq 3856569828, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1009667710 ecr 0,sackOK,eol], length 0
09:41:26.567586 IP server4.443 > client4.65152: Flags [S.], cksum 0x933e (correct), seq 3461360247, ack 3856569829, win 14480, options [mss 1380,sackOK,TS val 229212430 ecr 1009667710,nop,wscale 7], length 0

Again, a closer look at the TCP timestamps reveals that there is a third server that terminated the TCP connection. Apparently, in this case this was the load-balancer itself that forwarded the data extracted from the connection to one of the server.

Thanks to happy eyeballs, my TCP connections reach different servers behind the load-balancer. This is annoying because the web servers maintain one session and every time I switch from one session to another I might switch from one server to another. In my experience, this happens randomly with this server, possibly as a function of the IP addresses that I’m using and the server load. As a user, I experience difficulties to log on the server or random logouts, while the problem lies in unexpected interactions between happy eyeballs and a load balancer. The load balancer would like to stick all the TCP connections from one host to the same server, but due to the frequent switchovers between IPv6 and IPv4 it cannot stick each client to a server.

I’d be interested in any suggestions on how to improve this load balancing scheme without changing the web servers...

]]>
Tue, 03 Dec 2013 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/12/01/sandstorm.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/12/01/sandstorm.html <![CDATA[Sandstorm : even faster TCP]]> Sandstorm : even faster TCP

Researchers have worked on improving the performance of TCP since the early 1980s. At that time, many researchers considered that achieving high performance with a software-based TCP implementation was impossible. Several new transport protocols were designed at that time such as XTP. Some researchers even explored the possibility of implementing transport protocols. Hardware-based implementations are usually interesting to achieve high performance, but they are usually too inflexible for a transport protocol. In parallel with this effort, some researchers continued to believe in TCP. Dave Clark and his colleagues demonstrated in [1] that TCP stacks could be optimized to achieve high performance.

TCP implementations continued to evolve in order to achieve even higher performance. The early 2000s, with the advent of Gigabit interfaces showed a better coupling between TCP and the hardware on the network interface. Many high-speed network interfaces can compute the TCP checksum in hardware, which reduces the load on the main CPU. Furthermore, high-speed interfaces often support large segment offload. A naive implementation of a TCP/IP stack would be to send segments and acknowledgements independently. For each segment sent/received, such a stack could need to process one interrupt, a very costly operation on current hardware. Large segment offload provides an alternative by exposing to the TCP/IP stack a large segment size, up to 64 KBytes. By sending an receiving larger segments, the TCP/IP stack minimizes the cost of processing the interrupts and thus maximises its performance.

Read more...

]]>
Sun, 01 Dec 2013 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/12/01/broadband_in_europe.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/12/01/broadband_in_europe.html <![CDATA[Broadband in Europe]]> Broadband in Europe

The European Commission has published recently an interesting survey of the deployment of broadband access technologies in Europe. The analysis is part of the Commission’s Digital Agenda that aims at enabling home users to have speeds of 30 Mbps or more. Technologies like ADSL and cable modems are widely deployed in Europe and more than 95% of European households can subscribe to fixed broadband networks.

Read more...

]]>
Sun, 01 Dec 2013 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/11/30/multipath_rtp.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/11/30/multipath_rtp.html <![CDATA[Multipath RTP]]> Multipath RTP

Multipath TCP enables communicating nodes to use different interfaces and paths to reliably exchange data. In today’s Internet, most applications use TCP and can benefit from Multipath TCP. However, multimedia applications often use the Real Time Transport (RTP) on top of UDP. A few years after the initial work on Multipath TCP, researchers at Aalto university analyzed how RTP to be extended to support multiple path. Thanks to their work, there is now a backward compatible extension of RTP that can be used on multihomed hosts. This extension will be important to access mobile streaming websites that use the Real Time Streaming Protocol (RTSP).

Read more...

]]>
Sat, 30 Nov 2013 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/11/26/the_software_defined_operator.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/11/26/the_software_defined_operator.html <![CDATA[The Software Defined Operator]]> The Software Defined Operator

Network operators are reconsidering the architecture of their networks to better address the quickly evolving traffic and connectivity requirements. DT is one of them and in a recent presentation at the Bell Labs Open Days in Antwerp, Axel Clauberg gave his vision of the next generation ISP network. This is not the first presentation that DT employees give on their TerraStream vision for future networks. However, there are some points that are worth being noted.

Read more...

]]>
Tue, 26 Nov 2013 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/11/25/lessons_learned_from_sdn_experiments_and_deployments.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/11/25/lessons_learned_from_sdn_experiments_and_deployments.html <![CDATA[Lessons learned from SDN experiments and deployments]]> Lessons learned from SDN experiments and deployments

The scientific literature is full of papers that propose a new technique that (sometimes only slightly) improves the state of the art and evaluate its performance (by means of mathematical models, simulations are rarely experiments with real systems). During the last years, Software Defined Networking has seen a growing interest in both the scientific community and among vendors. Initially proposed as Stanford University, Software Defined Networking, aims at changing how networks are managed and operated. Today’s networks are composed of off-the-shelf devices that support standardized protocols with proprietary software and hardware implementations. Networked devices implement the data plane to forwarding packet and the control plane to correctly compute their forwarding table. Both planes are today implemented directly on the devices.

Software Defined Networking proposes to completely change how networks are built and managed. Networked devices still implement the data plane in hardware, but this data plane, or more precisely the forwarding table that controls its operation, is exposed through a simple API to software defined by the network operator to manage the network. This software runs on a controller and controls the update of the forwarding tables and the creation/removal of flows through the network according to policies defined by the network operator. Many papers have already been written on Software Defined Networking and entire workshops are already dedicated to this field.

A recently published paper, Maturing of OpenFlow and software-defined networking through deployments, written by M. Kobayashi and his colleagues analyzes Software Defined Networking from a different angle. This paper does not present a new contribution. Instead, it takes on step back and discusses the lessons that the networking group at Stanford have learned from designing, using and experimenting with the first Software Defined Networks that are used by real users. The paper discusses many of the projects carried out at Stanford in different phases, from the small lab experiments to international wide-area networks and using SDN for production traffic. For each phase, and this is probably the most interesting part of the paper, the authors highlight several of the lessons that they have learned from these deployments. Several of these lessons are worth being highlighted :

  • the size of the forwarding table on Openflow switches matters
  • the embedded CPU on networking devices is a barreer to innovation
  • virtualization and slicing and important when deployments are considered
  • the interactions between Openflow and existing protocols such as STP can cause problems. Still, it is unlikely that existing control plane protocols will disappear soon.

This paper is a must-read for researchers working on Software Defined Networks because it provides informations that are rarely discussed in scientific papers. Furthermore, it shows that eating your own dog food, i.e. really implementing and using the solutions that we propose in out papers is useful and has value.

Bibliography

[1] Masayoshi Kobayashi, Srini Seetharaman, Guru Parulkar, Guido Appenzeller, Joseph Little, Johan van Reijendam, Paul Weissmann, Nick McKeown, Maturing of OpenFlow and software-defined networking through deployments, Computer Networks, Available online 18 November 2013, ISSN 1389-1286, http://dx.doi.org/10.1016/j.bjp.2013.10.011.

]]>
Mon, 25 Nov 2013 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/11/24/another_type_of_attack_on_multipath_tcp.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/11/24/another_type_of_attack_on_multipath_tcp.html <![CDATA[Another type of attack on Multipath TCP ?]]> Another type of attack on Multipath TCP ?

In a recent paper presented at Hotnets, M. Zubair Shafiq and his colleagues discuss a new type of “attack” on Multipath TCP.

When the paper was announced on the Multipath TCP mailing list, I was somewhat concerned by the title. However, after having read it in details, I do not consider the inference “attack” discussed in this paper as a concern. The paper explains that thanks to Multipath TCP, it is possible for an operator to infer about the performance of “another operator” by observing the Multipath TCP packets that pass through its own network. The “attack” is discussed in the paper and some measurements are carried out in the lab to show that it is possible to infer some characteristics about the performance of the other network.

After having read the paper, I don’t think that the problem is severe and should be classified as an “attack”. First, if I want to test the performance of TCP in my competitor’s network, I can easily subscribe to this network, in particular for wireless networks that would likely benefit from Multipath TCP. There are even public measurements facilities that collect measurement data, see SamKnows, the FCC measurement app, speedtest or MLab.

More fundamentally, if an operator observes one subflow of a Multipath TCP connection, it cannot easily determine how many subflows are used in this Multipath TCP connection and what are the endpoints of these subflows. Without this information, it becomes more difficult to infer TCP performance in another specific network.

The technique proposed in the paper mainly considers the measurement throughput on each subflow as a time series whose evolution needs to be predicted. A passive measurement device could get more accurate predictions by looking at the packets that are exchanged, in particular the DATA level sequence number and acknowledgements. There is plenty of room to improve the inference technique described in this paper. Once Multipath TCP gets widely deployed and used for many applications, it might be possible to extend the technique to learn more about the performance of TCP in the global Internet.

]]>
Sun, 24 Nov 2013 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/27/news.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/27/news.html <![CDATA[The Multipath TCP buzz]]> The Multipath TCP buzz

The inclusion of Multipath TCP in iOS7 last week was a nice surprise for the designers and first implementors of the protocol. The initial announcement created a buzz that was echoed by many online publications :

The same information also appeared in news sites in Spanish, Norwegian, Japanese, Chinese, Portugese (see 1, 2, 3) and various blogs. See Google news search for recent links.

If you’ve seen postings about Multipath TCP in other major online or print publications, let me know.

]]>
Fri, 27 Sep 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/23/cnp3_reliability.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/23/cnp3_reliability.html <![CDATA[Computer Networking : starting from the principles]]> Computer Networking : starting from the principles

In 2009, I took my first sabbatical and decided to spend a large fraction of my time to write the open-source Computer Networking : Principles, Protocols and Practice ebook. This ebook was well received by the community and it received a 20,000$ award from the Saylor foundation that published it on iTunes.

There are two approaches to teach standard computer networking classes. Most textbooks are structured on the basis of the OSI or TCP/IP layered reference models. The most popular organisation is the bottom-up approach. Students start by learning about the physical layer, then move to datalink, ... This approach is used by Computer Networks among others. Almost a decade ago, Kurose and Ross took the opposite approach and started from the application layer. I liked this approach and have adopted a similar one for the first edition of Computer Networking : Principles, Protocols and Practice.

However, after a few years of experience of using the textbook with students and discussions with several colleagues who were using parts of the text, I’ve decided to revise it. This is a major revision that will include the following major modifications.

  • the second edition of the ebook will use an hybrid approach. One half of the ebook will be devoted to the key principles that any CS student must know. To explain these principles, the ebook will start from the physical layer and go up to the application. The main objective of this first part is to give the students a broad picture of the operation of computer networks without entering into any protocol detail. Several existing books discuss this briefly in their introduction or first chapter, but one chapter is not sufficient to grasp this entirely.

  • the second edition will discuss at least two different protocols in each layer to allow the students to compare different designs.

    • the application layer will continue to cover DNS, HTTP but will also include different types of remote procedure calls
    • the transport layer will continue to explain UDP and TCP, but will also cover SCTP. SCTP is cleaner than TCP and provides a different design for the students.
    • the network layer will continue to cover the data and control planes. In the control plane, RIP, OSPF and BGP remain, except that iBGP will probably not be covered due to time constraints. Concerning the data plane, given the same time constraints, we can only cover two protocols. The first edition covered IPv4 and IPv6. The second edition will cover IPv6 and MPLS. Describing MPLS (the basics, not all details about LDP and RSVP-TE, more on this in a few weeks) is important to show a different design than IP to the students. Once this choice has been made, one needs to select between IPv4 and IPv6. Covering both protocols is a waste of student’s time and the second edition will only discuss IPv6. A this point, it appears that IPv6 is more future-proof than IPv4. The description of IPv4 can still be found in the first edition of the ebook.
    • the datalink layer will continue to cover Ethernet and WiFi. Zigbee or other techniques could appear in future minor revisions
  • Practice remains an important skill that networking students need to learn. The second edition will include IPv6 labs built on top of netkit to allow the students to learn how to perform basic network configuration and management tasks on Linux.

The second edition of the book will be tested by the students who follow INGI2141 at UCL. The source code is available from https://github.com/obonaventure/cnp3/ and drafts will be posted on http://cnp3bis.info.ucl.ac.be/ every Wednesday during this semester.

]]>
Mon, 23 Sep 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/21/journals.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/21/journals.html <![CDATA[Sources of networking information]]> Sources of networking information

Students who start their Master thesis in networking have sometimes difficulties in locating scientific information which is related to their Master thesis’ topic. Many of them start by googling with a few keywords and find random documents and wikipedia pages. To aid them, I list below some relevant sources of scientific information about networking in general. The list is far from complete and biased by my own research interests which do not cover the entire networking domain.

Digital Libraries

During the last decade, publishers of scientific journals and conference organizers have created large digital libraries that are accessible through a web portal. Many of them are protected by a paywall that provides full access only to paid subscribers, but many universities have (costly) subscriptions to (some of) these librairies. Most of these digital librairies provide access to table of contents and abstracts.

Magazines

Conferences

Journals

Standardisation bodies

]]>
Sat, 21 Sep 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/20/tracebox.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/20/tracebox.html <![CDATA[Is your network ready for iOS7 and Multipath TCP ?]]> Is your network ready for iOS7 and Multipath TCP ?

During the last days, millions of users have installed iOS7 on their iphones and ipad. Estimates published by The Guardian reveal that more than one third of the users have already upgraded their devices to support the new release. As I still don’t use a smartphone, I usually don’t check these new software releases. From a networking viewpoint, this iOS update is different because it is the first step towards a wide deployment of Multipath TCP [RFC 6824]. Until now, Multipath TCP has mainly been used by researchers. With iOS7, the situation changes since millions of devices are capable of using Multipath TCP.

From a networking viewpoint, the deployment of Multipath TCP is an important change that will affect many network operators. In the 20th century, networks were only composed of routers and switches. These devices are completely transparent to TCP and never change any field of the TCP header or payload. Today’s networks, mainly enterprise and cellular networks are much more complex. They include various types of middleboxes that process the IP header but also analyze the TCP headers and payload and sometimes modify them for various. Michio Honda an his colleagues presented at IMC2011 a paper that reveals the impact of these middleboxes on TCP and its extensibility. In a nutshell, this paper revealed the following behaviors :

  • some middleboxes drop TCP options that they do not understand
  • some middleboxes replace TCP options by dummy options
  • some middleboxes change fields of the TCP header (source and destination ports for NAT, but also sequence/acknowledgement numbers, window fields, ...)
  • some middleboxes inspect the payload of TCP segments, reject out-of-sequence segments and sometimes modify the TCP payload (e.g. ALG for ftp on NAT)

These results had a huge influence on the design of Multipath TCP that includes various mechanisms that enable it to work around most of these middleboxes and fallback to regular TCP in case of problems (e.g. payload modifications) to preserve connectivity.

Of course, Multipath TCP will achieve the best performance when running in a network which is fully transparent and does not include middleboxes that interfere with it. Network operators might have difficulties to check the possible interference between their devices and TCP extensions like Multipath TCP. While implementing Multipath TCP in the Linux kernel, we spent a lot of time understanding the interference caused by our standard firewall that randomizes TCP sequence numbers.

To support network operators who want to check the transparency of their network, we have recently released a new open-source software called tracebox. tracebox is described in a forthcoming paper that will be presented at IMC2013.

In a nutshell, tracebox can be considered as an extension to traceroute. Like traceroute, it allows to discover devices in a network. However, while traceroute only detects IP routers, tracebox is able to detect any type of middlebox that modify some fields of the network or transport header. tracebox can be used as a command-line tool but also includes a scripting language that allows operators to develop more complex tests.

For example, tracebox can be used to verify that a path is transparent for Multipath TCP as shown below

# tracebox -n -p IP/TCP/MSS/MPCAPABLE/WSCALE bahn.de
tracebox to 81.200.198.6 (bahn.de): 64 hops max
1: 130.104.228.126 IP::CheckSum
2: 130.104.254.229 IP::TTL IP::CheckSum
3: 193.191.3.85 IP::TTL IP::CheckSum
4: 193.191.16.21 IP::TTL IP::CheckSum
5: 195.69.144.123 IP::TTL IP::CheckSum
6: 145.254.5.158 IP::TTL IP::CheckSum
7: 88.79.13.62 IP::TTL IP::CheckSum
8: 81.200.194.234 IP::TTL IP::CheckSum
9: 81.200.197.9 IP::TTL IP::CheckSum
10: 81.200.198.6 TCP::CheckSum IP::TTL IP::CheckSum TCPOptionMaxSegSize::MaxSegSize -TCPOptionMPTCPCapable -TCPOptionWindowScale

At each hop, tracebox verifies which fields of the IP/TCP headers have been modified. In the trace above, tracebox sends a SYN TCP segment on port 80 that contains MSS, MP_CAPABLE and WSCALE option. The last hop corresponds to a middlebox that changes the MSS option and removes the MP_CAPABLE and WSCALE option. Thanks to the flexibility of tracebox, it is possible to use it to detect almost any type of middlebox interference.

You can use it on Linux and MacOS to verify whether the network that you use is fully transparent to TCP. If not, tracebox will point you to the offending middlebox.

]]>
Fri, 20 Sep 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/18/mptcp.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/18/mptcp.html <![CDATA[Apple seems to also believe in Multipath TCP]]> Apple seems to also believe in Multipath TCP

Multipath TCP is a TCP extension that allows a TCP connection to send/receive packets over different interfaces. Multipath TCP has various use cases, including :

Designing such a major TCP extension has been a difficult problem and took a lot of effort within several research projects. The work started within the FP7 Trilogy project funded by the European Commission. It continues within the CHANGE and Trilogy 2 projects.

After five years of effort, we are getting close to a wide adoption of Multipath TCP.

  • In January 2013, the IETF published the Multipath specification as an Experimental standard in RFC 6824
  • In July 2013, the MPTCP working group reported three independent implementations of Multipath TCP, including our implementation in the Linux kernel. To my knowledge, this is the first time that a large TCP extension is implemented so quickly.
  • On September 18th, 2013, Apple releases iOS7 which includes the first large scale commercial deployment of Multipath TCP. Given the marketing buzz around new iOS7 releases, when can expect tens of millions of users who will use a Multipath TCP enabled device.

Packet traces collected on an iPad running iOS7 reveal that it uses Multipath TCP to reach some destinations that seem to be directly controlled by Apple. You won’t see Multipath TCP for regular TCP connections from applications like Safari, but if you use SIRI, you might see that the connection with one of the apple servers uses Multipath TCP. The screenshot below shows the third ACK of a three-way handshake sent by an ipad running iOS7.

../../../_images/siri.png

At this stage, the actual usage of Multipath TCP by iOS7 is unclear to me. If you have any hint on the type of information exchanged over this SSL connection, let me know.

The next step will, of course, be the utilisation of Multipath TCP by default for all applications running over iOS7.

]]>
Wed, 18 Sep 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/10/msc.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/09/10/msc.html <![CDATA[Quickly producing time-sequence diagrams]]> Quickly producing time-sequence diagrams

Networking researchers and teachers often need to draw time-sequence diagrams that represent the exchange of packets through a network. Any drawing tool can be used to write these diagrams that contains mainly lines, arrows and text. However, while writing an article or a textbook, switching from the text to the drawing tool can be cumbersome.

A better approach would to write a description of the diagrams directly in the text as a set of commands in a simple langage. Latex hackers can probably manage this easily, but I’m far from a latex guru. Thanks to Benjamin Hesmans, I recently found an interesting software called MSCGen. MSCGen was designed to write Message Sequence Chart descriptions. It produces SVG and PNG images and is integrated with sphinx thanks to mscgen extension. This integration is very useful since it allows to write both images and text directly in ascii.

The langage supported by mscgen is similar to the DOT langage used by graphviz and is very easy to use. For example, the code below

.. msc::

    a [label="", linecolour=white],
    b [label="Host A", linecolour=black],
    z [label="Physical link", linecolour=white],
    c [label="Host B", linecolour=black],
    d [label="", linecolour=white];

    a=>b [ label = "DATA.req(0)" ] ,
    b>>c [ label = "", arcskip=1];
    c=>d [ label = "DATA.ind(1)" ];

Produces the following image.

../../../_images/msc.png

The only drawback of MSCGen is that it is currently difficult to write a diagram that contains a window of packets that are exchanged and the opposite flow of the acknowledgements. Besides that, I’m planning to use it to produce all time sequence diagrams in the planned revision of Computer Networking : Principles, Protocols and Practice

]]>
Tue, 10 Sep 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/08/27/citation.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/08/27/citation.html <![CDATA[Adding bibliographic information to pdf files]]> Adding bibliographic information to pdf files

Researchers often distribute pdf files of their articles on their homepages or through institutional repositories like DIAL. Researchers are encouraged to distribute their scientific papers electronically and measurements have shown that distributing papers online improves the impact of the papers. Still, there is often one important information which is missing when a paper is posted on a website : the precise bibliography information which is needed to cite the paper. Without this bibliographic information, readers of a paper my print or save it without knowing where it has been published and are more likely to ignore it when preparing the bibliography of their own papers.

A better approach is to add directly the bibliographic information inside the pdf file. This is what the default ACM Latex style provides for accepted papers. For the SIGCOMM ebook on Recent Advances in Networking, we opted for a simple note on each paper.

Read more...

]]>
Tue, 27 Aug 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/08/22/zmap.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/08/22/zmap.html <![CDATA[How quickly can we scan the entire Internet]]> How quickly can we scan the entire Internet

A random host on the Internet receives a large number of unsollicited packets. Some of these packets are caused by transmission errors that modify the destination address of the packets or bugs/implementation errors. Still, most of the background Internet noise observed by network telescopes comes from worms that try to propagate or researchers, security experts or attackers trying to find characteristics of remote hosts.

When researchers try to map the Internet, they usually operate slowly. For example, CAIDA takes a few days to send traceroute probes towards all reachable class C networks. The 2012 anonymous Internet Census that exploited a large number of vulnerable routers to serve as probes took months. nmap, the default tool to probe open services on a remote host or network also uses a slow mode of operation. These slow modes of operations are mainly chosen to avoid triggering alarms on the remote sites. A few packets can be easily unnoticed on an entreprise networks, not millions of them.

A recent paper presented at the USENIX 2013 Security symposium takes a completely different approach.

Read more...

]]>
Thu, 22 Aug 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/08/20/pdflinks.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/08/20/pdflinks.html <![CDATA[Adding hyperlinks to our Latex articles]]> Adding hyperlinks to our Latex articles

When they write papers, scientists take a lot of time in preparing their bibliography and correctly citing all their references. However, bibliographies and the corresponding bibtex styles were designed when everyone read scientific papers on paper. This is rarely the case today and most scientific papers are read online. Still, we insist on placing volume numbers, pages numbers and other information from the paper era in each paper but rarely URLs or DOIs. This is probably a mistake...

When developing the first edition of Computer Networking : Principles, Protocols and Practice I quickly found that students read references provided that these references were easily accessible through hyperlinks. Today’s students and I guess a growing number of researchers are used to browse the web but rarely go to their library to read articles on paper. For the recently published SIGCOMM ebook on Recent Advances in Networking, we did a small experiment in adding hyperlinks directly to each chapter in pdf format. Adding these hyperlinks was surprisingly easy and I hope useful for the readers.

Read more...

]]>
Tue, 20 Aug 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/07/04/tcp_over_udp.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/07/04/tcp_over_udp.html <![CDATA[TCP over UDP : a new hack to pass through (some) middleboxes]]> TCP over UDP : a new hack to pass through (some) middleboxes

Extending TCP in the presence of middleboxes is a difficult but not impossible task as shown by Multipath TCP RFC 6824. A recent IETF draft proposed by Apple suggests to encapsulate TCP segments inside UDP to prevent modifications performed by middleboxes. Apparently, some measurements indicate that UDP passes better through some types of NAT boxes that regular TCP segments. Since TCP is more widely used than UDP, the draft proposes to encapsulate TCP inside UDP. The proposed encapsulation technique is a bit unusual. A classical encapsulation would put the entire TCP segment after the UDP header. Instead, the TCP-over-UDP draft proposes to rewrite the TCP header as follows

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            Length             |           Checksum            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |           | |A|P|R|S|F|                               |
| Offset| Reserved  |0|C|S|S|Y|I|            Window             |
|       |           | |K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      (Optional) Options                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Read more...

]]>
Thu, 04 Jul 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/06/25/fragmentation.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/06/25/fragmentation.html <![CDATA[Should we completely deprecate IP fragmentation ?]]> Should we completely deprecate IP fragmentation ?

Fragmentation and reassembly have been part of the IPv4 specification seems the beginning. One of the main motivations for including such mechanisms in the network layer is of course to allow IP packets to be exchanged over subnetworks that support different packet sizes. The IPv4 fragmentation forced routers to be able to fragment too large fragments. When routers were entirely software based, doing fragmentation on the router was a viable solution. However, with the advent of hardware assisted routers, performing fragmentation on the routers became quickly too expensive. In a seminal paper, Christopher Kent and Jeff Mogul argued that fragmentation should be considered harmful. This encourage endhosts to avoid in-network packet fragmentation and most TCP implementations now include Path MTU discovery RFC 1191.

When IPv6 was designed, in-network fragmentation was quickly left out. However, the designers of IPv6 still believed in the benefits of fragmentation. IPv6 supports a fragmentation header that can be used by endhosts to fragment packets that are too large for a given path. One of the motivation for host based fragmentation is that some packets need to be transmitted over subnets that only support small packet sizes (IPv6 mandates a minimum MTU of 1280 bytes).

Read more...

]]>
Tue, 25 Jun 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/05/22/don_t_use_ping_for_delay_measurements.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/05/22/don_t_use_ping_for_delay_measurements.html <![CDATA[Don't use ping for accurate delay measurements]]> Don’t use ping for accurate delay measurements

The ping software was designed decades ago to verify the reachability of a given IPv4 address. For this, it relies on ICMP that runs on top of IPv4. A host that receives an ICMP echo request message is supposed to reply immediately by sending an ICMP echo reply message. This confirms the reachability of the remote host. By measuring the delay between the transmission of the echo request message and the reception of the echo reply message, it is possible to infer the round-trip-time between the two hosts. Since the round-trip-time is important for the performance of many Internet protocols, this is an important metric which is reported by ping. Some variants of ping also report the minimum and maximum delays after measuring a number of round-trip-times. A typical example is shown below

ping www.psg.com
PING psg.com (147.28.0.62): 56 data bytes
64 bytes from 147.28.0.62: icmp_seq=0 ttl=48 time=148.715 ms
64 bytes from 147.28.0.62: icmp_seq=1 ttl=48 time=163.814 ms
64 bytes from 147.28.0.62: icmp_seq=2 ttl=48 time=148.780 ms
64 bytes from 147.28.0.62: icmp_seq=3 ttl=48 time=153.456 ms
64 bytes from 147.28.0.62: icmp_seq=4 ttl=48 time=148.935 ms
64 bytes from 147.28.0.62: icmp_seq=5 ttl=48 time=153.647 ms
64 bytes from 147.28.0.62: icmp_seq=6 ttl=48 time=148.682 ms
64 bytes from 147.28.0.62: icmp_seq=7 ttl=48 time=163.926 ms
64 bytes from 147.28.0.62: icmp_seq=8 ttl=48 time=148.669 ms
64 bytes from 147.28.0.62: icmp_seq=9 ttl=48 time=153.352 ms
64 bytes from 147.28.0.62: icmp_seq=10 ttl=48 time=163.688 ms
64 bytes from 147.28.0.62: icmp_seq=11 ttl=48 time=148.729 ms
64 bytes from 147.28.0.62: icmp_seq=12 ttl=48 time=163.691 ms
64 bytes from 147.28.0.62: icmp_seq=13 ttl=48 time=148.536 ms
^C
--- psg.com ping statistics ---
14 packets transmitted, 14 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 148.536/154.044/163.926/6.429 ms

In Computer Networking, Principles, Protocols and Practice, the following figure was used to illustrate the variation of the round-trip-time. This measurement was taken more than ten years ago between a host connected to a CATV modem in Charleroi and a server at the University of Namur. The main reason for the delay variations were the utilisation of the low speed link that we used at that time.

../../../_images/transport-fig-070-c.png

Evolution of the round-trip-time between two hosts

In a recent presentation at RIPE66, Randy Bush and several of his colleagues revealed some unexpected measurements collected by using ping. For these measurements, they used two unloaded servers and sent pings through mainly backbone networks. The figure below shows the CDF of the measured delays. The staircase curves were the first curves that they obtained. These delays look strange and several plateaux appear but it is not easy to find a clear explanation immediately.

They studied these delays in more details and tried to understand the reason for the huge delay variations that they observed. To understand the source of the delay variations, it is useful to look back at the format of an ICMP message encapsulated inside an IPv4 packet.

../../../_images/icmpv4.png

The important part in this header is the first 32 bits word of the ICMPv4 header. For TCP and UDP, this word contains the source and destination ports of the transport flow. Many routers that support Equal Cost Multipath will compute a hash function over the source and destination IP addresses and ports for packets carrying TCP/UDP segments. However, how should such a load balancing router handle ICMP messages or other types of protocols that run directly on top of IPv4. A first option would be to always send ICMP messages over the same path, i.e. disable load balancing for ICMP messages. This is probably not a good idea from an operational viewpoint since this would imply that ICMP messages, that are often used for debugging, would not necessarily follow the same paths as regular packets. A better option would be to only use the source and destination IP addresses when load balancing ICMP messages. However, this requires the router to detect distinguish between UDP/TCP and other types of flows and react in function of the Protocol field of the IP header. This likely increases the cost of implementing load-balancing in hardware. The measurements presented above are probably, at least partially, caused by load-balancing routers that use the first 32 bits word of the IP payload to make their load balancing decision, without verifying the Protocol field in the IP header. The vertical bars shown in the figure above correspond to a modified ping that always send ICMP messages that start with the same first 32 bits word. However, this does not completely explain why there is a delay difference of more than 15 milliseconds on the equal cost paths between two servers. Something else might be happening in this network.

Additional details are discussed in On the suitability of ping to measure latency by Cristel Pelsser, Luca Cittadini, Stefano Vissicchio and Randy Bush. See https://ripe66.ripe.net/archives/video/12/ for the recorded video.

]]>
Wed, 22 May 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/05/09/revisiting_the_infocom2007_best_paper.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/05/09/revisiting_the_infocom2007_best_paper.html <![CDATA[Disruption free reconfiguration for link-state routing protocols implementable on real routers]]> Disruption free reconfiguration for link-state routing protocols implementable on real routers

Link state routing protocols like OSPF and IS-IS flood information about the network topology to all routers. When a link needs to be shutdown or its weight changes, a new link state packet is flooded again. Each router processes the received link state packet and updates its forwarding table. Updating the forwarding table of all routers inside a large network can take several hundreds of milliseconds up to a few seconds depending on the configuration of the routing protocol. For his PhD thesis, Pierre Francois studied this problem in details. He first designed a simulator to analyse all the factors that influence the convergence time in a large ISP network. This analysis revealed that it is difficult for a large network to converge within less than 100-200 milliseconds after a topology change. During this period, transient loops can happen and packets get loss even for planned and non-urgent topology changes. Pierre Francois designed several techniques to avoid these transient loops during the convergence of a link-state routing protocol. The first solution required changes to the link-state routing protocol. We thought that getting IETF consensus on this solution would be very difficult. We were right since the framework draft has still not yet been adopted.

At INFOCOM 2007, by refining an intuition proposed by Mike Shand, we proposed a solution that does not require standardisation. This solution relies on the fact that if the weight of a link is changed by one (increase or decrease), no transient loop can happen. However, using only unit weight changes is not practical given the wide range of weights in real networks. Fortunately, our paper showed that when shutting down a link (i.e. setting its weight to infinity), it is possible to use a small number of metric increments to safely perform the reconfiguration. This metric reconfiguration was well accepted by the research community and received the INFOCOM 2007 best paper award.

The algorithms proposed in our INFOCOM 2007 paper were implemented in Java and took a few seconds or more to run. It was possible to run them on a management platform, but not on a router. Two years ago, Pascal Merindol and Francois Clad reconsidered the problem. They found new ways of expressing the proofs that enable loop-free reconfiguration of link-state protocols upon topology changes. Furthermore, they have also significantly improved the performance of the algorithms proposed in 2007 and reimplemented everything in C. The new algorithms operate one order of magnitude faster than the 2007 version. It becomes now possible to implement them directly on routers to enable OSPF and IS-IS to avoid low transient loops when dealing with non-urgent topology changes.

All the details are provided in our forthcoming paper that will appear in IEEE/ACM Transactions on Networking.

]]>
Thu, 09 May 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/04/11/hash_functions_are_not_the_only_answer_to_load_balancing_problems.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/04/11/hash_functions_are_not_the_only_answer_to_load_balancing_problems.html <![CDATA[Hash functions are not the only answer to load balancing problems]]> Hash functions are not the only answer to load balancing problems

Load balancing is important in many networks because it allows to spread the load over different ressources. Load balancing can happen at different layers of the protocol stack. A web server farm uses load balancing to distribute the load between different servers. Routers rely on Equal Cost Multipath (ECMP) to balance packets without breaking TCP flows. Bonding enables to combine several links in the datalink layer.

Many deployments of load balancing rely on the utilisation of hash functions. ECMP is a typical example. When a router has several equal cost paths towards the same destination, it can use a hash function to distribute the packets over these paths. Typically, the router would compute for each packet to be forwarded hash(IPsrc, IPdst, Portsrc, Portdst) mod N where N is the number of equal cost paths towards the destination to select the nexthop. Various hash functions have been evaluated in the literature RFC 2992 : CRC, checksum, MD5, ... Many load balancing techniques have adopted hash functions because they can be efficiently computed. An important criteria in selecting a function to perform load balancing is whether a small change in its input gives a large change in its output. This is sometimes called the avalanche effect and is a requirement for strong hash functions used in crypto applications. Unfortunately, hash functions have one important drawback : it is very difficult to predict an input that would lead to a given output. For crypto applications, this is the expectation, but many load balancing applications would like to be able to predict the output of the load balancing function while still benefitting from the avalanche effect.

In a recent paper, we have shown that it is possible to design a load balancing technique that both provides the avalanche effect (which is key for a good balancing of the traffic) and is predictable. The intuition behind this idea is that the hash function can be replaced by a block cipher. Block ciphers are usually used to encrypt/decrypt information by using a secret key. Since they are designed to provide an output that appears as random as possible for a wide range of inputs, they also exhibit the avalanche effect. Furthermore, provided that the key is known, the input that leads to a given output can be easily predicted. Our paper provides all the details and shows the benefits that such a technique can provide with Multipath TCP in datacenter networks, but there are many other potential applications.

]]>
Thu, 11 Apr 2013 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/02/05/humming.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/02/05/humming.html <![CDATA[Humming in the classroom]]> Humming in the classroom

One of the challenges of teaching to large classes is to encourage the students to interact during the class. Today’s professors do not simply read their course to passive students. Most try to initiate interaction with the students by asking questions, polling the students opinions, ... However, my experience in asking questions to students in a large class shows that it is difficult to get answers from many students. Asking the students to raise their hands to vote for a binary question almost always results in : - a small fraction of the students vote for the correct answer - a (usually smaller) fraction of the students vote for the wrong answer - most of the students do not vote

One of the reasons why students do not vote is that they are unsure about their answer and do not want their colleagues or worse their professor to notice that they had the wrong answer. Unfortunately, this decreases the engagement of the students and after some time some of them do not even think about the questions that are asked.

To cope with those shy students, I’ve started to ask them to hum to answer some of my questions. Humming is a technique that is used by the IETF to evaluate consensus during a working group meeting. The IETF develops the specifications for most of the protocols that are used on the Internet. An IETF working group is responsible for a given protocol. IETF participants meet every quarter together. During these meetings, engineers and protocol experts discuss about the new protocol specifications being developed. From time to time, working group chairs need to evaluate whether the working group agrees with one proposal. This question can be discussed on the working group’s mailing list. Another possibility would be to use a show of hands during the meeting, but during a show of hands, it is possible to recognize who is in favor and who is against a given proposal. This is not always desirable. The IETF uses a nice trick to solve this problem. Instead of asking participants to raise their hands, working group chairmans ask participants to hum. If most of the participants in the room hum, the noise level is high and the vote is accepted. Otherwise, if the noise level is similar in favor and against a proposal, then there is no consensus and the proposal will need to be discussed at another meeting later.

Humming works well in the classroom as well when asking a binary question or a question having a small number of possible answers. Students can give their opinion without revealing it to the professor. Of course, electronic voting systems can be used to preserve the anonymity of students, but deploying these systems in large classes is more and costly and more time consuming than humming...

References

]]>
Tue, 05 Feb 2013 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/01/22/looking_at_the_dns_through_a_looking_glass.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/01/22/looking_at_the_dns_through_a_looking_glass.html <![CDATA[Looking at the DNS through a looking glass]]> Looking at the DNS through a looking glass

In the networking community a looking glass is often a router located inside an ISP network that can be contacted openly via a telnet server or sometimes an HTTP server. These looking glasses are very useful to debug networking problems since they can be used to detect filtering of BGP routes or other problems.

A nice example of these looking glasses is the web server maintained by GEANT the pan-European research network. The GEANT looking glass provides dumps of BGP routing tables, traceroutes and other tools from most routers of GEANT. As an example, GEANT routers have three routes towards network 8.8.8.0/24 that includes the open DNS resolver managed by google.

inet.0: 433796 destinations, 1721271 routes (433773 active, 12 holddown, 755 hidden)
+ = Active Route, - = Last Active, * = Both

8.8.8.0/24         *[BGP/170] 3w6d 01:51:39, MED 0, localpref 95
                      AS path: 15169 I
                    > to 62.40.125.202 via xe-4/1/0.0
                    [BGP/170] 20w6d 03:35:32, MED 0, localpref 80
                      AS path: 3356 15169 I
                    > to 212.162.4.9 via xe-2/1/0.0
                    [BGP/170] 4w5d 04:06:28, MED 0, localpref 80
                      AS path: 174 15169 I
                    > to 149.6.42.73 via ge-6/1/0.0

As on all BGP routers, the best path which is actually used to forward packets is prefixed by *. Network operators have deployed many looking glasses, a list can be found on http://www.lookinglass.org/ and www.traceroute.org among others.

The correction operation of today’s Internet does not only depend on the propagation of BGP prefixes. Another frequent issue today is the correct dissemination of DNS names. In the early days, command-line tools like nslookup and dig were sufficient to detect most DNS problems. Today, this is not always the case anymore since Content Distribution Networks provide different DNS answers to the same DNS request coming from different clients. Furthermore, some network operators use DNS resolvers that sometimes provide invalid answers to some DNS requests. Some of these DNS resolvers are deployed due to legal constraints as some countries block Internet access to some sites. However, some ISPs have sometimes less legal reasons to deploy fake DNS resolvers as shown recently in France where Free deployed DNS masquerading to block access to some Internet advertisement companies. Checking the correct distribution of DNS names becomes now an operational problem. Several authors have proposed tools to examine the answers provided by the DNS to remote clients. Stéphane Bortzmeyer, who has sent many patches and improvements for the CNP3, book has developed a very interesting alternative to these DNS looking glasses. dns-lg can be used manually through a web server, but can also be used through an API that provides JSON output. This is pretty interesting to develop automated tests. An interesting feature of dns-lg is its REST API that allows to easily query the looking glass. For example, http://dns.bortzmeyer.org/multipath-tcp.org/NS returns the NS record for multipath-tcp.org while http://dns.bortzmeyer.org/www.uclouvain.be/AAAA returns the IPv6 address (AAAA) record of UCL’s web server. Thanks to its web interface, dns-lg could be a very nice alternative for students who have difficulties to use classical command line tools when they start learning networking.

]]>
Tue, 22 Jan 2013 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/01/05/broadband_and_cellular_performance.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/01/05/broadband_and_cellular_performance.html <![CDATA[A closer look at broadband and cellular performance]]> A closer look at broadband and cellular performance

The Internet started almost exactly 30 years ago when ARPANet switched to the TCP/IP protocol suite. At that time, only a few experimental hosts were connected to the network. Three decades later, the Internet is part of our lives and most users access the Internet by using broadband access or cellular networks. Understanding the performance characteristics of these networks is essential to understand the factors that influence Internet protocols.

In 2011, Srikanth Sundaresan and several other researchers presented a very interesting paper [3] at SIGCOMM that analysed a large number of measurement studies conducted by using modified home access routers. Two types of devices were used : dozen of home routers modified by the Bismark project and several thousands of measurement boxes deployed by Samknows throughout the US. This paper revealed that there is a huge diversity in broadband performance in the US. This diversity depends on the chosen ISP, the chosen plan and also the geographical location. The paper was later revised and published in Communications of the ACM, ACM’s flagship magazine [4].

During the last edition of the Internet Measurements Conference, Joel Somers, Paul Barford and Igor Canadi presented two papers that analyse Internet performance from a different viewpoint. [1] uses data from http://www.speedtest.net and SamKnows to analyse broadband performance. This enables them to provide a much more global analysis of broadband performance. For example, the figure below shows the average download throughput measured in different countries.

../../../_images/perf-down.png

Average download performance (source [1])

The second paper, [2] explores the performance of WiFi and cellular data networks in more than a dozen of cities, including Brussels. Latency and bandwidth information is extracted from http://www.speedtest.net. A first point to be noted from these measurements is that cellular and WiFi performance are significantly lower than broadband performance, despite all the efforts in deploying higher speed wireless networks. Note that the data was collected in February-June 2011, the network performance might have changed since. When comparing WiFi and cellular data, WiFi is consistently faster in all studied regions. In Brussels for example, WiFi download throughput is 8.6 Mbps in Brussels while the average cellular download throughput is only 1.2Mbps. Latency is also an important performance factor. In Brussels, the average WiFi latency is slightly above 100 milliseconds while it reaches 281 milliseconds for cellular networks. Both papers are recommended readings for anyone willing to better understand the performance of Internet access networks.

[1](1, 2) Igor Canadi, Paul Barford, and Joel Sommers. Revisiting Broadband Performance. In the 2012 Internet Measurements conference, 273–286. New York, New York, USA, 2012. ACM Press.
[2]Joel Sommers and Paul Barford. Cell vs. WiFi : On the Performance of Metro Area Mobile Connections. In the 2012 Internet Measurements conference, 301. New York, New York, USA, 2012. ACM Press.
[3]S Sundaresan, W de Donato, N Feamster, Renata Teixeira, Sam Crawford, and Antonio Pescapè. Broadband internet performance: a view from the gateway. SIGCOMM, 2011.
[4]Srikanth Sundaresan, Walter de Donato, Nick Feamster, Renata Teixeira, Sam Crawford, and Antonio Pescapè. Measuring home broadband performance. ACM SIGCOMM Computer Communication Review, 55(11):100, November 2012.
]]> Sat, 05 Jan 2013 00:00:00 +0100 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/12/17/reconfiguration_matters.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/12/17/reconfiguration_matters.html <![CDATA[REconfiguration matters]]> REconfiguration matters

Network configuration and management is an important problem for all network operators. IP networks are implemented by using a large number of switches and routers from different vendors. All these devices must be configured by the operators to maximise the performance of the network. In some networks, configuration files contain several tens of thousands of lines per device. Managing all these configurations is costly in large networks. Some researchers have worked on analysing the complexity of these networks and proposing abstractions to allow operators to better configure their network. Still, network configuration and management is often closer to an art than to science.

Researchers often consider the network configuration problem as a design problem. Starting from a blank sheet, how can a network operator define his/her objectives and then derive the configuration files that meet these objectives. This is not how networks are operated. Network operators almost never have the opportunity to design and configure a network from scratch. This only happens in new datacenters or new entreprise networks. In their recent work, Laurent Vanbever, Stefano Vissicchio and other colleagues have addressed a slightly different but much more relevant problem : network REconfiguration. There are only two letters of difference between network configuration and network REconfiguration, but these two letters reflect one of the important sources of complexity in managing a network and introducting new services. Once a network has been configured, it must remain in operation 24h per day, 365 days per year. Network equipment can remain in operation for 5-10 years and during their period their role in the network changes. All these changes must be done with as few impact as possible on the network.

To better understand the difficulty of reconfiguring a network, it is interesting to have a brief look at earlier papers that deal with similar problems. A decade ago, routing sessions had to be reset for each policy change or when the operating system of the router had to be upgraded. Aman Shaikh and others have shown that it is possible to update the control plane of a router without disrupting the dataplane [5]. Various graceful shutdown and graceful restart techniques have been proposed and implemented for the major control plane protocols. Another simple example of a reconfiguration problem is when operators need to change the OSPF weight associated to one link. This can happen for traffic engineering or maintenance purposes. This change triggers an OSPF convergence than can cause transient loops. Pierre Francois and others have proposed techniques that allow these simple reconfigurations to occur without causing transient forwarding problems [3][2]. Another step to aid network reconfiguration was the shadow configuration paper [1] that shows how to run different configurations in the same network at a given time.

During the last years, several network Reconfiguration problems have been addressed. The first problem is the migration from one configuration of a link-state routing protocol (e.g. OSPF without areas) to another link-state routing protocol (e.g. IS-IS with areas). At first glance, this problem could appear to be simple. However, network operators who have performed such a transition have spent more than half a year to plan the transition and analyse all the problems that could occur. [6] provides first a theoretical framework that shows the problems that could occur during such a reconfiguration. It shows that it is possible to avoid transient forwarding problems during the reconfiguration by using a ships-in-the night approach and updating the configuration of the routers in a specific order. Unfortunately, finding this ordering is an NP-complete problem. However, the paper proposes heuristics that find a suitable ordering and applies it to real networks and provides measurements from a prototype reconfigurator that manages an emulated network.

A second problem are the BGP reconfigurations. Given the complexity of BGP, it is not surprising that BGP reconfigurations are more difficult than IGP reconfigurations. [7] first shows that signalling and forwarding correctness that are usually used to very iBGP configuration are not sufficient properties. Dissemination correctness must be ensured in addition to these two properties. :cite:`6327628`_ analyses several iBGP reconfiguration problems and identifies some problematic configurations. To allow an iBGP reconfiguration, this paper proposes and evaluates a BGP multiplexer that combined with encapsulation enables iBGP reconfigurations. The proposed solution provably enables lossless BGP reconfigurations by leveraging existing technology to run multiple isolated control-planes in parallel.

This work on REconfiguration has already lead up to some follow-up work. For example, [4] has proposed techniques that use tagging to allow software-defined networks to support migrations in a seamless manner. We can expect to rad more papers that deal with REconfiguration problems in the coming years.

[1]Richard Alimi, Ye Wang, and Y. Richard Yang. Shadow configuration as a network management primitive. SIGCOMM Comput. Commun. Rev., 38(4):111–122, August 2008.
[2]P. Francois, M. Shand, and O. Bonaventure. Disruption free topology reconfiguration in ospf networks. In INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE, volume, 89 –97. May 2007.
[3]Pierre Francois and Olivier Bonaventure. Avoiding transient loops during the convergence of link-state routing protocols. IEEE/ACM Trans. Netw., 15(6):1280–1292, December 2007.
[4]Mark Reitblatt, Nate Foster, Jennifer Rexford, Cole Schlesinger, and David Walker. Abstractions for network update. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, SIGCOMM ‘12, 323–334. New York, NY, USA, 2012. ACM.
[5]Aman Shaikh, Rohit Dube, and Anujan Varma. Avoiding instability during graceful shutdown of multiple ospf routers. IEEE/ACM Trans. Netw., 14(3):532–542, June 2006.
[6]Laurent Vanbever, Stefano Vissicchio, Cristel Pelsser, Pierre Francois, and Olivier Bonaventure. Seamless network-wide igp migrations. In Proceedings of the ACM SIGCOMM 2011 conference, SIGCOMM ‘11, 314–325. New York, NY, USA, 2011. ACM.
[7]S. Vissicchio, L. Cittadini, L. Vanbever, and O. Bonaventure. Ibgp deceptions: more sessions, fewer routes. In INFOCOM, 2012 Proceedings IEEE, volume, 2122 –2130. March 2012.
]]> Mon, 17 Dec 2012 00:00:00 +0100 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/12/14/mininet.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/12/14/mininet.html <![CDATA[Mininet : improving the reproducibility of networking research]]> Mininet : improving the reproducibility of networking research

In most research communities, the ability to reproduce research results is a key step in validating and accepting new research results. Ideally, all published papers should contain enough information to enable researchers to reproduce the results discussed in the paper. Reproducibility is relatively easy for theoretical or mathematically oriented papers. If the main contribution of the paper is a proof or a mathematical model, then the paper contains all the information about the results. If the paper is more experimental, then reproducibility of often a concern. There are many (sometimes valid) reasons that can explain why the results obtained by the paper are difficult to reproduce :

  • the paper contains measurement data that are proprietary. This argument is often used by researchers who have tested their new solution in a commercial network, datacenter or used measurement data such as packet traces whose publication could revela private information
  • the source code used for the paper is proprietary and cannot be released to other researchers. This argument is weaker, especially when researchers extend publicly available (and often open-source) software to perform their research. Although they have benefitted from the publicly available software, they do not release their modification to this software

During the Conext2012, Brandon Heller, Nikhil Handigol, Vimalkumar Jeyakumar, Bob Lantz and Nick McKeown have presented a container-based emulation technique called Mininet 2.0 that enables researchers to easily create reproducible experiments. The paper describes in details the extension that they have developed above the Linux kernel to be able to emulate efficiently a set of hosts interconnected by virtual links on a single Linux kernel. The performance that they obtained is impressive. More importantly, they explain how they were able to reproduce recent networking papers on top of Mininet. Instead of performing the experiments themselves, they used Mininet 2.0 for a seminar at Stanford University and 17 groups of students were able to reproduce various measurements by using one virtual machine on EC2.

Beyond proposing a new tool, they also propose a new way to submit papers. In the introduction, they note :

To demonstrate that network systems research can indeed be made repeatable, each result described in this paper can be repeated by running a single script on an Amazon EC2 [5] instance or on a physical server. Following Claerbout’s model, clicking on each figure in the PDF (when viewed electronically) links to instructions to replicate the experiment that generated the figure. We encourage you to put this paper to the test and replicate its results for yourself.

I sincerely hope that will see more directly reproducible experimental papers in the coming months and years in the main networking conferences.

Brandon Heller, Nikhil Handigol, Vimalkumar Jeyakumar, Bob Lantz, Nick McKeown, Reproducible Network Experiments using Container Based Emulation, Proc. Conext 2012, December 2012, Nice, France

]]>
Fri, 14 Dec 2012 00:00:00 +0100
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/12/07/a_real_use_case_for_the_locator_identifier_separation_protocol.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/12/07/a_real_use_case_for_the_locator_identifier_separation_protocol.html <![CDATA[A real use case for the Locator/Identifier Separation Protocol]]> A real use case for the Locator/Identifier Separation Protocol

The Locator/Identifier Separation Protocol (LISP) was designed several years ago by Dino Farinacci and his colleagues at Cisco Systems as an architecture to improve the scalability of the Internet routing system. Like several other such proposals, LISP proposed to separate the two usages of addresses. In contrast with many other proposals that were discussed in the IRTF Routing Research Group, LISP has been fully implemented at tested on the global Internet through the lisp4.net testbed. Several implementations of LISP exist on different types of Cisco routers and there is also the OpenLISP open-source implementation on FreeBSD [3].

On the Internet, IP addresses are used for identifying an endhost (or more precisely an interface on such a host) where TCP connections can be terminated. Addresses are also used as locators to indicate to the routing system the location of each router and endpoint. On the Internet, both endpoint addresses and locators are advertised in the routing system and contribute to its growth. With LISP, there are two types of addresses :

  • locators. These addresses are used to identify routers inside the network. The locators are advertised in the routing system.
  • identifier addresses are used to identify endhosts. These addresses are not distributed in the routing system. This is the main advantage of LISP from a scalability viewpoint.

A typical deployment of LISP in the global Internet is described in the figure below [6].

../../../_images/lisp.png

LISP packet flow (source [6])

Endhosts are assigned identifiers. Identifiers are IP addresses (IPv4 or IPv6) whose reachability is advertised by the intradomain routing protocol inside the enterprise network. Two hosts that belong to the same network can exchange packets directly (A->B arrow in the above figure). Locators are assigned to border routers (ITRx and ETRx in the figure above). These locators are advertised in the global Internet routing system by using BGP. As the addresses of hosts A, B and C are not advertised by BGP, packets destined to these addresses cannot apear in the global Internet. LISP solves this problem by using map-and-encap. When A sends a packet towards C, it first sends a regular packet to its border router (ITR2 in the above figure). The border router performs two operations. First, it queries the LISP mapping system to retrieve the locator of the border routers that can reach the destination identifier C. Then, the original packet destined to C in encapsulated inside a LISP packet whose source is ITR2 and destination ETR1, one of the locators of identifier C. When ETR1 receives the encapsulated packet, it removes the first IP header and forwards as a regular IP packet towards its destination.

The mapping system plays an important role in the performance and the stability of a LISP-based network since border routers need to maintain a cache of the mappings that they use [2]. The first releases of LISP used a hack that combined GRE tunnels and BGP to distribute mapping information [1]. This solution had the advantage of being simple to implement (at least on Cisco routers) but we expected that it would become complex to operate and maintain in the long term. After many discussions and simulations, we convinced the LISP designers to opt for a different mapping system whose operation is inspired by the Domain Name System. Our LISP-TREE proposal [4] is the basis for the DDT mapping system that is now implemented and used by LISP routers.

Simulation-based studies have shown that LISP can provide several important benefits compared to the classic Internet architecture [5]. Some companies have used LISP to support specific services. For example, facebook has relied on LISP to support IPv6-based services [6]. However, until now the deployment use cases were not completely convincing from a commercial viewpoint. A recent announcement could change the situation. In a whitepaper, Cisco describes how LISP can be combined with encryption techniques to support Virtual Private Network services. Given the importance of VPN services for enterprise networks, this could become a killer application for LISP. There are apparently already several networks using LISP to support VPN services. The future of LISP will be guaranteed once a second major router vendor decides to implement LISP.

[1]V. Fuller, D. Farinacci, D. Meyer, and D. Lewis. Lisp alternative topology (lisp+alt). Internet draft, draft-ietf-lisp-alt-10.txt, December 2011.
[2]Luigi Iannone and Olivier Bonaventure. On the cost of caching locator/ID mappings. In CoNEXT ‘07: Proceedings of the 2007 ACM CoNEXT conference. ACM, December 2007.
[3]Luigi Iannone, Damien Saucez, and Olivier Bonaventure. Implementing the Locator/ID Separation Protocol: Design and experience. Computer Networks: The International Journal of Computer and Telecommunications Networking, March 2011.
[4]Loránd Jakab, Albert Cabellos-Aparicio, Florin Coras, Damien Saucez, and Olivier Bonaventure. LISP-TREE: a DNS hierarchy to support the lisp mapping system. IEEE Journal on Selected Areas in Communications, October 2010.
[5]Bruno Quoitin, Luigi Iannone, Cédric de Launois, and Olivier Bonaventure. Evaluating the benefits of the locator/identifier separation. In MobiArch ‘07: Proceedings of 2nd ACM/IEEE international workshop on Mobility in the evolving internet architecture. ACM Request Permissions, August 2007.
[6](1, 2, 3) Damien Saucez, Luigi Iannone, Olivier Bonaventure, and Dino Farinacci. Designing a Deployable Internet: The Locator/Identifier Separation Protocol. IEEE Internet Computing Magazine, 16(6):14–21, 2012.
]]> Fri, 07 Dec 2012 00:00:00 +0100 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/12/04/tcp_congestion_control_schemes.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/12/04/tcp_congestion_control_schemes.html <![CDATA[TCP congestion control schemes]]> TCP congestion control schemes

Since the publication of two end-to-end congestion control schemes at SIGCOMM‘88 [3] [4], congestion control has been a very popular and important topic in the scientific community. The IETF tried to mandate a standard TCP congestion control scheme that would be used by all TCP implementations RFC 5681, but today’s TCP implementations contain different congestions control schemes. Linux supports different congestion control schemes that can be configured by the system administrator. A detailed analysis of these implementations was presented recently in [2] Windows has opted for their own congestion control scheme that is included in the microsoft stack.

Given the importance of the congestion control scheme from a performance viewpoint, it is useful to have a detailed overview of the different congestion control schemes that have been proposed and evaluated. The best survey paper on TCP congestion control is probably the paper written by Alexander Afanasyev and his colleagues on Host-to-Host Congestion Control for TCP [1] and that appeared in IEEE Communications Surveys and tutorials. This paper provides a detailed overview of the different TCP congestion control schemes and classifies them.

Last week, I received an alert from google scholar indicating that a new survey on TCP congestion control appeared in the Journal of Network and Computer Applications. This paper tries to provide a classification of the different TCP congestion control schemes. Unfortunately, the paper is not convincing at all and furthermore it reuses two of the figures published in [1] without citing this previously published survey. This is a form of plagiarism that should have been detected by the editors of the Journal of Network and Computer Applications

[1](1, 2) Alexander Afanasyev, Neil Tilley, Peter Reiher, and Leonard Kleinrock. Host-to-Host Congestion Control for TCP. IEEE Communications Surveys & Tutorials, 12(3):304–342, 2010.
[2]C Callegari, S Giordano, M Pagano, and T Pepe. Behavior analysis of TCP Linux variants. Computer Networks, 56(1):462–476, January 2012.
[3]V. Jacobson. Congestion avoidance and control. In ACM SIGCOMM Computer Communication Review, volume 18, 314–329. ACM, 1988.
[4]KK Ramakrishnan and R. Jain. A binary feedback scheme for congestion avoidance in computer networks with a connectionless network layer. In ACM SIGCOMM Computer Communication Review, volume 18, 303–313. ACM, 1988.
]]> Tue, 04 Dec 2012 00:00:00 +0100 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/09/25/towards_faster_web_downloads_and_some_interesting_data.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/09/25/towards_faster_web_downloads_and_some_interesting_data.html <![CDATA[Towards faster web downloads and some interesting data]]> Towards faster web downloads and some interesting data

Decreasing the time required to download web pages is an obsession for a large number of content providers. They rely on various tricks to speed up the download of web pages. Many of these tricks are widely known, although they are not implemented on all web servers. Some depend on the content itself. A smaller web page will always load faster than a large web page. This is the reason why major content providers optimise their HTML content to reduce the unnecessary tags or compress their javascript scripts. They also heavily rely on cacheable information such as images, CSS, javascript, ... Some also use gzip-based compression to dynamically compress the data that needs to be transmitted over the wire. This is particularly useful when web pages are delivered over low bandwidth links such as to mobile phones.

Two years ago, at the beginning of their work on SPDY, google published interesting data about the size of web accessible content on https://developers.google.com/speed/articles/web-metrics

The analysis was based on the web pages collected by googlebot. It reflects thus a large fraction of the public web. Some of the key findings of this analysis include :

  • the average web page results in the transmission of 320 KBytes
  • there are on average more than 40 GET Requests per web page and a web page is retrieved by contacting 7 different hostnames. The average page contains 29 images, 7 scripts and 3 CSS
  • on average, each HTTP GET results in the retrieval of only 7 KBytes of data

The public web is thus widely fragmented and this fragmentation has an impact on its performance.

Last year, Yahoo researchers presented an interesting analysis of the work they did to optimise the performance of the web pages served by Yahoo [1]. This analysis focuses on the yahoo web pages but still provides some very interesting insights on the performance of the web. Some of the key findings of this analysis include :

  • around 90% of the objects served by yahoo are smaller than 25 Kbytes

  • despite the utilisation of HTTP/1.1, the average number of requests per TCP connection is still only 2.24 and 90% of the TCP connections do not carry more than 4 GETs

  • web page download is heavily affected by packet losses and packet losses occur. 30% of the TCP connections are slowed by retransmissions

    ../../../_images/loss.png

    Packet retransmission rate observed on the yahoo! CDN Source [1]

  • increasing the initial TCP window size as recently proposed and implemented on Linux reduces the page download time, even when packet losses occur

  • increasing the initial TCP window size may cause some unfairness problems given the small duration of TCP connections

There is still a lot of work to be done to reduce page download times and many factors influence the perceived performance of the web.

[1](1, 2) Mohammad Al-Fares, Khaled Elmeleegy, Benjamin Reed, and Igor Gashinsky. Overclocking the yahoo!: cdn for faster web page loads. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, IMC ‘11, 569–584. New York, NY, USA, 2011. ACM.
]]> Tue, 25 Sep 2012 00:00:00 +0200 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/09/14/understanding_the_dropbox_protocol_and_quantifying_the_usage_of_cloud_storage_services.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/09/14/understanding_the_dropbox_protocol_and_quantifying_the_usage_of_cloud_storage_services.html <![CDATA[Understanding the dropbox protocol and quantifying the usage of cloud storage services]]> Understanding the dropbox protocol and quantifying the usage of cloud storage services

Measurement studies show that video is consuming more and more bandwidth in the global Internet. Another service whose usage is growing are the cloud-based storage services like dropbox, icloud, skydrive or GDrive. These cloud storage services use proprietary protocols and allow users to exchange files and share folders efficiently. dropbox is probably one of the most widely known storage services. It heavily relies on the amazon EC2 and AWS services. The dropbox application is easy to use, but few is known open the operation of the underlying protocol. In a paper that will be presented this fall at IMC‘12, Idilio Drago and his colleagues provide a very detailed analysis of the dropbox protocol and its usage in home and campus networks [1].

Several of the findings of this paper are very interesting. First, despite its popularity, dropbox is still provided from servers located mainly in the US. This implies a long round-trip-time for the large population of dropbox users who do not reside in North America. Since dropbox uses the Amazon infrastructure, it is surprising that they do not seem to use Amazon datacenters outisde the US. All the files that you store in your dropbox folder are likely stored on US servers. Another surprising result is that dropbox divides the files to be transferred in chunks of 4 MBytes and each chunk needs be acknowledged by the application. Coupled with the long round-trip-time, this results in a surprisingly low transfer rate of about 500 Kbps. This performance issue seems to have been solved recently by dropbox with the ability to send chunks in batches.

[1] also provides an analysis of the operation of the main dropbox protocol. dropbox uses mainly servers hosted on Amazon datacenters for various types of operation. Although dropbox uses TLS to encrypt the data, the authors used SSLBump running on squid to perform a man-in-the-middle attack between a dropbox client and the official servers.

../../../_images/dropbox.png

An example of a storage operation with dropbox (source [1])

Another interesting information provided in [1] is an analysis of the dropbox traffic in campus and home networks. This analysis performed by using tstat shows that cloud storage services already contribute to a large volume of data in the global Internet. The analysis also considers the percentage of clients that are uploading, downloading and silent. Users who have installed dropbox but are not using it should be aware that the dropbox client always opens connections to dropbox servers, even if no data needs to be exchanged. The entire dataset collected for is available from http://www.simpleweb.org/wiki/Dropbox_Traces

[1](1, 2, 3, 4) Idilio Drago, Marco Mellia, Maurizio M. Munafò, Anna Sperotto, Ramin Sadre, and Aiko Pras. Inside Dropbox: Understanding Personal Cloud Storage Services. In Proceedings of the 12th ACM SIGCOMM Conference on Internet Measurement, IMC‘12. 2012.
]]> Fri, 14 Sep 2012 00:00:00 +0200 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/09/06/an_interesting_retrospective_of_a_decade_of_transport_protocol_research_with_sctp.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/09/06/an_interesting_retrospective_of_a_decade_of_transport_protocol_research_with_sctp.html <![CDATA[An interesting retrospective of a decade of transport protocol research with SCTP]]> An interesting retrospective of a decade of transport protocol research with SCTP

TCP is still the dominant transport protocol on the Internet. However, since the late 1990s, researchers and IETFers have worked on developing and improving another reliable transport protocol for the Internet : SCTP. The Stream Control Transport Protocol (SCTP) RFC 4960 provides several additional features compared to the venerable TCP, including :

  • multihoming and failover support
  • partially reliable data transfer
  • preservation of message boundaries
  • support for multiple concurrent streams

The main motivation for the design of SCTP has been the support of IP Telephony signaling applications that required a reliable delivery with small delays and fast failover in case of failures. SCTP has evolved to support other types of applications.

Since the publication of the first RFC on SCTP, various research groups have proposed extensions and improvements to SCTP. SCTP’s flexibility and extensibility has enabled researchers to explore various new techniques and solutions to improve transport protocols. A recently published survey paper [1] analyses almost a decade of transport protocol research by looking at over 430 SCTP related publications. Like most survey papers, it provides an introduction to the paper topic, in this case SCTP and briefly compares some of the studied papers.

[1] goes beyond a simple summary of research papers and the approach chosen by the authors could probably be applicable in other fields. In order to understand the evolution of the main topics of SCTP research, the authors of [1] classified each paper along four different dimensions :

  • protocol features (handover, load sharing, congestion, partial reliability, ...)
  • application (signaling, multimedia, bulk transfer, web, ...)
  • network environment (wireless, satellite, best effort, ...)
  • study approach (analytical, simulation, emulation, live measurements, ...)

By using these four dimensions [1], provides a quick snapshot of the past SCTP research and its evolution over the years. Not surprisingly, simulation is more popular than live measurements or bulk data transfer is more often explored than signaling. Furthermore, [1] provides interesting visualization such as the figure below on the chosen study approach.

../../../_images/sctp.png

Fourth dimension : the study approach used for SCTP papers during the last decade [1]

[1] is a very interesting starting point for any researcher interested in transport protocol research. The taxonomy and the presentation could also inspire researchers in other fields. A web interface for the taxonomy is also available. Unfortunately, it does not seem to have been maintained after the finalization of the paper.

[1](1, 2, 3, 4, 5, 6, 7) \Lukasz Budzisz, Johan Garcia, Anna Brunstrom, and Ramon Ferrús. A taxonomy and survey of sctp research. ACM Comput. Surv., 44(4):18:1–18:36, September 2012. http://doi.acm.org/10.1145/2333112.2333113.
]]> Thu, 06 Sep 2012 00:00:00 +0200 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/09/05/less_than_best_effort___congestion_control_schemes_do_not_always_try_to_optimize_goodput.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/09/05/less_than_best_effort___congestion_control_schemes_do_not_always_try_to_optimize_goodput.html <![CDATA[Less than best effort : congestion control schemes do not always try to optimize goodput]]> Less than best effort : congestion control schemes do not always try to optimize goodput

The TCP congestion control scheme aims at providing a fair distribution of the network resources among the competing hosts while still achieving the highest possible throughput for all sources. In some sense, TCP’s congestion control scheme considers that all sources are equal and should obtain the same fraction of the available resources. In practice, this is not completely true since it is known that TCP is unfair and favors sources with a low round-trip-time compare to sources with a high round-trip-time. Furthermore, TCP’s congestion control scheme operates by filling the available buffers in the routers. This mode of operation results in an increase in the end-to-end delay perceived by the applications. This increased delay can be penalizing for interactive applications.

However, TCP’s congestion control RFC 5681 is only one possible design point in the space of congestion control schemes. Some congestion control schemes start from a different assumption than TCP. The Low Extra Delay Background Transport (LEDBAT) IETF working group is exploring such an alternate design point. Instead of assuming that all sources are equal, LEDBAT assumes that some sources are background sources that should benefit from the available network resources within creating unnecessary delays that would affect the other sources. The LEDBAT congestion control scheme is an example of a delay based congestion controller. LEDBAT operates by estimating the one-way delay between the source and the destination. This delay is estimated by measuring the minimum delay and assuming that this minimum delay can serve as a reference to evaluate the one-way delay. The figure below (from [1]) illustrates clearly the estimation of the minimum one-way-delay and then the estimation of the current one-way-delay. LEDBAT uses a congestion window and adjusts it every time the measured delay changes. If the measured delay increases, this indicates that the network is becoming more congested and thus background sources need to backoff and reduce their congestion window.

../../../_images/ledbat.png

Estimation of the queueing delay in LEDBAT [1]

LEDBAT is only one example of congestion control schemes that can be used by less-than best effort applications. A recent survey [1] RFC 6297 summarizes the main features and properties of many of these congestion control schemes. Furthermore, [1] provides pointers to implementations of such controllers. Such congestion control schemes have notably been implemented inside Bittorrent clients. An analysis of the performance of such p2p clients may be found in [2]

[1](1, 2, 3, 4) D. Ros and M. Welzl. Less-than-best-effort service: a survey of end-to-end approaches. Communications Surveys Tutorials, IEEE, PP(99):1 –11, 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6226797.
[2]D. Rossi, C. Testa, and S. Valenti. Yes, we ledbat: playing with the new bittorrent congestion control algorithm. In Passive and Active Measurement (PAM‘10). Zurich, Switzerland, April 2012. http://www.pam2010.ethz.ch/papers/full-length/4.pdf.
]]> Wed, 05 Sep 2012 00:00:00 +0200 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/08/31/getting_a_better_understanding_of_the_internet_via_the_lenses_of_ripe_atlas.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/08/31/getting_a_better_understanding_of_the_internet_via_the_lenses_of_ripe_atlas.html <![CDATA[Getting a better understanding of the Internet via the lenses of RIPE Atlas]]> Getting a better understanding of the Internet via the lenses of RIPE Atlas

The Internet is probably the most complex system to have been built by humans. All the devices and software that compose the Internet interact in various ways. Most of the times, these interactions are positive and allow to improve the network performance. However, some interactions can cause losses of connectivity, decreased performance or other types of problems.

Understanding all the factors that affect the performance of the Internet is complex. One way to approach the problem is to collect measurements about various metrics such as delay, bandwidth or paths through the network. Several research projects and companies are currently collecting large amounts of data about the Internet.

A very interesting project is RIPE Atlas . RIPE is a non-profit organisation mainly composed of network operators whose objective is to allocate IP addresses in Europe. In addition to this address allocation activity, they allow carry various projects that are useful for their members. Atlas is one of their recent projects. To obtain a better understanding on the performance and the connectivity of Internet nodes, RIPE engineers have developed a very small network probe that contains an embedded operating system, has an Ethernet plug an is powered via USB. A photo of the Atlas probe is available here.

This embedded system has low power and low performance, but it can be deployed at a large scale. As of this writing, more than 1800 probes are connected to the Internet and new ones are added on a daily basis. This large number of nodes places RIPE in a very good position to obtain very good data about the performance of the network since they can run various types of measurements including ping and traceroute. As of this writing, Atlas is mainly used to check network connectivity since Atlas hosts can request their own measurements. In this future, it can be expected that Atlas hosts will be able to program various forms of measurements and RIPE has developed a credit system that allows hosts to obtain credits based on the number of Atlas probes that they host.

Atlas already cover a large fraction of the Internet. In can check on https://atlas.ripe.net/ the probes that have been activated near your location. If you live in an area without Atlas probes and have permanent Internet connectivity, you can apply on https://atlas.ripe.net/pre-register/

]]>
Fri, 31 Aug 2012 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/08/30/dns_injection_can_pollute_the_entire_internet.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/08/30/dns_injection_can_pollute_the_entire_internet.html <![CDATA[DNS injection can pollute the entire Internet]]> DNS injection can pollute the entire Internet

The Domain Name System is one of the key applications on the Internet since it enables the translation of domain names into IP addresses. There are many usages of the DNS and unfortunately some abuses as well. Since DNS allows to map domain names into IP addresses, a simple attack on the DNS system consist in providing incorrect answers to DNS queries. This can be performed by attackers willing to launch a Man in the Middle Attack but also by some ISPs and some governments to block some websites. Various countries have laws that force ISPs to block some specific websites, for various purposes. In Belgium this technique has been used several times to block a small number of websites, see e.g. http://blog.rootshell.be/2011/10/04/the-great-firewall-of-belgium-is-back/

Some countries have more ambitious goals than Belgium and try to block a large number of websites. Chine is a well-known example with the Great Firewall of China. One of the techniques used in China is DNS injection. In a few words, a DNS injector is a device that is strategically placed inside the network to capture DNS requests. Every time the injector sees a DNS request that matches a blocked domain, it sends a fake DNS reply containing invalid information. A recent article published in SIGCOMM CCR analyses the impact of such injectors [1]. Surprisingly, DNS injectors can lead to pollution of DNS resolvers outside of the country where the injection takes place. This is partially because the DNS is a hierarchy and when a resolver sends a DNS request, it usually needs to query top-level domain name servers. When the path to such a server passes through a DNS injector, even if the actual data traffic will not pass through the DNS injector later, the injector will inject a fake reply and the website will not be reachable by any client of this resolver during the lifetime of the cached information. The analysis presented in the paper shows that this DNS injection technique can pollute a large number of resolvers abroad. The article reports that domains in belonging to .de are affected by the Great Firewall of China.

map to buried treasure

DNS injection (source : [1])

The article on The Collateral Damage of Internet Censorship by DNS Injection should become a must read for ISP operators who are forced by their governments to deploy DNS injectors. In parallel, this should also be a strong motivation to deploy DNSSEC that will enable resolvers to detect and ignore these DNS injections.

[1](1, 2) Anonymous. The collateral damage of internet censorship by dns injection. SIGCOMM Comput. Commun. Rev., 42(3):21–27, 2012. http://doi.acm.org/10.1145/2317307.2317311.
]]> Thu, 30 Aug 2012 00:00:00 +0200 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/08/29/interesting_networking_magazines.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/08/29/interesting_networking_magazines.html <![CDATA[Interesting networking magazines]]> Interesting networking magazines

During their studies, students learn the basics of networking. After one of maybe two networking courses, they know a very small subset of the networking field. This subset should allow them to continue learning new protocols, architectures and algorithms. However, finding information about these new techniques can be difficult. Here is a short list of magazines and journals that publish regularly interesting information about the evolution of the networking field.

A first source of information are the magazines. There are various trade press magazines that usually present new products and provide information on new commercial deployments. A networking student should not limit his/her networking knowledge to the information published in the trade press. Here is a subset of the magazines that I often try to read :

  • the Internet Protocol Journal published by Cisco is a very useful and very interesting source of tutorial papers. Each issue usually contains articles about new Internet protocols written by an expert. The papers are easy to read and do not contain marketing information.
  • the IETF Journal published by the Internet Society publishes short papers about the evolution of the protocols standardized by the IETF

These two magazines are freely available to anyone.

Scientific societies also publish magazines with networking content. These magazines are not available at the newsstand, but good university libraries should store them.

The IEEE publishes several magazines with some networking content :

  • IEEE Network Magazine publishes tutorial papers in the broad networking field.
  • IEEE Communications Magazine publishes tutorial articles on various topics of interest to telecommunication engineers, with sometime networking sections or articles
  • IEEE Internet Computing Magazine publishes articles mainly on Internet-based applications with sometimes articles on lower layers
  • IEEE Security and Privacy publishes articles on new advances in security and privacy in general. Some of the published articles are related to networking problems.

The ACM publishes various magazines and journals that cover most areas of Computer Science.

  • Communications of the ACM is ACM’s main magazine. This magazine is an essential read for any computer scientist willing to track the evolution of his/her scientific field. It sometimes publishes networking articles. ACM Queue is a special section of the magazine that is devoted to more practically oriented articles. It has published very interesting networking articles and furthermore all content on ACM Queue is publicly available.

If you read other networking magazines that are of interest for networking students, let me know. I will cover networking conferences and networking journals in subsequent posts.

]]>
Wed, 29 Aug 2012 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/08/27/multipath_tcp___beyond_grandmother__s_tcp.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/08/27/multipath_tcp___beyond_grandmother__s_tcp.html <![CDATA[Multipath TCP : beyond grandmother's TCP]]> Multipath TCP : beyond grandmother’s TCP

TCP, the Transmission Control Protocol, is the default transport protocol in the TCP/IP protocol suite. TCP is essentially based on research carried out during the 1970s that resulted in the publication of RFC 793. Since then, TCP has slowly evolved. A major step in TCP’s history is the congestion control scheme proposed and implemented by Van Jacobson [2]. RFC 1323 proposed a few years later extended TCP to support larger windows, a key extension for today’s high speed networks. Various changes have been included in TCP over the years and RFC 4614 provides pointers to many of the TCP standardisation documents.

In the late 1990s and early 2000s, the IETF developed the Stream Control Transmission Protocol (SCTP) RFC 4960. Compared to TCP, SCTP is a more advanced transport protocol that provides a richer API to the application. SCTP was also designed with multihoming and mind and can support hosts with multiple interfaces, something that TCP cannot easily do. Unfortunately, as of this writing, SCTP has not yet been widely deployed despite being implemented in major operating systems. The key difficulties in deploying SCTP appear to be :

  • the lack of SCTP support in middleboxes such as NAT, firewalls, ...
  • the need to update applications to support SCTP, although this is changing with RFC 6458

SCTP seems to be stuck in a classical chicken and egg problem. As there are not enough SCTP applications, middleboxes vendor do not support it and application developers do not use SCTP since middleboxes do not support. Multipath TCP is a major extension to TCP whose specification is currently being finalised by the MPTCP working group of the IETF. Multipath TCP allows a TCP connection to be spread over several interfaces during the lifetime of the connection. Multipath TCP has several possible use cases :

  • datacenters where Multipath TCP allows to better spread the load among all available paths [4]
  • smartphones where Multipath TCP allows to use both WiFi and 3G a the same time [3]

The design of Multipath TCP [1] has been more complicated than expected due to the difficulty of supporting various types of middleboxes [5], but the protocol is now ready and you can even try our implementation in the Linux kernel from http://www.multipath-tcp.org

[1]Alan Ford, Costin Raiciu, Mark Handley, and Olivier Bonaventure. Tcp extensions for multipath operation with multiple addresses. Internet draft, draft-ietf-mptcp-multiaddressed-07, March 2012.
[2]V. Jacobson. Congestion avoidance and control. In Symposium proceedings on Communications architectures and protocols, SIGCOMM ‘88, 314–329. New York, NY, USA, 1988. ACM.
[3]Christoph Paasch, Gregory Detal, Fabien Duchene, Costin Raiciu, and Olivier Bonaventure. Exploring mobile/wifi handover with multipath tcp. In ACM SIGCOMM workshop on Cellular Networks (Cellnet12). 2012.
[4]Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik, and Mark Handley. Improving datacenter performance and robustness with multipath tcp. In Proceedings of the ACM SIGCOMM 2011 conference, SIGCOMM ‘11, 266–277. New York, NY, USA, 2011. ACM.
[5]Costin Raiciu, Christoph Paasch, Sebastien Barre, Alan Ford, Michio Honda, Fabien Duchene, Olivier Bonaventure, and Mark Handley. How hard can it be? designing and implementing a deployable multipath tcp. In USENIX Symposium of Networked Systems Design and Implementation (NSDI‘12), San Jose (CA). 2012.
]]> Mon, 27 Aug 2012 00:00:00 +0200 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/08/21/anatomy.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/08/21/anatomy.html <![CDATA[Anatomy of a Large European IXP]]> Anatomy of a Large European IXP

The Internet is still evolving and measurements allow us to better understand its evolution. In [3], Craig Labovitz and his colleagues used extensive measurements to show the growing importance of large content providers such as google, yahoo or content distribution networks such as Akamai. This paper forced researchers to reconsider some of their assumptions on the organisation of the Internet with fully meshed Tier-1 ISPs serving Tier-2 ISPs that are serving Tier-3 ISPs and content providers. [3] showed that many of the traffic sources are directly connected to the last mile ISPs.

In a recent paper [1] presented during SIGCOMM 2012 , Bernard Ager and his colleagues used statistics collected at one of the largest Internet eXchange Points (IXP) in Europe. IXPs are locations were various Internet providers place routers to exchange traffic via a switched network. While the first IXPs where simple Ethernet switches in the back of a room, current IXPs are huge. For example, AMS-IX gathers more than 500 different ISPs on more than 800 different ports. As of this writing, its peak traffic has been larger than 1.5 Terabits per second ! IXPs play a crucial role in the distribution of Internet traffic and in particular in Europe where there are many large IXPs. [1] highlights many unknown factors about these IXPs such as the large number of peering links that exist on each ISP, the application mix or the traffic matrices. [1] will become a classic paper for those willing to understand the organisation of the Internet.

[1](1, 2, 3) B. Ager, N. Chatzis, A. Feldmann, N. Sarrar, S. Uhlig, and W. Willinger. Anatomy of a large european ixp. In SIGCOMM 2012. Helsinki, Finland, April 2012.
[2]H. Babiker, I. Nikolova, and K. Chittimaneni. Deploying ipv6 in the google enterprise network. lessons learned. In USENIX LISA 2011. 2011.
[3](1, 2) Craig Labovitz, Scott Iekel-Johnson, Danny McPherson, Jon Oberheide, and Farnam Jahanian. Internet inter-domain traffic. SIGCOMM Comput. Commun. Rev., 41(4):–, August 2010.
[4]T. Tsou, D. Lopez, J. Brzozowski, C. Popoviciu, C. Perkins, and D. Cheng. Exploring ipv6 deployment in the enterprise: experiences of the it department of futurewei technologies. IETF Journal, June 2012.
]]> Tue, 21 Aug 2012 00:00:00 +0200 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/07/23/ipv6_deployment.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/07/23/ipv6_deployment.html <![CDATA[Deploying IPv6 in enterprise networks]]> Deploying IPv6 in enterprise networks

The Internet is slowly moving towards IPv6. IPv6 traffic is now growing and more and more enterprise networks are migrating to IPv6. Migrating all enterprise networks to support IPv6 will be slow. One of the main difficulties faced by network administrators when migrating to IPv6 is that it is not sufficient to migrate the hosts (most operating systems already support IPv6) and configure the routers are dual-stack. All devices that process packets need to be verified or updated to support IPv6.

Network administrators who perform IPv6 migrations sometimes document their findings in articles. These articles are interesting for other networ administrators who might face similar problems.

In [2], google engineers explain the issues that they faced when migrating the enterprise networks of google premisses to support IPv6. Surprisingly, one of the main bottleneck in this migration was the support of IPv6 on the transparent proxies that they use to accelerate web access.

In [4], researchers and network administrators from FutureWei report their experience in adding IPv6 connectivity to their network. The report is short on enabling IPv6, but discusses more recent solutions that are being developed within the IETF.

Another interesting viewpoint is the one discussed in RFC 6585. In this RFC, Jari Arkko and Ari Keranen report all the issues that they faced while running an IPv6 only network in their lab and using it on a daily basis to access the Internet which is still mainly IPv4.

[2]H. Babiker, I. Nikolova, and K. Chittimaneni. Deploying ipv6 in the google enterprise network. lessons learned. In USENIX LISA 2011. 2011.
[4]T. Tsou, D. Lopez, J. Brzozowski, C. Popoviciu, C. Perkins, and D. Cheng. Exploring ipv6 deployment in the enterprise: experiences of the it department of futurewei technologies. IETF Journal, June 2012.
]]> Mon, 23 Jul 2012 00:00:00 +0200 http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/07/19/middlexboxes.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/07/19/middlexboxes.html <![CDATA[Don't ignore the middleboxes]]> Don’t ignore the middleboxes

Traditional networks contain routers, switches, clients and servers. Most introductory networking textbooks focus on these devices and the protocols that they use. However, real networks can be much more complex than the typical academic networks that are considered in textbooks. During the last decade, enterprise networks have included more and more middleboxes. A middlebox can be roughly defined as a device that resides inside the network and it able to both forward (like a router or a switch) but also modify packets. For this reason, middleboxes are often considered as layer-7 relays but they are not officially part of the Internet architecture. These middleboxes are usually deployed by network operators to better control or improve the performance of traffic in their network. There exist various types of middleboxes RFC 3234. The most common ones are :

  • Network Address Translators (NAT) that rewrite IP addresses and port numbers
  • Firewalls that control the incoming and outgoing packets
  • Network Intrusion Detection System that analyse the packet payloads to detect possible attacks
  • Load balancers that allow to distribute the load among several servers
  • WAN optimizers that compress packets before transmitting them over expensive low bandwidth links
  • Media gateways that are able to transcode voice and video formats
  • transparent proxy caches that speedup access to remote web servers by maintaining caches
  • ...

The list of middleboxes keeps growing and managing them in addition to the routers and the switches is becoming a concern for enterprise network operators. In a recent paper presented at USENIX NSDI12, Vyas Sekar and colleagues describe a survey that they performed in an anonymous entreprise network. This network contained about 900 routers and more than 600 middleboxes !

Appliance type Number
Firewall 166
Network Intrusion Detection System 127
Conferencing/Media gateway 110
Load balancers 67
Proxy caches 66
VPN devices 45
WAN optimizers 44
Voice gateways 11
Routers about 900
]]>
Thu, 19 Jul 2012 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/07/17/internet_topology_zoo.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/07/17/internet_topology_zoo.html <![CDATA[Internet Topology Zoo]]> Internet Topology Zoo

A recent article published on slate provided nice artistic views about the layout of the optical fibers that are used for Internet.

Researchers have spent a lot of time to collect data about ISP networks during the slate decade. If you are looking for nice maps about real networks, I encourage you to have a look at the Internet topology zoo. This website, maintained by researchers from the University of Adelaide contains maps of a few hundred networks, an excellent starting point if you would like to understand a bit more on ISP networks are designed.

If you are more interested by the layout of cables, notably submarine cables, you can also check the geographical maps provided by telegeography.

]]>
Tue, 17 Jul 2012 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/07/16/adaptive_queue_management.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/07/16/adaptive_queue_management.html <![CDATA[Controlling Queueing delays]]> Controlling Queueing delays

Routers use a buffer to store the packets that have arrived but have not yet been retransmitted on their output linke. These buffers play an important role in combination with TCP’s congestion control scheme TCP uses packet losses to detect congestion. To manage their buffers, routers rely on a buffer acceptance algorithm. The simplest buffer acceptance algorithm is to discard packets as soon as the buffer is full. This algorithm can be easily implemented, but simulations and measurements have shown that is does not always provide good performance with TCP.

In the 1990s, various buffer acceptance algorithms have been proposed to overcome this problem. Random Early Detection (RED) probabilistically drops packets when the average buffer occupancy becomes too high. RED has been implemented on routers and has been strongly recommended by the IETF in RFC 2309. However, as of this writing, RED is still not widely deployed. One of the reasons is that RED uses many parameters and is difficult to configure and tune correctly (see the references listed on http://www.icir.org/floyd/red.html).

In a recent paper published in ACM Queue, Kathleen Nichols and Van Jacobson propose a new Adaptive Queue Management algorithm. The new algorithm measures the waiting time of each packet in the buffer and its control law depends on the minimum buffer occupancy. An implementation for Linux-based routers seems to be in progress. Maybe it’s time to revisit buffer acceptance algorithms again...

]]>
Mon, 16 Jul 2012 00:00:00 +0200
http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/07/16/unicode_is_growing.html http://perso.uclouvain.be/olivier.bonaventure/blog/html/2012/07/16/unicode_is_growing.html <![CDATA[Unicode is growing]]> Unicode is growing

The Internet was created by using the 7-bits ASCII character set. Over the years, the internationalisation of the Internet forced protocol designers to reconsider the utilisation of the 7-bits US-ASCII character set. A first move was the utilisation of the 8-bits character sets. Unicode became the unifying standard that allows to encode all written languages. A recent article in IEEE Spectrum provides interesting data about the progressing of Unicode on web servers. See http://spectrum.ieee.org/telecom/standards/will-unicode-soon-be-the-universal-code

]]>
Mon, 16 Jul 2012 00:00:00 +0200