Concrete steps to improve the reproducibility of networking research

Schloss Dagstuhl – Leibniz Center for Informatics is a well-known and important place for Computer Science. Since 1990, it is a meeting place for Computer Science researchers who spent a few days or a week discussing about interesting research issues in a small castle in South Germany. Dagstuhl is well-known for the high quality of its seminars.

Last week, together with kc Claffy, Daniel Karrenberg and Vaibhav Bajpai, I had the pleasure to organise a Dagstuhl seminar on Encouraging Reproducibility in Scientific Research of the Internet. As many fields of Science, networking research has some reproducibility issues. During the seminar, the thirty participants discussed on several very concrete steps to improve the reproducibility of networking research. This is a long-term objective that the community needs to tackle step by step to achieve a sustainable solution.

../../../_images/dagstuhl.jpg

Several very excellent ideas where discussed during the seminar and some of them will materialise in the coming months. The first one is a review form for research artifacts that will be used for the evaluation of the artifacts of SIGCOMM-sponsored conferences in the coming weeks. It will probably be followed by a set of guidelines to help young researchers to carry out reproducible research and other more long-term ideas.

A printed ebook

In 2008, UCLouvain agreed to reduce my teaching load to let me concentrate on writing the open-source Computer Networking: Principles, Protocols and Practice ebook. Since then, this ebook has served as required textbook for basic networking courses at UCLouvain and at various universities. The ebook is completely open and available under a creative-commons license. This license enables anyone, including for-profit printing companies, to distribute and sell copies of the book. Honestly, I haven’t checked whether someone had decided to take the financial risk of printing copies of the book. With today’s print on demand solutions, the risk is very small.

While attending a Dagstuhl seminar on Encouraging Reproducibility in Scientific Research of the Internet, I visited their famous library, probably one of the best Computer Science libraries. I was delighted to see that they had bound copy of the first edition of Computer Networking: Principles, Protocols and Practice. This is the first time that I saw a copy of this book.

../../../_images/cnp3a.jpg ../../../_images/cnp3b.jpg

This printed version has been published by Textbook Equity.

A blog to complement university courses

When I was a student, university courses where an opportunity for the professor to teach all the important principles on a given topics to the students who registered for the course. At that time, students almost only used the course syllabus or one reference book. They rarely went to the library to seek additional information on any topic discussed by the professor. This forced the professor to be as complete as possible and cover all the important topics during the classes.

Today’s professors have a completely different job. Given the vast amount of information that is available to all students over the Internet, university courses have become a starting point that guide students in their exploration of the course topic. It remains important to teach the key principles to the student, but it becomes equally important to encourage them to explore the field by themselves. There are several activities that professors can organise in their classes to encourage the students to go further. For example, my networking course is based on the open-source Computer Networking: Principles, Protocols and Practice ebook. Initially, the ebook was distributed as a pdf file. The students were satisfied with the contents of the ebook, but they almost never spent time in the library to look at the books and articles referenced in the bibliography. This changed dramatically in 2011 after I modified the bibliography to include clickable URLs for most cited references. Since then, I observed that more and more students spent time to look at some references, including RFCs, to better understand specific parts of the course.

Another activity that I organise within the networking course to encourage students to explore the field is the detailed analysis of a popular website that each student has to carry out. During the last month of the semester, i.e. once the students has understood the basics of computer networking and some of the key protocols, each student has to apply his/her knowledge by writing a detailed four-page report that analyses the operation of a popular website. During the course, the students learns the basics of DNS, TLS, HTTP, TCP, IPv6 and they mobilise this knowledge to understand the protocol optimisations done by popular websites. They use standard tools such as the developper extensions of web browsers, dig, traceroute, wireshark, tcpdump, or openssl to interact with the website and analyse the protocol optimisation that it supports. During this analysis, they often see unexpected results that force them to understand in more details one of these protocols by looking at tutorials on the web, scientific articles or internet drafts and RFCs. With this kind of activity, the students gain a more in-depth knowlege of the Internet protocols that are explained during the course. More importantly, they also learn to find accurate technical information on the web, which is a very important skill for any computer scientist.

The exam is an important event for the students. It confirms that they have mastered the topic. However, the topics that were discussed during the course continue to evolve after the exam. While the basic principles of computer networking are stable, Internet protocols continue to evolve at a rapid pace. Various updates have been made to the Computer Networking: Principles, Protocols and Practice ebook. This ensures that future students will use up-to-date material to start their exploration of the networking field. However, former students are also interested in the evolution of the field and do not want to wait for the next edition of the ebook. For them, I have launched a companion blog for the ebook. On this blog, I summarise recent news, articles, or Internet drafts that could affect the evolution of the field. This blog is also available as an RSS feed.

TLS or HTTPS everywhere is not necessary the right answer

Since the revelations about the massive surveillance by Edward Snowden, we have observed a strong move towards increasing the utilisation of encryption to protect the end-to-end traffic exchanged by Internet hosts. Various Internet stakeholders have made strong move on recommending strong encryption, e.g. :

  • The IETF has confirmed in RFC 7258 that pervasive monitoring is an attack and needs to be countered
  • The EFF has promoted the utilisation of HTTPS through the HTTPS-everywhere campaign and browser extension
  • The Let’s Encrypt campaign prepares a new certification authority to ease the utilisation of TLS
  • Mozilla has announced plans to deprecate non-secure HTTP
  • Most large web companies have announced plans to encrypt traffic between their datacenters

Pervasive monitoring is not desirable and researchers should aim at finding solutions, but encrypting everything is not necessarily the best solution. As an Internet user, I am also very concerned by the massive surveillance that is conducted by various commercial companies.

http://arstechnica.com/security/2013/11/encrypt-all-the-worlds-web-traffic-internet-architects-propose/

Segment Routing in the Linux kernel

Segment Routing is a new packet forwarding technique which is being developed by the SPRING working group of the IETF. Until now, two packet forwarding techniques were supported by the IETF protocols :

  • datagram mode with IPv4 and IPv6
  • label swapping with MPLS

Segment Routing is a modern realisation of source routing that was supported by IPv4 in RFC 791 and initially in IPv6 RFC 2460. Source routing enables a source to indicate inside each packet that it sends a list of intermediate nodes to reach the final destination. Although rather old, this technique is not widely used today because it causes several security problems. For IPv6, various attacks against source routing were demonstrated in 2007. In the end, the IETF chose to deprecate source routing in IPv6 RFC 5095.

However, source routing has several very useful applications inside a controlled network such as an entreprise or a single ISP network. For this reason, the IETF has revived source routing and considers two data planes :

  • IPv6
  • MPLS

Evolution of link bandwidths

During my first lesson for the undergrad networking class, I wanted to provide the students with some historical background of the evolution of link bandwidth. Fortunately, wikipedia provides a very interesting page that lists most of the standards for modems, optical fibers, …

A first interesting plot is the evolution of the modems that allow to transmit data over the traditional telephone network. The figure below, based on information extracted from http://en.m.wikipedia.org/wiki/List_of_device_bandwidths shows the evolution of the modem technology. The first method to transfer data was the Morse code that appeared in the mid 1800s. After that, it took more than a century to move to the Bell 101 modem that was capable of transmitting data at 110 bits/sec. Slowly, 300 bps and later 1200 bps modems appeared. The late 1980s marked the arrival of faster modems with 9.6 kbps and later 28.8 and 56 kbps. This marked the highest bandwidth that was feasible on a traditional phone line. ISDN appeared in the late 1980s with a bandwidth of 64 kbps on digital lines that was later doubled.

When the telephone network become the bottleneck, telecommunication manufacturers and network operators moved to various types of Digital Subscriber Lines technologies, ADSL being the most widespread. From the early days at 1.5 Mbps downstream to the latests VDSL deployments, bandwidth has increased by almost two order of magnitude. As of this writing, it seems that xDSL technology is reaching its limits and while bandwidth will continue to grow, the rate of improvement will not remain as high as in the past. In parallel, CATV operators have deployed various versions of the DOCSIS standards to provide data services in cable networks. The next step is probably to go to fiber-based solutions, but they cost more than one order of magnitude more than DSL services and can be difficult to deploy in rural areas.

The performance of wireless networks has also significantly improved. As an illustration, and again based on data from http://en.m.wikipedia.org/wiki/List_of_device_bandwidths here is the theoretical maximum bandwidth for the various WiFi standards. From 2 Mbps for 802.11 in 1997, bandwidth increased to 54 Mbps in 2003 for 802.11g and 600 Mbps for 802.11n in 2009.

The datasets used in this post are partial. Suggestions for additional datasets that could be used to provide a more detailed view of the evolution of bandwidth are more than welcome. For optical fiber, an interesting figure appeared in Nature, see http://www.nature.com/nphoton/journal/v7/n5/fig_tab/nphoton.2013.94_F1.html

Flipping an advanced networking course

Before the beginning of the semester, Nick Feamster informed me that he decided to flip his advanced networking course . Various teachers have opted for flipped classrooms to increase the interaction with students. Instead of using class time to present theory, the teacher focuses his/her attention during the class on solving problems with the students. Various organisations of a flipped classroom have been tested. Often, the teacher posts short videos that explain the basic principles before the class and the students have to listen to the videos before attending the class. This is partially the approach adopted by Nick Feamster for his class.

By bringing more interaction in the classroom, the flipped approach is often considered to be more interesting for the teacher as well as for the student. Since my advanced networking class gathers only a few tens of students, compared to the 100+ and 300+ students of the other courses that I teach, I also decided to flip one course this year.

The advanced networking course is a follow-up to the basic networking course. I cover several advanced topics and aims at explaining to the students the operation of large Internet Service Provider networks. The main topics covered are :

  • Interdomain routing with BGP (route reflectors, traffic engineering, …)
  • Traffic control and Quality of Service (from basic mechanisms - shaping, policing, scheduling, buffer acceptance - to services - integrated or differentiated services)
  • IP Multicast and Multicast routing protocols
  • Multiprotocol Label Switching
  • Virtual Private Networks
  • Middleboxes

The course is complemented by projects during which the students configure and test realistic networks built from Linux-based routers.

During the last decade, I’ve taught this course by using slides and presenting them to the students and discussing the theoretical material. I could have used some of them to record videos explaining the basic principles, but I’m still not convinced by the benefits of using video online as a learning vehicle. Video is nice for demonstrations and short introductory material, but students need written material to understand the details. For this reason, I’ve decided to opt for a seminar-type approach where the students read one or two articles every week to understand the basic principles. Then, the class focuses on discussing real cases or exercises.

Many courses are organized as seminars during which the students read recent articles and discuss them. Often, these are advanced courses and the graduate students read and comment recent scientific articles. This approach was not applicable in many case given the maturity of the students who follow the advanced networking course. Instead of using purely scientific articles, I’ve opted for tutorial articles that appear in magazines such as IEEE Communications Magazine or the Internet Protocol Journal . These articles are easier to read by the students and often provide good tutorial content with references that the students can exploit if they need additional information.

The course has started a few weeks ago and the interaction with the student has been really nice. I’ll regularly post updates on the articles that I’ve used, the exercises that have been developed and the student’s reactions. Comments are, of course, welcome.

Happy eyeballs makes me unhappy…

Happy eyeballs, defined in RFC 6555, is a technique that enables dual-stack hosts to automatically select between IPv6 and IPv4 based on their respective performance. When a dual-stack host tries to contact a webserver that is reachable over both IPv6 and IPv4, it :

  • first tries to establish a TCP connection towards the IPv6 or IPv4 address and starts a short timeout, say 300 msec
  • if the connection is established over the chosen address family, it continues
  • if the timer expires before the establishment of the connection, a second connection is tried with the other address family

Happy eyeballs works well when the one of the two address families provides bad performance or is broken. In this case, a host using happy eyeballs will automatically avoid the broken address family. However, when both IPv6 and IPv4 work correctly, happy eyeballs may cause frequent switches between the two address families.

As an exemple, here is a summary of a packet trace that I collected when contacting a dual-stack web server from my laptop using the latest version of MacOS.

First connection

09:40:47.504618 IP6 client6.65148 > server6.80: Flags [S], cksum 0xe3c1 (correct), seq 2500114810, win 65535, options [mss 1440,nop,wscale 4,nop,nop,TS val 1009628701 ecr 0,sackOK,eol], length 0
09:40:47.505886 IP6 server6.80 > client6.65148: Flags [S.], cksum 0x1abd (correct), seq 193439890, ack 2500114811, win 14280, options [mss 1440,sackOK,TS val 229630052 ecr 1009628701,nop,wscale 7], length 0

The interesting information in these packets are the TCP timestamps. Defined in RFC 1323, these timestamps are extracted from a local clock. The server returns its current timestamp in the SYN+ACK segment.

Thanks to happy eyeballs, the next TCP connection is sent over IPv4 (it might be faster than IPv6, who knows). IPv4 works well and answers immediately

09:40:49.512112 IP client4.65149 > server4.80: Flags [S], cksum 0xee77 (incorrect -> 0xb4bd), seq 321947613, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1009630706 ecr 0,sackOK,eol], length 0
09:40:49.513399 IP (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto TCP (6), length 60) server4.80 > client4.65149: Flags [S.], cksum 0xd86f (correct), seq 873275860, ack 321947614, win 5792, options [mss 1380,sackOK,TS val 585326122 ecr 1009630706,nop,wscale 7], length 0

Note the TS val in the returning SYN+ACK. The value over IPv4 is much larger than over IPv6. This is not because IPv6 is faster than IPv4, but indicates that there is a load-balancer that balances the TCP connections between (at least) two different servers.

Shortly after, I authenticated myself over an SSL connection that was established over IPv4

09:41:26.566362 IP client4.65152 > server4.443: Flags [S], cksum 0xee77 (incorrect -> 0x420d), seq 3856569828, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 1009667710 ecr 0,sackOK,eol], length 0
09:41:26.567586 IP server4.443 > client4.65152: Flags [S.], cksum 0x933e (correct), seq 3461360247, ack 3856569829, win 14480, options [mss 1380,sackOK,TS val 229212430 ecr 1009667710,nop,wscale 7], length 0

Again, a closer look at the TCP timestamps reveals that there is a third server that terminated the TCP connection. Apparently, in this case this was the load-balancer itself that forwarded the data extracted from the connection to one of the server.

Thanks to happy eyeballs, my TCP connections reach different servers behind the load-balancer. This is annoying because the web servers maintain one session and every time I switch from one session to another I might switch from one server to another. In my experience, this happens randomly with this server, possibly as a function of the IP addresses that I’m using and the server load. As a user, I experience difficulties to log on the server or random logouts, while the problem lies in unexpected interactions between happy eyeballs and a load balancer. The load balancer would like to stick all the TCP connections from one host to the same server, but due to the frequent switchovers between IPv6 and IPv4 it cannot stick each client to a server.

I’d be interested in any suggestions on how to improve this load balancing scheme without changing the web servers…

Sandstorm : even faster TCP

Researchers have worked on improving the performance of TCP since the early 1980s. At that time, many researchers considered that achieving high performance with a software-based TCP implementation was impossible. Several new transport protocols were designed at that time such as XTP. Some researchers even explored the possibility of implementing transport protocols. Hardware-based implementations are usually interesting to achieve high performance, but they are usually too inflexible for a transport protocol. In parallel with this effort, some researchers continued to believe in TCP. Dave Clark and his colleagues demonstrated in [1] that TCP stacks could be optimized to achieve high performance.

TCP implementations continued to evolve in order to achieve even higher performance. The early 2000s, with the advent of Gigabit interfaces showed a better coupling between TCP and the hardware on the network interface. Many high-speed network interfaces can compute the TCP checksum in hardware, which reduces the load on the main CPU. Furthermore, high-speed interfaces often support large segment offload. A naive implementation of a TCP/IP stack would be to send segments and acknowledgements independently. For each segment sent/received, such a stack could need to process one interrupt, a very costly operation on current hardware. Large segment offload provides an alternative by exposing to the TCP/IP stack a large segment size, up to 64 KBytes. By sending an receiving larger segments, the TCP/IP stack minimizes the cost of processing the interrupts and thus maximises its performance.

Read more...

Broadband in Europe

The European Commission has published recently an interesting survey of the deployment of broadband access technologies in Europe. The analysis is part of the Commission’s Digital Agenda that aims at enabling home users to have speeds of 30 Mbps or more. Technologies like ADSL and cable modems are widely deployed in Europe and more than 95% of European households can subscribe to fixed broadband networks.

The report provides lots of data about the different technologies and countries. Some of the figures are worth being highlighted. A first interesting figure is the distribution of the different advanced access technologies. Satellite is widely available given its large footprint. DSL is also widely available, but less in rural areas. The newer technologies like VDSL, FTTP, WiMAX, LTE and DOCSIS3 cable start to appear.

../../../_images/coverage.png

Broadband coverage in Europe, source Study on broadband coverage 2012

For the standard fixed broadband network technologies, the top three in terms of coverage is Netherlands, Malta and Belgium. It remains simpler to largely deploy broadband in small countries. For the next generation broadband access, the same three countries remain at the top except that Malta has better coverage than the Netherlands. For rural regions, Luxemburg is the best country in terms of coverage.

../../../_images/coverage-map.png

The coverage map of next generation broadband coverage shows that some countries are much better covered than others. There is still work to be done by network operators to deploy advanced technologies to meet the 30 Mbps expectation of the Digital agenda.