May 22, 2013

Don’t use ping for accurate delay measurements

The ping software was designed decades ago to verify the reachability of a given IPv4 address. For this, it relies on ICMP that runs on top of IPv4. A host that receives an ICMP echo request message is supposed to reply immediately by sending an ICMP echo reply message. This confirms the reachability of the remote host. By measuring the delay between the transmission of the echo request message and the reception of the echo reply message, it is possible to infer the round-trip-time between the two hosts. Since the round-trip-time is important for the performance of many Internet protocols, this is an important metric which is reported by ping. Some variants of ping also report the minimum and maximum delays after measuring a number of round-trip-times. A typical example is shown below

ping www.psg.com
PING psg.com (147.28.0.62): 56 data bytes
bytes from 147.28.0.62: icmp_seq=0 ttl=48 time=148.715 ms
bytes from 147.28.0.62: icmp_seq=1 ttl=48 time=163.814 ms
bytes from 147.28.0.62: icmp_seq=2 ttl=48 time=148.780 ms
bytes from 147.28.0.62: icmp_seq=3 ttl=48 time=153.456 ms
bytes from 147.28.0.62: icmp_seq=4 ttl=48 time=148.935 ms
bytes from 147.28.0.62: icmp_seq=5 ttl=48 time=153.647 ms
bytes from 147.28.0.62: icmp_seq=6 ttl=48 time=148.682 ms
bytes from 147.28.0.62: icmp_seq=7 ttl=48 time=163.926 ms
bytes from 147.28.0.62: icmp_seq=8 ttl=48 time=148.669 ms
bytes from 147.28.0.62: icmp_seq=9 ttl=48 time=153.352 ms
bytes from 147.28.0.62: icmp_seq=10 ttl=48 time=163.688 ms
bytes from 147.28.0.62: icmp_seq=11 ttl=48 time=148.729 ms
bytes from 147.28.0.62: icmp_seq=12 ttl=48 time=163.691 ms
bytes from 147.28.0.62: icmp_seq=13 ttl=48 time=148.536 ms
^C
--- psg.com ping statistics ---
packets transmitted, 14 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 148.536/154.044/163.926/6.429 ms

In Computer Networking, Principles, Protocols and Practice, the following figure was used to illustrate the variation of the round-trip-time. This measurement was taken more than ten years ago between a host connected to a CATV modem in Charleroi and a server at the University of Namur. The main reason for the delay variations were the utilisation of the low speed link that we used at that time.

../../../_images/transport-fig-070-c.png

Evolution of the round-trip-time between two hosts

In a recent presentation at RIPE66, Randy Bush and several of his colleagues revealed some unexpected measurements collected by using ping. For these measurements, they used two unloaded servers and sent pings through mainly backbone networks. The figure below shows the CDF of the measured delays. The staircase curves were the first curves that they obtained. These delays look strange and several plateaux appear but it is not easy to find a clear explanation immediately.

Source : https://ripe66.ripe.net/presentations/128-130513.tokyo-ping.pdf

They studied these delays in more details and tried to understand the reason for the huge delay variations that they observed. To understand the source of the delay variations, it is useful to look back at the format of an ICMP message encapsulated inside an IPv4 packet.

The important part in this header is the first 32 bits word of the ICMPv4 header. For TCP and UDP, this word contains the source and destination ports of the transport flow. Many routers that support Equal Cost Multipath will compute a hash function over the source and destination IP addresses and ports for packets carrying TCP/UDP segments. However, how should such a load balancing router handle ICMP messages or other types of protocols that run directly on top of IPv4. A first option would be to always send ICMP messages over the same path, i.e. disable load balancing for ICMP messages. This is probably not a good idea from an operational viewpoint since this would imply that ICMP messages, that are often used for debugging, would not necessarily follow the same paths as regular packets. A better option would be to only use the source and destination IP addresses when load balancing ICMP messages. However, this requires the router to detect distinguish between UDP/TCP and other types of flows and react in function of the Protocol field of the IP header. This likely increases the cost of implementing load-balancing in hardware. The measurements presented above are probably, at least partially, caused by load-balancing routers that use the first 32 bits word of the IP payload to make their load balancing decision, without verifying the Protocol field in the IP header. The vertical bars shown in the figure above correspond to a modified ping that always send ICMP messages that start with the same first 32 bits word. However, this does not completely explain why there is a delay difference of more than 15 milliseconds on the equal cost paths between two servers. Something else might be happening in this network.

Additional details are discussed in On the suitability of ping to measure latency by Cristel Pelsser, Luca Cittadini, Stefano Vissicchio and Randy Bush. See https://ripe66.ripe.net/archives/video/12/ for the recorded video.

Posted by Olivier Bonaventure

Tags: measurements