Towards faster web downloads and some interesting data

Decreasing the time required to download web pages is an obsession for a large number of content providers. They rely on various tricks to speed up the download of web pages. Many of these tricks are widely known, although they are not implemented on all web servers. Some depend on the content itself. A smaller web page will always load faster than a large web page. This is the reason why major content providers optimise their HTML content to reduce the unnecessary tags or compress their javascript scripts. They also heavily rely on cacheable information such as images, CSS, javascript, … Some also use gzip-based compression to dynamically compress the data that needs to be transmitted over the wire. This is particularly useful when web pages are delivered over low bandwidth links such as to mobile phones.

Two years ago, at the beginning of their work on SPDY, google published interesting data about the size of web accessible content on https://developers.google.com/speed/articles/web-metrics

The analysis was based on the web pages collected by googlebot. It reflects thus a large fraction of the public web. Some of the key findings of this analysis include :

  • the average web page results in the transmission of 320 KBytes
  • there are on average more than 40 GET Requests per web page and a web page is retrieved by contacting 7 different hostnames. The average page contains 29 images, 7 scripts and 3 CSS
  • on average, each HTTP GET results in the retrieval of only 7 KBytes of data

The public web is thus widely fragmented and this fragmentation has an impact on its performance.

Last year, Yahoo researchers presented an interesting analysis of the work they did to optimise the performance of the web pages served by Yahoo [AFERG11]. This analysis focuses on the yahoo web pages but still provides some very interesting insights on the performance of the web. Some of the key findings of this analysis include :

  • around 90% of the objects served by yahoo are smaller than 25 Kbytes

  • despite the utilisation of HTTP/1.1, the average number of requests per TCP connection is still only 2.24 and 90% of the TCP connections do not carry more than 4 GETs

  • web page download is heavily affected by packet losses and packet losses occur. 30% of the TCP connections are slowed by retransmissions

    ../../../_images/loss.png

    Packet retransmission rate observed on the yahoo! CDN Source [AFERG11]

  • increasing the initial TCP window size as recently proposed and implemented on Linux reduces the page download time, even when packet losses occur

  • increasing the initial TCP window size may cause some unfairness problems given the small duration of TCP connections

There is still a lot of work to be done to reduce page download times and many factors influence the perceived performance of the web.

[AFERG11](1, 2) Mohammad Al-Fares, Khaled Elmeleegy, Benjamin Reed, and Igor Gashinsky. Overclocking the yahoo!: cdn for faster web page loads. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, IMC ‘11, 569–584. New York, NY, USA, 2011. ACM. URL: http://doi.acm.org/10.1145/2068816.2068869, doi:10.1145/2068816.2068869.