Performance Diagnosis: Network Unreliability

Knowledge Drop

Last tested: Nov 2, 2018
 

In order to use Looker, end users need to make requests from their client browser to the Looker server. The Looker server usually lives on another machine than the one end users are requesting from, so these requests need to traverse a network (or networks).

A network is a collection of connected machines capable of transmitting data amongst themselves. If a network has a route to another network by means of a router and routing table, machines in one network can transmit data to and from machines in another network. The internet is just a massive conglomeration of networks.

If I'm at work within my office's network and I'm requesting resources from a Looker server hosted in Looker's VPC, I'm traversing multiple networks. In other words, the packets transmitted to and from our respective machines need to jump through multiple networks on their journey. When troubleshooting network unreliability, it's usually helpful to determine:

  1. what network devices are between the client and the server
  2. how long it generally takes to make each jump from device to device
  3. whether there is any packet loss between devices

To do this, these are your tools:

Traceroute

Traceroute is a simple tool to show the pathway to a remote server. To use it, open a command line terminal on the machine you want to test from and run traceroute <host> like:

traceroute google.com

The first line of the output tells us the conditions that traceroute is operating under. I.e. the maximum number of hops to make, as well as the size of the packets. After the first line, each subsequent line represents a "hop", or intermediate host that your traffic must pass through to reach the computer represented by the host you specified. Each line has the following format:

hop_number host_name (IP_address) packet_round_trip_times 

Times above 150ms are generally considered to be long for a trip within the continental United States. (Times over 150ms may be normal if the signal crosses an ocean, however.) Keep an eye out for where the trouble spots may exist, and narrow down if the issue exists within their network. For a deeper dive on analyzing the results see here.

Ping

The ping command sends packets of data to a specific IP address on a network, and then lets you know how long it took to transmit that data and get a response. To use it, open a shell on the machine you want to test from and run ping <host> like: 

ping google.com

Each line will have the following output:

x bytes from <host> (<ip>): icmp_seq=x ttl=x time=x.x ms

The first line shows the URL you’re pinging, the IP address associated with that URL, and the size of the packets being sent. The next lines show the replies from each individual packet, including the time (in milliseconds) it took for the response and the time-to-live (TTL) of the packet, which is the amount of time that must pass before the packet is discarded. Go ahead and kill the process with control + C to see the summary statistics of how many packets were sent and received, as well as the minimum, maximum, and average response times.

The ICMP protocol (which ping uses) has no agreed "standard" response time so there's no golden rule on what constitutes a slow reply. On a high level though, keep in mind there may be a lower tolerance of high latency when streaming resources compared to loading a page. For a deeper dive on analyzing the results see here.
 

MTR

MTR is basically a combo of ping and traceroute insofar as it allows you to constantly poll a remote server and see how the latency and performance changes over time. It's not installed on most systems by default. To use it, open a command line terminal on the machine you want to test from and run mtr <host> like:

mtr google.com

Each line will have the following format:

HOST: example Loss% Snt Last Avg Best Wrst StDev

mtr will run indefinitely unless you include the --report option to specify the number of packets to send. After the first line, which contains the column headers, each subsequent line represents a hop. In addition to seeing the hosts that packets hop from during their journey, the results show the % of packet loss at each hop, the latency in milliseconds of packets sent, and the standard deviation of each host. The higher the standard deviation, the greater the difference is between measurements of latency and the less reliable the host packet to packet. Review the hops of mtr alongside the round trip latency of ping to narrow down where slowness is hitting the packets during the request/response. For a deeper dive on analyzing the results see here.

This content is subject to limited support.                

Version history
Last update:
‎06-14-2021 05:46 PM
Updated by: