A Look Into Dublin Traceroute's Results · Dublin Traceroute's blog

16 Sep 2017, 13:06

Author: Andrea Barberio

In the previous post, Observing ECMP with Dublin Traceroute, I briefly talked about ECMP and showed how to visualize all (well, hopefully) the paths between two nodes A and B. In particular I had run dublin-traceroute from a 3G connection in Ireland towards Google’s public DNS, 8.8.8.8. This is a very interesting IP address, as it is a DNS, it is deployed via anycast, and is widely distributed around the world, so I will use it often thorughout this blog.

In this post we will analyze it, and at the end I’ll show how to interpret certain packet drops and what to do in those cases.

Dublin Traceroute’s output

In the previous post we ran simply dublin-traceroute 8.8.8.8. The output started with:

Starting dublin-traceroute
Traceroute from 0.0.0.0:12345 to 8.8.8.8:33434~33453 (probing 20 paths,
    min TTL is 1, max TTL is 30, delay is 10 ms)

The second line contains some interesting information: dublin-traceroute is sending packets to 8.8.8.8 with 20 different destination ports (33434 to 33453), the source port is 12345, and the TTL in the packets is variable between 1 and 30.

All the packets are UDP, which is the only layer 4 protocol that dublin-traceroute implements at the moment, more will come though. Each packet has a custom payload that is generated in a way to control the UDP checksum (more on this in a future post).

Before sendind each packet, dublin-traceroute will wait 10 milliseconds, that is 100 packets per second. It may seem a small number for a modern network, but we will see later that for traceroute it could be too large.

All of the above means that dublin-traceroute will send 30 * 20 = 600 packets for a full 20-paths traceroute, and it will take approximately 6 seconds to send everything out.

Flow ID

Next we see a header:

== Flow ID 33434 ==

This header is repeated below with different values, and represent a single flow of packets through an unique path (unless the paths change during the traceroute). This flow also corresponds to the destination port of the UDP packets, and as seen above, we are using 20 different paths, that correspond to 20 different destination ports.

As also discussed above, dublin-traceroute will send 30 packets packets for each (source port, destination port) combination towards a given host.

Flows, microflows, 5-tuples?

A flow in Dublin Traceroute’s slang is a 5-tuple of source host and port, destination host and port, and protocol. This is alsk known as microflow (RFC2474, Differentiated Service fields in IPv4 and IPv6), and as discussed in Paris Traceroute. This 5-tuple is used by ECMP to decide to which next hop a packet will be forwarded to, and by Dublin Traceroute to force packets through a specific path.

Packet replies

Then we see a summary of the replies for each TTL in the given flow. Not all of them are interesting, so only a few are shown below (output line broken for readability):

1    192.168.43.1 (gateway), IP ID: 17503 RTT 7.657 ms
     ICMP (type=11, code=0) 'TTL expired in transit', NAT ID: 0,
     flow hash: 25516

This is the first hop, specifically the gateway. We have:

the IP address that responded to our probe, 192.168.43.1
its host name as resolved by DNS (gateway)
the IP ID of the packet that this device received from us (17503). This is one of the new features added by Dublin Traceroute, and it is important to detect NATs
the Round-Trip Time in milliseconds (7.657)
the ICMP type, code, and description, TTL expired in transit
the NAT ID, related to the IP ID above
the flow hash, 25516, a number used internally to represent a network path between two hosts

The silent hop

The second hop did not respond at all, so we just see an asterisk:

2    *

Traversing NATs

Things are getting interesting at hop 11:

11    172.16.101.1 (172.16.101.1), IP ID: 0 RTT 40.522 ms
      ICMP (type=11, code=0) 'TTL expired in transit',
      NAT ID: 42753 (NAT detected), flow hash: 25516

Here we see a new information: NAT ID: 42753 (NAT detected). What this means is that Dublin Traceroute has detected a device distant 11 hops that is translating network addresses. NATs rewrite part of the IP header to replace at least the source IP, and possibly more fields, and this condition can be detected by the data contained in the response. How this is done is material for another article, so we’ll skip this now. But the source code is available so feel free to find out yourself :-)

One last note on this: Dublin Traceroute can detect multiple different NATs, so if a new address translation happens on the path it will be detected.

And eventually, the destination

18    8.8.8.8 (google-public-dns-a.google.com), IP ID: 39240 RTT 68.68 ms
      ICMP (type=3, code=3) 'Destination port unreachable',
      NAT ID: 42753, flow hash: 25516

The last hop is our destination, 8.8.8.8, who diligently responded to our requests. To be fair, we were diligent too, and used the port ranges dedicated to traceroute, i.e. 33434 and above.

Side note: 33434 seems to be born as 2 ^ 15 (halfway the highest port number) + 666, or in other words traceroute is the tool of the beast.

Note the “Destination port unreachable” ICMP, different from the ‘TTL expired in transit’ that we received so far. this means that we reached the target, and Dublin Traceroute will consider this the last hop.

What if some intermediate device or the target did not respond?

Simple enough: we see an asterisk! Does this mean that the device is broken? Not necessarily, there are several possible causes. For example:

well, yes, the device is broken
the return path is asymmetric, and something is broken on the return path
the device is overloaded and dropped our request
the device is actively dropping some or all of our requests

This list is not exhaustive.

We will see in the future that it’s possible to detect all of these issues in some simpler or harder way. But let’s see at least what happened to our original traceroute.

If you look at the traceroute from the previous post you will notice that some hops are consistently not responding, while a couple of them (192.168.43.1 and 8.8.8.8) are not responding only to certain probes.

While it’s not possible to tell with certainty what’s happening, we can interpret it and make good (or bad) guesses. After all, traceroute is all about guessing what’s going on!

The totally unresponsive hops: my guess is that these hops are just configured not to respond with ICMP TTL-expired. Why? Maybe to save resources, since ICMP TTL-expired packets are processed in the slow path, i.e. in the CPU rather than in the faster ASICs, and for this reason it’s very expensive. Or maybe the network administrators for that device believe that it’s better to hide it for security reasons. I won’t judge :-)

The partially-responsive hops: these are more interesting. Why did they respond to certain packets, and ignored others? Maybe we hit a rate limiter: after all ICMP TTL-expired are expensive, and the network administrators may have configured a rate limiter to avoid that many ICMP TTL-exired could take down the network device. The rate limiters normally use a counter that is shared among all the packets, so running traceroute from a different IP won’t probably help. Remember when I said that 100 packets per second could be too much? Well, this could be the case. We can try and increase the inter-packet delay using the --delay option of dublin-traceroute, and the responses should appear again.

Or maybe there is an actual problem on some paths, i.e. some faulty network device or cable. In that case we would see the problem consistently, and slowing down the packet rate won’t help. And obviously we would see that all the packets sent on that path will stop responding after that point. What can we do? Well, wait until it’s fixed, or contact the network administrator :-)

What’s next?

In the next post I will show how the graphical representation of a multipath traceroute can be more immediate to understand, and can be useful to spot problems that would otherwise be hard to see. All of this with some real life example of course :-)