Bug #2150: NFD does not honor netem delays - NFD - NDN project issue tracking system

Actions

Copy link

Bug #2150

closed

NFD does not honor netem delays

Added by Syed Amin over 10 years ago. Updated about 3 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Category:

Target version:

Start date:

Due date:

% Done:

Estimated time:

Description

I've a machine with multiple interfaces:

$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:24:e8:7a:44:cd
          inet addr:155.98.38.81  Bcast:155.98.39.255  Mask:255.255.252.0
          inet6 addr: fe80::224:e8ff:fe7a:44cd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11552 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2958 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1770097 (1.7 MB)  TX bytes:559424 (559.4 KB)

eth1      Link encap:Ethernet  HWaddr 00:24:e8:7a:44:cf
          inet addr:10.1.1.2  Bcast:10.1.1.255  Mask:255.255.255.0
          inet6 addr: fe80::224:e8ff:fe7a:44cf/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:674 errors:0 dropped:0 overruns:0 frame:0
          TX packets:636 errors:1 dropped:0 overruns:0 carrier:1
          collisions:14 txqueuelen:1000
          RX bytes:57477 (57.4 KB)  TX bytes:48981 (48.9 KB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:334 errors:0 dropped:0 overruns:0 frame:0
          TX packets:334 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:49347 (49.3 KB)  TX bytes:49347 (49.3 KB)

and the following entry in the /etc/hosts:

10.1.1.3        CAIDA-lan0 CAIDA-0 CAIDA

I added a delay using netem on eth1 (10.1.1.2):

sudo tc qdisc add dev eth1 root netem delay 100ms

and added an entry in nfd using nfdc:

HOME=/tmp nfdc register -c 25 /caida udp://caida

$ nfd-status -fb
Faces:
  faceid=1 remote=internal:// local=internal:// counters={in={0i 143d 0B} out={144i 0d 0B}} local persistent point-to-point
  faceid=254 remote=contentstore:// local=contentstore:// counters={in={0i 0d 0B} out={0i 0d 0B}} local persistent point-to-point
  faceid=255 remote=null:// local=null:// counters={in={0i 0d 0B} out={0i 0d 0B}} local persistent point-to-point
  faceid=256 remote=udp4://224.0.23.170:56363 local=udp4://155.98.38.81:56363 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
  faceid=257 remote=udp4://224.0.23.170:56363 local=udp4://10.1.1.2:56363 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
  faceid=258 remote=ether://[01:00:5e:00:17:aa] local=dev://eth0 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
  faceid=259 remote=ether://[01:00:5e:00:17:aa] local=dev://eth1 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
  faceid=260 remote=fd://23 local=unix:///run/nfd.sock counters={in={112i 27d 37014B} out={27i 102d 66238B}} local on-demand point-to-point
  faceid=262 remote=fd://24 local=unix:///run/nfd.sock counters={in={1i 0d 422B} out={0i 1d 785B}} local on-demand point-to-point
  faceid=264 remote=udp4://10.1.1.3:6363 local=udp4://10.1.1.2:6363 counters={in={0i 23d 8829B} out={23i 0d 894B}} non-local persistent point-to-point
  faceid=299 remote=fd://25 local=unix:///run/nfd.sock counters={in={1i 0d 46B} out={0i 0d 0B}} local on-demand point-to-point
FIB:
  /localhost/nfd nexthops={faceid=1 (cost=0)}
  /caida nexthops={faceid=264 (cost=25)}
  /localhost/nfd/rib nexthops={faceid=260 (cost=0)}

The remote machine (caida in this case) is running ndnpingserver on "/caida" prefix.

When I do simple ping, I can see the intended delays:

$ ping caida
PING CAIDA-lan0 (10.1.1.3) 56(84) bytes of data.
64 bytes from CAIDA-lan0 (10.1.1.3): icmp_req=1 ttl=64 time=100 ms
64 bytes from CAIDA-lan0 (10.1.1.3): icmp_req=2 ttl=64 time=100 ms

But when I do ndnping I don't see any delay?

$ ndnping /caida

=== Pinging /caida ===

Content From /caida - Ping Reference = 500905038    - Round Trip Time = 0.012777 ms
Content From /caida - Ping Reference = 500905039    - Round Trip Time = 0.009988 ms
Content From /caida - Ping Reference = 500905040    - Round Trip Time = 0.009912 ms

Any idea what's going on here? My guess was nfd is not using the right face to forward the interest. But ndndump shows that the packets are going through eth1:

$ HOME=/tmp/ sudo ndndump -v -i eth1 > /tmp/ndndump.out &

$ ndnping /caida

=== Pinging /caida ===

Content From /caida - Ping Reference = 716405588    - Round Trip Time = 0.012228 ms
Content From /caida - Ping Reference = 716405589    - Round Trip Time = 0.00985 ms

$ cat /tmp/ndndump.out
1415385072.714421 From: 10.1.1.2, To: 10.1.1.3, Tunnel Type: UDP, INTEREST: /caida/ping/716405588?ndn.MustBeFresh=1&ndn.Nonce=716405588
1415385072.722551 From: 10.1.1.3, To: 10.1.1.2, Tunnel Type: UDP, DATA: /caida/ping/716405588
1415385073.706848 From: 10.1.1.2, To: 10.1.1.3, Tunnel Type: UDP, INTEREST: /caida/ping/716405589?ndn.MustBeFresh=1&ndn.Nonce=716405589
1415385073.715069 From: 10.1.1.3, To: 10.1.1.2, Tunnel Type: UDP, DATA: /caida/ping/716405589

Actions

Copy link

Updated by Syed Amin over 10 years ago

Please move this bug to NFD. Thanks!

Actions

Copy link

Updated by Junxiao Shi over 10 years ago

Project changed from ndn-cxx to NFD
Subject changed from NFD is not supporting the delays enforced by netem. to NFD does not honor netem delays
Description updated (diff)
Category set to Faces
Target version set to v0.3

Actions

Copy link

Updated by Junxiao Shi over 10 years ago

Assignee set to Giulio Grassi

Confirmed. This happens for both UDP and TCP face.

UDP-based IP traceroute shows over 100ms delay, so I assume netem command is correct.

Actions

Copy link

Updated by Syed Amin over 10 years ago

Want to add that this started to happen after recent git pull. The commit that I was using previously (8bfd56d051bb1b9f8f78623fba7e7414cb6a5edd) didn't have this problem.

Actions

Copy link

Updated by Alex Afanasyev over 10 years ago

How this can ever happen? NFD doesn't do anything special, except using IP stack. If some combination of flags that we are setting on sockets does some funky with tc, then it is tc's bug.

There could be some timing issues with ndnping, though...

Actions

Copy link

Updated by Syed Amin over 10 years ago

Okay, so the problem is not in NFD, it is in ndn-tlv-ping. I downgraded ndn-tlv-ping to the commit: 38f1ea7698108bdd1cdbdd621316f020b8d0e4dc and it seems to be working fine now.

Other commit ids are:

ndn-cxx: 56860f5cd10bc68d771e9bb93a83d0906107cc9d
nfd: 689f0e9d52f6c08bc1e4f57690340a31b65e27d8

Actions

Copy link

Updated by Lan Wang over 10 years ago

Alex/Junxiao: Who's the right person to fix ndn-tlv-ping?

Actions

Copy link

Updated by Junxiao Shi over 10 years ago

Assignee deleted (~~Giulio Grassi~~)

Okay, so the problem is not in NFD, it is in ndn-tlv-ping. I downgraded ndn-tlv-ping to the commit: 38f1ea7698108bdd1cdbdd621316f020b8d0e4dc and it seems to be working fine now.

How were you able to downgrade ndn-tlv-ping to a version before C++11 transition? It won't compile in C++03 mode.

If you believe it's indeed a problem in ndn-tlv-ping, please do an additional experiment:

Use two topologies. The first topology is same as what you are initially using. The second topology is below:

Topology: A-R-B
R is an IP router; use netem to add 500ms delay on both NICs
A runs producer without netem rules
B runs consumer without netem rules

Please try three producer/consumer combinations:

ndnpingserver/ndnping
ndn-traffic/ndn-traffic-server
ndn-tlv-poke/ndn-tlv-peek (measure rough RTT with time on consumer end)

If ndnpingserver/ndnping is the only pair that reports incorrect RTT, please report a bug on ndn-tlv-ping GitHub site.

Actions

Copy link

Updated by Syed Amin over 10 years ago

How were you able to downgrade ndn-tlv-ping to a version before C++11 transition? It won't compile in C++03 mode.

I first downgraded all packages (ndn-cxx, nfd and ndn-tlv-ping) to the last versions that support C++03 mode and built them. After that I upgraded ndn-cxx and nfd only.

Please try three producer/consumer combinations:

ndnpingserver/ndnping

ndn-traffic/ndn-traffic-server

ndn-tlv-poke/ndn-tlv-peek (measure rough RTT with time on consumer end)

If ndnpingserver/ndnping is the only pair that reports incorrect RTT, please report a bug on ndn-tlv-ping GitHub site.

I am quite sure the problem is in ndn-tlv-ping but I can check above tools as well.

Actions

Copy link

#10

Updated by Davide Pesavento over 10 years ago

I think the regression was introduced in ndn-tlv-ping commit 81b054658eb9221128625e05bf6fd0458b661e22, where the two bind() in expressInterest() were replaced by lambdas. The problem is that the body of the lambda is lazily evaluated, as opposed to the bind arguments which are evaluated immediately at the call site, therefore in the current code steady_clock::now() is evaluated only when the callback is executed, i.e. when the ping reply comes back. If my analysis is correct, then ndn-tlv-ping RTT calculation is completely broken at the moment, even without netem.

Actions

Copy link

#11