Project

General

Profile

Bug #2150

NFD does not honor netem delays

Added by Syed Amin over 5 years ago. Updated over 5 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
Faces
Target version:
Start date:
11/07/2014
Due date:
% Done:

0%

Estimated time:

Description

I've a machine with multiple interfaces:

$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:24:e8:7a:44:cd
          inet addr:155.98.38.81  Bcast:155.98.39.255  Mask:255.255.252.0
          inet6 addr: fe80::224:e8ff:fe7a:44cd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11552 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2958 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1770097 (1.7 MB)  TX bytes:559424 (559.4 KB)

eth1      Link encap:Ethernet  HWaddr 00:24:e8:7a:44:cf
          inet addr:10.1.1.2  Bcast:10.1.1.255  Mask:255.255.255.0
          inet6 addr: fe80::224:e8ff:fe7a:44cf/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:674 errors:0 dropped:0 overruns:0 frame:0
          TX packets:636 errors:1 dropped:0 overruns:0 carrier:1
          collisions:14 txqueuelen:1000
          RX bytes:57477 (57.4 KB)  TX bytes:48981 (48.9 KB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:334 errors:0 dropped:0 overruns:0 frame:0
          TX packets:334 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:49347 (49.3 KB)  TX bytes:49347 (49.3 KB)

and the following entry in the /etc/hosts:

10.1.1.3        CAIDA-lan0 CAIDA-0 CAIDA

I added a delay using netem on eth1 (10.1.1.2):

sudo tc qdisc add dev eth1 root netem delay 100ms

and added an entry in nfd using nfdc:

HOME=/tmp nfdc register -c 25 /caida udp://caida
$ nfd-status -fb
Faces:
  faceid=1 remote=internal:// local=internal:// counters={in={0i 143d 0B} out={144i 0d 0B}} local persistent point-to-point
  faceid=254 remote=contentstore:// local=contentstore:// counters={in={0i 0d 0B} out={0i 0d 0B}} local persistent point-to-point
  faceid=255 remote=null:// local=null:// counters={in={0i 0d 0B} out={0i 0d 0B}} local persistent point-to-point
  faceid=256 remote=udp4://224.0.23.170:56363 local=udp4://155.98.38.81:56363 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
  faceid=257 remote=udp4://224.0.23.170:56363 local=udp4://10.1.1.2:56363 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
  faceid=258 remote=ether://[01:00:5e:00:17:aa] local=dev://eth0 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
  faceid=259 remote=ether://[01:00:5e:00:17:aa] local=dev://eth1 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
  faceid=260 remote=fd://23 local=unix:///run/nfd.sock counters={in={112i 27d 37014B} out={27i 102d 66238B}} local on-demand point-to-point
  faceid=262 remote=fd://24 local=unix:///run/nfd.sock counters={in={1i 0d 422B} out={0i 1d 785B}} local on-demand point-to-point
  faceid=264 remote=udp4://10.1.1.3:6363 local=udp4://10.1.1.2:6363 counters={in={0i 23d 8829B} out={23i 0d 894B}} non-local persistent point-to-point
  faceid=299 remote=fd://25 local=unix:///run/nfd.sock counters={in={1i 0d 46B} out={0i 0d 0B}} local on-demand point-to-point
FIB:
  /localhost/nfd nexthops={faceid=1 (cost=0)}
  /caida nexthops={faceid=264 (cost=25)}
  /localhost/nfd/rib nexthops={faceid=260 (cost=0)}

The remote machine (caida in this case) is running ndnpingserver on "/caida" prefix.

When I do simple ping, I can see the intended delays:

$ ping caida
PING CAIDA-lan0 (10.1.1.3) 56(84) bytes of data.
64 bytes from CAIDA-lan0 (10.1.1.3): icmp_req=1 ttl=64 time=100 ms
64 bytes from CAIDA-lan0 (10.1.1.3): icmp_req=2 ttl=64 time=100 ms

But when I do ndnping I don't see any delay?

$ ndnping /caida

=== Pinging /caida ===

Content From /caida - Ping Reference = 500905038    - Round Trip Time = 0.012777 ms
Content From /caida - Ping Reference = 500905039    - Round Trip Time = 0.009988 ms
Content From /caida - Ping Reference = 500905040    - Round Trip Time = 0.009912 ms

Any idea what's going on here? My guess was nfd is not using the right face to forward the interest. But ndndump shows that the packets are going through eth1:

$ HOME=/tmp/ sudo ndndump -v -i eth1 > /tmp/ndndump.out &

$ ndnping /caida

=== Pinging /caida ===

Content From /caida - Ping Reference = 716405588    - Round Trip Time = 0.012228 ms
Content From /caida - Ping Reference = 716405589    - Round Trip Time = 0.00985 ms

$ cat /tmp/ndndump.out
1415385072.714421 From: 10.1.1.2, To: 10.1.1.3, Tunnel Type: UDP, INTEREST: /caida/ping/716405588?ndn.MustBeFresh=1&ndn.Nonce=716405588
1415385072.722551 From: 10.1.1.3, To: 10.1.1.2, Tunnel Type: UDP, DATA: /caida/ping/716405588
1415385073.706848 From: 10.1.1.2, To: 10.1.1.3, Tunnel Type: UDP, INTEREST: /caida/ping/716405589?ndn.MustBeFresh=1&ndn.Nonce=716405589
1415385073.715069 From: 10.1.1.3, To: 10.1.1.2, Tunnel Type: UDP, DATA: /caida/ping/716405589
#1

Updated by Syed Amin over 5 years ago

Please move this bug to NFD. Thanks!

#2

Updated by Junxiao Shi over 5 years ago

  • Project changed from ndn-cxx to NFD
  • Subject changed from NFD is not supporting the delays enforced by netem. to NFD does not honor netem delays
  • Description updated (diff)
  • Category set to Faces
  • Target version set to v0.3
#3

Updated by Junxiao Shi over 5 years ago

  • Assignee set to Giulio Grassi

Confirmed. This happens for both UDP and TCP face.

UDP-based IP traceroute shows over 100ms delay, so I assume netem command is correct.

#4

Updated by Syed Amin over 5 years ago

Want to add that this started to happen after recent git pull. The commit that I was using previously (8bfd56d051bb1b9f8f78623fba7e7414cb6a5edd) didn't have this problem.

#5

Updated by Alex Afanasyev over 5 years ago

How this can ever happen? NFD doesn't do anything special, except using IP stack. If some combination of flags that we are setting on sockets does some funky with tc, then it is tc's bug.

There could be some timing issues with ndnping, though...

#6

Updated by Syed Amin over 5 years ago

Okay, so the problem is not in NFD, it is in ndn-tlv-ping. I downgraded ndn-tlv-ping to the commit: 38f1ea7698108bdd1cdbdd621316f020b8d0e4dc and it seems to be working fine now.

Other commit ids are:

ndn-cxx: 56860f5cd10bc68d771e9bb93a83d0906107cc9d
nfd: 689f0e9d52f6c08bc1e4f57690340a31b65e27d8
#7

Updated by Lan Wang over 5 years ago

Alex/Junxiao: Who's the right person to fix ndn-tlv-ping?

#8

Updated by Junxiao Shi over 5 years ago

  • Assignee deleted (Giulio Grassi)

Okay, so the problem is not in NFD, it is in ndn-tlv-ping. I downgraded ndn-tlv-ping to the commit: 38f1ea7698108bdd1cdbdd621316f020b8d0e4dc and it seems to be working fine now.

How were you able to downgrade ndn-tlv-ping to a version before C++11 transition? It won't compile in C++03 mode.

If you believe it's indeed a problem in ndn-tlv-ping, please do an additional experiment:

Use two topologies. The first topology is same as what you are initially using. The second topology is below:

  • Topology: A-R-B
  • R is an IP router; use netem to add 500ms delay on both NICs
  • A runs producer without netem rules
  • B runs consumer without netem rules

Please try three producer/consumer combinations:

  • ndnpingserver/ndnping
  • ndn-traffic/ndn-traffic-server
  • ndn-tlv-poke/ndn-tlv-peek (measure rough RTT with time on consumer end)

If ndnpingserver/ndnping is the only pair that reports incorrect RTT, please report a bug on ndn-tlv-ping GitHub site.

#9

Updated by Syed Amin over 5 years ago

How were you able to downgrade ndn-tlv-ping to a version before C++11 transition? It won't compile in C++03 mode.

I first downgraded all packages (ndn-cxx, nfd and ndn-tlv-ping) to the last versions that support C++03 mode and built them. After that I upgraded ndn-cxx and nfd only.

Please try three producer/consumer combinations:

  • ndnpingserver/ndnping
  • ndn-traffic/ndn-traffic-server
  • ndn-tlv-poke/ndn-tlv-peek (measure rough RTT with time on consumer end)

If ndnpingserver/ndnping is the only pair that reports incorrect RTT, please report a bug on ndn-tlv-ping GitHub site.

I am quite sure the problem is in ndn-tlv-ping but I can check above tools as well.

#10

Updated by Davide Pesavento over 5 years ago

I think the regression was introduced in ndn-tlv-ping commit 81b054658eb9221128625e05bf6fd0458b661e22, where the two bind() in expressInterest() were replaced by lambdas. The problem is that the body of the lambda is lazily evaluated, as opposed to the bind arguments which are evaluated immediately at the call site, therefore in the current code steady_clock::now() is evaluated only when the callback is executed, i.e. when the ping reply comes back. If my analysis is correct, then ndn-tlv-ping RTT calculation is completely broken at the moment, even without netem.

#11

Updated by Junxiao Shi over 5 years ago

  • Status changed from New to Rejected

Not a NFD bug. Please report to ndn-tlv-ping bug tracker.

#12

Updated by Syed Amin over 5 years ago

David, your analysis seems right. The problem is also in the ndn-traffic-generator. I'll report it to the GitHub.

Also available in: Atom PDF