Bug #2150
closedNFD does not honor netem delays
0%
Description
I've a machine with multiple interfaces:
$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:24:e8:7a:44:cd
inet addr:155.98.38.81 Bcast:155.98.39.255 Mask:255.255.252.0
inet6 addr: fe80::224:e8ff:fe7a:44cd/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:11552 errors:0 dropped:0 overruns:0 frame:0
TX packets:2958 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1770097 (1.7 MB) TX bytes:559424 (559.4 KB)
eth1 Link encap:Ethernet HWaddr 00:24:e8:7a:44:cf
inet addr:10.1.1.2 Bcast:10.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::224:e8ff:fe7a:44cf/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:674 errors:0 dropped:0 overruns:0 frame:0
TX packets:636 errors:1 dropped:0 overruns:0 carrier:1
collisions:14 txqueuelen:1000
RX bytes:57477 (57.4 KB) TX bytes:48981 (48.9 KB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:334 errors:0 dropped:0 overruns:0 frame:0
TX packets:334 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:49347 (49.3 KB) TX bytes:49347 (49.3 KB)
and the following entry in the /etc/hosts:
10.1.1.3 CAIDA-lan0 CAIDA-0 CAIDA
I added a delay using netem on eth1 (10.1.1.2):
sudo tc qdisc add dev eth1 root netem delay 100ms
and added an entry in nfd using nfdc:
HOME=/tmp nfdc register -c 25 /caida udp://caida
$ nfd-status -fb
Faces:
faceid=1 remote=internal:// local=internal:// counters={in={0i 143d 0B} out={144i 0d 0B}} local persistent point-to-point
faceid=254 remote=contentstore:// local=contentstore:// counters={in={0i 0d 0B} out={0i 0d 0B}} local persistent point-to-point
faceid=255 remote=null:// local=null:// counters={in={0i 0d 0B} out={0i 0d 0B}} local persistent point-to-point
faceid=256 remote=udp4://224.0.23.170:56363 local=udp4://155.98.38.81:56363 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
faceid=257 remote=udp4://224.0.23.170:56363 local=udp4://10.1.1.2:56363 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
faceid=258 remote=ether://[01:00:5e:00:17:aa] local=dev://eth0 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
faceid=259 remote=ether://[01:00:5e:00:17:aa] local=dev://eth1 counters={in={0i 0d 0B} out={0i 0d 0B}} non-local persistent point-to-point
faceid=260 remote=fd://23 local=unix:///run/nfd.sock counters={in={112i 27d 37014B} out={27i 102d 66238B}} local on-demand point-to-point
faceid=262 remote=fd://24 local=unix:///run/nfd.sock counters={in={1i 0d 422B} out={0i 1d 785B}} local on-demand point-to-point
faceid=264 remote=udp4://10.1.1.3:6363 local=udp4://10.1.1.2:6363 counters={in={0i 23d 8829B} out={23i 0d 894B}} non-local persistent point-to-point
faceid=299 remote=fd://25 local=unix:///run/nfd.sock counters={in={1i 0d 46B} out={0i 0d 0B}} local on-demand point-to-point
FIB:
/localhost/nfd nexthops={faceid=1 (cost=0)}
/caida nexthops={faceid=264 (cost=25)}
/localhost/nfd/rib nexthops={faceid=260 (cost=0)}
The remote machine (caida in this case) is running ndnpingserver on "/caida" prefix.
When I do simple ping, I can see the intended delays:
$ ping caida
PING CAIDA-lan0 (10.1.1.3) 56(84) bytes of data.
64 bytes from CAIDA-lan0 (10.1.1.3): icmp_req=1 ttl=64 time=100 ms
64 bytes from CAIDA-lan0 (10.1.1.3): icmp_req=2 ttl=64 time=100 ms
But when I do ndnping I don't see any delay?
$ ndnping /caida
=== Pinging /caida ===
Content From /caida - Ping Reference = 500905038 - Round Trip Time = 0.012777 ms
Content From /caida - Ping Reference = 500905039 - Round Trip Time = 0.009988 ms
Content From /caida - Ping Reference = 500905040 - Round Trip Time = 0.009912 ms
Any idea what's going on here? My guess was nfd is not using the right face to forward the interest. But ndndump shows that the packets are going through eth1:
$ HOME=/tmp/ sudo ndndump -v -i eth1 > /tmp/ndndump.out &
$ ndnping /caida
=== Pinging /caida ===
Content From /caida - Ping Reference = 716405588 - Round Trip Time = 0.012228 ms
Content From /caida - Ping Reference = 716405589 - Round Trip Time = 0.00985 ms
$ cat /tmp/ndndump.out
1415385072.714421 From: 10.1.1.2, To: 10.1.1.3, Tunnel Type: UDP, INTEREST: /caida/ping/716405588?ndn.MustBeFresh=1&ndn.Nonce=716405588
1415385072.722551 From: 10.1.1.3, To: 10.1.1.2, Tunnel Type: UDP, DATA: /caida/ping/716405588
1415385073.706848 From: 10.1.1.2, To: 10.1.1.3, Tunnel Type: UDP, INTEREST: /caida/ping/716405589?ndn.MustBeFresh=1&ndn.Nonce=716405589
1415385073.715069 From: 10.1.1.3, To: 10.1.1.2, Tunnel Type: UDP, DATA: /caida/ping/716405589
Updated by Junxiao Shi almost 10 years ago
- Project changed from ndn-cxx to NFD
- Subject changed from NFD is not supporting the delays enforced by netem. to NFD does not honor netem delays
- Description updated (diff)
- Category set to Faces
- Target version set to v0.3
Updated by Junxiao Shi almost 10 years ago
- Assignee set to Giulio Grassi
Confirmed. This happens for both UDP and TCP face.
UDP-based IP traceroute shows over 100ms delay, so I assume netem
command is correct.
Updated by Syed Amin almost 10 years ago
Want to add that this started to happen after recent git pull. The commit that I was using previously (8bfd56d051bb1b9f8f78623fba7e7414cb6a5edd) didn't have this problem.
Updated by Alex Afanasyev almost 10 years ago
How this can ever happen? NFD doesn't do anything special, except using IP stack. If some combination of flags that we are setting on sockets does some funky with tc, then it is tc's bug.
There could be some timing issues with ndnping, though...
Updated by Syed Amin almost 10 years ago
Okay, so the problem is not in NFD, it is in ndn-tlv-ping. I downgraded ndn-tlv-ping to the commit: 38f1ea7698108bdd1cdbdd621316f020b8d0e4dc and it seems to be working fine now.
Other commit ids are:
ndn-cxx: 56860f5cd10bc68d771e9bb93a83d0906107cc9d
nfd: 689f0e9d52f6c08bc1e4f57690340a31b65e27d8
Updated by Lan Wang almost 10 years ago
Alex/Junxiao: Who's the right person to fix ndn-tlv-ping?
Updated by Junxiao Shi almost 10 years ago
- Assignee deleted (
Giulio Grassi)
Okay, so the problem is not in NFD, it is in ndn-tlv-ping. I downgraded ndn-tlv-ping to the commit: 38f1ea7698108bdd1cdbdd621316f020b8d0e4dc and it seems to be working fine now.
How were you able to downgrade ndn-tlv-ping to a version before C++11 transition? It won't compile in C++03 mode.
If you believe it's indeed a problem in ndn-tlv-ping, please do an additional experiment:
Use two topologies. The first topology is same as what you are initially using. The second topology is below:
- Topology: A-R-B
- R is an IP router; use netem to add 500ms delay on both NICs
- A runs producer without netem rules
- B runs consumer without netem rules
Please try three producer/consumer combinations:
- ndnpingserver/ndnping
- ndn-traffic/ndn-traffic-server
- ndn-tlv-poke/ndn-tlv-peek (measure rough RTT with
time
on consumer end)
If ndnpingserver/ndnping is the only pair that reports incorrect RTT, please report a bug on ndn-tlv-ping GitHub site.
Updated by Syed Amin almost 10 years ago
How were you able to downgrade ndn-tlv-ping to a version before C++11 transition? It won't compile in C++03 mode.
I first downgraded all packages (ndn-cxx, nfd and ndn-tlv-ping) to the last versions that support C++03 mode and built them. After that I upgraded ndn-cxx and nfd only.
Please try three producer/consumer combinations:
- ndnpingserver/ndnping
- ndn-traffic/ndn-traffic-server
- ndn-tlv-poke/ndn-tlv-peek (measure rough RTT with
time
on consumer end)If ndnpingserver/ndnping is the only pair that reports incorrect RTT, please report a bug on ndn-tlv-ping GitHub site.
I am quite sure the problem is in ndn-tlv-ping but I can check above tools as well.
Updated by Davide Pesavento almost 10 years ago
I think the regression was introduced in ndn-tlv-ping commit 81b054658eb9221128625e05bf6fd0458b661e22
, where the two bind()
in expressInterest()
were replaced by lambdas. The problem is that the body of the lambda is lazily evaluated, as opposed to the bind arguments which are evaluated immediately at the call site, therefore in the current code steady_clock::now()
is evaluated only when the callback is executed, i.e. when the ping reply comes back. If my analysis is correct, then ndn-tlv-ping RTT calculation is completely broken at the moment, even without netem.
Updated by Junxiao Shi almost 10 years ago
- Status changed from New to Rejected
Not a NFD bug. Please report to ndn-tlv-ping bug tracker.
Updated by Syed Amin almost 10 years ago
David, your analysis seems right. The problem is also in the ndn-traffic-generator. I'll report it to the GitHub.
Updated by Davide Pesavento over 2 years ago
- Category deleted (
Faces) - Target version deleted (
v0.3) - Start date deleted (
11/07/2014)