Bug #1971
closedNccStrategy: Interest not forwarded if a similar Interest arrives during straggler timer and cannot be satisfied by ContentStore
100%
Description
Steps to reproduce:
- send an Interest with MustBeFresh=yes
- response a Data with FreshnessPeriod=0ms
- within 50ms, retransmit the same Interest with a fresh Nonce
Expected: Interest is forwarded to producer
Actual: Interest is not forwarded and eventually times out
Root cause:
- The first Interest creates a PIT entry. NCC strategy knows this PIT entry is new.
- When the first Interest is satisfied, in-records and out-record are deleted from the PIT entry; the PIT entry itself is scheduled to be deleted when straggler timer expires (100ms).
- The second Interest arrives before straggler time expires, so it finds the same PIT entry. Straggler timer is cancelled.
- The second Interest cannot be satisfied by ContentStore because the Data is already stale.
- NCC strategy thinks this PIT entry is old via
NccStrategy::PitEntryInfo::isNewInterest
, so it won't forward the Interest.
Original report: nfd-status -f : gets frequent timeout on Testbed nodes
When running nfd-status on Testbed nodes we get frequent results of:
Request timed out
I've narrowed it down to the face status portion of nfd-status.
I have not been able to get this to happen on my local nfd running on MacOS.
Updated by Alex Afanasyev over 9 years ago
Do you have an estimate about the amount of traffic testbed node has? I mean, is there active Interest-Data exchanges during these timeouts?
Updated by John DeHart over 9 years ago
Alex Afanasyev wrote:
Do you have an estimate about the amount of traffic testbed node has? I mean, is there active Interest-Data exchanges during these timeouts?
There is some traffic but not anything more than our typical background traffic.
I also just updated our local node that supports ndnmap to the latest Ubuntu version
of nfd from the PPA and I can't get it to give me a timeout. So, I don't suspect
it is purely an Ubuntu nfd issue.
Updated by Junxiao Shi over 9 years ago
I also see this behavior frequently on testbed nodes, and Face Dataset is the only portion that has this problem.
I suspect it's not due to ContentStore eviction, because other datasets don't have this problem.
Updated by John DeHart over 9 years ago
I have a clue.
If I stop the collection of bandwidth data from NFD for display on ndnmap then I do not get timeouts from nfd-status.
We collect data from nfd by sending an interest: Interest interest("/localhost/nfd/faces/list");
We sleep for 1 second between each subsequent interest to collect the data.
With face logging turned up it looks something like this:
1410205451.481137 DEBUG: [FaceManager] command result: processing verb: list
1410205452.991051 DEBUG: [FaceManager] command result: processing verb: list
1410205454.505937 DEBUG: [FaceManager] command result: processing verb: list
1410205456.015470 DEBUG: [FaceManager] command result: processing verb: list
So it turns out to be about every 1.5 seconds that we collect the data.
What I have noticed is that if I do an nfd-status -f that times out the Face logging locks up for about 4 seconds (default lifetime of an interest) and then it goes back to normal.
What that looks like in the log is something like this:
1410205619.113779 DEBUG: [FaceManager] command result: processing verb: list
1410205620.625075 DEBUG: [FaceManager] command result: processing verb: list
1410205622.133072 DEBUG: [FaceManager] command result: processing verb: list
1410205623.643113 DEBUG: [FaceManager] command result: processing verb: list
1410205623.700257 INFO: [FaceTable] Added face id=414 remote=fd://31 local=unix:///run/nfd.sock
1410205623.705905 DEBUG: [FaceManager] command result: processing verb: events
1410205627.701083 INFO: [UnixStreamFace] [id:414,uri:fd://31] Connection closed
1410205627.708151 INFO: [FaceTable] Removed face id=414 remote=fd://31 local=unix:///run/nfd.sock
1410205627.708982 DEBUG: [FaceManager] command result: processing verb: events
1410205630.154877 DEBUG: [FaceManager] command result: processing verb: list
1410205631.664020 DEBUG: [FaceManager] command result: processing verb: list
1410205633.172634 DEBUG: [FaceManager] command result: processing verb: list
A normal nfd-status -f in the log file looks like this:
1410205616.097154 DEBUG: [FaceManager] command result: processing verb: list
1410205617.174842 INFO: [FaceTable] Added face id=413 remote=fd://31 local=unix:///run/nfd.sock
1410205617.180671 DEBUG: [FaceManager] command result: processing verb: list
1410205617.184581 DEBUG: [FaceManager] command result: processing verb: events
1410205617.185192 INFO: [UnixStreamFace] [id:413,uri:fd://31] Connection closed
1410205617.189059 INFO: [FaceTable] Removed face id=413 remote=fd://31 local=unix:///run/nfd.sock
1410205617.189812 DEBUG: [FaceManager] command result: processing verb: events
1410205617.606849 DEBUG: [FaceManager] command result: processing verb: list
I thought at first that it might be how close together the data request (processing verb: list) and the nfd-status -f (processing verb: events) were to each other but the successful one above is very close together...
Updated by John DeHart over 9 years ago
Another clue. If I turn the data collection off and do this:
ndnops@wundngw:~$ nfd-status -f; nfd-status -f
The first nfd-status works and the second nfd-status always times out.
However, if I do that on my laptop I still do not get a timeout from nfd-status.
Updated by Junxiao Shi over 9 years ago
One difference between testbed hub and laptop is: hub has a full ContentStore.
I suggest trying this on the laptop:
- set a smaller ContentStore capacity
- fill the ContentStore with traffic generator; for the worst case, use long lifetime
- request Face Dataset twice consecutively
Updated by John DeHart over 9 years ago
I've tried before, while and after sending data through NFD and still no
timeouts on my laptop.
Updated by Junxiao Shi over 9 years ago
- Category set to Forwarding
I'm able to reproduce this bug using command from note-5 on ndn6.tk node.
Some related TRACE logs are:
# first nfd-status is connected
1411684697.083361 DEBUG: [UnixStreamChannel] [/tmp/nfd.sock] << Incoming connection
1411684697.083496 INFO: [FaceTable] Added face id=274 remote=fd://24 local=unix:///tmp/nfd.sock
1411684697.089089 TRACE: [UnixStreamFace] [id:274,uri:fd://24] Received: 46 bytes
# new PIT entry
1411684697.089369 DEBUG: [Forwarder] onIncomingInterest face=274 interest=/localhost/nfd/faces/list
1411684697.089484 TRACE: [NameTree] lookup /localhost/nfd/faces/list
1411684697.090265 TRACE: [NameTree] Did not find /localhost/nfd/faces/list, need to insert it to the table
# no CS match, forwarding to InternalFace
1411684697.090362 TRACE: [ContentStore] find() /localhost/nfd/faces/list
1411684697.105279 DEBUG: [Forwarder] onOutgoingInterest face=1 interest=/localhost/nfd/faces/list
# FaceManager generates dataset
1411684697.106135 DEBUG: [InternalFace] received Interest: /localhost/nfd/faces/list
1411684697.106240 DEBUG: [InternalFace] found Interest filter for /localhost/nfd/faces (previous match)
1411684697.106318 DEBUG: [FaceManager] command result: processing verb: list
# Data is admitted to ContentStore
1411684697.110192 DEBUG: [Forwarder] onIncomingData face=1 data=/localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00
1411684697.110379 TRACE: [NameTree] NameTree::findAllMatches/localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00
1411684697.110535 TRACE: [NameTree] findLongestPrefixMatch /localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00
1411684697.110763 TRACE: [ContentStore] insert() /localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00/%11%90%9Ds%F2%88%85%B5%A7%9F%05%C7%A33x%29%0D2VWa%20Y%CD%848F3%ACGDw
1411684697.111454 TRACE: [ContentStore] insertToSkipList() /localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00/%11%90%9Ds%F2%88%85%B5%A7%9F%05%C7%A33x%29%0D2VWa%20Y%CD%848F3%ACGDw, skipList size 67
1411684697.111775 TRACE: [ContentStore] Not a duplicate
1411684697.111797 TRACE: [ContentStore] insertafter
1411684697.111822 DEBUG: [Forwarder] onIncomingData matching=/localhost/nfd/faces/list
# Data is returned to nfd-status
1411684697.111969 DEBUG: [Forwarder] onOutgoingData face=274 data=/localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00
# first nfd-status is gone
1411684697.130626 INFO: [UnixStreamFace] [id:274,uri:fd://24] Connection closed
# second nfd-status is connected
1411684697.158898 DEBUG: [UnixStreamChannel] [/tmp/nfd.sock] << Incoming connection
1411684697.158972 INFO: [FaceTable] Added face id=275 remote=fd://24 local=unix:///tmp/nfd.sock
1411684697.163958 TRACE: [UnixStreamFace] [id:275,uri:fd://24] Received: 46 bytes
# existing PIT entry
1411684697.164012 DEBUG: [Forwarder] onIncomingInterest face=275 interest=/localhost/nfd/faces/list
1411684697.164105 TRACE: [NameTree] lookup /localhost/nfd/faces/list
1411684697.164793 TRACE: [NameTree] Name /localhost/nfd/faces/list hash value = 16399172738370688085 location = 85
# no CS match
1411684697.164900 TRACE: [ContentStore] find() /localhost/nfd/faces/list
1411684697.178367 TRACE: [ContentStore] violates mustBeFresh
# but Interest is not forwarded
This appears to be a bug in forwarding pipeline that only happens when FreshnessPeriod is less than InterestLifetime.
Updated by Junxiao Shi over 9 years ago
- Subject changed from nfd-status -f : gets frequent timeout on Testbed nodes to NCC strategy: Interest not forwarded if a similar Interest arrives during straggler timer and cannot be satisfied by ContentStore
- Description updated (diff)
- Assignee set to Junxiao Shi
I recall that both testbed nodes and ndn6.tk node are using NCC strategy, while most laptops are on best-route v2.
NCC strategy uses a field in its StrategyInfo to determine whether a PIT entry is new.
This is incompatible with straggler timer, but the problem only appears if similar Interest arrives within that period and cannot be satisfied by ContentStore.
I don't remember why NCC strategy is designed in this way, and I don't have a concrete solution yet.
Updated by Junxiao Shi over 9 years ago
- Subject changed from NCC strategy: Interest not forwarded if a similar Interest arrives during straggler timer and cannot be satisfied by ContentStore to NccStrategy: Interest not forwarded if a similar Interest arrives during straggler timer and cannot be satisfied by ContentStore
- Status changed from New to In Progress
- Target version set to v0.3
- Estimated time set to 3.00 h
Updated by Junxiao Shi over 9 years ago
- Status changed from In Progress to Code review
Updated by Junxiao Shi over 9 years ago
- Status changed from Code review to Closed
- % Done changed from 0 to 100