Project

General

Profile

Actions

Bug #1971

closed

NccStrategy: Interest not forwarded if a similar Interest arrives during straggler timer and cannot be satisfied by ContentStore

Added by John DeHart about 10 years ago. Updated about 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Forwarding
Target version:
Start date:
09/08/2014
Due date:
% Done:

100%

Estimated time:
3.00 h

Description

Steps to reproduce:

  1. send an Interest with MustBeFresh=yes
  2. response a Data with FreshnessPeriod=0ms
  3. within 50ms, retransmit the same Interest with a fresh Nonce

Expected: Interest is forwarded to producer

Actual: Interest is not forwarded and eventually times out

Root cause:

  1. The first Interest creates a PIT entry. NCC strategy knows this PIT entry is new.
  2. When the first Interest is satisfied, in-records and out-record are deleted from the PIT entry; the PIT entry itself is scheduled to be deleted when straggler timer expires (100ms).
  3. The second Interest arrives before straggler time expires, so it finds the same PIT entry. Straggler timer is cancelled.
  4. The second Interest cannot be satisfied by ContentStore because the Data is already stale.
  5. NCC strategy thinks this PIT entry is old via NccStrategy::PitEntryInfo::isNewInterest, so it won't forward the Interest.

Original report: nfd-status -f : gets frequent timeout on Testbed nodes

When running nfd-status on Testbed nodes we get frequent results of:

Request timed out

I've narrowed it down to the face status portion of nfd-status.

I have not been able to get this to happen on my local nfd running on MacOS.

Actions #1

Updated by Alex Afanasyev about 10 years ago

Do you have an estimate about the amount of traffic testbed node has? I mean, is there active Interest-Data exchanges during these timeouts?

Actions #2

Updated by John DeHart about 10 years ago

Alex Afanasyev wrote:

Do you have an estimate about the amount of traffic testbed node has? I mean, is there active Interest-Data exchanges during these timeouts?

There is some traffic but not anything more than our typical background traffic.

I also just updated our local node that supports ndnmap to the latest Ubuntu version
of nfd from the PPA and I can't get it to give me a timeout. So, I don't suspect
it is purely an Ubuntu nfd issue.

Actions #3

Updated by Junxiao Shi about 10 years ago

I also see this behavior frequently on testbed nodes, and Face Dataset is the only portion that has this problem.
I suspect it's not due to ContentStore eviction, because other datasets don't have this problem.

Actions #4

Updated by John DeHart about 10 years ago

I have a clue.

If I stop the collection of bandwidth data from NFD for display on ndnmap then I do not get timeouts from nfd-status.

We collect data from nfd by sending an interest: Interest interest("/localhost/nfd/faces/list");

We sleep for 1 second between each subsequent interest to collect the data.

With face logging turned up it looks something like this:

1410205451.481137 DEBUG: [FaceManager] command result: processing verb: list

1410205452.991051 DEBUG: [FaceManager] command result: processing verb: list

1410205454.505937 DEBUG: [FaceManager] command result: processing verb: list

1410205456.015470 DEBUG: [FaceManager] command result: processing verb: list

So it turns out to be about every 1.5 seconds that we collect the data.

What I have noticed is that if I do an nfd-status -f that times out the Face logging locks up for about 4 seconds (default lifetime of an interest) and then it goes back to normal.

What that looks like in the log is something like this:

1410205619.113779 DEBUG: [FaceManager] command result: processing verb: list

1410205620.625075 DEBUG: [FaceManager] command result: processing verb: list

1410205622.133072 DEBUG: [FaceManager] command result: processing verb: list

1410205623.643113 DEBUG: [FaceManager] command result: processing verb: list

1410205623.700257 INFO: [FaceTable] Added face id=414 remote=fd://31 local=unix:///run/nfd.sock

1410205623.705905 DEBUG: [FaceManager] command result: processing verb: events

1410205627.701083 INFO: [UnixStreamFace] [id:414,uri:fd://31] Connection closed

1410205627.708151 INFO: [FaceTable] Removed face id=414 remote=fd://31 local=unix:///run/nfd.sock

1410205627.708982 DEBUG: [FaceManager] command result: processing verb: events

1410205630.154877 DEBUG: [FaceManager] command result: processing verb: list

1410205631.664020 DEBUG: [FaceManager] command result: processing verb: list

1410205633.172634 DEBUG: [FaceManager] command result: processing verb: list

A normal nfd-status -f in the log file looks like this:

1410205616.097154 DEBUG: [FaceManager] command result: processing verb: list

1410205617.174842 INFO: [FaceTable] Added face id=413 remote=fd://31 local=unix:///run/nfd.sock

1410205617.180671 DEBUG: [FaceManager] command result: processing verb: list

1410205617.184581 DEBUG: [FaceManager] command result: processing verb: events

1410205617.185192 INFO: [UnixStreamFace] [id:413,uri:fd://31] Connection closed

1410205617.189059 INFO: [FaceTable] Removed face id=413 remote=fd://31 local=unix:///run/nfd.sock

1410205617.189812 DEBUG: [FaceManager] command result: processing verb: events

1410205617.606849 DEBUG: [FaceManager] command result: processing verb: list

I thought at first that it might be how close together the data request (processing verb: list) and the nfd-status -f (processing verb: events) were to each other but the successful one above is very close together...

Actions #5

Updated by John DeHart about 10 years ago

Another clue. If I turn the data collection off and do this:

ndnops@wundngw:~$ nfd-status -f; nfd-status -f

The first nfd-status works and the second nfd-status always times out.

However, if I do that on my laptop I still do not get a timeout from nfd-status.

Actions #6

Updated by Junxiao Shi about 10 years ago

One difference between testbed hub and laptop is: hub has a full ContentStore.

I suggest trying this on the laptop:

  1. set a smaller ContentStore capacity
  2. fill the ContentStore with traffic generator; for the worst case, use long lifetime
  3. request Face Dataset twice consecutively
Actions #7

Updated by John DeHart about 10 years ago

I've tried before, while and after sending data through NFD and still no
timeouts on my laptop.

Actions #8

Updated by Junxiao Shi about 10 years ago

  • Category set to Forwarding

I'm able to reproduce this bug using command from note-5 on ndn6.tk node.

Some related TRACE logs are:

# first nfd-status is connected
1411684697.083361 DEBUG: [UnixStreamChannel] [/tmp/nfd.sock] << Incoming connection
1411684697.083496 INFO: [FaceTable] Added face id=274 remote=fd://24 local=unix:///tmp/nfd.sock
1411684697.089089 TRACE: [UnixStreamFace] [id:274,uri:fd://24] Received: 46 bytes

# new PIT entry
1411684697.089369 DEBUG: [Forwarder] onIncomingInterest face=274 interest=/localhost/nfd/faces/list
1411684697.089484 TRACE: [NameTree] lookup /localhost/nfd/faces/list
1411684697.090265 TRACE: [NameTree] Did not find /localhost/nfd/faces/list, need to insert it to the table

# no CS match, forwarding to InternalFace
1411684697.090362 TRACE: [ContentStore] find() /localhost/nfd/faces/list
1411684697.105279 DEBUG: [Forwarder] onOutgoingInterest face=1 interest=/localhost/nfd/faces/list

# FaceManager generates dataset
1411684697.106135 DEBUG: [InternalFace] received Interest: /localhost/nfd/faces/list
1411684697.106240 DEBUG: [InternalFace] found Interest filter for /localhost/nfd/faces (previous match)
1411684697.106318 DEBUG: [FaceManager] command result: processing verb: list

# Data is admitted to ContentStore
1411684697.110192 DEBUG: [Forwarder] onIncomingData face=1 data=/localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00
1411684697.110379 TRACE: [NameTree] NameTree::findAllMatches/localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00
1411684697.110535 TRACE: [NameTree] findLongestPrefixMatch /localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00
1411684697.110763 TRACE: [ContentStore] insert() /localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00/%11%90%9Ds%F2%88%85%B5%A7%9F%05%C7%A33x%29%0D2VWa%20Y%CD%848F3%ACGDw
1411684697.111454 TRACE: [ContentStore] insertToSkipList() /localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00/%11%90%9Ds%F2%88%85%B5%A7%9F%05%C7%A33x%29%0D2VWa%20Y%CD%848F3%ACGDw, skipList size 67
1411684697.111775 TRACE: [ContentStore] Not a duplicate
1411684697.111797 TRACE: [ContentStore] insertafter
1411684697.111822 DEBUG: [Forwarder] onIncomingData matching=/localhost/nfd/faces/list

# Data is returned to nfd-status
1411684697.111969 DEBUG: [Forwarder] onOutgoingData face=274 data=/localhost/nfd/faces/list/%00%00%01H%AE%F7%04%12/%00%00

# first nfd-status is gone
1411684697.130626 INFO: [UnixStreamFace] [id:274,uri:fd://24] Connection closed

# second nfd-status is connected
1411684697.158898 DEBUG: [UnixStreamChannel] [/tmp/nfd.sock] << Incoming connection
1411684697.158972 INFO: [FaceTable] Added face id=275 remote=fd://24 local=unix:///tmp/nfd.sock
1411684697.163958 TRACE: [UnixStreamFace] [id:275,uri:fd://24] Received: 46 bytes

# existing PIT entry
1411684697.164012 DEBUG: [Forwarder] onIncomingInterest face=275 interest=/localhost/nfd/faces/list
1411684697.164105 TRACE: [NameTree] lookup /localhost/nfd/faces/list
1411684697.164793 TRACE: [NameTree] Name /localhost/nfd/faces/list hash value = 16399172738370688085  location = 85

# no CS match
1411684697.164900 TRACE: [ContentStore] find() /localhost/nfd/faces/list
1411684697.178367 TRACE: [ContentStore] violates mustBeFresh

# but Interest is not forwarded

This appears to be a bug in forwarding pipeline that only happens when FreshnessPeriod is less than InterestLifetime.

Actions #9

Updated by Junxiao Shi about 10 years ago

  • Subject changed from nfd-status -f : gets frequent timeout on Testbed nodes to NCC strategy: Interest not forwarded if a similar Interest arrives during straggler timer and cannot be satisfied by ContentStore
  • Description updated (diff)
  • Assignee set to Junxiao Shi

I recall that both testbed nodes and ndn6.tk node are using NCC strategy, while most laptops are on best-route v2.

NCC strategy uses a field in its StrategyInfo to determine whether a PIT entry is new.
This is incompatible with straggler timer, but the problem only appears if similar Interest arrives within that period and cannot be satisfied by ContentStore.

I don't remember why NCC strategy is designed in this way, and I don't have a concrete solution yet.

Actions #10

Updated by Junxiao Shi about 10 years ago

  • Subject changed from NCC strategy: Interest not forwarded if a similar Interest arrives during straggler timer and cannot be satisfied by ContentStore to NccStrategy: Interest not forwarded if a similar Interest arrives during straggler timer and cannot be satisfied by ContentStore
  • Status changed from New to In Progress
  • Target version set to v0.3
  • Estimated time set to 3.00 h
Actions #11

Updated by Junxiao Shi about 10 years ago

  • Status changed from In Progress to Code review
Actions #12

Updated by Junxiao Shi about 10 years ago

  • Status changed from Code review to Closed
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF