Bug #1742
closedSome entries got stuck in RIB manager, causing problems
100%
Description
I will update this issue as I have more information. Currently, I'm running repo-ng test cases and unless I re-start nrd/nfd, the second run of test cases fails.
I was getting a lot of unrelated records (something from previous runs of the scenario), and at the moment I'm getting (NFD Error: 410 Face not found)
Updated by Alex Afanasyev over 10 years ago
Potential reason could be in missed notification of face destruction
Updated by Alex Afanasyev over 10 years ago
I have pushed 2 commits to NFD and ndn-cxx: http://gerrit.named-data.net/#/c/1000/ and http://gerrit.named-data.net/#/c/999/. Assuming they are used, the following relatively constantly reproduces the problem on my laptop (OSX 10.9.4)
Run
sudo nfd &
nrd
# we will need terminal output from nrd later
Run (may require multiple times to get the actual)
./build/examples/producer
Run
killall -HUP nrd
Expected: nothing extra is printed in NRD terminal screen
Actual:
RibEntry{
Name: /example/testApp/0
FaceEntry( faceid: 11 origin: 0 cost: 0 flags: 1 expires in: 3598024833000 nanoseconds)
}
RibEntry{
Name: /example/testApp/0/1
FaceEntry( faceid: 11 origin: 0 cost: 0 flags: 1 expires in: 3598027837000 nanoseconds)
}
RibEntry{
Name: /example/testApp/0/1/2
FaceEntry( faceid: 11 origin: 0 cost: 0 flags: 1 expires in: 3598030794000 nanoseconds)
}
Updated by Alex Afanasyev over 10 years ago
More details. Seem to be directly related to missed notifications. The following shows log after running producer. The strange thing that neither notification from created face nor destroyed face is received:
1404781914.553415 DEBUG: [RibManager] Parameters parsed OK
1404781914.553466 DEBUG: [RibManager] command result: processing verb: register
1404781914.553549 TRACE: [RibManager] register prefix: FaceEntry( faceid: 25 origin: 0 cost: 0 flags: 1 expires in: 3599999910000 nanoseconds)
1404781914.553685 DEBUG: [RibManager] Applying 1 updates to FIB
1404781914.556456 DEBUG: [RibManager] Parameters parsed OK
1404781914.556505 DEBUG: [RibManager] command result: processing verb: register
1404781914.556579 TRACE: [RibManager] register prefix: FaceEntry( faceid: 25 origin: 0 cost: 0 flags: 1 expires in: 3599999904000 nanoseconds)
1404781914.556720 DEBUG: [RibManager] Applying 1 updates to FIB
1404781914.559484 DEBUG: [RibManager] Parameters parsed OK
1404781914.559530 DEBUG: [RibManager] command result: processing verb: register
1404781914.559604 TRACE: [RibManager] register prefix: FaceEntry( faceid: 25 origin: 0 cost: 0 flags: 1 expires in: 3599999921000 nanoseconds)
1404781914.559733 DEBUG: [RibManager] Applying 1 updates to FIB
Compare this to a "normal" run:
1404781774.371073 TRACE: [RibManager] onNotification: FaceEventNotification(Kind: created, FaceID: 16, RemoteUri: fd://30, LocalUri: unix:///private/var/run/nfd.sock, Flags: 3)
1404781774.371282 DEBUG: [FaceMonitor] onNotification: Interest sent: /localhost/nfd/faces/events/%17?ndn.InterestLifetime=60000
1404781774.374280 TRACE: [RibManager] onNotification: FaceEventNotification(Kind: created, FaceID: 17, RemoteUri: fd://31, LocalUri: unix:///private/var/run/nfd.sock, Flags: 3)
1404781774.374459 DEBUG: [FaceMonitor] onNotification: Interest sent: /localhost/nfd/faces/events/%18?ndn.InterestLifetime=60000
1404781774.383791 DEBUG: [RibManager] Parameters parsed OK
1404781774.383858 DEBUG: [RibManager] command result: processing verb: register
1404781774.383976 TRACE: [RibManager] register prefix: FaceEntry( faceid: 16 origin: 0 cost: 0 flags: 1 expires in: 3599999869000 nanoseconds)
1404781774.384176 DEBUG: [RibManager] Applying 1 updates to FIB
1404781774.387594 DEBUG: [RibManager] Parameters parsed OK
1404781774.387670 DEBUG: [RibManager] command result: processing verb: register
1404781774.387795 TRACE: [RibManager] register prefix: FaceEntry( faceid: 16 origin: 0 cost: 0 flags: 1 expires in: 3599999857000 nanoseconds)
1404781774.388012 DEBUG: [RibManager] Applying 1 updates to FIB
1404781774.391074 DEBUG: [RibManager] Parameters parsed OK
1404781774.391122 DEBUG: [RibManager] command result: processing verb: register
1404781774.391197 TRACE: [RibManager] register prefix: FaceEntry( faceid: 16 origin: 0 cost: 0 flags: 1 expires in: 3599999909000 nanoseconds)
1404781774.391336 DEBUG: [RibManager] Applying 1 updates to FIB
1404781774.486277 TRACE: [RibManager] onNotification: FaceEventNotification(Kind: destroyed, FaceID: 16, RemoteUri: fd://30, LocalUri: unix:///private/var/run/nfd.sock, Flags: 3)
1404781774.486513 DEBUG: [RibManager] Applying 3 updates to FIB
1404781774.495401 DEBUG: [FaceMonitor] onNotification: Interest sent: /localhost/nfd/faces/events/%19?ndn.InterestLifetime=60000
Updated by Lan Wang over 10 years ago
When you do "kill -HUP nrd", what is nrd expected to do and what does it currently do? Does it get this signal and remove all the entries it inserted? Or it just dies. If the latter, then some entries will be stuck in the FIB, right?
Updated by Alex Afanasyev over 10 years ago
Sorry for spamming, but I have more input. I wrote a simple app that just subscribes to face notifications (also part of http://gerrit.named-data.net/#/c/999/ now). This simple app does not misses a single notification, while rib-manager does miss some consistently.
Updated by Alex Afanasyev over 10 years ago
Lan, I have added processing of HUP signal in http://gerrit.named-data.net/#/c/999/. This is a workaround of issue #1684 and #1662 (RIB listing), which we don't have.
Previously, nrd suppose to simply die. In http://gerrit.named-data.net/#/c/999/ it is devoted to dumping content of RIB database.
Updated by Alex Afanasyev over 10 years ago
- Category changed from RIB to Management
- Assignee set to Alex Afanasyev
Updated by Lan Wang over 10 years ago
Maybe unrelated to this bug, but in general there needs to be a way to remove the entries nrd inserted (especially those MAX time entries) before it crashes. Previously application's entries get only 1 hour of lifetime, so this is not a big issue. But I think now the default is MAX time, so this problem needs to be addressed.
Updated by Alex Afanasyev over 10 years ago
- Blocked by Task #1743: Proper default settings are not applied to ContentStore added
Updated by Alex Afanasyev over 10 years ago
Ok. I found the main reason for the current issue. Due to bug #1743 and my outdated nfd.conf
, my content store was only 10 packets. Due to the way notifications are processed in NRD (this is a bug and I will fix it), in some cases NRD was requesting notification that was already evicted from the ContentStore.
Updated by Alex Afanasyev over 10 years ago
- Status changed from New to Code review
- % Done changed from 0 to 50
(The other 50% after #1743 finished)
Updated by Alex Afanasyev over 10 years ago
Btw. Should RIB dump on HUP signal be a current feature or still just for testing?
Updated by Junxiao Shi over 10 years ago
Due to bug #1743 and my outdated
nfd.conf
, my content store was only 10 packets. Due to the way notifications are processed in NRD (this is a bug and I will fix it), in some cases NRD was requesting notification that was already evicted from the ContentStore.
This is one consequence of the design choice that StatusDataset and Notification rely on ContentStore.
A subscriber of face status update notifications should request Face Dataset if it suspects some notifications are lost.
Btw. Should RIB dump on HUP signal be a current feature or still just for testing?
This should be for testing only.
#1684 provides RIB enumeration via Interest-Data exchange.
Updated by Junxiao Shi over 10 years ago
Maybe unrelated to this bug, but in general there needs to be a way to remove the entries nrd inserted (especially those MAX time entries) before it crashes. Previously application's entries get only 1 hour of lifetime, so this is not a big issue. But I think now the default is MAX time, so this problem needs to be addressed.
No. When nrd
process crashes, nfd
should also quit.
There is no need to clean up FIB entries inserted by RIB Daemon.
Updated by Lan Wang over 10 years ago
Alex: what was the problem in nrd's notification processing?
Updated by Alex Afanasyev over 10 years ago
See comments in http://gerrit.named-data.net/#/c/1001/
Updated by Alex Afanasyev over 10 years ago
- Status changed from Code review to Closed
- % Done changed from 50 to 100