Project

General

Profile

Actions

Bug #1742

closed

Some entries got stuck in RIB manager, causing problems

Added by Alex Afanasyev over 10 years ago. Updated over 10 years ago.

Status:
Closed
Priority:
Normal
Category:
Management
Target version:
Start date:
07/07/2014
Due date:
% Done:

100%

Estimated time:

Description

I will update this issue as I have more information. Currently, I'm running repo-ng test cases and unless I re-start nrd/nfd, the second run of test cases fails.

I was getting a lot of unrelated records (something from previous runs of the scenario), and at the moment I'm getting (NFD Error: 410 Face not found)


Related issues 1 (0 open1 closed)

Blocked by NFD - Task #1743: Proper default settings are not applied to ContentStoreClosed07/07/2014

Actions
Actions #1

Updated by Alex Afanasyev over 10 years ago

Potential reason could be in missed notification of face destruction

Actions #2

Updated by Alex Afanasyev over 10 years ago

I have pushed 2 commits to NFD and ndn-cxx: http://gerrit.named-data.net/#/c/1000/ and http://gerrit.named-data.net/#/c/999/. Assuming they are used, the following relatively constantly reproduces the problem on my laptop (OSX 10.9.4)

Run

sudo nfd &
nrd
# we will need terminal output from nrd later

Run (may require multiple times to get the actual)

./build/examples/producer

Run

killall -HUP nrd

Expected: nothing extra is printed in NRD terminal screen

Actual:

RibEntry{
    Name:  /example/testApp/0
    FaceEntry( faceid: 11 origin: 0 cost: 0 flags: 1 expires in: 3598024833000 nanoseconds)
}
RibEntry{
    Name:  /example/testApp/0/1
    FaceEntry( faceid: 11 origin: 0 cost: 0 flags: 1 expires in: 3598027837000 nanoseconds)
}
RibEntry{
    Name:  /example/testApp/0/1/2
    FaceEntry( faceid: 11 origin: 0 cost: 0 flags: 1 expires in: 3598030794000 nanoseconds)
}
Actions #3

Updated by Alex Afanasyev over 10 years ago

More details. Seem to be directly related to missed notifications. The following shows log after running producer. The strange thing that neither notification from created face nor destroyed face is received:

1404781914.553415 DEBUG: [RibManager] Parameters parsed OK
1404781914.553466 DEBUG: [RibManager] command result: processing verb: register
1404781914.553549 TRACE: [RibManager] register prefix: FaceEntry( faceid: 25 origin: 0 cost: 0 flags: 1 expires in: 3599999910000 nanoseconds)
1404781914.553685 DEBUG: [RibManager] Applying 1 updates to FIB
1404781914.556456 DEBUG: [RibManager] Parameters parsed OK
1404781914.556505 DEBUG: [RibManager] command result: processing verb: register
1404781914.556579 TRACE: [RibManager] register prefix: FaceEntry( faceid: 25 origin: 0 cost: 0 flags: 1 expires in: 3599999904000 nanoseconds)
1404781914.556720 DEBUG: [RibManager] Applying 1 updates to FIB
1404781914.559484 DEBUG: [RibManager] Parameters parsed OK
1404781914.559530 DEBUG: [RibManager] command result: processing verb: register
1404781914.559604 TRACE: [RibManager] register prefix: FaceEntry( faceid: 25 origin: 0 cost: 0 flags: 1 expires in: 3599999921000 nanoseconds)
1404781914.559733 DEBUG: [RibManager] Applying 1 updates to FIB

Compare this to a "normal" run:

1404781774.371073 TRACE: [RibManager] onNotification: FaceEventNotification(Kind: created, FaceID: 16, RemoteUri: fd://30, LocalUri: unix:///private/var/run/nfd.sock, Flags: 3)
1404781774.371282 DEBUG: [FaceMonitor] onNotification: Interest sent: /localhost/nfd/faces/events/%17?ndn.InterestLifetime=60000
1404781774.374280 TRACE: [RibManager] onNotification: FaceEventNotification(Kind: created, FaceID: 17, RemoteUri: fd://31, LocalUri: unix:///private/var/run/nfd.sock, Flags: 3)
1404781774.374459 DEBUG: [FaceMonitor] onNotification: Interest sent: /localhost/nfd/faces/events/%18?ndn.InterestLifetime=60000
1404781774.383791 DEBUG: [RibManager] Parameters parsed OK
1404781774.383858 DEBUG: [RibManager] command result: processing verb: register
1404781774.383976 TRACE: [RibManager] register prefix: FaceEntry( faceid: 16 origin: 0 cost: 0 flags: 1 expires in: 3599999869000 nanoseconds)
1404781774.384176 DEBUG: [RibManager] Applying 1 updates to FIB
1404781774.387594 DEBUG: [RibManager] Parameters parsed OK
1404781774.387670 DEBUG: [RibManager] command result: processing verb: register
1404781774.387795 TRACE: [RibManager] register prefix: FaceEntry( faceid: 16 origin: 0 cost: 0 flags: 1 expires in: 3599999857000 nanoseconds)
1404781774.388012 DEBUG: [RibManager] Applying 1 updates to FIB
1404781774.391074 DEBUG: [RibManager] Parameters parsed OK
1404781774.391122 DEBUG: [RibManager] command result: processing verb: register
1404781774.391197 TRACE: [RibManager] register prefix: FaceEntry( faceid: 16 origin: 0 cost: 0 flags: 1 expires in: 3599999909000 nanoseconds)
1404781774.391336 DEBUG: [RibManager] Applying 1 updates to FIB
1404781774.486277 TRACE: [RibManager] onNotification: FaceEventNotification(Kind: destroyed, FaceID: 16, RemoteUri: fd://30, LocalUri: unix:///private/var/run/nfd.sock, Flags: 3)
1404781774.486513 DEBUG: [RibManager] Applying 3 updates to FIB
1404781774.495401 DEBUG: [FaceMonitor] onNotification: Interest sent: /localhost/nfd/faces/events/%19?ndn.InterestLifetime=60000
Actions #4

Updated by Lan Wang over 10 years ago

When you do "kill -HUP nrd", what is nrd expected to do and what does it currently do? Does it get this signal and remove all the entries it inserted? Or it just dies. If the latter, then some entries will be stuck in the FIB, right?

Actions #5

Updated by Alex Afanasyev over 10 years ago

Sorry for spamming, but I have more input. I wrote a simple app that just subscribes to face notifications (also part of http://gerrit.named-data.net/#/c/999/ now). This simple app does not misses a single notification, while rib-manager does miss some consistently.

Actions #6

Updated by Alex Afanasyev over 10 years ago

Lan, I have added processing of HUP signal in http://gerrit.named-data.net/#/c/999/. This is a workaround of issue #1684 and #1662 (RIB listing), which we don't have.

Previously, nrd suppose to simply die. In http://gerrit.named-data.net/#/c/999/ it is devoted to dumping content of RIB database.

Actions #7

Updated by Alex Afanasyev over 10 years ago

  • Category changed from RIB to Management
  • Assignee set to Alex Afanasyev
Actions #8

Updated by Lan Wang over 10 years ago

Maybe unrelated to this bug, but in general there needs to be a way to remove the entries nrd inserted (especially those MAX time entries) before it crashes. Previously application's entries get only 1 hour of lifetime, so this is not a big issue. But I think now the default is MAX time, so this problem needs to be addressed.

Actions #9

Updated by Alex Afanasyev over 10 years ago

  • Blocked by Task #1743: Proper default settings are not applied to ContentStore added
Actions #10

Updated by Alex Afanasyev over 10 years ago

Ok. I found the main reason for the current issue. Due to bug #1743 and my outdated nfd.conf, my content store was only 10 packets. Due to the way notifications are processed in NRD (this is a bug and I will fix it), in some cases NRD was requesting notification that was already evicted from the ContentStore.

Actions #11

Updated by Alex Afanasyev over 10 years ago

  • Status changed from New to Code review
  • % Done changed from 0 to 50

(The other 50% after #1743 finished)

Actions #12

Updated by Alex Afanasyev over 10 years ago

Btw. Should RIB dump on HUP signal be a current feature or still just for testing?

Actions #13

Updated by Junxiao Shi over 10 years ago

Due to bug #1743 and my outdated nfd.conf, my content store was only 10 packets. Due to the way notifications are processed in NRD (this is a bug and I will fix it), in some cases NRD was requesting notification that was already evicted from the ContentStore.

This is one consequence of the design choice that StatusDataset and Notification rely on ContentStore.
A subscriber of face status update notifications should request Face Dataset if it suspects some notifications are lost.

Btw. Should RIB dump on HUP signal be a current feature or still just for testing?

This should be for testing only.
#1684 provides RIB enumeration via Interest-Data exchange.

Actions #14

Updated by Junxiao Shi over 10 years ago

Maybe unrelated to this bug, but in general there needs to be a way to remove the entries nrd inserted (especially those MAX time entries) before it crashes. Previously application's entries get only 1 hour of lifetime, so this is not a big issue. But I think now the default is MAX time, so this problem needs to be addressed.

No. When nrd process crashes, nfd should also quit.
There is no need to clean up FIB entries inserted by RIB Daemon.

Actions #15

Updated by Lan Wang over 10 years ago

Alex: what was the problem in nrd's notification processing?

Actions #17

Updated by Alex Afanasyev over 10 years ago

  • Status changed from Code review to Closed
  • % Done changed from 50 to 100
Actions

Also available in: Atom PDF