Bug #2757
closedGateway RIB crashes after remote unregistration
100%
Description
When the RIB manager on a gateway processes a remote unregistration command from a laptop, nfd process on the gateway crashes after sending a successful response.
Environment: two machines, A works as the laptop while B works as the gateway.
Steps to reproduce:
- run nfd on A with
rib.remote-register
enabled, and set the log level to INFO - generate an identity /Z/A on A and install its cert.
- run nfd on B with
rib.localhop-security
enabled, configure the trust anchor as type any. - run
nfdc register /localhop/nfd udp4://<ip_address_of_B>
on A to gain connectivity to B. - run
ndnpingserver ndn:/Z/A/H
on A to register prefix /Z/A/H/ping locally. Confirm that /Z/A is successfully registered to B's rib (runnfd-status -r
on B). - stop ndnpingserver on A. Confirm that /Z/A/H/ping is unregistered from A's rib and there is not any other entries on A's rib starts with prefix /Z/A. So that /Z/A will be unregistered from B's rib.
Actual: A's nfd can receive the response of successfully unregistration of /Z/A from B. B's nfd has exited unexpectedly.
Expected: B's nfd does not crash.
Files
Updated by Junxiao Shi over 9 years ago
- Subject changed from nfd process will exit unexpectedly after the rib manager performed remote prefix unregistration. to Gateway RIB crashes after remote unregistration
- Description updated (diff)
- Target version set to v0.4
Offending commit is probably commit:76c751ce80109cd429cd45d32a04015f7715546b.
Updated by Vince Lehman over 9 years ago
I tried to reproduce this bug, but was unable. Could you check to make sure I did not miss a step and that I generated the cert correctly?
On laptop:
rib.remote-register is enabled; default_level is INFO
$ sudo nfd-start
$ ndnsec-keygen /Z/A > key.req
$ ndnsec-certgen -N /Z/A key.req > tmp.cert
$ ndnsec-cert-install tmp.cert
$ ndnsec-set-default /Z/A
On server:
Uncomment rib.localhop-security; change trust anchor to type “any”
$ sudo nfd-start
On laptop:
$ nfdc register /localhop/nfd udp4://server-IP
$ ndnpingserver ndn:/Z/A/H
FIB:
/localhost/nfd nexthops={faceid=1 (cost=0)}
/Z/A/H/ping nexthops={faceid=261 (cost=0)}
/localhop/nfd nexthops={faceid=260 (cost=0)}
/localhost/nfd/rib nexthops={faceid=258 (cost=0)}
RIB:
/localhost/nfd/rib route={faceid=258 (origin=0 cost=0 ChildInherit)}
/localhop/nfd route={faceid=260 (origin=255 cost=0 ChildInherit)}
/Z/A/H/ping route={faceid=261 (origin=0 cost=0 ChildInherit)}
On server:
1429567713.335752 DEBUG: [RibManager] Parameters parsed OK
1429567713.335786 DEBUG: [RibManager] command result: processing verb: register
1429567713.346025 INFO: [RibManager] Adding route /Z/A nexthop=260 origin=65 cost=15
1429567713.368157 INFO: [RemoteRegistrator] no hub connected when registering /Z/A
1429567713.368395 DEBUG: [RibManager] RIB update succeeded for RibUpdate {
Name: /Z/A
Action: REGISTER
Route(faceid: 260, origin: 65, cost: 15, flags: 1, never expires)
}
FIB:
/Z/A nexthops={faceid=260 (cost=15)}
/localhost/nfd nexthops={faceid=1 (cost=0)}
/localhop/nfd/rib nexthops={faceid=259 (cost=0)}
/localhost/nfd/rib nexthops={faceid=259 (cost=0)}
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/Z/A route={faceid=260 (origin=65 cost=15 ChildInherit)}
On laptop:
ctl^c to quit ndnpingserver
FIB:
/localhost/nfd nexthops={faceid=1 (cost=0)}
/localhop/nfd nexthops={faceid=260 (cost=0)}
/localhost/nfd/rib nexthops={faceid=258 (cost=0)}
RIB:
/localhost/nfd/rib route={faceid=258 (origin=0 cost=0 ChildInherit)}
/localhop/nfd route={faceid=260 (origin=255 cost=0 ChildInherit)}
On server:
1429567782.676441 DEBUG: [RibManager] Parameters parsed OK
1429567782.676485 DEBUG: [RibManager] command result: processing verb: unregister
1429567782.686422 INFO: [RibManager] Removing route /Z/A nexthop=260 origin=65
1429567782.707531 INFO: [RemoteRegistrator] no hub connected when unregistering /Z/A
1429567782.707796 DEBUG: [RibManager] RIB update succeeded for RibUpdate {
Name: /Z/A
Action: UNREGISTER
Route(faceid: 260, origin: 65, cost: 0, flags: 0, expires in: 9216282403347378533 nanoseconds)
}
FIB:
/localhost/nfd nexthops={faceid=1 (cost=0)}
/localhop/nfd/rib nexthops={faceid=259 (cost=0)}
/localhost/nfd/rib nexthops={faceid=259 (cost=0)}
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
After the server received the unregister command, it continued to run and seemed to function properly. I was able to retrieve the nfd-status and add a route to the RIB.
Nfd version on laptop:
nfd -V
0.1.0-316-ge06b627
Nfd version on server:
$ nfd -V
0.3.1-12-ge8f4246
which are both after the FibUpdater commit
Updated by Junxiao Shi over 9 years ago
- Assignee set to Vince Lehman
Vince agreed to work on this Bug at 20150420 conference call.
Updated by Yanbiao Li over 9 years ago
Thanks for your testing.
The only difference between our configurations is that there is a route on the server toward the laptop in my testing environment. (I need this route for other test purpose)
I tested again. As long as there is a route toward the laptop on the server, the nfd will crash after remote unregistration. (see attached pictures)
Updated by Junxiao Shi over 9 years ago
As shown in crash.png, this appears to be a problem with route inheritance.
Updated by Vince Lehman over 9 years ago
I've added the step to register a route from the server back to the laptop:
On server:
$ nfd-status -r
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
$ nfdc register / udp4://<laptop-IP>
Successful in name registration: ControlParameters(Name: /, FaceId: 263, Origin: 255, Cost: 0, Flags: 1, )
$ nfd-status -r
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/ route={faceid=263 (origin=255 cost=0 ChildInherit)}
On laptop:
$ ndnpingserver /Z/A/H
On server:
$ nfd-status -r
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/Z/A route={faceid=263 (origin=65 cost=15 ChildInherit)}
/ route={faceid=263 (origin=255 cost=0 ChildInherit)}
On laptop:
ctl^c to kill ndnpingserver
$ nfd-status -r
RIB:
/localhost/nfd/rib route={faceid=258 (origin=0 cost=0 ChildInherit)}
/localhop/nfd route={faceid=261 (origin=255 cost=0 ChildInherit)}
On server:
$ nfd-status -r
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/ route={faceid=263 (origin=255 cost=0 ChildInherit)}
Which OS/versions are you using for the laptop and the gateway?
Updated by Yanbiao Li over 9 years ago
- File laptop.nfd.conf laptop.nfd.conf added
- File laptop.sh laptop.sh added
- File server.nfd.conf server.nfd.conf added
- File server.sh server.sh added
- File system.inf system.inf added
both the laptop and the server are ubuntu 14.04. The latest NFD is run on both ends.
I uploaded my test scripts and corresponding config files:
laptop.sh; server.sh; laptop.nfd.conf; server.nfd.conf
1/ on the server: sh server.sh LAPTOP_IP
2/ on the laptop: sh laptop.sh SERVER_IP
3/ run nfd-status -r on the server. I get "ERROR: error while connecting to the forwarder (Connection refused)"
Updated by Vince Lehman over 9 years ago
- Status changed from New to In Progress
I am able to reproduce the bug using the command line on a single machine:
$ sudo nfd-start
$ nfdc register / 258
$ nfdc register -o 65 -c 15 /Z/A 258
$ nfdc unregister -o 65 /Z/A 258
$ nfd-status
ERROR: error while connecting to the forwarder (Connection refused)
This is a problem caused by a FibUpdate being generated for a namespace that is removed from the RIB.
The RIB searches for the namespace and tries to apply the FibUpdate to the namespace. There is a BOOST_ASSERT in the code to check if the namespace does
not exist, but there is no code to stop the RibEntry from being dereferenced. When I compile with the --debug flag and run the above commands, I see that
the assertion fails.
I will push a patch to stop the FibUpdater from generating FibUpdates for a namespace that will be removed.
Updated by Vince Lehman over 9 years ago
- Status changed from In Progress to Code review
- % Done changed from 0 to 90
Updated by Yanbiao Li over 9 years ago
I run tests with latest commit, this bug has been resolved according to the results.
Updated by Junxiao Shi over 9 years ago
- Status changed from Code review to Closed
- % Done changed from 90 to 100