Bug #2757
closedGateway RIB crashes after remote unregistration
100%
Description
When the RIB manager on a gateway processes a remote unregistration command from a laptop, nfd process on the gateway crashes after sending a successful response.
Environment: two machines, A works as the laptop while B works as the gateway.
Steps to reproduce:
- run nfd on A with
rib.remote-register
enabled, and set the log level to INFO - generate an identity /Z/A on A and install its cert.
- run nfd on B with
rib.localhop-security
enabled, configure the trust anchor as type any. - run
nfdc register /localhop/nfd udp4://<ip_address_of_B>
on A to gain connectivity to B. - run
ndnpingserver ndn:/Z/A/H
on A to register prefix /Z/A/H/ping locally. Confirm that /Z/A is successfully registered to B's rib (runnfd-status -r
on B). - stop ndnpingserver on A. Confirm that /Z/A/H/ping is unregistered from A's rib and there is not any other entries on A's rib starts with prefix /Z/A. So that /Z/A will be unregistered from B's rib.
Actual: A's nfd can receive the response of successfully unregistration of /Z/A from B. B's nfd has exited unexpectedly.
Expected: B's nfd does not crash.
Files
Updated by Junxiao Shi almost 10 years ago
- Subject changed from nfd process will exit unexpectedly after the rib manager performed remote prefix unregistration. to Gateway RIB crashes after remote unregistration
- Description updated (diff)
- Target version set to v0.4
Offending commit is probably commit:76c751ce80109cd429cd45d32a04015f7715546b.
Updated by Vince Lehman almost 10 years ago
I tried to reproduce this bug, but was unable. Could you check to make sure I did not miss a step and that I generated the cert correctly?
On laptop:
rib.remote-register is enabled; default_level is INFO
$ sudo nfd-start
$ ndnsec-keygen /Z/A > key.req
$ ndnsec-certgen -N /Z/A key.req > tmp.cert
$ ndnsec-cert-install tmp.cert
$ ndnsec-set-default /Z/A
On server:
Uncomment rib.localhop-security; change trust anchor to type “any”
$ sudo nfd-start
On laptop:
$ nfdc register /localhop/nfd udp4://server-IP
$ ndnpingserver ndn:/Z/A/H
FIB:
/localhost/nfd nexthops={faceid=1 (cost=0)}
/Z/A/H/ping nexthops={faceid=261 (cost=0)}
/localhop/nfd nexthops={faceid=260 (cost=0)}
/localhost/nfd/rib nexthops={faceid=258 (cost=0)}
RIB:
/localhost/nfd/rib route={faceid=258 (origin=0 cost=0 ChildInherit)}
/localhop/nfd route={faceid=260 (origin=255 cost=0 ChildInherit)}
/Z/A/H/ping route={faceid=261 (origin=0 cost=0 ChildInherit)}
On server:
1429567713.335752 DEBUG: [RibManager] Parameters parsed OK
1429567713.335786 DEBUG: [RibManager] command result: processing verb: register
1429567713.346025 INFO: [RibManager] Adding route /Z/A nexthop=260 origin=65 cost=15
1429567713.368157 INFO: [RemoteRegistrator] no hub connected when registering /Z/A
1429567713.368395 DEBUG: [RibManager] RIB update succeeded for RibUpdate {
Name: /Z/A
Action: REGISTER
Route(faceid: 260, origin: 65, cost: 15, flags: 1, never expires)
}
FIB:
/Z/A nexthops={faceid=260 (cost=15)}
/localhost/nfd nexthops={faceid=1 (cost=0)}
/localhop/nfd/rib nexthops={faceid=259 (cost=0)}
/localhost/nfd/rib nexthops={faceid=259 (cost=0)}
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/Z/A route={faceid=260 (origin=65 cost=15 ChildInherit)}
On laptop:
ctl^c to quit ndnpingserver
FIB:
/localhost/nfd nexthops={faceid=1 (cost=0)}
/localhop/nfd nexthops={faceid=260 (cost=0)}
/localhost/nfd/rib nexthops={faceid=258 (cost=0)}
RIB:
/localhost/nfd/rib route={faceid=258 (origin=0 cost=0 ChildInherit)}
/localhop/nfd route={faceid=260 (origin=255 cost=0 ChildInherit)}
On server:
1429567782.676441 DEBUG: [RibManager] Parameters parsed OK
1429567782.676485 DEBUG: [RibManager] command result: processing verb: unregister
1429567782.686422 INFO: [RibManager] Removing route /Z/A nexthop=260 origin=65
1429567782.707531 INFO: [RemoteRegistrator] no hub connected when unregistering /Z/A
1429567782.707796 DEBUG: [RibManager] RIB update succeeded for RibUpdate {
Name: /Z/A
Action: UNREGISTER
Route(faceid: 260, origin: 65, cost: 0, flags: 0, expires in: 9216282403347378533 nanoseconds)
}
FIB:
/localhost/nfd nexthops={faceid=1 (cost=0)}
/localhop/nfd/rib nexthops={faceid=259 (cost=0)}
/localhost/nfd/rib nexthops={faceid=259 (cost=0)}
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
After the server received the unregister command, it continued to run and seemed to function properly. I was able to retrieve the nfd-status and add a route to the RIB.
Nfd version on laptop:
nfd -V
0.1.0-316-ge06b627
Nfd version on server:
$ nfd -V
0.3.1-12-ge8f4246
which are both after the FibUpdater commit
Updated by Junxiao Shi almost 10 years ago
- Assignee set to Vince Lehman
Vince agreed to work on this Bug at 20150420 conference call.
Updated by Yanbiao Li almost 10 years ago
Thanks for your testing.
The only difference between our configurations is that there is a route on the server toward the laptop in my testing environment. (I need this route for other test purpose)
I tested again. As long as there is a route toward the laptop on the server, the nfd will crash after remote unregistration. (see attached pictures)
Updated by Junxiao Shi almost 10 years ago
As shown in crash.png, this appears to be a problem with route inheritance.
Updated by Vince Lehman almost 10 years ago
I've added the step to register a route from the server back to the laptop:
On server:
$ nfd-status -r
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
$ nfdc register / udp4://<laptop-IP>
Successful in name registration: ControlParameters(Name: /, FaceId: 263, Origin: 255, Cost: 0, Flags: 1, )
$ nfd-status -r
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/ route={faceid=263 (origin=255 cost=0 ChildInherit)}
On laptop:
$ ndnpingserver /Z/A/H
On server:
$ nfd-status -r
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/Z/A route={faceid=263 (origin=65 cost=15 ChildInherit)}
/ route={faceid=263 (origin=255 cost=0 ChildInherit)}
On laptop:
ctl^c to kill ndnpingserver
$ nfd-status -r
RIB:
/localhost/nfd/rib route={faceid=258 (origin=0 cost=0 ChildInherit)}
/localhop/nfd route={faceid=261 (origin=255 cost=0 ChildInherit)}
On server:
$ nfd-status -r
RIB:
/localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
/ route={faceid=263 (origin=255 cost=0 ChildInherit)}
Which OS/versions are you using for the laptop and the gateway?
Updated by Yanbiao Li almost 10 years ago
- File laptop.nfd.conf laptop.nfd.conf added
- File laptop.sh laptop.sh added
- File server.nfd.conf server.nfd.conf added
- File server.sh server.sh added
- File system.inf system.inf added
both the laptop and the server are ubuntu 14.04. The latest NFD is run on both ends.
I uploaded my test scripts and corresponding config files:
laptop.sh; server.sh; laptop.nfd.conf; server.nfd.conf
1/ on the server: sh server.sh LAPTOP_IP
2/ on the laptop: sh laptop.sh SERVER_IP
3/ run nfd-status -r on the server. I get "ERROR: error while connecting to the forwarder (Connection refused)"
Updated by Vince Lehman almost 10 years ago
- Status changed from New to In Progress
I am able to reproduce the bug using the command line on a single machine:
$ sudo nfd-start
$ nfdc register / 258
$ nfdc register -o 65 -c 15 /Z/A 258
$ nfdc unregister -o 65 /Z/A 258
$ nfd-status
ERROR: error while connecting to the forwarder (Connection refused)
This is a problem caused by a FibUpdate being generated for a namespace that is removed from the RIB.
The RIB searches for the namespace and tries to apply the FibUpdate to the namespace. There is a BOOST_ASSERT in the code to check if the namespace does
not exist, but there is no code to stop the RibEntry from being dereferenced. When I compile with the --debug flag and run the above commands, I see that
the assertion fails.
I will push a patch to stop the FibUpdater from generating FibUpdates for a namespace that will be removed.
Updated by Vince Lehman almost 10 years ago
- Status changed from In Progress to Code review
- % Done changed from 0 to 90
Updated by Yanbiao Li almost 10 years ago
I run tests with latest commit, this bug has been resolved according to the results.
Updated by Junxiao Shi almost 10 years ago
- Status changed from Code review to Closed
- % Done changed from 90 to 100