Project

General

Profile

Bug #2757

Gateway RIB crashes after remote unregistration

Added by Yanbiao Li over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
RIB
Target version:
Start date:
04/17/2015
Due date:
% Done:

100%

Estimated time:

Description

When the RIB manager on a gateway processes a remote unregistration command from a laptop, nfd process on the gateway crashes after sending a successful response.

Environment: two machines, A works as the laptop while B works as the gateway.

Steps to reproduce:

  1. run nfd on A with rib.remote-register enabled, and set the log level to INFO
  2. generate an identity /Z/A on A and install its cert.
  3. run nfd on B with rib.localhop-security enabled, configure the trust anchor as type any.
  4. run nfdc register /localhop/nfd udp4://<ip_address_of_B> on A to gain connectivity to B.
  5. run ndnpingserver ndn:/Z/A/H on A to register prefix /Z/A/H/ping locally. Confirm that /Z/A is successfully registered to B's rib (run nfd-status -r on B).
  6. stop ndnpingserver on A. Confirm that /Z/A/H/ping is unregistered from A's rib and there is not any other entries on A's rib starts with prefix /Z/A. So that /Z/A will be unregistered from B's rib.

Actual: A's nfd can receive the response of successfully unregistration of /Z/A from B. B's nfd has exited unexpectedly.

Expected: B's nfd does not crash.

ok.png (93.8 KB) ok.png without a route toward the laptop Yanbiao Li, 04/21/2015 04:47 PM
crash.png (80.4 KB) crash.png with a route toward the laptop Yanbiao Li, 04/21/2015 04:47 PM
laptop.sh (642 Bytes) laptop.sh Yanbiao Li, 04/22/2015 01:52 PM
laptop.nfd.conf (10.8 KB) laptop.nfd.conf Yanbiao Li, 04/22/2015 01:52 PM
server.nfd.conf (10.8 KB) server.nfd.conf Yanbiao Li, 04/22/2015 01:52 PM
system.inf (129 Bytes) system.inf Yanbiao Li, 04/22/2015 01:52 PM
server.sh (377 Bytes) server.sh Yanbiao Li, 04/22/2015 01:52 PM
302
303

History

#1 Updated by Junxiao Shi over 4 years ago

  • Subject changed from nfd process will exit unexpectedly after the rib manager performed remote prefix unregistration. to Gateway RIB crashes after remote unregistration
  • Description updated (diff)
  • Target version set to v0.4

Offending commit is probably commit:76c751ce80109cd429cd45d32a04015f7715546b.

#2 Updated by Vince Lehman over 4 years ago

I tried to reproduce this bug, but was unable. Could you check to make sure I did not miss a step and that I generated the cert correctly?

On laptop:

rib.remote-register is enabled; default_level is INFO

$ sudo nfd-start

$ ndnsec-keygen /Z/A > key.req
$ ndnsec-certgen -N /Z/A key.req > tmp.cert
$ ndnsec-cert-install tmp.cert
$ ndnsec-set-default /Z/A

On server:

Uncomment rib.localhop-security; change trust anchor to type “any”
$ sudo nfd-start

On laptop:

$ nfdc register /localhop/nfd udp4://server-IP

$ ndnpingserver ndn:/Z/A/H

FIB:
  /localhost/nfd nexthops={faceid=1 (cost=0)}
  /Z/A/H/ping nexthops={faceid=261 (cost=0)}
  /localhop/nfd nexthops={faceid=260 (cost=0)}
  /localhost/nfd/rib nexthops={faceid=258 (cost=0)}
RIB:
  /localhost/nfd/rib route={faceid=258 (origin=0 cost=0 ChildInherit)}
  /localhop/nfd route={faceid=260 (origin=255 cost=0 ChildInherit)}
  /Z/A/H/ping route={faceid=261 (origin=0 cost=0 ChildInherit)}

On server:

1429567713.335752 DEBUG: [RibManager] Parameters parsed OK
1429567713.335786 DEBUG: [RibManager] command result: processing verb: register
1429567713.346025 INFO: [RibManager] Adding route /Z/A nexthop=260 origin=65 cost=15
1429567713.368157 INFO: [RemoteRegistrator] no hub connected when registering /Z/A
1429567713.368395 DEBUG: [RibManager] RIB update succeeded for RibUpdate {
  Name: /Z/A
  Action: REGISTER
  Route(faceid: 260, origin: 65, cost: 15, flags: 1, never expires)
}

FIB:
  /Z/A nexthops={faceid=260 (cost=15)}
  /localhost/nfd nexthops={faceid=1 (cost=0)}
  /localhop/nfd/rib nexthops={faceid=259 (cost=0)}
  /localhost/nfd/rib nexthops={faceid=259 (cost=0)}
RIB:
  /localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
  /localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
  /Z/A route={faceid=260 (origin=65 cost=15 ChildInherit)}

On laptop:

ctl^c to quit ndnpingserver

FIB:
  /localhost/nfd nexthops={faceid=1 (cost=0)}
  /localhop/nfd nexthops={faceid=260 (cost=0)}
  /localhost/nfd/rib nexthops={faceid=258 (cost=0)}
RIB:
  /localhost/nfd/rib route={faceid=258 (origin=0 cost=0 ChildInherit)}
  /localhop/nfd route={faceid=260 (origin=255 cost=0 ChildInherit)}

On server:

1429567782.676441 DEBUG: [RibManager] Parameters parsed OK
1429567782.676485 DEBUG: [RibManager] command result: processing verb: unregister
1429567782.686422 INFO: [RibManager] Removing route /Z/A nexthop=260 origin=65
1429567782.707531 INFO: [RemoteRegistrator] no hub connected when unregistering /Z/A
1429567782.707796 DEBUG: [RibManager] RIB update succeeded for RibUpdate {
  Name: /Z/A
  Action: UNREGISTER
  Route(faceid: 260, origin: 65, cost: 0, flags: 0, expires in: 9216282403347378533 nanoseconds)
}

FIB:
  /localhost/nfd nexthops={faceid=1 (cost=0)}
  /localhop/nfd/rib nexthops={faceid=259 (cost=0)}
  /localhost/nfd/rib nexthops={faceid=259 (cost=0)}
RIB:
  /localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
  /localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}

After the server received the unregister command, it continued to run and seemed to function properly. I was able to retrieve the nfd-status and add a route to the RIB.

Nfd version on laptop:

nfd -V
0.1.0-316-ge06b627

Nfd version on server:

$ nfd -V
0.3.1-12-ge8f4246

which are both after the FibUpdater commit

#3 Updated by Junxiao Shi over 4 years ago

  • Assignee set to Vince Lehman

Vince agreed to work on this Bug at 20150420 conference call.

#4 Updated by Yanbiao Li over 4 years ago

302
303

Thanks for your testing.

The only difference between our configurations is that there is a route on the server toward the laptop in my testing environment. (I need this route for other test purpose)

I tested again. As long as there is a route toward the laptop on the server, the nfd will crash after remote unregistration. (see attached pictures)

#5 Updated by Junxiao Shi over 4 years ago

As shown in crash.png, this appears to be a problem with route inheritance.

#6 Updated by Vince Lehman over 4 years ago

I've added the step to register a route from the server back to the laptop:

On server:

$ nfd-status -r
RIB:
  /localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
  /localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}

$ nfdc register / udp4://<laptop-IP>
Successful in name registration: ControlParameters(Name: /, FaceId: 263, Origin: 255, Cost: 0, Flags: 1, )

$ nfd-status -r
RIB:
  /localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
  /localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
  / route={faceid=263 (origin=255 cost=0 ChildInherit)}

On laptop:

$ ndnpingserver /Z/A/H

On server:

$ nfd-status -r
RIB:
  /localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
  /localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
  /Z/A route={faceid=263 (origin=65 cost=15 ChildInherit)}
  / route={faceid=263 (origin=255 cost=0 ChildInherit)}

On laptop:

ctl^c to kill ndnpingserver

$ nfd-status -r 
RIB:
  /localhost/nfd/rib route={faceid=258 (origin=0 cost=0 ChildInherit)}
  /localhop/nfd route={faceid=261 (origin=255 cost=0 ChildInherit)}

On server:

$ nfd-status -r
RIB:
  /localhost/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
  /localhop/nfd/rib route={faceid=259 (origin=0 cost=0 ChildInherit)}
  / route={faceid=263 (origin=255 cost=0 ChildInherit)}

Which OS/versions are you using for the laptop and the gateway?

#7 Updated by Yanbiao Li over 4 years ago

both the laptop and the server are ubuntu 14.04. The latest NFD is run on both ends.

I uploaded my test scripts and corresponding config files:

laptop.sh; server.sh; laptop.nfd.conf; server.nfd.conf

1/ on the server: sh server.sh LAPTOP_IP

2/ on the laptop: sh laptop.sh SERVER_IP

3/ run nfd-status -r on the server. I get "ERROR: error while connecting to the forwarder (Connection refused)"

#8 Updated by Vince Lehman over 4 years ago

  • Status changed from New to In Progress

I am able to reproduce the bug using the command line on a single machine:

$ sudo nfd-start
$ nfdc register / 258
$ nfdc register -o 65 -c 15 /Z/A 258
$ nfdc unregister -o 65 /Z/A 258
$ nfd-status
ERROR: error while connecting to the forwarder (Connection refused)

This is a problem caused by a FibUpdate being generated for a namespace that is removed from the RIB.

The RIB searches for the namespace and tries to apply the FibUpdate to the namespace. There is a BOOST_ASSERT in the code to check if the namespace does
not exist, but there is no code to stop the RibEntry from being dereferenced. When I compile with the --debug flag and run the above commands, I see that
the assertion fails.

I will push a patch to stop the FibUpdater from generating FibUpdates for a namespace that will be removed.

#9 Updated by Vince Lehman over 4 years ago

  • Status changed from In Progress to Code review
  • % Done changed from 0 to 90

#10 Updated by Yanbiao Li over 4 years ago

I run tests with latest commit, this bug has been resolved according to the results.

#11 Updated by Junxiao Shi over 4 years ago

  • Status changed from Code review to Closed
  • % Done changed from 90 to 100

Also available in: Atom PDF