Project

General

Profile

Actions

Task #1711

closed

Unregister routes from a routing protocol when the routing daemon dies

Added by Junxiao Shi over 9 years ago. Updated over 9 years ago.

Status:
Rejected
Priority:
Normal
Category:
RIB
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

RIB Daemon should detect when a routing daemon dies, and withdraw all routes installed by that routing protocol.

Actions #1

Updated by Junxiao Shi over 9 years ago

RIB Daemon is already able to detect a process failing from face status change notifications.
However, a method needs to be designed for RIB Daemon to know which process is the routing daemon.
Inferring from Origin is not sufficient, because nfdc is able to send Register on behalf of any Origin.

Actions #2

Updated by Lan Wang over 9 years ago

A comment: allowing nfdc to specify origin complicates matters and it seems to defeat the purpose of the origin field. Now a question: suppose we are able to detect the failure of the routing daemon, then should we remove the routes inserted by only that routing daemon or all the routes with that routing daemon's origin (i.e., including the ones inserted by nfdc)? If the former, we need to keep track of which routes are inserted by which process (in addition to the origin).

Actions #3

Updated by Junxiao Shi over 9 years ago

Answer to note-2:

nfdc's register with specific Origin is intended for testing. Therefore, it's okay to unregister all routes of a routing protocol's Origin when it dies.

Actions #4

Updated by Alex Afanasyev over 9 years ago

I think this task can be rejected. RIB/FIB entries that originated from NLSR will disappear after ~1 hour after NLSR dies (because of the 1~hour expiration lifetime).

That is, we already have a soft state to solve the problem and implementing hard-state may not be necessary.

Actions #5

Updated by Junxiao Shi over 9 years ago

If we rely on Route expiration, every routing daemon, upon restart after crashing, should request RIB dataset and remove all routes installed by its previous instance.

Actions #6

Updated by Lan Wang over 9 years ago

Even though there is a one-hour expiration time, I think it's still preferable for RIB daemon to remove the routes from the routing process as soon as it dies, if it can detect this event. Question: how does RIB daemon detect whether the routing process has died? I suppose it does not know the face id of the routing process.

Actions #7

Updated by Junxiao Shi over 9 years ago

Answer to note-6:

A protocol extension would be needed.

A routing daemon SHOULD claim its ownership of an Origin via the rib/claimorigin command.

Parameters of this command include an Origin and a FaceId. RIB daemon MUST verify the signing key of this command owns the Origin.

After this command is accepted, when the specified face is destroyed, RIB daemon MUST erase all Routes carrying the specified Origin.

Actions #8

Updated by Lan Wang over 9 years ago

  • Assignee set to Vince Lehman
  • Target version set to v0.3

The protocol extension looks good to me.

Actions #9

Updated by Alex Afanasyev over 9 years ago

We have briefly discussed this issue during todays NFD call (July 28, 2014). Protocol extension is really unnecessary. It is possible for NLSR to implement "taking ownership" without defining any new protocol, as Junxiao suggested in note-5: NLSR should request RIB dataset and then request removal all routes with NLSR's origin.

Actions #10

Updated by Lan Wang over 9 years ago

Alex: the question is not about how to remove NLSR's routes, but how to detect NLSR has died. NRD seems face events, but it doesn't know which face corresponds to NLSR, right?

Actions #11

Updated by Junxiao Shi over 9 years ago

As note-9 said, RIB daemon needs no action when a routing daemon dies.

This means, when a routing daemon crashes, the Routes from this routing daemon remain in the RIB.
When the routing daemon is restarted (eg. by upstart), it should request RIB dataset and take care of the old Routes from its Origin (eg. delete them or reconcile the different).

This approach is better than automatically deleting all Routes, because a short-term routing daemon failure won't affect network operation.
Installed Routes are likely valid even if the routing daemon has failed; if a Route isn't working, forwarding strategy can get around it.
Traffic can still be forwarded during this period without the routing daemon.

The routing daemon is expected to come back soon (before the installed Routes expire).
When it comes back, it should perform an initial route discovery (eg. learn the whole topology in a link-state protocol), and reconcile the difference from the current RIB.

Actions #12

Updated by Lan Wang over 9 years ago

  • Assignee changed from Vince Lehman to A K M Mahmudul Hoque

This sounds good to me. We'll learn more from actual testbed deployment to know if there's a need to delete the prefixes when the routing daemon dies. I've reassigned the task to Hoque.

Actions #13

Updated by Lan Wang over 9 years ago

If the suggestion in Note #11 is accepted, then this task needs to be moved to NLSR.

Actions #14

Updated by Junxiao Shi over 9 years ago

  • Status changed from New to Rejected

No change is required in NFD.

NLSR should have another Task to solve the problem, see note-11.

Actions

Also available in: Atom PDF