Project

General

Profile

Feature #1941

FibUpdater

Added by Junxiao Shi about 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
RIB
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
12.00 h

Description

Currently, FIB updates are generated by RibManager class. This is semantically incorrect, because RibManager's responsibility is to accept RIB register/unregister commands, while FIB updates can result from not only RIB register/unregister commands, but also Route expiration.

This Task is to implement the proposed design: send FIB updates from a separate FibUpdater module, and trigger FIB updates from a RibUpdateBatch.
This design has been approved and clarified.

Design

RibManager updates the RIB only; it does not directly update the FIB.

RibManager can respond to prefix registration commands right away without waiting for Rib::beginApplyUpdate to complete, because a successful response indicates the acceptance, not the completion, of a RibMgmt command.

RIB updates are batched. A batch of RIB updates is packaged into a RibUpdateBatch object. All RIB updates in the same RibUpdateBatch object must refer to the same face.

When RibManager receives one or more RIB updates, it generates a RibUpdateBatch, and passes it to Rib::beginApplyUpdate() to begin the RIB
update procedure.

When one or more Route expires, the RIB itself generates a RibUpdateBatch, and begin the RIB update procedure.

Note: before #1698, RibUpdateBatch may contain only one RIB update.

The RIB update procedure has admission control: if a previous RibUpdateBatch is in progress, the new batch will be put in a queue until the previous batch is complete.

To apply a batch of RIB updates, the current collection of RIB entries and the RibUpdateBatch are passed to FibUpdater component to compute the optimal set of FIB updates, and the FibUpdater component shall send these FIB updates.

FIB updates should be send sequentially, in order to avoid #1990 and avoid overloading NFD; given that NFD is local, this shouldn't cause long latency.

FIB updates for RibUpdateBatch's face should be sent before FIB updates for other faces.

If any FIB update fails,

(a) in case of a non-recoverable error (eg. signing key of RIB Daemon is not trusted, more than 10 timeouts for the same command), the RIB Daemon shall crash;

(b) in case of a non-existent face error of the same face, the RibUpdateBatch is abandoned;

(c) in case of a non-existent face error of a different face, the FIB update is skipped;

(d) in other cases (eg. timeout), the FIB update is retried.

The updates can be skipped in case (c) because they are for inherited routes. Inherited routes can only affect the calculation of updates for the same face, which is an invalid face, and do not alter the route flags.

If the RibUpdateBatch is abandoned, the RIB is unchanged; otherwise, after all FIB updates are successful, the RIB entries are updated.

Fib-Updater.pdf (4.97 KB) Fib-Updater.pdf FibUpdater Class and Sequence Diagram Vince Lehman, 11/11/2014 01:51 PM
fib-update-summary.pdf (85.4 KB) fib-update-summary.pdf FIB Update Summary Vince Lehman, 12/09/2014 01:03 PM

Related issues

Blocks NFD - Feature #1698: Queuing FIB updates generated as a response to different RIB updatesNew2014-06-25

Blocks NFD - Feature #1697: Calculate FIB cost as the lowest cost among all inherited RoutesNew

History

#1 Updated by Junxiao Shi about 5 years ago

  • Blocks Feature #1698: Queuing FIB updates generated as a response to different RIB updates added

#2 Updated by Junxiao Shi about 5 years ago

  • Blocks Feature #1697: Calculate FIB cost as the lowest cost among all inherited Routes added

#3 Updated by Vince Lehman almost 5 years ago

  • Status changed from New to In Progress

#4 Updated by Vince Lehman almost 5 years ago

Before I start implementation, I want to make sure I understand the proposal correctly. I've attached a class diagram and sequence diagram for review.

#5 Updated by Lan Wang almost 5 years ago

A couple more clarifications Vince gave me:

"I’ve left FaceEntry as is since this change is more important than the style change of refactoring FaceEntry as Route; I can implement the FibUpdater first as it is an important functional change and then refactor FaceEntry.

The flow for NLSR would be almost exactly the same except NRD would not wait for the FibUpdates to be applied before responding with a success message. (Message 10 would occur as the second message)"

Junxiao/Alex: could you take a look? If there's no objection, Vince can start implementing it today.

#6 Updated by Junxiao Shi almost 5 years ago

Sequence graph in note-4 seems correct.

Class diagram is never accurate because it doesn't show everything.
If you want a review on API, upload a patchset that contains updated headers.

#7 Updated by Vince Lehman almost 5 years ago

I have a few questions about the design:

1.) Assume you have the following RIB:

Name Face CHILD_INHERIT CAPTURE cost
/ 2 T F 5
/a 2 T T 10
/a/b 2 F T 15
/a/c 2 F F 15
/a/d 2 F F 15

And this pending RibUpdateBatch:

Name Face CHILD_INHERIT CAPTURE cost
/a 2 T F 10
/ 2 F F 5

Processing the first RibUpdate turns off CAPTURE on /a and generates the following FibUpdates:

Name Face cost action
/a 2 5 ADD
/a/c 2 5 ADD
/a/d 2 5 ADD

Processing the second RibUpdate turns off CHILD_INHERIT for / and generates no FibUpdates because in the Rib, /a still has CAPTURE set.

This means the FibUpdater will now need to check previously calculated updates to guarantee the current FibUpdates because the CAPTURE flag or CHILD_INHERIT flag may have changed on another name during a previous RibUpdate.

The previous FibUpdates should be modified to:

Name Face cost action
/a 2 10 ADD
/a/c 2 10 ADD
/a/d 2 10 ADD

but the FibUpdater does not know how previous RibUpdates should affect the current RibUpdate without maintaining some sort of state.

Is there a simple way to calculate the FibUpdates based on a non-modified Rib and other generated FibUpdates?

2.) If multiple registrations are batched together, how should responses be handled? Should each RibUpdate have callbacks for FibUpdate success and failure?

Say an application registers /app/video on face 2 and another application registers /app/music on face 2. Both RibUpdates are added to a RibUpdateBatch for face 2. The FibUpdater calculates that /app/video generates 3 FibUpdates while /app/music generates 1 FibUpdate.

Both the application that registered /app/video and the application that registered /app/music need to received independent responses. If FibUpdates are calculated for all of the RibUpdates before they are applied, how can you tell when to respond to each specific application (i.e., respond to one application when 1 update is successful and another when 3 updates are successful)? Is the intent of the design to individually calculate the FibUpdates for each RibUpdate and send them before considering the next RibUpdate or to calculate all of the FibUpdates for an entire RibUpdate batch and then send them?

#8 Updated by Junxiao Shi almost 5 years ago

Is there a simple way to calculate the FibUpdates based on a non-modified Rib and other generated FibUpdates?

As I said in proposed design, given a pre-update RIB snapshot and a RibUpdateBatch, there exists an optimal set of FIB updates.

The algorithm to calculate this set shall be designed as part of #1698.
In #1941, RibUpdateBatch contains only one RIB update.

2.) If multiple registrations are batched together, how should responses be handled? Should each RibUpdate have callbacks for FibUpdate success and failure?

No. The entire batch can succeed or fail only once. There's no callback for individual updates, because FIB updates are calculated from the batch, and there's no way to associate a subset of FIB updates to an individual RIB update.

#9 Updated by Vince Lehman almost 5 years ago

As I said in proposed design, given a pre-update RIB snapshot and a RibUpdateBatch, there exists an optimal set of FIB updates.

Yes, I don’t disagree with this.

The algorithm to calculate this set shall be designed as part of #1698.
In #1941, RibUpdateBatch contains only one RIB update.

I need to think of these issues now as this will affect my design.

No. The entire batch can succeed or fail only once. There's no callback for individual updates, because FIB updates are calculated from the batch, and there's no way to associate a subset of FIB updates to an individual RIB update.

So multiple callbacks should be associated with one RibUpdateBatch?

Another question I have is where should I assume FIB update failure can occur? If FibUpdates fail because of a nonexistent FaceId, this is simple as the updates will not be applied to the FIB and the RibUpdateBatch can be abandoned. But, can FibUpdates fail in the middle of a batch update for other reasons not covered by the design specification? If so, there are certain scenarios that I am worried about and will need to consider carefully.

For example:

Assume you have the following RIB:

|  Name  | Face | CHILD_INHERIT | CAPTURE | cost |
|--------|------|---------------|---------|------|
| /a/b   |  1   |      F        |    F    |   5  |
| /a/c   |  1   |      F        |    F    |   5  |     

and assume you have the following RibUpdateBatch:

|  Name  | Face | CHILD_INHERIT | CAPTURE | cost |
|--------|------|---------------|---------|------|
| /a     |  2   |      T        |    F    |   10 |

This will generate the following FibUpdates:

|  Name  | Face |  cost  |  action  |
|--------|------|--------|----------|
| /a     |  2   |     10 | ADD      |
| /a/b   |  2   |     10 | ADD      |
| /a/c   |  2   |     10 | ADD      |

Now, is it possible that the updates for /a and /a/b will succeed but the update for
/a/c will fail due to timeout or some other factor?

If so, the entire RibUpdateBatch will fail and the RIB will not be updated.

The problem is that the updates for /a and /a/b are in the FIB but there will be no way to unregister them. If an unregistration command comes for /a, the entry does not exist in the RIB and there will be no way to know what if longer prefixes inherited this route.

#10 Updated by Vince Lehman almost 5 years ago

I want to clarify that the above assumes that after a certain number of retries, the FibUpdater will stop trying to apply the update; This can cause the FibUpdate process to fail sometime in the middle or at the end. When the FibUpdater stops retrying, this is what will cause the RibUpdateBatch to fail as a whole. If this is not the intended behavior, let me know.

#11 Updated by Junxiao Shi almost 5 years ago

  • Description updated (diff)

So multiple callbacks should be associated with one RibUpdateBatch?

No. There is only one callback. The called function can reply to multiple prefix registration commands.

Hint: RibManager can place relevant prefix registration commands in a container that is passed to the called function.

Where should I assume FIB update failure can occur? If FibUpdates fail because of a nonexistent FaceId, this is simple as the updates will not be applied to the FIB and the RibUpdateBatch can be abandoned. But, can FibUpdates fail in the middle of a batch update for other reasons not covered by the design specification? If so, there are certain scenarios that I am worried about and will need to consider carefully.

Please enumerate all possible failure scenarios.

The spec is updated to consider the failure scenario in note-9, and another failure scenario mentioned during 20141208 conference call.

#12 Updated by Vince Lehman almost 5 years ago

(b) in case of a non-existent face error of the same face, the RibUpdateBatch is abandoned;

In this case, isn't it still possible that an update for another face ID has already been applied to the FIB? You would still need to reverse the previously successful update to keep the FIB and RIB consistent.

For example:

The RibUpdateBatch has face ID 1 and the FibUpdater sends updates with face 1 and face 2. By the time the FibUpdater receives a response that face 1 is invalid, the update with face 2 has already been sent.

#13 Updated by Vince Lehman almost 5 years ago

Junxiao,

I've attached a summary of a meeting between Lan and me about the issues and concerns with the FibUpdater design. I would appreciate if you could read it and confirm my understanding of the design is correct. I would especially appreciate any comments on whether batching is a necessity and if so, how to handle calculation in an efficient and non-overly complicated manner.

#14 Updated by Junxiao Shi almost 5 years ago

  • Description updated (diff)

The following is added to solve the problem in note-12:

FIB updates for RibUpdateBatch's face should be sent before FIB updates for other faces.

#15 Updated by Junxiao Shi almost 5 years ago

  • Tracker changed from Task to Feature

In #1941,

  • Each RibUpdateBatch contains exactly one RIB update.
  • The algorithm to compute the set of FIB updates in response to a RIB update is reused from #1325.

whether batching is a necessity and if so, how to handle calculation in an efficient and non-overly complicated manner.

The necessity of having more than one RIB updates in a RibUpdateBatch, and how to design the algorithm, is out of scope of #1941.
Please discuss them in #1698.

#16 Updated by Junxiao Shi almost 5 years ago

#17 Updated by Vince Lehman almost 5 years ago

(c) in case of a non-existent face error of a different face, the FIB update is skipped;

I think this should be OK since inherited routes cannot change a CAPTURE or CHILD_INHERIT flag.

The inherited routes can only affect FIB Updates with the same Face ID as themselves. If inherited routes with an invalid Face are left in the RIB, they should only affect inherited routes which would have the invalid Face and can only block FIB Updates for the invalid Face.

e.g.)

If /a has inherited route (FaceId=3, Cost=10) and a RibUpdate generates the inherited route (3, 15) for /a, the FibUpdater will not generate a FIB update as the new inherited route has a higher cost. If instead, the RibUpdate generates the inherited route (1, 10) for /a, the inherited route (3, 10) will not block the FIB update generation or installation of (1, 10).

The RIB will be inconsistent with the FIB but once the RibManager is notified of the destroyed face or finds the face to be gone, the invalid Face will be removed from the RIB, including inherited routes.

#18 Updated by Vince Lehman almost 5 years ago

Blocked by Feature #2293: Controller: stop-and-wait added

Would it not be possible to instead send the FibUpdates which have the RibUpdateBatch's face without stop-and-wait, wait for all of them to be responded to successfully, and then send the rest of the FibUpdates without stop-and-wait?

Or could NFD potentially reject some due to Bug #1990?

#19 Updated by Vince Lehman almost 5 years ago

  • % Done changed from 0 to 40

#20 Updated by Junxiao Shi almost 5 years ago

#21 Updated by Vince Lehman almost 5 years ago

  • Description updated (diff)
  • % Done changed from 40 to 60

#22 Updated by Junxiao Shi over 4 years ago

Rib/CancelExpirationEvent test case needs to be completed before this issue can close.

#23 Updated by Junxiao Shi over 4 years ago

  • Description updated (diff)

Design is updated due to the decision in #2468 note-1 and the related RibMgmt protocol update.

#24 Updated by Vince Lehman over 4 years ago

Junxiao Shi wrote:
> `Rib/CancelExpirationEvent` test case needs to be completed before this issue can close.

I think RibManager/CancelExpirationEvent completely covers that functionality; is there something additional I should test in Rib/CancelExpirationEvent or should I just remove the test case?

Junxiao Shi wrote:
> Design is updated due to the decision in #2468 note-1 and the related RibMgmt protocol update.

I will update the implementation based on the design decision.

#25 Updated by Vince Lehman over 4 years ago

  • Status changed from In Progress to Code review
  • % Done changed from 60 to 90

#26 Updated by Junxiao Shi over 4 years ago

NFD Developer Guide needs updated before closing this issue.

#27 Updated by Junxiao Shi over 4 years ago

commit:a4b0ef0b999cea0dd94188d11d61c86d949fbf42 adds a 'wasFaceDestroyed' parameter to FibUpdater::computeAndSendFibUpdates, and skips updates for batch.getFaceId() if it is true.

The problem was due to an assumption in design that if a FibUpdate fails for an update with the same Face ID as the batch, the whole batch should be abandoned and the RIB should remain unmodified.
In the case of a face destroy event, every update would fail since the FIB would respond with a non-existent face error. The RIB would remain unchanged and the routes would not be removed from the RIB.
This change checks whether or not this update is the result of a face destroy event, and if it is, only tries to apply the updates with a different face ID than the batch.

This design is wrong, because:

  • RibUpdate::Action::DESTROY_FACE has incorrect semantics. It's associated with an empty Route (with only FaceId), which is invalid.
  • A RIB update with RibUpdate::Action::DESTROY_FACE is converted to many RIB updates with RibUpdate::Action::UNREGISTER, but removal of Routes after a face is destroyed is semantically different from removal of a Route after an unregister command.
  • Whether the face was destroyed is a property of either RIB update or RibUpdateBatch. It shouldn't be represented outside of RibUpdateBatch.

The correct fix is:

  • Introduce RibUpdate::Action::REMOVE_FACE. Semantics: indicates a Route needs to be removed after a face is destroyed.
  • Add void Rib::beginRemoveFace(uint64_t faceId) as an entrypoint after FaceMonitor detects a face was destroyed.
  • RibManager::onFaceDestroyedEvent should call Rib::beginRemoveFace instead of generating a RibUpdate with empty Route and RibUpdate::Action::DESTROY_FACE.
  • Drop RibUpdate::Action::DESTROY_FACE.
  • FibUpdater::computeUpdates should inspect update.getAction(), and don't generate FIB updates for batch.getFaceId() when it's RibUpdate::Action::REMOVE_FACE.

#28 Updated by Junxiao Shi over 4 years ago

  • Status changed from Code review to Feedback

Code is merged.

NFD Developer Guide needs updated before closing this issue.

#29 Updated by Tai-Lin Chu over 4 years ago

I want to report a regression from this commit, which is found by bisecting all commits after v0.3.1.

Test case:

  1. a producer registers a name.
  2. right after the response is returned, a client sends an interest with the exact same name.

Expected result: the client gets the data

Actual result: the interest never reaches the producer

Further examination shows that this is because the fib update happens after the command response is generated. I think this is semantically incorrect because there is no way to know when fib update is done after producer sends rib register command.

#30 Updated by Junxiao Shi over 4 years ago

I've entered Bug #2761 based on note-29. Please discuss this further in #2761.

#31 Updated by Vince Lehman over 4 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100

NFD Developer's Guide has been updated

#32 Updated by Junxiao Shi over 4 years ago

  • Status changed from Resolved to Closed
  • Target version changed from v0.3 to v0.4

Also available in: Atom PDF