Project

General

Profile

Actions

Feature #4683

closed

add RIB entry update with prefix announcement in self-learning

Added by Teng Liang over 6 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
RIB
Target version:
Start date:
07/24/2018
Due date:
% Done:

100%

Estimated time:
6.00 h

Description

Given that a forwarding strategy is able to run events on the rib io_service, this feature adds handlers in the RIB service to assist self-learning updating RIB.


Related issues 2 (0 open2 closed)

Has duplicate NFD - Feature #4470: Extend RIB for prefix announcement in self-learningDuplicateTeng Liang

Actions
Blocks NFD - Feature #4279: Self-learning strategyClosedTeng Liang09/27/2017

Actions
Actions #1

Updated by Teng Liang over 6 years ago

Actions #2

Updated by Davide Pesavento over 6 years ago

  • Category set to RIB
  • Assignee set to Teng Liang
  • Target version set to v0.7

Teng Liang wrote:

Given that a forwarding strategy is able to run events on the rib io_service, this feature adds handlers in the RIB service to assist self-learning updating RIB.

I have no idea what this sentence means. "run events"... do you mean callbacks? "adds handlers"... what handlers? what do they handle?

Actions #3

Updated by Teng Liang over 6 years ago

Davide Pesavento wrote:

Teng Liang wrote:

Given that a forwarding strategy is able to run events on the rib io_service, this feature adds handlers in the RIB service to assist self-learning updating RIB.

I have no idea what this sentence means. "run events"... do you mean callbacks? "adds handlers"... what handlers? what do they handle?

With the global rib io_service pointer, one is able to post functions to it, which is running in the RIB thread. To achieve registering RIB entries in a forwarding strategy, the rib service is supposed to be exposed, and be passed to the rib io_service as one parameter. In additions, handlers (functions) should be passed together with the rib service. More specifically, I am thinking to add registerEntryFromPA and unregisterEntryFromPA two functions in the rib service class. Therefore, a forwarding strategy is able to call runOnRibIoService(bind(rib::Service::registerEntryFromPA, g_ribService, pa)) to register a RIB entry, which then registers a FIB entry.

Actions #4

Updated by Teng Liang over 6 years ago

  • Subject changed from add RIB service handlers to update RIB for self-learning to add RIB entry update with prefix announcement in self-learning
  • Status changed from New to In Progress
Actions #5

Updated by Junxiao Shi over 6 years ago

  • Blocked by Feature #4650: Accept and store PrefixAnnouncement in rib/announce command added
Actions #6

Updated by Junxiao Shi over 6 years ago

  • Assignee changed from Teng Liang to Junxiao Shi
  • % Done changed from 0 to 10
  • Estimated time set to 6.00 h

https://gerrit.named-data.net/#/c/NFD/+/4914/ patchset4 has the headers for the methods provided to self-learning.

I did not include a "withdraw" function. The strategy is expected to "announce" the route when a PA arrives, and "renew" the route every time a Data is successfully retrieved using a learned route. Both operations accept a maxLifetime parameter that is expected to be much shorter than PA's own lifetime (suggest 15-60 seconds, self-learning is effective when maxLifetime is greater than Interest interval). In case the strategy detects a bad route, this information should be recorded in the measurements table, while the route on RIB side is retained for later probing.


Since the functions are usable on RIB thread only, strategy needs to invoke them like:

// asynchronous function
runOnRibIoService([=] {
  rib::Service::get().getRibManager().slX(arg,
    [=] (const Result& res) {
      runOnMainIoService([=] {
        // use the result
      });
    });
});

// synchronous function
runOnRibIoService([=] {
  Result res = rib::Service::get().getRibManager().slX(arg);
  runOnMainIoService([=] {
    // use the result
  });
});

I'll let Teng implement this runOnMainIoService function.

Actions #7

Updated by Teng Liang over 6 years ago

So feature #4650 is not designed to be used for self-learning, but only the PA storage part?

Actions #8

Updated by Junxiao Shi over 6 years ago

So feature #4650 is not designed to be used for self-learning, but only the PA storage part?

#4650 provides PA structure and storage, which is for both self-learning and #4648 use case.
I'll add self-learning specific functions under this issue.

Actions #9

Updated by Teng Liang over 6 years ago

Junxiao Shi wrote:

https://gerrit.named-data.net/#/c/NFD/+/4914/ patchset4 has the headers for the methods provided to self-learning.

I did not include a "withdraw" function.

Why not? If the producer becomes unaccessible, how to remove the bad routes?

Both operations accept a maxLifetime parameter that is expected to be much shorter than PA's own lifetime (suggest 15-60 seconds, self-learning is effective when maxLifetime is greater than Interest interval).

There are stable networks, where operators can set the route lifetime high (compared to what you suggest) to reduce the chances of Interest broadcasting.

In case the strategy detects a bad route, this information should be recorded in the measurements table, while the route on RIB side is retained for later probing.

How to detect a bad route? When will the probing be executed (and by the forwarding strategy)?

Actions #10

Updated by Junxiao Shi over 6 years ago

There are stable networks, where operators can set the route lifetime high (compared to what you suggest) to reduce the chances of Interest broadcasting.

Setting a short route lifetime does not lead to higher overhead in a stable network, as long as the route is actively being used. Every time a Data is retrieved following a route, the route lifetime gets extended (see slRenew function). Therefore, if strategy specifies maxLifetime=15_s, there would be no flooding as long as there’s at least one Interest under the route’s prefix every 15 seconds.

Setting a long route lifetime is beneficial when a producer moves back and forth between several attachment points, because strategy can immediately try a different known route when a non-discovery Interest is Nacked.
On the other hand, suppose you set the route lifetime to 1 year, it’s very likely that the producer is no longer in the network after such a long time, and thus there’s an overhead in storing the route. Moreover, if the producer is still in the network but at a different location, having that old route means the network would first forward a non-discovery Interest to the old location, and then re-flood after getting a Nack, causing a higher delay than flooding directly.

I did not include a "withdraw" function.

Why not? If the producer becomes unaccessible, how to remove the bad routes?

Strategy doesn’t remove “bad routes” because strategy can never confirm a route has gone bad forever, even if you receive one or more Nacks against the route. Instead, if there’s no Data coming back from a nexthop for long enough (either because a link failed or producer moved away, or because there’s no Interest coming), the route expires on its own.

Of course, calling slRenew with zero lifetime effectively erases the route, but strategy shouldn’t need it.

How to detect a bad route?

When an Interest is sent following a route but is unsatisfied (Nacked or timed out), the route turns “yellow” (see Cheng’s dissertation about semantics of route colors). Whether a route is “green” or “yellow” is recorded in measurements table, similar to how ASF and NCC strategy record per-namespace information.

A route can also turn “red” but only when the face fails or disappears. This is handled on RIB side through face monitor and does not need a withdraw command from strategy.

When will the probing be executed (and by the forwarding strategy)?

Strategy records next probing time in the measurements table. Probing allows strategy to discover alternative route that is faster than current active route.
I advice not to implement probing initially. Self-learning can work without probing.

Actions #11

Updated by Teng Liang over 6 years ago

Junxiao Shi wrote:

Setting a short route lifetime does not lead to higher overhead in a stable network, as long as the route is actively being used. Every time a Data is retrieved following a route, the route lifetime gets extended (see slRenew function). Therefore, if strategy specifies maxLifetime=15_s, there would be no flooding as long as there’s at least one Interest under the route’s prefix every 15 seconds.

Why 15 seconds is perfect value? I am thinking of 10 mins or hours.

How to detect a bad route?

When an Interest is sent following a route but is unsatisfied (Nacked or timed out), the route turns “yellow” (see Cheng’s dissertation about semantics of route colors). Whether a route is “green” or “yellow” is recorded in measurements table, similar to how ASF and NCC strategy record per-namespace information.

A route can also turn “red” but only when the face fails or disappears. This is handled on RIB side through face monitor and does not need a withdraw command from strategy.

How does self-learning treat "yellow" and "red" faces?

I am thinking of a different design. When an Interest is nacked from on a discovered path, the routes along the path should be revoked, and the consumer should start a new round of route discovery by broadcasting Interests. In this design, the logics become simpler, because it does not need to color faces.

Actions #12

Updated by Junxiao Shi over 6 years ago

Why 15 seconds is perfect value? I am thinking of 10 mins or hours.

When a route is actively in use, it's likely that there's at least one Interest every 15 seconds. When a route is not actively in use, there's no point keep it.

How does self-learning treat "yellow" and "red" faces?

In the same way as described in Cheng Yi's dissertation.

I am thinking of a different design. When an Interest is nacked from on a discovered path, the routes along the path should be revoked, and the consumer should start a new round of route discovery by broadcasting Interests. In this design, the logics become simpler, because it does not need to color faces.

No, a single Nack/timeout should not revoke the route. See "producer moves back and forth" scenario in note-10 and also #4193.

Actions #13

Updated by Teng Liang over 6 years ago

Junxiao Shi wrote:

Why 15 seconds is perfect value? I am thinking of 10 mins or hours.

When a route is actively in use, it's likely that there's at least one Interest every 15 seconds. When a route is not actively in use, there's no point keep it.

Do you have any data (traffic) to support this argument? I think it's likely to be 10 minutes.

I am thinking of a different design. When an Interest is nacked from on a discovered path, the routes along the path should be revoked, and the consumer should start a new round of route discovery by broadcasting Interests. In this design, the logics become simpler, because it does not need to color faces.

No, a single Nack/timeout should not revoke the route. See "producer moves back and forth" scenario in note-10 and also #4193.

In your scenario, the route is kept for only 15s. Are you saying the producer moves back and forth within 15s? Please give some practical use cases.

Actions #14

Updated by Junxiao Shi over 6 years ago

Do you have any data (traffic) to support this argument? I think it's likely to be 10 minutes.
In your scenario, the route is kept for only 15s. Are you saying the producer moves back and forth within 15s? Please give some practical use cases.

Let’s not hang on this. It’s a parameter controlled from strategy side (#3868).

Actions #15

Updated by Junxiao Shi over 6 years ago

  • % Done changed from 10 to 30

https://gerrit.named-data.net/#/c/NFD/+/4947 has yet another minor refactoring: RIB test suite has a new way to mock FIB updates. Then, RibManager's logic of creating RibUpdateBatch is abstracted out of control command handler into beginAddRoute and beginRemoveRoute functions. slAnnounce will need to use beginAddRoute to update the RIB and get notified about the result.

Reminder: Teng needs to work on runOnMainIoService function, see #4683-6 second half.

Actions #16

Updated by Junxiao Shi over 6 years ago

  • % Done changed from 30 to 40
Actions #17

Updated by Junxiao Shi over 6 years ago

  • % Done changed from 40 to 80

https://gerrit.named-data.net/#/c/NFD/+/4914/ patchset12 completes the three helper functions.
Current implementation has a known limitation: if slAnnounce is in progress, the result of slRenew and slFindAnn may reflect prior RIB state. Resolving this problem would require a refactoring of RIB update queuing mechanism, which should be done together with #1698.

Actions #18

Updated by Junxiao Shi over 6 years ago

  • Has duplicate Feature #4470: Extend RIB for prefix announcement in self-learning added
Actions #19

Updated by Junxiao Shi over 6 years ago

  • Blocked by deleted (Feature #4650: Accept and store PrefixAnnouncement in rib/announce command)
Actions #20

Updated by Junxiao Shi over 6 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 80 to 100
Actions #21

Updated by Junxiao Shi about 6 years ago

  • Tags set to SelfLearning
Actions #22

Updated by Junxiao Shi almost 6 years ago

During the development of this feature, I was lazy that I reused the localhop_security validator.
This issue was brought up on 20190111 NFD call because NFD-Android does not have an effective way to configure certificates, and allowing any registration would enable attackers to manipulate RIB remotely.
As a short-term solution, I think it's unnecessary to change any validation logic. NFD-Android can just configure the validator to (1) accept all Data packets (2) reject all Interest packets.
Since /localhop/nfd/rib/register commands are Interest packets, NFD-Android would not allow attackers to manipulate its RIB; since PAs are Data packets, NFD-Android can enjoy the benefit of self-learning.

Actions #23

Updated by Davide Pesavento about 1 year ago

  • Tags changed from SelfLearning to self-learning
Actions

Also available in: Atom PDF