Feature #4683
closedadd RIB entry update with prefix announcement in self-learning
100%
Description
Given that a forwarding strategy is able to run events on the rib io_service, this feature adds handlers in the RIB service to assist self-learning updating RIB.
Updated by Teng Liang over 6 years ago
- Blocks Feature #4279: Self-learning strategy added
Updated by Davide Pesavento over 6 years ago
- Category set to RIB
- Assignee set to Teng Liang
- Target version set to v0.7
Teng Liang wrote:
Given that a forwarding strategy is able to run events on the rib io_service, this feature adds handlers in the RIB service to assist self-learning updating RIB.
I have no idea what this sentence means. "run events"... do you mean callbacks? "adds handlers"... what handlers? what do they handle?
Updated by Teng Liang over 6 years ago
Davide Pesavento wrote:
Teng Liang wrote:
Given that a forwarding strategy is able to run events on the rib io_service, this feature adds handlers in the RIB service to assist self-learning updating RIB.
I have no idea what this sentence means. "run events"... do you mean callbacks? "adds handlers"... what handlers? what do they handle?
With the global rib io_service pointer, one is able to post functions to it, which is running in the RIB thread. To achieve registering RIB entries in a forwarding strategy, the rib service is supposed to be exposed, and be passed to the rib io_service as one parameter. In additions, handlers (functions) should be passed together with the rib service. More specifically, I am thinking to add registerEntryFromPA
and unregisterEntryFromPA
two functions in the rib service class. Therefore, a forwarding strategy is able to call runOnRibIoService(bind(rib::Service::registerEntryFromPA, g_ribService, pa))
to register a RIB entry, which then registers a FIB entry.
Updated by Teng Liang over 6 years ago
- Subject changed from add RIB service handlers to update RIB for self-learning to add RIB entry update with prefix announcement in self-learning
- Status changed from New to In Progress
Updated by Junxiao Shi over 6 years ago
- Blocked by Feature #4650: Accept and store PrefixAnnouncement in rib/announce command added
Updated by Junxiao Shi about 6 years ago
- Assignee changed from Teng Liang to Junxiao Shi
- % Done changed from 0 to 10
- Estimated time set to 6.00 h
https://gerrit.named-data.net/#/c/NFD/+/4914/ patchset4 has the headers for the methods provided to self-learning.
I did not include a "withdraw" function. The strategy is expected to "announce" the route when a PA arrives, and "renew" the route every time a Data is successfully retrieved using a learned route. Both operations accept a maxLifetime
parameter that is expected to be much shorter than PA's own lifetime (suggest 15-60 seconds, self-learning is effective when maxLifetime
is greater than Interest interval). In case the strategy detects a bad route, this information should be recorded in the measurements table, while the route on RIB side is retained for later probing.
Since the functions are usable on RIB thread only, strategy needs to invoke them like:
// asynchronous function
runOnRibIoService([=] {
rib::Service::get().getRibManager().slX(arg,
[=] (const Result& res) {
runOnMainIoService([=] {
// use the result
});
});
});
// synchronous function
runOnRibIoService([=] {
Result res = rib::Service::get().getRibManager().slX(arg);
runOnMainIoService([=] {
// use the result
});
});
I'll let Teng implement this runOnMainIoService
function.
Updated by Teng Liang about 6 years ago
So feature #4650 is not designed to be used for self-learning, but only the PA storage part?
Updated by Junxiao Shi about 6 years ago
Updated by Teng Liang about 6 years ago
Junxiao Shi wrote:
https://gerrit.named-data.net/#/c/NFD/+/4914/ patchset4 has the headers for the methods provided to self-learning.
I did not include a "withdraw" function.
Why not? If the producer becomes unaccessible, how to remove the bad routes?
Both operations accept a
maxLifetime
parameter that is expected to be much shorter than PA's own lifetime (suggest 15-60 seconds, self-learning is effective whenmaxLifetime
is greater than Interest interval).
There are stable networks, where operators can set the route lifetime high (compared to what you suggest) to reduce the chances of Interest broadcasting.
In case the strategy detects a bad route, this information should be recorded in the measurements table, while the route on RIB side is retained for later probing.
How to detect a bad route? When will the probing be executed (and by the forwarding strategy)?
Updated by Junxiao Shi about 6 years ago
There are stable networks, where operators can set the route lifetime high (compared to what you suggest) to reduce the chances of Interest broadcasting.
Setting a short route lifetime does not lead to higher overhead in a stable network, as long as the route is actively being used. Every time a Data is retrieved following a route, the route lifetime gets extended (see slRenew
function). Therefore, if strategy specifies maxLifetime=15_s, there would be no flooding as long as there’s at least one Interest under the route’s prefix every 15 seconds.
Setting a long route lifetime is beneficial when a producer moves back and forth between several attachment points, because strategy can immediately try a different known route when a non-discovery Interest is Nacked.
On the other hand, suppose you set the route lifetime to 1 year, it’s very likely that the producer is no longer in the network after such a long time, and thus there’s an overhead in storing the route. Moreover, if the producer is still in the network but at a different location, having that old route means the network would first forward a non-discovery Interest to the old location, and then re-flood after getting a Nack, causing a higher delay than flooding directly.
I did not include a "withdraw" function.
Why not? If the producer becomes unaccessible, how to remove the bad routes?
Strategy doesn’t remove “bad routes” because strategy can never confirm a route has gone bad forever, even if you receive one or more Nacks against the route. Instead, if there’s no Data coming back from a nexthop for long enough (either because a link failed or producer moved away, or because there’s no Interest coming), the route expires on its own.
Of course, calling slRenew
with zero lifetime effectively erases the route, but strategy shouldn’t need it.
How to detect a bad route?
When an Interest is sent following a route but is unsatisfied (Nacked or timed out), the route turns “yellow” (see Cheng’s dissertation about semantics of route colors). Whether a route is “green” or “yellow” is recorded in measurements table, similar to how ASF and NCC strategy record per-namespace information.
A route can also turn “red” but only when the face fails or disappears. This is handled on RIB side through face monitor and does not need a withdraw command from strategy.
When will the probing be executed (and by the forwarding strategy)?
Strategy records next probing time in the measurements table. Probing allows strategy to discover alternative route that is faster than current active route.
I advice not to implement probing initially. Self-learning can work without probing.
Updated by Teng Liang about 6 years ago
Junxiao Shi wrote:
Setting a short route lifetime does not lead to higher overhead in a stable network, as long as the route is actively being used. Every time a Data is retrieved following a route, the route lifetime gets extended (see
slRenew
function). Therefore, if strategy specifies maxLifetime=15_s, there would be no flooding as long as there’s at least one Interest under the route’s prefix every 15 seconds.
Why 15 seconds is perfect value? I am thinking of 10 mins or hours.
How to detect a bad route?
When an Interest is sent following a route but is unsatisfied (Nacked or timed out), the route turns “yellow” (see Cheng’s dissertation about semantics of route colors). Whether a route is “green” or “yellow” is recorded in measurements table, similar to how ASF and NCC strategy record per-namespace information.
A route can also turn “red” but only when the face fails or disappears. This is handled on RIB side through face monitor and does not need a withdraw command from strategy.
How does self-learning treat "yellow" and "red" faces?
I am thinking of a different design. When an Interest is nacked from on a discovered path, the routes along the path should be revoked, and the consumer should start a new round of route discovery by broadcasting Interests. In this design, the logics become simpler, because it does not need to color faces.
Updated by Junxiao Shi about 6 years ago
Why 15 seconds is perfect value? I am thinking of 10 mins or hours.
When a route is actively in use, it's likely that there's at least one Interest every 15 seconds. When a route is not actively in use, there's no point keep it.
How does self-learning treat "yellow" and "red" faces?
In the same way as described in Cheng Yi's dissertation.
I am thinking of a different design. When an Interest is nacked from on a discovered path, the routes along the path should be revoked, and the consumer should start a new round of route discovery by broadcasting Interests. In this design, the logics become simpler, because it does not need to color faces.
No, a single Nack/timeout should not revoke the route. See "producer moves back and forth" scenario in note-10 and also #4193.
Updated by Teng Liang about 6 years ago
Junxiao Shi wrote:
Why 15 seconds is perfect value? I am thinking of 10 mins or hours.
When a route is actively in use, it's likely that there's at least one Interest every 15 seconds. When a route is not actively in use, there's no point keep it.
Do you have any data (traffic) to support this argument? I think it's likely to be 10 minutes.
I am thinking of a different design. When an Interest is nacked from on a discovered path, the routes along the path should be revoked, and the consumer should start a new round of route discovery by broadcasting Interests. In this design, the logics become simpler, because it does not need to color faces.
No, a single Nack/timeout should not revoke the route. See "producer moves back and forth" scenario in note-10 and also #4193.
In your scenario, the route is kept for only 15s. Are you saying the producer moves back and forth within 15s? Please give some practical use cases.
Updated by Junxiao Shi about 6 years ago
Do you have any data (traffic) to support this argument? I think it's likely to be 10 minutes.
In your scenario, the route is kept for only 15s. Are you saying the producer moves back and forth within 15s? Please give some practical use cases.
Let’s not hang on this. It’s a parameter controlled from strategy side (#3868).
Updated by Junxiao Shi about 6 years ago
- % Done changed from 10 to 30
https://gerrit.named-data.net/#/c/NFD/+/4947 has yet another minor refactoring: RIB test suite has a new way to mock FIB updates. Then, RibManager
's logic of creating RibUpdateBatch
is abstracted out of control command handler into beginAddRoute
and beginRemoveRoute
functions. slAnnounce
will need to use beginAddRoute
to update the RIB and get notified about the result.
Reminder: Teng needs to work on runOnMainIoService
function, see #4683-6 second half.
Updated by Junxiao Shi about 6 years ago
- % Done changed from 40 to 80
https://gerrit.named-data.net/#/c/NFD/+/4914/ patchset12 completes the three helper functions.
Current implementation has a known limitation: if slAnnounce
is in progress, the result of slRenew
and slFindAnn
may reflect prior RIB state. Resolving this problem would require a refactoring of RIB update queuing mechanism, which should be done together with #1698.
Updated by Junxiao Shi about 6 years ago
- Has duplicate Feature #4470: Extend RIB for prefix announcement in self-learning added
Updated by Junxiao Shi about 6 years ago
- Blocked by deleted (Feature #4650: Accept and store PrefixAnnouncement in rib/announce command)
Updated by Junxiao Shi about 6 years ago
- Status changed from In Progress to Closed
- % Done changed from 80 to 100
Updated by Junxiao Shi almost 6 years ago
During the development of this feature, I was lazy that I reused the localhop_security
validator.
This issue was brought up on 20190111 NFD call because NFD-Android does not have an effective way to configure certificates, and allowing any registration would enable attackers to manipulate RIB remotely.
As a short-term solution, I think it's unnecessary to change any validation logic. NFD-Android can just configure the validator to (1) accept all Data packets (2) reject all Interest packets.
Since /localhop/nfd/rib/register
commands are Interest packets, NFD-Android would not allow attackers to manipulate its RIB; since PAs are Data packets, NFD-Android can enjoy the benefit of self-learning.
Updated by Davide Pesavento 12 months ago
- Tags changed from SelfLearning to self-learning