Feature #5029: Self-learning forwarding strategy v2 - NFD - NDN project issue tracking system

Actions

Copy link

Feature #5029

open

Feature #5031: Make NFD work out of the box

Self-learning forwarding strategy v2

Added by Teng Liang almost 6 years ago. Updated 11 months ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Forwarding

Target version:

Start date:

Due date:

% Done:

Estimated time:

Tags:

self-learning

Description

The current version of self-learning forwarding strategy mainly handles the "no-next-hop" case, and uses multicast to one or more eligible faces. This is far from being useful or robust in real usage, so we plan to improve self-learning in the following five aspects:

Interest processing pipelines, the added mechanisms are borrowed from the best route forwarding strategies:
- Interest Retransmission Suppression (Code Review)
  - Add a retransmission suppression decision algorithm that suppresses Interest retransmissions using exponential backoff.
- Interest Retransmission Forwarding
  - For a retransmitted Interest that bypass the suppression period, forward it to an unused next hop that has the lowest cost; if all next hops have been used, forward it in round-robin manner starting from the-lowest-cost next hop.
- Add afterCsHit trigger (Code Review)
  - Attache PA to cached Data if necessary (#5018)
Data processing pipelines: (Code Review)
- Creating a unicast Face on receiving data from a multi-access Face
  - This feature is designed for WiFi AP-station mode (#4973)
NACK processing pipelines: (Code Review)
- NACK handling
  - Send Interest to an eligible unused next hop on receiving a no-route NACK
  - Once all next hops return no-route NACK; if this node is not consumer, send the NACK back to downstream
NFD configuration:
- Make NFD work out-of-the-box in a local network
  - Set self-learning as the default forwarding strategy for / in nfd.conf.sample (#5031)
Unit Testing (#5026) (Code Review)

Discussion on May 15 2020

NACK discovery Interests from unicast face if there is no next hops. This helps an edge case that an outside network reacts quicker to NACK from a local network with only unicast faces.
Auto-created faces should have "on-demand" persistency.
Deal with duplicate transmissions for multiple multicast faces on the same physical interface
To make self-learning work at gateways
- configure self-learning strategy with local name prefix, e.g., /edu/ucla
- enable whitelist and blacklist configuration on Face system to indicate which faces can be used for broadcasting
- Self-learning strategy uses face flag to determine broadcasting behavior

Read self-learning packet processing flows for more details.

Actions

Copy link

Updated by Davide Pesavento almost 6 years ago

I'm having a hard time understanding what this issue is about. The beginning of the description is pretty general and it sounds like you're talking about the whole forwarder... then you mention the best-route strategy (so, is it only about best-route?)... and right after you mentioned self-learning (and now you lost me)... please clarify.

the forwarding plane should be able to try other next hops if there are any

It is already able to do that. What and when other nexthops are tried is up to the strategy.

Actions

Copy link

Updated by Teng Liang almost 6 years ago

Subject changed from Handling NACK in forwarding strategies to Improve best-route with self-learning and new NACK handler
Description updated (diff)

I updated the description in response to note-1.

Actions

Copy link

Updated by Teng Liang almost 6 years ago

Related to Feature #5031: Make NFD work out of the box added

Actions

Copy link

Updated by Davide Pesavento almost 6 years ago

Tags set to SelfLearning

Actions

Copy link

Updated by Davide Pesavento almost 6 years ago

Start date deleted (~~10/16/2019~~)

Actions

Copy link

Updated by Davide Pesavento almost 6 years ago

The best-route strategy has always had pretty specific forwarding semantics. One of its most important characteristics is that it only looks at routes in the FIB, starting from the lowest-cost nexthop. Introducing a new version of best-route with completely different semantics is not a good idea in my opinion. We have a virtually infinite set of strategy names to choose from, why do you want to overload the already poorly named best-route?

Actions

Copy link

Updated by Teng Liang almost 6 years ago

Davide Pesavento wrote:

The best-route strategy has always had pretty specific forwarding semantics. One of its most important characteristics is that it only looks at routes in the FIB, starting from the lowest-cost nexthop. Introducing a new version of best-route with completely different semantics is not a good idea in my opinion. We have a virtually infinite set of strategy names to choose from, why do you want to overload the already poorly named best-route?

I don't care much about the strategy name. The point is whatever name we choose, we need to make it the default strategy for / in the conf file. The current default strategy is best route, so I used it. Do you have a name suggestion?

Actions

Copy link

Updated by Davide Pesavento almost 6 years ago

Tracker changed from Task to Feature

Actions

Copy link

Updated by Davide Pesavento almost 6 years ago

Teng Liang wrote:

I don't care much about the strategy name.

Names are important. We've already gotten enough complaints about the "best route" name for the current default strategy, because it doesn't clearly express the semantics. Let's not make it even worse.

Do you have a name suggestion?

Ehm... what's wrong with "self-learning"? Really, this issue should be about improving/fixing the existing self-learning strategy to handle more (real-world) cases. There is no need to merge anything with anything. Have a properly working, robust, self-learning strategy (as it should have been from the beginning) and then we can consider changing the default in nfd.conf.

Actions

Copy link

#10

Updated by Teng Liang almost 6 years ago

Subject changed from Improve best-route with self-learning and new NACK handler to Improve self-learning forwarding strategy
Description updated (diff)
Status changed from New to In Progress
Assignee set to Teng Liang

Description is updated based on note-9.

Actions

Copy link

#11

Updated by Davide Pesavento almost 6 years ago

Description updated (diff)

Actions

Copy link

#12

Updated by Davide Pesavento almost 6 years ago

Teng Liang wrote:

If multiple next hops exist, an eligible next hop with the lowest cost should be picked. For a retransmitted Interest that can be forwarded, forward it to an unused next hop with the lowest cost; if all next hops have been tried, pick the first chosen next hop to forward.

How do you assign costs to routes created through self-learning?

Once all next hops return no-route NACK

if this node is consumer, reinitiate discovery Interest flooding

if this node is not consumer, send the NACK back to downstreams

How do you plan to implement the "consumer or not?" check?

If the same retransmitted Interest has been forwarded 5 times (RETX_TRIGGER_BROADCAST_COUNT), reinitiate Interest flooding.

Who would do this? Any node in the network?

Actions

Copy link

#13

Updated by Davide Pesavento almost 6 years ago

Description updated (diff)

Actions

Copy link

#14

Updated by Teng Liang almost 6 years ago

Davide Pesavento wrote:

Teng Liang wrote:

If multiple next hops exist, an eligible next hop with the lowest cost should be picked. For a retransmitted Interest that can be forwarded, forward it to an unused next hop with the lowest cost; if all next hops have been tried, pick the first chosen next hop to forward.

How do you assign costs to routes created through self-learning?

A hard-coded value (2048 for the current imp).

Once all next hops return no-route NACK

if this node is consumer, reinitiate discovery Interest flooding

if this node is not consumer, send the NACK back to downstreams

How do you plan to implement the "consumer or not?" check?

If there is one in-record, the face is local.

If the same retransmitted Interest has been forwarded 5 times (RETX_TRIGGER_BROADCAST_COUNT), reinitiate Interest flooding.

Who would do this? Any node in the network?

Should be the consumer node only (will update the processing flow).

Actions

Copy link

#15

Updated by Teng Liang almost 6 years ago

Description updated (diff)

Actions

Copy link

#16

Updated by Davide Pesavento almost 6 years ago

Teng Liang wrote:

Davide Pesavento wrote:

How do you assign costs to routes created through self-learning?

A hard-coded value (2048 for the current imp).

And how do you choose the next hop when you have multiple routes discovered via self-learning? In other words, how do you rank next hops with the same cost?

How do you plan to implement the "consumer or not?" check?

If there is one in-record, the face is local.

Huh? Having one in-record doesn't mean that the face is local. And not all consumers are connected to NFD through a local face.

Actions

Copy link

#17

Updated by Teng Liang almost 6 years ago

Davide Pesavento wrote:

Teng Liang wrote:

Davide Pesavento wrote:

How do you assign costs to routes created through self-learning?

A hard-coded value (2048 for the current imp).

And how do you choose the next hop when you have multiple routes discovered via self-learning? In other words, how do you rank next hops with the same cost?

The current self-learning will only learn one route (the first coming back Data with PrefixAnn). It is possible to learn multiple paths in future designs, but how to set the cost is undefined yet.

How do you plan to implement the "consumer or not?" check?

If there is one in-record, the face is local.

Huh? Having one in-record doesn't mean that the face is local. And not all consumers are connected to NFD through a local face.

True. First, I meant if there is only one in-record, and the inFace is local. Second, these conditions cover our testing scenarios. Do your know if there's other information can be used to make better decision?

Actions

Copy link

#18

Updated by Davide Pesavento almost 6 years ago

Teng Liang wrote:

Davide Pesavento wrote:

Teng Liang wrote:

Davide Pesavento wrote:

How do you assign costs to routes created through self-learning?

A hard-coded value (2048 for the current imp).

And how do you choose the next hop when you have multiple routes discovered via self-learning? In other words, how do you rank next hops with the same cost?

The current self-learning will only learn one route (the first coming back Data with PrefixAnn). It is possible to learn multiple paths in future designs, but how to set the cost is undefined yet.

So once you've learned one route/nexthop, you never flood discovery Interests anymore?

Huh? Having one in-record doesn't mean that the face is local. And not all consumers are connected to NFD through a local face.

True. First, I meant if there is only one in-record, and the inFace is local. Second, these conditions cover our testing scenarios. Do your know if there's other information can be used to make better decision?

Can you avoid basing the "re-flood" decision on whether it's a consumer node or not? It seems fragile in general. Maybe you can let the consumer app (or library) itself retransmit the Interest, and base the re-flood logic on incoming retransmissions?

Actions

Copy link

#19

Updated by Teng Liang almost 6 years ago

Davide Pesavento wrote:

Teng Liang wrote:

Davide Pesavento wrote:

Teng Liang wrote:

Davide Pesavento wrote:

How do you assign costs to routes created through self-learning?

A hard-coded value (2048 for the current imp).

And how do you choose the next hop when you have multiple routes discovered via self-learning? In other words, how do you rank next hops with the same cost?

The current self-learning will only learn one route (the first coming back Data with PrefixAnn). It is possible to learn multiple paths in future designs, but how to set the cost is undefined yet.

So once you've learned one route/nexthop, you never flood discovery Interests anymore?

If there is a no-route NACK (and after tried all next hops), or the Interest retransmission (not suppressed) has reached a threshold. Re-flooding Interests only happens at consumer.

Huh? Having one in-record doesn't mean that the face is local. And not all consumers are connected to NFD through a local face.

True. First, I meant if there is only one in-record, and the inFace is local. Second, these conditions cover our testing scenarios. Do your know if there's other information can be used to make better decision?

Can you avoid basing the "re-flood" decision on whether it's a consumer node or not? It seems fragile in general. Maybe you can let the consumer app (or library) itself retransmit the Interest, and base the re-flood logic on incoming retransmissions?

Why is it fragile, is it because it is hard to decide if the the NFD is directly connecting to consumer? Depending on apps may not be reliable. E.g., the current ndn catchunk just terminates on receiving a no-route NACK.

Actions

Copy link

#20

Updated by Davide Pesavento almost 6 years ago

Teng Liang wrote:

Davide Pesavento wrote:

Teng Liang wrote:

The current self-learning will only learn one route (the first coming back Data with PrefixAnn). It is possible to learn multiple paths in future designs, but how to set the cost is undefined yet.

So once you've learned one route/nexthop, you never flood discovery Interests anymore?

If there is a no-route NACK (and after tried all next hops), or the Interest retransmission (not suppressed) has reached a threshold.

This "if" is missing a "then" branch, so I don't understand the sentence.

Re-flooding Interests only happens at consumer.

And what if the re-flooding discovers a new nexthop? This is what I asked earlier and you said "The current self-learning will only learn one route"... you seem to be contradicting yourself.

Can you avoid basing the "re-flood" decision on whether it's a consumer node or not? It seems fragile in general. Maybe you can let the consumer app (or library) itself retransmit the Interest, and base the re-flood logic on incoming retransmissions?

Why is it fragile, is it because it is hard to decide if the the NFD is directly connecting to consumer?

Yes, it's impossible in the general case.

Depending on apps may not be reliable.

You said previously that when the number of retransmissions reaches a threshold, you trigger re-flooding. So you are relying on apps... which contradicts the above statement.

E.g., the current ndn catchunk just terminates on receiving a no-route NACK.

I don't get this argument. If catchunks doesn't behave properly, fix it! instead of adding workarounds to the forwarder.

Actions

Copy link

#21

Updated by Teng Liang almost 6 years ago

Description updated (diff)

Responding to note-20:

The current self-learning only learns one route with one discovery Interest broadcasting, even if there are multiple producers. On receiving a NACK, the route will be cleared, so the next Interest will trigger Interest broadcasting.

We plan to add a more aggressive action in forwarding strategy, i.e., on receiving NACK, NFD should try alternative paths, if all have been tried, send the NACK to downstreams; the NFD directly connected to apps should trigger Interest broadcasting instead of sending no-route NACK back. This is not a workaround. I think forwarding strategy can take better actions (reinitiate discovering), instead of sending no-route back to apps (which does not help much).

In addition, there are several cases that NACK may not be returned back, e.g., on a multicast face, or the producer cannot respond NACK. However, another producer can still serve the data in the network, so we intend to add another aggressive Interest broadcasting triggering after receiving a consecutive Interest retransmission. The benefit that forwarding plane can react to unreachable producers faster.

How to decide if an NFD is directly connected to consumers is challenging with the current states NFD have. How about we add a direct-sent-from-app tag to Interest, and the direct connected NFD will record the state in PIT in-record, and remove the tag before forwarding it to upstreams.

Actions

Copy link

#22

Updated by Davide Pesavento over 5 years ago

Related to deleted (Feature #5031: Make NFD work out of the box)

Actions

Copy link

#23

Updated by Davide Pesavento over 5 years ago

Parent task set to #5031

Actions

Copy link

#24

Updated by Davide Pesavento over 5 years ago

Target version set to 22.02

Actions

Copy link

#25

Updated by Teng Liang about 5 years ago

Description updated (diff)

Actions

Copy link

#26

Updated by Teng Liang about 5 years ago

Description updated (diff)

The current face created by self-learning has FacePersistency = persistent. The discussion on May 15 2020 proposes to change it as on-demand. After a second thought, persistent Face is better, because on-demand face cannot be created directly by protocol factory, and unicast faces learned in local environment are indeed persistent.

Actions

Copy link

#27

Updated by Davide Pesavento about 5 years ago

Teng Liang wrote:

because on-demand face cannot be created directly by protocol factory

This is a purely implementational concern and should not affect any design decision.

and unicast faces learned in local environment are indeed persistent.

What does this mean? Sounds like a circular argument. Please clarify.

I'm a little concerned about keeping the auto-created faces "forever". There is clearly a potential resource consumption issue here (there is a per-process limit on the number of open file descriptors), and it's easily exploitable.

Actions

Copy link

#28

Updated by Teng Liang about 5 years ago

Davide Pesavento wrote:

Teng Liang wrote:

because on-demand face cannot be created directly by protocol factory

This is a purely implementational concern and should not affect any design decision.

Right, so what was the concern for to avoid on-demand face creation in protocol factory?

and unicast faces learned in local environment are indeed persistent.

What does this mean? Sounds like a circular argument. Please clarify.

I'm a little concerned about keeping the auto-created faces "forever". There is clearly a potential resource consumption issue here (there is a per-process limit on the number of open file descriptors), and it's easily exploitable.

The number of unicast faces in a local network is normally limited, and their existence are consistent. But I accept either way.

Actions

Copy link

#29

Updated by Davide Pesavento about 5 years ago

Teng Liang wrote:

Davide Pesavento wrote:

Teng Liang wrote:

because on-demand face cannot be created directly by protocol factory

This is a purely implementational concern and should not affect any design decision.

Right, so what was the concern for to avoid on-demand face creation in protocol factory?

No concern. There was simply no use case.

Actions

Copy link

#30

Updated by Teng Liang about 5 years ago

Description updated (diff)

Actions

Copy link

#31

Updated by Teng Liang almost 5 years ago

Subject changed from Improve self-learning forwarding strategy to Self-learning Forwarding Strategy Version 2
Description updated (diff)

Actions

Copy link

#32

Updated by Teng Liang almost 5 years ago

Description updated (diff)

Actions

Copy link

#33

Updated by Davide Pesavento about 4 years ago

Target version changed from 22.02 to 22.12

Actions

Copy link

#34

Updated by Davide Pesavento almost 3 years ago

Target version deleted (~~22.12~~)

Actions

Copy link

#35

Updated by Davide Pesavento over 1 year ago

Tags changed from SelfLearning to self-learning
Subject changed from Self-learning Forwarding Strategy Version 2 to Self-learning forwarding strategy v2
Status changed from In Progress to New
Assignee deleted (~~Teng Liang~~)

Actions

Copy link

#36

Updated by Jan Romann 11 months ago

Would you still be interested in adding this feature to the NFD codebase? A colleague of mine and I updated Teng's NFD fork a while ago to the latest upstream state, so we could potentially follow up on the work that has already been done and, for example, incorporate the review comments made to the changes/patch sets pending on Gerrit (such as https://gerrit.named-data.net/c/NFD/+/6230/1).

Actions

Copy link

#37

Updated by Beichuan Zhang 11 months ago

Jan Romann wrote in #note-36:

Would you still be interested in adding this feature to the NFD codebase? A colleague of mine and I updated Teng's NFD fork a while ago to the latest upstream state, so we could potentially follow up on the work that has already been done and, for example, incorporate the review comments made to the changes/patch sets pending on Gerrit (such as https://gerrit.named-data.net/c/NFD/+/6230/1).

Yes, we are interested! The current NFD implements part of self-learning but not the full mechanism due to certain issues. It may be more effective to discuss this over a zoom meeting if you're interested. We have weekly meeting on Friday 9-11am Pacific Time and you're welcome to join. We'd like to learn more about how self-learning fits into your research and how your code will address certain complexity in implementation. But if the meeting time is not good for you, maybe you can follow up on Gerrit to get this going. Thanks!

Actions

Copy link

#38

Updated by Jan Romann 11 months ago

Beichuan Zhang wrote in #note-37:

Yes, we are interested! The current NFD implements part of self-learning but not the full mechanism due to certain issues. It may be more effective to discuss this over a zoom meeting if you're interested. We have weekly meeting on Friday 9-11am Pacific Time and you're welcome to join. We'd like to learn more about how self-learning fits into your research and how your code will address certain complexity in implementation. But if the meeting time is not good for you, maybe you can follow up on Gerrit to get this going. Thanks!

That is great, thank you very much for your response :) My colleague and I will probably join you in today's call, looking forward to discussing this topic with you!

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD

Tags

Feature #5029

Self-learning forwarding strategy v2

Updated by Davide Pesavento almost 6 years ago

Updated by Teng Liang almost 6 years ago

Updated by Teng Liang almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Teng Liang almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Teng Liang almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Teng Liang almost 6 years ago

Updated by Teng Liang almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Teng Liang almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Teng Liang almost 6 years ago

Updated by Davide Pesavento almost 6 years ago

Updated by Teng Liang almost 6 years ago

Updated by Davide Pesavento over 5 years ago

Updated by Davide Pesavento over 5 years ago

Updated by Davide Pesavento over 5 years ago

Updated by Teng Liang about 5 years ago

Updated by Teng Liang about 5 years ago

Updated by Davide Pesavento about 5 years ago

Updated by Teng Liang about 5 years ago

Updated by Davide Pesavento about 5 years ago

Updated by Teng Liang about 5 years ago

Updated by Teng Liang almost 5 years ago

Updated by Teng Liang almost 5 years ago

Updated by Davide Pesavento about 4 years ago

Updated by Davide Pesavento almost 3 years ago

Updated by Davide Pesavento over 1 year ago

Updated by Jan Romann 11 months ago

Updated by Beichuan Zhang 11 months ago

Updated by Jan Romann 11 months ago