Project

General

Profile

Actions

Feature #5029

open

Feature #5031: Make NFD work out of the box

Self-learning forwarding strategy v2

Added by Teng Liang about 5 years ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Forwarding
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

The current version of self-learning forwarding strategy mainly handles the "no-next-hop" case, and uses multicast to one or more eligible faces. This is far from being useful or robust in real usage, so we plan to improve self-learning in the following five aspects:

  1. Interest processing pipelines, the added mechanisms are borrowed from the best route forwarding strategies:

    • Interest Retransmission Suppression (Code Review)
      • Add a retransmission suppression decision algorithm that suppresses Interest retransmissions using exponential backoff.
    • Interest Retransmission Forwarding
      • For a retransmitted Interest that bypass the suppression period, forward it to an unused next hop that has the lowest cost; if all next hops have been used, forward it in round-robin manner starting from the-lowest-cost next hop.
    • Add afterCsHit trigger (Code Review)
      • Attache PA to cached Data if necessary (#5018)
  2. Data processing pipelines: (Code Review)

    • Creating a unicast Face on receiving data from a multi-access Face
      • This feature is designed for WiFi AP-station mode (#4973)
  3. NACK processing pipelines: (Code Review)

    • NACK handling
      • Send Interest to an eligible unused next hop on receiving a no-route NACK
      • Once all next hops return no-route NACK; if this node is not consumer, send the NACK back to downstream
  4. NFD configuration:

    • Make NFD work out-of-the-box in a local network
      • Set self-learning as the default forwarding strategy for / in nfd.conf.sample (#5031)
  5. Unit Testing (#5026) (Code Review)

Discussion on May 15 2020

  1. NACK discovery Interests from unicast face if there is no next hops. This helps an edge case that an outside network reacts quicker to NACK from a local network with only unicast faces.
  2. Auto-created faces should have "on-demand" persistency.
  3. Deal with duplicate transmissions for multiple multicast faces on the same physical interface
  4. To make self-learning work at gateways
    • configure self-learning strategy with local name prefix, e.g., /edu/ucla
    • enable whitelist and blacklist configuration on Face system to indicate which faces can be used for broadcasting
    • Self-learning strategy uses face flag to determine broadcasting behavior

Read self-learning packet processing flows for more details.

Actions #1

Updated by Davide Pesavento about 5 years ago

I'm having a hard time understanding what this issue is about. The beginning of the description is pretty general and it sounds like you're talking about the whole forwarder... then you mention the best-route strategy (so, is it only about best-route?)... and right after you mentioned self-learning (and now you lost me)... please clarify.

the forwarding plane should be able to try other next hops if there are any

It is already able to do that. What and when other nexthops are tried is up to the strategy.

Actions #2

Updated by Teng Liang about 5 years ago

  • Subject changed from Handling NACK in forwarding strategies to Improve best-route with self-learning and new NACK handler
  • Description updated (diff)

I updated the description in response to note-1.

Actions #3

Updated by Teng Liang about 5 years ago

Actions #4

Updated by Davide Pesavento about 5 years ago

  • Tags set to SelfLearning
Actions #5

Updated by Davide Pesavento about 5 years ago

  • Start date deleted (10/16/2019)
Actions #6

Updated by Davide Pesavento about 5 years ago

The best-route strategy has always had pretty specific forwarding semantics. One of its most important characteristics is that it only looks at routes in the FIB, starting from the lowest-cost nexthop. Introducing a new version of best-route with completely different semantics is not a good idea in my opinion. We have a virtually infinite set of strategy names to choose from, why do you want to overload the already poorly named best-route?

Actions #7

Updated by Teng Liang about 5 years ago

Davide Pesavento wrote:

The best-route strategy has always had pretty specific forwarding semantics. One of its most important characteristics is that it only looks at routes in the FIB, starting from the lowest-cost nexthop. Introducing a new version of best-route with completely different semantics is not a good idea in my opinion. We have a virtually infinite set of strategy names to choose from, why do you want to overload the already poorly named best-route?

I don't care much about the strategy name. The point is whatever name we choose, we need to make it the default strategy for / in the conf file. The current default strategy is best route, so I used it. Do you have a name suggestion?

Actions #8

Updated by Davide Pesavento about 5 years ago

  • Tracker changed from Task to Feature
Actions #9

Updated by Davide Pesavento about 5 years ago

Teng Liang wrote:

I don't care much about the strategy name.

Names are important. We've already gotten enough complaints about the "best route" name for the current default strategy, because it doesn't clearly express the semantics. Let's not make it even worse.

Do you have a name suggestion?

Ehm... what's wrong with "self-learning"? Really, this issue should be about improving/fixing the existing self-learning strategy to handle more (real-world) cases. There is no need to merge anything with anything. Have a properly working, robust, self-learning strategy (as it should have been from the beginning) and then we can consider changing the default in nfd.conf.

Actions #10

Updated by Teng Liang about 5 years ago

  • Subject changed from Improve best-route with self-learning and new NACK handler to Improve self-learning forwarding strategy
  • Description updated (diff)
  • Status changed from New to In Progress
  • Assignee set to Teng Liang

Description is updated based on note-9.

Actions #11

Updated by Davide Pesavento about 5 years ago

  • Description updated (diff)
Actions #12

Updated by Davide Pesavento about 5 years ago

Teng Liang wrote:

  • If multiple next hops exist, an eligible next hop with the lowest cost should be picked. For a retransmitted Interest that can be forwarded, forward it to an unused next hop with the lowest cost; if all next hops have been tried, pick the first chosen next hop to forward.

How do you assign costs to routes created through self-learning?

  1. Once all next hops return no-route NACK
    • if this node is consumer, reinitiate discovery Interest flooding
    • if this node is not consumer, send the NACK back to downstreams

How do you plan to implement the "consumer or not?" check?

  • If the same retransmitted Interest has been forwarded 5 times (RETX_TRIGGER_BROADCAST_COUNT), reinitiate Interest flooding.

Who would do this? Any node in the network?

Actions #13

Updated by Davide Pesavento about 5 years ago

  • Description updated (diff)
Actions #14

Updated by Teng Liang about 5 years ago

Davide Pesavento wrote:

Teng Liang wrote:

  • If multiple next hops exist, an eligible next hop with the lowest cost should be picked. For a retransmitted Interest that can be forwarded, forward it to an unused next hop with the lowest cost; if all next hops have been tried, pick the first chosen next hop to forward.

How do you assign costs to routes created through self-learning?

A hard-coded value (2048 for the current imp).

  1. Once all next hops return no-route NACK
    • if this node is consumer, reinitiate discovery Interest flooding
    • if this node is not consumer, send the NACK back to downstreams

How do you plan to implement the "consumer or not?" check?

If there is one in-record, the face is local.

  • If the same retransmitted Interest has been forwarded 5 times (RETX_TRIGGER_BROADCAST_COUNT), reinitiate Interest flooding.

Who would do this? Any node in the network?

Should be the consumer node only (will update the processing flow).

Actions #15

Updated by Teng Liang about 5 years ago

  • Description updated (diff)
Actions #16

Updated by Davide Pesavento about 5 years ago

Teng Liang wrote:

Davide Pesavento wrote:

How do you assign costs to routes created through self-learning?

A hard-coded value (2048 for the current imp).

And how do you choose the next hop when you have multiple routes discovered via self-learning? In other words, how do you rank next hops with the same cost?

How do you plan to implement the "consumer or not?" check?

If there is one in-record, the face is local.

Huh? Having one in-record doesn't mean that the face is local. And not all consumers are connected to NFD through a local face.

Actions #17

Updated by Teng Liang about 5 years ago

Davide Pesavento wrote:

Teng Liang wrote:

Davide Pesavento wrote:

How do you assign costs to routes created through self-learning?

A hard-coded value (2048 for the current imp).

And how do you choose the next hop when you have multiple routes discovered via self-learning? In other words, how do you rank next hops with the same cost?

The current self-learning will only learn one route (the first coming back Data with PrefixAnn). It is possible to learn multiple paths in future designs, but how to set the cost is undefined yet.

How do you plan to implement the "consumer or not?" check?

If there is one in-record, the face is local.

Huh? Having one in-record doesn't mean that the face is local. And not all consumers are connected to NFD through a local face.

True. First, I meant if there is only one in-record, and the inFace is local. Second, these conditions cover our testing scenarios. Do your know if there's other information can be used to make better decision?

Actions #18

Updated by Davide Pesavento about 5 years ago

Teng Liang wrote:

Davide Pesavento wrote:

Teng Liang wrote:

Davide Pesavento wrote:

How do you assign costs to routes created through self-learning?

A hard-coded value (2048 for the current imp).

And how do you choose the next hop when you have multiple routes discovered via self-learning? In other words, how do you rank next hops with the same cost?

The current self-learning will only learn one route (the first coming back Data with PrefixAnn). It is possible to learn multiple paths in future designs, but how to set the cost is undefined yet.

So once you've learned one route/nexthop, you never flood discovery Interests anymore?

Huh? Having one in-record doesn't mean that the face is local. And not all consumers are connected to NFD through a local face.

True. First, I meant if there is only one in-record, and the inFace is local. Second, these conditions cover our testing scenarios. Do your know if there's other information can be used to make better decision?

Can you avoid basing the "re-flood" decision on whether it's a consumer node or not? It seems fragile in general. Maybe you can let the consumer app (or library) itself retransmit the Interest, and base the re-flood logic on incoming retransmissions?

Actions #19

Updated by Teng Liang about 5 years ago

Davide Pesavento wrote:

Teng Liang wrote:

Davide Pesavento wrote:

Teng Liang wrote:

Davide Pesavento wrote:

How do you assign costs to routes created through self-learning?

A hard-coded value (2048 for the current imp).

And how do you choose the next hop when you have multiple routes discovered via self-learning? In other words, how do you rank next hops with the same cost?

The current self-learning will only learn one route (the first coming back Data with PrefixAnn). It is possible to learn multiple paths in future designs, but how to set the cost is undefined yet.

So once you've learned one route/nexthop, you never flood discovery Interests anymore?

If there is a no-route NACK (and after tried all next hops), or the Interest retransmission (not suppressed) has reached a threshold. Re-flooding Interests only happens at consumer.

Huh? Having one in-record doesn't mean that the face is local. And not all consumers are connected to NFD through a local face.

True. First, I meant if there is only one in-record, and the inFace is local. Second, these conditions cover our testing scenarios. Do your know if there's other information can be used to make better decision?

Can you avoid basing the "re-flood" decision on whether it's a consumer node or not? It seems fragile in general. Maybe you can let the consumer app (or library) itself retransmit the Interest, and base the re-flood logic on incoming retransmissions?

Why is it fragile, is it because it is hard to decide if the the NFD is directly connecting to consumer? Depending on apps may not be reliable. E.g., the current ndn catchunk just terminates on receiving a no-route NACK.

Actions #20

Updated by Davide Pesavento about 5 years ago

Teng Liang wrote:

Davide Pesavento wrote:

Teng Liang wrote:

The current self-learning will only learn one route (the first coming back Data with PrefixAnn). It is possible to learn multiple paths in future designs, but how to set the cost is undefined yet.

So once you've learned one route/nexthop, you never flood discovery Interests anymore?

If there is a no-route NACK (and after tried all next hops), or the Interest retransmission (not suppressed) has reached a threshold.

This "if" is missing a "then" branch, so I don't understand the sentence.

Re-flooding Interests only happens at consumer.

And what if the re-flooding discovers a new nexthop? This is what I asked earlier and you said "The current self-learning will only learn one route"... you seem to be contradicting yourself.

Can you avoid basing the "re-flood" decision on whether it's a consumer node or not? It seems fragile in general. Maybe you can let the consumer app (or library) itself retransmit the Interest, and base the re-flood logic on incoming retransmissions?

Why is it fragile, is it because it is hard to decide if the the NFD is directly connecting to consumer?

Yes, it's impossible in the general case.

Depending on apps may not be reliable.

You said previously that when the number of retransmissions reaches a threshold, you trigger re-flooding. So you are relying on apps... which contradicts the above statement.

E.g., the current ndn catchunk just terminates on receiving a no-route NACK.

I don't get this argument. If catchunks doesn't behave properly, fix it! instead of adding workarounds to the forwarder.

Actions #21

Updated by Teng Liang about 5 years ago

  • Description updated (diff)

Responding to note-20:

The current self-learning only learns one route with one discovery Interest broadcasting, even if there are multiple producers. On receiving a NACK, the route will be cleared, so the next Interest will trigger Interest broadcasting.

We plan to add a more aggressive action in forwarding strategy, i.e., on receiving NACK, NFD should try alternative paths, if all have been tried, send the NACK to downstreams; the NFD directly connected to apps should trigger Interest broadcasting instead of sending no-route NACK back. This is not a workaround. I think forwarding strategy can take better actions (reinitiate discovering), instead of sending no-route back to apps (which does not help much).

In addition, there are several cases that NACK may not be returned back, e.g., on a multicast face, or the producer cannot respond NACK. However, another producer can still serve the data in the network, so we intend to add another aggressive Interest broadcasting triggering after receiving a consecutive Interest retransmission. The benefit that forwarding plane can react to unreachable producers faster.

How to decide if an NFD is directly connected to consumers is challenging with the current states NFD have. How about we add a direct-sent-from-app tag to Interest, and the direct connected NFD will record the state in PIT in-record, and remove the tag before forwarding it to upstreams.

Actions #22

Updated by Davide Pesavento about 5 years ago

  • Related to deleted (Feature #5031: Make NFD work out of the box)
Actions #23

Updated by Davide Pesavento about 5 years ago

  • Parent task set to #5031
Actions #24

Updated by Davide Pesavento almost 5 years ago

  • Target version set to 22.02
Actions #25

Updated by Teng Liang over 4 years ago

  • Description updated (diff)
Actions #26

Updated by Teng Liang over 4 years ago

  • Description updated (diff)

The current face created by self-learning has FacePersistency = persistent. The discussion on May 15 2020 proposes to change it as on-demand. After a second thought, persistent Face is better, because on-demand face cannot be created directly by protocol factory, and unicast faces learned in local environment are indeed persistent.

Actions #27

Updated by Davide Pesavento over 4 years ago

Teng Liang wrote:

because on-demand face cannot be created directly by protocol factory

This is a purely implementational concern and should not affect any design decision.

and unicast faces learned in local environment are indeed persistent.

What does this mean? Sounds like a circular argument. Please clarify.

I'm a little concerned about keeping the auto-created faces "forever". There is clearly a potential resource consumption issue here (there is a per-process limit on the number of open file descriptors), and it's easily exploitable.

Actions #28

Updated by Teng Liang over 4 years ago

Davide Pesavento wrote:

Teng Liang wrote:

because on-demand face cannot be created directly by protocol factory

This is a purely implementational concern and should not affect any design decision.

Right, so what was the concern for to avoid on-demand face creation in protocol factory?

and unicast faces learned in local environment are indeed persistent.

What does this mean? Sounds like a circular argument. Please clarify.

I'm a little concerned about keeping the auto-created faces "forever". There is clearly a potential resource consumption issue here (there is a per-process limit on the number of open file descriptors), and it's easily exploitable.

The number of unicast faces in a local network is normally limited, and their existence are consistent. But I accept either way.

Actions #29

Updated by Davide Pesavento over 4 years ago

Teng Liang wrote:

Davide Pesavento wrote:

Teng Liang wrote:

because on-demand face cannot be created directly by protocol factory

This is a purely implementational concern and should not affect any design decision.

Right, so what was the concern for to avoid on-demand face creation in protocol factory?

No concern. There was simply no use case.

Actions #30

Updated by Teng Liang over 4 years ago

  • Description updated (diff)
Actions #31

Updated by Teng Liang about 4 years ago

  • Subject changed from Improve self-learning forwarding strategy to Self-learning Forwarding Strategy Version 2
  • Description updated (diff)
Actions #32

Updated by Teng Liang about 4 years ago

  • Description updated (diff)
Actions #33

Updated by Davide Pesavento over 3 years ago

  • Target version changed from 22.02 to 22.12
Actions #34

Updated by Davide Pesavento about 2 years ago

  • Target version deleted (22.12)
Actions #35

Updated by Davide Pesavento 11 months ago

  • Tags changed from SelfLearning to self-learning
  • Subject changed from Self-learning Forwarding Strategy Version 2 to Self-learning forwarding strategy v2
  • Status changed from In Progress to New
  • Assignee deleted (Teng Liang)
Actions #36

Updated by Jan Romann 2 months ago

Would you still be interested in adding this feature to the NFD codebase? A colleague of mine and I updated Teng's NFD fork a while ago to the latest upstream state, so we could potentially follow up on the work that has already been done and, for example, incorporate the review comments made to the changes/patch sets pending on Gerrit (such as https://gerrit.named-data.net/c/NFD/+/6230/1).

Actions #37

Updated by Beichuan Zhang 2 months ago

Jan Romann wrote in #note-36:

Would you still be interested in adding this feature to the NFD codebase? A colleague of mine and I updated Teng's NFD fork a while ago to the latest upstream state, so we could potentially follow up on the work that has already been done and, for example, incorporate the review comments made to the changes/patch sets pending on Gerrit (such as https://gerrit.named-data.net/c/NFD/+/6230/1).

Yes, we are interested! The current NFD implements part of self-learning but not the full mechanism due to certain issues. It may be more effective to discuss this over a zoom meeting if you're interested. We have weekly meeting on Friday 9-11am Pacific Time and you're welcome to join. We'd like to learn more about how self-learning fits into your research and how your code will address certain complexity in implementation. But if the meeting time is not good for you, maybe you can follow up on Gerrit to get this going. Thanks!

Actions #38

Updated by Jan Romann about 2 months ago

Beichuan Zhang wrote in #note-37:

Yes, we are interested! The current NFD implements part of self-learning but not the full mechanism due to certain issues. It may be more effective to discuss this over a zoom meeting if you're interested. We have weekly meeting on Friday 9-11am Pacific Time and you're welcome to join. We'd like to learn more about how self-learning fits into your research and how your code will address certain complexity in implementation. But if the meeting time is not good for you, maybe you can follow up on Gerrit to get this going. Thanks!

That is great, thank you very much for your response :) My colleague and I will probably join you in today's call, looking forward to discussing this topic with you!

Actions

Also available in: Atom PDF