Feature #4343
openASF strategy should isolate timed-out or NACK'd interests into separate namespaces
0%
Description
In the week of October 9, 2017, some problems occurred on the NDN Testbed that revealed some flaws with the ASF strategy.
Currently, if the local NFD instance has a FIB entry for a very broad prefix (in our case, it was /ndn/edu/ucla
), then even one misbehaving or malicious prefix anchored there is capable of causing major routing disruptions for all traffic at that prefix. The strategy should be able to defend against things like this, in some way.
I have one suggestion to mitigate this: Whenever some interest provokes a NACK or is timed-out, the strategy should remember which namespace that was, and only change its nexthop selection for that namespace. Moreover, the strategy could choose to isolate a namespace that's an n-component prefix of the offending interest. The amount of components to remove should be configurable.
Updated by Nicholas Gordon about 7 years ago
I have a few thoughts, but these are more rough and un-discussed, so they are living in a note:
There could be a mechanism such that a prefix is considered "non-responsible", that is, interests under this prefix would always be isloated into their own namespaces, so that other, unknown traffic is considered "innocent", and nexthop choices are only affected for offending interests. This could be used in supplement to or instead of the suggestion in the issue description.
For example, /ndn/edu/memphis/nmgordon
could be considered non-responsible, so that failures or NACKs from /ndn/edu/memphis/nmgordon/some-experiment
don't cause a change in nexthop selection behavior for /ndn/edu/memphis/nmgordon/home-automation
, or any other prefixes I may advertise.
I'm not sure if failures at /ndn/edu/memphis
should cause selection changes for the nmgordon
sub-prefix, or if it should be considered separately. The main issues I see with something like this are that the strategy begins to subsume the function of the FIB, where it would need to keep a detailed and complex set of relationships for each FIB entry, maintaining a sort of "meta-FIB".
Updated by Nicholas Gordon about 7 years ago
Additionally, I should note that it seems clear to me that if a topology has a generally high degree, failures of this sort are less damaging, because in most cases the performance along alternate routes will be similar, but slightly worse. However, the concerned nodes on the testbed have comparatively low degree, and the difference in costs for those nexthops is very high, as some cross the Pacific Ocean. Possibly a strategy should be configured with a "preference" to use seemingly-unreliable links, because the alternatives are very-high latency.
I understand that the SRTT mechanism is supposed to fill this role, but it evidently needs some kind of tuning.
Updated by John DeHart about 7 years ago
I also wonder if there is perhaps a different NACK that is needed. If I remember correctly,
what was happening was the Interest with a prefix of /ndn/edu/ucla/scripts/ was getting
to the UCLA node but the daemon that should have handled that had died and the FIB entry
that that daemon had registered had gone away. So, the routing and forwarding actually
worked to the extent that the Interest reached the node it was supposed to. But, again
if I remember correctly, since the the Interest was within the network region for that node
it did not get a NACK NoRoute it just got Rejected. The the previous nodes just saw a
timeout for that set of Interests. Seems like what ASF needs is for that type of Interest to
get a NACK NoLocalRoute or something like that that indicates that we had reached the
correct place in the network there was just no local server for it. Then ASF could
ignore Interests that get that NACK and not make any changes.
I'll go back an review the log files and see if I can verify that that was really what I saw.
Updated by Anonymous about 7 years ago
One question: is the FIB prefix /ndn/edu/ucla split up at some point into more fine prefixes?
Whenever some interest provokes a NACK or is timed-out, the strategy should remember which namespace that was, and only change its nexthop selection for that namespace.
How do you know the granularity of the offending namespace? (i.e. the prefix length)
You can make the guess "n components broader than offending Interest", but then you have to make a trade-off between too much routing state (n too small) and isolating a too large prefix (n too large).
Some suggestions:
- Don't use timeouts as indication of a routing problem
- Use NACKs but consider the specific NACK type (as you said in the other issue)
- Require a signature for the NACKs to prevent/reduce malicious use
Timeouts can be caused by either a problem inside the network (link failure, router failure) or a problem at the end-point application (app not answering). Thus, unless your messages are specifically addressing routers (like the different OSPF message types), I wouldn't use timeouts to influence routing decisions.
Updated by Davide Pesavento over 5 years ago
- Tracker changed from Task to Feature
- Subject changed from ASF Strategy should isolate timed-out or NACK'd interests into separate namespaces. to ASF Strategy should isolate timed-out or NACK'd interests into separate namespaces
Updated by Davide Pesavento over 1 year ago
- Tags set to ASF
- Subject changed from ASF Strategy should isolate timed-out or NACK'd interests into separate namespaces to ASF strategy should isolate timed-out or NACK'd interests into separate namespaces
Updated by Junxiao Shi over 1 year ago
if the local NFD instance has a FIB entry for a very broad prefix (in our case, it was
/ndn/edu/ucla
), then even one misbehaving or malicious prefix anchored there is capable of causing major routing disruptions for all traffic at that prefix.
It doesn't need a misbehaving or malicious prefix to have major routing disruptions.
As many have observed, yoursunny ndn6 network peers with the global NDN testbed in multiple locations around the world.
In early 2022, I was announcing a single prefix, /yoursunny
, to the testbed.
Consequently, ASF strategy maintains a single set of measurements for all names under that prefix.
If I start ndnping /yoursunny/_/nrt
on several end hosts each single-homed to a testbed router, testbed routers will gradually optimize their paths toward Japan, applied to the prefix /yoursunny
.
When I add ndnping /yoursunny/_/mia
, I would observe traffic going toward Japan, even if the latter prefix is in USA.
Neither prefix is misbehaving or malicious, but all but one prefix would suffer from suboptimal forwarding paths.
I have one suggestion to mitigate this: Whenever some interest provokes a NACK or is timed-out, the strategy should remember which namespace that was, and only change its nexthop selection for that namespace. Moreover, the strategy could choose to isolate a namespace that's an n-component prefix of the offending interest. The amount of components to remove should be configurable.
This is the classical name granularity problem.
n-component prefix isn't going to work well, because the configuration parameter has to be tailored to each namespace.
While I can describe every name pattern in my network, the patterns may change over time, which means I have to repeatedly request NDNOPS to update their configs.
Instead, ASF strategy should adopt the self-tuning solution is described in:
Teng Liang, Junxiao Shi, and Beichuan Zhang. 2020. On the Prefix Granularity Problem in NDN Adaptive Forwarding. In Proceedings of the 7th ACM Conference on Information-Centric Networking (ICN '20). Association for Computing Machinery, New York, NY, USA, 41–51. https://doi.org/10.1145/3405656.3418712
There's also an issue with ForwardingHint.
Suppose consumers are trying to fetch different content from yoursunny ndn6 network, using the same ForwardingHint=/yoursunny
.
In this case, the strategy should keep separate measurements for "ForwardingHint + a prefix of Interest name", instead of just the ForwardingHint or just the name.