Project

General

Profile

Actions

Bug #4499

open

Unbounded queuing in StreamTransport

Added by Davide Pesavento about 6 years ago. Updated over 3 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Faces
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

The send queue in StreamTransport does not have any upper bound on its size. This can result in increased delay for outbound traffic and memory issues for the forwarder itself.


Related issues 2 (1 open1 closed)

Related to NFD - Bug #4407: Large queuing delay in stream-based facesClosed

Actions
Related to NFD - Feature #1301: Shrink table size during memory stressNew

Actions
Actions #1

Updated by Anonymous about 6 years ago

  • Related to Bug #4407: Large queuing delay in stream-based faces added
Actions #2

Updated by Anonymous about 6 years ago

Here's my comment from the older thread:

Klaus Schneider wrote:

Davide Pesavento wrote:

Hmm that fixes the symptoms, but the infinite queue is still there. I'll open a separate ticket though, as there are many unrelated comments here.

I don't think the infinite queue itself is the problem. As long as the average delay (aka Bufferbloat) is controlled, the infinite queue might be a feature, not a bug.

See the CoDel paper or this recent IETF article: https://www.ietf.org/blog/blind-men-and-elephant/

Also consider the downsides with introducing an upper bound of queued packets for StreamTransport: Now, suddenly StreamTransports can drop packets, which needs to be considered by applications and by the congestion control scheme using them.

Actions #3

Updated by Davide Pesavento about 6 years ago

Klaus Schneider wrote:

I don't think the infinite queue itself is the problem. As long as the average delay (aka Bufferbloat) is controlled, the infinite queue might be a feature, not a bug.

An infinite queue can hardly be considered a feature... I don't know of any systems that have infinite queues.
And they can easily be exploited by malicious actors to cause a DoS.

See the CoDel paper or this recent IETF article: https://www.ietf.org/blog/blind-men-and-elephant/

I (quickly) read the article. I don't see where it is said that infinite queues are a good thing.

Also consider the downsides with introducing an upper bound of queued packets for StreamTransport: Now, suddenly StreamTransports can drop packets, which needs to be considered by applications and by the congestion control scheme using them.

That can already happen. Transports are best effort. If your application cannot deal with Interests being unanswered, it's already severely broken.

Actions #4

Updated by Anonymous about 6 years ago

Davide Pesavento wrote:

Klaus Schneider wrote:

I don't think the infinite queue itself is the problem. As long as the average delay (aka Bufferbloat) is controlled, the infinite queue might be a feature, not a bug.

An infinite queue can hardly be considered a feature... I don't know of any systems that have infinite queues.
And they can easily be exploited by malicious actors to cause a DoS.

Well not literally infinite, but my understanding was that with CoDel the queue capacity can be arbitrarily large, since the average queue size does not depend on the capacity.

See the CoDel paper or this recent IETF article: https://www.ietf.org/blog/blind-men-and-elephant/

I (quickly) read the article. I don't see where it is said that infinite queues are a good thing.

Yeah, I think a better reference would be the Codel paper https://queue.acm.org/detail.cfm?id=2209336 or https://en.wikipedia.org/wiki/CoDel

From the paper: The queue capacity should be large enough to allow for all the "good queue" (= a temporary increase in queue delay; "shock absorbers to convert bursty arrivals into smooth, steady departures.")

The "bad queue" (= permanent excess queueing delay) is already handled by CoDel. The bad queue is not increased by having a larger queuing capacity. Hence, CoDel (just released last month as an RFC https://www.rfc-editor.org/rfc/rfc8289.txt) does not make a recommendation for a maximum queue capacity (other than "large enough to allow for the 'good queue'").

So in theory, there should not be much drawback of leaving it at infinite.

I can see the problem of DoS attacks, but I don't see how having a simple queue capacity limit would help that much. Rather than causing excess queueing delay, and attacker would just cause excess packet loss (still achieving the DoS).

Also consider the downsides with introducing an upper bound of queued packets for StreamTransport: Now, suddenly StreamTransports can drop packets, which needs to be considered by applications and by the congestion control scheme using them.

That can already happen. Transports are best effort. If your application cannot deal with Interests being unanswered, it's already severely broken.

Actions #5

Updated by Davide Pesavento about 6 years ago

Klaus Schneider wrote:

Davide Pesavento wrote:

Klaus Schneider wrote:

I don't think the infinite queue itself is the problem. As long as the average delay (aka Bufferbloat) is controlled, the infinite queue might be a feature, not a bug.

An infinite queue can hardly be considered a feature... I don't know of any systems that have infinite queues.
And they can easily be exploited by malicious actors to cause a DoS.

Well not literally infinite, but my understanding was that with CoDel the queue capacity can be arbitrarily large, since the average queue size does not depend on the capacity.

Not even "arbitrarily large", there's always a limit. For example, as far as I know on Linux it's a fraction of the available RAM, and can be further reduced under memory pressure.

From the paper: The queue capacity should be large enough to allow for all the "good queue" (= a temporary increase in queue delay; "shock absorbers to convert bursty arrivals into smooth, steady departures.")

Well yes, I'm not saying to make the queue so small that it's not even enough for the "good queue". I'm just talking about a fail safe to avoid exhausting all available memory in case of misbehaving applications (this actually happened to me a couple of years ago while working on chunks, it wasn't malicious, just a programming error).

I can see the problem of DoS attacks, but I don't see how having a simple queue capacity limit would help that much. Rather than causing excess queueing delay, and attacker would just cause excess packet loss (still achieving the DoS).

My main concern is not queueing delay but triggering an out-of-memory condition on the router, which will result in a crash and no packets being forwarded anymore. Of course the DoS needs way more sophisticated countermeasures, but it's hard to apply those countermeasures if the router is dead :)

I guess using the FQ (flow queuing) variant of CoDel could partially help in this situation, so that well-behaved flows are not penalized. The question is how to identify "flows" in NDN.

Actions #6

Updated by Anonymous about 6 years ago

I guess using the FQ (flow queuing) variant of CoDel could partially help in this situation, so that well-behaved flows are not penalized. The question is how to identify "flows" in NDN.

I thought about this for a little while and there's basically two options:

  1. A flow is defined as some prefix of the packet (data or interest) name. However, this approach doesn't seem to help at all, since an attacker can trivially send out Interests with arbitrary names; thus create so many flows to make FQ-CoDel useless.

  2. A flow is defined as all interest traffic coming from a specific interface. This seems more useful, since a single attacker cannot simply create multiple flows. However, the FQ-CoDel should be applied as close to the attacker as possible (possibly at its access router) to avoid punishing other flows which share the incoming interface.

But back to the main topic:

I'm just talking about a fail safe to avoid exhausting all available memory in case of misbehaving applications

That makes sense to me. So what should the limit be in your opinion? A fraction of the available RAM?

Actions #7

Updated by Davide Pesavento about 6 years ago

  • Priority changed from Normal to Low

Klaus Schneider wrote:

I'm just talking about a fail safe to avoid exhausting all available memory in case of misbehaving applications

That makes sense to me. So what should the limit be in your opinion? A fraction of the available RAM?

I think so. Unfortunately, there is no portable way to know the total physical RAM on a host, let alone the available (i.e. free) RAM. But I guess a static limit based on total RAM is a good enough first step. On POSIX systems we can use sysconf to that end.

Actions #8

Updated by Davide Pesavento about 6 years ago

  • Related to Feature #1301: Shrink table size during memory stress added
Actions #9

Updated by Anonymous about 6 years ago

Davide Pesavento wrote:

Klaus Schneider wrote:

I'm just talking about a fail safe to avoid exhausting all available memory in case of misbehaving applications

That makes sense to me. So what should the limit be in your opinion? A fraction of the available RAM?

I think so. Unfortunately, there is no portable way to know the total physical RAM on a host, let alone the available (i.e. free) RAM. But I guess a static limit based on total RAM is a good enough first step. On POSIX systems we can use sysconf to that end.

This still leaves the task of coming up with a good percentage.

Is this limit for each individual StreamTransport (face) or shared for the whole NFD daemon?

Actions #10

Updated by Junxiao Shi about 6 years ago

I think the send queue shouldn't exist at all. The kernel already maintains a queue on the socket. If the socket is blocked (i.e. the kernel queue is full), the outgoing packet should be dropped, just like in any IP router. Queue length should be set in the kernel parameters, if needed.
Exception to drop-tail is that, if an L2 frame is partially accepted by the socket, the remaining portion can remain in StreamTransport's buffer; otherwise, an incomplete frame would cause a decode error on the other end.

Actions #11

Updated by Davide Pesavento over 3 years ago

  • Tags set to needs-discussion
Actions

Also available in: Atom PDF