Project

General

Profile

Actions

Bug #4407

closed

Large queuing delay in stream-based faces

Added by Anonymous almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Faces
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

There currently does not seem to be a limit to the queuing delay inside an NFD process.

For example, if I run catchunks on my local laptop:

sudo ndnputchunks /test < 66MB_file
ndncatchunks --aimd-rto-min=2000 -v /test > /dev/null

I will indeed get over 2000ms of queuing delay, before a timeout happens. This leads to a huge amount of packets needing to be retransmitted (over 30%):

All segments have been received.
Time elapsed: 9801.08 milliseconds
Total # of segments received: 15595
Total size: 68615.4kB
Goodput: 56.006347 Mbit/s
Total # of packet loss events: 2
Packet loss rate: 0.000128246
Total # of retransmitted segments: 5562
Total # of received congestion marks: 0

(There is only 2 "loss events", which means that all retransmissions happened over a period of just 2 RTTs).

My question is:

  • Should there be an upper limit on queuing delay inside NFD?

  • Where exactly can I find these queues in the code? Can we access them and use them as part of the congestion signaling mechanism?


Related issues 3 (1 open2 closed)

Related to NFD - Feature #1624: Design and Implement Congestion ControlClosed

Actions
Related to NFD - Feature #4362: Congestion detection by observing socket queuesClosedEric Newberry

Actions
Related to NFD - Bug #4499: Unbounded queuing in StreamTransportNew

Actions
Actions #1

Updated by Anonymous almost 7 years ago

  • Related to Feature #1624: Design and Implement Congestion Control added
Actions #2

Updated by Davide Pesavento almost 7 years ago

  • Category set to Faces
  • Start date deleted (12/20/2017)

How can you be sure that this queuing happens inside NFD? (as opposed to the receive socket buffer, for example)

Actions #3

Updated by Anonymous almost 7 years ago

Good question.

I just checked http://linuxinsight.com/proc_sys_net_unix_max_dgram_qlen.html which I think is the unix socket buffer size.

On my machine it's 512 packets

I reduced it to 10 packets, but got the same results.

Also, the math doesn't add up (when using 4000ms minRTO, I get up to 4000 ms delay):

4s queuing delay * 56.006347 Mbit/s = 224Mbit = 28 Megabyte = 7000 packets.

I doubt the unix buffer is that large.

Actions #4

Updated by Anonymous almost 7 years ago

  • My number is probably an underestimate, since the Goodput (56Mbit) only counts sucesfully retrieved data. The throughput (including retransmissions) is higher.
Actions #5

Updated by Anonymous almost 7 years ago

Also note, that the default UDP socket buffer size in Ubuntu is only 200 Kilobyte.

Actions #6

Updated by Anonymous almost 7 years ago

From our email:

On Wed, Dec 20, 2017 at 2:16 AM, Klaus Schneider klaus@cs.arizona.edu wrote:

I got some comments from Davide on this issue during the Hackathon (CC'd).
It seems that NFD has no maximum internal queuing delay after which it
starts dropping packets, and that it may not even respect the Interest
lifetime in this case.

@Davide: Feel free to clarify these statements.

That's true, NFD doesn't perform any kind of AQM.

Note that the only "explicit" queue inside NFD is the send queue in
stream-based transports (i.e. TCP and Unix faces). IIUC, you were also
measuring the length of this queue for your hackathon project: what
numbers were you seeing in the case of local file transfers (catchunks
and putchunks on the same node as nfd)?

(feel free to continue this discussion on redmine #4407 instead of here)

Davide

The highest sendQueue values were around 610 (packets) followed by this error:

size of sendQueu= 611
size of sendQueu= 610
size of sendQueu= 609
size of sendQueu= 608
size of sendQueu= 607
size of sendQueu= 606
size of sendQueu= 606
size of sendQueu= 605
size of sendQueu= 604
size of sendQueu= 603
size of sendQueu= 602
size of sendQueu= 601
size of sendQueu= 600
size of sendQueu= 599
size of sendQueu= 598
size of sendQueu= 597
size of sendQueu= 596
size of sendQueu= 595
size of sendQueu= 594
size of sendQueu= 593
size of sendQueu= 592
size of sendQueu= 591
size of sendQueu= 590
size of sendQueu= 589
size of sendQueu= 588
size of sendQueu= 587
size of sendQueu= 586
1513819147.615661 ERROR: [UnixStreamTransport] [id=264,local=unix:///run/nfd.sock,remote=fd://30] Send or receive operation failed: Broken pipe
1513819147.615704 INFO: [Transport] [id=264,local=unix:///run/nfd.sock,remote=fd://30] setState UP -> FAILED
1513819147.616073 INFO: [Transport] [id=264,local=unix:///run/nfd.sock,remote=fd://30] setState FAILED -> CLOSED
1513819147.617232 INFO: [FaceTable] Removed face id=264 remote=fd://30 local=unix:///run/nfd.sock
1513819152.747982 INFO: [UnixStreamTransport] [id=0,local=unix:///run/nfd.sock,remote=fd://30] Creating transport
1513819152.748016 INFO: [FaceTable] Added face id=265 remote=fd://30 local=unix:///run/nfd.sock

Actions #7

Updated by Anonymous almost 7 years ago

Actually, this error occurs multiple times, even at lower sendQueues:

size of sendQueu= 46
size of sendQueu= 45
size of sendQueu= 44
size of sendQueu= 43
size of sendQueu= 42
size of sendQueu= 41
size of sendQueu= 40
size of sendQueu= 39
size of sendQueu= 38
1513819374.337269 ERROR: [UnixStreamTransport] [id=266,local=unix:///run/nfd.sock,remote=fd://30] Send or receive operation failed: Broken pipe
1513819374.337293 INFO: [Transport] [id=266,local=unix:///run/nfd.sock,remote=fd://30] setState UP -> FAILED
1513819374.337342 INFO: [Transport] [id=266,local=unix:///run/nfd.sock,remote=fd://30] setState FAILED -> CLOSED
1513819374.338398 INFO: [FaceTable] Removed face id=266 remote=fd://30 local=unix:///run/nfd.sock
1513819385.166847 INFO: [UnixStreamTransport] [id=0,local=unix:///run/nfd.sock,remote=fd://30] Creating transport
1513819385.166883 INFO: [FaceTable] Added face id=267 remote=fd://30 local=unix:///run/nfd.sock
size of sendQueu= 1
size of sendQueu= 1

Actions #8

Updated by Davide Pesavento almost 7 years ago

Klaus Schneider wrote:

I just checked http://linuxinsight.com/proc_sys_net_unix_max_dgram_qlen.html which I think is the unix socket buffer size.

On my machine it's 512 packets

That's for datagram sockets. Our Unix face is stream-based (like TCP), there's no concept of "packet". You need to look at /proc/sys/net/core/{r,w}mem_{default,max}. All those values are set to 208 KiB on a Linux server I have.

Actions #9

Updated by Anonymous almost 7 years ago

Correction: I think the "UnixStreamTransport ERROR" came from me manually terminating ndncatchunks.

Also, I saw even higher sendQueue sizes:

size of sendQueu= 917
size of sendQueu= 916
size of sendQueu= 915
size of sendQueu= 914
size of sendQueu= 913
size of sendQueu= 912
size of sendQueu= 911
size of sendQueu= 910

910 packets = 3.6 Megabyte = 28.8Mbit = roughly 500ms queuing delay.

So I think we can use this sendQueue to add a threshold after which NFD adds a congestion mark to outgoing packets.

Actions #10

Updated by Anonymous almost 7 years ago

Davide Pesavento wrote:

Klaus Schneider wrote:

I just checked http://linuxinsight.com/proc_sys_net_unix_max_dgram_qlen.html which I think is the unix socket buffer size.

On my machine it's 512 packets

That's for datagram sockets. Our Unix face is stream-based (like TCP), there's no concept of "packet". You need to look at /proc/sys/net/core/{r,w}mem_{default,max}. All those values are set to 208 KiB on a Linux server I have.

Same for me.

Actions #11

Updated by Davide Pesavento almost 7 years ago

Klaus Schneider wrote:

The highest sendQueue values were around 610 (packets) followed by this error:
[...]

Is this the face towards the consumer or the producer?

Actions #12

Updated by Anonymous almost 7 years ago

Davide Pesavento wrote:

Klaus Schneider wrote:

The highest sendQueue values were around 610 (packets) followed by this error:
[...]

Is this the face towards the consumer or the producer?

No idea. Looks like we need a more specific output, which includes the faceId.

Actions #13

Updated by Anonymous almost 7 years ago

My guess is that it's the producer's data packets, since they're larger.

Actions #14

Updated by Davide Pesavento almost 7 years ago

Klaus Schneider wrote:

Davide Pesavento wrote:

Klaus Schneider wrote:

The highest sendQueue values were around 610 (packets) followed by this error:
[...]

Is this the face towards the consumer or the producer?

No idea. Looks like we need a more specific output, which includes the faceId.

Yes, you should've used the NFD_LOG_FACE_* macros instead of std::cout.

Actions #15

Updated by Davide Pesavento almost 7 years ago

Klaus Schneider wrote:

My guess is that it's the producer's data packets, since they're larger.

Yes, it's mainly the face towards ndncatchunks, i.e. the Data packets are queued. I see some queuing on the other direction as well, but much less.

(btw, the --verbose flag of ndncatchunks seems to make a huge difference: without it the transfer takes half the time! writing to stderr is slow...)

Actions #16

Updated by Anonymous almost 7 years ago

Davide Pesavento wrote:

Klaus Schneider wrote:

My guess is that it's the producer's data packets, since they're larger.

Yes, it's mainly the face towards ndncatchunks, i.e. the Data packets are queued. I see some queuing on the other direction as well, but much less.

(btw, the --verbose flag of ndncatchunks seems to make a huge difference: without it the transfer takes half the time! writing to stderr is slow...)

Yeah, I think we should have an option to only show the summary, but not a line for each individual packet.

In any case, I wonder if the "send queue" is the only source of queuing? Does if fully explain the amount of queuing delay that we see?

Actions #17

Updated by Davide Pesavento almost 7 years ago

Klaus Schneider wrote:

In any case, I wonder if the "send queue" is the only source of queuing? Does if fully explain the amount of queuing delay that we see?

For the case of local Unix faces, I can't think of anything else right now, beyond the socket buffers and StreamTransport::m_sendQueue.

Actions #18

Updated by Anonymous almost 7 years ago

Davide Pesavento wrote:

Klaus Schneider wrote:

In any case, I wonder if the "send queue" is the only source of queuing? Does if fully explain the amount of queuing delay that we see?

For the case of local Unix faces, I can't think of anything else right now, beyond the socket buffers and StreamTransport::m_sendQueue.

There might also be a queue at the producer app?

Actions #19

Updated by Anonymous almost 7 years ago

I've just enabled the congestion marks (threshold sendQueue > 100), with quite astounding results:

Edit: Actually the congestion is detected by the ioctl command (TIOCOUTQ > 50KB).

Before:

ndncatchunks --aimd-rto-min=2000 --aimd-ignore-cong-marks -v /test > /dev/null

All segments have been received.
Time elapsed: 10576.5 milliseconds
Total # of segments received: 15595
Total size: 68615.4kB
Goodput: 51.900165 Mbit/s
Total # of packet loss events: 2
Packet loss rate: 0.000128246
Total # of retransmitted segments: 5195
Total # of received congestion marks: 15495

After:

ndncatchunks --aimd-rto-min=2000 -v /test > /dev/null

All segments have been received.
Time elapsed: 1087.8 milliseconds
Total # of segments received: 15595
Total size: 68615.4kB
Goodput: 504.619057 Mbit/s
Total # of packet loss events: 0
Packet loss rate: 0
Total # of retransmitted segments: 0
Total # of received congestion marks: 893

Roughly 10x the throughput, no packet loss, and the delay never exceeds 25ms.

Actions #20

Updated by Davide Pesavento almost 7 years ago

Klaus Schneider wrote:

Davide Pesavento wrote:

Klaus Schneider wrote:

In any case, I wonder if the "send queue" is the only source of queuing? Does if fully explain the amount of queuing delay that we see?

For the case of local Unix faces, I can't think of anything else right now, beyond the socket buffers and StreamTransport::m_sendQueue.

There might also be a queue at the producer app?

oh, I'm only considering the forwarder here, given the issue title. I'm totally unfamiliar with the internals of ndn-cxx's Face, but I bet there's a send queue there as well, for the same reason we have to have it in NFD (there cannot be more than one outstanding asynchronous write on the same socket at any given time). The cat/putchunks app logic itself does not do any queuing that I'm aware of.

Actions #21

Updated by Anonymous almost 7 years ago

  • Related to Feature #4362: Congestion detection by observing socket queues added
Actions #22

Updated by Davide Pesavento almost 7 years ago

  • Subject changed from NFD Internal Queuing to Unbounded queuing in StreamTransport

I'm not sure what the goal of this task is. @Klaus, please clarify.

Actions #23

Updated by Anonymous almost 7 years ago

Davide Pesavento wrote:

I'm not sure what the goal of this task is. @Klaus, please clarify.

I think it should be fixed by #4362.

Actions #24

Updated by Davide Pesavento almost 7 years ago

  • Tracker changed from Task to Bug
  • Subject changed from Unbounded queuing in StreamTransport to Large queuing delay in stream-based faces
  • Status changed from New to Closed
  • Target version set to v0.7

Hmm that fixes the symptoms, but the infinite queue is still there. I'll open a separate ticket though, as there are many unrelated comments here.

Actions #25

Updated by Anonymous almost 7 years ago

Davide Pesavento wrote:

Hmm that fixes the symptoms, but the infinite queue is still there. I'll open a separate ticket though, as there are many unrelated comments here.

I don't think the infinite queue itself is the problem. As long as the average delay (aka Bufferbloat) is controlled, the infinite queue might be a feature, not a bug.

See the CoDel paper or this recent IETF article: https://www.ietf.org/blog/blind-men-and-elephant/

Actions #26

Updated by Anonymous almost 7 years ago

  • Related to Bug #4499: Unbounded queuing in StreamTransport added
Actions

Also available in: Atom PDF