Feature #4362
closedFeature #1624: Design and Implement Congestion Control
Congestion detection by observing socket queues
100%
Description
Routers should be able to detect congestion by measuring their queue (either backlog size, or queuing delay).
This congestion measurement should then be used to signal consumers by putting congestion marks into packets (#3797).
In our Hackathon project we figured out, that we can use the ioctl command TIOCOUTQ to receive the buffer backlog size of all three socket types: TCP, UDP, and unix sockets.
The current design includes:
- Check if buffer is above THRESHOLD = MAX_BUF_SIZE / 3 (defaults to 200KB / 3)
- Use some default INTERVAL as rough estimation of RTT (default: 100ms)
- If buffer > THRESHOLD for at least one INTERVAL -> Insert first congestion mark into packet.
- As long as the buffer stays above THRESHOLD: Mark the following packets in a decreasing interval, described in the CoDel paper (start at INTERVAL then decrease by inverse sqrt(drops) )
Files
Updated by Anonymous about 7 years ago
- Related to Feature #1624: Design and Implement Congestion Control added
Updated by Anonymous about 7 years ago
Can someone make this a subtask of #1624?
I can't figure out how to do that.
Updated by Davide Pesavento about 7 years ago
- Tracker changed from Task to Feature
- Subject changed from Congestion Detection by observing router queues. to Congestion Detection by observing router queues
Updated by Anonymous almost 7 years ago
We made some progress on this task on the 5th NDN Hackathon: https://github.com/5th-ndn-hackathon/congestion-control
I attached our slides.
Updated by Anonymous almost 7 years ago
- Description updated (diff)
I've updated our current version of the design.
It is very simple, and can later be replaced by a more sophisticated AQM scheme like CODEL.
Updated by Anonymous almost 7 years ago
- Related to Bug #4407: Large queuing delay in stream-based faces added
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
In our Hackathon project we figured out, that we can use the ioctl command TIOCOUTQ to receive the buffer backlog size of all three socket types: TCP, UDP, and unix sockets.
When is the ioctl
called?
Updated by Davide Pesavento almost 7 years ago
- Subject changed from Congestion Detection by observing router queues to Congestion detection by observing socket queues
- Category changed from Forwarding to Faces
- Target version set to v0.7
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
In our Hackathon project we figured out, that we can use the ioctl command TIOCOUTQ to receive the buffer backlog size of all three socket types: TCP, UDP, and unix sockets.
When is the
ioctl
called?
The code is here: https://github.com/5th-ndn-hackathon/congestion-control/blob/master/daemon/face/stream-transport.hpp
doSend(Transport::Packet&& packet) calls checkCongestionLevel(packet);
The questions of where it should be called is still up for debate.
Updated by Davide Pesavento almost 7 years ago
I don't disagree in principle with placing it there. But an ioctl
is a system call. What's the performance impact of adding an extra system call for each sent packet?
Updated by Anonymous almost 7 years ago
No idea. Do you think the impact is significant compared to all the other NFD overhead?
Updated by Davide Pesavento almost 7 years ago
I did a very rough test: that ioctl
seems to take less then 1 µs on average, so no need to worry about it.
Updated by Anonymous almost 7 years ago
- Description updated (diff)
I added the part about the marking Interval to the design. Shouldn't be too complicated. You only need to keep a time stamp "firstTimeAboveLimit" and a counter of drops since the time has been above the limit.
See also https://github.com/schneiderklaus/ns-3-dev/blob/ndnSIM-v2/src/internet/model/codel-queue2.cc
Updated by Eric Newberry almost 7 years ago
- Status changed from New to In Progress
Updated by Eric Newberry almost 7 years ago
- Status changed from In Progress to Code review
- % Done changed from 10 to 70
Updated by Eric Newberry almost 7 years ago
For reliable transports (like TCP), do we want the queue length to be the total number of bytes not yet sent plus the number of unacked bytes or do we just want to use the total number of unsent bytes? I know how to do the first on both Linux and macOS (the mechanism is different) but I am unsure if the second is possible on macOS.
Updated by Anonymous almost 7 years ago
I think we want only the number of packets not sent.
The number of unacked packets (= inflight packets) depends on the propagation delay, which we don't want to mix with the queuing delay.
For example, if we set a threshold equivalent of 5ms total delay (not sent + not acked), then tunnels with 10ms propagation delay would always cause congestion marks.
Thus, it's clear that we want the threshold to only apply to queuing delay (not sent).
Also, isn't it easier to implement "not sent" for both? According to https://android.googlesource.com/kernel/msm.git/+/android-6.0.1_r0.21/include/linux/sockios.h
- TIOCOUTQ output queue size (not sent + not acked) is Linux-specific.
- SIOCOUTQNSD output queue size (not sent only) is platform independent.
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
Also, isn't it easier to implement "not sent" for both? According to https://android.googlesource.com/kernel/msm.git/+/android-6.0.1_r0.21/include/linux/sockios.h
- TIOCOUTQ output queue size (not sent + not acked) is Linux-specific.
- SIOCOUTQNSD output queue size (not sent only) is platform independent.
Neither are platform independent. Where did you read that SIOCOUTQNSD
is?
TIOCOUTQ
is usable only on TTYs on macOS, but there's a viable alternative: the SO_NWRITE
socket option. There is no BSD/macOS counterpart to SIOCOUTQNSD
as far as I know.
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
Also, isn't it easier to implement "not sent" for both? According to https://android.googlesource.com/kernel/msm.git/+/android-6.0.1_r0.21/include/linux/sockios.h
- TIOCOUTQ output queue size (not sent + not acked) is Linux-specific.
- SIOCOUTQNSD output queue size (not sent only) is platform independent.
Neither are platform independent. Where did you read that
SIOCOUTQNSD
is?
Well, TIOCOUTQ was listed under "Linux-specific", so I assumed the rest would be platform-independent.
But you're right, SIOCOUTQNSD isn't platform independent either.
TIOCOUTQ
is usable only on TTYs on macOS, but there's a viable alternative: theSO_NWRITE
socket option. There is no BSD/macOS counterpart toSIOCOUTQNSD
as far as I know.
Yes, "SO_NWRITE" sounds like what we are looking for.
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
Davide Pesavento wrote:
Klaus Schneider wrote:
Also, isn't it easier to implement "not sent" for both? According to https://android.googlesource.com/kernel/msm.git/+/android-6.0.1_r0.21/include/linux/sockios.h
- TIOCOUTQ output queue size (not sent + not acked) is Linux-specific.
- SIOCOUTQNSD output queue size (not sent only) is platform independent.
Neither are platform independent. Where did you read that
SIOCOUTQNSD
is?Well, TIOCOUTQ was listed under "Linux-specific", so I assumed the rest would be platform-independent.
But you're right, SIOCOUTQNSD isn't platform independent either.
TIOCOUTQ
is usable only on TTYs on macOS, but there's a viable alternative: theSO_NWRITE
socket option. There is no BSD/macOS counterpart toSIOCOUTQNSD
as far as I know.Yes, "SO_NWRITE" sounds like what we are looking for.
Linux actually doesn't support support SO_NWRITE, so we have separate implementations for Linux and macOS. Linux uses SIOCOUTQ via ioctl and macOS uses SO_NWRITE via getsockopt.
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
Yes, "SO_NWRITE" sounds like what we are looking for.
SO_NWRITE
is the # of unsent + unacked bytes. So, not good if we want unsent only per note-23.
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
Yes, "SO_NWRITE" sounds like what we are looking for.
SO_NWRITE
is the # of unsent + unacked bytes. So, not good if we want unsent only per note-23.
Alright, I got it.
Is there a way to count only the unacked bytes? (let's call it X)
Then subtract SO_NWRITE - X?
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
Davide Pesavento wrote:
Klaus Schneider wrote:
Yes, "SO_NWRITE" sounds like what we are looking for.
SO_NWRITE
is the # of unsent + unacked bytes. So, not good if we want unsent only per note-23.Alright, I got it.
Is there a way to count only the unacked bytes? (let's call it X)
Then subtract SO_NWRITE - X?
Not sure if there is, but I would worry about the relevant values changing in the time between the calls to obtain the relevant values.
Updated by Anonymous almost 7 years ago
Well, one fall-back option would be to just use m_sendQueue for macOS + TCP.
I think this solution is fine, since TCP tunnels are less important than UDP tunnels anways.
Updated by Anonymous almost 7 years ago
I wonder whether SO_NWRITE will work as desired in macOS + Unix Domain socket?
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
Is there a way to count only the unacked bytes? (let's call it X)
Then subtract SO_NWRITE - X?
In theory, there's the TCP_INFO
socket option. But:
- it's marked as PRIVATE, which means it's not exposed to user-space, but I think we can ignore that.
- it's also completely undocumented (I had to look at the xnu kernel source).
- the
struct tcp_info
it returns is huge, copying all of it to user-space for every packet sent is much more expensive thanSO_NWRITE
. But since we're not marking every packet anyway, maybe we can callgetSendQueueLength
only when we would mark the packet currently being sent (if the buffer was congested). Would that work?
The advantage of this approach is that struct tcp_info
contains both fields:
u_int32_t tcpi_snd_sbbytes; /* bytes in snd buffer including data inflight */
...
u_int64_t tcpi_txunacked __attribute__((aligned(8))); /* current number of bytes not acknowledged */
so there shouldn't be any race conditions/inconsistent values.
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
Is there a way to count only the unacked bytes? (let's call it X)
Then subtract SO_NWRITE - X?
In theory, there's the
TCP_INFO
socket option. But:
- it's marked as PRIVATE, which means it's not exposed to user-space, but I think we can ignore that.
- it's also completely undocumented (I had to look at the xnu kernel source).
- the
struct tcp_info
it returns is huge, copying all of it to user-space for every packet sent is much more expensive thanSO_NWRITE
. But since we're not marking every packet anyway, maybe we can callgetSendQueueLength
only when we would mark the packet currently being sent (if the buffer was congested). Would that work?
Well that would work after you entered the congested state (where the queue is above target). But how often would you check the queue size before the first packet is marked?
Also, even in the congested state, your solution would miss if the queue runs empty before the next packet is expected to be marked. This is fine in our current code, but would not allow us to extend it to the proper CoDel, where there is a minimum interval (100ms) that the queue has to be above target (5ms) to enter the congested state.
I think since TCP tunnels are not that important, we can use the simpler solution of just using m_sendQueue.
The advantage of this approach is that
struct tcp_info
contains both fields:u_int32_t tcpi_snd_sbbytes; /* bytes in snd buffer including data inflight */ ... u_int64_t tcpi_txunacked __attribute__((aligned(8))); /* current number of bytes not acknowledged */
Why is the not-acked number a uint64 while snd_sbbytes is uint32? Shouldn't snd_sbbytes always be the same or larger?
so there shouldn't be any race conditions/inconsistent values.
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
I think since TCP tunnels are not that important, we can use the simpler solution of just using m_sendQueue.
Fair enough.
u_int32_t tcpi_snd_sbbytes; /* bytes in snd buffer including data inflight */ ... u_int64_t tcpi_txunacked __attribute__((aligned(8))); /* current number of bytes not acknowledged */
Why is the not-acked number a uint64 while snd_sbbytes is uint32? Shouldn't snd_sbbytes always be the same or larger?
I had the same question, no idea. A 32-bit quantity would be more than enough for both values, I guess the second one was extended due to alignment requirements...
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
I wonder whether SO_NWRITE will work as desired in macOS + Unix Domain socket?
I believe it should. There are no ACKs in Unix sockets, data is written directly to the memory buffer from which the peer is reading. So my guess is that SO_NWRITE
simply returns the # of bytes in that buffer.
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
I wonder whether SO_NWRITE will work as desired in macOS + Unix Domain socket?
I believe it should. There are no ACKs in Unix sockets, data is written directly to the memory buffer from which the peer is reading. So my guess is that
SO_NWRITE
simply returns the # of bytes in that buffer.
Sounds good. Let's do that.
So in summary we use:
Linux¶
- UDP: TIOCOUTQ
- TCP: SIOCOUTQNSD + m_sendQueue
- Unix: SIOCOUTQNSD + m_sendQueue
MacOS¶
- UDP: SO_NWRITE
- TCP: m_sendQueue
- Unix: SO_NWRITE + m_sendQueue
Is that correct?
Updated by Davide Pesavento almost 7 years ago
Mostly correct. SIOCOUTQNSD makes sense for TCP sockets only. For Unix sockets, we must use SIOCOUTQ (+ m_sendQueue
).
Updated by Eric Newberry almost 7 years ago
Since WebSocket runs over TCP, should we also attempt to detect congestion in WebSocketTransport
? I found out how to get the underlying Asio socket through websocketpp. However, WebSocketTransport
doesn't contain a send queue (like TcpTransport
), so macOS would not be able to detect congestion using the mechanisms outlined above.
Updated by Junxiao Shi almost 7 years ago
It makes no sense to detect congestion on Unix stream and WebSockets, which exclusively connect to end applications.
Unix streams are local IPC channels. If an app can’t read from a socket fast enough, it can detect the situation by itself (there’s almost no idle cycles between select
calls), and a congestion mark does not help it more.
WebSockets are proxied through nginx in a typical deployment, and congestion happens between nginx and browser, not between NFD and nginx. Therefore, detecting congestion on NFD side won’t work. Also, NDN-JS does not recognize congestion marks (and I don’t see an open issue requesting that) so this won’t help applications.
Updated by Davide Pesavento almost 7 years ago
Junxiao Shi wrote:
Unix streams are local IPC channels. If an app can’t read from a socket fast enough, it can detect the situation by itself (there’s almost no idle cycles between
select
calls),
That condition doesn't imply a congested state by itself. And in any case, why would you want to do this in applications? Let's not make writing apps harder than it already is.
and a congestion mark does not help it more.
#4407 seems to prove you wrong. Marking packets is helpful to signal the consumer to slow down.
WebSockets are proxied through nginx in a typical deployment, and congestion happens between nginx and browser, not between NFD and nginx.
[citation needed]
Also, NDN-JS does not recognize congestion marks (and I don’t see an open issue requesting that) so this won’t help applications.
That's a completely orthogonal issue. Libraries will add support over time, as with any other new feature.
Updated by Junxiao Shi almost 7 years ago
WebSockets are proxied through nginx in a typical deployment, and congestion happens between nginx and browser, not between NFD and nginx.
[citation needed]
As described in https://lists.named-data.net/mailman/private/operators/2016-June/001062.html, modern browsers cannot use non-TLS WebSockets except in very limited circumstances, so the testbed uses nginx as a proxy.
The connection between NFD and nginx is local so it cannot be congested.
Updated by Eric Newberry almost 7 years ago
For transports that add the size of m_sendQueue
onto the send queue length obtained from the socket, if there is an error obtaining the queue length from the socket, do we want to just return the size of m_sendQueue
? Or do we want to return an error code (QUEUE_ERROR
)?
Updated by Davide Pesavento almost 7 years ago
Eric Newberry wrote:
For transports that add the size of
m_sendQueue
onto the send queue length obtained from the socket, if there is an error obtaining the queue length from the socket, do we want to just return the size ofm_sendQueue
? Or do we want to return an error code (QUEUE_ERROR
)?
IMO, return at least the size of m_sendQueue
, which is always available in a stream transport. Some info is better than none.
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Eric Newberry wrote:
For transports that add the size of
m_sendQueue
onto the send queue length obtained from the socket, if there is an error obtaining the queue length from the socket, do we want to just return the size ofm_sendQueue
? Or do we want to return an error code (QUEUE_ERROR
)?IMO, return at least the size of
m_sendQueue
, which is always available in a stream transport. Some info is better than none.
Agreed. I don't think returning an error code would be very useful.
I also agree fully with Davide's note-40.
Regarding Websockets: Do you have any concrete design for MacOS? If not, I think it's fine to leave it out (+ document it).
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
Davide Pesavento wrote:
Eric Newberry wrote:
For transports that add the size of
m_sendQueue
onto the send queue length obtained from the socket, if there is an error obtaining the queue length from the socket, do we want to just return the size ofm_sendQueue
? Or do we want to return an error code (QUEUE_ERROR
)?IMO, return at least the size of
m_sendQueue
, which is always available in a stream transport. Some info is better than none.Agreed. I don't think returning an error code would be very useful.
I also agree fully with Davide's note-40.
Regarding Websockets: Do you have any concrete design for MacOS? If not, I think it's fine to leave it out (+ document it).
Patchset 14 should implement everything according to note 40.
I believe WebSocket would use the same information as TCP, without m_sendQueue, meaning that it would have absolutely no send queue length information on macOS.
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
Regarding Websockets: Do you have any concrete design for MacOS? If not, I think it's fine to leave it out (+ document it).
It would be exactly the same as TCP since the WebSocket protocol is layered over a TCP connection. But let's leave WebSocketTransport support out for now, we can add it later.
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
Regarding Websockets: Do you have any concrete design for MacOS? If not, I think it's fine to leave it out (+ document it).
It would be exactly the same as TCP since the WebSocket protocol is layered over a TCP connection. But let's leave WebSocketTransport support out for now, we can add it later.
Sounds good.
Updated by Eric Newberry almost 7 years ago
Do we plan to attempt an implementation of queue length retrieval for Ethernet faces in the future?
Also, we should create another issue to allow congestion marking to be enabled and disabled from NFD management. This would include design work.
Updated by Davide Pesavento almost 7 years ago
Eric Newberry wrote:
Do we plan to attempt an implementation of queue length retrieval for Ethernet faces in the future?
I hope so, but does that impact change 4411?
Updated by Eric Newberry almost 7 years ago
Davide Pesavento wrote:
Eric Newberry wrote:
Do we plan to attempt an implementation of queue length retrieval for Ethernet faces in the future?
I hope so, but does that impact change 4411?
No, but it definitely relates to this issue, since the description mentions Ethernet links.
Updated by Davide Pesavento almost 7 years ago
Eric Newberry wrote:
No, but it definitely relates to this issue, since the description mentions Ethernet links.
"Ethernet links" doesn't imply ethernet faces are being used. Although I'm not sure what "When NFD is running natively" means TBH.
Let's keep this issue focused on socket-based transports only, like the title and the rest of the description say. Ethernet transports differ substantially from the rest, and the detection mechanism(s) will probably be very different, assuming it's even feasible to do it. Use a separate issue for them.
Updated by Anonymous almost 7 years ago
Eric Newberry wrote:
Davide Pesavento wrote:
Eric Newberry wrote:
Do we plan to attempt an implementation of queue length retrieval for Ethernet faces in the future?
I hope so, but does that impact change 4411?
No, but it definitely relates to this issue, since the description mentions Ethernet links.
Looks like the description needs an update. Let's exclude Ethernet links from this issue.
Also, we should create another issue to allow congestion marking to be enabled and disabled from NFD management. This would include design work.
Which design work are you thinking about?
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
Also, we should create another issue to allow congestion marking to be enabled and disabled from NFD management. This would include design work.
Which design work are you thinking about?
Just a design of how it would be enabled from management (probably a bit in the Flags and Mask fields), plus adding any other necessary options. It should be pretty straightforward, with nothing too complex.
Updated by Eric Newberry almost 7 years ago
Does "drops" in the description refer to the number of packets marked in the incident of congestion or the total number of packets processed in the incident of congestion?
Updated by Anonymous almost 7 years ago
Eric Newberry wrote:
Does "drops" in the description refer to the number of packets marked in the incident of congestion or the total number of packets processed in the incident of congestion?
"drops" (better called "marks") means the number of packets marked after entering the marking state (i.e. the queue being above target).
Once the queue goes below target, drops is reset to 0.
In general, any reference to "drops" in the CoDel RFC will translate to "marked packets" in our design.
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
Eric Newberry wrote:
Does "drops" in the description refer to the number of packets marked in the incident of congestion or the total number of packets processed in the incident of congestion?
"drops" (better called "marks") means the number of packets marked after entering the marking state (i.e. the queue being above target).
Once the queue goes below target, drops is reset to 0.
Previous patchsets used the second incorrect definition, but the latest patchset (18) should use the correct one stated above.
Updated by Anonymous almost 7 years ago
Eric Newberry wrote:
Previous patchsets used the second incorrect definition, but the latest patchset (18) should use the correct one stated above.
sounds good.
Updated by Anonymous almost 7 years ago
- % Done changed from 100 to 90
I ran a simple test on my local machine (100MB file) and it failed:
ndncatchunks -v /test > /dev/null
All segments have been received.
Time elapsed: 4003.19 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 208.181317 Mbit/s
Total # of packet loss events: 1
Total # of retransmitted segments: 1292
Total # of received congestion marks: 0
RTT min/avg/max = 0.875/131/260 ms
Catchunks doesn't receive any congestion marks.
In contrast, our hackathon code works:
ndncatchunks -v /test > /dev/null
All segments have been received.
Time elapsed: 1701.96 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 489.664074 Mbit/s
Total # of packet loss events: 0
Total # of retransmitted segments: 0
Total # of received congestion marks: 1046
RTT min/avg/max = 0.388/2.34/20.4 ms
Edit: removed the misleading "packet loss rate", which is currently a bug in ndncatchunks.
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
I ran a simple test on my local machine (100MB file) and it failed:
ndncatchunks -v /test > /dev/null
All segments have been received.
Time elapsed: 4003.19 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 208.181317 Mbit/s
Total # of packet loss events: 1
Packet loss rate: 4.22369e-05
Total # of retransmitted segments: 1292
Total # of received congestion marks: 0
RTT min/avg/max = 0.875/131/260 msCatchunks doesn't receive any congestion marks.
In contrast, our hackathon code works:
ndncatchunks -v /test > /dev/null
All segments have been received.
Time elapsed: 1701.96 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 489.664074 Mbit/s
Total # of packet loss events: 0
Packet loss rate: 0
Total # of retransmitted segments: 0
Total # of received congestion marks: 1046
RTT min/avg/max = 0.388/2.34/20.4 ms
I don't think we can expect any system to work 100% perfectly 100% of the time. The loss could also have been caused by external factors. Did you run multiple trials?
Updated by Anonymous almost 7 years ago
Eric Newberry wrote:
I don't think we can expect any system to work 100% perfectly 100% of the time. The loss could also have been caused by external factors. Did you run multiple trials?
Currently, it works 0% of the time.
The queue is clearly above the limit (avg. RTT of 131ms), yet there are no congestion marks.
Edit: Currently the catchunk output of "packet loss event" and "packet loss rate" is misleading. Look at "# of retransmitted segments" to see the actual packet loss rate.
Updated by Anonymous almost 7 years ago
Okay, I figured out that I forgot to enable the congestion detection in the code (we should be able to do this in a config file, rather than by recompiling).
Now it works quite well, but needs some tuning for links with low delay (here less than 1ms).
ndncatchunks -v /test > /dev/null
All segments have been received.
Time elapsed: 2304.11 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 361.697410 Mbit/s
Total # of retransmitted segments: 970
Total # of received congestion marks: 22
RTT min/avg/max = 0.489/18.9/246 ms
The problem can be fixed by changing the CoDel interval from 100ms to 10ms:
All segments have been received.
Time elapsed: 1640.65 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 507.962230 Mbit/s
Total # of retransmitted segments: 0
Total # of received congestion marks: 45
RTT min/avg/max = 0.499/2.74/25.3 ms
Or by increasing the consumer RTO:
ndncatchunks -v --aimd-rto-min 500 /test > /dev/null
All segments have been received.
Time elapsed: 1856.24 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 448.965759 Mbit/s
Total # of retransmitted segments: 0
Total # of received congestion marks: 15
RTT min/avg/max = 0.658/25.2/233 ms
Also note that (as expected) we get much viewer congestion marks here, than in the hackathon code.
I still have to do the evaluation with higher RTTs and TCP/UDP tunnels.
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
Okay, I figured out that I forgot to enable the congestion detection in the code (we should be able to do this in a config file, rather than by recompiling).
Whether to allow congestion marking on a face or not should be set through the management system (and therefore via nfdc). It should be implemented in a similar manner to LpReliability.
The problem can be fixed by changing the CoDel interval from 100ms to 10ms:
I assume we're not going to change this value in the current commit. We can allow it to be changed through the management system.
Updated by Anonymous almost 7 years ago
Eric Newberry wrote:
Klaus Schneider wrote:
Okay, I figured out that I forgot to enable the congestion detection in the code (we should be able to do this in a config file, rather than by recompiling).
Whether to allow congestion marking on a face or not should be set through the management system (and therefore via nfdc). It should be implemented in a similar manner to LpReliability.
Sounds good. Are we doing that in next commit?
The problem can be fixed by changing the CoDel interval from 100ms to 10ms:
I assume we're not going to change this value in the current commit. We can allow it to be changed through the management system.
Yes, the current default of 100ms is fine. But it should be reasonably easy to change.
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
Eric Newberry wrote:
Klaus Schneider wrote:
Okay, I figured out that I forgot to enable the congestion detection in the code (we should be able to do this in a config file, rather than by recompiling).
Whether to allow congestion marking on a face or not should be set through the management system (and therefore via nfdc). It should be implemented in a similar manner to LpReliability.
Sounds good. Are we doing that in next commit?
Yes
The problem can be fixed by changing the CoDel interval from 100ms to 10ms:
I assume we're not going to change this value in the current commit. We can allow it to be changed through the management system.
Yes, the current default of 100ms is fine. But it should be reasonably easy to change.
We can probably just have this (from the user's perspective) as a command-line argument to nfdc.
Updated by Davide Pesavento almost 7 years ago
Eric Newberry wrote:
Klaus Schneider wrote:
The problem can be fixed by changing the CoDel interval from 100ms to 10ms:
I assume we're not going to change this value in the current commit. We can allow it to be changed through the management system.
Yes we can do that, but the problem is that no one is actually going to change it, especially not for local (unix stream) faces.
Updated by Eric Newberry almost 7 years ago
Davide Pesavento wrote:
Eric Newberry wrote:
Klaus Schneider wrote:
The problem can be fixed by changing the CoDel interval from 100ms to 10ms:
I assume we're not going to change this value in the current commit. We can allow it to be changed through the management system.
Yes we can do that, but the problem is that no one is actually going to change it, especially not for local (unix stream) faces.
We could have the management system set a different default for local faces?
Updated by Davide Pesavento almost 7 years ago
Eric Newberry wrote:
We could have the management system set a different default for local faces?
Please elaborate. Currently, management has no power over Unix faces, and more generally, management is not involved in the creation of on-demand faces.
Updated by Eric Newberry almost 7 years ago
Davide Pesavento wrote:
Eric Newberry wrote:
We could have the management system set a different default for local faces?
Please elaborate. Currently, management has no power over Unix faces, and more generally, management is not involved in the creation of on-demand faces.
What about local TCP faces? Are they ever utilized and are they also created on-demand?
Updated by Davide Pesavento almost 7 years ago
Eric Newberry wrote:
Davide Pesavento wrote:
Eric Newberry wrote:
We could have the management system set a different default for local faces?
Please elaborate. Currently, management has no power over Unix faces, and more generally, management is not involved in the creation of on-demand faces.
What about local TCP faces? Are they ever utilized and are they also created on-demand?
Same thing. Whenever you have an application on the other side, the face is on-demand. Local TCP faces are also rarely used, compared to Unix faces.
Updated by Anonymous almost 7 years ago
I did some testing over UDP on my virtual machine (both NFDs have congestion detection enabled). I also added 50ms link delay.
First case: Producer runs on laptop, consumer runs on VM:
All segments have been received.
Time elapsed: 29136.8 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 28.602673 Mbit/s
Total # of retransmitted segments: 338
Total # of received congestion marks: 0
RTT min/avg/max = 43.954/55.612/110.566 ms
Second case: Consumer runs on laptop, producer runs on VM:
All segments have been received.
Time elapsed: 13365.2 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 62.355268 Mbit/s
Total # of retransmitted segments: 842
Total # of received congestion marks: 0
RTT min/avg/max = 51/101/325 ms
As you see, there's no congestion marks. Increasing the consumer RTO didn't help.
Updated by Anonymous almost 7 years ago
The log files look like this (grep "cong").
Congestion detection is clearly enabled, but it never marks a packet. (no occurence of "LpPacket was marked as congested").
1515462666.399212 TRACE: [GenericLinkService] [id=267,local=unix:///run/nfd.sock,remote=fd://29] Send queue length: 91392 - congestion threshold: 65536
1515462666.399356 TRACE: [GenericLinkService] [id=267,local=unix:///run/nfd.sock,remote=fd://29] Send queue length: 96768 - congestion threshold: 65536
1515462666.399499 TRACE: [GenericLinkService] [id=267,local=unix:///run/nfd.sock,remote=fd://29] Send queue length: 102144 - congestion threshold: 65536
1515462673.287010 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 57344 - congestion threshold: 53248
1515462681.545601 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 57344 - congestion threshold: 53248
1515462681.546727 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 65536 - congestion threshold: 53248
1515462681.548310 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 73728 - congestion threshold: 53248
1515462681.943691 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 57344 - congestion threshold: 53248
1515462681.944187 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 65536 - congestion threshold: 53248
1515462681.944718 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 73728 - congestion threshold: 53248
1515462681.945164 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 81920 - congestion threshold: 53248
1515462681.945894 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 90112 - congestion threshold: 53248
1515462681.946476 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 98304 - congestion threshold: 53248
1515462681.947019 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 106496 - congestion threshold: 53248
1515462681.947638 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 114688 - congestion threshold: 53248
1515462681.948119 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 122880 - congestion threshold: 53248
1515462681.948629 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 131072 - congestion threshold: 53248
1515462681.949120 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 139264 - congestion threshold: 53248
1515462682.917569 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 57344 - congestion threshold: 53248
1515462682.945887 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 57344 - congestion threshold: 53248
1515462682.946526 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 65536 - congestion threshold: 53248
1515462685.098668 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 57344 - congestion threshold: 53248
1515462685.099018 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 65536 - congestion threshold: 53248
1515462685.099470 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 73728 - congestion threshold: 53248
1515462686.671630 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.1.107:6363,remote=udp4://192.168.1.104:6363] Send queue length: 57344 - congestion threshold: 53248
Updated by Anonymous almost 7 years ago
As written in Gerrit, it would be good to have a better logging output.
Updated by Anonymous almost 7 years ago
Regarding the UDP test:
- I found that the congestion on the actual UDP face never exceeds the threshold:
1515480726.669771 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 768 - send queue capacity: 106496 - congestion threshold: 53248
1515480726.673690 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 768 - send queue capacity: 106496 - congestion threshold: 53248
1515480726.673833 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 1536 - send queue capacity: 106496 - congestion threshold: 53248
1515480726.674149 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 2304 - send queue capacity: 106496 - congestion threshold: 53248
1515480726.674294 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 3072 - send queue capacity: 106496 - congestion threshold: 53248
Instead the congestion occurs at the unix socket.
However, for the unix socket, the congestion level goes below the threshold quite often and unexplicably:
1515480710.620540 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 53760 - send queue capacity: -1 - congestion threshold: 65536
1515480710.620611 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 59136 - send queue capacity: -1 - congestion threshold: 65536
1515480710.620692 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 64512 - send queue capacity: -1 - congestion threshold: 65536
1515480710.620766 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 69888 - send queue capacity: -1 - congestion threshold: 65536
1515480710.620843 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 75264 - send queue capacity: -1 - congestion threshold: 65536
1515480710.627133 DEBUG: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length dropped below congestion threshold
1515480710.627363 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 5376 - send queue capacity: -1 - congestion threshold: 65536
1515480710.627483 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 10752 - send queue capacity: -1 - congestion threshold: 65536
I wonder whether this is caused by packet loss/buffer overflow?
Another problem, is that we currently can't determine the queue capacity for unix sockets (result = -1).
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
Another problem, is that we currently can't determine the queue capacity for unix sockets (result = -1).
Can you test again with the latest patchset (25)? It should implement queue capacity detection for Unix sockets.
Updated by Anonymous almost 7 years ago
Eric Newberry wrote:
Klaus Schneider wrote:
Another problem, is that we currently can't determine the queue capacity for unix sockets (result = -1).
Can you test again with the latest patchset (25)? It should implement queue capacity detection for Unix sockets.
I think Davide made a good point: The unix socket also uses m_sendQueue() and thus can't drop any packets.
Is that true? If yes, the earlier design (returning -1 was fine).
But then we have the question why the queue suddenly becomes empty.
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
I wonder whether this is caused by packet loss/buffer overflow?
No. Unix sockets don't lose packets.
The unix socket also uses m_sendQueue() and thus can't drop any packets.
Is that true? If yes, the earlier design (returning -1 was fine).
Yes, that's correct.
Updated by Anonymous almost 7 years ago
Here is the output of the consumer (60ms link propagation delay):
All segments have been received.
Time elapsed: 36545.2 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 22.804349 Mbit/s
Total # of packet loss events: 21
Packet loss rate: 0.000886974
Total # of retransmitted segments: 206
Total # of received congestion marks: 0
RTT min/avg/max = 60.673/68.721/132.907 ms
There are no timeouts caused by exceeding the RTO (200ms), but 206 retransmitted packets.
So where do these packet drops come from, if the UDP tunnel never exceeds the limit, and both unix sockets and NFD are perfectly reliable (i.e. introduce delay, but don't drop packets)?
Updated by Anonymous almost 7 years ago
Another weird outcome: The UDP tunnel send queue size seems to be counting # of packets rather than bytes:
1515482564.798918 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 18 - send queue capacity: 106496 - congestion threshold: 53248
1515482564.798998 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 19 - send queue capacity: 106496 - congestion threshold: 53248
1515482564.799072 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 20 - send queue capacity: 106496 - congestion threshold: 53248
1515482564.799144 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 21 - send queue capacity: 106496 - congestion threshold: 53248
1515482564.799255 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 22 - send queue capacity: 106496 - congestion threshold: 53248
1515482564.799968 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 23 - send queue capacity: 106496 - congestion threshold: 53248
1515482564.800050 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 24 - send queue capacity: 106496 - congestion threshold: 53248
1515482564.800190 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 25 - send queue capacity: 106496 - congestion threshold: 53248
1515482564.800975 TRACE: [GenericLinkService] [id=260,local=udp4://10.0.2.15:6363,remote=udp4://10.134.223.61:6363] Send queue length: 26 - send queue capacity: 106496 - congestion threshold: 53248
Or how else do you explain that the "send queue length" is incremented in steps of 1?
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
So where do these packet drops come from, if the UDP tunnel never exceeds the limit, and both unix sockets and NFD are perfectly reliable (i.e. introduce delay, but don't drop packets)?
How do you know the udp socket isn't dropping packets?
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
So where do these packet drops come from, if the UDP tunnel never exceeds the limit, and both unix sockets and NFD are perfectly reliable (i.e. introduce delay, but don't drop packets)?
How do you know the udp socket isn't dropping packets?
If it was, we would first see its queue go above the threshold. But it's never even close to the threshold. (assuming that our queue size detection works correctly)
I have another theory: Maybe some process clears the buffer in batches, which empties it periodically?
That would explain why the queue suddenly decreases so much:
1515485721.065771 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 129024 - congestion threshold: 53248 - capacity: 106496
1515485721.065844 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 134400 - congestion threshold: 53248 - capacity: 106496
1515485721.065943 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 139776 - congestion threshold: 53248 - capacity: 106496
1515485721.075803 DEBUG: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length dropped below congestion threshold
1515485721.668688 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 5376 - congestion threshold: 53248 - capacity: 106496
1515485721.845179 TRACE: [GenericLinkService] [id=262,local=unix:///run/nfd.sock,remote=fd://27] Send queue length: 5376 - congestion threshold: 53248 - capacity: 106496
But it still doesn't explain the packet loss.
Updated by Anonymous almost 7 years ago
I'm still wondering what's up with the non-byte queue lengths:
1515486333.962175 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 20 - congestion threshold: 53248 - capacity: 106496
1515486333.962446 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 20 - congestion threshold: 53248 - capacity: 106496
1515486333.969706 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 12 - congestion threshold: 53248 - capacity: 106496
1515486333.970008 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 13 - congestion threshold: 53248 - capacity: 106496
1515486333.978128 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 8 - congestion threshold: 53248 - capacity: 106496
1515486333.978493 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 9 - congestion threshold: 53248 - capacity: 106496
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
Davide Pesavento wrote:
Klaus Schneider wrote:
So where do these packet drops come from, if the UDP tunnel never exceeds the limit, and both unix sockets and NFD are perfectly reliable (i.e. introduce delay, but don't drop packets)?
How do you know the udp socket isn't dropping packets?
If it was, we would first see its queue go above the threshold. But it's never even close to the threshold. (assuming that our queue size detection works correctly)
I wouldn't make that assumption, seeing that the queue length values don't make much sense.
I'm still wondering what's up with the non-byte queue lengths:
I don't know what's happening in your test, but the queue length is in bytes.
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
Davide Pesavento wrote:
Klaus Schneider wrote:
So where do these packet drops come from, if the UDP tunnel never exceeds the limit, and both unix sockets and NFD are perfectly reliable (i.e. introduce delay, but don't drop packets)?
How do you know the udp socket isn't dropping packets?
If it was, we would first see its queue go above the threshold. But it's never even close to the threshold. (assuming that our queue size detection works correctly)
I wouldn't make that assumption, seeing that the queue length values don't make much sense.
So how can we figure out if the UDP tunnel drops something?
I'm still wondering what's up with the non-byte queue lengths:
I don't know what's happening in your test, but the queue length is in bytes.
So why does SIOCOUTQ return values that differ by 1? Are there 1-byte packets in the buffer?
Can you (or someone) try to run the code in UDP tunnels, and see if you can replicate my results?
Updated by Anonymous almost 7 years ago
I just figured out a way in which the udp tunnel could drop packets, without us noticing (http://developerweb.net/viewtopic.php?id=6488): If the receive queue at the other endpoint is overflowing!
Can we also monitor the receive queue of each UDP face and print out trace output?
The command would be ioctl(SIOCINQ)
Updated by Anonymous almost 7 years ago
Actually, the numbers make perfect sense if the UDP queue length is somehow mistakenly in #packets rather than bytes:
1515543750.854293 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 136 - congestion threshold: 53248 - capacity: 106496
1515543750.854851 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 136 - congestion threshold: 53248 - capacity: 106496
1515543750.855346 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 136 - congestion threshold: 53248 - capacity: 106496
1515543750.856916 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 132 - congestion threshold: 53248 - capacity: 106496
1515543750.857660 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 132 - congestion threshold: 53248 - capacity: 106496
1515543750.858332 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 136 - congestion threshold: 53248 - capacity: 106496
1515543750.859473 TRACE: [GenericLinkService] [id=260,local=udp4://192.168.56.101:6363,remote=udp4://192.168.56.1:6363] Send queue length: 132 - congestion threshold: 53248 - capacity: 106496
The udp buffer capacity is 106,496 bytes.
The highest observed queue length is 136 x 768(packet size) = 104,448 bytes. (Right below capacity).
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
So how can we figure out if the UDP tunnel drops something?
I'm not sure, I don't think there's a way to know that for a specific socket. You can try "nstat -az | grep UdpSndbufErrors
" for a system-wide counter, but other type of errors may be conflated in the same metric.
But let's take a step back, why do you care if the socket dropped a packet?
The command would be ioctl(SIOCINQ)
No, SIOCINQ
returns the size of the first datagram in the receive queue.
Actually, the numbers make perfect sense if the UDP queue length is somehow mistakenly in #packets rather than bytes:
Where do you see these weird numbers? on macOS?
The highest observed queue length is 136 x 768(packet size) = 104,448 bytes. (Right below capacity).
Are you sure the packet is 768 bytes? Are you using chunks? You must be doing something unusual if the packets have that size (too big for an Interest, too small for a Data).
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
So how can we figure out if the UDP tunnel drops something?
I'm not sure, I don't think there's a way to know that for a specific socket. You can try "
nstat -az | grep UdpSndbufErrors
" for a system-wide counter, but other type of errors may be conflated in the same metric.But let's take a step back, why do you care if the socket dropped a packet?
Well there is packet loss somewhere, and I'm trying to figure out where it occurs.
The command would be ioctl(SIOCINQ)
No,
SIOCINQ
returns the size of the first datagram in the receive queue.
Fair enough. It seems to work on every platform but Linux https://stackoverflow.com/questions/9278189/how-do-i-get-amount-of-queued-data-for-udp-socket
Actually, the numbers make perfect sense if the UDP queue length is somehow mistakenly in #packets rather than bytes:
Where do you see these weird numbers? on macOS?
Ubuntu 16.04 VM.
The highest observed queue length is 136 x 768(packet size) = 104,448 bytes. (Right below capacity).
Are you sure the packet is 768 bytes? Are you using chunks? You must be doing something unusual if the packets have that size (too big for an Interest, too small for a Data).
Not sure, but 768 bytes was the "step size" in the unix queue output.
Updated by Anonymous almost 7 years ago
I did netstat -s -su and I get a couple packet receive errors and RcvbufErrors
Before catchunks run:
Udp:
291724 packets received
435 packet receive errors
293536 packets sent
RcvbufErrors: 435
After catchunks run:
Udp:
315400 packets received
544 packet receive errors
317322 packets sent
RcvbufErrors: 544
So definitely, the UDP sockets are dropping something.
Updated by Anonymous almost 7 years ago
And indeed it's a receive buffer overflow. You can see it when running:
watch -n .1 ss -nump
On both ends, the Send-Q is almost empty, while the Recv-Q is overflowing.
Updated by Anonymous almost 7 years ago
Note that the problem can be mitigated some some combination of:
- Reducing the default marking interval, that is, the time the queue has to stay above limit before the first packet is marked (here: to 5ms)
- Increasing the UDP receive buffer size (here to around 10x the normal, 2000KB).
- Increasing the consumer min-rto (here to 600ms)
All segments have been received.
Time elapsed: 13476.7 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 61.839497 Mbit/s
Total # of packet loss events: 0
Packet loss rate: 0
Total # of retransmitted segments: 0
Total # of received congestion marks: 2
RTT min/avg/max = 41.858/140.903/348.999 ms
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
- Reducing the default marking interval, that is, the time the queue has to stay above limit before the first packet is marked (here: to 5ms)
We could make this a separate interval in GenericLinkService::Options from baseCongestionMarkingInterval.
Updated by Anonymous almost 7 years ago
Eric Newberry wrote:
Klaus Schneider wrote:
- Reducing the default marking interval, that is, the time the queue has to stay above limit before the first packet is marked (here: to 5ms)
We could make this a separate interval in GenericLinkService::Options from baseCongestionMarkingInterval.
Actually, I thought about the following design:
- Ignore the interval when first marking the packet. Just mark the first packet once queue > threshold.
- Then, wait at least "INTERVAL" before marking the next packet. If the queue stays above the threshold, reduce the interval, as it is done currently.
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
Eric Newberry wrote:
Klaus Schneider wrote:
- Reducing the default marking interval, that is, the time the queue has to stay above limit before the first packet is marked (here: to 5ms)
We could make this a separate interval in GenericLinkService::Options from baseCongestionMarkingInterval.
Actually, I thought about the following design:
- Ignore the interval when first marking the packet. Just mark the first packet once queue > threshold.
- Then, wait at least "INTERVAL" before marking the next packet. If the queue stays above the threshold, reduce the interval, as it is done currently.
I believe this is actually more or less the design I had in a previous patchset. I can't remember which number at the moment.
Updated by Anonymous almost 7 years ago
Yeah, I know. Let's go back to that and try it out.
Updated by Eric Newberry almost 7 years ago
The design changes in note 94 have been implemented in patchset 29.
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
- Ignore the interval when first marking the packet. Just mark the first packet once queue > threshold.
- Then, wait at least "INTERVAL" before marking the next packet. If the queue stays above the threshold, reduce the interval, as it is done currently.
The second point is unclear. If the queue was above threshold, then goes below, and then again above it before one interval has passed since the previous mark, should we mark or not? If yes, a queue that oscillates around the threshold value would mark too many packets I think.
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
- Ignore the interval when first marking the packet. Just mark the first packet once queue > threshold.
- Then, wait at least "INTERVAL" before marking the next packet. If the queue stays above the threshold, reduce the interval, as it is done currently.
The second point is unclear. If the queue was above threshold, then goes below, and then again above it before one interval has passed since the previous mark, should we mark or not? If yes, a queue that oscillates around the threshold value would mark too many packets I think.
I would say "no", don't mark the packet.
Updated by Eric Newberry almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
- Ignore the interval when first marking the packet. Just mark the first packet once queue > threshold.
- Then, wait at least "INTERVAL" before marking the next packet. If the queue stays above the threshold, reduce the interval, as it is done currently.
The second point is unclear. If the queue was above threshold, then goes below, and then again above it before one interval has passed since the previous mark, should we mark or not? If yes, a queue that oscillates around the threshold value would mark too many packets I think.
How would we track and prevent this?
Updated by Eric Newberry almost 7 years ago
The changes in notes 98 and 99 should be implemented in patchset 31.
Updated by Anonymous almost 7 years ago
I just figured out that enabling the TRACE logging (redirecting it to a file), completely destroyed the performance of my earlier measurements.
Here is one run over UDP (link RTT 20ms) with the TRACE logging:
All segments have been received.
Time elapsed: 13284.3 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 62.734813 Mbit/s
Total # of retransmitted segments: 953
Total # of received congestion marks: 1
RTT min/avg/max = 21.155/104.836/454.905 ms
Here is the exact same configuration without trace logging:
All segments have been received.
Time elapsed: 4716.32 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 176.703426 Mbit/s
Total # of retransmitted segments: 0
Total # of received congestion marks: 4
RTT min/avg/max = 20.311/43.109/137.979 ms
We get 3x higher throughput, less than half the avg. delay, and the retransmissions completely disappear.
Looks like I need to do the testing again, without trace logging.
Updated by Anonymous almost 7 years ago
So with the current settings, I get pretty good throughput on my local machine:
klaus@Latitude-E7470:~/backup_ccn/NFD$ ndncatchunks /local > /dev/null
All segments have been received.
Time elapsed: 1034.94 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 805.251379 Mbit/s
Total # of retransmitted segments: 0
Total # of received congestion marks: 6
RTT min/avg/max = 0.931/3.331/9.483 ms
And for the UDP tunnel (20ms RTT), the results are okay, even though one might want to fine-tune the marking a bit to get rid of the retransmissions:
klaus@Latitude-E7470:~/backup_ccn/NFD$ ndncatchunks /udp > /dev/null
All segments have been received.
Time elapsed: 8509.57 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 97.935614 Mbit/s
Total # of retransmitted segments: 43
Total # of received congestion marks: 9
RTT min/avg/max = 20.246/22.962/59.581 ms
Thus, after a couple small changes (see Gerrit), I'm fine with merging the current code.
Updated by Davide Pesavento almost 7 years ago
Klaus Schneider wrote:
I just figured out that enabling the TRACE logging (redirecting it to a file), completely destroyed the performance of my earlier measurements.
Well, yes, that's expected. Benchmarks should be run in release mode and with minimal logging. In particular, never enable TRACE globally for every module if you're doing something time-sensitive.
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
I just figured out that enabling the TRACE logging (redirecting it to a file), completely destroyed the performance of my earlier measurements.
Well, yes, that's expected. Benchmarks should be run in release mode and with minimal logging. In particular, never enable TRACE globally for every module if you're doing something time-sensitive.
I'll consider that next time :)
Updated by Anonymous almost 7 years ago
Just for completeness, here's the result for a TCP face (same scenario as above; RTT of 20ms):
All segments have been received.
Time elapsed: 21059.5 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 39.573042 Mbit/s
Total # of retransmitted segments: 0
Total # of received congestion marks: 19
RTT min/avg/max = 21.920/39.302/65.801 ms
Updated by Eric Newberry almost 7 years ago
Change 4411 has been merged. Are we ready to close this issue and move onto the NFD management component of this system (allowing us to allow and disallow congestion marking on a face and set any applicable parameters)?
Updated by Davide Pesavento almost 7 years ago
I believe so, unless Klaus wanted to do more as part of this task. Please open separate issues for 1) integration with NFD management and nfdc, and 2) EthernetTransport support.
Updated by Anonymous almost 7 years ago
- Status changed from Code review to Closed
- % Done changed from 90 to 100
Sure, we can put up separate issues.
Updated by Anonymous almost 7 years ago
@Eric: Can you create the issue for the NFD Management support?
Updated by Eric Newberry almost 7 years ago
Klaus Schneider wrote:
@Eric: Can you create the issue for the NFD Management support?
An issue for this has been created as #4465.