Bug #5003
closedFeature #1624: Design and Implement Congestion Control
Congestion Marking too aggressive
100%
Description
The current implementation of the active queue management in GenericLinkService is too agressive, which can cause a drop in throughput.
Example with 2 NFD nodes connected via UDP tunnel (added 10ms RTT):
klaus@consumer:~/work$ ndncatchunks /100m
All segments have been received.
Time elapsed: 43.3606 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 19.346137 Mbit/s
Congestion marks: 13 (caused 13 window decreases)
Timeouts: 144 (caused 17 window decreases)
Retransmitted segments: 107 (0.446969%), skipped: 37
RTT min/avg/max = 10.520/22.440/427.402 ms
With congestion marks ignored:
klaus@consumer:~/work$ ndncatchunks --ignore-marks /100m
All segments have been received.
Time elapsed: 20.561 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 40.798586 Mbit/s
Congestion marks: 48 (caused 0 window decreases)
Timeouts: 1459 (caused 19 window decreases)
Retransmitted segments: 1389 (5.50732%), skipped: 70
RTT min/avg/max = 10.574/33.479/1006.184 ms
klaus@consumer:~/work$ ndncatchunks /100m
The queuing (and congestion marking) happens mostly inside NFD, since the links are faster than NFD can process (often the case in real networks too).
The solution is to implement a more proper version of CoDel (see https://tools.ietf.org/html/rfc8289), compared to what was done in #4362.
There are two simplifications in the current code:
- We measure the queue size (in bytes) rather than queuing delay (ms).
- The first mark happens directly after exceeding the threshold, rather than exceeding the threshold for a given time period (100ms).
I think (2) is more significant, and I will look at it first.
Files
Updated by Anonymous over 5 years ago
- Category set to Faces
- Parent task set to #1624
Updated by Anonymous about 5 years ago
We wanted to find out how many of these timeouts are real packet drops vs. just exceeding the consumer RTO limit.
Some measurements about the Unix sockets. Retrieving a 50 Megabyte file:
klaus@localhost:~$ ndncatchunks --ignore-marks /50
All segments have been received.
Time elapsed: 2.04337 seconds
Segments received: 11916
Transferred size: 52428.8 kB
Goodput: 205.264109 Mbit/s
Congestion marks: 0 (caused 0 window decreases)
Timeouts: 3248 (caused 2 window decreases)
Retransmitted segments: 2650 (18.1931%), skipped: 598
RTT min/avg/max = 0.088/192.217/473.843 ms
With higher RTO and interest lifetime (values in ms):
klaus@localhost:~$ ndncatchunks --ignore-marks --min-rto 40000 --lifetime 100000 /50
All segments have been received.
Time elapsed: 5.9546 seconds
Segments received: 11916
Transferred size: 52428.8 kB
Goodput: 70.438089 Mbit/s
Congestion marks: 0 (caused 0 window decreases)
Timeouts: 0 (caused 0 window decreases)
Retransmitted segments: 0 (0%), skipped: 0
RTT min/avg/max = 0.370/2139.468/4118.719 ms
Updated by Anonymous about 5 years ago
Some more local (unix socket) measurements with a larger file (500MB), CS disabled:
Without congestion marks:
klaus@localhost:~$ ndncatchunks --ignore-marks /500
All segments have been received.
Time elapsed: 23.4087 seconds
Segments received: 119157
Transferred size: 524288 kB
Goodput: 179.177290 Mbit/s
Congestion marks: 11098 (caused 0 window decreases)
Timeouts: 2660 (caused 2 window decreases)
Retransmitted segments: 1905 (1.57357%), skipped: 755
RTT min/avg/max = 0.062/317.786/528.805 ms
With congestion marks:
klaus@localhost:~$ ndncatchunks /500
All segments have been received.
Time elapsed: 7.3627 seconds
Segments received: 119157
Transferred size: 524288 kB
Goodput: 569.669132 Mbit/s
Congestion marks: 30 (caused 16 window decreases)
Timeouts: 3545 (caused 2 window decreases)
Retransmitted segments: 3019 (2.47103%), skipped: 526
RTT min/avg/max = 0.298/9.548/465.668 ms
Increasing the minimum RTO:
klaus@localhost:~$ ndncatchunks --min-rto 1000 /500
All segments have been received.
Time elapsed: 6.50988 seconds
Segments received: 119157
Transferred size: 524288 kB
Goodput: 644.298549 Mbit/s
Congestion marks: 32 (caused 18 window decreases)
Timeouts: 0 (caused 0 window decreases)
Retransmitted segments: 0 (0%), skipped: 0
RTT min/avg/max = 0.208/7.186/236.157 ms
Updated by Anonymous about 5 years ago
Run in the scenario described above (UDP tunnel, 20ms RTT):
klaus@consumer:~/work$ ndncatchunks /200
All segments have been received.
Time elapsed: 75.5894 seconds
Segments received: 47663
Transferred size: 209715 kB
Goodput: 22.195196 Mbit/s
Congestion marks: 110 (caused 2 window decreases)
Timeouts: 4678 (caused 42 window decreases)
Retransmitted segments: 4605 (8.81036%), skipped: 73
RTT min/avg/max = 20.878/49.811/805.386 ms
Consumer UDP Receive/Buffer Errors: 325
Producer UDP Receive/Buffer Errors: 1974
Updated by Anonymous about 5 years ago
Same scenario run again with higher timeouts:
klaus@consumer:~/work$ ndncatchunks --min-rto 2000 --lifetime 10000 /200
All segments have been received.
Time elapsed: 56.7384 seconds
Segments received: 47663
Transferred size: 209715 kB
Goodput: 29.569444 Mbit/s
Congestion marks: 10 (caused 2 window decreases)
Timeouts: 870 (caused 37 window decreases)
Retransmitted segments: 870 (1.79259%), skipped: 0
RTT min/avg/max = 20.752/57.494/918.524 ms
Consumer UDP Receive/Buffer Errors: 193
Producer UDP Receive/Buffer Errors: 677
Total: 870
With large enough timeouts, all the timeouts are caused by UDP buffer overflows/drops.
Updated by Anonymous about 5 years ago
Moreover, we're getting the same number (or even more) of timeouts on a file 10x smaller (20M vs 200M):
ndncatchunks --min-rto 2000 --lifetime 10000 /20
All segments have been received.
Time elapsed: 6.94108 seconds
Segments received: 4767
Transferred size: 20971.5 kB
Goodput: 24.170902 Mbit/s
Congestion marks: 7 (caused 2 window decreases)
Timeouts: 1231 (caused 2 window decreases)
Retransmitted segments: 1231 (20.5235%), skipped: 0
RTT min/avg/max = 21.276/100.755/215.389 ms
klaus@consumer:~/work$ ndncatchunks --min-rto 2000 --lifetime 10000 /20
That means that likely most of those timeouts are happen during the slow-start phase.
Updated by Anonymous about 5 years ago
Some iperf results as comparison. Same scenario (3 nodes, 20ms RTT):
klaus@consumer:~/work$ iperf -e -l 256K -c 10.2.0.3
------------------------------------------------------------
Client connecting to 10.2.0.3, TCP port 5001 with pid 15837
Write buffer size: 256 KByte
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.1.0.1 port 52334 connected with 10.2.0.3 port 5001
[ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT
[ 3] 0.00-10.05 sec 125 MBytes 105 Mbits/sec 1/0 142 448K/25883 us
Updated by Davide Pesavento about 5 years ago
- Subject changed from Congestion Marking too agressive to Congestion Marking too aggressive
Updated by Davide Pesavento about 5 years ago
- Status changed from New to Code review
- Target version set to v0.7
Updated by Davide Pesavento about 5 years ago
I don't understand the context of these numbers you posted. Are they measured after applying the change? How do they compare with the behavior before?
Updated by Anonymous about 5 years ago
Yes, all the numbers on #note-3 and later are for the new design.
The original post contains the comparison between before and after the change: 19.346137 Mbit/s vs. 40.798586 Mbit/s
Updated by Anonymous about 5 years ago
Basically the later comments are a follow-up discussion I had with Beichuan to determine:
- Whether timeouts only/mostly happen during the start of the connection?
- Whether those are real packet drops vs. just timeouts caused by the RTO setting?
- Whether NFD + unix sockets will drop any packets or queue them indefinitely?
Updated by Davide Pesavento about 5 years ago
Klaus Schneider wrote:
The original post contains the comparison between before and after the change: 19.346137 Mbit/s vs. 40.798586 Mbit/s
I'm still confused. The description compares a run with cong marks vs one without. I don't see any "before vs after". Please clarify.
Moreover, it'd be great to see the behavior in a few more scenarios, e.g. different link delays (1/10/100 ms or 1/20/200 ms or something like that), different transfer sizes (say, 1MB and 100MB).
Updated by Anonymous about 5 years ago
Note: The default UDP Buffer capacity seems to be quite low at only 106KB.
1570305326.735465 TRACE: [nfd.GenericLinkService] [id=264,local=udp4://10.1.0.1:6363,remote=udp4://10.2.0.3:6363] txqlen=768 threshold=65536 capacity=106496
Updated by Davide Pesavento about 5 years ago
But in any case, if the capacity is so low, it still doesn't make sense to slow down the throughput even more by adding congestion marks.
Maybe we should consider increasing the defaultCongestionThreshold
? (currently at 64 KB)
Updated by Davide Pesavento about 5 years ago
Or change that std::min
to std::max
? Or have both an upper bound and a lower bound?
Updated by Anonymous about 5 years ago
If the buffer is too small, there is very little you can do via congestion marking.
- Lower threshold: You mark too many packets, poor throughput
- Higher threshold: You mark very few packets and get lots of packet drops from the queue.
I think the default threshold is fine, and if any thing, should be tuned (via nfdc) to the link capacity and desired queuing delay. A higher threshold leads to higher avg. queuing delay, but also higher throughput.
The ideal would be to have the threshold approximate 5ms of queuing delay. Aka use a higher threshold for faster links.
Updated by Anonymous about 5 years ago
Okay, here's a proper comparison between old and new code. 10ms RTT, UDP tunnel, 100MByte file.
I used 3 runs, since there's a lot of variance.
Old code (100MB):
All segments have been received.
Time elapsed: 64.6259 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 12.980265 Mbit/s
Congestion marks: 12 (caused 9 window decreases)
Timeouts: 242 (caused 34 window decreases)
Retransmitted segments: 176 (0.733089%), skipped: 66
RTT min/avg/max = 11.236/24.991/1229.038 ms
klaus@consumer:~/work$ ndncatchunks /100
All segments have been received.
Time elapsed: 26.9613 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 31.113460 Mbit/s
Congestion marks: 15 (caused 11 window decreases)
Timeouts: 788 (caused 14 window decreases)
Retransmitted segments: 772 (3.1377%), skipped: 16
RTT min/avg/max = 10.894/24.501/542.259 ms
klaus@consumer:~/work$ ndncatchunks /100
All segments have been received.
Time elapsed: 21.4059 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 39.188221 Mbit/s
Congestion marks: 14 (caused 14 window decreases)
Timeouts: 12 (caused 1 window decreases)
Retransmitted segments: 12 (0.0503271%), skipped: 0
RTT min/avg/max = 10.974/18.564/59.997 ms
New code 100MB:
All segments have been received.
Time elapsed: 21.2545 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 39.467509 Mbit/s
Congestion marks: 98 (caused 4 window decreases)
Timeouts: 1943 (caused 16 window decreases)
Retransmitted segments: 1900 (7.3838%), skipped: 43
RTT min/avg/max = 11.060/50.522/772.800 ms
klaus@consumer:~/work$ ndncatchunks /100
All segments have been received.
Time elapsed: 14.9753 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 56.016126 Mbit/s
Congestion marks: 103 (caused 4 window decreases)
Timeouts: 1139 (caused 13 window decreases)
Retransmitted segments: 1076 (4.3199%), skipped: 63
RTT min/avg/max = 11.243/79.103/790.922 ms
klaus@consumer:~/work$ ndncatchunks /100
All segments have been received.
Time elapsed: 14.0394 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 59.750446 Mbit/s
Congestion marks: 33 (caused 5 window decreases)
Timeouts: 889 (caused 10 window decreases)
Retransmitted segments: 815 (3.30669%), skipped: 74
RTT min/avg/max = 10.952/27.316/459.914 ms
klaus@consumer:~/work$
General result: You see much fewer window decreases via congestion marks + Higher throughput.
Updated by Anonymous about 5 years ago
Same measurement, but with 200 Megabyte file:
Old:
klaus@consumer:~/work$ ndncatchunks /200
All segments have been received.
Time elapsed: 35.9728 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 23.319323 Mbit/s
Congestion marks: 19 (caused 14 window decreases)
Timeouts: 130 (caused 12 window decreases)
Retransmitted segments: 90 (0.376223%), skipped: 40
RTT min/avg/max = 11.299/23.571/433.105 ms
klaus@consumer:~/work$ ndncatchunks /200
All segments have been received.
Time elapsed: 47.5256 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 17.650725 Mbit/s
Congestion marks: 10 (caused 9 window decreases)
Timeouts: 308 (caused 25 window decreases)
Retransmitted segments: 257 (1.06688%), skipped: 51
RTT min/avg/max = 10.840/25.944/509.276 ms
klaus@consumer:~/work$ ndncatchunks /200
klaus@consumer:~/work$ ndncatchunks /200
All segments have been received.
Time elapsed: 24.11 seconds
Segments received: 23832
Transferred size: 104858 kB
Goodput: 34.793028 Mbit/s
Congestion marks: 17 (caused 17 window decreases)
Timeouts: 0 (caused 0 window decreases)
Retransmitted segments: 0 (0%), skipped: 0
RTT min/avg/max = 11.209/17.683/58.173 ms
New:
klaus@consumer:~/work$ ndncatchunks /200
All segments have been received.
Time elapsed: 31.3761 seconds
Segments received: 47663
Transferred size: 209715 kB
Goodput: 53.471389 Mbit/s
Congestion marks: 31 (caused 4 window decreases)
Timeouts: 523 (caused 17 window decreases)
Retransmitted segments: 453 (0.941475%), skipped: 70
RTT min/avg/max = 11.083/35.429/909.116 ms
klaus@consumer:~/work$ ndncatchunks /200
All segments have been received.
Time elapsed: 27.983 seconds
Segments received: 47663
Transferred size: 209715 kB
Goodput: 59.955015 Mbit/s
Congestion marks: 1 (caused 1 window decreases)
Timeouts: 54 (caused 8 window decreases)
Retransmitted segments: 44 (0.0922297%), skipped: 10
RTT min/avg/max = 10.834/17.465/353.567 ms
klaus@consumer:~/work$ ndncatchunks /200
All segments have been received.
Time elapsed: 32.4897 seconds
Segments received: 47663
Transferred size: 209715 kB
Goodput: 51.638518 Mbit/s
Congestion marks: 2 (caused 1 window decreases)
Timeouts: 521 (caused 24 window decreases)
Retransmitted segments: 390 (0.811604%), skipped: 131
RTT min/avg/max = 11.039/22.453/754.475 ms
Updated by Davide Pesavento about 5 years ago
Klaus Schneider wrote:
I used 3 runs, since there's a lot of variance.
Did you disable the content store?
Updated by Anonymous about 5 years ago
Yes, I set "cs_max_packets 0" on both NFDs.
Updated by Anonymous about 5 years ago
Log level is "INFO" btw, and piped to text file.
sudo nfd 2>log.txt
Updated by Davide Pesavento about 5 years ago
- Status changed from Code review to Closed
- % Done changed from 0 to 100