Bug #5003
Updated by Anonymous about 5 years ago
The current implementation of the active queue management in GenericLinkService is too agressive, which can cause a drop in throughput. Example with 2 NFD nodes connected via UDP tunnel (added 20ms RTT): ~~~ klaus@consumer:~/work$ ndncatchunks /100m All segments have been received. Time elapsed: 43.3606 seconds Segments received: 23832 Transferred size: 104858 kB Goodput: 19.346137 Mbit/s Congestion marks: 13 (caused 13 window decreases) Timeouts: 144 (caused 17 window decreases) Retransmitted segments: 107 (0.446969%), skipped: 37 RTT min/avg/max = 10.520/22.440/427.402 ms ~~~ With congestion marks ignored: ~~~ klaus@consumer:~/work$ ndncatchunks --ignore-marks /100m All segments have been received. Time elapsed: 20.561 seconds Segments received: 23832 Transferred size: 104858 kB Goodput: 40.798586 Mbit/s Congestion marks: 48 (caused 0 window decreases) Timeouts: 1459 (caused 19 window decreases) Retransmitted segments: 1389 (5.50732%), skipped: 70 RTT min/avg/max = 10.574/33.479/1006.184 ms klaus@consumer:~/work$ ndncatchunks /100m ~~~ The queuing (and congestion marking) happens mostly inside NFD, since the links are faster than NFD can process (often the case in real networks too). The solution is to implement a more proper version of CoDel (see https://tools.ietf.org/html/rfc8289), CoDel, compared to what was done in #4362. There are two simplifications in the current code: 1. We measure the queue size (in bytes) rather than queuing delay (ms). 2. The first mark happens directly after exceeding the threshold, rather than exceeding the threshold for a given time period (100ms). I think (2) is more significant, and I will look at it first.