Bug #4408
open
NFD throughput using large chunk is low
Added by susmit shannigrahi almost 7 years ago.
Updated almost 7 years ago.
Description
As we discussed on 12/20/2017 call, NFD performance is a bottleneck for large data applications such as High Energy Particle Physics (HEP).
Here are the numbers we observed using ndnputchunks and catchunks. The RTT was around 3ms. Link bandwidth = 10Gbps
We also noted the NFD throughput numbers, they are similar.
Goodput |
Init CWND |
Step |
link_MTU |
NDN_MTU |
81.033611 Mbit/s |
500 |
200 |
9000 |
1M |
62.872308 Mbit/s |
500 |
200 |
9000 |
2M |
69.155263 Mbit/s |
500 |
200 |
9000 |
3M |
76.271195 Mbit/s |
500 |
200 |
9000 |
4M |
78.672476 Mbit/s |
500 |
200 |
9000 |
5M |
When we use just one NFD, putchunks and catchunks connected to the same NFD (unix face), performance numbers are a little better.
Goodput |
Init CWND |
Step |
link_MTU |
NDN_MTU |
347.369347 Mbit/s |
500 |
200 |
9000 |
1M |
350.066353 Mbit/s |
500 |
200 |
9000 |
2M |
327.004883 Mbit/s |
500 |
200 |
9000 |
3M |
335.431581 Mbit/s |
500 |
200 |
9000 |
4M |
344.075071 Mbit/s |
500 |
200 |
9000 |
5M |
Files
What's the exact putchunks and catchunks commands you ran?
The bandwidth-delay product is 10Gbps * 3ms = 30Mb = 3.75MB. The congestion window cannot be bigger than this, which means for a 1MB packet size, the congestion window should be at most 3.75 packets. The initial window size of 500 is too big. Or you have a different definition of window? It usually means the number of outstanding packets.
- Status changed from In Progress to New
An issue cannot be InProgress without an assignee.
- Tracker changed from Task to Bug
Hey Susmit, I think you should retry your measurements with the very latest version of NFD and ndn-tools. It includes two improvements:
- The basic congestion control scheme works now (on UDP, TCP, and Unix sockets)
- Ndncatchunks can run without printing a line per packet, which makes it faster.
I can get better performance using normal-sized chunks (4KB):
klaus@Latitude-E7470:~$ ndncatchunks --aimd-ignore-cong-marks /bla > /dev/null
All segments have been received.
Time elapsed: 2582.85 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 322.662441 Mbit/s
Total # of retransmitted segments: 2222
Total # of received congestion marks: 152
RTT min/avg/max = 0.875/45.972/329.765 ms
I get even better performance when using the congestion marks:
ndncatchunks /bla > /dev/null
All segments have been received.
Time elapsed: 1285.99 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 648.052707 Mbit/s
Total # of retransmitted segments: 0
Total # of received congestion marks: 9
RTT min/avg/max = 0.847/17.817/174.590 ms
- Start date deleted (
12/20/2017)
Klaus Schneider wrote:
- The basic congestion control scheme works now (on UDP, TCP, and Unix sockets)
Remember that the above still requires changing a flag in NFD's source code. You can wait for #4465 if you don't want to do that.
Davide Pesavento wrote:
Klaus Schneider wrote:
- The basic congestion control scheme works now (on UDP, TCP, and Unix sockets)
Remember that the above still requires changing a flag in NFD's source code. You can wait for #4465 if you don't want to do that.
Which flag should I change?
susmit shannigrahi wrote:
Davide Pesavento wrote:
Klaus Schneider wrote:
- The basic congestion control scheme works now (on UDP, TCP, and Unix sockets)
Remember that the above still requires changing a flag in NFD's source code. You can wait for #4465 if you don't want to do that.
Which flag should I change?
In the GenericLinkService::Options
constructor, you can change allowCongestionMarking
to true.
Eric Newberry wrote:
In the GenericLinkService::Options
constructor, you can change allowCongestionMarking
to true.
Yeah, plus remember to do that for all involved hosts.
Looks like the performance peaks around 1MB at 1.4Gbps. See attached figure.
This is on the same machine (unix face).
$ ndncatchunks --aimd-initial-cwnd=100/200/300 --aimd-ignore-cong-marks /test > /dev/null
why --aimd-ignore-cong-marks
?
Davide Pesavento wrote:
why --aimd-ignore-cong-marks
?
Yeah, using the congestion marks (removing the --aimd-ignore-cong-marks option) should give you an even higher performance.
Can you post the catchunks summary output? I'm especially interested in #retransmissions and RTT.
In the GenericLinkService::Options
constructor, you can change allowCongestionMarking
to true.
Thanks, Eric.
I noticed that you merged a change that incorporates the config file option. So this is no longer need, right?
susmit shannigrahi wrote:
In the GenericLinkService::Options
constructor, you can change allowCongestionMarking
to true.
Thanks, Eric.
I noticed that you merged a change that incorporates the config file option. So this is no longer need, right?
Correct. We're planning to add this to management soon as well.
Also available in: Atom
PDF