Bug #4408
openNFD throughput using large chunk is low
0%
Description
As we discussed on 12/20/2017 call, NFD performance is a bottleneck for large data applications such as High Energy Particle Physics (HEP).
Here are the numbers we observed using ndnputchunks and catchunks. The RTT was around 3ms. Link bandwidth = 10Gbps
We also noted the NFD throughput numbers, they are similar.
Goodput | Init CWND | Step | link_MTU | NDN_MTU |
---|---|---|---|---|
81.033611 Mbit/s | 500 | 200 | 9000 | 1M |
62.872308 Mbit/s | 500 | 200 | 9000 | 2M |
69.155263 Mbit/s | 500 | 200 | 9000 | 3M |
76.271195 Mbit/s | 500 | 200 | 9000 | 4M |
78.672476 Mbit/s | 500 | 200 | 9000 | 5M |
When we use just one NFD, putchunks and catchunks connected to the same NFD (unix face), performance numbers are a little better.
Goodput | Init CWND | Step | link_MTU | NDN_MTU |
---|---|---|---|---|
347.369347 Mbit/s | 500 | 200 | 9000 | 1M |
350.066353 Mbit/s | 500 | 200 | 9000 | 2M |
327.004883 Mbit/s | 500 | 200 | 9000 | 3M |
335.431581 Mbit/s | 500 | 200 | 9000 | 4M |
344.075071 Mbit/s | 500 | 200 | 9000 | 5M |
Files
Updated by Anonymous almost 7 years ago
What's the exact putchunks and catchunks commands you ran?
Updated by Lan Wang almost 7 years ago
The bandwidth-delay product is 10Gbps * 3ms = 30Mb = 3.75MB. The congestion window cannot be bigger than this, which means for a 1MB packet size, the congestion window should be at most 3.75 packets. The initial window size of 500 is too big. Or you have a different definition of window? It usually means the number of outstanding packets.
Updated by Junxiao Shi almost 7 years ago
- Status changed from In Progress to New
An issue cannot be InProgress without an assignee.
Updated by Anonymous almost 7 years ago
Hey Susmit, I think you should retry your measurements with the very latest version of NFD and ndn-tools. It includes two improvements:
- The basic congestion control scheme works now (on UDP, TCP, and Unix sockets)
- Ndncatchunks can run without printing a line per packet, which makes it faster.
I can get better performance using normal-sized chunks (4KB):
klaus@Latitude-E7470:~$ ndncatchunks --aimd-ignore-cong-marks /bla > /dev/null
All segments have been received.
Time elapsed: 2582.85 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 322.662441 Mbit/s
Total # of retransmitted segments: 2222
Total # of received congestion marks: 152
RTT min/avg/max = 0.875/45.972/329.765 ms
I get even better performance when using the congestion marks:
ndncatchunks /bla > /dev/null
All segments have been received.
Time elapsed: 1285.99 milliseconds
Total # of segments received: 23676
Total size: 104174kB
Goodput: 648.052707 Mbit/s
Total # of retransmitted segments: 0
Total # of received congestion marks: 9
RTT min/avg/max = 0.847/17.817/174.590 ms
Updated by Davide Pesavento almost 7 years ago
- Start date deleted (
12/20/2017)
Klaus Schneider wrote:
- The basic congestion control scheme works now (on UDP, TCP, and Unix sockets)
Remember that the above still requires changing a flag in NFD's source code. You can wait for #4465 if you don't want to do that.
Updated by susmit shannigrahi almost 7 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:
- The basic congestion control scheme works now (on UDP, TCP, and Unix sockets)
Remember that the above still requires changing a flag in NFD's source code. You can wait for #4465 if you don't want to do that.
Which flag should I change?
Updated by Eric Newberry almost 7 years ago
susmit shannigrahi wrote:
Davide Pesavento wrote:
Klaus Schneider wrote:
- The basic congestion control scheme works now (on UDP, TCP, and Unix sockets)
Remember that the above still requires changing a flag in NFD's source code. You can wait for #4465 if you don't want to do that.
Which flag should I change?
In the GenericLinkService::Options
constructor, you can change allowCongestionMarking
to true.
Updated by Anonymous almost 7 years ago
Eric Newberry wrote:
In the
GenericLinkService::Options
constructor, you can changeallowCongestionMarking
to true.
Yeah, plus remember to do that for all involved hosts.
Updated by susmit shannigrahi almost 7 years ago
- File throughput.pdf throughput.pdf added
Looks like the performance peaks around 1MB at 1.4Gbps. See attached figure.
This is on the same machine (unix face).
$ ndncatchunks --aimd-initial-cwnd=100/200/300 --aimd-ignore-cong-marks /test > /dev/null
Updated by Anonymous almost 7 years ago
Davide Pesavento wrote:
why
--aimd-ignore-cong-marks
?
Yeah, using the congestion marks (removing the --aimd-ignore-cong-marks option) should give you an even higher performance.
Can you post the catchunks summary output? I'm especially interested in #retransmissions and RTT.
Updated by susmit shannigrahi almost 7 years ago
In the
GenericLinkService::Options
constructor, you can changeallowCongestionMarking
to true.
Thanks, Eric.
I noticed that you merged a change that incorporates the config file option. So this is no longer need, right?
Updated by Eric Newberry almost 7 years ago
susmit shannigrahi wrote:
In the
GenericLinkService::Options
constructor, you can changeallowCongestionMarking
to true.Thanks, Eric.
I noticed that you merged a change that incorporates the config file option. So this is no longer need, right?
Correct. We're planning to add this to management soon as well.