Feature #3636
closedndncatchunks: AIMD congestion control
100%
Description
The current version of chunks application uses a fixed window size and a “backoff and retry” strategy to deal with packet loss.
Update it with a TCP-like congestion control algorithm.
- Consumer uses packet timeout as signal of congestion;
- Consumer reacts to one packet loss event per RTT (to handle a burst of packet loss);
- Consumer takes one RTT sample per RTT;
- Consumer uses TCP's AIMD scheme to adjust congestion window size;
Files
Updated by Junxiao Shi over 8 years ago
- Project changed from NFD to ndn-tools
- Subject changed from Add congestion control to chunks application to ndncatchunks: TCP-like congestion control
- Description updated (diff)
- Category deleted (
Tools) - Start date deleted (
05/30/2016)
A design document should be uploaded before coding.
Also, I recommend exposing the choice of congestion control algorithm as a command line option.
The default could be the best available congestion control algorithm, but older algorithms should be kept so that ndncatchunks
can serve as a performance comparison tool.
Updated by Weiwei Liu over 8 years ago
- File Congestion_Control_Design_For_ndncatchunks.pdf Congestion_Control_Design_For_ndncatchunks.pdf added
Thanks for your suggestion.
I've attached a design document.
Updated by Davide Pesavento over 8 years ago
FYI, Andrea has recently implemented a rudimentary version of TCP congestion control in ndncatchunks
. The main reason was to prevent NFD from overloading due to too many incoming interests (it's pretty easy to hit the limit at the moment). I believe we can share the code, although I'm afraid it's more of a quick hack than a well-designed implementation.
Updated by Anonymous over 8 years ago
The design looks pretty solid and is practically the same that I used for my experimentation in ndnSIM (we can talk about it during the seminar on Thursday).
Just two comments:
"Consumer takes one RTT sample per RTT;"
Why is that?
slow start threshold, initial value: 200;
200 might be too low, if the optimal congestion window is higher than that. The only requirement for the sstreshhold is that it is "sufficiently high" and there is no harm in setting it to 100.000 or even to MAX_INTEGER.
Regarding Junxiao's comment: I think the choice of congestion control algorithm can be made more specific, by enabling/disabling certain features. For example:
- Enabling/Disabling the SACK mechanism (that is, only considering one loss event per RTT). The default should be enabled.
- Per loss event: Resetting the cwnd to 1 vs. resetting it to the sstresh (default: reset to sstresh).
- Consider gaps in sequence numbers (receiving out-of-order data packets) as congestion event? The default should be disabled.
Updated by Davide Pesavento over 8 years ago
Klaus Schneider wrote:
"Consumer takes one RTT sample per RTT;"
Why is that?
I think that's the traditional TCP behavior. See also https://tools.ietf.org/html/rfc6298#section-3 and https://tools.ietf.org/html/rfc7323#section-4.2
slow start threshold, initial value: 200;
200 might be too low, if the optimal congestion window is higher than that. The only requirement for the sstreshhold is that it is "sufficiently high" and there is no harm in setting it to 100.000 or even to MAX_INTEGER.
Agreed, ssthresh
should start from MAX_INT.
Updated by Weiwei Liu over 8 years ago
Thanks for all your comments and suggestions.
I just uploaded the code here: http://gerrit.named-data.net/2872. It's basically Shuo's original implementation of the design(except initSsthresh is modified to MAX_INT). And command line options for congestion control have not been added yet, we are still working on it.
Updated by Anonymous over 8 years ago
Davide Pesavento wrote:
Klaus Schneider wrote:"Consumer takes one RTT sample per RTT;"
Why is that?
I think that's the traditional TCP behavior. See also https://tools.ietf.org/html/rfc6298#section-3 and https://tools.ietf.org/html/rfc7323#section-4.2
Wouldn't it be more accurate to take the RTT of each data packet? What's the benefit of only using one per RTT?
Updated by Weiwei Liu over 8 years ago
In rfc7323, it says research has shown that "taking additional RTT samples within each RTT has little effect on the ultimate retransmission timeout value". But in Shuo's code, it seems that he actually takes RTT sample per data packet(retransmitted segments excluded) instead of per RTT. I'll confirm this with him.
Klaus Schneider wrote:
Davide Pesavento wrote:
Klaus Schneider wrote:"Consumer takes one RTT sample per RTT;"
Why is that?
I think that's the traditional TCP behavior. See also https://tools.ietf.org/html/rfc6298#section-3 and https://tools.ietf.org/html/rfc7323#section-4.2
Wouldn't it be more accurate to take the RTT of each data packet? What's the benefit of only using one per RTT?
Updated by Weiwei Liu over 8 years ago
I was wrong, Shuo actually took care of that inside the rtt-estimator. So actually he does take one rtt sample per RTT. And in fact, he tried to take one rtt sample per packet, but somehow the performance was bad especially when transferring large file. So he switched to taking one rtt sample per rtt.
Weiwei Liu wrote:
In rfc7323, it says research has shown that "taking additional RTT samples within each RTT has little effect on the ultimate retransmission timeout value". But in Shuo's code, it seems that he actually takes RTT sample per data packet(retransmitted segments excluded) instead of per RTT. I'll confirm this with him.
Klaus Schneider wrote:
Davide Pesavento wrote:
Klaus Schneider wrote:"Consumer takes one RTT sample per RTT;"
Why is that?
I think that's the traditional TCP behavior. See also https://tools.ietf.org/html/rfc6298#section-3 and https://tools.ietf.org/html/rfc7323#section-4.2
Wouldn't it be more accurate to take the RTT of each data packet? What's the benefit of only using one per RTT?
Updated by Anonymous over 8 years ago
"And in fact, he tried to take one rtt sample per packet, but somehow the performance was bad especially when transferring large file."
I don't think that the performance should suffer by taking one rtt sample per packet. It's just one operation (for the exponential moving average) per packet and doesn't require any memory.
Can we try to figure out the root cause? Or implement both with a parameter to choose the rtt sampling method.
Updated by Anonymous over 8 years ago
- Subject changed from ndncatchunks: TCP-like congestion control to ndncatchunks: AIMD congestion control
Changed title according to Beichuan's request.
Updated by Anonymous over 8 years ago
- File NDN_SACK_full.pdf NDN_SACK_full.pdf added
Weiwei Liu wrote:
I was wrong, Shuo actually took care of that inside the rtt-estimator. So actually he does take one rtt sample per RTT. And in fact, he tried to take one rtt sample per packet, but somehow the performance was bad especially when transferring large file. So he switched to taking one rtt sample per rtt.
Here is Shuo's full report and my comments on why the performance might have been bad. I think the problem is not that the algorithm takes multiple rtts per sample.
First page:
The initial value of the cwnd should be 1 not 0, right?
The initial sstresh should be MAX_INTEGER
The discussion "A packet loss event happens" is a bit unclear. It should be made clear that "loss event" means "window decrease at the consumer" and not that a single packet was lost in the network.
On each "loss event" the congestion window should be decreased to cwnd/2, not to 1! Decreasing it to 1 makes the whole SACK adaptation pointless (the only reason we use it, is to prevent overly strong reductions of the cwnd). And that leads to poor performance.
Results:
In the linear topology, Design 2, you reduce the cwnd to 1 on each loss event. This might explain much more of the poor performance than the RTT estimation.
In the dumbbell topology, you got Design 2 right, and it performs much better compared to the linear topology, and almost as good as Design 3.
Moreover, Design 3 creates a much higher queuing delay. The average total delay for Design 3 is over 100 ms, while for Design 2 it is only about 60 ms.
Thus, "taking one RTT sample per RTT" creates a much higher delay at a small increase in throughput. It's easy to get a higher throughput when you constantly keep the router queues full.
Moreover, I think "taking one RTT sample" vs. "taking multiple samples per RTT" is not the right knob to adjust. When the typical RTO estimation is the problem, I would change the RTO setting (currently meanRtt + 4*VarRTT) directly or change the way you calculate the RTT variance (as it seems to be too low for sampling every packet).
Updated by Shuo Yang over 8 years ago
Regarding to Klaus' question about RTT sampling frequency, I've read corresponding RFC (7323 & 6298) and one related paper: on estimating end-to-end network path properties (http://www.icir.org/mallman/pubs/AP01/AP01.pdf).
I quoted the corresponding explanations below:
The [RFC6298] RTT estimator has weighting factors,
alpha and beta, based on an implicit assumption that at most one RTTM
will be sampled per RTT. (https://tools.ietf.org/html/rfc7323#section-4.2)[Ludwig00] and [Floyd05] have highlighted the problem that an
unmodified RTO calculation, which is updated with per-packet RTT
samples, will truncate the path history too soon. This can lead to
an increase in spurious retransmissions, when the path properties
vary in the order of a few RTTs, but a high number of RTT samples are
taken on a much shorter timescale. (https://tools.ietf.org/html/rfc7323#section-4.2)As we found for [WS95], timing every packet makes little difference over
timing only one packet per RTT, even though by timing every packet we run
many more measurements through the EWMAs per unit time.
This in turn causes the EWMAs to adapt SRTT and RTTVAR more quickly to
current network conditions, and to more rapidly lose memory of conditions
further in the past. (http://www.icir.org/mallman/pubs/AP01/AP01.pdf)
Also the paper concluded that the performance of the RTO estimators
is dominated by their minimum values, and to a lesser extent,
the timer granularity, while being virtually unaffected by how
often round-trip time measurements are made.
In https://tools.ietf.org/html/rfc7323#appendix-G, it gives a modified RTO calculation
for taking multiple RTT per window. I think it'd interesting to try it out in our application.
Updated by Anonymous over 8 years ago
Thanks for looking up the references. Yeah, I think we should implement the Appendix G as Shuo said:
Appendix G. RTO Calculation Modification
Taking multiple RTT samples per window would shorten the history
calculated by the RTO mechanism in [RFC6298], and the below algorithm
aims to maintain a similar history as originally intended by
[RFC6298].
It is roughly known how many samples a congestion window worth of
data will yield, not accounting for ACK compression, and ACK losses.
Such events will result in more history of the path being reflected
in the final value for RTO, and are uncritical. This modification
will ensure that a similar amount of time is taken into account for
the RTO estimation, regardless of how many samples are taken per
window:
ExpectedSamples = ceiling(FlightSize / (SMSS * 2))
alpha' = alpha / ExpectedSamples
beta' = beta / ExpectedSamples
Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs".
Instead of using alpha and beta in the algorithm of [RFC6298], use
alpha' and beta' instead:
RTTVAR <- (1 - beta') * RTTVAR + beta' * |SRTT - R'|
SRTT <- (1 - alpha') * SRTT + alpha' * R'
(for each sample R')
Updated by Weiwei Liu over 8 years ago
With the latest code (change https://gerrit.named-data.net/#/c/2872/), a "Segmentation fault" error occurs when running ndncatchunks. Could someone confirm this?
Updated by Junxiao Shi over 8 years ago
Latest code doesn't even compile due to #3665-7.
Updated by Weiwei Liu over 8 years ago
Yes, I noticed that.. But it should work with an old ndn-cxx version, right?
Updated by Weiwei Liu over 8 years ago
I think the problem is the following code in consumer.cpp, discover should be replaced with m_discover.
m_discover = std::move(discover);
m_pipeline = std::move(pipeline);
m_nextToPrint = 0;
m_bufferedData.clear();
discover->onDiscoverySuccess.connect(bind(&Consumer::startPipeline, this, _1));
discover->onDiscoveryFailure.connect(bind(&Consumer::onFailure, this, _1));
Updated by Davide Pesavento over 8 years ago
Weiwei Liu wrote:
I think the problem is the following code in consumer.cpp, discover should be replaced with m_discover.
Yes, that's correct. Can you submit a fix?
Consumer::run
is not unit-tested at all. Could you add a (very simple) test case for it as well?
Updated by Weiwei Liu over 8 years ago
Sure. In the test case, we just need to simply call consumer.run() right? And what can we check?
Davide Pesavento wrote:
Weiwei Liu wrote:
I think the problem is the following code in consumer.cpp, discover should be replaced with m_discover.
Yes, that's correct. Can you submit a fix?
Consumer::run
is not unit-tested at all. Could you add a (very simple) test case for it as well?
Updated by Weiwei Liu over 8 years ago
I submitted a fix for cosnumer::run:
https://gerrit.named-data.net/#/c/2957/
Updated by Weiwei Liu over 8 years ago
- Status changed from In Progress to Code review
Updated by Weiwei Liu over 8 years ago
backport c++ 17 std::clamp
https://gerrit.named-data.net/#/c/3147/
Updated by Junxiao Shi about 8 years ago
- Status changed from Code review to Closed
- % Done changed from 0 to 100