Bug #4551: catchunks: PipelineInterestsAimd::printSummary shows incorrect RTTs if all packets are retransmitted - ndn-tools - NDN project issue tracking system

Actions

Copy link

Bug #4551

closed

catchunks: PipelineInterestsAimd::printSummary shows incorrect RTTs if all packets are retransmitted

Added by Chavoosh Ghasemi over 7 years ago. Updated over 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Chavoosh Ghasemi

Start date:

03/20/2018

Due date:

% Done:

100%

Estimated time:

1.00 h

Description

I have ported catchunks and putchunks tools to ndnSIM and after running them by different bandwidths I just realized for small bandwidth the printSummary function prints an odd output, like the following one:

All segments have been received.
Time elapsed: 591.6 milliseconds
Total # of segments received: 9
Total size: 35.225kB
Goodput: 476.335363 kbit/s
Total # of packet loss events: 2
Packet loss rate: 0.222222
Total # of retransmitted segments: 10
Total # of received congestion marks: 0
RTT min/avg/max = 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546
70353751698604991057655128207624549009038932894407586850845513394230458323690322294816580855933212334827479782620414472316873817718091929988125040
4026184124858368.000/0.000/0.000 ms

As we can see the value of minRTT is so big, and this is because we initialize minRTT with numeric_limits<double>::max() and apparently it does not change, so after calling printSummary function it prints out the max value of double type.
Anyways, I am wondering why minRTT does not update. BTW, the above output shows we have some packet loss, but all segments are received, successfully (i.e. 9 out of 9).

Actions

Copy link

Updated by Chavoosh Ghasemi over 7 years ago

Description updated (diff)

Actions

Copy link

Updated by Chavoosh Ghasemi over 7 years ago

Description updated (diff)

Actions

Copy link

Updated by Anonymous over 7 years ago

None of the RTT measurements is updated, so it seems that the function "m_rttEstimator.addMeasurement" is never called.

Probably because all of your packets needed at least one retransmission, for which RTTs are (rightfully) not sampled.

Now the question is: Why do all of your packets experience at least one retransmission?

Actions

Copy link

Updated by Junxiao Shi over 7 years ago

Tracker changed from Task to Bug
Subject changed from Bug in "printSummary" function of chunks tools' pipeline to catchunks: PipelineInterestsAimd::printSummary shows incorrect RTTs if all packets are retransmitted

Why do all of your packets experience at least one retransmission?

With best-route strategy, if the first nexthop is bad (does not lead to Data) and the second nexthop is good, every chunk would need a retransmission.

With a better strategy, this can still happen if the file has only one chunk.

Actions

Copy link

Updated by Chavoosh Ghasemi over 7 years ago

With best-route strategy, if the first nexthop is bad (does not lead to Data) and the second nexthop is good, every chunk would need a retransmission.

The scenario that I tested ndnchunks on is so simple, three nodes where node_1 is connected to node_2 and node_2 is connected to node_3 and the bandwidth and delay of links are 10Mbps and 10ms, respectively. Note that node_1 is the consumer and node_3 is the producer. Based on this scenario there is only one nexthop (which leads us to the Data). So, I think maybe the reason is something else.

If the problem was packet drop, we should have seen more packet loss (not only 2). It is obvious that all the first packets will experience a timeout, so the second interest will fetch the Data (maybe from intermediate node (i.e. node_2). So, maybe changing the timeout value can mitigate this problem; however, even if that is the case, we should ask this question that what should we print out in scenarios like this.

FYI, by increasing the bandwidth to 10Mbps the problem will be mitigated.

Actions

Copy link

Updated by Anonymous over 7 years ago

"Loss events" is different from packet drops. The actual number of dropped packets is 10 (the same as retx).

Also yes, you should increase the minrto (let's say to 2seconds) and try again.

It might be that your low BW leads to high queuing delays, which leads to the timeouts.

Actions

Copy link

Updated by Anonymous over 7 years ago

BTW, what's your BW in the original measurement?

Actions

Copy link

Updated by Chavoosh Ghasemi over 7 years ago

Klaus Schneider wrote:

BTW, what's your BW in the original measurement?

The scenario based on the output is shown the links are 1Mbps and delay is 10ms. BTW, what should we do for scenarios like this? It does not make sense to print out max of the double type as min_rtt.

Actions

Copy link

Updated by Anonymous over 7 years ago

Chavoosh Ghasemi wrote:

Klaus Schneider wrote:

BTW, what's your BW in the original measurement?

The scenario based on the output is shown the links are 1Mbps and delay is 10ms. BTW, what should we do for scenarios like this? It does not make sense to print out max of the double type as min_rtt.

Well, we could print out an error message saying that there were no useful RTT measurements.

But more importantly you should fix your setup to avoid getting only timeouts in the first place.

Did changing the rto fix the problem?

Actions

Copy link

#10

Updated by Chavoosh Ghasemi over 7 years ago

Did changing the rto fix the problem?

No, it did not solve the problem. I've changed some other parameters but the problem does mitigate unless the BW be increased.

Actions

Copy link

#11

Updated by Anonymous over 7 years ago

Chavoosh Ghasemi wrote:

Did changing the rto fix the problem?

No, it did not solve the problem. I've changed some other parameters but the problem does mitigate unless the BW be increased.

So what's the smallest BW where it works, and the highest BW where it doesn't work?

+ Post your catchunks commands and the output.

Actions

Copy link

#12

Updated by Anonymous over 7 years ago

Chavoosh just told me that the problem was that the PIT lifetime is fixed at 100ms.

Is there any way to increase that?

Actions

Copy link

#13

Updated by Anonymous over 7 years ago

Subject changed from catchunks: PipelineInterestsAimd::printSummary shows incorrect RTTs if all packets are retransmitted to Initial PIT Lifetime too low?

Actions

Copy link

#14

Updated by Junxiao Shi over 7 years ago

Subject changed from Initial PIT Lifetime too low? to catchunks: PipelineInterestsAimd::printSummary shows incorrect RTTs if all packets are retransmitted

Please do not change issue title. The unusually large number is a real bug and must be fixed.
If you have concerns about PIT lifetime, ask a question on nfd-dev mailing list or report a bug on NFD (not ndn-tools) issue tracker.

Actions

Copy link

#15

Updated by Anonymous over 7 years ago

Okay. It would be good for Chavoosh to post a full problem description (regarding the pit lifetime) and what steps he took to fix it.

Actions

Copy link

#16

Updated by Chavoosh Ghasemi over 7 years ago

If you have concerns about PIT lifetime, ask a question on nfd-dev mailing list or report a bug on NFD (not ndn-tools) issue tracker.

Although this problem is related to NFD, I do not think it should be tracked in nfd-dev mailing list, since the default value of interest lifetime is set in ndn-cxx (see ndn-cxx/src/interest.hpp:38). Also, you can take a look at #2202. Anyways, the default value for interest lifetime is set to 4000 milliseconds. Previously, I used 100 milliseconds which caused at least one retransmission for each Interest packet. I increased it to 1000 milliseconds and the problem solved. However, after checking the ndn-cxx source code, I just realized the default value should be set to 4000 milliseconds (which works properly).

About the scenario where we do not have any measurement for RTT, we should print a warning message (like "No measurement is available"), instead of printing weird output (as is shown in the problem description). If you have no comment, I'll change the source code and push it to gerrit.

Actions

Copy link

#17

Updated by Davide Pesavento over 7 years ago

Description updated (diff)

Actions

Copy link

#18

Updated by Davide Pesavento over 7 years ago

Chavoosh Ghasemi wrote:

Although this problem is related to NFD, I do not think it should be tracked in nfd-dev mailing list, since the default value of interest lifetime is set in ndn-cxx (see ndn-cxx/src/interest.hpp:38). Also, you can take a look at #2202. Anyways, the default value for interest lifetime is set to 4000 milliseconds. Previously, I used 100 milliseconds which caused at least one retransmission for each Interest packet. I increased it to 1000 milliseconds and the problem solved. However, after checking the ndn-cxx source code, I just realized the default value should be set to 4000 milliseconds (which works properly).

So, just to be clear, there is no problem with PIT lifetime, you were just setting an Interest lifetime that was too low, correct?

Actions

Copy link

#19

Updated by Chavoosh Ghasemi over 7 years ago

So, just to be clear, there is no problem with PIT lifetime, you were just setting an Interest lifetime that was too low, correct?

Correct.

Actions

Copy link

#20

Updated by Anonymous over 7 years ago

Chavoosh Ghasemi wrote:

If you have concerns about PIT lifetime, ask a question on nfd-dev mailing list or report a bug on NFD (not ndn-tools) issue tracker.

Although this problem is related to NFD, I do not think it should be tracked in nfd-dev mailing list, since the default value of interest lifetime is set in ndn-cxx (see ndn-cxx/src/interest.hpp:38). Also, you can take a look at #2202. Anyways, the default value for interest lifetime is set to 4000 milliseconds. Previously, I used 100 milliseconds which caused at least one retransmission for each Interest packet. I increased it to 1000 milliseconds and the problem solved. However, after checking the ndn-cxx source code, I just realized the default value should be set to 4000 milliseconds (which works properly).

If you had told us about this change this would have been much easier to debug...

Also, I think your example is still an extreme corner case, since if the PIT lifetime was even lower, you wouldn't get any replies at all.

Somehow you hit the sweet spot where the Interest lifetime (100ms) is too short for the initial Interests, but long enough for the follow-up Interests that hit the cache.

This is quite unlikely, so it's also unlikely that users would see your proposed warning message. In most cases either the RTT would display correctly, or catchunks would return a timeout message without any summary at all: "Reached the maximum number of timeout retries".

Actions

Copy link

#21

Updated by Chavoosh Ghasemi over 7 years ago

Assignee set to Chavoosh Ghasemi

Actions

Copy link

#22

Updated by Davide Pesavento over 7 years ago

Status changed from New to Closed
% Done changed from 0 to 100

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD » ndn-tools

Tags

Bug #4551

catchunks: PipelineInterestsAimd::printSummary shows incorrect RTTs if all packets are retransmitted

Updated by Chavoosh Ghasemi over 7 years ago

Updated by Chavoosh Ghasemi over 7 years ago

Updated by Anonymous over 7 years ago

Updated by Junxiao Shi over 7 years ago

Updated by Chavoosh Ghasemi over 7 years ago

Updated by Anonymous over 7 years ago

Updated by Anonymous over 7 years ago

Updated by Chavoosh Ghasemi over 7 years ago

Updated by Anonymous over 7 years ago

Updated by Chavoosh Ghasemi over 7 years ago

Updated by Anonymous over 7 years ago

Updated by Anonymous over 7 years ago

Updated by Anonymous over 7 years ago

Updated by Junxiao Shi over 7 years ago

Updated by Anonymous over 7 years ago

Updated by Chavoosh Ghasemi over 7 years ago

Updated by Davide Pesavento over 7 years ago

Updated by Davide Pesavento over 7 years ago

Updated by Chavoosh Ghasemi over 7 years ago

Updated by Anonymous over 7 years ago

Updated by Chavoosh Ghasemi over 7 years ago

Updated by Davide Pesavento over 7 years ago