Project

General

Profile

Actions

Bug #4551

closed

catchunks: PipelineInterestsAimd::printSummary shows incorrect RTTs if all packets are retransmitted

Added by Chavoosh Ghasemi about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Start date:
03/20/2018
Due date:
% Done:

100%

Estimated time:
1.00 h

Description

I have ported catchunks and putchunks tools to ndnSIM and after running them by different bandwidths I just realized for small bandwidth the printSummary function prints an odd output, like the following one:

All segments have been received.
Time elapsed: 591.6 milliseconds
Total # of segments received: 9
Total size: 35.225kB
Goodput: 476.335363 kbit/s
Total # of packet loss events: 2
Packet loss rate: 0.222222
Total # of retransmitted segments: 10
Total # of received congestion marks: 0
RTT min/avg/max = 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546
70353751698604991057655128207624549009038932894407586850845513394230458323690322294816580855933212334827479782620414472316873817718091929988125040
4026184124858368.000/0.000/0.000 ms

As we can see the value of minRTT is so big, and this is because we initialize minRTT with numeric_limits<double>::max() and apparently it does not change, so after calling printSummary function it prints out the max value of double type.
Anyways, I am wondering why minRTT does not update. BTW, the above output shows we have some packet loss, but all segments are received, successfully (i.e. 9 out of 9).

Actions #1

Updated by Chavoosh Ghasemi about 6 years ago

  • Description updated (diff)
Actions #2

Updated by Chavoosh Ghasemi about 6 years ago

  • Description updated (diff)
Actions #3

Updated by Anonymous about 6 years ago

None of the RTT measurements is updated, so it seems that the function "m_rttEstimator.addMeasurement" is never called.

Probably because all of your packets needed at least one retransmission, for which RTTs are (rightfully) not sampled.

Now the question is: Why do all of your packets experience at least one retransmission?

Actions #4

Updated by Junxiao Shi about 6 years ago

  • Tracker changed from Task to Bug
  • Subject changed from Bug in "printSummary" function of chunks tools' pipeline to catchunks: PipelineInterestsAimd::printSummary shows incorrect RTTs if all packets are retransmitted

Why do all of your packets experience at least one retransmission?

With best-route strategy, if the first nexthop is bad (does not lead to Data) and the second nexthop is good, every chunk would need a retransmission.

With a better strategy, this can still happen if the file has only one chunk.

Actions #5

Updated by Chavoosh Ghasemi about 6 years ago

With best-route strategy, if the first nexthop is bad (does not lead to Data) and the second nexthop is good, every chunk would need a retransmission.

The scenario that I tested ndnchunks on is so simple, three nodes where node_1 is connected to node_2 and node_2 is connected to node_3 and the bandwidth and delay of links are 10Mbps and 10ms, respectively. Note that node_1 is the consumer and node_3 is the producer. Based on this scenario there is only one nexthop (which leads us to the Data). So, I think maybe the reason is something else.

If the problem was packet drop, we should have seen more packet loss (not only 2). It is obvious that all the first packets will experience a timeout, so the second interest will fetch the Data (maybe from intermediate node (i.e. node_2). So, maybe changing the timeout value can mitigate this problem; however, even if that is the case, we should ask this question that what should we print out in scenarios like this.

FYI, by increasing the bandwidth to 10Mbps the problem will be mitigated.

Actions #6

Updated by Anonymous about 6 years ago

"Loss events" is different from packet drops. The actual number of dropped packets is 10 (the same as retx).

Also yes, you should increase the minrto (let's say to 2seconds) and try again.

It might be that your low BW leads to high queuing delays, which leads to the timeouts.

Actions #7

Updated by Anonymous about 6 years ago

BTW, what's your BW in the original measurement?

Actions #8

Updated by Chavoosh Ghasemi about 6 years ago

Klaus Schneider wrote:

BTW, what's your BW in the original measurement?

The scenario based on the output is shown the links are 1Mbps and delay is 10ms. BTW, what should we do for scenarios like this? It does not make sense to print out max of the double type as min_rtt.

Actions #9

Updated by Anonymous about 6 years ago

Chavoosh Ghasemi wrote:

Klaus Schneider wrote:

BTW, what's your BW in the original measurement?

The scenario based on the output is shown the links are 1Mbps and delay is 10ms. BTW, what should we do for scenarios like this? It does not make sense to print out max of the double type as min_rtt.

Well, we could print out an error message saying that there were no useful RTT measurements.

But more importantly you should fix your setup to avoid getting only timeouts in the first place.

Did changing the rto fix the problem?

Actions #10

Updated by Chavoosh Ghasemi about 6 years ago

Did changing the rto fix the problem?

No, it did not solve the problem. I've changed some other parameters but the problem does mitigate unless the BW be increased.

Actions #11

Updated by Anonymous about 6 years ago

Chavoosh Ghasemi wrote:

Did changing the rto fix the problem?

No, it did not solve the problem. I've changed some other parameters but the problem does mitigate unless the BW be increased.

So what's the smallest BW where it works, and the highest BW where it doesn't work?

+ Post your catchunks commands and the output.

Actions #12

Updated by Anonymous about 6 years ago

Chavoosh just told me that the problem was that the PIT lifetime is fixed at 100ms.

Is there any way to increase that?

Actions #13

Updated by Anonymous about 6 years ago

  • Subject changed from catchunks: PipelineInterestsAimd::printSummary shows incorrect RTTs if all packets are retransmitted to Initial PIT Lifetime too low?
Actions #14

Updated by Junxiao Shi about 6 years ago

  • Subject changed from Initial PIT Lifetime too low? to catchunks: PipelineInterestsAimd::printSummary shows incorrect RTTs if all packets are retransmitted

Please do not change issue title. The unusually large number is a real bug and must be fixed.
If you have concerns about PIT lifetime, ask a question on nfd-dev mailing list or report a bug on NFD (not ndn-tools) issue tracker.

Actions #15

Updated by Anonymous about 6 years ago

Okay. It would be good for Chavoosh to post a full problem description (regarding the pit lifetime) and what steps he took to fix it.

Actions #16

Updated by Chavoosh Ghasemi about 6 years ago

If you have concerns about PIT lifetime, ask a question on nfd-dev mailing list or report a bug on NFD (not ndn-tools) issue tracker.

Although this problem is related to NFD, I do not think it should be tracked in nfd-dev mailing list, since the default value of interest lifetime is set in ndn-cxx (see ndn-cxx/src/interest.hpp:38). Also, you can take a look at #2202. Anyways, the default value for interest lifetime is set to 4000 milliseconds. Previously, I used 100 milliseconds which caused at least one retransmission for each Interest packet. I increased it to 1000 milliseconds and the problem solved. However, after checking the ndn-cxx source code, I just realized the default value should be set to 4000 milliseconds (which works properly).

About the scenario where we do not have any measurement for RTT, we should print a warning message (like "No measurement is available"), instead of printing weird output (as is shown in the problem description). If you have no comment, I'll change the source code and push it to gerrit.

Actions #17

Updated by Davide Pesavento about 6 years ago

  • Description updated (diff)
Actions #18

Updated by Davide Pesavento about 6 years ago

Chavoosh Ghasemi wrote:

Although this problem is related to NFD, I do not think it should be tracked in nfd-dev mailing list, since the default value of interest lifetime is set in ndn-cxx (see ndn-cxx/src/interest.hpp:38). Also, you can take a look at #2202. Anyways, the default value for interest lifetime is set to 4000 milliseconds. Previously, I used 100 milliseconds which caused at least one retransmission for each Interest packet. I increased it to 1000 milliseconds and the problem solved. However, after checking the ndn-cxx source code, I just realized the default value should be set to 4000 milliseconds (which works properly).

So, just to be clear, there is no problem with PIT lifetime, you were just setting an Interest lifetime that was too low, correct?

Actions #19

Updated by Chavoosh Ghasemi about 6 years ago

So, just to be clear, there is no problem with PIT lifetime, you were just setting an Interest lifetime that was too low, correct?

Correct.

Actions #20

Updated by Anonymous about 6 years ago

Chavoosh Ghasemi wrote:

If you have concerns about PIT lifetime, ask a question on nfd-dev mailing list or report a bug on NFD (not ndn-tools) issue tracker.

Although this problem is related to NFD, I do not think it should be tracked in nfd-dev mailing list, since the default value of interest lifetime is set in ndn-cxx (see ndn-cxx/src/interest.hpp:38). Also, you can take a look at #2202. Anyways, the default value for interest lifetime is set to 4000 milliseconds. Previously, I used 100 milliseconds which caused at least one retransmission for each Interest packet. I increased it to 1000 milliseconds and the problem solved. However, after checking the ndn-cxx source code, I just realized the default value should be set to 4000 milliseconds (which works properly).

If you had told us about this change this would have been much easier to debug...

Also, I think your example is still an extreme corner case, since if the PIT lifetime was even lower, you wouldn't get any replies at all.

Somehow you hit the sweet spot where the Interest lifetime (100ms) is too short for the initial Interests, but long enough for the follow-up Interests that hit the cache.

This is quite unlikely, so it's also unlikely that users would see your proposed warning message. In most cases either the RTT would display correctly, or catchunks would return a timeout message without any summary at all: "Reached the maximum number of timeout retries".

Actions #21

Updated by Chavoosh Ghasemi about 6 years ago

  • Assignee set to Chavoosh Ghasemi
Actions #22

Updated by Davide Pesavento about 6 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF