Project

General

Profile

Actions

Bug #4604

closed

ndncatchunks: impossible computed RTT values

Added by Nicholas Gordon almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Ryan Wickman
Start date:
Due date:
% Done:

100%

Estimated time:

Description

When using ndncatchunks, we got RTT values that couldn't possibly be correct, such as max RTT values that exceeded the total transfer time of the file. Initial suspects are a buffer under- or over-flow, or maybe an erroneous division by zero.


Related issues 1 (0 open1 closed)

Has duplicate ndn-tools - Bug #4745: ndncatchunks terminates without receiving every data chunkClosedRyan Wickman

Actions
Actions #1

Updated by Nicholas Gordon almost 6 years ago

At this time, further investigation is needed to establish reproducibility.

Actions #2

Updated by Nicholas Gordon almost 6 years ago

  • Assignee set to Ryan Wickman
Actions #3

Updated by Ryan Wickman almost 6 years ago

PipelineInterestAimd::SegmentState, an enum class, is used to keep track of the current state of the segment, however, this gets muddled when retransmissions come into play.
The error occurs when three Data segments, with the same segment number, arrive at PipelineInterestAimd::handleData() and the following events occur:
(Prior to this assume that all three of these segments have timed out)

  • First arrives and since its state is SegmentState::Retransmitted it's RTT will not be added to measurements, it then changes the state to SegmentState::RetxReceived

  • Second arrives and the segment is erased out of the unordered_map (PipelineInterestAimd::m_segmentInfo), due to it's state, and returns without doing anything else.

  • Third arrives and (since it was erased out of the unordered map) outputs an invalid RTT to the console and adds this RTT to measurements (this is what will cause the erroneously displayed min/avg/max RTTs)

Part of the problem occurs due to the unordered_map (m_segmentInfo). When you try to access an element in the map that isn't available, rather than throw an error, it will return an empty object of the value type. So, when the third segment arrives to PipelineInterestAimd::handleData() it will try to access the already erased segment, thus returning an empty object. Thus when it checks it's state it will be 0 (i.e., SegmentState::FirstTimeSent).

Actions #4

Updated by Ryan Wickman almost 6 years ago

I propose the following fixes in PipelineInterestAimd::handleData() to resolve the issue.

To prevent the invalid RTTs from being displayed on the screen:

Before:

  SegmentInfo& segInfo = m_segmentInfo[recvSegNo];

After:

  if (m_segmentInfo.find(recvSegNo) == m_segmentInfo.end())
    return;

  SegmentInfo& segInfo = m_segmentInfo[recvSegNo];

To prevent retransmitted segments from being added as a measurement:

Before:

  if ((segInfo.state == SegmentState::FirstTimeSent ||
       segInfo.state == SegmentState::InRetxQueue) &&) {

After:

  if ((segInfo.state == SegmentState::FirstTimeSent ||
       segInfo.state == SegmentState::InRetxQueue) &&
       m_retxCount.find(recvSegNo) ==  m_retxCount.end()) {
Actions #5

Updated by Ryan Wickman almost 6 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Anonymous almost 6 years ago

Sounds good. I would also suggest to replace the operator[] with the map::at function http://www.cplusplus.com/reference/map/map/at/

SegmentInfo& segInfo = m_segmentInfo.at(recvSegNo);

This is much safer, since it throws an exception if the element doesn't exist (rather than creating a new element).

Feel free to push the change to Gerrit.

Actions #7

Updated by Davide Pesavento almost 6 years ago

Ryan Wickman wrote:

The error occurs when three Data segments, with the same segment number, arrive at PipelineInterestAimd::handleData()
...
(Prior to this assume that all three of these segments have timed out)

How is this possible then? If the segments have timed out, they shouldn't trigger an invocation of handleData() anymore.

Actions #8

Updated by Ryan Wickman almost 6 years ago

Here is a log of the exact events that transpire:

Requesting segment #1
segNo: #1 segInfo.state = SegmentState::FirstTimeSent;
m_segmentInfo.at(segNo).state = SegmentState::InRetxQueue: #1
enqueueForRetransmission: #1
Retransmitting segment #1
seg: #1 segInfo.state = SegmentState::Retransmitted;
m_segmentInfo.at(segNo).state = SegmentState::InRetxQueue: #1
enqueueForRetransmission: #1
Retransmitting segment #1
# of retries for segment #1 is 2
seg: #1 segInfo.state = SegmentState::Retransmitted;
m_segmentInfo.at(segNo).state = SegmentState::InRetxQueue: #1
enqueueForRetransmission: #1
Retransmitting segment #1
# of retries for segment #1 is 3
seg: #1 segInfo.state = SegmentState::Retransmitted;
CALLING handleData for recvSegNo: #1
Received segment #1, rtt=21.0426ms, rto=1000ms
Seg : #1 segInfo.state = SegmentState::RetxReceived
CALLING handleData for recvSegNo: #1
Erasing seg: #1
CALLING handleData for recvSegNo: #1
Received segment #1, rtt=2.0921e+07ms, rto=0ms
Measurment taken for recvSegNo: #1

It can be seen that this does, in fact, occur. As to why, I believe it is due to the Interest lifetime being set greater than the amount of time required for a retransmission to occur.

How the Interest lifetime is set:

interest.setInterestLifetime(m_options.interestLifetime)

In which m_options.interestLifetime has the value of ndn::DEFAULT_INTEREST_LIFETIME

How it is checked and enqueued for retransmission:

  if (timeElapsed.count() > segInfo.rto.count()) { // timer expired?
        hasTimeout = true;
        enqueueForRetransmission(entry.first);
      }
Actions #9

Updated by Davide Pesavento almost 6 years ago

Oh, you're talking about the RTO timeout, which does not cancel the pending Interest (i.e. Data can still arrive). I thought you were talking about the Interest timeout, which causes the invocation of handleLifetimeExpiration instead of handleData.

Actions #10

Updated by Davide Pesavento almost 6 years ago

Ryan Wickman wrote:

To prevent retransmitted segments from being added as a measurement:

Before:

  if ((segInfo.state == SegmentState::FirstTimeSent ||
       segInfo.state == SegmentState::InRetxQueue) &&) {

After:

  if ((segInfo.state == SegmentState::FirstTimeSent ||
       segInfo.state == SegmentState::InRetxQueue) &&
       m_retxCount.find(recvSegNo) ==  m_retxCount.end()) {

Yes, I pointed this out in https://redmine.named-data.net/issues/3902#note-20 and note-22, I guess nobody fixed it since then...

Actions #11

Updated by Ryan Wickman almost 6 years ago

  • Status changed from In Progress to Code review
Actions #12

Updated by Davide Pesavento over 5 years ago

  • Subject changed from chunks: impossible computed RTT values to ndncatchunks: impossible computed RTT values
Actions #13

Updated by Davide Pesavento over 5 years ago

I had another look at this and I noticed that sendInterest calls removePendingInterest on the old (timed-out) Interest before sending out a new Interest for that segment. So I don't understand how the scenario outlined in note-3 can happen: handleData cannot be called more than once for any segment number, because there can be at most one pending Interest for any given segment.

Actions #14

Updated by Davide Pesavento over 5 years ago

  • Status changed from Code review to Closed
  • % Done changed from 0 to 100
Actions #15

Updated by Davide Pesavento over 5 years ago

  • Has duplicate Bug #4745: ndncatchunks terminates without receiving every data chunk added
Actions

Also available in: Atom PDF