Bug #4604: ndncatchunks: impossible computed RTT values - ndn-tools - NDN project issue tracking system

Actions

Copy link

Bug #4604

closed

ndncatchunks: impossible computed RTT values

Added by Nicholas Gordon about 7 years ago. Updated over 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Ryan Wickman

Start date:

Due date:

% Done:

100%

Estimated time:

Description

When using ndncatchunks, we got RTT values that couldn't possibly be correct, such as max RTT values that exceeded the total transfer time of the file. Initial suspects are a buffer under- or over-flow, or maybe an erroneous division by zero.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Nicholas Gordon about 7 years ago

At this time, further investigation is needed to establish reproducibility.

Actions

Copy link

Updated by Nicholas Gordon about 7 years ago

Assignee set to Ryan Wickman

Actions

Copy link

Updated by Ryan Wickman about 7 years ago

PipelineInterestAimd::SegmentState, an enum class, is used to keep track of the current state of the segment, however, this gets muddled when retransmissions come into play.
The error occurs when three Data segments, with the same segment number, arrive at PipelineInterestAimd::handleData() and the following events occur:
(Prior to this assume that all three of these segments have timed out)

First arrives and since its state is SegmentState::Retransmitted it's RTT will not be added to measurements, it then changes the state to SegmentState::RetxReceived
Second arrives and the segment is erased out of the unordered_map (PipelineInterestAimd::m_segmentInfo), due to it's state, and returns without doing anything else.
Third arrives and (since it was erased out of the unordered map) outputs an invalid RTT to the console and adds this RTT to measurements (this is what will cause the erroneously displayed min/avg/max RTTs)

Part of the problem occurs due to the unordered_map (m_segmentInfo). When you try to access an element in the map that isn't available, rather than throw an error, it will return an empty object of the value type. So, when the third segment arrives to PipelineInterestAimd::handleData() it will try to access the already erased segment, thus returning an empty object. Thus when it checks it's state it will be 0 (i.e., SegmentState::FirstTimeSent).

Actions

Copy link

Updated by Ryan Wickman about 7 years ago

I propose the following fixes in PipelineInterestAimd::handleData() to resolve the issue.

To prevent the invalid RTTs from being displayed on the screen:

Before:

  SegmentInfo& segInfo = m_segmentInfo[recvSegNo];

After:

  if (m_segmentInfo.find(recvSegNo) == m_segmentInfo.end())
    return;

  SegmentInfo& segInfo = m_segmentInfo[recvSegNo];

To prevent retransmitted segments from being added as a measurement:

Before:

  if ((segInfo.state == SegmentState::FirstTimeSent ||
       segInfo.state == SegmentState::InRetxQueue) &&) {

After:

  if ((segInfo.state == SegmentState::FirstTimeSent ||
       segInfo.state == SegmentState::InRetxQueue) &&
       m_retxCount.find(recvSegNo) ==  m_retxCount.end()) {

Actions

Copy link

Updated by Ryan Wickman about 7 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Anonymous about 7 years ago

Sounds good. I would also suggest to replace the operator[] with the map::at function http://www.cplusplus.com/reference/map/map/at/

SegmentInfo& segInfo = m_segmentInfo.at(recvSegNo);

This is much safer, since it throws an exception if the element doesn't exist (rather than creating a new element).

Feel free to push the change to Gerrit.

Actions

Copy link

Updated by Davide Pesavento about 7 years ago

Ryan Wickman wrote:

The error occurs when three Data segments, with the same segment number, arrive at PipelineInterestAimd::handleData()
...
(Prior to this assume that all three of these segments have timed out)

How is this possible then? If the segments have timed out, they shouldn't trigger an invocation of handleData() anymore.

Actions

Copy link

Updated by Ryan Wickman about 7 years ago

Here is a log of the exact events that transpire:

Requesting segment #1
segNo: #1 segInfo.state = SegmentState::FirstTimeSent;
m_segmentInfo.at(segNo).state = SegmentState::InRetxQueue: #1
enqueueForRetransmission: #1
Retransmitting segment #1
seg: #1 segInfo.state = SegmentState::Retransmitted;
m_segmentInfo.at(segNo).state = SegmentState::InRetxQueue: #1
enqueueForRetransmission: #1
Retransmitting segment #1
# of retries for segment #1 is 2
seg: #1 segInfo.state = SegmentState::Retransmitted;
m_segmentInfo.at(segNo).state = SegmentState::InRetxQueue: #1
enqueueForRetransmission: #1
Retransmitting segment #1
# of retries for segment #1 is 3
seg: #1 segInfo.state = SegmentState::Retransmitted;
CALLING handleData for recvSegNo: #1
Received segment #1, rtt=21.0426ms, rto=1000ms
Seg : #1 segInfo.state = SegmentState::RetxReceived
CALLING handleData for recvSegNo: #1
Erasing seg: #1
CALLING handleData for recvSegNo: #1
Received segment #1, rtt=2.0921e+07ms, rto=0ms
Measurment taken for recvSegNo: #1

It can be seen that this does, in fact, occur. As to why, I believe it is due to the Interest lifetime being set greater than the amount of time required for a retransmission to occur.

How the Interest lifetime is set:

interest.setInterestLifetime(m_options.interestLifetime)

In which m_options.interestLifetime has the value of ndn::DEFAULT_INTEREST_LIFETIME

How it is checked and enqueued for retransmission:

  if (timeElapsed.count() > segInfo.rto.count()) { // timer expired?
        hasTimeout = true;
        enqueueForRetransmission(entry.first);
      }

Actions

Copy link

Updated by Davide Pesavento about 7 years ago

Oh, you're talking about the RTO timeout, which does not cancel the pending Interest (i.e. Data can still arrive). I thought you were talking about the Interest timeout, which causes the invocation of handleLifetimeExpiration instead of handleData.

Actions

Copy link

#10

Updated by Davide Pesavento about 7 years ago

Ryan Wickman wrote:

To prevent retransmitted segments from being added as a measurement:

Before:

  if ((segInfo.state == SegmentState::FirstTimeSent ||
       segInfo.state == SegmentState::InRetxQueue) &&) {

After:

  if ((segInfo.state == SegmentState::FirstTimeSent ||
       segInfo.state == SegmentState::InRetxQueue) &&
       m_retxCount.find(recvSegNo) ==  m_retxCount.end()) {

Yes, I pointed this out in https://redmine.named-data.net/issues/3902#note-20 and note-22, I guess nobody fixed it since then...

Actions

Copy link

#11

Updated by Ryan Wickman about 7 years ago

Status changed from In Progress to Code review

Actions

Copy link

#12

Updated by Davide Pesavento almost 7 years ago

Subject changed from chunks: impossible computed RTT values to ndncatchunks: impossible computed RTT values

Actions

Copy link

#13

Updated by Davide Pesavento almost 7 years ago

I had another look at this and I noticed that sendInterest calls removePendingInterest on the old (timed-out) Interest before sending out a new Interest for that segment. So I don't understand how the scenario outlined in note-3 can happen: handleData cannot be called more than once for any segment number, because there can be at most one pending Interest for any given segment.

Actions

Copy link

#14

Updated by Davide Pesavento over 6 years ago

Status changed from Code review to Closed
% Done changed from 0 to 100

Actions

Copy link

#15

Updated by Davide Pesavento over 6 years ago

Has duplicate Bug #4745: ndncatchunks terminates without receiving every data chunk added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD » ndn-tools

Tags

Bug #4604

ndncatchunks: impossible computed RTT values

Updated by Nicholas Gordon about 7 years ago

Updated by Nicholas Gordon about 7 years ago

Updated by Ryan Wickman about 7 years ago

Updated by Ryan Wickman about 7 years ago

Updated by Ryan Wickman about 7 years ago

Updated by Anonymous about 7 years ago

Updated by Davide Pesavento about 7 years ago

Updated by Ryan Wickman about 7 years ago

Updated by Davide Pesavento about 7 years ago

Updated by Davide Pesavento about 7 years ago

Updated by Ryan Wickman about 7 years ago

Updated by Davide Pesavento almost 7 years ago

Updated by Davide Pesavento almost 7 years ago

Updated by Davide Pesavento over 6 years ago

Updated by Davide Pesavento over 6 years ago