Task #3508: [ndncon] Chronosync misconfiguration leads to extensive CPU usage for ndncon and NFD - ndnrtc - NDN project issue tracking system

Actions

Copy link

Task #3508

closed

[ndncon] Chronosync misconfiguration leads to extensive CPU usage for ndncon and NFD

Added by Peter Gusev over 9 years ago. Updated over 9 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Start date:

03/04/2016

Due date:

% Done:

100%

Estimated time:

Description

Observed behavior:

Start producer(s) on REMAP hub
Start ndncon connected to REMAP
Create chat
Wait for some time

Result:
NFD and ndncon CPU usage ramps up to 80% and 100% accordingly.

Problem description by Zhehao:

Given the current implementation, a problematic case could be when sync broadcast interest brings back data that does not result in digest root update (though this, by design, should not happen), and the same interest would be sent again, bringing back the same data thus causing the same interest to be sent again with no interval, and quickly flooding (only) the local NFD. The spike would go away after one sync data times out.
For example, in ChronoSync2013 in ndn-cpp's onData, if the update call does not cause update in root digest, the same interest gets issued again instantly. Similar logic exists in my discovery.
Our previous (deterministic) spike issue, caused by unintended namespace conflict, is the same thing: essentially a different application is able to answer a sync interest with data that does not make sense, thus causing the sync interest sender to instantly retransmit the same interest, and floods its local NFD.
(So as a side note, if anyone in the call today's using a version of NdnCon before your update, this spike should happen for someone else in the call; assuming that everyone has the update, I'll still need to figure out the exact scenario that can cause this.)

The issue was considered to be resolved with this commit. Though it was observed again right before the 3/2/2016 seminar. Need to check again.

Actions

Copy link

Updated by Anonymous over 9 years ago

For example, in ChronoSync2013 in ndn-cpp's onData, if the update call does not cause update in root digest, the same interest gets issued again instantly.

Zhehao or Peter: The update call returns false if it didn't update the digest tree. Should we simply change onData to delay re-sending the interest in this case? (Normally this shouldn't happen, so I think the delay is OK.)

Actions

Copy link

Updated by Zhehao Wang over 9 years ago

Jeff Thompson wrote:

For example, in ChronoSync2013 in ndn-cpp's onData, if the update call does not cause update in root digest, the same interest gets issued again instantly.

Zhehao or Peter: The update call returns false if it didn't update the digest tree. Should we simply change onData to delay re-sending the interest in this case? (Normally this shouldn't happen, so I think the delay is OK.)

Makes sense to me as a quick fix for unintentional local NFD floods;
We'll play without the change to test this issue, to make sure it's what we thought it is.

Actions

Copy link

Updated by Anonymous over 9 years ago

An obvious comment: Since we don't expect update() to return false, during testing you can update NDN-CPP to log a warning if it does.
https://github.com/named-data/ndn-cpp/blob/60ea5ee1856c3b1c4eee8458325558aa2401ddb9/src/sync/chrono-sync2013.cpp#L225

Actions

Copy link

Updated by Zhehao Wang over 9 years ago

Peter and I tried to confirm and reproduce this issue on Friday, but we weren't able to using the scenario Peter described. This issue could be caused by people running an older version of NdnCon on Wed (0.7.4 or earlier), which still has the namespace collision.

Instead, we found that in the current version of NdnCon, users can't create chatroom if a chatroom's already discovered. This should be NdnCon specific, and could be limiting the tests that we can do for this issue.

Meanwhile, if anyone's running NdnCon 0.7.4 or earlier on Mar 2nd before the seminar, please let us know and upgrade NdnCon.

Actions

Copy link

Updated by Peter Gusev over 9 years ago

Status changed from In Progress to Closed
% Done changed from 50 to 100

I'm closing this and tracking chat creation issues here #3518.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

ndnrtc

Task #3508

[ndncon] Chronosync misconfiguration leads to extensive CPU usage for ndncon and NFD

Updated by Anonymous over 9 years ago

Updated by Zhehao Wang over 9 years ago

Updated by Anonymous over 9 years ago

Updated by Zhehao Wang over 9 years ago

Updated by Peter Gusev over 9 years ago