Task #3508
closed[ndncon] Chronosync misconfiguration leads to extensive CPU usage for ndncon and NFD
100%
Description
Observed behavior:
- Start producer(s) on REMAP hub
- Start ndncon connected to REMAP
- Create chat
- Wait for some time
Result:
NFD and ndncon CPU usage ramps up to 80% and 100% accordingly.
Problem description by Zhehao:
Given the current implementation, a problematic case could be when sync broadcast interest brings back data that does not result in digest root update (though this, by design, should not happen), and the same interest would be sent again, bringing back the same data thus causing the same interest to be sent again with no interval, and quickly flooding (only) the local NFD. The spike would go away after one sync data times out.
For example, in ChronoSync2013 in ndn-cpp's onData, if the update call does not cause update in root digest, the same interest gets issued again instantly. Similar logic exists in my discovery.
Our previous (deterministic) spike issue, caused by unintended namespace conflict, is the same thing: essentially a different application is able to answer a sync interest with data that does not make sense, thus causing the sync interest sender to instantly retransmit the same interest, and floods its local NFD.
(So as a side note, if anyone in the call today's using a version of NdnCon before your update, this spike should happen for someone else in the call; assuming that everyone has the update, I'll still need to figure out the exact scenario that can cause this.)
The issue was considered to be resolved with this commit. Though it was observed again right before the 3/2/2016 seminar. Need to check again.
Updated by Anonymous almost 9 years ago
For example, in ChronoSync2013 in ndn-cpp's onData, if the update call does not cause update in root digest, the same interest gets issued again instantly.
Zhehao or Peter: The update call returns false if it didn't update the digest tree. Should we simply change onData to delay re-sending the interest in this case? (Normally this shouldn't happen, so I think the delay is OK.)
Updated by Zhehao Wang almost 9 years ago
Jeff Thompson wrote:
For example, in ChronoSync2013 in ndn-cpp's onData, if the update call does not cause update in root digest, the same interest gets issued again instantly.
Zhehao or Peter: The update call returns false if it didn't update the digest tree. Should we simply change onData to delay re-sending the interest in this case? (Normally this shouldn't happen, so I think the delay is OK.)
Makes sense to me as a quick fix for unintentional local NFD floods;
We'll play without the change to test this issue, to make sure it's what we thought it is.
Updated by Anonymous almost 9 years ago
An obvious comment: Since we don't expect update() to return false, during testing you can update NDN-CPP to log a warning if it does.
https://github.com/named-data/ndn-cpp/blob/60ea5ee1856c3b1c4eee8458325558aa2401ddb9/src/sync/chrono-sync2013.cpp#L225
Updated by Zhehao Wang over 8 years ago
Peter and I tried to confirm and reproduce this issue on Friday, but we weren't able to using the scenario Peter described. This issue could be caused by people running an older version of NdnCon on Wed (0.7.4 or earlier), which still has the namespace collision.
Instead, we found that in the current version of NdnCon, users can't create chatroom if a chatroom's already discovered. This should be NdnCon specific, and could be limiting the tests that we can do for this issue.
Meanwhile, if anyone's running NdnCon 0.7.4 or earlier on Mar 2nd before the seminar, please let us know and upgrade NdnCon.
Updated by Peter Gusev over 8 years ago
- Status changed from In Progress to Closed
- % Done changed from 50 to 100
I'm closing this and tracking chat creation issues here #3518.