https://redmine.named-data.net/https://redmine.named-data.net/favicon.ico?14759811232015-03-09T09:23:18ZNDN project issue tracking systemNFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=89712015-03-09T09:23:18ZDavide Pesavento
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/8971/diff?detail_id=8209">diff</a>)</li></ul> NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90112015-03-09T15:14:07ZJunxiao Shi
<ul><li><strong>Subject</strong> changed from <i>Long delays for some Content Store Lookups</i> to <i>CS lookup takes long time if Interest Name covers many entries</i></li><li><strong>Category</strong> set to <i>Tables</i></li><li><strong>Target version</strong> set to <i>v0.4</i></li></ul><p>As stated in <a class="issue tracker-3 status-5 priority-2 priority-default closed" title="Task: Rewrite ContentStore based on ContentStore stub (Closed)" href="https://redmine.named-data.net/issues/2254">#2254</a> note-13:</p>
<blockquote>
<p>ContentStore has worst case performance: all Data packets in the ContentStore will be visited.<br><br>
The worst case scenario happens when Interest Name=ndn:/, simple selectors reject all Data packets.</p>
</blockquote>
<p>This Bug is an observation of this worst case scenario:</p>
<ol>
<li>ndnmap client <a href="https://github.com/WU-ARL/ndnmap/blob/6807cda91fde2f2509095977b27994f505134188/nfdDataCollection/nfdStatusCollectorClient.cpp#L161" class="external">requests FaceStatus dataset</a> <a href="http://lists.named-data.net/mailman/private/operators/2015-March/000893.html" class="external">every 1300ms</a>.<br>
<a class="wiki-page" href="https://redmine.named-data.net/projects/nfd/wiki/FaceMgmt">FaceStatus dataset</a> has FreshnessPeriod set to <a href="https://github.com/named-data/NFD/blob/15b12e759d4472f391fd9be1bfdfa92f9c2e1c81/core/notification-stream.hpp#L64" class="external">1000ms</a>, which means every request from ndnmap client would generate a new version</li>
<li>Data packets of all versions are cached in ContentStore under <code>ndn:/localhost/nfd/faces/list</code> prefix.<br>
Without CS eviction, this namespace grows by 2769 children per hour.</li>
<li>CS lookup for an Interest with <code>ndn:/localhost/nfd/faces/list</code> prefix visits all entries under this prefix until a Data matching Selectors is found.<br>
However, the Interest from ndnmap client has MustBeFresh (as required by <a class="wiki-page" href="https://redmine.named-data.net/projects/nfd/wiki/StatusDataset">StatusDataset</a>) and no cached Data can satisfy it.<br>
This causes all entries to be visited, and therefore the duration of this lookup grows linearly as more entries are inserted.</li>
</ol>
<p>This problem is confirmed without ndnmap client:</p>
<ol>
<li>enable DEBUG log for ContentStore</li>
<li><code>for I in $(seq 36000); do nfd-status -f; sleep 0.1; done</code></li>
<li><p>parse NFD logs with:</p>
<pre><code>awk '
BEGIN {
T = 0
}
$3=="[ContentStore]" && $4=="find" && $5=="/localhost/nfd/faces/list" {
T = $1
next
}
$3=="[ContentStore]" && T>0 {
if ($4=="no-match") {
print int(($1 - T) * 1000000)
}
T = 0
}
' nfd.log
</code></pre></li>
<li><p>visualize the trend of parsed dataset<br><br>
X-axis is number of requests; Y-axis is duration of lookup<br><br>
<img src="http://i57.tinypic.com/13zyji1.png" alt="trend" /></p></li>
</ol>
<p>One possible solution is to <strong>relax system requirements</strong>: limit the number of visited entries on each CS lookup, and return "no-match" when exceeding this limit.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90122015-03-09T16:41:49ZJohn DeHartjohn.d.dehart@gmail.com
<ul></ul><p>What is the difference in how the CS is searched for these two:</p>
<pre><code>1425669061.723974 DEBUG: [ContentStore] find /localhost/nfd/faces/events R
1425669061.725418 DEBUG: [ContentStore] no-match
1425669105.051430 DEBUG: [ContentStore] find /localhost/nfd/faces/list R
1425669105.074600 DEBUG: [ContentStore] no-match
</code></pre>
<p>Neither finds a match. One takes under 2ms and the other over 20ms.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90172015-03-09T20:30:09ZJunxiao Shi
<ul></ul><p><code>/localhost/nfd/faces/lists</code> grows by 2769 children per hour, assuming one dataset request per 1300ms;<br><br>
<code>/localhost/nfd/faces/events</code> grows by 2 children per face create+destroy, which is much slower than the namespace above.</p>
<p>It's worth noting that this Bug isn't introduced in <a class="issue tracker-3 status-5 priority-2 priority-default closed" title="Task: Rewrite ContentStore based on ContentStore stub (Closed)" href="https://redmine.named-data.net/issues/2254">#2254</a>. Ilya's CS is expected to have same problem, because algorithm is similar.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90252015-03-10T04:42:02ZJohn DeHartjohn.d.dehart@gmail.com
<ul></ul><p>Junxiao Shi wrote:</p>
<blockquote>
<p>However, the Interest from ndnmap client has MustBeFresh (as required by <a class="wiki-page" href="https://redmine.named-data.net/projects/nfd/wiki/StatusDataset">StatusDataset</a>) and no cached Data can satisfy it.<br><br>
This causes all entries to be visited, and therefore the duration of this lookup grows linearly as more entries are inserted.</p>
</blockquote>
<p>So, if we have MustBeFresh, don't we know ahead of time that we are not and by definition can not find<br>
data in the CS that will satisfy this Interest? If so, then why not skip the CS lookup?<br>
Is the thought that MustBeFresh should be rare and we don't want to spend the processing time<br>
checking for that on all Interests?</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90272015-03-10T08:35:51ZJohn DeHartjohn.d.dehart@gmail.com
<ul></ul><p>One additional question. Is there a way to mark data so it does not get cached?<br>
If we could do that with the ndnmap and nfd-status requests then they<br>
would not interfere with application interests and data.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90282015-03-10T08:45:32ZDavide Pesavento
<ul></ul><p>John DeHart wrote:</p>
<blockquote>
<p>So, if we have MustBeFresh, don't we know ahead of time that we are not and by definition can not find<br>
data in the CS that will satisfy this Interest?</p>
</blockquote>
<p>Uhm no, you can have data in the CS that is fresh (i.e. whose FreshnessPeriod hasn't expired yet).</p>
<blockquote>
<p>One additional question. Is there a way to mark data so it does not get cached? If we could do that with the ndnmap and nfd-status requests then they would not interfere with application interests and data.</p>
</blockquote>
<p>IIUC, the <code>NoCache</code> caching policy should help in this case, see <a href="http://redmine.named-data.net/projects/nfd/wiki/LocalControlHeader#CachingPolicy-feature">http://redmine.named-data.net/projects/nfd/wiki/LocalControlHeader#CachingPolicy-feature</a></p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90302015-03-10T09:34:23ZJohn DeHartjohn.d.dehart@gmail.com
<ul></ul><p>Ahh. Ok. For some reason I had interpretted MustBeFresh to mean must come from the original producer.</p>
<blockquote>
<p>IIUC, the <code>NoCache</code> caching policy should help in this case, see <a href="http://redmine.named-data.net/projects/nfd/wiki/LocalControlHeader#CachingPolicy-feature">http://redmine.named-data.net/projects/nfd/wiki/LocalControlHeader#CachingPolicy-feature</a></p>
</blockquote>
<p>I see NoCache in ndn-cxx but I don't see it used anywhere in NFD except in one test. Has it been implemented?</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90312015-03-10T09:49:01ZAnonymous
<ul></ul><p>I don't think so. It should be feature <a class="issue tracker-2 status-9 priority-2 priority-default closed" title="Feature: faces/enable-local-control: CachingPolicy (Abandoned)" href="https://redmine.named-data.net/issues/2184">#2184</a>, but the implementation is "blocked" by the management refactoring (in the sense of avoiding repeat work). The management refactoring is blocked by a question about extracting about identities from signed Interests.</p>
<p>I'm on the hook for <a class="issue tracker-2 status-9 priority-2 priority-default closed" title="Feature: faces/enable-local-control: CachingPolicy (Abandoned)" href="https://redmine.named-data.net/issues/2184">#2184</a>. If it's useful to you, and Alex + Junxiao are willing to approve code, I'm willing to implement it for you ASAP.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90322015-03-10T10:06:39ZJohn DeHartjohn.d.dehart@gmail.com
<ul></ul><p>I am not sure NoCache makes sense for this. I suspect that this data is handled<br>
the way it is to avoid a DOS type attack and if we change the caching policy for it so<br>
that all requests go all the way to nfd management code then we have probably opened<br>
that doorway for attack.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90332015-03-10T10:30:47ZJunxiao Shi
<ul></ul><p>One possible solution is to <strong>relax system requirements</strong>: limit the number of visited entries on each CS lookup, and return "no-match" when exceeding this limit.</p>
<p><strong>NoCache</strong> can mask this Bug, but it cannot solve the general problem in note-2.<br><br>
The following issues are needed to use NoCache in <a class="wiki-page" href="https://redmine.named-data.net/projects/nfd/wiki/StatusDataset">StatusDataset</a> from NFD Management: <a class="issue tracker-2 status-5 priority-2 priority-default closed" title="Feature: Recognize CachingPolicy=NoCache (Closed)" href="https://redmine.named-data.net/issues/2185">#2185</a>, <a class="issue tracker-2 status-5 priority-4 priority-high2 closed" title="Feature: InMemoryStorage for management (Closed)" href="https://redmine.named-data.net/issues/2182">#2182</a>.<br><br>
Without <a class="issue tracker-2 status-5 priority-4 priority-high2 closed" title="Feature: InMemoryStorage for management (Closed)" href="https://redmine.named-data.net/issues/2182">#2182</a>, <a class="wiki-page" href="https://redmine.named-data.net/projects/nfd/wiki/StatusDataset">StatusDataset</a> with multiple segments won't work.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90392015-03-10T11:23:53ZJohn DeHartjohn.d.dehart@gmail.com
<ul></ul><p>I think limiting the number of visited entries makes sense. What number do you propose?</p>
<p>I'm also not sure what the difference is for these requests if we specify the ChildSelector or not.</p>
<p>We currently set the ChildSelector to 1 for the ndnmap requests which gives us a RightMost find.<br>
We do this because that is what nfd-status does.</p>
<p>What happens if we set it to 0 and use a LeftMost find?</p>
<p>I'm trying that right now and the worst case time for the find is about 2.5 ms.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90492015-03-10T20:21:15ZJunxiao Shi
<ul></ul><blockquote>
<p>I think limiting the number of visited entries makes sense. What number do you propose?</p>
</blockquote>
<p>I don't have a specific number in mind. This needs benchmark.</p>
<blockquote>
<p>What happens if we set it to 0 and use a LeftMost find?<br><br>
I'm trying that right now and the worst case time for the find is about 2.5 ms.</p>
</blockquote>
<p>Count of visited entries is the same when there's no match.<br><br>
Rightmost lookup procedure calls <code>lower_bound</code> many times (linear to count of unique next-components under Interest Name).<br>
Leftmost lookup procedure calls <code>lower_bound</code> only once, so it is faster.</p>
<p>"Visiting an entry" means calling the procedure that determines whether the Interest including simple Selectors matches the cached Data.<br>
Merely comparing the Name to another entry doesn't count as a visit.</p>
<hr>
<p>Given these stored entries:</p>
<ul>
<li>/P/v0/s0, /P/v0/s1, .., /P/v0/s99</li>
<li>/P/v1/s0, /P/v1/s1, .., /P/v1/s99</li>
<li>..</li>
<li>/P/v9/s0, /P/v9/s1, .., /P/v9/s99</li>
</ul>
<p>Current rightmost lookup procedure "<a href="http://redmine.named-data.net/attachments/download/149/repo-index_20140617.pptx" class="external">PROC-B</a>" is optimized for the expected case that there's a match near the rightmost end (eg. /P/v9/s0), and minimizes count of visited entries in this expected case.<br><br>
However, when there's no match due to simple Selectors (such as MustBeFresh), the procedure would visit all entries, and also perform 10 <code>lower_bound</code>s (to locate /P/v9/s0, /P/v8/s0, .., /P/v0/s0).</p>
<p>An alternate procedure "<a href="http://redmine.named-data.net/attachments/download/89/tables-concept-cs_20140117.pptx" class="external">PROC-A</a>" has slightly better worst case performance, but its expected case performance is worse: all entries under Interest Name are always visited, regardless of whether there's a match; but it does not call <code>lower_bound</code>.</p>
<p>Yet another procedure "PROC-C" can visit the entries backwards.<br>
It shall be better than PROC-A in expected case, and has same worst case performance as PROC-A.</p>
<p>When there's a match (at /P/v9/s0):</p>
<table><thead>
<tr>
<th></th>
<th>count of visited entries</th>
<th>count of <code>lower_bound</code>s</th>
</tr>
</thead><tbody>
<tr>
<td>PROC-A</td>
<td>1000</td>
<td>1</td>
</tr>
<tr>
<td>PROC-B</td>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>PROC-C</td>
<td>100</td>
<td>1</td>
</tr>
</tbody></table>
<p>When there's no match:</p>
<table><thead>
<tr>
<th></th>
<th>count of visited entries</th>
<th>count of <code>lower_bound</code>s</th>
</tr>
</thead><tbody>
<tr>
<td>PROC-A</td>
<td>1000</td>
<td>1</td>
</tr>
<tr>
<td>PROC-B</td>
<td>1000</td>
<td>11</td>
</tr>
<tr>
<td>PROC-C</td>
<td>1000</td>
<td>1</td>
</tr>
</tbody></table>
<hr>
<p>Back to this Bug: the namespace of stored entries after one hour of ndnmap client execution looks like:<br>
(suppose FaceStatus dataset has 2 segments)</p>
<ul>
<li>/localhost/nfd/faces/list/v0/s0, /localhost/nfd/faces/list/v0/s1</li>
<li>/localhost/nfd/faces/list/v1/s0, /localhost/nfd/faces/list/v1/s1</li>
<li>..</li>
<li>/localhost/nfd/faces/list/v2768/s0, /localhost/nfd/faces/list/v2768/s1</li>
</ul>
<p>When there's a match (at /localhost/nfd/faces/list/v2768/s0):</p>
<table><thead>
<tr>
<th></th>
<th>count of visited entries</th>
<th>count of <code>lower_bound</code>s</th>
</tr>
</thead><tbody>
<tr>
<td>PROC-A</td>
<td>5536</td>
<td>1</td>
</tr>
<tr>
<td>PROC-B</td>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>PROC-C</td>
<td>2</td>
<td>1</td>
</tr>
</tbody></table>
<p>When there's no match:</p>
<table><thead>
<tr>
<th></th>
<th>count of visited entries</th>
<th>count of <code>lower_bound</code>s</th>
</tr>
</thead><tbody>
<tr>
<td>PROC-A</td>
<td>5536</td>
<td>1</td>
</tr>
<tr>
<td>PROC-B</td>
<td>5536</td>
<td>2770</td>
</tr>
<tr>
<td>PROC-C</td>
<td>5536</td>
<td>1</td>
</tr>
</tbody></table>
<p>I wish the ContentStore could select a procedure based on "what kind of namespace it is".<br>
In the examples above, PROC-C has better average performance if there are many versions but few segments per version, while PROC-B has better average performance is there are few versions but many segments per version.<br><br>
However, the ContentStore index is an ordered container, which doesn't allow the algorithm to make this choice.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=90532015-03-11T07:16:09ZJohn DeHartjohn.d.dehart@gmail.com
<ul></ul><p>That helped a lot. I am now convinced that the ndnmap interests should ask for LeftMost<br>
since we know that the expected result is no-match, we don't what to spend the worst-case<br>
time of RightMost.</p>
<p>I'll run some more tests to make sure nothing else weird happens.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=117872015-07-27T16:01:16ZJunxiao Shi
<ul></ul><p>20150727 conference call discussed this problem.</p>
<p>Lixia points out that this would be a DDoS attack.</p>
<p>Alex proposes that ContentStore could consider Name+Exclude only: Name must be an exact match of Name without implicit digest, and Exclude can only reject implicit digests.<br><br>
This shall allow the ContentStore to visit only a small number of entries (all having the same Name without implicit digest), but require all other Interests to be processed by the producer.</p>
NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=139632015-12-22T13:19:25ZAlex Afanasyev
<ul><li><strong>Target version</strong> changed from <i>v0.4</i> to <i>v0.5</i></li></ul> NFD - Bug #2626: CS lookup takes long time if Interest Name covers many entrieshttps://redmine.named-data.net/issues/2626?journal_id=219972018-01-23T23:23:48ZDavide Pesavento
<ul><li><strong>Target version</strong> deleted (<del><i>v0.5</i></del>)</li></ul>