https://redmine.named-data.net/https://redmine.named-data.net/favicon.ico?14759811232014-06-26T23:45:28ZNDN project issue tracking systemNFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=35162014-06-26T23:45:28ZAlex Afanasyev
<ul></ul><p>Can you link/reference the benchmark?</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=35172014-06-26T23:46:15ZJunxiao Shi
<ul></ul><p>The benchmark is done with a modified version of <a href="https://github.com/named-data/NFD/tree/4a771368949db5e0bf824cd60ed0a9af9d5d67c5/tests/other/cs-smoketest.cpp" class="external"><code>tests/other/cs-smoketest.cpp</code></a>.</p>
<ul>
<li>add <code>data->getFullName()</code> in the dataWorkload preparation loop, so that implicit digest is computed before CS insertion; or comment out this line, and CS insertion algorithm would have to compute implicit digest</li>
<li>replace the loop of varying nInsertions with a single execution of 70000 insertions; this is needed because dataWorkload array would contain implicit digests after the first execution</li>
</ul>
<p>The results are:</p>
<ul>
<li>CS insertion computes implicit digest: 62.0152 seconds</li>
<li>pre-computed implicit digest - CS insertion does not compute implicit digest: 32.3143 seconds</li>
<li>time saving: 48%</li>
</ul>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=35192014-06-26T23:53:03ZJunxiao Shi
<ul><li><strong>Related to</strong> <i><a class="issue tracker-3 status-5 priority-2 priority-default closed" href="/issues/1707">Task #1707</a>: Reduce implicit digest computation in Interest::matchesData</i> added</li></ul> NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=46402014-08-27T09:55:27ZJunxiao Shi
<ul><li><strong>Assignee</strong> set to <i>Yi Huang</i></li><li><strong>Target version</strong> set to <i>v0.3</i></li></ul> NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=46982014-09-01T21:47:00ZYi Huangethanhuang1991@gmail.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=46992014-09-01T21:50:04ZJunxiao Shi
<ul></ul><p>@Yi, don't hurry with the implementation of this Task. ContentStore needs a major refactoring, and this is only part of it. Do read the slides.</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=48422014-09-12T12:59:54ZJunxiao Shi
<ul></ul><p>@Yi did a benchmark on a Macbook Pro:</p>
<ul>
<li>CS insertion computes implicit digest: 0.443175 seconds</li>
<li>pre-computed implicit digest - CS insertion does not compute implicit digest: 0.165358 seconds</li>
<li>time saving: 63%</li>
</ul>
<p>Implicit digest computation consumes a greater percentage of time in CS insertion on Macbook.</p>
<p>Another observation is: the same 70000 insertions is 100+ times faster than my benchmark on a Ubuntu 12.04 VM.<br><br>
It's surprising to see such a significant difference.<br>
This difference is likely to be more than hardware.</p>
<p>I don't remember what compiler options I was using when doing the benchmark.<br>
Yi should repeat the benchmark on Ubuntu using the same compiler options that were used on Macbook.</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=49832014-09-24T15:29:23ZJunxiao Shi
<ul></ul><p>Since <code>tables-concept-cs_20140117.pptx</code> is written, <code>Data::getFullName</code> function is introduced in ndn-cxx.<br>
This function computes the implicit digest upon first invocation, and caches the result internally.<br>
It would be a duplicate to store a copy of full Name outside of the <code>Data</code> object.</p>
<p>A solution to this problem is: create a <a href="http://en.cppreference.com/w/cpp/concept/Compare" class="external">Compare</a> of Data where <code>Data::getFullName</code> invocation is minimized, in order to establish an ordered sequence.<br>
CS algorithms never expressly test whether implicit digest has been computed, but only obtain full Name when it's required for a comparison.</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=50092014-09-28T16:54:55ZYi Huangethanhuang1991@gmail.com
<ul></ul><p>Here is my design of implementing CS:</p>
<p>What does not change:</p>
<ul>
<li>All public functions have the same name so there is no need to rewrite code outside the CS.</li>
<li>I did not think of a better way to keep the cleaning index. Therefore, I am keeping the use of boost::multi_index_container.</li>
<li>There is not many changes to current CS Entry. I just removed things that manipulate skiplist iterators.</li>
</ul>
<p>Using SkipList implementation from repo-ng:</p>
<ul>
<li>The SkipList can be created with a type and a comparator of that type. Inserting and Finding elements takes a object of that type. Here's my design of using the SkipList:
<ul>
<li>To reduce unnecessary digest computation, the comparator function should first compare "getName()". If "getName()" appear to be the same on both side, compare "getFullName()".</li>
<li>Since inserting and finding in SkipList requires a object of the type of element in the list, we need to construct a new CS entry once Cs::insert is called (same for Cs::find). This is because the SkipList cannot search object by index. I don't know whether this is tolerable overhead.</li>
</ul></li>
</ul>
<p>When insert to CS:</p>
<ol>
<li>A new CS entry with new data will be created.</li>
<li>Call SkipList::find to see whether there is an entry with the same name (including digest if needed) already there in SkipList.</li>
<li>If yes:
<ol>
<li>check unsolicited. If old entry is not unsolicited and new entry is, discard the new entry and return. </li>
<li>Update stale time then return.</li>
</ol></li>
<li>If no:
<ol>
<li>Evict one entry if Cs is full.</li>
<li>Insert the new entry to CleanupIndex</li>
<li>Insert the new entry into SkipList then return.</li>
</ol></li>
</ol>
<p>When lookup (This is basically the steps described in Junxiao's slides. Moving forward in SkipList is done by using iterator provided by SkipList in 2nd step):</p>
<ol>
<li>A new CS entry with new data will be created.</li>
<li>Use "lower_bound()" to locate starting point. And set nameLength to the number of components in the Interest, set lastMatch to nil.</li>
<li>If last component in Interest Name may be an implicit digest, compute the digest of current CS entry.</li>
<li>If Interest Name is not a prefix of current CS entry's Name plus implicit digest if computed, goto step 9.</li>
<li>if current CS entry violates MinSuffixComponents, MaxSuffixComponents, PublisherPublicKeyLocator, Exclude, MustBeFresh selectors, go to step 8.</li>
<li>If ChildSelector prefers leftmost child, return current CS entry.</li>
<li>If ChildSelector prefers rightmost child, and ((lastMatch is nil) or (current CS entry and lastMatch have different nameLength-th component)), set lastMatch to current CS entry.</li>
<li>Move to next CS entry in the ordered sequence, and goto step 3.</li>
<li>Return lastMatch.</li>
</ol>
<p>When evict entry:</p>
<ol>
<li>Unsolicited entries are evicted first.</li>
<li>Stale entries are evicted next.</li>
<li>If there is no unsolicited or stale entry, other entries are evicted by the order they are created.</li>
<li>This can be maintained by boost::multi_index_container.</li>
</ol>
<p>For periodically cleanup:</p>
<ul>
<li>We decided not to do this during our last meeting with Beichuan.</li>
</ul>
<p>Additional questions:</p>
<ul>
<li>Since now we are using SkipList from repo-ng, I think we need to make NFD depends on it. I am not very familiar with waf. How do I do that?</li>
<li>Like I mentioned above. Is creating new CS entry just for finding existing entry in SkipList a tolerable overhead?</li>
</ul>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=50102014-09-28T17:20:55ZJunxiao Shi
<ul><li><strong>File</strong> <a href="/attachments/149">repo-index_20140617.pptx</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/149/repo-index_20140617.pptx">repo-index_20140617.pptx</a> added</li></ul><p>An alternative to SkipList is to add an ordered index to the multi-index container, where the order is determined by full Name.<br>
Please explain why SkipList is chosen over an ordered index.</p>
<p>If SkipList is proven to be beneficial, constructing an Entry during lookups is acceptable because time complexity of doing so is constant.<br>
Such Entry should be placed on the stack rather than the heap.</p>
<p>Please justify the choice of eviction single entry during insertion, over periodical mass evictions.</p>
<p>repo-index_20140617.pptx describes two Interest matching algorithms when ChildSelector=rightmost.<br>
The algorithm in note-9 is the "alternate" described in these slides, and it is inefficient in certain cases.<br>
I suggest using the first algorithm.</p>
<p>NFD should never depend on repo-ng.<br>
I suggest moving SkipList to ndn-cxx, if SkipList is proven to be beneficial for ContentStore.</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=50112014-09-29T13:02:27ZYi Huangethanhuang1991@gmail.com
<ul></ul><p>I am actually not sure whether our skiplist is going to out perform multi-index container. I did a quick look up on multi-index container's performance. It says the implementation is either matching or better than performance of STL containers. </p>
<p><a href="http://www.boost.org/doc/libs/1_56_0/libs/multi_index/doc/performance.html">http://www.boost.org/doc/libs/1_56_0/libs/multi_index/doc/performance.html</a></p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=50122014-09-29T13:25:02ZJunxiao Shi
<ul></ul><p>Neither STL nor Boost has a SkipList.<br>
The SkipList is implemented by repo-ng author, and is not STL container.</p>
<p>Please do some analysis or benchmark to compare which one of the following is faster:</p>
<ul>
<li>SkipList for name ordering, Multi-Index Container for cleanup</li>
<li>Multi-Index Container for both name ordering and cleanup</li>
</ul>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=52122014-10-08T14:11:53ZAlex Afanasyev
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-5 priority-2 priority-default closed" href="/issues/1207">Feature #1207</a>: CS policy interface</i> added</li></ul> NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=53842014-10-17T08:34:45ZJunxiao Shi
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>New</i></li><li><strong>Assignee</strong> deleted (<del><i>Yi Huang</i></del>)</li></ul><p>Yi is assigned to other tasks, so this Task is back to New status. The whole ContentStore redesign work is blocked by <em>ImplicitSha256DigestComponent</em> spec approval and library support.</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=58012014-11-05T09:26:32ZJunxiao Shi
<ul><li><strong>Blocked by</strong> <i><a class="issue tracker-3 status-5 priority-2 priority-default closed" href="/issues/1640">Task #1640</a>: ImplicitSha256DigestComponent</i> added</li></ul> NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=61252014-11-17T21:25:16ZJunxiao Shi
<ul></ul><p>A possible starting point of this Task is <a href="http://gerrit.named-data.net/447">http://gerrit.named-data.net/447</a> patchset 4.<br><br>
Although the commit is called "stub", it's a much cleaner ContentStore.</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=67722014-12-14T20:51:04ZJunxiao Shi
<ul><li><strong>Related to</strong> deleted (<i><a class="issue tracker-2 status-5 priority-2 priority-default closed" href="/issues/1207">Feature #1207</a>: CS policy interface</i>)</li></ul> NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=67742014-12-14T20:51:17ZJunxiao Shi
<ul><li><strong>Blocked by</strong> <i><a class="issue tracker-3 status-5 priority-2 priority-default closed" href="/issues/2254">Task #2254</a>: Rewrite ContentStore based on ContentStore stub</i> added</li></ul> NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=67972014-12-15T13:00:23ZJunxiao Shi
<ul><li><strong>Assignee</strong> set to <i>Shashwat Vidhu Sher</i></li></ul> NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=75772015-01-27T00:05:56ZAlex Afanasyev
<ul></ul><p>Does the merged CS implementation implements the delayed digest calculation or not yet?</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=75922015-01-27T12:28:07ZShashwat Vidhu Shervidhusher@gmail.com
<ul><li><strong>Assignee</strong> deleted (<del><i>Shashwat Vidhu Sher</i></del>)</li></ul><p>Dropping the task due to parallel committments</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=76672015-01-28T18:13:56ZJunxiao Shi
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>Junxiao Shi</i></li></ul><p>Before I begin implementation, here's a baseline benchmark:</p>
<table><thead>
<tr>
<th>test case</th>
<th>platform</th>
<th>regular</th>
<th>without digest computation</th>
<th>saving</th>
</tr>
</thead><tbody>
<tr>
<td>find(miss)-insert</td>
<td>Ubuntu 14.10</td>
<td>7412611</td>
<td>4023103</td>
<td>46%</td>
</tr>
<tr>
<td>find(miss)-insert</td>
<td>OSX 10.9</td>
<td>3037630</td>
<td>1814464</td>
<td>40%</td>
</tr>
<tr>
<td>insert-find(hit)</td>
<td>Ubuntu 14.10</td>
<td>6799756</td>
<td>3731573</td>
<td>45%</td>
</tr>
<tr>
<td>insert-find(hit)</td>
<td>OSX 10.9</td>
<td>3028404</td>
<td>1825252</td>
<td>40%</td>
</tr>
</tbody></table>
<p>Time reported is the total duration for the timed portion, in microseconds.<br><br>
"without digest computation" comes from a modified benchmark where <code>Data::getFullName</code> is invoked on every Data packet before timed portion.<br><br>
"saving" is calculated as (1 - "without digest computation" / regular).</p>
<p>The projected saving is an upper bound of the actual saving when this Task completes, because:</p>
<ul>
<li>Digest computation is necessary when relative order between a Data packet and another Data packet with same Name must be determined, and in certain query conditions.</li>
<li>Additional complexity caused by this Task would increase constant factor.</li>
</ul>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=76692015-01-28T19:07:16ZJunxiao Shi
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>40</i></li></ul><p><a href="http://gerrit.named-data.net/1688">http://gerrit.named-data.net/1688</a></p>
<p>The initial implementation is uploaded as commit 56591cd366368630b84bb06fbd3778ba71b12012.</p>
<p>Benchmark result is unsatisfactory:</p>
<table><thead>
<tr>
<th>test case</th>
<th>platform</th>
<th>current</th>
<th>projected</th>
<th>56591cd3</th>
</tr>
</thead><tbody>
<tr>
<td>find(miss)-insert</td>
<td>Ubuntu 14.10</td>
<td>7412611</td>
<td>4023103</td>
<td>7130155</td>
</tr>
<tr>
<td>find(miss)-insert</td>
<td>OSX 10.9</td>
<td>3037630</td>
<td>1814464</td>
<td>2904101</td>
</tr>
<tr>
<td>insert-find(hit)</td>
<td>Ubuntu 14.10</td>
<td>6799756</td>
<td>3731573</td>
<td>7106507</td>
</tr>
<tr>
<td>insert-find(hit)</td>
<td>OSX 10.9</td>
<td>3028404</td>
<td>1825252</td>
<td>2915203</td>
</tr>
</tbody></table>
<p>There might be hidden <code>getFullName</code> call somewhere.<br>
If I pre-compute the digest, benchmark results are: 3776805, 1754864, 3874197, 1758086.</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=76742015-01-28T19:42:39ZJunxiao Shi
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Code review</i></li><li><strong>% Done</strong> changed from <i>40</i> to <i>100</i></li></ul><p>I added a counter of expected <code>getFullName</code> calls into commit 56591cd366368630b84bb06fbd3778ba71b12012, and it's surprising to see that <code>getFullName</code> is called 600000 times in either test case.</p>
<p>This is caused by the benchmark design:<br>
The benchmark generates 50000 unique Names, and each Name is reused four times to make 200000 Data packets, which are inserted into a ContentStore of 50000 capacity.<br>
The 50001st Data being inserted has the same Name as the 1st Data. This requires a comparison between them in order to determine the relative order, which would require <code>getFullName</code> on both Data packets.</p>
<p>The solution is to change the benchmark to generate 100000 unique Names.<br>
The 100001st Data being inserted will have the same Name as the 1st Data, but the 1st Data is already evicted by that time.<br><br>
The new benchmark is in commit 9b842ad6913a2066aa74fa58eedb151916a9d33e.</p>
<p>New benchmark results (including baseline) are: (note that each test case now has 400000 iterations)</p>
<table><thead>
<tr>
<th>test case</th>
<th>platform</th>
<th>current</th>
<th>projected</th>
<th>9b842ad6</th>
</tr>
</thead><tbody>
<tr>
<td>find(miss)-insert</td>
<td>Ubuntu 14.10</td>
<td>18374417</td>
<td>9441264</td>
<td>9591541</td>
</tr>
<tr>
<td>find(miss)-insert</td>
<td>OSX 10.9</td>
<td>7625666</td>
<td>4877360</td>
<td>4722637</td>
</tr>
<tr>
<td>insert-find(hit)</td>
<td>Ubuntu 14.10</td>
<td>17559825</td>
<td>9859690</td>
<td>9975561</td>
</tr>
<tr>
<td>insert-find(hit)</td>
<td>OSX 10.9</td>
<td>7574497</td>
<td>4962484</td>
<td>4799287</td>
</tr>
</tbody></table>
<p>I believe this result is good enough.</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=77992015-02-01T19:12:57ZJunxiao Shi
<ul></ul><blockquote>
<p>I can think of one potential issue with multiple data packets with the same name (different digests). What is the intended behavior for CS? Keep just one copy?</p>
</blockquote>
<p>At API level, ContentStore is not obligated to store or keep any packet.<br><br>
In current design, the ContentStore will store all Data packets with same Name but different digests.<br>
This behavior is tested in <code>DigestOrder</code> and <code>DigestExclude</code> test cases.</p>
<p>The computation cost is bounded:<br>
When second Data packet with same Name arrives, implicit digests are computed for both Data packets.<br>
When third or more Data packet with same Name arrives, implicit digest is computed for the incoming Data packets, while cached Data packets already have their implicit digests computed.<br><br>
At most two implicit digest computations can occur per insertion.</p>
NFD - Task #1706: Reduce implicit digest computation in ContentStorehttps://redmine.named-data.net/issues/1706?journal_id=78492015-02-02T20:32:34ZJunxiao Shi
<ul><li><strong>Status</strong> changed from <i>Code review</i> to <i>Closed</i></li></ul>