Reduce implicit digest computation in ContentStore
Every time a Data packet is inserted to the ContentStore, its full Name (including the implicit digest component) is requested, which involves implicit digest computation; benchmark shows that implicit digest computation accounts for 48% time in a find-insert scenario.
The purpose of requesting a full Name upon insertion is to determine the relative order between two Data packets that have the same Name (without implicit digest), and to determine whether a cached Data can satisfy an Interest that may contain an implicit digest as its last Name component or exclude a range of implicit digests.
However, most Data packets have unique Names so their ordering can be determined without knowing the full Name, and most Interests do not appear to have an implicit digest as its last Name component and do not appear to exclude a range of implicit digests.
Therefore, we can optimize for the expected case, to eliminate implicit digest computation unless when necessary.
This Task is to implement the idea in
tables-concept-cs_20140117.pptx, so that implicit digest computation is performed only when necessary.
#2 Updated by Junxiao Shi about 5 years ago
The benchmark is done with a modified version of
data->getFullName()in the dataWorkload preparation loop, so that implicit digest is computed before CS insertion; or comment out this line, and CS insertion algorithm would have to compute implicit digest
- replace the loop of varying nInsertions with a single execution of 70000 insertions; this is needed because dataWorkload array would contain implicit digests after the first execution
The results are:
- CS insertion computes implicit digest: 62.0152 seconds
- pre-computed implicit digest - CS insertion does not compute implicit digest: 32.3143 seconds
- time saving: 48%
#7 Updated by Junxiao Shi about 5 years ago
@Yi did a benchmark on a Macbook Pro:
- CS insertion computes implicit digest: 0.443175 seconds
- pre-computed implicit digest - CS insertion does not compute implicit digest: 0.165358 seconds
- time saving: 63%
Implicit digest computation consumes a greater percentage of time in CS insertion on Macbook.
Another observation is: the same 70000 insertions is 100+ times faster than my benchmark on a Ubuntu 12.04 VM.
It's surprising to see such a significant difference.
This difference is likely to be more than hardware.
I don't remember what compiler options I was using when doing the benchmark.
Yi should repeat the benchmark on Ubuntu using the same compiler options that were used on Macbook.
#8 Updated by Junxiao Shi almost 5 years ago
tables-concept-cs_20140117.pptx is written,
Data::getFullName function is introduced in ndn-cxx.
This function computes the implicit digest upon first invocation, and caches the result internally.
It would be a duplicate to store a copy of full Name outside of the
A solution to this problem is: create a Compare of Data where
Data::getFullName invocation is minimized, in order to establish an ordered sequence.
CS algorithms never expressly test whether implicit digest has been computed, but only obtain full Name when it's required for a comparison.
#9 Updated by Yi Huang almost 5 years ago
Here is my design of implementing CS:
What does not change:
- All public functions have the same name so there is no need to rewrite code outside the CS.
- I did not think of a better way to keep the cleaning index. Therefore, I am keeping the use of boost::multi_index_container.
- There is not many changes to current CS Entry. I just removed things that manipulate skiplist iterators.
Using SkipList implementation from repo-ng:
- The SkipList can be created with a type and a comparator of that type. Inserting and Finding elements takes a object of that type. Here's my design of using the SkipList:
- To reduce unnecessary digest computation, the comparator function should first compare "getName()". If "getName()" appear to be the same on both side, compare "getFullName()".
- Since inserting and finding in SkipList requires a object of the type of element in the list, we need to construct a new CS entry once Cs::insert is called (same for Cs::find). This is because the SkipList cannot search object by index. I don't know whether this is tolerable overhead.
When insert to CS:
- A new CS entry with new data will be created.
- Call SkipList::find to see whether there is an entry with the same name (including digest if needed) already there in SkipList.
- If yes:
- check unsolicited. If old entry is not unsolicited and new entry is, discard the new entry and return.
- Update stale time then return.
- If no:
- Evict one entry if Cs is full.
- Insert the new entry to CleanupIndex
- Insert the new entry into SkipList then return.
When lookup (This is basically the steps described in Junxiao's slides. Moving forward in SkipList is done by using iterator provided by SkipList in 2nd step):
- A new CS entry with new data will be created.
- Use "lower_bound()" to locate starting point. And set nameLength to the number of components in the Interest, set lastMatch to nil.
- If last component in Interest Name may be an implicit digest, compute the digest of current CS entry.
- If Interest Name is not a prefix of current CS entry's Name plus implicit digest if computed, goto step 9.
- if current CS entry violates MinSuffixComponents, MaxSuffixComponents, PublisherPublicKeyLocator, Exclude, MustBeFresh selectors, go to step 8.
- If ChildSelector prefers leftmost child, return current CS entry.
- If ChildSelector prefers rightmost child, and ((lastMatch is nil) or (current CS entry and lastMatch have different nameLength-th component)), set lastMatch to current CS entry.
- Move to next CS entry in the ordered sequence, and goto step 3.
- Return lastMatch.
When evict entry:
- Unsolicited entries are evicted first.
- Stale entries are evicted next.
- If there is no unsolicited or stale entry, other entries are evicted by the order they are created.
- This can be maintained by boost::multi_index_container.
For periodically cleanup:
- We decided not to do this during our last meeting with Beichuan.
- Since now we are using SkipList from repo-ng, I think we need to make NFD depends on it. I am not very familiar with waf. How do I do that?
- Like I mentioned above. Is creating new CS entry just for finding existing entry in SkipList a tolerable overhead?
#10 Updated by Junxiao Shi almost 5 years ago
An alternative to SkipList is to add an ordered index to the multi-index container, where the order is determined by full Name.
Please explain why SkipList is chosen over an ordered index.
If SkipList is proven to be beneficial, constructing an Entry during lookups is acceptable because time complexity of doing so is constant.
Such Entry should be placed on the stack rather than the heap.
Please justify the choice of eviction single entry during insertion, over periodical mass evictions.
repo-index_20140617.pptx describes two Interest matching algorithms when ChildSelector=rightmost.
The algorithm in note-9 is the "alternate" described in these slides, and it is inefficient in certain cases.
I suggest using the first algorithm.
NFD should never depend on repo-ng.
I suggest moving SkipList to ndn-cxx, if SkipList is proven to be beneficial for ContentStore.
#11 Updated by Yi Huang almost 5 years ago
I am actually not sure whether our skiplist is going to out perform multi-index container. I did a quick look up on multi-index container's performance. It says the implementation is either matching or better than performance of STL containers.
#12 Updated by Junxiao Shi almost 5 years ago
Neither STL nor Boost has a SkipList.
The SkipList is implemented by repo-ng author, and is not STL container.
Please do some analysis or benchmark to compare which one of the following is faster:
- SkipList for name ordering, Multi-Index Container for cleanup
- Multi-Index Container for both name ordering and cleanup
#22 Updated by Junxiao Shi over 4 years ago
- Status changed from New to In Progress
- Assignee set to Junxiao Shi
Before I begin implementation, here's a baseline benchmark:
|test case||platform||regular||without digest computation||saving|
Time reported is the total duration for the timed portion, in microseconds.
"without digest computation" comes from a modified benchmark where
Data::getFullName is invoked on every Data packet before timed portion.
"saving" is calculated as (1 - "without digest computation" / regular).
The projected saving is an upper bound of the actual saving when this Task completes, because:
- Digest computation is necessary when relative order between a Data packet and another Data packet with same Name must be determined, and in certain query conditions.
- Additional complexity caused by this Task would increase constant factor.
#23 Updated by Junxiao Shi over 4 years ago
- % Done changed from 0 to 40
The initial implementation is uploaded as commit 56591cd366368630b84bb06fbd3778ba71b12012.
Benchmark result is unsatisfactory:
There might be hidden
getFullName call somewhere.
If I pre-compute the digest, benchmark results are: 3776805, 1754864, 3874197, 1758086.
#24 Updated by Junxiao Shi over 4 years ago
- Status changed from In Progress to Code review
- % Done changed from 40 to 100
I added a counter of expected
getFullName calls into commit 56591cd366368630b84bb06fbd3778ba71b12012, and it's surprising to see that
getFullName is called 600000 times in either test case.
This is caused by the benchmark design:
The benchmark generates 50000 unique Names, and each Name is reused four times to make 200000 Data packets, which are inserted into a ContentStore of 50000 capacity.
The 50001st Data being inserted has the same Name as the 1st Data. This requires a comparison between them in order to determine the relative order, which would require
getFullName on both Data packets.
The solution is to change the benchmark to generate 100000 unique Names.
The 100001st Data being inserted will have the same Name as the 1st Data, but the 1st Data is already evicted by that time.
The new benchmark is in commit 9b842ad6913a2066aa74fa58eedb151916a9d33e.
New benchmark results (including baseline) are: (note that each test case now has 400000 iterations)
I believe this result is good enough.
#25 Updated by Junxiao Shi over 4 years ago
I can think of one potential issue with multiple data packets with the same name (different digests). What is the intended behavior for CS? Keep just one copy?
At API level, ContentStore is not obligated to store or keep any packet.
In current design, the ContentStore will store all Data packets with same Name but different digests.
This behavior is tested in
DigestExclude test cases.
The computation cost is bounded:
When second Data packet with same Name arrives, implicit digests are computed for both Data packets.
When third or more Data packet with same Name arrives, implicit digest is computed for the incoming Data packets, while cached Data packets already have their implicit digests computed.
At most two implicit digest computations can occur per insertion.