Project

General

Profile

Feature #4050

Content Store Manager

Added by Davide Pesavento about 2 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Management
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:

Description

Design Content Store management protocol CsMgmt and implement the corresponding manager for it.

In this issue, the protocol shall support retrieving and updating CS config, including:

  • Get the maximum number of entries
  • Set the maximum number of entries (entries in excess are evicted)
  • Enable/disable the CS
    • This could be split into two orthogonal flags: (a) whether to cache incoming Data or not; and (b) whether to serve Data from the cache or not

Related issues

Related to NFD - Feature #4219: ContentStore hit/miss countersClosed

Related to NFD - Feature #4317: Content Store enumerationNew

Related to NFD - Feature #4318: Content Store flush/erase commandClosed

Related to NFD - Task #4498: 'nfdc cs' should be accepted as alias of 'nfdc cs info'Closed

History

#1 Updated by Junxiao Shi about 2 years ago

What's the application scenario, for each proposed feature?

#2 Updated by Davide Pesavento almost 2 years ago

Monitoring/measurement/experimentation.

#3 Updated by Qi Zhao almost 2 years ago

I am also interested in this feature. I think the controller should have this feature and it can help with experimentation as well. NDN-CC should also provide this feature as well.

#4 Updated by Davide Pesavento almost 2 years ago

  • Assignee set to Davide Pesavento

#5 Updated by Junxiao Shi almost 2 years ago

My main concern is on the scalability of the enumeration operation as part of the Management protocol.
On the NDN testbed, each router has 65536-entry CS. Suppose each record is 50 octets, and each Data packet in the dataset carries 8000 octets of payload, a full enumerate would produce 409 Data packets. Creating and signing 409 Data packets takes more than 1 second (according to #1589-1 benchmark on v1 KeyChain), which means NFD cannot process any new packets for more than 1 second, which is unacceptable.
Management thread (#3565) can eliminate the signing overhead, but it still requires the CS to be locked for some time in order to copy the packet names.

Monitoring/measurement/experimentation.

These purposes can be achieved with ndnSIM, where the scenario code has direct access to the nfd::Cs object. We just need to add methods to flush and enable/disable the CS.

#6 Updated by Davide Pesavento almost 2 years ago

Junxiao Shi wrote:

My main concern is on the scalability of the enumeration operation as part of the Management protocol.

Yes, that is my concern as well. However:

  1. The management protocol should not be constrained by current implementation limitations or inefficiencies, a future version of NFD may remove these limitations or a different forwarder may not have them in the first place.
  2. Enumeration can be a privileged operation.
  3. I'm not sure we actually need full CS enumeration. In fact, it would be enough as a first step to have a "query" mgmt interface that given a name + selectors returns any matching entry, basically wrapping Cs::find().
  4. Additionally, in the case of filtered enumeration, the CS manager can return only the first N matching entries, where N is a configuration parameter.

Monitoring/measurement/experimentation.

These purposes can be achieved with ndnSIM

I'm not interested in simulations.

#7 Updated by Junxiao Shi almost 2 years ago

it would be enough as a first step to have a "query" mgmt interface that given a name + selectors returns any matching entry, basically wrapping Cs::find().

You already have it (untested snippet):

auto interest = make_shared<Interest>("/A");
interest->setTag(make_shared<lp::NextHopFaceId>(254));
face.expressInterest(*interest, nullptr, nullptr, nullptr);

This Interest can only be forwarded to the special "content store" face, which means it will retrieve any Data present in the CS, but will be Nacked (with standards-conforming forwarding plane implementation) if the Data is missing.

#8 Updated by Davide Pesavento almost 2 years ago

Junxiao Shi wrote:

interest->setTag(make_shared<lp::NextHopFaceId>(254));

My understanding is that: (1) there's no guarantee that NextHopFaceId will be respected, and (2) everything concerning FaceId 254 is an internal implementation detail and must not be relied upon.

#9 Updated by Davide Pesavento almost 2 years ago

Another question is whether we need a new "cs" privilege for this manager or not.

Also, can we add the "max number of entries" to the GeneralStatus dataset, or do we need to create a separate CsStatus dataset for just one number?

#10 Updated by Davide Pesavento almost 2 years ago

  • Status changed from New to In Progress

#11 Updated by Davide Pesavento almost 2 years ago

  • Assignee changed from Davide Pesavento to Pauline Chan

#12 Updated by Junxiao Shi almost 2 years ago

can we add the "max number of entries" to the GeneralStatus dataset, or do we need to create a separate CsStatus dataset for just one number?

This can belong to status/general, if there's no plan for more fields.

#13 Updated by Davide Pesavento almost 2 years ago

The only 2 things I can think of right now are:

  1. Obtain the current state of the "cache" and "serve" flags (the "get" counterpart of the "set" described above).
  2. Get statistics on the CS, e.g. hit/miss count. However, I would expect monitoring software to poll these counters much more frequently than other relatively static values, so keeping them separate might not be a bad idea.

Finally, status/general already provides NCsEntries, so it seems natural to add a MaxCsEntries there.

#14 Updated by Pauline Chan almost 2 years ago

Draft of the specification:
https://gist.github.com/pchan1/29d603fb490ac3fc706ce3caba6eafeb

Suggestions are welcomed

#15 Updated by Junxiao Shi almost 2 years ago

I've reviewed design as of revision 797a7998bac6dc65ddeb9eb1c4bde9bd7c76f372.

  • I do not understand this sentence:

    This command does not have a hard limit and does not account for limits that exceed memory space

  • What's a use case for a CS with "cache" enabled but "serve" disabled?

  • What's a use case for a CS with "serve" enabled but "cache" disabled?

  • Why are set-limit and set-flags commands needed?
    Similar functionality is already supported as (1) modify nfd.conf (2) send SIGHUP to trigger config reload.
    What's a use case that cannot be supported through existing config reload mechanism?
    Hint: a valid use case should require runtime changes of each setting, and the justification needs to state why modifying nfd.conf is impractical.

  • Assuming there is a compelling use case that requires commands, why not defining a single command that controls all attributes of the CS including capacity limit, serve/cache flags, and eviction policy?

  • Should flush command support flushing only Data packets under a certain prefix?
    If so, erase is a better name.

  • list dataset could be huge.
    Suppose the CS is at default capacity of 65536 packets and is completely full, and the average entry length is 32 octets, the dataset would have 263 segments each with 8000 octets of payload.
    Generating and signing so many Data packets would cause a 664ms delay (according to #1589-1 benchmark), which is unacceptable.
    However, this is not a protocol problem.

#16 Updated by Davide Pesavento almost 2 years ago

Junxiao Shi wrote:

  • I do not understand this sentence:

    This command does not have a hard limit and does not account for limits that exceed memory space

It means that we make no attempt to limit the CS limit based on currently available RAM or other metrics. So if you ask for a 50GB limit on a 1GB machine, you'll probably trigger an OOM condition.

  • What's a use case for a CS with "cache" enabled but "serve" disabled?

For testing/experimentation, it can simplify or accelerate cache warm-up in some wireless scenarios. Also for demonstration purposes (I saw a CICN demo where they turned off "serving data from the CS").

  • What's a use case for a CS with "serve" enabled but "cache" disabled?

We don't really have a strong use case in mind at the moment.
However, if the cache-on serve-off use case above is approved, then we still need at least 2 bits to represent all 3 possible states, therefore adding the cache-off serve-on combination comes basically for free.

  • Why are set-limit and set-flags commands needed? Similar functionality is already supported as (1) modify nfd.conf (2) send SIGHUP to trigger config reload. What's a use case that cannot be supported through existing config reload mechanism? Hint: a valid use case should require runtime changes of each setting, and the justification needs to state why modifying nfd.conf is impractical.

Changing nfd.conf requires root privileges.
Also, I could ask the same thing about StrategyChoice management...

  • Assuming there is a compelling use case that requires commands, why not defining a single command that controls all attributes of the CS including capacity limit, serve/cache flags, and eviction policy?

What's your idea, specifically? We're open to suggestions on this.

  • Should flush command support flushing only Data packets under a certain prefix? If so, erase is a better name.

Maybe, but we didn't have a use case. However, I agree that the protocol can use "erase" as command verb, with one parameter (the prefix), and it doesn't necessarily have to match nfdc command line.

  • list dataset could be huge. Suppose the CS is at default capacity of 65536 packets and is completely full, and the average entry length is 32 octets, the dataset would have 263 segments each with 8000 octets of payload. Generating and signing so many Data packets would cause a 664ms delay (according to #1589-1 benchmark), which is unacceptable. However, this is not a protocol problem.

We already discussed about this (see above). We'll probably make this a "query" operation rather than "list".
For "query", the question is whether it finds only a single match or multiple matches. In the latter case, one can probably construct a query that matches every packet in the CS, so the scaling problem is still there. However, a CS Manager implementation may decide to truncate the response if it's larger than a predefined threshold.

#17 Updated by Junxiao Shi over 1 year ago

Changing nfd.conf requires root privileges.

This reason is invalid. The sysadmin can create a UNIX group as the owner of /etc/ndn/nfd.conf, and add users who are authorized to modify this file to the group.

Also, I could ask the same thing about StrategyChoice management...

Initially NFD management was designed to rely entirely on nfdc commands. table.strategy_choice config option was added later.

why not defining a single command that controls all attributes of the CS including capacity limit, serve/cache flags, and eviction policy?

What's your idea, specifically? We're open to suggestions on this.

I'd prefer a single set-config command for all these settings, because they can be changed together.
In particular, changing eviction policy requires clearing the entire CS. If both eviction policy and capacity limit are to be changed, it's more efficient to do them together. Otherwise, assuming there is a clear method on the existing eviction policy that is faster than repeatedly calling evict, in case the caller lowers capacity limit and then changes eviction policy, there is useless work during the first command for evicting excess entries first, given the second command would clear the CS anyways.

Symmetrically, there should be a config StatusDataset that reveals the current config.

The set of available eviction policies needs a separate cs/available-policies dataset, to support nfdc autocompletion.

I agree that the protocol can use "erase" as command verb, with one parameter (the prefix), and it doesn't necessarily have to match nfdc command line.

erase is more general and should be preferred. My use case is to clean up poisoned Data.

For "query", the question is whether it finds only a single match or multiple matches. In the latter case, one can probably construct a query that matches every packet in the CS, so the scaling problem is still there. However, a CS Manager implementation may decide to truncate the response if it's larger than a predefined threshold.

query definitely should support multiple matches. The desired number of entries should be provided as a parameter in the Interest, with an implementation-defined upper bound. Paging support is unnecessary.
Also, the Dispatcher's StatusDataset signing can be optimized: it can defer signing a segment until an Interest asking for this segment arrives. This still requires signing all the segments, but it does not block NFD for long time, and gives other faces a chance to compete for NFD forwarding processing.

#18 Updated by Davide Pesavento over 1 year ago

#19 Updated by Davide Pesavento over 1 year ago

Junxiao Shi wrote:

Changing nfd.conf requires root privileges.

This reason is invalid. The sysadmin can create a UNIX group as the owner of /etc/ndn/nfd.conf, and add users who are authorized to modify this file to the group.

Your solution is inferior, as it is too coarse-grained (all or nothing). With NFD mgmt, the admin can grant only the "cs" privilege to a user, disallowing all other changes.

I agree that the protocol can use "erase" as command verb, with one parameter (the prefix), and it doesn't necessarily have to match nfdc command line.

erase is more general and should be preferred. My use case is to clean up poisoned Data.

Sure. What I meant was that we can have a nfdc cs flush command (similar to ip address flush) in addition to nfdc cs erase <prefix>.

#20 Updated by Junxiao Shi over 1 year ago

I agree with note-19.

I feel this scope of this issue is too board and thus it's going to take very long time (I'm estimating more than 6 months). I suggest splitting get/set config, enumeration, and deleting data as three issues. Then, we can start with get/set config which is the easiest.

#21 Updated by Davide Pesavento over 1 year ago

  • Assignee changed from Pauline Chan to Davide Pesavento
  • % Done changed from 0 to 10

#22 Updated by Junxiao Shi over 1 year ago

  • Description updated (diff)
  • Target version set to v0.7

Enumeration is split to #4317.
Flush is split to #4318.
Hit/miss counters are provided by #4219.

#23 Updated by Davide Pesavento over 1 year ago

  • Status changed from In Progress to Feedback
  • Assignee deleted (Davide Pesavento)
  • Target version deleted (v0.7)

#24 Updated by Davide Pesavento over 1 year ago

#25 Updated by Davide Pesavento over 1 year ago

  • Related to Feature #4318: Content Store flush/erase command added

#26 Updated by Junxiao Shi over 1 year ago

  • Status changed from Feedback to In Progress
  • Assignee set to Junxiao Shi
  • Target version set to v0.7
  • % Done changed from 10 to 20

Protocol update for cs/config command and cs/info dataset extension are in CsMgmt rev5 and ControlCommand rev34.

#27 Updated by Davide Pesavento over 1 year ago

Junxiao Shi wrote:

Protocol update for cs/config command and cs/info dataset extension are in CsMgmt rev5 and ControlCommand rev34.

LGTM.

I suggest adding NCsEntries to cs/info as well, so that we have both capacity and current # of entries in the same dataset.

#28 Updated by Junxiao Shi over 1 year ago

I suggest adding NCsEntries to cs/info as well, so that we have both capacity and current # of entries in the same dataset.

Agreed. This is added in CsMgmt rev7.

#29 Updated by Junxiao Shi over 1 year ago

  • % Done changed from 20 to 30

https://gerrit.named-data.net/4453 declares cs/config command in ndn-cxx.

#30 Updated by Junxiao Shi over 1 year ago

why don't we require the client to set the mask appropriately, as in FaceUpdateCommand

Agreed.
ControlCommand rev35 and FaceMgmt rev73 and CsMgmt rev8 update the protocol. Definition of Mask field has been moved to ControlCommand page.

#31 Updated by Junxiao Shi over 1 year ago

https://gerrit.named-data.net/4453 patchset3 implements the updated protocol.

#32 Updated by Junxiao Shi over 1 year ago

  • % Done changed from 30 to 40

https://gerrit.named-data.net/4467 adds CS enablement flags to Cs class.

#33 Updated by Junxiao Shi over 1 year ago

https://gerrit.named-data.net/4468 adds Capacity and CS enablement flags to CsInfo struct.
It is a design choice to not have getFlags and to use the name getNStored instead of getNCsEntries. These function names are suitable for the semantics of this struct, instead of matching the encoding.

#34 Updated by Davide Pesavento over 1 year ago

Junxiao Shi wrote:

It is a design choice [...] to use the name getNStored instead of getNCsEntries.

Why not getSize or getNEntries?

These function names are suitable for the semantics of this struct, instead of matching the encoding.

This sentence is completely meaningless.

#35 Updated by Junxiao Shi over 1 year ago

Why not getSize or getNEntries?

getSize does not indicate whether the unit is “entries” or “octets”.
getNEntries is not better than getNStored, ie “number of stored entries”.

the field in the serialized/deserialized way is called NCsEntries (I thought it is the same until check). I think it should simply match that one and be called getNCsEntries / setNCsEntries

No.

  • TLV-TYPE does not have a name. It’s just a number.
  • The constant naming that number is indeed “NCsEntries”. This constant name is defined in the context of general status dataset. “CS” part would be duplicate in the context of CsInfo struct.
  • Public API of CsInfo is designed to make sense to the caller, not to match the encoding.

#36 Updated by Davide Pesavento over 1 year ago

Junxiao Shi wrote:

Why not getSize or getNEntries?

getSize does not indicate whether the unit is “entries” or “octets”.

Neither does getNStored.

getNEntries is not better than getNStored, ie “number of stored entries”.

getNEntries provides a more uniform naming since "entries" is used elsewhere in management with the same meaning. And see also the previous point.

the field in the serialized/deserialized way is called NCsEntries (I thought it is the same until check). I think it should simply match that one and be called getNCsEntries / setNCsEntries

No.

  • TLV-TYPE does not have a name. It’s just a number.
  • The constant naming that number is indeed “NCsEntries”. This constant name is defined in the context of general status dataset. “CS” part would be duplicate in the context of CsInfo struct.

True. But getNEntries doesn't contain this redundancy.

#37 Updated by Junxiao Shi about 1 year ago

CsInfo::wireDecode has a bug: when it encounters Capacity element, the value is incorrectly assigned to m_nHits instead of m_capacity. This is fixed in https://gerrit.named-data.net/4512. This bug was not caught earlier because:

  • The unit test passes the wire encoding to wireDecode, and then checks the decoded CsInfo equals the original.
  • To check the decoded CsInfo equals the original, the unit test uses operator==.
  • operator== does not invoke individual getters, but simply checks the encodings are equal.
  • wireDecode caches the encoding, so operator== is comparing the cached encoding.
  • As a result, there's no test coverage checking the correctness in decoded fields.

I have changed the test case to check the return value of each getter. To avoid the same mistake in the future, TLV struct decoding test cases should always check with individual getters.

#38 Updated by Junxiao Shi about 1 year ago

  • % Done changed from 40 to 70

https://gerrit.named-data.net/4514 implements cs/config command.

https://gerrit.named-data.net/4513 adds capacity and enablement flags to cs/info dataset (kept as draft until Change 4512 merges).

#39 Updated by Junxiao Shi about 1 year ago

Remaining part of this issue is adding nfdc commands. Design is below, but implementation needs to wait after https://gerrit.named-data.net/4489 to avoid conflicts.

nfdc cs info will switch to a multi-line format similar to nfdc status.

CS information:
  capacity=17274
     admit=on
     serve=off
  nEntries=4615
     nHits=8958
   nMisses=13540

There will be a new nfdc cs config command:

nfdc cs config [capacity <CAPACITY>] [admit on|off] [serve on|off]

#40 Updated by Davide Pesavento about 1 year ago

Junxiao Shi wrote:

Remaining part of this issue is adding nfdc commands. Design is below, but implementation needs to wait after https://gerrit.named-data.net/4489 to avoid conflicts.

4489 has been merged.

I agree with the proposed syntax for config and output format for info.

#41 Updated by Davide Pesavento about 1 year ago

  • Related to Task #4498: 'nfdc cs' should be accepted as alias of 'nfdc cs info' added

#42 Updated by Davide Pesavento about 1 year ago

#43 Updated by Davide Pesavento about 1 year ago

Junxiao, do you think you can finish this ASAP so that we can have it in v0.6.1?

#44 Updated by Davide Pesavento about 1 year ago

#45 Updated by Junxiao Shi about 1 year ago

  • % Done changed from 70 to 90

https://gerrit.named-data.net/4597 introduces text::OnOff helper for printing boolean as 'on' or 'off'.

https://gerrit.named-data.net/4598 adds nfdc cs config command. Addition to nfdc cs info will come in another commit.

#46 Updated by Junxiao Shi about 1 year ago

  • Status changed from In Progress to Code review
  • % Done changed from 90 to 100

https://gerrit.named-data.net/4604 nfdc cs info includes config fields; also updating XML, XSD, XSL, tested manually with online validator.
Extras: the other vertical output in nfdc status show is implemented alike. xml::Flag struct prints optional XML element for boolean field, so that unit test of formatItemXml does not need both true and false cases to achieve full coverage.

#47 Updated by Junxiao Shi about 1 year ago

  • Status changed from Code review to Closed

Also available in: Atom PDF