Project

General

Profile

Actions

Feature #2828

open

Add Pattern Inference in NDN Regular Expression

Added by Qiuhan Ding over 9 years ago. Updated over 9 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Utils
Target version:
-
Start date:
05/20/2015
Due date:
% Done:

0%

Estimated time:

Description

Add Pattern Inference in NDN Regular Expression to support new trust schema.

Pattern Inference is to derive patterns of original regex pattern with additional knowledge. Basically, we pass a list of arguments to regex, the number of which will be equal to the number of back references in the original pattern. Then the regex will match its back reference with these arguments and update the regex. New pattern derived from the updated regex will get all its back references filled.

There are three possible types of arguments: name, "null" or "NA" (case insensitive for the last two). name will replace the sub group with a new pattern, "null" will remove the sub group. "NA" will leave the sub group as it is.

For example: If I have a pattern like:

<ndn><ucla>(<>)(<>*)

And I should pass a vector with two elements.
The return of vector ["/qiuhan", "/mac/temp"] would be

<ndn><ucla><qiuhan><mac><temp>

The return of vector ["/qiuhan", "NA"] would be

<ndn><ucla><qiuhan>(<>*)

The return of vector ["/qiuhan", "null"] would be

<ndn><ucla><qiuhan>


Related issues 1 (0 open1 closed)

Blocked by ndn-cxx - Task #2829: Write New Trust Schema Design SpecificationClosedYingdi Yu05/20/2015

Actions
Actions #1

Updated by Junxiao Shi over 9 years ago

Please explain what is pattern inference.

Please upload the "new schema".

Actions #2

Updated by Yingdi Yu over 9 years ago

  • Category set to Security
  • Assignee set to Qiuhan Ding
Actions #3

Updated by Yingdi Yu over 9 years ago

  • Blocked by Task #2829: Write New Trust Schema Design Specification added
Actions #4

Updated by Qiuhan Ding over 9 years ago

Junxiao Shi wrote:

Please upload the "new schema".

About the new schema spec, I will update it soon...

Actions #5

Updated by Junxiao Shi over 9 years ago

  • Tracker changed from Task to Feature
  • Category changed from Security to Utils

Regex should be part of Utils, not Security.

Security schema changes should go into a separate issue.

Actions #6

Updated by Qiuhan Ding over 9 years ago

  • Description updated (diff)
Actions #7

Updated by Qiuhan Ding over 9 years ago

  • Description updated (diff)
Actions #8

Updated by Junxiao Shi over 9 years ago

  • Description updated (diff)
Actions #9

Updated by Junxiao Shi over 9 years ago

Why does pattern inference require complete knowledge?
Requiring complete knowledge means all captures are gone.

I think partial knowledge inference is more useful.
For example: Regex("<ndn><ucla>(<>)(<>*)").infer("ndn:/qiuhan", NA) could return Regex("<ndn><ucla><qiuhan>(<>*)").

Why does pattern inference take Names as knowledge?

I think taking partial patterns is more useful.
For example: Regex("<ndn><ucla>(<>)").infer("<ab*c>") could return Regex("<ndn><ucla><ab*c>").

This allows the result regex to do non-exact matching.

Also, error conditions should be defined.

What happens if the input knowledge doesn't match the capture group in the first place?
For example, Regex("<ndn><ucla>(<ab*c>)").infer("ndn:/ddd").

Actions #10

Updated by Yingdi Yu over 9 years ago

Junxiao Shi wrote:

Why does pattern inference require complete knowledge?
Requiring complete knowledge means all captures are gone.

I think partial knowledge inference is more useful.
For example: Regex("<ndn><ucla>(<>)(<>*)").infer("ndn:/qiuhan", NA) could return Regex("<ndn><ucla><qiuhan>(<>*)").

NA should be supported.

Why does pattern inference take Names as knowledge?

I think taking partial patterns is more useful.
For example: Regex("<ndn><ucla>(<>)").infer("<ab*c>") could return Regex("<ndn><ucla><ab*c>").

This allows the result regex to do non-exact matching.

I am afraid this feature is too general, and we do not use it. The implementation of such feature is non-trivial, I would defer the feature.

Also, error conditions should be defined.

What happens if the input knowledge doesn't match the capture group in the first place?
For example, Regex("<ndn><ucla>(<ab*c>)").infer("ndn:/ddd").

Agree.

Actions #11

Updated by Qiuhan Ding over 9 years ago

  • Description updated (diff)
Actions #12

Updated by Junxiao Shi over 9 years ago

New pattern derived from the updated regex will get all its back references filled.

"all" is inaccurate when "NA" is supported.

There are three possible types of arguments: name, "null" or "NA" (case insensitive for the last two).

What's the parameter type?

  • Name: ambigious, because Name("NA") is a valid Name
  • std::string: possibly inefficient
  • another type that is implicitly convertible from Name and has special values for "null" and "NA": unnecessary to declare "case insensitive" because special values should come from named constants

name will replace the sub group with a new pattern,

This implies the Name is relative, which contradicts with the decision in #1962 note-2 and therefore is semantically incorrect.

std::vector<name::Component> should be used instead.

"null" will remove the sub group.

How does this differ from passing an empty Name?

"NA" will leave the sub group as it is.

Actions #13

Updated by Qiuhan Ding over 9 years ago

Junxiao Shi wrote:

New pattern derived from the updated regex will get all its back references filled.

"all" is inaccurate when "NA" is supported.

There are three possible types of arguments: name, "null" or "NA" (case insensitive for the last two).

What's the parameter type?

  • Name: ambigious, because Name("NA") is a valid Name
  • std::string: possibly inefficient
  • another type that is implicitly convertible from Name and has special values for "null" and "NA": unnecessary to declare "case insensitive" because special values should come from named constants

name will replace the sub group with a new pattern,

This implies the Name is relative, which contradicts with the decision in #1962 note-2 and therefore is semantically incorrect.

std::vector<name::Component> should be used instead.

My previous design is that when pass a name the back reference matcher will first check whether the name matches the regex(using original match function in matcher that accept Name type as input), then get the matching result. So if change to a vector of Component, do you suggest that I add another match function to match this? Another point is that the names we pass into the pattern inference are usually derived from another name's sub groups(the "id as a function" as new schema spec described). In that case, I choose to use expand function to get the sub groups name. Do you think I need to change this function to return a vector of Component?

"null" will remove the sub group.

How does this differ from passing an empty Name?

"NA" will leave the sub group as it is.

Actions #14

Updated by Yingdi Yu over 9 years ago

Just had a discussion with Qiuhan and Haitao, we feel that although na is generally correct as an input to pattern inference, the algorithm and implementation is really complicated. And indeed schema does not need this feature. Therefore, to avoid blocking the schema implementation, we can defer the implementation of supporting na. We will add the na support only if it is needed at some other places.

If Junxiao insists to have this feature, please provide a valid algorithm for matching pattern against pattern. Otherwise we will go with the implementation without na.

Actions #15

Updated by Junxiao Shi over 9 years ago

note-9 does not insist in having NA feature. It's asking for a spec confirmation.

I know nothing about how regex is implemented.

Actions #16

Updated by Junxiao Shi over 9 years ago

name will replace the sub group with a new pattern,

This implies the Name is relative, which contradicts with the decision in #1962 note-2 and therefore is semantically incorrect.

std::vector<name::Component> should be used instead.

My previous design is that when pass a name the back reference matcher will first check whether the name matches the regex(using original match function in matcher that accept Name type as input), then get the matching result. So if change to a vector of Component, do you suggest that I add another match function to match this? Another point is that the names we pass into the pattern inference are usually derived from another name's sub groups(the "id as a function" as new schema spec described). In that case, I choose to use expand function to get the sub groups name. Do you think I need to change this function to return a vector of Component?

Yes, the APIs shall be changed to take a RelativeName aka "vector of Components".

Name is always absolute as decided in #1962 note-2. It's semantically wrong to treat some Name as relative.

Actions #17

Updated by Junxiao Shi over 9 years ago

In commit:5c1ca16b75c64dcd0ae075cd8292526803d70407, the spec says:

When nullptr is passed, the relevant sub-pattern will be removed.
For example, if the original pattern is <ndn><edu>(<>)(<>*)<>+ and ["ucla", nullptr] is passed to it.
The inferred pattern would be <ndn><edu><ucla><>+.

I had doubt on this definition:

This is either redundant or wrong.

If the sub-pattern can match an empty PartialName, this is redundant, because caller can pass an empty PartialName instead of a nullptr.

If the sub-pattern cannot match an empty PartialName, this is wrong, because the inferred pattern could match a name that is not matched by original pattern. Inferred pattern should always match a subset (not a superset) of what the original pattern could match.

For example, '([])(<>)<>+'.infer({nullptr, "ucla"}) shouldn't return <>+ because <>+ would accept ndn:/ndn/ucla/irl but the original regex ([])(<>)<>+ would reject ndn:/ndn/ucla/irl

Qiuhan explained:

Pattern derivation is not designed for matching a subset of original pattern.

It is used for deriving signer's name patterns from packet name pattern (the original pattern in this case).

So it is not necessary that the signer's pattern should reject a name that a packet pattern rejects.

There is an example in the schema docs:

rule {
  id key
  name (<>*)(<>)<KEY>[id]<ID-CERT>[version]
  signer key($1,null)|root()
}

This explanation seems reasonable. I'm copying it here so I can remember this.

Actions

Also available in: Atom PDF