Feature #2828: Add Pattern Inference in NDN Regular Expression - ndn-cxx - NDN project issue tracking system

Actions

Copy link

Feature #2828

open

Add Pattern Inference in NDN Regular Expression

Added by Qiuhan Ding about 10 years ago. Updated almost 10 years ago.

Status:

New

Priority:

Normal

Assignee:

Qiuhan Ding

Category:

Utils

Target version:

Start date:

05/20/2015

Due date:

% Done:

Estimated time:

Description

Add Pattern Inference in NDN Regular Expression to support new trust schema.

Pattern Inference is to derive patterns of original regex pattern with additional knowledge. Basically, we pass a list of arguments to regex, the number of which will be equal to the number of back references in the original pattern. Then the regex will match its back reference with these arguments and update the regex. New pattern derived from the updated regex will get all its back references filled.

There are three possible types of arguments: name, "null" or "NA" (case insensitive for the last two). name will replace the sub group with a new pattern, "null" will remove the sub group. "NA" will leave the sub group as it is.

For example: If I have a pattern like:

<ndn><ucla>(<>)(<>*)

And I should pass a vector with two elements.
The return of vector ["/qiuhan", "/mac/temp"] would be

<ndn><ucla><qiuhan><mac><temp>

The return of vector ["/qiuhan", "NA"] would be

<ndn><ucla><qiuhan>(<>*)

The return of vector ["/qiuhan", "null"] would be

<ndn><ucla><qiuhan>

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Junxiao Shi about 10 years ago

Please explain what is pattern inference.

Please upload the "new schema".

Actions

Copy link

Updated by Yingdi Yu about 10 years ago

Category set to Security
Assignee set to Qiuhan Ding

Actions

Copy link

Updated by Yingdi Yu about 10 years ago

Blocked by Task #2829: Write New Trust Schema Design Specification added

Actions

Copy link

Updated by Qiuhan Ding about 10 years ago

Junxiao Shi wrote:

Please upload the "new schema".

About the new schema spec, I will update it soon...

Actions

Copy link

Updated by Junxiao Shi about 10 years ago

Tracker changed from Task to Feature
Category changed from Security to Utils

Regex should be part of Utils, not Security.

Security schema changes should go into a separate issue.

Actions

Copy link

Updated by Qiuhan Ding about 10 years ago

Description updated (diff)

Actions

Copy link

Updated by Qiuhan Ding about 10 years ago

Description updated (diff)

Actions

Copy link

Updated by Junxiao Shi about 10 years ago

Description updated (diff)

Actions

Copy link

Updated by Junxiao Shi about 10 years ago

Why does pattern inference require complete knowledge?
Requiring complete knowledge means all captures are gone.

I think partial knowledge inference is more useful.
For example: Regex("<ndn><ucla>(<>)(<>*)").infer("ndn:/qiuhan", NA) could return Regex("<ndn><ucla><qiuhan>(<>*)").

Why does pattern inference take Names as knowledge?

I think taking partial patterns is more useful.
For example: Regex("<ndn><ucla>(<>)").infer("<ab*c>") could return Regex("<ndn><ucla><ab*c>").

This allows the result regex to do non-exact matching.

Also, error conditions should be defined.

What happens if the input knowledge doesn't match the capture group in the first place?
For example, Regex("<ndn><ucla>(<ab*c>)").infer("ndn:/ddd").

Actions

Copy link

#10

Updated by Yingdi Yu about 10 years ago

Junxiao Shi wrote:

Why does pattern inference require complete knowledge?
Requiring complete knowledge means all captures are gone.

I think partial knowledge inference is more useful.
For example: Regex("<ndn><ucla>(<>)(<>*)").infer("ndn:/qiuhan", NA) could return Regex("<ndn><ucla><qiuhan>(<>*)").

NA should be supported.

Why does pattern inference take Names as knowledge?

I think taking partial patterns is more useful.
For example: Regex("<ndn><ucla>(<>)").infer("<ab*c>") could return Regex("<ndn><ucla><ab*c>").

This allows the result regex to do non-exact matching.

I am afraid this feature is too general, and we do not use it. The implementation of such feature is non-trivial, I would defer the feature.

Also, error conditions should be defined.

What happens if the input knowledge doesn't match the capture group in the first place?
For example, Regex("<ndn><ucla>(<ab*c>)").infer("ndn:/ddd").

Agree.

Actions

Copy link

#11

Updated by Qiuhan Ding about 10 years ago

Description updated (diff)

Actions

Copy link

#12

Updated by Junxiao Shi about 10 years ago

New pattern derived from the updated regex will get all its back references filled.

"all" is inaccurate when "NA" is supported.

There are three possible types of arguments: name, "null" or "NA" (case insensitive for the last two).

What's the parameter type?

Name: ambigious, because Name("NA") is a valid Name
std::string: possibly inefficient
another type that is implicitly convertible from Name and has special values for "null" and "NA": unnecessary to declare "case insensitive" because special values should come from named constants

name will replace the sub group with a new pattern,

This implies the Name is relative, which contradicts with the decision in #1962 note-2 and therefore is semantically incorrect.

std::vector<name::Component> should be used instead.

"null" will remove the sub group.

How does this differ from passing an empty Name?

"NA" will leave the sub group as it is.

Actions

Copy link

#13

Updated by Qiuhan Ding about 10 years ago

Junxiao Shi wrote:

New pattern derived from the updated regex will get all its back references filled.

"all" is inaccurate when "NA" is supported.

There are three possible types of arguments: name, "null" or "NA" (case insensitive for the last two).

What's the parameter type?

Name: ambigious, because Name("NA") is a valid Name

std::string: possibly inefficient

another type that is implicitly convertible from Name and has special values for "null" and "NA": unnecessary to declare "case insensitive" because special values should come from named constants

name will replace the sub group with a new pattern,

This implies the Name is relative, which contradicts with the decision in #1962 note-2 and therefore is semantically incorrect.

std::vector<name::Component> should be used instead.

My previous design is that when pass a name the back reference matcher will first check whether the name matches the regex(using original match function in matcher that accept Name type as input), then get the matching result. So if change to a vector of Component, do you suggest that I add another match function to match this? Another point is that the names we pass into the pattern inference are usually derived from another name's sub groups(the "id as a function" as new schema spec described). In that case, I choose to use expand function to get the sub groups name. Do you think I need to change this function to return a vector of Component?

"null" will remove the sub group.

How does this differ from passing an empty Name?

"NA" will leave the sub group as it is.

Actions

Copy link

#14

Updated by Yingdi Yu about 10 years ago

Just had a discussion with Qiuhan and Haitao, we feel that although na is generally correct as an input to pattern inference, the algorithm and implementation is really complicated. And indeed schema does not need this feature. Therefore, to avoid blocking the schema implementation, we can defer the implementation of supporting na. We will add the na support only if it is needed at some other places.

If Junxiao insists to have this feature, please provide a valid algorithm for matching pattern against pattern. Otherwise we will go with the implementation without na.

Actions

Copy link

#15

Updated by Junxiao Shi about 10 years ago

note-9 does not insist in having NA feature. It's asking for a spec confirmation.

I know nothing about how regex is implemented.

Actions

Copy link

#16

Updated by Junxiao Shi about 10 years ago

name will replace the sub group with a new pattern,

This implies the Name is relative, which contradicts with the decision in #1962 note-2 and therefore is semantically incorrect.

std::vector<name::Component> should be used instead.

My previous design is that when pass a name the back reference matcher will first check whether the name matches the regex(using original match function in matcher that accept Name type as input), then get the matching result. So if change to a vector of Component, do you suggest that I add another match function to match this? Another point is that the names we pass into the pattern inference are usually derived from another name's sub groups(the "id as a function" as new schema spec described). In that case, I choose to use expand function to get the sub groups name. Do you think I need to change this function to return a vector of Component?

Yes, the APIs shall be changed to take a RelativeName aka "vector of Components".

Name is always absolute as decided in #1962 note-2. It's semantically wrong to treat some Name as relative.

Actions

Copy link

#17

Updated by Junxiao Shi almost 10 years ago

In commit:5c1ca16b75c64dcd0ae075cd8292526803d70407, the spec says:

When nullptr is passed, the relevant sub-pattern will be removed.
For example, if the original pattern is <ndn><edu>(<>)(<>*)<>+ and ["ucla", nullptr] is passed to it.
The inferred pattern would be <ndn><edu><ucla><>+.

I had doubt on this definition:

This is either redundant or wrong.

If the sub-pattern can match an empty PartialName, this is redundant, because caller can pass an empty PartialName instead of a nullptr.

If the sub-pattern cannot match an empty PartialName, this is wrong, because the inferred pattern could match a name that is not matched by original pattern. Inferred pattern should always match a subset (not a superset) of what the original pattern could match.

For example, '([])(<>)<>+'.infer({nullptr, "ucla"}) shouldn't return <>+ because <>+ would accept ndn:/ndn/ucla/irl but the original regex ([])(<>)<>+ would reject ndn:/ndn/ucla/irl

Qiuhan explained:

Pattern derivation is not designed for matching a subset of original pattern.

It is used for deriving signer's name patterns from packet name pattern (the original pattern in this case).

So it is not necessary that the signer's pattern should reject a name that a packet pattern rejects.

There is an example in the schema docs:

rule {
  id key
  name (<>*)(<>)<KEY>[id]<ID-CERT>[version]
  signer key($1,null)|root()
}

This explanation seems reasonable. I'm copying it here so I can remember this.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD » ndn-cxx

Tags

Feature #2828

Add Pattern Inference in NDN Regular Expression

Updated by Junxiao Shi about 10 years ago

Updated by Yingdi Yu about 10 years ago

Updated by Yingdi Yu about 10 years ago

Updated by Qiuhan Ding about 10 years ago

Updated by Junxiao Shi about 10 years ago

Updated by Qiuhan Ding about 10 years ago

Updated by Qiuhan Ding about 10 years ago

Updated by Junxiao Shi about 10 years ago

Updated by Junxiao Shi about 10 years ago

Updated by Yingdi Yu about 10 years ago

Updated by Qiuhan Ding about 10 years ago

Updated by Junxiao Shi about 10 years ago

Updated by Qiuhan Ding about 10 years ago

Updated by Yingdi Yu about 10 years ago

Updated by Junxiao Shi about 10 years ago

Updated by Junxiao Shi about 10 years ago

Updated by Junxiao Shi almost 10 years ago