Feature #2828
openAdd Pattern Inference in NDN Regular Expression
0%
Description
Add Pattern Inference in NDN Regular Expression to support new trust schema.
Pattern Inference is to derive patterns of original regex pattern with additional knowledge. Basically, we pass a list of arguments to regex, the number of which will be equal to the number of back references in the original pattern. Then the regex will match its back reference with these arguments and update the regex. New pattern derived from the updated regex will get all its back references filled.
There are three possible types of arguments: name, "null" or "NA" (case insensitive for the last two). name will replace the sub group with a new pattern, "null" will remove the sub group. "NA" will leave the sub group as it is.
For example: If I have a pattern like:
<ndn><ucla>(<>)(<>*)
And I should pass a vector with two elements.
The return of vector ["/qiuhan", "/mac/temp"] would be
<ndn><ucla><qiuhan><mac><temp>
The return of vector ["/qiuhan", "NA"] would be
<ndn><ucla><qiuhan>(<>*)
The return of vector ["/qiuhan", "null"] would be
<ndn><ucla><qiuhan>
Updated by Junxiao Shi over 9 years ago
Please explain what is pattern inference.
Please upload the "new schema".
Updated by Yingdi Yu over 9 years ago
- Category set to Security
- Assignee set to Qiuhan Ding
Updated by Yingdi Yu over 9 years ago
- Blocked by Task #2829: Write New Trust Schema Design Specification added
Updated by Qiuhan Ding over 9 years ago
Junxiao Shi wrote:
Please upload the "new schema".
About the new schema spec, I will update it soon...
Updated by Junxiao Shi over 9 years ago
- Tracker changed from Task to Feature
- Category changed from Security to Utils
Regex should be part of Utils, not Security.
Security schema changes should go into a separate issue.
Updated by Junxiao Shi over 9 years ago
Why does pattern inference require complete knowledge?
Requiring complete knowledge means all captures are gone.
I think partial knowledge inference is more useful.
For example: Regex("<ndn><ucla>(<>)(<>*)").infer("ndn:/qiuhan", NA)
could return Regex("<ndn><ucla><qiuhan>(<>*)")
.
Why does pattern inference take Name
s as knowledge?
I think taking partial patterns is more useful.
For example: Regex("<ndn><ucla>(<>)").infer("<ab*c>")
could return Regex("<ndn><ucla><ab*c>")
.
This allows the result regex to do non-exact matching.
Also, error conditions should be defined.
What happens if the input knowledge doesn't match the capture group in the first place?
For example, Regex("<ndn><ucla>(<ab*c>)").infer("ndn:/ddd")
.
Updated by Yingdi Yu over 9 years ago
Junxiao Shi wrote:
Why does pattern inference require complete knowledge?
Requiring complete knowledge means all captures are gone.
I think partial knowledge inference is more useful.
For example:Regex("<ndn><ucla>(<>)(<>*)").infer("ndn:/qiuhan", NA)
could returnRegex("<ndn><ucla><qiuhan>(<>*)")
.
NA
should be supported.
Why does pattern inference take
Name
s as knowledge?
I think taking partial patterns is more useful.
For example:Regex("<ndn><ucla>(<>)").infer("<ab*c>")
could returnRegex("<ndn><ucla><ab*c>")
.
This allows the result regex to do non-exact matching.
I am afraid this feature is too general, and we do not use it. The implementation of such feature is non-trivial, I would defer the feature.
Also, error conditions should be defined.
What happens if the input knowledge doesn't match the capture group in the first place?
For example,Regex("<ndn><ucla>(<ab*c>)").infer("ndn:/ddd")
.
Agree.
Updated by Junxiao Shi over 9 years ago
New pattern derived from the updated regex will get all its back references filled.
"all" is inaccurate when "NA" is supported.
There are three possible types of arguments: name, "null" or "NA" (case insensitive for the last two).
What's the parameter type?
Name
: ambigious, becauseName("NA")
is a valid Namestd::string
: possibly inefficient- another type that is implicitly convertible from
Name
and has special values for "null" and "NA": unnecessary to declare "case insensitive" because special values should come from named constants
name will replace the sub group with a new pattern,
This implies the Name is relative, which contradicts with the decision in #1962 note-2 and therefore is semantically incorrect.
std::vector<name::Component>
should be used instead.
"null" will remove the sub group.
How does this differ from passing an empty Name?
"NA" will leave the sub group as it is.
Updated by Qiuhan Ding over 9 years ago
Junxiao Shi wrote:
New pattern derived from the updated regex will get all its back references filled.
"all" is inaccurate when "NA" is supported.
There are three possible types of arguments: name, "null" or "NA" (case insensitive for the last two).
What's the parameter type?
Name
: ambigious, becauseName("NA")
is a valid Namestd::string
: possibly inefficient- another type that is implicitly convertible from
Name
and has special values for "null" and "NA": unnecessary to declare "case insensitive" because special values should come from named constantsname will replace the sub group with a new pattern,
This implies the Name is relative, which contradicts with the decision in #1962 note-2 and therefore is semantically incorrect.
std::vector<name::Component>
should be used instead.
My previous design is that when pass a name the back reference matcher will first check whether the name matches the regex(using original match function in matcher that accept Name type as input), then get the matching result. So if change to a vector of Component, do you suggest that I add another match function to match this? Another point is that the names we pass into the pattern inference are usually derived from another name's sub groups(the "id as a function" as new schema spec described). In that case, I choose to use expand function to get the sub groups name. Do you think I need to change this function to return a vector of Component?
"null" will remove the sub group.
How does this differ from passing an empty Name?
"NA" will leave the sub group as it is.
Updated by Yingdi Yu over 9 years ago
Just had a discussion with Qiuhan and Haitao, we feel that although na
is generally correct as an input to pattern inference, the algorithm and implementation is really complicated. And indeed schema does not need this feature. Therefore, to avoid blocking the schema implementation, we can defer the implementation of supporting na
. We will add the na
support only if it is needed at some other places.
If Junxiao insists to have this feature, please provide a valid algorithm for matching pattern against pattern. Otherwise we will go with the implementation without na
.
Updated by Junxiao Shi over 9 years ago
note-9 does not insist in having NA feature. It's asking for a spec confirmation.
I know nothing about how regex is implemented.
Updated by Junxiao Shi over 9 years ago
name will replace the sub group with a new pattern,
This implies the Name is relative, which contradicts with the decision in #1962 note-2 and therefore is semantically incorrect.
std::vector<name::Component>
should be used instead.My previous design is that when pass a name the back reference matcher will first check whether the name matches the regex(using original match function in matcher that accept Name type as input), then get the matching result. So if change to a vector of Component, do you suggest that I add another match function to match this? Another point is that the names we pass into the pattern inference are usually derived from another name's sub groups(the "id as a function" as new schema spec described). In that case, I choose to use expand function to get the sub groups name. Do you think I need to change this function to return a vector of Component?
Yes, the APIs shall be changed to take a RelativeName
aka "vector of Components".
Name is always absolute as decided in #1962 note-2. It's semantically wrong to treat some Name as relative.
Updated by Junxiao Shi over 9 years ago
In commit:5c1ca16b75c64dcd0ae075cd8292526803d70407, the spec says:
When nullptr is passed, the relevant sub-pattern will be removed.
For example, if the original pattern is<ndn><edu>(<>)(<>*)<>+
and ["ucla", nullptr] is passed to it.
The inferred pattern would be<ndn><edu><ucla><>+
.
I had doubt on this definition:
This is either redundant or wrong.
If the sub-pattern can match an empty PartialName, this is redundant, because caller can pass an empty PartialName instead of a nullptr.
If the sub-pattern cannot match an empty PartialName, this is wrong, because the inferred pattern could match a name that is not matched by original pattern. Inferred pattern should always match a subset (not a superset) of what the original pattern could match.
For example, '([])(<>)<>+'.infer({nullptr, "ucla"}) shouldn't return <>+ because <>+ would accept ndn:/ndn/ucla/irl but the original regex ([])(<>)<>+ would reject ndn:/ndn/ucla/irl
Qiuhan explained:
Pattern derivation is not designed for matching a subset of original pattern.
It is used for deriving signer's name patterns from packet name pattern (the original pattern in this case).
So it is not necessary that the signer's pattern should reject a name that a packet pattern rejects.
There is an example in the schema docs:
rule {
id key
name (<>*)(<>)<KEY>[id]<ID-CERT>[version]
signer key($1,null)|root()
}
This explanation seems reasonable. I'm copying it here so I can remember this.