Regex » History » Revision 4
Revision 3 (Yingdi Yu, 03/19/2014 11:19 AM) → Revision 4/7 (Yingdi Yu, 03/19/2014 11:57 AM)
# NDN Regular Expression NDN regular expression matching is done at two levels: one at the name level and one at the name component level. We use `<` and `>` to enclose a name component matcher which specifies the pattern of a name component. The component pattern is expressed using the [Perl Regular Expression Syntax](http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html). For example, `<ab*c>` can match the 1st, 3rd, and 4th components of `/ac/dc/abc/abbc`, `"/ac/dc/abc/abbc"`, but it cannot match the 2nd component. A special case is that `<>` is a wildcard matcher that can match **ANY** component. Note that a component match can match only one name component. In order to match a name, you need to specify the pattern of a name based on the name component matchers. For example, `<ndn><edu><ucla>` can match the name `/ndn/edu/ucla`. `"/ndn/edu/ucla"`. In order to describe a more complicated name pattern, we borrow some syntaxes from the standard regular expressions. ## NDN Regex Syntax ### Anchors A `'^'` character shall match the start of a name. For example, `^<ndn>` shall match any names starting with a component `ndn`, `"ndn"`, and it will exclude a name like `/local/broadcast`. `"/local/broadcast"`. A `'$'` character shall match the end of a name. For example, `^<ndn><edu>$` shall match only one name: `/ndn/edu`. ### Repeats `"/ndn/edu"`. A NDN regular expression is built on the component matcher can be followed extended from the standard regex by simply treating a repeat syntax to indicate how many times the preceding component can be matched. Syntax `*` for zero or more times. For example, `^<ndn><KEY><>*<ID-CERT>` shall match `/ndn/KEY/ID-CERT/`, or `/ndn/KEY/edu/ID-CERT`, or `/ndn/KEY/edu/ksk-12345/ID-CERT` and so on. Syntax `+` for one or more times. For example, `^<ndn><KEY><>+<ID-CERT>` shall match `/ndn/KEY/edu/ID-CERT`, or `/ndn/KEY/edu/ksk-12345/ID-CERT` and so on, but it cannot match `/ndn/KEY/ID-CERT/`. Syntax `?` for zero or one times. For example, `^<ndn><KEY><>?<ID-CERT>` shall match `/ndn/KEY/ID-CERT/`, or `/ndn/KEY/edu/ID-CERT`, but it cannot match `/ndn/KEY/edu/ksk-12345/ID-CERT`. Repetition can also be bounded: `{n}` for exactly `n` times. `{n,}` for at least `n` times. `{,n}` for at most `n` times. And `{n, m}` for `n` to `m` times. Note that the repeat matching is **greedy**, that is it will consume matcher as many matched components as possible. We do not support non-greedy repeat matching and possessive repeat matching for now. ### Sets Name component set is a bracket-expression starting with `'['` and ending with `']'`, it defines a set of name components, and matches any single name component that is a member of that set. Unlike the character in standard regular expression, NDN regular expression only supports **Single Components Set**, that is, you have to list all the set members one by one between the bracket. regex. For example, `^[<ndn><localhost>]` shall `^<ndn>` can match any names starting with either a component `ndn"` or `localhost`. When `"ndn"`; `<test>$` can match any names ending with a name component set starts with a `'^'`, the set becomes a **Negation Set**, that is, it matches the complement of the name components it contains. `"test"`; For example, `^[^<ndn>]` shall can match any names that does do not start with a component `ndn`. Some other types of sets, such as Range Set, will be supported later. Note that component set `"ndn"`; `^([^<DNS>])<DNS>(<>*)<NS>` can be repeated as well. ### Sub-pattern and Back Reference A section beginning `(` and ending `)` acts as a marked sub-pattern. Whatever matched the sub-pattern is split out in a separate field by the matching algorithms. For example `^([^<DNS>])<DNS>(<>*)<NS>` shall match a NDN DNS data name, and the first sub-pattern captures the zone prefix while the second sub-pattern captures the entry name. Marked sub-patterns you can be referred use back reference to extract the part enclosed by a back-reference `\n`. `(` and `)` The same example above shall match a name `/ndn/edu/ucla/DNS/irl/NS/123456`, and a back reference `\1\2` shall (`"\1\2"` can extract `/ndn/edu/ucla/irl` out of the name. Note that marked sub-patterns can be also repeated. from `/ndn/edu/ucla/DNS/irl/NS/123456`.