Regex » History » Version 5
Yingdi Yu, 03/19/2014 12:00 PM
1 | 1 | Yingdi Yu | # NDN Regular Expression |
---|---|---|---|
2 | 2 | Yingdi Yu | |
3 | NDN regular expression matching is done at two levels: one at the name level and one at the name component level. |
||
4 | |||
5 | We use `<` and `>` to enclose a name component matcher which specifies the pattern of a name component. |
||
6 | The component pattern is expressed using the [Perl Regular Expression Syntax](http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html). |
||
7 | 4 | Yingdi Yu | For example, `<ab*c>` can match the 1st, 3rd, and 4th components of `/ac/dc/abc/abbc`, but it cannot match the 2nd component. |
8 | 2 | Yingdi Yu | A special case is that `<>` is a wildcard matcher that can match **ANY** component. |
9 | |||
10 | Note that a component match can match only one name component. |
||
11 | In order to match a name, you need to specify the pattern of a name based on the name component matchers. |
||
12 | 4 | Yingdi Yu | For example, `<ndn><edu><ucla>` can match the name `/ndn/edu/ucla`. |
13 | 2 | Yingdi Yu | In order to describe a more complicated name pattern, we borrow some syntaxes from the standard regular expressions. |
14 | |||
15 | ## NDN Regex Syntax |
||
16 | |||
17 | ### Anchors |
||
18 | |||
19 | 3 | Yingdi Yu | A `'^'` character shall match the start of a name. |
20 | 4 | Yingdi Yu | For example, `^<ndn>` shall match any names starting with a component `ndn`, and it will exclude a name like `/local/broadcast`. |
21 | 1 | Yingdi Yu | |
22 | A `'$'` character shall match the end of a name. |
||
23 | 4 | Yingdi Yu | For example, `^<ndn><edu>$` shall match only one name: `/ndn/edu`. |
24 | 1 | Yingdi Yu | |
25 | 4 | Yingdi Yu | ### Repeats |
26 | 1 | Yingdi Yu | |
27 | 4 | Yingdi Yu | A component matcher can be followed by a repeat syntax to indicate how many times the preceding component can be matched. |
28 | |||
29 | Syntax `*` for zero or more times. |
||
30 | For example, `^<ndn><KEY><>*<ID-CERT>` shall match `/ndn/KEY/ID-CERT/`, or `/ndn/KEY/edu/ID-CERT`, or `/ndn/KEY/edu/ksk-12345/ID-CERT` and so on. |
||
31 | |||
32 | Syntax `+` for one or more times. |
||
33 | For example, `^<ndn><KEY><>+<ID-CERT>` shall match `/ndn/KEY/edu/ID-CERT`, or `/ndn/KEY/edu/ksk-12345/ID-CERT` and so on, but it cannot match `/ndn/KEY/ID-CERT/`. |
||
34 | |||
35 | Syntax `?` for zero or one times. |
||
36 | For example, `^<ndn><KEY><>?<ID-CERT>` shall match `/ndn/KEY/ID-CERT/`, or `/ndn/KEY/edu/ID-CERT`, but it cannot match `/ndn/KEY/edu/ksk-12345/ID-CERT`. |
||
37 | |||
38 | Repetition can also be bounded: |
||
39 | |||
40 | `{n}` for exactly `n` times. `{n,}` for at least `n` times. `{,n}` for at most `n` times. And `{n, m}` for `n` to `m` times. |
||
41 | |||
42 | Note that the repeat matching is **greedy**, |
||
43 | that is it will consume as many matched components as possible. |
||
44 | We do not support non-greedy repeat matching and possessive repeat matching for now. |
||
45 | |||
46 | ### Sets |
||
47 | |||
48 | Name component set is a bracket-expression starting with `'['` and ending with `']'`, |
||
49 | it defines a set of name components, and matches any single name component that is a member of that set. |
||
50 | |||
51 | Unlike the standard regular expression, |
||
52 | NDN regular expression only supports **Single Components Set**, |
||
53 | that is, you have to list all the set members one by one between the bracket. |
||
54 | For example, `^[<ndn><localhost>]` shall match any names starting with either a component `ndn"` or `localhost`. |
||
55 | |||
56 | When a name component set starts with a `'^'`, the set becomes a **Negation Set**, |
||
57 | that is, it matches the complement of the name components it contains. |
||
58 | For example, `^[^<ndn>]` shall match any names that does not start with a component `ndn`. |
||
59 | |||
60 | Some other types of sets, such as Range Set, will be supported later. |
||
61 | |||
62 | Note that component set can be repeated as well. |
||
63 | |||
64 | ### Sub-pattern and Back Reference |
||
65 | |||
66 | A section beginning `(` and ending `)` acts as a marked sub-pattern. |
||
67 | Whatever matched the sub-pattern is split out in a separate field by the matching algorithms. |
||
68 | 5 | Yingdi Yu | For example `^([^<DNS>])<DNS>(<>*)<NS>` shall match a data name of NDN DNS NS record, |
69 | and the first sub-pattern captures the zone name while the second sub-pattern captures the relative record name. |
||
70 | 4 | Yingdi Yu | |
71 | Marked sub-patterns can be referred to by a back-reference `\n`. |
||
72 | The same example above shall match a name `/ndn/edu/ucla/DNS/irl/NS/123456`, |
||
73 | and a back reference `\1\2` shall extract `/ndn/edu/ucla/irl` out of the name. |
||
74 | |||
75 | Note that marked sub-patterns can be also repeated. |