Project

General

Profile

Regex » History » Version 5

Yingdi Yu, 03/19/2014 12:00 PM

1 1 Yingdi Yu
# NDN Regular Expression
2 2 Yingdi Yu
3
NDN regular expression matching is done at two levels: one at the name level and one at the name component level.
4
5
We use `<` and `>` to enclose a name component matcher which specifies the pattern of a name component.
6
The component pattern is expressed using the [Perl Regular Expression Syntax](http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html).
7 4 Yingdi Yu
For example, `<ab*c>` can match the 1st, 3rd, and 4th components of `/ac/dc/abc/abbc`, but it cannot match the 2nd component.
8 2 Yingdi Yu
A special case is that `<>` is a wildcard matcher that can match **ANY** component.
9
10
Note that a component match can match only one name component.
11
In order to match a name, you need to specify the pattern of a name based on the name component matchers.
12 4 Yingdi Yu
For example, `<ndn><edu><ucla>` can match the name `/ndn/edu/ucla`.
13 2 Yingdi Yu
In order to describe a more complicated name pattern, we borrow some syntaxes from the standard regular expressions.
14
15
## NDN Regex Syntax
16
17
### Anchors
18
19 3 Yingdi Yu
A `'^'` character shall match the start of a name.
20 4 Yingdi Yu
For example, `^<ndn>` shall match any names starting with a component `ndn`, and it will exclude a name like `/local/broadcast`.
21 1 Yingdi Yu
22
A `'$'` character shall match the end of a name.
23 4 Yingdi Yu
For example, `^<ndn><edu>$` shall match only one name: `/ndn/edu`. 
24 1 Yingdi Yu
25 4 Yingdi Yu
### Repeats
26 1 Yingdi Yu
27 4 Yingdi Yu
A component matcher can be followed by a repeat syntax to indicate how many times the preceding component can be matched.
28
29
Syntax `*` for zero or more times.
30
For example, `^<ndn><KEY><>*<ID-CERT>` shall match `/ndn/KEY/ID-CERT/`, or `/ndn/KEY/edu/ID-CERT`, or `/ndn/KEY/edu/ksk-12345/ID-CERT` and so on.
31
32
Syntax `+` for one or more times.
33
For example, `^<ndn><KEY><>+<ID-CERT>` shall match `/ndn/KEY/edu/ID-CERT`, or `/ndn/KEY/edu/ksk-12345/ID-CERT` and so on, but it cannot match `/ndn/KEY/ID-CERT/`.
34
35
Syntax `?` for zero or one times.
36
For example, `^<ndn><KEY><>?<ID-CERT>` shall match `/ndn/KEY/ID-CERT/`, or `/ndn/KEY/edu/ID-CERT`, but it cannot match `/ndn/KEY/edu/ksk-12345/ID-CERT`.
37
38
Repetition can also be bounded:
39
40
`{n}` for exactly `n` times. `{n,}` for at least `n` times. `{,n}` for at most `n` times. And `{n, m}` for `n` to `m` times.
41
42
Note that the repeat matching is **greedy**, 
43
that is it will consume as many matched components as possible.
44
We do not support non-greedy repeat matching and possessive repeat matching for now.
45
46
### Sets
47
48
Name component set is a bracket-expression starting with `'['` and ending with `']'`, 
49
it defines a set of name components, and matches any single name component that is a member of that set.
50
51
Unlike the standard regular expression, 
52
NDN regular expression only supports **Single Components Set**, 
53
that is, you have to list all the set members one by one between the bracket.
54
For example, `^[<ndn><localhost>]` shall match any names starting with either a component `ndn"` or `localhost`.
55
56
When a name component set starts with a `'^'`, the set becomes a **Negation Set**, 
57
that is, it matches the complement of the name components it contains.
58
For example, `^[^<ndn>]` shall match any names that does not start with a component `ndn`.
59
60
Some other types of sets, such as Range Set, will be supported later.
61
62
Note that component set can be repeated as well.
63
64
### Sub-pattern and Back Reference
65
66
A section beginning `(` and ending `)` acts as a marked sub-pattern. 
67
Whatever matched the sub-pattern is split out in a separate field by the matching algorithms. 
68 5 Yingdi Yu
For example `^([^<DNS>])<DNS>(<>*)<NS>` shall match a data name of NDN DNS NS record, 
69
and the first sub-pattern captures the zone name while the second sub-pattern captures the relative record name.
70 4 Yingdi Yu
71
Marked sub-patterns can be referred to by a back-reference `\n`.
72
The same example above shall match a name `/ndn/edu/ucla/DNS/irl/NS/123456`, 
73
and a back reference `\1\2` shall extract `/ndn/edu/ucla/irl` out of the name.
74
75
Note that marked sub-patterns can be also repeated.