Project

General

Profile

Actions

Regex » History » Revision 5

« Previous | Revision 5/7 (diff) | Next »
Yingdi Yu, 03/19/2014 12:00 PM


NDN Regular Expression

NDN regular expression matching is done at two levels: one at the name level and one at the name component level.

We use < and > to enclose a name component matcher which specifies the pattern of a name component.
The component pattern is expressed using the Perl Regular Expression Syntax.
For example, <ab*c> can match the 1st, 3rd, and 4th components of /ac/dc/abc/abbc, but it cannot match the 2nd component.
A special case is that <> is a wildcard matcher that can match ANY component.

Note that a component match can match only one name component.
In order to match a name, you need to specify the pattern of a name based on the name component matchers.
For example, <ndn><edu><ucla> can match the name /ndn/edu/ucla.
In order to describe a more complicated name pattern, we borrow some syntaxes from the standard regular expressions.

NDN Regex Syntax

Anchors

A '^' character shall match the start of a name.
For example, ^<ndn> shall match any names starting with a component ndn, and it will exclude a name like /local/broadcast.

A '$' character shall match the end of a name.
For example, ^<ndn><edu>$ shall match only one name: /ndn/edu.

Repeats

A component matcher can be followed by a repeat syntax to indicate how many times the preceding component can be matched.

Syntax * for zero or more times.
For example, ^<ndn><KEY><>*<ID-CERT> shall match /ndn/KEY/ID-CERT/, or /ndn/KEY/edu/ID-CERT, or /ndn/KEY/edu/ksk-12345/ID-CERT and so on.

Syntax + for one or more times.
For example, ^<ndn><KEY><>+<ID-CERT> shall match /ndn/KEY/edu/ID-CERT, or /ndn/KEY/edu/ksk-12345/ID-CERT and so on, but it cannot match /ndn/KEY/ID-CERT/.

Syntax ? for zero or one times.
For example, ^<ndn><KEY><>?<ID-CERT> shall match /ndn/KEY/ID-CERT/, or /ndn/KEY/edu/ID-CERT, but it cannot match /ndn/KEY/edu/ksk-12345/ID-CERT.

Repetition can also be bounded:

{n} for exactly n times. {n,} for at least n times. {,n} for at most n times. And {n, m} for n to m times.

Note that the repeat matching is greedy,
that is it will consume as many matched components as possible.
We do not support non-greedy repeat matching and possessive repeat matching for now.

Sets

Name component set is a bracket-expression starting with '[' and ending with ']',
it defines a set of name components, and matches any single name component that is a member of that set.

Unlike the standard regular expression,
NDN regular expression only supports Single Components Set,
that is, you have to list all the set members one by one between the bracket.
For example, ^[<ndn><localhost>] shall match any names starting with either a component ndn" or localhost.

When a name component set starts with a '^', the set becomes a Negation Set,
that is, it matches the complement of the name components it contains.
For example, ^[^<ndn>] shall match any names that does not start with a component ndn.

Some other types of sets, such as Range Set, will be supported later.

Note that component set can be repeated as well.

Sub-pattern and Back Reference

A section beginning ( and ending ) acts as a marked sub-pattern.
Whatever matched the sub-pattern is split out in a separate field by the matching algorithms.
For example ^([^<DNS>])<DNS>(<>*)<NS> shall match a data name of NDN DNS NS record,
and the first sub-pattern captures the zone name while the second sub-pattern captures the relative record name.

Marked sub-patterns can be referred to by a back-reference \n.
The same example above shall match a name /ndn/edu/ucla/DNS/irl/NS/123456,
and a back reference \1\2 shall extract /ndn/edu/ucla/irl out of the name.

Note that marked sub-patterns can be also repeated.

Updated by Yingdi Yu over 10 years ago · 7 revisions