regex - How to compose regexes in code -
i writing regex irc protocol abnf message format. following short example of of regex writing.
// digit = %x30-39 ; 0-9 // "[0-9]" static const std::string digit("[\x30-\x39]");
i use previous definitions form more complex ones, , gets complex, fast. having problems with, more complicated regexes, composing them:
// hexdigit = digit / "a" / "b" / "c" / "d" / "e" / "f" // "[[0-9]abcdef]" static const std::string hexdigit("[" + digit + "abcdef]");
a "hexdigit" "digit" or "hex-letter".
note: don't care rfc defines "hexdigit" letter (abcdef) being uppercase. going rfc says , don't plan on changing requirements.
const std::regex digit(dapps::regex::digit); assert(std::regex_match("0", digit)); assert(std::regex_match("1", digit)); assert(std::regex_match("2", digit)); assert(std::regex_match("3", digit)); assert(std::regex_match("4", digit)); assert(std::regex_match("5", digit)); assert(std::regex_match("6", digit)); assert(std::regex_match("7", digit)); assert(std::regex_match("8", digit)); assert(std::regex_match("9", digit)); assert(!std::regex_match("10", digit));
in code above, matching "digit" works intended in abnf.
however, "hexdigit" illegal regex syntax:
[[0-9]abcdef]
rather than
[0-9abcdef]
and trying match won't work:
const std::regex hexdigit(dapps::regex::hexdigit); assert(std::regex_match("0", hexdigit)); assert(std::regex_match("1", hexdigit)); assert(std::regex_match("2", hexdigit)); assert(std::regex_match("3", hexdigit)); assert(std::regex_match("4", hexdigit)); assert(std::regex_match("5", hexdigit)); assert(std::regex_match("6", hexdigit)); assert(std::regex_match("7", hexdigit)); assert(std::regex_match("8", hexdigit)); assert(std::regex_match("9", hexdigit)); assert(std::regex_match("a", hexdigit)); assert(std::regex_match("b", hexdigit)); assert(std::regex_match("c", hexdigit)); assert(std::regex_match("d", hexdigit)); assert(std::regex_match("e", hexdigit)); assert(std::regex_match("f", hexdigit)); assert(!std::regex_match("10", hexdigit));
consequently, if make "digit" not have "single character in range selector", ([ ]
) can't use "digit" match "digit".
i may going wrong way entirely, question is: need keep both versions, 1 , without brackets, or there easier way altogether compose regexes.
rather meld 2 character classes have attempted, should have been:
[0-9abcdef]
construct alternation - ie logical or
- via pipe char |
, , bracket (non-grouping) joined terms:
(?:[0-9]|[abcdef])
the benefit of approach can join 2 expressions way, character class or otherwise, eg digit or whitespace:
(?:[0-9]|\s)
so can applied.
minor point: can code [abcdef]
[a-f]
and/or can make case insensitive [a-fa-f]
.
Comments
Post a Comment