regex - How to compose regexes in code -


i writing regex irc protocol abnf message format. following short example of of regex writing.

// digit      =  %x30-39                 ; 0-9 // "[0-9]" static const std::string digit("[\x30-\x39]"); 

i use previous definitions form more complex ones, , gets complex, fast. having problems with, more complicated regexes, composing them:

// hexdigit = digit / "a" / "b" / "c" / "d" / "e" / "f" // "[[0-9]abcdef]" static const std::string hexdigit("[" + digit + "abcdef]"); 

a "hexdigit" "digit" or "hex-letter".

note: don't care rfc defines "hexdigit" letter (abcdef) being uppercase. going rfc says , don't plan on changing requirements.

const std::regex digit(dapps::regex::digit); assert(std::regex_match("0", digit)); assert(std::regex_match("1", digit)); assert(std::regex_match("2", digit)); assert(std::regex_match("3", digit)); assert(std::regex_match("4", digit)); assert(std::regex_match("5", digit)); assert(std::regex_match("6", digit)); assert(std::regex_match("7", digit)); assert(std::regex_match("8", digit)); assert(std::regex_match("9", digit)); assert(!std::regex_match("10", digit)); 

in code above, matching "digit" works intended in abnf.

however, "hexdigit" illegal regex syntax:

[[0-9]abcdef] 

rather than

[0-9abcdef] 

and trying match won't work:

const std::regex hexdigit(dapps::regex::hexdigit); assert(std::regex_match("0", hexdigit)); assert(std::regex_match("1", hexdigit)); assert(std::regex_match("2", hexdigit)); assert(std::regex_match("3", hexdigit)); assert(std::regex_match("4", hexdigit)); assert(std::regex_match("5", hexdigit)); assert(std::regex_match("6", hexdigit)); assert(std::regex_match("7", hexdigit)); assert(std::regex_match("8", hexdigit)); assert(std::regex_match("9", hexdigit)); assert(std::regex_match("a", hexdigit)); assert(std::regex_match("b", hexdigit)); assert(std::regex_match("c", hexdigit)); assert(std::regex_match("d", hexdigit)); assert(std::regex_match("e", hexdigit)); assert(std::regex_match("f", hexdigit)); assert(!std::regex_match("10", hexdigit)); 

consequently, if make "digit" not have "single character in range selector", ([ ]) can't use "digit" match "digit".

i may going wrong way entirely, question is: need keep both versions, 1 , without brackets, or there easier way altogether compose regexes.

rather meld 2 character classes have attempted, should have been:

[0-9abcdef] 

construct alternation - ie logical or - via pipe char |, , bracket (non-grouping) joined terms:

(?:[0-9]|[abcdef]) 

the benefit of approach can join 2 expressions way, character class or otherwise, eg digit or whitespace:

(?:[0-9]|\s) 

so can applied.


minor point: can code [abcdef] [a-f] and/or can make case insensitive [a-fa-f].


Comments

Popular posts from this blog

java - Run spring boot application error: Cannot instantiate interface org.springframework.context.ApplicationListener -

reactjs - React router and this.props.children - how to pass state to this.props.children -

Excel VBA "Microsoft Windows Common Controls 6.0 (SP6)" Location Changes -