.net - Regex: not arbitrary non capturing group -
i'm trying write regex cover cases. have parse xml , capture properties. example:
<item p2="2"/> <item p1="1" p2="2"/> <item p1="1" p2="2" p3="3"/> <item p1="1" p2="2" p3="3" p4="4"/> <item p1="1" p2="2" p3="3" p4="4" p5="5"/>
i have capture value of "p2" property , know "p2" present in line. want capture value of "p4" property not present.
at first i'm trying satisfy first 4 cases(first 4 lines in example) , wrote regular expression this:
\<item.+?p2=\"(?<val1>\d+)".*?(?:p4=\"(?<val2>\d+)\")?\/\>
and works fine. "val1" group returns value. , "val2" group returns value if "p4" property presented.
but cover last case:
<item p1="1" p2="2" p3="3" p4="4" p5="5"/>
i have modified regular expression this:
\<item.+?p2=\"(?<val1>\d+)".*?(?:p4=\"(?<val2>\d+)\")?.*?\/\> ______________________________________________________^^^
and found "val1" group still returns values, "val2" group no more returns values cases.
could tell me i'm missed, , write regular expression cover cases?
xml not regular language there using regular expressions not way go. need parser.
there many ways this, load xml document xmldocument class , use selectnodes method xpath query find list of items. once have can use foreach on each found xmlnode , use attributes collection data want.
if must using regular expressions need put last .? inside non-capturing group. have done give regex permission ommit p4 patch , match .? instead. putting .*? inside group removes possibility. slow (it may suffer catastrophic backtracking) , not handle complexities of xml. here program demonstraits:
using system; using system.text.regularexpressions; class program { static void main() { var regex = new regex(@" \<item # capture <item .+? # capture 1 or more characters few times possible p2= # capture p2= \"" # capture opening quote (?<val1>\d+) # capture 1 or more decimal digits , put them in val1 "" # capture closing quote .*? # capture 0 or more characters few times possible (?: # begin non capturing group p4= # capture p4= \"" # capture opening quote (?<val2>\d+) # capture 1 or more decimal digits , put them in val2 \"" # capture closing quote .*? # capture 0 or more characters few times possible )? # capture 0 or 1 p4s /> # capture \> ", regexoptions.ignorepatternwhitespace); test(regex, @"<item p2=""2""/>", "2", string.empty); test(regex, @"<item p1=""1"" p2=""2""/>", "2", string.empty); test(regex, @"<item p1=""1"" p2=""2"" p3=""3""/>", "2", string.empty); test(regex, @"<item p1=""1"" p2=""2"" p3=""3"" p4=""4""/>", "2", "4"); test(regex, @"<item p1=""1"" p2=""2"" p3=""3"" p4=""4"" p5=""5""/>", "2", "4"); } static void test(regex regex, string test, string p2, string p4) { match m = regex.match(test); string p2group = m.groups["val1"].value; string p4group = m.groups["val2"].value; console.writeline("test: '{0}'", test); console.writeline("p2: '{0}' - {1}", p2group, p2group == p2 ? "success" : "fail"); console.writeline("p4: '{0}' - {1}", p4group, p4group == p4 ? "success" : "fail"); } }
Comments
Post a Comment