.net - Regex: not arbitrary non capturing group -


i'm trying write regex cover cases. have parse xml , capture properties. example:

<item p2="2"/> <item p1="1" p2="2"/> <item p1="1" p2="2" p3="3"/> <item p1="1" p2="2" p3="3" p4="4"/> <item p1="1" p2="2" p3="3" p4="4" p5="5"/> 

i have capture value of "p2" property , know "p2" present in line. want capture value of "p4" property not present.

at first i'm trying satisfy first 4 cases(first 4 lines in example) , wrote regular expression this:

\<item.+?p2=\"(?<val1>\d+)".*?(?:p4=\"(?<val2>\d+)\")?\/\> 

and works fine. "val1" group returns value. , "val2" group returns value if "p4" property presented.

but cover last case:

<item p1="1" p2="2" p3="3" p4="4" p5="5"/> 

i have modified regular expression this:

\<item.+?p2=\"(?<val1>\d+)".*?(?:p4=\"(?<val2>\d+)\")?.*?\/\> ______________________________________________________^^^ 

and found "val1" group still returns values, "val2" group no more returns values cases.

could tell me i'm missed, , write regular expression cover cases?

example here in regex tester

xml not regular language there using regular expressions not way go. need parser.

there many ways this, load xml document xmldocument class , use selectnodes method xpath query find list of items. once have can use foreach on each found xmlnode , use attributes collection data want.

if must using regular expressions need put last .? inside non-capturing group. have done give regex permission ommit p4 patch , match .? instead. putting .*? inside group removes possibility. slow (it may suffer catastrophic backtracking) , not handle complexities of xml. here program demonstraits:

using system; using system.text.regularexpressions;  class program {     static void main()     {         var regex = new regex(@"         \<item                  # capture <item         .+?                     # capture 1 or more characters few times possible          p2=                     # capture p2=         \""                     # capture opening quote         (?<val1>\d+)            # capture 1 or more decimal digits , put them in val1         ""                      # capture closing quote         .*?                     # capture 0 or more characters few times possible         (?:                     # begin non capturing group             p4=                 # capture p4=             \""                 # capture opening quote             (?<val2>\d+)        # capture 1 or more decimal digits , put them in val2             \""                 # capture closing quote             .*?                 # capture 0 or more characters few times possible         )?                      # capture 0 or 1 p4s                 />                      # capture \>         ", regexoptions.ignorepatternwhitespace);          test(regex, @"<item p2=""2""/>", "2", string.empty);         test(regex, @"<item p1=""1"" p2=""2""/>", "2", string.empty);         test(regex, @"<item p1=""1"" p2=""2"" p3=""3""/>", "2", string.empty);         test(regex, @"<item p1=""1"" p2=""2"" p3=""3"" p4=""4""/>", "2", "4");         test(regex, @"<item p1=""1"" p2=""2"" p3=""3"" p4=""4"" p5=""5""/>", "2", "4");     }      static void test(regex regex, string test, string p2, string p4)     {         match m = regex.match(test);          string p2group = m.groups["val1"].value;         string p4group = m.groups["val2"].value;          console.writeline("test: '{0}'", test);         console.writeline("p2: '{0}' - {1}", p2group, p2group == p2 ? "success" : "fail");         console.writeline("p4: '{0}' - {1}", p4group, p4group == p4 ? "success" : "fail");     } } 

Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

configurationsection - activeMq-5.13.3 setup configurations for wildfly 10.0.0 -