I am currently using c# to implement a project (previously using c++) and had a query on the response.ExtractRegExp() function.
I am having trouble extracting a list of regular expressions, i am only getting one extracted value back, however there are multiple possible matches on the web page.
For example:
[i]string regEx = “PS-[0-9]{8}-[0-9]{5}”;
ExtractionCursor cursor = new ExtractionCursor();
When I iterate through the matchRegEx list it only seems to have one value, The code sample in the c# API help suggests that this approach should populate the list with all extracted regular expression matches. Am i doing something wrong?
I could get around this by dumping the web page contents to a string and use the .NET regex functionality but wondered if i could get around this with ExtractRegExp.
I’ve had a look and am able to populate multiple matches so it looks like your RegEx may only be returning a single match. Are you able to share the page source so that I can reproduce locally? Feel free to email this to me if you’re worried about sharing it on the forum (my username with a dot between and @testplant.com).
RegExpMatchList matches = response4.ExtractRegExp(new ExtractionCursor(), @"url\(.*\)");
WriteMessage("Number of regex matches: " + matches.Count);
foreach (RegExpMatch match in matches)
{
WriteMessage("Match value: " + match.Match);
}
The expression used is probably a bit more straightforward than yours, but it simply extracts a bunch of URLs in this case:
00:00:04:524 Message Number of regex matches: 5
00:00:04:525 Message Match value: url(templates/greenhouse/images/cellpic2.jpg)
00:00:04:525 Message Match value: url(templates/greenhouse/images/)
00:00:04:525 Message Match value: url(templates/greenhouse/images/cellpic1.gif)
00:00:04:525 Message Match value: url("templates/greenhouse/formIE.css")
00:00:04:525 Message Match value: url('/phpBB2/images/bg_forum.png')
When you manually put it through the .NET RegEx engine, are you using the Response object’s Content property?