Extracting Multiple regular expressions from Web Page


I am currently using c# to implement a project (previously using c++) and had a query on the response.ExtractRegExp() function.

I am having trouble extracting a list of regular expressions, i am only getting one extracted value back, however there are multiple possible matches on the web page.

For example:

[i]string regEx = “PS-[0-9]{8}-[0-9]{5}”;
ExtractionCursor cursor = new ExtractionCursor();

RegExpMatchList matchRegEx= response25.ExtractRegExp(cursor,regEx);[/i]

When I iterate through the matchRegEx list it only seems to have one value, The code sample in the c# API help suggests that this approach should populate the list with all extracted regular expression matches. Am i doing something wrong?

I could get around this by dumping the web page contents to a string and use the .NET regex functionality but wondered if i could get around this with ExtractRegExp.

Many Thanks,

Hi Paul,

I’ve had a look and am able to populate multiple matches so it looks like your RegEx may only be returning a single match. Are you able to share the page source so that I can reproduce locally? Feel free to email this to me if you’re worried about sharing it on the forum (my username with a dot between and @testplant.com).

My test was to navigate to http://forums.testplant.com/phpBB2/ and in the generated code plant the following:

            RegExpMatchList matches = response4.ExtractRegExp(new ExtractionCursor(), @"url\(.*\)");

            WriteMessage("Number of regex matches: " + matches.Count);
            foreach (RegExpMatch match in matches)
                WriteMessage("Match value: " + match.Match);

The expression used is probably a bit more straightforward than yours, but it simply extracts a bunch of URLs in this case:

00:00:04:524	Message		Number of regex matches: 5
00:00:04:525	Message		Match value: url(templates/greenhouse/images/cellpic2.jpg)
00:00:04:525	Message		Match value: url(templates/greenhouse/images/)
00:00:04:525	Message		Match value: url(templates/greenhouse/images/cellpic1.gif)
00:00:04:525	Message		Match value: url("templates/greenhouse/formIE.css")
00:00:04:525	Message		Match value: url('/phpBB2/images/bg_forum.png')

When you manually put it through the .NET RegEx engine, are you using the Response object’s Content property?