[eggPlant Functional] XML Data -- reading and writing

NOTE: In Eggplant for Mac beginning with version 4.1, SenseTalk includes built-in support for working with XML data (see Chapter 16 � Working with Trees and XML of the SenseTalk Reference manual for full details). The example script presented here is still of interest as an example of parsing text formats or for use with Eggplant for Linux or Windows


The XMLParser suite attached below contains two scripts. The XMLParser script includes two handlers that can be called as functions:

  • parseXMLString() takes an XML string and returns a SenseTalk property list containing data extracted from it
  • generateXML() does the reverse, taking a property list and returning the data in an XML format

The second script, TestXMLParsing, shows how you might use these functions to work with XML data.

These scripts are intended as an example – a starting point for your own XML processing, rather than a finished product. They work well for the simple case shown here, but are not a complete solution for all possible XML formats, which can vary widely.

One particular known limitation: the current scripts do not handle XML “entities” (such as < to represent a “<” character). For now, extending the script to deal with entities is left as an exercise for the reader. :wink:

Hey Doug,

I just tried parseXMLString. I have two xml files with UTF-8 encoding, one of the xml file has pure ASCII characters, and the other xml file has some Chinese characters.

I wrote a small code to read the content from the xml files to a string, and invoke parseXMLString. It worked well with for xml file with pure ASCII characters, while it has conversion error for the xml file with Chinese characters.

You may need to set the defaultStringEncoding global property to a different encoding before reading the file. By default, the encoding is set to UTF8 which is what you said the file uses, but maybe it’s actually something else. Try writing a simple test script that just sets the encoding, reads the file and displays its contents (using a ‘put’ statement). Then try some different encodings to see if you can find one that will read the Chinese characters properly.

(Note: the defaultStringEncoding property was introduced in Eggplant 3.0 – if you have an older version you’ll have to find a different solution, perhaps using some external tool to convert the file format)

Hey Doug,

Finally I found the bug that causes the “NSCharacterConversionException Conversion to encoding 30 failed for string…”

This statement in your XMLParser.script has the problem.

the item number of endTag in items index+1 to last of tokenList

if there are “non-ASCII” characters in the list, it will throw “NSCharacterConversion” exception.

I attach my simple test script here.

NonASCII.txt (this file is in UTF-8 encoding.)
(,
,,
,,
,,
,桌面,
,,
,,
,,)

main.script
set the defaultStringEncoding to “UTF8”

put file “/tmp/NonASCII.txt” into testData

put testData into list
put “” into var

//I am able to output each item
repeat with n = 1 to the number of items in list
put "item " & n & item n of list
end repeat

//failed here.
put the item number of var within list

Could you please take a look at this issue? I am trying to custermize your xmlparser for our own use.

Thanks.
Jeanne

Thanks for the clear description of the problem! I’ve verified that there is a bug with the “item number of var within list” operator when working with non-ASCII text. We’ll see about correcting that in a future version. In the meantime there are plenty of other ways to accomplish the same thing.

I have posted an updated version of the XMLParser script and suite which seems to work fine (in my limited testing) with the Chinese characters you posted. This version can also properly handle some XML documents that may contain a tag nested inside another tag of the same name (such as XML-RPC which has tags inside other tags, for example).

Please download the new version and let us know if there are still problems.

Hey Doug,

I got “file not found 404 error” when trying to download the zip file. Would you please help to check?

Thanks.
Jeanne

Sorry. It was there for a little while… :oops: It seems we had a little instability related to our new website design, but all should be well again now. Please try again.

Has anyone had any better luck implementing a faster and more robust XMLParser for Eggplant/SenseTalk?

A relatively simple XML of a few dozen nodes takes nearly 10 seconds to parse.
And, the lack of XML entity support is a pretty big weakness.
Also, the limited MaxCallDepth can also cause some problems w/ trying parse more than just the simplest of XML.

If Eggplant had native XML parsing and regular expressions and no hard coded MacCallDepth then that would be killer!

Pv

Please note that in the Mac version beginning in Eggplant 4.1, SenseTalk includes built-in support for XML parsing and generation, through the new Tree structure. This is many times faster than the example script given here, and provides many other benefits as well.

See Chapter 16 ? Working with Trees and XML in the SenseTalk Reference manual for full details.

One word says it all: woot!

This chapter isn’t included in the Windows documentation. Am I to take that as this functionality doesn’t exist in the Windows version?

Actually, I’ve just read the release notes when I was writing this and I’ve seen that it’s an exclusion from the Mac version. Any idea when this functionality is to be included in the Windows version? It would be extremely useful for me and I guess a few others.

Thanks. 8)

I wouldn’t look for it any time soon. It’s going to need to be completely reimplemented for Windows and Linux.

I would expect it to be a priority seeing as the fee for a Windows eggPlant license is substantially higher than the Mac version yet it doesn’t have all the features?

The reason it’s not available on Windows and Linux is because the underlying library does not exist on those platforms and, as Matt indicated, will need to be re-written. This is spelled out in the Windows Exclusions section of the Release Notes. That said, it absolutely is something we hope to implement for a future release of eggPlant: Windows but no specific date has been set.

If you would prefer to work with the Mac version of the product we would be happy to accommodate you. Please contact your account representative.

Please note:
The XML and Tree functionality of SenseTalk is now available in the latest versions of eggPlant for Windows and Linux, as well as Mac (version 11.2 and later). So go download the latest version and enjoy it!