Problems reading text with underscores

jswanson · February 13, 2013, 1:14pm

I am creating a list from items listed in a window and depending on the words I get different results when an underscore is included in the text. We just recently installed 12.1 on Linux systems.

Install_Default -> becomes Install.Default
Background_White -> becomes BackgroundWmte
Desert_Sand -> becomes “Desert Sand”

I have Contrast: on
The results are the same whether I include Background Color in the search properties. I have changed the tolerance from 20 to 80 and I get the same results.

What else can I do to accurately read text that includes an “_”.

EggplantMatt · February 18, 2013, 1:22am

The OCR engine has it’s origins in reading printed documents and as a result, it tends to see the underscore as just a line like those on a form. As a result, it’s generally discarded or simply ignored. You might have some luck if you use the “ValidCharacters” flag and include the underscore in the list, but this means that you also need to include the alphabet (upper and lower case) and any other characters that you need to read – when you use that flag, eggPlant will only return the characters that are included in the list. Here’s some sample code that creates a list of all the alphabet characters (plus a few elements of punctuation) and adds the underscore:

put A..z as list into alphas
put readText((143,123,268,174),validCharacters:(alphas & "_"))

jswanson · February 18, 2013, 10:51am

Quick question: Do I have to specify every letter or is the “A…z” sufficient?

put A…z as list into alphas

SenseTalkDoug · February 18, 2013, 6:12pm

The A…z notation is a “range” that includes every character from capital A through lowercase z (which as Matt mentioned also includes some punctuation characters that happen to have values that fall in between). So yes, that is sufficient to include all of the letters in between.

To be safe though, you should use quotes around the letters “A” and “z”. And if you want to be explicit about exactly which characters are allowed you might do it like this:

set alphaAndUnderscore to "A".."Z" &&& "a".."z" &&& "_"
put readText((143,123,268,174),validCharacters:alphaAndUnderscore)