OCR Accuracy with "8" and "9"

crb417 · March 3, 2015, 10:59am

I am using OCR to read black text on a white background. I am frequently running into the following issues:

“8” will be read as “3”
“9” will be read as “3”
A text string that is 1 character long is not recognized at all

I have already tried turning contrast on, setting the “backgroundcolor” to white, and altering the contrast tolerance. Any other ideas for how to improve the accuracy of OCR on these numbers?

I also have these same issues when reading black text on a grey background. I have tried setting the “backgroundcolor” to grey here as well.

I am using eggplant 14.21 on Red Hat 6. My Unit Under Test is also a Red Hat 6 machine.

EggplantMatt · March 3, 2015, 11:37am

These are all common issues with eggPlant v14. The OCR engine was updated and significantly improved in v15 and generally does a better job of reading text. It will recognize a single character where v14 almost never would.

Note too that the “backgroundcolor” property is not used with and does not work with the OCR functionality – it is only used with the TIG approach to finding text, which is a completely different mechanism. When working with contrast in the OCR, you need to set the contrastColor property.

–

crb417 · March 10, 2015, 11:16am

I have upgraded to eggplant version 15.10-Linux.

This has solved the above issues, but it seems version 15 has its own set of OCR issues. This is what I have observed:

Strings that contain only the character 8 will not be recognized at all. Examples: “8”, “88”, “888”
OCR will insert erroneous spaces in strings. I am aware of the IgnoreSpaces option, but this is not a solution for me as I expect some strings to contain spaces.
Using the Contrast, ContrastColor, and ContrastTolerance settings makes OCR substantially worse in my case. Reading text on a light grey background, I set Contrast to on, ContrastColor to the RGB values, and set the ContrastTolerance to all sorts of values with no results. I started with ContrastTolerance = 20 as this is recommended here:

http://docs.testplant.com/?q=content/working-ocr

Removing the contrast options helps, but I am still left with issues 1 and 2.

Are these common issues with version 15?

OCR Accuracy with &quot;8&quot; and &quot;9&quot;

OCR Accuracy with "8" and "9"