Using OCR for Textual Locators

Using an OCR engine during test will allow you to provide textual based locators which work be evaluating your device screen image during test, then analyzing and breaking up the text on the page into objects that can be validated in your automated test code. This includes, the text of each word, the size, and the location on screen of the text. Although this can sometimes be less reliable than true native based locators, it can be incredibly useful for testing scenarios where an element may not be available inside the application DOM - such as text inside of an image, a video, etc.

Rokuality provides support for Tesseract, Google Vision, Amazon Rekognition, and Amazon Textract. Use of Google Vision and Amazon require an additional paid service but Tesseract is a no cost/open source alternative that is available to all users.



You can use Tesseract as your OCR engine during test provided you have Tesseract installed on the machine running Rokuality. This provides an open source/no cost OCR solution but results are typically a little less reliable than using Google Vision or Amazon based OCR engines. But as a free alternative, you may find that it meets your application needs. Note that when using Tesseract as your OCR engine, you must have Tesseract installed and properly setup on the machine running Rokuality, with any required language packages installed.

DeviceCapabilities capabilities = new DeviceCapabilities();

capabilities.addCapability("OCRType", "Tesseract");

capabilities.addCapability("TesseractBinary", "/path/to/your/tesseract/binary");


// 3 char language code OPTIONAL - defaults to 'eng'

capabilities.addCapability("TesseractLanguage", "eng");

Google Vision


You can use Google Vision as your OCR engine during test provided you have a valid Google Vision account setup. You must also set the path to your .json service file containing your service key in your DeviceCapabilities prior to driver start.

DeviceCapabilities capabilities = new DeviceCapabilities();

capabilities.addCapability("OCRType", "GoogleVision");

capabilities.addCapability("GoogleCredentials", "/path/to/your/vision/authkey.json");

Amazon Rekognition/Textract


You can use Amazon Rekognition or Amazon Textract as your OCR engine during test provided you have a valid AWS Account account setup with an api key with an IAM role set to allow Textact or Rekognition use. You must then create the standard aws credential file with your api key id and secret key. Note that Rekognition has a 50 word max per page evaluation whereas Textract has no such limit.

DeviceCapabilities capabilities = new DeviceCapabilities();


// change value to 'AmazonRekognition' to use Rekognition instead of Textract

capabilities.addCapability("OCRType", "AmazonTextract");

capabilities.addCapability("AWSCredentials", "/path/to/your/aws/credential/file");