HDMI Automation - Playstations, Tivo, Cable Settop and more!

The Rokuality App will allow you to automate any device with HDMI out that accepts Bluetooth or IR commands in. This includes but is not limited to Playstations, Tivo, Cable SetTop boxes, and more!


The Rokuality App

The Rokuality App operates in 2 modes within a standalone app for Mac and Windows: First as a test debugger/builder which allows you to debug your apps and construct your tests, and Second, as a server which can be used to execute/distribute your tests to your devices.


Note that you can start and run the server headlessly if you download the standalone jar and provide the serveronly=true system property during launch. Useful if running on a linux machine or if running from a build/test server environment:

java -Dserveronly=true -Dport=7777 -jar /path/to/Rokuality_version.jar

Optionally, you can provide a desired set of capabilities and start a manual test session against your device. The test debugger includes a number of UI tools that are geared to help you construct your automated tests. Additionally the tool can be used to provide remote access to your devices or share devices remotely across remote resources.


The Rokuality Platform

The Rokuality open source platform is a rich tool-set for you to write robust, end to end automation tests in a language of your choice! Simply choose a supported language from below and write your tests. Then download the Rokuality app and launch a server instance to route your tests to your devices.

Available Languages:





How to get the Rokuality language bindings:









implementation 'com.rokuality:rokuality-java:1.5.4'

HDMI Device Requirements and Setup


Automated testing on HDMI devices requires the following:

  1. You must connect your HDMI device via its HDMI input to an encoder on your network. We recommend the J-Tech H.264 Encoder as it is inexepensive, efficient, and easy to configure/connect your device. Once configured you can access the hdmi output of your device anywhere on your network. This allows the Rokuality app to query your device and perform Image and OCR evaluations against your device's screen. See the why encoder? section for details.

  2. You must have a Logitech Harmony Hub with your device setup as a device and XMPP enabled on the hub. See the sections why harmony and configuring your harmony for details. This is a low cost and extensible solution that allows the Rokuality app to send IR and bluetooth commands to your device without requiring any "magic box" hardware.


Starting a Driver and Connecting to your Device

Once you've added the bindings of your choice to your test project, and you've installed the Rokuality app and it is listening on available port, you can initiate a new Driver instance and connect to your device under test.

// your server ip address and listening port

String serverUrl = "http://your_server_ip_address:your_server_port_number";


// initiate your driver

HDMIDriver driver = new HDMIDriver(serverUrl, DeviceCapabilities);


Device Capabilities on Session Start

A capability object as passed to your driver declaration will control certain functionality on test start. For HDMI devices, the only required capabilities can be seen below. Others are optional.

// your server ip and listening port

String serverUrl = "http://your_server_ip:your_listening_port";

// Declare a new DeviceCapability object

DeviceCapabilities capabilities = new DeviceCapabilities();


// Indicates we want an HDMI test session

capabilities.addCapability("Platform", "HDMI");

// The rtsp url of your encoder

capabilities.addCapability("EncoderUrl", "your_encoder_rtsp_url");


// The name of your device as saved in Harmony

capabilities.addCapability("DeviceName", "device_name");


// Pass the capabilities and start the test

HDMIDriver hdmiDriver = new HDMIDriver(serverUrl, capabilities);

All Device Capabilities



Required - String Value. Indicates the target platform. For HDMI automation, required value is HDMI.



Required - String Value. The name of your device as saved in your Harmony Hub during Harmony setup.



Required - String Value. The rtsp encoder url of your encoder with your connected device under test.



Optional - Double Value. An optional image match similarity default used only during Image locator evaluations. A lower value will allow for greater tolerance of image disimilarities between the image locator and the screen, BUT will also increase the possibility of a false positive. Double. Defaults to .90



Optional - String Value. An optional 'WIDTHxHEIGHT' cap that all screen image captures will be resized to prior to match evaluation. Useful if you want to enforce test consistency across multiple device types and multiple developer machines or ci environments. String - i.e. a value of '1800x1200' will ensure that all image captures are resized to those specs before the locator evaluation happens no matter what the actual device screen size is.



Optional But HIGHLY recommended - String Value. The OCR type - Currently supported options are 'Tesseract', 'GoogleVision', 'AmazonRekognition', or 'AmazonTextract'. If the capability is set to 'GoogleVision' you MUST have a valid Google Vision account setup and provide the 'GoogleCredentials' capability with a valid file path to the oath2 .json file with valid credentials for the Google Vision service. If the capability is set to 'AmazonRekognition' or 'AmazonTextract' then you MUST have a valid AWS account with an IAM role set with Rekognition or Textract priveleges and an AWS api access key id and secret key in file format you can provide, and you MUST provide the 'AWSCredentials' capability with a valid file path to this credentials file. If the capability is set to 'Tesseract', then you MUST have tesseract installed on your machine. See the using OCR section for details. This is HIGHLY recommended for HDMI automation - without this capability you will be limited to locating elements ONLY by image snippets.



Optional - String Value. The path to a valid .json Google Auth key service file. Required if the 'OCRType' capability is set to 'GoogleVision'. The .json service key must exist on the machine triggering the tests, and the Google account for the service must have permissions for Cloud Vision API. The .json service key must exist on the machine triggering the tests. See the using OCR section for details.



Optional - String Value. The path to a valid AWS credential file with an api key id and secret key. Optional but Required if the 'OCRType' capability is set to 'AmazonRekognition' or 'AmazonTextract'. The credential file must exist on the machine triggering the tests. See the using OCR section for details.



Optional - Integer Value. A value in milliseconds that acts as a delay between image collection during a test session. If you experience 400 http errors during image collection, a small pause here can help alleviate this. A delay in milliseconds i.e. 250. Defaults to 0



Optional - String Value. If using the 'OCRType' capability with value 'Tesseract', this capability must be provided with the absolute path to your Tesseract binary as installed on you machine, i.e. '/usr/local/bin/tesseract'. See the using OCR section for details.



Optional - String Value. If using the 'OCRType' capability with value 'Tesseract', this capability can be provided with the 3-character ISO 639-2 language code you wish to use, i.e. 'eng'. See the using OCR section for details. Defaults to 'eng'.


Finding Elements During Test

During your test, you can identify and locate elements in a variety of ways:

OCR Text Locators

OCR - Text based locators work by capturing the screen image of your device and then performing an evaluation to determine if the text resides within the image. Note that for this locator type to be available,  you must provide the OCRType capability with all the necessary requirements. See the device capabilities section and the using OCR section for additional details.

Element element = driver.finder().findElement(By.Text("text to find on screen"));

Image Snippet Locators

Image snippet locators allow you to provide a partial image snippet you would expect to exist within the device screen. The device screen is then captured and searched to see if it contains the expected image snippet. This is very useful if you wish to verify images/colors/logos/etc in your application. But be cautioned as this is the most fragile of all locator types as the image snippet capture must reliably match the screen of the device for the evaluation. See the Device Capabilities section above as the 'ImageMatchSimilarity' and 'ScreenSizeOverride' capabilities can help with this. But you still need to ensure that the image snippet you're passing is an apples to apples comparison against the device screen.

Note that your image snippet MUST be in .png format and can be either the absolute path to an image snippet on your machine, or can be the url to an image snippet that it can access. The latter is useful if you wish to query your application image content from a remote content api and then dynamically search for them within your test.

// Finds an element by a .png image snippet saved on the users file system

Element eleFromFile = driver.finder().findElement(By.Image("/path/to/image.png"));

// Finds an element by a .png image snippet available at a public url

Element eleFromUrl = driver.finder().findElement(By.Image("http://urltoimage.png"));

Element Timeouts and NoSuchElement Exceptions

In the event our locator can't be found within the application, a NoSuchElementException will be returned to the user and the test will fail. In the above scenarios this failure would happen immediately as we did not apply an implicit wait for our locator searches. But if we apply a timeout (in milliseconds), we can reduce flake in our tests as the locator will be searched for continuously until it is either found, or the timeout expires and a NoSuchElementException is thrown. The implicit element timeout will last for the duration of the driver session, or until a new value is set that overrides it.

// Sets an element timeout applied to all locator searches

// if set the server will look for the element until it is either found

// or the timeout is exceeded and a NoSuchElementException is thrown


driver.finder().findElement(By.Text("text to find"));


Elements as Objects

Once our locator has been found, a matching Element object will be returned which will contain information about its location, size, and confidence (if relevant).

Element element = driver.finder().findElement(By.Text("Hello World!"));







Finding Multiple Elements and Checking for Element Presence

In the above scenarios, we examine searching for a single element within our application. But in those cases, if the element is not found by the designated locator, then a NoSuchElementException will be thrown and our test will fail. But what if we want to check if an element is present or not? Or our locator in question returns multiple elements on the device screen?

Multiple Match Locators

Locators that find multiple matches for an element will return a collection of those elements which can be iterated over as follows:

List<Element> elements = driver.finder().findElements(By.Text("multi match locator"));

Element Presence

This same approach can be used to check whether an element is present within the application. Using the multi match search will NOT throw a NoSuchElementException in the event a matching element is not found. In that event the collection will be empty and we can perform logic based on that scenario.

boolean elementPresent = driver.finder().findElements(By.Text("locator")).size() > 0;

Remote Control and User Interaction


A user can perform remote control button presses and drive the application as a user would with the remote control api's. A user can navigate the UI of their app, and pause/play/fast forward/rewind media in flight. You can view all available button options depending on your device, and send the appropriate command to your Harmony hub as follows:

// get available remote commands

String buttonOptions = hdmiDriver.remote().getButtonOptions();

System.out.println("Button options: " + buttonOptions);

// send remote control options


By default, the time delay between multiple remote control commands is 0 milliseconds, meaning multiple remote control commands will happen as quickly as possible. This can in some cases lead to test flake due to multiple commands happening back to back too quickly. If this is happening, you can add a delay in between remote control button presses as follows:

// sets a delay between remote control button presses to 2 seconds.

// this will last for the duration of the session or until a new value is set.

// If not set, defaults to 0


Getting Screen Artifacts (Image, Recordings, and More)


During the course of a test, a user can get screen artifacts such as the screen image and screen size. It's also possible to get the screen recording of the device from session start until the time of capture which is incredibly useful for reporting and test debugging.

// get the screen size



// get the screen image



// get the screen sub image from starting x,y with width/height

driver.screen().getImage(1, 1, 300, 300);


// get the screen recording of the test session from start to now


Setting and Retrieving Session Status


You can set the status of an active session to "passed", "failed", "broken", or "in progress" which will be retained in server memory for the duration of the session or until a new value is set. This is useful if you want to set the status of a test and then communicate result status with a reporting framework/service in a teardown or after test method. By default, the session status is "in progress" unless the user has updated it during the course of the session.

// sets the session status


// can be retrieved at any point the session is active

SessionStatus status = driver.options().getSessionStatus();

Assert.assertEquals(SessionStatus.PASSED, status);

Properly Stopping Your Session on Test Complete


It's important that when your test is complete, you properly stop your driver and release your device! This terminates the session under test and frees up the available thread for additional testing. If you don't properly release the device, back end cleanup will eventually run and release the device for further testing.

// stops the driver and releases your available thread back to your plan

// should be called as the last action of your test


Why Do I Need an Encoder?


HDMI device automation requires you to setup an HDMI encoder on your network and connect your HDMI device to it which allows the Rokuality app to query your device's UI during test and perform OCR and image based analysis. Many encoders such as the recommended J-Tech H.264 Encoder are low cost, efficient, and allow you to connect to your devices from anywhere on your network. Other similar solutions on the market ship you a "magic box" device that is proprietary and limits the ability to scale with more devices and across multiple developers/resources. By allowing you to purchase and supply your own encoder, it limits cost and increases your ability to scale and control your own hardware, and not worry about it being provided by only one automation provider.

Why Do I Need a Harmony?


HDMI device automation requires a logitech harmony hub to drive the user input. The hub is a low cost (60$) device that provides both IR and Bluetooth capability in an easily available, wireless solution. Since you don't need to use an IR blaster or a cabled base solution you can scale across additional devices easily, and the Rokuality app can drive the harmony and control your devices under test. Other similar solutions on the market ship you a "magic box" device that requires you to plug in your device under test - which is proprietary and limits the ability to scale with more devices and across multiple developers. Using the harmony approach is cheap, effective, and scalable.

NOTE - We are aware of a recent decision by Logitech to gradually phase out the Harmony line of products - although they have actively agreed to continue supporting the products in circulation. We are actively working on providing an alternate solution to Harmony which will be coming soon and will have limited (if any) impact to your tests!

Configuring Your Harmony


To setup your Harmony hub and prepare it for automating your devices under test:

  1. Download the Harmony mobile app for iOS or Android.

  2. During setup it will walk you through adding your device under test. Be sure to keep track of what you name your device when you pair it with your Harmony as that will be used later during your tests as your "DeviceName" capability. This will allow the server to communicate with your device from your test code and drive it like a real user.

  3. Enable XMPP on the hub. In the harmony app go to Settings>>Harmony Setup>>Add/Edit Devices & Activities>>Hub>>Enable XMPP