XBox Automation


The Rokuality App

The Rokuality App operates in 2 modes within a standalone app for Mac and Windows: First as a test debugger/builder which allows you to debug your apps and construct your tests, and Second, as a server which can be used to execute/distribute your tests to your devices.


Note that you can start and run the server headlessly if you download the standalone jar and provide the serveronly=true system property during launch. Useful if running on a linux machine or if running from a build/test server environment:

java -Dserveronly=true -Dport=7777 -jar /path/to/Rokuality_version.jar

Optionally, you can provide a desired set of capabilities and start a manual test session against your device. The test debugger includes a number of UI tools that are geared to help you construct your automated tests. Additionally the tool can be used to provide remote access to your devices or share devices remotely across remote resources.


The Rokuality Platform

The Rokuality open source platform is a rich tool-set for you to write robust, end to end automation tests in a language of your choice! Simply choose a supported language from below and write your tests. Then download the Rokuality app and launch a server instance to route your tests to your devices.

Available Languages:





How to get the Rokuality language bindings:









implementation 'com.rokuality:rokuality-java:1.5.4'

XBox Device Requirements and Setup


Automated testing on XBox requires the following:

  1. Google Chrome browser installed on the machine running the Rokuality server. The server uses headless chrome to handle various tasks like installing/launching/uninstalling the XBox appxbundles. You won't physically see chrome launch as it will run in headless mode.

  2. Your XBox must be in dev kit mode. Enabling developer mode on your XBox is straight forward but it does require a 19$ Microsoft developer account which will allow you to automate 3 boxes from a single dev account.

  3. You must have the XBox dev console available for remote access with NO username/password set. If properly setup you should be able to access your dev console remotely at https://yourxboxip:11443


Starting a Driver and Connecting to your Device

Once you've added the bindings of your choice to your test project, and you've installed the Rokuality app and it is listening on available port, you can initiate a new Driver instance and connect to your device under test.

// your server ip address and listening port

String serverUrl = "http://your_server_ip_address:your_server_port_number";


// initiate your driver

XBoxDriver driver = new XBoxDriver(serverUrl, DeviceCapabilities);


Device Capabilities on Session Start

A capability object as passed to your driver declaration will control certain functionality on test start. For XBox devices, the only required capabilities can be seen below. Others are optional.

// your server ip and listening port

String serverUrl = "http://your_server_ip:your_listening_port";

// Declare a new DeviceCapability object

DeviceCapabilities capabilities = new DeviceCapabilities();


// Indicates we want an XBox One test session

capabilities.addCapability("Platform", "XBox");

// The IP address of your XBox

capabilities.addCapability("DeviceIPAddress", "your_xbox_ip_address");


// App location (path or url to a appx or appxbundle)

capabilities.addCapability("AppPackage", "/path/or/url/to/your/app");

// Identify the friendly name of the app package

capabilities.addCapability("App", "app_name");


// Pass the capabilities and start the test

XBoxDriver xboxDriver = new XBoxDriver(serverUrl, capabilities);

All Device Capabilities



Required - String Value. Indicates the target platform. For XBox automation, required value is XBox.



Required - String Value. The absolute path or url to an XBox appxbundle package to be installed on the device.



Required - String Value. The IP address of your XBox device. Note that the XBox must be on the same network as your Rokuality server.



Required - String Value. The friendly name of your application appxbundle to be installed. Note that if you provide this cap but omit the AppPackage capability, the Rokuality app will attempt to launch an already installed application that has been previously installed by the user.



Optional - Double Value. An optional image match similarity default used only during Image locator evaluations. A lower value will allow for greater tolerance of image disimilarities between the image locator and the screen, BUT will also increase the possibility of a false positive. Double. Defaults to .90



Optional - String Value. An optional 'WIDTHxHEIGHT' cap that all screen image captures will be resized to prior to match evaluation. Useful if you want to enforce test consistency across multiple device types and multiple developer machines or ci environments. String - i.e. a value of '1800x1200' will ensure that all image captures are resized to those specs before the locator evaluation happens no matter what the actual device screen size is.



Optional But HIGHLY recommended - String Value. The OCR type - Currently supported options are 'Tesseract', 'GoogleVision', 'AmazonRekognition', or 'AmazonTextract'. If the capability is set to 'GoogleVision' you MUST have a valid Google Vision account setup and provide the 'GoogleCredentials' capability with a valid file path to the oath2 .json file with valid credentials for the Google Vision service. If the capability is set to 'AmazonRekognition' or 'AmazonTextract' then you MUST have a valid AWS account with an IAM role set with Rekognition or Textract priveleges and an AWS api access key id and secret key in file format you can provide, and you MUST provide the 'AWSCredentials' capability with a valid file path to this credentials file. If the capability is set to 'Tesseract', then you MUST have tesseract installed on your machine. See the using OCR section for details. This is HIGHLY recommended for XBox automation - without this capability you will be limited to locating elements ONLY by image snippets.



Optional - String Value. The path to a valid .json Google Auth key service file. Required if the 'OCRType' capability is set to 'GoogleVision'. The .json service key must exist on the machine triggering the tests, and the Google account for the service must have permissions for Cloud Vision API. The .json service key must exist on the machine triggering the tests. See the using OCR section for details.



Optional - String Value. The path to a valid AWS credential file with an api key id and secret key. Optional but Required if the 'OCRType' capability is set to 'AmazonRekognition' or 'AmazonTextract'. The credential file must exist on the machine triggering the tests. See the using OCR section for details.



Optional - String Value. Used to indicate the XBox package type to be installed. Options are appx and appxbundle. If omitted we assume the package is an appxbundle.



Optional - Integer Value. A value in milliseconds that acts as a delay between image collection during a test session. If you experience 400 http errors during image collection, a small pause here can help alleviate this. A delay in milliseconds i.e. 250. Defaults to 0



Optional - String Value. If using the 'OCRType' capability with value 'Tesseract', this capability must be provided with the absolute path to your Tesseract binary as installed on you machine, i.e. '/usr/local/bin/tesseract'. See the using OCR section for details.



Optional - String Value. If using the 'OCRType' capability with value 'Tesseract', this capability can be provided with the 3-character ISO 639-2 language code you wish to use, i.e. 'eng'. See the using OCR section for details. Defaults to 'eng'.


Finding Elements During Test

During your test, you can identify and locate elements in a variety of ways:

OCR Text Locators

OCR - Text based locators work by capturing the screen image of your device and then performing an evaluation to determine if the text resides within the image. Note that for this locator type to be available,  you must provide the OCRType capability with all the necessary requirements. See the device capabilities section and the using OCR section for additional details.

Element element = driver.finder().findElement(By.Text("text to find on screen"));

Image Snippet Locators

Image snippet locators allow you to provide a partial image snippet you would expect to exist within the device screen. The device screen is then captured and searched to see if it contains the expected image snippet. This is very useful if you wish to verify images/colors/logos/etc in your application. But be cautioned as this is the most fragile of all locator types as the image snippet capture must reliably match the screen of the device for the evaluation. See the Device Capabilities section above as the 'ImageMatchSimilarity' and 'ScreenSizeOverride' capabilities can help with this. But you still need to ensure that the image snippet you're passing is an apples to apples comparison against the device screen.

Note that your image snippet MUST be in .png format and can be either the absolute path to an image snippet on your machine, or can be the url to an image snippet that it can access. The latter is useful if you wish to query your application image content from a remote content api and then dynamically search for them within your test.

// Finds an element by a .png image snippet saved on the users file system

Element eleFromFile = driver.finder().findElement(By.Image("/path/to/image.png"));

// Finds an element by a .png image snippet available at a public url

Element eleFromUrl = driver.finder().findElement(By.Image("http://urltoimage.png"));

Element Timeouts and NoSuchElement Exceptions

In the event our locator can't be found within the application, a NoSuchElementException will be returned to the user and the test will fail. In the above scenarios this failure would happen immediately as we did not apply an implicit wait for our locator searches. But if we apply a timeout (in milliseconds), we can reduce flake in our tests as the locator will be searched for continuously until it is either found, or the timeout expires and a NoSuchElementException is thrown. The implicit element timeout will last for the duration of the driver session, or until a new value is set that overrides it.

// Sets an element timeout applied to all locator searches

// if set the server will look for the element until it is either found

// or the timeout is exceeded and a NoSuchElementException is thrown


driver.finder().findElement(By.Text("text to find"));


Elements as Objects

Once our locator has been found, a matching Element object will be returned which will contain information about its location, size, and confidence (if relevant).

Element element = driver.finder().findElement(By.Text("Hello World!"));







Finding Multiple Elements and Checking for Element Presence

In the above scenarios, we examine searching for a single element within our application. But in those cases, if the element is not found by the designated locator, then a NoSuchElementException will be thrown and our test will fail. But what if we want to check if an element is present or not? Or our locator in question returns multiple elements on the device screen?

Multiple Match Locators

Locators that find multiple matches for an element will return a collection of those elements which can be iterated over as follows:

List<Element> elements = driver.finder().findElements(By.Text("multi match locator"));

Element Presence

This same approach can be used to check whether an element is present within the application. Using the multi match search will NOT throw a NoSuchElementException in the event a matching element is not found. In that event the collection will be empty and we can perform logic based on that scenario.

boolean elementPresent = driver.finder().findElements(By.Text("locator")).size() > 0;

Remote Control and User Interaction


A user can perform remote control button presses and drive the application as a user would with the remote control api's. A user can navigate the UI of their app, and pause/play/fast forward/rewind media in flight.


Almost all remote control buttons are supported.





In addition to remote control button presses, it is also possible to send a string of literal characters to the device to perform searches or easily interact with the XBox keyboard. Note that the XBox virtual keyboard must be visible on screen for this to have any effect.

driver.remote().sendKeys("text to type");

Getting Screen Artifacts (Image, Recordings, and More)


During the course of a test, a user can get screen artifacts such as the screen image and screen size. It's also possible to get the screen recording of the device from session start until the time of capture which is incredibly useful for reporting and test debugging.

// get the screen size



// get the screen image



// get the screen sub image from starting x,y with width/height

driver.screen().getImage(1, 1, 300, 300);


// get the screen recording of the test session from start to now


Getting Information About the Device


Information about the device under test such as the device type, model number, and more can be retrieved during the session as follows:

XBoxDeviceInfo deviceInfo =;

Setting and Retrieving Session Status


You can set the status of an active session to "passed", "failed", "broken", or "in progress" which will be retained in server memory for the duration of the session or until a new value is set. This is useful if you want to set the status of a test and then communicate result status with a reporting framework/service in a teardown or after test method. By default, the session status is "in progress" unless the user has updated it during the course of the session.

// sets the session status


// can be retrieved at any point the session is active

SessionStatus status = driver.options().getSessionStatus();

Assert.assertEquals(SessionStatus.PASSED, status);

Properly Stopping Your Session on Test Complete


It's important that when your test is complete, you properly stop your driver and release your device! This terminates the session under test and frees up the available thread for additional testing. If you don't properly release the device, back end cleanup will eventually run and release the device for further testing.

// stops the driver and releases your available thread back to your plan

// should be called as the last action of your test