Debugging Your Workflows

When developing your workflows, it's not expected that you will do it exactly right on the first try. Nobody is perfect. And there are a number of moving parts. In this section we'll discuss how to debug your workflows so you can make forward progress during development.

This is an ordered list that will walk you down the levels of the stack to help identify the root cause.

LED State

The first thing you should do is check that the LEDs are in the expected state. See this guide. Confirm that the device is connected (three white LEDs) - if not, figure out why (weak/no cell coverage, captive portal on WiFi, dead battery, not activated, etc.)

Radio Check

The next thing to do is to test that basic communications are working. Tap the Assistant button, and the Relay's LEDs should turn blue and it should speak out the name of the device and what channel it is on (i.e., "Alice is on Main"), then the LEDs should return to their original state. Since these actions and speech come from the Relay server, this is a good way to verify that you have round-trip connectivity to the Relay server and that the device is properly activated. This doesn't involve anything from your workflow server.

Try a Built-In Workflow

As an additional check that the Relay server and device are in good shape, try triggering a pre-registered built-in workflow, such as holding down the Assistant button and speaking the phrase "battery". There won't be a confirmation beep upon the recognized phrase. After a moment, the device should speak something like "The battery level is 85 percent." This also helps to verify that the Relay server can hear your voice and transcribe what you said.

You can also try a built-in workflow that isn't automatically registered, such as the "hello world" and "LED demo" ones described in the Quick Start.

Workflow Server is Listening

The next thing to do is to verify that your workflow server is up and running and listening for incoming connection requests. Unless you have disabled default logging in the Relay SDK, you should see a log message like this from your workflow app when you start it:

Relay SDK WebSocket Server listening => 8080
INFO:relay.workflow:Relay workflow server (relay-sdk-python/2.0.0) listening on localhost port 8080 with plaintext
11:11:42 [1] [] [DBG] Add "/hello_world" to SamplesLibrary.HelloWorldWorkflow 
7/13/2022 11:11:42 AM [Info] Server started at ws://0.0.0.0:8080 (actual port 8080)

Check that this message appears in verbose mode, and that the port number is as you expected.

You can also check with the operating system that the port is listening as expected:

$ ss -tlnp
State  Recv-Q Send-Q  Local Address:Port    Peer Address:Port Process                                 
LISTEN 0      100           0.0.0.0:8080         0.0.0.0:*     users:(("python3",pid=3958835,fd=6))

Remember that you need to have TLS set up somewhere, either via a reverse proxy or directly in the workflow server.

Verify Workflow Registration

Check that you don't have a typo in your workflow registration, especially in the URI/URL of your workflow server so that the Relay server tries to connect to your workflow server at the correct address. A typo here is surprisingly common. Also check the Type to make sure you are registering the correct kind of trigger. And don't forget to check that the workflow is "installed" on the device that you are currently using, otherwise the trigger will be ignored by the Relay server.

$ relay workflow list --extended --no-truncate
=== Installed Workflows

 ID                                            Name                 Type              Uri                                                       Args                                       Installed on               
 ───────────────────────────────────────────── ──────────────────── ───────────────── ───────────────────────────────────────────────────────── ────────────────────────────────────────── ────────────────────────── 
 wf_mywf_eRzs9APSs7tP9ADlTE0et9CMA             mywf                 phrases:test      wss://d484-173-42-72-3.ngrok.io/hellopath                                                            all devices

If during the registration you receive the error missing_capability then contact Relay Support. In the case of missing_capability, you can manually check that you have this capability by running the whoami CLI command and looking for workflow_sdk: true in the Capabilities:

$ relay whoami
=== You are

Name:               Alice Smith
Email:              [email protected]
Default Subscriber: ee4547e0-aaef-4956-b853-ade0ba234567
Auth User ID:       98071da3-572b-4552-be88-8907a9cdef01
Relay User ID:      VIRT1cMBYVBIPZ3kR7aojmYaRL
Capabilities:       workflow_sdk: true, indoor_positioning: true, ui_nfc: true, calling: true, calling_between_devices_support: true, enable_audit_logs: true

Workflow Server Connection

Using a PC web browser, ideally located outside of your enterprise, copy the wss URI from the relay workflow list -x --no-truncate command shown above, and paste it into the URL field of your browser, but swapping out the wss protocol for https. When you do this, the browser should display a websocket header error, for example, Chrome and Firefox would say:

Failed to open a WebSocket connection: empty Connection header.
You cannot access a WebSocket server directly with a browser. You need a WebSocket client.

This is expected and positive. It means that the browser was able to establish a TLS connection with your workflow server and exchange an HTTP message. The browser is trying to fetch a web page from your workflow server, but the workflow server can handle only websockets, which is why the workflow server returns this expected error message. Again, this is goodness for this exercise.

Badness would be a different error, such as:

  • hostname not found.
  • timeout connecting to server.
  • connection refused.
  • invalid SSL certificate (your connection is not private, connection not secure).
  • etc.

Button Triggers Require Assistant Channel

Is your trigger a button press (single or double tap)? If so, you need to be on the Assistant channel for that button press to be recognized as a trigger. Otherwise it will just be interpreted as a short outgoing audio message (talk). And be sure you are pressing the big Talk button on the face of the device, not the smaller buttons on the edge of the device.

There is a way to encourage the device to be on the Assistant channel. In Dash, you can configure the Assistant channel as "Home Channel" for the device, and add a "Home Channel Timeout" that will automatically return the device to the Home Channel in as little as one minute of inactivity on any other channel. These settings can be found in Dash under Account -> Users -> Channels. However, be aware that even with these settings, it does not prevent the user from navigating to another channel (assuming other channels are available).

Trigger Phrase Acknowledged on Device

In the case where you are doing a spoken phrase trigger, there are a few things to look for:

  1. Immediately when you hold the button to begin speaking the trigger phrase, the LEDs on the device should turn blue, and the device should give one short vibrate. When doing this you can use the Assistant button no matter what channel you are currently on, or use the Talk button when on the Assistant channel.
  2. When you let go of the button, the LEDs should return to their previous state.
  3. Shortly after speaking the trigger phrase and letting go of the button, the device should give a short confirmation beep. This confirmation beep comes from the Relay server and indicates that your phrase was successfully transcribed and recognized as a configured trigger, and shortly the Relay server should attempt to establish a websocket connection to your workflow server and send you a START event. This is what you want to see happen.
  4. If instead of getting a confirmation beep there is no feedback from the Relay server (no beep, no vibrate), then this indicates that the Relay server was unable to successfully transcribe your spoken phrase. In other words, it couldn't tell what words you said. The listen action will wait for you to press the talk button again and speak a new phrase, even multiple times until successfully transcribed, up until when it times out. A failure to transcribe is not an error in listen. Try speaking with increased annunciation. And don't hold the Relay device too close to your mouth. Also verify that you are pressing down the button sufficiently long before you begin speaking, and letting go of the button sufficiently long after you are done speaking, so your audio doesn't get clipped. It is surprisingly common for users let go of the button before they are completely done speaking.
  5. If instead of getting a confirmation beep there is a "dum-dum" sound and a quick triple vibration, this indicates that the Relay server was able to transcribe your speech, but the transcribed phrase does not match any configured trigger phrase. In other words, there isn't a workflow registered for the trigger phrase you spoke. Check that the phrase you spoke actually matches a configured trigger phrase that appears in relay workflow list.
  6. Another possible reason for the "dum-dum" sound (without a quick triple vibration) is that a previous instance of your workflow is still running while you are trying to create a new workflow instance on the same device. Since not more than one instance of the same workflow name can run on a device at a time, the Relay server will reject the trigger for the new workflow instance. See below for how to deal with this scenario in the heading titled "No Duplicate Workflow Instances".

No Duplicate Workflow Instances

When registering your workflow, you provided a name for that workflow via the -n option. That name is more than just a human-readable tag. When a trigger occurs, the Relay server will attempt to instantiate an instance of the configured workflow. However, if there already is a workflow of the same name already running on the device, the Relay server will reject the new workflow instance and your workflow code won't be invoked.

When developing workflows it is common to unintentionally leave a workflow running, when there is a problem in the flow of your code and the terminate method is not invoked. So what you end up with is a workflow instance that is unintentionally hanging around in an unexpected state, which prevents you from starting a new instance of the same workflow application. When you speak a phrase trigger and you get a "dum dum" sound and you don't see a new instance of your workflow, this may be why. To verify if this is the case, tap the assistant button on the device to see what channel it says. If the channel name it says is the interaction name (i.e., "hello interaction") then the server is still trying to execute your workflow. If your workflow has terminated or timed out, it should say the channel name it was on before the workflow started.

If your workflow seems to only partially execute, it is similarly suggested to tap the assistant button to see if it says the interaction name as the channel.

There is a CLI command to list your running workflow instances in the Relay server, so you can check if there is an undesired instance still running. Here is an example of that CLI command when there is a workflow still running.

$ relay workflow instance list
=== Workflow Instance

 ID                       Workflow id                         Name      Triggered By    Status  
 ──────────────────────── ─────────────────────────────────── ───────── ─────────────── ─────── 
 Uf879BgZwjmLrLETmiHpofB  wf_internal2_dvzYUlxsdKB3uzSJUXgnpD internal2 990007560020123 running

The easiest way to get rid of an undesired "hanging around" workflow instance in the Relay server is to kill your workflow application, such as hitting Ctrl-C to stop its execution. When you kill the workflow application on your workflow server, it should automatically cause any existing websockets to close, and that closure is transmitted the the Relay server. When the Relay server sees the websocket close, it will automatically terminate the workflow because it no longer has a way to communicate with the workflow application. Now you should be able to perform the trigger again, and the Relay server should be able to create a new instance of your workflow.

Similarly, when creating an interaction via startInteraction, you are required to provide a name of the interaction. The same behavior applies to the interaction name as does the workflow name, so a "hanging around" interaction can prevent a new interaction of the same name from getting created. Similarly, the workaround is to stop the execution of the workflow (i.e., Ctrl-C), and the Relay server will end the interaction and terminate the workflow, so you can create a new one.

Websocket Established

When you perform a trigger that is correctly registered, whether spoken phrase or button press or something else, you want to see that the Relay server establishes a websocket to your workflow app. When the default logging mode is unchanged in the Relay SDK, the SDK should generate a log message such as the following:

Workflow new connection on /hellopath
workflow started from /hellopath
17:43:10.013 [8] [] [INF] [b5554d19-2fb7-437f-a619-92ed8c15280b/127.0.0.1:34846/notifypath] OnOpen
2022-09-28 17:36:35 INFO  Relay:100 - Workflow instance started for hellopath

You should see these with the default log level of INFO. If you don't see this websocket get established, there may be a registration mismatch, a listening problem on your workflow server, a connectivity problem between the Relay server and your workflow server, etc. Here it is surprisingly common for the path part of the registered URI to not match what the workflow app self-registers for in its source code.

Enabling SDK Verbose Mode

All of the SDKs include logging information that print to standard output when your workflow app starts and while your workflow runs. It can display the messages that travel over the websocket - this may be more information that you need, but it's good to see something instead of instead of not being able to see anything. The next thing to do is enable verbose mode in the Relay SDK. Each SDK should have instructions on how to do that in their README.md file which is also located in Github. We'll use the verbose output in the steps below, starting with looking for the START event.

Start Event Received

Immediately after the websocket is established, you want to see that the Relay server sends a START event which is received by your workflow application. When verbose mode is enabled in the Relay SDK, the SDK should generate a log message such as the following (look for wf_api_start_event):

DEBUG: 2022-08-05 10:09:01,648: [hello:140130310166560] recv: {'_type': 'wf_api_start_event', 'trigger': {'args': {'phrase': 'test', 'source_uri': 'urn:relay-resource:name:device:Alice'}, 'type': 'phrase'}}

Now you have confirmation that the URI/URL of your workflow server has the correct protocol+hostname, and the path part is correct.

Analytics Log

Assuming a workflow gets started, when the workflow ends (normally or abnormally), there will an entry in the analytics log on the Relay server. This entry will have a timestamp, a reason string, and a note of which action was the last one executed in the workflow. So if you are struggling to understand at what point your workflow is ending, this can provide some help. For more information on how to fetch this data, see the page titled Using Analytics in the heading titled "System Entries".

Add Logging to Your Code

Once the websocket is connected and the START event is received, the SDK should call the handler method that you registered for the START event. Now the rest is up to you. One thing you could do at this point is to generate your own log message as the first thing you do in your START handler:

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
...
wf_server = relay.workflow.Server('0.0.0.0', 8080, log_level=logging.DEBUG)
...

@my_workflow.on_start
async def start_handler(workflow, trigger):
    logger.debug(f'I have been called with: {trigger}')
    
    ....

Consider doing the same at other points in your workflow code, such as the interaction lifecycle handler. This will help you confirm that your workflow code execution is flowing in the expected path, and the values you are working with are as expected.

Action Requests Get Responses

When the verbose mode is enabled in the SDK, it will show you the messages that are exchanged across the websocket. Most of these will be in a request/response pairing: your workflow sending a request, and the Relay server replying with a response. Seeing the requests get sent and the expected responses coming back will give you confidence that your workflow is functioning as expected. Here is an example for say:

_send action => {
  _id: '9d48d77c93b2b71413a139e5cecab76e',
  _type: 'wf_api_say_request',
  _target: {
    uris: [
      'urn:relay-resource:name:interaction:hello%20world?device=urn%3Arelay-resource%3Aname%3Adevice%3AFrog'
    ]
  },
  text: 'Hello World!',
  lang: 'en-US'
}
onMessage {
  _id: '9d48d77c93b2b71413a139e5cecab76e',
  _type: 'wf_api_say_response',
  id: '9d48d77c93b2b71413a139e5cecab76e'
}
_waitForEventCondition#responseListener => {
  _id: '9d48d77c93b2b71413a139e5cecab76e',
  _type: 'wf_api_say_response',
  id: '9d48d77c93b2b71413a139e5cecab76e'
}
processing event 9d48d77c93b2b71413a139e5cecab76e of type wf_api_say_response
2022-07-28 09:56:22,949 - relay.workflow - DEBUG - [hello:139643426739344] send: {"_type": "wf_api_say_request", "_target": {"uris": ["urn:relay-resource:name:interaction
:hello%20world?device=urn%3Arelay-resource%3Aname%3Adevice%3AFrog"]}, "text": "Hello World!", "lang": "en-US", "_id": "8131d84351744e538a3559d51322f1d3"}
2022-07-28 09:56:22,998 - relay.workflow - DEBUG - [hello:139643426739344] recv: {'_id': '8131d84351744e538a3559d51322f1d3', '_type': 'wf_api_say_response', 'id': '8131d84351744e538a3559d51322f1d3'}
10:02:27 [11] [] [DBG] [cfbb69e2-89f2-4ee0-a318-8114af5c3cf4/127.0.0.1:44698/hello_world] Send JSON: "{\"text\":\"Hello World!\",\"lang\":\"en-US\",\"_id\":\"8D95AA540156B6DB038AEC3239E345A1
\",\"_type\":\"wf_api_say_request\",\"_target\":{\"uris\":[\"urn:relay-resource:name:interaction:hello%20world?device=urn%3Arelay-resource%3Aname%3Adevice%3AFrog\"]}}" 
10:02:27 [10] [] [DBG] [cfbb69e2-89f2-4ee0-a318-8114af5c3cf4/127.0.0.1:44698/hello_world] OnMessage JSON: "{\"_id\":\"8D95AA540156B6DB038AEC3239E345A1\",\"_type\":\"wf_api_say_response\",\"id\":\"8D95AA540156B6DB038AEC3239E345A1\"}"

If the request was successful, you should see the corresponding response, such as the wf_api_say_request and wf_api_say_response pair above. If the request was unsuccessful, you may get a wf_api_error_response and an END event. If this happens:

  • if the SDK method you call requires an interaction URN as the "target" parameter, check that you are using an interaction URN here instead of a device URN or group URN. Many of the SDK methods require an interaction to be explicitly started.
  • check the other parameters in the SDK method you call to verify they are well-formed, are of the correct type, and have valid data.

After an error is sent to your workflow from the Relay server, you can fetch entries from the system log to get a bit of metadata about it. See the Using Analytics page for more info on that.

There is a rate limit to the number of workflows/interactions/actions that can be performed. If you hit this limit, you should get back an HTTP response code 429 "Too Many Requests", where the error message may contain some wording about an AUP (Acceptable Use Policy). This rate limit is in place to handle a "runaway" loop that may put excessive burden on the Relay server, and you shouldn't encounter it outside of that scenario. If you have a use case that is hitting the rate limit that isn't a runaway loop, please contact Relay Support for assistance.

Other Tips

  • When making an SDK call to perform an action, it will likely require the parameter "target". This parameter identifies which device(s) you want the action to occur on. There are different types of targets, namely interaction targets and device targets. Make sure you are using the correct target type. The guides and API reference should make it clear which type should be used.
  • If you are sending a notification out to a group of devices via broadcast or alert, ensure that you are passing in the correct group URN. The best way to create a URN for a group is through the URN construct/parse functions that are provided with the SDK. This is done by passing in the name or ID of your group into the groupId() or groupName() function.
  • Getting/setting a variable can be helpful for when you need to save a value for later retrieval in the workflow. However, keep in mind that the workflows can only store variables with string values. If you want to store a variable that has a value of an integer, you can save it as a string with setVar() and then retrieve it using the SDK's getNumberVar() variable.
  • As noted in the Say and Listen section, if you don't see a listen response, that likely means that the device was not able to transcribe your speech to text. It will continue to wait for subsequent spoken audio that it can parse or until the request times out before continuing on through the workflow.