Say and Listen

Voice-centric communication is one of the core features of the Relay device. It allows an active worker to interact with your I/T system without a screen or keyboard. In this section, we will discuss how to use some of the device's voice-centric communication functions. This can be complemented by a visual element (LEDs) and a touch element (vibrate).

๐Ÿ“˜

URN Type for Say and Listen

The target URN for these methods needs to be an interaction URN. Interaction URN's are formatted like the following:
urn:relay-resource:name:interaction:my_test?device=urn%3Arelay-resource%3Aname%3Adevice%3AName
With 'Name' being the name of the device you are interacting with. This interaction URN is sent to your workflow with the INTERACTION_STARTED event after you call startInteraction.

Say

Allowing the device to communicate with the user is important in order to send out messages to a group or prompt a user to give the device input. The say() function utilizes text-to-speech (TTS) in order for the device to "talk" to the user.

As previously seen in the Hello World workflow, you were able to have the device prompt the user for their name (i.e., the device asked the user "What is your name?"). This was done using the sayAndWait() function. The say() function will return quickly and does not block. The sayAndWait() function will block until the speech completes streaming to the device, blocking in your workflow, making it more convenient for serial processing.

For example, in the Hello World workflow, the device asks the user for their name. The workflow does this by utilizing the sayAndWait() function so that it can ask the question "What is your name?" before moving on to the next line of code where it then listens for the user to state their name with the talk button. The following code asks the user for their name:

await workflow.sayAndWait(interaction_uri, `What is your name?`)
await workflow.say_and_wait(interaction_uri, 'What is your name?')
await Relay.SayAndWait(this, interactionUri, "What is your name?");
relay.sayAndWait(interactionUri, "What is your name?");
api.SayAndWait(interactionUri, "What is your name?", sdk.ENGLISH)

The sayAndWait() function in this example takes two parameters: a target and a string of text. The target parameter is who you want to say your message to; what device should get your message. In this case, our target is the device that triggered the workflow, and for which we started an interaction. The text parameter is what you would like to say to the user. This can be a question or message.

Events

As the speech is being streamed to and playing out on the device, the Relay server will generate two events that will be sent to your workflow application. The PROMPT_START event is fired when the speech begins to stream to the device. The prompt_stop event is fired when the speech is done streaming to the device. If you choose to use the asynchronous non-blocking say method, this is how you tell when it is complete. The sayAndWait command will automatically block until it receives the prompt_stop event, so you don't need to look for that event yourself.

Listen

The Relay device is able to listen to words or phrases that are spoken into the device, and transcribe them into text (speech-to-text).

๐Ÿ“˜

No Passive Listening

Note that Relay devices never do passive listening for keywords, especially sending that the audio from that passive listening off-device, like other smart assistants from the big companies. With Relay, you have to press the button to capture audio.

For example, in the Hello World workflow in the Create your first workflow section, you should have noticed that you needed to listen for the user's name in order to later greet the user with their name (i.e., so the device can say "Hello (user's name)..."). With the listen() function, this is how your Relay device can accept spoken information and transcribe it to text so you can store that input within your code for the use of other functions or actions. For example, the following code makes the relay device listen for the user's name, and then sets the userProvidedName variable equal to the user's response:

const { text: userProvidedName } = await workflow.listen(interaction_uri)
user_provided_name = await workflow.listen(interaction_uri)
var userProvidedName = await Relay.Listen(this, sourceUri);
String userProvidedName = relay.listen(interactionUri, "request_1");
userProvidedName := api.Listen(interactionUri, []string {}, false, sdk.ENGLISH, 30)

If you expect a response from a list of known phrases, then you can improve transcription accuracy by specifying that list as follows:

await workflow.say(interaction_uri, `Would you like a hot or cold drink?`)
const temp = await workflow.listen(interaction_uri, ['hot','cold'])
await workflow.say(interaction_uri, 'Would you like a hot or cold drink?')
temp = await workflow.listen(interaction_uri, 'request1', ['hot', 'cold'])
await Relay.Say(this, interactionUri, "Would you like a hot or cold drink?");
var temp = await Relay.Listen(this, interactionUri, ["hot", "cold"]);
relay.say(interactionUri, "Would you like a hot or cold drink?");
String temp = relay.listen(interactionUri, "request_1", new String[]{"hot", "cold"});
api.Say(interactionUri, "Would you like a hot or cold drink?", sdk.ENGLISH)
temp := api.Listen(interactionUri, []string {"hot", "cold"}, false, sdk.ENGLISH, 30)

Note that the returned result is not limited to that list of phrases. Providing the list helps the speech-to-text function in Relay to identify a match when the spoken word is "hot" or "cold" (with various inflections, etc), but other input (e.g. "purple") will be returned as transcribed. If you want to limit the transcribed text to one of those phrases, you'll need to handle that yourself in your workflow logic.

If the device cannot understand what was spoken, it will wait for you to hold down the talk button again and repeat the spoken phrase. Once it is able to parse the speech into text, it will continue through the workflow. Until then, it will block. Your workflow should consider a timeout for this.

Events

During a listen, after the user has held down the Talk button, spoken something into the device, released the Talk button, and the device has successfully completed the speech-to-text transcription, it will fire the SPEECH event.

Setting a language

Say and listen can also take an additional parameter that sets the language on the device, so that it can do transcription of a language other than English. Relay supports a number of different languages, consisting of:

  • ENGLISH = en-US
  • GERMAN = de-DE
  • SPANISH = es-ES
  • FRENCH = fr-FR
  • ITALIAN = it-IT
  • RUSSIAN = ru-RU
  • SWEDISH = sv-SE
  • TURKISH = tr-TR
  • HINDI = hi-IN
  • ICELANDIC = is-IS
  • JAPANESE = ja-JP
  • KOREAN = ko-KR
  • POLISH = pl-PK
  • PORTUGUESE = pt-BR
  • NORWEGIAN = nb-NO
  • DUTCH = nl-NL
  • CHINESE = zh

In the above example where we ask 'Would you like a hot or cold drink?', we can specify the language of the text parameter by using the enum value 'fr-FR' as the third parameter, provide the text message in French, then the device will speak that French phase.

// 'Would you like a hot or cold drink?'
await workflow.say(interaction_uri, `Envie d'une boisson chaude ou froide?`, 'fr-FR')
await workflow.say(interaction_uri, "Envie d'une boisson chaude ou froide?", 'fr-FR')
await Relay.Say(this, interactionUri, "Envie d'une boisson chaude ou froide?", "fr-FR");
relay.say(interactionUri, "Envie d'une boisson chaude ou froide?", LanguageType.French);
api.Say(interactionUri, "Envie d'une boisson chaude ou froide?", sdk.FRENCH)

We can do the same with the sayAndWait() and the listen() functions. When using the value 'fr-FR' language specification parameter in the listen() function, the device listens for a phrase spoken in French, and then attempts to generate a text transcription in French.

const temp = await workflow.listen(interaction_uri, ['chaude','froide'], 'fr-FR')
temp = await workflow.listen(interaction_uri, 'request1', ['chaude', 'froide'], 'fr-FR')
var temp = await Relay.Listen(this, interactionUri, ["chaude", "froide"], "fr-FR");
String temp = relay.listen(interactionUri, "request_1", new String[] {"chaude", "froide"}, LanguageType.French);
temp := api.Listen(interactionUri, []string {"chaude", "froide"}, false, sdk.FRENCH, 30)