NAV Navbar

Mix tips for IVR developers

This document provides tips for IVR developers and points out important factors to consider when creating your Mix project, designing your Mix.dialog conversations, and when using the Dialog as a Service runtime API.

Create a project in Mix

This section describes how to choose channel, modalities, and engine pack versions for IVR projects.

Channel and modalities

When creating a project in Mix you specify the communication channels you intend to support. The predefined channels have recommended modalities preselected for you but you can customize the channels to suit your needs. You can even create and name your own custom channel.

Mix offers three project templates with default channels and modalities. For example, the IVR (Interactive Voice Response) predefined channel is meant to support the classic interactive touchtone telephone system using voice and DTMF, with prerecorded audio files for prompts (called messages in Mix.dialog).

A modality specifies a format used to exchange information with users, such as text-to-speech (TTS), audio, text, and so on. Modalities determine the options that are available for a channel in Mix.dialog. For example, if you select the DTMF modality, you will be able to map DTMF keys to entity values in the DTMF properties of question and answer nodes. If you don’t select the DTMF modality for any of the channels in your project, these properties will not be available, nor will any DTMF-related settings appear in the Project Settings panel for your project.

Consider the following factors when making your modality selections for an IVR project:

Unless you're creating an omnichannel project, you can skip Rich Text and Interactivity:

Once you have specified the channels and modalities for your project, you must select a use case (General is the default), followed by the languages you wish to support. The available languages depend on the selected use case. For more information, see Create a project.

Engine packs

When creating a project in Mix you must select the engine pack version that corresponds to the engines installed in your self-hosted environment.

This ensures that the resources generated for your project (ASR DLMs, NLU models, and Dialog models) are compatible with the engine versions you have installed. The engine pack version also determines the tooling features you can access. In the Mix tools, features introduced in a later engine pack version are not available until you upgrade to the engine pack version supporting these new features. This ensures that changes introduced in any hosted engines in the Mix runtime environment will not impact existing projects.

An engine pack version includes a major and a minor version number. Speech Suite 11 is only compatible with the 2.x releases of engine packs. To determine which specific engine pack version to use, select the version corresponding to your installed engines.

For the list of engine packs available for Speech Suite deployments, see Speech Suite.

If you are creating a project in Mix that needs to be deployed to an IVR environment with Speech Suite 11 (11.0.9 and up) and the Nuance Dialog engine—whether these components are self-hosted or Nuance-hosted (Nuance-Hosted IVR)—you need to select an engine pack that matches the versions of Speech Suite and Dialog engine installed in the IVR environment.

For example, consider the following Speech Suite 11 deployment:

For this deployment, you would select the 2.0 engine pack.

To determine your installed engine versions and recommended engine pack version to select for your project:

For more information about engine packs, see Manage engine packs and data packs.

Prepare audio and grammar files

In order for the application to retrieve audio and grammar files at runtime, you must place the files in specific directories relative to the client application.

Prerecorded messages

The expected location for prerecorded audio files follows this format:

basePath/language/prompts/library/channel/

Where:

Mix.dialog can generate filenames for your messages. When Mix builds the Dialog model for an application, it uses Audio File ID as the base filename for messages. If Audio File ID is not defined, Mix falls back to Message ID. For more information, see Add a message for your project.

The filename extension—.wav (default), .vox, or .ulaw—is set in your project settings.

VoiceXML Connector automatically appends the version query parameter, which refers to a specific Dialog build. (Mix generates a unique version number for each deployment.) Example: http://webserver:8080/myContent/en-US/prompts/default/IVRVoiceVA/promptName.wav?version=1.0_123456789

In the absence of a new Dialog build, VoiceXML Connector lets you explicitly specify version with an arbitrary value to make sure your VoiceXML application uses the latest version of a resource—in this case, an audio file—instead of any cached version. For more information, refer to the VoiceXML Connector documentation.

Dynamic message references

Dialog designs that support the Audio Script modality also support dynamic audio file references in messages. Rather than have the message point to a fixed audio file, a data access node is used to retrieve a reference to an audio file, whether from the client application, or from an external data store or endpoint. For more information, see DynamicMessageReference schema.

Dynamic messages

Mix.dialog gives you the flexibility to use HTML markup in Rich Text messages, and SSML tags in TTS and Audio Script messages. You can also create dynamic messages by adding placeholders (annotations) to be replaced at runtime based on what the user said or other circumstances. See Dynamic messages for more information.

Use the same directory structure as described above. For a message that contains dynamic data along with static text, the Dialog build references separate audio files with a suffix, such as _01, _03, and so on, appended to the base filename.

Runtime representation of a dynamic message for audio playback

"audio": [{
        "text": "Adding ",
        "mask": false,
        "bargeInDisabled": false,
        "uri": "en-US/prompts/default/IVRchannel/Report_quantity_added_ini_01_01.wav?version=preview_1.0_1634153564362"
    }, {
        "text": "",
        "mask": false,
        "bargeInDisabled": false,
        "uri": "en-US/prompts/default/IVRchannel/cpr/silence.natnum.precpr.wav?version=preview_1.0_1634153564362"
    }, {
        "text": "",
        "mask": false,
        "bargeInDisabled": false,
        "uri": "en-US/prompts/default/IVRchannel/cpr/silence.global.natnum.wav?version=preview_1.0_1634153564362"
    }, {
        "text": "5",
        "mask": false,
        "bargeInDisabled": false,
        "uri": "en-US/prompts/default/IVRchannel/cpr/m.natnum.units05.wav?version=preview_1.0_1634153564362"
    }, {
        "text": "",
        "mask": false,
        "bargeInDisabled": false,
        "uri": "en-US/prompts/default/IVRchannel/cpr/silence.global.natnum.wav?version=preview_1.0_1634153564362"
    }, {
        "text": "",
        "mask": false,
        "bargeInDisabled": false,
        "uri": "en-US/prompts/default/IVRchannel/cpr/silence.natnum.postcpr.wav?version=preview_1.0_1634153564362"
    }, {
        "text": " units to your order",
        "mask": false,
        "bargeInDisabled": false,
        "uri": "en-US/prompts/default/IVRchannel/Report_quantity_added_ini_01_03.wav?version=preview_1.0_1634153564362"
    }
]

For example, consider this audio script message, and its runtime representation on the right:

Editor message audio script dynamic

At runtime, the Dialog service represents this dynamic message in separate parts:

See Message actions, in the Dialog as a Service gRPC API documentation, for more information.

Dynamic concatenated audio packages

Annotations that represent variables or entities of certain types (date, time, temperature, and so on) support output formatting. At runtime, some VoiceXML applications can play Audio Script messages by using audio files identified in a package via the proprietary Nuance CPR algorithm for dynamic concatenated audio. If a recorded audio file cannot be found in the recorded audio package, TTS playback is used as a fallback. References to files located in dynamic concatenated audio packages have the same format with the addition of the /cpr subdirectory before the prompt name, as shown in this example:

http://webserver:8080/myContent/en-US/prompts/default/IVRVoiceVA/cpr/silence.global.date.wav?version=1.0_123456789

For more information, including on the formatting options available for the Audio Script, TTS, and Rich Text modalities, see Output formatting options. Note that support for dynamic concatenated audio is limited to specific languages.

Grammars for recognizing speech input

Mix supports uncompiled (.grxml) and precompiled (.gram) grammar file formats. Grammar files must be placed in specific directories, relative to the client application, following a similar pattern to audio files:

basePath/language/grammars/channel/filename

Where:

VoiceXML Connector automatically appends the version query parameter, which refers to a specific Dialog build. (Mix generates a unique version number for each deployment.) Example: http://webserver:8080/myContent/en-US/grammars/IVRVoiceVA/myGrammar.grxml?version=1.0_12345689

In the absence of a new Dialog build, VoiceXML Connector lets you explicitly specify version with an arbitrary value to make sure your VoiceXML application uses the latest version of a resource—in this case, a grammar—instead of any cached version. For more information, refer to the VoiceXML Connector documentation.

DTMF mappings and DTMF grammars

For IVR dialogs, you can also define mappings to handle DTMF input. You can either define the mappings in Mix.dialog or use a DTMF grammar file. See the section below for some advice. Place DTMF grammar files in a directory relative to the client application, the same as for speech recognition grammar files.

Use and manage grammars

Whether you're using Mix.nlu to recognize the caller's request or response, a Nuance Recognizer speech grammar (.grxml or .gram), DTMF, or a combination, VoiceXML Connector converts this into a <grammar> tag for your application. Mix.dialog provides the flexibility to handle different combinations of models and grammars, at any question and answer nodes in your dialog design.

For example, you might use:

Scenario Example use cases Enabling for entity collection and confirmation Command handling
ASR/NLU models only NLU collection (intent and entities) Default, no actions required. Commands must be part of the NLU model. See Configure global commands.
ASR/NLU models, and DTMF Menu-type collection, or simple input See Enable DTMF. Speech commands must be part of the NLU model.
DTMF mappings can be specified in Mix.dialog, or through an external DTMF grammar.
NR grammar only Alphanumeric collection, or complex speech-only recognition See Enable Nuance Recognizer grammars. Speech commands must be defined in an external grammar, and also configured in Mix.dialog.
NR grammar, and DTMF Collecting strings of digits, or reusing grammars from an existing IVR project See Enable Nuance Recognizer grammars and Enable DTMF. Speech commands must be defined in an external grammar.
DTMF mappings can be specified in Mix.dialog, or through an external DTMF grammar.
DTMF only Language selection See DTMF-only input states DTMF mappings can be specified directly in Mix.dialog, or through an external DTMF grammar.

Enable Nuance Recognizer grammars

Turn on Reference speech grammars in nodes for the appropriate channels in your project settings, and then specify the required speech grammars for entity collection (and for command overrides, if needed), in the desired question and answer nodes.

A question and answer node that collects an entity for which a Nuance Recognizer built-in grammar exists can reference the built-in speech grammar in a similar fashion. In such cases, the grammar reference is a URI specifying the type and name of the built-in grammar, any desired parameters, and the name of the entity being collected. For example, the node that collects an entity called ACCOUNT_NUMBER (based on nuance_CARDINAL_NUMBER) as a 7-digit string can reference the built-in digits speech grammar as:
builtin:grammar/digits?length=7;entity=ACCOUNT_NUMBER;
Where grammar indicates that we’re using a speech grammar, digits is the name of the desired built-in grammar, and ACCOUNT_NUMBER is the name of the entity to collect.
This requires VoiceXML Connector 0.17 or later. Refer to your Speech Suite documentation for more information on built-in grammars.

If you wish to support speech interaction at confirmation turns, see Specify grammars for confirmation.

Note that you can also specify speech grammar files for commands, in which case you must also specify a speech grammar for the entity in focus, at every question and answer node that must support command entity values.

Enable DTMF

Prerequisite: Your Mix project must have a channel that supports DTMF interaction. See Manage targets, for more information.

The ability to reference DTMF grammar files in question and answer nodes is enabled by default, in your project settings, for channels that support DTMF.

Any question and answer nodes can reference external DTMF grammars files: see Specify grammars.

A question and answer node that collects an entity for which a Nuance Recognizer built-in grammar exists can reference the built-in DTMF grammar in a similar fashion. In such cases, the grammar reference is a URI specifying the type and name of the built-in grammar, any desired parameters, and the name of the entity being collected. For example, the node that collects an entity called ACCOUNT_NUMBER (based on nuance_CARDINAL_NUMBER) as a 7-digit string can reference the built-in digits DTMF grammar as:
builtin:dtmf/digits?length=7;entity=ACCOUNT_NUMBER;
Where dtmf indicates that we’re using a DTMF grammar, digits is the name of the desired built-in grammar, and ACCOUNT_NUMBER is the name of the entity to collect.
This requires VoiceXML Connector 0.17 or later. Refer to your Speech Suite documentation for more information on built-in grammars.

For question and answer nodes that collect a list entity, you can also set DTMF mappings directly in the node, if desired. VoiceXML Connector can interpret DTMF input based on the dtmf_mappings information in the ExecuteResponse payload, without the need for an external DTMF grammar reference. See RecognitionSettings, in the Dialog as a Service gRPC API documentation, for more information.

In your project settings, you can specify a global DTMF grammar to support DTMF interaction at confirmation turns, see Specify grammars for confirmation.

Note that you can also specify DTMF grammar files for commands.

Set global properties and behaviors

Speech settings example

Timeout-related settings example for speech interaction

DTMF settings example

Use the Project Settings panel of Mix.dialog to define settings and behaviors that determine how your application handles commands and events, as well as collection, confirmation, and recovery. See Global settings and behaviors. You can set most parameters globally, for all channels; some are also, or only, available for specific channels, or for specific entities. Some settings are also available at the node level, for question and answer nodes and message nodes, or at the message level.

Note the limitations associated with some VoiceXML configuration parameters. This table shows VoiceXML parameters applicable to speech interaction, and their Mix.dialog equivalent:

VoiceXML parameter Mix.dialog setting Limitations
speedvsaccuracy Speed vs accuracy, in the Speech settings category (global, channel level, entity level, node level) Not supported by either Nuance Recognizer or Dragon Voice
bargein Enabled by default—can be disabled at specific nodes (speech settings of question and answer nodes, node settings of message nodes), and for individual messages
bargeintype value hotword Barge-in type, in the Speech settings category (global, channel level, entity level, node level)—speech is enabled by default Not supported by Dragon Voice
sensitivity Sensitivity, in the Speech settings category (global, channel level, entity level, node level)
timeout Timeout, in the Collection settings category (global, channel level, entity level, node level), and the Confirmation settings category (global, channel level)
completetimeout Complete timeout, in the Collection settings category (global, channel level, entity level, node level), and the Confirmation settings category (global, channel level) Ignored by Dragon Voice
incompletetimeout Incomplete timeout, in the Collection settings category (global, channel level, entity level, node level), and the Confirmation settings category (global, channel level)
maxspeechtimeout Max Speech timeout, in the Collection settings category (global, channel level, entity level, node level), and the Confirmation settings category (global, channel level)

The table below shows VoiceXML parameters applicable to DTMF interaction, and the corresponding settings in Mix.dialog. These settings are available in Mix.dialog under the DTMF settings category:

VoiceXML parameter Mix.dialog setting
interdigittimeout Interdigit timeout
termtimeout Terminating timeout
termchar Terminating character

The required TTS language and voice should be installed in the environment and referenced in the appropriate document for your environment. Refer to the VoiceXML Connector documentation for more information.

Understand confidence evaluation

Confidence thresholds example for collection turns

Confirmation strategy, and confidence threshold for confirmation turns

Confirmation strategy type example for a specific entity

Confidence thresholds example for a specific entity

At runtime, the Dialog service handles confidence scores that VoiceXML Connector relays from your recognition engine (Nuance Recognizer or Dragon Voice) along with recognition results. When there are multiple results, the Dialog service chooses the best one, based on context and confidence score. The Dialog service then uses the applicable Confirmation strategy type, Confidence-high threshold, and Confidence-low threshold, to evaluate whether to accept the result, reject it, or elicit a confirmation turn. In Mix.dialog, you can find the thresholds and confirmation strategy types in your project settings.

At collection turns:

At confirmation turns, if the confidence score of the collected utterance is lower than the applicable Confidence-low threshold, the recognition result is rejected, and this generates a nomatch event.

The global confirmation strategy type is Never, by default. You can change the global strategy, or set a different confirmation strategy for specific channels, or for specific entities. You can also set confidence thresholds by language, for collection or for confirmation, for specific channels, or for specific entities.

Handle escalations

In your dialog design, you can use external actions nodes set up for a transfer action, to pass information to your VoiceXML application. When your application transfers the user to a live agent, it can make this information available for the agent, for example.

Use throw event actions, in question and answer nodes, message nodes, or decision nodes, to throw custom events, or global command events when you want to handle a situation the same way as the corresponding global command. For example, at the point in your dialog flow, where you want to transfer the user to a live agent, you can set a throw event action to throw the Escalate event.

Configure event handlers to catch error events, command events, or events thrown via throw event actions. The destination for an event handler can be an external actions node. It can also be a node of any type, which you might use to catch the event and perform additional logic before transitioning to the external actions node that will trigger the transfer action.

At runtime, VoiceXML Connector relays all the variables from the external actions node's Send Data parameters, to your VoiceXML application. Eventually, the application returns information, which VoiceXML Connector relays to the Dialog service. Depending on the outcome of the transfer action, the Dialog service will follow the success path (user has been transferred successfully), or the failure path (user could not be transferred), in your dialog flow.

Limitations

This section lists current limitations and feature gaps to consider when using Mix for a VoiceXML application.

Messages/prompts

Global commands

External resources

Recognition/interpretation

Voices/languages

Try mode

The Try mode supports rudimentary testing of IVR dialogs. For complete testing, you need to deploy your VoiceXML application, and test it with VoiceXML Connector.

FAQs

This section answers common how-to questions.

Specify DNIS and ANI

When your IVR application invokes the dialog model, it can pass on the caller’s DNIS and ANI information (retrieved from telephony) to the dialog model for reporting purposes. Your dialog design can refer to this information through these fields of the userData predefined variable:

See Exchanging session data, in the Dialog as a Service gRPC API documentation, for more information.

Enable hotword detection

Nuance Recognizer supports hotword detection. To enable hotword detection, set Barge-in type to hotword, at the appropriate level, in Mix.dialog.

Mark data as sensitive

For applications involving information that must be masked in application logs, Mix.dialog lets you mark specific entities and variables as sensitive. In the StartRequestPayload of your client application, set suppress_log_user_data to True, to disable logging for ASR, NLU, TTS, and Dialog. For more information, see:

Handle noinput or hang-up events

Your dialog design can handle situations such as when a caller doesn't speak or hangs up. Create event handlers to catch the predefined events MaxNoinput, and UserDisconnect, at the appropriate level. When a caller hangs up, this generates a UserDisconnect event. When your application has prompted a caller for the same piece of information for the maximum applicable number of times (default is 3), without eliciting any response, this generates a MaxNoinput event. You can set Maximum number of no inputs to the desired value, in the Collection settings category (global, channel level, entity level, node level). For more information, see:

Play message on hold

For any data access nodes in your dialog design, you can set a message to be played while the caller is waiting for a backend system to return the desired data, and specify when to start playing the message, and for how long. For more information, see Specify a latency message, in the Mix.dialog documentation.

Export a grammar specification document

See Manage grammars, for a detailed description of the grammar specification document and how to export it from Mix.dialog.

Generate filenames for the Audio Script modality

Mix.dialog can automatically generate static filenames for all messages in your dialog design. You can also generate a filename for individual messages. You cannot generate a filename for messages that are not used. However, you can enter one manually (maximum 255 characters), if desired.

When you build your dialog model, if a message requires multiple audio files, Mix appends a suffix to the static filename based on the order in which the separate files are to play. If there are messages for which Audio File ID is blank, Mix generates filenames based on Message ID instead. The filename extension—.wav (default), .vox, or .ulaw—is set in your project settings, for all channels or for individual channels that support the Audio Script modality.

For more information, see:

Pass in VoiceXML parameters

For some VoiceXML parameters, there are no equivalent settings or node properties in Mix.dialog. If you want to set such a parameter in your dialog design, you can create a variable with the same name as the VoiceXML parameter, set it to the desired value, and pass it on to your client application as a Send Data parameter, via a data access node, external actions node, or question and answer node. Alternatively, you might prefer setting some VoiceXML properties in the stub application that invokes your dialog model.

Download configuration files and models

For VoiceXML applications, Mix lets you download NLU, ASR, and Dialog models, along with configuration files that will allow you to deploy your models to a self-hosted environment, as required. See Download models for an application configuration, for details.

Change log

Below are changes made to the Mix tips for IVR developers documentation.

2022-08-05

Updated Channel and modalities, Enable DTMF, and Limitations, to reflect the ability to add, modify, and disable channels after a Mix project has been created

2022-06-06

Added Download configuration files and models

2021-11-22

Added Engine packs to describe how to select an engine pack version for IVR projects

2021-10-13

Updated Dynamic messages to reflect the ability to apply TTS output formatting for Audio Script messages in Mix.dialog

2021-09-28

Updated FAQs and Limitations to reflect that it is now possible to specify, directly in your dialog design, latency messages (messages on hold) to play while data access operations are pending

2021-08-23

Added FAQs

2021-08-18

Added Limitations