Dialog as a Service gRPC API

DLGaaS allows conversational AI applications to interact with Mix dialogs

Nuance Dialog

Dialog as a Service is Nuance's omni-channel conversation engine. The Dialog as a Service API allows client applications to interact with conversational agents created with the Mix.dialog web tool. These interactions are situated within a cohesive conversational session that keeps track of the ongoing context of the conversation, similar to what we do during the back and forth of a conversation with a person.

The gRPC protocol provided by Dialog as a Service allows a client application to interact with a dialog in all the programming languages supported by gRPC.

gRPC is an open source RPC (remote procedure call) software used to create services. It uses HTTP/2 for transport and protocol buffers to define the structure of the application. Dialog as a Service supports the gRPC proto3 version.

Version: v1

This release supports version v1 of the Dialog as a Service protocol. See gRPC setup to download the proto files and get started.

Dialog essentials

From an end-user's perspective, a dialog-enabled app is one that understands natural language, can respond in kind, and, where appropriate, can extend the conversation by following up the user's turn with appropriate questions and suggestions, all the while maintaining a memory of the context of what happened earlier in the conversation.

Dialogs are created using Mix.dialog; see Creating Mix.dialog Applications for more information. This document describes how to access a dialog at runtime from a client application using the DLGaaS gRPC API.

This section introduces concepts that you will need to understand to write your client application.

What is a conversation?

The flow of the DLGaaS API is based around the metaphor of a conversation between two parties. Specifically, with DLGaaS, this is a conversation between one human user—who enters text and speech inputs through some sort of client app UI—and a Dialog agent running on a server. Specifically, the API allows an interface between a client app and the Dialog agent. The model here is a conversation between a person and an agent from an organization or company that a person might want to contact.

Similar to a person dealing with a human agent, the human user is assumed to have some purpose in the conversation. They will come to the conversation with an intent, and the goal of the agent is to help understand that intent and help the person achieve it. The person might also introduce a new intent during the conversation.

To understand how the flow of the API works, it helps to reflect for a moment on what is a conversation. In its simplest form, a conversation is a series of more or less realtime exchanges between two people over a period of time. People take turns speaking, and communicate with each other in a back-and-forth pattern.

Structure of a conversation

Taken a little abstractly, a conversation between a user and a Dialog agent could look like this:

What this looks like in the API

In the Dialog client runtime API, you use a Start request to establish a conversation. This creates a session on the Dialog side to hold the conversation and any resources it needs for a set timeframe.

The dialog proceeds in a series of steps, where at each step, the client app sends input from the user and possibly data, and the Dialog agent responds by sending informational messages, prompts for input, references to files for the client to use or play, or requests to the client for data. The way this works depends on the type of input:

The flow of the API is structured around steps of user input, followed by the agent response. The agent's response at any step is a reply to the client input in the same step. But remember also that in conversations with some sort of agent, be it human or virtual, the agent also generally drives or steers the conversation. For example, opening the interaction with, "Welcome to our store. How may I help you?" In Mix.dialog, you create a dialog flow, and the conversation is driven by this flow.

By convention, an agent will generally start off the conversation and then continue to direct the flow of the conversation toward getting any additional information needed to fulfill the user's request. As well, when a user gives input, it is generally in response to something asked for in the previous step of the conversation by the agent. And when data is sent in a step, it is a response to a request for data in the previous step.

At the start of the conversation, the client app needs a way to "poke" the API to reply with the initial greeting prompts, but without sending any input. The API enables you do this by sending a first Execute request with an empty payload. This causes the Dialog agent to respond with its standard initial greeting prompts, and the conversation is underway.

See Client app development for a more detailed description of how to access and use the API to carry out a conversation.


A session represents an ongoing conversation between a user and the Dialog service for the purpose of carrying out some task or tasks, where the context of the conversation is maintained for the duration. For example, consider the following scenario for a coffee app:

A session is started by the client, and ends when the natural flow of the conversation is complete or the session times out.

The length of a session is flexible, and can can handle different types of dialog, from a short burst of interaction to carry out one task for a user, or a series of interactions carrying out multiple tasks over an extended period of time.

Session ID

The interactions between the client application and the Dialog service for this scenario occur in the same session. A session is identified by a session ID. Each request and response exchanged between the client app and the Dialog service for that specific conversation must include that session ID referencing the conversation, and its context. If you do not provide a session ID, a new session is created and you are provided with a new session ID.

Session context

A session holds a context of the history of the conversation. This context is a memory of what the user said previously and what intents were identified previously. The context improves the performance of the dialog agent in subsequent interactions by giving additional hints to help with interpreting what the user is saying and wants to do. For example, if someone has just booked a flight to Boston, and then asks to book a hotel, it is quite likely the person wants to book a hotel in the Boston area, starting the same day as the flight arrives.

The session context is maintained throughout the lifetime of the session and added to as the conversation proceeds.

Session lifetime

A session's length in time is bounded by a session timeout limit, after which an idle session terminates if not already closed by the conclusion of the natural dialog flow.

Configure session lifetime

This limit is configurable up to a maximum of 259200 seconds or 72 hours (default of 900 seconds) and can be set at the start of the dialog using the Start method.

For more information on session IDs and session timeout values, see Step 3. Start conversation.

Check remaining session lifetime

Using the session ID, a client application can check whether the session is still active and get an estimate of how much time is left in the session using the Status method. For more information, see Step 5. Check session status.

Reset session time remaining

For asynchronous channels, you may want or need to keep the session going for longer than the upper limit. The client application can reset the time remaining in the session to the original limit by using either the Execute, ExecuteStream, or Update method.

If you simply want to reset the time remaining to keep the session alive without otherwise advancing the conversation, send an UpdateRequest specifying the session ID but with the payload left empty. For more information, see Step 6. Update session data.

Session data

Each session has memory designated to hold data related to the session. This includes contextual information about the user inputs during the session as well as session variables.

Session variables

Variables of different types can be used to hold data needed during a session. Dialog includes several useful predefined variables. You can also create new user-defined variables of various types in Mix.dialog.

For both predefined and user-defined variables, values can be assigned:

Different variable types have their respective access methods defined, allowing you to retrieve variable values and components of those values in Mix.dialog. This allows you to define conditions, create dynamic messages content, and make assignments to other variables.

Assigning variables through data transfer

In some situations, you may want to send variables data from the client application to the Dialog service to be used during the session. For example, at the beginning of a session, you might want to send the geographical location of the user, the user name and phone number, and so on. You might also want to update the same values mid-session. As well, data transfers can be used during the session to provide wordsets specifying the relevant options for dynamic list entities.

Note: You can only assign values for variables that have already been defined in Mix.dialog, whether predefined or user-defined.

For more information, see Exchanging session data.

Session data lifetime

Values for variables stored in the session persist for the lifetime of the session or until the variable is updated or cleared during the session.

Playing messages and providing user input

The client application is responsible for playing messages to the user (for example, "What can I do for you today?") and for collecting and returning the user input to the Dialog service (for example, "I want a cappuccino").

Messages can be provided to the user in the form of:

The client app can then send the user input to the Dialog service in a few ways:

Orchestration with other Mix services

To support the Dialog service, different natural language and speech tasks will generally be required, depending on the channels your application is using and the types of input you are dealing with. You may need one or more of the following:

The Dialog service does not itself perform these tasks but relies on other services to carry them out.

The Mix platform offers a set of Conversational AI services to handle these tasks:

Your client application can handle these tasks either with the Mix services, or by using third party services.

Dialog service offers the possibility of special integration when using Mix services. Properly formatted requests sent to DLGaaS will automatically trigger calls to other Mix services. Rather than needing to separately call the other Mix services, Dialog can orchestrate with the other Mix services behind the scenes as follows. The Dialog service:

  1. Prepares and forwards a request to the specific Mix service
  2. Receives the response from the Mix service
  3. Prepares and forwards this response to the client application bundled as part of the standard DLGaaS response to the initial DLGaaS request

For orchestrated ASRaaS and TTSaaS requests, the DLGaaS service supports streaming of the audio input/output in both directions.

For more details about how to format inputs to trigger orchestration with Mix services, see Client app development.

Alternatively, if you prefer, you can directly handle the orchestration with the other Mix services or even third party tools rather than leaving it to Dialog.

Nodes and actions

Mix.dialog nodes that trigger a call to the DLGaaS API Mix.dialog nodes

You create applications in Mix.dialog using nodes. Each node performs a specific task, such as asking a question, playing a message, and performing recognition. As you add nodes and connect them to one another, the dialog flow takes shape in the form of a graph.

At specific points in the dialog, when the Dialog service requires input from the client application, it sends an action to the client app. In the context of DLGaaS, the following Mix.dialog nodes trigger a call to the DLGaaS API and send a corresponding action:

Question and answer

The objective of the question and answer node is to collect user input. It sends a message to the client application and expects user input, which can be speech audio, a text utterance, or a natural language understanding interpretation. For example, in the coffee app, the dialog may tell the client app to ask the user "What type of coffee would you like today" and then to return the user's answer.

The message specified in a question and answer node is sent to the client application as a question and answer action. To continue the flow, the client application must then return the user input to the question and answer node.

See Question and answer actions for details.

Data access

A data access node expects data from a data source to continue the flow. The data source can either be a backend server or the client app, and this is configurable in Mix.dialog. For example, in a coffee app, the dialog may ask the client application to query the price of the order or to retrieve the name of the user.

When Mix.dialog is configured for client-side data access, information is sent to the client application in a data access action, identifying what data the Dialog service needs and providing any input data needed to retrieve that information. It also provides information to help the client application smooth over any delays while waiting for the data access. To continue the flow, the client application must return the requested data to DLGaaS.

See Data access actions for details.

When Mix.dialog is configured for server-side backend data access, DLGaaS sends the client application a continue action and awaits a response before proceeding with the data access. The continue action provides information to help the client application smooth over any delays waiting on the DLGaaS communicating with the server backend. To continue the flow, the client application must respond to DLGaaS.

See Continue actions for details.

External actions: Transfer and End

There are two types of external actions nodes:

Message node

The message node plays a message. The message specified in a message node is sent to the client application as a message action.

See Message actions for details.


Most dialog applications can support multiple channels and languages, so you need to select which channel and language to use for an interaction in your API. This is done through a selector.

Selectors can be sent as part of a:

A selector is the combination of:

You do not need to send the selector at each interaction. If the selector is not included, the values of the previous interaction will be used.

Prerequisites from Mix

Before developing your gRPC application, you need a Mix project that provides a dialog application as well as authorization credentials.

  1. Create a Mix project:
  2. Generate a "secret" and client ID of your Mix project: see Authorize your client application. Later you will use these credentials to request an access token to run your application.
  3. Learn the URL to call the Dialog service: see Accessing a runtime service.
    • For DLGaaS, this is:

gRPC setup

Install gRPC for programming language, e.g. Python

$ pip install --upgrade pip
$ pip install grpcio
$ pip install grpcio-tools

Unzipped proto files

├── Your client apps here
└── nuance                        
    ├── dlg                       
    │   └── v1                    
    │       ├── common
    │       │   └── dlg_common_messages.proto
    │       ├── dlg_interface.proto
    │       └── dlg_messages.proto    
    ├── asr                       
    │   └── v1                    
    │       ├── recognizer.proto
    │       ├── resource.proto    
    │       └── result.proto      
    ├── tts                       
    │   └── v1                    
    │       └── nuance_tts_v1.proto
    ├── nlu                       
    │   └── v1                    
    │       ├── interpretation-common.proto
    │       ├── multi-intent-interpretation.proto
    │       ├── result.proto
    │       ├── runtime.proto         
    │       └── single-intent-interpretation.proto  
        ├── error_details.proto
        ├── status.proto
        └── status_code.proto

For Python, use protoc to generate stubs

$ echo "Pulling support files"
$ mkdir -p google/api
$ curl > google/api/annotations.proto
$ curl > google/api/http.proto

$ echo "generate the stubs for support files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/http.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/annotations.proto
$ echo "generate the stubs for the DLGaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ --grpc_python_out=./ nuance/dlg/v1/dlg_interface.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/dlg/v1/dlg_messages.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/dlg/v1/common/dlg_common_messages.proto

$ echo "generate the stubs for the ASRaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ --grpc_python_out=. nuance/asr/v1/recognizer.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./  nuance/asr/v1/resource.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/asr/v1/result.proto

$ echo "generate the stubs for the TTSaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ --grpc_python_out=./ nuance/tts/v1/nuance_tts_v1.proto

$ echo "generate the stubs for the NLUaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ --grpc_python_out=./ nuance/nlu/v1/runtime.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/nlu/v1/result.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/nlu/v1/interpretation-common.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/nlu/v1/single-intent-interpretation.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/nlu/v1/multi-intent-interpretation.proto

$ echo "generate the stubs for supporting files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/rpc/error_details.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/rpc/status.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/rpc/status_code.proto

Final structure of protos and stubs for DLGaaS files after unzip and protoc compilation

├── Your client apps here
└── nuance                        
    ├── dlg                       
    │   └── v1                    
    │       ├── common
    │       │   ├── dlg_common_messages.proto
    │       │   └──
    │       ├── dlg_interface.proto
    │       ├──
    │       ├──
    │       ├── dlg_messages.proto
    │       └──    
    ├── asr                       
    │   └── v1                    
    │       ├──
    │       ├──
    │       ├── recognizer.proto
    │       ├──
    │       ├── resource.proto    
    │       ├──
    │       └── result.proto      
    ├── tts                       
    │   └── v1                    
    │       ├── nuance_tts_v1.proto
    │       ├──
    │       └──
    ├── nlu                       
    │   └── v1                    
    │       ├──
    │       ├── interpretation-common.proto
    │       ├──
    │       ├── multi-intent-interpretation.proto
    │       ├── result.proto
    │       ├──
    │       ├── runtime.proto
    │       ├──
    │       ├── 
    │       ├──            
    │       └── single-intent-interpretation.proto  
        ├── error_details.proto
        ├── status.proto
        ├── status_code.proto

The basic steps in using the Dialog as a Service gRPC protocol are:

  1. Install gRPC for the programming language of your choice, including C++, Java, Python, Go, Ruby, C#, Node.js, and others. See gRPC Documentation for a complete list and instructions on using gRPC with each language.
  2. Download the zip file containing the gRPC .proto files for the Dialog service. These files contain a generic version of the functions or classes that can interact with the dialog service.
    See Note about packaged proto files below.
  3. Unzip the file in a location that your applications can access, for example in the directory that contains or will contain your client apps.
  4. Generate client stub files in your programming language from the proto files. Depending on your programming language, the stubs may consist of one file or multiple files per proto file. These stub files contain the methods and fields from the proto files as implemented in your programming language. You will consult the stubs in conjunction with the proto files. See gRPC API.
  5. Write your client app, referencing the functions or classes in the client stub files. See Client app development for details and a scenario.

Note about packaged proto files

The DLGaaS API provides features that require that you install the ASR, TTS, and NLU proto files, as well as certain supporting files:

For your convenience, these files are packaged with the DLGaaS proto files available here, and this documentation provides instructions for generating the stub files.

As such, the following files are packaged with this documentation:

Client app development

This section describes the main steps in a typical client application that interacts with a Mix.dialog application. In particular, it provides an overview of the different methods and messages used in a sample order coffee application.

Sample dialog exchange

To illustrate how to use the API, this document uses the following simple dialog exchange between an end user and a dialog application:


The DialogService is the main entry point to the Nuance Dialog service.

A typical workflow for accessing a dialog application at runtime is as follows:

  1. The client application requests the access token from the Nuance authorization server.
  2. The client application opens a secure channel using the access token.
  3. The client application creates a new conversation sending a StartRequest to the DialogService. The service returns a session ID, which is used at each interaction to keep the same conversation. The client application also sends an ExecuteRequest message with the session ID and an empty payload to kick off the conversation.
  4. As the user interacts with the dialog, the client application sends one of the following messages, as often as necessary:
    • The ExecuteRequest message for text input and data exchange.
      An ExecuteResponse is returned to the client application when a question and answer node, a data access node, or an external actions node is encountered in the dialog flow.
    • The StreamInput message for audio input (ASR) and/or audio output (TTS).
      A StreamOutput is returned to the client application.
  5. Optionally, at any point during the conversation, the client application can check that the session is still active by sending a StatusRequest message.
  6. Optionally, at any point during the conversation, the client application can update session variables by sending an UpdateRequest message.
  7. The client application closes the conversation by sending a StopRequest message.

This workflow is shown in the following high-level sequence flow:

(Click the image for a close-up view)

For a detailed sequence flow diagram, see Detailed sequence flow.

Step 1. Generate token

Get token and run simple Mix client (


# Remember to change the colon (:) in your CLIENT_ID to code %3A
export MY_TOKEN="`curl -s -u "$CLIENT_ID:$SECRET" "" \
-d "grant_type=client_credentials" -d "scope=dlg" \
| python -c 'import sys, json; print(json.load(sys.stdin)["access_token"])'`"

python --serverUrl "" --token $MY_TOKEN --modelUrn "$1" --textInput "$2" 

Nuance Mix uses the OAuth 2.0 protocol for authorization. To call the Dialog runtime service, your client application must request and then provide an access token. The token expires after a short period of time so must be regenerated frequently.

Your client application uses the client ID and secret from the Mix.dashboard (see Prerequisites from Mix) to generate an access token from the Nuance authorization server, available at the following URL:

The token may be generated in several ways, either as part of the client application or as a script file. This Python example uses a Linux script to generate a token and store it in an environment variable. The token is then passed to the application, where it is used to create a secure connection to the Dialog service.

The curl command in these scripts generates a JSON object including the access_token field that contains the token, then uses Python tools to extract the token from the JSON. The resulting environment variable contains only the token.

In this scenario, the colon (:) in the client ID must be changed to the code %3A so curl can parse the value correctly:


Step 2. Authorize the service

def create_channel(args):    
    log.debug("Adding CallCredentials with token %s" % args.token)
    call_credentials = grpc.access_token_call_credentials(args.token)

    log.debug("Creating secure gRPC channel")
    channel_credentials = grpc.ssl_channel_credentials()
    channel_credentials = grpc.composite_channel_credentials(channel_credentials, call_credentials)
    channel = grpc.secure_channel(args.serverUrl, credentials=channel_credentials)

    return channel

You authorize the service by creating a secure gRPC channel, providing:

Step 3. Start the conversation

def start_request(stub, model_ref_dict, session_id, selector_dict={}, timeout):
    selector = Selector(channel=selector_dict.get('channel'), 
    start_payload = StartRequestPayload(model_ref=model_ref_dict)
    start_req = StartRequest(session_id=session_id, 
    log.debug(f'Start Request: {start_req}')
    start_response, call = stub.Start.with_call(start_req)
    response = MessageToDict(start_response)
    log.debug(f'Start Request Response: {response}')
    return response, call

To start the conversation, you need to do two things:

Start a new session

Before you can start the new conversation, the client app first needs to send a StartRequest message with the following information:

A new unique session ID is generated and returned as a response; for example:

'payload': {'session_id': 'b8cba63a-f681-11e9-ace9-d481d7843dbd'}

The client app must then use the same session ID in all subsequent requests that apply to this conversation.

Additional notes on session IDs

Kick off the conversation

The client app needs to signal to Dialog to start the conversation.

Send an empty ExecuteRequest to Dialog to get started. Include the session ID but leave the user_text field of the payload user_input empty.

payload_dict = {
            "user_input": {
                "user_text": None
response, call = execute_request(stub,

Step 4. Step through the dialog

At each step, the client app sends input to advance the dialog to the next step. This can take one of four different forms depending on the place in the dialog.

Step 4a. Interact with the user (text input)

def execute_request(stub, session_id, selector_dict={}, payload_dict={}):
    selector = Selector(channel=selector_dict.get('channel'), 
    input = UserInput(user_text=payload_dict.get('user_input').get('userText'))
    execute_payload = ExecuteRequestPayload(
    execute_request = ExecuteRequest(session_id=session_id, 
    log.debug(f'Execute Request: {execute_payload}')
    execute_response, call = stub.Execute.with_call(execute_request)
    response = MessageToDict(execute_response)
    log.debug(f'Execute Response: {response}')
    return response, call

Interactions that use text input and do not require streaming are done through multiple ExecuteRequest calls, providing the following information:

ExecuteResponse for output

The dialog runtime app returns the Execute response payload when a question and answer node, a data access node, or an external actions node is encountered in the dialog flow. This payload provides the actions to be performed by the client application.

There are many types of actions that can be requested by the dialog application:

For example, the following question and answer action indicates that the message "Hello! How can I help you today?" must be displayed to the user:

Note: Examples in this section are shown in JSON format for readability. However, in an actual client application, content is sent and received as protobuf objects.

"payload": {
    "messages": [],
    "qa_action": {
        "message": {
            "nlg": [],
            "visual": [{
                    "text": "Hello! How can I help you today?"
            "audio": []

A question and answer node expects input from the user to continue the flow. This can be provided as text (either to be interpreted by Nuance or as already interpreted input) in the next ExecuteRequest call. To provide the user input as audio, use the StreamInput request, as described in Step 4b.

Step 4b. Interact with the user (using audio)

def execute_stream_request(args, stub, session_id, selector_dict={}):
    # Receive stream outputs from Dialog
    stream_outputs = stub.ExecuteStream(build_stream_input(args, session_id, selector_dict))
    log.debug(f'execute_responses: {stream_outputs}')
    responses = []
    audio = bytearray(b'')

    for stream_output in stream_outputs:
        if stream_output:
            # Extract execute response from the stream output
            response = MessageToDict(stream_output.response)
            if response: 
            audio +=
    return responses, audio

def build_stream_input(args, session_id, selector_dict):
    selector = Selector(channel=selector_dict.get('channel'), 

        with open(args.audioFile, mode='rb') as file:
            audio_buffer =

        # Hard code packet_size_byte for simplicity sake (approximately 100ms of 16KHz mono audio)
        packet_size_byte = 3217
        audio_size = sys.getsizeof(audio_buffer)
        audio_packets = [ audio_buffer[x:x + packet_size_byte] for x in range(0, audio_size, packet_size_byte) ]

        # For simplicity sake, let's assume the audio file is PCM 16KHz
        user_input = None
        asr_control_v1 = {'audio_format': {'pcm': {'sample_rate_hz': 16000}}}

        # Text interpretation as normal
        asr_control_v1 = None
        audio_packets = [b'']
        user_input = UserInput(user_text=args.textInput)

    # Build execute request object
    execute_payload = ExecuteRequestPayload(user_input=user_input)
    execute_request = ExecuteRequest(session_id=session_id, 

    # For simplicity sake, let's assume the audio file is PCM 16KHz
    tts_control_v1 = {'audio_params': {'audio_format': {'pcm': {'sample_rate_hz': 16000}}}}
    first_packet = True
    for audio_packet in audio_packets:
        if first_packet:
            first_packet = False

            # Only first packet should include the request header
            stream_input = StreamInput(
            log.debug(f'Stream input initial: {stream_input}')
            stream_input = StreamInput(audio=audio_packet)

        yield stream_input

Interactions with the user that require audio streaming are done through multiple ExecuteStream calls. ExecuteStream takes in a StreamInput message and returns a StreamOutput message. This provides a streaming audio option to handle audio input and audio output in a smooth way.

Streaminput for input

The StreamInput message can be used to:

The StreamInput method has the following fields:

Streamoutput for output

ExecuteStream returns a StreamOutput, which has the following fields:

Note that speech responses do not necessarily need to use synthesized speech from TTS. Another option is to use recorded speech audio files. For more information, see Providing speech response using recorded speech audio.

Additional details on handling speech input and output in your application are available under Reference topics.

Step 4c. Send requested data

If the last ExecuteResponse included a data acess action requesting client-side fetch of specified data, the client app needs to fetch the data and returns it as part of the payload of an ExecuteRequest under requested_data. The payload will otherwise be empty, not containing user input. This happens when the dialog gets to a data access node that is configured for client-side data access. For more information about this, see Data access actions.

payload_dict = {
    "requested_data": {
      "id": "get_coffee_price",
      "data": {
        "coffee_price": "4.25",
        "returnCode": "0"

response, call = execute_request(stub,

Step 4d. Proceed with server-side data fetch

If Dialog is carrying out a data fetch on the server-side that will take some time, and a latency message has been configured in Mix.dialog, Dialog can send messages to play to fill up the time and make the user experience waiting more pleasant as part of a Continue action.

To move on, the client app has to signal that it is ready for Dialog to carry on when it is ready. As you would when you first kick off a conversation, send an ExecuteRequest that includes the session ID but leave the user_text field of the payload user_input empty.

payload_dict = {
            "user_input": {
                "user_text": None
response, call = execute_request(stub,

Step 5. Check session status

def status_request(stub, session_id):
    status_request = StatusRequest(session_id=session_id)
    log.debug(f'Status Request: {status_request}')
    status_response, call = stub.Status.with_call(status_request)
    response = MessageToDict(status_response)
    log.debug(f'Status Response: {response}')
    return response, call

In a client application using asynchronous communication modalities such as text messaging, the client will not always necessarily know whether a session is still active, or whether it has expired. To check whether the session is still active, and if so, how much time is left in the ongoing session, the client app sends a StatusRequest message. This message has one field:

Some notes:

A StatusResponse message is returned giving the approximate time left in the session. The status code can be one of the following:

Step 6. Update session data

def update_request(stub, session_id, update_data, client_data, user_id):
    update_payload = UpdateRequestPayload(
    update_request = UpdateRequest(session_id=session_id, 
    log.debug(f'Update Request: {update_request}')
    update_response, call = stub.Update.with_call(update_request)
    response = MessageToDict(update_response)
    log.debug(f'Update Response: {response}')
    return response, call

To update session data, the client app sends the UpdateRequest message; this message has the following fields:

Some notes:

An empty UpdateResponse is returned. The status code can be one of the following:

Step 7. Stop the conversation

def stop_request(stub, session_id=None):
    stop_req = StopRequest(session_id=session_id)
    log.debug(f'Stop Request: {stop_req}')
    stop_response, call = stub.Stop.with_call(stop_req)
    response = MessageToDict(stop_response)
    log.debug(f'Stop Response: {response}')
    return response, call

To stop the conversation, the client app sends the StopRequest message; this message has the following fields:

The StopRequest message removes the session state, so the session ID for this conversation should not be used in the short term for any new interactions, to prevent any confusion when analyzing logs.

Note: If the dialog application concludes with an External Actions node of type End, your client application does not need to send the StopRequest message, since the End node closes the session. If both the StopRequest message is sent and the dialog application includes an End node, the StatusCode.NOT_FOUND error code is returned, since the session is closed and could not be found.

Detailed sequence flow

Detailed sequence flow

Sample Python app sample app

import argparse
import logging

import uuid

from google.protobuf.json_format import MessageToJson, MessageToDict

from grpc import StatusCode

from nuance.dlg.v1.common.dlg_common_messages_pb2 import *
from nuance.dlg.v1.dlg_messages_pb2 import *
from nuance.dlg.v1.dlg_interface_pb2 import *
from nuance.dlg.v1.dlg_interface_pb2_grpc import *

log = logging.getLogger(__name__)

def parse_args():
    parser = argparse.ArgumentParser(
        usage="%(prog)s [-options]",
        formatter_class=lambda prog: argparse.HelpFormatter(
            prog, max_help_position=45, width=100)

    options = parser.add_argument_group("options")
    options.add_argument("-h", "--help", action="help",
                         help="Show this help message and exit")
    options.add_argument("--token", nargs="?", help=argparse.SUPPRESS)
    options.add_argument("-s", "--serverUrl", metavar="url", nargs="?",
                         help="Dialog server URL, default=localhost:8080", default='localhost:8080')
    options.add_argument('--modelUrn', nargs="?",
                         help="Dialog App URN, e.g. urn:nuance-mix:tag:model/A2_C16/mix.dialog")
    options.add_argument("--textInput", metavar="file", nargs="?",
                         help="Text to preform interpretation on")

    return parser.parse_args()

def create_channel(args):    
    log.debug("Adding CallCredentials with token %s" % args.token)
    call_credentials = grpc.access_token_call_credentials(args.token)

    log.debug("Creating secure gRPC channel")
    channel_credentials = grpc.ssl_channel_credentials()
    channel_credentials = grpc.composite_channel_credentials(channel_credentials, call_credentials)
    channel = grpc.secure_channel(args.serverUrl, credentials=channel_credentials)

    return channel

def read_session_id_from_response(response_obj):
        session_id = response_obj.get('payload').get('sessionId', None)
    except Exception as e:
        raise Exception("Invalid JSON Object or response object")
    if session_id:
        return session_id
        raise Exception("Session ID is not present or some error occurred")

def start_request(stub, model_ref_dict, session_id, selector_dict={}):
    selector = Selector(channel=selector_dict.get('channel'), 
    start_payload = StartRequestPayload(model_ref=model_ref_dict)
    start_req = StartRequest(session_id=session_id, 
    log.debug(f'Start Request: {start_req}')
    start_response, call = stub.Start.with_call(start_req)
    response = MessageToDict(start_response)
    log.debug(f'Start Request Response: {response}')
    return response, call

def execute_request(stub, session_id, selector_dict={}, payload_dict={}):
    selector = Selector(channel=selector_dict.get('channel'), 
    input = UserInput(user_text=payload_dict.get('user_input').get('userText'))
    execute_payload = ExecuteRequestPayload(
    execute_request = ExecuteRequest(session_id=session_id, 
    log.debug(f'Execute Request: {execute_payload}')
    execute_response, call = stub.Execute.with_call(execute_request)
    response = MessageToDict(execute_response)
    log.debug(f'Execute Response: {response}')
    return response, call

def execute_stream_request(args, stub, session_id, selector_dict={}):
    # Receive stream outputs from Dialog
    stream_outputs = stub.ExecuteStream(build_stream_input(args, session_id, selector_dict))
    log.debug(f'execute_responses: {stream_outputs}')
    responses = []
    audio = bytearray(b'')

    for stream_output in stream_outputs:
        if stream_output:
            # Extract execute response from the stream output
            response = MessageToDict(stream_output.response)
            if response: 
            audio +=
    return responses, audio

def build_stream_input(args, session_id, selector_dict):
    selector = Selector(channel=selector_dict.get('channel'), 

        with open(args.audioFile, mode='rb') as file:
            audio_buffer =

        # Hard code packet_size_byte for simplicity sake (approximately 100ms of 16KHz mono audio)
        packet_size_byte = 3217
        audio_size = sys.getsizeof(audio_buffer)
        audio_packets = [ audio_buffer[x:x + packet_size_byte] for x in range(0, audio_size, packet_size_byte) ]

        # For simplicity sake, let's assume the audio file is PCM 16KHz
        user_input = None
        asr_control_v1 = {'audio_format': {'pcm': {'sample_rate_hz': 16000}}}

        # Text interpretation as normal
        asr_control_v1 = None
        audio_packets = [b'']
        user_input = UserInput(user_text=args.textInput)

    # Build execute request object
    execute_payload = ExecuteRequestPayload(user_input=user_input)
    execute_request = ExecuteRequest(session_id=session_id, 

    # For simplicity sake, let's assume the audio file is PCM 16KHz
    tts_control_v1 = {'audio_params': {'audio_format': {'pcm': {'sample_rate_hz': 16000}}}}
    first_packet = True
    for audio_packet in audio_packets:
        if first_packet:
            first_packet = False

            # Only first packet should include the request header
            stream_input = StreamInput(
            log.debug(f'Stream input initial: {stream_input}')
            stream_input = StreamInput(audio=audio_packet)

        yield stream_input

def stop_request(stub, session_id=None):
    stop_req = StopRequest(session_id=session_id)
    log.debug(f'Stop Request: {stop_req}')
    stop_response, call = stub.Stop.with_call(stop_req)
    response = MessageToDict(stop_response)
    log.debug(f'Stop Response: {response}')
    return response, call

def main():
    args = parse_args()
    log_level = logging.DEBUG
        format='%(asctime)s %(levelname)-5s: %(message)s', level=log_level)
    with create_channel(args) as channel:
        stub = DialogServiceStub(channel)
        model_ref_dict = {
            "uri": args.modelUrn,
            "type": 0
        selector_dict = {
            "channel": "default",
            "language": "en-US",
            "library": "default"
        response, call = start_request(stub, 
        session_id = read_session_id_from_response(response)
        log.debug(f'Session: {session_id}')
        assert call.code() == StatusCode.OK
        log.debug(f'Initial request, no input from the user to get initial prompt')
        payload_dict = {
            "user_input": {
                "userText": None
        response, call = execute_request(stub, 
        assert call.code() == StatusCode.OK
        log.debug(f'Second request, passing in user input')
        payload_dict = {
            "user_input": {
                "userText": args.textInput
        response, call = execute_request(stub, 
        assert call.code() == StatusCode.OK
        response, call = stop_request(stub, 
        assert call.code() == StatusCode.OK

if __name__ == '__main__':

The sample Python application consists of these files:


To run this sample app, you need:


To run this sample application:

Step 1. Download the sample app here and unzip it in a working directory (for example, /home/userA/dialog-sample-python-app).

Step 2. Download the gRPC .proto files here and unzip the files in the sample app working directory.

Step 3. Navigate to the sample app working directory and install the required dependencies. The details will depend on the platform and command shell you are using.

For a POSIX OS using bash:

python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install grpcio
pip install grpcio-tools
pip install uuid

For Windows using cmd.exe:

python -m venv env
python -m pip install --upgrade pip
pip install grpcio
pip install grpcio-tools
pip install uuid

For Windows using Git Bash command shell, the details are almost the same, but substitute source env/Scripts/activate for env/Scripts/activate.

Step 4. Generate the stubs:

echo "Pulling support files"
mkdir -p google/api
curl > google/api/annotations.proto
curl > google/api/http.proto
echo "generate the stubs for support files"
python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/http.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/annotations.proto
echo "generate the stubs for the DLGaaS gRPC files"
python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/dlg_interface.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=. nuance/dlg/v1/dlg_messages.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=. nuance/dlg/v1/common/dlg_common_messages.proto
echo "generate the stubs for the ASRaaS gRPC files"
python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/recognizer.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=. nuance/asr/v1/resource.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=. nuance/asr/v1/result.proto
echo "generate the stubs for the TTSaaS gRPC files"
python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/tts/v1/nuance_tts_v1.proto
echo "generate the stubs for the NLUaaS gRPC files"
python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/runtime.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=. nuance/nlu/v1/result.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=. nuance/nlu/v1/interpretation-common.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=. nuance/nlu/v1/single-intent-interpretation.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=. nuance/nlu/v1/multi-intent-interpretation.proto
echo "generate the stubs for supporting files"
python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/rpc/error_details.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/rpc/status.proto
python -m grpc_tools.protoc --proto_path=./ --python_out=./ nuance/rpc/status_code.proto

Step 5. Edit the run script,, to add your CLIENT_ID and SECRET. These are your Mix credentials as described in Generate token.

export MY_TOKEN="`curl -s -u "$CLIENT_ID:$SECRET" \
 "" \
 -d "grant_type=client_credentials" -d "scope=dlg" \
 | python -c 'import sys, json; print(json.load(sys.stdin)["access_token"])'"

python --serverUrl "" --token $MY_TOKEN --modelUrn "$1" --textInput "$2"

Step 6. Run the application using the script file, passing it the URN and a text to interpret:

./ modelUrn textInput


For example:

$ ./ "urn:nuance-mix:tag:model/TestMixClient/mix.dialog" "I want a double espresso"

An output similar to the following is provided:

2020-12-07 17:04:05,414 DEBUG: Creating secure gRPC channel
2020-12-07 17:04:05,420 DEBUG: Start Request: selector {
  channel: "default"
  language: "en-US"
  library: "default"
payload {
  model_ref {
    uri: "urn:nuance-mix:tag:model/TestMixClient/mix.dialog"

2020-12-07 17:04:05,945 DEBUG: Start Request Response: {'payload': {'sessionId': '92705444-cd59-4a04-b79c-e67203f04f0d'}}
2020-12-07 17:04:05,948 DEBUG: Session: 92705444-cd59-4a04-b79c-e67203f04f0d
2020-12-07 17:04:05,949 DEBUG: Initial request, no input from the user to get initial prompt
2020-12-07 17:04:05,952 DEBUG: Execute Request: user_input {

2020-12-07 17:04:06,193 DEBUG: Execute Response: {'payload': {'messages': 
[{'visual': [{'text': 'Hello and welcome to the coffee app.'}], 'view': {}}], 
'qaAction': {'message': {'visual': [{'text': 'What can I get you today?'}]}, 
'data': {}, 'view': {}}}}
2020-12-07 17:04:06,198 DEBUG: Second request, passing in user input
2020-12-07 17:04:06,199 DEBUG: Execute Request: user_input {
  user_text: "I want a double espresso"

2020-12-07 17:04:06,791 DEBUG: Execute Response: {'payload': {'messages': 
[{'visual': [{'text': 'Perfect, a double espresso coming right up!'}], 'view': 
{}}], 'endAction': {'data': {}, 'id': 'End dialog'}}}

Reference topics

This section provides more detailed information about objects used in the gRPC API.

Note: Examples in this section are shown in JSON format for readability. However, in an actual client application, content is sent and received as protobuf objects.

Status messages and codes

gRPC error codes

In addition to the standard gRPC error codes, DLGaaS uses the following codes:

gRPC code Message Indicates
0 OK Normal operation
5 NOT FOUND The resource specified could not be found; for example:
  • No session corresponding to the session ID specified
  • No model found for the URN specified
  • Incorrect language code specified
  • Incorrect channel name specified

Troubleshooting: Make sure that the resource provided exists or that you have specified it correctly. See URN for details on the URN syntax.
9 FAILED_PRECONDITION ASRaaS and/or NLUaaS returned 400 range status codes
11 OUT_OF_RANGE The provided session timeout is not in the expected range.

Troubleshooting: Specify a value between 0 and 90000 seconds (default is 900 seconds) and try again.
12 UNIMPLEMENTED The API version was not found or is not available on the URL specified. For example, a client using DLGaaS v1 is trying to access the URL.

Troubleshooting: See URLs to runtime services for the supported URLs.
13 INTERNAL There was an issue on the server side or interactions between sub systems have failed.

Troubleshooting: Contact Nuance.
16 UNAUTHENTICATED The credentials specified are incorrect or expired.

Troubleshooting: Make sure that you have generated the access token and that you are providing the credentials as described in Authorize your client application. Note that the token needs to be regenerated regularly. See Access token lifetime for details.

HTTP return codes

In addition to the standard HTTP error codes, DLGaaS uses the following codes:

HTTP code Message Indicates
200 OK Normal operation
400 BAD_REQUEST Server cannot process the request due to client error such as a malformed request
401 UNAUTHORIZED The credentials specified are incorrect or expired.

Troubleshooting: Make sure that you have generated the access token and that you are providing the credentials as described in Authorize your client application. Note that the token needs to be regenerated regularly. See Access token lifetime for details.
404 NOT_FOUND The resource specified could not be found; for example:
  • No session corresponding to the session ID specified
  • No model found for the URN specified
  • The path of the HTTP endpoint includes a typo (for example, incorrect version)

Troubleshooting: Make sure that the resource provided exists or that you have specified it correctly. See URN for details on the URN syntax.
500 INTERNAL_SERVER_ERROR There was an issue on the server side.
Troubleshooting: Contact Nuance.

Values in the 400 range indicate an error in the request that your client app sent. Values in the 500 range indicate an internal error within DLGaaS or another Mix service.


Incorrect URN

"grpc_message":"model [urn:nuance:mix/eng-USA/coffee_app_typo/mix.dialog] could not be found","grpc_status":5

Incorrect channel

"grpc_message":"channel is invalid, supported values are [Omni Channel VA, default] (error code: 5)","grpc_status":5}"

Session not found

"grpc_message":"Could not find session for [12345]","grpc_status":5}"

Incorrect credentials

"{"error":{"code":401,"status":"Unauthorized","reason":"Token is expired","message":"Access credentials are invalid"}\n","grpc_status":16}"

Message actions

Example message action as part of QA Action

  "payload": {
    "messages": [],
    "qa_action": {
      "message": {
        "nlg": [{
            "text": "What type of coffee would you like?"
        "visual": [{
            "text": "What <b>type</b> of coffee would you like? For the list of options, see the <a href=\"\">menu</a>."
        "audio": [{
            "text": "What type of coffee would you like? ",
            "uri": "en-US/prompts/default/default/Message_ini_01.wav?version=1.0_1602096507331"

A message action indicates that a message should be played to the user. A message can be provided as:

Message actions can be configured in the following Mix.dialog nodes:

Message nodes

A message node is used to play or display a message. The message specified in a message node is sent to the client application as a message action. A message node also performs non-recognition actions, such as playing a message, assigning a variable, or defining the next node in the dialog flow.

Messages configured in a message node are cumulative and sent only when a question and answer node, a data access node, or an external actions node occurs in the dialog flow. For example, consider the following dialog flow:

multiple messages

This would be handled as follows:

  1. The Dialog service sends an ExecuteResponse when encountering the question and answer node, with the following messages:
    # First ExecuteResponse
    "payload": {
    "messages": [{
        "nlg": [],
        "visual": [{
            "text": "Hey there!"
        "audio": []
      }, {
        "nlg": [],
        "visual": [{
            "text": "Welcome to the coffee app."
        "audio": []
    "qa_action": {
      "message": {
        "nlg": [],
        "visual": [{
            "text": "What can I do for you today?"
        "audio": []
  2. The client application sends an ExecuteRequest with the user input.
  3. The Dialog service sends an ExecuteResponse when encountering the end node, with the following message action:
    # Second ExecuteResponse
    "payload": {
    "messages": [{
        "nlg": [],
        "visual": [{
            "text": "Goodbye."
        "audio": []
    "end_action": {}

Using variables in messages

Messages can include variables. For example, in a coffee application, you might want to personalize the greeting message:

"Hello Miranda ! What can I do for you today?"

Variables are configured in Mix.dialog. They are resolved by the Dialog engine and then returned to the client application. For example:

    "payload": {
        "messages": [],
        "qa_action": {
            "message": {
                "nlg": [],
                "visual": [
                        "text": "Hello Miranda ! What can I do for you today?"
                "audio": []

Question and answer actions

A question and answer action is returned by a question and answer node. A question and answer node is the basic node type in dialog applications. It first plays a message and then recognizes user input.

The message specified in a question and answer node is sent to the client application as a message action.

The client application must then return the user input to the question and answer node. This can be provided in four ways:

In a question and answer node, the dialog flow is stopped until the client application has returned the user input.

Sending data

A question and answer node can specify data to send to the client application. This data is configured in Mix.dialog, in the Send Data tab of the question and answer node. For the procedure, see Send data to the client application in the Mix.dialog documentation.

For example, in the coffee application, you might want to send entities that you have collected in a previous node (COFFEE_TYPE and COFFEE_SIZE) as well as data that you have retrieved from an external system (the user's rewards card number):

Send Data tab

This data is sent to the client application in the data field of the qa_action; for example:

    "payload": {
        "messages": [],
        "qa_action": {
            "message": {
                "nlg": [],
                "visual": [
                        "text": "Your order was processed. Would you like anything else today?"
                "audio": [],
                "view": {
                    "id": "",
                    "name": ""
            "data": {
                "rewardsCard": "5367871902680912",
                "COFFEE_TYPE": "espresso",
                "COFFEE_SIZE": "lg"

Interactive elements

Question and answer actions can include interactive elements to be displayed by the client app, such as clickable buttons or links.

For example, in a web version of the coffee application, you may want to display Yes/No buttons so that users can confirm their selection for an entity named answer which takes values of Yes or No:


Interactive elements are configured in Mix.dialog in question and answer nodes. For the procedure, see Define interactive elements in the Mix.dialog documentation.

For example, for the Yes/No buttons scenario above, you could configure two elements, one for each button, as follows:


This information is sent to the client app in the selectable field of the qa_action. For example:

    "payload": {
        "messages": [],
        "qa_action": {
            "message": {
                "nlg": [],
                "visual": [{
                        "text": "So you want a double espresso , is that it?"
                "audio": []
            "selectable": {
                "selectable_items": [{
                        "value": {
                            "id": "answer",
                            "value": "yes"
                        "description": "Image of green checkmark",
                        "display_text": "Yes",
                        "display_image_uri": "/resources/images/green_checkmark.png"
                    }, {
                        "value": {
                            "id": "answer",
                            "value": "no"
                        "description": "Image of Red X",
                        "display_text": "No",
                        "display_image_uri": "/resources/images/red_x.png"

The application is then responsible for displaying the elements (in this case, the two buttons) and for returning the choice made by the user in the selected_item field of the Execute Request payload. For example:

"payload": {
    "user_input": {
        "selected_item": {
            "id": "answer",
            "value": "no"

In both cases the field "id" corresponds to the name of the entity as defined in Mix.dialog or Mix.nlu.

Data access actions

A data access action tells the client app that the dialog needs data from the client to continue the flow. For example, consider the following use cases:

Data access actions are configured in Mix.dialog in data access nodes. The configurations in these nodes specify:

Data access actions are sent only when the data access node has enabled client-side fetching.

Data access nodes can also be configured in Mix.dialog for server-side fetching directly from a backend server without going through the DLGaaS API. In that case a Continue action is sent instead.

See Exchange data with an external system for additional details.

Using the data access API in the client app

When a data access node is configured for client-side fetching, data access information is sent and received as follows:

For example, in the coffee app use case, if a user says "I want a double espresso," the dialog will send data access action information to the client application in the ExecuteResponsePayload:

  "payload": {
    "messages": [
      "nlg": [],
      "visual": [
        "text": "Great! A large espresso coming right up.",
        "mask": false,
        "barge_in_disabled": false
      "audio": [],
      "view": {
       "id": "",
       "name": ""
    "da_action": {
      "id": "get_coffee_price",
      "message": {
        "nlg": [],
        "visual": [
          "text": "Hold on a moment while we ring that up.",
          "mask": false,
          "barge_in_disabled": false
        "audio": []
      "view": {
       "id": "sample class",
       "name": "sample type"
      "data": {
        "COFFEE_TYPE": "espresso",
        "COFFEE_SIZE": "lg"
      "message_settings": {
       "delay": "500ms",
       "minimum": "0ms"


The client application uses that information to perform the action required by the dialog, in this case fetching the price of the coffee based on the user's choice. While retrieving the data it plays the message to the user using the specified message settings.

When the client gets the coffee price from the data source, it then returns the value in the coffee_price variable as part of the ExecuteRequestPayload data field. Note that data also includes a returnCode.

  "selector": {
    "channel": "ivr",
    "language": "en-US",
    "library": "default"
  "payload": {
    "requested_data": {
      "id": "get_coffee_price",
      "data": {
        "coffee_price": "4.25",
        "returnCode": "0"

The returnCode is required, otherwise the Execute request will fail. A returnCode of "0" indicates a successful interaction.

Data access action sequence flow

This sequence diagram here shows a data access action exchange. For simplicity, only the payload of the requests and responses related to the data access feature are shown.

data access flow

Continue actions

Self-hosted environments: Latency messages require version 1.1 (or later) of the Dialog service. IVR applications using Nuance Speech Suite with VoiceXML Connector 1.0 or earlier do not support the fetching properties, or the continue action interaction for server-side fetching.

A continue action is used in the case of a Data access node using a backend server connection to access the required data.

In this case, DLGaaS pauses before continuing on with the data access step, and sends an ExecuteResponse containing a continue action to the client app.

The continue action provides the client app with information useful for smoothing over any latency or delays while DLGaaS tries to access the data from the backend server. This includes:

For example, in the coffee app use case, if a user says "I want a double espresso," the dialog will send continue action information to the client application in the ExecuteResponsePayload:

  "payload": {
   "messages": [
        "nlg": [],
        "visual": [
            "text": "Great! A large espresso coming right up!",
            "mask": false,
            "bargeInDisabled": false
        "audio": [],
        "view": {
          "id": "",
          "name": ""
   "continueAction": {
      "message": {
        "nlg": [],
        "visual": [
            "text": "Hold on a moment while we ring that up.",
           "mask": false,
            "bargeInDisabled": false
        "audio": []
      "view": {
        "id": "sample class",
        "name": "sample type"
      "id": "DataAccess",
      "messageSettings": {
        "delay": "500ms",
        "minimum": "0ms"
      "backendConnectionSettings": {
        "fetchTimeout": "30s",
        "connectTimeout": ""

To continue the flow, the client app must send an ExecuteRequest to DLGaaS containing only the current session_id.

DLGaaS proceeds to attempt to retrieve the data from the backend server, and in the meantime, the client app can play the provided message to keep the user informed and engaged while waiting for the response from DLGaaS.

DLGaaS will then continue with the flow as configured in the dialog.

continue action flow

Continue action settings are configured in Mix.dialog in the data access node settings, under Latency message and Backend connection overrides. See Set up a data access node in the Mix.dialog documentation for more details.

Transfer actions

An external actions node of type "Transfer" in Mix.dialog sends an Escalation action in the DGLaaS API. This action can be used, for example, to escalate to an IVR agent. Any data set in the Transfer node is sent as part of the Escalation action data field.

To continue the flow, the client application must return data in the requested_data field of the ExecuteRequestPayload. At a minimum, this data must include a returnCode. It can also include data requested by the dialog, if any. The returnCode is required, otherwise the Execute request will fail. A returnCode of "0" indicates a successful interaction.

For example, consider a scenario where the Transfer action is used to escalate to an agent to confirm a customer's data, as shown in the following Mix.dialog node:

Transfer actions

This transfer action sends the userName and userID variables to the client application in an escalation_action, as follows:

    "payload": {
        "messages": [],
        "escalation_action": {
            "data": {
                "userName": "Miranda Smith",
                "userID": "MIRS82734"
            "id": "TransferToAgent"

The client application transfers the call and then returns a returnCode to the dialog to provide the status of the transaction. If the transfer was successful, a returnCode of "0" returned. For example:

    "selector": {
        "channel": "default",
        "language": "en-US",
        "library": "default"
    "payload": {
        "requested_data": {
            "id": "TransferToAgent",
            "data": {
                "returnCode": "0"

End actions

An external actions node of type "End" returns an End action, which indicates the end of the dialog. It includes the ID that identifies the node in the Mix.dialog application as well as any data that you set for this node. For example:

  "payload": {
    "messages": [{
        "nlg": [],
        "visual": [{
            "text": "Perfect, a double espresso coming right up!"
        "audio": []
    "end_action": {
      "data": {
        "returnCode": "0"
      "id": "CoffeeApp End node"

Interpreting text user input

Interpretation of user input provided as text can be performed either by the Nuance Mix Platform (using NLUaaS) or by an external system.

Nuance Mix Platform performs interpretation

Example: Interpretation is performed by Nuance

"payload": {
  "user_input": {
    "user_text": "I want a large coffee"

When the Nuance Mix Platform is responsible for interpreting user input, the client application sends the text collected from the end user in the user_text field of the Execute request input message. The user text is sent to NLUaaS, which performs interpretation and returns the results to DLGaaS.

External system performs interpretation

Example: Interpretation is performed by an external system (simple format)

"payload": {
  "user_input": {
    "interpretation": {
      "confidence": 1.0,
      "utterance": "I want a large americano",
      "data": {
        "COFFEE_SIZE": "LG",
        "COFFEE_TYPE": "americano"
      "slot_literals": {
        "COFFEE_SIZE": "large",
        "COFFEE_TYPE": "americano"

Example: Interpretation is performed by an external system (NLUaaS format)

"payload": {
  "user_input": {
    "nluaas_interpretation": {
      "literal": "i want a double espresso",
      "interpretations": [{
          "single_intent_interpretation": {
            "intent": "ORDER_COFFEE",
            "confidence": 1,
            "origin": "GRAMMAR",
            "entities": {
              "COFFEE_SIZE": {
                "entities": [{
                    "text_range": {
                      "start_index": 9,
                      "end_index": 15
                    "confidence": 1,
                    "origin": "GRAMMAR",
                    "string_value": "lg"
              "COFFEE_TYPE": {
                "entities": [{
                    "text_range": {
                      "start_index": 16,
                      "end_index": 24
                    "confidence": 1,
                    "origin": "GRAMMAR",
                    "string_value": "espresso"

When an external system is responsible for interpreting user input, the client application sends the results of this interpretation in one of the following fields:

Performing speech recognition on audio input

The workflow to perform speech recognition on audio input is as follows:

  1. The Dialog service sends an ExecuteResponse with a question and answer action, indicating that it requires user input.
  2. The client application sends a first StreamInput method with the asr_control_v1, request, and control_message parameters to DLGaaS; this lets DLGaaS know to expect audio and provides parameters and resources to facilitate and tune the transcription.
  3. The client application sends additional StreamInputs to stream the audio.
  4. The client application sends an empty StreamInput to indicate end of audio.
    The audio is recognized, interpreted, and returned to the dialog application, which continues its flow.
  5. The Dialog service returns the corresponding ExecuteResponse in a single StreamOutput.

This can be seen in the detailed sequence flow. For example, assuming that the user says "I want an espresso", the client application will send a series of StreamInput methods with the following content:

# First StreamInput
    "request": {
        "session_id": "1c2c9822-45d5-460d-8696-d3fa9d8af8c2",
        "selector": {
            "channel": "default"
            "language": "en-US"
            "library": "default"
        "payload": {}
    "asr_control_v1": {
        "audio_format": {
            "pcm": {
                "sample_rate_hz": 16000
    "audio": "RIFF4\373\000\00..."

# Additional StreamInputs with audio bytes
    "audio": "...audio_bytes..."

# Final empty StreamInput to indicate end of audio


Once audio has been recognized, interpreted, and handled by DLGaaS, the following StreamOutput is returned:


# StreamOutput

    "response": {
        "payload": {
            "messages": [],
            "qa_action": {
                "message": {
                    "nlg": [{
                            "text": "What size coffee would you like? "
                    "visual": [{
                            "text": "What size coffee would you like?"
                    "audio": [] // This is a reference to an audio file.

Handling unusable ASR audio

DLGaaS handles unusable ASR audio as follows:

By default, if ASRaaS does not return a valid hypothesis, the dialog flow is determined by the dialog application, according to the processing defined for the NO_INPUT and NO_MATCH events in Mix.dialog.

In some cases, you may want the client application to handle the dialog flow if a valid hypothesis is not returned. This is done by setting the end_stream_no_valid_hypotheses parameter of the StreamInput asr_control_v1 message to true. When this is enabled, the stream is closed and the last StreamOutput message contains the ASR result in the asr_result field. The client application is then responsible for determining the next step in the dialog flow.

Handling DTMF input in IVR applications

For Interactive Voice Response (IVR) applications, you may also want to use Dual-tone multi-frequency (DTMF) inputs, for example from a telephone keypad.

This could include single key inputs that correspond to one of a set of options, for example, for a menu, as defined by a DTMF mapping in Mix.dialog. It could also include a sequence of key inputs, for example to key in an account or identification number, to be interpreted by an external DTMF grammar referenced in Mix.dialog.

DTMF inputs can be handled by an integration between Mix.dialog and Nuance Speech Suite using Nuance VoiceXML Connector. Speech Suite uses DTMF mappings or DTMF grammars from Dialog to interpret DTMF input in terms of Dialog entities. It then returns the interpretation of the input to the Dialog service to advance the dialog.

For more details on such integrations and on configuring Mix.dialog to handle DTMF inputs, see Mix tips for IVR developers.

Generating synthesized speech output

Generation of synthesized speech output can be performed either by the Nuance Mix Platform (TTSaaS) or by an third party text to speech system. Speech synthesis carried out by Nuance TTSaaS can either be orchestrated by Dialog or by the client application.

Synthesizing an audio output message using TTS with Dialog orchestration

  1. The client application sends a StreamInput message with the tts_control_v1 and request parameters to DLGaaS.
    The dialog application continues the dialog according to the ExecuteRequest provided in the request parameter.
  2. If the dialog is configured to support the TTS modality, speech audio for the text is synthesized and the audio is streamed back to the application in a series of StreamOutput messages.

Note: When DLGaaS calls TTSaaS through the StreamInput request, it specifies the ssml input type, which lets you use SSML tags to tune the synthesized TTS output. For more information about SSML tags, see the TTSaaS documentation.

For example, assuming that the user typed "I want an espresso", the client application will send a single StreamInput method with the following content:

# StreamInput
    "request": {
        "session_id": "1c2c9822-45d5-460d-8696-d3fa9d8af8c2",
        "selector": {
            "channel": "default"
            "language": "en-US"
            "library": "default"
        "payload": {
            "user_input": {
                "user_text": "I want an espresso"
    "tts_control_v1": {
        "audio_params": {
            "audio_format": {
                "pcm": {
                    "sample_rate_hz": 16000

Once user text has been interpreted and handled by DLGaaS, the following series of StreamOutput is returned:

Note: The StreamOutput includes the audio field because a TTS message was defined (as shown in the nlg field). If no TTS message was specified, no audio would have been returned.

# First StreamOutput
    "response": {
        "payload": {
            "messages": [],
            "qa_action": {
                "message": {
                    "nlg": [{
                            "text": "What size coffee would you like? "
                    "visual": [{
                            "text": "What size coffee would you like?"
                    "audio": []
    "audio": "RIFF4\373\000\00.."

# Additional StreamOutputs with audio bytes
    "audio": "...audio_bytes..."

TTS with orchestration by client app

Self-hosted environments: This feature requires version 1.3 of the Dialog service. The VoiceXML Connector does not support this feature.

To support alternate solutions for text to speech, DLGaaS provides the current conversation language and the TTS voice settings configured in Mix.dialog for the response messages as part of ExecuteResponse payload messages. The active language lets the client application know which language to generate speech for. The voice information lets the client application know, if you are using Mix TTSaaS, which Nuance voice profile to request as part of a TTSaaS SynthesisRequest.

Language and TTS voice parameters

  "payload": {
    "messages": [],
    "qa_action": {
      "message": {
        "nlg": [{
            "text": "What type of coffee would you like?"
        "visual": [{
            "text": "What <b>type</b> of coffee would you like? For the list of options, see the <a href=\"\">menu</a>."
        "language": "en-us",
        "tts_parameters": {
            "voice": {
                "name": "Evan",
                "model": "enhanced",
                "gender": "MALE",
                "language": "en-us"

The nlg text contents of ExecuteResponse payload messages provide the text input to pass to TTSaaS if you are doing your own orchestration. Otherwise, it provides a text backup if TTSaaS fails.

Note that there are some important points to remember in your design and configuration in Mix.dialog:

Performing both speech recognition and TTS in a single call

  1. The client application sends the StreamInput method with the asr_control_v1, tts_control_v1, and request parameters to DLGaaS; this lets DLGaaS know to expect audio.
  2. The client application streams the audio with the StreamInput method.
    The audio is recognized, interpreted, and returned to the dialog application, which continues its flow. If the corresponding ExecuteResponse includes a TTS message, this message is synthesized and the audio is streamed back to the application in a series of StreamOutput calls.

Note about performing speech recognition and TTS in a dialog application

The speech recognition and TTS features provided as part of the DLGaaS API should be used in relation to your Mix.dialog, that is:

To perform recognition or TTS outside of a Mix.dialog, please use the following services:

Providing speech response using recorded speech audio

TTS synthesized speech is one way to provide speech responses in voice or omni-channel applications. Another option is to use recorded audio files.

This second option is available when an Audio Script message has been defined in Mix.dialog for the interaction. When using this option, you need to pre-record and store speech audio files within the client application. In this case, the StreamOutput response from DLGaaS includes, within the payload of its response field, local URI references for the appropriate audio file(s) to retrieve and play .

The message contents of both the messages and qa_action fields in the payload contain an audio field with one or more Message.Audio messages. The contents give details for recorded audio versions of the message contents. Message.Audio contains two key fields:

Audio files and naming

Dialog expects recorded audio files related to a message to have file names derived systematically from the Audio File ID, or, if that is not specified, from the Message ID in Mix.dialog. How the file names are specified depends on whether the message is static or dynamic.

Static message audio file naming

Static messages have fixed contents and are the same every time they are used. An example of this is a standard greeting message or question posed routinely to the user.

For example, suppose in a banking application, the application sends an initial greeting message with a question to open the interaction, as follows:

"Welcome to your personal banking app. How may I help you today?"

In the case of a static message, the client application receives a payload message with one Message.Audio entry providing reference to a single audio file. Only one file is needed because the contents are fixed and can be recorded in one piece. If an Audio File ID is available, the file name is of the form Audio_File_ID. If only a Message ID is available, the file will instead be named Message_ID.

For the example above, the following payload message audio field contents would be returned:

    "response": {
        "payload": {
            "messages": [],
            "qa_action": {
                "message": {
                    "visual": [{ "text": "Welcome to your personal banking app. How may I help you today?"}],
                    "audio": [{"text": "Welcome to your personal banking app. How may I help you today?", "uri": "en-US/prompts/default/IVRVoiceVA/welcomeAudio.wav?version=1.0_1612217879954"}]

Dynamic message audio file naming

Dynamic messages have all or part of the message depending on the value of session variables. As such, the full contents of the message are only knowable at runtime.

For example, suppose that in a banking application you want to read back the details of the requested transaction to the user and get their confirmation. So in the case of a funds transfer scenario, the message might be defined in Mix.dialog as follows:

"You have chosen to transfer AMOUNT from SOURCE_ACCOUNT to DESTINATION_ACCOUNT. Is this correct?"

Here, AMOUNT, SOURCE_ACCOUNT, and DESTINATION_ACCOUNT are placeholders for values of variables only known at runtime based on what the user says. The rest of the message is static content that is always the same.

In the case of a dynamic message with placeholders for variable values, the message is broken into parts representing the different static and dynamic segments in the message. The client application receives a payload message with multiple Message.Audio entries providing reference to either static audio files or fallback text for TTS.

Suppose that at runtime, you have:

The message breaks down into seven segments, alternating between static and dynamic content:

  1. You have chosen to transfer (static)
  2. $500 (dynamic)
  3. from (static)
  4. chequing (dynamic)
  5. to (static)
  6. savings (dynamic)
  7. Is this correct? (static)

Seven audio entries are sent within the response payload representing the static and dynamic segments.

If the message has an Audio File ID transferBetweenAccounts, and .wav was set as the desired audio file format in Mix.dialog, then Mix.dialog would expect four recorded audio files corresponding to the four static segments with file names:

Here the numbers added to the end of the file name correspond to the segment number within the message.

For the dynamic segments, text is provided so that the client application can make a runtime request for TTS audio.

Here's the payload message audio field contents for the same example:

# StreamOutput
    "response": {
        "payload": {
            "messages": [],
            "qa_action": {
                "message": {
                    "visual": [{ "text": "You have chosen to transfer $500 from checking to savings. Is this correct?"}],
                    "audio": [
                        {"text": "You have chosen to transfer", "uri": "en-US/prompts/default/IVRVoiceVA/transferBetweenAccounts_01.wav?version=1.0_1612217879954"},
                        {"text": "$500"},
                        {"text":"from", "uri": "en-US/prompts/default/IVRVoiceVA/transferBetweenAccounts_03.wav?version=1.0_1612217879954"},
                        {"text": "chequing" },
                        {"text": "to", "uri": "en-US/prompts/default/IVRVoiceVA/transferBetweenAccounts_05.wav?version=1.0_1612217879954" },
                        {"text": "savings" },
                        {"text": "Is this correct?", "uri": "en-US/prompts/default/IVRVoiceVA/transferBetweenAccounts_07.wav?version=1.0_1612217879954" }

For the static segments with URIs, the client application can try to retrieve the audio files at the expected location. For the dynamic segments with only text, the client application would need to obtain synthesized speech by sending the text segments to TTS.

Once the recorded audio files and the TTS audio files are all obtained, the client application can play the audio for the message together.


DynamicMessageReference is a predefined variable schema in Mix.dialog used for audio messages.

This schema includes two fields:

To use this, do the following in Mix.dialog:

  1. Create a variable based on this schema
  2. Create a data access node to obtain the field values for the variable at runtime from the client application or a backend data source
  3. Put the variable as a dynamic placeholder under Audio Script modality in the message definition in Mix.dialog.

At runtime, Mix.dialog gets the audioFileName and ttsBackup from the data source, and sends this to the client application as part of a response payload Message.Audio. There, it can be handled similarly to the case of a static message audio file.

TTS backup

In any case where either no URI is provided for a segment of the message or the audio file is not available at runtime, the backup text can be used to generate audio via TTS. The client application needs to make a separate request to TTS to generate speech for that text.

Dynamic concatenated audio

When Mix dialogs are driven by VoiceXML applications, Audio script messages for certain supported languages are played using audio files from dynamic concatenated audio packages. In this case, speech audio for both static and dynamic content is put together and played from recorded concatenated audio files with intonation and formatting driven by message formatting applied in Mix.dialog. For more information see Dynamic concatenated audio playback options.


This reference topic clarifies the use of inline wordsets to improve Dialog's ability to make sense of user inputs.

What is a wordset?

In ASRaaS and NLUaaS, wordsets are used to help boost performance of recognition and interpretation of values for dynamic list entities. Dynamic list entities are list entities where the entity can take on several different values, and where the set of possible values can only be fully specified at runtime. Wordsets are collections of words brought in at runtime to dynamically specify the allowed values for one or more entities. In DLGaaS, wordsets are passed in to data access nodes using dynamic entity data variables.

Use cases for wordsets

There are two different scenarios where wordsets can be useful:

Wordsets improve performance for interpretation and recognition by more completely delineating the possible values that ASRaaS and NLUaaS should expect to encounter in the present context for specified entities.

Inline vs compiled wordsets in ASRaaS and NLUaaS

In ASRaaS and NLUaaS, wordsets can be passed to the service in one of two ways:

Inline wordsets are used for entities with a modest number of possible values (No more than 100 total items). Inline wordsets are:

Compiled wordsets are used in ASRaaS and NLUaaS for entities with a large number of possible values (hundreds to thousands of values). Examples of this could include, for example, all of a person's personal contacts, the staff directory of a large hospital, or a list of possible medication names.

Because of the size of these wordsets, trying to pass them in to be compiled at runtime adds undesirable or impractical amounts of latency. As a solution, ASRaaS and NLUaaS provide APIs to compile wordsets ahead of time. The Training API in ASRaaS and the Wordset API in NLUaaS provide this functionality. Once compiled, the wordsets can be referenced by URN at runtime using the regular runtime APIs of each service. This reduces the amount of latency added by using the wordset.

For details on using compiled wordsets, see Referencing compiled resources. The rest of this section focuses on how to use inline wordsets.

Passing inline wordsets: client-side vs server-side

Inline wordsets and server-side data integration

Client-side data integration

In DLGaaS, inline wordsets can be passed into the session at runtime through data transfers from external systems. This could either be from the client application or from a server-side data connection. For example:

When inline wordsets are used with Dialog, the accuracy and confidence levels for recognition and interpretation of dynamic list entities are boosted. This improves the overall ability of the Dialog to understand what your users want to do and route the Dialog accordingly to fulfill that intent.

Using wordsets with Dialog

To use inline wordsets with Dialog:

  1. Define one or more dynamic list entities in Mix.dialog or Mix.nlu by creating list entities and marking them as dynamic.
  2. Add at least a few initial values and literals for the each dynamic list entity in Mix.dialog or Mix.nlu.
  3. In Mix.nlu, for each dynamic list entity, create some annotated samples containing the entity, and train your NLU model.
  4. Create a new dynamic entity data variable in Mix.dialog. Note: Dynamic entity data objects are classified as simple objects in Mix.dialog.
  5. If a data access node is to be used for the data exchange, create and configure a data access node in Mix.dialog to get the dynamic entity data variable created earlier. The data access node also needs to be configured for either client-side or server-side integration with the data source.
  6. Create a question and answer node in Mix.dialog to collect your dynamic list entity.
  7. Set up your data source, whether server-side or client-side, to provide the dynamic entity data variable containing the wordset data to the data access node.

Wordsets schema

The inline wordset data is passed in the form of a dynamic entity data variable object.

A dynamic entity data variable contains one field, variable_name. This corresponds to the name of the variable created in Mix.dialog and configured to be collected by a data access node. The value for this field is a dynamic entity data object.

A dynamic entity data object contains a wordset for boosting one or more dynamic list entities. It has one or more fields with names of the form entity_name. Here, each entity_name corresponds to the name of one dynamic list entity that is being provided with values. The value for each entity_name field is an array of dynamic entity data items. Each dynamic entity data item describes one value for the corresponding dynamic list entity. In DLGaaS, the following fields can be used:

Element Type Description Used by
canonical String The value of the entity ASR, NLU, DLG
literal String The written or spoken form of the value; doubles as the value when canonical is omitted ASR, NLU, DLG
spoken Array (Optional) One or more additional spoken forms of the value—used by ASR; ignored for NLU ASR
label String (Optional) A label, such as the text to show on a button DLG
image_url String (Optional) A link (URL or relative path) for the image to use on a button DLG
description String (Optional) A description DLG

As can be seen in the table, some of these fields are used by NLU and/or ASR, while others are used only by DLG. label, image_url, and description are used in DLGaaS only to identify how to display the options in an interactive element.

The example below shows the format for a dynamic entity data variable object holding a cold drinks wordset for a coffee shop application.

Here moreCoffeeTypes is the dynamic entity data variable set in Dialog.

COFFEE_TYPE is an entity to be boosted with a wordset. Associated with this is an array. The two entries within the array hold details related to two possible values for the entity, cold brew coffee and iced cappuccino.

                "canonical": "cold_brew",
                "literal": "cold brew",
                    "cold brew"
                "label": "Cold brew coffee",
                "image_url": "",
                "description": "Cafe Italia's famous and refreshing cold brew coffee. Great for summer."
                "canonical": "ice_capp",
                "literal": "iced cappuccino",
                    "iced kapucheeno",
                    "iced kapacheeno"
                "label": "Iced cappuccino",
                "image_url": "",
                "description": "A frosty, slushy burst of coffee to beat the heat."

For more information, see Dynamic entity data specification.

Set up your data source

Your data source provides the wordset data to a data access node. The data source can use either server-side integration or client-side integration.

Server-side integration

Set up a RESTful endpoint at the server URL specified in the data access node. The endpoint will take in the specified inputs and return the specified dynamic entity data variable according to the the Wordsets schema.

For details on how to do this, see Exchanging data from the dialog application.

Client-side integration

Set up a script in your client application to handle the data access action. This script takes in specified inputs and returns the specified dynamic entity data variable according to the the Wordsets schema.

For details on how to do this, see Data access actions.

Behind the scenes behavior

Once the dynamic entity data variable is pulled into Dialog, it is available afterwards during the session for as long as needed.

Whenever a call to ASRaaS and/or NLUaaS is triggered by a DLGaaS ExecuteStream or Execute request, the wordset contained in the dynamic entity data variable will be added to the call.

For each such call to ASRaaS or NLUaaS, the dynamic entity data object is extracted from the dynamic entity data variable object, and added by Dialog as an inline wordset resource.

Recommendations/best practices

Wordsets with multiple different dynamic list entities can be passed into Dialog for use during the session.

If you're unsure about the size of your inline wordset, test the latency.

For more details on setting up wordsets in Mix.dialog, see Dynamic list entities.

Referencing compiled resources

Self-hosted environments: Use of the ExternalResourceReferences variable requires version 1.1 (or later) of the Dialog service. IVR applications using the Speech Suite platform with VoiceXML Connector do not yet support fetching external NLU and ASR resources. Projects using the Speech Suite platform only support inline wordsets.

This reference topic clarifies the use of compiled resources by reference to improve Dialog's ability to make sense of user speech and text inputs.

As mentioned in the wordsets section, the APIs of NLUaaS and ASRaaS allow you to compile resources ahead of time and then reference these resources by URN at runtime. The resources are then shared with ASRaaS and NLUaaS to improve recognition and interpretation.

DLGaaS supports passing in ASRaaS and NLUaaS references at runtime to be used by calls made by DLGaaS to ASRaaS and NLUaaS. This is accomplished using a session variable called ExternalResourceReferences.

Types of resources

The following types of resources can be referenced using an ExternalResourceReferences variable.

Service Resource type Description URN format

App-level NLU compiled wordset. Provides values for a dynamic list entity relevant to all users of the app. urn:nuance-mix:tag:wordset:lang/context_tag/name/lang/mix.nlu
User-level NLU compiled wordset. Provides values for a dynamic list entity specific to the current user. urn:nuance-mix:tag:wordset:lang/context_tag/name/lang/mix.nlu?=user_id=user_id
App-level ASR compiled wordset. Provides values for a dynamic list entity relevant to all users of the app. urn:nuance-mix:tag:wordset:lang/context_tag/name/lang/mix.asr
User-level ASR compiled wordset. Provides values for a dynamic list entity specific to the current user. urn:nuance-mix:tag:wordset:lang/context_tag/name/lang/mix.nlu?=user_id=user_id
DOMAIN_LM ASR domain language model. Additional model that supplements a base language model and improves performance recognizing speech using specialized terms common to a specific knowledge domain but rare in everyday speech. urn:nuance-mix:tag:model/context_tag/mix.asr?=language=lang  
SETTINGS ASR settings.


SPEAKER_PROFILE ASR speaker profile for the current user_id. Contains data that improves recognition performance for the current user based on qualities of the speaker and channel. N/A

For the URNs:

Note that speaker profiles do not need a URI. Speaker profiles are specified by the current user_id, which is passed in with requests in the DLGaaS API.

For more information on recognition and interpretation resources, see:

Passing in ExternalResourceReferences

ExternalResourceReferences can be passed into Dialog in three different ways:

Using compiled resources with Dialog

To use compiled resources by reference with Dialog:

  1. If using compiled wordset resources:
    • Define one or more dynamic list entities in Mix.dialog or Mix.nlu by creating list entities and marking them as dynamic.
    • Add at least a few initial values and literals for the each dynamic list entity in Mix.dialog or Mix.nlu.
    • In Mix.nlu, for each dynamic list entity, create some annotated samples containing the entity, and train your NLU model.
  2. If applicable, create and configure a data access node or an external actions node of Transfer action type with the predefined ExternalResourceReferences variable as a get data parameter to fetch references to the compiled resources.
  3. Create a question and answer node in Mix.dialog to collect your inputs on which the compiled resources will be applied.
  4. If using a data access node or external actions node, set up a data source to provide the value for the ExternalResourceReferences variable to be sent to Dialog.

ExternalResourceReferences schema

The value of ExternalResourceReferences is an object with two fields:

Each resource entry can have up to three fields:

The code sample below shows the format of an ExternalResourceReferences object. See above for the details that need to be specified for each URN to identify the resource.

     // Resources to improve NLU interpretation
     "NLUResources": [
         // NLU compiled wordset
         "uri": "urn:nuance-mix:tag:wordset:lang/contextTag/resourceName/lang/mix.nlu?=user_id=userId",
         "resourceType": "COMPILED_WORDSET"
     // Resources to improve ASR recognition
     "ASRResources": [
         // ASR compiled wordset
         "uri": "urn:nuance-mix:tag:wordset:lang/contextTag/resourceName/lang/mix.asr",
         "resourceType": "COMPILED_WORDSET"
         // ASR domain language model
         "uri": "urn:nuance-mix:tag:model/contextTag/mix.asr?=language=lang",
         "resourceType": "DOMAIN_LM",
         "weight_value": 0.7
         // ASR speaker profile
          "resourceType": "SPEAKER_PROFILE"
         // ASR settings
         "uri": "urn:nuance-mix:tag:settings/names-places/asr",
         "resourceType": "SETTINGS"

Use of ExternalResourceReferences

As with other session variables, once the set of resources is set, they will be available for use for the remainder of the session. DLGaaS will add references to these resources in any subsequent calls to ASRaaS and NLUaaS

Updating ExternalResourceReferences values

If the client application passes in a value for the ExternalResourceReferences variable again, this will overwrite the earlier values, and the new values will be used from that point forward.

Exchanging session data

In addition to data requested by data access actions, you can send data from the client application to the Dialog service with the following methods:

This data can include:

userData predefined variable

Example: StartRequest payload with session data

    "channel": "default",
    "language": "en-US",
    "library": "default"
        "timezone": "America/Cancun",
        "userGlobalID": "123123123",
        "userChannelID": "",
        "userAuxiliaryID": "7319434000843499",
        "systemID": "4561 9219 9923",
          "latitude": "21.161908",
          "longitude": "-86.8515279"
      "preferred_coffee": "espresso",
      "user_name": "Miranda"

Example: UpdateRequest payload with session data

  "session_id": "27f8e613-f624-429b-8c11-d2465dbc2692",
        "timezone": "America/Cancun",
        "userGlobalID": "123123123",
        "userChannelID": "",
        "userAuxiliaryID": "7319434000843499",
        "systemID": "4561 9219 9923",
          "latitude": "21.161908",
          "longitude": "-86.8515279"
      "preferred_coffee": "cappucino",
      "user_name": "Sam"

All dialog projects include the userData predefined variable, which can be set in the StartRequest payload or in the UpdateRequest payload to provide end user data such as the user's timezone, location, and so on.

The JSON code shows an example of how to pass userData in the StartRequest and UpdateRequest payloads. This data can then be used in the dialog application.

For a description of the userData variable, see userData schema in the Mix.dialog documentation.

Variables defined in Mix.dialog

You can set variables that were previously defined in Mix.dialog in the StartRequest or UpdateRequest. For example, let's say that the user name and preferred coffee are stored on the user's phone, and you'd like to use them in your dialog application to customize your messages:

To implement this scenario:

  1. Create variables in Mix.dialog (for example, user_name and preferred_coffee). See Manage variables in the Mix.dialog documentation for details.
  2. Use the variables in the dialog; for example, the following message node includes the user_name value in the initial prompt:
  3. Send the values of user_name and preferred_coffee as key-value pairs in the StartRequestPayload or UpdateRequestPayload.

The dialog app can then include the user name in the next prompt:

    "payload": {
        "messages": [],
        "qa_action": {
            "message": {
                "nlg": [],
                "visual": [
                        "text": "Hello Miranda ! What can I do for you today?"
                "audio": []

Note: The variable values need to be sent in the expected format and range of expected values. If they are not, the variable value will not be updated. For example, the language session variable expects a four character language and country code combination from the set of languages configured in the project, for example en-US. So, for example, trying to set a language not supported by the project, or using an incorrect format like en will not result in an update to the language variable.

Simple variable types

Simple variables created in Mix.dialog are of a specified type. When you send a variable, whether in the StartRequest payload or in a data access action, you must make sure to send the data in the right format so that it can be used by the dialog application.

This table lists the types of simple variables and describes how to send them to the dialog application. The JSON code then shows examples of how to pass this type of data in a data access action.

For more information, see Variable types in the Mix.dialog documentation.

    "selector": {
        "channel": "default",
        "language": "en-US",
        "library": "default"
    "payload": {
        "requested_data": {
            "id": "DataAccess",
            "data": {
                "returnCode": "0",
                "sampleString": "This is a sample string",
                "sampleAlphanumeric": "1-2 This is an alphanumeric string.",
                "sampleDigits": "12",
                "sampleBoolean": "true",                
                "sampleInt": 27,
                "sampleDecimal": 12.34,
                "sampleAmount": {
                    "unit": "USD",
                    "number": 10.5
                "sampleDate": "202001014",
                "sampleTime": "1212a",
                "sampleDistance": {
                    "modifier": "LE",
                    "unit": "km",
                    "number": 10
                "sampleTemperature": {
                    "unit": "C",
                    "number": 32
Variable type Description
String String of characters
Alphanumeric String of alphanumeric characters (a-z, A-Z, 0-9)
Digits String of digits (0-9)
Boolean Boolean (true, false)
Integer Whole number
Decimal Decimal-point number
Amount Amount, including currency. Specify the amount in an object with the following elements:
  • unit: Unit of currency, such as USD
  • number: Number of units
The currency is dependent on the grammar. For example, if the en-US grammar is used, the only currency accepted is USD.
Date Date (YYYYMMDD)
Time Time. Specify as a string using the format HHMMx, where x is one of the following:
  • a: for AM
  • p: for PM
Distance Distance, including unit and modifier. Specify the distance in an object with the following elements:
  • modifier: Modifier such as LT for "less than"
  • unit: Unit of distance, such as km
  • number: Number of units
See the nuance_DISTANCE schema for the unit and modifier values supported.
Temperature Temperature, including unit. Specify the temperature in an object with the following elements:
  • unit: Unit of temperature, such as C
  • number: Number of units
See the nuance_TEMPERATURE schema for the unit values supported.

Disabling logging

Sensitive flagging and partial redaction

By default, the values of any entities and variables marked as 'sensitive' in Mix.dialog and Mix.nlu are redacted for Dialog and NLU payload logs in the Nuance Mix runtime event logs. This is called partial redaction. The content of the text exchanges for both sides of the conversational will be partially readable, but traces of sensitive information are redacted.

Complete redaction

If you want to suppress logging of the contents of conversations more broadly and completely, set the suppress_log_user_data flag in the StartRequestPayload to True. This completely disables logging of the contents of the conversation for Dialog, and, whenever the other services are orchestrated by Dialog, this also triggers corresponding flags to suppress logging of contents in ASR, NLU, and TTS. This is the master redact button when you want the event logs to remember nothing of the words or data transmitted during the conversation.

See Managing sensitive information in an application in the Nuance Mix Runtime Event Logs documentation for more details.

User ID

You can specify a user ID in the StartRequest, ExecuteRequest, UpdateRequest, and StopRequest. This user ID is converted into an unreadable format and stored in call logs and user-specific files. It can be used for:

Note: The user_id value can accept any UTF-8 characters.


Dialog as a Service provides three protocol buffer (.proto) files to define the Dialog service for gRPC. These files contain the building blocks of your dialog applications:

Once you have transformed the proto files into functions and classes in your programming language using gRPC tools, you can call these functions from your client application to start a conversation with a user, collect the user's input, obtain the action to perform, and so on.

See Client app development for a scenario using Python that provides an overview of the different methods and messages used in a sample order coffee application. For other languages, consult the gRPC and Protocol Buffer documentation:

Field names in proto and stub files

In this section, the names of the fields are shown as they appear in the proto files. To see how they are generated in your programming language, consult your generated files. For example:

Proto file Python Go Java
session_id session_id SessionId sessionId or getSessionId
selector selector Selector selector or setSelector

For details, see the Protocol Buffers documentation for:

Proto files structure

Structure of DLGaaS proto files










    request Standard DLGaaS ExecuteRequest
            pcm | alaw | ulaw | opus | ogg_opus

    response Standard DLGaaS ExecuteResponse



Proto files

Proto files

Proto files


Name Request Type Response Type Description
Start StartRequest StartResponse Starts a conversation. Returns a StartResponse object.
Status StatusRequest StatusResponse Returns the status of a session. Returns grpc status 0 (OK) if found, 5 (NOT_FOUND) if no session was found. Returns a StatusResponse object.
Update UpdateRequest UpdateResponse Updates the state of a session without advancing the conversation. Returns an UpdateResponse object.
Execute ExecuteRequest ExecuteResponse Used to continuously interact with the conversation based on end user input or events. Returns an ExecuteResponse object that will contain data related to the dialog interactions and that can be used by the client to interact with the end user.
ExecuteStream StreamInput stream StreamOutput stream Performs recognition on streamed audio using ASRaaS and provides speech synthesis using TTSaaS.
Stop StopRequest StopResponse Ends a conversation and performs cleanup. Returns a StopResponse object.

This service includes:



Request object used by the Start method.

Field Type Description
session_id string Optional session ID. If not provided then one will be generated.
selector common.Selector Selector providing the channel and language used for the conversation.
payload common.StartRequestPayload Payload of the Start request.
session_timeout_sec uint32 Session timeout value (in seconds), after which the session is terminated. The maximum is configured in the deployment.
user_id string Identifies a specific user within the application. See User ID.
client_data map<string,string> Map of client-supplied key-value pairs to inject into the call log. Optional.
Example: "client_data": { "param1": "value1", "param2": "value2" }

This method includes:



Response object used by the Start method.

Field Type Description
payload common.StartResponsePayload Payload of the Start response. Contains session ID.

This method includes:



Request object used by Status method. For more information about the Status method, see Step 5. Check session status.

Field Type Description
session_id string ID for the session.

This method includes:



Response object used by the Status method.

Field Type Description
session_remaining_sec uint32 Remaining session time to live (TTL) value in seconds, after which the session is terminated.
Note: The TTL may be a few seconds off based on how long the round trip of the request took.

This method includes:



Request object used by the Update method. For more information about the Update method, see Step 6. Update session data.

Field Type Description
session_id string ID for the session.
payload common.UpdateRequestPayload Payload of the Update request.
client_data map<string,string> Map of client-supplied key-value pairs to inject into the call log. Optional.
Example: "client_data": { "param1": "value1", "param2": "value2" }
user_id string Identifies a specific user within the application. See User ID.

This method includes:



Response object used by the Update method. Currently empty.

This method includes:



Request object used by the Execute method.

Field Type Description
session_id string ID for the session.
selector common.Selector Selector providing the channel and language used for the conversation.
payload common.ExecuteRequestPayload Payload of the Execute request.
user_id string Identifies a specific user within the application. See User ID.

This method includes:



Response object used by the Execute method. This object carries a payload, which instructs the client app to play messages to the user (as needed) and do one of the following:

Field Type Description
payload common.ExecuteResponsePayload Payload of the Execute response.

This method includes:



Performs recognition on streamed audio using ASRaaS and requests speech synthesis using TTSaaS.

asr_control_v1 (and control_message if applicable) must be sent as part of the first StreamInput message in order for DLGaaS to chain the audio stream with ASRaaS. Audio is then sent in the subsequent StreamInput messages.

Field Type Description
request ExecuteRequest Standard DLGaaS ExecuteRequest. Used to continue the dialog interactions.
asr_control_v1 AsrParamsV1 Defines audio recognition parameters to be forwarded to the ASR service to initiate audio streaming. The contents of this message correspond to those of the recognition_init_message field used in the first message of the ASR input stream.
audio bytes Subsequent message containing audio samples in the selected encoding for recognition.
tts_control_v1 TtsParamsv1 Parameters to be forwarded to the TTS service.
control_message nuance.asr.v1.ControlMessage Optional input message to be forwarded to the ASR service. This corresponds to the optional control_message field used in the first message of the ASR input stream. ASR uses this message to start the recognition no-input timer if it was disabled by a stall_timers recognition flag in asr_control_v1. See the ASRaaS RecognitionRequest documentation for details.

This method includes:

    request Standard DLGaaS ExecuteRequest
            pcm | alaw | ulaw | opus | ogg_opus


Streams the requested TTS output and returns ASR results.

Field Type Description
response ExecuteResponse Standard DLGaaS ExecuteResponse; used to continue the dialog interactions.
audio nuance.tts.v1.SynthesisResponse TTS output. See the TTSaaS SynthesisResponse documentation for details.
asr_result nuance.asr.v1.Result Output message containing the transcription result, including the result type, the start and end times, metadata about the transcription, and one or more transcription hypotheses. See the ASRaaS Result documentation for details.
asr_status nuance.asr.v1.Status Output message indicating the status of the transcription. See the ASRaaS Status documentation for details.
asr_start_of_speech nuance.asr.v1.StartOfSpeech Output message containing the start-of-speech message. See the ASRaaS StartOfSpeech documentation for details.

This method includes:

    response Standard DLGaaS ExecuteResponse


Request object used by Stop method.

Field Type Description
session_id string ID for the session.
user_id string Identifies a specific user within the application. See User ID.

This method includes:



Response object used by the Stop method. Currently empty; reserved for future use.

This method includes:


Fields reference


Parameters to be forwarded to the ASR service. See Step 4b. Interact with the user (using audio) for details.

Field Type Description
audio_format nuance.asr.v1.AudioFormat Audio codec type and sample rate. See the ASRaaS AudioFormat documentation for details.
utterance_detection_mode nuance.asr.v1. EnumUtteranceDetectionMode How end of utterance is determined. Defaults to SINGLE. See the ASRaaS EnumUtteranceDetectionMode documentation for details.
recognition_flags nuance.asr.v1.RecognitionFlags Flags to fine tune recognition. See the ASRaaS RecognitionFlags documentation for details.
result_type nuance.asr.v1.EnumResultType Whether final, partial, or immutable results are returned. See the ASRaaS EnumResultType documentation for details.
no_input_timeout_ms uint32 Maximum silence, in ms, allowed while waiting for user input after recognition timers are started. Default (0) means server default, usually no timeout. See the ASRaaS Timers documentation for details.
recognition_timeout_ms uint32 Maximum duration, in ms, of recognition turn. Default (0) means server default, usually no timeout. See the ASRaaS Timers documentation for details.
utterance_end_silence_ms uint32 Minimum silence, in ms, that determines the end of an utterance. Default (0) means server default, usually 500ms or half a second. See the ASRaaS Timers documentation for details.
speech_detection_sensitivity float A balance between detecting speech and noise (breathing, etc.), from 0 to 1. 0 means ignore all noise, 1 means interpret all noise as speech. Default is 0.5. See the ASRaaS Timers documentation for details.
max_hypotheses uint32 Maximum number of n-best hypotheses to return. Default (0) means a server default, usually 10 hypotheses.
end_stream_no_valid_hypotheses bool Determines whether the dialog application or the client application handles the dialog flow when ASRaaS does not return a valid hypothesis. When set to false (default), the dialog flow is determined by the Mix.dialog application, according to the processing defined for the NO_INPUT and NO_MATCH events. To configure the streaming request so that the stream is closed if ASRaaS does not return a valid hypothesis, set to true. See Handling unusable ASR audio for details.
resources nuance.asr.v1.RecognitionResource Repeated. Resources (DLMs, wordsets, builtins) to improve recognition. See the ASRaaS RecognitionResource documentation for details.
speech_domain string Mapping to internal weight sets for language models in the data pack. Values depend on the data pack.
formatting nuance.asr.v1.Formatting Specifies how the transcription results are presented, using keywords for formatting schemes and options supported by the data pack. See ASRaaS Formatting for details.


Settings configured for a data access node backend connection.

Field Type Description
fetch_timeout string Number of milliseconds allowed for fetching the data before timing out.
connect_timeout string Connect timeout in milliseconds.


Continue action provides the client application with information useful for handling latency or delays involved with a data access node using a backend data connection. The continue action prompts the client application to respond to initiate the data access.

Field Type Description
message Message Latency message to be played to the user while waiting for the backend data access.
view View View details for this action.
data google.protobuf.Struct Map of data exchanged in this node.
id string ID identifying the Continue action node in the dialog application.
message_settings MessageSettings Settings to be used along with messages returned to the present user.
backend_connection_settings BackendConnectionSettings Backend settings that will be used by DLGaaS for connecting to and fetching from the backend.


A Data Access action is associated with a Data access node using client-side data access. It provides the client application with data needed to perform the data access as well as a message to play to the user while waiting for the data access to complete.

Field Type Description
id string ID identifying the Data Access node in the dialog application.
message Message Message to be played to the user while waiting for the data access to complete.
view View View details for this action.
data google.protobuf.Struct Map of data exchanged in this node.
message_settings MessageSettings Settings to be used along with messages played to the present user.


Message used to indicate an event that occurred during the dialog interactions.

Field Type Description
type DialogEvent.EventType Type of event being triggered.
message string Optional message providing additional information about the event.
event_name string Name of custom event. Must be set to the name of the custom event defined in Mix.dialog. See Manage events for details. Applies only when DialogEvent.EventType is set to CUSTOM.


The possible event types that can occur on the client side of interactions.

Name Number Description
SUCCESS 0 Everything went as expected.
ERROR 1 An unexpected problem occurred.
NO_INPUT 2 End user has not provided any input.
NO_MATCH 3 End user provided unrecognizable input.
HANGUP 4 End user has hung up. Currently used for IVR interactions.
CUSTOM 5 Custom event. You must set field event_name in DialogEvent to the name of the custom event defined in Mix.dialog.


End node, indicates that the dialog has ended.

Field Type Description
data google.protobuf.Struct Map of data exchanged in this node.
id string ID identifying the End Action node in the dialog application.


Escalation action to be performed by the client application.

Field Type Description
message Message Message to be played as part of the escalation action.
view View View details for this action.
data google.protobuf.Struct Map of data exchanged in this node.
id string ID identifying the External Action node in the dialog application.


Payload sent with the Execute request. If both an event and a user input are provided, the event has precedence. For example, if an error event is provided, the input will be ignored.

Field Type Description
user_input UserInput Input provided to the Dialog engine.
dialog_event DialogEvent Used to pass in events that can drive the flow. Optional; if an event is not passed, the operation is assumed to be successful.
requested_data RequestData Data that was previously requested by engine.


Payload returned after the Execute method is called. Specifies the action to be performed by the client application.

Field Type Description
messages Message Repeated. Message action to be performed by the client application.
qa_action QAAction Question and answer action to be performed by the client application.
da_action DAAction Data access action to be performed by the client application in relation to data access node using client-side data connection.
escalation_action EscalationAction Escalation action to be performed by the client application.
end_action EndAction End action to be performed by the client application.
continue_action ContinueAction Continue action to be performed by the client application in relation to data access node using server-side data connection.


Specifies the message to be played to the user. See Message actions for details.

Field Type Description
nlg Message.Nlg Repeated. Text to be played using Text-to-speech.
visual Message.Visual Repeated. Text to be displayed to the user (for example, in a chat).
audio Message.Audio Repeated. Prompt to be played from an audio file.
view View View details for this message.
language string Message language in xx-XX format, e.g. en-US.
tts_parameters TTSParameters Voice parameters for TTS to be used when TTSaaS orchestrated separately from DLGaaS.


Field Type Description
text string Text to be used as TTS backup if the audio file cannot be played.
uri string URI to the audio file, in the following format:
For example: en-US/prompts/default/Omni_Channel_VA/Message_ini_01.wav?version=1.0_1602096507331
See here for more details on how the filename portion is generated.
mask bool When set to true, indicates that the text contains sensitive data that will be masked in logs.
barge_in_disabled bool When set to true, indicates that barge-in is disabled.


Field Type Description
voice Voice TTSaaS voice to be used.


Field Type Description
name string The voice's name, e.g. 'Evan'. Mandatory for SynthesizeRequest.
model string The voice's quality model, e.g. 'standard' or 'enhanced'. Mandatory for SynthesizeRequest.
gender EnumGender Voice gender. Default ANY for SynthesisRequest.
language string Language associated with the voice in xx-XX format, e.g. en-US.


TTSaaS voice gender.

Name Number Description
ANY 0 Any gender voice. Default for SynthesisRequest.
MALE 1 Male voice.
FEMALE 2 Female voice.
NEUTRAL 3 Neutral gender voice.


Field Type Description
text string Text to be played using Text-to-speech.
mask bool When set to true, indicates that the text contains sensitive data that will be masked in logs.
barge_in_disabled bool When set to true, indicates that barge-in is disabled.


Field Type Description
text string Text to be displayed to the user (for example, in a chat).
mask bool When set to true, indicates that the text contains sensitive data that will be masked in logs.
barge_in_disabled bool When set to true, indicates that barge-in is disabled.


Settings to be used with messages returned by DAAction or ContinueAction.

Field Type Description
delay string Time in ms to wait before presenting user with message.
minimum string Time in ms to display/play message to user.


Question and answer action to be performed by the client application.

Field Type Description
message Message Message to be played as part of the question and answer action.
data google.protobuf.Struct Map of data exchanged in this node.
view View View details for this action.
selectable Selectable Interactive elements to be displayed by the client app, such as clickable buttons or links. See Interactive elements for details.
recognition_settings RecognitionSettings Configuration information to be used during recognition.
mask bool When set to true, indicates that the Question and Answer node is meant to collect an entity that will hold sensitive data to be masked in logs.


Configuration information to be used during recognition.

Field Type Description
dtmf_mappings DtmfMapping Array of DTMF mappings configured in Mix.dialog.
collection_settings CollectionSettings Collection settings configured in Mix.dialog.
speech_settings SpeechSettings Speech settings configured in Mix.dialog.
dtmf_settings DtmfSettings DTMF settings configured in Mix.dialog.


Collection settings configured in Mix.dialog.

Field Type Description
timeout string Time, in ms, to wait for speech once a prompt has finished playing before throwing a NO_INPUT event.
complete_timeout string Duration of silence, in ms, to determine the user has finished speaking. The timer starts when the recognizer has a well-formed hypothesis.
incomplete_timeout string Duration of silence, in ms, to determine the user has finished speaking. The timer starts when the user stops speaking.
max_speech_timeout string Maximum duration, in ms, of an utterance collected from the user.


DTMF mappings configured in Mix.dialog. See Set DTMF mappings for details.

Field Type Description
id string Name of the entity to which the DTMF mapping applies.
value string Entity value to map to a DTMF key.
dtmf_key string DTMF key associated with this entity value. Valid values are: 0-9, *, #


DTMF settings configured in Mix.dialog.

Field Type Description
inter_digit_timeout string Maximum time, in ms, allowed between each DTMF character entered by the user.
term_timeout string Maximum time, in ms, to wait for an additional DTMF character before terminating the input.
term_char string Character that terminates a DTMF input.


Speech settings configured in Mix.dialog.

Field Type Description
sensitivity string Level of sensitivity to speech. 1.0 means highly sensitive to quiet input, while 0.0 means least sensitive to noise.
barge_in_type string Barge-in type; possible values: "speech" (interrupt a prompt by using any word) and "hotword" (interrupt a prompt by using a specific hotword).
speed_vs_accuracy string Desired balance between speed and accuracy. 0.0 means fastest recognition, while 1.0 means best accuracy.


Data that was requested by the dialog application.

Field Type Description
id string ID used by the dialog application to identify which node requested the data.
data google.protobuf.Struct Map of keys to json objects of the data requested.


Reference object of the resource to use for the request (for example, URN or URL of the model)

Field Type Description
uri string Reference (for example, the URL or URN for the Dialog model).
type ResourceReference. EnumResourceType Type of resource.


Name Number Description
APPLICATION_MODEL 0 Dialog application model.


Interactive elements to be displayed by the client app, such as clickable buttons or links. See Interactive elements for details.

Field Type Description
selectable_items Selectable.SelectableItem Repeated. Ordered list of interactive elements.


Field Type Description
value Selectable.SelectableItem. SelectedValue Key-value pair of entity information (name and value) for the interactive element. A selected key-value pair is passed in an ExecuteRequest when the user interacts with the element.
description string Description of the interactive element.
display_text string Label to display for this interactive element.
display_image_uri string URI of image to display for this interactive element.


Field Type Description
id string Name of the entity being collected.
value string Entity value corresponding to the interactive element.


Provides channel and language used for the conversation. See Selectors for details.

Field Type Description
channel string Optional: Channel that this conversation is going to use (for example, WebVA). Note: Replace any spaces or slashes in the name of the channel with the underscore character (_).
language string Optional: Language to use for this conversation. This sets the language session variable. The format is xx-XX, for example, "en-US"
library string Optional: Library to use for this conversation. Advanced customization reserved for future use. Always use the default value for now, which is default.


Payload sent with the Start request.

Field Type Description
model_ref ResourceReference Reference object for the Dialog model.
data google.protobuf.Struct Session variables data sent in the request as a map of key-value pairs.
suppress_log_user_data bool Set to true to disable logging for ASR, NLU, TTS, and Dialog.


Payload returned after the Start method is called. If a session ID is not provided in the request, a new one is generated and should be used for subsequent calls.

Field Type Description
session_id string Returns session ID to use for subsequent calls.


Payload sent with the Update request.

Field Type Description
data google.protobuf.Struct Map of key-value pairs of session variables to update.


Parameters to be forwarded to the TTS service. See Step 4b. Interact with the user (using audio) for details.

Field Type Description
audio_params nuance.tts.v1.
Output audio parameters, such as encoding and volume. See the TTSaaS AudioParameters documentation for details.
voice nuance.tts.v1.Voice The voice to use for audio synthesis. See the TTSaaS Voice documentation for details.


Provides input to the Dialog engine. The client application sends either the text collected from the user, to be interpreted by Mix, or an interpretation that was performed externally.

Note: Provide only one of the following fields: user_text, interpretation, selected_item, nluaas_interpretation.

Field Type Description
user_text string Text collected from end user.
interpretation UserInput.Interpretation Interpretation that was done externally (for example, Nuance Recognizer for VoiceXML). This can be used for simple interpretations that include entities with string values only. Use nluaas_interpretation for interpretations that include complex entities.
selected_item Selectable.SelectableItem.
Value of element selected by end user.
nluaas_interpretation nuance.nlu.v1.InterpretResult Interpretation that was done externally (for example, Nuance Recognizer for VoiceXML), provided in the NLUaaS format. See Interpreting text user input for an example. Note that DLGaaS currently only supports single intent interpretations.
input_mode string Optional: Input mode. Used for reporting. Current values are dtmf/voice. Applies to user_text and nluaas_interpretation input only.


Sends interpretation data.

Field Type Description
confidence float Required: Value from 0..1 that indicates the confidence of the interpretation.
input_mode string Optional: Input mode. Current values are dtmf/voice (but input mode not limited to these).
utterance string Raw collected text.
data UserInput.Interpretation.
Repeated. Data from the interpretation of intents and entities. For example, INTENT:BILL_PAY or or AMOUNT:100.
slot_literals UserInput.Interpretation.
Repeated. Slot literals from the interpretation of the entities. The slot literal provides the exact words used by the user. For example, AMOUNT: One hundred dollars.
slot_formatted_literals UserInput.Interpretation.
Repeated. Slot formatted literals from the interpretation of the entities.
slot_confidences UserInput.Interpretation.
Repeated. Slot confidences from the interpretation of the entities.
alternative_interpretations UserInput.Interpretation Repeated. Alternative interpretations possible from the interaction, that is, n-best list.


Field Type Description
key string Key of the data.
value string Value of the data.


Field Type Description
key string Name of the entity.
value float Value from 0..1 that indicates the confidence of the interpretation for this entity.


Field Type Description
key string Name of the entity.
value string Literal value of the entity.


Field Type Description
key string Name of the entity.
value string Literal value of the entity.


Specifies view details for this action.

Field Type Description
id string Class or CSS defined for the view details in the node.
name string Type defined for the view details in the node.

Scalar Value Types

.proto Type Notes C++ Type Java Type Python Type
double double double float
float float float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long
uint32 Uses variable-length encoding. uint32 int int/long
uint64 Uses variable-length encoding. uint64 long int/long
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long
sfixed32 Always four bytes. int32 int int
sfixed64 Always eight bytes. int64 long int/long
bool bool boolean boolean
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode
bytes May contain any arbitrary sequence of bytes. string ByteString str

Change log


Updates to Session lifetime. The maximum configurable session time limit has been increased from 24 hours to 72 hours.




The proto files have been updated. To use the new fields:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


The proto files have been updated. To use the new fields:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


A Message returned as part of an ExecuteResponse now includes the current active language for the conversation. This allows the client application to be aware when the language is changed in the dialog. The message also includes information about the TTS voice configured to use for the message. This voice information includes the name of the voice, quality model, gender, and language for which the voice applies.

The TTS voice information is useful if you need to orchestrate with TTSaaS separately from Dialog using a TTSaaS SynthesisRequest.

Being aware when the active language is changed is useful if the client application is using a third-party solution for text to speech.

For more information about handling TTSaaS orchestration in the client application, see Generating synthesized speech output.

The proto files have been updated. To use the new fields:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


Minor updates to sample app run script in Client app development.


Updates to Disabling logging.


Adding new content about Handling DTMF input in IVR applications.


To use new fields:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use the new field:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.



To use this new method:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use this new method:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use the new fields:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use the new resources field:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use the new input_mode field:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use these features:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use these features:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use this feature:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


Added more information about URIs for audio files.


To use this feature:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use these features:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.





To use this feature:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use these features:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.



To use these features:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.


To use these features:

  1. Download the latest version of the proto files.
  2. Generate the client stubs from the proto files as described in gRPC setup.



First release of this new version.