NLU Model Specification (TRSX)
Mix.nlu supports the creation of a project using an XML file that conforms to the TRSX specification.
The TRSX file format represents a Mix.nlu project in an XML file. Mix.nlu projects are comprised of an ontology and associated data. The ontology defines the schema of the model, and the data includes training samples and dictionaries. Both parts can be defined in a TRSX file.
TRSX provides the developer flexibility in managing their model data. With TRSX, you can manage an entire model in a single file outside of Mix and import the model into a Mix.nlu project. You can also manage training data in separate TRSX files and import them individually.
TRSX is designed to be a universal file format. Nuance will maintain and update the format with additional features and also add new functionality to Mix.nlu for handing TRSX.
The TRSX specification is defined, owned, and maintained by Nuance Communications.
Overview
TRSX is short for "TRaining Set XML."
A TRSX file has the following main sets of data:
- Metadata
- Entry nodes with key-value pairs
- Sources
- List of sources used to label data
- Ontology
- Intents
- Entity
- Intent/Entity relationships, known as Links
- Entity/Entity relationships, defined as Relations
- Dictionaries
- List of entities included in a List type entity
- Samples (the training set)
- Training samples with annotations for entities
The TRSX specification allows for representing a complete Mix.nlu project. Keeping all aspects of the project in a single file is possible and may be simpler in some situations. However, if you're dealing with larger projects, it might be worth keeping the ontology separate (and shared across languages, e.g., ontology.trsx.xml) as well as having separate TRSX files for dictionaries (e.g., music-list.trsx.xml) and samples (e.g., samples-group-a.trsx.xml, samples-group-b.trsx.xml).
This section describes TRSX version 2.6, which is currently deployed in the Mix environment.
Specification
Project node
Project XML meta
<project xmlns:nuance="https://developer.nuance.com/mix/nlu/trsx"
xml:lang="en-us"
nuance:version="2.6">
<metadata/>
<sources/>
<ontology/>
<dictionaries/>
<samples/>
</project>
Project schema
<xs:element name='project'>
<xs:annotation>
<xs:documentation>Mix.nlu Project: Metadata, Sources, Ontology,
Dictionaries & Samples.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='1' ref='metadata'>
<xs:annotation>
<xs:documentation>Metadata for the project.</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element minOccurs='0' maxOccurs='1' ref='sources'>
<xs:annotation>
<xs:documentation>Sources for the project.</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element minOccurs='0' maxOccurs='1' ref='ontology'>
<xs:annotation>
<xs:documentation>Ontology for the project.</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element minOccurs='0' maxOccurs='1' ref='dictionaries'>
<xs:annotation>
<xs:documentation>Dictionary values (entity literals and their values).</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='samples'>
<xs:annotation>
<xs:documentation>Samples part of the training set. Samples may have annotations.</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
<xs:attribute ref='nuance:version' use='required'/>
<xs:attribute ref='xml:lang' default='en-us'/>
<xs:attribute ref='nuance:enginePackVersion'/>
</xs:complexType>
</xs:element>
A project encapsulates a large part of what is in the Mix.nlu tool. The project node is the primary container for a Mix.nlu model. Projects are defined per language.
Project node specification
The project
node is defined as follows:
- The
project
node contains zero-1metadata
,sources
,ontology
, anddictionaries
nodes. - The
project
node contains zero-manysamples
nodes.
A project
node has the following attributes:
Attribute | Required? | Description |
---|---|---|
nuance:version |
Required | TRSX spec version. For example: 2.6 . |
xml:lang |
Optional | Language of the project. See Languages and Voice for the list of valid codes. |
nuance:enginePackVersion |
Optional | Engine pack version of the project when exported. For example: 3.8 . See Engine packs for more information. |
Metadata node
Metadata XML meta
<metadata>
<entry key="str">str</entry>
</metadata>
Metadata schema
<xs:element name='metadata'>
<xs:annotation>
<xs:documentation>Project Metadata: Captures data that is retained upon import.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' name='entry'>
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name='key' use='required'
type='xs:NCName'/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
> Real example using metadata
```xml
<metadata>
<entry key="description">My search engine model</entry>
<entry key="version">1.0.0</entry>
<entry key="author">Bennie Jets</entry>
</metadata>
A project's metadata lets you manage extra details about your project or TRSX file that are not part of the training data itself, such as author or version.
Metadata node specification
The metadata
node is defined as follows:
- The
metadata
node contains zero-manyentry
nodes. - The
entry
node contains the key-value pair that specifies the metadata. - The
entry
node has the attributekey
, which is a string. - The
entry
's value can be any string.
Any information that you define will be preserved upon import.
Sources node
Sources XML meta
<sources>
<source name="name" uri="uri" version="string" type="type"/>
</sources>
Sources schema
<xs:element name='sources'>
<xs:annotation>
<xs:documentation>Sources: List of Sources used to label imported data.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='source'/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name='source'>
<xs:complexType>
<xs:attribute name='name' use='required' type='xs:NCName'/>
<xs:attribute name='displayName' use='optional' type='xs:string'/>
<xs:attribute name='uri' use='optional' type='xs:anyURI'/>
<xs:attribute name='version' use='optional' type='xs:string'/>
<xs:attribute name='type' use='optional' type='source_type'/>
<xs:attribute name='useForOOV' use='optional' type='xs:boolean'/>
</xs:complexType>
</xs:element>
<xs:simpleType name="source_type">
<xs:restriction base="xs:string">
<xs:enumeration value="CUSTOM"/>
<xs:enumeration value="PREBUILT"/>
<xs:enumeration value="REJECTION"/>
</xs:restriction>
</xs:simpleType>
Real example using sources
<sources>
<source name="DTV_Domain" uri="http://localhost:80/my_local_dtv_domain" version="1.0" type="PREBUILT"/>
<source name="IOT_Domain" type="CUSTOM"/>
<source name="NCSRef_Rejection" uri="http://localhost:80/ncs_ref_rejection_model" version="1.0" type="REJECTION"/>
</sources>
The Sources node provides a list of sources used to label imported data to identify its origin. To label data, you set its sourceref
attribute to the name of the source.
For example, assuming the following source:
<source name="DTV_Domain" uri="http://localhost:80/my_local_dtv_domain" version="1.0" type="PREBUILT"/>
To label an entity with this source, you would define it as follows:
<concept name="CHANNEL" sourceref="DTV_Domain"/>
All data not explicitly associated to a source declared in this node will be associated with a default source named nuance_custom_data
. Note that this is not visible in the TRSX file.
Source node specification
The sources
node contains zero-many source
nodes.
The source
node has the following attributes:
Attribute | Required? | Description |
---|---|---|
name |
Required | Name of the source data. For example, DTV_Domain . |
displayName |
Optional | Name of the source as displayed in the Mix.nlu interface. |
uri |
Optional | URI of the source. For example, http://localhost:80/my_local_dtv_domain . |
version |
Optional | Version of the source. For example, 1.0 . |
type |
Optional | Type of source. Can be one of the following: CUSTOM (Custom data), PREBUILT (Pre-built model), REJECTION (Rejection model) |
useForOOV |
Optional | If the useForOOV attribute is true , the source sample nuance_custom_data is used for DLM building. Default is true . |
Ontology node
Ontology XML meta
<ontology>
<intents>
<intent name="str">
<links>
<link conceptref="str"/>
</links>
</intent>
</intents>
<concepts>
<concept name="str" freetext="bool">
<settings>
<setting name="str" value="str"/>
</settings>
<relations>
<relation type="str" conceptref="str" />
</relations>
</concept>
</concepts>
</ontology>
Ontology schema
<xs:element name='ontology'>
<xs:annotation>
<xs:documentation>Project Ontology: Data contract consisting of Intents, Entities, and their relations and properties.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='1' ref='intents'/>
<xs:element minOccurs='0' maxOccurs='1' ref='concepts'/>
</xs:sequence>
<xs:attribute name='base' use='optional' type='xs:anyURI'/>
</xs:complexType>
</xs:element>
<xs:element name='intents'>
<xs:annotation>
<xs:documentation>Ontology Intents: App use cases, utilized as classifications for samples.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='intent'/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name='intent'>
<xs:annotation>
<xs:documentation>Intent: A particular use case, e.g. "turn_tv_on" or "stop_microwave".</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='1' ref='links'/>
</xs:sequence>
<xs:attribute name='name' use='required' type='xs:NCName'/>
<xs:attribute name='sourceref' use='optional' type='xs:NCName'/>
</xs:complexType>
</xs:element>
<xs:element name='links'>
<xs:annotation>
<xs:documentation>An intent is linked to a set of entities. Samples within the intent can be annotated only with the linked entities.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='link'/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name='link'>
<xs:annotation>
<xs:documentation>Intent link: Specific entities that are applicable in the context of the defined Intent.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:attribute name='conceptref' use='required' type='xs:NCName'/>
<xs:attribute name='sourceref' use='optional' type='xs:NCName'/>
</xs:complexType>
</xs:element>
<xs:element name='concepts'>
<xs:annotation>
<xs:documentation>Ontology entities: Specific details as part of the given use case, to guide decision making.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='concept'/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name='concept'>
<xs:annotation>
<xs:documentation>Entity: Possibly list based (static or dynamic) or freetext - these are parsed within the context of an Intent.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:all>
<xs:element minOccurs='0' maxOccurs='1' ref='settings'/>
<xs:element minOccurs='0' maxOccurs='1' ref='regex'/>
<xs:element minOccurs='0' maxOccurs='1' ref='relations'/>
</xs:all>
<xs:attribute name='name' use='required' type='xs:NCName'/>
<xs:attribute name='dataType' use='optional' type='data_type'/>
<xs:attribute name='freetext' use='optional' type='xs:boolean' default='false'/>
<xs:attribute name='dynamic' use='optional' type='xs:boolean' default='false'/>
<xs:attribute name='ruleGrammarFileName' use='optional' type='xs:string'/>
<xs:attribute name='sourceref' use='optional' type='xs:NCName'/>
</xs:complexType>
</xs:element>
<xs:simpleType name='data_type'>
<xs:restriction base='xs:string'>
<xs:enumeration value='not_set'/>
<xs:enumeration value='no_format'/>
<xs:enumeration value='yes_no'/>
<xs:enumeration value='boolean'/>
<xs:enumeration value='number'/>
<xs:enumeration value='digits'/>
<xs:enumeration value='alphanum'/>
<xs:enumeration value='date'/>
<xs:enumeration value='time'/>
<xs:enumeration value='amount'/>
<xs:enumeration value='distance'/>
<xs:enumeration value='temperature'/>
</xs:restriction>
</xs:simpleType>
<xs:element name='regex'/>
<xs:element name='relations'>
<xs:annotation>
<xs:documentation>Entity Relations: It is possible to inherit from an entity in some fashion. The following types can be described: isA, hasA, and hasReferrers. </xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='relation'/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name='relation'>
<xs:annotation>
<xs:documentation>Relation: Type of linkage between entities -- isA, hasA, or hasReferrers.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:attribute name='type' use='required' type='relation_type'/>
<xs:attribute name='conceptref' use='required' type='xs:NCName'/>
<xs:attribute name='sourceref' use='optional' type='xs:NCName'/>
</xs:complexType>
</xs:element>
<xs:simpleType name='relation_type'>
<xs:restriction base='xs:string'>
<xs:enumeration value='isA'/>
<xs:enumeration value='hasA'/>
<xs:enumeration value='hasReferrers'/>
</xs:restriction>
</xs:simpleType>
<xs:element name='settings'>
<xs:annotation>
<xs:documentation>Entity Settings: Properties on the entity that can be altered, e.g. "canonicalize".</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='setting'/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name='setting'>
<xs:annotation>
<xs:documentation>Entity Setting: Specific name-value pair to assign to the Entity.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:attribute name='name' use='required'/>
<xs:attribute name='value' use='required'/>
</xs:complexType>
</xs:element>
Real example using ontology. In this example, there is one use case, search, and two entities are used as part of that use case: the query and the engine to use when searching.
<ontology base="https://developer.nuance.com/mix/nlu/trsx/ontology-1.0.xml">
<intents>
<intent name="SEARCH">
<links>
<link conceptref="SEARCH_ENGINE"/>
<link conceptref="SEARCH_QUERY"/>
</links>
</intent>
</intents>
<concepts>
<concept name="SEARCH_QUERY" freetext="true"/>
<concept name="SEARCH_ENGINE"/>
</concepts>
</ontology>
An ontology is a formal specification of the semantic schema of your model.
In Mix.nlu, the ontology includes the set of intents and entities, and the relationships between them. The intents and entities you define in the ontology are used to annotate your training data, and thus form the interface between the NLU and your client application.
The ontology is the central schema for organizing your model and its sample data. The intents, entities, and associations between them are all stored in the ontology.
This section first introduces the terminology and notions that apply to intents and entities. The specification is then provided.
Intent-entity links
Each intent is linked to zero or more entities. These are the entities that you can use when annotating entities in training samples that belong to the intent. They are also the full set of entities that can be returned in the JSON NLU results for this intent. You can think of the links as parameters to a method.
Samples labeled with an intent will only allow annotation of entities that are linked to the intent.
Entity data types
The following entity data types can be included in a TRSX file:
Type | Description |
---|---|
not_set |
Data type is not set. |
no_format |
Text data without any special format. Refers to the Generic data type in Mix.nlu and Mix.dialog. |
yes_no |
Yes or no. |
boolean |
True or false. |
number |
A numerical quantity. |
digits |
A sequence of digits from 0-9. |
alphanum |
A sequence of letters or numbers, A-Z, a-z, 0-9. |
date |
A YYYYMMDD date. |
time |
An HHMM time. |
amount |
A quantity with units, defined by the magnitude and units. |
distance |
A measure of distance, including magnitude and distance unit. |
temperature |
A measure of temperature, including possibly signed magnitude and units. |
Entity collection methods
A collection method is related to how the set of possible values of the entity can be enumerated or defined.
Entity collection methods are as follows:
Type | Description |
---|---|
List | Default collection method of entities with data type no_format (generic). |
Dynamic List | If the dynamic attribute is true , the entity is defined as a Dynamic List. Default is false . |
Freeform | If the freetext attribute is true , the entity is defined as a Freeform entity. Default is false . |
Rule-based | Rule-based entities are defined with a GrXML grammar. To define an entity as rule-based, specify the name of the GrXML file in the ruleGrammarFileName attribute. |
Relationship | Entities can be extended by relationships to other entities. To define a relationship entity, do not assign freetext or dynamic .Instead, use a relations node. |
Predefined entities
Nuance predefined entities can be used simply by referencing them via conceptref
in ontology relation nodes and in sample annotations.
Entity relationships
There are three types of relationships that entities can have between entities.
Type | Description |
---|---|
isA |
A entity can be related to zero-one entity with the isA relationship. |
hasA |
A entity can be related to zero-to-many entities with the hasA relationship. |
hasReferrers |
An entity can be referred to as a moment (REF_MOMENT), a person (REF_PERSON), a place (REF_PLACE), or a thing (REF_THING). See Anaphoras for details. |
Note: isA
and hasA
are mutually exclusive.
For example, consider a flight application, with the following entities: City, origin_city, destination_city, and itinerary. The following relationships could be defined:
- origin_city isA city
- destination_city isA city
- itinerary hasA origin_city
- itinerary hasA destination_city
Ontology node specification
The ontology
node is defined as follows:
- The
ontology
node contains zero-oneintents
node andconcepts
node. - The
ontology
node contains the optional attributebase
, which is currently not used.
Intents node
The intents
node contains the ontology intents and is defined as follows:
- The
intents
node contains zero-manyintent
nodes. - Each
intent
node defines a specific intent and contains zero-onelinks
node.
Each intent
node has the following attributes:
Attribute | Required? | Description |
---|---|---|
name |
Required | Name of the intent. For example: FIND_MOVIE . |
sourceref |
Optional | Source of the intent. For example: DTV_Domain . |
Links node
An intent is linked to a set of entities. The links
node describes the entities that can be used in sample annotations for this intent and are returned in the JSON results.
The links
node contains zero-many link
nodes.
Each link
node has the following attributes:
Attribute | Required? | Description |
---|---|---|
conceptref |
Required | Name of the entity associated with this intent. This entity is defined in a concept node (see below). |
sourceref |
Optional | Source of the link. For example: DTV_Domain . |
Concepts node
The concepts
node contains the ontology entities and is defined as follows:
- The
concepts
node contains zero-manyconcept
nodes. - Each
concept
node defines a single entity. - Each
concept
node contains zero-onesettings
,relations
, andregex
node.
Each concept
node has the following attributes:
Attribute | Required? | Description |
---|---|---|
name |
Required | Name of the entity. |
dataType |
Optional | Type of data the entity contains. See Entity data types for more information. |
freetext |
Optional | true for a freeform entity; otherwise, false (default). See Entity collection methods. |
dynamic |
Optional | true for a dynamic entity; otherwise, false (default). See Entity collection methods. |
ruleGrammarFileName |
Optional | For rule-based entities, specifies the name of the GrXML file that defines the entity and its relative location in the TRSX file. See Entity collection methods. |
sourceref |
Optional | Source of the entity. For example: DTV_Domain . |
Relations node
The relations
node specifies the relation between entities. Relations can be of type isA
, hasA
, or hasReferrers
. See Entity Relationships for more information.
The relations
node contains zero-many relation
nodes.
Each relation
node has the following attributes:
Attribute | Required? | Description |
---|---|---|
type |
Required | Type of relation. Can be of type isA , hasA , or hasReferrers . See Entity relationships. |
conceptref |
Required | Name of entity to which the relation applies. |
sourceref |
Optional | Source of the relation. For example: DTV_Domain . |
Notes:
- The
relation
node cannot contain twoisA
relations. isA
andhasA
are mutually exclusive.
Settings node
Settings schema
<ontology base="http://localhost:8080/resources/ontology-1.0.xml">
<concepts>
<concept name="PAYMENT_DATE"/>
<concept name="CCV">
<settings>
<setting name="isSensitive" value="true"/>
</settings>
</concept>
<concept name="CREDIT_CARD_NUMBER">
<settings>
<setting name="isSensitive" value="true"/>
</settings>
</concept>
</concepts>
</ontology>
The settings
node defines settings that apply to the entity.
Setting | Required? | Description |
---|---|---|
isSensitive |
Optional | When true , user sensitive data is masked in logs. Default value is false . |
canonicalize |
Optional | Specifies whether canonicalization is enabled or not. When true , values are returned for entities in the JSON results. Set to false if values are not required, for example, for performance optimization. Default value is true . |
Regex node
The regex
node defines the regex pattern for regex-based entities. For example, a phone number entity might have the following regex node: (\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}
Dictionaries node
Dictionaries XML meta
<dictionaries>
<dictionary conceptref="str">
<entry literal="str" value="str"/>
</dictionary>
</dictionaries>
Dictionaries schema
<xs:element name='dictionaries'>
<xs:annotation>
<xs:documentation>Dictionaries: Instances of list type Entities (static or dynamic type), that can have canonical values.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='dictionary'/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name='dictionary'>
<xs:annotation>
<xs:documentation>Entity Dictionary: Entity will have entries defining the entities.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' name='entry' type='dictionary_entry'/>
</xs:sequence>
<xs:attribute name='conceptref' use='required' type='xs:NCName'/>
</xs:complexType>
</xs:element>
<xs:complexType name='dictionary_entry'>
<xs:annotation>
<xs:documentation>Entity Dictionary Entry: Entity that has a 'literal', i.e. the surface form, and the 'value' for ancillary use.</xs:documentation>
</xs:annotation>
<xs:attribute name='literal' use='required'/>
<xs:attribute name='value' use='optional'/>
<xs:attribute name='protected' use='optional' type='xs:boolean' default='false'/>
<xs:attribute name='sourceref' use='optional' type='xs:NCName'/>
</xs:complexType>
Real example using dictionaries. The dictionaries consist of literal-value pairs that relate to a specific entity. This example showcases an entity "SEARCH_ENGINE", which is defined as List style entity. The value is optional, however it is worth observing the canonicalization of wikipedia and duck duck go to showcase how one might use the value attribute.
<dictionaries>
<dictionary conceptref="SEARCH_ENGINE">
<entry literal="bing" value="bing"/>
<entry literal="duck duck go" value="duckduckgo"/>
<entry literal="duckduckgo" value="duckduckgo"/>
<entry literal="google" value="google"/>
<entry literal="wiki" value="wikipedia"/>
<entry literal="wikipedia" value="wikipedia"/>
<entry literal="wikipédia" value="wikipedia"/>
<entry literal="yahoo" value="yahoo"/>
</dictionary>
</dictionaries>
A List entity can have an associated dictionary. The dictionary is the list of spoken forms that correspond to entities or 'mentions' that are part of the entity. For example, a City entity can have literals such as "New York City", "New York", and "The Big Apple".
A dictionary entry always has a literal and can optionally have a value.
The literal represents the exact spoken text that is present within an utterance (what the user said). For example, in the query “I’d like a large t-shirt”, the literal corresponding to the entity [TSHIRT_SIZE] is “large”. Other literals might be “small”, “medium”, “big”, “very big”, and “extra large”. When you annotate samples, you select a range of text to tag with an entity.
The value corresponds to what is returned in the JSON NLU results whenever the input utterance has a matching literal for this entity, through a process called canonicalization. Values can be any string you want. For example, a CoffeeType entity could have the literals "coffee" and "americano", and both literals would correspond to the value "americano".
Dictionaries node specification
The dictionaries
node is defined as follows:
- The
dictionaries
node contains zero-manydictionary
nodes. - Each
dictionary
node contains zero-manyentry
nodes - Each
dictionary
node contains a required attributeconceptref
, which defines the entity the entries apply to.
Each entry
node has the following attributes:
Attribute | Required? | Description |
---|---|---|
literal |
Required | Text that is present within an utterance. The literal is tokenized and normalized. |
value |
Optional | Value returned in the JSON NLU results. See Dictionaries node for information about values. |
protected |
Optional | Used to identify data that is confidential and should not be exposed. Default value is false .Do not set this field to true , otherwise you will not be able to access the data. |
sourceref |
Optional | Source of the entry. For example: DTV_Domain . |
Samples node
Sample XML meta
<samples>
<sample intentref="str" description="str" count="int" excluded="bool">
token1 token2 token3
<annotation conceptref="str">
token4 <annotation conceptref="str">token5</annotation>
</annotation>
token6
</sample>
</samples>
Sample schema
<xs:element name='samples'>
<xs:annotation>
<xs:documentation>Samples: Training set data that is used to train on; data has weights, intent classifications, and may be excluded.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='sample'/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name='sample'>
<xs:annotation>
<xs:documentation>Sample: A specific instance of an utterance in the training set. The utterance may have annotations.</xs:documentation>
</xs:annotation>
<xs:complexType mixed='true'>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='annotation'/>
</xs:sequence>
<xs:attribute name='description' use='optional' type='xs:string'/>
<xs:attribute name='count' use='optional' type='xs:integer'/>
<xs:attribute name='intentref' use='optional' type='xs:NCName'/>
<xs:attribute name='excluded' use='optional' type='xs:boolean' default='false'/>
<xs:attribute name='fullyVerified' use='optional' type='xs:boolean' default='false'/>
<xs:attribute name='protected' use='optional' type='xs:boolean' default='false'/>
<xs:attribute name='sourceref' use='optional' type='xs:NCName'/>
</xs:complexType>
</xs:element>
<xs:element name='annotation'>
<xs:annotation>
<xs:documentation>Sample Annotation: Annotations are used to tag entities present within utterances.</xs:documentation>
</xs:annotation>
<xs:complexType mixed='true'>
<xs:sequence>
<xs:element minOccurs='0' maxOccurs='unbounded' ref='annotation'/>
</xs:sequence>
<xs:attribute name='conceptref' use='required' type='xs:NCName'/>
</xs:complexType>
</xs:element>
Real example using samples. This example shows numerous samples, all with annotations and annotated with the single intent -- the search use-case.
<sample intentref="SEARCH" count="1">I'd like to find<annotation conceptref="SEARCH_QUERY">good coffee places nearby</annotation>on<annotation conceptref="SEARCH_ENGINE">bing</annotation>
</sample>
<sample intentref="SEARCH" count="1">look up<annotation conceptref="SEARCH_QUERY">Edward Snowden</annotation>on<annotation conceptref="SEARCH_ENGINE">duckduckgo</annotation>
</sample>
<sample intentref="SEARCH" count="1">
<annotation conceptref="SEARCH_ENGINE">google</annotation>
<annotation conceptref="SEARCH_QUERY">how long does a sequoia live</annotation>?
</sample>
The samples consist of phrases or sentences that are used to train your NLU model. Samples are labeled with intents and annotated with entities.
Samples node specification
The samples
node is defined as follows:
- The
samples
node contains zero-manysample
nodes. - Each
sample
node contains zero-manyannotation
nodes. - Each
annotation
node specifies an annotation for the sample. It has the requiredconceptref
attribute, which specifies the entity for the annotation. - The
annotation
node can be nested if the ontologyconcept
has the appropriaterelations
defined.
Each sample
node has the following attributes:
Attribute | Required? | Description |
---|---|---|
description |
Optional | Description of the sample. |
count |
Optional | Relative frequency of this sample being spoken in your application. |
intentref |
Optional | The intent it expresses or is part of. |
excluded |
Optional | Specifies whether the sample should included in the training set when building a model. Default value is false , i.e. sample is included. |
fullyVerified |
Optional | Specifies whether the sample has been assigned an intent and annotation is complete and verified as correct. Default value is false , i.e. sample is not annotation-assigned. See Verify samples for details. |
protected |
Optional | Used to identify data that is confidential and should not be exposed. Default value is false .Do not set this field to true , otherwise you will not be able to access the data. |
sourceref |
Optional | Source of the sample. For example: DTV_Domain . |
Sample TRSX
search-v1.trsx.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<project xmlns:nuance="https://developer.nuance.com/mix/nlu/trsx" xml:lang="en-US" nuance:version="2.0">
<metadata>
<entry key="description">Sample model with a freeform entity and a list entity</entry>
<entry key="short_name">Search Engine Query Sample Model</entry>
<entry key="source">Nuance Communications</entry>
<entry key="type">sample</entry>
<entry key="version">1.0.0</entry>
</metadata>
<ontology base="https://developer.nuance.com/mix/nlu/trsx/ontology-1.0.xml">
<intents>
<intent name="SEARCH">
<links>
<link conceptref="SEARCH_ENGINE"/>
<link conceptref="SEARCH_QUERY"/>
</links>
</intent>
</intents>
<concepts>
<concept name="SEARCH_QUERY" freetext="true"/>
<concept name="SEARCH_ENGINE"/>
</concepts>
</ontology>
<dictionaries>
<dictionary conceptref="SEARCH_ENGINE">
<entry literal="bing" value="bing"/>
<entry literal="duck duck go" value="duckduckgo"/>
<entry literal="duckduckgo" value="duckduckgo"/>
<entry literal="google" value="google"/>
<entry literal="wiki" value="wikipedia"/>
<entry literal="wikipedia" value="wikipedia"/>
<entry literal="wikipédia" value="wikipedia"/>
<entry literal="yahoo" value="yahoo"/>
</dictionary>
</dictionaries>
<samples>
<sample intentref="SEARCH" count="1">I'd like to find<annotation conceptref="SEARCH_QUERY">good coffee places nearby</annotation>on<annotation conceptref="SEARCH_ENGINE">bing</annotation>
</sample>
<sample intentref="SEARCH" count="1">look up<annotation conceptref="SEARCH_QUERY">Edward Snowden</annotation>on<annotation conceptref="SEARCH_ENGINE">duckduckgo</annotation>
</sample>
<sample intentref="SEARCH" count="1">
<annotation conceptref="SEARCH_ENGINE">google</annotation>
<annotation conceptref="SEARCH_QUERY">how long does a sequoia live</annotation>?</sample>
<sample intentref="SEARCH" count="1">find me some<annotation conceptref="SEARCH_QUERY">cat pictures</annotation>on<annotation conceptref="SEARCH_ENGINE">bing</annotation>
</sample>
<sample intentref="SEARCH" count="1">search<annotation conceptref="SEARCH_QUERY">cheap flights</annotation>in<annotation conceptref="SEARCH_ENGINE">yahoo</annotation>
</sample>
<sample intentref="SEARCH" count="1">I'd like to search for<annotation conceptref="SEARCH_QUERY">the list of lists of lists</annotation>on<annotation conceptref="SEARCH_ENGINE">wikipedia</annotation>please</sample>
<sample intentref="SEARCH" count="1">search<annotation conceptref="SEARCH_QUERY">do a barrel roll</annotation>on<annotation conceptref="SEARCH_ENGINE">google</annotation>
</sample>
<sample intentref="SEARCH" count="1">search for<annotation conceptref="SEARCH_QUERY">chinese chicken salad recipes</annotation>
</sample>
<sample intentref="SEARCH" count="1">search for<annotation conceptref="SEARCH_QUERY">cats</annotation>
</sample>
<sample intentref="SEARCH" count="1">search for<annotation conceptref="SEARCH_QUERY">cat toys</annotation>on<annotation conceptref="SEARCH_ENGINE">yahoo</annotation>
</sample>
<sample intentref="SEARCH" count="1">search for<annotation conceptref="SEARCH_QUERY">thai noodles</annotation>on<annotation conceptref="SEARCH_ENGINE">bing</annotation>
</sample>
<sample intentref="SEARCH" count="1">search<annotation conceptref="SEARCH_QUERY">how to fix a leaky tap</annotation>
</sample>
<sample intentref="SEARCH" count="1">search<annotation conceptref="SEARCH_ENGINE">google</annotation>for<annotation conceptref="SEARCH_QUERY">a new car</annotation>
</sample>
<sample intentref="SEARCH" count="1">search on<annotation conceptref="SEARCH_ENGINE">google</annotation>for<annotation conceptref="SEARCH_QUERY">tree pruning</annotation>
</sample>
</samples>
</project>
The use case supported in this sample is search. This example showcases freetext entities and list entities.
This sample TRSX file has the following characteristics:
- Defined as an XML file using the TRSX 2.0 specification
- Contains
metadata
pertaining to the nature of the project - Contains an
ontology
with a single use case:SEARCH
- The
SEARCH
use case supports specifying aSEARCH_ENGINE
and aSEARCH_QUERY
- The
SEARCH_QUERY
entity is a freetext type entity - The
SERCH_ENGINE
entity is of list type, and various engines have been enumerated and canonicalized - The
samples
present are all annotated with the respective entities
Import errors
Mix.nlu may return errors and warnings during the import.
Unless specified otherwise (see IMPORT_ERROR_TRSX_PARSING), the import process will complete and then return the errors or warnings, if any. The difference between these two severity levels is as follows:
- Error: A serious issue occurred. You may want to fix it and re-import the file.
- Warning: The import process was successful, but a non-critical issue occurred.
To solve the issues
Each error message below provides a recommended action for solving the issue. You have two options for solving issues:
- Fix the error using the Mix.nlu tool. For example, if a sample was imported but not annotated, you can annotate it in the Mix.nlu tool. This is the recommended method.
- Fix the error in the TRSX file and re-import it again. However, this solution is not optimal since you will get warnings for all the items that have already been imported.
ANNOTATION_ERROR_ANNOTATING_WITH_CONCEPT
Unable to annotate a sample's tokens with an entity.
The severity and action depend on the additional message provided:
Message 1: "Annotation with concept entity_name has been skipped because the entity is hidden"
Description: The sample was annotated with a hidden entity. This can happen, for example, if you try to annotate a sample with an entity that is used to define a predefined entity (for example, if you annotate a sample with entity nuance_QUANTITY_ABS
, which is a sub-entity of the nuance_QUANTITY
predefined entity).
Severity: Warning.
Result: The sample was added without annotation.
Action: Annotate your samples with an entity that is defined in your ontology. To get the list of entities that you can use:
- Log in to the Mix Nuance Dev portal.
- Select your project.
- Go to Mix.nlu.
- Use an entity that is listed in the Entities column.
Message 2: "Annotating a full sample with a freetext concept is not allowed"
Description: The complete sample was annotated with an entity of type freetext.
Severity: Error.
Result: The sample was added without annotation.
Action: Apply the freetext entity to a meaningful phrase inside the sample, not to the whole sample.
ANNOTATION_ERROR_ANNOTATING_WITH_INTENT
Unable to annotate a sample's tokens with an intent.
The severity and action depend on the additional message provided:
Message 1: "Intent intent_name not recognized"
Description: The sample was annotated with an intent that does not exist.
Severity: Warning.
Result: The sample was added without annotation.
Action: Annotate your sample with an intent that is defined in your ontology. To get the list of intents that you can use:
- Log in to the Mix Nuance Dev portal.
- Select your project.
- Go to Mix.nlu.
- Use an intent that is listed in the Intents area.
Message 2: "Intent was not provided"
Description: The intent name was not specified in the sample.
Severity: Warning.
Result: The sample was added without annotation.
Action: Annotate the sample with an existing intent.
IMPORT_ERROR_TRSX_PARSING
The TRSX file you are trying to import is invalid.
The action depend on the additional message provided:
Message 1: "Failed to create data sources! Please check content of your TRSX file"
Description: There was an issue with the Sources section of the TRSX. The URI may be malformed or the data source already exists.
Severity: Error.
Result: The TRSX file was not imported.
Action: Fix the issue and try again.
Other messages: "An unexpected error occurred while loading a '.trsx' file", "Could not load XML document", or "Unable to parse TRSX file"
Description: The file is not a correctly-formed XML document.
Severity: Error.
Result: The TRSX file was not imported.
Action: Verify the validity of your XML file using an online XML validator. You can download the TRSX schema here. Try the import again when your file is valid.
IMPORT_WARNING_DUPLICATE_SAMPLE_LITERAL
Description: By default, you cannot include duplicate samples in your TRSX file, unless the allowDuplicate
query parameter is enabled. If this parameter is not enabled and the TRSX file includes duplicate samples (even if they have different annotations), this message is returned. An additional message similar to the following may also be provided:
- "This sample already exists in the imported model data_source_name and cannot be modified."
- "The sample already exists in the imported model data_source_name in intent_name."
- "The sample already exists in intent_name."
- "The sample already exists in the imported model data_source_name but isn't assigned to any intent."
- "The sample already exists in the model but isn't assigned to any intent."
Severity: Error.
Result: The sample was not imported.
Action: Modify the duplicate sample or enable the allowDuplicate
query parameter.
IMPORT_WARNING_IGNORED_LINE
Description: A line was ignored because it is empty or commented out.
Severity: Warning.
Result: The line was ignored.
Action: No action.
NORMALIZATION_ERROR_QA_CHECKER_VALIDATION
Description: A serious error occurred when tokenizing a sample. This can occur when the annotations are not compliant with the tokens returned by the tokenizing system.
For example, consider the following sample:
pie à la mode
The tokenizer splits this utterance into the following tokens:
pie
and à la mode
This means that the whole phrase à la mode
must be annotated, and not only mode
. If you annotate this sample as follows:
<sample intentref="order_dessert" count="1"><annotation conceptref="dessert_type">pie</annotation> à la <annotation conceptref="dessert_flavor">mode</annotation></sample>
The NORMALIZATION_ERROR_QA_CHECKER_VALIDATION will be generated.
Severity: Error.
Result: The sample was added without annotation.
Action: When you encounter this error, you can try to annotate the problematic sample in the Mix.nlu user interface, as follows:
- Log in to the Mix Nuance Dev portal.
- Create a project with the appropriate language.
- Go to Mix.nlu.
- Select NO_INTENT.
- Type the problematic sample and see how you can apply annotations.
ONTOLOGY_ERROR_CREATING_CONCEPT
Unable to create an entity.
The severity and action depend on the additional message provided:
Message 1: "An exception occurred while creating concept entity_name: A concept cannot have multiple modifier properties active simultaneously"
Description: An entity cannot be of type freetext
and dynamic
at the same time.
Severity: Error.
Result: The entity was not added.
Action: Specify a single entity type.
Message 2: "An exception occurred while creating concept entity_name: message_details"
Description: An internal error occurred while creating the entity.
Severity: Error.
Result: The entity was not added.
Action: Please contact Nuance Support.
Message 3: "An invalid data source source_name has been provided for concept entity_name"
Description: The data source specified does not exist.
Severity: Error.
Result: The entity was not added.
Action: Make sure to specify a name for a source that already exists.
ONTOLOGY_ERROR_CREATING_INTENT
Unable to create an intent.
The severity and action depend on the additional message provided:
Message 1: An invalid data source source_name has been provided for intent intent_name"
Description: The data source specified does not exist.
Severity: Error.
Result: The intent was not created.
Action: Make sure to specify a name for a source that already exists.
Message 2: "An exception occurred while creating intent intent_name: message_details"
Description: An internal error occurred while creating the intent.
Severity: Error.
Result: The intent was not added.
Action: Please contact Nuance Support.
ONTOLOGY_ERROR_CREATING_RELATIONSHIP
Unable to create a relationship between entities.
The severity and action depend on the additional message provided:
Message 1: An invalid data source source_name has been provided for 'hasA' relationship between intent intent_name and concept entity_name" or "An invalid data source source_name has been provided for hasA/isA relationship between parent node parent_name and child node child_name."
Description: The data source specified does not exist.
Severity: Error.
Result: The relationship was not created.
Action: Make sure to specify a name for a source that already exists.
Message 2: "Error while retrieving intent intent_name from the ontology in order to create a 'hasA' relationship"
Description: The intent specified for the relationship does not exist.
Severity: Error.
Result: The relationship was not created.
Action: Specify an intent that is defined in your ontology. To get the list of intents that you can use:
- Log in to the Mix Nuance Dev portal.
- Select your project.
- Go to Mix.nlu.
- Use an intent that is listed in the Intents area.
Message 3: "Error while retrieving node node_name from the ontology in order to create a hasA/isA relationship" or "Error while retrieving either node_name or node_name node from the ontology in order to create a hasA/isA relationship"
Description: The entity or intent specified for the relationship does not exist.
Severity: Error.
Result: The relationship was not created.
Action: Specify an existing entity or intent.
Message 4: "Error occurred while creating 'hasA' relationship between intent intent_name and concept entity_name" or "Error occurred while creating relationship between parent node parent_name and child node child_name"
Description: An internal error occurred while creating the relationship.
Severity: Error.
Result: The relationship was not added.
Action: Please contact Nuance Support.
ONTOLOGY_ERROR_CREATING_SAMPLE
Unable to create a sample.
The severity and action depend on the additional message provided:
Message 1: "Sample sample exceeded maximum length of length_limit"
Description: The sample provided is too long. A sample can contain a maximum of 400 characters.
Severity: Error.
Result: The sample was not created.
Action: Edit the sample so that it includes less than 400 characters.
Message 2: "Impossible to create sample - The literal is empty."
Description: The sample provided is empty.
Severity: Error.
Result: The sample was not created.
Action: Specify a non-empty sample.
Message 3: An invalid data source source_name has been provided for sample sample"
Description: The data source specified does not exist.
Severity: Error.
Result: The sample was not created.
Action: Make sure to specify a name for a source that already exists.
Message 4: "Annotation on sample sample_literal with concept entity_name failed"
Description: An internal error occurred while creating the sample.
Severity: Error.
Result: The sample was not added.
Action: Please contact Nuance Support.
SEMANTIC_ERROR_ADDING_PATTERN
Unable to add a pattern (i.e., an entity's dictionary entry).
The severity and action depend on the additional message provided:
Message 1: "Error while retrieving concept entity_name from the ontology in order to add patterns - pattern_name."
Description: The entity name specified could not be found.
Severity: Error.
Result: The pattern was not added since the entity could not be found.
Action: Make sure to add the patterns to an entity that is defined in your ontology. To get the list of entities that you can use:
- Log in to the Mix Nuance Dev portal.
- Select your project.
- Go to Mix.nlu.
- Use an entity that is listed in the Entities column.
Message 2: "Concept entity_name has some relationships and must not be associated to literals"
Description: The entity specified has relationships, so you cannot add a dictionary entry to it.
Severity: Error.
Result: The pattern was not added.
Action: Make sure to add the patterns to an entity that does not have relationships.
Message 3: "An invalid data source source_name has been provided for pattern pattern_name"
Description: The data source specified does not exist.
Severity: Error.
Result: The pattern was not added.
Action: Make sure to specify a name for a source that already exists.
Message 4: "Pattern pattern_name has been dropped because it cannot be associated with concept entity_name"
Description: You are trying to add a pattern to a predefined entity. This is not allowed.
Severity: Warning.
Result: The pattern was not added.
Action: No action.
Message 5: "Error while adding pattern pattern_name to concept entity_name"
Description: The tokenized version of the pattern is a duplicate of an existing pattern. For example, consider the following patterns:
- it's
- it 's (notice the extra space after "it").
While these patterns are different in their raw format, once tokenized they are identical ("it's"). This is not allowed.
Severity: Warning.
Result: The pattern was not added.
Action: No action.
Other messages: "Internal error - no pattern was tokenized!", "Couldn't tokenize/normalize pattern: pattern_name"
Description: An internal error occurred.
Severity: Error.
Result: The pattern was added but was not tokenized.
Action: Please contact Nuance Support.
TRSX schema (XSD)
You can download the following schemas to validate your TRSX XML file:
Change log
2022-09-28
- TRSX file was updated and is now at version 2.6.
dataType
attribute included in Concepts node specification.- Added entity data types section to the Ontology node.
- Renamed former entity types section to entity collection methods.
- Added predefined entities section to Ontology node.
2022-02-23
- Added the
isSensitive
attribute to the Settings node specification. - Added a settings node schema (code sample).
2022-01-31
- TRSX file was updated.
- Nuance XSD file was updated.
- Added the
nuance:enginePackVersion
attribute to the Project node specification. - Added the
useForOOV
attribute to the Source node specification.
2020-08-11
- TRSX file was updated.
2020-07-14
- TRSX file was updated and is now at version 2.5.
- Added the
fullyVerified
attribute to the Samples node specification.
2020-06-24
- TRSX file was updated.
- Added rule-based entity collection method.
2020-03-31
- TRSX file was updated and is now at version 2.4.
- Added regex node.
2019-12-11
Below are changes made to the NLU Model Specification (TRSX) documentation since the initial Beta release:
- "Concepts" were renamed to "Entities".