Working with data packs
Data packs are maintained by Nuance speech scientists to remain current with popular vocabulary and language use.
Mix uses two types of data packs when building models for your applications:
Nuance Data Pack (NDP), also known as a core data pack, used for automatic speech recognition and tokenization. Each data pack provides a base language model that enables the speech recognition engine and the text processing engine to recognize and transcribe the most common terms and constructs in the language and locale. The main identifier for this type of data pack is the Locale plus a Topic (gen for General in most cases). Additional topics are available upon request to provide a specialized, yet still general, knowledge of a domain or specific area of interest. Nuance data packs are used by both the ASR and the NLU engines.
When you add a new project in Mix you select the data pack topic and locale to use. If specialized topics are available to your organization, they will be listed for selection as well.QuickNLP (QNLP) data pack, used for semantic understanding. These data packs enable the natural language engine to derive intent and reveal the meaning behind text and spoken input, by leveraging AI-based speech technology and powerful machine learning models. QuickNLP data packs are used by the NLU engine.
Currently, this section contains information on the QNLP data pack versions available to Mix, including on the predefined entities supported. In the future more information about NDP models will be available.
Understanding the link between projects and QNLP data pack versions
Projects are associated with a specific version of a QuickNLP/NLU data pack. When you create a new project, it is associated with the current data pack version. When you rebuild your model, it continues to use the same data pack version. For example, if the version of the QNLP data pack is 6.6.11 when you create a new project, this data pack version will continue to be used when you rebuild your model, even if the current data pack version available has increased.
Data pack versions may not be fully backward-compatible. This means that if you create a new project but you want to import the TRSX content of a project associated with an older data pack version, you may encounter issues.
For example, data packs contain predefined entities, which save you the trouble of defining entities that are common to many applications, such as monetary amounts, Boolean values, calendar items (dates, times, or both), cardinal and ordinal numbers, and so on. Let's say that the nuance_DURATION predefined entity changed between the 5.x and 6.x data packs. An application with a 5.x data pack would expect a response of {"DURATION_ABS":{"HOUR":2}}, but the new data pack would return a different format: {"DURATION_ABS":{"UNIT":"hour","NUMBER":2}}. This change may break your application.
Predefined entities
This section describes the predefined entities that are included with Mix QuickNLP (QNLP) data pack version 6.x.
Some notes:
- All Nuance predefined entities are namespaced with "nuance_", including all subnodes.
- Some entities may not be available in all languages.
- Mix.dialog does not support the nuance_CALENDARX predefined entity. If you would like to let users provide either a date or a time, as a single conversation turn, use separate DATE and TIME entities. See Predefined entities in Mix.dialog.
Numeric value ranges for predefined entities
Most predefined entities use a mix of both regular expression (regex) and GrXML grammars. In general, GrXML grammars are used to parse expressions written as words (for example, "five degrees Celsius"), and regex are used to parse expressions that include Arabic numerals, punctuation, or symbols (such as "5°C"). However, GrXML grammars can also be used to parse expressions with Arabic numbers (for example, "December 10 2018"), and a regex may parse expressions that include words (for example, "-5 degrees" or "5:15 tomorrow").
For regex, there is no limit to the number of digits in an Arabic number for any entity with regex values in the data pack. This applies to all languages.
For GrXML grammars, the accepted ranges of numeric values are provided per predefined entity (when applicable). Note that the values are provided based on US English. The coverage may vary per language. As an example, for languages where large numbers are expressed as compound words (such as Finnish and Germanic languages), the value ranges may be more limited because a grammar cannot list all the possible compounds for large numbers. Also, languages that have not been used by a deployed application may be more limited due to availability of feedback.
nuance_AMOUNT
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_AMOUNT": {
"type": "object",
"properties": {
"nuance_UNIT": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
]
},
"nuance_NUMBER": {
"type": "string"
}
},
"required": [
"nuance_UNIT",
"nuance_NUMBER"
]
}
},
"additionalProperties": false
}
A sum of money intended for use in banking actions and for payments. The currency is dependent on the grammar. For example, if the en-US grammar is used, the only currency accepted is the US Dollar.
Sample language:
- five dollars
- $4.75
For example, "five dollars" returns this response (formatted in JSON):
{ "nuance_NUMBER":5, "nuance_UNIT":"USD" }
Numeric value range
<= 99,999,999,999
nuance_BOOLEAN
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_BOOLEAN": {
"type": "boolean"
}
},
"additionalProperties": false
}
Declaration of true or false.
Sample language:
- yes
- that's absolutely right
- it is not
- not really
nuance_CALENDARX
{
"$schema": "http://json-schema.org/draft-04/schema#",
"definitions": {
"__main__.TimeAbs": {
"type": "object",
"properties": {
"nuance_TIME_ABS": {
"type": "object",
"properties": {
"nuance_AMPM": {
"type": "string",
"pattern": "[am|pm]"
},
"nuance_MINUTE": {
"type": "number"
},
"nuance_HOUR": {
"type": "number"
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[fuzzy]"
}
}
}
},
"additionalProperties": false
},
"__main__.TimeRel": {
"type": "object",
"properties": {
"nuance_TIME_REL": {
"type": "object",
"properties": {
"nuance_INCREMENT": {
"type": "number"
},
"nuance_STEP": {
"type": "string",
"pattern": "[hour|minute]"
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[fuzzy]"
}
}
}
},
"additionalProperties": false
},
"__main__.DateAbs": {
"type": "object",
"properties": {
"nuance_DATE_ABS": {
"type": "object",
"properties": {
"nuance_DAY": {
"type": "number"
},
"nuance_MONTH": {
"type": "number"
},
"nuance_YEAR": {
"type": "number"
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[fuzzy]"
}
}
}
},
"additionalProperties": false
},
"__main__.DateRel": {
"type": "object",
"properties": {
"nuance_DATE_REL": {
"type": "object",
"properties": {
"nuance_NAMED_DAY": {
"type": "string"
},
"nuance_DAY_OF_WEEK": {
"type": "number",
"pattern": "[1-7]"
},
"nuance_INCREMENT": {
"type": "number"
},
"nuance_STEP": {
"type": "string",
"pattern": "[day|week|month|year]"
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[fuzzy]"
}
}
}
},
"additionalProperties": false
},
"__main__.Date": {
"type": "object",
"properties": {
"nuance_DATE": {
"anyOf": [
{
"$ref": "#/definitions/__main__.DateAbs"
},
{
"$ref": "#/definitions/__main__.DateRel"
}
]
}
},
"additionalProperties": false
},
"__main__.Time": {
"type": "object",
"properties": {
"nuance_TIME": {
"anyOf": [
{
"$ref": "#/definitions/__main__.TimeAbs"
},
{
"$ref": "#/definitions/__main__.TimeRel"
}
]
}
},
"additionalProperties": false
},
"__main__.CalendarRange": {
"type": "object",
"properties": {
"nuance_CALENDAR_RANGE_START": {
"nuance_CALENDAR": {
"anyOf": [
{
"$ref": "#/definitions/__main__.Date"
},
{
"$ref": "#/definitions/__main__.Time"
}
]
}
},
"nuance_CALENDAR_RANGE_END": {
"nuance_CALENDAR": {
"anyOf": [
{
"$ref": "#/definitions/__main__.Date"
},
{
"$ref": "#/definitions/__main__.Time"
}
]
}
}
},
"additionalProperties": false
}
},
"type": "object",
"properties": {
"CALENDARX": {
"type": "object",
"properties": {
"oneOf": [
{
"nuance_CALENDAR": {
"anyOf": [
{
"$ref": "#/definitions/__main__.Date"
},
{
"$ref": "#/definitions/__main__.Time"
}
]
},
"nuance_CALENDAR_RANGE": {
"$ref": "#/definitions/__main__.CalendarRange"
}
}
]
}
}
},
"additionalProperties": false
}
Defines a discrete calendar event in terms of date, time, or both. Additionally, the calendar event may be in absolute or relative terms. Named holidays are also represented.
Sample language:
- July
- 5:00
- around 5:30
- in 9 hours
- within the next 2 hours
- in the first week of april
- between 1pm and 4pm
- from Monday to Friday
- Christmas
Examples
The input "from Monday to Friday" returns this response (formatted in JSON):
{ "nuance_CALENDAR_RANGE": { "nuance_CALENDAR_RANGE_START": { "nuance_CALENDAR": { "nuance_DATE": { "nuance_DATE_REL": { "nuance_INCREMENT": 0, "nuance_DAY_OF_WEEK": 2 } } } }, "nuance_CALENDAR_RANGE_END": { "nuance_CALENDAR": { "nuance_DATE": { "nuance_DATE_REL": { "nuance_INCREMENT": 0, "nuance_DAY_OF_WEEK": 6 } } } } } }
The input "Christmas" returns this response (in JSON):
{ "nuance_CALENDAR": { "nuance_DATE": { "nuance_DATE_ABS": { "nuance_DAY": 25, 'nuance_MONTH":12 } } } }
Numeric value range
- For nuance_CALENDARX, nuance_DATE_REL, nuance_TIME_REL: <= 120. (Note that this grammar also covers Arabic numbers. For example, "in 120 hours".)
- For nuance_YEAR: 1900-2050 (as Arabic numbers in GrXML for English and as compounds for de-de).
nuance_CARDINAL_NUMBER
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_CARDINAL_NUMBER": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
]
}
},
"additionalProperties": false
}
A non-fractional, whole number denoting quantity (1, 2, 3) as opposed to an ordinal number (denoting order like first, second, third). Cardinal numbers can be described in natural speech up to and including a million, but any number larger than that must be dictated as a string of digits.
Choose nuance_CARDINAL_NUMBER instead of nuance_NUMBER or nuance_DOUBLE for numerical entities that must have no fraction or decimal point.
Sample language:
- 27
- forty three
- oh one two three
Numeric value range
<= 99,999,999
nuance_DISTANCE
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_DISTANCE": {
"type": "object",
"properties": {
"oneOf": [{
"nuance_DISTANCE_ABS": {
"type": "object",
"properties": {
"nuance_UNIT": {
"type": "string",
"pattern": "[km|m|cm|mm|mi|yd|ft|in]"
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "fuzzy"
},
"nuance_NUMBER": {
"anyOf": [{
"type": "number"
},
{
"type": "string"
}
]
}
}
}
},
{
"nuance_DISTANCE_REL": {
"type": "object",
"properties": {
"nuance_DISTANCE_ABS": {
"type": "object",
"properties": {
"nuance_NUMBER": {
"anyOf": [{
"type": "number"
},
{
"type": "string"
}
]
},
"nuance_UNIT": {
"type": "string",
"pattern": "[km|m|cm|mm|mi|yd|ft|in]"
}
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[GT|LT|GE|LE|EQ|INC]"
}
}
},
"required": "nuance_MODIFIER"
}
}
]
}
}
},
"additionalProperties": false
}
Amount of space between two things or people.
Sample language:
- five meters
- 42 km
- seven inches
Examples
The input "more than three km" returns this response (JSON format):
{ "nuance_DISTANCE_REL": { "nuance_DISTANCE_ABS": { "nuance_UNIT": "km", "nuance_NUMBER": 3 }, "nuance_MODIFIER": "GT" } }
The input "two and a half meters" returns (in JSON):
{ "nuance_DISTANCE_ABS": { "nuance_UNIT": "m", "nuance_NUMBER": 2.5 } }
Numeric value range
<= 99,999,999
nuance_DOUBLE
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_DOUBLE": {
"anyOf": [
{
"type": "number"
},
{
"type": "string"
}
]
}
},
"additionalProperties": false
}
Fractions and decimal-point numbers.
Sample language:
- 2.7
- one and two tenths
- point 5
Numeric value range
- For the whole part and decimal part in decimal-point numbers: <= 99,999,999
- For fractions: Denominators up to tenth
nuance_DURATION
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_DURATION": {
"type": "object",
"properties": {
"oneOf": [{
"nuance_DURATION_ABS": {
"type": "object",
"properties": {
"nuance_NUMBER": {
"type": "number"
},
"nuance_UNIT": {
"type": "string",
"pattern": "[millisecond|second|minute|hour|day|week|month|year]"
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[fuzzy]"
}
}
}
}, {
"nuance_DURATION_REL": {
"type": "object",
"properties": {
"nuance_DURATION_ABS": {
"type": "object",
"properties": {
"nuance_NUMBER": {
"type": "number"
},
"nuance_UNIT": {
"type": "string",
"pattern": "[millisecond|second|minute|hour|day|week|month|year]"
}
}
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[GT|LT|GE|LE|EQ|INC]"
}
},
"required": "nuance_MODIFIER"
}
}
]
}
}
},
"additionalProperties": false
}
Period of time described in absolute (hours, minutes, seconds, and days) or relative terms (for example, "two hours more").
Sample language:
- 7 hours
- 2.5 hours
- around two hours and fourteen minutes
- half an hour
- all day
- less than thirty minutes
Examples
The input "a year and a month" returns this response (JSON format):
{ "nuance_DURATION_ABS": { "nuance_NUMBER": 13, "nuance_UNIT": "month" } }
The input "more than an hour" returns (in JSON):
{ "nuance_DURATION_REL": { "nuance_DURATION_ABS": { "nuance_NUMBER": 1, "nuance_UNIT": "hour" }, "nuance_MODIFIER": "GT" } }
Numeric value range
<= 99,999,999
nuance_DURATION_RANGE
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_DURATION_RANGE": {
"type": "object",
"properties": {
"nuance_DURATION_RANGE_START": {
"type": "object",
"properties": {
"nuance_DURATION": {
"type": "object",
"properties": {
"oneOf": [{
"nuance_DURATION_ABS": {
"type": "object",
"properties": {
"nuance_NUMBER": {
"type": "number"
},
"nuance_UNIT": {
"type": "string",
"pattern": "[millisecond|second|minute|hour|day|week|month|year]"
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[fuzzy]"
}
}
},
"nuance_DURATION_REL": {
"type": "object",
"properties": {
"nuance_DURATION_ABS": {
"type": "object",
"properties": {
"nuance_NUMBER": {
"type": "number"
},
"nuance_UNIT": {
"type": "string",
"pattern": "[millisecond|second|minute|hour|day|week|month|year]"
}
}
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[GT|LT|GE|LE|EQ|INC]"
}
}
}
}
]
}
}
}
},
"nuance_DURATION_RANGE_END": {
"type": "object",
"properties": {
"nuance_DURATION": {
"type": "object",
"properties": {
"oneOf": [{
"nuance_DURATION_ABS": {
"type": "object",
"properties": {
"nuance_NUMBER": {
"type": "number"
},
"nuance_UNIT": {
"type": "string",
"pattern": "[millisecond|second|minute|hour|day|week|month|year]"
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[fuzzy]"
}
}
},
"nuance_DURATION_REL": {
"type": "object",
"properties": {
"nuance_DURATION_ABS": {
"type": "object",
"properties": {
"nuance_NUMBER": {
"type": "number"
},
"nuance_UNIT": {
"type": "string",
"pattern": "[millisecond|second|minute|hour|day|week|month|year]"
}
}
},
"nuance_MODIFIER": {
"type": "string",
"pattern": "[GT|LT|GE|LE|EQ|INC]"
}
}
}
}
]
}
}
}
}
}
}
},
"additionalProperties": false
}
Represents a duration when it can span one of two time intervals. There must be at least a start or end duration value, and most often there are both.
Example
The input "between two and three months" returns this response (formatted in JSON):
{ "nuance_DURATION_RANGE_START": { "nuance_DURATION": { "nuance_DURATION_ABS": { "nuance_NUMBER": 2, "nuance_UNIT": "month" } } }, "nuance_DURATION_RANGE_END": { "nuance_DURATION": { "nuance_DURATION_ABS": { "nuance_NUMBER": 3, "nuance_UNIT": "month" } } } }
nuance_EXPIRY_DATE
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_EXPIRY_DATE": {
"type": "object",
"properties": {
"nuance_DATE_ABS": {
"type": "object",
"properties": {
"nuance_MONTH": {
"type": "number"
},
"nuance_YEAR": {
"type": "number"
}
}
}
},
"required": [
"nuance_DATE_ABS"
]
}
},
"additionalProperties": false
}
This entity is used for credit card expiry dates.
Sample language:
- June twenty fourteen
- one two slash two two
- 05/21
nuance_GENERIC_ORDER
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_GENERIC_ORDER": {
"anyOf": [
{
"type": "string",
"pattern": "[min|max|succ|prec|penultimate]"
},
{
"type": "number"
}
]
}
},
"required": [
"nuance_GENERIC_ORDER"
],
"additionalProperties": false
}
This entity extends ORDINAL_NUMBER (1st, 2nd, 3rd, ..., 31st) to represent special cases for expressing minimum, maximum, previous, successive, and so on.
Sample language:
- first
- latest
- second
- thirty first
nuance_GLOBAL
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_GLOBAL": {
"type": "string",
"pattern": "[help|repeat]"
}
},
"required": [
"nuance_GLOBAL"
],
"additionalProperties": false
}
Generic commands for navigating a voice system.
Sample language:
- i need help
- go back
- sign in
- reset
- cancel
nuance_NUMBER
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_NUMBER": {
"type": "object",
"properties": {
"oneOf": [{
"nuance_CARDINAL_NUMBER": {
"anyOf": [{
"type": "number"
}, {
"type": "string"
}
]
}
}, {
"nuance_DOUBLE": {
"anyOf": [{
"type": "number"
}, {
"type": "string"
}
]
}
}
]
}
}
},
"additionalProperties": false
}
Represents a number (integer, fraction, or decimal point). Numbers can be described in natural speech up to and including a million.
Choose nuance_NUMBER instead of nuance_CARDINAL_NUMBER or nuance_DOUBLE for numerical entities that can be whole numbers, a fraction, or include a decimal point.
Sample language:
- twenty seven
- four and a half
- 49
nuance_ORDINAL_NUMBER
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_ORDINAL_NUMBER": {
"type": "number"
}
},
"required": [
"nuance_ORDINAL_NUMBER"
],
"additionalProperties": false
}
Number that defines a position in a series. Handles all values up until 31st.
Sample language:
- second
- tenth
- thirteenth
- 5th
Numeric value range
<= 31st (Since this has been used mostly for the day of the month. Some languages, such as Russian, have bigger coverage.)
nuance_QUANTITY
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_QUANTITY": {
"type": "object",
"properties": {
"oneOf": [{
"nuance_QUANTITY_ABS": {
"type": "object",
"properties": {
"nuance_NUMBER": {
"type": "number"
}
}
}
}, {
"nuance_QUANTITY_REL": {
"type": "object",
"properties": {
"nuance_MODIFIER": {
"type": "string",
"pattern": "[GT|LT|GE|LE|EQ|INC|NONE|SOME|LOTS]"
},
"nuance_QUANTITY_ABS": {
"type": "object",
"properties": {
"nuance_NUMBER": {
"type": "number"
}
}
}
},
"required": "nuance_MODIFIER"
}
}
]
}
}
},
"additionalProperties": false
}
Defines values of magnitude. Can be relative (for example, "greater than nine", < 100, "some", "lots") or absolute (for example, "ten", 20). Note that QUANTITY_REL does not have built-in grammars.
Sample language:
- more than five
- a lot
- 10
- at least ten
- > 25
- minimum 16
Numeric value range
<= 99,999,999 (for nuance_QUANTITY_REL)
nuance_TEMPERATURE
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"nuance_TEMPERATURE": {
"type": "object",
"properties": {
"nuance_UNIT": {
"type": "string",
"pattern": "[C|F|K]"
},
"nuance_NUMBER": {
"type": "number"
}
},
"required": [
"nuance_NUMBER"
]
}
},
"additionalProperties": false
}
Temperature as expressed in degrees Celsius or Fahrenheit. Temperature can be expressed as a whole number or a floating point value.
Sample language:
- 23 degrees Celsius
- eighteen point five degrees Fahrenheit
- 28°C
- 115°F
Example
The input "twenty two degrees celsius" returns this response (formatted as JSON):
{ "nuance_NUMBER": 22, "nuance_UNIT": "C" }
Numeric value range
- Up to 4 digits (<= 9,999) for whole numbers
- For the whole part and decimal part in decimal-point numbers: <= 99,999,999
- For fractions: Denominators up to tenth