Comparison of available intentizer types

When creating a new NLU you can select diffent typos of intentizer models. Please note that complex model requires special feature enabled on company level. Contact you system admin to enable this feature.

SimpleComplex
SummaryLightweight, universal model, multilingualHeavy model, mainly for banking
DomainUniversalFinances
Supported languagesMultilingual - 16 languages (Arabic, Simplified Chinese, Traditional Chinese, English, French, German, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Thai, Turkish, Russian)mainly Polish
Support for multi-intentions
Classification quality (1)⭐⭐⭐⭐⭐⭐⭐⭐⭐
Classification performance (2)⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Model training time(3)⭐⭐⭐⭐⭐⭐⭐⭐
Size of the resulting model (4)⭐⭐⭐⭐⭐⭐⭐⭐
PreprocessingThis mechanism performs a series of operations on the input text, aimed at clearing it of unwanted characters and normalizing it to a uniform form. The following is a description of each operation:

- matching all characters that are not alphanumeric and converting the matched sequences to a single space, thus removing any unwanted characters such as punctuation marks, emoticons, etc.
- matching any occurrences of two or more spaces and converting the matched sequences to a single space, which allows you to normalize multiple spaces to a single space.
- removing whitespace characters (spaces, tabs) from the beginning and end of the text, which eliminates any whitespace characters added at the end of the previous step's operation.
- converting all letters in the text to lowercase, which ensures the uniformity of letter size.
- Converting Unicode characters into their ASCII equivalents. For example, diacritical characters will be replaced with their basic equivalents (e.g., “ł” will be replaced with “l”).

As a result, the text after preprocessing will contain text normalized to a uniform form, devoid of punctuation marks, multiple spaces and unwanted Unicode characters.


Examples:
- “What will the weather be like tomorrow?” -> “what will the weather be like tomorrow”
- “ By when do I get a response to my complaint?????” -> “by when do i get a response to my complaint”
- "sign me up for a doctor tomorrow 😁" -> “sign me up for a doctor tomorrow"
This mechanism performs a series of operations on the input text, aimed at clearing it of unwanted characters and normalizing it to a uniform form. The following is a description of each operation:

- converting all letters in the text to lowercase, which ensures the uniformity of letter size.

Examples:
“What will the weather be like tomorrow?” -> “what will the weather be like tomorrow ?”

  1. f1-score measure, simple - 0.89 (precision - 0.92, sensitivity - 0.85), complex - 0.94 (precision - 0.94, sensitivity - 0.93), test set of 2807 samples for 91 classes
  2. simple - 58.82 phrases/s, complex - 47.62 phrases/s, test collection of 2807 samples
  3. simple - 107 s, complex - 25 min, training set of 9825 samples divided into 91 classes of intentions
  4. simple - 350MB (feature extractor) + 1MB / model, complex - 3,5GB / model