Common mistakes

What are the most common mistakes in building data sets for NLU?

❗️

MISTAKE #1

The same phrases can be used many times in the input intents and in further states in the bot - you must not do this!

This type of error causes the same phrases to be flipped and copied multiple times between intents, while forgetting that they are not the correct answers to the question previously asked. Such action also reduces the effectiveness of the model.

How to avoid it? Check the bot's flow and question for a given intent every time you add new phrases to the bot.

❗️

MISTAKE #2

If we want a given statement to be served from two places, we can add the same phrase to two intents and the model will deal with it - you must not do this!

For example, if we want to give the customer the opportunity to say "it is about insurance" or "I am calling about insurance" in different intents. You must not do this! This is a case of phrase ambiguity.

In this case, we do not know if the caller already has insurance or wants to buy it. Adding this type of phrase to the intent OFFER and at the same time to the intent INSURANCE is not a solution to this problem.

These phrases should be included in only one intent in the bot, e.g. the intent INFORMATION. This intent will disambiguate and/or clarify the user’s intent, and guide the user onto the right track, e.g. If you are calling about your existing insurance, say 'insurance', if it’s about purchasing insurance, say 'offer'.