Training phrases

Conciseness

Phrases in the training dataset should be direct and simple, as is human-bot communication. When talking or writing to bots, people tend to be clear and concise. User utterances are often closer to browser queries than well-built sentences.

Of course, users that are not yet aware of the fact that they’re writing or talking to a bot might build long and complex sentences, but for the sake of training datasets efficacy, it’s best to provide training phrases according to that rule.

Examples:

Bad phraseGood phrase
*I’m calling because I have a credit card and well I was hoping it provides some kind of insurance but I didn’t find anything about it is it possible that you checked that for mecan you check if my credit card provides some insurance
I’ve been abroad currently I have a personal account with your bank and I wanted to ask if there is a possibility to open a foreign currency account online cause I can’t go to the bank personally for nowis it possible to open a foreign currency account online if I have a personal account

As you see, the good training phrases consist of a simple opening phrase:
Can you / Is it possible / I wanted to know if, etc.
And words conveying the essential meaning:
important verbs and keywords: credit card, provide insurance, open, foreign currency account, online

At the end of the day, it is these content words that the recognition is based on. This means if a user provides "longish" and non-standard utterances that contain some of such words, the model will still be able to assign an appropriate intent to it.

Commonness

Searching for unique ways of asking things usually does not bring benefits. It’s good to focus on the most common ways of expressing the idea and add simple paraphrases. The more of such similar, concise utterances in the dataset, the better the recognition.

Bot questionBad phraseGood phrase
Do you have a business account in our bank? oh I used to but decided to move it to another bank no not currently
Do you have a loan in our bank? unfortunatelyI do yes
Do you use our mobile app? once in a blue moon not often to be honest

Try to avoid abstract phrases that require complex meaning synthesis, unless you need a specific expression to be recognized and associated with a specific intent.

The chances that "once in a blue moon" occurs in user input are very little; thus, it is best to provide the model with representative data that it will have a chance to deal with.

It is also a good practice to review user utterances as the bot goes public. Then you can observe what user expressions look like and what the most common utterances are, and just use them in your training set to enhance recognition


What’s Next