Advices regarding intents

The quality of the classifier will aim for 100%, but it will never actually reach 100%—there will always be examples where it does not perform as expected.
Possible Real-World Examples (preferably from production).
Clear Class Definition: Clearly define the class for each sample, avoiding cases where the same or similarly meaningful sample ends up in two or more classes.
Diversity: The greater the diversity in the data represented by a sample, the better the quality. More diverse examples help improve the classifier's performance.
Samples of Various Lengths: Include samples of varying lengths, as this reflects real-world scenarios and helps the model handle different input sizes more effectively.
Less (classes) means better – avoiding unnecessary increases in the number of classes (beyond what is actually required in the design process); each additional class decreases the quality of classification. Solutions: merging classes with similar meanings, using a separate model.
Avoid transferring unnecessary information (noise) in samples, such as account numbers, names, addresses, etc. – skillful use of entities is better suited for this purpose.
Detection of problematic phrases and adding them to the training set.
Adequate number of samples per class (assuming our model is few-shot, meaning a dozen or more).
Balanced dataset – as similar a number of samples per class as possible.
Data augmentation – for example: generating additional test samples, such as those containing spelling errors, etc.
Score/probability at the output of the classifier is a somewhat arbitrary concept.
Quality of training samples matters – remove duplicates, errors, inconsistencies.
Data anonymization – use of entities.
Contextual filtering – removing words/characters from phrases that carry irrelevant information.
If there are examples for which the classifier does not provide the expected response (locally), it does not mean that the classifier is of poor quality (globally) – see point 1. Globally, it can be tested by performing cross-validation or testing on a larger scale of samples and iteratively improving it by adding problematic samples and checking if this noticeably affects the quality of classification globally.
Optionally: Use negative examples that do not fit any of the assumed classes as a separate/dedicated intent class.
Optionally: Iterative refinement – adding new phrases from actual conversations to the training set.
Optionally: Use data visualization/dimensionality reduction techniques to illustrate how phrases look in terms of similarity and how they are spaced apart, for example: Visualizing Word Vectors with t-SNE.