Advices regarding intents
- The quality of the classifier will aim for 100%, but it will never actually reach 100%—there will always be examples where it does not perform as expected.
- Possible Real-World Examples (preferably from production).
- Clear Class Definition: Clearly define the class for each sample, avoiding cases where the same or similarly meaningful sample ends up in two or more classes.
- Diversity: The greater the diversity in the data represented by a sample, the better the quality. More diverse examples help improve the classifier's performance.
- Samples of Various Lengths: Include samples of varying lengths, as this reflects real-world scenarios and helps the model handle different input sizes more effectively.
- Less (classes) means better – avoiding unnecessary increases in the number of classes (beyond what is actually required in the design process); each additional class decreases the quality of classification. Solutions: merging classes with similar meanings, using a separate model.
- Avoid transferring unnecessary information (noise) in samples, such as account numbers, names, addresses, etc. – skillful use of entities is better suited for this purpose.
- Detection of problematic phrases and adding them to the training set.
- Adequate number of samples per class (assuming our model is few-shot, meaning a dozen or more).
- Balanced dataset – as similar a number of samples per class as possible.
- Data augmentation – for example: generating additional test samples, such as those containing spelling errors, etc.
- Score/probability at the output of the classifier is a somewhat arbitrary concept.
- Quality of training samples matters – remove duplicates, errors, inconsistencies.
- Data anonymization – use of entities.
- Contextual filtering – removing words/characters from phrases that carry irrelevant information.
- If there are examples for which the classifier does not provide the expected response (locally), it does not mean that the classifier is of poor quality (globally) – see point 1. Globally, it can be tested by performing cross-validation or testing on a larger scale of samples and iteratively improving it by adding problematic samples and checking if this noticeably affects the quality of classification globally.
- Optionally: Use negative examples that do not fit any of the assumed classes as a separate/dedicated intent class.
- Optionally: Iterative refinement – adding new phrases from actual conversations to the training set.
- Optionally: Use data visualization/dimensionality reduction techniques to illustrate how phrases look in terms of similarity and how they are spaced apart, for example: Visualizing Word Vectors with t-SNE.
Updated 6 months ago