Precision and recall are metrics that are calculated for each individual intent along with F1-score. They’re calculated based on the result of the automated testing that happens on each intent model training. They can be used, together with F1-score to inform your decision making process when you’re looking to optimize your intent classification model.
To calculate precision we look at the amount of true positive evaluations, compared to the total of true positive evaluations, and false positive evaluations:
Precision = true positives / (true positives + false positives)
Precision is especially relevant when the cost of a false positive outweighs the cost of a false negative, in other words the cost of acting (or responding with an answer in Conversational AI Cloud) is higher than the cost of not acting (responding with a fallback event). Optimizing for precision means that you prefer to only give an optimal answer versus giving an answer that may or may not be the most relevant. A very precise model is a very “pure” model. It may not answer all questions, but the ones it does answer are more than likely correct.
To calculate recall we look at the amount of true positive evaluations, compared to the total of true positive evaluations, and false negative evaluations:
Recall = true positives / (true positives + false negatives)
Recall is especially relevant when the cost of a false negative outweighs the cost of a false positive, in other words the cost of not acting (responding with a fallback event in Conversational AI Cloud) is higher than the cost of acting (responding with an answer in Conversational AI Cloud). Optimizing for recall means that you favor giving an answer to the question, over giving the optimal answer. A model with high recall will succeed very well in answering a large amount of incoming questions that you have content for, even though some of those questions shouldn’t have been answered at all.
F1-score is the average of an intents (or models) precision and recall. The challenge of using precision and recall on their own to drive your optimization efforts is that favoring one over the other usually means sacrificing one over the other. In other words, optimizing for precision will likely result in a lower recall and vice versa. The goal of F1-score is to combine these two metrics into a single metric that can be used to evaluate the overall performance of an intent, or the entire model. The formula to calculate F1-score looks as follows:
F1 score = 2 * ( ( precision * recall ) / ( precision + recall ) )
F1-score is a more convenient way to evaluate the overall performance of an intent classification model. Depending on the situation you might want to favor recall for one intent, and precision for another, aiming to strike for a healthy balance between the two over your entire intent model.
Learn more about optimizing your intent model.