The F-score (or F-measure) considers both the precision and the recall of the test
to compute the score. The precision p is the number of correct positive results
divided by the number of all positive results, and the recall r is the number of
correct positive results divided by the number of positive results that should
have been returned.
The traditional or balanced F-score (F1 score) is the harmonic mean of
precision and recall, where an F1 score reaches its best value at 1 and worst at 0.
The general formula involves a positive real β so that F-score measures
the effectiveness of retrieval with respect to a user who attaches β times
as much importance to recall as precision.