202602141353
Status: #reference
Tags: Statistics, Machine Learning
State: #nascient
F1-Score
The harmonic mean of Precision (1-false discovery proportion) and Recall (Statistical Power). It is defined as follows:
This unnatural expression can be understood easily once we look at its partial derivatives. Indeed, we would intuitively want a score that:
-
Increases when precision increases
-
Increases when recall increases
-
Is "bad" when either one is too low
-
Shows diminishing returns in increasing either one
After all, if we care about both, increase in either one should be rewarded, increase in one at the detriment of the other should be punished somewhat, and past a given threshold of recall/precision the increase in performance is marginal and therefore not worth rewarding much more.
We see that the partial of the
and with respect to
This tells us that the
Regarding the sensitivity to precision, if recall is high and precision increases, the increase in
Observe, that the harmonic mean is closer to the lower of precision and recall than the arithmetic mean which is easily impressed if either metric is high. For example, if the precision is
Finally, to show the diminishing returns (and confirm everything we already explained) it suffices to look at the following plot.
import numpy as np
import matplotlib.pyplot as plt
def f1_partial_p(P, R):
return (2 * R**2) / (P + R)**2
def arithmetic_mean(P, R):
return (P + R) / 2
def f1_score(P, R):
return (2 * P * R) / (P + R)
# Setup for Partial Derivative Plot
p_range = np.linspace(0.01, 1, 200)
recalls = [0.2, 0.5, 0.8, 1.0]
plt.figure(figsize=(12, 5))
# Plot 1: The Partial Derivative (Sensitivity & Diminishing Returns)
plt.subplot(1, 2, 1)
for r in recalls:
plt.plot(p_range, f1_partial_p(p_range, r), label=f'Recall = {r}', lw=2)
plt.title('Sensitivity of $F_1$ to Precision\n($\partial F_1 / \partial P$)', fontsize=13)
plt.xlabel('Precision ($P$)', fontsize=11)
plt.ylabel('Rate of Change in $F_1
## Relevant Links
| File | Folder | Last Modified |
| --------------------------------------------------------------------------------------------------- | --------- | ---------------------------- |
| (FBeta) E-Score | 1. Cosmos | 11:29 AM - February 25, 2026 |
| Interpolated Precision | 1. Cosmos | 2:10 AM - February 25, 2026 |
| Precision (1-false discovery proportion) | 1. Cosmos | 1:54 AM - February 25, 2026 |
{ .block-language-dataview}, fontsize=11)
plt.legend(title='Recall Value')
plt.grid(True, linestyle='--', alpha=0.6)
plt.ylim(0, 2.1)
# Plot 2: F1 Score vs Arithmetic Mean (The "Penalization" Property)
plt.subplot(1, 2, 2)
fixed_r = 0.8
plt.plot(p_range, f1_score(p_range, fixed_r), label=f'$F_1$ Score', lw=3, color='tab:blue')
plt.plot(p_range, arithmetic_mean(p_range, fixed_r), label='Arithmetic Mean', lw=2, linestyle='--', color='tab:red')
plt.title(f'Comparison: $F_1$ vs. Arithmetic Mean\n(Fixed Recall = {fixed_r})', fontsize=13)
plt.xlabel('Precision ($P$)', fontsize=11)
plt.ylabel('Score', fontsize=11)
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.savefig('f1_analysis_plots.png')
plt.show()
Relevant Links
{{CODE_BLOCK_1}}