202602141353
Status: #reference
Tags: Statistics, Machine Learning
State: #nascient

F1-Score

The harmonic mean of Precision (1-false discovery proportion) and Recall (Statistical Power). It is defined as follows:

F1=2×Precision×RecallPrecision+Recall

This unnatural expression can be understood easily once we look at its partial derivatives. Indeed, we would intuitively want a score that:

After all, if we care about both, increase in either one should be rewarded, increase in one at the detriment of the other should be punished somewhat, and past a given threshold of recall/precision the increase in performance is marginal and therefore not worth rewarding much more.

We see that the partial of the F1 with respect to Precision is:

\frac{\partial}{\partial}_{Precision}F_1 = \frac{2\times Recall { #2} }{(Precision+Recall)^2}

and with respect to Recall is:

\frac{\partial}{\partial}_{Recall}F_1 = \frac{2\times Precision { #2} }{(Precision+Recall)^2}

This tells us that the F1 has all of our desired properties. Indeed, both derivatives are clearly positive, therefore they are monotonic and F1 will increase if either one increases.

Regarding the sensitivity to precision, if recall is high and precision increases, the increase in F1 will be bigger than if precision increases but recall is low. Symmetrically, the exact same thing holds for recall.

Observe, that the harmonic mean is closer to the lower of precision and recall than the arithmetic mean which is easily impressed if either metric is high. For example, if the precision is 0.10 but the recall is 0.98, the arithmetic mean would yield (0.98+0.10)/2=0.54 whereas the harmonic mean would give (20.980.10)/(0.98+0.10)=0.1814.

Finally, to show the diminishing returns (and confirm everything we already explained) it suffices to look at the following plot.

import numpy as np

import matplotlib.pyplot as plt

def f1_partial_p(P, R):
	return (2 * R**2) / (P + R)**2

def arithmetic_mean(P, R):
	return (P + R) / 2

def f1_score(P, R):
	return (2 * P * R) / (P + R)

  
# Setup for Partial Derivative Plot
p_range = np.linspace(0.01, 1, 200)
recalls = [0.2, 0.5, 0.8, 1.0]
plt.figure(figsize=(12, 5))

# Plot 1: The Partial Derivative (Sensitivity & Diminishing Returns)
plt.subplot(1, 2, 1)
for r in recalls:
	plt.plot(p_range, f1_partial_p(p_range, r), label=f'Recall = {r}', lw=2)
	plt.title('Sensitivity of $F_1$ to Precision\n($\partial F_1 / \partial P$)', fontsize=13)
	plt.xlabel('Precision ($P$)', fontsize=11)
	plt.ylabel('Rate of Change in $F_1


## Relevant Links
| File                                                                                                | Folder    | Last Modified                |
| --------------------------------------------------------------------------------------------------- | --------- | ---------------------------- |
| (FBeta) E-Score                                                   | 1. Cosmos | 11:29 AM - February 25, 2026 |
| Interpolated Precision                                     | 1. Cosmos | 2:10 AM - February 25, 2026  |
| Precision (1-false discovery proportion) | 1. Cosmos | 1:54 AM - February 25, 2026  |

{ .block-language-dataview}, fontsize=11)
	plt.legend(title='Recall Value')
	plt.grid(True, linestyle='--', alpha=0.6)
	plt.ylim(0, 2.1)

  

# Plot 2: F1 Score vs Arithmetic Mean (The "Penalization" Property)
plt.subplot(1, 2, 2)
fixed_r = 0.8
plt.plot(p_range, f1_score(p_range, fixed_r), label=f'$F_1$ Score', lw=3, color='tab:blue')
plt.plot(p_range, arithmetic_mean(p_range, fixed_r), label='Arithmetic Mean', lw=2, linestyle='--', color='tab:red')
plt.title(f'Comparison: $F_1$ vs. Arithmetic Mean\n(Fixed Recall = {fixed_r})', fontsize=13)
plt.xlabel('Precision ($P$)', fontsize=11)
plt.ylabel('Score', fontsize=11)
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)

plt.tight_layout()
plt.savefig('f1_analysis_plots.png')
plt.show()

{{CODE_BLOCK_1}}