The Brier Score is a scoring metric specifically for probability forecasts (like Probability of Precipitation, or POP). It is defined as: $\text{BS} = \frac{1}{n}\sum_{i=1}^{n}(f_i – o_i)^2$, where $f_i$ is the forecast probability (0 to 1) and $o_i$ is the binary outcome (0 if it didn’t rain, 1 if it did). It ranges from 0 (perfect) to 1 (worst).
A Brier Score can be “decomposed” into three parts: Reliability (does it rain 30% of the time when you say 30%?), Resolution (can you tell the difference between a 10% day and a 90% day?), and Uncertainty. Professionals search for “Brier Score Decomposition” to see if their model is “reliable” but “unresolved”—meaning it always forecasts the average climate but never predicts a specific storm.
A “proper” scoring rule is one where the forecaster gets the best score by reporting their honest belief. If you try to “game” the system by always forecasting 0% or 100%, your Brier Score will eventually suffer when the rare events happen. This is why the Brier Score is the industry standard for verifying “Probability of Precipitation” (POP).
In a desert where it rarely rains, it is very easy to get a “good” (low) Brier Score by simply forecasting 0% every day. This is why the raw score is rarely used alone. Instead, professionals look at the score in the context of the local environment, often leading to a search for “Skill Scores” which compare the Brier Score to a baseline forecast.
This site uses cookies to improve your experience. See our Privacy Policy.