Post Hurricane Irma’s passage over Florida, back in September 2017, I analysed the quality of the forecasts that had been issued by the US National Hurricane Center. When Hurricane Dorian made its way to the US coast, I showed the Irma blog post to my friend Chris, a traffic modeler and forecaster. In response, Chris raised the question of how NHC forecast quality had developed since.
Raising that question sufficed to rekindle my interest in hurricane
forecasts and I decided to analyse the quality of the NHC forecasts over
a wider range of events. The questions I attempted to answer are, How
does forecast quality…
… vary by forecast lead time?
… develop over time?
… vary by storm type?
NHC forecast advisories
The US National Hurricane Center issues forecasts for the path and intensity of tropical depressions, storms and cyclones that may make landfall in the United States. These forecast advisories are posted on the NHC web portal in a number of formats including text bulletins and cone plots (links are examples of these). Below animation is a composite of all graphic advisories that had been issued during the lifetime of Hurricane Dorian.
If and when NHC is tracking a storm, its forecast advisories are issued at least four times daily at 3am, 9am, 3pm and 9pm (all times in UTC). The forecast cycle for each advisory begins 3 hours prior to the issuance of the advisory products, i.e. at 12am, 6am, 12pm or 6pm. In the present analysis, these latter “reference times” are referred to as T0 (“T-Zero”). Notably, the advisories contained predicted positions at T0 plus 12h, 24h, 36h, 48h, 72h, 96h and 120h – here referred to as forecast lead times.
The
advisories contain both forecasted and observed locations of the storm’s
center. This offers the possibility to verify the forecasts. In the
present analyses, only forecasts of the location of the eye of storms is
verified; no attempt was made to verify other variables such as wind
speed and direction.
The data
The data was obtained by downloading text bulletins of the 2006 through October 2019 forecasts that have been archived on the NHC web portal and extracting the relevant passages. Below plot gives a sense of the number of storms that have been included in the analysis.
Some manual checks were done to ensure that the downloaded and parsed data correspond to the data in the original forecasts. Note that the data contains locations with a resolution of 1/10th of a degree. The advisories address the fact that there is observational uncertainty: the location is only accurate within so many nautical miles. In the present analysis, this observational uncertainty is not taken into account.
Results
Below plot shows forecast errors of all available pairs versus forecast lead time. As expected, the error grows with lead time. This is largely due to the chaotic nature of the atmosphere.
I then, for each lead time separately, plotted the median annual
forecast error versus the year in which the forecast was issued. The
plot shows a general decrease in forecast error from 2006 through 2014.
Since that year, forecast errors have remained more or less constant –
with a slight increase observed in the present calendar year 2019. While
the median error only partly described the full record of errors, an
analysis of error distributions (not shown here) yields similar results.
I subsequently explored how median annual forecast errors vary with
storm types. The idea behind this is that the NHC may give more
attention to hurricanes than to weaker storms, and that this attention
may result in higher forecast quality. This appears to be confirmed by
below graph.
From storm names, their gender may be ascertained. An interesting
question then is whether a difference in forecast quality can be
observed, depending on the storm gender. Indeed, below plot appears to
suggest that females are less predictable than males – which many of us
may either confirm or refute on the basis of personal, anecdotal
evidence.
Discussion
In above analysis, ‘forecast quality’ was reduced to ‘characteristics
of the difference between forecast and observed storm eye location’.
Obviously, there is more to forecast quality than that. A more elaborate
analysis would have to include a more robust statistical analysis of
findings. It could also include further analysis of what factors
contribute to forecast quality. For example, it may vary by basin
(Atlantic versus Eastern Pacific versus Central Pacific), by forecaster,
by the number of preceding storms within a season (due to increased
experience of the forecasters) and by the level of potential adverse
consequences. Also, it could well vary by factors such as level of
agency funding, overall level of forecaster experience and many more.
Storm Donald
Storms tend to trigger forecaster action. An example of the reverse case was observed when US President Trump decided to, independently from the NHC, publish a hurricane forecast. The source of that forecast was unclear, its quality clearly poor. ‘Storm Donald’ has not been included in above analysis, and I can only hope that the NHC forecasters have made it through that storm okay.
You verified forecasts. So what?
Analysing the NHC advisories required a significant amount of
resources and one could ask the question, Why bother? Indeed, this is a
valid question. Verification, while fun, shouldn’t be done simply for
the heck of it. Rather, there should be some wider objective – a set of
questions that may be answered using the results of a verification
exercise.
When I showed a first version of the present blog post to Chris, he asked me various questions, including
Has improved forecast quality resulted in increased benefits, in terms of damage reduction?
If so, how does that benefit relate to the investments made into improved forecasts?
Why has forecast quality kind of leveled off post 2014 or so?
Why are forecasts for hurricanes that much better than those for tropical storms of lesser magnitude?
What is the reason why some of the forecasts have errors in the magnitude of well over 1,000 miles?
These are questions he would not have asked unless he had seen the
results of the analysis of forecast quality. The point I’m trying to
make here is: forecast verification results in a set of questions that,
in turn, may well lead to improved forecasts. And this is exactly one of
the reasons why forecasts should be verified. And while it is important
that forecasters verify their own forecasts, I would argue that
forecast verification information should also be available to a wider
audience than that. A large, diverse audience will ask questions that a
smaller group would have never throught of – thus opening up novel
avenues of improvement. Scary, yes – but so worthwhile.
Post Scriptum on Additional use of the data
I do not have the intention to try and publish any of the above in
the peer reviewed scientific literature. However, feel free to contact
me if you like to bring the analysis to a level that is fit for that. If
you can make a convincing case that you indeed can, I’m happy to share
the data and my scripts with you. Contact me through email at jan@forecastverification.com.