The value of forecasters-in-the-loop in real-time flood forecasting in the age of machine learning
This study evaluates machine learning (ML) models for hydrological (streamflow/flood) forecasting, but unlike previous research that compared ML against simplified baselines or reanalysis data, it benchmarks ML against a real operational forecasting system — the California Nevada River Forecast Center (CNRFC). That system combines a physics-based hydrological model (CHPS) with human forecasters actively in the loop. The comparison covers general forecasts and flood alerting across lead times up to 96 hours, testing ML even under favorable conditions where it receives observed (perfect) precipitation inputs, while the operational CNRFC system works with imperfect, biased weather forecasts.
The findings revealed that despite the advantage given to ML models, human-guided operational systems consistently outperformed them at all lead times tested. Forecasters proved capable of compensating for inaccurate precipitation inputs, maintaining forecast reliability through their domain expertise — something ML models could not replicate. Moreover, the operational system degraded more gracefully as lead time increased, while ML performance deteriorated more sharply. The study concludes that human expertise remains irreplaceable in real-world forecasting and warns that current ML capabilities are often overstated when evaluated only in controlled, retrospective settings rather than against true operational benchmarks.