Towards multimodal geospatial reasoning: a foundation model approach for disaster detection from social media, news, and weather data
The timely detection of disasters is essential for effective emergency response. Traditional satellite-based monitoring provides accurate hazard observations but suffers from acquisition delays and weather-dependent imaging conditions. Therefore, recent research increasingly uses rapidly available digital data such as social media, news, and weather observations. However, most approaches analyse these sources in isolation and lack standardised evaluation. This publication addresses the gap using a grid-based framework that quantifies disaster detection accuracy relative to satellite-derived reference data. Within this framework, it introduces a multimodal geospatial reasoning method that employs generative Language Models (LMs) to interpret heterogeneous information. Across two case studies on the 2024 Central Europe floods and the 2025 Southern California wildfires, LM-based detection outperformed traditional hotspot and anomaly detection while requiring only ten content items per prediction. Results were robust across prompt variants, and Automatic Prompt Optimisation provided only moderate gains. Overall, this research offers the first systematic evaluation of Bluesky, GDELT, and weather data for disaster detection and shows that Foundation Models (FMs) can act as efficient zero-shot or few-shot detectors of natural-hazard-induced disasters.
Overall, the results of the study show that both Bluesky posts and GDELT news headlines provide measurable, although limited, signals for detecting natural-hazard-induced disaster events. GDELT performed slightly better than Bluesky, likely because news headlines more consistently reference affected locations. In contrast, weather anomaly detection produced near-random results. When the same data sources were analysed with the LM-based geospatial reasoning approach, performance improved substantially. The multimodal LM-based geospatial reasoning approach substantially outperformed all statistical baselines using only ten content items per prediction. Zero-shot configurations provided the best balance between precision and recall for both classes, with the score-based data retrieval variant performing best for the 2024 Central Europe floods and the cell-based retrieval variant for the 2025 Southern California wildfires. This suggests that LMs can effectively interpret evidence from multiple heterogeneous sources and detect disaster events even when data are sparse. Multimodal integration generally improved disaster detection accuracy across both statistical and LMbased approaches, with the multimodal LM configurations outperforming all statistical ensembles. Combining Bluesky and GDELT data produced the most stable results across all disaster detection methods, suggesting that social and news media do indeed hold complementary information.