Disaster storylines and knowledge graphs from global news with large language models and retrieval-augmented generation
This study aims to produce structured storylines and knowledge graphs for over 3,000 global disaster events from 2014 to 2024 by processing news articles from the Europe Media Monitor (EMM) through a RAG pipeline combining the power of LLMs for text generation and the semantic extraction of factual knowledge from unstructured news text, capturing multi-hazard dynamics, inter-event relationships, and interactions between physical hazards and societal responses.
It concludes that by transforming diverse textual sources into an well-organized, reusable format, the dataset provides an openly available resource designed for interoperability and reuse in line with the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Beyond research, potential operational advantages include faster data validation and encoding for loss-and-damage tracking, quicker situational awareness and decision support during emergencies, and improved machine-to-machine data exchange.