Cultivating trust in AI for disaster management
Artificial intelligence applied in disaster management must be reliable, accurate, and, above all, transparent. But what does transparency in AI mean, why do we need it, and how is it achieved?
When a wildfire spreads, an earthquake strikes, or a storm makes landfall, the public depends on timely, precise, and accurate information from authoritative sources so it can effectively respond and recover. People also expect vital infrastructure and services (telecommunications, evacuation routes, and first-response measures, to name a few) to be in place to reduce the impacts when disaster hits.
Transparency with respect to the data, training, evaluation, and limitations of artificial intelligence (AI) for disaster management is critical to ensure the safety and robustness of these tools.
In communities and countries around the world, humans-aided by traditional technologies such as physical-based flood models and numerical-based weather forecasts-meet these disaster management expectations as best they can. Increasingly, studies show that artificial intelligence (AI) can build on and supplement these traditional technologies [Sun et al., 2020]. However, AI-based disaster management tools, like their non-AI counterparts, can fail. Furthermore, the complexity of some AI algorithms makes it difficult to pinpoint causes of failure.
Transparency with respect to the data, training, evaluation, and limitations of AI for disaster management is therefore critical to ensure the safety and robustness of these tools. Transparency also cultivates trust among end users, including disaster management agencies, first responders, and individuals, enabling them to make informed decisions with confidence. Here we highlight examples of how AI is already contributing to disaster management and identify steps to foster transparency.
AI and the Four Phases of Disaster Management
Disaster management refers to strategies intended to offset the impacts of hazards. Traditionally, these strategies consider four phases of intervention: mitigation, preparedness, response, and recovery [Sun et al., 2020] (Figure 1).
Mitigation includes actions taken well in advance of a disaster, such as purchasing insurance to minimize potential financial burdens and constructing barriers to hold back future flooding. Preparedness refers to actions taken as a disaster becomes imminent, including forecasting and monitoring its progress, preparing shelters, and stockpiling disaster supplies. The response phase covers actions taken during a disaster, such as providing humanitarian assistance and sending out search and rescue missions. Finally, the recovery phase refers to actions after most of the damage has occurred, including impact assessment, debris removal, and reconstruction.
AI technologies, including machine learning (ML) algorithms trained to recognize patterns in data sets, are showing great promise for beneficial use in all four phases, even outperforming some traditional tools in terms of accuracy and efficiency.
For mitigation, AI is being used to identify vulnerabilities in critical infrastructure and to inform urban planning. For example, the resilience technology company One Concern is combining AI with virtual representations (known as digital twins) of natural and built environments in Japan to visualize possible impacts of disasters on critical infrastructure, including power grids, roads, and airports. In Europe, the DestinE project is building digital twins of Earth systems and funding extensive ML development to better understand the effects of climate and extreme weather events.
Using a different approach, Gazzea et al. [2023] applied AI to understand traffic during hurricanes (Figure 2), which can help urban planners strategically position sensors to capture traffic flow and enhance situational awareness during disasters. Researchers are also using AI to produce maps of landscape susceptibility (e.g., to landslides) to guide infrastructure development [e.g., Azarafza et al., 2021].
Fig. 2. Using a combination of satellite imagery and data on roadways, AI can be trained to optimize the placement of traffic sensors to best capture traffic flows and enhance situational awareness during disasters. Credit: Adapted from Gazzea et al. [2023], CC BY 4.0
AI can support preparedness by contributing to forecasts. For example, AltaML is training AI with data on historical fires, regional weather, and forest conditions to predict wildfires. In addition, the European Centre for Medium-Range Weather Forecasts, enabled by additional funding from member states, runs and publishes publicly available weather forecasts using AI models from Google DeepMind (GraphCast), NVIDIA (FourCastNet), and Huawei (Pangu-Weather) and its own Artificial Intelligence Forecasting System.
Benefits of these algorithms include low computational costs and high accuracy; general disadvantages include the black box nature of AI-generated forecasts.
Benefits of these algorithms include low computational costs and high accuracy with respect to global metrics and certain extreme weather events. General disadvantages include the black box nature of AI-generated forecasts: As AI models grow in complexity, it can become increasingly difficult to understand how they reach their decisions. Other uncertainties relate to how climate change will affect future weather regimes. More specific challenges that have been faced by Pangu-Weather, for example, include overly smooth forecasts, bias with greater lead time, and issues predicting tropical cyclone intensity [Ben Bouallègue et al., 2024]. However, the growing involvement of domain experts in developing data-driven forecasting models can help address shortcomings and expand forecasting capabilities.
AI's ability to detect and monitor hazards can also enhance preparedness. For example, ALERTCalifornia and CAL FIRE are using AI to recognize smoke and other fire indicators in the footage of 1,050 cameras distributed across California and to alert local fire departments, applications that are especially useful for monitoring remote regions. During its first 2 months of operation, the agencies' system correctly identified 77 fires before they were reported via 911. Pano AI is similarly applying AI to data from rotating ultrahigh-definition cameras, satellites, field sensors, and other sources to detect smoke rapidly. Detections verified by human analysts are then communicated to first responders, cutting response times to fires.
During the response phase, AI can provide situational awareness and decision support for disaster management efforts. For instance, research has demonstrated the ability of AI to sift through geolocated social media posts and find clusters of emergency messages, which could help identify where response efforts may need to be prioritized [Powers et al., 2023]. In the Real-time Artificial Intelligence for Decision Support via RPAS Data Analytics (AIDERS) project, data collected by sensors on board remotely piloted aircraft systems (RPAS) are being analyzed using AI to support actionable decisions for first responders in emergency situations. Following a disaster, AI can be used to detect differences in pre- and postdisaster aerial or satellite imagery to ascertain the extent of damages [e.g., Kaur et al., 2023].
The Importance of Transparency
Humans must be able to evaluate the quality of AI-generated information before using it to make important decisions.
Despite many positive examples, applying AI in disaster management also presents substantial challenges. One is understanding how models arrive at their results and thus whether they are reliable. Humans must be able to evaluate the quality of AI-generated information before using it to make important decisions. Often, however, end users are not provided information about how an AI model is trained and evaluated.
Take Google's Android Earthquake Alert System, which for some parts of the world pools anonymized accelerometer data from individual Android phones onto Google servers, applies AI algorithms to detect seismic events, and triggers alerts if a seismic event meets or exceeds a magnitude of 4.5. During the 2023 Kahramanmaraş earthquake sequence along the Türkiye-Syria border, however, reception of the alerts was reportedly patchy, despite circumstances that should have enhanced its reliability: The largest earthquakes exceeded the magnitude threshold, there was a high density of Android users in the region, and at least early in the sequence, it is likely that many of these phones were stationary during the shaking as it occurred at night.
A seeming lack of transparency from the company about how the system operates, how well it works, and how users responded to surveys about the system's functionality following the event has raised concerns about the system's reliability and about if and where it failed (i.e., during the training of the AI model, the detection of the seismic event, or the triggering of the alert). By contrast, the inner workings of other early-warning systems, such as the U.S. Geological Survey's ShakeAlert system, which relies on data from a network of seismometers and issues alerts from a publicly accountable entity, are far more transparent.
Sometimes, AI algorithms might not perform as expected because of deficiencies in the data used for training or as operational input. For example, biases in training data-related to, say, the selection of data collection sites [McGovern et al., 2024]-can distort model outputs. Or if sensors are not sufficiently sensitive-to detect wildfires in remote or otherwise inaccessible regions, for example-an AI algorithm might miss signals, resulting in failures to warn residents.
Because of the complexity of many AI systems, identifying points of failure is difficult, and potential risks may be hard to spot in advance.
Another avenue by which AI is being applied in disaster management is in chatbots. By generating text (see, e.g., the Strengthening Disasters Prevention approaches in Eastern Africa chatbot from the United Nations Educational, Scientific and Cultural Organization), AI can provide guidance and decision support during disasters. However, AI chatbots can raise additional challenges if they "hallucinate"-that is, if they generate convincing but unreliable answers that could be misleading and even dangerous.
Because of the complexity of many AI systems, identifying points of failure is difficult, and potential risks may be hard to spot in advance, especially for underinformed users. Thus, it is important for the developers of these tools to be transparent about the quality, suitability, accessibility, and comprehensiveness of data used in an AI model; about how the model, its components, and its training algorithm function; and about limitations in capability and applicability [Mittelstadt et al., 2019].
Furthermore, this transparency must also be meaningful. In other words, the information provided to stakeholders should be complete and understandable to enable informed decisionmaking. AI developers should implement fail-safes (e.g., involving human oversight), consider the AI literacy of human end users (which affects their ability to interpret output information), and combat cognitive biases such as users' tendency to rely too heavily on algorithms. Together, these approaches can ensure the safety and robustness of AI tools for disaster management, enhance trust in them, enable the replication of methods, and contribute to more efficient transfer of knowledge and capacity sharing among current and potential users of the technology.
Steps to Success
Two deliberate steps that academic researchers, companies, and others developing AI-based tools for operational disaster management can take to foster transparency include sharing comprehensive documentation and undergoing regular independent audits.
Documentation such as open-access metadata, data sheets, and other publications should disclose the origins and characteristics of AI training data according to the FAIR (findable, accessible, interoperable, and reusable) principles and should detail how these data have been processed. Considerations of data quality (including information about missing values or biases) and privacy, as well as ethical considerations such as whether the data are equitably shared, should also be documented. These approaches safeguard sensitive information and build public confidence in the resulting AI application.
During training and evaluation of an AI model, it is important to follow best practices to ensure the reproducibility and validity of the model. Modeling methods, decisions, limitations, and ethical considerations should be disclosed and documented in a short file called a model card [Mitchell et al., 2019], which can be disseminated in publicly available materials on platforms such as GitHub and GitLab or in open-access journals.
We advise using white box algorithms (e.g., causal trees) that are inherently interpretable or, if a higher level of complexity is necessary, combining black box models (e.g., deep neural networks) with explainability methods. Explainability methods help justify the recommendations, decisions, or actions of an AI model. Some of these methods provide a local explanation for why a decision was made for a single prediction, whereas others provide global insights into general model behavior [e.g., Mamalakis et al., 2022]. For example, if a user wants to understand why an AI system provided a specific earthquake detection, a local explanation would suffice. If, however, the user wants to understand why an earthquake detection system repeatedly fails, a global explanation would have greater value.
When an AI model is operationally deployed, it is important to convey its uncertainties and thresholds clearly to users.
When an AI model is operationally deployed, it is important to convey its uncertainties and thresholds clearly to users. For instance, what level of uncertainty in the model output is tolerable, and what threshold must be crossed for an early warning of an earthquake or flash flood to be triggered? Also, what technical requirements (e.g., Internet connectivity) must be met for such a warning to be received?
Finally, it is imperative for developers to conduct regular audits and public reporting of the AI systems they are designing and implementing. Independent evaluations using relevant performance metrics and benchmarks-supported by organizations such as the U.S. Government Accountability Office and the U.S. Department of the Interior-should assess the effectiveness and fairness (the absence of bias in data and algorithms [see Gevaert et al., 2021]) of AI applications. Public reporting of findings promotes transparency and encourages continuous improvement.
In addition to sharing comprehensive documentation and undergoing regular auditing, we emphasize the importance of integrating stakeholders and end users in the development of AI-based systems. Such multiparty and interdisciplinary collaboration is vital for disaster management, in which various groups must work harmoniously in critical conditions [Kuglitsch et al., 2022].
Through partnerships among AI researchers, natural hazard experts, disaster management experts, policymakers, and members of affected communities, AI development becomes more inclusive and responsive to diverse needs and more transparent to all levels of stakeholders. Within these partnerships, adoption of harmonized terminologies around disaster risk and AI-such as those produced by the International Strategy for Disaster Reduction, the United Nations Office for Disaster Risk Reduction, and the Focus Group on AI for Natural Disaster Management (FG-AI4NDM, which has transitioned into the U.N. Global Initiative on Resilience to Natural Hazards through AI Solutions)-can lead to clearer communication across disciplines. Furthermore, adopting policies at the national level (e.g., U.S. federal Executive Order 13960 on promoting trustworthy AI in the government) and regional level (e.g., the European Union's AI Act), as well as standards at the global level (e.g., the technical reports produced by FG-AI4NDM), can help foster transparency.
These steps toward collaboration and transparency-especially in the documentation, implementation, and public presentation of AI systems-are critical to the success of AI for helping at-risk communities worldwide through disaster mitigation, preparedness, response, and recovery.
CC BY-NC-ND 3.0