in

Identification of tropical cyclone–related flash floods from hazard narratives using a large language model–based approach


Abstract

This study employs a Large Language Model (LLM)–based approach to identify tropical cyclones (TCs)-related flash floods and understand their interrelationships using hazard narratives from the National Centers for Environmental Information Storm Events Database. A processing pipeline comprising two LLMs and a series of validation procedures was developed to extract the contributing factors of flash floods and classify them classified as either TC-related factors or external weather and environmental conditions. The analysis identified 6,470 flash floods across the contiguous United States related to 135 North Atlantic and East Pacific TCs between 2007 and 2024. The spatial proximity of flash floods to TC tracks varied substantially, suggesting multiple TC-related flood-generating mechanisms. The contributing factors to flash floods extracted from hazard narratives indicated that floods primarily driven by TC processes occurred closer to the TCs, whereas events influenced by external factors extended well beyond the typical TC rainfall range.

Similar content being viewed by others

Climate change exacerbates compound flooding from recent tropical cyclones

Basin-informed flood frequency analysis using deep learning exhibits consistent projected regional patterns over CONUS

The conterminous United States are projected to become more prone to flash floods in a high-end emissions scenario

Introduction

Floodings are among the world’s deadliest natural hazards, and have significant social, economic, and environmental impacts1. In the United States (U.S.), tropical cyclones (TCs) and their associated heavy rainfall are major causes of inland flooding2,3, and rainfall-induced flood deaths occurred in more TCs than in any other hazard4. Inland flooding can be broadly classified into two categories: flash flooding and riverine flooding. Flash floods are rapid flooding events occurring within six hours of a triggering event, typically heavy rainfall5. Owing to their rapid onset and complex hydrometeorological characteristics, flash floods pose severe threats to life, property, and infrastructure, and present significant challenges for forecasting, early warning, and flood risk management1,3,6. Recent deadly flash floods from Hurricane Helene in North Carolina (2024)7 and the remnants of Hurricane Hilary (2023) in the southwestern U.S.8 have drawn renewed attention to how TCs can interact with other weather systems and environmental conditions to produce flooding hundreds of kilometers away and even after the storm has dissipated. Research has found that most TC-related flood fatalities occur inland, where communities may not understand that they are at risk of TC-induced flooding3. Thus, there is a critical need for a comprehensive assessment of TC-related flash floods, including their spatial patterns and contributing factors. Such knowledge can strengthen public awareness, guide local preparedness efforts, and improve flood risk management.

The climatology of TC-associated flooding in the U.S. has been extensively examined at local, regional, and continental scales2,9,10,11,12. Most of these studies identify floods as TC-associated when river discharge or flooding occurs within defined spatial and temporal windows of the storm track. A spatial window of 500 km from TC tracks is commonly used2,11, as previous studies have shown that this distance captures the majority of TC rainfall and effectively distinguishes TC rain from that produced by other weather systems13,14,15. Consequently, prior studies primarily focused on flash floods driven directly by the rainfall fields of active TCs whose locations were being tracked, overlooking events beyond these spatial ranges. However, previous studies have shown that TC- and remnant-induced environmental changes, particularly enhanced moisture, can interact with mid-latitude systems to produce flooding more than 1000 km away from the parent TC, including some of the most severe events on record16,17,18,19. Therefore, it is essential to incorporate these types of floods when assessing the full scope of TC-related hydrological and societal impacts.

Previous studies seeking to identify the meteorological causes of flooding often require extensive datasets to characterize the complex interrelationships between storms and resulting flood events3,20. As demonstrated by prior work, these analyses typically rely on data from multiple mesoscale and synoptic-scale sources (e.g., weather charts, reanalysis data, and radar observations), along with visual inspection and expert judgment, to accurately attribute flood causation3,20. Consequently, such approaches can be time-intensive, making them impractical for large-scale or systematic assessments of TC-related flooding. In addition, most studies focus primarily on meteorological aspects, while other critical factors—such as soil moisture and topography21—are overlooked due to data limitations and the need for specialized expertise3.

In contrast, narrative descriptions provided in hazard reports and individual studies capture the detailed features, context, and impacts of hazard events. These narratives, often prepared by domain experts, synthesize diverse qualitative information sources, making them invaluable for understanding the complex interrelationships among hazards. For example, one of the most comprehensive hazard databases in the U.S.—the National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Information (NCEI) Storm Events Database22—includes narrative accounts describing the causes and impacts of recorded events. Recent advances in Natural Language Processing (NLP), particularly in Large Language Models (LLMs), now enable causal reasoning and the extraction of structured information from such large volumes of narrative data. Researchers have utilized NLP and LLMs in various tasks in the atmospheric science and hazard research, including identifying disasters interrelationships23, reasoning adverse weather conditions24,25, solving atmospheric science problems26, recognizing TC-related named entities27, and extracting disaster impacts28. Despite their promising capabilities, the potential and challenges of using LLMs and hazard narratives to advance the understanding of complex hazard interrelationships have yet to be fully explored.

This study aims to employs an LLM-based approach to identify flash floods influenced by North Atlantic and East Pacific TCs across the contiguous United States (CONUS) from 2007 to 2024, using hazard narratives provided in the NCEI Storm Events Database. We developed a processing pipeline that integrates two LLMs with a series of validation procedures to extract factors causally contributing to flash floods from the narratives. The extracted factors were subsequently classified as either TC-related or external factors independent of TCs. We then examined the extracted results across three aspects. First, we provided an overview of TC-related flash flood events and their associated human and economic losses. Second, we measured the distance from flash floods to TC tracks to understand their spatial relationship. Third, we examined frequently contributing TC and external factors described in the narratives to better understand how TCs contributed to flash floods and their spatial characteristics. Together, these results provide an improved understanding of TC-related flash floods—where they occur and the conditions that contribute to them—which can support forecasting, mitigation, and public education. We also discuss the feasibility of using LLMs for large-scale analysis of text-based weather and hazard data, along with the study’s limitations and future research directions.

Results

Overview of TC-related flash floods

From 2007 to 2024, a total of 6470 flash floods in the CONUS were identified as related to North Atlantic and East Pacific TCs, representing 9.5% of all flash floods recorded during that time22. However, these TC-related flash floods caused 318 direct fatalities, accounting for 26.5% of the total 1197 flash flood deaths during the study period. About 2.3% of TC-related flash floods caused direct death, nearly double the 1.0% observed for all flash flood events6,22. In terms of direct property damage, TC-related flash floods resulted in a total of $64.1 billion (2024 USD), representing 73.2% of the total flash flood damage of $88.0 billion. This high proportion is largely driven by the extreme losses associated with Hurricane Harvey (2017)6,29. Overall, 38.7% of TC-related flash floods caused property damage, slightly higher than the 36.8% observed for all flash flood events.

Figure 1 presents county-level TC-related flash flood frequency and damages from 2007 to 2024. A total of 1328 counties across 42 CONUS states plus Washington, D.C. experienced at least one TC-related flash flood event (Fig. 1a). In the east CONUS, the most affected areas were along the Atlantic and Gulf coasts, particularly the Carolinas and the Northeast. Florida showed a local minimum in flash flood frequency, consistent with prior studies indicating that much of its flooding is from slow-rise river floods rather than flash floods30. Although not a primary region directly impacted by TC landfalls, the Southwest still experiences frequent TC-related flood events. Even some northern areas, such as counties in Utah and Washington, have recorded such events10. The spatial distributions of these TC-related flood events are generally consistent with previous TC flooding studies2,10. In terms of damage, a total of 118 counties experienced at least one direct death caused by TC-related flash floods (Fig. 1b), and 806 counties experienced direct property damage (Fig. 1c). Both loss metrics reveal high damage clusters in Texas—primarily due to Hurricane Harvey (2017)29—and in western North Carolina, largely attributed to Hurricane Helene (2024)7. It is noteworthy that the damage does not show a sharp decline inland, as flash flood frequency does (Fig. 1a). Significant losses from TC-related flash floods occurred in inland regions that were otherwise less frequently impacted by TCs.

Fig. 1: County-level distribution of TC-related flash floods during 2007–2024.

a Event frequency. b Direct fatalities. c Directproperty damage (million 2024 USD). In all panels, darker shading indicates higher values, lighter shading indicates lower values,and and gray denotes counties without TC-related flash flooding or without reported fatalities or damage during the study period.

Full size image

Spatial proximity of flash floods to TCs

This section examines the spatial relationship between storm location and flood onset. The spatial distribution of flash floods and the flood-inducing TC tracks are shown in Fig. 2. The locations of flash floods were represented by the midpoints between their reported start and end coordinates. These 6470 flash flood events were related to 135 TCs, including 90 North Atlantic TCs and 45 East Pacific TCs. Among these TCs, 65 TCs (44 East Pacific TCs and 21 North Atlantic TCs) did not make landfall over CONUS, and 31 TCs (26 East Pacific TCs and 5 North Atlantic TCs) never moved within 500 km of CONUS, such as Hurricane Joaquin (2015), which were related to severe flooding in South Carolina31. These results suggest that TCs can trigger flash floods far from the storm track. To quantify this, we measured the distance between each flash flood location and the nearest point on the corresponding storm track. For flash floods starting during active TC periods with valid track data, distances were calculated from the flood location to the TC track on the day the flood began. In addition, 17.9% of flash floods started after TC dissipation—when storm positions were no longer tracked. As a TC dissipates, its remnants often interact with and become increasingly influenced by large-scale weather systems, resulting in heavy rainfall10,19. For these flash floods, we measured distances from the flood locations to the nearest points along the entire TC track to assess proximity to the TC during its lifetime.

Fig. 2: Locations of flash floods and related TC tracks during 2007–2024.

a–f Flash flood locations and TC tracks for successiveperiods: a 2007–2009, b 2010–2012, c 2013–2015, d 2016–2018, e 2019–2021, and f 2022–2024. In all panels, blue pointsindicate reported flash flood locations, red lines represent TC tracks, and the gray buffer shows areas within 500 km of the contiguousUnited States.

Full size image

Table 1 summarizes the flood-to-TC distances of flash floods occurring during and after the TC active lifecycle. For flash floods starting within the TC active lifecycle, the average distance from the flash flood to the same-day track segment is approximately 334.9 km, with half of the events located within 163.7 km. This aligns with previous findings that TC-related rainfall and flooding typically occur within a 500 km range2,32. About 22% of flash floods were located outside the 500 km of the storm track on the same day. The flash floods that began after TC dissipation occurred at significantly greater distances from the storm track, likely due to remnant lows being advected north or northwest and continuing to generate rainfall even after the TC was officially classified as dissipated19,33. The average distance was 896.8 km, with over 75% of these events occurring beyond 500 km from the entire TC track. Notably, the distance distributions for both groups exhibit large standard deviations, with maximum distances extending up to 2000 km.

Table 1 Descriptive summary of the distance (km) from the flash flood events to the storm tracks
Full size table

The proximity of flash flood events to the storm track also shows geographic variation (Fig. 3). Figure 3a, b show the flood-to-TC distance occurring during the active TC lifecycle. For most counties in the coastal states along the Gulf and East Coasts, as well as along the inland trajectory from Texas northeastward to Ohio, the average distance is generally less than 500 km. This spatial pattern corresponds to the spatial pattern of landfalling North Atlantic TC tracks, which frequently generate flooding in close proximity to the storm path2. Small clusters of counties over the coast, along the Appalachian region and in the Midwest initiated flash flood events when the storm was, on average, positioned to the south, with over half of these events occurring at distances exceeding 500 km (Fig. 3b). The southwestern CONUS, which is primarily impacted by East Pacific TCs, experienced flash floods that tended to start when the TC was located farther away. Figure 4c, d illustrate the distance of the flash flood occurring after the active TC lifecycle. Post-TC dissipation events were concentrated in the Northeast and southwestern CONUS, with additional scattered clusters across the Midwest and Midsouth (Fig. 3d). On average, these events occurred at greater distances from the storm track, with numerous counties—especially in the Northeast, central, and western regions—recording events at distances exceeding 1000 km (Fig. 3c).

Fig. 3: County-level spatial proximity of flash floods to related TCs.

a Mean distance from flash floods beginning during the TCactive phase to the same-day TC track. b Percentage of flash floods occurring more than 500 km from the same-day TC track. c Mean distance from flash floods beginning after the TC active phase to the full TC track. d Percentage of flash floods occurringmore than 500 km from the full TC track. In all panels, darker shading indicates higher values, lighter shading indicates lower values,and gray denotes counties without events of this type.

Full size image

In summary, the spatial proximity of flash flood events to TCs exhibits great variability, with many occurring outside the typical spatial and temporal range of active TCs. The relatively long distance suggests that a TC-related floods can develop either prior to the arrival of a TC as a result of predecessor rainfall16,17,18, after the TC has passed or dissipated10,33, or from a TC that never directly approached the affected location31. These results also suggest that numerous floods are triggered not solely by TC rainfall, but also by complex interactions between TCs and external environmental factors.

Contributing factors to TC-related flash floods

This section examines how the TCs and external factors contribute to flash floods and their spatial characteristics. The 6470 flash floods were associated with 978 storm episodes, as defined in the Storm Events Database, which groups events linked to the same storm system34. Each episode includes a narrative summarizing the synoptic conditions and storm evolution. Further details are provided in the Methods section. The TC and external factors extracted from the narratives were normalized in categories and Fig. 4a shows the categories that are present in > 5% of 978 TC-related flood episodes and their interconnections.

Fig. 4: Frequent TC and external contributing factors and their combinations.

a Network graph of TC and external contributing factors present in more than 5% of episodes. Green nodes represent TC factors and orange nodes represent external factors; node size reflects episode frequency, and links indicate co-occurrence within the same episode. b Frequent combinations of TC and external factors occurring in more than 5% of episodes. Green bars indicate TC factors-only combinations, and orange bars indicate combinations involving both TC and external factors.

Full size image

TC factors mentioned in these narratives predominantly fall into four categories: The name of the cyclone, remnants, TC rain, and moisture (Fig. 4a). TC names and remnants denote the corresponding storm systems. Almost all the episodes (~94.8%) refer to the storm by its name (e.g., Hurricane Ian) as expected, while about 46.4% of TC-related flood episodes explicitly indicate it’s the remnants of the storm that contribute to the flash floods. TC rain (~51.6% of the episodes) and TC-associated moisture (~40.5% of the episodes) were the two primary features identified as contributing TC factors to flash floods, together accounting for 84.8% of episodes. In the remaining episodes, TC roles were often implied through storm names or remnants only rather than explicitly stated.

Approximately 45.3% of 978 flood episodes reported external factors contributing alongside TCs (Fig. 4a). Among these episodes, approximately 33.9% of episodes mentioned the influence of synoptic-scale systems, including fronts (~23.7%), upper-level enhancements (~10.7%), and low-pressure systems (~5.3%). These synoptic features can provide the lifting mechanisms for widespread heavy rainfall and flood-inducing storms3,16,20,35. Environmental instability and diurnal heating, which also contribute to lifting mechanisms for thunderstorm and heavy rainfall36,37, were reported in about 6.8% episodes. Approximately 6.6% of episodes reported the environmental moisture sources in addition to TC moisture, and about 5.2% of episodes reported the wind and flow features facilitating moisture transport, such as the low-level jet. These factors are consistent with previous research on ingredients that contribute to heavy rainfall that triggers flash floods37,38. In addition to atmospheric conditions, approximately 6.6% of episodes reported preexisting surface conditions (e.g., saturated soil from antecedent rainfall) or topography as contributing factors. Although less frequently mentioned, these factors are important considerations in flash flood forecasting21 and were associated with several high-impact events. For example, rainfall from Hurricane Matthew (2016) fell on already saturated soils caused by above-normal precipitation in September in North Carolina, resulting in catastrophic flash flooding39.

We analyzed the co-occurrence frequency of these factors to assess how their interactions contribute to the flash floods. As TC names appeared in nearly all episodes, episodes were classified into two groups—active TCs and remnants—depending on whether the narratives explicitly referenced storm remnants. Figure 4b shows the most frequent factor combinations. First, active TCs and remnants exhibit different patterns: active TCs are more often linked to rainfall, while remnants contribute slightly more through moisture. Second, remnants and TC moisture commonly co-occurred with external systems such as fronts, upper-level features, and environmental instability. These findings are consistent with previous studies on TC-induced remote rainfall, which highlight that TCs supply abundant moisture that subsequently interacts with environmental lifting mechanisms to produce heavy precipitation40. As a TC dissipates and loses its organized structure, its remnants often interact with and become increasingly influenced by large-scale weather systems41.

Because the combinations of factors reflect distinct TC–flood interrelationships, flash flood events were classified according to four distinct groups, namely active TC-only, remnants-only, active TC + external factors, and remnants + external factors, to examine their locations and spatial relationships to the TCs. Figure 5 shows box plots of flood–TC distances. The Kruskal–Wallis test with Dunn’s post-hoc analysis revealed significant differences (p < 0.01) in flood-to-TC distance among groups, except distance between the two groups involving external factors before TC dissipation (Fig. 5a). Figure 6 illustrates county-level flash flood frequency across the four groups.

Fig. 5: Box plots of flash flood–to–TC distance for four groups.

a Distances for flash flood events beginning during the TC active phase. b Distances for flash flood events beginning after TC dissipation.

Full size image
Fig. 6: County-level frequency of flash floods by event type.

a Active TC-only events. b Remnants-only events. c Active TC + external factors events. d Remnants + external factors events. In all panels, darker shading indicates higher frequencies, lighter shading indicates lower frequencies, and gray denotes counties without that event type.

Full size image

The active TC–only group includes 2545 flash floods with no explicit indication of contributions from remnants or external factors. Approximately 96% of floods in this group occurred before TC dissipation. These events occurred close to the storm center at the time of flood onset, with an average distance of 175.2 km (St. Dev:204.1 km) (Fig. 5a). The extreme distance values (> 1182 km, 99th percentile) were associated with one North Atlantic TC (2020 Hurricane Cristobal) and four East Pacific TCs (2014 Hurricane Marie, 2015 Hurricane Dolores, 2018 Hurricane John, and 2023 Hurricane Hilary). This group of flash floods occurred mainly in the counties along the Gulf and East coasts (Fig. 6a). Two counties in the southwest also had a high frequency of active TC-only flash floods, while this region has fewer TCs directly hitting this area. We found that active TC–only events in these southwestern counties typically referenced the TC name or its moisture contribution, without explicit mention of external factors (e.g., Hurricane Hilary, 2023).

The remnants–only group includes 1186 flash flood events associated with TC remnants, with no explicit indication of external contributing factors. Approximately 90% of floods in this group occurred before TC dissipation. These events were located further from the track with an average distance of 219.9 km (St. Dev:234.6 km) (Fig. 5a). The extreme distance values of this group (>1099 km, 99th percentile) were associated with two North Atlantic TCs (2015 Tropical Storm Bill and 2024 Hurricane Debby) and two East Pacific TC (2015 Hurricane Linda and 2022 Hurricane Kay). The remnants-only events primarily occurred along the Northeast coast, with scattered clusters in the mid-South, inland Texas counties, and the Southwest (Fig. 6b).

The active TC + external factors group includes 1098 flash flood events with no clear mention of remnants but with reference to one or more external factors. Approximately 91% of floods in this group occurred before TC dissipated. As expected, the flash floods in this group occurred farther from the storm, averaging 608.2 km (SD: 533.4 km) from the storm track on the same day, with some events occurring as far as 1000–2000 km away (Fig. 5a). The Carolinas, as well as the southwest, observed the major clusters of this type of event (Fig. 6c).

Lastly, the remnants + external factors group includes 1641 flash flood events with mentions of contribution from both TC remnants and external factors. Only about 51.2% of events started within the TC active cycle, averaging 609.1 km (St.Dev: 489.9 km) from the storm track on the same day (Fig. 5a). The remaining events in this group occurred after TC dissipation and represented the majority of flash floods observed in the post-dissipation period. These post-dissipation events were located with an average distance of 1032.7 km (St.Dev: 559.6 km) away from the any part of the entire TC track (Fig. 5b). Remnants + external factors events had a broad pattern from South Carolina to Maine, in the Midwest and inland South, and in the Southwest from southern California to New Mexico (Fig. 6d).

Discussion

Understanding TC contributions to flash floods is crucial; however, conventional approaches often overlook events occurring beyond typical spatial ranges influenced by TC–environment interactions, largely due to data and method limitations. This study developed an LLM-based framework to extract contributing factors to flash floods from hazard narratives, identify TC-related events, and analyze their interrelationships. The results identified 6470 flash floods across the CONUS that were associated with 135 North Atlantic and East Pacific TCs during 2007–2024. Flash floods exhibited substantial variability in both location and timing relative to the parent TC. By examining the flash floods contributing factors extracted from the narratives, we found that approximately half of the flood episodes involved contributions from both TC-related and external environmental conditions. These differing flood-generation mechanisms are associated with distinct spatial characteristics of the flash floods. Flash floods driven solely by active TCs or the remnants typically occurred near the storm track during the TC’s active phase. In contrast, the flash floods that were influenced by TCs or their remnants in combination with external weather systems and environmental conditions tend to develop beyond TC rain range and after TC dissipation. Furthermore, TC–flash flood relationships exhibit clear spatial variability. Areas along the Gulf and Atlantic Coasts were more frequently affected by floods associated with active TCs, whereas northern and inland regions, such as Northeast, central, and Southwest, experienced a higher occurrence of flash floods contributed by TC remnants and environmental factors.

To our knowledge, this study is among the first to employ LLM-based text mining of large-scale weather datasets to examine complex multi-hazard interactions. It advances the understanding of TC-related flash flood climatology by integrating both direct and indirect TC-induced events identified from expert-curated narratives. The quantification of the distance between flood events and TCs suggests that an inland location may experience flooding either before the cyclone makes its closest approach or even when it is not directly in the cyclone’s path. In addition, several of these remote flood events were associated with weaker storms and occurred after TC dissipation. This information can be valuable for improving public awareness and preparedness regarding flash flood safety, particularly for the inland regions beyond the direct TC track3. The findings also highlight the critical role of TC moisture and remnants, in conjunction with environmental factors, in generating floods—particularly those occurring beyond the typical spatial extent of TC influence. With an increase in TC moisture convergence and precipitation rate is anticipated due to climate warming, inland flood risks are expected to rise, highlighting the need for improved TC flood risk management, early warning, and emergency managment42,43.

In addition, this work evaluated the feasibility of the proposed LLM-based framework. As detailed in the Methods section, the model demonstrated strong performance in extracting causal contributing factors to flash floods and labeling them as TC-related or external. The knowledge derived from the extracted information was broadly consistent with findings from previous studies using independent datasets and quantitative analyses, further confirming the robustness of the results. This study underscores the potential of LLMs for analyzing text-based weather and hazard data. Given that much meteorological information—such as forecasts, warnings, and post-event reports—is conveyed in textual form, applying LLMs offers a novel and effective approach to extract and synthesize insights that conventional analytical methods often overlook.

This study has several limitations. First, the extracted information depends on the quality of the hazard report narratives. Although the Storm Events Database is considered one of the best publicly available collections of hazard reports in the U.S., and its records undergo quality control by local offices as well as the NCEI, biases and limitations may still exist due to reporting practices, human judgment, and varying data sources44. Second, LLM methods may have ambiguity or uncertainty depending on how the narrative describes the scenario. For example, while diurnal and topographic effects are important for heavy rainfall and flooding, these factors may be implied only through references to flood timing (e.g., afternoon) or location (e.g., Blue Ridge) rather than explicitly stated, leading to potential omissions in the extracted outputs. Future studies should carefully design and refine prompting strategies to better capture such implicit information within narrative text. Third, we presented contributing factor combinations for the same episode; however, complex spatial and temporal relationships may exist, as an episode can last up to five days34. Fully understanding the mechanisms of rainfall and flooding will require further work incorporating additional data and modeling approaches. For example, analysis of rainstorm structures as detected by ground-based radars can establish regions covered by TC rainbands and provide precise measurements of flood event distances from the TC’s edge.

Methods

Data

This study obtained the storm names, tracks, and timing of TCs that originated from the East Pacific and North Atlantic basins from IBTrACS (International Best Track Archive for Climate Stewardship)45. A total of 654 TCs were included in the analysis to examine their contribution to the flash floods over the COUNS during the 18-year period from 2007 to 2024. The flash flood event information was obtained from the NOAA NCEI Storm Events Database, which documents severe weather events and unusual meteorological phenomena in the U.S. since 195022. This dataset is regarded as the best available and official archive of storm events in the U.S. and has been widely used in flash flood and weather hazards studies3,6,30,44,46.

Inside the Storm Events Database, the data are structured in the following way. First, each record in the Storm Event database represents an individual hazardous event and its detailed information, such as hazard type (e.g., tropical storm, flash flood, or tornado), timing, locations, and loss information34. This work focused on flash flood events. The Storm Events Database reports several flood types, including flash floods, floods, coastal floods, and storm surges. The NCEI distinguishes flash floods from other flood types in the database. According to guidelines provided by the NWS, flash floods are defined as “a life-threatening, rapid rise of water into a normally dry area, beginning within minutes to several hours of the causative event (e.g., intense rainfall, dam failure, ice jam)”34. For each flash flood, the start and end times of the events were reported in the local time zone, and the locations were reported by the impacted county and by the start and end locations of the affected area in latitude and longitude coordinates. Flash flood losses include direct fatalities and property damages reported in actual U.S. dollars.

Second, each event is part of a storm episode, which represents an entire storm system and may encompass various types of weather events. All events within a single episode are considered as associated with the same synoptic meteorological system34. Each episode includes an episode narrative, prepared by the NWS Weather Forecast Office that warned on the event, which provides a detailed textual description of the synoptic meteorology associated with the episode. For flooding events, a summary related to the rainfall or other conditions causing the floods are included. In particular, for TC-related events, the names of the TCs are included in the episode narratives of all related individual events. Since narratives have been available for all episodes only since October 2006, we focused on the period from 2007 to 2024 to ensure the analysis covered 18 complete hurricane seasons.

Information extraction from the narratives

Our method for extracting TC and external factors casually contributing to flash floods from episode narratives uses a pipeline consisting of four main components, as illustrated in Fig. 7. The first component performed episode selection from the Storm Event database using a keyword filtering. The second component used LLM prompting to extract information about casual TC-related and external contributing factors to flash floods, storing the result in a semi-structured format. The third component conducted validation checks of the two LLMs outputs with the original narratives. The fourth component post-processed the semi-structured LLM outputs by exploring the word frequency and normalizing all extracted information. Below we describe each of the four components in more depth.

Fig. 7: Information extraction pipeline with four main modules.

a Text selection: flash flood episodes selected from the Storm Events Database using keyword filtering and filters on hazard type, time, and location. b LLM prompting: two LLMs used to extract contributing factors from the narratives. c Validation: LLM outputs checked against original narratives to generate the verified results. d Post-processing: extracted information analyzed through word-frequency exploration and normalization.

Full size image

Selecting flash flood episode narratives

We first filtered the Storm Events Database by hazard type, time, and location, identifying 20,394 episodes of flash flood events that occurred across the CONUS between 2007 and 2024. We then used a list of 654 East Pacific and North Atlantic TC names (e.g., “Ian” or “Helene) to perform a keyword-based search of the episode narratives, identifying episodes whose narratives contained a TC name corresponding to the year in which the TC occurred. This process yielded 1061 episodes involving flash floods that mentioned a TC name, which were used as inputs for LLM prompting. The episode narratives varied in length, with an average of 141 (St. Dev: 212) tokens per episode.

LLM Prompting

In the core component of our information extraction pipeline, we fed the selected narratives into a LLM along with a sequence of prompts designed to identify causal contributing factors from TCs and external sources that are independent of TCs to the flash flood events. We constructed few-shot prompts to extract three levels of information: (1) whether the TC contributed to the flash floods; (2) the specific factors contributing to the flash floods; and (3) the classification of each factor as either TC-related or external. TC-related factors were defined as features or environmental conditions directly associated with, or primarily influenced by, TC. Examples include rainfall and thunderstorms generated by TCs, as well as tropical moisture transported by TC circulation. In contrast, external factors refer to weather systems or environmental conditions unrelated to TCs. Common examples of external factors include fronts, troughs, upper-level enhancement, topographic effects, atmospheric instability, diurnal heating cycles, and surface conditions37. We also requested the export of intermediate steps to aid in decision-making regarding how identified TC and external factors led to the flash floods. In particular, factors such as rainfall or thunderstorms may be classified as either TC-related or intermediate; the following method was applied to make this determination: If no external factors were mentioned in producing the rainfall, the rainfall was considered primarily a TC factor; otherwise, if the rainfall resulted from the interaction of a TC and external factors (e.g., a cold front), it was classified as an intermediate step. The prompts we applied are provided in Supplementary Text S1. We instructed LLM to generate outputs in CSV format to facilitate downstream processing and analysis.

The prompt engineering process utilized several LLMs, including GPT-4.147, DeepSeek-v3.2-exp48, and Gemini-2.0-Flash (01-21)49. The formal analysis utilized the OpenAI GPT-4.1 and DeepSeek-v3.2-exp chat streaming APIs as the LLMs. During the inference stage, the temperature parameter is fixed at 0.3 to promote more deterministic outputs; the nucleus sampling threshold (top-p) is set to 1.0, ensuring that the entire probability mass is considered; the maximum token limit is determined by the default configuration of the respective models, and the top-k parameter is not specified in the streaming API.

Validation and evaluation of LLM outputs

The raw output generated by the LLM required validation and consistency checks prior to inclusion in the analysis. To ensure data reliability, we conducted a series of validation procedures and compiled a verified list of TC-related episodes, along with their contributing factors as defined in the narratives. First, we identified the episodes that were related to TCs. Non–TC-related episodes were defined as those in which the LLM outputs either did not report any contributing factors, as instructed, or reported only external factors without any TC-related factors. At the episode level, the agreement between the two LLMs—OpenAI GPT-4.1 and DeepSeek-v3.2-exp chat—was 99.4%, with only six episodes showing disagreement. We manually went through all 1061 episodes to verify the results and ultimately identified 978 TC-related episodes.

We combined the extracted factors from both LLMs for the 978 TC-related episodes and validated them against the original narratives. The two models collectively extracted 9409 contributing factors. We compared the outputs of the two LLMs using similarity metrics, including a fuzzy token set ratio threshold of 65 and a Jaccard similarity score threshold of 0.3. The two models exhibited an overall agreement of approximately 87%, with around 13% of the extracted factors differing between them. To ensure accuracy, all TC-related episodes were manually reviewed to validate the extracted TC and external factors against the original narratives, with particular attention given to unmatched cases. This process was supplemented by both exact and fuzzy word-matching techniques to determine whether the extracted information was explicitly present in the narrative rather than inferred or hallucinated by the models. Extracted factors were evaluated using defined criteria to ensure accuracy. A factor was considered a true positive if it (1) appeared in the original narrative or as a reasonable lexical variant; (2) represented a weather system or environmental condition contributing to the flash flood; and (3) was correctly labeled as either “TC” or “External.” Because LLM-generated boundaries can vary, similar expressions (e.g., “frontal boundary,” “a weak frontal boundary,” or “front”) were all accepted when contextually valid. After validation, a finalized list of TC-related and external contributing factors was compiled for all TC-related flash flood events.

To address this ambiguity in entity extraction, we adopted an adjusted “manual score” approach suggested by Dagdelen et al.50 to determine whether the extracted phrase accurately reflects the source information, accounting for equivalent or variant expressions. Evaluation scores, including precision, recall, and F1, were calculated at both the episode and factor levels. While the precision (Eq. (1)) and recall (Eq. (2)) assess the prevalence of false negatives and false positives, the F1 (Eq. (3)) balances the importance of precision and recall and is preferable to accuracy for class-imbalanced datasets.

$${precision}=frac{{No}.{of; true; postive}}{{No}.{of; true; positive}+{No}.{of; false; positive}}$$
(1)
$${recall}=frac{{No}.{of; true; positive}}{{No}.{of; true; positive}+{No}.{of; false; negative}}$$
(2)
$$F1=frac{2times {precision}times {recall}}{{precision}+{recall}}$$
(3)

The evaluation results are presented in Table 2. For the task of identifying whether a flash flood episode is related to a TC, both LLMs achieved a F1 score of 99%, demonstrating strong performance. The false negatives involved predecessor rainfall events or ambiguous cases where a TC was mentioned in the broader synoptic environment without a clear causal link to the flooding. The false positives primarily arose when TC name matches referred to place names rather than storms (e.g., Florence County, Ida County) or the TC being mentioned for unrelated purposes.

Table 2 Model manual validation score
Full size table

Both models achieved F1 scores of 85–90% for extracting and labeling contributing factors. TC-factors showed marginally lower precision, largely from false positives involving non-contributing elements such as strong winds, tornadoes, or repeated terms like “flooding.” False negatives mainly reflected missed factors, such as cases where moisture was not correctly recognized as TC-related. For external factors, false positives mainly stemmed from misclassifying TC-related elements like tropical moisture as external, or from reporting non-contributing features (e.g., high tides). The false negatives were caused by the missing key environmental contributors to TCs. Although this does not constitute the most rigorous evaluation of the method, the strong agreement between the two models and with the original narratives nevertheless demonstrates the feasibility of extracting weather and environmental conditions and causal relationships from text-based data.

Extracted information post-processing

After validating the extracted factors, we performed a word frequency analysis using all confirmed contributing factors as an exploratory step. Word frequency was calculated after stop word removal and lemmatization, and the results were in the form of word clouds (Supplementary Figs. S1 and S2). These contributing factors generally align with previous research on the TC, heavy rainfall, and flash floods3,37. To prepare the data for quantitative analysis, each validated factor was mapped to a predefined category using keyword matching, fuzzy matching, and rule-based methods, guided by terminology from the National Weather Service Weather Glossary5, Glossary of National Hurricane Center Terms51, American Meteorological Society Glossary of Meteorology52, and prior studies on tropical cyclone rainfall and flooding3,20,35,37,38. The categories and exemplary factors are shown in Table 3.

Table 3 Factor categorizing terms
Full size table

Spatial and statistical analysis

Utilizing the extracted information, we identified 978 TC-related episodes encompassing 6470 flash flood events. We first summarized flash flood events by their impacted counties, direct fatalities, and direct property damage. The direct property damage reported in the Storm Event database was adjusted for inflation using the consumer price index metric obtained from the U.S. Bureau of Labor Statistics53. All damage values were scaled to bring monetary values from the year of the event to the final data year (2024).

Second, we analyzed the spatial proximity of flash flood events relative to their corresponding TCs using a Geographic Information System (GIS). Flash flood locations were represented by midpoints derived from start and end coordinates of impacted areas reported in Storm Events. Flash flood start times were converted to UTC and compared with TC active periods from IBTrACS, defined as the interval between TC origination and dissipation or cessation of tracking. For the flash flood events that started within the TC active cycle, the flood-to-TC distance was calculated from the flash flood midpoint to the nearest point on the TC track segment on the same day. For events that occurred after the TC’s active period—when no valid TC track was recorded—we measured the distance to the nearest point along the full TC track. All spatial analyses were performed using Python and ArcPy (ArcGIS Pro v3.3, Environmental Systems Research Institute, Inc.). These distance metrics were mapped at the county level to provide a geographic overview of where flash floods tend to start relative to TC locations.

Last, we investigated the complex interactions between the TC and flash floods by examining the contributed factors to flash floods described in the narratives. Frequent TC and external factors and their interconnections were visualized using a network graph. Flash flood events were further classified according to their contributing factors: initially by the presence or absence of external factors, and subsequently by whether the episode was associated with a TC remnant. This classification resulted in four groups, namely active TC-only events, remnants-only events, active TC + external factors events, and remnants + external factors events. The Kruskal–Wallis test and Dunn’s post-hoc tests were applied to determine whether the distance metrics varied significantly among the four groups. Our working hypothesis is that flash floods influenced by TC and external factors occur at greater distances from the storm center.

Data availability

All the datasets analyzed in this study can be obtained from the following open sources. Tropical cyclone best track data are available at https://doi.org/10.25921/82ty-9e16. Storm Event Database is provided by the National Centers for Environmental Information at www.ncdc.noaa.gov/stormevents/.

References

  1. World Meteorological Organization. Flash Flood Guidance System: Response to one of the deadliest hazards. World Meteorological Organization Available at: https://wmo.int/media/magazine-article/flash-flood-guidance-system-response-one-of-deadliest-hazards (accessed 17 Jul 2025). (WMO, 2020).

  2. Villarini, G., Goska, R., Smith, J. A. & Vecchi, G. A. North Atlantic tropical cyclones and U.S. flooding. Bull. Am. Meteorol. Soc. 95, 1381–1388 (2014).

    Article 

    Google Scholar 

  3. Ashley, S. T. & Ashley, W. S. The storm morphology of deadly flooding events in the United States. Int. J. Climatol. 28, 493–503 (2008).

    Article 

    Google Scholar 

  4. Rappaport, E. N. Fatalities in the United States from Atlantic tropical cyclones: new data and interpretation. Bull. Am. Meteorol. Soc. 95, 341–346 (2014).

    Article 

    Google Scholar 

  5. National Weather Service (NWS). National Weather Service Instruction 10–1605, NWSPD 10–16: Storm Data Preparation. National Weather Service Available at: https://www.weather.gov (accessed 5 Jul 2025). (NWS, 2021).

  6. Ahmadalipour, A. & Moradkhani, H. A data-driven analysis of flash flood hazard, fatalities, and damages over the CONUS during 1996–2017. J. Hydrol. 578, 124106 (2019).

    Article 

    Google Scholar 

  7. National Hurricane Center (NHC). Tropical Cyclone Report: AL092024 (Helene). National Hurricane Center Available at: https://www.nhc.noaa.gov/data/tcr/AL092024_Helene.pdf (accessed 17 Jun 2025). (NHC, 2024).

  8. National Hurricane Center (NHC). Tropical Cyclone Report: EP092023 (Hilary). National Hurricane Center Available at: https://www.nhc.noaa.gov/data/tcr/EP092023_Hilary.pdf (accessed 17 Jun 2025). (NHC, 2023).

  9. Villarini, G., Smith, J. A., Baeck, M. L., Marchok, T. & Vecchi, G. A. Characterization of rainfall distribution and flooding associated with U.S. landfalling tropical cyclones: analyses of Hurricanes Frances, Ivan, and Jeanne (2004). J. Geophys. Res. 116, D23 (2011).

    Google Scholar 

  10. Barth, N. A., Villarini, G. & White, K. Contribution of eastern North Pacific tropical cyclones and their remnants on flooding in the western United States. Int. J. Climatol. 38, 5441–5446 (2018).

    Article 

    Google Scholar 

  11. Liu, M., Smith, J. A., Yang, L. & Vecchi, G. A. Tropical cyclone flooding in the Carolinas. J. Hydrometeorol. 23, 53–70 (2022).

    Article 

    Google Scholar 

  12. Gori, A., Lin, N. & Smith, J. Assessing compound flooding from landfalling tropical cyclones on the North Carolina coast. Water Resour. Res. 56, e2019WR026788 (2020).

    Article 

    Google Scholar 

  13. Zhou, Y. & Matyas, C. J. Spatial characteristics of storm-total rainfall swaths associated with tropical cyclones over the Eastern United States. Int. J. Climatol. 37, 557–569 (2017).

    Article 

    Google Scholar 

  14. Jiang, H., Liu, C. & Zipser, E. J. A TRMM-based tropical cyclone cloud and precipitation feature database. J. Appl. Meteorol. Climatol. 50, 1255–1274 (2011).

    Article 

    Google Scholar 

  15. Prat, O. P. & Nelson, B. R. Precipitation contribution of tropical cyclones in the southeastern United States from 1998 to 2009 using TRMM satellite data. J. Clim. 26, 1047–1062 (2013).

    Article 

    Google Scholar 

  16. Galarneau, T. J., Bosart, L. F. & Schumacher, R. S. Predecessor rain events ahead of tropical cyclones. Mon. Weather Rev. 138, 3272–3297 (2010).

    Article 

    Google Scholar 

  17. Moore, B. J., Bosart, L. F., Keyser, D. & Jurewicz, M. L. Synoptic-scale environments of predecessor rain events occurring east of the Rocky Mountains in association with Atlantic basin tropical cyclones. Mon. Weather Rev. 141, 1022–1047 (2013).

    Article 

    Google Scholar 

  18. Rowe, S. T. & Villarini, G. Flooding associated with predecessor rain events over the Midwest United States. Environ. Res. Lett. 8, 024007 (2013).

    Article 

    Google Scholar 

  19. Ritchie, E. A., Wood, K. M., Gutzler, D. S. & White, S. R. The influence of eastern Pacific tropical cyclone remnants on the southwestern United States. Mon. Weather Rev. 139, 192–210 (2011).

    Article 

    Google Scholar 

  20. Mullens, E. D. Meteorological cause and characteristics of widespread heavy precipitation in the Texas Gulf watershed 2003–2018. Int J Climatol 41, 3743–3760 (2021).

    Article 

    Google Scholar 

  21. Gourley, J. J. & Clark, R. A. III. Real-time flash flood forecasting. In Oxford Encyclopedia of Natural Hazard Science (Oxford Univ. Press, https://doi.org/10.1093/acrefore/9780199389407.013.298 (2018).

  22. National Center for Environmental Information (NCEI). Storm Events Database. NOAA NCEI (2024). Available at: https://www.ncdc.noaa.gov/stormevents (accessed 20 Mar 2025).

  23. Liu, B., Guo, H. & Wang, H. An AI-driven approach to extract interrelationships between disasters. Int. J. Disaster Risk Reduct. 121, 105417 (2025).

    Article 

    Google Scholar 

  24. Zafarmomen, N. & Samadi, V. Can large language models effectively reason about adverse weather conditions? Environ. Model. Softw 188, 106421 (2025).

    Article 

    Google Scholar 

  25. Walczak, S. & Dinh, L. A text mining analytic approach for distinguishing between disaster and non-disaster zones from tweets. Int. J. Disaster Risk Reduct 118, 105233 (2025).

    Article 

    Google Scholar 

  26. Li, C., Deng, W., Lu, M. & Yuan, B. AtmosSci-Bench: evaluating the recent advance of large language model for atmospheric science. Preprint at https://doi.org/10.48550/arXiv.2502.01159 (2025) (accessed 27 Mar 2025).

  27. Zhou, Y. & Liu, P. Assessing multi-hazards related to tropical cyclones through large language models and geospatial approaches. Environ. Res. Lett. 19, 124069 (2024).

    Article 

    Google Scholar 

  28. Li, N. et al. Using LLMs to build a database of climate extreme impacts. In Proc. 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024) 93–110 (Association for Computational Linguistics, Bangkok, Thailand, https://doi.org/10.18653/v1/2024.climatenlp-1.7 (NLP, 2024).

  29. National Hurricane Center (NHC). Tropical Cyclone Report: AL092017 (Harvey). National Hurricane Center Available at: https://www.nhc.noaa.gov/data/tcr/AL092017_Harvey.pdf (accessed 17 Jun 2025). (NHC, 2018).

  30. Dougherty, E. & Rasmussen, K. L. Climatology of flood-producing storms and their associated rainfall characteristics in the United States. Mon. Weather Rev. 147, 3861–3877 (2019).

    Article 

    Google Scholar 

  31. Marciano, C. G. & Lackmann, G. M. The South Carolina flood of October 2015: moisture transport analysis and the role of Hurricane Joaquin. J. Hydrometeorol 18, 2973–2990 (2017).

    Article 

    Google Scholar 

  32. Jiang, H., Ramirez, E. M. & Cecil, D. J. Convective and rainfall properties of tropical cyclone inner cores and rainbands from 11 years of TRMM data. Mon. Weather Rev. 141, 431–450 (2013).

    Article 

    Google Scholar 

  33. National Weather Service. Climatology of tropical cyclone remnants for northern Illinois & northwest Indiana. https://www.weather.gov/lot/tropical_climatology. (accessed 17 Jun 2025).

  34. National Weather Service (NWS). National Weather Service Instruction 10–1605, NWSPD 10–16: Storm Data Preparation. National Weather Service (2021). Available at: https://www.weather.gov (accessed 5 Jul 2025).

  35. Konrad, C. E. & Perry, L. B. Relationships between tropical cyclones and heavy rainfall in the Carolina region of the USA Int. J. Climatol. 30, 522–534 (2010).

    Article 

    Google Scholar 

  36. Bowman, K. P. & Fowler, M. D. The diurnal cycle of precipitation in tropical cyclones. J. Clim. 28, 5325–5334 (2015).

    Article 

    Google Scholar 

  37. Lamers, A. et al. Forecasting tropical cyclone rainfall and flooding hazards and impacts. Trop. Cyclone Res. Rev. 12, 100–112 (2023).

    Google Scholar 

  38. Doswell, C. A., Brooks, H. E. & Maddox, R. A. Flash flood forecasting: an ingredients-based methodology. Weather Forecast 11, 560–581 (1996).

    2.0.CO;2″ data-track-item_id=”10.1175/1520-0434(1996)011<0560:FFFAIB>2.0.CO;2″ data-track-value=”article reference” data-track-action=”article reference” href=”https://doi.org/10.1175%2F1520-0434%281996%29011%3C0560%3AFFFAIB%3E2.0.CO%3B2″ aria-label=”Article reference 38″ data-doi=”10.1175/1520-0434(1996)011<0560:FFFAIB>2.0.CO;2″>Article 

    Google Scholar 

  39. Davis, C. Five years later, five lessons learned from Matthew–North Carolina State Climate Office. https://climate.ncsu.edu/blog/2021/10/five-years-later-five-lessons-learned-from-matthew/ (2021) (Accessed on Oct 31 2025).

  40. Schumacher, R. S., Galarneau, T. J. & Bosart, L. F. Distant effects of a recurving tropical cyclone on rainfall in a midlatitude convective system: a high-impact predecessor rain event. Mon. Weather Rev 139, 650–667 (2011).

    Article 

    Google Scholar 

  41. Lai, E. Tropical cyclone rainfall and flood forecasting. Global Guide Tropical Cyclone Forecast. (World Meteorological Organization, 2017).

  42. Knutson, T. et al. Tropical cyclones and climate change assessment: part II: projected response to anthropogenic warming. Bull. Am. Meteorol. Soc. 101, E303–E322 (2020).

    Article 

    Google Scholar 

  43. Maxwell, J. T. et al. Recent increases in tropical cyclone precipitation extremes over the US east coast. Proc. Natl. Acad. Sci. USA 118, e2105636118 (2021).

    Article 
    CAS 

    Google Scholar 

  44. Marjerison, R. D., Walter, M. T., Sullivan, P. J. & Colucci, S. J. Does population affect the location of flash flood reports? J. Appl. Meteorol. Climatol 55, 1953–1963 (2016).

    Article 

    Google Scholar 

  45. Knapp, K. R., Diamond, H. J., Kossin, J. P., Kruk, M. C. & Schreck III, C. J. M. International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4. North Atlantic version. NOAA National Centers for Environmental Information (2018). https://doi.org/10.25921/82ty-9e16 (accessed 1 Dec 2024).

  46. Taszarek, M. et al. Severe convective storms across Europe and the United States. Part I: climatology of lightning, large hail, severe wind, and tornadoes. J. Clim 33, 10239–10261 (2020).

    Article 

    Google Scholar 

  47. OpenAI et al. GPT-4 Technical Report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2024) (accessed 20 Mar 2025).

  48. DeepSeek-AI et al. DeepSeek-V3 Technical Report. Preprint at https://doi.org/10.48550/arXiv.2412.19437 (2025) (accessed 18 Apr 2025).

  49. Google Cloud. Gemini 2.0 Flash | Generative AI on Vertex AI. https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-0-flash. (accessed 20 Mar 2025).

  50. Dagdelen, J. et al. Structured information extraction from scientific text with large language models. Nat. Commun. 15, 1418 (2024).

    Article 
    CAS 

    Google Scholar 

  51. National Hurricane Center. Glossary of NHC terms. https://www.nhc.noaa.gov/aboutgloss.shtml (accessed 27 May 2025).

  52. American Meteorological Society. Glossary of Meteorology. https://www.ametsoc.org/ams/publications/glossary-of-meteorology/(accessed 17 July 2025).

  53. Bureau of Labor Statistics. Consumer price index. https://www.bls.gov/cpi/ (accessed 24 Jul 2025).

Download references

Acknowledgements

Y.Z. acknowledges the support from the Embry-Riddle Aeronautical University Faculty Research Development Program and the College of Aviation Small Equipment Grant. H.L. acknowledges the support from the U.S. National Science Foundation (DMS-2424605).

Author information

Authors and Affiliations

Authors

Contributions

Y.Z. conceived the idea, collected the data, and conducted the preliminary analysis. Y.Z. and P.L. developed the methodology. Y.Z. and C.M. investigated the preliminary results. Y.Z. wrote the original draft. Y.Z., C.M., P.L., and H.L. reviewed and edited the manuscript.

Corresponding author

Correspondence to
Yao Zhou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

44304_2025_156_MOESM1_ESM

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, Y., Matyas, C., Liu, P. et al. Identification of tropical cyclone–related flash floods from hazard narratives using a large language model–based approach.
npj Nat. Hazards 2, 104 (2025). https://doi.org/10.1038/s44304-025-00156-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s44304-025-00156-6


Source: Resources - nature.com

Tariff familiarity sustains household water conservation

Author Correction: Sociality predicts orangutan vocal phenotype

Back to Top