USACE Coastal and Hydraulics Laboratory Quality Controlled, Consistent Measurement Archive

As NDBC publish their historical and real time in situ wave and meteorological data in multiple online locations, USACE developed a methodology to combine these data sources and develop a unique USACE QCC Measurement Archive that is fully self-describing. This required merging the manually quality controlled data that is stored on the NDBC website with the lower quality netCDF data with metadata files for the same stations that are stored at NCEI. The NOAA DODS source was not included as those data are exact copies of what is found within the NDBC historical station pages.

As mentioned, the NDBC website historical station pages contain the cleanest data that has been subjected to manual QA/QC by NDBC Mission Control data analysts. Data collected during service periods (when the buoys were physically on board ships for maintenance) were removed during the manual QA/QC, and are typically not present within the NDBC website data. However this data source contains no metadata other than date and time. This lack of metadata allows for the erroneous inclusion of unidentifiable data from historical time periods where the moored buoys were adrift (inaccurate wave readings, wind, temperature etc.). Additionally, although NDBC switched to a redundant meteorological sensor paradigm during the last decade, only single variable values are available per time stamp per station on the NDBC website. This is because NDBC toggles the release of primary and secondary sensor data to ensure that the highest quality data are published. However, the NDBC website contains no associated metadata indicating when these data release switches occur and hence instrumentation usage is indeterminable. Users often need these sensor details, for example wind sensor height above sea level to extrapolate wind speed at additional heights above the moored buoy. The NDBC website also does not store uncorrected non-directional spectral energy estimates (({c}_{11}^{m})).

Conversely, the NDBC netCDF data stored at NCEI includes metadata such as time-stamped GPS positions, instrumentation metadata, data quality flags1, and data release flags (indicating which data were released to the real time stream). These GPS positions allow for the identification of data that was collected while NDBC moored buoys were adrift. For ease of data source identification, these NDBC netCDF files stored at NCEI will be referred to as NCEI netCDF data below. However, readers should remember that these are all NDBC data, with time-paired values that are collected from the same, unique sensor.

This NCEI netCDF data source also includes both the primary and secondary redundant meteorological sensor outputs, with metadata, as well as uncorrected non-directional spectral energy estimates (({c}_{11}^{m})). These primary and secondary sensor variables are only found within these NCEI netCDF datasets. However, since 2011, these netCDF data are pulled from the NDBC real-time data stream, which is only subjected to automated QA/QC protocols that flag but do not remove suspect data1. Prior to 2011, the NDBC data were stored in an encoded Trusted Data Format (TDF), but these data were converted into netCDF format in early 2020.

Of note is that the NCEI netCDF structures differ for data stored before and after the 2011 switch to netCDF file usage. Throughout the historical netCDF dataset, the netCDF file structures contain non-uniform netCDF formats that are dependent on the data collected during file-specific time periods. Additionally, the pre-2011 netCDF files contain a nominal, fixed deployment position that is repeated for each date/time stamp within the datasets. Furthermore, these pre-2011 netCDF files contain erroneous spectral wave frequency bands that are not included in the NDBC website datasets (and do not match any wave instrumentation frequencies that NDBC has historically deployed). Both formats include instrumentation metadata that are not only inconsistent throughout the years, but within individual netCDF file’s group attributes.

Therefore, to mitigate these identified data source issues10, the USACE QCC Measurement Archive process utilized a methodology (Fig. 1) that combines each dataset’s advantages to develop a best available historical NDBC measurement dataset. For example, the GPS data included within the post-2011 NCEI netCDF files were used to detect data that fell outside a reasonable radius of the moored buoy. Conversely, the NDBC website data were used to isolate which primary or secondary sensor data were released to the public – achieved by matching the individual NDBC variable values to the equivalent primary or secondary NCEI netCDF values, therefore identifying the correct netCDF metadata. Additional outlier QA/QC variable checks, station and metadata verification (provided by literature reviews and historical NDBC buoy deployment log books) allowed for the development of a best available, self-described USACE QCC Measurement Archive.

Fig. 1

Flowchart of the USACE QCC Measurement Archive methodology. This flowchart outlines input data sources, station and metadata verification, selected ‘best’ data sets and output netCDF files.

Full size image

The USACE QCC Measurement Archive methodology process consists of two phases. The first phase of the project processes the historical data, while a second phase annually appends newly available data to the historical database. The data archive routine involves a six step process (Fig. 1) for each buoy station: (1) download, (2) concatenation, (3) metadata verification, (4) comparison, geographical QA/QC and metadata attachment, (5) best dataset selection, and (6) netCDF data file creation. Finally these netCDF files are uploaded to the buoy section of the USACE CHL Data server.

These steps were automated using scripts developed in R software11. Where necessary, each script was subset to handle the particular idiosyncrasies10 of the NDBC and NCEI netCDF data archives. To process all of the historical NDBC data (1970–2021), steps two to five in phase one required ~ 400k cpu hours at the Department of Defense (DOD) Supercomputing Resource Center.

The following steps outline the methodologies utilized within this USACE QCC Measurement Archive development. For more detailed information, please see the USACE QCC Measurement Archive Standard Operating Procedure (SOP) document that is stored in the Archive GitHub (

  1. 1.

    Step 1: Download. Historical NDBC data for all NDBC stations are downloaded from the NDBC website and the NCEI archives. Source-specific archive download links are listed in the USACE QCC Measurement Archive SOP. Data from the storage specific files types (detailed below) are extracted for concatenation in step 2.

    The NDBC website stores data in zipped yearly and monthly files as standard meteorological (stdmet), spectral wave density (swden), spectral wave (alpha1) direction (swdir), spectral wave (alpha2) direction (swdir2), spectral wave (r1) direction (swr1), and spectral wave (r2) direction data (swr2). These files require unzipping. Included within the NDBC stdmet datasets are collected meteorological and bulk wave data in the following structure: wind direction (°), wind speed (m/s), wind gusts (m/s), significant wave height (m), dominant wave period (seconds), average wave period (seconds), mean wave direction (°), air pressure at sea level (hPa), air temperature (°C), water temperature (°C), dew point temperature (°C), visibility (miles) and tide (ft). Visibility and tide are no longer collected by NDBC, and are disregarded.

    The NCEI website stores monthly NDBC files per year in netCDF format. All available data and metadata are extracted from these netCDF files. These files contain the same NDBC data as listed above, but also include additional wave spectral parameters such as uncorrected spectral energy wave data (({c}_{11}^{m})), spectral wave co- and quad-spectra, and four wave data quality assurance parameters that are produced by the NDBC wave processing procedure12.

    The NCEI netCDF file formats differ significantly before and after January 2011. After January 2011, these netCDF structures varied throughout the years as NDBC buoy structures and netCDF creation procedures changed. Each format requires format-specific code to extract the data from the variable fields.

    For example, the pre-2011 netCDF files consistently contain all variables directly within the main file directory. However, the post-2011 netCDF files are structured by ‘payload’, with subset sensor fields (e.g. ‘anemomenter_1’), which in turn have their own subset variable fields (e.g. wind_speed, ‘wind_direction’) with associated quality control and release flags. Therefore users have to navigate through the payload and sensor subfields to discover the variable data with their associated metadata.

    Importantly, these ‘payload’ fields do not always refer to the on-board computer system that serves the sensor suites, e.g. NDBC’s Automated Reporting Environmental System13 (ARES), but also delineate between sensor suites with available primary and secondary sensor data (e.g. ‘payload_1’, ‘payload_2’). Conversely these primary and secondary sensor data (e.g. ‘air_temperature_1’ and ‘air_temperature_2’) may be subset within a single ‘payload’. Of note is that these multiple payloads often contain duplicated data.

    These ‘payload’ fields are also important when extracting data captured by NDBC Self-Contained Ocean Observations Payloads (SCOOP), as these netCDF files resemble the physical structure of the buoy stations with their modular sensor assembly. For example, the NCEI netCDF July 2020 data file for station 41009 includes 5 payload subsections. ‘payload_1’ contains an ‘anemometer_1’ sensor suite, which contains subset wind variables and data flags; ‘barometer_1’, with subset air pressure variables and flags; and a ‘gps_1’ sensor suites, with subset lat, lon variables, etc. ‘payload_2’ contains a second ‘anemometer_1’, ‘barometer_1’, ‘gps_1’, ‘air_temperature_sensor_1’, and ‘humidity_sensor_1’ suites. Payload 3 contains a single ‘gps_1’ fields (lat and lon variables with flags), while payloads 4 and 5 house ‘wave_sensor_1’ and ‘ocean_temperature_sensor_1’ sensor suites respectively, both with their own ‘gps_1’ data. In this example, ‘payload_1’ represents an R.M. Young sensor, while ‘payload_2’ is listed as a MetPak Weather Station instrument in the netCDF sensor suite attributes.

    NDBC is in the process of redesigning these netCDF file formats to be more user friendly. However, they do not plan to reformat their archive datasets. For more details on the NDBC and NCEI netCDF file formats and code extraction descriptions, please see the USACE QCC Measurement Archive SOP within the Archive GitHub.

  2. 2.

    Step 2: Concatenation. This step merges each yearly and monthly data files to produce a single time series of concatenated stdmet data, and time series files for each individual spectral wave variable. The concatenated stdmet data format mirrors the NDBC website data formats. To handle the NDBC data, this step allows for the management of differing yearly file formats and spectral frequencies; the concatenation of multiple date and time columns into one field; and the removal of redundant date, time and tide columns in stdmet data. This step allocates the spectral data into the standard NDBC 38 frequencies (old wave sensors), and 47 frequencies (new wave sensors). Finally, this step converts the NDBC r1 and r2 values to their correct units (NDBC r1 and r2 data are scaled by 100 to reduce storage requirements, so these data should be multiplied by 0.01).

    To handle the NCEI data, this step allows for the concatenation of stdmet data to create a dataset that matches the NDBC website data nomenclature. This step also removes data that were flagged as erroneous by automated NDBC QA/QC protocols. As unit standards vary between the NCEI and NDBC website archives, this step converts the NCEI netCDF pressure units to match the NDBC units (Pa to hPa), and converts the air, water and dew point temperatures from Kelvin to degree Celsius to match NDBC data. This step also performs outlier QA/QC, where it removes zero (‘0‘) wind gust values when no wind speed values are present; direction values greater than 360 °; obvious variable outliers; and duplicated netCDF data points that are ~5–10 seconds apart. To handle the erroneous netCDF spectral frequency data, the code advances through the spectral data and matches the available spectral frequency data to the appropriate 38 frequencies (old wave sensors) or 47 frequencies (new wave sensors).

    1. 3.

      Step 3: Verify metadata. This step is applied solely to the NCEI netCDF data files to validate the netCDF metadata with NDBC-sourced, buoy specific metadata spreadsheets. These metadata spreadsheets were constructed from the NDBC database and original NDBC service technician log books, and provide accurate station and sensor information. Scripts verify or insert missing hull type, payload and mooring type; and verify or insert missing instrument processing systems (for wave data only), instrumentation names and sensor deployment heights. If none are available, metadata fields are augmented with pre-set hull-specific instrumentation specifications that were sourced from online references (for hull-specific instrumentation specifications, please see the USACE QCC Measurement Archive SOP).

    2. 4.

      Step 4: Compare, geographically QA/QC and attach metadata. Compare: Although these data originate from the same sensor, storage protocols resulted in different time stamps for each within their various archives. This step compares the NDBC and NCEI sourced data by matching the datasets by nearest date and time (to the minute), after which geographical data are appended to the NDBC datasets.

      As the NDBC data is manually QA/QC’d and does not contain data collected during buoy maintenance operations, these data were considered as a date/time reference to quality control the fixed positions of the pre-2011 netCDF datasets. In other words, if data were present within the NCEI dataset, but not within the NDBC dataset, those NCEI data records were removed.

      Of interest are the datasets within the NCEI netCDF files that pre-date any data published on the NDBC website. These data are likely from sensor and processing tests conducted during deployments that were intentionally not released to the public. These early data are included in the USACE QCC Measurement Archive but have quality control (QC) flags that rate them as unreliable. For more information on these earlier datasets, please reference the technical note on utilizing NDBC data, ERDC/CHL CHETN-I-10010.

      Geographically QA/QC: Each dataset is filtered to remove GPS positions and associated data that are not within a one (1) degree radius (~60 nautical miles) of the NDBC station watch circles (the surface area through which a buoy can travel while tethered to specific location by a mooring). This radius allows for fluctuations in NDBC deployment locations over the decades, as tests showed that radii of less than one degree significantly removed viable data (see Fig. 2 in the Technical Validation section). Users may wish to further filter their specific datasets to remove additional data points that are outside their target deployment locations; a task now easily achievable with the fully-described, verified metadata included within this USACE QCC Measurement Archive13.

      Two methods are used to geographically QA/QC these data: 1) a sorted table of value occurrences to find the most common latitude and longitude positions (using the assumption that the buoy held its correct station for the majority of its life cycle); 2) a manual confirmation and insertion of the primary station locations that were sourced from NDBC buoy specific metadata spreadsheets. This manual step was relevant for buoys that did not consistently hold their stations due to high vandalism rates or strong currents.

      Assign metadata: Once the data are geographically QA/QC’d, this step assigns verified metadata (from step 3) to the NDBC stdmet datasets as follows. Station-specific hull type, water depth, payload and mooring type are appended to the NDBC stdmet datasets from the NDBC-sourced, buoy specific metadata spreadsheets. These NDBC Buoy Metadata Spreadsheets and the verified NCEI netCDF metadata are then used to assign the correct primary or secondary sensor designation, which includes metadata such as instrument processing systems (for waves) and instrumentation information (names, deployment heights etc.), to the NDBC stdmet datasets by matching the time paired NDBC variable values with the exact NCEI values.

    3. 5.

      Step 5: Create best dataset. This step selects a combination of the geographically QA/QC datasets that were created in step 4 above. These best available, self-describing datasets (Fig. 1) include:

    • NDBC website wind direction, wind speed, wind gust, air pressure at sea level, air temperature, sea surface temperature, significant wave height, dominant and peak periods, mean wave direction, spectral c11, alpha1, alpha2, r1, r2, with their now fully-described, verified metadata.

    • NCEI netCDF spectral ({c}_{11}^{m}). These data are retained within the USACE QCC Measurement Archive to allow for bulk wave parameter re-calculations without the influences of NDBC shore-side processing protocols.

    • Verified station metadata obtained from the NDBC Buoy Metadata Spreadsheets.

    • NCEI netCDF data for the above variables that pre-date the NDBC datasets (where applicable).

    1. 6.

      Step 6: Create netCDF data files. This step creates monthly netCDF NDBC data files that collate all of the best available data variables that were selected in step 5 above. For easy access by the USACE and user community, these month-long netCDF data files are stored on the USACE CHL Data Server and are updated annually. A static copy of the historical data (1970–2021) is located within the USACE Knowledge Core Library Datasets13.

Source: Resources -

“The world needs your smarts, your skills,” Ngozi Okonjo-Iweala tells MIT’s Class of 2022

Optimal Channel Networks accurately model ecologically-relevant geomorphological features of branching river networks