PointStat: Python embedding to read Argo netCDF files to verify ocean temperature forecast at 50 m depth

model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp.conf

Scientific Objective

This use case utilizes the ASCII2NC tool with python embedding to natively read in Argo netCDF files, a common source of ocean profile data for operational entities. These values are then used by the PointStat tool to verify RTOFS ocean temperature forecast at 50 m depth.

Datasets

Forecast: RTOFSv2.3 forecast data pre-processed into 0.1 degree lat-lon grid
Observations: three netCDF files from Argo
Climatology: two monthly climatology files from WOA23
Sea Ice Mask: a mask file to exclude forecast grid points with sea ice concentration > 15%
Location: All of the input data required for this use case can be found in the marine_and_cryosphere sample data tarball. Click here to the METplus releases page and download sample data for the appropriate release: https://github.com/dtcenter/METplus/releases This tarball should be unpacked into the directory that you will set the value of INPUT_BASE. See Running METplus section for more information.

METplus Components

This use case calls ASCII2NC to read in Argo netCDF files and then PointStat for verification against RTOFS model data.

METplus Workflow

ASCII2NC is the first tool called. It pulls in three Argo files for the Atlantic, Pacific, and Indian Oceans, respectively using a Python script. These observations are converted into a netCDF file, which is then called by PointStat as the observation dataset. PointStat also pulls in a forecast from the RTOFS for ocean temperature at 50 m depth, which is included in the range of available observation times, and two monthly climatology files from the WOA23 to calculate daily climatology for the valid date. A 24-hour (18Z to 18Z) window is allowed for the observational data, and a “UNIQUE” flag is set to only use the observational data closest to the forecast valid time at a given location. Temperature thresholds are set to correspond to operational usage, and the CTC, CTS, CNT, SL1L2, and SAL1L2 line types are requested. It processes the following run time:

Valid: 2023-03-18 00Z
Forecast lead: 24 hour

METplus Configuration

METplus first loads all of the default configuration files found in parm/metplus_config, then it loads any configuration files passed to METplus via the command line with the -c option, i.e. -c parm/use_cases/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp.conf

###
# Purpose: Example METplus configuration file to verify RTOFS ocean temperature
#          forecasts at 50 m depth with Argo profile data and WOA23 climatology
#          using python embedding.
# Contributors: L. Gwen Chen (lichuan.chen@noaa.gov), George McCabe, 
#               John Halley Gotway, and Daniel Adriaansen
# Date: 22 March 2023
###

[config]

###
# Processes to run
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#process-list
###

PROCESS_LIST = ASCII2NC,PointStat

###
# Time Info
# LOOP_BY options are INIT, VALID, RETRO, and REALTIME
# If set to INIT or RETRO:
#   INIT_TIME_FMT, INIT_BEG, INIT_END, and INIT_INCREMENT must also be set
# If set to VALID or REALTIME:
#   VALID_TIME_FMT, VALID_BEG, VALID_END, and VALID_INCREMENT must also be set
# LEAD_SEQ is the list of forecast leads to process
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#timing-control
###

LOOP_BY = VALID
VALID_TIME_FMT = %Y%m%d
VALID_BEG = 20230318
VALID_END = 20230318
VALID_INCREMENT = 24H

LEAD_SEQ = 024

###
# File I/O
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#directory-and-filename-template-info
###

PY_EMBED_SCRIPT = {PARM_BASE}/use_cases/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp/read_argo_metplus.py

INPUT_FILE = {INPUT_BASE}/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp/argo/atlantic_ocean/{valid?fmt=%Y%m%d}_prof.nc {INPUT_BASE}/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp/argo/indian_ocean/{valid?fmt=%Y%m%d}_prof.nc {INPUT_BASE}/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp/argo/pacific_ocean/{valid?fmt=%Y%m%d}_prof.nc

ASCII2NC_INPUT_DIR =
ASCII2NC_INPUT_TEMPLATE = "{PY_EMBED_SCRIPT} {INPUT_FILE}"

ASCII2NC_OUTPUT_DIR = {OUTPUT_BASE}/prep
ASCII2NC_OUTPUT_TEMPLATE = argo.{valid?fmt=%Y%m%d}.nc

ASCII2NC_SKIP_IF_OUTPUT_EXISTS = False

ASCII2NC_FILE_WINDOW_BEGIN = 0
ASCII2NC_FILE_WINDOW_END = 0

FCST_POINT_STAT_INPUT_DIR = {INPUT_BASE}/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp
FCST_POINT_STAT_INPUT_TEMPLATE = rtofs.{init?fmt=%Y%m%d}/rtofs_glo_3dz_f{lead?fmt=%3H}_daily_3ztio.argo.nc

OBS_POINT_STAT_INPUT_DIR = {ASCII2NC_OUTPUT_DIR}
OBS_POINT_STAT_INPUT_TEMPLATE = {ASCII2NC_OUTPUT_TEMPLATE}

POINT_STAT_OUTPUT_DIR = {OUTPUT_BASE}/stats
POINT_STAT_OUTPUT_TEMPLATE = rtofs.{valid?fmt=%Y%m%d}

###
# ASCII2NC Settings
# https://metplus.readthedocs.io/en/latest/Users_Guide/wrappers.html#ascii2nc
###

#ASCII2NC_CONFIG_FILE =

ASCII2NC_WINDOW_BEGIN = 0
ASCII2NC_WINDOW_END = 0

ASCII2NC_INPUT_FORMAT = python

ASCII2NC_MASK_GRID =
ASCII2NC_MASK_POLY =
ASCII2NC_MASK_SID =

ASCII2NC_TIME_SUMMARY_FLAG = False
ASCII2NC_TIME_SUMMARY_RAW_DATA = False
ASCII2NC_TIME_SUMMARY_BEG = 000000
ASCII2NC_TIME_SUMMARY_END = 235959
ASCII2NC_TIME_SUMMARY_STEP = 300
ASCII2NC_TIME_SUMMARY_WIDTH = 600
ASCII2NC_TIME_SUMMARY_GRIB_CODES = 11, 204, 211
ASCII2NC_TIME_SUMMARY_VAR_NAMES =
ASCII2NC_TIME_SUMMARY_TYPES = min, max, range, mean, stdev, median, p80
ASCII2NC_TIME_SUMMARY_VALID_FREQ = 0
ASCII2NC_TIME_SUMMARY_VALID_THRESH = 0.0

###
# Field Info
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#field-info
###

POINT_STAT_ONCE_PER_FIELD = False

FCST_VAR1_NAME = temperature
FCST_VAR1_LEVELS = "(0,@50,*,*)"
FCST_VAR1_OPTIONS = set_attr_lead = "{lead?fmt=%3H}"; set_attr_level = "Z50";
OBS_VAR1_NAME = TEMP
OBS_VAR1_LEVELS = Z48-52
OBS_VAR1_OPTIONS = set_attr_units = "degC";
BOTH_VAR1_THRESH = >=0, >=26.5

###
# PointStat Settings
# https://metplus.readthedocs.io/en/latest/Users_Guide/wrappers.html#pointstat
###

OBS_POINT_STAT_WINDOW_BEGIN = -21600
OBS_POINT_STAT_WINDOW_END = 64800
POINT_STAT_MET_CONFIG_OVERRIDES = duplicate_flag = UNIQUE; obs_summary = NEAREST;

POINT_STAT_OFFSETS = 0

MODEL = RTOFS
OBTYPE = ARGO
POINT_STAT_DESC = NA
POINT_STAT_OUTPUT_PREFIX = {MODEL}_{OBTYPE}_temp_Z50

POINT_STAT_REGRID_TO_GRID = NONE
POINT_STAT_REGRID_METHOD = BILIN
POINT_STAT_REGRID_WIDTH = 2

POINT_STAT_MESSAGE_TYPE = ARGO

POINT_STAT_MASK_GRID =
POINT_STAT_MASK_SID =
POINT_STAT_MASK_POLY = {INPUT_BASE}/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp/rtofs.{init?fmt=%Y%m%d}/mask.global.nc

# Set up climatology files and interpolation methods
POINT_STAT_CLIMO_MEAN_FILE_NAME = {INPUT_BASE}/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp/woa23/woa23_decav91C0_t03_04.nc, {INPUT_BASE}/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp/woa23/woa23_decav91C0_t04_04.nc
POINT_STAT_CLIMO_MEAN_FIELD = {name = "t_an"; level = "(0,@50,*,*)";}
POINT_STAT_CLIMO_MEAN_REGRID_METHOD = BILIN
POINT_STAT_CLIMO_MEAN_REGRID_WIDTH = 2
POINT_STAT_CLIMO_MEAN_REGRID_VLD_THRESH = 0.5
POINT_STAT_CLIMO_MEAN_REGRID_SHAPE = SQUARE
POINT_STAT_CLIMO_MEAN_TIME_INTERP_METHOD = DW_MEAN
POINT_STAT_CLIMO_MEAN_DAY_INTERVAL = 31
POINT_STAT_CLIMO_MEAN_HOUR_INTERVAL = 6

POINT_STAT_CLIMO_CDF_WRITE_BINS = False

# Set up output files
POINT_STAT_OUTPUT_FLAG_CTC = STAT
POINT_STAT_OUTPUT_FLAG_CTS = STAT
POINT_STAT_OUTPUT_FLAG_CNT = STAT
POINT_STAT_OUTPUT_FLAG_SL1L2 = STAT
POINT_STAT_OUTPUT_FLAG_SAL1L2 = STAT

MET Configuration

METplus sets environment variables based on user settings in the METplus configuration file. See How METplus controls MET config file settings for more details.

YOU SHOULD NOT SET ANY OF THESE ENVIRONMENT VARIABLES YOURSELF! THEY WILL BE OVERWRITTEN BY METPLUS WHEN IT CALLS THE MET TOOLS!

If there is a setting in the MET configuration file that is currently not supported by METplus you’d like to control, please refer to: Overriding Unsupported MET config file settings

Note

See the PointStat MET Configuration section of the User’s Guide for more information on the environment variables used in the file below:

////////////////////////////////////////////////////////////////////////////////
//
// Default ascii2nc configuration file
//
////////////////////////////////////////////////////////////////////////////////

//
// The parameters listed below are used to summarize the ASCII data read in
//

//
// Time periods for the summarization
// obs_var (string array) is added and works like grib_code (int array)
// when the obs name is given instead of grib_code
//
${METPLUS_TIME_SUMMARY_DICT}

//
// Mapping of input little_r report types to output message types
//
message_type_map = [
   { key = "FM-12 SYNOP";  val = "ADPSFC"; },
   { key = "FM-13 SHIP";   val = "SFCSHP"; },
   { key = "FM-15 METAR";  val = "ADPSFC"; },
   { key = "FM-18 BUOY";   val = "SFCSHP"; },
   { key = "FM-281 QSCAT"; val = "ASCATW"; },
   { key = "FM-32 PILOT";  val = "ADPUPA"; },
   { key = "FM-35 TEMP";   val = "ADPUPA"; },
   { key = "FM-88 SATOB";  val = "SATWND"; },
   { key = "FM-97 ACARS";  val = "AIRCFT"; }
];

//
// Indicate a version number for the contents of this configuration file.
// The value should generally not be modified.
//
//version = "V10.0";

tmp_dir = "${MET_TMP_DIR}";

${METPLUS_MET_CONFIG_OVERRIDES}
////////////////////////////////////////////////////////////////////////////////
//
// Point-Stat configuration file.
//
// For additional information, see the MET_BASE/config/README file.
//
////////////////////////////////////////////////////////////////////////////////

//
// Output model name to be written
//
// model =
${METPLUS_MODEL}

//
// Output description to be written
// May be set separately in each "obs.field" entry
//
// desc =
${METPLUS_DESC}

////////////////////////////////////////////////////////////////////////////////

//
// Verification grid
//
// regrid = {
${METPLUS_REGRID_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// May be set separately in each "field" entry
//
censor_thresh = [];
censor_val    = [];
cat_thresh    = [ NA ];
cnt_thresh    = [ NA ];
cnt_logic     = UNION;
wind_thresh   = [ NA ];
wind_logic    = UNION;
eclv_points   = 0.05;
//hss_ec_value =
${METPLUS_HSS_EC_VALUE}
rank_corr_flag = FALSE;

//
// Forecast and observation fields to be verified
//
fcst = {
  ${METPLUS_FCST_FILE_TYPE}
  ${METPLUS_FCST_FIELD}
}

obs = {
  ${METPLUS_OBS_FILE_TYPE}
  ${METPLUS_OBS_FIELD}
}
////////////////////////////////////////////////////////////////////////////////

//
// Point observation filtering options
// May be set separately in each "obs.field" entry
//
// message_type =
${METPLUS_MESSAGE_TYPE}
sid_exc        = [];

//obs_quality_inc =
${METPLUS_OBS_QUALITY_INC}

//obs_quality_exc =
${METPLUS_OBS_QUALITY_EXC}

duplicate_flag = NONE;
obs_summary    = NONE;
obs_perc_value = 50;

//
// Mapping of message type group name to comma-separated list of values.
//
//message_type_group_map =
${METPLUS_MESSAGE_TYPE_GROUP_MAP}

////////////////////////////////////////////////////////////////////////////////

//
// Climatology data
//
//climo_mean = {
${METPLUS_CLIMO_MEAN_DICT}


//climo_stdev = {
${METPLUS_CLIMO_STDEV_DICT}

//
// May be set separately in each "obs.field" entry
//
//climo_cdf = {
${METPLUS_CLIMO_CDF_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Point observation time window
//
// obs_window = {
${METPLUS_OBS_WINDOW_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Verification masking regions
//
//mask = {
${METPLUS_MASK_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Confidence interval settings
//
ci_alpha  = [ 0.05 ];

boot = {
   interval = PCTILE;
   rep_prop = 1.0;
   n_rep    = 0;
   rng      = "mt19937";
   seed     = "";
}

////////////////////////////////////////////////////////////////////////////////

//
// Interpolation methods
//
//interp = {
${METPLUS_INTERP_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// HiRA verification method
//
//hira = {
${METPLUS_HIRA_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Statistical output types
//
//output_flag = {
${METPLUS_OUTPUT_FLAG_DICT}

////////////////////////////////////////////////////////////////////////////////
// Threshold for SEEPS p1 (Probability of being dry)

//seeps_p1_thresh =
${METPLUS_SEEPS_P1_THRESH}

////////////////////////////////////////////////////////////////////////////////

tmp_dir = "${MET_TMP_DIR}";

// output_prefix =
${METPLUS_OUTPUT_PREFIX}
//version        = "V10.0.0";

////////////////////////////////////////////////////////////////////////////////

${METPLUS_MET_CONFIG_OVERRIDES}

Python Embedding

This use case uses one Python script to read observation data.

parm/use_cases/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp/read_argo_metplus.py

#! /usr/bin/env python3

import sys
import os
import datetime

import netCDF4
import numpy
from numpy.ma import is_masked

# set to true to output more info
DEBUG = False

# constant values that will be used for every observation
MESSAGE_TYPE = 'ARGO'
ELEVATION = 'NA'
LEVEL = 'NA'
QC_STRING = '1'

# skip data if PRES_ADJUSTED_ERROR value is greater than this value
MAX_PRESSURE_ERROR = 20

"""Read and format the input 11-column observations:
(1)  string:  Message_Type
(2)  string:  Station_ID
(3)  string:  Valid_Time(YYYYMMDD_HHMMSS)
(4)  numeric: Lat(Deg North)
(5)  numeric: Lon(Deg East)
(6)  numeric: Elevation(msl)
(7)  string:  Var_Name(or GRIB_Code)
(8)  numeric: Level
(9)  numeric: Height(msl or agl)
(10) string:  QC_String
(11) numeric: Observation_Value
"""


def get_string_value(var):
    """!Get string value from NetCDF variable. The string variables are stored
    as bytes, so decode them and strip off any whitespace before or after.

    @param var NetCDF variable to read
    @returns string value from variable
    """
    return var[:].tobytes().decode('utf-8').strip()


def get_val_check_qc(nc_obj, field_name, idx_p, idx_l=None, error_max=None):
    """!Get field value unless quality control checks do not pass.
    The conditions to skip data are as follows:
    1) If {field_name}_QC is masked.
    2) If {field_name} is masked.
    3) If {field_name}_QC value is not equal to 1 (recommended from ARGO docs).
    4) If error_max is set, skip if {field_name}_ERROR is not masked and less
    than the error_max value.

    @param nc_obj NetCDF object
    @param field_name name of field to read
    @param idx_p index of profile (1st dimension of data)
    @param idx_l (optional) index of level (2nd dimension of data). Defaults to
    None which assumes field is not 2D
    @param error_max (optional) value to compare to {field_name}_ERROR. Skip if
    error value is less than this value
    @returns numpy.float64 field value if all QC checks pass or None
    """
    if idx_l is None:
        qc = nc_obj.variables[f'{field_name}_QC'][idx_p]
        field = nc_obj.variables[field_name][idx_p]
        data_str = f'{field_name}[{idx_p}]'
    else:
        qc = nc_obj.variables[f'{field_name}_QC'][idx_p][idx_l]
        field = nc_obj.variables[field_name][idx_p][idx_l]
        data_str = f'{field_name}[{idx_p}][{idx_l}]'

    qc_mask = is_masked(qc)
    field_mask = is_masked(field)

    if qc_mask:
        if DEBUG:
            print(f"Skip {data_str} {field_name}_QC is masked")
        return None

    if field_mask:
        if DEBUG:
            print(f"Skip {data_str} {field_name} is masked")
        return None

    if int(qc) != 1:
        if DEBUG:
            print(f"Skip {data_str} {field_name}_QC value ({int(qc)}) != 1")
        return None

    if error_max:
        err = nc_obj.variables.get(f'{field_name}_ERROR')
        if err:
            err = err[idx_p] if idx_l is None else err[idx_p][idx_l]
            if not is_masked(err) and err > error_max:
                print(f"Skip {data_str} {field_name}_ERROR > {error_max}")
                return None

    return numpy.float64(field)


def get_valid_time(ref_dt, nc_obj, idx):
    """!Get valid time by adding julian days to the reference date time.

    @param ref_dt Datetime object of reference date time
    @param nc_obj NetCDF object
    @param idx index of profile to read
    @returns string of valid time in YYYYMMDD_HHMMSS format
    """
    julian_days = nc_obj.variables['JULD'][idx]
    day_offset = datetime.timedelta(days=float(julian_days))
    valid_dt = ref_dt + day_offset
    return valid_dt.strftime('%Y%m%d_%H%M%S')


def get_lat_lon(nc_obj, idx):
    """!Read latitude and longitude values from NetCDF file at profile index.

    @param nc_obj NetCDF object
    @param idx index of profile to read
    @returns tuple of lat and lon values as floats
    """
    return (numpy.float64(nc_obj.variables['LATITUDE'][idx]),
            numpy.float64(nc_obj.variables['LONGITUDE'][idx]))


if len(sys.argv) < 2:
    print(f"ERROR: {__file__} - Must provide at least 1 input file argument")
    sys.exit(1)

is_ok = True
input_files = []
for arg in sys.argv[1:]:
    if arg.endswith('debug'):
        print('Debugging output turned on')
        DEBUG = True
        continue

    input_file = os.path.expandvars(arg)
    if not os.path.exists(input_file):
        print(f'ERROR: Input file does not exist: {input_file}')
        is_ok = False
        continue

    input_files.append(input_file)

if not is_ok:
    sys.exit(1)

print(f'Number of input files: {len(input_files)}')

point_data = []
for input_file in input_files:
    print(f'Processing file: {input_file}')

    nc_in = netCDF4.Dataset(input_file, 'r')

    # get reference date time
    time_str = get_string_value(nc_in.variables['REFERENCE_DATE_TIME'])
    reference_date_time = datetime.datetime.strptime(time_str, '%Y%m%d%H%M%S')

    # get number of profiles and levels
    num_profiles = nc_in.dimensions['N_PROF'].size
    num_levels = nc_in.dimensions['N_LEVELS'].size

    new_point_data = []
    for index_p in range(0, num_profiles):
        # check QC and mask of JULD to skip profiles with bad time info
        if get_val_check_qc(nc_in, 'JULD', index_p) is None:
            continue

        valid_time = get_valid_time(reference_date_time, nc_in, index_p)
        station_id = get_string_value(
            nc_in.variables['PLATFORM_NUMBER'][index_p]
        )
        lat, lon = get_lat_lon(nc_in, index_p)

        # loop through levels
        for index_l in range(0, num_levels):
            # read pressure data to get height in meters of sea water (msw)
            height = get_val_check_qc(nc_in, 'PRES_ADJUSTED', index_p, index_l,
                                      error_max=MAX_PRESSURE_ERROR)
            if height is None:
                continue

            # get temperature and ocean salinity values
            for var_name in ('TEMP', 'PSAL'):
                observation_value = get_val_check_qc(nc_in,
                                                     f'{var_name}_ADJUSTED',
                                                     index_p, index_l)
                if observation_value is None:
                    continue

                point = [
                    MESSAGE_TYPE, station_id, valid_time, lat, lon, ELEVATION,
                    var_name, LEVEL, height, QC_STRING, observation_value,
                ]
                new_point_data.append(point)
                if DEBUG:
                    print(', '.join([str(val) for val in point]))

    point_data.extend(new_point_data)
    nc_in.close()

print("     point_data: Data Length:\t" + repr(len(point_data)))
print("     point_data: Data Type:\t" + repr(type(point_data)))

Running METplus

Pass the use case configuration file to the run_metplus.py script along with any user-specific system configuration files if desired:

run_metplus.py /path/to/METplus/parm/use_cases/model_applications/marine_and_cryosphere/PointStat_fcstRTOFS_obsARGO_climoWOA23_temp.conf /path/to/user_system.conf

See Running METplus for more information.

Expected Output

A successful run will output the following both to the screen and to the logfile:

INFO: METplus has successfully finished running.

Refer to the value set for OUTPUT_BASE to find where the output data was generated. Output for this use case will be found in prep and stats/rtofs.20230318 directories (relative to OUTPUT_BASE) and will contain the following files:

  • argo.20230318.nc

  • point_stat_RTOFS_ARGO_temp_Z50_240000L_20230318_000000V.stat

Keywords

Note

  • PointStatToolUseCase

  • ASCII2NCToolUseCase

  • PythonEmbeddingFileUseCase

  • ClimatologyUseCase

  • MarineAndCryosphereAppUseCase

Navigate to the METplus Quick Search for Use Cases page to discover other similar use cases.

sphinx_gallery_thumbnail_path = ‘_static/marine_and_cryosphere-PointStat_fcstRTOFS_obsARGO_climoWOA23_temp.png’

Total running time of the script: (0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery