5.2.11.5. Grid-Stat: Verification of TC forecasts against merged TDR data

model_applications/tc_and_extra_tc/GridStat_fcstHAFS_obsTDR _NetCDF.conf

Scientific Objective

To provide useful statistical information on the relationship between merged Tail Doppler Radar (TDR) data in NetCDF format to a gridded forecast. These values can be used to assess the skill of the prediction. The TDR data is available every 0.5 km AGL. So, the TC forecasts need to be in height coordinates to compare with the TDR data.

Datasets

Forecast: HAFS zonal wind
Observation: HRD TDR merged_zonal_wind
Location of Model forecast and Dropsonde files: All of the input data required for this use case can be found in the sample data tarball. Click here to download.
This tarball should be unpacked into the directory that you will set the value of INPUT_BASE. See ‘Running METplus’ section for more information.
TDR Data Source: Hurricane Research Division: Contact: Paul Reasor Email: paul.reasor@noaa.gov
The data dataset used in the use case is a subset of the Merged Analysis (v2d_combined_xy_rel_merged_ships.nc).
Thanks to HRD for providing us the dataset

METplus Components

The observations in the use case contains data mapped into Cartesian Grids with a horizontal grid spacing of 2 km and vertical grid spacing of 0.5 km. Hence the model output needs to be in height (km) (vertical coordinates) instead of pressure levels. Both observation and model output are available with the release. The instructions below tells how the input to the use case was prepared. The Hurricane Analysis and Forecast System (HAFS) (pressure levels in GRIB2 format) outputs are converted to height level (in NetCDF4 format) using METcalcpy vertical interpolation routine. Under METcalcpy/examples directory user can modify the vertical_interp_hwrf.sh or create a similar file for their own output. The $DATA_DIR is the top level output directory where the pressure level data resides. The –input and –output should point to the input and output file names resp. The –config points to a yaml file. Users should edit the yaml file, if needed. For this use case only zonal wind (u) at 4 (200m, 2000m, 4000m and 6000m) vertical levels are provided. The use case will compare the HAFS 2 km zonal wind (u) data against TDR’s merged_zonal_wind at 2km. The user need to run the shell script to get the height level output in NetCDF4 format. This use case utilizes the METplus python embedding to read the TDR data and compare them to gridded forecast data using GridStat.

METplus Workflow

The use case runs the python embedding scripts (GridStat_fcstHAFS_obsTDR_NetCDF/read_tdr.py: to read the TDR data) and run Grid-Stat (compute statistics against HAFS model output, in height coordinates), called in this example.

It processes the following run times: Valid at 2019-08-29 12Z

Forecast lead times: 0,6,12 and 18 UTC

The mission number (e.g CUSTOM_LOOP_LIST = 190829H1)

Height level (for TDR: OBS_VERT_LEVEL_KM = 2, HAFS: FCST_VAR1_LEVELS = “(0,1,*,*)”)

METplus Configuration

METplus first loads all of the configuration files found in parm/metplus_config, then it loads any configuration files passed to METplus via the command line with the -c option, i.e. -c parm/use_cases/model_applications/tc_and_extra_tc/GridStat_fcstHAFS_obsTDR_NetCDF.conf

# GridStat METplus Configuration

# section heading for [config] variables - all items below this line and
# before the next section heading correspond to the [config] section
[config]

# List of applications to run - only GridStat for this case
PROCESS_LIST = GridStat

# time looping - options are INIT, VALID, RETRO, and REALTIME
# If set to INIT or RETRO:
#   INIT_TIME_FMT, INIT_BEG, INIT_END, and INIT_INCREMENT must also be set
# If set to VALID or REALTIME:
#   VALID_TIME_FMT, VALID_BEG, VALID_END, and VALID_INCREMENT must also be set
LOOP_BY = VALID

# Format of INIT_BEG and INT_END using % items
# %Y = 4 digit year, %m = 2 digit month, %d = 2 digit day, etc.
# see www.strftime.org for more information
# %Y%m%d%H expands to YYYYMMDDHH
VALID_TIME_FMT = %Y%m%d%H

# Start time for METplus run - must match INIT_TIME_FMT
VALID_BEG = 2019082912

# End time for METplus run - must match INIT_TIME_FMT
VALID_END = 2019082912

# Increment between METplus runs (in seconds if no units are specified)
#  Must be >= 60 seconds
VALID_INCREMENT = 21600 

# List of forecast leads to process for each run time (init or valid)
# In hours if units are not specified
# If unset, defaults to 0 (don't loop through forecast leads)
LEAD_SEQ = 0,6,12,18

# Order of loops to process data - Options are times, processes
# Not relevant if only one item is in the PROCESS_LIST
# times = run all wrappers in the PROCESS_LIST for a single run time, then
#   increment the run time and run all wrappers again until all times have
#   been evaluated.
# processes = run the first wrapper in the PROCESS_LIST for all times
#   specified, then repeat for the next item in the PROCESS_LIST until all
#   wrappers have been run
LOOP_ORDER = times


# Verbosity of MET output - overrides LOG_VERBOSITY for GridStat only
LOG_GRID_STAT_VERBOSITY = 200

# Location of MET config file to pass to GridStat
# References CONFIG_DIR from the [dir] section
GRID_STAT_CONFIG_FILE = {CONFIG_DIR}/GridStatConfig_wrapped
GRID_STAT_OUTPUT_FLAG_FHO = BOTH
GRID_STAT_OUTPUT_FLAG_CTC = STAT
GRID_STAT_OUTPUT_FLAG_CTS = STAT
GRID_STAT_OUTPUT_FLAG_CNT = STAT
GRID_STAT_OUTPUT_FLAG_SL1L2 = STAT
GRID_STAT_OUTPUT_FLAG_ECLV = NONE

# grid to remap data. Value is set as the 'to_grid' variable in the 'regrid' dictionary
# See MET User's Guide for more information
GRID_STAT_REGRID_TO_GRID = OBS

# Name to identify model (forecast) data in output
MODEL = HAFS

# Name to identify observation data in output
OBTYPE = TDR

# add list of missions separated by commas
CUSTOM_LOOP_LIST = 190829H1

# List of variables to compare in GridStat - FCST_VAR1 variables correspond
#  to OBS_VAR1 variables
# Note [FCST/OBS/BOTH]_GRID_STAT_VAR<n>_NAME can be used instead if different evaluations
# are needed for different tools

FCST_VAR1_NAME =  u
FCST_VAR1_OPTIONS = set_attr_init="{init?fmt=%Y%m%d_%H%M%S}"; set_attr_valid="{valid?fmt=%Y%m%d_%H%M%S}"; set_attr_lead="{lead?fmt=%H}";

# FCST_VAR<n>_LEVELS dimensions are (valid_time, lev, latitude, longitude)
FCST_VAR1_LEVELS =  "(0,1,*,*)"
FCST_GRID_STAT_INPUT_DATATYPE = NETCDF_NCCF

# Location of the TDR file
TC_RADAR_FILE = {OBS_GRID_STAT_INPUT_DIR}/merged_zonal_wind_tdr.nc

# Obs vertical level in km
OBS_VERT_LEVEL_KM = 2


# Name of observation variable 1
# In this example the variable is merged_zonal_wind
#
OBS_VAR1_NAME = {PARM_BASE}/use_cases/model_applications/tc_and_extra_tc/GridStat_fcstHAFS_obsTDR_NetCDF/read_tdr.py {TC_RADAR_FILE} merged_zonal_wind {custom?fmt=%s} {OBS_VERT_LEVEL_KM} 

#Thresholds for categorical statistics
FCST_VAR1_THRESH = gt10.0, gt20.0, lt-10.0, lt-20.0
OBS_VAR1_THRESH = gt10.0, gt20.0, lt-10.0, lt-20.0

# Time relative to valid time (in seconds) to allow files to be considered
#  valid. Set both BEGIN and END to 0 to require the exact time in the filename
FCST_GRID_STAT_FILE_WINDOW_BEGIN = 0
FCST_GRID_STAT_FILE_WINDOW_END = 0

# MET GridStat neighborhood values
# See the MET User's Guide GridStat section for more information
# width value passed to nbrhd dictionary in the MET config file
GRID_STAT_NEIGHBORHOOD_WIDTH = 1

# shape value passed to nbrhd dictionary in the MET config file
GRID_STAT_NEIGHBORHOOD_SHAPE = SQUARE

# cov thresh list passed to nbrhd dictionary in the MET config file
GRID_STAT_NEIGHBORHOOD_COV_THRESH = >=0.5

# Set to true to run GridStat separately for each field specified
# Set to false to create one run of GridStat per run time that
#   includes all fields specified.
GRID_STAT_ONCE_PER_FIELD = False

# Set to true if forecast data is probabilistic
FCST_IS_PROB = false

# Only used if FCST_IS_PROB is true - sets probabilistic threshold
FCST_GRID_STAT_PROB_THRESH = ==0.1

# Set to true if observation data is probabilistic
#  Only used if configuring forecast data as the 'OBS' input
OBS_IS_PROB = false

# Only used if OBS_IS_PROB is true - sets probabilistic threshold
OBS_GRID_STAT_PROB_THRESH = ==0.1

GRID_STAT_OUTPUT_PREFIX = {MODEL}_vs_{OBTYPE}

# End of [config] section and start of [dir] section
[dir]

# location of configuration files used by MET applications
CONFIG_DIR={PARM_BASE}/met_config

# directory containing forecast input to GridStat
FCST_GRID_STAT_INPUT_DIR = {INPUT_BASE}/model_applications/tc_and_extra_tc/GridStat_fcstHAFS_obsTDR_NetCDF/hafs_height

# directory containing observation input to GridStat
OBS_GRID_STAT_INPUT_DIR = {INPUT_BASE}/model_applications/tc_and_extra_tc/GridStat_fcstHAFS_obsTDR_NetCDF/obs

# directory containing climatology mean input to GridStat
# Not used in this example
GRID_STAT_CLIMO_MEAN_INPUT_DIR =

# directory containing climatology mean input to GridStat
# Not used in this example
GRID_STAT_CLIMO_STDEV_INPUT_DIR =

# directory to write output from GridStat
GRID_STAT_OUTPUT_DIR = {OUTPUT_BASE}/model_applications/tc_and_extra_tc/tdr

# End of [dir] section and start of [filename_templates] section
[filename_templates]

# Template to look for forecast input to GridStat relative to FCST_GRID_STAT_INPUT_DIR
FCST_GRID_STAT_INPUT_TEMPLATE = dorian05l.{init?fmt=%Y%m%d%H}.hafsprs.synoptic.0p03.f{lead?fmt=%HHH}.nc4

# Template to look for observation input to GridStat relative to OBS_GRID_STAT_INPUT_DIR
OBS_GRID_STAT_INPUT_TEMPLATE = PYTHON_NUMPY


# Optional subdirectories relative to GRID_STAT_OUTPUT_DIR to write output from GridStat
GRID_STAT_OUTPUT_TEMPLATE = {init?fmt=%Y%m%d%H}

# Template to look for climatology input to GridStat relative to GRID_STAT_CLIMO_MEAN_INPUT_DIR
# Not used in this example
GRID_STAT_CLIMO_MEAN_INPUT_TEMPLATE =

# Template to look for climatology input to GridStat relative to GRID_STAT_CLIMO_STDEV_INPUT_DIR
# Not used in this example
GRID_STAT_CLIMO_STDEV_INPUT_TEMPLATE =

# Used to specify one or more verification mask files for GridStat
# Not used for this example
GRID_STAT_VERIFICATION_MASK_TEMPLATE =

MET Configuration

METplus sets environment variables based on the values in the METplus configuration file. These variables are referenced in the MET configuration file. YOU SHOULD NOT SET ANY OF THESE ENVIRONMENT VARIABLES YOURSELF! THEY WILL BE OVERWRITTEN BY METPLUS WHEN IT CALLS THE MET TOOLS! If there is a setting in the MET configuration file that is not controlled by an environment variable, you can add additional environment variables to be set only within the METplus environment using the [user_env_vars] section of the METplus configuration files. See the ‘User Defined Config’ section on the ‘System Configuration’ page of the METplus User’s Guide for more information.

////////////////////////////////////////////////////////////////////////////////
//
// Grid-Stat configuration file.
//
// For additional information, see the MET_BASE/config/README file.
//
////////////////////////////////////////////////////////////////////////////////

//
// Output model name to be written
//
// model =
${METPLUS_MODEL}

//
// Output description to be written
// May be set separately in each "obs.field" entry
//
// desc =
${METPLUS_DESC}

//
// Output observation type to be written
//
// obtype =
${METPLUS_OBTYPE}

////////////////////////////////////////////////////////////////////////////////

//
// Verification grid
//
// regrid = {
${METPLUS_REGRID_DICT}

////////////////////////////////////////////////////////////////////////////////

censor_thresh    = [];
censor_val       = [];
cat_thresh  	 = [];
cnt_thresh  	 = [ NA ];
cnt_logic   	 = UNION;
wind_thresh 	 = [ NA ];
wind_logic  	 = UNION;
eclv_points      = 0.05;
nc_pairs_var_suffix = "";
//nc_pairs_var_name =
${METPLUS_NC_PAIRS_VAR_NAME}
rank_corr_flag   = FALSE;

//
// Forecast and observation fields to be verified
//
fcst = {
  ${METPLUS_FCST_FILE_TYPE}
  ${METPLUS_FCST_FIELD}
}
obs = {
  ${METPLUS_OBS_FILE_TYPE}
  ${METPLUS_OBS_FIELD}
}

////////////////////////////////////////////////////////////////////////////////

//
// Climatology mean data
//
//climo_mean = {
${METPLUS_CLIMO_MEAN_DICT}


//climo_stdev = {
${METPLUS_CLIMO_STDEV_DICT}

//
// May be set separately in each "obs.field" entry
//
//climo_cdf = {
${METPLUS_CLIMO_CDF_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Verification masking regions
//
// mask = {
${METPLUS_MASK_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Confidence interval settings
//
ci_alpha  = [ 0.05 ];

boot = {
   interval = PCTILE;
   rep_prop = 1.0;
   n_rep    = 0;
   rng      = "mt19937";
   seed     = "";
}

////////////////////////////////////////////////////////////////////////////////

//
// Data smoothing methods
//
//interp = {
${METPLUS_INTERP_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Neighborhood methods
//
nbrhd = {
   field      = BOTH;
   // shape =
   ${METPLUS_NBRHD_SHAPE}
   // width =
   ${METPLUS_NBRHD_WIDTH}
   // cov_thresh =
   ${METPLUS_NBRHD_COV_THRESH}
   vld_thresh = 1.0;
}

////////////////////////////////////////////////////////////////////////////////

//
// Fourier decomposition
// May be set separately in each "obs.field" entry
//
fourier = {
   wave_1d_beg = [];
   wave_1d_end = [];
}

////////////////////////////////////////////////////////////////////////////////

//
// Gradient statistics
// May be set separately in each "obs.field" entry
//
gradient = {
   dx = [ 1 ];
   dy = [ 1 ];
}

////////////////////////////////////////////////////////////////////////////////

//
// Distance Map statistics
// May be set separately in each "obs.field" entry
//
distance_map = {
   baddeley_p        = 2;
   baddeley_max_dist = NA;
   fom_alpha         = 0.1;
   zhu_weight        = 0.5;
}

////////////////////////////////////////////////////////////////////////////////

//
// Statistical output types
//
//output_flag = {
${METPLUS_OUTPUT_FLAG_DICT}

//
// NetCDF matched pairs output file
// May be set separately in each "obs.field" entry
//
// nc_pairs_flag = {
${METPLUS_NC_PAIRS_FLAG_DICT}

////////////////////////////////////////////////////////////////////////////////

//grid_weight_flag =
${METPLUS_GRID_WEIGHT_FLAG}
tmp_dir          = "/tmp";
// output_prefix =
${METPLUS_OUTPUT_PREFIX}

////////////////////////////////////////////////////////////////////////////////

${METPLUS_MET_CONFIG_OVERRIDES}

Note the following variables are referenced in the MET configuration file.

Python Embedding

This use case uses a Python embedding script to read input data

parm/use_cases/model_applications/tc_and_extra_tc/GridStat_fcstHAFS_obsTDR_NetCDF/read_tdr.py

import os
import sys

sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)))

import tdr_utils

if len(sys.argv) < 5:
    print("Must specify exactly one input file, variable name, mission ID (YYMMDDID), level (in km)")
    sys.exit(1)

# Read the input file as the first argument
input_file   = os.path.expandvars(sys.argv[1])
var_name     = sys.argv[2]
mission_name = sys.argv[3]
level_km     = float(sys.argv[4])

met_data, attrs = tdr_utils.main(input_file, var_name, mission_name, level_km)

The above script imports another script called tdr_utils.py in the same directory:

parm/use_cases/model_applications/tc_and_extra_tc/GridStat_fcstHAFS_obsTDR_NetCDF/tdr_utils.py

from netCDF4 import Dataset
import numpy as np
import datetime as dt
import os
import sys
from time import gmtime, strftime

# Return valid time
def get_valid_time(input_file, mission_name):
    f = Dataset(input_file, 'r')
    mid = f.variables['mission_ID'][:].tolist().index(mission_name)
    valid_time = calculate_valid_time(f, mid)
    valid_time_mid = valid_time.strftime("%Y%m%d%H%M") 
    return valid_time_mid

def calculate_valid_time(f, mid):
  merge_year_np  = np.array(f.variables['merge_year'][mid])
  merge_month_np = np.array(f.variables['merge_month'][mid])
  merge_day_np   = np.array(f.variables['merge_day'][mid])
  merge_hour_np  = np.array(f.variables['merge_hour'][mid])
  merge_min_np   = np.array(f.variables['merge_min'][mid])
  valid_time     = dt.datetime(merge_year_np,merge_month_np,merge_day_np,merge_hour_np,merge_min_np,0)
  return valid_time

def read_inputs():
    # Read the input file as the first argument
    input_file   = os.path.expandvars(sys.argv[1])
    var_name     = sys.argv[2]
    mission_name = sys.argv[3]
    level_km     = float(sys.argv[4])
    return input_file, var_name, mission_name, level_km

def main(input_file, var_name, mission_name, level_km):
  ###########################################

  ##
  ##  input file specified on the command line
  ##  load the data into the numpy array
  ##


    try:
      # Print some output to verify that this script ran
      print("Input File:      " + repr(input_file))
      print("Variable Name:   " + repr(var_name))

      # Read input file
      f = Dataset(input_file, 'r')

      # Find the requested mission name 
      mid = f.variables['mission_ID'][:].tolist().index(mission_name)

      # Find the requested level value 
      lid = f.variables['level'][:].tolist().index(level_km)

      # Read the requested variable
      data = np.float64(f.variables[var_name][mid,:,:,lid])

      # Expect that dimensions are ordered (lat, lon)
      # If (lon, lat), transpose the data
      if(f.variables[var_name].dimensions[0] == 'lon'):
         data = data.transpose()

      print("Mission (index): " + repr(mission_name) + " (" + repr(mid) + ")")
      print("Level (index):   " + repr(level_km) + " (" + repr(lid) + ")")
      print("Data Range:      " + repr(np.nanmin(data)) + " to " + repr(np.nanmax(data)))

      # Reset any negative values to missing data (-9999 in MET)
      data[np.isnan(data)] = -9999

      # Flip data along the equator
      data = data[::-1]

      # Store a deep copy of the data for MET
      met_data = data.reshape(200,200).copy()

      print("Data Shape:      " + repr(met_data.shape))
      print("Data Type:       " + repr(met_data.dtype))

    except NameError:
      print("Trouble reading input file: " . input_file)


    ###############################################################################

    # Determine LatLon grid information

    # Read in coordinate data
    merged_lon  = np.array(f.variables['merged_longitudes'][mid,0,:])
    merged_lat  = np.array(f.variables['merged_latitudes'][mid,:,0])

    # Time data:
    valid_time = calculate_valid_time(f, mid)
    init_time = valid_time

    ###########################################

    ##
    ##  create the metadata dictionary
    ##

    ###########################################
    attrs = {
      'valid': valid_time.strftime("%Y%m%d_%H%M%S"),
      'init' : valid_time.strftime("%Y%m%d_%H%M%S"),
      'lead':  '00',
      'accum': '06',
      'mission_id': mission_name,

      'name':      var_name,
      'long_name': var_name,
      'level':     str(level_km) + "km",
      'units':     str(getattr(f.variables[var_name], "units")),

      'grid': {
          'name':       var_name,
          'type' :      'LatLon',
          'lat_ll' :    float(min(merged_lat)),
          'lon_ll' :    float(min(merged_lon)),
          'delta_lat' : float(merged_lat[1]-merged_lat[0]),
          'delta_lon' : float(merged_lon[1]-merged_lon[0]),
          'Nlat' :      len(merged_lat),
          'Nlon' :      len(merged_lon),
      }
    }

    print("Attributes:      " + repr(attrs))
    return met_data, attrs

if __name__ == '__main__':
    if len(sys.argv) < 5:
        print("Must specify exactly one input file, variable name, mission ID (YYMMDDID), level (in km)")
        sys.exit(1)

    input_file, var_name, mission_name, level_km = read_inputs()

    met_data, attrs = main(input_file, var_name, mission_name, level_km)

Running METplus

This use case can be run two ways:

  1. Passing in GridStat_fcstHAFS_obsTDR_NetCDF.conf then a user-specific system configuration file:

    run_metplus.py -c /path/to/METplus/parm/use_cases/model_applications//tc_and_extra_tc/GridStat_fcstHAFS_obsTDR_NetCDF.conf -c /path/to/user_system.conf
    
  2. Modifying the configurations in parm/metplus_config, then passing in GridStat_fcstHAFS_obsTDR_NetCDF.conf:

    run_metplus.py -c /path/to/METplus/parm/use_cases/model_applications/tc_and_extra_tc/GridStat_fcstHAFS_obsTDR_NetCDF.conf
    

The former method is recommended. Whether you add them to a user-specific configuration file or modify the metplus_config files, the following variables must be set correctly:

  • INPUT_BASE - Path to directory where sample data tarballs are unpacked (See Datasets section to obtain tarballs). This is not required to run METplus, but it is required to run the examples in parm/use_cases

  • OUTPUT_BASE - Path where METplus output will be written. This must be in a location where you have write permissions

  • MET_INSTALL_DIR - Path to location where MET is installed locally

Example User Configuration File:

[dir]
INPUT_BASE = /path/to/sample/input/data
OUTPUT_BASE = /path/to/output/dir
MET_INSTALL_DIR = /path/to/met-X.Y

NOTE: All of these items must be found under the [dir] section.

Expected Output

A successful run will output the following both to the screen and to the logfile:

INFO: METplus has successfully finished running.

Refer to the value set for OUTPUT_BASE to find where the output data was generated. Output for this use case will be found in nam (relative to OUTPUT_BASE) and will contain the following files:

  • grid_stat_HAFS_vs_TDR_000000L_20190829_120000V_fho.txt

  • grid_stat_HAFS_vs_TDR_000000L_20190829_120000V_pairs.nc

  • grid_stat_HAFS_vs_TDR_000000L_20190829_120000V.stat

  • The use case is run for 4 lead times valid at 2019081912, so four directories will be generated which contains similar files as above.

Keywords

sphinx_gallery_thumbnail_path = ‘_static/tc_and_extra_tc-GridStat_fcstHAFS_obsTDR_NetCDF.png’

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery