Note

Go to the end to download the full example code.

PointStat: Use Python embedding and METcalcpy to calculate and verify CTP/HI

model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI.conf

Scientific Objective

This use case examines short-term land-atmosphere coupling within the UFS global forecast system (GFS). The specific configuration is a GFSv17 pre-release version (HR1). Land-atmosphere coupling is important for many processes, such as prediction of convection, convective clouds, as well as near surface sensible weather (e.g. winds, humidity, temperature). Here the CTP-HI process diagnostic uses morning surface and lower atmosphere conditions to assess the convective coupling in wet or dry coupling phases. This permits further assessment of the model near surface and lower atmosphere biases in an integrated phase space, using CTP versus HI.

Version Added

METplus version 6.1

Datasets

Forecast: Global Forecast System (GFS) version 17 prototype. Global 1-degree grid including temperature, specific humidity, and pressure.

Observation: Upper air radiosonde observations from the Global Data Assimilation System (GDAS) in PREPBUFR format.

Climatology: None

Location: All of the input data required for this use case can be found in a sample data tarball. Each use case category will have one or more sample data tarballs. It is only necessary to download the tarball with the use case’s dataset and not the entire collection of sample data. Click here to access the METplus releases page and download sample data for the appropriate release: https://github.com/dtcenter/METplus/releases This tarball should be unpacked into the directory that you will set the value of INPUT_BASE. See Running METplus section for more information.

METplus Components

This use cases uses PB2NC, GenVxMask, and PointStat along with Python embedding and user scripting. For each call to PointStat, Python embedding is used to calculate the CTP and Humidity Index diagnostics and pass those diagnostics to PointStat for verification.

METplus Workflow

Beginning time (VALID_BEG): 2020-08-05 12:00

End time (VALID_END): 2020-08-05 12:00

Increment between beginning and end times (VALID_INCREMENT): 12 Hours

Sequence of forecast leads to process (LEAD_SEQ): 60

Only a single time is used to demonstrate the workflow for this use case. For each time, PB2NC is used to convert the upper-air radiosonde observations from PREPBUFR format to NetCDF. Then, a Python user script is used to create a text file with the locations of upper-air sites to use for verification. This text file is used by GenVxMask to mask out any points in the forecast outside of a user-defined radius around each upper-air site. Then, PointStat is called which uses Python embedding to compute diagnostics using METcalcpy to use for verification. Point stat is used to produce matched pairs (forecast/observation pairs) of both diagnostics for downstream land-atmosphere process evaluation.

METplus Configuration

METplus first loads all of the configuration files found in parm/metplus_config, then it loads any configuration files passed to METplus via the command line, i.e. parm/use_cases/model_applications/s2s/SeriesAnalysis_fcstCFSv2_obsGHCNCAMS_climoStandardized_MultiStatisticTool.conf

[config]

# Documentation for this use case can be found at
# https://metplus.readthedocs.io/en/latest/generated/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI.html

# For additional information, please see the METplus Users Guide.
# https://metplus.readthedocs.io/en/latest/Users_Guide

###########################################################
# General METplus Wrappers Settings
###########################################################

#LOG_LEVEL=DEBUG

############################################################
# Processes to run
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#process-list
############################################################

PROCESS_LIST = PB2NC,UserScript(mask_script),GenVxMask,PointStat(CTP),PointStat(HumIndex),UserScript(plot_script)

############################################################
# Time Info
# LOOP_BY options are INIT, VALID, RETRO, and REALTIME
# If set to INIT or RETRO:
# INIT_TIME_FMT, INIT_BEG, INIT_END, and INIT_INCREMENT must also be set
# If set to VALID or REALTIME:
# VALID_TIME_FMT, VALID_BEG, VALID_END, and VALID_INCREMENT must also be set
# LEAD_SEQ is the list of forecast leads to process
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#timing-control
############################################################

LOOP_BY = VALID
VALID_TIME_FMT = %Y%m%d%H
VALID_BEG = 2020080512
VALID_END = 2020080512
VALID_INCREMENT = 12H
LEAD_SEQ = 60

############################################################
# PB2NC settings for processing GDAS observations
# https://metplus.readthedocs.io/en/latest/Users_Guide/wrappers.html#pb2nc
############################################################

PB2NC_INPUT_DIR={INPUT_BASE}/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/obs/prepbufr
PB2NC_INPUT_TEMPLATE=prepbufr.gdas.{valid?fmt=%Y%m%d%H}.nr
PB2NC_MESSAGE_TYPE=ADPUPA
PB2NC_OUTPUT_DIR={OUTPUT_BASE}/obs/netcdf
PB2NC_OUTPUT_TEMPLATE=prepbufr.gdas.{valid?fmt=%Y%m%d%H}.nc
PB2NC_OBS_WINDOW_BEGIN=-1800
PB2NC_OBS_WINDOW_END=+1800
PB2NC_SKIP_IF_OUTPUT_EXISTS=False
PB2NC_OBS_BUFR_VAR_LIST=ZOB,QOB,TOB,POB,UOB,VOB,D_DPT
PB2NC_QUALITY_MARK_THRESH=10

############################################################
# UserScript Settings for mask_script
# This script is used to create a lat/lon ASCII file of
# RAOB locations (ADPUPA) from PB2NC output for use
# with gen_vx_mask
############################################################

[mask_script]
USER_SCRIPT_RUNTIME_FREQ=RUN_ONCE_PER_INIT_OR_VALID
USER_SCRIPT_OUTPUT_DIR={OUTPUT_BASE}/gen_vx_mask
USER_SCRIPT_OUTPUT_TEMPLATE=pb2nc_adpupa_latlon.txt
USER_SCRIPT_SITES_TO_INCLUDE=7
USER_SCRIPT_COMMAND=python3 {PARM_BASE}/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/create_raob_mask_file.py {PB2NC_OUTPUT_DIR}/{PB2NC_OUTPUT_TEMPLATE} {USER_SCRIPT_OUTPUT_DIR}/{USER_SCRIPT_OUTPUT_TEMPLATE} {USER_SCRIPT_SITES_TO_INCLUDE}

############################################################
# UserScript Settings for plot_script
# This script is used to create a graphic from the matched
# forecast and observation pairs created with PointStat
############################################################

[plot_script]
USER_SCRIPT_RUNTIME_FREQ=RUN_ONCE
USER_SCRIPT_OUTPUT_DIR={OUTPUT_BASE}
USER_SCRIPT_COMMAND=python3 {PARM_BASE}/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/plot_2panel_ctp_hi.py {USER_SCRIPT_OUTPUT_DIR}

############################################################
# GenVxMask Settings
# For creating masks around RAOB sites
############################################################

[config]
GEN_VX_MASK_INPUT_MASK_TEMPLATE={OUTPUT_BASE}/gen_vx_mask/pb2nc_adpupa_latlon.txt
GEN_VX_MASK_INPUT_TEMPLATE="gaussian 0 496 240"
GEN_VX_MASK_SKIP_IF_OUTPUT_EXISTS=True
GEN_VX_MASK_OPTIONS=-type circle -thresh "<=100"
GEN_VX_MASK_OUTPUT_TEMPLATE={OUTPUT_BASE}/gen_vx_mask/raob_masks.nc

############################################################
# PointStat Settings
# https://metplus.readthedocs.io/en/latest/Users_Guide/wrappers.html#pointstat
############################################################

POINT_STAT_CONFIG_FILE = {PARM_BASE}/met_config/PointStatConfig_wrapped
POINT_STAT_MESSAGE_TYPE = ADPSFC
POINT_STAT_OUTPUT_DIR={OUTPUT_BASE}/point_stat
POINT_STAT_OUTPUT_FLAG_MPR = BOTH

############################################################
# Field Info
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#field-info
############################################################

FCST_POINT_STAT_INPUT_TEMPLATE=PYTHON_NUMPY
OBS_POINT_STAT_PYEMBED_DEBUG=False

[CTP]
OBS_POINT_STAT_INPUT_TEMPLATE=PYTHON_NUMPY={PARM_BASE}/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/pyembed_ctp_obs_gdas.py {OUTPUT_BASE}/obs/netcdf/prepbufr.gdas.{valid?fmt=%Y%m%d%H}.nc {OBS_POINT_STAT_PYEMBED_DEBUG}
OBS_VAR1_NAME=CTP
OBS_VAR1_LEVELS=L0
FCST_VAR1_NAME={PARM_BASE}/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/pyembed_ctp_fcst_HR1.py {INPUT_BASE}/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/fcst/{init?fmt=%Y%m%d}/gfs.conus.atmf{lead?fmt=%3H}.nc tmp pfull {GEN_VX_MASK_OUTPUT_TEMPLATE}
FCST_VAR1_LEVELS=L0
POINT_STAT_OUTPUT_TEMPLATE=CTP

[HumIndex]
OBS_POINT_STAT_INPUT_TEMPLATE=PYTHON_NUMPY={PARM_BASE}/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/pyembed_hi_obs_gdas.py {OUTPUT_BASE}/obs/netcdf/prepbufr.gdas.{valid?fmt=%Y%m%d%H}.nc {OBS_POINT_STAT_PYEMBED_DEBUG}
OBS_VAR1_NAME=HI
OBS_VAR1_LEVELS=L0
FCST_VAR1_NAME={PARM_BASE}/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/pyembed_hi_fcst_HR1.py {INPUT_BASE}/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/fcst/{init?fmt=%Y%m%d}/gfs.conus.atmf{lead?fmt=%3H}.nc tmp pfull spfh {GEN_VX_MASK_OUTPUT_TEMPLATE}
FCST_VAR1_LEVELS=L0
POINT_STAT_OUTPUT_TEMPLATE=HI

[user_env_vars]
MET_PYTHON_DIR={MET_INSTALL_DIR}/share/met/python

MET Configuration

METplus sets environment variables based on user settings in the METplus configuration file. See How METplus controls MET config file settings for more details.

YOU SHOULD NOT SET ANY OF THESE ENVIRONMENT VARIABLES YOURSELF! THEY WILL BE OVERWRITTEN BY METPLUS WHEN IT CALLS THE MET TOOLS!

If there is a setting in the MET configuration file that is currently not supported by METplus you’d like to control, please refer to: Overriding Unsupported MET config file settings

PointStatConfig_wrapped

////////////////////////////////////////////////////////////////////////////////
//
// Point-Stat configuration file.
//
// For additional information, see the MET_BASE/config/README file.
//
////////////////////////////////////////////////////////////////////////////////

//
// Output model name to be written
//
// model =
${METPLUS_MODEL}

//
// Output description to be written
// May be set separately in each "obs.field" entry
//
// desc =
${METPLUS_DESC}

////////////////////////////////////////////////////////////////////////////////

//
// Verification grid
//
// regrid = {
${METPLUS_REGRID_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// May be set separately in each "field" entry
//
censor_thresh = [];
censor_val    = [];
cat_thresh    = [ NA ];
cnt_thresh    = [ NA ];
cnt_logic     = UNION;
wind_thresh   = [ NA ];
wind_logic    = UNION;
eclv_points   = 0.05;
//hss_ec_value =
${METPLUS_HSS_EC_VALUE}
rank_corr_flag = FALSE;

//
// Forecast and observation fields to be verified
//
fcst = {
  ${METPLUS_FCST_FILE_TYPE}
  //field = [
  ${METPLUS_FCST_FIELD}
  ${METPLUS_FCST_CLIMO_MEAN_DICT}
  ${METPLUS_FCST_CLIMO_STDEV_DICT}
}

obs = {
  ${METPLUS_OBS_FILE_TYPE}
  //field = [
  ${METPLUS_OBS_FIELD}
  ${METPLUS_OBS_CLIMO_MEAN_DICT}
  ${METPLUS_OBS_CLIMO_STDEV_DICT}
}
////////////////////////////////////////////////////////////////////////////////

//
// Point observation filtering options
// May be set separately in each "obs.field" entry
//
// message_type =
${METPLUS_MESSAGE_TYPE}
sid_exc        = [];

//obs_quality_inc =
${METPLUS_OBS_QUALITY_INC}

//obs_quality_exc =
${METPLUS_OBS_QUALITY_EXC}

//duplicate_flag =
${METPLUS_DUPLICATE_FLAG}

//obs_summary =
${METPLUS_OBS_SUMMARY}

//obs_perc_value =
${METPLUS_OBS_PERC_VALUE}

//
// Mapping of message type group name to comma-separated list of values.
//
//message_type_group_map =
${METPLUS_MESSAGE_TYPE_GROUP_MAP}

//obtype_as_group_val_flag =
${METPLUS_OBTYPE_AS_GROUP_VAL_FLAG}

////////////////////////////////////////////////////////////////////////////////

//
// Climatology data
//
//climo_mean = {
${METPLUS_CLIMO_MEAN_DICT}


//climo_stdev = {
${METPLUS_CLIMO_STDEV_DICT}

//
// May be set separately in each "obs.field" entry
//
//climo_cdf = {
${METPLUS_CLIMO_CDF_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Land/Sea mask
// For LANDSF message types, only use forecast grid points where land = TRUE.
// For WATERSF message types, only use forecast grid points where land = FALSE.
// land_mask.flag may be set separately in each "obs.field" entry.
//
//land_mask = {
${METPLUS_LAND_MASK_DICT}

//
// Topography
// For SURFACE message types, only use observations where the topo - station
// elevation difference meets the use_obs_thresh threshold.
// For the observations kept, when interpolating forecast data to the
// observation location, only use forecast grid points where the topo - station
// difference meets the interp_fcst_thresh threshold.
// topo_mask.flag may be set separately in each "obs.field" entry.
//
//topo_mask = {
${METPLUS_TOPO_MASK_DICT}

//lapse_rate_correction = {
${METPLUS_LAPSE_RATE_CORRECTION_DICT}

//msl_agl_conversion = {
${METPLUS_MSL_AGL_CONVERSION_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Point observation time window
//
// obs_window = {
${METPLUS_OBS_WINDOW_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Verification masking regions
//
//mask = {
${METPLUS_MASK_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Confidence interval settings
//
ci_alpha  = [ 0.05 ];

boot = {
   interval = PCTILE;
   rep_prop = 1.0;
   n_rep    = 0;
   rng      = "mt19937";
   seed     = "";
}

////////////////////////////////////////////////////////////////////////////////

//
// Interpolation methods
//
//interp = {
${METPLUS_INTERP_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// HiRA verification method
//
//hira = {
${METPLUS_HIRA_DICT}

////////////////////////////////////////////////////////////////////////////////
// Threshold for SEEPS p1 (Probability of being dry)

//seeps_p1_thresh =
${METPLUS_SEEPS_P1_THRESH}

////////////////////////////////////////////////////////////////////////////////

//
// Statistical output types
//
//output_flag = {
${METPLUS_OUTPUT_FLAG_DICT}

////////////////////////////////////////////////////////////////////////////////

//ugrid_dataset =
${METPLUS_UGRID_DATASET}

//ugrid_max_distance_km =
${METPLUS_UGRID_MAX_DISTANCE_KM}

//ugrid_coordinates_file =
${METPLUS_UGRID_COORDINATES_FILE}

////////////////////////////////////////////////////////////////////////////////

//point_weight_flag =
${METPLUS_POINT_WEIGHT_FLAG}

tmp_dir = "${MET_TMP_DIR}";

// output_prefix =
${METPLUS_OUTPUT_PREFIX}
//version        = "V10.0.0";

////////////////////////////////////////////////////////////////////////////////

${METPLUS_MET_CONFIG_OVERRIDES}

Python Embedding

This use case has four Python embedding scripts, one for each diagnostic (CTP/HI) for both the forecast and observations

pyembed_ctp_fcst_HR1.py
pyembed_ctp_obs_gdas.py
pyembed_hi_fcst_HR1.py
pyembed_hi_obs_gdas.py

In addition to the basic package requirements for Python embedding, these Python scripts also require

The forecast Python embedding scripts require the data variable names and the path to the mask file created with GenVxMask, while the observation Python embedding scripts only require the output file from PB2NC.

parm/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/pyembed_ctp_fcst_HR1.py

import multiprocessing
import numpy as np
import os
import sys
import xarray as xr
from itertools import repeat
from metcalcpy.diagnostics.land_surface import calc_ctp
from metpy.units import units

print("\nSTARTING pyembed_ctp_fcst_HR1.py\n")

# Obtain the command line arguments
input_file = sys.argv[1]
tmpvarname = sys.argv[2]
prsvarname = sys.argv[3]
mask_file  = sys.argv[4]

# Open the input_file as an Xarray Dataset
if os.path.splitext(input_file)[1]=='.nc':
  ds = xr.open_dataset(input_file)
  ds = ds[[tmpvarname,prsvarname,'pressfc']]
else:
  print("FATAL! pyembed_ctp_fcst_HR1.py.")
  print("Unable to open input file.")
  sys.exit(1)

# Determine the input dims
indims = ds.sizes
if ('grid_xt' not in ds.coords) or ('grid_yt' not in ds.coords):
  print("FATAL! unexpected dimension names in FCST file.")
  sys.exit(1)
else:
  ny = indims['grid_yt']
  nx = indims['grid_xt']

# Open the mask file
maskdata = xr.open_dataset(mask_file)

# Add the mask variable to the data
ds['maskvar'] = xr.DataArray(maskdata['RAOB_SITES'].values,dims=['grid_yt','grid_xt'],coords={'grid_yt':ds.grid_yt,'grid_xt':ds.grid_xt})

# The files that were used to develop this use case need special treatment of the pressure field.
# Find the "bk_interp" attribute
try:
  bk = ds.attrs['bk']
except KeyError:
  print("ERROR! Required attribute \"bk\" not found in:")
  print(input_file)
  print("UNABLE TO CONTINUE.")
  exit(1)

# Reverse bk, sfc pressure is -1 so make it item 0
bk = bk[::-1]

# The adjustment is at the half levels, but pressures are at the full levels.
# Average each pair of data to create n-1 number of bk values to use.
bk_interp = np.array([np.mean([bk[n],bk[n+1]]) for n in range(0,len(bk)-1)])

# Filter out values where bk=0
bk_interp = bk_interp[bk_interp>0.0]

# The model data are on terrain-following levels so it can't have a constant z-coordinate.
# Thus, we define a new z-coordinate "z0" of integers representing the levels
z0 = xr.DataArray(range(0,len(bk_interp)),dims=['z0'],coords={'z0':range(0,len(bk_interp))},attrs={'units':'levelnumber'})

# Next get the temperature data. It's stored upside-down so reverse it along the vertical dimension
tmp3d = ds[tmpvarname].squeeze().reindex(pfull=ds.pfull[::-1])

# Subset the temperature data so it only has data where the bk_interp variable is available
tmp3d = tmp3d.isel(pfull=slice(0,len(z0)))

# Change the vertical coordinate and dimension for the temperature data to be z0
tmp3d = tmp3d.rename({'pfull':'z0'}).assign_coords({'z0':z0})
tmp3d = tmp3d*units('degK')

# Create the 3D pressure variable
prs3d = xr.DataArray(bk_interp,dims=['z0'],coords={'z0':z0},attrs={'units':'Pa'}).broadcast_like(tmp3d)
prs3d = (prs3d*(ds['pressfc'].squeeze()))*units('Pa').to('hPa')

# Get a pool of workers
mp = multiprocessing.Pool(multiprocessing.cpu_count()-2)

# Stack the data in the x-y dimension into a single dimension named "sid".
# This treats each grid cell/column like a "site"
tmpstack = tmp3d.stack(sid=("grid_yt","grid_xt"))
prsstack = prs3d.stack(sid=("grid_yt","grid_xt"))
mskstack = ds['maskvar'].stack(sid=("grid_yt","grid_xt"))

# Create an Xarray DataArray like the stacked variables to hold the results
resstack = xr.full_like(mskstack,-9999.).rename('ctp')

# Subset the data to only the points where the mask is
prs_mask = prsstack[:,mskstack>0]
tmp_mask = tmpstack[:,mskstack>0]

print("")
print("COMPUTING CTP FOR %10d CELLS." % (int(tmpstack[:,mskstack>0].sizes['sid'])))
print("")
result = mp.starmap(calc_ctp,([prs_mask,tmp_mask,sidx] for sidx in list(range(0,tmp_mask.sizes['sid']))))
result = [x.m for x in result]

# Re-populate the stacked array with the values at the correct locations
resstack[mskstack>0] = result

# Unstack the data from the `sid` dimension back to just grid_xt and grid_yt (2D) and obtain the NumPy N-D array
met_data = resstack.unstack()
met_data = met_data.reindex(grid_yt=met_data.grid_yt[::-1])
met_data.to_netcdf('test.nc')
met_data = met_data.values

grid_attrs = {}
grid_attrs['type'] = 'Gaussian'
grid_attrs['name'] = 'HR1'
grid_attrs['lon_zero'] = 0.0
grid_attrs['nx'] = nx
grid_attrs['ny'] = ny

attrs = {}
attrs['valid'] = '20200805_120000'
attrs['init'] = '20200803_000000'
attrs['lead'] = '600000'
attrs['accum'] = '000000'
attrs['name'] = 'testing'
attrs['long_name'] = 'long_test'
attrs['level'] = 'surface'
attrs['units'] = 'test'
attrs['fill_value'] = -9999.
attrs['grid'] = grid_attrs

parm/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/pyembed_ctp_obs_gdas.py

import sys
import numpy as np
import pandas as pd
from met.point_nc import nc_point_obs
from metcalcpy.diagnostics import land_surface
from metpy.units import units

print("\nSTARTING pyembed_ctp_obs_gdas.py\n")

pd.set_option('display.max_rows', None)

# Get the input PB2NC output filename as the input to this script
pb2nc = sys.argv[1]
DEBUG = sys.argv[2]
DEBUG = True if DEBUG in ['True','yes','true','Yes','YES','TRUE'] else False

# Get the Pandas dataframe of the PB2NC data
df = nc_point_obs(pb2nc).to_pandas()

# Group the 11-column data by station. This will effectively create "soundings" for each site
groups = df.groupby('sid')
print(f"FOUND {groups.ngroups} SITES TO PROCESS")

# The first row of each group contains the metadata we want to retain
point_data = groups.first().reset_index()[['sid','typ','vld','lat','lon','elv']]

# Filter out stations to not process here
point_data['site_digit'] = point_data['sid'].astype('str').str[0]
point_data = point_data[point_data['site_digit'].isin(['7'])]
point_data = point_data.drop(['site_digit'],axis=1)

# Array to hold the CTP values for each station
ctp = np.array([])

# Process each group, which is defined as a single site
# Each site will have the MET 11-column data.
for name,group in groups:

  if DEBUG:
    print("")
    print(f"PROCESSING SITE: {name}")
  
  # First, make sure there is only one valid time
  timegrp = group.groupby('vld')
  if timegrp.ngroups>1:
    if DEBUG:
      print("INFO: FOUND MULTIPLE SOUNDINGS FOR THIS SITE.")
      print("USING THE FIRST")
    timegrp_name = [sg_name for sg_name,sg_df in timegrp]
    prof = timegrp.get_group(timegrp_name[0])
  else:
    prof = group

  # Filter out stations that we don't want to process
  if name not in point_data['sid'].values.tolist():
    continue

  # In each group here, there are rows for each variable at each pressure level.
  # Thus, we can subset based on TMP and just use the pressure data for TMP.
  sub = prof[prof['var']=='TMP']

  # Ensure the temperature subset has data
  if len(sub)==0:
    if DEBUG:
      print("ERROR! NO TMP DATA.")
      print(f"UNABLE TO COMPUTE CTP FOR SID: {name}")
    ctp = np.append(ctp,-9999.)
    continue

  # Subset out the data for this site into individual arrays
  # and add units using MetPy
  tmpsub = sub['obs'].astype('float').values*units('degK')
  prssub = sub['lvl'].astype('float').values*units('hPa')
  qcsub = sub['qc'].astype('int').values

  # The pressures must exceed 300 hPa above the lowest in the sounding
  if max(prssub.m)<= min(prssub.m+300.0):
    if DEBUG:
      print("ERROR! SOUNDING TOP PRESSURE DOES NOT EXCEED 300 hPa ABOVE THE LOWEST PRESSURE.")
      print(f"UNABLE TO COMPUTE CTP FOR SID: {name}")
    ctp = np.append(ctp,-9999.)
  else:
    # Append the CTP value
    thisctp = land_surface.calc_ctp(prssub,tmpsub,-1)
    ctp = np.append(ctp,thisctp.m)

# After each station is processed, add in the missing 11-column data
# lvl --> set to 1000.0
# hgt --> set to 0
# qc --> set to 'NA' for now
# var --> set to "CTP"
# typ --> reset from ADPUPA to ADPSFC
point_data['obs'] = ctp
point_data['lvl'] = [1000.0]*len(point_data)
point_data['hgt'] = [0]*len(point_data)
point_data['qc'] = ['NA']*len(point_data)
point_data['var'] = ['CTP']*len(point_data)
point_data['typ'] = ['ADPSFC']*len(point_data)

# Assign proper dtypes
met_col_dtypes = {'typ':'string',
                  'sid':'string',
                  'vld':'string',
                  'lat':'float64',
                  'lon':'float64',
                  'elv':'float64',
                  'var':'string',
                  'lvl':'float64',
                  'hgt':'float64',
                  'qc':'string',
                  'obs':'float64'}
point_data = point_data.astype(met_col_dtypes)

# Reorder the columns to be correct
point_data = point_data[['typ','sid','vld','lat','lon','elv','var','lvl','hgt','qc','obs']]

# Convert to MET object
point_data = point_data.values.tolist()

parm/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/pyembed_hi_fcst_HR1.py

import multiprocessing
import numpy as np
import os
import sys
import xarray as xr
from itertools import repeat
from metcalcpy.diagnostics.land_surface import calc_humidity_index
from metpy.units import units
from metpy.calc import dewpoint_from_specific_humidity

print("\nSTARTING pyembed_hi_fcst_HR1.py\n")

# Obtain the command line arguments
input_file = sys.argv[1]
tmpvarname = sys.argv[2]
prsvarname = sys.argv[3]
sphvarname = sys.argv[4]
mask_file  = sys.argv[5]

# Open the input_file as an Xarray Dataset
if os.path.splitext(input_file)[1]=='.nc':
  ds = xr.open_dataset(input_file)
  ds = ds[[tmpvarname,prsvarname,sphvarname,'pressfc']]
else:
  print("FATAL! pyembed_hi_fcst_HR1.py.")
  print("Unable to open input file.")
  sys.exit(1)

# Determine the input dims
indims = ds.sizes
if ('grid_xt' not in ds.coords) or ('grid_yt' not in ds.coords):
  print("FATAL! unexpected dimension names in FCST file.")
  sys.exit(1)
else:
  ny = indims['grid_yt']
  nx = indims['grid_xt']

# Open the mask file
maskdata = xr.open_dataset(mask_file)

# Add the mask variable to the data
ds['maskvar'] = xr.DataArray(maskdata['RAOB_SITES'].values,dims=['grid_yt','grid_xt'],coords={'grid_yt':ds.grid_yt,'grid_xt':ds.grid_xt})

# The files that were used to develop this use case need special treatment of the pressure field.
# Find the "bk_interp" attribute
try:
  bk = ds.attrs['bk']
except KeyError:
  print("ERROR! Required attribute \"bk\" not found in:")
  print(input_file)
  print("UNABLE TO CONTINUE.")
  exit(1)

# Reverse bk, sfc pressure is -1 so make it item 0
bk = bk[::-1]

# The adjustment is at the half levels, but pressures are at the full levels.
# Average each pair of data to create n-1 number of bk values to use.
bk_interp = np.array([np.mean([bk[n],bk[n+1]]) for n in range(0,len(bk)-1)])

# Filter out values where bk=0
bk_interp = bk_interp[bk_interp>0.0]

# The model data are on terrain-following levels so it can't have a constant z-coordinate.
# Thus, we define a new z-coordinate "z0" of integers representing the levels
z0 = xr.DataArray(range(0,len(bk_interp)),dims=['z0'],coords={'z0':range(0,len(bk_interp))},attrs={'units':'levelnumber'})

# Next get the temperature and specific humidity data. It's stored upside-down so reverse it along the vertical dimension
tmp3d = ds[tmpvarname].squeeze().reindex(pfull=ds.pfull[::-1])
sph3d = ds[sphvarname].squeeze().reindex(pfull=ds.pfull[::-1])

# Subset the temperature and specific humidity data so it only has data where the bk_interp variable is available
tmp3d = tmp3d.isel(pfull=slice(0,len(z0)))
sph3d = sph3d.isel(pfull=slice(0,len(z0)))

# Change the vertical coordinate and dimension for the temperature and specific humidity data to be z0
tmp3d = tmp3d.rename({'pfull':'z0'}).assign_coords({'z0':z0})
tmp3d = tmp3d*units('degK')
sph3d = sph3d.rename({'pfull':'z0'}).assign_coords({'z0':z0})
sph3d = sph3d*units('kg/kg')

# Create the 3D pressure variable
prs3d = xr.DataArray(bk_interp,dims=['z0'],coords={'z0':z0},attrs={'units':'Pa'}).broadcast_like(tmp3d)
prs3d = (prs3d*(ds['pressfc'].squeeze()))*units('Pa').to('hPa')

# Compute dewpoint temperature from specific humidity
dew3d = dewpoint_from_specific_humidity(prs3d,sph3d)
dew3d = dew3d*units('degK')

# Get a pool of workers
mp = multiprocessing.Pool(multiprocessing.cpu_count()-2)

# Stack the data in the x-y dimension into a single dimension named "sid".
# This treats each grid cell/column like a "site"
tmpstack = tmp3d.stack(sid=("grid_yt","grid_xt"))
prsstack = prs3d.stack(sid=("grid_yt","grid_xt"))
dewstack = dew3d.stack(sid=("grid_yt","grid_xt"))
mskstack = ds['maskvar'].stack(sid=("grid_yt","grid_xt"))

# Create an Xarray DataArray like the stacked variables to hold the results
resstack = xr.full_like(mskstack,-9999.).rename('humidity_index')

# Subset the data to only the points where the mask is
prs_mask = prsstack[:,mskstack>0]
tmp_mask = tmpstack[:,mskstack>0]
dew_mask = dewstack[:,mskstack>0]

print("")
print("COMPUTING HUM. INDEX FOR %10d CELLS." % (int(tmpstack[:,mskstack>0].sizes['sid'])))
print("")
result = mp.starmap(calc_humidity_index,([prs_mask,tmp_mask,dew_mask,sidx] for sidx in list(range(0,tmp_mask.sizes['sid']))))
result = [x.m for x in result]

# Re-populate the stacked array with the values at the correct locations
resstack[mskstack>0] = result

# Unstack the data from the `sid` dimension back to just grid_xt and grid_yt (2D) and obtain the NumPy N-D array
met_data = resstack.unstack()
met_data = met_data.reindex(grid_yt=met_data.grid_yt[::-1])
met_data.to_netcdf('test.nc')
met_data = met_data.values

grid_attrs = {}
grid_attrs['type'] = 'Gaussian'
grid_attrs['name'] = 'HR1'
grid_attrs['lon_zero'] = 0.0
grid_attrs['nx'] = nx
grid_attrs['ny'] = ny

attrs = {}
attrs['valid'] = '20200805_120000'
attrs['init'] = '20200803_000000'
attrs['lead'] = '600000'
attrs['accum'] = '000000'
attrs['name'] = 'testing'
attrs['long_name'] = 'long_test'
attrs['level'] = 'surface'
attrs['units'] = 'test'
attrs['fill_value'] = -9999.
attrs['grid'] = grid_attrs

parm/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/pyembed_hi_obs_gdas.py

import sys
import numpy as np
import pandas as pd
from met.point_nc import nc_point_obs
from metcalcpy.diagnostics import land_surface
from metpy.units import units

print("\nSTARTING pyembed_hi_obs_gdas.py\n")

pd.set_option('display.max_rows', None)

# Get the input PB2NC output filename as the input to this script
pb2nc = sys.argv[1]
DEBUG = sys.argv[2]
DEBUG = True if DEBUG in ['True','yes','true','Yes','YES','TRUE'] else False

# Get the Pandas dataframe of the PB2NC data
df = nc_point_obs(pb2nc).to_pandas()

# Group the 11-column data by station. This will effectively create "soundings" for each site
groups = df.groupby('sid')
print(f"FOUND {groups.ngroups} SITES TO PROCESS")

# The first row of each group contains the metadata we want to retain
point_data = groups.first().reset_index()[['sid','typ','vld','lat','lon','elv']]

# Filter out stations to not process here
point_data['site_digit'] = point_data['sid'].astype('str').str[0]
point_data = point_data[point_data['site_digit'].isin(['7'])]
point_data = point_data.drop(['site_digit'],axis=1)

# Array to hold the HI values for each station
hi = np.array([])

# Process each group, which is defined as a single site
# Each site will have the MET 11-column data.
for name,group in groups:

  if DEBUG:
    print("")
    print(f"PROCESSING SITE: {name}")
  
  # First, make sure there is only one valid time
  timegrp = group.groupby('vld')
  if timegrp.ngroups>1:
    if DEBUG:
      print("INFO: FOUND MULTIPLE SOUNDINGS FOR THIS SITE.")
      print("USING THE FIRST")
    timegrp_name = [sg_name for sg_name,sg_df in timegrp]
    prof = timegrp.get_group(timegrp_name[0])
  else:
    prof = group

  # Filter out stations that we don't want to process
  if name not in point_data['sid'].values.tolist():
    continue

  # For the current sounding, pull out the TMP and DPT rows
  tmpsub = prof[prof['var']=='TMP']
  dewsub = prof[prof['var']=='DPT']
  
  # Store number of rows for each variable
  ntmp = len(tmpsub)
  ndew = len(dewsub)

  # If there is no TMP or DPT data, skip this site.
  if (ntmp==0) or (ndew==0):
    if DEBUG:
      print("ERROR! NO TMP OR DPT DATA!")
      print(f"UNABLE TO COMPUTE HI FOR SID: {name}")
    hi = np.append(hi,-9999.)
    continue

  # Pull out the actual data values and assign units with MetPy
  tmparr = tmpsub['obs'].astype('float').values*units('degK')
  dewarr = dewsub['obs'].astype('float').values*units('degK')
  prsarr = tmpsub['lvl'].astype('float').values*units('hPa')
  
  # The pressures must exceed 300 hPa above the lowest in the sounding
  if np.max(prsarr.m)<= np.min(prsarr.m+300.0):
    if DEBUG:
      print("ERROR! SOUNDING TOP PRESSURE DOES NOT EXCEED 300 hPa ABOVE THE LOWEST PRESSURE.")
      print(f"UNABLE TO COMPUTE HI FOR SID: {name}")
    hi = np.append(hi,-9999.)
  elif not len(prsarr)==len(tmparr)==len(dewarr):
    if DEBUG:
      print("ERROR! UNEQUAL LENGTH DATA.")
      print(f"UNABLE TO COMPUTE HI FOR SID: {name}")
      print(f"FOUND {ntmp} TMP OBS")
      print(f"FOUND {ndew} DEW OBS")
    hi = np.append(hi,-9999.)
  else:
    # Append the HI value
    thishi = land_surface.calc_humidity_index(prsarr,tmparr,dewarr,-1)
    hi = np.append(hi,thishi.m)

# After each station is processed, add in the missing 11-column data
# lvl --> set to 1000.0
# hgt --> set to 0
# qc --> set to 'NA' for now
# var --> set to "CTP"
# typ --> reset from ADPUPA to ADPSFC
point_data['obs'] = hi
point_data['lvl'] = [1000.0]*len(point_data)
point_data['hgt'] = [0]*len(point_data)
point_data['qc'] = ['NA']*len(point_data)
point_data['var'] = ['HI']*len(point_data)
point_data['typ'] = ['ADPSFC']*len(point_data)

# Assign proper dtypes
met_col_dtypes = {'typ':'string',
                  'sid':'string',
                  'vld':'string',
                  'lat':'float64',
                  'lon':'float64',
                  'elv':'float64',
                  'var':'string',
                  'lvl':'float64',
                  'hgt':'float64',
                  'qc':'string',
                  'obs':'float64'}
point_data = point_data.astype(met_col_dtypes)

# Reorder the columns to be correct
point_data = point_data[['typ','sid','vld','lat','lon','elv','var','lvl','hgt','qc','obs']]

# Convert to MET object
point_data = point_data.values.tolist()

For more information on the basic requirements to utilize Python Embedding in METplus, please refer to the MET User’s Guide section on Python embedding.

User Scripting

This use case uses a Python script to create an input file for GenVxMask to use based on the upper-air sites the user wishes to include. In the METplus configuration file, a user can set the USER_SCRIPT_SITES_TO_INCLUDE configuration item. This is a comma-separated string of integers representing the first digit in the WMO station ID of upper-air sites to include. For example, in the United States many of the upper-air sites begin with “7”. If a user wants to use only U.S. sites, they would set this just to 7. To include other countries, include more digits. By default, this is set to only 7 in order to decrease the total run time of the use case for testing and demonstration.

A second user script is used to plot matched forecast and observation pairs (MPR) from PointStat MPR text files in order to visualize the two metrics in the same space.

In addition to the basic package requirements for Python embedding, the plot_2panel_ctp_hi.py Python script also requires

Matplotlib

parm/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI/plot_2panel_ctp_hi.py

import glob
import matplotlib.pyplot as plt
import os
import pandas as pd
import sys

outdir = sys.argv[1]

# Directories where point_stat MPR files are
ctp_dir = f'{outdir}/point_stat/CTP'
hmi_dir = f'{outdir}/point_stat/HI'

# Get a list of files for CIP and HI and sort them
ctp_files = glob.glob(os.path.join(ctp_dir,'*_mpr.txt'))
ctp_files.sort()
hmi_files = glob.glob(os.path.join(hmi_dir,'*_mpr.txt'))
hmi_files.sort()

# Determine the MPR column names and store them for later
ctp_cols = pd.read_csv(ctp_files[0],nrows=1,header=None)[0].str.split(expand=True).iloc[0].tolist()
hmi_cols = pd.read_csv(hmi_files[0],nrows=1,header=None)[0].str.split(expand=True).iloc[0].tolist()

# Read each MPR file into a dataframe without the header
ctp_list = [pd.read_csv(x,header=None,skiprows=1)[0].str.split(expand=True) for x in ctp_files]
hmi_list = [pd.read_csv(x,header=None,skiprows=1)[0].str.split(expand=True) for x in hmi_files]

# Concatenate all the MPR files into a single dataframe
ctp_df = pd.concat(ctp_list)
hmi_df = pd.concat(hmi_list)

# Add the columns
ctp_df.columns = ctp_cols
hmi_df.columns = hmi_cols

# Align the sites between CTP/HMI- sometimes the calculation will fail at some sites leading
# to an inconsistent match between sites with both, or only 1 of the two metrics
ctp_df = ctp_df[ctp_df['OBS_SID'].isin(hmi_df['OBS_SID'])]
hmi_df = hmi_df[hmi_df['OBS_SID'].isin(ctp_df['OBS_SID'])]

# Sort each subset based on some criteria. Note that there could be multiple forecast lead times
# at the same valid time, for each station.
ctp_sort = ctp_df.sort_values(['FCST_LEAD','FCST_VALID_BEG','OBS_SID'])
hmi_sort = hmi_df.sort_values(['FCST_LEAD','FCST_VALID_BEG','OBS_SID'])

fig, (ax1, ax2) = plt.subplots(2,1,figsize=(22,15))
fcst = ax1.scatter(hmi_sort['FCST'].astype('float'),ctp_sort['FCST'].astype('float'),c='k',s=50)
obs = ax1.scatter(hmi_sort['OBS'].astype('float'),ctp_sort['OBS'].astype('float'),c='b',s=50)
ax1.legend([fcst,obs],['FCST','OBS'],fontsize=18)
ax1.set_ylabel('CTP (J/kg)',fontsize=24)
ax1.set_xlabel('Humidity Index (C)',fontsize=24)
ax1.set_xlim([-10,100])
ax1.set_ylim([-1100,1000])
ax1.set_yticks([-1000,-750,-500,-250,0,250,500,750,1000])
ax1.set_xticks(ax1.get_xticks())
ax1.set_yticklabels(ax1.get_yticks(),fontsize=24)
ax1.set_xticklabels(ax1.get_xticks(),fontsize=24)
ax1.axhline(0.0,linewidth=2,c='k')
ax1.grid(which='both')
nsites = int(len(hmi_sort))
ax1.set_title(f'Convective Triggering Potential vs. Humidity Index for {nsites} Fcst/Obs Pairs',loc='center',fontsize=32)
ax2.scatter(hmi_sort['FCST'].astype('float')-hmi_sort['OBS'].astype('float'),ctp_sort['FCST'].astype('float')-ctp_sort['OBS'].astype('float'),c='r',s=50)
ax2.set_ylabel('CTP [FCST-OBS] (J/kg)',fontsize=24)
ax2.set_xlabel('Humidity Index [FCST-OBS] (C)',fontsize=24)
ax2.axhline(0.0,linewidth=2,c='k')
ax2.axvline(0.0,linewidth=2,c='k')
ax2.grid(which='both')
ax2.set_xlim([-50,50])
ax2.set_ylim([-1100,1000])
ax2.set_yticks([-1000,-750,-500,-250,0,250,500,750,1000])
ax2.set_xticks(ax2.get_xticks())
ax2.set_yticklabels(ax2.get_yticks(),fontsize=24)
ax2.set_xticklabels(ax2.get_xticks(),fontsize=24)
plt.savefig(f'{outdir}/compare_ctp_hi.png')

Running METplus

Note

Prior to running this use case, the user should ensure that the version of Python in their environment meets the requirements for the Python UserScripts. These requirements are noted earlier in this documentation. Additionally, if the user is using a version of MET that was not compiled against a Python installation meeting the requirements for Python embedding (also described earlier), then the user should set the MET_PYTHON_EXE configuration variable under the [user_env_var] section of their local METplus configuration file to “python3”. When a user runs the Linux command “which python3”, it should return the path to the Python installation that meets the requirements for either the Python UserScripts, and if needed, Python embedding scripts. This will ensure the correct Python is used for Python embedding (if needed), and the Python UserScripts.

Pass the use case configuration file to the run_metplus.py script along with any user-specific system configuration files if desired:

run_metplus.py /path/to/METplus/parm/use_cases/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI.conf /path/to/user_system.conf

See Running METplus for more information.

Expected Output

A successful run will output the following both to the screen and to the logfile:

INFO: METplus has successfully finished running.

Refer to the value set for OUTPUT_BASE to find where the output data was generated. Output for this use case will be found in {OUTPUT_BASE}/model_applications/land_surface/PointStat_fcstUFS_obsGDAS_CTP_HI and will contain the following files:

* point_stat/CTP/point_stat_600000L_20200805_120000V_mpr.txt
* point_stat/CTP/point_stat_600000L_20200805_120000V.stat
* point_stat/HI/point_stat_600000L_20200805_120000V_mpr.txt
* point_stat/HI/point_stat_600000L_20200805_120000V.stat
* compare_ctp_hi.png
* gen_vx_mask/pb2nc_adpupa_latlon.txt
* gen_vx_mask/raob_masks.nc
* obs/netcdf/prepbufr.gdas.2020080512.nc

#
# Each file should contain corresponding statistics for the line type(s) requested.

Keywords

Note

PointStatToolUseCase
PythonEmbeddingFileUseCase
METcalcpyUseCase
LandSurfaceAppUseCase
UserScriptUseCase
VxDataGDASPREP

Navigate to the METplus Quick Search for Use Cases page to discover other similar use cases.

sphinx_gallery_thumbnail_path = ‘_static/land_surface-PointStat_fcstUFS_obsGDAS_CTP_HI.png’

Gallery generated by Sphinx-Gallery