StatAnalysis: IODAv1

model_applications/data_assimilation/StatAnalysis_fcstHAFS_obsPrepBufr_JEDI_IODA_interface.conf

Scientific Objective

This use case demonstrates the Stat-Analysis tool and ingestion of HofX netCDF files that have been output from the Joint Effort for Data assimilation Integration (JEDI) data assimilation system. JEDI uses “IODA” formatted files, which are netCDF files with certain requirements of variables and naming conventions. These files hold observations to be assimilated into forecasts, in this case the FV3-based Hurricane Analysis and Forecast System (HAFS). HAFS performs tc initialization by using synthetic observations of conventional variables to relocate a tropical cyclone as informed by a vortex tracker, in this case Tropical Storm Dorian.

In this case 100224 observations from 2019082418 are used. These were converted from perpbufr files via a fortran ioda-converter provided by the Joint Center for Satellite Data Assimilation, which oversees the development of JEDI. The variables used are t, q, u, and v.

The first component of JEDI to be incorporated into operational systems will be the Unified Forward Operator (UFO) to replace the GSI observer in global EnKF forecasts. UFO is a component of HofX, which maps the background forecast to observation space to form O minus B pairs. The HofX application of JEDI takes the input IODA files and adds an additional variable, <variable_name>@hofx that is to be paired with <variable_name>@ObsValue. These HofX files are used as input to form Matched Pair (MPR) formatted lists via Python embedding. In this case, Stat-Analysis then performs a filter job and outputs the filtered MPR formatted columns in an ascii file.

Datasets

Data source: JEDI HofX output files in IODA format
Location: All of the input data required for this use case can be found in the met_test sample data tarball. Click here to the METplus releases page and download sample data for the appropriate release: https://github.com/dtcenter/METplus/releases
The tarball should be unpacked into the directory that you will set the value of INPUT_BASE. See Running METplus section for more information.

METplus Components

This use case utilizes the METplus StatAnalysis wrapper to search for files that are valid for the given case and generate a command to run the MET tool stat_analysis.

METplus Workflow

StatAnalysis is the only tool called in this example. It processes the following run times:

Valid: 2019-08-24_18Z
Forecast lead: 6 hour

METplus Configuration

METplus first loads all of the configuration files found in parm/metplus_config, then it loads any configuration files passed to METplus via the command line with the -c option, i.e. -c parm/use_cases/model_applications/data_assimilation/StatAnalysis_fcstHAFS_obsPrepBufr_JEDI_IODA_interface.conf

[config]

# Documentation for this use case can be found at
# https://metplus.readthedocs.io/en/latest/generated/model_applications/data_assimilation/StatAnalysis_fcstHAFS_obsPrepBufr_JEDI_IODA_interface.html

# For additional information, please see the METplus Users Guide.
# https://metplus.readthedocs.io/en/latest/Users_Guide

###
# Processes to run
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#process-list
###

PROCESS_LIST = StatAnalysis


###
# Time Info
# LOOP_BY options are INIT, VALID, RETRO, and REALTIME
# If set to INIT or RETRO:
#   INIT_TIME_FMT, INIT_BEG, INIT_END, and INIT_INCREMENT must also be set
# If set to VALID or REALTIME:
#   VALID_TIME_FMT, VALID_BEG, VALID_END, and VALID_INCREMENT must also be set
# LEAD_SEQ is the list of forecast leads to process
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#timing-control
###

LOOP_BY = VALID

VALID_TIME_FMT = %Y%m%d%H
VALID_BEG=2005080700
VALID_END=2005080700
VALID_INCREMENT = 12H

LEAD_SEQ = 0


###
# File I/O
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#directory-and-filename-template-info
###

MODEL1_STAT_ANALYSIS_LOOKIN_DIR = python {PARM_BASE}/use_cases/model_applications/data_assimilation/StatAnalysis_fcstHAFS_obsPrepBufr_JEDI_IODA_interface/read_ioda_mpr.py {INPUT_BASE}/model_applications/data_assimilation/hofx_dir

STAT_ANALYSIS_OUTPUT_DIR = {OUTPUT_BASE}/model_applications/data_assimilation/StatAnalysis_HofX

MODEL1_STAT_ANALYSIS_DUMP_ROW_TEMPLATE = dump.out


###
# StatAnalysis Settings
# https://metplus.readthedocs.io/en/latest/Users_Guide/wrappers.html#statanalysis
###

MODEL1 = NA
MODEL1_OBTYPE = NA

STAT_ANALYSIS_JOB_NAME = filter
STAT_ANALYSIS_JOB_ARGS = -out_line_type CNT -dump_row [dump_row_file] -line_type MPR

MODEL_LIST =
DESC_LIST =
FCST_LEAD_LIST =
OBS_LEAD_LIST =
FCST_VALID_HOUR_LIST = 
FCST_INIT_HOUR_LIST =
OBS_VALID_HOUR_LIST =
OBS_INIT_HOUR_LIST =
FCST_VAR_LIST =
OBS_VAR_LIST =
FCST_UNITS_LIST =
OBS_UNITS_LIST =
FCST_LEVEL_LIST =
OBS_LEVEL_LIST =
VX_MASK_LIST =
INTERP_MTHD_LIST =
INTERP_PNTS_LIST =
FCST_THRESH_LIST =
OBS_THRESH_LIST =
COV_THRESH_LIST =
ALPHA_LIST =
LINE_TYPE_LIST =

GROUP_LIST_ITEMS =
LOOP_LIST_ITEMS = MODEL_LIST

MET Configuration

METplus sets environment variables based on user settings in the METplus configuration file. See How METplus controls MET config file settings for more details.

YOU SHOULD NOT SET ANY OF THESE ENVIRONMENT VARIABLES YOURSELF! THEY WILL BE OVERWRITTEN BY METPLUS WHEN IT CALLS THE MET TOOLS!

If there is a setting in the MET configuration file that is currently not supported by METplus you’d like to control, please refer to: Overriding Unsupported MET config file settings

Note

See the StatAnalysis MET Configuration section of the User’s Guide for more information on the environment variables used in the file below:

////////////////////////////////////////////////////////////////////////////////
//
// STAT-Analysis configuration file.
//
// For additional information, see the MET_BASE/config/README file.
//
////////////////////////////////////////////////////////////////////////////////

//
// Filtering input STAT lines by the contents of each column
//
${METPLUS_MODEL}
${METPLUS_DESC}

${METPLUS_FCST_LEAD}
${METPLUS_OBS_LEAD}

${METPLUS_FCST_VALID_BEG}
${METPLUS_FCST_VALID_END}
${METPLUS_FCST_VALID_HOUR}

${METPLUS_OBS_VALID_BEG}
${METPLUS_OBS_VALID_END}
${METPLUS_OBS_VALID_HOUR}

${METPLUS_FCST_INIT_BEG}
${METPLUS_FCST_INIT_END}
${METPLUS_FCST_INIT_HOUR}

${METPLUS_OBS_INIT_BEG}
${METPLUS_OBS_INIT_END}
${METPLUS_OBS_INIT_HOUR}

${METPLUS_FCST_VAR}
${METPLUS_OBS_VAR}

${METPLUS_FCST_UNITS}
${METPLUS_OBS_UNITS}

${METPLUS_FCST_LEVEL}
${METPLUS_OBS_LEVEL}

${METPLUS_OBTYPE}

${METPLUS_VX_MASK}

${METPLUS_INTERP_MTHD}

${METPLUS_INTERP_PNTS}

${METPLUS_FCST_THRESH}
${METPLUS_OBS_THRESH}
${METPLUS_COV_THRESH}

${METPLUS_ALPHA}

${METPLUS_LINE_TYPE}

column = [];

weight = [];

////////////////////////////////////////////////////////////////////////////////

//
// Array of STAT-Analysis jobs to be performed on the filtered data
//
${METPLUS_JOBS}

////////////////////////////////////////////////////////////////////////////////

//
// Confidence interval settings
//
out_alpha = 0.05;

boot = {
   interval = PCTILE;
   rep_prop = 1.0;
   n_rep    = 0;
   rng      = "mt19937";
   seed     = "";
}

////////////////////////////////////////////////////////////////////////////////

//
// WMO mean computation logic
//
wmo_sqrt_stats   = [ "CNT:FSTDEV",  "CNT:OSTDEV",  "CNT:ESTDEV",
                     "CNT:RMSE",    "CNT:RMSFA",   "CNT:RMSOA",
                     "VCNT:FS_RMS", "VCNT:OS_RMS", "VCNT:RMSVE",
                     "VCNT:FSTDEV", "VCNT:OSTDEV" ];

wmo_fisher_stats = [ "CNT:PR_CORR", "CNT:SP_CORR",
                     "CNT:KT_CORR", "CNT:ANOM_CORR" ];

////////////////////////////////////////////////////////////////////////////////

//hss_ec_value =
${METPLUS_HSS_EC_VALUE}
rank_corr_flag = FALSE;
vif_flag       = FALSE;

tmp_dir = "${MET_TMP_DIR}";

//version        = "V10.0";

${METPLUS_MET_CONFIG_OVERRIDES}

Python Embedding

This use case uses a Python embedding script to read input data

parm/use_cases/model_applications/data_assimilation/StatAnalysis_fcstHAFS_obsPrepBufr_JEDI_IODA_interface/read_ioda_mpr.py

from __future__ import print_function

import pandas as pd
import os
from glob import glob
import sys
import xarray as xr
import datetime as dt

########################################################################

def read_netcdfs(files, dim):
    paths = sorted(glob(files))
    datasets = [xr.open_dataset(p) for p in paths]
    combined = xr.concat(datasets, dim)
    return combined

########################################################################
print('Python Script:\t', sys.argv[0])

# Input is directory of .nc or .nc4 files

if len(sys.argv) == 2:
    # Read the input file as the first argument
    input_dir = os.path.expandvars(sys.argv[1])
    try:
        print("Input File:\t" + repr(input_dir))
       
        # Read all from a directory 
        ioda_data = read_netcdfs(input_dir+'/*.nc*', dim='nlocs')
        
        # Grab variables list
        var_list = ioda_data['variable_names@VarMetaData'].isel(nlocs=[0]).str.decode('utf-8').values
        var_list = [i.strip() for i in var_list[0] if i]

        # Use only nlocs dimension to ensure a table
        ioda_data = ioda_data.drop_dims('nvars')
        ioda_df = ioda_data.to_dataframe()
           
        nlocs = len(ioda_df.index)
        print('Number of locations in set: ' + str(nlocs)) 

        # Decode strings
        ioda_df.loc[:,'datetime@MetaData'] = ioda_df.loc[:,'datetime@MetaData'].str.decode('utf-8') 
        ioda_df.loc[:,'station_id@MetaData'] = ioda_df.loc[:,'station_id@MetaData'].str.decode('utf-8')

        # Datetime format. Need YYYYMMD_HHMMSS from YYYY-MM-DDTHH:MM:SSZ.           
        time = ioda_df.loc[:,'datetime@MetaData'].values.tolist()

        for i in range(0,nlocs):        
            temp = dt.datetime.strptime(time[i], '%Y-%m-%dT%H:%M:%SZ')
            time[i] = temp.strftime('%Y%m%d_%H%M%S')
            
        ioda_df.loc[:,'datetime@MetaData'] = time

        mpr_data = []
        var_list = [i for i in var_list if i+'@hofx' in ioda_df.columns]

        for var_name in var_list:
            
            # Subset the needed columns
            ioda_df_var = ioda_df[['datetime@MetaData','station_id@MetaData',var_name+'@ObsType',
                                'latitude@MetaData','longitude@MetaData','air_pressure@MetaData',
                                var_name+'@hofx',var_name+'@ObsValue',
                                var_name+'@PreQC']]
            
            # Find locations with ObsValues
            ioda_df_var = ioda_df_var[ioda_df_var[var_name+'@ObsValue'] < 1e9] 
            nlocs = len(ioda_df_var.index)
            print(var_name+' has '+str(nlocs)+' obs.')
            
            # Add additional columns
            ioda_df_var['lead'] = '000000'
            ioda_df_var['MPR'] = 'MPR'
            ioda_df_var['nobs'] = nlocs
            ioda_df_var['index'] = range(0,nlocs)
            ioda_df_var['varname'] = var_name
            ioda_df_var['na'] = 'NA'

            # Arrange columns in MPR format
            cols = ['na','na','lead','datetime@MetaData','datetime@MetaData','lead','datetime@MetaData',
                    'datetime@MetaData','varname','na','lead','varname','na','na',
                    var_name+'@ObsType','na','na','lead','na','na','na','na','MPR',
                    'nobs','index','station_id@MetaData','latitude@MetaData','longitude@MetaData',
                    'air_pressure@MetaData','na',var_name+'@hofx',var_name+'@ObsValue',
                    var_name+'@PreQC','na','na']
            
            ioda_df_var = ioda_df_var[cols]

            # Into a list and all to strings
            mpr_data = mpr_data + [list( map(str,i) ) for i in ioda_df_var.values.tolist() ]
            
            print("Total Length:\t" + repr(len(mpr_data)))

    except NameError: 
        print("Can't find the input files or the variables.")
        print("Variables in this file:\t" + repr(var_list))
else:
    print("ERROR: read_ioda_mpr.py -> Must specify directory of files.\n")
    sys.exit(1)

########################################################################

Running METplus

It is recommended to run this use case by:

Passing in StatAnalysis_fcstHAFS_obsPrepBufr_JEDI_IODA_interface.conf then a user-specific system configuration file:

run_metplus.py -c /path/to/StatAnalysis_fcstHAFS_obsPrepBufr_JEDI_IODA_interface.conf -c /path/to/user_system.conf

The following METplus configuration variables must be set correctly to run this example.:

  • INPUT_BASE - Path to directory where sample data tarballs are unpacked (See Datasets section to obtain tarballs).

  • OUTPUT_BASE - Path where METplus output will be written. This must be in a location where you have write permissions

  • MET_INSTALL_DIR - Path to location where MET is installed locally

Example User Configuration File:

[dir]
INPUT_BASE = /path/to/sample/input/data
OUTPUT_BASE = /path/to/output/dir
MET_INSTALL_DIR = /path/to/met-X.Y

NOTE: All of these items must be found under the [dir] section.

Expected Output

A successful run will output the following both to the screen and to the logfile:

INFO: METplus has successfully finished running.

Refer to the value set for OUTPUT_BASE to find where the output data was generated. Output for this use case will be found in model_applications/data_assimilation/StatAnalysis_HofX (relative to OUTPUT_BASE) and will contain the following file:

  • dump.out

Keywords

Note

  • StatAnalysisToolUseCase

  • PythonEmbeddingFileUseCase

  • TCandExtraTCAppUseCase

  • NOAAEMCOrgUseCase

  • IODA2NCToolUseCase

Navigate to the METplus Quick Search for Use Cases page to discover other similar use cases.

sphinx_gallery_thumbnail_path = ‘_static/data_assimilation-StatAnalysis_fcstHAFS_obsPrepBufr_JEDI_IODA_interface.png’

Total running time of the script: (0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery