Note

Go to the end to download the full example code.

StatAnalysis: Met Office LFRic UGRID

model_applications/unstructured_grids/StatAnalysis_fcstLFRIC_UGRID_obsASCII_PyEmbed.conf

Scientific Objective

This use case demonstrates the use of python embedding to ingest and perform verification on an unstructured grid. This foregoes the need to interpolate to a regular grid as a step in the verification process, thereby avoiding any incurred interpolation error in the process.

In particular, this use case ingests a UK MET Office LFRic forecast file in NetCDF format, which resides in the UGRID format of the cubed-sphere. The python library Iris was developed to perform analysis on various UGRID formats, and is employed here to ingest the file as well as perform direct interpolation from the native forecast grid to observation locations, thereby forming matched pairs to pass to stat_analysis. In order to perform the interpolation using a nearest-neighbors approach, the geovista python package is also used to form a KD tree to be used in identifying the interpolation points to be used. This package is located at https://github.com/bjlittle/geovista/ and can be installed from a development version. It is also required to install the pyvista python package. ASCII files containing observations are also ingested.

The python embedding script itself performs the interpolation in time, and for this use case thins the observation data in order to reduce the run time. It is also noted that the observations for this use case were fabricated and correlated observation-forecast pairs are not expected.

Version Added

METplus version 5.0

Datasets

Forecast: UK MET Office LFRic forecast files in UGRID NetCDF format

Observation: Observations in ASCII format

Climatology: None

Location: All of the input data required for this use case can be found in a sample data tarball. Each use case category will have one or more sample data tarballs. It is only necessary to download the tarball with the use case’s dataset and not the entire collection of sample data. Click here to access the METplus releases page and download sample data for the appropriate release: https://github.com/dtcenter/METplus/releases This tarball should be unpacked into the directory that you will set the value of INPUT_BASE. See Running METplus section for more information.

METplus Components

This use case utilizes the METplus StatAnalysis wrapper to search for files that are valid for the given case and generate a command to run the MET tool stat_analysis.

METplus Workflow

Beginning time (VALID_BEG): 2021050500

End time (VALID_END): 2021050500

Increment between beginning and end times (VALID_INCREMENT): 6H

Sequence of forecast leads to process (LEAD_SEQ): 0

StatAnalysis is the only tool called in this example. It processes the following run times:

Valid: 2021-05-05_00Z

Forecast lead: 12 hour

METplus Configuration

METplus first loads all of the configuration files found in parm/metplus_config, then it loads any configuration files passed to METplus via the command line, i.e. parm/use_cases/model_applications/unstructured_grids/StatAnalysis_fcstLFRIC_UGRID_obsASCII_PyEmbed.conf

[config]

# Documentation for this use case can be found at
# https://metplus.readthedocs.io/en/latest/generated/model_applications/unstructured_grids/StatAnalysis_fcstLFRIC_UGRID_obsASCII_PyEmbed.html

# For additional information, please see the METplus Users Guide.
# https://metplus.readthedocs.io/en/latest/Users_Guide

###
# Processes to run
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#process-list
###

PROCESS_LIST = StatAnalysis


###
# Time Info
# LOOP_BY options are INIT, VALID, RETRO, and REALTIME
# If set to INIT or RETRO:
#   INIT_TIME_FMT, INIT_BEG, INIT_END, and INIT_INCREMENT must also be set
# If set to VALID or REALTIME:
#   VALID_TIME_FMT, VALID_BEG, VALID_END, and VALID_INCREMENT must also be set
# LEAD_SEQ is the list of forecast leads to process
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#timing-control
###

LOOP_BY = VALID

VALID_TIME_FMT = %Y%m%d%H
VALID_BEG=2021050500
VALID_END=2021050500
VALID_INCREMENT = 6H

LEAD_SEQ = 0


###
# File I/O
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#directory-and-filename-template-info
###

MODEL1_STAT_ANALYSIS_LOOKIN_DIR = python {PARM_BASE}/use_cases/model_applications/unstructured_grids/StatAnalysis_fcstLFRIC_UGRID_obsASCII_PyEmbed/ugrid_lfric_mpr.py {INPUT_BASE}/model_applications/unstructured_grids/StatAnalysis_fcstLFRIC_UGRID_obsASCII_PyEmbed/fcst_data/lfric_ver_20210505_0000.nc {INPUT_BASE}/model_applications/unstructured_grids/StatAnalysis_fcstLFRIC_UGRID_obsASCII_PyEmbed/obs_data

STAT_ANALYSIS_OUTPUT_DIR = {OUTPUT_BASE}/StatAnalysis_UGRID
STAT_ANALYSIS_OUTPUT_TEMPLATE = job.out
MODEL1_STAT_ANALYSIS_DUMP_ROW_TEMPLATE = dump.out


###
# StatAnalysis Settings
# https://metplus.readthedocs.io/en/latest/Users_Guide/wrappers.html#statanalysis
###

MODEL1 = NA
MODEL1_OBTYPE = NA

STAT_ANALYSIS_JOB_NAME = aggregate_stat
STAT_ANALYSIS_JOB_ARGS = -out_line_type CNT -dump_row [dump_row_file] -line_type MPR -by FCST_VAR

MODEL_LIST =
DESC_LIST =
FCST_LEAD_LIST =
OBS_LEAD_LIST =
FCST_VALID_HOUR_LIST = 
FCST_INIT_HOUR_LIST =
OBS_VALID_HOUR_LIST =
OBS_INIT_HOUR_LIST =
FCST_VAR_LIST =
OBS_VAR_LIST =
FCST_UNITS_LIST =
OBS_UNITS_LIST =
FCST_LEVEL_LIST =
OBS_LEVEL_LIST =
VX_MASK_LIST =
INTERP_MTHD_LIST =
INTERP_PNTS_LIST =
FCST_THRESH_LIST =
OBS_THRESH_LIST =
COV_THRESH_LIST =
ALPHA_LIST =
LINE_TYPE_LIST =

GROUP_LIST_ITEMS =
LOOP_LIST_ITEMS = MODEL_LIST

MET Configuration

METplus sets environment variables based on user settings in the METplus configuration file. See How METplus controls MET config file settings for more details.

YOU SHOULD NOT SET ANY OF THESE ENVIRONMENT VARIABLES YOURSELF! THEY WILL BE OVERWRITTEN BY METPLUS WHEN IT CALLS THE MET TOOLS!

If there is a setting in the MET configuration file that is currently not supported by METplus you’d like to control, please refer to: Overriding Unsupported MET config file settings

STATAnalysisConfig_wrapped

////////////////////////////////////////////////////////////////////////////////
//
// STAT-Analysis configuration file.
//
// For additional information, see the MET_BASE/config/README file.
//
////////////////////////////////////////////////////////////////////////////////

//
// Filtering input STAT lines by the contents of each column
//
//model = [
${METPLUS_MODEL}

//desc  = [
${METPLUS_DESC}

//fcst_lead = [
${METPLUS_FCST_LEAD}

//obs_lead  = [
${METPLUS_OBS_LEAD}

//fcst_valid_beg  =
${METPLUS_FCST_VALID_BEG}

//fcst_valid_end  =
${METPLUS_FCST_VALID_END}

fcst_valid_inc  = [];
fcst_valid_exc  = [];

//fcst_valid_hour = [
${METPLUS_FCST_VALID_HOUR}


//obs_valid_beg   =
${METPLUS_OBS_VALID_BEG}

//obs_valid_end   =
${METPLUS_OBS_VALID_END}

obs_valid_inc   = [];
obs_valid_exc   = [];

//obs_valid_hour  = [
${METPLUS_OBS_VALID_HOUR}


//fcst_init_beg   =
${METPLUS_FCST_INIT_BEG}

//fcst_init_end   =
${METPLUS_FCST_INIT_END}

fcst_init_inc   = [];
fcst_init_exc   = [];

//fcst_init_hour  = [
${METPLUS_FCST_INIT_HOUR}


//obs_init_beg    =
${METPLUS_OBS_INIT_BEG}

//obs_init_end    =
${METPLUS_OBS_INIT_END}

obs_init_inc    = [];
obs_init_exc    = [];

//obs_init_hour   = [
${METPLUS_OBS_INIT_HOUR}


//fcst_var = [
${METPLUS_FCST_VAR}
//obs_var  = [
${METPLUS_OBS_VAR}

//fcst_units = [
${METPLUS_FCST_UNITS}
//obs_units  = [
${METPLUS_OBS_UNITS}

//fcst_lev = [
${METPLUS_FCST_LEVEL}
//obs_lev  = [
${METPLUS_OBS_LEVEL}

//obtype = [
${METPLUS_OBTYPE}

//vx_mask = [
${METPLUS_VX_MASK}

//interp_mthd = [
${METPLUS_INTERP_MTHD}

//interp_pnts = [
${METPLUS_INTERP_PNTS}

//fcst_thresh = [
${METPLUS_FCST_THRESH}
//obs_thresh = [
${METPLUS_OBS_THRESH}
//cov_thresh = [
${METPLUS_COV_THRESH}

//alpha = [
${METPLUS_ALPHA}

//line_type = [
${METPLUS_LINE_TYPE}

//column = [
${METPLUS_COLUMN}

//weight = [
${METPLUS_WEIGHT}

////////////////////////////////////////////////////////////////////////////////

//
// Array of STAT-Analysis jobs to be performed on the filtered data
//
//jobs = [
${METPLUS_JOBS}

////////////////////////////////////////////////////////////////////////////////

//
// Confidence interval settings
//
out_alpha = 0.05;

boot = {
   interval = PCTILE;
   rep_prop = 1.0;
   n_rep    = 0;
   rng      = "mt19937";
   seed     = "";
}

////////////////////////////////////////////////////////////////////////////////

//
// WMO mean computation logic
//
wmo_sqrt_stats   = [ "CNT:FSTDEV",  "CNT:OSTDEV",  "CNT:ESTDEV",
                     "CNT:RMSE",    "CNT:RMSFA",   "CNT:RMSOA",
                     "VCNT:FS_RMS", "VCNT:OS_RMS", "VCNT:RMSVE",
                     "VCNT:FSTDEV", "VCNT:OSTDEV" ];

wmo_fisher_stats = [ "CNT:PR_CORR", "CNT:SP_CORR",
                     "CNT:KT_CORR", "CNT:ANOM_CORR" ];

////////////////////////////////////////////////////////////////////////////////

//
// Skill score index options
//
//ss_index_name =
${METPLUS_SS_INDEX_NAME}
//ss_index_vld_thresh =
${METPLUS_SS_INDEX_VLD_THRESH}

////////////////////////////////////////////////////////////////////////////////

//hss_ec_value =
${METPLUS_HSS_EC_VALUE}
rank_corr_flag = FALSE;
vif_flag       = FALSE;

tmp_dir = "${MET_TMP_DIR}";

//version        = "V10.0";

${METPLUS_MET_CONFIG_OVERRIDES}

Python Embedding

This use case uses a Python embedding script to read input data.

parm/use_cases/model_applications/unstructured_grids/StatAnalysis_fcstLFRIC_UGRID_obsASCII_PyEmbed/ugrid_lfric_mpr.py

from __future__ import print_function

import math
import pandas as pd
import numpy as np
import os
from glob import glob
import sys
import xarray as xr
import datetime as dt
import iris
from iris.experimental.ugrid import PARSE_UGRID_ON_LOAD
#geovista from https://github.com/bjlittle/geovista/
import geovista as gv
import geovista.theme
from geovista.common import to_cartesian
import netCDF4
import pyvista as pv
from pykdtree.kdtree import KDTree

from pathlib import Path
from typing import Optional

import matplotlib.pyplot as plt

print(f"{iris.__version__=}")
print(f"{gv.__version__=}")

########################################################################

def read_ascii_obs(files):
    paths = sorted(glob(files))
    datasets = [pd.read_table(p, header=None, delim_whitespace=True) for p in paths]
    combined = pd.concat(datasets)
    return combined

def load_ugrid(
    fname: str,
    data: Optional[bool] = False,
    constraint: Optional[str] = None,
    verbose: Optional[bool] = False
) -> pv.PolyData:
#    fname = BASE_DIR / fname
    with PARSE_UGRID_ON_LOAD.context():
        cube = iris.load_cube(fname, constraint=constraint)
        
    if cube.ndim > 1:
        cube = cube[(0,) * (cube.ndim - 1)]
    
    if verbose:
        print(cube)
    
    data = cube.data if data else None
        
    face_node = cube.mesh.face_node_connectivity
    indices = face_node.indices_by_location()
    lons, lats = cube.mesh.node_coords

    mesh = gv.Transform.from_unstructured(
        lons.points,
        lats.points,
        indices,
        data=data,
        start_index=face_node.start_index,
        name=cube.name(),
    )

    if data is None:
        mesh.active_scalars_name = None
    
    return mesh

def info(mesh: pv.PolyData) -> None:
    print(f"The mesh is a C{int(math.sqrt(mesh.n_cells / 6))}, with 6 panels, {int(mesh.n_cells / 6):,d} cells per panel, and {mesh.n_cells:,d} cells.")

def find_nearest(tree, points, poi, k):
    # lat/lon to xyz
    xyz = to_cartesian(*poi)
    
    # find the k nearest euclidean neighbours
    dist, idxs = tree.query(xyz, k=k)
    
    if idxs.ndim > 1:
        idxs = idxs[0]
    
    # retieve the associated xyz points of the k nearest neighbours
    nearest = points[idxs]
    
    return xyz, nearest, idxs

def to_centers(mesh: pv.PolyData) -> pv.PolyData:
    tmp = mesh.copy()
    tmp.clear_cell_data()
    tmp.clear_point_data()
    tmp.clear_field_data()
    return tmp.cell_centers()

########################################################################
print('Python Script:\t', sys.argv[0])

# Input is directory of .nc lfric files and a directory of ascii obs filess

if len(sys.argv) == 3:
    # Read the input file as the first argument
    input_fcst_dir = os.path.expandvars(sys.argv[1])
    input_obs_dir = os.path.expandvars(sys.argv[2])
    try:
        print("Input Forecast Dir:\t" + repr(input_fcst_dir))
        print("Input Observations Dir:\t" + repr(input_obs_dir))

        #Read all obs from directory
        obs_data = read_ascii_obs(input_obs_dir+'/*.ascii')
        print(obs_data.shape)
        obs_data = obs_data.iloc[::1000, :]#thin for testing
        obs_data = obs_data.rename(columns={0:'message_type', 1:'station_id', 2:'obs_valid_time', 3:'obs_lat', 4:'obs_lon', \
                             5:'elevation', 6:'var_name', 7:'level', 8:'height', 9:'qc_string', 10:'obs_value'})
   
        obs_vars = ['UGRD', 'VGRD', 'TMP', 'RH']
        fcst_vars = ['u10m', 'v10m', 't1p5m', 'rh1p5m']

        #open the netcdf forecast to access data values and list of times
        fcst_data = xr.open_dataset(input_fcst_dir)
        fcst_times = pd.to_datetime(fcst_data.coords['time_centered'])

        match_df = pd.DataFrame(columns=['message_type', 'station_id', 'obs_valid_time', 'obs_lat', 'obs_lon', \
                'elevation', 'var_name', 'level', 'height', 'qc_string', 'obs_value', 'idx_nearest, fcst_value'])

        for idx1, (obs_var, fcst_var) in enumerate(zip(obs_vars, fcst_vars)):
        
            #load forecast as an iris cube
            fcst_mesh = load_ugrid(input_fcst_dir, constraint=fcst_var)
            info(fcst_mesh)
        
            #get indices of nearest cell center
            fcst_centers = to_centers(fcst_mesh)
            points = fcst_centers.points
            tree = KDTree(points)

            #get the forecast data values loaded
            fcst_df = fcst_data[fcst_var].to_dataframe()
            print(fcst_df)
        
            #get obs data for variable
            var_data = obs_data.loc[obs_data['var_name'] == obs_var].reset_index(drop=True)

            for idx2, row in var_data.iterrows():
                xyz, nearest, idx_nearest = find_nearest(tree, points, [row['obs_lat'], row['obs_lon']], k=1)
                var_data.at[idx2,'idx_nearest'] = int(idx_nearest)

                #get the obs time, search for closest in the forecast data
                time = dt.datetime.strptime(row['obs_valid_time'],'%Y%m%d_%H%M%S')
                match_time = min(fcst_times, key=lambda  d: abs(d - time))
                match_idx = np.argmin(np.abs(fcst_times - time))

                #add matched fcst value to data
                var_data.at[idx2, 'fcst_value'] = fcst_df.loc[(match_idx,int(idx_nearest)), fcst_var]
                var_data.at[idx2, 'fcst_lat'] = fcst_df.loc[(match_idx,int(idx_nearest)), 'Mesh2d_face_x']
                var_data.at[idx2, 'fcst_lon'] = fcst_df.loc[(match_idx,int(idx_nearest)), 'Mesh2d_face_y']
                var_data.at[idx2, 'fcst_time'] = fcst_df.loc[(match_idx,int(idx_nearest)), 'time_centered']
                
            #check results
            #with pd.option_context('display.max_rows', None):
            #    print(var_data[['obs_lat','fcst_lat','obs_lon','fcst_lon','obs_value','fcst_value','obs_valid_time','fcst_time']])
            with pd.option_context('display.max_columns', 500, 'display.max_rows', 100, 'display.width', 500):
                print(var_data)
            ob_vals = var_data['obs_value'].values
            f_vals = var_data['fcst_value'].values

            match_df = pd.concat([match_df, var_data], ignore_index=True)

        nlocs = len(match_df.index)
        print('Number of locations in matched set: ' + str(nlocs)) 
            
        # Add additional columns
        match_df['lead'] = '000000'
        match_df['MPR'] = 'MPR'
        match_df['nobs'] = nlocs
        match_df['index'] = range(0,nlocs)
        match_df['na'] = 'NA'
        match_df['QC'] = '0'

        # Arrange columns in MPR format
        cols = ['na','na','lead','obs_valid_time','obs_valid_time','lead','obs_valid_time',
                'obs_valid_time','var_name','na','lead','var_name','na','na',
                'var_name','na','na','lead','na','na','na','na','MPR',
                'nobs','index','station_id','obs_lat','obs_lon',
                'level','na','fcst_value','obs_value',
                'QC','na','na','na','na','na']
        
        match_df = match_df[cols]

        # Into a list and all to strings
        mpr_data = [list( map(str,i) ) for i in match_df.values.tolist() ]

    except NameError: 
        print("Can't find the input files or the variables.")
        print("Variables in this file:\t" + repr(var_list))
else:
    print("ERROR: ugrid_lfric_mpr.py -> Must specify directory of files.\n")
    sys.exit(1)

########################################################################

For more information on the basic requirements to utilize Python Embedding in METplus, please refer to the MET User’s Guide section on Python embedding.

User Scripting

User Scripting is not used in this use case.

Running METplus

Pass the use case configuration file to the run_metplus.py script along with any user-specific system configuration files if desired:

run_metplus.py /path/to/StatAnalysis_fcstLFRIC_UGRID_obsASCII_PyEmbed.conf /path/to/user_system.conf

See Running METplus for more information.

Expected Output

A successful run will output the following both to the screen and to the logfile:

INFO: METplus has successfully finished running.

Refer to the value set for OUTPUT_BASE to find where the output data was generated. Output for this use case will be found in StatAnalysis_UGRID (relative to OUTPUT_BASE) and will contain the following file:

dump.out

Keywords

Note

StatAnalysisToolUseCase
PythonEmbeddingFileUseCase
UnstructureGridsUseCase

Navigate to the METplus Quick Search for Use Cases page to discover other similar use cases.

sphinx_gallery_thumbnail_path = ‘_static/unstructured_grids-StatAnalysis_fcstLFRIC_UGRID_obsASCII_PyEmbed.png’

Gallery generated by Sphinx-Gallery