UserScript: Reformat MET .stat ECNT data, calculate aggregation statistics, and generate a spread skill plot

model_applications/ short_range/ UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot_ecnt_spread_skill.py

Scientific Objective

This use case illustrates how to use MET .stat output (using the ECNT linetype data) to generate a spread skill plot using a subset of the METplus Analysis Tools (METdataio, METcalcpy, and METplotpy). The METdataio METreformat module extracts the ECNT linetype data and performs reformatting, the METcalcpy agg-stat module performs aggregation, and the METplotpy line plot is used to generate the spread skill plot.

Datasets

  • Forecast dataset: RRFS GEFS (Rapid Refresh Forecast System Global Ensemble Forecast System)

  • Observation dataset: None

Input: MET .stat files from MET ensemble-stat tool for RRFS for 20220506

Location: All the input data required for this use case can be found in the met_test sample data tarball (sample_data-short_range.tgz).

Click here to see the METplus releases page and download sample data for the appropriate release: https://github.com/dtcenter/METplus/releases

See Running METplus section for more information.

This tarball should be unpacked into the directory corresponding to the value of INPUT_BASE in the User Configuration File section.

External Dependencies

You will need to use the version of Python that is required for the METplus version in use. Refer to the Installation section of the User’s Guide for basic Python requirements: https://metplus.readthedocs.io/en/latest/Users_Guide/installation.html

The METplus Analysis tools: METdataio, METcalcpy, and METplotpy have the additional third-party Python package requirements. The version numbers are found in the requirements.txt file found at the top-level directory of each repository.

  • lxml

  • pandas

  • pyyaml

  • numpy

  • netcdf4

  • xarray

  • scipy

  • metpy

  • pint

  • python-dateutil

  • kaleido (python-kaleido)

  • plotly

  • matplotlib

METplus Components

This use case runs the UserScript wrapper tool to run a user provided script, in this case, reformat_ecnt_linetype.py, agg_stat_ecnt.py and plot_spread_skill.py. It also requires the METdataio, METcalcpy and METplotpy source code to reformat the MET .stat output, perform aggregation, and generate the plot. Clone the METdataio repository (https://github.com/dtcenter/METdataio), METcalcpy repository (https://github.com/dtcenter/METcalcpy, and the METplotpy repository (https://github.com/dtcenter/METplotpy) under the same base directory as the METPLUS_BASE directory so that the METdataio, METcalcpy, and METplotpy directories are under the same base directory (i.e. if the METPLUS_BASE directory is /home/username/working/METplus, then clone the METdataio, METcalcpy and METplotpy source code into the /home/username/working directory)

Clone the METdataio, METcalcpy, and METplotpy source code from their repositories under a base directory. The repositories are located:

Define the OUTPUT_BASE, INPUT_BASE, and MET_INSTALL_DIR settings in the user configuration file. For instructions on how to set up the user configuration file, refer to the User ConfigurationFile section.

METplus Workflow

This use case reads in the MET .stat output that contains the ECNT linetype (from the MET ensemble-stat tool). The .stat output MUST reside under one directory. If .stat files are spread among multiple directories, these must be consolidated under a single directory. The use case loops over three processes: reformatting, aggregating, and plotting.

METplus Configuration

METplus first loads all the configuration files found in parm/metplus_config, then it loads any configuration files passed to METplus via the command line with the -c option, i.e. -c parm/use_cases/model_applications/short_range/UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot.conf

[config]

# Documentation for this use case can be found at
# https://metplus.readthedocs.io/en/latest/generated/model_applications/short-range/UserScript_fcstRRFS_obsOnly_Reformat_Aggregate_Plot.html

# For additional information, please see the METplus Users Guide.
# https://metplus.readthedocs.io/en/latest/Users_Guide

###
# Processes to run
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#process-list
###

PROCESS_LIST = UserScript(reformatter), UserScript(aggregate), UserScript(plotting)

###
# Time Info
# LOOP_BY options are INIT, VALID, RETRO, and REALTIME
# If set to INIT or RETRO:
#   INIT_TIME_FMT, INIT_BEG, INIT_END, and INIT_INCREMENT must also be set
# If set to VALID or REALTIME:
#   VALID_TIME_FMT, VALID_BEG, VALID_END, and VALID_INCREMENT must also be set
# LEAD_SEQ is the list of forecast leads to process
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#timing-control
###

LOOP_BY = VALID
VALID_TIME_FMT = %Y%m%d_%H%M%S
VALID_BEG = 20220506_000000

USER_SCRIPT_RUNTIME_FREQ = RUN_ONCE
LOOP_ORDER = processes

###
# UserScript Settings
# https://metplus.readthedocs.io/en/latest/Users_Guide/wrappers.html#userscript
###


[user_env_vars]
REFORMAT_YAML_CONFIG_NAME = {PARM_BASE}/use_cases/model_applications/short_range/UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot/reformat_ecnt.yaml
AGGREGATE_YAML_CONFIG_NAME = {PARM_BASE}/use_cases/model_applications/short_range/UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot/aggregate_ecnt.yaml
PLOTTING_YAML_CONFIG_NAME = {PARM_BASE}/use_cases/model_applications/short_range/UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot/plot_spread_skill.yaml
REFORMAT_INPUT_BASE = {INPUT_BASE}/model_applications/short_range/UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot
REFORMAT_OUTPUT_BASE = {OUTPUT_BASE}/reformatted
AGGREGATE_INPUT_BASE = {REFORMAT_OUTPUT_BASE}
AGGREGATE_OUTPUT_BASE = {OUTPUT_BASE}/aggregated
PLOT_INPUT_BASE = {AGGREGATE_OUTPUT_BASE}
PLOT_OUTPUT_BASE = {OUTPUT_BASE}/plot
METDATAIO_BASE = {METPLUS_BASE}/../METdataio
METCALCPY_BASE = {METPLUS_BASE}/../METcalcpy
METPLOTPY_BASE = {METPLUS_BASE}/../METplotpy
PYTHONPATH = {METDATAIO_BASE}:{METDATAIO_BASE}/METdbLoad:{METDATAIO_BASE}/METdbLoad/ush:{METDATAIO_BASE}/METreformat:{METCALCPY_BASE}:{METCALCPY_BASE}/metcalcpy:{METPLOTPY_BASE}:{METPLOTPY_BASE}/metplotpy/plots


[reformatter]
USER_SCRIPT_COMMAND = {PARM_BASE}/use_cases/model_applications/short_range/UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot/reformat_ecnt_linetype.py

[aggregate]
USER_SCRIPT_COMMAND = {PARM_BASE}/use_cases/model_applications/short_range/UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot/aggregate_ecnt.py

[plotting]
USER_SCRIPT_COMMAND = {PARM_BASE}/use_cases/model_applications/short_range/UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot/plot_spread_skill.py

MET Configuration

There are no MET tools used in this use case. The use case uses MET .stat output as input for the reformatting step.

Python Embedding

There is no python embedding in this use case

Python Scripts

This use case uses Python scripts to invoke the METdataio reformatter, the METcalcpy aggregator, and the METplotpy line plot.

The following Python script (from METdataio) is used to reformat the MET .stat ECNT linetype data into a format that can be used by the aggregating script.

#!/usr/bin/env python3


import os
import time
import logging

from METdbLoad.ush.read_data_files import ReadDataFiles
from METdbLoad.ush.read_load_xml import XmlLoadFile
from METreformat.write_stat_ascii import WriteStatAscii
from metcalcpy.util import read_env_vars_in_config as readconfig


logger = logging.getLogger(__name__)

def main():

    # Read in the YAML configuration file.  Environment variables in
    # the configuration file are supported.
    input_config_file = os.getenv("REFORMAT_YAML_CONFIG_NAME", "reformat_ecnt.yaml")
    settings = readconfig.parse_config(input_config_file)
    logging.info(settings)


    # Replacing the need for an XML specification file, pass in the XMLLoadFile and
    # ReadDataFile parameters
    rdf_obj: ReadDataFiles = ReadDataFiles()
    xml_loadfile_obj: XmlLoadFile = XmlLoadFile(None)

    # Retrieve all the filenames in the data_dir specified in the YAML config file
    load_files = xml_loadfile_obj.filenames_from_template(settings['input_data_dir'],
                                                          {})

    flags = xml_loadfile_obj.flags
    line_types = xml_loadfile_obj.line_types
    beg_read_data = time.perf_counter()
    rdf_obj.read_data(flags, load_files, line_types)
    end_read_data = time.perf_counter()
    time_to_read = end_read_data - beg_read_data
    logger.info("Time to read input .stat data files using METdbLoad: {time_to_read}")
    file_df = rdf_obj.stat_data

    # Check if the output file already exists, if so, delete it to avoid
    # appending output from subsequent runs into the same file.
    existing_output_file = os.path.join(settings['output_dir'], settings['output_filename'])
    logger.info("Checking if {existing_output_file}  already exists")
    if os.path.exists(existing_output_file):
        logger.info("Removing existing output file {existing_output_file}")
        os.remove(existing_output_file)

    # Write stat file in ASCII format
    stat_lines_obj: WriteStatAscii = WriteStatAscii(settings)
    # stat_lines_obj.write_stat_ascii(file_df, parms, logger)
    stat_lines_obj.write_stat_ascii(file_df, settings)


if __name__ == "__main__":
    main()

This Python script (from METcalcpy) is used to calculate aggregation statistics for the ECNT linetype.

#!/usr/bin/env python3


import os
import time
import logging
import pandas as pd
import yaml
from metcalcpy.util import read_env_vars_in_config as readconfig
from metcalcpy.agg_stat import AggStat

logger = logging.getLogger(__name__)

def main():
    '''
       Read in the config file (with ENVIRONMENT variables defined in the
       UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot.conf).  Invoke METcalcpy agg_stat module to
       calculate the aggregation statistics, clean up the data so it is compatible for the METplotpy line plot
       and write a tab-separated ASCII file.

    '''

    start_agg_step = time.time()

    # Read in the YAML configuration file.  Environment variables in
    # the configuration file are supported.
    try:
        input_config_file = os.getenv("AGGREGATE_YAML_CONFIG_NAME", "aggregate_ecnt.yaml")
        settings = readconfig.parse_config(input_config_file)
        logger.info(settings)
    except yaml.YAMLError as exc:
        logger.error(exc)

    # Calculate the aggregation statistics using METcalcpy agg_stat
    agg_begin = time.time()
    try:
       os.mkdir(os.getenv("AGGREGATE_OUTPUT_BASE"))
    except OSError:
        # Directory already exists, ignore error.
        pass

    AGG_STAT = AggStat(settings)
    AGG_STAT.calculate_stats_and_ci()
    agg_finish = time.time()
    time_for_aggregation = agg_finish - agg_begin
    logger.info("Total time for calculating aggregation statistics (in seconds): {time_for_aggregation}")

    # Add a 'dummy' column (fcst_valid with the same values as fcst_lead)
    # to the output data.  The aggregation was based
    # on the fcst_lead BUT the line plot requires a *second* time-related
    # column (i.e. fcst_init_beg, fcst_valid, etc.) to identify unique
    # points.  In this case, the aggregated data already consists of
    # unique points. If any other time column was used, this
    # step would not be required.
    output_file = settings['agg_stat_output']
    df = pd.read_csv(output_file, sep='\t')
    df['fcst_valid'] = df['fcst_lead']
    df.to_csv(output_file, sep='\t')

    finish_agg = time.time()
    total_agg_step = finish_agg - start_agg_step
    logger.info("Total time for performing the aggregation step (in sec): {total_agg_step} ")


if __name__ == "__main__":
    main()

Finally,this Python script (from METplotpy) is used to generate a spread-skill plot using the METplotypy line plot code.

#!/usr/bin/env python3


import os
from time import perf_counter
import logging
import yaml
import metcalcpy.util.read_env_vars_in_config as readconfig
from metplotpy.plots.line import line

def main():

    # Read in the YAML configuration file.  Environment variables in
    # the configuration file are supported.
    try:
        input_config_file = os.getenv("PLOTTING_YAML_CONFIG_NAME", "plot_spread_skill.yaml")
        settings = readconfig.parse_config(input_config_file)
        logging.info(settings)
    except yaml.YAMLError as exc:
        logging.error(exc)

    try:
        start = perf_counter()
        plot = line.Line(settings)
        plot.save_to_file()
        plot.write_html()
        plot.write_output_file()
        end = perf_counter()
        execution_time = end - start
        plot.line_logger.info(f"Finished creating line plot, execution time: {execution_time} seconds")
    except ValueError as val_er:
        print(val_er)

if __name__ == "__main__":
  main()

Running METplus

This use case can be run two ways:

1) Passing in UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot.conf, then a user-specific system configuration file:

run_metplus.py -c /path/to/METplus/parm/use_cases/model_applications/short_range/UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot.conf -c /path/to/user_system.conf
  1. Modifying the configurations in parm/metplus_config, then passing in UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot.conf:

    run_metplus.py -c /path/to/METplus/parm/use_cases/model_applications/short_range/UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot.conf
    

The former method is recommended. Whether you add them to a user-specific configuration file or modify the metplus_config files, the following variables must be set correctly:

  • INPUT_BASE - Path to directory where sample data tarballs are unpacked (See Datasets section to obtain tarballs). This is not required to run METplus, but it is required to run the examples in parm/use_cases

  • OUTPUT_BASE - Path where METplus output will be written. This must be in a location where you have write permissions

  • MET_INSTALL_DIR - Path to location where MET is installed locally

and for the [exe] section, you will need to define the location of NON-MET executables. If the executable is in the user’s path, METplus will find it from the name. If the executable is not in the path, specify the full path to the executable here (i.e. RM = /bin/rm) The following executables are required for performing series analysis use cases:

Example User Configuration File:

[config]
INPUT_BASE = /path/to/sample/input/data
OUTPUT_BASE = /path/to/output/dir
MET_INSTALL_DIR = /path/to/met-X.Y


[exe]
RM = /path/to/rm
CUT = /path/to/cut
TR = /path/to/tr
NCAP2 = /path/to/ncap2
CONVERT = /path/to/convert
NCDUMP = /path/to/ncdump

Expected Output

A successful run will output the following both to the screen and to the logfile, one for the reformat, aggregate, and plot steps of the use case:

INFO: METplus has successfully finished running.

Reformat Output

The reformatted ensemble-stat ECNT linetype data should exist in the location specified in the user configuration file (OUTPUT_BASE). Verify that the ensemble_stat_ecnt.data file exists. The file now has all the statistics under the stat_name and stat_value columns, all ECNT statistic columns labelled with their corresponding names (e.g. crps, crpss, rmse, etc.) and confidence level values under the following columns: stat_btcl and stat_btcu

Aggregation Output

The METcalcpy agg_stat module is used to calculate aggregated statistics and confidence intervals for each series (line) point.

Plot Output

A spread-skill plot of temperature for the RMSE, SPREAD_PLUS_OERR, and a ratio line of SPREAD_PLUS_OERR/RMSE is created and found in the output location specified in the user configuration file (OUTPUT_BASE). The plot is named short-range_UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot.png

Keywords

Note

  • UserScriptUseCase

  • ShortRangeAppUseCase

  • METdataioUseCase

  • METcalcpyUseCase

  • METplotpyUseCase

Navigate to the METplus Quick Search for Use Cases page to discover other similar use cases.

sphinx_gallery_thumbnail_path = ‘_static/short-range_UserScript_fcstRRFS_fcstOnly_Reformat_Aggregate_Plot.png’

Total running time of the script: (0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery