SeriesAnalysis: Verifying seasonal models across conditional times

model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCN_SelectClimateRegimes.conf

Scientific Objective

This use case uses conditional time periods for verification, relying on METplus tool instances and valid listing to create specific verifiction windows. Instead of aggregating statistics over the entire hindcast period, this configuration allows users to assess model skill during specific climate regimes, such as El Niño, La Niña, or Neutral years.

Version Added

METplus version 13.0

Datasets

Forecast: Climate Forecast System ver 2 (CFSv2) and Seasonal Forecast System (SFS) Baseline, 1 degree resolution, 2m temperature

Observation: Global Historical Climatology Network ver. 2, Climatology Anomaly Monitoring System (GHCNCAMS) 1 degree resolution, 2m temperature

Climatology: None

Note that this use case ingests the same dataset used in two other use cases in the this category, and in an effort to mimize data duplication, paths to the data may reference another use case name

Location: All of the input data required for this use case can be found in a sample data tarball. Each use case category will have one or more sample data tarballs. It is only necessary to download the tarball with the use case’s dataset and not the entire collection of sample data. Click here to access the METplus releases page and download sample data for the appropriate release: https://github.com/dtcenter/METplus/releases This tarball should be unpacked into the directory that you will set the value of INPUT_BASE. See Running METplus section for more information.

METplus Components

The only tool this use case calls is SeriesAnalysis, but there are four instances of SeriesAnalysis. A Python script is used for ingesting forecast and observation data, once for each year of data in the CFSv2 and SFS ensembles.

METplus Workflow

For the initial instance of SeriesAnalysis, the timing info is:

Beginning time (INIT_BEG): 1994-11-01

End time (INIT_END): 2020-11-01

Increment between beginning and end times (INIT_INCREMENT): 1 year

Sequence of forecast leads to process (LEAD_SEQ): None

For the remaining instances of SeriesAnalysis, the VALID_LIST setting is utilized. The following details what each instance has for each VALID_LIST.

ElNino instance:

VALID_LIST : 1997110100, 2002110100, 2004110100, 2006110100, 2009110100, 2014110100, 2015110100, 2018110100

LaNina instance:

VALID_LIST : 1995110100, 1998110100, 1999110100, 2000110100, 2005110100, 2007110100, 2008110100, 2010110100, 2011110100, 2016110100, 2017110100, 2020110100

Neutral instance:

VALID_LIST : 199610100, 2001110100, 2003110100, 2012110100, 2013110100, 2019110100

With an increment of 1 year, all November 1st’s from 1994 to 2020 are processed for a total of 27 years for the first instance of SeriesAnalysis. In the following three instances, the VALID_LIST controls how many years are read in, ranging from six to twelve years. This use case utilizes Python Embedding to process the forecast and observation inputs, reading user-set variables like TIME_PERIOD and CLIM to fine-tune what temporal size the means are calcuated over and what climatology period is available, respectively. Outputs of ME, MAE, and RMSE, among others, provide climate-useful statistical output. Instead of utilizing LEAD_SEQ, this use case uses the CUSTOM_LOOP_LIST for 1, 2, and 3 month leads. This is done so that the Python script can access an actual number rather than a date for processing.

METplus Configuration

METplus first loads all of the configuration files found in parm/metplus_config, then it loads any configuration files passed to METplus via the command line, e.g. parm/use_cases/model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SelectClimateRegimes.conf

[config]

# For additional information, please see the METplus Users Guide.
# https://metplus.readthedocs.io/en/latest/Users_Guide

###
# Processes to run
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#process-list
###

PROCESS_LIST = SeriesAnalysis, SeriesAnalysis(ElNino), SeriesAnalysis(LaNina), SeriesAnalysis(Neutral)

###
# Field Info
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#field-info
###

# USER INPUT
# Note that MODEL can be a single or group of models 
# Individual models should have a definition of their name and number of ensemble members on approx line 8 of function_library.py
# MODEL_GROUPS should be defined in function_library.py approx line 15
# FIELD1 can be raw, anom, std_anom, clim_mean, clim_std, lower_tercile, middle_tercile, upper_tercile
# TIME_PERIOD can be monthly or seasonal means
# SERIES_ANALYSIS_CUSTOM_LOOP_LIST loops through leads 1, 2, and 3 in this example.
MODEL = CFSandSFS
FIELD1 = raw
CLIM = 1994_2020
TIME_PERIOD = seasonal

# CUSTOM LOOP LIST 
# Taking the place of LEAD
SERIES_ANALYSIS_CUSTOM_LOOP_LIST = 01, 02, 03

LOOP_BY = INIT
INIT_TIME_FMT = %Y%m%d%H
INIT_BEG=1994110100
INIT_END=2020110100
INIT_INCREMENT = 1Y

# RUN 1 - All years
FCST_SERIES_ANALYSIS_INPUT_TEMPLATE = {init?fmt=%Y}
OBS_SERIES_ANALYSIS_INPUT_TEMPLATE = {init?fmt=%Y}

FCST_SERIES_ANALYSIS_INPUT_DATATYPE = PYTHON_NUMPY
OBS_SERIES_ANALYSIS_INPUT_DATATYPE = PYTHON_NUMPY

FCST_VAR1_NAME = {CONFIG_DIR}/wrapper_combined.py 11 {custom?fmt=%s} tmp2m {CLIM} {MODEL} {INPUT_BASE}/model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SingleOrMultimodel/fcst {TIME_PERIOD} MET_PYTHON_INPUT_ARG {FIELD1}
FCST_VAR1_LEVELS = "(*,*)"
OBS_VAR1_NAME = {CONFIG_DIR}/wrapper_combined.py 11 {custom?fmt=%s} tmp2m {CLIM} obs {INPUT_BASE}/model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SingleOrMultimodel/obs {TIME_PERIOD} MET_PYTHON_INPUT_ARG {FIELD1}
OBS_VAR1_LEVELS = "(*,*)"

CONFIG_DIR = {PARM_BASE}/use_cases/model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SelectClimateRegimes

SERIES_ANALYSIS_VLD_THRESH = 0.7
SERIES_ANALYSIS_BLOCK_SIZE = 360*181
SERIES_ANALYSIS_CAT_THRESH = >=0.5

SERIES_ANALYSIS_OUTPUT_PREFIX = 

SERIES_ANALYSIS_RUNTIME_FREQ = RUN_ONCE

SERIES_ANALYSIS_STAT_LIST = ME, MAE, RMSE, TOTAL, FBAR, OBAR

SERIES_ANALYSIS_ONCE_PER_FIELD = False


SERIES_ANALYSIS_OUTPUT_DIR = {OUTPUT_BASE}/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SelectClimateRegimes
SERIES_ANALYSIS_OUTPUT_TEMPLATE = Bias_SeriesAnalysis_tmp2m_{FIELD1}_{MODEL}_IC11_Lead{custom?fmt=%s}{TIME_PERIOD}_{CLIM}.nc

# Run 2 - El Nino
[ElNino]
LOOP_BY = VALID
VALID_BEG=1994110100
VALID_END=2020110100
VALID_TIME_FMT = %Y%m%d%H
VALID_LIST = 1997110100,2002110100,2004110100,2006110100,2009110100,2014110100,2015110100,2018110100

FCST_SERIES_ANALYSIS_INPUT_TEMPLATE = {valid?fmt=%Y}
OBS_SERIES_ANALYSIS_INPUT_TEMPLATE = {valid?fmt=%Y}

SERIES_ANALYSIS_OUTPUT_TEMPLATE = Bias_SeriesAnalysis_tmp2m_{FIELD1}_{MODEL}_IC11_Lead{custom?fmt=%s}{TIME_PERIOD}_{CLIM}_ElNino.nc

# Run 3 - La Nina
[LaNina]
LOOP_BY = VALID
VALID_BEG=1994110100
VALID_END=2020110100
VALID_TIME_FMT = %Y%m%d%H
VALID_LIST = 1995110100,1998110100,1999110100,2000110100,2005110100,2007110100,2008110100,2010110100,2011110100,2016110100,2017110100,2020110100

FCST_SERIES_ANALYSIS_INPUT_TEMPLATE = {valid?fmt=%Y}
OBS_SERIES_ANALYSIS_INPUT_TEMPLATE = {valid?fmt=%Y}

SERIES_ANALYSIS_OUTPUT_TEMPLATE = Bias_SeriesAnalysis_tmp2m_{FIELD1}_{MODEL}_IC11_Lead{custom?fmt=%s}{TIME_PERIOD}_{CLIM}_LaNina.nc


# Run 4 - Neutral
[Neutral]
LOOP_BY = VALID
VALID_BEG=1994110100
VALID_END=2020110100
VALID_TIME_FMT = %Y%m%d%H
VALID_LIST = 1996110100,2001110100,2003110100,2012110100,2013110100,2019110100

FCST_SERIES_ANALYSIS_INPUT_TEMPLATE = {valid?fmt=%Y}
OBS_SERIES_ANALYSIS_INPUT_TEMPLATE = {valid?fmt=%Y}

SERIES_ANALYSIS_OUTPUT_TEMPLATE = Bias_SeriesAnalysis_tmp2m_{FIELD1}_{MODEL}_IC11_Lead{custom?fmt=%s}{TIME_PERIOD}_{CLIM}_Neutral.nc

MET Configuration

METplus sets environment variables based on user settings in the METplus configuration file. See How METplus controls MET config file settings for more details.

YOU SHOULD NOT SET ANY OF THESE ENVIRONMENT VARIABLES YOURSELF! THEY WILL BE OVERWRITTEN BY METPLUS WHEN IT CALLS THE MET TOOLS!

If there is a setting in the MET configuration file that is currently not supported by METplus you’d like to control, please refer to: Overriding Unsupported MET config file settings

SeriesAnalysisConfig_wrapped
////////////////////////////////////////////////////////////////////////////////
//
// Series-Analysis configuration file.
//
// For additional information, see the MET_BASE/config/README file.
//
////////////////////////////////////////////////////////////////////////////////

//
// Output model name to be written
//
//model =
${METPLUS_MODEL}

//
// Output description to be written
//
//desc =
${METPLUS_DESC}

//
// Output observation type to be written
//
//obtype =
${METPLUS_OBTYPE}

////////////////////////////////////////////////////////////////////////////////

//
// Verification grid
// May be set separately in each "field" entry
//
//regrid = {
${METPLUS_REGRID_DICT}

////////////////////////////////////////////////////////////////////////////////

censor_thresh = [];
censor_val    = [];
//cat_thresh =
${METPLUS_CAT_THRESH}
cnt_thresh    = [ NA ];
cnt_logic     = UNION;

//
// Forecast and observation fields to be verified
//
fcst = {
   ${METPLUS_FCST_FILE_TYPE}
   ${METPLUS_FCST_CAT_THRESH}
   //field = [
   ${METPLUS_FCST_FIELD}
   ${METPLUS_FCST_CLIMO_MEAN_DICT}
   ${METPLUS_FCST_CLIMO_STDEV_DICT}
}
obs = {
   ${METPLUS_OBS_FILE_TYPE}
   ${METPLUS_OBS_CAT_THRESH}
   //field = [
   ${METPLUS_OBS_FIELD}
   ${METPLUS_OBS_CLIMO_MEAN_DICT}
   ${METPLUS_OBS_CLIMO_STDEV_DICT}
}

////////////////////////////////////////////////////////////////////////////////

//
// Climatology data
//
//climo_mean = {
${METPLUS_CLIMO_MEAN_DICT}


//climo_stdev = {
${METPLUS_CLIMO_STDEV_DICT}

//climo_cdf = {
${METPLUS_CLIMO_CDF_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Confidence interval settings
//
ci_alpha  = [ 0.05 ];

boot = {
   interval = PCTILE;
   rep_prop = 1.0;
   n_rep    = 0;
   rng      = "mt19937";
   seed     = "";
}

////////////////////////////////////////////////////////////////////////////////

//
// Verification masking regions
//
//mask = {
${METPLUS_MASK_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Gradient statistics
// May be set separately in each "obs.field" entry
//
//gradient = {
${METPLUS_GRADIENT_DICT}

//
// Number of grid points to be processed concurrently.  Set smaller to use
// less memory but increase the number of passes through the data.
//
//block_size =
${METPLUS_BLOCK_SIZE}

//
// Ratio of valid matched pairs to compute statistics for a grid point
//
//vld_thresh =
${METPLUS_VLD_THRESH}

////////////////////////////////////////////////////////////////////////////////

//
// Statistical output types
//
//output_stats = {
${METPLUS_OUTPUT_STATS_DICT}

////////////////////////////////////////////////////////////////////////////////

//hss_ec_value =
${METPLUS_HSS_EC_VALUE}
rank_corr_flag = FALSE;

tmp_dir = "${MET_TMP_DIR}";

//version        = "V10.0";

////////////////////////////////////////////////////////////////////////////////

${METPLUS_TIME_OFFSET_WARNING}
${METPLUS_MET_CONFIG_OVERRIDES}

Python Embedding

This use case utilizes two Python scripts. The first, function_library.py, serves the purpose of data handling. It open and reads in NMME and observational data, while also calculating climatologies, anomalies, and tercile probabilities using CPC methodologies (including non-normal assumption for precipitation terciles, or other variables as needed). For more simple changes, users can add additional models under MODEL_SPECS (in the format ‘model_name’: nMembers); and additional model groupings can be added under MODEL_GROUPS, in the format ‘short_name’: [‘list’, ‘of’, ‘models’]. Both of these are near the top of the function library. Additional observed datasets can be added under the file_map and var_map, under create_obs_anomalies. The second script, wrapper_combined.py, serves as the interface between python logic and METplus. Based on options in the METplus config file, it formats model and observational data (e.g., standardizing to lat x lon grids) and holds it in memory for METplus to ingest. It also feature options for flags (FLIP_OBS, FLIP_MODELS) to handle latitude orientation mismatches, ensuring data is geometrically correct before MET sees it.

parm/use_cases/model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SelectClimateRegimes/function_library.py
import numpy as np
import xarray as xr
import os


# --------------------------------------------------------------------------------------------------
# 1. Define nMembers in each model
MODEL_SPECS = {
    'CFSv2': 24, 'NCAR_CCSM4': 10, 'GEM5_NEMO': 10, 'NASA_GEOS5v2': 4,
    'CanCM4i': 10, 'GFDL_SPEAR': 15, 'GEM5.2_NEMO': 20, 'CanESM5': 20,
    'SFS_Baseline': 9, 'NCAR_CESM1': 10, 'obs': 1
}

# 2. Define Model Groups
MODEL_GROUPS = {
    'NMME': ['CFSv2', 'NCAR_CCSM4', 'GEM5.2_NEMO', 'NASA_GEOS5v2', 'CanESM5', 'GFDL_SPEAR'],
    'miniNMME': ['CFSv2', 'NCAR_CCSM4', 'GFDL_SPEAR'],
    'miniNMME_addSFS': ['CFSv2', 'NCAR_CCSM4', 'GFDL_SPEAR', 'SFS_Baseline'],
    'miniNMME_replaceCFSwSFS': ['SFS_Baseline', 'NCAR_CCSM4', 'GFDL_SPEAR'],
    'NMMEwithCFS': ['CFSv2', 'NCAR_CCSM4', 'NCAR_CESM1', 'GFDL_SPEAR', 'GEM5.2_NEMO', 'CanESM5', 'NASA_GEOS5v2'],
    'NMMEwithSFS': ['SFS_Baseline', 'NCAR_CCSM4', 'NCAR_CESM1', 'GFDL_SPEAR', 'GEM5.2_NEMO', 'CanESM5', 'NASA_GEOS5v2'],
    'CFSandSFS': ['SFS_Baseline', 'CFSv2'],
}
# --------------------------------------------------------------------------------------------------







# --------------------------------------------------------------------------------------------------
def setup(model_input_list, clim_per_str):
    print('Running setup for model data choices and climate period...')

    if len(model_input_list) == 1 and model_input_list[0] in MODEL_GROUPS:
        selected_models = MODEL_GROUPS[model_input_list[0]]
    else:
        selected_models = model_input_list

    final_model_dict = {}
    for m in selected_models:
        if m in MODEL_SPECS:
            final_model_dict[m] = MODEL_SPECS[m]
        else:
            raise ValueError(f"Model '{m}' not supported.")

    n_ens = sum(final_model_dict.values())

    try:
        start_year, end_year = map(int, clim_per_str.split('_'))
        years = np.arange(start_year, end_year + 1, 1)
    except ValueError:
        raise ValueError(f"Climatology format '{clim_per_str}' is invalid.")

    print(f"    Climatology Period: {clim_per_str} (Years: {len(years)})")
    print(f"    Total Ensemble Members (nEns): {n_ens}")

    return {
        'model_dict': final_model_dict,
        'years': years,
        'model_out': list(final_model_dict.keys()),
        'climo': clim_per_str,
        'mems_total': n_ens
    }
# --------------------------------------------------------------------------------------------------

# --------------------------------------------------------------------------------------------------
def get_time_indices(init, lead, init_year, config):
    init_idx = int(init) - 1
    lead_int = int(lead)
    init_yr_int = int(init_year)

    absolute_month_index = init_idx + lead_int
    years_added = absolute_month_index // 12
    target_month_idx = absolute_month_index % 12
    target_year = init_yr_int + years_added

    month_abbrs = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    season_abbrs = ['DJF', 'JFM', 'FMA', 'MAM', 'AMJ', 'MJJ', 'JJA', 'JAS', 'ASO', 'SON', 'OND', 'NDJ']

    target_lead_name = month_abbrs[target_month_idx]
    target_season_name = season_abbrs[target_month_idx]

    print(f"    Date Info: Init {init} (Year {init_year}) + Lead {lead} --> Valid: {target_lead_name} ({target_season_name}) Year {target_year}")

    return {
        'target_year': target_year,
        'lead_name': target_lead_name,
        'season_name': target_season_name,
    }
# --------------------------------------------------------------------------------------------------

# --------------------------------------------------------------------------------------------------
def open_and_process_models(base_path, variable, time_period,
                            init_month, lead_time, init_year,
                            config, return_members=False):

    mode_str = "Full Ensemble" if return_members else "Ensemble Mean"
    print(f"Opening target data for {variable} ({mode_str})...")

    model_order = config['model_out']
    ds_list = []

    # We need to know the grid shape in case everything fails.
    # Hardcoding 1x1 Global grid (181, 360) based on your metadata.
    # If using different grids, you might need to read a "sample" file first.
    GRID_SHAPE = (181, 360)

    for model_name in model_order:
        n_members = config['model_dict'][model_name]
        file_name = f"{model_name}.{variable}.{init_year}{init_month}.fcst.nc"
        full_path = os.path.join(base_path, model_name, file_name)

        try:
            # Check if file exists first
            if not os.path.exists(full_path):
                print(f"    WARNING: File missing: {full_path}")
                continue

            ds = xr.open_dataset(full_path, decode_times=False)

            # Check if data variable exists
            if 'fcst' not in ds:
                print(f"    WARNING: 'fcst' variable missing in {full_path}")
                continue

            ds_subset = ds.isel(ensmem=slice(0, n_members))
            ds_list.append(ds_subset)

        except Exception as e:
            print(f"    WARNING: Error reading {full_path}: {e}")
            continue

    # --- FAIL-SAFE BLOCK ---
    if not ds_list:
        print(f"    WARNING: No valid data found for {init_year}. Returning -9999 to MET.")
        # Return -9999 instead of np.nan
        return np.full(GRID_SHAPE, -9999.0)

    # Combine data
    ds_combined = xr.concat(ds_list, dim='ensmem')
    fcst_data = ds_combined['fcst'].values
    fcst_data = np.where(np.abs(fcst_data) < 9999, fcst_data, np.nan)

    # --- NAN CHECK ---
    # If the file opened but contains only NaNs
    if np.isnan(fcst_data).all():
         print(f"    WARNING: Data for {init_year} is all-NaN. Returning -9999 to MET.")
         return np.full(GRID_SHAPE, -9999.0)

    lead_int = int(lead_time)

    # Time Slicing Logic
    try:
        if time_period == 'monthly':
            fcst_proc = fcst_data[:, lead_int, :, :]
        elif time_period == 'seasonal':
            leads_to_avg = slice(lead_int - 1, lead_int + 2)
            fcst_subset = fcst_data[:, leads_to_avg, :, :]
            fcst_proc = np.nanmean(fcst_subset, axis=1)
    except IndexError:
         print(f"    WARNING: Lead time {lead_int} out of bounds for {init_year}. Returning NaNs.")
         return np.full(GRID_SHAPE, np.nan)

    if return_members:
        fcst = fcst_proc
    else:
        fcst = np.nanmean(fcst_proc, axis=0)

    # Unit Conversion
    if variable == 'prate':
        fcst = fcst * 86400
    elif variable in ['tmp2m', 'tmpsfc']:
        fcst = fcst - 273.15
        if variable == 'tmpsfc':
             fcst = np.where(fcst > 600, np.nan, fcst)

    print(f"    Shape of fcst array: {fcst.shape}")
    return fcst
# --------------------------------------------------------------------------------------------------

# --------------------------------------------------------------------------------------------------
def calc_clim(base_path, variable, time_period,
              init_month, lead_time, config,
              return_members=False):

    mode_str = "Member-Wise" if return_members else "Ensemble Mean"
    print(f"Calculating {mode_str} Climatology for {variable}...")

    clim_years = config['years']
    model_order = config['model_out']
    lead_int = int(lead_time)
    history_stack = []

    for year in clim_years:
        year_str = str(year)
        current_year_models = []

        for model_name in model_order:
            n_members = config['model_dict'][model_name]
            file_name = f"{model_name}.{variable}.{year_str}{init_month}.fcst.nc"
            full_path = os.path.join(base_path, model_name, file_name)

            # --- SAFETY CHECK 1: File Existence ---
            if not os.path.exists(full_path):
                print(f"    WARNING: File missing for year {year_str}: {full_path}")
                continue # Skip this model, try next model or next year

            try:
                ds = xr.open_dataset(full_path, decode_times=False)
                ds_subset = ds.isel(ensmem=slice(0, n_members))
                current_year_models.append(ds_subset)
            except Exception as e:
                print(f"    WARNING: Error reading {full_path}: {e}")
                continue

        # --- SAFETY CHECK 2: Did we find ANY models for this year? ---
        if not current_year_models:
            print(f"    WARNING: No valid model data found for year {year_str}. Skipping this year in climatology.")
            continue # Skip to the next year

        # Combine models (if multi-model) or just take the single model
        try:
            ds_combined = xr.concat(current_year_models, dim='ensmem')
            fcst_data = ds_combined['fcst'].values
            fcst_data = np.where(np.abs(fcst_data) < 9999, fcst_data, np.nan)
        except Exception as e:
            print(f"    WARNING: Error combining data for {year_str}: {e}. Skipping.")
            continue

        # --- SAFETY CHECK 3: Is the data All-NaNs? (The 2003 Fix) ---
        if np.isnan(fcst_data).all():
            print(f"    WARNING: Data for {year_str} is entirely NaN. Skipping this year in climatology.")
            continue

        # Process Time Period (Monthly vs Seasonal)
        if time_period == 'monthly':
            data_proc = fcst_data[:, lead_int, :, :]
        elif time_period == 'seasonal':
            leads_to_avg = slice(lead_int - 1, lead_int + 2)
            fcst_subset = fcst_data[:, leads_to_avg, :, :]
            data_proc = np.nanmean(fcst_subset, axis=1)

        history_stack.append(data_proc)

    # --- FINAL CHECK: Did we get enough years? ---
    if len(history_stack) == 0:
        raise RuntimeError("CRITICAL ERROR: No valid years found for climatology. Check paths and data.")
    full_hist_array = np.stack(history_stack, axis=0)

    # Unit Conversions
    if variable == 'prate':
        full_hist_array = full_hist_array * 86400
    elif variable in ['tmp2m', 'tmpsfc']:
        full_hist_array = full_hist_array - 273.15
        if variable == 'tmpsfc':
            full_hist_array = np.where(full_hist_array > 600, np.nan, full_hist_array)

    # Statistics Calculation
    if return_members:
        clim = np.nanmean(full_hist_array, axis=0)
        stddev = np.nanstd(full_hist_array, axis=0)
        anoms = full_hist_array - clim
        ptiles = np.nanpercentile(anoms, [33, 66], axis=0)
    else:
        yearly_means = np.nanmean(full_hist_array, axis=1)
        clim = np.nanmean(yearly_means, axis=0)
        stddev = np.nanstd(yearly_means, axis=0)
        anoms = yearly_means - clim
        ptiles = np.nanpercentile(anoms, [33, 66], axis=0)

    print(f"    Climatology successfully calculated using {len(history_stack)} valid years.")
    print(f"    clim shape: {clim.shape}")
    return clim, stddev, ptiles
# --------------------------------------------------------------------------------------------------


# --------------------------------------------------------------------------------------------------
def calc_anom(fcst, clim, stddev):
    print('Calculating anomalies...')
    anom = fcst - clim
    with np.errstate(divide='ignore', invalid='ignore'):
        std_anom = anom / stddev
    std_anom = np.where(np.isfinite(std_anom), std_anom, np.nan)
    print(f'    anom Shape: {anom.shape}')
    return anom, std_anom
# --------------------------------------------------------------------------------------------------

# --------------------------------------------------------------------------------------------------
def create_terciles(fcst, clim, stddev, ptiles, variable):
    print(f'Creating tercile probabilities for {variable} (Member-Wise)...')
    anoms = fcst - clim
    with np.errstate(divide='ignore', invalid='ignore'):
        std_anoms = anoms / stddev
    std_anoms = np.where(np.isfinite(std_anoms), std_anoms, np.nan)

    if variable in ['tmp2m', 'tmpsfc']:
        thresh_low, thresh_high = -0.43, 0.43
        field = std_anoms
    else:
        thresh_low, thresh_high = ptiles[0, :, :, :], ptiles[1, :, :, :]
        field = anoms

    ut_ones = np.where(field > thresh_high, 1, 0)
    lt_ones = np.where(field < thresh_low, 1, 0)
    mt_ones = np.where((field >= thresh_low) & (field <= thresh_high), 1, 0)

    total_members = field.shape[0]
    ut_prob = np.nansum(ut_ones, axis=0) / total_members
    lt_prob = np.nansum(lt_ones, axis=0) / total_members
    mt_prob = np.nansum(mt_ones, axis=0) / total_members

    terciles = np.array([lt_prob, mt_prob, ut_prob])
    print(f'    terciles shape: {terciles.shape}')
    return terciles
# --------------------------------------------------------------------------------------------------



# --------------------------------------------------------------------------------------------------
def create_obs_anomalies(base_path, clim_period, variable, init_month, lead_time, time_period):
    """
    The Corrected Xarray Implementation.
    - Precision: float64
    - Logic: Vectorized (No loops)
    - Boundary: Truncated (No time wrapping)
    - NaNs: Handled via nanmean
    - FIXED: Slicing logic now aligns strictly with file start year
    """
    import xarray as xr
    import numpy as np
    import os

    # 1. Setup Logic
    init = int(init_month) - 1
    lead = int(lead_time)

    # This finds the correct month index (0=Jan, 1=Feb) regardless of lead time
    target_month_idx = (init + lead) % 12

    # We do NOT want to use (init + lead) as the start index, because if init+lead > 12,
    # it skips the first year of data in the file. We want all available years.
    clim_str = clim_period.replace('_', '-')

    file_map = {
        'tmp2m': f"ghcn_cams.1x1.{clim_str}.mon.nc",
        'prate': f"cmap.1x1.{clim_str}.mon.nc",
        'tmpsfc': f"oisstv2.1.1x1.{clim_str}.mon.nc",
        'soilm1m': f"ERA5.soilm1m.1x1.{clim_str}.mon.nc"
    }
    var_map = {'tmp2m': 'tmp2m', 'prate': 'precip', 'tmpsfc': 'tmpsfc', 'soilm1m': 'soilm'}

    if '/cpc/nmme/' in base_path:
        obs_path = "/cpc/home/jinfanti/MET/S2S/PythonWrapper_NMME/nmme_met_development/obs_data/"
    else:
        obs_path = base_path

    full_path = os.path.join(obs_path, file_map[variable])

    # 2. Load Data (Xarray automatically promotes to float64)
    try:
        ds = xr.open_dataset(full_path, decode_times=False)
        obs_raw = ds[var_map.get(variable, variable)].values.astype(np.float64)
    except Exception as e:
        raise FileNotFoundError(f"Error loading {full_path}: {e}")

    # 3. Seasonal Means Logic
    if time_period == 'seasonal':
        term_minus = np.roll(obs_raw, 1, axis=0)
        term_plus = np.roll(obs_raw, -1, axis=0)

        term_minus[0] = np.nan
        term_plus[-1] = np.nan

        stack_season = np.stack([term_minus, obs_raw, term_plus], axis=0)

        with np.errstate(invalid='ignore'):
            obs_processed = np.nanmean(stack_season, axis=0)

    else:
        obs_processed = obs_raw

    # 4. Climatology Calculation
    # Slice: Start at target month, step by 12.
    clim_slice = obs_processed[target_month_idx::12, :, :]
    obs_clim = np.nanmean(clim_slice, axis=0)
    obs_stddev = np.nanstd(clim_slice, axis=0)

    # 5. Verification Slicing
    # We use target_month_idx instead of verif_start_idx.
    # This ensures that if the file starts in 1994, Index 0 is the 1994 target season.
    verif_slice = obs_processed[target_month_idx::12, :, :]

    # 6. Anomaly Calculation
    obs_anom = verif_slice - obs_clim

    with np.errstate(divide='ignore', invalid='ignore'):
        obs_std_anom = obs_anom / obs_stddev

    obs_std_anom = np.where(np.isfinite(obs_std_anom), obs_std_anom, np.nan)

    return verif_slice, obs_anom, obs_std_anom, obs_clim, obs_stddev
# --------------------------------------------------------------------------------------------------


# --------------------------------------------------------------------------------------------------
def create_obs_terciles(obs_anom, obs_std_anom, variable):
    print(f'Creating observed terciles for {variable}...')
    if variable in ['tmp2m', 'tmpsfc']:
        thresh_low, thresh_high = -0.43, 0.43
        field = obs_std_anom
    else:
        thresh_low = np.percentile(obs_anom, 33, axis=0)
        thresh_high = np.percentile(obs_anom, 66, axis=0)
        field = obs_anom

    ut_ones = np.where(field > thresh_high, 1, 0)
    lt_ones = np.where(field < thresh_low, 1, 0)
    mt_ones = np.where((field >= thresh_low) & (field <= thresh_high), 1, 0)

    terciles_obs = np.stack([lt_ones, mt_ones, ut_ones], axis=0)
    print(f'    terciles_obs shape: {terciles_obs.shape}')
    return terciles_obs
# --------------------------------------------------------------------------------------------------

# --------------------------------------------------------------------------------------------------
def create_dominant_tercile_fcst(terciles_fcst):
    max_indices = np.argmax(terciles_fcst, axis=0)
    dominant_tercile_model = max_indices + 1.0
    mask_check = np.nansum(terciles_fcst, axis=0)
    dominant_tercile_model = np.where(mask_check == 0, np.nan, dominant_tercile_model)
    dominant_tercile_model = np.where(np.isnan(mask_check), np.nan, dominant_tercile_model)
    return dominant_tercile_model
# --------------------------------------------------------------------------------------------------

# --------------------------------------------------------------------------------------------------
def create_dominant_tercile_obs(terciles_obs):
    max_indices = np.argmax(terciles_obs, axis=0)
    dominant_tercile_obs = max_indices + 1.0
    mask_check = np.nansum(terciles_obs, axis=0)
    dominant_tercile_obs = np.where(mask_check == 0, np.nan, dominant_tercile_obs)
    dominant_tercile_obs = np.where(np.isnan(mask_check), np.nan, dominant_tercile_obs)
    return dominant_tercile_obs
# --------------------------------------------------------------------------------------------------

parm/use_cases/model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SelectClimateRegimes/wrapper_combined.py
import sys
import os
import time
import numpy as np
import datetime as dt
import xarray as xr
from dateutil.relativedelta import *
import function_library as fl

# =================================================================
# USER CONFIGURATION: ORIENTATION FLAGS
# =================================================================
FLIP_OBS = True      # Set to True to flip Observation data (North <-> South)
FLIP_MODELS = False   # Set to True to flip Model data (North <-> South)
# =================================================================

def main():
    start_time = time.time()
    print("-------------------------------------------------------------------")
    print("Starting NMME Pre-processing Wrapper (Manual Flip Enforced)")
    print("-------------------------------------------------------------------")

    if len(sys.argv) < 10:
        print("Usage: python wrapper_combined.py <init> <lead> <var> <clim_per> <models> <path> <time_per> <target_yr> <field>")
        sys.exit(1)

    init_month, lead_time, variable = sys.argv[1], sys.argv[2], sys.argv[3]
    clim_period, model_arg = sys.argv[4], sys.argv[5]
    base_path, time_period = sys.argv[6], sys.argv[7]
    target_year, field = sys.argv[8], sys.argv[9]

    model_input = [model_arg]
    init_time_str = target_year + init_month
    init_time = dt.datetime.strptime(init_time_str, "%Y%m")
    lead_val = int(lead_time)
    val_time = init_time + relativedelta(months=lead_val)

    # Unit Logic
    if variable in ['tmpsfc', 'tmp2m']:
        units = 'deg C'
    elif variable == 'prate':
        units = 'mm/day'
    else:
        units = 'unitless'
        if 'tercile' in field: units = 'percent'

    try:
        config = fl.setup(model_input, clim_period)

        # Parse Start Year for Obs logic
        try:
            sep = '_' if '_' in clim_period else '-'
            file_start_year = int(clim_period.split(sep)[0])
        except:
            file_start_year = 1982
            print("WARNING: Defaulting obs start year to 1982.")

        v = None
        is_observation = (model_arg == 'obs')

        # Determine if we need to flip based on the flags at the top
        should_flip = False

        # ==================================================================
        # BRANCH A: OBSERVATIONS
        # ==================================================================
        if is_observation:
            print(">>> MODE: OBSERVATION PROCESSING")
            should_flip = FLIP_OBS

            # Process Data
            verif, anom, std_anom, clim, std = fl.create_obs_anomalies(
                base_path, clim_period, variable, init_month, lead_time, time_period
            )

            target_idx = int(target_year) - file_start_year
            if target_idx < 0 or target_idx >= anom.shape[0]:
                raise IndexError(f"Target year {target_year} is outside range.")

            if field == 'raw': v = verif[target_idx, :, :]
            elif field == 'anom': v = anom[target_idx, :, :]
            elif field == 'std_anom': v = std_anom[target_idx, :, :]
            elif field == 'clim_mean': v = clim
            elif field == 'clim_std': v = std
            elif 'tercile' in field:
                obs_terciles = fl.create_obs_terciles(anom, std_anom, variable)
                if field == 'lower_tercile': v = obs_terciles[0, target_idx, :, :]
                elif field == 'middle_tercile': v = obs_terciles[1, target_idx, :, :]
                elif field == 'upper_tercile': v = obs_terciles[2, target_idx, :, :]

        # ==================================================================
        # BRANCH B: MODELS
        # ==================================================================
        else:
            print(">>> MODE: MODEL FORECAST PROCESSING")
            should_flip = FLIP_MODELS

            # Process Data
            if field == 'raw':
                v = fl.open_and_process_models(base_path, variable, time_period, init_month, lead_time, target_year, config, return_members=False)
            elif 'clim' in field:
                clim_mean, std_mean, _ = fl.calc_clim(base_path, variable, time_period, init_month, lead_time, config, return_members=False)
                v = clim_mean if field == 'clim_mean' else std_mean
            elif field in ['anom', 'std_anom']:
                clim_mean, std_mean, _ = fl.calc_clim(base_path, variable, time_period, init_month, lead_time, config, return_members=False)
                fcst_mean = fl.open_and_process_models(base_path, variable, time_period, init_month, lead_time, target_year, config, return_members=False)
                anom, std_anom = fl.calc_anom(fcst_mean, clim_mean, std_mean)
                v = anom if field == 'anom' else std_anom
            elif 'tercile' in field:
                clim_mem, std_mem, ptiles_mem = fl.calc_clim(base_path, variable, time_period, init_month, lead_time, config, return_members=True)
                fcst_mem = fl.open_and_process_models(base_path, variable, time_period, init_month, lead_time, target_year, config, return_members=True)
                probs = fl.create_terciles(fcst_mem, clim_mem, std_mem, ptiles_mem, variable)
                if field == 'lower_tercile': v = probs[0,:,:]
                elif field == 'middle_tercile': v = probs[1,:,:]
                elif field == 'upper_tercile': v = probs[2,:,:]

        # ------------------------------------------------------------------
        # FINAL PROCESSING
        # ------------------------------------------------------------------
        if v is None: raise NameError("Data variable 'v' was not assigned.")

        var = np.float64(v)
        var[var < -800] = np.nan
        var[var > 800]  = np.nan
        met_data = np.squeeze(var).copy()

        # --- FLIP LOGIC ---
        if should_flip:
            print("    ACTION: Flipping data (np.flipud) to align with MET Grid.")
            met_data = np.flipud(met_data).copy()
        else:
            print("    ACTION: Data orientation preserved (No Flip applied).")

        # Grid Definition (South-to-North)
        grid_data = {
            'name': 'Global_1x1', 'type': 'LatLon',
            'lat_ll': -90.0, 'lon_ll': 0.0, 'delta_lat': 1.0, 'delta_lon': 1.0,
            'Nlat': 181, 'Nlon': 360,
        }

        attrs = {
            'valid': str(val_time.strftime("%Y%m%d"))+'_'+str(val_time.strftime("%H%M%S")),
            'init': str(init_time.strftime("%Y%m%d"))+'_'+str(init_time.strftime("%H%M%S")),
            'lead':  lead_time,
            'name':  variable,
            'accum': '00', 'level': 'ground', 'units': units,
            'long_name': f"{variable} {field}",
            'grid': grid_data
        }

    except Exception as e:
        print(f"ERROR: {e}")
        import traceback
        traceback.print_exc()
        sys.exit(1)

    elapsed_time = time.time() - start_time
    print(f"Elapsed time: {elapsed_time:.2f} seconds")
    return met_data, attrs

print(f"PYTHON SCRIPT ARGUMENTS: {sys.argv}")
met_data, attrs = main()

For more information on the basic requirements to utilize Python Embedding in METplus, please refer to the MET User’s Guide section on Python embedding.

User Scripting

This use case does not use additional scripts.

Running METplus

Pass the use case configuration file to the run_metplus.py script along with any user-specific system configuration files if desired:

run_metplus.py /path/to/METplus/parm/use_cases/model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SelectClimateRegimes.conf /path/to/user_system.conf

See Running METplus for more information.

Expected Output

A successful run will output the following both to the screen and to the logfile:

INFO: METplus has successfully finished running.

Refer to the value set for OUTPUT_BASE to find where the output data was generated. Output will be in a directory named SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SelectClimateRegimes relative to OUTPUT BASE and will contain 12 files. These can be grouped into three lead times (1, 2, and 3 month) and follow this naming scheme:

  • Bias_SeriesAnalysis_tmp2m_raw_CFSandSFS_IC11_Lead0xseasonal_1994_2020_ElNino.nc

  • Bias_SeriesAnalysis_tmp2m_raw_CFSandSFS_IC11_Lead0xseasonal_1994_2020_LaNina.nc

  • Bias_SeriesAnalysis_tmp2m_raw_CFSandSFS_IC11_Lead0xseasonal_1994_2020.nc

  • Bias_SeriesAnalysis_tmp2m_raw_CFSandSFS_IC11_Lead0xseasonal_1994_2020_Neutral.nc

Where x is the lead time value. In each netCDF file, six variable fields are present (not including the lat/lon fields). These correspond to the statistics that were requested in the configuration file, and are as follows:

  • series_cnt_ME(lat, lon)

  • series_cnt_MAE(lat, lon)

  • series_cnt_RMSE(lat, lon)

  • series_cnt_TOTAL(lat, lon)

  • series_cnt_FBAR(lat, lon)

  • series_cnt_OBAR(lat, lon)

Keywords

Note

  • SeriesAnalysisUseCase

  • PythonEmbeddingFileUseCase

  • CustomStringLoopingUseCase

  • S2SMMEAppUseCase

Navigate to the METplus Quick Search for Use Cases page to discover other similar use cases.

sphinx_gallery_thumbnail_path = ‘_static/s2s_mme-SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SelectClimateRegimes.png’

Gallery generated by Sphinx-Gallery