Note
Go to the end to download the full example code.
SeriesAnalysis: Probabilistic verification across tercile climatology
model_applications/s2s_mme/SeriesAnalysis_fcstSFS_obsGHCNCAMS_Prob_SmartMasking_Detrend.conf
Scientific Objective
When verifying climatology data, it is sometimes more advantageous to calculate the climatology fields across the data itself, rather than ingesting another dataset. This use case expands on this concept, with a static probability field of 0.33 being read in as the climatology mean file and acting as the tercile that the data can be measured against. The forecast is calculated in probabilistic space with the aid of Python Embedding and demonstrates how S2S verification space can be more easily accessed in METplus.
Version Added
METplus version 13.0
Datasets
Forecast: Seasonal Forecast System (SFS) Baseline, 1 degree resolution, 2m temperature
Observation: Global Historical Climatology Network ver. 2, Climatology Anomaly Monitoring System (GHCNCAMS) 1 degree resolution, 2m temperature
Climatology: A self-created constant field of 0.33
Note that this use case ingests the same dataset used in two other use cases in the this category, and in an effort to mimize data duplication, paths to the data may reference another use case name
Location: All of the input data required for this use case can be found in a sample data tarball. Each use case category will have one or more sample data tarballs. It is only necessary to download the tarball with the use case’s dataset and not the entire collection of sample data. Click here to access the METplus releases page and download sample data for the appropriate release: https://github.com/dtcenter/METplus/releases This tarball should be unpacked into the directory that you will set the value of INPUT_BASE. See Running METplus section for more information.
METplus Components
The only tool this use case calls is SeriesAnalysis, but there are four instances of SeriesAnalysis. A Python script is used for ingesting forecast and observation data, once for each year of data in the CFSv2 and SFS ensembles.
METplus Workflow
Beginning time (INIT_BEG): 1994-11-01
End time (INIT_END): 2020-11-01
Increment between beginning and end times (INIT_INCREMENT): 1 year
Sequence of forecast leads to process (LEAD_SEQ): None
With an increment of 1 year, all November 1st’s from 1994 to 2020 are processed for a total of 27 years in SeriesAnalysis. This use case utilizes Python Embedding which moves the input into probabilistic space when it is returned to SeriesAnalysis, so all thresholds are unitless. To focus on instances where the lower tercile (0.33) is met, the observation categorical threshold is set to >=0.5, where the observation field that is preprocessed by Python Embedding is already set to 0 or 1, to indicate an observed lower tercile event occurred or not. Additionally, the lower tercile is set using the climatology mean input file.
METplus Configuration
METplus first loads all of the configuration files found in parm/metplus_config, then it loads any configuration files passed to METplus via the command line, e.g. parm/use_cases/model_applications/s2s_mme/SeriesAnalysis_fcstSFS_obsGHCNCAMS_Prob_SmartMasking_Detrend.conf
[config]
# For additional information, please see the METplus Users Guide.
# https://metplus.readthedocs.io/en/latest/Users_Guide
###
# Processes to run
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#process-list
###
PROCESS_LIST = SeriesAnalysis
###
# Field Info
# https://metplus.readthedocs.io/en/latest/Users_Guide/systemconfiguration.html#field-info
###
# USER INPUT
# Note that MODEL can be a single or group of models
# Individual models should have a definition of their name and number of ensemble members on approx line 8 of function_library_withDetrend.py
# MODEL_GROUPS should be defined in function_library_withDetrend.py approx line 15
# FIELD1 can be raw, anom, std_anom, clim_mean, clim_std, lower_tercile, middle_tercile, upper_tercile
# TIME_PERIOD can be monthly or seasonal means
MODEL = SFS_Baseline
OBTYPE = OBS
FIELD1 = lower_tercile
VARIABLE = tmp2m
CLIM = 1994_2020
TIME_PERIOD = seasonal
# CUSTOM LOOP LIST
# Taking the place of LEAD
SERIES_ANALYSIS_CUSTOM_LOOP_LIST = 01
LOOP_BY = INIT
INIT_TIME_FMT = %Y%m%d%H
INIT_BEG=1994110100
INIT_END=2020110100
INIT_INCREMENT = 1Y
FCST_SERIES_ANALYSIS_INPUT_TEMPLATE = {init?fmt=%Y}
OBS_SERIES_ANALYSIS_INPUT_TEMPLATE = {init?fmt=%Y}
FCST_SERIES_ANALYSIS_INPUT_DATATYPE = PYTHON_NUMPY
OBS_SERIES_ANALYSIS_INPUT_DATATYPE = PYTHON_NUMPY
FCST_VAR1_NAME = {CONFIG_DIR}/wrapper_combined_withDetrend.py 11 {custom?fmt=%s} tmp2m {CLIM} {MODEL} {INPUT_BASE}/model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SingleOrMultimodel/fcst {TIME_PERIOD} MET_PYTHON_INPUT_ARG {FIELD1}
FCST_VAR1_LEVELS = "(*,*)"
OBS_VAR1_NAME = {CONFIG_DIR}/wrapper_combined_withDetrend.py 11 {custom?fmt=%s} tmp2m {CLIM} obs {INPUT_BASE}/model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SingleOrMultimodel/obs {TIME_PERIOD} MET_PYTHON_INPUT_ARG {FIELD1}
OBS_VAR1_LEVELS = "(*,*)"
CONFIG_DIR = {PARM_BASE}/use_cases/model_applications/s2s_mme/SeriesAnalysis_fcstSFS_obsGHCNCAMS_Prob_SmartMasking_Detrend
SERIES_ANALYSIS_VLD_THRESH = 0.7
SERIES_ANALYSIS_BLOCK_SIZE = 360*181
FCST_IS_PROB = TRUE
FCST_CAT_THRESH = ==0.1
OBS_SERIES_ANALYSIS_CAT_THRESH = >=0.5
SERIES_ANALYSIS_CLIMO_MEAN_DAY_INTERVAL = NA
SERIES_ANALYSIS_CLIMO_MEAN_HOUR_INTERVAL = NA
SERIES_ANALYSIS_CLIMO_STDEV_DAY_INTERVAL = NA
SERIES_ANALYSIS_CLIMO_STDEV_HOUR_INTERVAL = NA
SERIES_ANALYSIS_CLIMO_MEAN_FILE_TYPE = NETCDF_NCCF
SERIES_ANALYSIS_CLIMO_MEAN_FILE_NAME = {INPUT_BASE}/model_applications/s2s_mme/SeriesAnalysis_fcstCFSandSFS_obsGHCNCAMS_SingleOrMultimodel/climo/climo_const.nc
SERIES_ANALYSIS_CLIMO_MEAN_FIELD = {name="probability"; level="(*,*)";}
SERIES_ANALYSIS_CLIMO_STDEV_FILE_NAME =
SERIES_ANALYSIS_CLIMO_STDEV_FIELD =
SERIES_ANALYSIS_CLIMO_CDF_DIRECT_PROB = TRUE
SERIES_ANALYSIS_OUTPUT_PREFIX =
SERIES_ANALYSIS_RUNTIME_FREQ = RUN_ONCE
SERIES_ANALYSIS_OUTPUT_STATS_PSTD = BRIER, RELIABILITY, RESOLUTION, UNCERTAINTY, ROC_AUC, BSS, BRIERCL, TOTAL
SERIES_ANALYSIS_ONCE_PER_FIELD = False
SERIES_ANALYSIS_OUTPUT_DIR = {OUTPUT_BASE}/SeriesAnalysis_fcstSFS_obsGHCNCAMS_Prob_SmartMasking_Detrend
SERIES_ANALYSIS_OUTPUT_TEMPLATE = BSS_SeriesAnalysis_tmp2m_{FIELD1}_{MODEL}_IC11_Lead{custom?fmt=%s}{TIME_PERIOD}_{CLIM}.nc
MET Configuration
METplus sets environment variables based on user settings in the METplus configuration file. See How METplus controls MET config file settings for more details.
YOU SHOULD NOT SET ANY OF THESE ENVIRONMENT VARIABLES YOURSELF! THEY WILL BE OVERWRITTEN BY METPLUS WHEN IT CALLS THE MET TOOLS!
If there is a setting in the MET configuration file that is currently not supported by METplus you’d like to control, please refer to: Overriding Unsupported MET config file settings
SeriesAnalysisConfig_wrapped
////////////////////////////////////////////////////////////////////////////////
//
// Series-Analysis configuration file.
//
// For additional information, see the MET_BASE/config/README file.
//
////////////////////////////////////////////////////////////////////////////////
//
// Output model name to be written
//
//model =
${METPLUS_MODEL}
//
// Output description to be written
//
//desc =
${METPLUS_DESC}
//
// Output observation type to be written
//
//obtype =
${METPLUS_OBTYPE}
////////////////////////////////////////////////////////////////////////////////
//
// Verification grid
// May be set separately in each "field" entry
//
//regrid = {
${METPLUS_REGRID_DICT}
////////////////////////////////////////////////////////////////////////////////
censor_thresh = [];
censor_val = [];
//cat_thresh =
${METPLUS_CAT_THRESH}
cnt_thresh = [ NA ];
cnt_logic = UNION;
//
// Forecast and observation fields to be verified
//
fcst = {
${METPLUS_FCST_FILE_TYPE}
${METPLUS_FCST_CAT_THRESH}
//field = [
${METPLUS_FCST_FIELD}
${METPLUS_FCST_CLIMO_MEAN_DICT}
${METPLUS_FCST_CLIMO_STDEV_DICT}
}
obs = {
${METPLUS_OBS_FILE_TYPE}
${METPLUS_OBS_CAT_THRESH}
//field = [
${METPLUS_OBS_FIELD}
${METPLUS_OBS_CLIMO_MEAN_DICT}
${METPLUS_OBS_CLIMO_STDEV_DICT}
}
////////////////////////////////////////////////////////////////////////////////
//
// Climatology data
//
//climo_mean = {
${METPLUS_CLIMO_MEAN_DICT}
//climo_stdev = {
${METPLUS_CLIMO_STDEV_DICT}
//climo_cdf = {
${METPLUS_CLIMO_CDF_DICT}
////////////////////////////////////////////////////////////////////////////////
//
// Confidence interval settings
//
ci_alpha = [ 0.05 ];
boot = {
interval = PCTILE;
rep_prop = 1.0;
n_rep = 0;
rng = "mt19937";
seed = "";
}
////////////////////////////////////////////////////////////////////////////////
//
// Verification masking regions
//
//mask = {
${METPLUS_MASK_DICT}
////////////////////////////////////////////////////////////////////////////////
//
// Gradient statistics
// May be set separately in each "obs.field" entry
//
//gradient = {
${METPLUS_GRADIENT_DICT}
//
// Number of grid points to be processed concurrently. Set smaller to use
// less memory but increase the number of passes through the data.
//
//block_size =
${METPLUS_BLOCK_SIZE}
//
// Ratio of valid matched pairs to compute statistics for a grid point
//
//vld_thresh =
${METPLUS_VLD_THRESH}
////////////////////////////////////////////////////////////////////////////////
//
// Statistical output types
//
//output_stats = {
${METPLUS_OUTPUT_STATS_DICT}
////////////////////////////////////////////////////////////////////////////////
//hss_ec_value =
${METPLUS_HSS_EC_VALUE}
rank_corr_flag = FALSE;
tmp_dir = "${MET_TMP_DIR}";
//version = "V10.0";
////////////////////////////////////////////////////////////////////////////////
${METPLUS_TIME_OFFSET_WARNING}
${METPLUS_MET_CONFIG_OVERRIDES}
Python Embedding
This use case utilizes two Python scripts. The first, function_library_withDetrend.py, serves the purpose of data handling. It open and reads in NMME and observational data, while also calculating the tercile probabilities evaluated in METplus using CPC methodologies (including non-normal assumption for precipitation terciles, or other variables as needed). For more simple changes, users can add additional models under MODEL_SPECS (in the format ‘model_name’: nMembers); and additional model groupings can be added under MODEL_GROUPS, in the format ‘short_name’: [‘list’, ‘of’, ‘models’]. Both of these are near the top of the function library. Additional observed datasets can be added under the file_map and var_map, under create_obs_anomalies. This python has the addiitonal enhancement of a dynamic “smart masking” logic. This procedure identifies the valid domain of any input variable (e.g., Land for temperature, Ocean for SST) and converts NaN values within that domain to zeros to ensuring non-events are statistically counted as “misses.” It also re-applies the original data mask to the invalid regions (e.g., oceans for land-only data), preventing invalid verification over undefined areas regardless of the variable type. The second script, wrapper_combined_withDetrend.py, serves as the interface between python logic and METplus. Based on options in the METplus config file, it formats model and observational data (e.g., standardizing to lat x lon grids) and holds it in memory for METplus to ingest. It also feature options for flags (FLIP_OBS, FLIP_MODELS) to handle latitude orientation mismatches, ensuring data is geometrically correct before MET sees it. This use case could also be made to detrend the data, by updating the DETREND_DATA setting in function_library_withDetrend.py to TRUE.
parm/use_cases/model_applications/s2s_mme/SeriesAnalysis_fcstSFS_obsGHCNCAMS_Prob_SmartMasking_Detrend/function_library_withDetrend.py
import numpy as np
import xarray as xr
import os
# --------------------------------------------------------------------------------------------------
# 1. Define nMembers in each model
MODEL_SPECS = {
'CFSv2': 24, 'NCAR_CCSM4': 10, 'GEM5_NEMO': 10, 'NASA_GEOS5v2': 4,
'CanCM4i': 10, 'GFDL_SPEAR': 15, 'GEM5.2_NEMO': 20, 'CanESM5': 20,
'SFS_Baseline': 9, 'NCAR_CESM1': 10, 'obs': 1
}
# 2. Define Model Groups
MODEL_GROUPS = {
'NMME': ['CFSv2', 'NCAR_CCSM4', 'GEM5.2_NEMO', 'NASA_GEOS5v2', 'CanESM5', 'GFDL_SPEAR'],
'miniNMME': ['CFSv2', 'NCAR_CCSM4', 'GFDL_SPEAR'],
'miniNMME_addSFS': ['CFSv2', 'NCAR_CCSM4', 'GFDL_SPEAR', 'SFS_Baseline'],
'miniNMME_replaceCFSwSFS': ['SFS_Baseline', 'NCAR_CCSM4', 'GFDL_SPEAR'],
'NMMEwithCFS': ['CFSv2', 'NCAR_CCSM4', 'NCAR_CESM1', 'GFDL_SPEAR', 'GEM5.2_NEMO', 'CanESM5', 'NASA_GEOS5v2'],
'NMMEwithSFS': ['SFS_Baseline', 'NCAR_CCSM4', 'NCAR_CESM1', 'GFDL_SPEAR', 'GEM5.2_NEMO', 'CanESM5', 'NASA_GEOS5v2'],
'CFSandSFS': ['SFS_Baseline', 'CFSv2'],
}
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def detrend_grid(data_array, years):
"""
Linearly detrends data at every grid point over the time dimension (axis 0).
data_array: Shape (Time, Lat, Lon)
years: 1D array of years corresponding to Time axis.
Returns:
detrended_data: Shape (Time, Lat, Lon) with trend removed (residuals + mean)
slope_grid: Shape (Lat, Lon) - Slope per year
intercept_grid: Shape (Lat, Lon)
"""
print(" Calculating linear trend for removal...")
n_t, n_lat, n_lon = data_array.shape
# Flatten spatial dims for vectorized polyfit
y_reshaped = data_array.reshape(n_t, -1)
# Handle NaNs: mask them out or polyfit will fail.
valid_mask = np.isfinite(y_reshaped).all(axis=0)
# Prepare Output
detrended_flat = y_reshaped.copy()
slope_flat = np.zeros(y_reshaped.shape[1])
intercept_flat = np.zeros(y_reshaped.shape[1])
# x vector
x = years
if np.any(valid_mask):
y_valid = y_reshaped[:, valid_mask]
# np.polyfit returns [slope, intercept]
coeffs = np.polyfit(x, y_valid, 1)
slopes = coeffs[0, :]
intercepts = coeffs[1, :]
# Calculate Trend Line
# trend = slope * x + intercept
trend = np.outer(x, slopes) + intercepts
# Remove trend but keep the mean!
# Standard Detrend: Data - Trend_Line + Mean
means = np.mean(y_valid, axis=0)
detrended_valid = y_valid - trend + means
# Store back
detrended_flat[:, valid_mask] = detrended_valid
slope_flat[valid_mask] = slopes
intercept_flat[valid_mask] = intercepts
return detrended_flat.reshape(n_t, n_lat, n_lon), slope_flat.reshape(n_lat, n_lon), intercept_flat.reshape(n_lat, n_lon)
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def setup(model_input_list, clim_per_str):
print('Running setup for model data choices and climate period...')
if len(model_input_list) == 1 and model_input_list[0] in MODEL_GROUPS:
selected_models = MODEL_GROUPS[model_input_list[0]]
else:
selected_models = model_input_list
final_model_dict = {}
for m in selected_models:
if m in MODEL_SPECS:
final_model_dict[m] = MODEL_SPECS[m]
else:
raise ValueError(f"Model '{m}' not supported.")
n_ens = sum(final_model_dict.values())
try:
start_year, end_year = map(int, clim_per_str.split('_'))
years = np.arange(start_year, end_year + 1, 1)
except ValueError:
raise ValueError(f"Climatology format '{clim_per_str}' is invalid.")
print(f" Climatology Period: {clim_per_str} (Years: {len(years)})")
print(f" Total Ensemble Members (nEns): {n_ens}")
return {
'model_dict': final_model_dict,
'years': years,
'model_out': list(final_model_dict.keys()),
'climo': clim_per_str,
'mems_total': n_ens
}
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def get_time_indices(init, lead, init_year, config):
init_idx = int(init) - 1
lead_int = int(lead)
init_yr_int = int(init_year)
absolute_month_index = init_idx + lead_int
years_added = absolute_month_index // 12
target_month_idx = absolute_month_index % 12
target_year = init_yr_int + years_added
month_abbrs = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
season_abbrs = ['DJF', 'JFM', 'FMA', 'MAM', 'AMJ', 'MJJ', 'JJA', 'JAS', 'ASO', 'SON', 'OND', 'NDJ']
target_lead_name = month_abbrs[target_month_idx]
target_season_name = season_abbrs[target_month_idx]
print(f" Date Info: Init {init} (Year {init_year}) + Lead {lead} --> Valid: {target_lead_name} ({target_season_name}) Year {target_year}")
return {
'target_year': target_year,
'lead_name': target_lead_name,
'season_name': target_season_name,
}
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def open_and_process_models(base_path, variable, time_period,
init_month, lead_time, init_year,
config, return_members=False):
mode_str = "Full Ensemble" if return_members else "Ensemble Mean"
print(f"Opening target data for {variable} ({mode_str})...")
model_order = config['model_out']
ds_list = []
# We need to know the grid shape in case everything fails.
GRID_SHAPE = (181, 360)
for model_name in model_order:
n_members = config['model_dict'][model_name]
file_name = f"{model_name}.{variable}.{init_year}{init_month}.fcst.nc"
full_path = os.path.join(base_path, model_name, file_name)
try:
# Check if file exists first
if not os.path.exists(full_path):
print(f" WARNING: File missing: {full_path}")
continue
ds = xr.open_dataset(full_path, decode_times=False)
# Check if data variable exists
if 'fcst' not in ds:
print(f" WARNING: 'fcst' variable missing in {full_path}")
continue
ds_subset = ds.isel(ensmem=slice(0, n_members))
ds_list.append(ds_subset)
except Exception as e:
print(f" WARNING: Error reading {full_path}: {e}")
continue
# --- FAIL-SAFE BLOCK ---
if not ds_list:
print(f" WARNING: No valid data found for {init_year}. Returning -9999 to MET.")
return np.full(GRID_SHAPE, -9999.0)
# Combine data
ds_combined = xr.concat(ds_list, dim='ensmem')
fcst_data = ds_combined['fcst'].values
fcst_data = np.where(np.abs(fcst_data) < 9999, fcst_data, np.nan)
# --- NAN CHECK ---
if np.isnan(fcst_data).all():
print(f" WARNING: Data for {init_year} is all-NaN. Returning -9999 to MET.")
return np.full(GRID_SHAPE, -9999.0)
lead_int = int(lead_time)
# Time Slicing Logic
try:
if time_period == 'monthly':
fcst_proc = fcst_data[:, lead_int, :, :]
elif time_period == 'seasonal':
leads_to_avg = slice(lead_int - 1, lead_int + 2)
fcst_subset = fcst_data[:, leads_to_avg, :, :]
fcst_proc = np.nanmean(fcst_subset, axis=1)
except IndexError:
print(f" WARNING: Lead time {lead_int} out of bounds for {init_year}. Returning NaNs.")
return np.full(GRID_SHAPE, np.nan)
if return_members:
fcst = fcst_proc
else:
fcst = np.nanmean(fcst_proc, axis=0)
# Unit Conversion
if variable == 'prate':
fcst = fcst * 86400
elif variable in ['tmp2m', 'tmpsfc']:
fcst = fcst - 273.15
if variable == 'tmpsfc':
fcst = np.where(fcst > 600, np.nan, fcst)
print(f" Shape of fcst array: {fcst.shape}")
return fcst
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def _load_history_stack(base_path, variable, time_period, init_month, lead_time, config):
"""
Helper to avoid code duplication in calc_clim and get_trend_value.
Returns array of shape: (Years, Members, Lat, Lon)
"""
clim_years = config['years']
model_order = config['model_out']
lead_int = int(lead_time)
history_stack = []
for year in clim_years:
year_str = str(year)
current_year_models = []
for model_name in model_order:
n_members = config['model_dict'][model_name]
file_name = f"{model_name}.{variable}.{year_str}{init_month}.fcst.nc"
full_path = os.path.join(base_path, model_name, file_name)
if not os.path.exists(full_path):
print(f" WARNING: File missing for year {year_str}: {full_path}")
continue
try:
ds = xr.open_dataset(full_path, decode_times=False)
ds_subset = ds.isel(ensmem=slice(0, n_members))
current_year_models.append(ds_subset)
except Exception as e:
print(f" WARNING: Error reading {full_path}: {e}")
continue
if not current_year_models:
print(f" WARNING: No valid model data found for year {year_str}. Skipping.")
continue
try:
ds_combined = xr.concat(current_year_models, dim='ensmem')
fcst_data = ds_combined['fcst'].values
fcst_data = np.where(np.abs(fcst_data) < 9999, fcst_data, np.nan)
except Exception as e:
print(f" WARNING: Error combining data for {year_str}: {e}. Skipping.")
continue
if np.isnan(fcst_data).all():
print(f" WARNING: Data for {year_str} is entirely NaN. Skipping.")
continue
if time_period == 'monthly':
data_proc = fcst_data[:, lead_int, :, :]
elif time_period == 'seasonal':
leads_to_avg = slice(lead_int - 1, lead_int + 2)
fcst_subset = fcst_data[:, leads_to_avg, :, :]
data_proc = np.nanmean(fcst_subset, axis=1)
history_stack.append(data_proc)
if len(history_stack) == 0:
raise RuntimeError("CRITICAL ERROR: No valid years found for climatology.")
full_hist_array = np.stack(history_stack, axis=0) # (Years, Members, Lat, Lon)
# Unit Conversions
if variable == 'prate':
full_hist_array = full_hist_array * 86400
elif variable in ['tmp2m', 'tmpsfc']:
full_hist_array = full_hist_array - 273.15
if variable == 'tmpsfc':
full_hist_array = np.where(full_hist_array > 600, np.nan, full_hist_array)
return full_hist_array
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def get_trend_value(base_path, variable, time_period, init_month, lead_time, config, target_year):
"""
Helper to get the scalar trend value to remove from a specific target year for Models.
Calculates the ensemble mean history, fits trend, evaluates trend at target year.
Returns grid of adjustments to SUBTRACT from forecast.
"""
# 1. Get History (Raw)
clim_years = config['years']
hist_array = _load_history_stack(base_path, variable, time_period, init_month, lead_time, config)
# 2. Ensemble Mean of History
hist_ens_mean = np.nanmean(hist_array, axis=1) # (Years, Lat, Lon)
# 3. Calculate Trend
_, slope, intercept = detrend_grid(hist_ens_mean, clim_years)
# 4. Calculate Removal Value
# Trend to remove = (Slope * TargetYear + Intercept) - Mean
# Note: detrend_grid returns (Data - Trend + Mean).
# We want just the "Trend anomaly" to subtract from the raw forecast.
# Trend Anomaly = (Slope * Year + Intercept) - Mean
hist_mean = np.mean(hist_ens_mean, axis=0)
trend_val = (slope * int(target_year) + intercept) - hist_mean
return trend_val
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def calc_clim(base_path, variable, time_period,
init_month, lead_time, config,
return_members=False, detrend=False):
mode_str = "Member-Wise" if return_members else "Ensemble Mean"
print(f"Calculating {mode_str} Climatology for {variable} (Detrend={detrend})...")
clim_years = config['years']
full_hist_array = _load_history_stack(base_path, variable, time_period, init_month, lead_time, config)
# full_hist_array is (Years, Members, Lat, Lon)
# Detrending Logic
if detrend:
# We detrend the ENSEMBLE MEAN, then apply that adjustment to members if needed
# 1. Calc Ens Mean
ens_mean_hist = np.nanmean(full_hist_array, axis=1) # (Years, Lat, Lon)
# 2. Calc Trend on Ens Mean
detrended_mean, slope, intercept = detrend_grid(ens_mean_hist, clim_years)
# 3. Calculate the adjustment required per year
# adjustment = Original_Mean - Detrended_Mean
adjustment = ens_mean_hist - detrended_mean
# Expand dims to match members: (Years, 1, Lat, Lon)
adjustment = adjustment[:, np.newaxis, :, :]
# 4. Apply to full array
full_hist_array = full_hist_array - adjustment
# Statistics Calculation
if return_members:
clim = np.nanmean(full_hist_array, axis=0) # This will basically be the mean of the detrended series
stddev = np.nanstd(full_hist_array, axis=0)
anoms = full_hist_array - clim
ptiles = np.nanpercentile(anoms, [33, 66], axis=0)
else:
yearly_means = np.nanmean(full_hist_array, axis=1)
clim = np.nanmean(yearly_means, axis=0)
stddev = np.nanstd(yearly_means, axis=0)
anoms = yearly_means - clim
ptiles = np.nanpercentile(anoms, [33, 66], axis=0)
print(f" Climatology successfully calculated using {full_hist_array.shape[0]} valid years.")
print(f" clim shape: {clim.shape}")
return clim, stddev, ptiles
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def calc_anom(fcst, clim, stddev):
print('Calculating anomalies...')
anom = fcst - clim
with np.errstate(divide='ignore', invalid='ignore'):
std_anom = anom / stddev
std_anom = np.where(np.isfinite(std_anom), std_anom, np.nan)
print(f' anom Shape: {anom.shape}')
return anom, std_anom
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def create_terciles(fcst, clim, stddev, ptiles, variable):
print(f'Creating tercile probabilities for {variable} (Member-Wise)...')
anoms = fcst - clim
with np.errstate(divide='ignore', invalid='ignore'):
std_anoms = anoms / stddev
std_anoms = np.where(np.isfinite(std_anoms), std_anoms, np.nan)
if variable in ['tmp2m', 'tmpsfc']:
thresh_low, thresh_high = -0.43, 0.43
field = std_anoms
else:
thresh_low, thresh_high = ptiles[0, :, :, :], ptiles[1, :, :, :]
field = anoms
ut_ones = np.where(field > thresh_high, 1, 0)
lt_ones = np.where(field < thresh_low, 1, 0)
mt_ones = np.where((field >= thresh_low) & (field <= thresh_high), 1, 0)
total_members = field.shape[0]
ut_prob = np.nansum(ut_ones, axis=0) / total_members
lt_prob = np.nansum(lt_ones, axis=0) / total_members
mt_prob = np.nansum(mt_ones, axis=0) / total_members
terciles = np.array([lt_prob, mt_prob, ut_prob])
print(f' terciles shape: {terciles.shape}')
return terciles
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def create_obs_anomalies(base_path, clim_period, variable, init_month, lead_time, time_period, detrend=False):
import xarray as xr
import numpy as np
import os
# 1. Setup Logic
init = int(init_month) - 1
lead = int(lead_time)
target_month_idx = (init + lead) % 12
clim_str = clim_period.replace('_', '-')
file_map = {
'tmp2m': f"ghcn_cams.1x1.{clim_str}.mon.nc",
'prate': f"cmap.1x1.{clim_str}.mon.nc",
'tmpsfc': f"oisstv2.1.1x1.{clim_str}.mon.nc",
'soilm1m': f"ERA5.soilm1m.1x1.{clim_str}.mon.nc"
}
var_map = {'tmp2m': 'tmp2m', 'prate': 'precip', 'tmpsfc': 'tmpsfc', 'soilm1m': 'soilm'}
if '/cpc/nmme/' in base_path:
obs_path = "/cpc/home/jinfanti/MET/S2S/PythonWrapper_NMME/nmme_met_development/obs_data/"
else:
obs_path = base_path
full_path = os.path.join(obs_path, file_map[variable])
# 2. Load Data
try:
# decode_times=True reads the actual years from the file (e.g. 1994-2020)
ds = xr.open_dataset(full_path, decode_times=True)
if 'time' in ds.coords:
time_obj = ds['time']
elif 'T' in ds.coords:
time_obj = ds['T']
else:
raise KeyError("Could not find 'time' or 'T' coordinate in file.")
all_years = time_obj.dt.year.values
obs_raw = ds[var_map.get(variable, variable)].values.astype(np.float64)
except Exception as e:
raise FileNotFoundError(f"Error loading {full_path}: {e}")
# 3. Seasonal Means Logic
if time_period == 'seasonal':
term_minus = np.roll(obs_raw, 1, axis=0)
term_plus = np.roll(obs_raw, -1, axis=0)
term_minus[0] = np.nan
term_plus[-1] = np.nan
stack_season = np.stack([term_minus, obs_raw, term_plus], axis=0)
with np.errstate(invalid='ignore'):
obs_processed = np.nanmean(stack_season, axis=0)
else:
obs_processed = obs_raw
# 4. Slicing
obs_target_season = obs_processed[target_month_idx::12, :, :]
years = all_years[target_month_idx::12]
if len(years) != obs_target_season.shape[0]:
start_year_actual = int(all_years[0])
n_years = obs_target_season.shape[0]
years = np.arange(start_year_actual, start_year_actual + n_years)
# ----------------------------------------------------------------------
# DETRENDING OBS (Strict Model Methodology)
# ----------------------------------------------------------------------
if detrend:
print(" Calculating linear trend for removal (Obs - Strict Model Method)...")
data_array = obs_target_season
n_t, n_lat, n_lon = data_array.shape
# Flatten spatial dims for vectorized polyfit
y_reshaped = data_array.reshape(n_t, -1)
# STRICT MASK: Only include points where ALL time steps are finite.
valid_mask = np.isfinite(y_reshaped).all(axis=0)
# --- DEBUG PRINT: FORCED LOCATION (SIBERIA) ---
# 65N is index 155 (since -90 is index 0)
# 105E is index 105
# Formula: Lat_Index * 360 + Lon_Index
forced_idx = 155 * 360 + 105
debug_idx = None
# Check if Siberia point is valid (it should be for land data)
if 0 <= forced_idx < y_reshaped.shape[1] and valid_mask[forced_idx]:
debug_idx = forced_idx
print(f" (Debug Target: Siberia 65N, 105E found at index {forced_idx})")
else:
# Fallback to first valid point if Siberia is missing/masked
print(f" (Debug Target: Siberia index {forced_idx} invalid/masked. Falling back to first valid point.)")
valid_indices = np.where(valid_mask)[0]
debug_idx = valid_indices[0] if len(valid_indices) > 0 else None
# ---------------------------------------------
detrended_flat = y_reshaped.copy()
x = years.astype(np.float64)
if np.any(valid_mask):
y_valid = y_reshaped[:, valid_mask]
# Use np.polyfit (Degree 1) - Exactly as used in models
coeffs = np.polyfit(x, y_valid, 1)
slopes = coeffs[0, :]
intercepts = coeffs[1, :]
# Calculate Trend Line
trend = np.outer(x, slopes) + intercepts
# Remove trend but keep the mean
# Formula: Data - Trend_Line + Mean
means = np.mean(y_valid, axis=0)
detrended_valid = y_valid - trend + means
# Store back
detrended_flat[:, valid_mask] = detrended_valid
# --- DEBUG PRINT: COLUMN FORMAT ---
if debug_idx is not None:
print(f"\n --- DEBUG DETRENDING OBS (Flat Index: {debug_idx}) ---")
print(f" {'Year':<6} | {'Raw Val':<12} | {'Detrended':<12}")
print(" " + "-"*36)
for i, yr in enumerate(years):
raw_val = y_reshaped[i, debug_idx]
new_val = detrended_flat[i, debug_idx]
print(f" {int(yr):<6} | {raw_val:<12.4f} | {new_val:<12.4f}")
print(" " + "-"*36 + "\n")
else:
print("\n --- DEBUG: No valid gridpoints found to print! ---\n")
# ----------------------------------
# Reshape back to (Time, Lat, Lon)
obs_target_season = detrended_flat.reshape(n_t, n_lat, n_lon)
# ----------------------------------------------------------------------
# 5. Climatology Calculation
obs_clim = np.nanmean(obs_target_season, axis=0)
obs_stddev = np.nanstd(obs_target_season, axis=0)
# 6. Verification Slicing
verif_slice = obs_target_season
# 7. Anomaly Calculation
obs_anom = verif_slice - obs_clim
with np.errstate(divide='ignore', invalid='ignore'):
obs_std_anom = obs_anom / obs_stddev
obs_std_anom = np.where(np.isfinite(obs_std_anom), obs_std_anom, np.nan)
return verif_slice, obs_anom, obs_std_anom, obs_clim, obs_stddev
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def create_obs_terciles(obs_anom, obs_std_anom, variable):
print(f'Creating observed terciles for {variable}...')
if variable in ['tmp2m', 'tmpsfc']:
thresh_low, thresh_high = -0.43, 0.43
field = obs_std_anom
else:
thresh_low = np.percentile(obs_anom, 33, axis=0)
thresh_high = np.percentile(obs_anom, 66, axis=0)
field = obs_anom
ut_ones = np.where(field > thresh_high, 1, 0)
lt_ones = np.where(field < thresh_low, 1, 0)
mt_ones = np.where((field >= thresh_low) & (field <= thresh_high), 1, 0)
terciles_obs = np.stack([lt_ones, mt_ones, ut_ones], axis=0)
print(f' terciles_obs shape: {terciles_obs.shape}')
return terciles_obs
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def create_dominant_tercile_fcst(terciles_fcst):
max_indices = np.argmax(terciles_fcst, axis=0)
dominant_tercile_model = max_indices + 1.0
mask_check = np.nansum(terciles_fcst, axis=0)
dominant_tercile_model = np.where(mask_check == 0, np.nan, dominant_tercile_model)
dominant_tercile_model = np.where(np.isnan(mask_check), np.nan, dominant_tercile_model)
return dominant_tercile_model
# --------------------------------------------------------------------------------------------------
# --------------------------------------------------------------------------------------------------
def create_dominant_tercile_obs(terciles_obs):
max_indices = np.argmax(terciles_obs, axis=0)
dominant_tercile_obs = max_indices + 1.0
mask_check = np.nansum(terciles_obs, axis=0)
dominant_tercile_obs = np.where(mask_check == 0, np.nan, dominant_tercile_obs)
dominant_tercile_obs = np.where(np.isnan(mask_check), np.nan, dominant_tercile_obs)
return dominant_tercile_obs
# --------------------------------------------------------------------------------------------------
parm/use_cases/model_applications/s2s_mme/SeriesAnalysis_fcstSFS_obsGHCNCAMS_Prob_SmartMasking_Detrend/wrapper_combined_withDetrend.py
import sys
import os
import time
import numpy as np
import datetime as dt
import xarray as xr
from dateutil.relativedelta import *
import function_library_withDetrend as fl
# =================================================================
# USER CONFIGURATION: ORIENTATION & PROCESSING FLAGS
# =================================================================
FLIP_OBS = True # Set to True to flip Observation data (North <-> South)
FLIP_MODELS = False # Set to True to flip Model data (North <-> South)
DETREND_DATA = False # Set to True to remove linear trend based on clim period
# =================================================================
def main():
start_time = time.time()
print("-------------------------------------------------------------------")
print("Starting NMME Pre-processing Wrapper (Manual Flip Enforced)")
if DETREND_DATA:
print(">>> DETRENDING ACTIVATED: Linear trend will be removed.")
print("-------------------------------------------------------------------")
if len(sys.argv) < 10:
print("Usage: python wrapper_combined.py <init> <lead> <var> <clim_per> <models> <path> <time_per> <target_yr> <field>")
sys.exit(1)
init_month, lead_time, variable = sys.argv[1], sys.argv[2], sys.argv[3]
clim_period, model_arg = sys.argv[4], sys.argv[5]
base_path, time_period = sys.argv[6], sys.argv[7]
target_year, field = sys.argv[8], sys.argv[9]
model_input = [model_arg]
init_time_str = target_year + init_month
init_time = dt.datetime.strptime(init_time_str, "%Y%m")
lead_val = int(lead_time)
val_time = init_time + relativedelta(months=lead_val)
# Unit Logic
if variable in ['tmpsfc', 'tmp2m']:
units = 'deg C'
elif variable == 'prate':
units = 'mm/day'
else:
units = 'unitless'
if 'tercile' in field: units = 'percent'
try:
config = fl.setup(model_input, clim_period)
# Parse Start Year for Obs logic
try:
sep = '_' if '_' in clim_period else '-'
file_start_year = int(clim_period.split(sep)[0])
except:
file_start_year = 1982
print("WARNING: Defaulting obs start year to 1982.")
v = None
is_observation = (model_arg == 'obs')
# Determine if we need to flip based on the flags at the top
should_flip = False
# ==================================================================
# BRANCH A: OBSERVATIONS
# ==================================================================
if is_observation:
print(">>> MODE: OBSERVATION PROCESSING")
should_flip = FLIP_OBS
# Process Data
# Note: The detrend flag is passed here. If True, the function:
# 1. Detrends the entire history (removing trend from climatology)
# 2. Detrends the verification slice (removing trend from target)
verif, anom, std_anom, clim, std = fl.create_obs_anomalies(
base_path, clim_period, variable, init_month, lead_time, time_period,
detrend=DETREND_DATA
)
target_idx = int(target_year) - file_start_year
if target_idx < 0 or target_idx >= anom.shape[0]:
raise IndexError(f"Target year {target_year} is outside range.")
# --- SMART MASKING START ---
# Capture the Ocean Mask (Truth) from the raw anomaly field before we process further.
# Anywhere 'anom' is NaN is truly missing data (Ocean).
obs_mask = np.isnan(anom[target_idx, :, :])
# --- SMART MASKING END ---
if field == 'raw': v = verif[target_idx, :, :]
elif field == 'anom': v = anom[target_idx, :, :]
elif field == 'std_anom': v = std_anom[target_idx, :, :]
elif field == 'clim_mean': v = clim
elif field == 'clim_std': v = std
elif 'tercile' in field:
obs_terciles = fl.create_obs_terciles(anom, std_anom, variable)
if field == 'lower_tercile': v = obs_terciles[0, target_idx, :, :]
elif field == 'middle_tercile': v = obs_terciles[1, target_idx, :, :]
elif field == 'upper_tercile': v = obs_terciles[2, target_idx, :, :]
# ==================================================================
# BRANCH B: MODELS
# ==================================================================
else:
print(">>> MODE: MODEL FORECAST PROCESSING")
should_flip = FLIP_MODELS
# Process Data
if field == 'raw':
v = fl.open_and_process_models(base_path, variable, time_period, init_month, lead_time, target_year, config, return_members=False)
# Note: Detrending raw model fields is skipped in this simple implementation
if DETREND_DATA:
print(" WARNING: 'raw' model field is NOT detrended in this simple implementation.")
elif 'clim' in field:
# Calculate Climatology (Pass Detrend Flag)
clim_mean, std_mean, _ = fl.calc_clim(base_path, variable, time_period, init_month, lead_time, config, return_members=False, detrend=DETREND_DATA)
v = clim_mean if field == 'clim_mean' else std_mean
elif field in ['anom', 'std_anom']:
# 1. Calc Climatology (Detrended if flag is True)
clim_mean, std_mean, _ = fl.calc_clim(base_path, variable, time_period, init_month, lead_time, config, return_members=False, detrend=DETREND_DATA)
# 2. Open Raw Forecast
fcst_mean = fl.open_and_process_models(base_path, variable, time_period, init_month, lead_time, target_year, config, return_members=False)
# 3. Apply Detrending to Forecast Target Year
if DETREND_DATA:
trend_val = fl.get_trend_value(base_path, variable, time_period, init_month, lead_time, config, target_year)
fcst_mean = fcst_mean - trend_val
print(f" Applied trend removal to forecast for year {target_year}")
# 4. Calculate Anomaly
anom, std_anom = fl.calc_anom(fcst_mean, clim_mean, std_mean)
v = anom if field == 'anom' else std_anom
elif 'tercile' in field:
# Member-wise logic
# 1. Calc Climatology stats (Detrended if flag is True)
clim_mem, std_mem, ptiles_mem = fl.calc_clim(base_path, variable, time_period, init_month, lead_time, config, return_members=True, detrend=DETREND_DATA)
# 2. Open Raw Forecast Members
fcst_mem = fl.open_and_process_models(base_path, variable, time_period, init_month, lead_time, target_year, config, return_members=True)
# 3. Apply Detrending to Forecast Members
if DETREND_DATA:
trend_val = fl.get_trend_value(base_path, variable, time_period, init_month, lead_time, config, target_year)
fcst_mem = fcst_mem - trend_val
print(f" Applied trend removal to forecast members for year {target_year}")
# 4. Create Terciles
probs = fl.create_terciles(fcst_mem, clim_mem, std_mem, ptiles_mem, variable)
if field == 'lower_tercile': v = probs[0,:,:]
elif field == 'middle_tercile': v = probs[1,:,:]
elif field == 'upper_tercile': v = probs[2,:,:]
# ------------------------------------------------------------------
# FINAL PROCESSING
# ------------------------------------------------------------------
if v is None: raise NameError("Data variable 'v' was not assigned.")
var = np.float64(v)
var[var < -800] = np.nan
var[var > 800] = np.nan
# 1. FIX LAND NON-EVENTS: Turn NaNs into 0.0 for Tercile fields
# This fixes the issue where valid land "Misses" were being ignored by MET.
if 'tercile' in field:
var = np.nan_to_num(var, nan=0.0)
# 2. FIX OCEAN MASK: Restore NaNs where data was originally missing
# This prevents the ocean from being counted as valid "0.0" data.
if is_observation and 'obs_mask' in locals():
# Safety check: ensure shapes match before applying mask
if var.shape == obs_mask.shape:
var[obs_mask] = np.nan
print(" ACTION: Re-applied ocean mask to Observation data.")
met_data = np.squeeze(var).copy()
# --- FLIP LOGIC ---
if should_flip:
print(" ACTION: Flipping data (np.flipud) to align with MET Grid.")
met_data = np.flipud(met_data).copy()
else:
print(" ACTION: Data orientation preserved (No Flip applied).")
# Grid Definition (South-to-North)
grid_data = {
'name': 'Global_1x1', 'type': 'LatLon',
'lat_ll': -90.0, 'lon_ll': 0.0, 'delta_lat': 1.0, 'delta_lon': 1.0,
'Nlat': 181, 'Nlon': 360,
}
attrs = {
'valid': str(val_time.strftime("%Y%m%d"))+'_'+str(val_time.strftime("%H%M%S")),
'init': str(init_time.strftime("%Y%m%d"))+'_'+str(init_time.strftime("%H%M%S")),
'lead': lead_time,
'name': variable,
'accum': '00', 'level': 'ground', 'units': units,
'long_name': f"{variable} {field}",
'grid': grid_data
}
except Exception as e:
print(f"ERROR: {e}")
import traceback
traceback.print_exc()
sys.exit(1)
elapsed_time = time.time() - start_time
print(f"Elapsed time: {elapsed_time:.2f} seconds")
return met_data, attrs
print(f"PYTHON SCRIPT ARGUMENTS: {sys.argv}")
met_data, attrs = main()
For more information on the basic requirements to utilize Python Embedding in METplus, please refer to the MET User’s Guide section on Python embedding.
User Scripting
This use case does not use additional scripts.
Running METplus
Pass the use case configuration file to the run_metplus.py script along with any user-specific system configuration files if desired:
run_metplus.py /path/to/METplus/parm/use_cases/model_applications/s2s_mme/SeriesAnalysis_fcstSFS_obsGHCNCAMS_Prob_SmartMasking_Detrend.conf /path/to/user_system.conf
See Running METplus for more information.
Expected Output
A successful run will output the following both to the screen and to the logfile:
INFO: METplus has successfully finished running.
Refer to the value set for OUTPUT_BASE to find where the output data was generated. Output will be in a directory named SeriesAnalysis_fcstSFS_obsGHCNCAMS_Prob_SmartMasking_Detrend relative to OUTPUT BASE and will contain one file. The netCDF file will contain eight variable fields (not including the lat/lon fields). These correspond to the statistics that were requested in the configuration file, and are as follows:
series_pstd_BRIER_obsge0.5(lat, lon)
series_pstd_RELIABILITY_obsge0.5(lat, lon)
series_pstd_RESOLUTION_obsge0.5(lat, lon)
series_pstd_UNCERTAINTY_obsge0.5(lat, lon)
series_pstd_ROC_AUC_obsge0.5(lat, lon)
series_pstd_BSS_obsge0.5(lat, lon)
series_pstd_BRIERCL_obsge0.5(lat, lon)
series_pstd_TOTAL_obsge0.5(lat, lon)
Keywords
Note
SeriesAnalysisUseCase
PythonEmbeddingFileUseCase
CustomStringLoopingUseCase
S2SMMEAppUseCase
Navigate to the METplus Quick Search for Use Cases page to discover other similar use cases.
sphinx_gallery_thumbnail_path = ‘_static/s2s_mme-SeriesAnalysis_fcstSFS_obsGHCNCAMS_Prob_SmartMasking_Detrend.png’