Note

Click here to download the full example code

5.2.9.7. Grid-Stat and Series-Analysis: BMKG APIK Seasonal Forecast

model_applications/s2s/GridStat_SeriesAnalysis _fcstNMME_obsCPC _seasonal_forecast.conf

Scientific Objective

The process of seasonal forecasting with a time horizon of one to many months (typically 6 to 9 months) poses new challenges to tools primarily developed for weather forecasting that cover a few days. These challenges include two aspects in particular: (1) a dramatically expanded time variable, and (2) a verification that is by design backward oriented using extensive hindcasts over past decades rather than the rapid verification possible in short-range weather forecasting. Therefore, the scientific objective of the seasonal forecast usecase involves the expansion of options to describe time as well as the strategic selection of hindcasts.

Time:

Commonly METplus expresses time intervals in the minutes, hours, and days. Month and year intervals were not supported since there is not a constant length for these units. Therefore, modifications to METplus were made to support these intervals by determining the offset relative to a given time.

Input data:

The input data from seasonal forecasts is generally based on daily, weekly, decadal (10-day), monthly or seasonal time integrated intervals. The time variable therefore is often no longer a simple snapshot of the system but rather representing an average, a sum (precipitation), or a particular statistic (maximum wind, minimum temperature, wind variability) over the integration time period. This requires some adjustment from the traditional approach in forecast verification where forecast time (“valid-time”) is simply a snapshot out of a continuous run.

Hindcasts:

The objective of seasonal forecasts is no longer the exact location and intensity of one particular weather event, such as a storm, a frontal passage, or high wind conditions. Rather, seasonal forecasting focuses more on the statistical properties over a period of time, be it a 10-day interval, a month, or even a three month season. The verification of a new, forward looking seasonal forecast requires assessments of the forecast systems ability to appropriately forecast that longrange behavior of the weather (here, only atmospheric verification is considered, but the same concept would apply ocean or any other longrange forecast system). Because weather properties commonly change significantly over the course of the season, samples to verify the prognostic system can not be taken from the immediate days, weeks of months before the forecast. Hindcasting in the seasonal context requires a complete set of forecasts based on the same season but during past years. A current July-1 2019 forecast, therefore requires many July-1 forecasts for as many years in the past as possible, given that the forecast system is the same as the one used for the current forecast cycle into the future. Operational centers offer hindcasts, also sometimes called “re-forecasts”, with the current, most up-to-date forecast system. MET and METplus therefore need to be able to extract the appropriate collection of past forecasts. This includes the identification of the same Julian-day-of-Year init-dates from forecasts cycles from past years, and then identify the different lead-times of interest generally ranging from one to 6 or more months.

Verification:

The verification steps can then utilize the existing collection of verification tools. In comparison to weather forecasts, the only difference is that the data, as stated above, are not snapshots but time-integrated values (averages, sums, statistics) that are representing a whole period of time. The verification then focuses on comparisons of these derivatives of the forecast simulations. In practice, a further step might be added prior to, or as a key step during verification: the formation of anomalies of the forecasts compared to long-term expected averages. A rainfall forecasts can therefore be verified in both absolute as well as anomaly context where some analyses might focus on extreme rainfall threshold exeedance of, for example, 500mm per month. At the same time, the same forecast might be verified for the 3 months rainfall average in comparison with the long-term expected mean. The verification might then assess how well the system can foresee the occurrence of below average rainfall over the season, and possibly some selected thresholds there (e.g., ability to forecast mean seasonal rainfall below the 10-th percentile of seasonal rainfall). Finally, flexibility in formulating forecast verification strategies is important as forecast skill might vary by location, the timing within the seasonal cycle, or the state of the evolving coupled system (the rapid onset of a strong El Nino will lead to significantly different forecast skill compared to a neutral state in the Pacific). Memory from past months, for example when considering accumulated soil moisture, might also influence the forecast skills. Seasonal forecast verification therefore requires understanding of the climate system; MET and METplus then need to offer the flexibility to tailor verification strategies and to potentially craft conditional approaches.

Overall, seasonal forecasts don’t require a new verification approach. It does however put demands on the flexibility of dealing with a significantly exapanded range of the time variable as well as logistic infrastructure to select appropriate hindcast samples from long hindcast or re-forecast archives. Scientifically, the challenges are mostly restricted in the appropriate formulation of verificaation questions that address specific forecast objectives. Compared to weather forecastsing, seasonal forecasts need to draw their skill from slowly changing components in the coupled Earth system while acknowledging the high-frequency noise of weather superposed on these ‘climatologically’ evolving background conditions. In many regions of the world, the noise might dominate that background climate and forecast skill is low. It is therefore the task of seasonal forecast verification to identify where there is actually skill for particular properties of the forecasts over a wide range of lead-times. The skill might be dependent on location, on the timing within the seasonal cycle, or even on the evolving state of the coupled system.

Datasets

All datasets are traditionally in netCDF format. Grids are either regular gaussian Latitude/Longitude grids or they are Lambert-conformal WRF grids.

The forecast datasets contain weekly, monthly or seasonally integrated data. Here, the time format of the use-case is monthly. Since the verification is done on the hindcasts rather than the forecast (would require another 6 months of waiting), the key identification here is the month of initialization and then the lead-time of the forecast of interest.

The hindcast data, the ‘observational’ data that is to be compared to the forecast, is a collection of datasets formatted in equivalent format to the forecast. The hindcast ensemble is identified through the year in the filename (as well as in the time variable inside the netCDF file).

Forecast Datasets:

NMME * variable of interest: pr (precipitation: cumulative monthly sum) * format of precipitation variable: time,lat,lon (here dimensions: 29,181,361) with time variable representing 29 samples of same Julian Init-Time of hindcasts over past 29 years.

Hindcast Datasets:

Observational Dataset:

CPC precipitation reference data (same format and grid)

METplus Components

This use case loops over initialization years and processes forecast lead months with GridStat It also processes the output of GridStat using two calls to SeriesAnalysis.

External Dependencies

You will need to use a version of Python 3.6+ that has the following packages installed:

netCDF4

METplus Workflow

The following tools are used for each run time: GridStat

This example loops by initialization time. Each initialization time is July of each year from 1982 to 2010. For each init time it will run once, processing forecast leads 1 month through 5 months. The following times are processed:

Run times:

Init: 1982-07
Forecast leads: 1 month, 2 months, 3 months, 4 months, 5 months

Init: 1983-07
Forecast leads: 1 month, 2 months, 3 months, 4 months, 5 months

Init: 1984-07
Forecast leads: 1 month, 2 months, 3 months, 4 months, 5 months

Init: 1985-07
Forecast leads: 1 month, 2 months, 3 months, 4 months, 5 months

…

Init: 2009-07
Forecast leads: 1 month, 2 months, 3 months, 4 months, 5 months

Init: 2010-07
Forecast leads: 1 month, 2 months, 3 months, 4 months, 5 months

METplus Configuration

# Grid to Grid APIK Verification - S2S Use Case 1: Comparison of NMME hindcasts to CPC observations

[config]

# List of applications to run
PROCESS_LIST = GridStat, SeriesAnalysis(climo), SeriesAnalysis(full_stats)

# loop by INIT time (options are INIT, VALID, or ?)
LOOP_BY = INIT

# Format of INIT_BEG and INIT_END
INIT_TIME_FMT = %Y%m

# Start time for METplus run
INIT_BEG = 198207

# End time for METplus run
INIT_END = 201007

INIT_INCREMENT = 1Y

# list of forecast leads to process  (JLV-NOTE: This only works for grid_stat and example wrappers right now, using feature_281_py_embed)
LEAD_SEQ = 1m, 2m, 3m, 4m, 5m, 6m

#SERIES_ANALYSIS_CUSTOM_LOOP_LIST =

# Options are times, processes
# times = run all items in the PROCESS_LIST for a single initialization
# time, then repeat until all times have been evaluated.
# processes = run each item in the PROCESS_LIST for all times
#   specified, then repeat for the next item in the PROCESS_LIST.
LOOP_ORDER = processes

FCST_GRID_STAT_VAR1_NAME = pr
FCST_GRID_STAT_VAR1_LEVELS = "({valid?fmt=%Y%m01_000000},*,*)"
FCST_GRID_STAT_VAR1_THRESH = >0, >50, >100, >150, >200, >250, >300, >400, >500

OBS_GRID_STAT_VAR1_NAME = precip
OBS_GRID_STAT_VAR1_LEVELS = "({valid?fmt=%Y%m01_000000},*,*)"
OBS_GRID_STAT_VAR1_THRESH = >0, >50, >100, >150, >200, >250, >300, >400, >500

FCST_SERIES_ANALYSIS_VAR1_NAME = FCST_precip_FULL
FCST_SERIES_ANALYSIS_VAR1_LEVELS = "(*,*)"

OBS_SERIES_ANALYSIS_VAR1_NAME = OBS_precip_FULL
OBS_SERIES_ANALYSIS_VAR1_LEVELS = "(*,*)"

# description of data to be processed
# used in output file path
MODEL = NMME
OBTYPE = CPC

# location of grid_stat MET config file
GRID_STAT_CONFIG_FILE = {PARM_BASE}/met_config/GridStatConfig_wrapped

GRID_STAT_OUTPUT_FLAG_CTC = STAT
GRID_STAT_OUTPUT_FLAG_CNT = STAT
GRID_STAT_OUTPUT_FLAG_SL1L2 = STAT

GRID_STAT_NC_PAIRS_FLAG_APPLY_MASK = FALSE

GRID_STAT_NC_PAIRS_VAR_NAME = precip

# variables to describe format of forecast data
FCST_IS_PROB = false

# variables to describe format of observation data
#  none needed

# Increase verbosity of MET tools
#LOG_MET_VERBOSITY=4

GRID_STAT_OUTPUT_PREFIX = {MODEL}-hindcast_{CURRENT_OBS_NAME}_vs_{OBTYPE}_IC{init?fmt=%Y%b}_V{valid?fmt=%Y%2m%d}

# sets the desc variable in the SeriesAnalysis config file
SERIES_ANALYSIS_DESC = hindcast

# sets the cat_thresh variable in the SeriesAnalysis config file
SERIES_ANALYSIS_CAT_THRESH = >=50, >=100, >=150, >=200, >=250, >=300, >=400, >=500

# sets the vld_thresh variable in the SeriesAnalysis config file
SERIES_ANALYSIS_VLD_THRESH = 0.50

# sets the block_size variable in the SeriesAnalysis config file
SERIES_ANALYSIS_BLOCK_SIZE = 360*181

# set to True to add the -paired flag to the SeriesAnalysis command
SERIES_ANALYSIS_IS_PAIRED = False

# MET Configuration file passed to SeriesAnalysis
SERIES_ANALYSIS_CONFIG_FILE = {PARM_BASE}/met_config/SeriesAnalysisConfig_wrapped

# If True/yes, run plot_data_plane on output from Series-Analysis to generate
# images for each stat item listed in SERIES_ANALYSIS_STAT_LIST
SERIES_ANALYSIS_GENERATE_PLOTS = no

# If True/yes, run convert on output from Series-Analysis to generate
# a gif using images in groups of name/level/stat
SERIES_ANALYSIS_GENERATE_ANIMATIONS = no

# grid to remap data. Value is set as the 'to_grid' variable in the 'regrid' dictionary
# See MET User's Guide for more information
#SERIES_ANALYSIS_REGRID_TO_GRID = NONE

SERIES_ANALYSIS_RUNTIME_FREQ = RUN_ONCE_PER_LEAD

SERIES_ANALYSIS_RUN_ONCE_PER_STORM_ID = False

# used for SeriesAnalysis(climo) instance
SERIES_ANALYSIS_STAT_LIST = OBAR

[full_stats]

SERIES_ANALYSIS_STAT_LIST =TOTAL, FBAR, OBAR, ME, MAE, RMSE, ANOM_CORR, PR_CORR
SERIES_ANALYSIS_CTS_LIST = BASER, CSI, GSS

SERIES_ANALYSIS_CLIMO_MEAN_INPUT_DIR = {SERIES_ANALYSIS_OUTPUT_DIR}
SERIES_ANALYSIS_CLIMO_MEAN_INPUT_TEMPLATE = series_analysis_{MODEL}_{OBTYPE}_stats_F{lead?fmt=%2m}_climo.nc


[dir]

# input and output data directories
FCST_GRID_STAT_INPUT_DIR = {INPUT_BASE}/model_applications/s2s/NMME/hindcast/monthly
OBS_GRID_STAT_INPUT_DIR = {INPUT_BASE}/model_applications/s2s/NMME/obs
GRID_STAT_OUTPUT_DIR = {OUTPUT_BASE}/model_applications/s2s/GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast/GridStat

BOTH_SERIES_ANALYSIS_INPUT_DIR = {GRID_STAT_OUTPUT_DIR}

SERIES_ANALYSIS_OUTPUT_DIR = {OUTPUT_BASE}/model_applications/s2s/GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast/SeriesAnalysis

# used in full_stats instance file only
SERIES_ANALYSIS_CLIMO_MEAN_INPUT_DIR =


[filename_templates]

# format of filenames
# FCST
FCST_GRID_STAT_INPUT_TEMPLATE = nmme_pr_hcst_{init?fmt=%b}IC_{valid?fmt=%2m}_*.nc

# ANLYS
OBS_GRID_STAT_INPUT_TEMPLATE = obs_cpc_pp.1x1.nc

BOTH_SERIES_ANALYSIS_INPUT_TEMPLATE = grid_stat_{MODEL}-hindcast_precip_vs_{OBTYPE}_IC{init?fmt=%Y%b}_V{valid?fmt=%Y%2m}01_*pairs.nc

SERIES_ANALYSIS_OUTPUT_TEMPLATE = series_analysis_{MODEL}_{OBTYPE}_stats_F{lead?fmt=%2m}_{instance?fmt=%s}.nc

# used in full_stat instance only
SERIES_ANALYSIS_CLIMO_MEAN_INPUT_TEMPLATE =

MET Configuration

METplus sets environment variables based on user settings in the METplus configuration file. See How METplus controls MET config file settings for more details.

YOU SHOULD NOT SET ANY OF THESE ENVIRONMENT VARIABLES YOURSELF! THEY WILL BE OVERWRITTEN BY METPLUS WHEN IT CALLS THE MET TOOLS!

If there is a setting in the MET configuration file that is currently not supported by METplus you’d like to control, please refer to: Overriding Unsupported MET config file settings

GridStatConfig_wrapped

Note

See the GridStat MET Configuration section of the User’s Guide for more information on the environment variables used in the file below:

////////////////////////////////////////////////////////////////////////////////
//
// Grid-Stat configuration file.
//
// For additional information, see the MET_BASE/config/README file.
//
////////////////////////////////////////////////////////////////////////////////

//
// Output model name to be written
//
// model =
${METPLUS_MODEL}

//
// Output description to be written
// May be set separately in each "obs.field" entry
//
// desc =
${METPLUS_DESC}

//
// Output observation type to be written
//
// obtype =
${METPLUS_OBTYPE}

////////////////////////////////////////////////////////////////////////////////

//
// Verification grid
//
// regrid = {
${METPLUS_REGRID_DICT}

////////////////////////////////////////////////////////////////////////////////

censor_thresh    = [];
censor_val       = [];
cat_thresh  	 = [];
cnt_thresh  	 = [ NA ];
cnt_logic   	 = UNION;
wind_thresh 	 = [ NA ];
wind_logic  	 = UNION;
eclv_points      = 0.05;
nc_pairs_var_suffix = "";
//nc_pairs_var_name =
${METPLUS_NC_PAIRS_VAR_NAME}
rank_corr_flag   = FALSE;

//
// Forecast and observation fields to be verified
//
fcst = {
  ${METPLUS_FCST_FILE_TYPE}
  ${METPLUS_FCST_FIELD}
}
obs = {
  ${METPLUS_OBS_FILE_TYPE}
  ${METPLUS_OBS_FIELD}
}

////////////////////////////////////////////////////////////////////////////////

//
// Climatology mean data
//
//climo_mean = {
${METPLUS_CLIMO_MEAN_DICT}


//climo_stdev = {
${METPLUS_CLIMO_STDEV_DICT}

//
// May be set separately in each "obs.field" entry
//
//climo_cdf = {
${METPLUS_CLIMO_CDF_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Verification masking regions
//
// mask = {
${METPLUS_MASK_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Confidence interval settings
//
ci_alpha  = [ 0.05 ];

boot = {
   interval = PCTILE;
   rep_prop = 1.0;
   n_rep    = 0;
   rng      = "mt19937";
   seed     = "";
}

////////////////////////////////////////////////////////////////////////////////

//
// Data smoothing methods
//
//interp = {
${METPLUS_INTERP_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Neighborhood methods
//
nbrhd = {
   field      = BOTH;
   // shape =
   ${METPLUS_NBRHD_SHAPE}
   // width =
   ${METPLUS_NBRHD_WIDTH}
   // cov_thresh =
   ${METPLUS_NBRHD_COV_THRESH}
   vld_thresh = 1.0;
}

////////////////////////////////////////////////////////////////////////////////

//
// Fourier decomposition
// May be set separately in each "obs.field" entry
//
fourier = {
   wave_1d_beg = [];
   wave_1d_end = [];
}

////////////////////////////////////////////////////////////////////////////////

//
// Gradient statistics
// May be set separately in each "obs.field" entry
//
gradient = {
   dx = [ 1 ];
   dy = [ 1 ];
}

////////////////////////////////////////////////////////////////////////////////

//
// Distance Map statistics
// May be set separately in each "obs.field" entry
//
distance_map = {
   baddeley_p        = 2;
   baddeley_max_dist = NA;
   fom_alpha         = 0.1;
   zhu_weight        = 0.5;
}

////////////////////////////////////////////////////////////////////////////////

//
// Statistical output types
//
//output_flag = {
${METPLUS_OUTPUT_FLAG_DICT}

//
// NetCDF matched pairs output file
// May be set separately in each "obs.field" entry
//
// nc_pairs_flag = {
${METPLUS_NC_PAIRS_FLAG_DICT}

////////////////////////////////////////////////////////////////////////////////

//grid_weight_flag =
${METPLUS_GRID_WEIGHT_FLAG}
tmp_dir          = "/tmp";
// output_prefix =
${METPLUS_OUTPUT_PREFIX}

////////////////////////////////////////////////////////////////////////////////

${METPLUS_MET_CONFIG_OVERRIDES}

SeriesAnalysisConfig_wrapped

Note

See the SeriesAnalysis MET Configuration section of the User’s Guide for more information on the environment variables used in the file below:

////////////////////////////////////////////////////////////////////////////////
//
// Series-Analysis configuration file.
//
// For additional information, see the MET_BASE/config/README file.
//
////////////////////////////////////////////////////////////////////////////////

//
// Output model name to be written
//
${METPLUS_MODEL}

//
// Output description to be written
//
${METPLUS_DESC}

//
// Output observation type to be written
//
${METPLUS_OBTYPE}

////////////////////////////////////////////////////////////////////////////////

//
// Verification grid
// May be set separately in each "field" entry
//
${METPLUS_REGRID_DICT}

////////////////////////////////////////////////////////////////////////////////

censor_thresh = [];
censor_val    = [];
${METPLUS_CAT_THRESH}
cnt_thresh    = [ NA ];
cnt_logic     = UNION;

//
// Forecast and observation fields to be verified
//
fcst = {
   ${METPLUS_FCST_FILE_TYPE}
   ${METPLUS_FCST_FIELD}
}
obs = {
   ${METPLUS_OBS_FILE_TYPE}
   ${METPLUS_OBS_FIELD}
}

////////////////////////////////////////////////////////////////////////////////

//
// Climatology data
//
//climo_mean = {
${METPLUS_CLIMO_MEAN_DICT}


//climo_stdev = {
${METPLUS_CLIMO_STDEV_DICT}

////////////////////////////////////////////////////////////////////////////////

//
// Confidence interval settings
//
ci_alpha  = [ 0.05 ];

boot = {
   interval = PCTILE;
   rep_prop = 1.0;
   n_rep    = 0;
   rng      = "mt19937";
   seed     = "";
}

////////////////////////////////////////////////////////////////////////////////

//
// Verification masking regions
//
mask = {
   grid = "";
   poly = "";
}

//
// Number of grid points to be processed concurrently.  Set smaller to use
// less memory but increase the number of passes through the data.
//
${METPLUS_BLOCK_SIZE}

//
// Ratio of valid matched pairs to compute statistics for a grid point
//
${METPLUS_VLD_THRESH}

////////////////////////////////////////////////////////////////////////////////

//
// Statistical output types
//
output_stats = {
   fho    = [];
   ctc    = [];
   ${METPLUS_CTS_LIST}
   mctc   = [];
   mcts   = [];
   ${METPLUS_STAT_LIST}
   sl1l2  = [];
   sal1l2 = [];
   pct    = [];
   pstd   = [];
   pjc    = [];
   prc    = [];
}

////////////////////////////////////////////////////////////////////////////////

rank_corr_flag = FALSE;
tmp_dir        = "/tmp";
//version        = "V10.0";

////////////////////////////////////////////////////////////////////////////////

${METPLUS_MET_CONFIG_OVERRIDES}

Running METplus

This use case can be run two ways:

Passing in GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast.conf then a user-specific system configuration file:

run_metplus.py -c /path/to/METplus/parm/use_cases/model_applications/s2s/GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast.conf -c /path/to/user_system.conf

Modifying the configurations in parm/metplus_config, then passing in GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast.conf:

run_metplus.py -c /path/to/METplus/parm/use_cases/model_applications/s2s/GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast.conf

The former method is recommended. Whether you add them to a user-specific configuration file or modify the metplus_config files, the following variables must be set correctly:

INPUT_BASE - Path to directory where sample data tarballs are unpacked (See Datasets section to obtain tarballs). This is not required to run METplus, but it is required to run the examples in parm/use_cases
OUTPUT_BASE - Path where METplus output will be written. This must be in a location where you have write permissions
MET_INSTALL_DIR - Path to location where MET is installed locally

Example User Configuration File:

[dir]
INPUT_BASE = /path/to/sample/input/data
OUTPUT_BASE = /path/to/output/dir
MET_INSTALL_DIR = /path/to/met-X.Y

NOTE: All of these items must be found under the [dir] section.

Expected Output

A successful run will output the following both to the screen and to the logfile:

INFO: METplus has successfully finished running.

Refer to the value set for OUTPUT_BASE to find where the output data was generated. Output for this use case will be found in model_applications/s2s/GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast/GridStat (relative to OUTPUT_BASE)

For each month and year there will be two files written:

* grid_stat_NMME-hindcast_precip_vs_CPC_IC{%Y%b}01_2301360000L_20081001_000000V.stat
* grid_stat_NMME-hindcast_precip_vs_CPC_IC{%Y%b}01_2301360000L_20081001_000000V_pairs.nc

Output from SeriesAnalysis will be found in model_applications/s2s/GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast/SeriesAnalysis (relative to OUTPUT_BASE)

For each month there will be two files written:

* series_analysis_NMME_CPC_stats_ICJul_{%m}_climo.nc
* series_analysis_NMME_CPC_stats_ICJul_{%m}_full_stats.nc

Keywords

Note

GridStatToolUseCase, NetCDFFileUseCase, LoopByMonthFeatureUseCase, NCAROrgUseCase, RuntimeFreqUseCase

sphinx_gallery_thumbnail_path = ‘_static/s2s-GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast.png’

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery