5.2.9.2. WeatherRegime Calculation: RegridDataPlane, PcpCombine, and WeatherRegime python code

model_applications/ s2s/ UserScript_obsERA_obsOnly_WeatherRegime.py

Scientific Objective

To perform a weather regime analysis using 500 mb height data. There are 2 pre- processing steps, RegridDataPlane and PcpCombine, and 3 steps in the weather regime analysis, elbow, EOFs, and K means. The elbow and K means steps begin with K means clustering. Elbow then computes the sum of squared distances for clusters 1 - 14 and draws a straight line from the sum of squared distance for the clusters. This helps determine the optimal cluster number by examining the largest difference between the curve and the straight line. The EOFs step is optional. It computes an empirical orthogonal function analysis. The K means step uses clustering to compute the frequency of occurrernce and anomalies for each cluster to give the most common weather regimes.

Datasets

  • Forecast dataset: None

  • Observation dataset: ERA Reanlaysis 500 mb height.

External Dependencies

You will need to use a version of Python 3.6+ that has the following packages installed:

* numpy
* netCDF4
* datetime
* pylab
* scipy
* sklearn
* eofs

If the version of Python used to compile MET did not have these libraries at the time of compilation, you will need to add these packages or create a new Python environment with these packages.

If this is the case, you will need to set the MET_PYTHON_EXE environment variable to the path of the version of Python you want to use. If you want this version of Python to only apply to this use case, set it in the [user_env_vars] section of a METplus configuration file.:

[user_env_vars] MET_PYTHON_EXE = /path/to/python/with/required/packages/bin/python

METplus Components

This use case runs the weather regime driver script which runs the steps the user lists in STEPS_OBS. The possible steps are regridding, time averaging, computing the elbow (ELBOW), plotting the elbow (PLOTELBOW), computing EOFs (EOF), plotting EOFs (PLOTEOF), computing K means (KMEANS), and plotting the K means (PLOTKMEANS). Regridding and time averaging are set up in the UserScript .conf file and are formatted as follows: PROCESS_LIST = RegridDataPlane(regrid_obs), PcpCombine(daily_mean_obs), UserScript(script_wr)

The other steps are listed in the weather regime analsysis .conf file in the following format: OBS_STEPS = ELBOW+PLOTELBOW+EOF+PLOTEOF+KMEANS+PLOTKMEANS

METplus Workflow

The weather regime python code is run for each time for the forecast and observations data. This example loops by valid time. This version is set to only process the weather regime steps (ELBOW, PLOTELBOW, EOF, PLOTEOF, KMEANS, PLOTKMEANS), omitting the REGRID and TIMEAVE pre-processing steps. However, the configurations for pre-processing are available for user reference.

METplus Configuration

METplus first loads all of the configuration files found in parm/metplus_config, then it loads any configuration files passed to METplus via the command line i.e. parm/use_cases/model_applications/s2s/UserScript_obsERA_obsOnly_WeatherRegime.py. The file UserScript_obsERA_obsOnly_WeatherRegime.conf runs the python program, however UserScript_obsERA_obsOnly_Blocking/WeatherRegime_obsERA_obsOnly.conf sets the variables for all steps of the Weather Regime use case including data paths.

# UserScript wrapper for Weather Regime Analysis

[config]
# All steps, including pre-processing:
# PROCESS_LIST = RegridDataPlane(regrid_obs), PcpCombine(daily_mean_obs), UserScript(script_wr)
# Weather Regime Analysis only:
PROCESS_LIST = UserScript(script_wr)

# time looping - options are INIT, VALID, RETRO, and REALTIME
# If set to INIT or RETRO:
#   INIT_TIME_FMT, INIT_BEG, INIT_END, and INIT_INCREMENT must also be set
# If set to VALID or REALTIME:
#   VALID_TIME_FMT, VALID_BEG, VALID_END, and VALID_INCREMENT must also be set
LOOP_BY = VALID

# Format of VALID_BEG and VALID_END using % items
# %Y = 4 digit year, %m = 2 digit month, %d = 2 digit day, etc.
# see www.strftime.org for more information
# %Y%m%d%H expands to YYYYMMDDHH
VALID_TIME_FMT = %Y%m%d%H

# Start time for METplus run - must match VALID_TIME_FMT
VALID_BEG = 1979120100

# End time for METplus run - must match VALID_TIME_FMT
VALID_END = 2017022800

# Increment between METplus runs (in seconds if no units are specified)
#  Must be >= 60 seconds
VALID_INCREMENT = 86400

# List of forecast leads to process for each run time (init or valid)
# In hours if units are not specified
# If unset, defaults to 0 (don't loop through forecast leads)
LEAD_SEQ = 0

# Only Process DJF
SKIP_TIMES = "%m:begin_end_incr(3,11,1)", "%m%d:0229"

# Order of loops to process data - Options are times, processes
# Not relevant if only one item is in the PROCESS_LIST
# times = run all wrappers in the PROCESS_LIST for a single run time, then
#   increment the run time and run all wrappers again until all times have
#   been evaluated.
# processes = run the first wrapper in the PROCESS_LIST for all times
#   specified, then repeat for the next item in the PROCESS_LIST until all
#   wrappers have been run
LOOP_ORDER = processes

# location of configuration files used by MET applications
CONFIG_DIR={PARM_BASE}/use_cases/model_applications/s2s


# Regridding Pre-Processing Step
[regrid_obs]
# Start time for METplus run - must match VALID_TIME_FMT
VALID_BEG = 1979120100

# End time for METplus run - must match VALID_TIME_FMT
VALID_END = 2017022818

# Increment between METplus runs in seconds. Must be >= 60
VALID_INCREMENT = 21600

# REGRID_DATA_PLANE (Pre Processing Step 1), currently turned off
# Run regrid_data_plane on forecast data
OBS_REGRID_DATA_PLANE_RUN = True

# If true, process each field individually and write a file for each
# If false, run once per run time passing in all fields specified
OBS_DATA_PLANE_ONCE_PER_FIELD = False

# Name of input field to process
OBS_REGRID_DATA_PLANE_VAR1_INPUT_FIELD_NAME = Z

# Level of input field to process
OBS_REGRID_DATA_PLANE_VAR1_INPUT_LEVEL = P500

# Name of output field to create
OBS_REGRID_DATA_PLANE_VAR1_OUTPUT_FIELD_NAME = Z500

# Mask to use for regridding
# A 1 degree latitude/longitude grid running 24 to 54 degrees latitude
# and 230 to 300 degrees longitude
REGRID_DATA_PLANE_VERIF_GRID = latlon 71 31 54 230 -1.0 1.0

# Method to run regrid_data_plane, not setting this will default to NEAREST
REGRID_DATA_PLANE_METHOD = BILIN

# Regridding width used in regrid_data_plane, not setting this will default to 1
REGRID_DATA_PLANE_WIDTH = 2

# input and output data directories for each application in PROCESS_LIST
OBS_REGRID_DATA_PLANE_INPUT_DIR = {INPUT_BASE}/model_applications/s2s/UserScript_obsERA_obsOnly_WeatherRegime/ERA/OrigData
OBS_REGRID_DATA_PLANE_OUTPUT_DIR = {OUTPUT_BASE}/s2s/UserScript_obsERA_obsOnly_WeatherRegime/ERA/Regrid

# format of filenames
# Input and output ERA Interim
OBS_REGRID_DATA_PLANE_INPUT_TEMPLATE = {valid?fmt=%Y%m}/ei.oper.an.pl.regn128sc.{valid?fmt=%Y%m%d%H}
OBS_REGRID_DATA_PLANE_OUTPUT_TEMPLATE = {valid?fmt=%Y%m%d}/Z500_6hourly_{init?fmt=%Y%m%d%H}_NH.nc


# Daily Mean Pre-Processing Step
[daily_mean_obs]
# Start time for METplus run
VALID_BEG = 1979120118

# End time for METplus run
VALID_END = 2017022818

# run pcp_combine on obs data
OBS_PCP_COMBINE_RUN = True

# method to run pcp_combine on forecast data
# Options are ADD, SUM, SUBTRACT, DERIVE, and USER_DEFINED
OBS_PCP_COMBINE_METHOD = DERIVE
OBS_PCP_COMBINE_STAT_LIST = MEAN

# field name and level of 1 hr accumulation in forecast files
OBS_PCP_COMBINE_INPUT_ACCUMS = 6
OBS_PCP_COMBINE_INPUT_NAMES = Z500
OBS_PCP_COMBINE_INPUT_LEVELS = "(*,*)"
OBS_PCP_COMBINE_INPUT_OPTIONS = convert(x) = x / 9.81; set_attr_valid = "{valid?fmt=%Y%m%d_%H%M%S?shift=-64800}";

# Convert height and derive mean over 24 hours
OBS_PCP_COMBINE_OUTPUT_ACCUM = 24
OBS_PCP_COMBINE_DERIVE_LOOKBACK = 24

# Name output variable Z500
OBS_PCP_COMBINE_OUTPUT_NAME = Z500

# input and output data directories for each application in PROCESS_LIST
OBS_PCP_COMBINE_INPUT_DIR = {OUTPUT_BASE}/s2s/UserScript_obsERA_obsOnly_WeatherRegime/ERA/Regrid
OBS_PCP_COMBINE_OUTPUT_DIR = {OUTPUT_BASE}/s2s/UserScript_obsERA_obsOnly_WeatherRegime/ERA/Daily

# Input ERA Interim
OBS_PCP_COMBINE_INPUT_TEMPLATE = {valid?fmt=%Y%m%d}/Z500_6hourly_{valid?fmt=%Y%m%d%H}_NH.nc
OBS_PCP_COMBINE_OUTPUT_TEMPLATE = Z500_daily_{valid?fmt=%Y%m%d?shift=-64800}_NH.nc


# Run the Weather Regime Script
[script_wr]
# User Script Commands
USER_SCRIPT_CUSTOM_LOOP_LIST = nc

# Run the user script once
USER_SCRIPT_RUNTIME_FREQ = RUN_ONCE

# Command to run the user script with input configuration file
USER_SCRIPT_COMMAND = {METPLUS_BASE}/parm/use_cases/model_applications/s2s/UserScript_obsERA_obsOnly_WeatherRegime/WeatherRegime_driver.py {METPLUS_BASE}/parm/use_cases/model_applications/s2s/UserScript_obsERA_obsOnly_WeatherRegime/WeatherRegime_obsERA_obsOnly.conf dir.INPUT_BASE={INPUT_BASE} dir.OUTPUT_BASE={OUTPUT_BASE} dir.MET_INSTALL_DIR={MET_INSTALL_DIR}

MET Configuration

METplus sets environment variables based on the values in the METplus configuration file. These variables are referenced in the MET configuration file. YOU SHOULD NOT SET ANY OF THESE ENVIRONMENT VARIABLES YOURSELF! THEY WILL BE OVERWRITTEN BY METPLUS WHEN IT CALLS THE MET TOOLS! If there is a setting in the MET configuration file that is not controlled by an environment variable, you can add additional environment variables to be set only within the METplus environment using the [user_env_vars] section of the METplus configuration files. See the ‘User Defined Config’ section on the ‘System Configuration’ page of the METplus User’s Guide for more information.

See the following files for more information about the environment variables set in this configuration file.

parm/use_cases/met_tool_wrapper/RegridDataPlane/RegridDataPlane.py parm/use_cases/met_tool_wrapper/PCPCombine/PCPCOmbine_derive.py

Python Scripts

This use case uses Python scripts to perform the blocking calculation

parm/use_cases/model_applications/s2s/UserScript_obsERA_obsOnly_WeatherRegime/WeatherRegime_driver.py: This script calls the requested steps in the blocking analysis for a forecast, observation, or both. The possible steps are computing the elbow, computing EOFs, and computing weather regimes using k means clustering.

parm/use_cases/model_applications/s2s/UserScript_obsERA_obsOnly_WeatherRegime/WeatherRegime.py: This script runs the requested steps, containing the code for computing the bend in the elbow, computing EOFs, and computing weather regimes using k means clustering

parm/use_cases/model_applications/s2s/UserScript_obsERA_obsOnly_WeatherRegime/Blocking_WeatherRegime_util.py: This script contains functions used by both the blocking anwd weather regime analysis, including the code for determining which steps the user wants to run, and finding and reading the input files in the format from the output pre-processing steps

#!/usr/bin/env python3
import sys
import os
import numpy as np
import netCDF4

from WeatherRegime import WeatherRegimeCalculation
from metplus.util import pre_run_setup, config_metplus, get_start_end_interval_times, get_lead_sequence
from metplus.util import get_skip_times, skip_time, is_loop_by_init, ti_calculate, do_string_sub, getlist
from metplotpy.contributed.weather_regime import plot_weather_regime as pwr
from Blocking_WeatherRegime_util import find_input_files, parse_steps, read_nc_met

def main():

    all_steps = ["ELBOW","PLOTELBOW","EOF","PLOTEOF","KMEANS","PLOTKMEANS"]

    inconfig_list = sys.argv[1:]
    steps_list_fcst,steps_list_obs,config_list = parse_steps(inconfig_list)
    config = pre_run_setup(config_list)

    if not steps_list_obs and not steps_list_fcst:
        print('No processing steps requested for either the model or observations,')
        print('no data will be processed')


    ######################################################################
    # Blocking Calculation and Plotting
    ######################################################################
    # Set up the data
    steps_obs = WeatherRegimeCalculation(config,'OBS')
    steps_fcst = WeatherRegimeCalculation(config,'FCST')

    # Check to see if there is a plot directory
    oplot_dir = config.getstr('WeatherRegime','WR_PLOT_OUTPUT_DIR','')
    if not oplot_dir:
        obase = config.getstr('config','OUTPUT_BASE')
        oplot_dir = obase+'/'+'plots'
    if not os.path.exists(oplot_dir):
        os.makedirs(oplot_dir)

    elbow_config = config_metplus.replace_config_from_section(config,'WeatherRegime')
    elbow_config_init = config.find_section('WeatherRegime','INIT_BEG')
    elbow_config_valid = config.find_section('WeatherRegime','VALID_BEG')
    use_init =  is_loop_by_init(elbow_config)


    if ("ELBOW" in steps_list_obs) or ("EOF" in steps_list_obs) or ("KMEANS" in steps_list_obs):
        obs_infiles,yr_obs,mth_obs,day_obs,yr_full_obs = find_input_files(elbow_config, use_init, 'OBS_WR_TEMPLATE')
        obs_invar = config.getstr('WeatherRegime','OBS_WR_VAR','')
        z500_obs,lats_obs,lons_obs,year_obs = read_nc_met(obs_infiles,yr_obs,obs_invar)
        z500_detrend_obs,z500_detrend_2d_obs = steps_obs.weights_detrend(lats_obs,lons_obs,z500_obs)

    if ("ELBOW" in steps_list_fcst) or ("EOF" in steps_list_fcst) or("KMEANS" in steps_list_fcst):
        fcst_infiles,yr_fcst, mth_fcst,day_fcst,yr_full_fcst = find_input_files(elbow_config, use_init, 'FCST_WR_TEMPLATE')
        fcst_invar = config.getstr('WeatherRegime','FCST_WR_VAR','')
        z500_fcst,lats_fcst,lons_fcst,year_fcst = read_nc_met(fcst_infiles,yr_fcst,fcst_invar)
        z500_detrend_fcst,z500_detrend_2d_fcst = steps_fcst.weights_detrend(lats_fcst,lons_fcst,z500_fcst)


    if ("ELBOW" in steps_list_obs):
        print('Running Obs Elbow')
        K_obs,d_obs,mi_obs,line_obs,curve_obs = steps_obs.run_elbow(z500_detrend_2d_obs)

    if ("ELBOW" in steps_list_fcst):
        print('Running Forecast Elbow')
        K_fcst,d_fcst,mi_fcst,line_fcst,curve_fcst = steps_fcst.run_elbow(z500_detrend_2d_fcst)

    if ("PLOTELBOW" in steps_list_obs):
        if not ("ELBOW" in steps_list_obs):
            raise Exception('Must run observed Elbow before plotting observed elbow.')
        print('Creating Obs Elbow plot')
        elbow_plot_title = config.getstr('WeatherRegime','OBS_ELBOW_PLOT_TITLE','Elbow Method For Optimal k')
        elbow_plot_outname = oplot_dir+'/'+config.getstr('WeatherRegime','OBS_ELBOW_PLOT_OUTPUT_NAME','obs_elbow')
        pwr.plot_elbow(K_obs,d_obs,mi_obs,line_obs,curve_obs,elbow_plot_title,elbow_plot_outname)

    if ("PLOTELBOW" in steps_list_fcst):
        if not ("ELBOW" in steps_list_fcst):
            raise Exception('Must run forecast Elbow before plotting forecast elbow.')
        print('Creating Forecast Elbow plot')
        elbow_plot_title = config.getstr('WeatherRegime','FCST_ELBOW_PLOT_TITLE','Elbow Method For Optimal k')
        elbow_plot_outname = oplot_dir+'/'+config.getstr('WeatherRegime','FCST_ELBOW_PLOT_OUTPUT_NAME','fcst_elbow')
        pwr.plot_elbow(K_fcst,d_fcst,mi_fcst,line_fcst,curve_fcst,elbow_plot_title,elbow_plot_outname)


    if ("EOF" in steps_list_obs):
        print('Running Obs EOF')
        eof_obs,pc_obs,wrnum_obs,variance_fractions_obs = steps_obs.Calc_EOF(z500_obs)
        z500_detrend_2d_obs = steps_obs.reconstruct_heights(eof_obs,pc_obs,z500_detrend_2d_obs.shape)

    if ("EOF" in steps_list_fcst):
        print('Running Forecast EOF')
        eof_fcst,pc_fcst,wrnum_fcst,variance_fractions_fcst = steps_fcst.Calc_EOF(z500_fcst)
        z500_detrend_2d_fcst = steps_fcst.reconstruct_heights(eof_fcst,pc_fcst,z500_detrend_2d_fcst.shape)

    if ("PLOTEOF" in steps_list_obs):
        if not ("EOF" in steps_list_obs):
            raise Exception('Must run observed EOFs before plotting observed EOFs.')
        print('Plotting Obs EOFs')
        pltlvls_str = getlist(config.getstr('WeatherRegime','EOF_PLOT_LEVELS',''))
        pltlvls = [float(pp) for pp in pltlvls_str]
        eof_plot_outname = oplot_dir+'/'+config.getstr('WeatherRegime','OBS_EOF_PLOT_OUTPUT_NAME','obs_eof')
        pwr.plot_eof(eof_obs,wrnum_obs,variance_fractions_obs,lons_obs,lats_obs,eof_plot_outname,pltlvls)

    if ("PLOTEOF" in steps_list_fcst):
        if not ("EOF" in steps_list_fcst):
            raise Exception('Must run forecast EOFs before plotting forecast EOFs.')
        print('Plotting Forecast EOFs')
        pltlvls_str = getlist(config.getstr('WeatherRegime','EOF_PLOT_LEVELS',''))
        pltlvls = [float(pp) for pp in pltlvls_str]
        eof_plot_outname = oplot_dir+'/'+config.getstr('WeatherRegime','OBS_EOF_PLOT_OUTPUT_NAME','obs_eof')
        pwr.plot_eof(eof_fcst,wrnum_fcst,variance_fractions_fcst,lons_fcst,lats_fcst,eof_plot_outname,pltlvls)


    if ("KMEANS" in steps_list_obs):
        print('Running Obs K Means')
        kmeans_obs,wrnum_obs,perc_obs = steps_obs.run_K_means(z500_detrend_2d_obs,yr_full_obs,mth_obs,day_obs,z500_obs.shape)

    if ("KMEANS" in steps_list_fcst):
        print('Running Forecast K Means')
        kmeans_fcst,wrnum_fcst,perc_fcst = steps_fcst.run_K_means(z500_detrend_2d_fcst,yr_full_fcst,mth_fcst,day_fcst,z500_fcst.shape)

    if ("PLOTKMEANS" in steps_list_obs):
        if not ("KMEANS" in steps_list_obs):
            raise Exception('Must run observed Kmeans before plotting observed Kmeans.')
        print('Plotting Obs K Means')
        pltlvls_str = getlist(config.getstr('WeatherRegime','KMEANS_PLOT_LEVELS',''))
        pltlvls = [float(pp) for pp in pltlvls_str]
        kmeans_plot_outname = oplot_dir+'/'+config.getstr('WeatherRegime','OBS_KMEANS_PLOT_OUTPUT_NAME','obs_kmeans')
        pwr.plot_K_means(kmeans_obs,wrnum_obs,lons_obs,lats_obs,perc_obs,kmeans_plot_outname,pltlvls)

    if ("PLOTKMEANS" in steps_list_fcst):
        if not ("KMEANS" in steps_list_fcst):
            raise Exception('Must run forecast Kmeans before plotting forecast Kmeans.')
        print('Plotting Forecast K Means')
        pltlvls_str = getlist(config.getstr('WeatherRegime','KMEANS_PLOT_LEVELS',''))
        pltlvls = [float(pp) for pp in pltlvls_str]
        kmeans_plot_outname = oplot_dir+'/'+config.getstr('WeatherRegime','FCST_KMEANS_PLOT_OUTPUT_NAME','fcst_kmeans')
        pwr.plot_K_means(kmeans_fcst,wrnum_fcst,lons_fcst,lats_fcst,perc_fcts,kmeans_plot_outname,pltlvls)


if __name__ == "__main__":
    main()
import os
import numpy as np
from pylab import *
from sklearn.cluster import KMeans
import scipy
import netCDF4 as nc4
from scipy import stats,signal
from numpy import ones,vstack
from numpy.linalg import lstsq
from eofs.standard import Eof
from metplus.util import config_metplus, get_start_end_interval_times, get_lead_sequence
from metplus.util import get_skip_times, skip_time, is_loop_by_init, ti_calculate


class WeatherRegimeCalculation():
    """Contains the programs to perform a Weather Regime Analysis
    """
    def __init__(self,config,label):

        self.wrnum = config.getint('WeatherRegime',label+'_WR_NUMBER',6)
        self.numi = config.getint('WeatherRegime',label+'_NUM_CLUSTERS',20)
        self.NUMPCS = config.getint('WeatherRegime',label+'_NUM_PCS',10)
        self.wr_outfile_type = config.getstr('WeatherRegime',label+'_WR_OUTPUT_FILE_TYPE','text')
        self.wr_outfile_dir = config.getstr('WeatherRegime','WR_OUTPUT_FILE_DIR','')
        self.wr_outfile = config.getstr('WeatherRegime',label+'_WR_OUTPUT_FILE',label+'_WeatherRegime')


    def get_cluster_fraction(self, m, label):
        return (m.labels_==label).sum()/(m.labels_.size*1.0)


    def weights_detrend(self,lats,lons,indata):

        arr_shape = indata.shape

        ##Set up weight array
        cos = lats * np.pi / 180.0
        way = np.cos(cos)
        if len(lats.shape) == 1:
            weightf = np.repeat(way[:,np.newaxis],len(lons),axis=1)
        else:
            weightf = way

        #Remove trend and seasonal cycle
        atemp = np.zeros(arr_shape)
        for i in np.arange(0,len(indata[0,:,0,0]),1):
            atemp[:,i] = signal.detrend(atemp[:,i],axis=0)
            atemp[:,i] = (indata[:,i] - np.nanmean(indata[:,i],axis=0)) * weightf

        a = atemp
        atemp=0

        #Reshape into time X space
        a1 = np.reshape(a,[len(a[:,0,0,0])*len(a[0,:,0,0]),len(lons)*len(lats)])

        return a,a1


    def run_elbow(self,a1):

        k=KMeans(n_clusters=self.wrnum, random_state=0)  #Initilize cluster centers

        #Calculate sum of squared distances for clusters 1-15
        kind = np.arange(1,self.numi,1)
        Sum_of_squared_distances = []
        K = range(1,self.numi)
        for k in K:
            km = KMeans(n_clusters=k)
            km = km.fit(a1)
            Sum_of_squared_distances.append(km.inertia_)

        # Calculate the bend of elbow
        points = [(K[0],Sum_of_squared_distances[0]),(K[-1],Sum_of_squared_distances[-1])]
        x_coords, y_coords = zip(*points)
        A = vstack([x_coords,ones(len(x_coords))]).T
        m, c = lstsq(A, y_coords,rcond=None)[0]
        line = m*kind + c
        curve = Sum_of_squared_distances
        curve=np.array(curve)*10**-10
        line = line*10**-10

        d=[]
        for i in np.arange(0,self.numi-1,1):
            p1=np.array([K[0],curve[0]])
            p2=np.array([K[-1],curve[-1]])
            p3=np.array([K[i],curve[i]])
            d=np.append(d,np.cross(p2-p1,p3-p1)/np.linalg.norm(p2-p1))

        mi = np.where(d==d.min())[0]
        print('Optimal Cluster # = '+str(mi+1)+'')

        return K,d,mi,line,curve


    def Calc_EOF(self,eofin):

        #Remove trend and seasonal cycle
        for d in np.arange(0,len(eofin[0,:,0,0]),1):
            eofin[:,d] = signal.detrend(eofin[:,d],axis=0)
            eofin[:,d] = eofin[:,d] - np.nanmean(eofin[:,d],axis=0)

        #Reshape into time X space
        arr_shape = eofin.shape
        arrdims = len(arr_shape)
        eofin = np.reshape(eofin,(np.prod(arr_shape[0:arrdims-2]),arr_shape[arrdims-2]*arr_shape[arrdims-1]))

        # Use EOF solver and get PCs and EOFs
        solver = Eof(eofin,center=False)
        pc = solver.pcs(npcs=self.NUMPCS,pcscaling=1)
        eof = solver.eofsAsCovariance(neofs=self.NUMPCS,pcscaling=1)
        eof = np.reshape(eof,(self.NUMPCS,arr_shape[arrdims-2],arr_shape[arrdims-1]))

        #Get variance fractions
        variance_fractions = solver.varianceFraction(neigs=self.NUMPCS) * 100
        print(variance_fractions)

        return eof, pc, self.wrnum, variance_fractions


    def reconstruct_heights(self,eof,pc,reshape_arr):

        rssize = len(reshape_arr)
        eofshape = eof.shape
        eosize = len(eofshape)

        #reconstruction. If NUMPCS=nt, then a1=a0
        eofs=np.reshape(eof,(self.NUMPCS,eofshape[eosize-2]*eofshape[eosize-1]))
        a1=np.matmul(pc,eofs)

        return a1


    def run_K_means(self,a1,yr,mth,day,arr_shape):

        arrdims = len(arr_shape)

        k=KMeans(n_clusters=self.wrnum, random_state=0)

        #fit the K-means algorithm to the data
        f=k.fit(a1)

        #Obtain the cluster anomalies
        y=f.cluster_centers_

        #Obtain cluster labels for each day [Reshape to Year,day]
        wr = f.labels_
        wr = np.reshape(wr,arr_shape[0:arrdims-2])

        yf = np.reshape(y,[self.wrnum,arr_shape[arrdims-2],arr_shape[arrdims-1]]) # reshape cluster anomalies to latlon

        #Get frequency of occurence for each cluster
        perc=np.zeros(self.wrnum)
        for ii in np.arange(0,self.wrnum,1):
            perc[ii] = self.get_cluster_fraction(f,ii)

        #Sort % from low to high
        ii = np.argsort(perc)
        print(perc[ii])

        #Reorder
        perc = perc[ii]
        input=yf[ii]
        ii = ii[::-1]

        #Reorder from max to min and relabel
        wrc = wr*1.0/1.0
        for i in np.arange(0,self.wrnum,1):
            wrc[wr==ii[i]] = i

        perc = perc[::-1]
        input = input[::-1]

        #Save Label data [YR,DAY]
        # Make some conversions first
        wrc_shape = wrc.shape
        yr_1d = np.array(yr)
        mth_1d = np.array(mth)
        day_1d = np.array(day)
        wrc_1d = np.reshape(wrc,len(mth))

        # netcdf file
        if self.wr_outfile_type=='netcdf':
            wr_full_outfile = self.wr_outfile_dir+'/'+self.wr_outfile+'.nc'

            if os.path.isfile(wr_full_outfile):
                os.remove(wr_full_outfile)

            # Create CF compliant time unit
            rdate = datetime.datetime(int(yr_1d[0]),int(mth_1d[0]),int(day_1d[0]),0,0,0)
            cf_diffdays = np.zeros(len(yr_1d))
            ymd_arr = np.empty(len(yr_1d))
            for dd in range(len(yr_1d)):
                loopdate = datetime.datetime(int(yr_1d[dd]),int(mth_1d[dd]),int(day_1d[dd]),0,0,0)
                cf_diffdays[dd] = (loopdate - rdate).days
                ymd_arr[dd] = yr_1d[dd]+mth_1d[dd]+day_1d[dd]

            nc = nc4.Dataset(wr_full_outfile, 'w')
            nc.createDimension('time', len(mth))
            nc.Conventions = "CF-1.7"
            nc.title = "Weather Regime Classification"
            nc.institution = "NCAR DTCenter"
            nc.source = "Weather Regime METplus use-case"

            ncti = nc.createVariable('time','d',('time'))
            nc.variables['time'].long_name = "time"
            nc.variables['time'].standard_name = "time"
            nc.variables['time'].units = "days since "+rdate.strftime('%Y-%m-%d %H:%M:%S')
            nc.variables['time'].calendar = "gregorian"

            ncdate = nc.createVariable('date','i',('time'))
            nc.variables['date'].long_name = "date YYYYMMDD"
       
            ncnum = nc.createVariable('wrnum','i',('time'),fill_value=-9999.0)
            nc.variables['wrnum'].long_name = "weather_regime_number"

            ncti[:] = cf_diffdays
            ncdate[:] = ymd_arr
            ncnum[:] = wrc_1d
            nc.close()

        # text file
        if self.wr_outfile_type=='text':
           wr_full_outfile = self.wr_outfile_dir+'/'+self.wr_outfile+'.txt'

           if os.path.isfile(wr_full_outfile):
                os.remove(wr_full_outfile)

           otdata = np.array([yr_1d, mth_1d, day_1d, wrc_1d])
           otdata = otdata.T

           with open(wr_full_outfile, 'w+') as datafile_id:
               np.savetxt(datafile_id, otdata, fmt=['%6s','%3s','%4s','%6s'], header='Year Month Day WeatherRegime')


        return input, self.wrnum, perc
import os
import netCDF4
import numpy as np
from metplus.util import pre_run_setup, config_metplus, get_start_end_interval_times, get_lead_sequence
from metplus.util import get_skip_times, skip_time, is_loop_by_init, ti_calculate, do_string_sub

def parse_steps(config_list):

    steps_config_part_fcst = [s for s in config_list if "FCST_STEPS" in s]
    steps_list_fcst = []

    steps_config_part_obs = [s for s in config_list if "OBS_STEPS" in s]
    steps_list_obs = []

    # Setup the Steps
    if steps_config_part_fcst:
        steps_param_fcst = steps_config_part_fcst[0].split("=")[1]
        steps_list_fcst = steps_param_fcst.split("+")
        config_list.remove(steps_config_part_fcst[0])
    if steps_config_part_obs:
        steps_param_obs = steps_config_part_obs[0].split("=")[1]
        steps_list_obs = steps_param_obs.split("+")
        config_list.remove(steps_config_part_obs[0])

    config = pre_run_setup(config_list)
    if not steps_config_part_fcst:
        steps_param_fcst = config.getstr('config','FCST_STEPS','')
        steps_list_fcst = steps_param_fcst.split("+")
    if not steps_config_part_obs:
        steps_param_obs = config.getstr('config','OBS_STEPS','')
        steps_list_obs = steps_param_obs.split("+")

    return steps_list_fcst, steps_list_obs, config_list 


def find_input_files(inconfig, use_init, intemplate, secondtemplate=''):
    loop_time, end_time, time_interval = get_start_end_interval_times(inconfig)
    skip_times = get_skip_times(inconfig)

    start_mth = loop_time.strftime('%m')
    template = inconfig.getraw('config',intemplate)
    if secondtemplate:
        template2 = inconfig.getraw('config',secondtemplate)
        file_list2 = []

    file_list = []
    yr_list = []
    mth_list = []
    day_list = []
    yr_full_list = []
    if use_init:
        timname = 'init'
    else:
        timname = 'valid'
    input_dict = {}
    input_dict['loop_by'] = timname
    pmth = start_mth
    while loop_time <= end_time:
        lead_seq = get_lead_sequence(inconfig)
        for ls in lead_seq:
            new_time = loop_time + ls
            input_dict[timname] = loop_time
            input_dict['lead'] = ls

            outtimestuff = ti_calculate(input_dict)
            if skip_time(outtimestuff, skip_times):
                continue
            cmth = outtimestuff['valid'].strftime('%m')
            filepath = do_string_sub(template, **outtimestuff)
            mth_list.append(cmth)
            day_list.append(outtimestuff['valid'].strftime('%d'))
            yr_full_list.append(outtimestuff['valid'].strftime('%Y'))
            if secondtemplate:
                filepath2 = do_string_sub(template2, **outtimestuff)
                if os.path.exists(filepath) and os.path.exists(filepath2):
                    file_list.append(filepath)
                    file_list2.append(filepath2)
                else:
                    file_list.append('')
                    file_list2.append('')
            else:
                if os.path.exists(filepath):
                    file_list.append(filepath)
                else:
                    file_list.append('')

            if (int(cmth) == int(start_mth)) and (int(pmth) != int(start_mth)):
                yr_list.append(int(outtimestuff['valid'].strftime('%Y')))
            pmth = cmth

        loop_time += time_interval

    if secondtemplate:
        file_list = [file_list,file_list2]
    yr_list.append(int(outtimestuff['valid'].strftime('%Y')))

    if all('' == fn for fn in file_list):
        raise Exception('No input files found as given: '+template)

    return file_list, yr_list, mth_list, day_list, yr_full_list

def read_nc_met(infiles,yrlist,invar):

    print("Reading in Data")

    #Find the first non empty file name so I can get the variable sizes
    locin = next(sub for sub in infiles if sub)
    indata = netCDF4.Dataset(locin)
    lats = indata.variables['lat'][:]
    lons = indata.variables['lon'][:]
    invar_arr = indata.variables[invar][:]
    indata.close()

    var_3d = np.empty([len(infiles),len(invar_arr[:,0]),len(invar_arr[0,:])])

    for i in range(0,len(infiles)):

        #Read in the data
        if infiles[i]:
            indata = netCDF4.Dataset(infiles[i])
            new_invar = indata.variables[invar][:]
            init_time_str = indata.variables[invar].getncattr('init_time')
            valid_time_str = indata.variables[invar].getncattr('valid_time')
            indata.close()
        else:
            new_invar = np.empty((1,len(var_3d[0,:,0]),len(var_3d[0,0,:])),dtype=np.float)
            new_invar[:] = np.nan
        var_3d[i,:,:] = new_invar

    yr = np.array(yrlist)
    if len(var_3d[:,0,0])%float(len(yrlist)) != 0:
        lowval = int(len(var_3d[:,0,0])/float(len(yrlist)))
        newarrlen = (lowval+1) * float(len(yrlist))
        arrexp = int(newarrlen - len(var_3d[:,0,0]))
        arrfill = np.empty((arrexp,len(var_3d[0,:,0]),len(var_3d[0,0,:])),dtype=np.float)
        arrfill[:] = np.nan
        var_3d = np.append(var_3d,arrfill,axis=0)
    sdim = len(var_3d[:,0,0])/float(len(yrlist))
    var_4d = np.reshape(var_3d,[len(yrlist),int(sdim),len(var_3d[0,:,0]),len(var_3d[0,0,:])])

    return var_4d,lats,lons,yr

Running METplus

This use case is run in the following ways:

  1. Passing in UserScript_obsERA_obsOnly_WeatherRegime.py then a user-specific system configuration file:

    master_metplus.py -c /path/to/METplus/parm/use_cases/model_applications/s2s/UserScript_obsERA_obsOnly_WeatherRegime.py -c /path/to/user_system.conf
    
  2. Modifying the configurations in parm/metplus_config, then passing in UserScript_obsERA_obsOnly_WeatherRegime.py:

    master_metplus.py -c /path/to/METplus/parm/use_cases/model_applications/s2s/UserScript_obsERA_obsOnly_WeatherRegime.py
    

The following variables must be set correctly:

  • INPUT_BASE - Path to directory where sample data tarballs are unpacked (See Datasets section to obtain tarballs). This is not required to run METplus, but it is required to run the examples in parm/use_cases

  • OUTPUT_BASE - Path where METplus output will be written. This must be in a location where you have write permissions

  • MET_INSTALL_DIR - Path to location where MET is installed locally

Example User Configuration File:

[dir]
INPUT_BASE = /path/to/sample/input/data
OUTPUT_BASE = /path/to/output/dir
MET_INSTALL_DIR = /path/to/met-X.Y

Expected Output

Refer to the value set for OUTPUT_BASE to find where the output data was generated. Output for this use case will be found in model_applications/s2s/WeatherRegime (relative to OUTPUT_BASE) and will contain output for the steps requested. This may include the regridded data, daily averaged files, and a weather regime output file. In addition, output elbow, EOF, and Kmeans weather regime plots can be generated. The location of these output plots can be specified as WR_OUTPUT_DIR. If it is not specified, plots will be sent to model_applications/s2s/WeatherRegime/plots (relative to OUTPUT_BASE).

Keywords

sphinx_gallery_thumbnail_path = ‘_static/s2s-OBS_ERA_weather_regime.png’

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery