4. Using METdbLoad

The METdbLoad module provides support for inserting MET output data (as .stat files) into a relational database (mysql, mariadb, or aurora).

4.1. Create Database

Before using the METdbLoad module, the database must exist and have the proper permissions (i.e. grant privileges to insert, delete, update, and index).

Data must be loaded into a database which has the prefix 'mv_', (e.g. mv_met_data). This database must be structured with the METviewer mv_mysql.sql schema:

cd [install]/metviewer
mysql -u[db_username] -p[db_password] -e'create database [db_name];'
mysql -u[db_username] -p[db_password] [db_name] < sql/mv_mysql.sql
  • Replace db_usenamer with the username

  • Replace db_passwd with the password

  • Replace db_name with the appropriate database name that begins with mv (e.g. mv_xyz )

Note

Use the mysql command line client (https://dev.mysql.com/doc/refman/en/mysql.html) to run the above instructions.

The mv_mysql.sql file is located in the METdataio/METdbLoad/sql/ directory.

Create MET linetype tables

mysql -udbuser -pdbpasswd dbtable_name < /path-to-METdataio/METdbLoad/sql/mv_mysql.sql

  • Replace dbuser with the username

  • Replace dbpasswd with the password

  • Replace dbtable_name with the appropriate database name that begins with mv (e.g. mv_xyz )

  • Replace path-to-METdataio with the path where the METdataio source code is saved

  • Note the use of the redirection symbol ‘<’ in the command

4.2. Create an XML Load Specification Document

The XML load specification document contains information about the MET data, database connection information, and the location of data to be loaded. This is used by the METdbLoad script met_db_load.py to load the MET data. The XML load specification is validated against a schema to check that the file is valid. This validation is necessary to prevent extremely large payloads or recursive payloads that can compromise the loading of data. The elements in the XML load specification file must adhere to the order specified by the XML schema and conform to size and number of element limitations. Click to expand the example XML load specifications and the XML validation schema below.

An example XML specification file that is valid

For defining data organized by dates using the folder_tmpl element:

<load_spec>
    <connection>
        <host>yourhost:1234</host>
        <database>your_database_name</database>
        <user>your_db_username</user>
        <password>your_password</password>
    </connection>

    <date_list name="folder_dates">
        <start>2022050100</start>
        <end>2022051200</end>
        <inc>86400</inc>
        <format>yyyyMMddHH</format>
    </date_list>
    <date_list name="valid_dates">
        <start>2022050100</start>
        <end>2022051200</end>
        <inc>0600</inc>
        <format>yyyyMMddHH</format>
    </date_list>

    <verbose>false</verbose>
    <insert_size>1</insert_size>
    <stat_header_db_check>False</stat_header_db_check>
    <mode_header_db_check>false</mode_header_db_check>
    <drop_indexes>FALSE</drop_indexes>
    <apply_indexes>true</apply_indexes>
    <load_stat>True</load_stat>
    <load_mode>false</load_mode>
    <load_mpr>false</load_mpr>
    <group>Regional Ensemble</group>

    <folder_tmpl>
        /var/autofs/mnt/hostmachine/projects/RRFS/prototype/met_out/{config}/{fcst_init}/{mem}/{valid_times}/metprd/{met_out}/
    </folder_tmpl>

    <load_val>
        <field name="config">
            <val>HREF_lag_offset</val>
            <val>RTPS</val>
        </field>
        <field name="fcst_init">
            <date_list name="folder_dates"/>
        </field>
        <field name="mem">
            <val>mem01</val>
            <val>mem02</val>
            <val>mem03</val>
            <val>mem04</val>
            <val>mem05</val>
            <val>mem06</val>
            <val>mem07</val>
            <val>mem08</val>
            <val>mem09</val>
            <val>mem10</val>
        </field>
        <field name="valid_times">
            <date_list name="valid_dates"/>
        </field>
        <field name="met_out">
            <val>grid_stat_cmn</val>
            <val>point_stat_cmn</val>
        </field>
    </load_val>
    <line_type>
        <val>CNT</val>
        <val>ORANK</val>
        <val>SEEPS_MPR</val>
        <val>ME</val>
    </line_type>
    <load_xml>true</load_xml>
    <load_note>Load HREF and RTPS data for Spring 2022.</load_note>

</load_spec>
Another example XML specification file that is valid

For specifying a list of input data files using the load_files element:

<load_spec>
  <connection>
    <host>yourhost:1234</host>
    <database>your_database_name</database>
    <user>your_db_username</user>
    <password>your_password</password>
  </connection>

  <verbose>true</verbose>
  <insert_size>1</insert_size>
  <stat_header_db_check>False</stat_header_db_check>
  <mode_header_db_check>false</mode_header_db_check>
  <drop_indexes>FALSE</drop_indexes>
  <apply_indexes>true</apply_indexes>
  <load_stat>True</load_stat>
  <load_mode>false</load_mode>
  <load_mpr>false</load_mpr>
  <group>Regional Ensemble</group>

  <load_files>
    <file>/met_out/mode/mode_MASK_POLY_300000L_20120410_180000V_060000A_cts.txt</file>
    <file>/met_out/rhist/ensemble_stat_RRFS_GEFS_GF.SPP.SPPT_RETOP_MRMS_20220507_120000V.stat</file>
  </load_files>

  
  <load_xml>true</load_xml>
  <load_note>Load MODE and RHIST for load_files.</load_note>
  
</load_spec>
The XML schema that is used to validate the XML specification file
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <!-- Define special types used to impose size limitations on payload-->

    <xs:simpleType name="inserttype">
        <!-- only insert size of 1 -->
        <xs:restriction base="xs:positiveInteger">
            <xs:enumeration value="1"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="truefalsetype">
        <!--Checking for true or false values only (allow for case-insensitive) -->
        <xs:restriction base="xs:string">
            <xs:pattern value="((t|T)(r|R)(u|U)(e|E))|((f|F)(a|A)(l|L)(s|S)(e|E))"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="usernametype">
        <!--Limit the username to 32 characters (mysql maximum) -->
        <xs:restriction base="xs:string">
            <xs:pattern value="[\s\-_a-zA-z0-9]+"/>
            <xs:minLength value="3"/>
            <xs:maxLength value="32"/>
        </xs:restriction>
    </xs:simpleType>
    <xs:simpleType name="hostnametype">
        <!--Limit the hostname to 60 characters (mysql maximum) add 7 for port number -->
        <xs:restriction base="xs:string">
            <xs:pattern value="([.\-_a-zA-z0-9]+:\d+)"/>
            <xs:minLength value="3"/>
            <xs:maxLength value="67"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="dbnametype">
        <!--Limit the  database name to max 124 characters based on Stack Overflow  -->
        <xs:restriction base="xs:string">
            <xs:pattern value="[\-_a-zA-z0-9]+"/>
            <xs:minLength value="3"/>
            <xs:maxLength value="124"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="pwdtype">
        <!--Limit the password length  -->
        <xs:restriction base="xs:string">
            <xs:minLength value="3"/>
            <xs:maxLength value="30"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="foldertmpl">
        <!--Checking for folder template name, limit to max 4096 characters -->
        <xs:restriction base="xs:string">
            <!-- Allow _, #, -, / in path -->
            <!-- https://unix.stackexchange.com/questions/32795/what-is-the-maximum-allowed-filename-and-folder-size-with-ecryptfs -->
            <xs:pattern value=".+"/>
            <xs:minLength value="3"/>
            <xs:maxLength value="4096"/>

        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="mgmtsystype">
        <!--Checking for mariadb|aurora|mysql values in management_system-->
        <xs:restriction base="xs:string">
            <xs:enumeration value="mysql"/>
            <xs:enumeration value="aurora"/>
            <xs:enumeration value="mariadb"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="limitedstringtype">
        <!--Limit the database group and description  -->
        <xs:restriction base="xs:string">
            <xs:pattern value="[\s\-._,a-zA-z0-9]+"/>
            <xs:minLength value="1"/>
            <xs:maxLength value="300"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="linetype">
        <xs:restriction base="xs:string">
            <xs:minLength value="2"/>
            <xs:maxLength value="25"/>
            <xs:pattern value="[_a-zA-Z0-9]+"/>
        </xs:restriction>
    </xs:simpleType>


    <!-- COMPLEX TYPES -->
    <xs:element name="load_spec">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="connection">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element type="mgmtsystype" name="management_system" minOccurs="0"/>
                            <xs:element type="hostnametype" name="host"/>
                            <xs:element type="dbnametype" name="database"/>
                            <xs:element type="usernametype" name="user"/>
                            <xs:element type="pwdtype" name="password"/>
                            <xs:element type="truefalsetype" name="local_infile" minOccurs="0"/>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>

                <xs:element name="date_list" minOccurs="0" maxOccurs="5">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element type="xs:int" name="start"/>
                            <xs:element type="xs:int" name="end"/>
                            <xs:element type="xs:int" name="inc"/>
                            <xs:element type="xs:string" name="format"/>
                        </xs:sequence>
                        <xs:attribute type="xs:string" name="name"/>
                    </xs:complexType>
                </xs:element>

                <!-- Header and other flags -->
                <xs:element type="truefalsetype" name="verbose"/>
                <xs:element type="xs:positiveInteger" name="insert_size"/>
                <xs:element type="truefalsetype" name="stat_header_db_check" minOccurs="0"/>
                <xs:element type="truefalsetype" name="mode_header_db_check" minOccurs="0"/>
                <xs:element type="truefalsetype" name="mtd_header_db_check" minOccurs="0"/>
                <xs:element type="truefalsetype" name="drop_indexes" minOccurs="0"/>
                <xs:element type="truefalsetype" name="apply_indexes" minOccurs="0"/>
                <xs:element type="truefalsetype" name="load_stat" minOccurs="0"/>
                <xs:element type="truefalsetype" name="load_mode" minOccurs="0"/>
                <xs:element type="truefalsetype" name="load_mtd" minOccurs="0"/>
                <xs:element type="truefalsetype" name="load_mpr" minOccurs="0"/>
                <xs:element type="truefalsetype" name="load_orank" minOccurs="0"/>
                <xs:element type="truefalsetype" name="force_dup_file" minOccurs="0"/>

                <xs:element type="limitedstringtype" name="group" minOccurs="0"/>
                <xs:element type="limitedstringtype" name="description" minOccurs="0"/>

                <xs:element name="load_files" minOccurs="0">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element type="xs:string" name="file" minOccurs="1" maxOccurs="200"/>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>

                <xs:element type="foldertmpl" name="folder_tmpl" minOccurs="0"/>

                <xs:element name="load_val" minOccurs="0">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element name="field" minOccurs="0" maxOccurs="20">
                                <xs:complexType>
                                    <xs:sequence>
                                        <xs:element name="date_list" minOccurs="0" maxOccurs="5">
                                            <xs:complexType>
                                                <xs:simpleContent>
                                                    <xs:extension base="xs:string">
                                                        <xs:attribute type="xs:string" name="name"/>
                                                    </xs:extension>
                                                </xs:simpleContent>
                                            </xs:complexType>
                                        </xs:element>
                                        <xs:element type="xs:string" name="val" minOccurs="0" maxOccurs="100"/>
                                    </xs:sequence>
                                    <xs:attribute type="xs:string" name="name"/>
                                </xs:complexType>
                            </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>

                <xs:element name="line_type" minOccurs="0" maxOccurs="1">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element type="linetype" name="val" maxOccurs="800" minOccurs="0"/>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>

                <xs:element type="xs:string" name="load_xml" minOccurs="0"/>
                
                <xs:element type="xs:string" name="load_note" minOccurs="0"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

4.2.1. Create the XML Specification File

The XML load specification file contains the database connection information, data file location, and other instructions for loading the data into the database.

Create the XML load specification file by copying the example specification file METdataio/METdbLoad/test/Examples/example_load_specification.xml file to the workspace location.

Do not save this XML specification file where it can be read by anyone who should not have access to this information, as this file contains the database password.

cp $METDATAIO_HOME/METdbLoad/test/Examples/example_load_specification.xml  path-to-your-dir/load_specification.xml
  • $METDATAIO_HOME is the path to the location of the cloned or forked METdataio source code.

  • Replace the path-to-your-dir with the actual path to where this file will be saved.

Change directory to the location where the example_load_specification.xml file was copied.

cd path-to-your-dir/load_specification.xml
  • Replace path-to-your-dir with the full path where the XML specification file will be saved.

Make the necessary edits to the required elements and delete any optional, unused/irrelevant elements, based on the explanation below (click to expand). Remember to update the username and password that is applicable to the database.

Note

The order of the elements in the XML load specification file is crucial. DO NOT modify the order of the described elements (below).

The following is an explanation of the required and optional elements and any limitations

These are element names. The XML angle brackets (<>) as seen in the XML load specification file are omitted for simplicity. Indentation is used to indicate hierarchical relationships between elements.

load_spec
  • mandatory

  • top-level tag/element

  • container for other elements that define connection information, flags, data input, etc.

The following elements pertain to logging into the database.

connection
  • mandatory

  • tag for connection information

management_system
  • optional

  • indicates which database is in use

  • recognized/expected values are one of the following:

    • aurora

    • mysql

    • mariadb

  • delete this element if not using

host
  • mandatory

  • name of host/machine where database is installed

  • format is hostname:port number

  • minimum number of characters is 3

  • maximum number of characters is 67

  • allowable characters (combinations of any of these):

  • upper and lower alphabetical characters (English)

  • digits 0-9

  • ., -, _ (period, dash, underscore)

database
  • mandatory

  • name of the database

  • maximum number of characters for database name is 124

  • allowable characters (combination of any of these):
    • _,- (underscore, dash)

    • upper and lower case alphabetical characters (English)

    • digits 0-9

user
  • mandatory

  • user name

  • minimum number of characters is 3

  • maximum number of characters is 32

  • allowable characters (combination of any of these):

    • upper and lower case alphabetical characters (English)

    • digits 0-9

    • _,- (underscore, dash)

password
  • mandatory

  • the password to access the database

  • minimum number of characters is 3

  • maximum number of characters is 30

  • all characters are allowed

local_infile
  • optional

  • argument passed into 3rd party Python library pymysql

    • for establishing a connection to a MySQL server

    • indicate whether the input file is local

    • default is False

    • enables use of the LOAD DATA LOCAL command

    • Accepted value: - Boolean value: True or False

      • True if loading local data

      • False otherwise

    • delete this element if loading of local data is not needed

      • METdataio sets default to False if this element is absent

The following elements are used to define the format of multiple input data directories that are (optionally) organized by datetime.

date_list
  • optional

  • for describing data organized in datetime subdirectories

  • omit date_list entries if data resides in a singular directory

  • multiple date_list elements are allowed

    • maximum number of date_lists is 5

    • differentiate different date_list definitions by the name attribute (i.e. name=)

      Example:

    /var/autofs/mnt/hostmachine/projects/RRFS/prototype/met_out/{config}/{fcst_init}/{mem}/{valid_times}/metprd/{met_out}

    • the fcst_init and valid_times subdirectories are based on datetime

    • assign the fcst_init subdirectory to a descriptively named date_list attribute:

      e.g. <date_list name=”folder_dates”>

      • this attribute name will be used in the load_val element within the folder_tmpl element to describe the {fcst_init} subdirectory template

    • assign the valid_times subdirectory to a descriptively named date_list attribute:

      e.g. <date_list name=”valid_dates”>

      • this attribute name will be used in the load_val element within the folder_tmpl element to describe the {valid_times} subdirectory template

start
  • mandatory if date_list is being used

  • start datetime

end
  • mandatory if date_list is being used

  • end datetime

inc
  • mandatory if date_list is being used

  • increment/step size between start and end time

    • Example, if 6-hour increment:

    • set inc to 0600

    • <inc>0600</inc>

format
  • mandatory if date_list is being used

  • format of the datetime

    • For example, to specify 4 digit year, 2 digit month, 2 digit day, and 2 digit hour:
      • <format>yyyyMMddHH</format>

The following elements define various flags

verbose
  • mandatory

  • indicates the desired volume of output from the load module

    • TRUE resulting in more information

    • FALSE resulting in less information

insert_size
  • mandatory

  • An integer indicating the number of MET output file rows inserted with each INSERT statement

    • This value is most often 1

stat_header_db_check
  • optional

  • indicate whether a database query check for stat header information should be performed

  • True or False (case insensitive)

    • WARNING enabling this feature (i.e. set to True) could significantly increase load time

mode_header_db_check
  • optional

  • indicate whether a database query check for the MODE header information should be performed

  • True or False (case insensitive)

    • WARNING enabling this feature (i.e. set to True) could significantly increase load time

mtd_header_db_check
  • optional

  • indicate whether a database query check for the MODE TD header information should be performed

  • True or False (case insensitive)

    • WARNING enabling this feature (i.e. set to True) could significantly increase load time

drop_indexes
  • optional

  • indicate whether to drop database indexes before loading new data

  • True or False (case insensitive)

apply_indexes
  • optional

  • indicate whether to apply database indexes

  • True or False (case insensitive)

load_stat
  • optional

  • indicate whether or not to load STAT data

  • True or False (case insensitive)

load_mode
  • optional

  • indicate whether or not to load MODE data

  • True or False (case insensitive)

load_mtd
  • optional

  • indicate whether or not to load MODE TD (MODE Time Domain) data

  • True or False (case insensitive)

load_mpr
  • optional

  • indicate whether or not to load MPR (matched pair) data

  • True or False (case insensitive)

load_orank
  • optional

  • indicate whether or not to load ORANK (observed rank) data

  • True or False (case insensitive)

force_dup_file
  • optional

  • indicate whether or not to force load paths/files that already exist

  • True or False (case insensitive)

The following elements indicate which group the database should be assigned and a description.

group
  • optional

  • the name of the database group (databases are grouped in METviewer: e.g. Testing)

  • if undefined, the database will be placed under the NO GROUP group

  • minimum number of characters is 1

  • maximum number of characters is 300

  • acceptable characters (English), any combination:

    • upper and/or lower case alphabetic characters

    • any digits 0-9

    • _, . , - (underscore, period, dash)

description
  • optional

  • description of the database

  • minimum number of characters is 1

  • maximum number of characters is 300

  • acceptable characters (English), any combination:

    • upper and/or lower case alphabetic characters

    • any digits 0-9

    • _, . , - (underscore, period, dash)

Determine whether to use folder_tmpl or load_files to define where the input data resides (descriptions below):

The following defines the location of the input data to be loaded into the database based on data organized by datetime (and/or any other criteria).

folder_tmpl
  • for data in subdirectories that are datetimes OR data that resides under one directory

  • only one folder template element is permitted (i.e. only one <folder_tmpl> … </folder_tmpl> )

  • NOTE the date_list element MUST BE DEFINED (see above in the date_list description) if any subdirectories are based on datetime

Specify the directory where the data is located in one of the following methods:

Using value templates for directories:

Example:

/var/autofs/mnt/hostmachine/projects/RRFS/prototype/met_out/{config}/{fcst_init}/{mem}/{valid_times}/metprd/{met_out}

  • data is organized into various directories based on datetime, and other criteria

  • use { } around “variable” names (in XML, these indicate attribute value templates)

    config, fcst_init, mem, valid_times, and met_out are attribute value template values that must be defined under the load_val element (for more details, refer to the load_val description below)

Specify a single directory where all data reside:

Example:

/var/autofs/mnt/hostmachine/projects/RRFS/prototype/met_out/mem00/metprd/all_runs

  • all datafiles are located under this directory (indicate the full path)

load_val
  • optional if folder_tmpl specifies a single directory where all data resides

  • mandatory if folder_tmpl has datetime subdirectories
    • field elements correspond to each attribute value template (i.e. variable names enclosed in {})

field
  • mandatory if folder_tmpl has subdirectories that are datetimes

  • each field element defines the attribute value template in the directory structure (i.e. the variable inside the {}) - val elements that can specify more subdirectories

    • optional

    • necessary when specifying subdirectories that are not datetimes (e.g. /path-to/model_A/air_quality/)

    • maximum number of val elements: 100

    • date_list elements for subdirectories that are datetimes

      • optional

      • mandatory when subdirectories are datetimes (e.g. /path-to/20240101/model/)

      • maximum number of date_list elements: 5

For this folder_tmpl example, which has datetime subdirectories:

/var/autofs/mnt/hostmachine/projects/RRFS/prototype/met_out/{config}/{fcst_init}/{mem}/{valid_times}/metprd/{met_out}

The following are the name attributes for the field element for the above example:

config
  • corresponds to the {config} template:

    <field name=”config”>

val
  • for defining non-datetime subdirectories

    • maximum number of val elements is 100

e.g.:

<field name=”config”>

<val>HREF_lag_offset</val>

<val>RTPS</val>

</field>

fcst_init
  • corresponds to the {fcst_init} template:

    <field name=”fcst_init”>

date_list
  • the name attribute corresponds to one of the date_list attribute names

    <field name=”fcst_init”>

    <date_list name=”folder_dates”/>

    </field>

    The “folder_dates” name attribute was assigned in the date_list portion of the XML load specification file

    <date_list name=”folder_dates”>

    <start>2022050100</start>

    <end>2022051200</end>

    <inc>86400</inc>

    <format>yyyyMMddHH</format>

    </date_list>

mem
  • corresponds to the {mem} template:

    <field name=”mem”>

val
  • for defining non-datetime subdirectories

    • maximum number of val elements is 100

    e.g.:

    <field name=”mem”>

    <val>mem01</val>

    <val>mem02</val>

    <val>mem03</val>

    <val>mem04</val>

    <val>mem05</val>

    <val>mem06</val>

    <val>mem07</val>

    <val>mem08</val>

    <val>mem09</val>

    <val>mem10</val>

    </field>

valid_times
  • corresponds to the {valid_times} template:

    <field name=”valid_times”>

date_list
  • the name attribute corresponds to one of the date_list attribute names

    <field name=”valid_times”>

    <date_list name=”valid_dates”>

    </field>

    the “valid_dates” name attribute value was assigned in the date_list portion of the XML load specification file

    <date_list name=”valid_dates”>

    <start>2022050100</start>

    <end>2022051200</end>

    <inc>0600</inc>

    <format>yyyyMMddHH</format>

    </date_list>

met_out
  • corresponds to the {met_out} template

val
  • for defining non-datetime subdirectories

    • maximum number of val elements is 100

e.g.:

<field name=”met_out”>

<val>grid_stat_cmn</val>

<val>point_stat_cmn</val>

</field>

For this folder_tmpl example, which has all the data under one directory (no datetime subdirectories):

/var/autofs/mnt/hostmachine/projects/RRFS/prototype/met_out/metprd/point_stat

The following defines the location of specific input data files to be loaded into the database.

load_files
  • for specifying the location of individual data files that are in different directories

file
  • mandatory for defining location of specific data files

    • maximum number of data files: 200

    Example:

    <load_files>

    <file>/met_out/mode/mode_MASK_POLY_300000L_20120410_180000V_060000A_cts.txt</file>

    <file>/met_out/rhist/ensemble_stat_RRFS_GEFS_GF.SPP.SPPT_RETOP_MRMS_20220507_120000V.stat</file>

    </load_files>

The following describe the linetypes to load.

line_type
  • optional

  • which MET output linetypes to load

  • if omitted, then all linetypes will be loaded

  • maximum number of line_type elements: 1

val element defines which MET output linetypes to load

val
  • the MET output linetype

    • linetype name (refer to the MET User’s Guide for a complete list of linetypes)

      Example:

      <line_type>

      <val>ECNT</val>

      <val>VL1L2</val>

      <val>SAL1L2</val>

      </line_type>

The following option allows one to indicate whether to save the XML commands into the database.

load_xml
  • optional

  • Option to save the XML into the database

  • Only takes effect when the load_note element is present

  • Acceptable values: TRUE or FALSE (case-insensitive)

  • Default value: TRUE

Example:

<load_xml>true</load_xml>

The following allows one to create a note into the instance_info database table.

load_note
  • optional

  • Add a descriptive “note” into the database

Example:

<load_note>Load HREF and RTPS data from Spring 2022</load_note>

4.2.2. Load Data

Now the MET data can be loaded in the database using the met_db_load.py script in the path-to-METdataio-source/METdbLoad/ush directory. The path-to-METdataio-source is the directory where the METdataio source code is saved.

Note

Only data files with the .stat extension will be loaded

The usage statement for met_db_load.py:
INFO:root:--- *** --- Start METdbLoad --- *** ---

usage: python met_db_load.py [-h] [-index] xmlfile [tmpdir [tmpdir ...]]

positional arguments:
  xmlfile     Please provide required xml load_spec filename
  tmpdir      Optional - when different directory wanted for tmp file

optional arguments:
  -h, --help  show this help message and exit

  -index      Only process index, do not load data
cd /path-to-METdataio-source/METdataio/METdbLoad/ush

python met_db_load.py /path-to/load_specification.xml

If logging is desired, redirect output to a log file (via &> command):

python met_db_load.py /path-to/load_specification.xml  &> log/your_logname.log &
  • Replace path-to-METdataio-source to the path to the location of the cloned or forked METdataio source code.

  • Replace the path-to with the location where the load_specification.xml XML load specification file was saved.

Refer to the section Create the XML Specification File and expand the drop-down instructions “The following is an explanation of the required and optional elements and any limitations” for details on what is expected in the XML load specification file.

4.3. Troubleshooting

Error:

ERROR: Caught class com.mysql.jdbc.exceptions.MySQLIntegrityConstraintViolationException: Duplicate entry ‘CT07-NMM-LIN-R2-0-2005-07-15 12:00:00-2005-07-15 12:00:00-0-2005’ for key 2

Solution:

This error is caused by trying to insert a stat_header record into the database when an identical one already exists. If identical stat_header information is present in more than one stat file, set the <stat_header_db_check> value to true. This setting will reduce performance, because the stat_header table is checked for duplicate stat_header each time a row is inserted. However, if a stat_header row already exists in the table with the insert information, then the existing record will be used instead of trying to insert a duplicate.

Error:

ERROR:root: (1049, “Unknown database ‘mv_test’”) in run_sql Error when connecting to database

Solution:

This error is caused when attempting to load data into a database that does not exist. You will need to create the database, set up the appropriate privileges as outlined above, and load the schema using the mv_mysql.sql file.

Error:

/full-path-to/xyz.xml is not valid and may contain a recursive payload or an excessively large payload

Solution:

This error is typically encountered when one of the following conditions exist as a result of failing the XML validation step:

  • the order of the elements in the XML load specification file is inconsistent with the order expected

  • the XML load specification file is missing one or more mandatory elements

  • one or more elements has exceeded size limits specified in the XML schema

  • there are additional XML elements that are not expected

Refer to the section **Create the XML Load Specification File to verify that the XML load specification file is correct.**