Get started
The R-factor scripts can be used to:
- Compute the erosivity \(EI_{30}\) values for a number of stations and years. 
- Use the computed \(EI_{30}\) values to compute an R-value. 
On this page, an example application can be found. Pandas is used as package to format input data for the computations, for which a default is defined in the section below. The package holds two specific example functions to load files (see Matlab KU Leuven) to the pandas format (see section File handling).
From 10’ rain data to EI for a single station/year
The input of the erosivity algorithm is a rainfall time series with a 10’ resolution as a Pandas DataFrame, e.g.
| idx | datetime | rain_mm | station | 
|---|---|---|---|
| 0 | 2018-01-01 02:10:00 | 0.27 | P01 | 
| 1 | 2018-01-01 02:20:00 | 0.02 | P01 | 
| 2 | 2018-01-01 03:10:00 | 0.48 | P01 | 
| … | … | … | … | 
The data can be derived from any source and contain more columns (e.g. the
measurement station identifier), but the datetime, rain_mm and
station are required to apply the erosivity algorithm. The data can
contain both data of a single year/station  as multiple years/stations (see
further to calculate multiple stations together. Make sure that the
station column is present also for the single station case.
Note
The pandas input data can not hold NULL (NaN) or 0! The user needs to delete/interpolate these values beforehand.
Erosivity (EI30-values) for a single station/year combination can be computed
(make sure to activating the conda environment, conda activate rfactor).
The rfactor.compute_erosivity() function applies the algorithm to a
DataFrame containing data for a single station/year, e.g. for the data in
DataFrame df_rain.
from rfactor import compute_erosivity, rain_energy_verstraeten2006, maximum_intensity
erosivity = compute_erosivity(df_rain, rain_energy_verstraeten2006, maximum_intensity)
Note
1. Other rain energy functions that are implemented are Brown and Foster
(see rfactor.rfactor.rain_energy_brown_and_foster1987()) and McGregor
(see rfactor.rfactor.rain_energy_mcgregor1995())
2. Equally to note above, the
rfactor.rfactor.rain_energy_verstraeten2006() is the
default method to deriver rain energy per unit depth for an event. Also
here, users can implement their custom function.
3. Note that preimplemented energy functions are valid for ten minute rainfall input data.
The output is a DataFrame with the intermediate results and the cumulative erosivity of each of the defined events:
| event_idx | datetime | … | all_event_rain_cum | erosivity_cum | station | year | 
|---|---|---|---|---|---|---|
| 2 | 2018-01-01 14:30:00 | … | 1.08 | 5.01878 | P01 | 2018 | 
| 3 | 2018-01-02 16:30:00 | … | 12.37 | 8.00847 | P01 | 2018 | 
| 4 | 2018-01-04 09:10:00 | … | 20.13 | 8.33275 | P01 | 2018 | 
| 5 | 2018-01-05 02:20:00 | … | 22.47 | 8.61462 | P01 | 2018 | 
| … | … | … | … | … | … | … | 
As the “erosivity_cum” column contains the cumulative erosity over the events, the last value is the R-factor of the year/station:
erosivity["erosivity_cum"].iloc[-1]
Other relevant derived statistics, such as the monthly and biweekly based R-factor can be calculated using the existing Pandas functionalities:
erosivity.resample("M", on="datetime")["erosivity"].sum()  # Monthly value
erosivity.resample("SM", on="datetime")["erosivity"].sum()  # Biweekly value
Calculating multiple station/year combinations
When data are available from multiple stations over multiple years in a single
DataFrame, the rfactor.compute_erosivity() function applies the
erosivity algorithm on each year/station combination in the input rain
DataFrame. To do so, an additional column with the station name is
required:
| datetime | rain_mm | station | |
|---|---|---|---|
| 0 | 2018-01-01 02:10:00 | 0.27 | P01 | 
| 1 | 2018-01-01 02:20:00 | 0.02 | P01 | 
| 2 | 2018-01-01 03:10:00 | 0.48 | P01 | 
| … | … | … | … | 
| 10 | 2019-01-01 01:10:00 | 0.52 | P01 | 
| 11 | 2019-01-01 01:20:00 | 0.20 | P01 | 
| … | … | … | … | 
| 123 | 2018-01-01 00:10:00 | 0.02 | P02 | 
| 124 | 2018-01-01 00:20:00 | 0.32 | P02 | 
| … | … | … | … | 
from rfactor import compute_erosivity, maximum_intensity
erosivity = compute_erosivity(df_rain)
The output is very similar to the previous section, but the data contains now multiple years and/or stations:
| event_idx | datetime | … | all_event_rain_cum | erosivity_cum | station | year | 
|---|---|---|---|---|---|---|
| 2 | 2018-01-01 14:30:00 | … | 1.08 | 5.01878 | P01 | 2018 | 
| 3 | 2018-01-02 16:30:00 | … | 12.37 | 8.00847 | P01 | 2018 | 
| … | … | … | … | … | … | … | 
| 1 | 2019-01-04 09:10:00 | … | 20.13 | 8.33275 | P01 | 2019 | 
| 2 | 2019-01-05 02:20:00 | … | 22.47 | 8.61462 | P01 | 2019 | 
| … | … | … | … | … | … | … | 
To derive the R-factor for each year/station in the data set, one can use the existing Pandas functionalities:
erosivity.groupby(["station", "year"])["erosivity_cum"].last().reset_index()
File handling
This package provides an example processing function in the
rfactor.process module to enable compatibility of the input format with
the required data format defined in this package (see previous section).
- rfactor.process.load_rain_file_matlab_legacy(): This is the processing function used to process the- Matlab KU-Leuvenfile legacy.
This file-format can be loaded with the defined processing function, i.e.
from pathlib import Path
from rfactor.process import load_rain_file, load_rain_file_matlab_legacy
# Load a Matlab-file
fname = Path("/PATH/TO/YOUR/RAINFALL/DATA/FOLDER/P01_001_2018.txt")
from_matlab = load_rain_file_matlab_legacy(fname)
# or
fname = Path("/PATH/TO/YOUR/RAINFALL/DATA/FOLDER/P01_001_2018.txt")
from_matlab = load_rain_file(fname, load_rain_file_matlab_legacy)
The load_rain_file function is a helper function that checks if the output format of the processing function is valid. This implies users can implement custom load functions that return dataframes with following definition (column name: type):
- datetime: datetime64[ns] 
- station: str 
- value: float 
Or a folder containing multiple files can be loaded:
from pathlib import Path
from rfactor.process import load_rain_file_matlab_legacy, load_rain_folder
# Load an entire set of Matlab-legacy files
folder = Path("/PATH/FOLDER/CONTAINING/MATLABFORMAT/FILES")
from_matlab = load_rain_folder(folder, load_rain_file_matlab_legacy)
Note
Do not forget to use a pathlib.Path defined file name or
folder name.
In the next subsection, an example is provided.
Matlab KU-Leuven legacy
The input files are defined by text files (extension: .txt) that
hold non-zero rainfall timeseries. The data are split per station and
per year with a specific datafile tag (format: SOURCE_STATION_YEAR.txt):
- KMI_6414_2004.txt 
- KMI_6414_2005.txt 
- … 
- KMI_6434_2003.txt 
- KMI_6434_2004.txt 
- … 
The content of each of this file is a non-zero rainfall timeseries (no header, space delimited):
9390 1.00
9470 0.20
9480 0.50
10770 0.10
... ...
with the first column being the timestamp from the start of the year (minutes) , and second the rainfall depth (in mm).
Custom example
An example of a custom function is posted below, holding removal of 0 and NaN values and an example for interpolation:
def load_rain_file_example(file_path, interpolate=False):
    """Load any txt file which is formatted in the correct format.
    The input files are defined by tab delimited files (extension: ``.txt``) that
    hold rainfall timeseries. The data are split per monitoring station and the file
    name should be the station identifier. The file should contain two columns:
    - *Date/Time*
    - *Value [millimeter]*
    Parameters
    ----------
    file_path : pathlib.Path
        File path (comma delimited, .CSV-extension) with rainfall data according to
        defined format:
        - *datetime*: ``%d-%m-%Y %H:%M:%S``-format
        - *Value [millimeter]*: str (containing floats and '---'-identifier)
        Headers are not necessary for the columns.
    interpolate: bool
        Interpolate NaN yes/no
    Returns
    -------
    rain : pandas.DataFrame
        DataFrame with rainfall time series. Contains the following columns:
        - *datetime* (pandas.Timestamp): Time stamp.
        - *minutes_since* (float): Minutes since start of year.
        - *station* (str): station identifier.
        - *rain_mm* (float): Rain in mm.
    Example
    -------
    1. Example of a rainfall file:
    ::
        01-01-2019 00:00,"0"
        01-01-2019 00:05,"0.03"
        01-01-2019 00:10,"0.04"
        01-01-2019 00:15,"0"
        01-01-2019 00:20,"0"
        01-01-2019 00:25,"---"
        01-01-2019 00:30,"0"
    Notes
    -----
    Strings ``---`` in column *Value [millimeter]* -identifiers are converted to
    NaN-values (np.nan). Note that the values in string should be convertable to float
    (except ``---``).
    """
    df = pd.read_csv(file_path, sep="\t", header=None, names=['datetime', 'rain_mm'])
    if not {"datetime", "rain_mm"}.issubset(df.columns):
        msg = (
            f"File '{file_path}' should should contain columns 'datetime' and "
            f"'Value [millimeter]'"
        )
        raise KeyError(msg)
    df["datetime"] = pd.to_datetime(df["datetime"])
    df["start_year"] = pd.to_datetime(
        [f"01-01-{x} 00:00:00" for x in df["datetime"].dt.year],
    )
    station, year = _extract_metadata_from_file_path(file_path)
    df["station"] = station
    nan = ["---", ""]
    df.loc[df["rain_mm"].isin(nan), "rain_mm"] = np.nan
    df.loc[df["rain_mm"] < 0, "rain_mm"] = np.nan
    if interpolate:
        df["rain_mm"] = df["rain_mm"].interpolate(method="linear")
    # remove 0
    df = df[df["rain_mm"] != 0]
    # remove NaN
    df = df[~df["rain_mm"].isna()]
    df["rain_mm"] = df["rain_mm"].astype(np.float64)
    return df[["datetime", "station", "rain_mm"]]
folder = Path("/PATH/FOLDER/CONTAINING/CUSTOMFORMAT/FILES")
from_matlab = load_rain_folder(folder, load_rain_file_example, interpolate=True)
Output erosivity
To export the resulting DataFrame with erosivity values into the legacy output format:
from pathlib import Path
from rfactor.process import load_rain_folder, load_rain_file
# Works both on a single station/year as multiple station/year combinations
write_erosivity_data(erosivity, Path("/PATH/TO/YOUR/EROSIVITY/OUTPUT"))
Analyse R-values
The R-value is determined by the number of years and stations the users wishes to consider to compute the R value. By using Pandas DataFrame to store the erosivity, all funtionalities for slicing/filtering/plotting/… are available directly.
For example, consider one wants to compute the R-value for 2017 and 2018, for Ukkel (stations: KMI_6447 and KMI_FS3):
erosivity_selected = erosivity[(erosivity["year"].isin([2017, 2018])) &
                   (erosivity["station"].isin(['KMI\_6447', 'KMI\_FS3']))]