rfactor package

rfactor.rfactor module

exception rfactor.rfactor.RFactorInputError[source]

Bases: Exception

Raise when input data are not conform the rfactor required input format.

exception rfactor.rfactor.RFactorKeyError[source]

Bases: Exception

Raise when input data missing required column names.

exception rfactor.rfactor.RFactorTypeError[source]

Bases: Exception

Raise when input data data type of a data column is wrong.

rfactor.rfactor.compute_erosivity(rain, energy_method=<function rain_energy_verstraeten2006>, intensity_method=<function maximum_intensity>, **kwargs)[source]

Calculate erosivity for each year/station combination

Parameters:
  • rain (pandas.DataFrame) –

    DataFrame with rainfall time series. Need to contain the following columns:

    • datetime (pandas.Timestamp): Time stamp

    • rain_mm (float): Rain in mm

    • station (str): Measurement station identifier

  • energy_method (Callable, default rain_energy_per_unit_depth_verstraeten2006) – Function to compute the rain energy per unit depth

  • intensity_method (Callable, default maximum_intensity) – Function to derive the maximal rain intensity (over 30min).

Returns:

all_erosivity – DataFrame with erosivity output for each event.

  • station* (str)

  • year (int)

  • tag (str): unique tag for year, station-couple.

  • event_rain_cum (float): Cumulative rain for each event

  • all_events_cum (float): Cumulative rain over the whole timeseries

  • max_30min_intensity (float): Maximal 30min intensity for each event

  • event_energy (float): Rain energy per unit depth for each event

  • erosivity (float): Erosivity for each event

  • erosivity_cum (float): Cumulative erosivity over all events together

Return type:

pandas.DataFrame

Notes

  1. NaN- and 0-values are removed from the input timeseries.

rfactor.rfactor.maximum_intensity(df)[source]

Maximum rain intensity for 30-min interval (Pandas rolling) expressed as mm/hour

The implementation uses a rolling window of the chosen interval to derive the maximal intensity.

Parameters:

df (pandas.DataFrame) –

DataFrame with rainfall time series. Needs to contain the following columns:

  • datetime (pandas.Timestamp): Timestamp

  • rain_mm (float): Rain in mm. No NaN or 0-values allowed

Returns:

maxprecip_30min – Maximal 30-minute intensity during event (in mm/h).

Return type:

float

rfactor.rfactor.maximum_intensity_interpolate(df)[source]

Maximum rain intensity for 30-min interval (Matlab clone Fix). This implementation is a fixed version of the Python-translation of the original Matlab implementation by [3].

Changes to the original script are:
  • In the if-statement ‘if timestamps[-1] - timestamps[0] <= 30:’ this methode calculates the total amount of rain during the interval while the original method only looks at the first rainfall entry.

  • In the same if-statement, the *2 was removed, since this is already done in the ‘return’ step of the model. This *2 causes the model to steeply over estimate the rainfall during short rainfall events.

Parameters:

df (pandas.DataFrame) – DataFrame with rainfall time series. Needs to contain the following columns: - datetime (pandas.Timestamp): Time stamp - rain_mm (float): Rain in mm. No NaN or 0-values allowed - event_rain_cum (float): Cumulative rain in mm

Returns:

maxprecip_30min – Maximal 30-minute intensity during event (in mm/h).

Return type:

float

Notes

The Python and original Matlab implementation linearly interpolate zero and NaN-values within one event.

rfactor.rfactor.maximum_intensity_matlab_clone(df)[source]

Maximum rain intensity for 30-min interval (Matlab clone).

The implementation is a direct Python-translation of the original Matlab implementation by Verstraeten et al. (2006) [3].

Parameters:

df (pandas.DataFrame) – DataFrame with rainfall time series. Needs to contain the following columns: - datetime (pandas.Timestamp): Time stamp - rain_mm (float): Rain in mm. No NaN or 0-values allowed - event_rain_cum (float): Cumulative rain in mm

Returns:

maxprecip_30min – Maximal 30-minute intensity during event (in mm/h).

Return type:

float

Notes

The Python and original Matlab implementation linearly interpolate zero and NaN-values within one event.

rfactor.rfactor.rain_energy_brown_and_foster1987(rain)[source]

Calculate rain energy per unit depth according to Brown and Foster.

Brown and Foster is applied considering a 10-minute interval input rainfall data set.

Parameters:

rain (numpy.ndarray) – Rain (mm)

Returns:

energy – Energy per unit depth.

Return type:

float

Notes

The rain energy per unit depth \(e_r\) (\(\text{MJ}.\text{mm}^{-1}. \text{ha}^{-1}\)) is defined by [4] and [5]:

\[e_r = 0.29*(1-0.72*exp(-0.05*i_r)\]

with

  • \(i_r\) the rain intensity for every 10-min increment (mm \(\text{h}^{-1}\) ).

The rain energy is multiplied by the volume of rain (per 10 minutes) and summed per event to compute the total energy of the event. The formula applies for a 10 minute rainfall input data set.

References

rfactor.rfactor.rain_energy_mcgregor1995(rain)[source]

Calculate rain energy per unit depth according to McGregor with 10 minute interval data.

McGregor is applied considering a 10-minute interval input rainfall data set.

Parameters:

rain (numpy.ndarray) – Rain (mm)

Returns:

energy – Energy per unit depth.

Return type:

float

Notes

The rain energy per unit depth \(e_r\) (\(\text{MJ}.\text{mm}^{-1}. \text{ha}^{-1}\)) is defined by [6]:

\[e_r = 0.29*(1-0.72*exp(-0.08*i_r)\]

with

  • \(i_r\) the rain intensity for every 10-min increment (mm \(\text{h}^{-1}\) ).

The rain energy is multiplied by the volume of rain (per 10 minutes) and summed per event to compute the total energy of the event. The formula applies for a 10 minute rainfall input data set.

References

rfactor.rfactor.rain_energy_verstraeten2006(rain)[source]

Calculate rain energy per unit depth according to Salles/Verstraeten with 10 minute interval data.

Verstraeten is applied considering a 10-minute interval input rainfall data set.

Parameters:

rain (numpy.ndarray) – Rain (mm)

Returns:

energy – Energy per unit depth.

Return type:

float

Notes

The rain energy per unit depth \(e_r\) (\(\text{MJ}.\text{mm}^{-1}. \text{ha}^{-1}\)) for an application for Flanders/Belgium is defined by [1] , [2] and [3]:

\[e_r = 0.1112i_r^{0.31}\]

with

  • \(i_r\) the rain intensity for every 10-min increment (mm \(\text{h}^{-1}\) ).

The rain energy is multiplied by the volume of rain (per 10 minutes) and summed per event to compute the total energy of the event. The formula applies for a 10 minute rainfall input data set.

References

rfactor.process module

class rfactor.process.RainfallFilesIOMsg[source]

Bases: str

Print message a string

rfactor.process.compute_diagnostics(rain)[source]

Compute diagnostics for input rainfall.

This function computes coverage (per year, station) and missing rainfall for each month (per year, station).

Parameters:

rain (pandas.DataFrame) –

DataFrame with rainfall time series. Contains at least the following columns:

  • rain_mm (float): Rain in mm

  • datetime (pandas.Timestamp): Time stamp

  • station (str): station name

  • year (int): year of the measurement

  • tag (str): tag identifier, formatted as STATION_YEAR

Returns:

diagnostics – Diagnostics per station, year with coverage and identifier for no-rain per month. Computed based on non-zero rainfall timeseries.

  • station (str): station identifier.

  • year (int): year.

  • coverage (float): percentage coverage non-zero timeseries (see Notes).

Added with per month (id’s 1 to 12):

  • months (int): 1: no rain observed in month, 0: rain observed.

Return type:

pandas.DataFrame

Notes

The coverage is computed as:

\[C = 100*[1-\frac{\text{number of NULL-data}} {\text{length of non-zero timeseries}}]\]
rfactor.process.compute_rainfall_statistics(df_rainfall, df_station_metadata=None)[source]

Compute general statistics for rainfall timeseries.

Statistics (number of records, min, max, median and years data) are computed for each measurement station

Parameters:
  • df_rainfall (pandas.DataFrame) – See rfactor.process.load_rain_file()

  • df_station_metadata (pandas.DataFrame) –

    Dataframe holding station metadata. This dataframe has one mandatory column:

    • station (str): Name or code of the measurement station

    • x (float): X-coordinate of measurement station.

    • y (float): Y-coordinate of measurement station.

Returns:

df_statistics – Apart from the station, x, y when df_station_metadata is provided, the following columns are returned:

  • year (list): List of the years fror which data is available for the station.

  • records (int): Total number of records for the station.

  • min (float): Minimal measured value for the station.

  • median (float): Median measured value for the station.

  • max (float): Maximal measured value for the station.

Return type:

pandas.DataFrame

rfactor.process.get_rfactor_station_year(erosivity, stations=None, years=None)[source]

Get R-factor at end of every year for each station from cumulative erosivity.

Parameters:
Returns:

erosivity – Updated with:

  • year (int): year

  • station (str): station

  • erosivity_cum (float): cumulative erosivity at end of year and at station.

Return type:

pandas.DataFrame

rfactor.process.load_rain_file(file_path, load_fun, **kwargs)[source]

Load file format of rainfall data with a given load function

Parameters:
  • file_path (pathlib.Path) – File path with rainfall data. Note that files in the folder should follow the input data format defined in the load_fun.

  • load_fun (Callable) –

    Please check the required input/output format for the files of the used load functions. The output of this function must comply with:

    • datetime (datetime64[ns]): timestamp, timezone naive

    • station (object): name of station, must be formatting accoring to a string.

    • value (float): in mm

  • kwargs – Keyword arguments for load_fun

Returns:

rain – DataFrame with rainfall time series. Contains at least the following columns:

  • rain_mm (float): Rain in mm

  • datetime (pandas.Timestamp): Time stamp

  • minutes_since (float): Minutes since start of year.

  • station (str): station name

  • year (int): year of the measurement

  • tag (str): tag identifier, formatted as STATION_YEAR

Return type:

pandas.DataFrame

rfactor.process.load_rain_file_flanders(file_path, interpolate=None, interval=inf, threshold_outliers=None)[source]

Example load functions developed in context of Flanders.

Translated the input file_path to the default input data used this package. This functions can be used for users an example to format functions. The file is a tab delimited files (extension: .txt), and holds the timeseries for one location. The name of the file is the tag that will be used.

Parameters:
  • file_path (pathlib.Path) –

    File path (tab delimited, .txt-extension). Headerless

    • %d-%m-%Y %H:%M:%S-format

    • float

  • interpolate (str, default None) – Interpolation method to use for NaN-Values. Possible values: see pandas.DataFrame.interpolate.

  • interval (int, default np.inf) – The max interval length over which NaN values are interpolated. The value needs to fit the index of the timeseries. For example, a timeseries with resolution of 10 min will have a maximum interval length of 6 hours if the interval value is set to 36 (36 * 10 min = 6 hours).

  • threshold_outliers (int, default None) – Set rainfall values above this threshold to NaN.

Returns:

rain – DataFrame with rainfall time series. Contains the following columns:

  • datetime (pandas.Timestamp): Time stamp.

  • minutes_since (float): Minutes since start of year.

  • station (str): station identifier.

  • rain_mm (float): Rain in mm.

Return type:

pandas.DataFrame

Example

  1. Example of a rainfall file:

2024-01-01 00:00:00     0.0
2024-01-01 00:10:00     0.0
2024-01-01 00:20:00     0.0
2024-01-01 00:30:00     10.5
2024-01-01 00:40:00     5.2
2024-01-01 00:50:00     1
2024-01-01 01:00:00     0.02
2024-01-01 01:10:00

Notes

  1. Current function is not maintained until further notice.

rfactor.process.load_rain_file_matlab_legacy(file_path)[source]

Load (legacy Matlab) file format of rainfall data of a single station/year.

The input files are defined by text files (extension: .txt) that hold non-zero rainfall timeseries. The data are split per station and per year with a specific datafile tag (file name format: SOURCE_STATION_YEAR.txt). The data should not contain headers, with the first column defined as ‘minutes since the start of the year’ and the second as the rainfall depth during the t last minutes (t is the temporal resolution of the timeseries).

Parameters:

file_path (pathlib.Path) – File path with rainfall data according to defined format, see notes.

Returns:

rain – DataFrame with rainfall time series. Contains the following columns:

  • minutes_since (int): Minutes since the start of the year

  • rain_mm (float): Rain in mm

  • datetime (pandas.Timestamp): Time stamp

  • station (str): station name

Return type:

pandas.DataFrame

Example

  1. Example of a rainfall file:

9390 1.00

9470 0.20

9480 0.50

10770 0.10

...  ...
rfactor.process.load_rain_folder(folder_path, load_fun, **kwargs)[source]

Load all (legacy Matlab format) files of rainfall data in a folder

Parameters:
  • folder_path (pathlib.Path) – Folder path with rainfall data, see also rfactor.process.load_rain_file(). Folder must contain txt files.

  • load_fun (Callable) –

    Please check the required input format for the files in the above listed functions. The (custom) function must output:

    • datetime (datetime64[ns]): timestamp, timezone naive

    • station (object): name of station, must be formatting accoring to a string.

    • value (float): in mm

  • kwargs – Keyword arguments for load_fun

Returns:

rain – See definition in rfactor.process.load_rain_file()

Return type:

pandas.DataFrame

rfactor.process.write_erosivity_data(df, folder_path)[source]

Write output erosivity to (legacy Matlab format) in folder.

Written data are split-up for each year and station (file name format: SOURCE_STATION_YEAR.txt) and does not contain any headers. The columns (no header!) in the written text files represent the following:

  • days_since (float): Days since the start of the year.

  • erosivity_cum (float): Cumulative erosivity over events.

  • all_event_rain_cum (float): Cumulative rain over events.

Parameters:
  • df (pandas.DataFrame) –

    DataFrame with rfactor/erosivity time series. Can contain multiple columns, but should have at least the following:

    • datetime (pandas.Timestamp): Time stamp

    • station (str): Station identifier

    • erosivity_cum (float): Cumulative erosivity over events

    • all_event_rain_cum (float): Cumulative rain over events

  • folder_path (pathlib.Path) – Folder path to save data according to legacy Matlab format, see rfactor.process.load_rain_file().

rfactor.valid module

rfactor.valid.valid_column(rain, req_col)[source]

Input dataframe has valid required columns

Parameters:
  • rain (pd.DataFrame) – To test dataframe

  • req_col (set) – Required columns in dataframe, e.g. {“datetime”, “rain_mm”}

rfactor.valid.valid_const_freq(rain)[source]

Check if rainfall inputdata has constant frequency

Parameters:

rain (pandas.DataFrame)

rfactor.valid.valid_freq(df_freq, req_freq=None)[source]

Test for valid frequency of input data

The frequency of the input data is tested to a defined frequency. Limit usage R-factor package to above 1-minute resolution.

Parameters:
  • df_freq (pandas.DatetimeIndex.freq) – Temporal frequency in the rainfall data

  • req_freq (int, default None) – Required frequency (minutes). If None, the req_frequence should at least be 1 minute.

rfactor.valid.valid_rainfall_timeseries(func=None, req_col={'datetime', 'rain_mm'}, req_freq=None)[source]
Customisable decorator to check pandas input rainfall data for functions used in

this package.

Parameters:
  • func (callable, default None)

  • req_col (set) – See rfactor.process.valid_column()

  • int (req_freq) – See rfactor.process.valid_freq()

Returns:

decorator – Return the execution of the actual decorator

Return type:

callable

Notes

Use super decorator to allow for decorator inputs