rfactor package

rfactor.rfactor module

exception rfactor.rfactor.RFactorInputError[source]

Bases: Exception

Raise when input data are not conform the rfactor required input format.

exception rfactor.rfactor.RFactorKeyError[source]

Bases: Exception

Raise when input data missing required column names.

exception rfactor.rfactor.RFactorTypeError[source]

Bases: Exception

Raise when input data data type of a data column is wrong.

rfactor.rfactor.compute_erosivity(rain, energy_method=<function rain_energy_verstraeten2006>, intensity_method=<function maximum_intensity>)[source]

Calculate erosivity for each year/station combination

Parameters:
  • rain (pandas.DataFrame) –

    DataFrame with rainfall time series. Need to contain the following columns:

    • datetime (pandas.Timestamp): Time stamp

    • rain_mm (float): Rain in mm

    • station (str): Measurement station identifier

  • energy_method (Callable, default rain_energy_per_unit_depth_verstraeten2006) – Function to compute the rain energy per unit depth

  • intensity_method (Callable, default maximum_intensity) – Function to derive the maximal rain intensity (over 30min).

Returns:

all_erosivity – DataFrame with erosivity output for each event.

  • datetime (pandas.Timestamp): Time stamp

  • datetime (pandas.Timestamp): Time stamp

  • event_rain_cum (float): Cumulative rain for each event

  • max_30min_intensity (float): Maximal 30min intensity for each event

  • event_energy (float): Rain energy per unit depth for each event

  • erosivity (float): Erosivity for each event

  • all_events_cum (float): Cumulative rain over all events together

  • erosivity_cum (float): Cumulative erosivity over all events together

  • tag (str): unique tag for year, station-couple.

Return type:

pandas.DataFrame

Notes

NaN- and 0-values are removed from the input timeseries.

rfactor.rfactor.maximum_intensity(df)[source]

Maximum rain intensity for 30-min interval (Pandas rolling) expressed as mm/hour

The implementation uses a rolling window of the chosen interval to derive the maximal intensity.

Parameters:

df (pandas.DataFrame) –

DataFrame with rainfall time series. Needs to contain the following columns:

  • datetime (pandas.Timestamp): Timestamp

  • rain_mm (float): Rain in mm. No NaN or 0-values allowed

Returns:

maxprecip_30min – Maximal 30-minute intensity during event (in mm/h).

Return type:

float

rfactor.rfactor.maximum_intensity_interpolate(df)[source]

Maximum rain intensity for 30-min interval (Matlab clone Fix). This implementation is a fixed version of the Python-translation of the original Matlab implementation by [3].

Changes to the original script are:
  • In the if-statement ‘if timestamps[-1] - timestamps[0] <= 30:’ this methode calculates the total amount of rain during the interval while the original method only looks at the first rainfall entry.

  • In the same if-statement, the *2 was removed, since this is already done in the ‘return’ step of the model. This *2 causes the model to steeply over estimate the rainfall during short rainfall events.

Parameters:

df (pandas.DataFrame) – DataFrame with rainfall time series. Needs to contain the following columns: - datetime (pandas.Timestamp): Time stamp - rain_mm (float): Rain in mm. No NaN or 0-values allowed - event_rain_cum (float): Cumulative rain in mm

Returns:

maxprecip_30min – Maximal 30-minute intensity during event (in mm/h).

Return type:

float

Notes

The Python and original Matlab implementation linearly interpolate zero and NaN-values within one event.

rfactor.rfactor.maximum_intensity_matlab_clone(df)[source]

Maximum rain intensity for 30-min interval (Matlab clone).

The implementation is a direct Python-translation of the original Matlab implementation by Verstraeten et al. (2006) [3].

Parameters:

df (pandas.DataFrame) – DataFrame with rainfall time series. Needs to contain the following columns: - datetime (pandas.Timestamp): Time stamp - rain_mm (float): Rain in mm. No NaN or 0-values allowed - event_rain_cum (float): Cumulative rain in mm

Returns:

maxprecip_30min – Maximal 30-minute intensity during event (in mm/h).

Return type:

float

Notes

The Python and original Matlab implementation linearly interpolate zero and NaN-values within one event.

rfactor.rfactor.rain_energy_brown_and_foster1987(rain)[source]

Calculate rain energy per unit depth according to Brown and Foster.

Brown and Foster is applied considering a 10-minute interval input rainfall data set.

Parameters:

rain (numpy.ndarray) – Rain (mm)

Returns:

energy – Energy per unit depth.

Return type:

float

Notes

The rain energy per unit depth \(e_r\) (\(\text{MJ}.\text{mm}^{-1}. \text{ha}^{-1}\)) is defined by [4] and [5]:

\[e_r = 0.29*(1-0.72*exp(-0.05*i_r)\]

with

  • \(i_r\) the rain intensity for every 10-min increment (mm \(\text{h}^{-1}\) ).

The rain energy is multiplied by the volume of rain (per 10 minutes) and summed per event to compute the total energy of the event. The formula applies for a 10 minute rainfall input data set.

References

rfactor.rfactor.rain_energy_mcgregor1995(rain)[source]

Calculate rain energy per unit depth according to McGregor with 10 minute interval data.

McGregor is applied considering a 10-minute interval input rainfall data set.

Parameters:

rain (numpy.ndarray) – Rain (mm)

Returns:

energy – Energy per unit depth.

Return type:

float

Notes

The rain energy per unit depth \(e_r\) (\(\text{MJ}.\text{mm}^{-1}. \text{ha}^{-1}\)) is defined by [6]:

\[e_r = 0.29*(1-0.72*exp(-0.08*i_r)\]

with

  • \(i_r\) the rain intensity for every 10-min increment (mm \(\text{h}^{-1}\) ).

The rain energy is multiplied by the volume of rain (per 10 minutes) and summed per event to compute the total energy of the event. The formula applies for a 10 minute rainfall input data set.

References

rfactor.rfactor.rain_energy_verstraeten2006(rain)[source]

Calculate rain energy per unit depth according to Salles/Verstraeten with 10 minute interval data.

Verstraeten is applied considering a 10-minute interval input rainfall data set.

Parameters:

rain (numpy.ndarray) – Rain (mm)

Returns:

energy – Energy per unit depth.

Return type:

float

Notes

The rain energy per unit depth \(e_r\) (\(\text{MJ}.\text{mm}^{-1}. \text{ha}^{-1}\)) for an application for Flanders/Belgium is defined by [1] , [2] and [3]:

\[e_r = 0.1112i_r^{0.31}\]

with

  • \(i_r\) the rain intensity for every 10-min increment (mm \(\text{h}^{-1}\) ).

The rain energy is multiplied by the volume of rain (per 10 minutes) and summed per event to compute the total energy of the event. The formula applies for a 10 minute rainfall input data set.

References

rfactor.process module

class rfactor.process.RainfallFilesIOMsg[source]

Bases: str

Print message a string

rfactor.process.compute_diagnostics(rain)[source]

Compute diagnostics for input rainfall.

This function computes coverage (per year, station) and missing rainfall for each month (per year, station).

Parameters:

rain (pandas.DataFrame) –

DataFrame with rainfall time series. Contains at least the following columns:

  • rain_mm (float): Rain in mm

  • datetime (pandas.Timestamp): Time stamp

  • station (str): station name

  • year (int): year of the measurement

  • tag (str): tag identifier, formatted as STATION_YEAR

Returns:

diagnostics – Diagnostics per station, year with coverage and identifier for no-rain per month. Computed based on non-zero rainfall timeseries.

  • station (str): station identifier.

  • year (int): year.

  • coverage (float): percentage coverage non-zero timeseries (see Notes).

Added with per month (id’s 1 to 12):

  • months (int): 1: no rain observed in month, 0: rain observed.

Return type:

pandas.DataFrame

Notes

The coverage is computed as:

\[C = 100*[1-\frac{\text{number of NULL-data}} {\text{length of non-zero timeseries}}]\]
rfactor.process.compute_rainfall_statistics(df_rainfall, df_station_metadata=None)[source]

Compute general statistics for rainfall timeseries.

Statistics (number of records, min, max, median and years data) are computed for each measurement station

Parameters:
  • df_rainfall (pandas.DataFrame) – See rfactor.process.load_rain_file()

  • df_station_metadata (pandas.DataFrame) –

    Dataframe holding station metadata. This dataframe has one mandatory column:

    • station (str): Name or code of the measurement station

    • x (float): X-coordinate of measurement station.

    • y (float): Y-coordinate of measurement station.

Returns:

df_statistics – Apart from the station, x, y when df_station_metadata is provided, the following columns are returned:

  • year (list): List of the years fror which data is available for the station.

  • records (int): Total number of records for the station.

  • min (float): Minimal measured value for the station.

  • median (float): Median measured value for the station.

  • max (float): Maximal measured value for the station.

Return type:

pandas.DataFrame

rfactor.process.get_rfactor_station_year(erosivity, stations=None, years=None)[source]

Get R-factor at end of every year for each station from cumulative erosivity.

Parameters:
Returns:

erosivity – Updated with:

  • year (int): year

  • station (str): station

  • erosivity_cum (float): cumulative erosivity at end of year and at station.

Return type:

pandas.DataFrame

rfactor.process.load_rain_file(file_path, load_fun, **kwargs)[source]

Load file format of rainfall data with a given load function

Parameters:
  • file_path (pathlib.Path) – File path with rainfall data. Note that files in the folder should follow the input data format defined in the load_fun.

  • load_fun (Callable) –

    Please check the required input/output format for the files of the used load functions. The output of this function must comply with:

    • datetime (datetime64[ns]): timestamp, timezone naive

    • station (object): name of station, must be formatting accoring to a string.

    • value (float): in mm

  • kwargs – Keyword arguments for load_fun

Returns:

rain – DataFrame with rainfall time series. Contains at least the following columns:

  • rain_mm (float): Rain in mm

  • datetime (pandas.Timestamp): Time stamp

  • minutes_since (float): Minutes since start of year.

  • station (str): station name

  • year (int): year of the measurement

  • tag (str): tag identifier, formatted as STATION_YEAR

Return type:

pandas.DataFrame

rfactor.process.load_rain_file_flanders(file_path, interpolate=False)[source]

Load any txt file which is formatted in the correct format.

The input files are defined by tab delimited files (extension: .txt) that hold rainfall timeseries. The data are split per monitoring station and the file name should be the station identifier. The file should contain two columns:

  • Date/Time

  • Value [millimeter]

Parameters:
  • file_path (pathlib.Path) –

    File path (comma delimited, .CSV-extension) with rainfall data according to defined format:

    • datetime: %d-%m-%Y %H:%M:%S-format

    • Value [millimeter]: str (containing floats and ‘—‘-identifier)

    Headers are not necessary for the columns.

  • interpolate (bool) – Interpolate NaN yes/no

Returns:

rain – DataFrame with rainfall time series. Contains the following columns:

  • datetime (pandas.Timestamp): Time stamp.

  • minutes_since (float): Minutes since start of year.

  • station (str): station identifier.

  • rain_mm (float): Rain in mm.

Return type:

pandas.DataFrame

Example

  1. Example of a rainfall file:

01-01-2019 00:00,"0"
01-01-2019 00:05,"0.03"
01-01-2019 00:10,"0.04"
01-01-2019 00:15,"0"
01-01-2019 00:20,"0"
01-01-2019 00:25,"---"
01-01-2019 00:30,"0"

Notes

  1. Strings --- in column Value [millimeter] -identifiers are converted to NaN-values (np.nan). Note that the values in string should be convertable to float (except ---).

  2. Current function is not maintained in unit test until further notice.

rfactor.process.load_rain_file_matlab_legacy(file_path)[source]

Load (legacy Matlab) file format of rainfall data of a single station/year.

The input files are defined by text files (extension: .txt) that hold non-zero rainfall timeseries. The data are split per station and per year with a specific datafile tag (file name format: SOURCE_STATION_YEAR.txt). The data should not contain headers, with the first column defined as ‘minutes since the start of the year’ and the second as the rainfall depth during the t last minutes (t is the temporal resolution of the timeseries).

Parameters:

file_path (pathlib.Path) – File path with rainfall data according to defined format, see notes.

Returns:

rain – DataFrame with rainfall time series. Contains the following columns:

  • minutes_since (int): Minutes since the start of the year

  • rain_mm (float): Rain in mm

  • datetime (pandas.Timestamp): Time stamp

  • station (str): station name

Return type:

pandas.DataFrame

Example

  1. Example of a rainfall file:

9390 1.00

9470 0.20

9480 0.50

10770 0.10

...  ...
rfactor.process.load_rain_folder(folder_path, load_fun, **kwargs)[source]

Load all (legacy Matlab format) files of rainfall data in a folder

Parameters:
  • folder_path (pathlib.Path) – Folder path with rainfall data, see also rfactor.process.load_rain_file(). Folder must contain txt files.

  • load_fun (Callable) –

    Please check the required input format for the files in the above listed functions. The (custom) function must output:

    • datetime (datetime64[ns]): timestamp, timezone naive

    • station (object): name of station, must be formatting accoring to a string.

    • value (float): in mm

  • kwargs – Keyword arguments for load_fun

Returns:

rain – See definition in rfactor.process.load_rain_file()

Return type:

pandas.DataFrame

rfactor.process.write_erosivity_data(df, folder_path)[source]

Write output erosivity to (legacy Matlab format) in folder.

Written data are split-up for each year and station (file name format: SOURCE_STATION_YEAR.txt) and does not contain any headers. The columns (no header!) in the written text files represent the following:

  • days_since (float): Days since the start of the year.

  • erosivity_cum (float): Cumulative erosivity over events.

  • all_event_rain_cum (float): Cumulative rain over events.

Parameters:
  • df (pandas.DataFrame) –

    DataFrame with rfactor/erosivity time series. Can contain multiple columns, but should have at least the following:

    • datetime (pandas.Timestamp): Time stamp

    • station (str): Station identifier

    • erosivity_cum (float): Cumulative erosivity over events

    • all_event_rain_cum (float): Cumulative rain over events

  • folder_path (pathlib.Path) – Folder path to save data according to legacy Matlab format, see rfactor.process.load_rain_file().

rfactor.valid module

rfactor.valid.valid_column(rain, req_col)[source]

Input dataframe has valid required columns

Parameters:
  • rain (pd.DataFrame) – To test dataframe

  • req_col (set) – Required columns in dataframe, e.g. {“datetime”, “rain_mm”}

rfactor.valid.valid_const_freq(rain)[source]

Check if rainfall inputdata has constant frequency

Parameters:

rain (pandas.DataFrame)

rfactor.valid.valid_freq(df_freq, req_freq=None)[source]

Test for valid frequency of input data

The frequency of the input data is tested to a defined frequency. Limit usage R-factor package to above 1-minute resolution.

Parameters:
  • df_freq (pandas.DatetimeIndex.freq) – Temporal frequency in the rainfall data

  • req_freq (int, default None) – Required frequency (minutes). If None, the req_frequence should at least be 1 minute.

rfactor.valid.valid_rainfall_timeseries(func=None, req_col={'datetime', 'rain_mm'}, req_freq=None)[source]
Customisable decorator to check pandas input rainfall data for functions used in

this package.

Parameters:
  • func (callable, default None)

  • req_col (set) – See rfactor.process.valid_column()

  • int (req_freq) – See rfactor.process.valid_freq()

Returns:

decorator – Return the execution of the actual decorator

Return type:

callable

Notes

Use super decorator to allow for decorator inputs