Calibration

[ ]:
%load_ext autoreload
%autoreload 2

Aim

The aim of this notebook is to post process calibration runs and to define a suitable set of ktc values for a given input dataset.

Note: - This notebook assumes you already did modelruns with the ‘calibrate’-option enabled. This is explained in another tutorial.

[ ]:
from pywatemsedem import calibrate
from pywatemsedem.calibrate import calculate_model_efficiency
from pathlib import Path
import pandas as pd

Observation data

We start by reading a file with the measured sediment values. Here, we use the calibration data of Flanders that is stored in the flanders-subpackage.

[ ]:
df_cal = pd.read_csv(Path("..") / ".." / "src" / "data" / "example_calibration_data.csv", sep=';')

As you can see, this pandas.DataFrame contains the sediment yield (sy, tons) and specific sediment yield (ssy, tons/ha) for several catchments. Also the area of each catchment is stored in this table.

[ ]:
df_cal.head(10)

Not all catchments are included in this calibration. We select the ones that are included.

[ ]:
df_cal = df_cal[df_cal["include"] == 1]

Simulation data

For several catchments in this dataset a calibration run is available in the datafolder. We will use these runs in this example.

The content of results_calibration should be (for setting scenario equal to 1, see below):

├── cathment_name1
│ ├── …
│ ├── scenario_1
│ │ ├── …
│ │ ├── modeloutput/calibration.txt
│ │ ├── …
│ ├── …
├── catchment_name2
│ ├── …
│ ├── scenario_1
│ │ ├── …
│ │ ├── modeloutput/calibration.txt
│ │ ├── …
├── …
[ ]:
datafolder=Path(r"results_calibration")
scenario_nr = 1
[ ]:
datafolder.exists()

All calibration runs in the datafolder were done for a range ktc low an ktc high values between 0 and 20. Therefore, we make a Calibration instance with these data.

[ ]:
caldata = calibrate.Calibration(0,20, 0, 20, 20)

the Calibration-object calculates automatically some usefull and commonly used variables in the data processing of a calibration. For example a template dataframe where all combinations of ktc-values are already present:

[ ]:
caldata.df_template.head(5)
[ ]:
df_cal

We loop over all directories in the datafolder, read the calibration output file, look-up the sy, ssy and area for the catchemnt in the calibration dataframe and process the modeloutput:

[ ]:
lst_df = []
cols = ["sy","include","area","ssy"]

for folder in datafolder.iterdir():

    if folder.is_dir():

        name=folder.stem
        cond = df_cal["name"]==name
        sy,include,area,ssy=df_cal.loc[cond,cols].values.tolist()[0]

        if include==1:
            f = datafolder / name / f'scenario_{scenario_nr}/modeloutput/calibration.txt'
            df = calibrate.process_calibrationrun_output(f, sy, name, area,endpoint_coefficient=0.)
            lst_df.append(df)

Next, we merge all this processed model output to a single dataframe. This dataframe contains a row for every model run for every catchment with a different combination of ktc values.

Evaluate

[ ]:
df_calibration_results = calibrate.merge_calibration_results(lst_df,caldata)
[ ]:
df_calibration_results.head(10)

Now, it is time to calculate the model efficiency for every combination of ktc-values

[ ]:
df_me = calculate_model_efficiency(df_cal, df_calibration_results)

The above function creates a pandas.DataFrame with the model efficiency (according to Nash and Suttcliff, 1970) for the modelled sediment yield (sy) and specific sediment yield (ssy)

[ ]:
df_me.head(10)

We can plot these data too:

[ ]:
calibrate.plot_model_efficiency(df_me, caldata, sy=False)
[ ]:
calibrate.plot_model_efficiency(df_me, caldata, sy=True)

We see several combinations of ktc-values with a high model efficiency for the sediment yield. However, we want to select a combination where the slope between the observed and modelled sediment yield is around 1. Therefore, we calculate a linear regression for all combinations of ktc-values:

[ ]:
df_regressions = calibrate.calculate_regressions(df_calibration_results, caldata)
[ ]:
df_regressions.head(10)

Both dataframes with metrics (df_regressions and df_me) can be joined on ktc_low and ktc_high:

[ ]:
df_metrics = pd.merge(left=df_regressions, right=df_me, on=['ktc_low', 'ktc_high'])
[ ]:
df_metrics.head(10)

According to Verstraeten et al. (2006), the ratio between ktc low and ktc high must be around 1/3. Therefore, we add a column with this ratio to the metrics dataframe

[ ]:
df_metrics = calibrate.calculate_ratio_ktc(df_metrics)
[ ]:
df_metrics.head(10)

Now, we have a dataframe with all the information we need to select the best combination of ktc values!

We define some bounderies on the slope and ratio:

[ ]:
lower_rico=0.90
upper_rico=1.10
lower_ratio=0.20
upper_ratio=0.40

We make a selection

[ ]:
selection_ratio = ((df_metrics['ratio_ktc'] >= lower_ratio) &
                (df_metrics['ratio_ktc'] < upper_ratio))

selection_sy = ((df_metrics['rico_sy'] >= lower_rico) &
                (df_metrics['rico_sy'] < upper_rico) &
               selection_ratio)

selection_ssy = ((df_metrics['rico_ssy'] >= lower_rico) &
                (df_metrics['rico_ssy'] < upper_rico) &
               selection_ratio)

And apply the selection on the dataframe

[ ]:
df_metrics[selection_sy]
[ ]:
df_metrics[selection_ssy]

The combination of ktc-values 4/12 has a perfect ratio (1/3), a high ME for both sy and ssy, and the slope of the regression lies arround 1. This is a perfect combination of ktc vaues, and thus the result of the calibration!

[ ]:
fig,ax = calibrate.plot_regression(df_calibration_results, 3, 12)
[ ]:
fig,ax = calibrate.plot_regression(df_calibration_results, 3, 12, 'ssy')
[ ]: