Calibration¶
[ ]:
%load_ext autoreload
%autoreload 2
Aim¶
The aim of this notebook is to post process calibration runs and to define a suitable set of ktc values for a given input dataset.
Note: - This notebook assumes you already did modelruns with the ‘calibrate’-option enabled. This is explained in another tutorial.
[ ]:
from pywatemsedem import calibrate
from pywatemsedem.calibrate import calculate_model_efficiency
from pathlib import Path
import pandas as pd
Observation data¶
We start by reading a file with the measured sediment values. Here, we use the calibration data of Flanders that is stored in the flanders-subpackage.
[ ]:
df_cal = pd.read_csv(Path("..") / ".." / "src" / "data" / "example_calibration_data.csv", sep=';')
As you can see, this pandas.DataFrame contains the sediment yield (sy, tons) and specific sediment yield (ssy, tons/ha) for several catchments. Also the area of each catchment is stored in this table.
[ ]:
df_cal.head(10)
Not all catchments are included in this calibration. We select the ones that are included.
[ ]:
df_cal = df_cal[df_cal["include"] == 1]
Simulation data¶
For several catchments in this dataset a calibration run is available in the datafolder
. We will use these runs in this example.
The content of results_calibration
should be (for setting scenario equal to 1, see below):
[ ]:
datafolder=Path(r"results_calibration")
scenario_nr = 1
[ ]:
datafolder.exists()
All calibration runs in the datafolder
were done for a range ktc low an ktc high values between 0 and 20. Therefore, we make a Calibration instance with these data.
[ ]:
caldata = calibrate.Calibration(0,20, 0, 20, 20)
the Calibration-object calculates automatically some usefull and commonly used variables in the data processing of a calibration. For example a template dataframe where all combinations of ktc-values are already present:
[ ]:
caldata.df_template.head(5)
[ ]:
df_cal
We loop over all directories in the datafolder, read the calibration output file, look-up the sy, ssy and area for the catchemnt in the calibration dataframe and process the modeloutput:
[ ]:
lst_df = []
cols = ["sy","include","area","ssy"]
for folder in datafolder.iterdir():
if folder.is_dir():
name=folder.stem
cond = df_cal["name"]==name
sy,include,area,ssy=df_cal.loc[cond,cols].values.tolist()[0]
if include==1:
f = datafolder / name / f'scenario_{scenario_nr}/modeloutput/calibration.txt'
df = calibrate.process_calibrationrun_output(f, sy, name, area,endpoint_coefficient=0.)
lst_df.append(df)
Next, we merge all this processed model output to a single dataframe. This dataframe contains a row for every model run for every catchment with a different combination of ktc values.
Evaluate¶
[ ]:
df_calibration_results = calibrate.merge_calibration_results(lst_df,caldata)
[ ]:
df_calibration_results.head(10)
Now, it is time to calculate the model efficiency for every combination of ktc-values
[ ]:
df_me = calculate_model_efficiency(df_cal, df_calibration_results)
The above function creates a pandas.DataFrame with the model efficiency (according to Nash and Suttcliff, 1970) for the modelled sediment yield (sy) and specific sediment yield (ssy)
[ ]:
df_me.head(10)
We can plot these data too:
[ ]:
calibrate.plot_model_efficiency(df_me, caldata, sy=False)
[ ]:
calibrate.plot_model_efficiency(df_me, caldata, sy=True)
We see several combinations of ktc-values with a high model efficiency for the sediment yield. However, we want to select a combination where the slope between the observed and modelled sediment yield is around 1. Therefore, we calculate a linear regression for all combinations of ktc-values:
[ ]:
df_regressions = calibrate.calculate_regressions(df_calibration_results, caldata)
[ ]:
df_regressions.head(10)
Both dataframes with metrics (df_regressions
and df_me
) can be joined on ktc_low and ktc_high:
[ ]:
df_metrics = pd.merge(left=df_regressions, right=df_me, on=['ktc_low', 'ktc_high'])
[ ]:
df_metrics.head(10)
According to Verstraeten et al. (2006), the ratio between ktc low and ktc high must be around 1/3. Therefore, we add a column with this ratio to the metrics dataframe
[ ]:
df_metrics = calibrate.calculate_ratio_ktc(df_metrics)
[ ]:
df_metrics.head(10)
Now, we have a dataframe with all the information we need to select the best combination of ktc values!
We define some bounderies on the slope and ratio:
[ ]:
lower_rico=0.90
upper_rico=1.10
lower_ratio=0.20
upper_ratio=0.40
We make a selection
[ ]:
selection_ratio = ((df_metrics['ratio_ktc'] >= lower_ratio) &
(df_metrics['ratio_ktc'] < upper_ratio))
selection_sy = ((df_metrics['rico_sy'] >= lower_rico) &
(df_metrics['rico_sy'] < upper_rico) &
selection_ratio)
selection_ssy = ((df_metrics['rico_ssy'] >= lower_rico) &
(df_metrics['rico_ssy'] < upper_rico) &
selection_ratio)
And apply the selection on the dataframe
[ ]:
df_metrics[selection_sy]
[ ]:
df_metrics[selection_ssy]
The combination of ktc-values 4/12 has a perfect ratio (1/3), a high ME for both sy and ssy, and the slope of the regression lies arround 1. This is a perfect combination of ktc vaues, and thus the result of the calibration!
[ ]:
fig,ax = calibrate.plot_regression(df_calibration_results, 3, 12)
[ ]:
fig,ax = calibrate.plot_regression(df_calibration_results, 3, 12, 'ssy')
[ ]: