Development guidelines¶
Want to contribute code or functionalities to the pywatemsedem
package? Great and welcome on board!
We use a number of development tools to support us in improving the code quality. No magic bullet or free lunch, but just a set of tools as any craftman has tools to support him/her doing a better job.
For development purposes using conda, make sure to first run pip install -e .[develop]
environment
to prepare the development environment and install all development tools. (When using tox -e dev
this
is already done).
When starting on the development of the pywatemsedem
package, makes sure to be familiar with the following tools. Do
not hesitate to ask the other developers when having trouble using these tools.
Pre-commit hooks¶
To ensure a more common code formatting and limit the git diff, make sure to install the pre-commit hooks. The
required dependencies are included in the development requirements installed when running pip install -e .[develop]
.
Warning
Install the pre-commit
hooks before your first git commit to the package!
pre-commit install
on the main level of the package (pywatemsedem
folder, location where the file .pre-commit-config.yaml
is located)
If you just want to run the hooks on your files to see the effect (not during a git commit), you can use the command at any time:
pre-commit run --all-files
Unit testing with pytest¶
Run the test suite using the pytest
package, from within the main package
folder (pywatemsedem):
pytest
Or using tox (i.e. in a separate environment)
tox --
You will receive information on the test status and the test coverage of the unit tests. Make sure you have defined SAGA and WaTEM/SEDEM, see here.
Documentation with sphinx¶
Build the documentation locally with Sphinx (tox):
tox -e docs
which will create the docs in the docs/_build/html
folder. The docs/_build
directory itself is
left out of version control (and we rather keep it as such ;-)).
Github actions¶
Apart from these tools you can run locally, we use github actions to run these tests also on the cloud. You can see the results at https://github.com/watem-sedem/pywatemsedem/actions
Git LFS¶
Git LFS, or large file support, is used in this repository to store gis files in the repository. To use this functionality you need to install git lfs. See https://git-lfs.github.com/ for instructions and more information.
The .gitattributes
-file in the root folder contains the file extensions wich are
stored under LFS. For now, only files within the test folder are stored under
LFS.
Naming things¶
To provide structure in the naming of methods, functions, classes,… we propose to conform the following guidelines.
Class, function/methods names follow the standard naming conventions as defined
in the PEP8 guidelines. Additionaly, methods/functions start - whenever
possible - with an active verb on the action they perform (does_something()
)
, e.g. load_raster()
, clip_shape()
, transform_rasterprop()
Variable names follow the PEP8 guidelines, but provide additional context:
vectorfiles (incl. shapefiles etc):
vct_variable
rasterfiles:
rst_raster
raw tekstfiles (txt):
txt_variable
numpy array:
arr_variable
pandas:
df_variable
geopandas df:
gdf_variable
files:
`fname_variable
Utilities¶
Load raster: load rasters as numpy arrays. The
profile
raster metadata holds all metedata that geograpical defines the array. Seepywatemsedem.core.utils.load_raster()
:
from pywatemsedem.core.utils import load_raster
rst = "landuse.tif" # can be any raster extension supported by GDAL
arr,profile = load_raster(rst)
Write raster: write numpy arrays to rasters. See
pywatemsedem.core.utils.write_arr_as_rst()
:
from pywatemsedem.core.utils import write_arr_as_rst
rst_out = "landuse_int64.tif"
write_arr_as_rst(arr, rst_out, "int64", profile):
The dtype of the array and output raster is a required function input parameter, and can be used to save disk memory (i.e. use int64 instead of float64).
Guideline for unified coding style¶
A number of guidelines are given in order to obtain a degree of unified style in the pywatemsedem code. Following guidelines are given:
Pandas: access columns via
df["test"]
rather thandf.test
.Use Pandas dataframes to transfer non-raster data between modules/functions, use numpy arrays to perform numerical operations. In case of numerical operations (> 1 operation), write independent functions which use numpy arrays as input and output numpy arrays that can be stored in a dataframe.
Describing DataFrames in docstrings¶
As Numpy docstring does not provide default rules on describing a parameter or
returned variable that represents a Pandas.DataFrame
or a dict
, we
include these as follows (equivalent for parameters versus returns section):
Returns
-------
df_name: pandas.DataFrame
The DataFrame ...whatever you need to say... and contains the
following columns:
- *colunm_name_1* (int): description 1
- *colunm_name_2* (float): description 2
- *colunm_name_3* (datetime): description 3
other_returned_var : float
Description of a none df variable
Similar for a dictionary:
Returns
-------
df_name: dict
The dict ...whatever you need to say... and contains:
- *key_1* (int): description 1
- *key_2* (float): description 2
- *key_3* (datetime): description 3
other_returned_var : float
Description of a none df variable
In case it would be a dict where each element would be coming from different landuses,… and so the datatypes of the keys/values are the same for each item, you can use this in the docstring. E.g. a dict
Returns
-------
df_name: dict of {str : float}
The dict ...whatever you need to say... with the landuse classes as keys and the
area (in ha) as values.
other_returned_var : float
Description of a none df variable
Note
The empty lines are important for sphinx to convert this to a clean list.
Detail alert: the format variable: type is used as constructor for every variable in the documentation (and not variable : type).
Postprocess file structure¶
The postprocess filestructure file src/pywatemsedem/data/postprocess_files.csv holds an overview of all files that are used within postprocessing (either WaTEM/SEDEM input data, output data, intermediate processing files, etc..). This file can be used to add files to the WaTEM/SEDEM package. Do note that this table is only used to define the references to files within the WaTEM/SEDEM Python code. These references are saved in Python in a dictionary. Each line holds the definition for one file:
tag_variable & prefix_variable (str): define the dictionary key (e.g.
rst_aspect
) as the filename structure defined above.folder, filename, argument and extension (str): hold the dictionary value and defines the filename. Python string formating is used in filename to define the specific arguments needed to recognize the file (e.g. filename: buffers_%s_s%s and argument bekken, scenario will fill in the catchment name and scenario number in the filename).
mandatory (int): indicates whether a file is mandatory to create/load.
condition (str) (only postprocess): indicates the condition which is coupled to the existence of a file, e.g. if a
rst_buffers
file is loaded, then the optionInclude buffers
in the model code is set to one. This option is only required for the postprocessing.default_value (int) (only postprocess): the default value given to an empty raster (usefull if a file was not mandatory, but if it does have to be loaded for merging with other scenario’s).
generate_nodata (int) (only postprocess): generate a no data file to define model domain (i.e. some inputfiles have 0 as nodata value for the modeldomain, but also have 0 in the model domain).
postprocess (int) (only postprocess): indicate whether file has to be loaded within postprocessing script.
Package release¶
Before releasing, please check the pinned versions of the dependencies, and - if necessary- adapt in the
environment.yml
-file. The CI will create sdist/wheels and publish these to gitea when git tags are
added, making releasing straight forward. In order to publish a new release,
the following steps:
git checkout master, git pull origin master
(work on up to date master branch)Update the
CHANGELOG.rst
with the changes for this new releasegit commit -m 'Update changelog for release X.X.X' CHANGELOG.rst
git push origin master
Add git tags:
git tag X.X.X
Push the git tags:
git push --tags
When all test pass, drone CI will publish a pre-release on gitea. To convert this to release:
On the release page of the repository, draft a new release using the latest git tag
Copy past the changes from the changelog in the dialog and publish release
Note
Run the flanders WS end-to-end test and validate results before creating a new release. To run these tests, see here