Vectorisation
Functions for converting raster data to vectors, notably .shp but also .kml and non-geographic formats, .csv and .pkl.
Key functions
vectorise_from_band()
This function uses GDAL to vectorise specific layers of the change report .geotiff.
- pyeo.vectorisation.band_naming(band: int, log: Logger)
This function provides a variable name (string) based on the input integer.
- Parameters:
band (int) – the band to interpet as a name. The integer format used here is starting from 1, not 0
log (logging.Logger) –
- Returns:
band_name
- Return type:
str
- pyeo.vectorisation.boundingBoxToOffsets(bbox: list, geot: object) list[float]
This function calculates offsets from the provided bounding box and geotransform.
- Parameters:
bbox (list[float]) – bounding box coordinates within a list.
geot (object) – Geotransform object.
- Returns:
List of offsets (floats) as [row1, row2, col1, col2].
- Return type:
list[float]
Notes
The original implementation of this function was written by Konrad Hafen and can be found at: https://opensourceoptions.com/blog/zonal-statistics-algorithm-with-python-in-4-steps/
- pyeo.vectorisation.clean_zero_nodata_vectorised_band(vectorised_band_path: str, log: Logger)
This function removes 0s and nodata values from the vectorised bands.
- Parameters:
vectorised_band_path (str) – path to the band to filter
log (logging.Logger) – The logger object
- Returns:
filename
- Return type:
str
- pyeo.vectorisation.geotFromOffsets(row_offset, col_offset, geot)
This function calculates a new geotransform from offsets.
- Parameters:
row_offset (int) –
col_offset (int) –
geot (object) –
- Returns:
new_geot
- Return type:
float
Notes
The original implementation of this function was written by Konrad Hafen and can be found at: https://opensourceoptions.com/blog/zonal-statistics-algorithm-with-python-in-4-steps/
- pyeo.vectorisation.merge_and_calculate_spatial(rb_ndetections_zstats_df: DataFrame, rb_confidence_zstats_df: DataFrame, rb_first_changedate_zstats_df: DataFrame, path_to_vectorised_binary_filtered: str, write_csv: bool, write_shapefile: bool, write_kml: bool, write_pkl: bool, change_report_path: str, log: Logger, epsg: int, level_1_boundaries_path: str, tileid: str, delete_intermediates: bool = True)
This function takes the zonal statistics Pandas DataFrames and performs a table join to the vectorised binary polygons that are the basis of the vectorised change report.
- Parameters:
rb_ndetections_zstats_df (pd.DataFrame()) – Pandas DataFrame object for report band 5 (ndetections)
rb_confidence_zstats_df (pd.DataFrame()) – Pandas DataFrame object for report band 9 (confidence)
rb_first_changedate_zstats_df (pd.DataFrame()) – Pandas DataFrame object for report band 4 (approved first change date)
path_to_vectorised_binary (str) – Path to the vectorised binary shapefile
write_pkl (bool (optional)) – whether to write to pkl, defaults to False
write_csv (bool (optional)) – whether to write to csv, defaults to False
write_shapefile (bool (optional)) – whether to write to shapefile, defaults to False
write_kml (bool (optional)) – whether to write to kml file, defaults to False
change_report_path (str) – the path of the original change_report tiff, used for filenaming if saving outputs
log (logging.Logger) – a logging object
epsg (int) – the epsg to work with, specified in .ini
level_1_boundaries_path (str) – path to the administrative boundaries to filter by, specified in the .ini
tileid (str) – tileid to work with
delete_intermediates (bool) – a boolean indicating whether to delete or keep intermediate files. Defaults to True.
- Returns:
output_vector_files – list of output vector files created
- Return type:
list[str]
- pyeo.vectorisation.setFeatureStats(fid, min, max, mean, median, sd, sum, count, report_band)
This function sets the feature stats to calculate from the array.
- Parameters:
fid (int) –
min (int) –
max (int) –
mean (float) –
median (float) –
sd (float) –
sum (int) –
count (int) –
report_band (int) –
- Returns:
featstats
- Return type:
dict
- pyeo.vectorisation.vectorise_from_band(change_report_path: str, band: int, log: Logger)
This function takes the path of a change report raster and using a band integer, vectorises a band layer.
- Parameters:
change_report_path (str) – path to a change report raster
band (int) – an integer from 1 - 18, indicating the desired band to vectorise. the integer corresponds to GDAL numbering, i.e. starting at 1 instead of 0 as in Python.
log (logging.Logger) – log variable
- Returns:
out_filename – the output path of the vectorised band
- Return type:
str
- pyeo.vectorisation.zonal_statistics(raster_path: str, shapefile_path: str, report_band: int, log: Logger)
This function calculates zonal statistics on a raster.
- Parameters:
raster_path (str) – the path to the raster to obtain the values from.
shapefile_path (str) – the path to the shapefile which we will use as the “zones”.
band (int) – the band to run zonal statistics on.
- Returns:
zstats_df
- Return type:
pd.DataFrame
Notes
The raster at raster_path needs to be an even shape, e.g. 10980, 10980, not 10979, 10979.
The original implementation of this function was written by Konrad Hafen and can be found at: https://opensourceoptions.com/blog/zonal-statistics-algorithm-with-python-in-4-steps/
Aspects of this function were amended to accommodate library updates from GDAL, OGR and numpy.ma.MaskedArray().