Vectorisation

Functions for converting raster data to vectors, notably .shp but also .kml and non-geographic formats, .csv and .pkl.

Key functions

vectorise_from_band() This function uses GDAL to vectorise specific layers of the change report .geotiff.

pyeo.vectorisation.band_naming(band: int, log: Logger)

This function provides a variable name (string) based on the input integer.

Parameters:
  • band (int) – the band to interpet as a name. The integer format used here is starting from 1, not 0

  • log (logging.Logger) –

Returns:

band_name

Return type:

str

pyeo.vectorisation.boundingBoxToOffsets(bbox: list, geot: object) list[float]

This function calculates offsets from the provided bounding box and geotransform.

Parameters:
  • bbox (list[float]) – bounding box coordinates within a list.

  • geot (object) – Geotransform object.

Returns:

List of offsets (floats) as [row1, row2, col1, col2].

Return type:

list[float]

Notes

The original implementation of this function was written by Konrad Hafen and can be found at: https://opensourceoptions.com/blog/zonal-statistics-algorithm-with-python-in-4-steps/

pyeo.vectorisation.clean_zero_nodata_vectorised_band(vectorised_band_path: str, log: Logger)

This function removes 0s and nodata values from the vectorised bands.

Parameters:
  • vectorised_band_path (str) – path to the band to filter

  • log (logging.Logger) – The logger object

Returns:

filename

Return type:

str

pyeo.vectorisation.geotFromOffsets(row_offset, col_offset, geot)

This function calculates a new geotransform from offsets.

Parameters:
  • row_offset (int) –

  • col_offset (int) –

  • geot (object) –

Returns:

new_geot

Return type:

float

Notes

The original implementation of this function was written by Konrad Hafen and can be found at: https://opensourceoptions.com/blog/zonal-statistics-algorithm-with-python-in-4-steps/

pyeo.vectorisation.merge_and_calculate_spatial(rb_ndetections_zstats_df: DataFrame, rb_confidence_zstats_df: DataFrame, rb_first_changedate_zstats_df: DataFrame, path_to_vectorised_binary_filtered: str, write_csv: bool, write_shapefile: bool, write_kml: bool, write_pkl: bool, change_report_path: str, log: Logger, epsg: int, level_1_boundaries_path: str, tileid: str, delete_intermediates: bool = True)

This function takes the zonal statistics Pandas DataFrames and performs a table join to the vectorised binary polygons that are the basis of the vectorised change report.

Parameters:
  • rb_ndetections_zstats_df (pd.DataFrame()) – Pandas DataFrame object for report band 5 (ndetections)

  • rb_confidence_zstats_df (pd.DataFrame()) – Pandas DataFrame object for report band 9 (confidence)

  • rb_first_changedate_zstats_df (pd.DataFrame()) – Pandas DataFrame object for report band 4 (approved first change date)

  • path_to_vectorised_binary (str) – Path to the vectorised binary shapefile

  • write_pkl (bool (optional)) – whether to write to pkl, defaults to False

  • write_csv (bool (optional)) – whether to write to csv, defaults to False

  • write_shapefile (bool (optional)) – whether to write to shapefile, defaults to False

  • write_kml (bool (optional)) – whether to write to kml file, defaults to False

  • change_report_path (str) – the path of the original change_report tiff, used for filenaming if saving outputs

  • log (logging.Logger) – a logging object

  • epsg (int) – the epsg to work with, specified in .ini

  • level_1_boundaries_path (str) – path to the administrative boundaries to filter by, specified in the .ini

  • tileid (str) – tileid to work with

  • delete_intermediates (bool) – a boolean indicating whether to delete or keep intermediate files. Defaults to True.

Returns:

output_vector_files – list of output vector files created

Return type:

list[str]

pyeo.vectorisation.setFeatureStats(fid, min, max, mean, median, sd, sum, count, report_band)

This function sets the feature stats to calculate from the array.

Parameters:
  • fid (int) –

  • min (int) –

  • max (int) –

  • mean (float) –

  • median (float) –

  • sd (float) –

  • sum (int) –

  • count (int) –

  • report_band (int) –

Returns:

featstats

Return type:

dict

pyeo.vectorisation.vectorise_from_band(change_report_path: str, band: int, log: Logger)

This function takes the path of a change report raster and using a band integer, vectorises a band layer.

Parameters:
  • change_report_path (str) – path to a change report raster

  • band (int) – an integer from 1 - 18, indicating the desired band to vectorise. the integer corresponds to GDAL numbering, i.e. starting at 1 instead of 0 as in Python.

  • log (logging.Logger) – log variable

Returns:

out_filename – the output path of the vectorised band

Return type:

str

pyeo.vectorisation.zonal_statistics(raster_path: str, shapefile_path: str, report_band: int, log: Logger)

This function calculates zonal statistics on a raster.

Parameters:
  • raster_path (str) – the path to the raster to obtain the values from.

  • shapefile_path (str) – the path to the shapefile which we will use as the “zones”.

  • band (int) – the band to run zonal statistics on.

Returns:

zstats_df

Return type:

pd.DataFrame

Notes

The raster at raster_path needs to be an even shape, e.g. 10980, 10980, not 10979, 10979.

The original implementation of this function was written by Konrad Hafen and can be found at: https://opensourceoptions.com/blog/zonal-statistics-algorithm-with-python-in-4-steps/

Aspects of this function were amended to accommodate library updates from GDAL, OGR and numpy.ma.MaskedArray().