Queries and Downloads

Functions for querying and downloading data from the Copernicus Dataspace Ecosystem (CDSE).

Key Functions

To interact with the CDSE, two key functions are required.

  1. query_dataspace_by_polygon() Queries the Copernicus Dataspace Ecosystem API for products between two dates that conform to the Area of Interest and maximum cloud cover supplied.

Once the appropriate products are identified, then the query :code:pd.DataFrame can be downloaded using:

  1. download_s2_data_from_dataspace() Passes a DataFrame of L2A and L1C products to download_dataspace_product(), which handles for authentication errors, URL redirects and token refreshes.

SAFE Files

Sentinel-2 data is downloaded in the form of a .SAFE file; all download functions will end with data in this structure. This is a directory structure that contains the imagery, metadata and supplementary data of a Sentinel 2 image. The rasters themeselves are the in the GRANULE/[granule_id]/IMG_DATA/[resolution]/ folder; each band is contained in its own .jp2 file. For full details, see https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/data-formats

There are two ways to refer to a given Sentinel-2 products: the UUID and the product ID. The UUID is an alphanumeric string (e.g. 22e7af63-07ad-4076-8541-f6655388dc5e), whereas the product ID is a human-readable string (more or less) containing all the information needed for unique identification of an product, split by the underscore character.

Query Data Structure

All query functions return a dictionary. The key of the dictionary is the UUID of the product id; the product is a further set of nested dictionaries containing information about the product to be downloaded.

Data Download Source

The only download source currently provided is via the Copernicus Dataspace Ecosystem (CDSE): https://documentation.dataspace.copernicus.eu/APIs/SentinelHub/Catalog.html

  • Copernicus DataSpace Ecosystem

    Images are downloaded in .zip format, and pyeo handles the unzipping, conversion from .jp2 to .tif. Users do not need to be registered with the CDSE to query images, but pyeo expects the user to have provided a valid username and password. The main change from SciHub to CDSE is that Sentinel-2 products are no longer archived beyond a certain time-frame, i.e. the products in the CDSE are always online.

Legacy Download Sources

There are three other legacy download sources which this library no longer supports as they are deprecated in favour of the now active CDSE.

  • Scihub

    The Copernicus Open-Access Hub is the default option for downloading sentinel-2 images. Images are downloaded in .zip format, and then automatically unzipped. Users are required to register with a username and password before downloading, and there is a limit to no more than two concurrent downloads per username at a time. Scihub is entirely free. Older images are moved to the long-term archive and have to be requested.

  • AWS

    Sentinel data is also publicly hosted on Amazon Web Services. This storage is provided by Sinergise, and is normally updated a few hours after new products are made available. There is a small charge associated with downloading this data. To access the AWS repository, you are required to register an Amazon Web Services account (including providing payment details) and obtain an API key for that account. See https://aws.amazon.com/s3/pricing/ for pricing details; the relevant table is Data Transfer Pricing for the EU (Frankfurt) region. There is no limit to the concurrent downloads for the AWS bucket.

  • USGS

    Landsat data is hosted and provided by the US Geological Survey. You can sign up at https://ers.cr.usgs.gov/register/

Functions

pyeo.queries_and_downloads.build_dataspace_request_string(max_cloud_cover: int, start_date: str, end_date: str, area_of_interest: str, max_records: int) str

This function:

Builds the API product request string based on given properties and constraints.

Parameters:
  • max_cloud_cover (int) – Maximum cloud cover to allow in the queried products

  • start_date (str) – Starting date of the observations (YYYY-MM-DD format)

  • end_date (str) – Ending date of the observations (YYYY-MM-DD format)

  • area_of_interest (str) – Area of interest geometry as a string in WKT format

  • max_records (int) – Maximum number of products to show per query (queries with very high numbers may not complete in time)

Returns:

request_string – API Request String

Return type:

str

pyeo.queries_and_downloads.check_for_s2_data_by_date(aoi_path, start_date, end_date, conf, cloud_cover=100, tile_id='None', verbose=False, producttype=None, filename=None)

Gets all the products between start_date and end_date. Wraps sent2_query to avoid having passwords and long-format timestamps in code.

Parameters:
  • aoi_path (str) – Path to a geojson file containing a polygon of the outline of the area you wish to download. See www.geojson.io for a tool to build these. If no geojson file is provided, a tile_id is required. In that case, aoi_path should point to the root directory for the processing run in which all subdirectories will be created.

  • start_date (str) – Start date in the format yyyymmdd.

  • end_date (str) – End date of the query in the format yyyymmdd

  • conf (dict) –

    Output from a configuration file containing your username and password for the ESA hub. If needed, this can be dummied with a dictionary of the following format:

    conf={'sent_2':{'user':'your_username', 'pass':'your_pass'}}
    

  • cloud_cover (int) – The maximum level of cloud cover in images to be downloaded. Default: 100 (all images returned)

  • tile_id (str) – Sentinel-2 granule ID - only required if no geojson file is given and tile-based processing is selected. Default: ‘None’ - no tile-based search but aoi-based search

  • verbose (boolean) – If True, log additional text output.

  • producttype (str) – Sentinel-2 product type to be used in the query. Default: None

  • filename (str) – Sentinel-2 file name pattern to be used in the query. Default: None

Returns:

result – A dictionary of Sentinel 2 products.

Return type:

dict

pyeo.queries_and_downloads.download_dataspace_product(product_uuid: str, dataspace_username: str, dataspace_password: str, product_name: str, safe_directory: str, log: Logger) None

This function:

Downloads a Sentinel-2 product using the given product UUID from the ESA servers.

Parameters:
  • product_uuid (str) – UUID of the product to download

  • dataspace_username (str) – username used to access the CDSE.

  • dataspace_password (str) – password associated with username.

  • product_name (str) – Name of the product

  • safe_directory (str) – The directory (path) to write the SAFE files to

Return type:

None

Notes

Registration to the Copernicus Dataspace Ecosystem (CDSE) is free. Register here: https://dataspace.copernicus.eu

pyeo.queries_and_downloads.download_from_aws_with_rollback(product_id: str, folder: str, uuid: str, user: str, passwd: str) None

Attempts to download a single product from AWS using product_id; if not found, rolls back to Scihub using the UUID.

Parameters:
  • product_id (str) – The product ID (L2A…)

  • folder (str) – The folder to download the .SAFE file to.

  • uuid (str) – The product UUID (4dfB4-432df….)

  • user (str) – Scihub username

  • passwd (str) – Scihub password

Return type:

None

pyeo.queries_and_downloads.download_from_scihub(product_uuid: str, out_folder: str, user: str, passwd: str) None

Downloads and unzips product_uuid from scihub.

Parameters:
  • product_uuid (str) – The product UUID (e.g. 4dfB4-432df….)

  • out_folder (str) – The folder to save the .SAFE file to

  • user (str) – Scihub username

  • passwd (str) – Scihub password

Returns:

  • 0 (No error)

  • 1 (HTTP Error in server response)

Notes

If interrupted mid-download, there will be a .incomplete file in the download folder. You might need to remove this for further processing. Copernicus Open Access Hub no longer stores all products online for immediate retrieval. Offline products can be requested from the Long Term Archive (LTA) and should become available within 24 hours. Copernicus Open Access Hub’s quota currently permits users to request an offline product every 30 minutes. A product’s availability can be checked with a regular OData query by evaluating the Online property value or by using the is_online() convenience method. When trying to download an offline product with download() it will trigger its retrieval from the LTA. Given a list of offline and online products, download_all() will download online products, while concurrently triggering the retrieval of offline products from the LTA. Offline products that become online while downloading will be added to the download queue. download_all() terminates when the download queue is empty, even if not all products were retrieved from the LTA. We suggest repeatedly calling download_all() to download all products, either manually or using a third-party library, e.g. tenacity.

Source: https://sentinelsat.readthedocs.io/en/latest/api_overview.html

pyeo.queries_and_downloads.download_landsat_data(products, out_dir, conf)

Given an output from landsat_query, will download al L1C products to out_dir.

Parameters:
  • products (str) – Dictionary of landsat products; must include downloadUrl and displayId

  • out_dir (str) – Directory to save Landsat files in. Folder structure is out_dir->displayId->products

  • conf (dict) – Dictionary containing USGS login credentials. See docs for landsat_query().

pyeo.queries_and_downloads.download_s2_data(new_data, l1_dir, l2_dir, source='scihub', user=None, passwd=None, try_scihub_on_fail=False)

Downloads S2 imagery from AWS, google_cloud or scihub. new_data is a dict from Sentinel_2.

Parameters:
  • new_data (dict) – A query dictionary contining the products you want to download

  • l1_dir (str) – The directory to download level 1 products to.

  • l2_dir (str) – The directory to download level 2 products to.

  • source ({'scihub', 'aws'}) – The source to download the data from. Can be ‘scihub’ or ‘aws’; see section introduction for details

  • user (str, optional) – The username for sentinelhub

  • passwd (str, optional) – The password for sentinelhub

  • try_scihub_on_fail (bool, optional) – If true, this function will roll back to downloading from Scihub on a failure of any other downloader. Defaults to False.

Raises:

BadDataSource – Raised when passed either a bad datasource or a bad image ID

pyeo.queries_and_downloads.download_s2_data_from_dataspace(product_df: DataFrame, l1c_directory: str, l2a_directory: str, dataspace_username: str, dataspace_password: str, log: Logger) None

This function:

Wraps around download_dataspace_product, providing the necessary directories dependent on product type (L1C/L2A).

Parameters:
  • product_df (pd.DataFrame) – A Pandas DataFrame containing the products to download.

  • l1c_directory (str) – The path to the L1C download directory.

  • l2a_directory (str) – The path to the L2A download directory.

  • dataspace_username (str) – The username registered with the Copernicus Open Access Dataspace.

  • dataspace_password (str) – The password registered with the Copernicus Open Access Dataspace.

  • log (logging.Logger) – Log object to write to.

Return type:

None

pyeo.queries_and_downloads.download_s2_data_from_df(new_data, l1_dir, l2_dir, source='scihub', user=None, passwd=None, try_scihub_on_fail=False)

Downloads S2 imagery from AWS, google_cloud or scihub. new_data is a dict from Sentinel_2.

Parameters:
  • new_data (pandas dataframe) – A query dataframe containing the products you want to download

  • l1_dir (str) – The directory to download level 1 products to.

  • l2_dir (str) – The directory to download level 2 products to.

  • source ({'scihub', 'aws'}) – The source to download the data from. Can be ‘scihub’ or ‘aws’; see section introduction for details

  • user (str, optional) – The username for sentinelhub

  • passwd (str, optional) – The password for sentinelhub

  • try_scihub_on_fail (bool, optional) – If true, this function will roll back to downloading from Scihub on a failure of any other downloader. Defaults to False.

Raises:

BadDataSource – Raised when passed either a bad datasource or a bad image ID

pyeo.queries_and_downloads.download_s2_pairs(l1_dir, l2_dir, conf)

Given a pair of folders, one containing l1 products and the other containing l2 products, will query and download missing data. At the end of the run, you will have two folders with a set of paired L1 and L2 products. :param l1_dir: The directory to download level 1 products to. May contain existing products. :type l1_dir: str :param l2_dir: The directory to download level 2 products to. May contain existing products. :type l2_dir: str :param conf: A dictionary containing [‘sent_2’][‘user’] and [‘sent_2’][‘pass’] :type conf: dict

pyeo.queries_and_downloads.filter_non_matching_s2_data(query_output)

Filters a query such that it only contains paired level 1 and level 2 data products.

Parameters:

query_output (dict) – Query list

Returns:

filtered_query – A dictionary of products containing only L1 and L2 data.

Return type:

dict

pyeo.queries_and_downloads.filter_to_l1_data(query_output)

Takes list of products from check_for_s2_data_by_date and removes all non Level 1 products.

Parameters:

query_output (dict) – A dictionary of products from a S2 query

Returns:

filtered_query – A dictionary of products containing only the L1C data products

Return type:

dict

pyeo.queries_and_downloads.filter_to_l2_data(query_output)

Takes list of products from check_for_s2_data_by_date and removes all non Level 2A products.

Parameters:

query_output (dict) – A dictionary of products from a S2 query

Returns:

filtered_query – A dictionary of products containing only the L2A data products

Return type:

dict

pyeo.queries_and_downloads.filter_unique_dataspace_products(l1c_products: DataFrame, l2a_products: DataFrame, log: Logger) DataFrame
Parameters:
  • l1c_products (pd.DataFrame) – Pandas DataFrame containing a list of L1C products.

  • l2a_products (pd.DataFrame) – Pandas DataFrame containing a list of L2A products.

  • log (logging.Logger) – logging object.

Returns:

unique_l1c_products – A Pandas DataFrame of L1C products that do not have counterpart L2A products.

Return type:

pd.DataFrame

pyeo.queries_and_downloads.filter_unique_l1c_and_l2a_data(df: DataFrame, log: Logger)

This function:

Filters a dataframe from a query result such that it contains only unique Sentinel-2 datatakes, based on ‘beginposition’. Retains L2A metadata and only retains L1C metadata if no L2A product for that datatake has been found.

Parameters:

df (pd.DataFrame) – pandas dataframe with query results

Returns:

l1c, l2a – Pandas dataframes containing only unique L1C and L2A datatakes

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

pyeo.queries_and_downloads.get_access_token(dataspace_username: str | None = None, dataspace_password: str | None = None, refresh_token: str | None = None) str

This function:

Creates an access token to use during download for verification purposes.

Parameters:
  • dataspace_username (str) – The username registered with the Copernicus Open Access Dataspace

  • dataspace_password (str) – The password registered with the Copernicus Open Access Dataspace

  • refresh (bool) – Refreshes an old access token, Default false - returns new access token

Returns:

response

Return type:

str

pyeo.queries_and_downloads.get_granule_identifiers(safe_product_id)

Returns the parts of a S2 name that uniquely identify that granulate at a moment in time :param safe_product_id: The filename of a SAFE product :type safe_product_id: str

Returns:

  • satellite (str) – A string of either “L2A” or “L2B”

  • intake_date (str) – The timestamp of the data intake of this granule

  • orbit number (str) – The orbit number of this granule

  • granule (str) – The ID of this granule

pyeo.queries_and_downloads.get_query_datatake(query_item)

Gets the datatake timestamp of a query item.

Parameters:

query_item (dict) – An item from a query results dictionary.

Returns:

timestamp – The timestamp of that item’s datatake in the format yyyymmddThhmmss (Ex: 20190613T123002)

Return type:

str

pyeo.queries_and_downloads.get_query_filename(query_item)

Gets the filename element of a query

Parameters:

query_item (dict) – An item from a query results dictionary.

Returns:

filename – The filename element of that item.

Return type:

str

pyeo.queries_and_downloads.get_query_granule(query_item)

Gets the granule ID (ex: 48MXU) of a query

Parameters:

query_item (dict) – An item from a query results dictionary.

Returns:

granule_id – The granule ID of that item.

Return type:

str

pyeo.queries_and_downloads.get_query_level(query_item)

Returns the processing level of the query item.

Parameters:

query_item (dict) – An item from a query results dictionary.

Returns:

query_level – A string of either ‘Level-1C’ or ‘Level-2A’.

Return type:

str

pyeo.queries_and_downloads.get_query_processing_time(query_item)

Returns the processing timestamps of a query item

Parameters:

query_item (dict) – An item from a query results dictionary.

Returns:

processing_time – The date processing timestamp in the format yyyymmddThhmmss (Ex: 20190613T123002)

Return type:

str

pyeo.queries_and_downloads.landsat_query(conf, geojsonfile, start_date, end_date, cloud=50)

Queries the USGS dataset LANDSAT_8_C1 for imagery between the start_date and end_date, inclusive. This downloads all imagery touched by the bounding box of the provided geojson file.

Parameters:
  • conf (dict) – A dictionary with [‘landsat’][‘user’] and [‘landsat’][‘pass’] values, containing your USGS credentials.

  • geojsonfile (str) – The geojson file

  • start_date (str) – The start date, in “yyyymmdd” format. Will truncate any longer string.

  • end_date (str) – The end query date, in “yyyymmdd” format. Will truncate any longer string.

  • cloud (float) – The maximum cloud cover to return.

Returns:

products – A list of products; each item being a dictionary returned from the USGS API. See https://earthexplorer.usgs.gov/inventory/documentation/datamodel#Scene

Return type:

list of dict

pyeo.queries_and_downloads.load_api_key(path_to_api)

Returns an API key from a single-line text file containing that API

Parameters:

path_to_api (str) – The path a text file containing only the API key

Returns:

api_key – Returns the API key

Return type:

str

pyeo.queries_and_downloads.planet_query(aoi_path, start_date, end_date, out_path, api_key, item_type='PSScene4Band', search_name='auto', asset_type='analytic', threads=5)

Downloads data from Planetlabs for a given time period in the given AOI

Parameters:
  • aoi (str) – Filepath of a single-polygon geojson containing the aoi

  • start_date (str) – the inclusive start of the time window in UTC format

  • end_date (str) – the inclusive end of the time window in UTC format

  • out_path (filepath-like object) – A path to the output folder Any identically-named imagery will be overwritten

  • item_type (str) – Image type to download (see Planet API docs)

  • search_name (str) – A name to refer to the search (required for large searches)

  • asset_type (str) – Planet asset type to download (see Planet API docs)

  • threads (int) – The number of downloads to perform concurrently

Notes

IMPORTANT: Will not run for searches returning greater than 250 items.

pyeo.queries_and_downloads.query_dataspace_by_polygon(max_cloud_cover: int, start_date: str, end_date: str, area_of_interest: str, max_records: int, log: Logger) DataFrame

This function:

Returns a DataFrame of available Sentinel-2 imagery from the Copernicus Dataspace API.

Parameters:
  • max_cloud_cover (int) – Maximum Cloud Cover

  • start_date (str) – Start date of the images to query from in (YYYY-MM-DD) format

  • end_date (str) – End date of the images to query from in (YYYY-MM-DD) format

  • area_of_interest (str) – Region of interest centroid in WKT format

  • max_records (int) – Maximum records to return

Return type:

None

pyeo.queries_and_downloads.query_for_corresponding_image(prod, conf)

Queries Copernicus Hub for the corresponding l1/l2 image to ‘prod’

Parameters:
  • prod (str) – The product name to query

  • conf (dict) – A dictionary containing [‘sent_2’][‘user’] and [‘sent_2’][‘pass’]

Returns:

out – A Sentinel-2 product dictionary

Return type:

dict

pyeo.queries_and_downloads.read_aoi(aoi_path)

Opens the geojson file for the aoi. If FeatureCollection, return the first feature.

Parameters:

aoi_path (str) – The path to the geojson file

Returns:

aoi_dict – A dictionary translation of the feature inside the .json file

Return type:

dict

pyeo.queries_and_downloads.sent2_query(user, passwd, geojsonfile, start_date, end_date, cloud=100, tile_id='None', start_row=0, producttype=None, filename=None)

This function:

Fetches a list of Sentinel-2 products

Parameters:
  • user (str) – Username for ESA hub. Register at https://scihub.copernicus.eu/dhus/#/home

  • passwd (str) – password for the ESA Open Access hub

  • geojsonfile (str) – Path to a geojson file containing a polygon of the outline of the area you wish to download. See www.geojson.io for a tool to build these. If no geojson file is provided, a tile_id is required. In that case, aoi_path should point to the root directory for the processing run in which all subdirectories will be created.

  • start_date (str) – Date of beginning of search in the format YYYY-MM-DDThh:mm:ssZ (ISO standard)

  • end_date (str) – Date of end of search in the format yyyy-mm-ddThh:mm:ssZ See https://www.w3.org/TR/NOTE-datetime, or use check_for_s2_data_by_date

  • cloud (int, optional) – The maximum cloud clover percentage (as calculated by Copernicus) to download. Defaults to 100%

  • tile_id (str) – Sentinel-2 granule ID - only required in no geojson file is given and tile-based processing is selected. Default: ‘None’ - do the search by geojson extent.

  • start_row (int) – integer of the start row of the query results, can be 0,100,200,… if more than 100 results are returned

  • producttype (str) – string describing the product type, e.g. ‘S2MSI2A’ or ‘S2MSI1C’

  • filename (str) – file name pattern to be used in the query

Returns:

products – A dictionary of Sentinel-2 granule products that are touched by your AOI polygon, keyed by product ID. Returns both level 1 and level 2 data.

Return type:

dict

Notes

If you get a ‘request too long’ error, it is likely that your polygon is too complex. The following functions download by granule; there is no need to have a precise polygon at this stage.

pyeo.queries_and_downloads.shapefile_to_wkt(shapefile_path)

Converts a shapefile to a well-known text (wkt) format

Parameters:

shapefile_path (str) – Path to the shapefile to convert

Returns:

wkt – A wkt - string containing the geometry of the first feature of the first layer of the shapefile shapefile

Return type:

str