EEDL Full API Reference

Submodules

eedl.image module

class eedl.image.EEDLImage(**kwargs)[source]

Bases: object

The main class that does all the work. Any use of this package ultimately instantiates this class for each export the user wants to do. As we refine this, we may be able to provide just a single function in this module named “export” or something of that sort for people who don’t need access to control class behavior. That will likely follow all the other enhancements, like converting the exports into async code.

The class has no required arguments as of 6/16/2023, but that may change. Any arguments provided get applied directly to the class and override any defaults. It may be good to set crs and scale here to make sure they are set correctly for the images you plan to export, but those can also be provided as kwargs to the .export method.

Note that in the arguments below, the ones prefixed by zonal are only used if you configure the mosaic_and_zonal callback on the TaskRegistry. Otherwise, when calling the zonal_stats method, you need to provide the parameters there.

Options at class instantiation include:

Parameters:

crs (Optional[str]) – Coordinate Reference System to use for exports in a format Earth Engine understands,
scale (Optional[int]) – Scale parameter to pass to Earth Engine for export. Defaults to 30
tile_size (Optional[int]) – The number of pixels per side of tiles to export
export_folder (Optional[Union[str, Path]]) – The name of the folder in the chosen export location that will be created for the export
cloud_bucket (Optional[str]) – The name of the Google Cloud storage bucket to use for exports - setting this parameter doesn’t automatically configure output to the bucket. When running .export your also need to specify a cloud export (instead of a drive export)
output_folder (Optional[Union[str, Path]]) – The folder, local to your system running the code, to export the finished images and optional zonal statistics files to.
zonal_polygons – Optional[Union[str, Path]]: The path to a fiona-compatible polygon vector data file (e.g. a shapefile, geopackage layer, or other). Only used with the mosaic_and_zonal callback. See note above.
zonal_stats_to_calc – Optional[Tuple]: Tuple of zonal statistics to calculate. For example ('min', 'max', 'mean'). See the documentation for the rasterstats package for full options. Only used with the mosaic_and_zonal callback. See note above.
zonal_keep_fields – Optional[Tuple]: Which fields should be preserved (passed through) from the spatial input data to the output zonal data. You will want to at least define the row’s ID/key value here (in a tuple, such as ('ID',) so you can join the zonal stats back to spatial data, but you can optionally include any other fields as well. Only used with the mosaic_and_zonal callback. See note above.
zonal_use_points – bool: Switch rasterstats to extract using gen_point_query instead of gen_zonal_stats. See rasterstats package documentation for complete information. Get_point_query will get the values of a raster at all vertex locations when provided with a polygon or line. If provided points, it will extract those point values. We set interpolation to the nearest to perform an exact extraction of the cell values. In this codebase’s usage, it’s assumed that the “features” parameter to this function will be a points dataset (still in the same CRS as the raster) when use_points is True. Additionally, when this is True, the stats argument to this function is ignored as only a single value will be extracted as the attribute value in the output CSV. Default is False. Only used with the mosaic_and_zonal callback. See note above.
zonal_output_filepath – Optional[Union[str, Path]]: Only used with the mosaic_and_zonal callback. See note above.
zonal_inject_constants – dict: Only used with the mosaic_and_zonal callback. See note above.
zonal_nodata_value – int: Only used with the mosaic_and_zonal callback. See note above.
zonal_all_touched – bool: Only used with the mosaic_and_zonal callback. See note above.

static check_mosaic_exists(download_location: str | Path, export_folder: str | Path, filename: str)[source]: This function isn’t ideal because it duplicates information - you need to pass it in elsewhere and assume this file format matches, rather than actually calculating the paths earlier in the process. But that’s currently necessary because the task registry sets the download location right now. So we want to be able to check at any time if the mosaic exists so that we can skip processing - we’re using this. Otherwise, we’d need to do a big refactor that’s probably not worth it.

download_results(download_location: str | Path, callback: str | None = None, drive_wait: int = 15) → None[source]

Handles the download and optional postprocessing of the current image to the folder specified download_location. Users of EEDL won’t need to invoke this except in very advanced situations. The Task Registry automatically invokes this method when it detects that the image has completed exporting.

Parameters:

download_location (Union[str, Path]) – The directory where the results should be downloaded to. Expects a string path or a Pathlib Path object.
callback (Optional[str]) – The callback function is called once the image has been downloaded.
drive_wait (int) – The amount of time in seconds to wait to allow for files that Earth Engine reports have been exported to actually populate. Default is 15 seconds.

Returns:

None

export(image: Image, filename_suffix: str, export_type: str = 'drive', clip: Geometry | None = None, strict_clip: bool | None = False, drive_root_folder: str | Path | None = None, **export_kwargs: Unpack) → None[source]

Handles the exporting of an image.

Determines the values to provide to tile and export the image, then calls Earth Engine’s image export functionality, creates an export task, and starts the task running. Importantly, this code does not wait for the task to finish exporting, or do any downloading of the image. It just starts the export process on EE’s servers in a way that can be tracked later. This is by design because if you want to export many images, what is most efficient is to start all of the image tasks, and then wait for all of them at once.

The drive_root_folder parameter must be set if export_type is drive, but is optional when export_type is cloud. When export_type is cloud you must provide the name of the Google Cloud storage bucket to export to in the bucket parameter. Both export types have configuration requirements and limitations discussed in Working with Export Locations - Drive vs. Cloud.

Note that this method returns None - if you wish to save an object for tracking and to obtain the path to the mosaicked image after everything is downloaded, keep the whole class instance object (or save it into a list, etc).

Parameters:

image (ee.image.Image) – Image for export.
filename_suffix (str) – The unique identifier used internally to identify images.
export_type (str) – Specifies how the image should be exported. Either “cloud” or “drive”. Defaults to “drive”.
clip (Optional[ee.geometry.Geometry]) – Defines the region of interest for export - does not perform a strict clip, which is often slower. Instead, it uses the Earth Engine export’s “region” parameter to clip the results to the bounding box of the clip geometry. To clip to the actual geometry, set strict_clip to True.
strict_clip (Optional[bool]) – When set to True, performs a true clip on the result so that it’s not just the bounding box but also the actual clipping geometry. Defaults to False.
drive_root_folder (Optional[Union[str, Path]]) – The folder on your computer that has the root of your Google Drive installation (e.g. G:My Drive on Windows) if “drive” is the provided export type.
bucket (Optional[str]) – The Google Cloud bucket to place the exported images into if using the cloud export_type.
export_kwargs (Unpack[EExportDict]) – An optional dictionary of keyword arguments that gets passed directly to Earth Engine’s image export method. Overrides any values EEDL manually calculates, if a key is set here that is also set elsewhere (such as on the class or derived in the method).

Returns:

None

property last_task_status: Dict[str, str]

Returns the status of the image’s task on Earth Engine’s servers, as last reported when it was checked. Calling this method does not re-check the task status. Instead, those checks are scheduled via the TaskRegistry and stored for access here.

The task status reported by Earth Engine comes back as a dictionary, which is what is stored here. We pay the most attention to the “STATE” key, which indicates where processing of the export is in the lifecycle. See https://developers.google.com/earth-engine/guides/processing_environments#task_lifecycle for more information on task lifecycles and possible values.

The following document from Earth Engine further has an example with the full set of keys and example values for a task status dictionary: https://developers.google.com/earth-engine/guides/python_install#create-an-export-task:

Returns: Dict[str, str]: Return the private variable “_last_task_status”

mosaic() → None[source]

Mosaics the individual pieces of the image into the complete image.

EEDL works by configuring Earth Engine to tile large exports - while Earth Engine has limits on individual image sizes for export, it can slice large images into tiles so that each individual image fits within those limits. EEDL sets Earth Engine to tile exports, then tracks the individual pieces (whether one or thousands) through downloading. In this function, it then mosaics those pieces back into one image you can use locally, so you don’t need to handle the individual tiles (and all of their edges).

Returns:: None

mosaic_and_zonal() → None[source]: A callback that takes no parameters, but runs both the mosaic and zonal stats methods. Runs zonal stats by allowing the user to set all the zonal params on the class instance instead of passing them as params. Users of EEDL could invoke this, but it’s really designed to be invoked via a callback on the Task Registry, and for any prior code to configure the zonal statistics extractions.

zonal_stats(polygons: str | Path, keep_fields: Tuple[str, ...] = ('UniqueID', 'CLASS2'), stats: Tuple[str, ...] = ('min', 'max', 'mean', 'median', 'std', 'count', 'percentile_10', 'percentile_90'), report_threshold: int = 1000, write_batch_size: int = 2000, use_points: bool = False, inject_constants: dict | None = None, nodata_value: int = -9999, all_touched: bool = False) → None[source]

Parameters:

polygons (Union[str, Path])
keep_fields (tuple[str, ...])
stats (tuple[str, ...])
report_threshold (int) – After how many iterations should it print out the feature number it’s on. Defaults to 1000. Set to None to disable.
write_batch_size (int) – How many zones should we store up before writing to the disk? Defaults to 2000.
use_points (bool)
inject_constants (Optional[dict])
nodata_value (int)
all_touched (bool)

Returns:

None

class eedl.image.EEExportDict[source]

Bases: TypedDict

bucket: NotRequired[str | None]

crs: str | None

description: str

fileDimensions: int | None

fileNamePrefix: str

folder: NotRequired[str | Path | None]

maxPixels: int | float

region: NotRequired[Geometry]

scale: int | float

class eedl.image.TaskRegistry[source]

Bases: object

The TaskRegistry class makes it convenient to manage arbitrarily many Earth Engine images that are in varying states of being downloaded.

COMPLETE_STATUSES = ['COMPLETED']

FAILED_STATUSES = ['CANCEL_REQUESTED', 'CANCELLED', 'FAILED']

INCOMPLETE_STATUSES = ('READY', 'UNSUBMITTED', 'RUNNING')

add(image: EEDLImage) → None[source]

Adds an Earth Engine image to the list of Earth Engine images.

Parameters:: image (ee.image.Image) – Earth Engine image to be added to the list of images
Returns:: None

property complete_tasks: List[EEDLImage]

List of Earth Engine images.

Returns:: List of Earth Engine images.
Return type:: List[ee.image.Image]

download_ready_images(download_location: str | Path) → None[source]

Downloads all images that are ready to be downloaded.

Parameters:: download_location (Union[str, Path]) – Destination for downloaded files.
Returns:: None

property downloadable_tasks: List[EEDLImage]

List of Earth Engine images that have not been cancelled or have failed.

Returns:: List of Earth Engine images that have not been cancelled or have failed.
Return type:: List[ee.image.Image]

property failed_tasks: List[EEDLImage]

List of Earth Engine images that have either been cancelled or that have failed

Returns:: List of Earth Engine images that have failed or have been cancelled.
Return type:: List[ee.image.Image]

property incomplete_tasks: List[EEDLImage]

List of Earth Engine images that have not been completed yet.

Returns: List[ee.image.Image]: List of Earth Engine images that have not been completed yet.

log_error(error_type: str, error_message: str)[source]

Parameters:

error_type (str) – Options “ee”, “local” to indicate whether it was an error on Earth Engine’s side or on the local processing side
error_message (str) – The error message to print to the log file

Returns:

None

setup_log(log_file_path: str | Path, mode='a')[source]

wait_for_images(download_location: str | Path, sleep_time: int = 10, callback: str | None = None, try_again_disk_full: bool = True, on_failure: str = 'log') → None[source]

Tells EEDL to wait until there are no more incomplete or downloadable tasks left. It will block execution of any following code until all images have been downloaded and processed. Any code that runs afterward can rely on the images being downloaded to disk.

Parameters:

download_location (Union[str, Path]) – Destination for downloaded files.
sleep_time (int) – Time between checking if downloads are available (complete) in seconds. Defaults to 10 seconds. This is also the interval where task statuses are updated on image objects.
callback (Optional[str]) – Optional callback function. Executed after image has been downloaded and allows for processing of completed images even while waiting for other images to complete on Earth Engine’s servers.
try_again_disk_full (bool) – Will continuously retry to download images that are ready, even if it fails to do so initially because the disk is full. This allows you to get the warning that the disk is full, then clear out space to allow processing to complete, without restarting the exports or processing.
on_failure (str) – *Needs language*

Returns:

None

eedl.image.download_images_in_folder(source_location: str | Path, download_location: str | Path, prefix: str) → None[source]

Handles pulling data from Google Drive over to a local location, filtering by a filename prefix and folder

Parameters:

source_location (Union[str, Path]) – Directory to search for files.
download_location (Union[str, Path]) – Destination for files with the specified prefix.
prefix (str) – A prefix to use to filter items in the folder - only files where the name matches this prefix will be moved.

Returns:

None

eedl.helpers module

class eedl.helpers.CollectionExtractor(**kwargs)[source]

Bases: object

This is a simple layer on top of EEDLImage that will export each item in a collection. Importantly, it will attempt to minimize any regridding of the raster by not doing any kind of strict boundary filtering. You can provide a collection filtered to a geometry and then it will export all of the images inside without any kind of clipping. If you choose the same CRS for output as the input, this should avoid regridding the raster.

collection: ImageCollection | None = None

collection_band: str | None = None

mosaic_by_date: bool | None = True

time_end: str | None = None

time_start: str | None = None

class eedl.helpers.GroupedCollectionExtractor(**kwargs)[source]

Bases: object

The GroupedCollectionExtractor is currently the most powerful tool in the package, though it has some limits based on its assumptions.

Using this class for an export allows you to provide spatial data (as polygons) indicating regions of interest (ROIs), as well as a separate set of spatial data indicating the polygons to extract data for within each ROI. With that information, the class will export every image from a collection within the date range you provide, clipped to each ROI, then obtain zonal statistics for the polygons within each ROI.

You’ll need to pay special attention to the parameters to the initialization function.

on_error

What to do when an error occurs. Options are “log”, which writes it to string and disk, and “raise” which raises the error as an exception and stops execution of the class.

Type:: str, “log”

skip_existing

Type:: bool, True

keep_image_objects

Whether to store the EEDLImage objects as part of this class, so they can be accessed when it’s done. We don’t just to not use the RAM on large exports. This does not specify anything about whether the image data is written to disk - that happens automatically and by default

Type:: bool, False

extract()[source]

eedl.helpers.mosaic_by_date(image_collection)[source]: Adapted to Python from code found via https://gis.stackexchange.com/a/343453/1955 :param image_collection: An image collection :return: ee.ImageCollection

eedl.merge module

A tool to merge separate timeseries outputs into a single data frame or DB table

eedl.merge.merge_csvs_in_folder(folder_path, output_path, sqlite_db=None, sqlite_table=None)[source]

eedl.merge.merge_many(base_folder, subfolder_name='alfalfa_et')[source]

eedl.merge.merge_outputs(file_mapping, date_field: str = 'et_date', sqlite_db: str | None = None, sqlite_table: str | None = None) → DataFrame[source]

Makes output zonal stats files into a data frame and adds a datetime field. Merges all inputs into one DF, and can optionally insert into a sqlite database.

Parameters:

file_mapping – A set of tuples with a path to a file and a time value (string or datetime) to associate with it.
date_field (str) – Defaults to “et_date”.
sqlite_db (Optional[str]) – Name of a sqlite database.
sqlite_table (Optional[str]) – Name of a table in the database.

Returns:

Pandas data frame with all file and time data.

Return type:

pandas.DataFrame

eedl.merge.plot_merged(df: DataFrame, et_field: str, date_field: str = 'et_date', uniqueid: str = 'UniqueID') → Plot[source]

Creates a seaborn plot of the data

Parameters:

df (pandas.DataFrame) – Data source for the plot.
et_field (str) – Name of the variable on the x-axis.
date_field (str) – Name of the variable on the y-axis. Default is “et_date”.
uniqueid (str) – Defines additional data subsets that transforms should operate on independently. Default is “UniqueID”.

Returns:

Returns a seaborn object plot.

Return type:

so.Plot

eedl.mosaic_rasters module

eedl.mosaic_rasters.mosaic_folder(folder_path: str | Path, output_path: str | Path, prefix: str = '') → None[source]

*Needs language*

Parameters:

folder_path (Union[str, Path]) – Location of the folder.
output_path (Union[str, Path]) – Output destination.
prefix (str) – Used to find the files of interest.

Returns:

None

eedl.mosaic_rasters.mosaic_rasters(raster_paths: Sequence[str | Path], output_path: str | Path, add_overviews: bool = True) → None[source]

Adapted from https://gis.stackexchange.com/a/314580/1955 and https://www.gislite.com/tutorial/k8024 along with other basic lookups on GDAL Python bindings

Parameters:

raster_paths (Sequence[Union[str, Path]]) – Location of the raster
output_path (Union[str, Path]) – Output destination
add_overviews (bool)

Returns:

None

eedl.zonal module

eedl.zonal.zonal_stats(features: str | Path | Collection, raster: str | Path | None, output_folder: str | Path | None, filename: str, keep_fields: Iterable[str] = ('UniqueID', 'CLASS2'), stats: Iterable[str] = ('min', 'max', 'mean', 'median', 'std', 'count', 'percentile_10', 'percentile_90'), report_threshold: int = 1000, write_batch_size: int = 2000, use_points: bool = False, inject_constants: dict = {}, nodata_value: int = -9999, **kwargs) → str | Path | None[source]

If the raster and the polygons are not in the CRS, this function will produce bad output.

Parameters:

features (Union[str, Path, fiona.Collection]) – Location to the features.
raster (Union[str, Path, None]) – Location of the raster.
output_folder (Union[str, Path, None]) – Output destination.
filename (Str) – Name of the file.
keep_fields (Iterable[str]) – Fields that will be used.
stats (Iterable[str]) – The various statistical measurements to be computed.
report_threshold (Int) – The number of iterations before it prints out the feature number it’s on. Default is 1000. Set to None to disable.
write_batch_size (Int) – The number of zones that should be stored up before writing to disk.
use_points (Bool) – Switch rasterstats to extract using gen_point_query instead of gen_zonal_stats. See rasterstats package documentation for complete information. Get_point_query will get the values of a raster at all vertex locations when provided with a polygon or line. If provided points, it will extract those point values. We set interpolation to the nearest to perform an exact extraction of the cell values. In this codebase’s usage, it’s assumed that the “features” parameter to this function will be a points dataset (still in the same CRS as the raster) when use_points is True. Additionally, when this is True, the stats argument to this function is ignored as only a single value will be extracted as the attribute value in the output CSV. Default is False.
inject_constants – A dictionary of field: value mappings to inject into every row. Useful if you plan to merge the data later. For example, a raster may be a single variable and date, and we’re extracting many rasters. So for each zonal call, you could do something like inject_constants = {date: ‘2021-01-01’, variable: ‘et’}, which would produce headers in the CSV for “date” and “variable” and added values in the CSV of “2021-01-01”, “et”.
kwargs – Passed through to rasterstats

Returns:

Return type:

Union[str, Path, None]

eedl.google_cloud module

eedl.google_cloud.download_export(bucket_name: str, output_folder: str | Path, prefix: str, delimiter: str = '/', autodelete: bool = True) → None[source]

Downloads a blob from the specified bucket.

Modified from Google Cloud sample documentation at: https://cloud.google.com/storage/docs/samples/storage-download-file#storage_download_file-python and https://cloud.google.com/storage/docs/samples/storage-list-files-with-prefix

Parameters:

bucket_name (str) – Name of the Google Cloud Storage Bucket to pull data from.
output_folder (Union[str, Path]) – Destination folder for exported data.
prefix (str) – A prefix to use to filter items in the bucket - only URLs where the path matches this prefix will be returned - defaults to all files.
delimiter (str) – Delimiter used for getting the list of blobs in the Google Cloud Storage Bucket. Defaults to “/”
autodelete (bool) – Bool for deleting blobs once contents have been installed. Defaults to True

Returns:

None

eedl.google_cloud.download_public_export(bucket_name: str, output_folder: str | Path, prefix: str = '') → None[source]

Parameters:

bucket_name (str) – Name of the Google Cloud Storage Bucket to pull data from.
output_folder (Union[str, Path]) – Destination folder for exported data.
prefix (str) – A prefix to use to filter items in the bucket - only URLs where the path matches this prefix will be returned - defaults to all files.

Returns:

None

eedl.google_cloud.get_public_export_urls(bucket_name: str, prefix: str = '') → List[str][source]

Downloads items from a public Google Cloud Storage Bucket without using a GCloud login. Filters only to files. with the specified prefix.

Parameters:

bucket_name (str) – Name of the Google Cloud Storage Bucket to pull data from.
prefix (str) – A prefix to use to filter items in the bucket - only URLs where the path matches this prefix will be returned - defaults to all files.

Returns:

A list of urls.

Return type:

List[str]