EEDL Full API Reference

Submodules

eedl.image module

class eedl.image.EEDLImage(**kwargs)[source]

Bases: object

The main class that does all the work. Any use of this package should instantiate this class for each export the user wants to do. As we refine this, we may be able to provide just a single function in this module named “export” or something of that sort for people who don’t need access to control class behavior. That will likely follow all the other enhancements, like converting the exports into async code.

The class has no required arguments as of 6/16/2023, but that may change. Any arguments provided get applied directly to the class and override any defaults. Options include:

Parameters:

crs (Optional[str]) – Coordinate Reference System to use for exports in a format Earth Engine understands, such as “EPSG:3310”
tile_size (Optional[int]) – The number of pixels per side of tiles to export
export_folder (Optional[Union[str, Path]]) – The name of the folder in the chosen export location that will be created for the export

static check_mosaic_exists(download_location: str | Path, export_folder: str | Path, filename: str)[source]: This function isn’t ideal because it duplicates information - you need to pass it in elsewhere and assume this file format matches, rather than actually calculating the paths earlier in the process. But that’s currently necessary because the task registry sets the download location right now. So we want to be able to check at any time if the mosaic exists so that we can skip processing - we’re using this. Otherwise we’d need to do a big refactor that’s probably not worth it.

download_results(download_location: str | Path, callback: str | None = None, drive_wait: int = 15) → None[source]

Parameters:

download_location (Union[str, Path]) – The directory where the results should be downloaded to. Expects a string path or a Pathlib Path object.
callback (Optional[str]) – The callback function is called once the image has been downloaded.
drive_wait (int) – The amount of time in seconds to wait to allow for files that Earth Engine reports have been exported to actually populate. Default is 15 seconds.

Returns:

None

export(image: Image, filename_suffix: str, export_type: str = 'drive', clip: Geometry | None = None, strict_clip: bool | None = False, drive_root_folder: str | Path | None = None, **export_kwargs: Unpack) → None[source]

Handles the exporting of an image.

Parameters:

image (ee.image.Image) – Image for export
filename_suffix (str) – The unique identifier used internally to identify images.
export_type (str) – Specifies how the image should be exported. Either “cloud” or “drive”. Defaults to “drive”.
clip (Optional[ee.geometry.Geometry]) – Defines the region of interest for export - does not perform a strict clip, which is often slower. Instead, it uses the Earth Engine export’s “region” parameter to clip the results to the bounding box of the clip geometry. To clip to the actual geometry, set strict_clip to True.
(Optional[bool] (strict_clip) – When set to True, performs a true clip on the result so that it’s not just the bounding box but also the actual clipping geometry. Defaults to False
drive_root_folder (Optional[Union[str, Path]]) – The folder for exporting if “drive” is selected

Returns:

None

property last_task_status: Dict[str, str]

Allows reading the private variable “_last_task_status”

Returns: Dict[str, str]: Return the private variable “_last_task_status”

mosaic() → None[source]

Mosaics the individual images into the full image

Returns:: None

mosaic_and_zonal() → None[source]: A callback that takes no parameters, but runs mosaic and zonal stats. Runs zonal stats by allowing the user to set all the zonal params on the class instance instead of passing them as params

zonal_stats(polygons: str | Path, keep_fields: Tuple[str, ...] = ('UniqueID', 'CLASS2'), stats: Tuple[str, ...] = ('min', 'max', 'mean', 'median', 'std', 'count', 'percentile_10', 'percentile_90'), report_threshold: int = 1000, write_batch_size: int = 2000, use_points: bool = False, inject_constants: dict = {}, nodata_value: int = -9999, all_touched: bool = False) → None[source]

Parameters:

polygons (Union[str, Path]) –
keep_fields (tuple[str, ...]) –
stats (tuple[str, ...]) –
report_threshold (int) – After how many iterations should it print out the feature number it’s on. Defaults to 1000. Set to None to disable.
write_batch_size (int) – How many zones should we store up before writing to the disk? Defaults to 2000.
use_points (bool) –

Returns:

None

class eedl.image.EEExportDict[source]

Bases: TypedDict

bucket: NotRequired[str | None]

crs: str | None

description: str

fileDimensions: int | None

fileNamePrefix: str

folder: NotRequired[str | Path | None]

maxPixels: int | float

region: NotRequired[Geometry]

scale: int | float

class eedl.image.TaskRegistry[source]

Bases: object

The TaskRegistry class makes it convenient to manage arbitrarily many Earth Engine images that are in varying states of being downloaded.

COMPLETE_STATUSES = ['COMPLETED']

FAILED_STATUSES = ['CANCEL_REQUESTED', 'CANCELLED', 'FAILED']

INCOMPLETE_STATUSES = ('READY', 'UNSUBMITTED', 'RUNNING')

add(image: Image) → None[source]

Adds an Earth Engine image to the list of Earth Engine images.

Parameters:: image (ee.image.Image) – Earth Engine image to be added to the list of images
Returns:: None

property complete_tasks: List[Image]

List of Earth Engine images.

Returns:: List of Earth Engine images.
Return type:: List[ee.image.Image]

download_ready_images(download_location: str | Path) → None[source]

Downloads all images that are ready to be downloaded.

Parameters:: download_location (Union[str, Path]) – Destination for downloaded files.
Returns:: None

property downloadable_tasks: List[Image]

List of Earth Engine images that have not been cancelled or have failed.

Returns:: List of Earth Engine images that have not been cancelled or have failed.
Return type:: List[ee.image.Image]

property failed_tasks: List[Image]

List of Earth Engine images that have either been cancelled or that have failed

Returns:: List of Earth Engine images that have failed or have been cancelled.
Return type:: List[ee.image.Image]

property incomplete_tasks: List[Image]

List of Earth Engine images that have not been completed yet.

Returns: List[ee.image.Image]: List of Earth Engine images that have not been completed yet.

log_error(error_type: str, error_message: str)[source]

Parameters:

error_type – Options “ee”, “local” to indicate whether it was an error on Earth Engine’s side or on the local processing side
error_message – The error message to print to the log file

setup_log(log_file_path: str | Path, mode='a')[source]

wait_for_images(download_location: str | Path, sleep_time: int = 10, callback: str | None = None, try_again_disk_full: bool = True, on_failure: str = 'log') → None[source]

Blocker until there are no more incomplete or downloadable tasks left.

Parameters:

download_location (Union[str, Path]) – Destination for downloaded files.
sleep_time (int) – Time between checking if the disk is full in seconds. Defaults to 10 seconds.
callback (Optional[str]) – Optional callback function. Executed after image has been downloaded.
try_again_disk_full (bool) – Will continuously retry to download images that are ready if disk is full.

Returns:

None

eedl.image.download_images_in_folder(source_location: str | Path, download_location: str | Path, prefix: str) → None[source]

Handles pulling data from Google Drive over to a local location, filtering by a filename prefix and folder

Parameters:

source_location (Union[str, Path]) – Directory to search for files.
download_location (Union[str, Path]) – Destination for files with the specified prefix.
prefix (str) – A prefix to use to filter items in the folder - only files where the name matches this prefix will be moved.

Returns:

None

eedl.merge module

A tool to merge separate timeseries outputs into a single data frame or DB table

eedl.merge.merge_csvs_in_folder(folder_path, output_path, sqlite_db=None, sqlite_table=None)[source]

eedl.merge.merge_many(base_folder, subfolder_name='alfalfa_et')[source]

eedl.merge.merge_outputs(file_mapping, date_field: str = 'et_date', sqlite_db: str | None = None, sqlite_table: str | None = None) → DataFrame[source]

Makes output zonal stats files into a data frame and adds a datetime field. Merges all inputs into one DF, and can optionally insert into a sqlite database.

Parameters:

file_mapping – A set of tuples with a path to a file and a time value (string or datetime) to associate with it.
date_field (str) – Defaults to “et_date”.
sqlite_db (Optional[str]) – Name of a sqlite database.
sqlite_table (Optional[str]) – Name of a table in the database.

Returns:

Pandas data frame with all file and time data.

Return type:

pandas.DataFrame

eedl.merge.plot_merged(df: DataFrame, et_field: str, date_field: str = 'et_date', uniqueid: str = 'UniqueID') → Plot[source]

Creates a seaborn plot of the data

Parameters:

df (pandas.DataFrame) – Data source for the plot.
et_field (str) – Name of the variable on the x-axis.
date_field (str) – Name of the variable on the y-axis. Default is “et_date”.
uniqueid (str) – Defines additional data subsets that transforms should operate on independently. Default is “UniqueID”.

Returns:

Returns a seaborn object plot.

Return type:

so.Plot

eedl.mosaic_rasters module

eedl.mosaic_rasters.mosaic_folder(folder_path: str | Path, output_path: str | Path, prefix: str = '') → None[source]

Testing

Parameters:

folder_path (Union[str, Path]) – Location of the folder.
output_path (Union[str, Path]) – Output destination.
prefix (str) – Used to find the files of interest.

Returns:

None

eedl.mosaic_rasters.mosaic_rasters(raster_paths: Sequence[str | Path], output_path: str | Path, add_overviews: bool = True) → None[source]

Adapted from https://gis.stackexchange.com/a/314580/1955 and https://www.gislite.com/tutorial/k8024 along with other basic lookups on GDAL Python bindings

Parameters:

raster_paths (Sequence[Union[str, Path]]) – Location of the raster
output_path (Union[str, Path]) – Output destination
add_overviews (bool) –

Returns:

None

eedl.zonal module

eedl.zonal.zonal_stats(features: str | Path | Collection, raster: str | Path | None, output_folder: str | Path | None, filename: str, keep_fields: Iterable[str] = ('UniqueID', 'CLASS2'), stats: Iterable[str] = ('min', 'max', 'mean', 'median', 'std', 'count', 'percentile_10', 'percentile_90'), report_threshold: int = 1000, write_batch_size: int = 2000, use_points: bool = False, inject_constants: dict = {}, nodata_value: int = -9999, **kwargs) → str | Path | None[source]

If the raster and the polygons are not in the CRS, this function will produce bad output.

Parameters:

features (Union[str, Path, fiona.Collection]) – Location to the features.
raster (Union[str, Path, None]) – Location of the raster.
output_folder (Union[str, Path, None]) – Output destination.
filename (Str) – Name of the file.
keep_fields (Iterable[str]) – Fields that will be used.
stats (Iterable[str]) – The various statistical measurements to be computed.
report_threshold (Int) – The number of iterations before it prints out the feature number it’s on. Default is 1000. Set to None to disable.
write_batch_size (Int) – The number of zones that should be stored up before writing to disk.
use_points (Bool) – Switch rasterstats to extract using gen_point_query instead of gen_zonal_stats. See rasterstats package documentation for complete information. Get_point_query will get the values of a raster at all vertex locations when provided with a polygon or line. If provided points, it will extract those point values. We set interpolation to the nearest to perform an exact extraction of the cell values. In this codebase’s usage, it’s assumed that the “features” parameter to this function will be a points dataset (still in the same CRS as the raster) when use_points is True. Additionally, when this is True, the stats argument to this function is ignored as only a single value will be extracted as the attribute value in the output CSV. Default is False.
inject_constants – A dictionary of field: value mappings to inject into every row. Useful if you plan to merge the data later. For example, a raster may be a single variable and date, and we’re extracting many rasters. So for each zonal call, you could do something like inject_constants = {date: ‘2021-01-01’, variable: ‘et’}, which would produce headers in the CSV for “date” and “variable” and added values in the CSV of “2021-01-01”, “et”.
kwargs – Passed through to rasterstats

Returns:

Return type:

Union[str, Path, None]

eedl.google_cloud module

eedl.google_cloud.download_export(bucket_name: str, output_folder: str | Path, prefix: str, delimiter: str = '/', autodelete: bool = True) → None[source]

Downloads a blob from the specified bucket.

Modified from Google Cloud sample documentation at: https://cloud.google.com/storage/docs/samples/storage-download-file#storage_download_file-python and https://cloud.google.com/storage/docs/samples/storage-list-files-with-prefix

Parameters:

bucket_name (str) – Name of the Google Cloud Storage Bucket to pull data from.
output_folder (Union[str, Path]) – Destination folder for exported data.
prefix (str) – A prefix to use to filter items in the bucket - only URLs where the path matches this prefix will be returned - defaults to all files.
delimiter (str) – Delimiter used for getting the list of blobs in the Google Cloud Storage Bucket. Defaults to “/”
autodelete (bool) – Bool for deleting blobs once contents have been installed. Defaults to True

Returns:

None

eedl.google_cloud.download_public_export(bucket_name: str, output_folder: str | Path, prefix: str = '') → None[source]

Parameters:

bucket_name (str) – Name of the Google Cloud Storage Bucket to pull data from.
output_folder (Union[str, Path]) – Destination folder for exported data.
prefix (str) – A prefix to use to filter items in the bucket - only URLs where the path matches this prefix will be returned - defaults to all files.

Returns:

None

eedl.google_cloud.get_public_export_urls(bucket_name: str, prefix: str = '') → List[str][source]

Downloads items from a public Google Cloud Storage Bucket without using a GCloud login. Filters only to files. with the specified prefix.

Parameters:

bucket_name (str) – Name of the Google Cloud Storage Bucket to pull data from.
prefix (str) – A prefix to use to filter items in the bucket - only URLs where the path matches this prefix will be returned - defaults to all files.

Returns:

A list of urls.

Return type:

List[str]