geoglue package

Contents

geoglue package#

Submodules#

geoglue.cds module#

This module uses ECMWF’s cdsapi to downloads ERA5 hourly data and provides utilities to time-shift the data to a particular timezone

class geoglue.cds.CdsDataset(instant: xr.Dataset, accum: xr.Dataset)#

Bases: NamedTuple

Tuple containing instant and accumulated variables from cdsapi

accum: Dataset#

Accumulated variables, such as total precipitation and surface solar radiation

assign_coords(coords: dict) CdsDataset#

Assigns coordinates to instant and accumulated variable datasets

daily() CdsDataset#

Returns CdsDataset corresponding to daily aggregation, mean for instant, sum for accumulated variables

daily_max() Dataset#

Daily maximum of instant variable dataset

daily_min() Dataset#

Daily minimum of instant variable dataset

equals(other: CdsDataset) bool#
get_time_dim() str#

Returns time dimension of the dataset

instant: Dataset#

Instant variables such as temperature and wind speed

property is_hourly: bool#

Returns whether dataset has hourly intervals

isel(*args, **kwargs) CdsDataset#

Select slices by index from both instant and accumulated datasets

sel(*args, **kwargs) CdsDataset#

Select slices from both instant and accumulated datasets

class geoglue.cds.CdsPath(instant: Path | None, accum: Path | None)#

Bases: NamedTuple

Tuple containing paths to instant and accumulated variables from cdsapi

accum: Path | None#

Path to accumulated variable dataset

as_dataset(drop_vars: list[str] = ['number', 'expver', 'surface']) CdsDataset#

Returns opened datasets for instant and accumulated variables

Parameters:

drop_vars – Variables to drop, default=[‘number’, ‘expver’, ‘surface’]

Returns:

Dataset corresponding to CdsPath

Return type:

CdsDataset

exists() bool#

Returns True if dataset exists

instant: Path | None#

Path to instant variable dataset

class geoglue.cds.DatasetPool(paths: Iterable[Path], shift_hours: int = 0, stub: str = 'era5')#

Bases: object

Collection of ERA5 reanalysis data

get_current_year(start_date: date | str, end_date: date | str) CdsDataset#
path(year: int, month: int | None = None, part: bool = False) CdsPath#

Returns CdsDataset corresponding to a particular year

path_min_part_year(year: int) CdsPath#

Returns CdsPath of the earliest available month of a partially downloaded year

weekly_reduce(year: int, vartype: Literal['instant', 'accum'], how_daily: Literal['mean', 'min', 'max', 'sum'] | None = None, how_weekly: Literal['mean', 'min', 'max', 'sum'] | None = None, window: int = 0, time_dim: str = 'valid_time') Dataset#

Returns aggregated weekly dataset, time-shifted to local timezone.

Dataset is aggregated to isoweeks, with week starting on Monday.

Parameters:
  • year – Year to return weekly dataset for

  • vartype – One of instant, accum to select instantaneous or accumulative variables

  • how_daily – One of ‘min’, ‘max’, ‘mean’, default=’mean’. Operation to aggregate from hourly to daily data. Ignored for accum vars, when we sum is used.

  • how_weekly – One of ‘min’, ‘max’, ‘mean’, default=’mean’. Operation to aggregate from daily to weekly data. Ignored for accum vars, where sum is used.

  • window – Number of weeks to include before the first ISO week (first Monday of the year). This is useful when performing rolling operations which require window elements to be present to avoid NaNs.

  • time_dim – Time dimension to use, default=’valid_time’

Returns:

Dataset resampled to weekly frequency, with weeks starting on Monday (ISO weeks)

Return type:

xr.Dataset

class geoglue.cds.ReanalysisSingleLevels(region: ZonedBaseRegion, variables: list[str], path: Path | None = None, stub: str = 'era5', data_format: Literal['grib', 'netcdf'] = 'grib', admin_in_name: bool = False)#

Bases: object

Fetch ERA5 reanalysis data from cdsapi for a particular country

Parameters:
  • region (Region) – Region for which to download data

  • variables (list[str]) – List of variables to fetch

  • path (Path | None) – Data path to download data to, optional. If not specified, downloads data to the default path, ~/.local/share/geoglue.

  • stub (str) – Stub to use in filename, default=`era5`. This is used as part of the downloaded filename, e.g. VNM-2-2020-stub.accum.nc

  • data_format (Literal['grib', 'netcdf']) – Data format to download files in, one of grib or netcdf, default=`grib`. Downloading data in GRIB format allows downloading more variables. GRIB files are converted to netCDF, so both options result in identical data files.

get(year: int, skip_exists: bool = True) CdsPath | None#

Fetches hourly data for a particular year.

An API key is needed for this function to work, see instructions at https://cds.climate.copernicus.eu/how-to-api

Parameters:
  • year – Data is downloaded for this year

  • skip_exists – Skip downloading if zipfile or extracted contents exist, default True

Returns:

Path of netCDF file that was written to disk

Return type:

CdsPath

get_current_year(start_date: date | str, end_date: date | str, skip_exists: bool = True) list[CdsPath] | None#

Fetches hourly data for a particular date range for the current year

get_dataset_pool() DatasetPool#

Returns DatasetPool corresponding to downloaded data

geoglue.cds.chunk_months(cds_data: CdsPath, stub: str, folder: Path) list[CdsPath]#
geoglue.cds.concat(a: CdsDataset, b: CdsDataset, time_dim: str = 'valid_time') CdsDataset#
geoglue.cds.era5_extract_hourly_data(file: Path, extract_path: Path) CdsPath#

Extracts hourly data from downloaded zip file

Parameters:
  • file – zip file to open

  • extract_path – Path to extract to

Returns:

Path to extracted dataset

Return type:

CdsPath

geoglue.cds.get_latest_era5_date() date#

Gets latest date when ERA5 data is available

ERA5 releases data with a lag of 5 days

geoglue.cds.get_timezone_offset_hours(offset: str) int | None#

Returns timezone offset in hours. Non-hourly offsets return None

Parameters:

offset – String in the form [+-]HH:MM

Examples

>>> get_timezone_offset_hours("+05:00")
5
>>> get_timezone_offset_hours("-04:00")
-4
>>> get_timezone_offset_hours("+01:30")  # returns None
Return type:

Timezone offset in hours, if fractional offset, then returns None

geoglue.cds.grib_to_netcdf(file: Path, path: Path | None = None) CdsPath#

Converts GRIB to netCDF

Parameters:
  • file – GRIB file to open

  • path – Parent folder to save netCDF files, optional. If not specified write to the same folder as the GRIB file

Returns:

Path to converted netCDF dataset

Return type:

CdsPath

geoglue.cds.is_end_of_month(d: date) bool#
geoglue.cds.timeshift_hours(ds1: Dataset, ds2: Dataset, shift: int, dim: str = 'valid_time') Dataset#

Timeshift dataset by shift hours.

If shift is a positive integer (longitude east), then that many hours are taken from the end of ds1 and attached onto ds2, with the end of ds2 clipped to ensure that ds2 size remains the same.

If shift is a negative integer (longitude west), then that many hours are taken from the beginning of ds2 and attached onto ds1, with the beginning of ds1 clipped to ensure that ds1 size remains the same.

Checks are performed to ensure that ds1 and ds2 are contiguous in time, and that they are hourly data.

Parameters:
  • ds1 – First dataset, comprises most of the data in returned timeshifted dataset when shift < 0

  • ds2 – Second dataset, comprises most of the data in returned timeshifted dataset when shift > 0

  • shift – Hours to timeshift, from [-12, 12], excluding 0.

  • dim – Name of the time dimension, optional

Return type:

Timeshifted dataset

Raises:

ValueError

  • Raised when no timeshift is performed when shift is zero - Raised when shift not in [-12, 12]

geoglue.cds.timeshift_hours_cdsdataset(ds1: CdsDataset, ds2: CdsDataset, shift: int, dim: str = 'valid_time') CdsDataset#

Timeshift CdsDataset by a integer number of hours

This applies timeshift_hours() to the instant and accum parts of a CdsDataset. The main difference from applying timeshift_hours() directly is that we shift the time shift value for the accum dataset by -1. This is due to the fact that the accumulated and mean rate variables represent the hour to the time-stamp, that is, the data time-stamped as YYYY/MM/DD 00:00, represents the accumulation/mean-rate of the data for the time period 23:00 to 00:00 for the date YYYY/MM/DD-1. See https://confluence.ecmwf.int/display/CKB/ERA5+family+post-processed+daily+statistics+documentation for context.

Parameters:
  • ds1 – First dataset, comprises most of the data in returned timeshifted dataset when shift <= 0

  • ds2 – Second dataset, comprises most of the data in returned timeshifted dataset when shift > 0

  • shift – Hours to timeshift, from [-12, 12], excluding 0.

  • dim – Name of the time dimension, optional

Return type:

Timeshifted dataset

Raises:

ValueError – Raised when shift not in [-12, 12]

geoglue.region module#

This module contains the Region class that has functions to fetch geospatial data (from GADM or geoBoundaries) for a particular country, as well as structures to make work with arbitrary shapefiles easier. It also supports calculating extents or geospatial bounds, and calculating timezone offsets.

class geoglue.region.AdministrativeLevel(name: str, url: str, bbox: Bbox, iso3: str | None, tz: str, admin: int, admin_file: str | Path, pk: str)#

Bases: ZonedBaseRegion

Represents a specific administrative level

admin: int#

Administrative level

admin_file: str | Path#

Path to shapefile

pk: str#

Column ID that is used as primary key to identify regions in shapefile, indexed by administrative level.

read() GeoDataFrame#
class geoglue.region.BaseCountry(name: str, url: str, bbox: Bbox, iso3: str | None)#

Bases: BaseRegion

Base class for all country level classes

class geoglue.region.BaseRegion(name: str, url: str, bbox: Bbox, iso3: str | None)#

Bases: object

Base class for all regions containing common fields

bbox: Bbox#

Geospatial bounding box

iso3: str | None#

If specified, the ISO3 code of the country that the region is a subdivision of

name: str#

Region identifier without spaces

url: str#

URL from which data was downloaded

class geoglue.region.Country(name: str, url: str, bbox: Bbox, iso3: str | None, tz: str, admin_files: Mapping[int, str | Path], pk: dict[int, str] | str)#

Bases: Region, BaseCountry

Subclass of Region that restricts name to country ISO3 codes

admin(adm: int) CountryAdministrativeLevel#
class geoglue.region.CountryAdministrativeLevel(name: str, url: str, bbox: Bbox, iso3: str | None, tz: str, admin: int, admin_file: str | Path, pk: str)#

Bases: AdministrativeLevel, BaseCountry

Subclass of AdministrativeLevel that restricts name to country ISO3 codes

class geoglue.region.Region(name: str, url: str, bbox: Bbox, iso3: str | None, tz: str, admin_files: Mapping[int, str | Path], pk: dict[int, str] | str)#

Bases: ZonedBaseRegion

Represents a geospatial region with a fixed time zone

admin(adm: int) AdministrativeLevel#
admin_files: Mapping[int, str | Path]#

Path to shapefiles, indexed by administrative level

pk: dict[int, str] | str#

Column ID that is used as primary key to identify regions in shapefile, indexed by administrative level.

If str, is the same for every administrative level

read_admin(admin: int) GeoDataFrame#

Reads a region shapefile

class geoglue.region.ZonedBaseRegion(name: str, url: str, bbox: Bbox, iso3: str | None, tz: str)#

Bases: BaseRegion

Base class for all regions with a fixed time zone

tz: str#

Timezone offset from UTC.

Expressed as [+-]HH:MM, e.g. +01:00 for CET timezone

geoglue.region.gadm(iso3: str, localize_date: datetime = datetime.datetime(2022, 1, 1, 0, 0), data_path: Path = PosixPath('/home/docs/.local/share/geoglue'), tzoffset: str | None = None) Country#

Returns GADM Region data

Parameters:
  • iso3 (str) – Country ISO3 code

  • localize_date (datetime.datetime) – Date where timezone is localised to, default=2022-01-01. See get_timezone() for information about this parameter

  • data_path (Path | None) – Optional. If specified, sets the data path where shapefiles will be downloaded, otherwise defaults to ~/.local/share/geoglue

  • tzoffset (str | None) – Optional, specifies timezone offset as [+-]HH:MM from UTC. If not specified is automatically inferred from country ISO3 code. Auto-detection is only performed for countries with one timezone, and this parameter is mandatory for countries spanning multiple timezones.

Returns:

Region data representing GADM information for a country at a particular admin level

Return type:

Region

geoglue.region.geoboundaries(iso3: str, localize_date: datetime = datetime.datetime(2022, 1, 1, 0, 0), data_path: Path = PosixPath('/home/docs/.local/share/geoglue'), tzoffset: str | None = None) Region#

Returns geoBoundaries Region data

Parameters:
  • iso3 (str) – Country ISO3 code

  • localize_date (datetime.datetime) – Date where timezone is localised to, default=2022-01-01. See get_timezone() for information about this parameter

  • data_path (Path | None) – Optional. If specified, sets the data path where shapefiles will be downloaded, otherwise defaults to ~/.local/share/geoglue

  • tzoffset (str | None) – Optional, specifies timezone offset as [+-]HH:MM from UTC. If not specified is automatically inferred from country ISO3 code. Auto-detection is only performed for countries with one timezone, and this parameter is mandatory for countries spanning multiple timezones.

Returns:

Region data representing geoBoundaries information for a country at a particular admin level

Return type:

Region

geoglue.region.get_bbox(path: str | Path) Bbox#

Gets bounding box of a shapefile

geoglue.region.get_region(name: str, file: str | Path | None = None, fallback: Literal['gadm', 'geoboundaries'] = 'gadm', **kwargs) Region#

Returns region from file or fallback to GADM or geoBoundaries

Parameters:
  • name (str) – Name of the region, e.g. ‘VNM’, ‘HCMC’

  • file (str | Path | None) – TOML file from which regions should be read. If not specified, fallback to GADM or geoBoundaries

  • fallback (Literal["gadm", "geoboundaries"]) – Default fallback provider, used when file is not specified or region name not found in the TOML file

  • **kwargs – Extra parameters passed to gadm() or geoboundaries()

Return type:

Region

geoglue.region.get_timezone(iso3: str, localize_date: datetime) str | None#

Returns unique timezone offset for a country with ISO3 code

Parameters:
  • iso3 (str) – ISO3 code of country

  • localize_date (datetime.datetime) – Date used to localize the timezone obtained from pytz. Timezone names (such as Europe/Berlin) do not have a fixed offset due to daylight savings time changes, and the same timezone can have a different offset, usually in summer months. The exact date when DST starts also varies by year according to local policy shifts. We pick a specific date here to ensure that the localization is reproducible. The date is taken to be in the middle of winter in the Northern hemisphere when DST does not apply and the time offset follows standard time. For countries in the Southern hemisphere, the choice of this date may lead to non-standard (daylight savings) time being used.

Returns:

Timezone offset as [+-]HH:MM from UTC if unique timezone found, None otherwise

Return type:

str | None

geoglue.memoryraster module#

MemoryRaster class to read and operate on raster files with metadata entirely in memory.

While rasterio offers low level access to read and manipulate geospatial raster files (such as in GeoTIFF), it does not have easy to use higher level functions for standard operations on rasters such as projection to a different coordinate system, resampling, or zonal statistics. This module defines a MemoryRaster class to contain metadata about rasters, to and fro conversion from rasterio.DataReader, and functions for plotting, reprojection and resampling. These functions are intended to make working with rasters in Python as easy to use as R’s terra package.

class geoglue.memoryraster.MemoryRaster(data: ndarray | MaskedArray | DataArray, transform: Affine, crs: str | CRS | None, nodata: int | float, origin_path: Path | None = None, dtype: str = 'float64', driver: str = 'GTiff')#

Bases: object

Class to operate on rasters in-memory.

While MemoryRaster can be constructed directly by passing the parameters below, in normal practice, it is constructed by reading from a GeoTIFF file or xarray object, using read() or from_xarray().

Parameters:
  • data (np.ndarray) – Data to consider as raster

  • transform (affine.Affine) – Affine transformation associated with raster

  • crs (str | pyproj.crs.CRS | None) – Coordinate reference system associated with raster

  • nodata (int | float) – Data value indicating NA

  • origin_path (Path | None) – Path to source file, optional, default=None. This attribute is populated if the MemoryRaster is read from a file

  • dtype (str) – numpy dtype of array, default=’float64’

  • driver (str) – rasterio driver, optional, default=’GTiff’

as_rasterio(zfill: bool = False)#

Returns MemoryRaster as a rasterio dataset

astype(t) MemoryRaster#

Returns a new MemoryRaster with type cast to t

property bbox: Bbox#

Bounding box of MemoryRaster

checksum() str#
crop(bbox: Bbox) MemoryRaster#

Crop a MemoryRaster to bounds

crs: str | CRS | None#
data: ndarray | MaskedArray | DataArray#
driver: str = 'GTiff'#
dtype: str = 'float64'#
static from_xarray(da: DataArray, c_longitude='longitude', c_latitude='latitude', nodata: int | float | None = None) MemoryRaster#

Creates MemoryRaster from xarray, assumes EPSG:4326

Parameters:
  • da – xarray DataArray from which to create MemoryRaster

  • c_longitude – Longitude axis in dataarray, default=’longitude’

  • c_latitude – Latitude axis in dataarray, default=’latitude’

  • nodata – Data value representing NA, optional. If not specified, tries to read from xarray attributes such as GRIB_missingValue, nodata, _FillValue

Return type:

MemoryRaster

property griddes: CdoGriddes#

Returns grid description that can be used by cdo to resample

property height#

Height of raster image

property is_lonlat#

Returns whether grid is longitude and latitude

mask(geometry: GeoDataFrame | GeoSeries | list[Polygon], crop: bool = True) MemoryRaster#

Mask raster file with a set of geometries

Parameters:
  • geometry – GeoDataFrame or GeoSeries

  • crop – Whether to crop the extent to the geometry specified, default=True. This is passed directly to rasterio.mask.mask.

Return type:

MemoryRaster

max() float#

Maximum value in raster

min() float#

Minimum value in raster

nodata: int | float#
origin_path: Path | None = None#
plot(cmap: str = 'viridis', fill_nodata=None, **kwargs)#

Plots a MemoryRaster using sensible defaults

property profile#
static read(file: str | Path, crs: str | None = None, resampling: Resampling = Resampling.bilinear) MemoryRaster#

Reads from a file supported by rasterio

Parameters:
  • file – File to read from, must be openable by rasterio

  • crs – Coordinate reference system to project to

  • resampling – If reprojecting to another CRS, resampling strategy to use. Must be a strategy supported by rasterio

Return type:

MemoryRaster

resample(dst: MemoryRaster, resampling: Resampling) MemoryRaster#

Resamples source raster to match destination mask

This function is meant to be used for resampling MemoryRaster, usually those created from GeoTIFF files. For data already in netCDF format, we recommend using Climate Data Operator (cdo)’s resampling functions, for which we provide a wrapper in geoglue.resample

Parameters:
  • dst – Destination MemoryRaster

  • resampling – Resampling method, one of rasterio.enums.Resampling

See also

geoglue.resample

Resample module with wrappers for cdo resample

property shape#

Shape (width, height) of raster image

sum() float#

Sum of non-null values in raster

transform: Affine#
property width#

Width of raster image

zonal_stats(geometry: GeoDataFrame, ops: str | list[str] | Callable, weights: MemoryRaster | None = None, **kwargs) DataFrame | GeoDataFrame#

Calculate zonal statistics using exactextract

Parameters:
  • geometry (gpd.GeoDataFrame) – Geometry dataframe, usually read from a shapefile

  • ops (str | list[str] | Callable) – exactextract operation(s) to perform

  • weights (MemoryRaster | None) – Optional, if specified uses the supplied raster to perform weighted zonal statistics

  • **kwargs – Extra parameters passed directly to exactextract.exact_extract()

Returns:

A copy of the geometry dataframe with additional column(s) with the zonal statistics requested. Each separate zonal statistic is given a column in the data

Return type:

pd.DataFrame | gpd.GeoDataFrame

geoglue.memoryraster.get_numpy_dtype(t: str)#
geoglue.memoryraster.grid_size(da: DataArray, axis: str) float#

geoglue.resample module#

geoglue.resample.remapbil_sparse(infile: str | Path, griddes_file: str, outfile: str | Path, eps: float = 1e-06, tmp_path: Path = PosixPath('.')) Path#

Sparse bilinear resampling

Resampling a raster with standard CDO remapbil can cause issues such as NaNs moving into coastal regions for covariates where the variable is only defined on land (soil moisture, vegetation). This implementation uses a zero-filled resampled DataArray divided by a resampled mask (non-NA=1, NA=0). A low epsilon threshold is used to small contributions to avoid blowing up output near edges with NaN cells.

Parameters:
  • infile – Input data file or xarray.DataArray

  • griddes – Target griddes file

  • outfile – Output resampled file path, if not specified, generated from infile by

  • eps – epsilon value that is used as a threshold for mask

  • tmp_path – Temporary folder to use for intermediate files, defaults to $CWD

Returns:

Returns sparse resampled DataArray

Return type:

xr.DataArray

geoglue.resample.resample(resampling: Literal['remapbil', 'remapdis', 'sremapbil'], infile: str | Path, target: MemoryRaster | CdoGriddes | DataArray, outfile: str | Path | None = None, skip_exists=True) Path#

Resamples input file to output file using CDO’s resampling to a target raster grid

Parameters:
  • resampling

    Resampling type to use, must be one of remapbil, remapdis or sremapbil:

    • remapbil is bilinear resampling

    • remapdis is distance-weighted average remapping

    • sremapdis is sparse bilinear resampling that uses a non-NaN/NaN mask

      to normalise values to avoid NaN spreading from land-ocean boundaries

  • infile – Input file to read

  • target – Target MemoryRaster whose grid to resample to, or a CdoGriddes, or an xr.DataArray

  • outfile – Output resampled file path, if not specified, generated from infile by affixing .resampled to the path

  • skip_exists – Whether to skip resampling if outfile exists (default=True)

Returns:

Resampled dataset path

Return type:

Path

geoglue.resample.resampled_dataset(resampling: Literal['remapbil', 'remapdis'], data: str | Path | Dataset, target: MemoryRaster | DataArray) Iterator[Dataset]#

Context manager version of geoglue.resample.resample().

Parameters:
  • resampling – Resampling type to use, must be one of remapbil or remapdis

  • data – Input file to read or xarray dataset

  • target – Target MemoryRaster or xr.DataArray whose grid to resample to

Yields:

xr.Dataset – Resampled dataset

Example

>>> from geoglue.resample import resampled_dataset
>>> from geoglue import MemoryRaster
>>> pop = MemoryRaster.read("VNM_ppp_2000_1km_Aggregated_UNadj.tif")
>>> with resampled_dataset("remapbil", "somefile.nc", pop) as ds:
...     print(ds)

geoglue.types module#

Common types used in geoglue

class geoglue.types.Bbox(minx: int | float, miny: int | float, maxx: int | float, maxy: int | float)#

Bases: NamedTuple

Geographic bounding box

as_polygon() Polygon#
coverage_fraction(other: Bbox) float#
enlarge(by: int | float) Bbox#

Enlarges a Bbox by the same extent in all directions

static from_string(s: str) Bbox#

Returns Bbox from standard string representation

static from_xarray(da: DataArray | Dataset) Bbox#
property geodetic_area_km2: float#
int() Bbox#
property lat_slice: slice#
property lon_slice: slice#
maxx: int | float#

Eastern bounds, maximum longitude

maxy: int | float#

Northern bounds, maximum latitude

minx: int | float#

Western bounds, minimum longitude

miny: int | float#

Southern bounds, minimum latitude

overlap_fraction(other: Bbox) float#
property safe_name: str#
to_list(spec: str) list[int | float]#

Returns Bbox converted to list of numbers in different order

The default and standard bbox order is minx,miny,maxx,maxy. Certain applications expect the bbox coordinates in a different order. This method takes a fmt string and returns a list in that order

Parameters:

spec (str) – Either a fully specified string like “maxx,minx,maxy,maxy” or a shorthand. Supported shorthands are “cdsapi” for supplying bbox parameters to ECMWF’s cdsapi

Returns:

Returns a list of bbox coordinates in specified order

Return type:

list[int | float]

class geoglue.types.CdoGriddes(gridtype: str, gridsize: int, xsize: int, ysize: int, xname: str, yname: str, xfirst: float, xinc: float, yfirst: float, yinc: float, ylongname: str = 'latitude', yunits: str = 'degrees_north', xlongname: str = 'longitude', xunits: str = 'degrees_east')#

Bases: object

Grid specification used by Climate Data Operators (CDO)

This class represents a grid description as specified by the Climate Data Operators (cdo) program, with functionality to read and write grid descriptions from files.

approx_equal(other: CdoGriddes, rtol=1e-05, atol=1e-08) bool#

Approximate equality testing, with absolute (atol) and relative (rtol) tolerance

static from_dataset(ds: Dataset | DataArray) CdoGriddes#
static from_file(file: str | Path, base: CdoGriddes | None = None, **kwargs) CdoGriddes#
get_bbox() Bbox#
gridsize: int#
gridtype: str#
write(file: str | Path)#
xfirst: float#
xinc: float#
xlongname: str = 'longitude'#
xname: str#
xsize: int#
xunits: str = 'degrees_east'#
yfirst: float#
yinc: float#
ylongname: str = 'latitude'#
yname: str#
ysize: int#
yunits: str = 'degrees_north'#

geoglue.zonal_stats module#

Perform zonal statistics using exactextract

geoglue.zonal_stats.zonal_stats(da: DataArray, geom: GeoDataFrame, operation: str = 'mean(coverage_weight=area_spherical_km2)', weights: MemoryRaster | None = None, include_cols: list[str] | None = None) DataFrame#

Return zonal statistics for a particular data array.

Note that this function does not perform certain pre-processing steps, as they may not be required in general. See the functions mentioned below for more information.

Parameters:
  • da (xr.DataArray) – xarray DataArray to perform zonal statistics on. Must have ‘latitude’, ‘longitude’ and a time coordinate

  • geom (gpd.GeoDataFrame) – DataFrame containing a geometry column specifying the zones over which to calculate statistics

  • operation (str) – Zonal statistics operation. For a full list of operations, see https://isciences.github.io/exactextract/operations.html. Default operation is to calculate the mean with a spherical area coverage weight.

  • weights (MemoryRaster | None) – Optional, if specified, uses the specified raster to perform weighted zonal statistics.

  • include_cols (list[str] | None) – Optional, if specified, only includes these columns. If not specified, returns all columns except the geometry column

Returns:

The DataFrame specified by the geom parameter, one additional column, value containing the zonal statistic for the corresponding geometry.

Return type:

pd.DataFrame

See also

zonal_stats_xarray

Version of this function that returns a xarray DataArray

geoglue.util.sort_lonlat

Function to sort latitude and longitude

geoglue.util.crop_dataset_to_geometry

Function to crop dataset to geometry if dataset and geometry do not match

geoglue.zonal_stats.zonal_stats_xarray(da: DataArray, geom: GeoDataFrame, operation: str = 'mean(coverage_weight=area_spherical_km2)', weights: MemoryRaster | None = None, region_col: str | None = None) DataArray#

Return zonal statistics for a DataArray.

Note that this function does not perform certain pre-processing steps, as they may not be required in general. See the functions mentioned below for more information.

Parameters:
  • da (xr.DataArray) – xarray DataArray to perform zonal statistics on. Must have ‘latitude’, ‘longitude’ and a time coordinate

  • geom (gpd.GeoDataFrame) – DataFrame containing a geometry column specifying the zones over which to calculate statistics

  • operation (str) – Zonal statistics operation. For a full list of operations, see https://isciences.github.io/exactextract/operations.html. Default operation is to calculate the mean with a spherical area coverage weight.

  • weights (MemoryRaster | None) – Optional, if specified, uses the specified raster to perform weighted zonal statistics.

  • region_col (str | None) – Column to use as elements of the region coordinate, optional. If not specified, is set to the first column in the geometry that has unique values for each row.

Returns:

DataArray with region and date as coordinates

Return type:

xr.DataArray

See also

zonal_stats

Version of this function that returns a DataFrame

geoglue.util.sort_lonlat

Function to sort latitude and longitude

geoglue.util.crop_dataset_to_geometry

Function to crop dataset to geometry if dataset and geometry do not match