geoglue package#
Submodules#
geoglue.cds module#
This module uses ECMWF’s cdsapi to downloads ERA5 hourly data and provides
utilities to time-shift the data to a particular timezone
- class geoglue.cds.CdsDataset(instant: xr.Dataset, accum: xr.Dataset)#
Bases:
NamedTupleTuple containing instant and accumulated variables from cdsapi
- accum: Dataset#
Accumulated variables, such as total precipitation and surface solar radiation
- assign_coords(coords: dict) CdsDataset#
Assigns coordinates to instant and accumulated variable datasets
- daily() CdsDataset#
Returns CdsDataset corresponding to daily aggregation, mean for instant, sum for accumulated variables
- daily_max() Dataset#
Daily maximum of instant variable dataset
- daily_min() Dataset#
Daily minimum of instant variable dataset
- equals(other: CdsDataset) bool#
- get_time_dim() str#
Returns time dimension of the dataset
- instant: Dataset#
Instant variables such as temperature and wind speed
- property is_hourly: bool#
Returns whether dataset has hourly intervals
- isel(*args, **kwargs) CdsDataset#
Select slices by index from both instant and accumulated datasets
- sel(*args, **kwargs) CdsDataset#
Select slices from both instant and accumulated datasets
- class geoglue.cds.CdsPath(instant: Path | None, accum: Path | None)#
Bases:
NamedTupleTuple containing paths to instant and accumulated variables from cdsapi
- accum: Path | None#
Path to accumulated variable dataset
- as_dataset(drop_vars: list[str] = ['number', 'expver', 'surface']) CdsDataset#
Returns opened datasets for instant and accumulated variables
- Parameters:
drop_vars – Variables to drop, default=[‘number’, ‘expver’, ‘surface’]
- Returns:
Dataset corresponding to CdsPath
- Return type:
- exists() bool#
Returns True if dataset exists
- instant: Path | None#
Path to instant variable dataset
- class geoglue.cds.DatasetPool(paths: Iterable[Path], shift_hours: int = 0, stub: str = 'era5')#
Bases:
objectCollection of ERA5 reanalysis data
- get_current_year(start_date: date | str, end_date: date | str) CdsDataset#
- path(year: int, month: int | None = None, part: bool = False) CdsPath#
Returns CdsDataset corresponding to a particular year
- path_min_part_year(year: int) CdsPath#
Returns CdsPath of the earliest available month of a partially downloaded year
- weekly_reduce(year: int, vartype: Literal['instant', 'accum'], how_daily: Literal['mean', 'min', 'max', 'sum'] | None = None, how_weekly: Literal['mean', 'min', 'max', 'sum'] | None = None, window: int = 0, time_dim: str = 'valid_time') Dataset#
Returns aggregated weekly dataset, time-shifted to local timezone.
Dataset is aggregated to isoweeks, with week starting on Monday.
- Parameters:
year – Year to return weekly dataset for
vartype – One of instant, accum to select instantaneous or accumulative variables
how_daily – One of ‘min’, ‘max’, ‘mean’, default=’mean’. Operation to aggregate from hourly to daily data. Ignored for accum vars, when we sum is used.
how_weekly – One of ‘min’, ‘max’, ‘mean’, default=’mean’. Operation to aggregate from daily to weekly data. Ignored for accum vars, where sum is used.
window – Number of weeks to include before the first ISO week (first Monday of the year). This is useful when performing rolling operations which require window elements to be present to avoid NaNs.
time_dim – Time dimension to use, default=’valid_time’
- Returns:
Dataset resampled to weekly frequency, with weeks starting on Monday (ISO weeks)
- Return type:
xr.Dataset
- class geoglue.cds.ReanalysisSingleLevels(region: ZonedBaseRegion, variables: list[str], path: Path | None = None, stub: str = 'era5', data_format: Literal['grib', 'netcdf'] = 'grib', admin_in_name: bool = False)#
Bases:
objectFetch ERA5 reanalysis data from cdsapi for a particular country
- Parameters:
region (Region) – Region for which to download data
variables (list[str]) – List of variables to fetch
path (Path | None) – Data path to download data to, optional. If not specified, downloads data to the default path,
~/.local/share/geoglue.stub (str) – Stub to use in filename, default=`era5`. This is used as part of the downloaded filename, e.g.
VNM-2-2020-stub.accum.ncdata_format (Literal['grib', 'netcdf']) – Data format to download files in, one of grib or netcdf, default=`grib`. Downloading data in GRIB format allows downloading more variables. GRIB files are converted to netCDF, so both options result in identical data files.
- get(year: int, skip_exists: bool = True) CdsPath | None#
Fetches hourly data for a particular year.
An API key is needed for this function to work, see instructions at https://cds.climate.copernicus.eu/how-to-api
- Parameters:
year – Data is downloaded for this year
skip_exists – Skip downloading if zipfile or extracted contents exist, default True
- Returns:
Path of netCDF file that was written to disk
- Return type:
- get_current_year(start_date: date | str, end_date: date | str, skip_exists: bool = True) list[CdsPath] | None#
Fetches hourly data for a particular date range for the current year
- get_dataset_pool() DatasetPool#
Returns DatasetPool corresponding to downloaded data
- geoglue.cds.concat(a: CdsDataset, b: CdsDataset, time_dim: str = 'valid_time') CdsDataset#
- geoglue.cds.era5_extract_hourly_data(file: Path, extract_path: Path) CdsPath#
Extracts hourly data from downloaded zip file
- Parameters:
file – zip file to open
extract_path – Path to extract to
- Returns:
Path to extracted dataset
- Return type:
- geoglue.cds.get_latest_era5_date() date#
Gets latest date when ERA5 data is available
ERA5 releases data with a lag of 5 days
- geoglue.cds.get_timezone_offset_hours(offset: str) int | None#
Returns timezone offset in hours. Non-hourly offsets return None
- Parameters:
offset – String in the form [+-]HH:MM
Examples
>>> get_timezone_offset_hours("+05:00") 5 >>> get_timezone_offset_hours("-04:00") -4 >>> get_timezone_offset_hours("+01:30") # returns None
- Return type:
Timezone offset in hours, if fractional offset, then returns None
- geoglue.cds.grib_to_netcdf(file: Path, path: Path | None = None) CdsPath#
Converts GRIB to netCDF
- Parameters:
file – GRIB file to open
path – Parent folder to save netCDF files, optional. If not specified write to the same folder as the GRIB file
- Returns:
Path to converted netCDF dataset
- Return type:
- geoglue.cds.is_end_of_month(d: date) bool#
- geoglue.cds.timeshift_hours(ds1: Dataset, ds2: Dataset, shift: int, dim: str = 'valid_time') Dataset#
Timeshift dataset by
shifthours.If
shiftis a positive integer (longitude east), then that many hours are taken from the end of ds1 and attached onto ds2, with the end of ds2 clipped to ensure that ds2 size remains the same.If
shiftis a negative integer (longitude west), then that many hours are taken from the beginning of ds2 and attached onto ds1, with the beginning of ds1 clipped to ensure that ds1 size remains the same.Checks are performed to ensure that ds1 and ds2 are contiguous in time, and that they are hourly data.
- Parameters:
ds1 – First dataset, comprises most of the data in returned timeshifted dataset when shift < 0
ds2 – Second dataset, comprises most of the data in returned timeshifted dataset when shift > 0
shift – Hours to timeshift, from [-12, 12], excluding 0.
dim – Name of the time dimension, optional
- Return type:
Timeshifted dataset
- Raises:
ValueError –
Raised when no timeshift is performed when shift is zero - Raised when shift not in [-12, 12]
- geoglue.cds.timeshift_hours_cdsdataset(ds1: CdsDataset, ds2: CdsDataset, shift: int, dim: str = 'valid_time') CdsDataset#
Timeshift CdsDataset by a integer number of hours
This applies timeshift_hours() to the instant and accum parts of a CdsDataset. The main difference from applying timeshift_hours() directly is that we shift the time shift value for the accum dataset by -1. This is due to the fact that the accumulated and mean rate variables represent the hour to the time-stamp, that is, the data time-stamped as YYYY/MM/DD 00:00, represents the accumulation/mean-rate of the data for the time period 23:00 to 00:00 for the date YYYY/MM/DD-1. See https://confluence.ecmwf.int/display/CKB/ERA5+family+post-processed+daily+statistics+documentation for context.
- Parameters:
ds1 – First dataset, comprises most of the data in returned timeshifted dataset when shift <= 0
ds2 – Second dataset, comprises most of the data in returned timeshifted dataset when shift > 0
shift – Hours to timeshift, from [-12, 12], excluding 0.
dim – Name of the time dimension, optional
- Return type:
Timeshifted dataset
- Raises:
ValueError – Raised when shift not in [-12, 12]
geoglue.region module#
This module contains the Region class that has functions to fetch geospatial data (from GADM or geoBoundaries) for a particular country, as well as structures to make work with arbitrary shapefiles easier. It also supports calculating extents or geospatial bounds, and calculating timezone offsets.
- class geoglue.region.AdministrativeLevel(name: str, url: str, bbox: Bbox, iso3: str | None, tz: str, admin: int, admin_file: str | Path, pk: str)#
Bases:
ZonedBaseRegionRepresents a specific administrative level
- admin: int#
Administrative level
- admin_file: str | Path#
Path to shapefile
- pk: str#
Column ID that is used as primary key to identify regions in shapefile, indexed by administrative level.
- read() GeoDataFrame#
- class geoglue.region.BaseCountry(name: str, url: str, bbox: Bbox, iso3: str | None)#
Bases:
BaseRegionBase class for all country level classes
- class geoglue.region.BaseRegion(name: str, url: str, bbox: Bbox, iso3: str | None)#
Bases:
objectBase class for all regions containing common fields
- iso3: str | None#
If specified, the ISO3 code of the country that the region is a subdivision of
- name: str#
Region identifier without spaces
- url: str#
URL from which data was downloaded
- class geoglue.region.Country(name: str, url: str, bbox: Bbox, iso3: str | None, tz: str, admin_files: Mapping[int, str | Path], pk: dict[int, str] | str)#
Bases:
Region,BaseCountrySubclass of Region that restricts name to country ISO3 codes
- admin(adm: int) CountryAdministrativeLevel#
- class geoglue.region.CountryAdministrativeLevel(name: str, url: str, bbox: Bbox, iso3: str | None, tz: str, admin: int, admin_file: str | Path, pk: str)#
Bases:
AdministrativeLevel,BaseCountrySubclass of AdministrativeLevel that restricts name to country ISO3 codes
- class geoglue.region.Region(name: str, url: str, bbox: Bbox, iso3: str | None, tz: str, admin_files: Mapping[int, str | Path], pk: dict[int, str] | str)#
Bases:
ZonedBaseRegionRepresents a geospatial region with a fixed time zone
- admin(adm: int) AdministrativeLevel#
- admin_files: Mapping[int, str | Path]#
Path to shapefiles, indexed by administrative level
- pk: dict[int, str] | str#
Column ID that is used as primary key to identify regions in shapefile, indexed by administrative level.
If str, is the same for every administrative level
- read_admin(admin: int) GeoDataFrame#
Reads a region shapefile
- class geoglue.region.ZonedBaseRegion(name: str, url: str, bbox: Bbox, iso3: str | None, tz: str)#
Bases:
BaseRegionBase class for all regions with a fixed time zone
- tz: str#
Timezone offset from UTC.
Expressed as [+-]HH:MM, e.g. +01:00 for CET timezone
- geoglue.region.gadm(iso3: str, localize_date: datetime = datetime.datetime(2022, 1, 1, 0, 0), data_path: Path = PosixPath('/home/docs/.local/share/geoglue'), tzoffset: str | None = None) Country#
Returns GADM Region data
- Parameters:
iso3 (str) – Country ISO3 code
localize_date (datetime.datetime) – Date where timezone is localised to, default=2022-01-01. See
get_timezone()for information about this parameterdata_path (Path | None) – Optional. If specified, sets the data path where shapefiles will be downloaded, otherwise defaults to
~/.local/share/geogluetzoffset (str | None) – Optional, specifies timezone offset as [+-]HH:MM from UTC. If not specified is automatically inferred from country ISO3 code. Auto-detection is only performed for countries with one timezone, and this parameter is mandatory for countries spanning multiple timezones.
- Returns:
Region data representing GADM information for a country at a particular admin level
- Return type:
- geoglue.region.geoboundaries(iso3: str, localize_date: datetime = datetime.datetime(2022, 1, 1, 0, 0), data_path: Path = PosixPath('/home/docs/.local/share/geoglue'), tzoffset: str | None = None) Region#
Returns geoBoundaries Region data
- Parameters:
iso3 (str) – Country ISO3 code
localize_date (datetime.datetime) – Date where timezone is localised to, default=2022-01-01. See
get_timezone()for information about this parameterdata_path (Path | None) – Optional. If specified, sets the data path where shapefiles will be downloaded, otherwise defaults to
~/.local/share/geogluetzoffset (str | None) – Optional, specifies timezone offset as [+-]HH:MM from UTC. If not specified is automatically inferred from country ISO3 code. Auto-detection is only performed for countries with one timezone, and this parameter is mandatory for countries spanning multiple timezones.
- Returns:
Region data representing geoBoundaries information for a country at a particular admin level
- Return type:
- geoglue.region.get_region(name: str, file: str | Path | None = None, fallback: Literal['gadm', 'geoboundaries'] = 'gadm', **kwargs) Region#
Returns region from file or fallback to GADM or geoBoundaries
- Parameters:
name (str) – Name of the region, e.g. ‘VNM’, ‘HCMC’
file (str | Path | None) – TOML file from which regions should be read. If not specified, fallback to GADM or geoBoundaries
fallback (Literal["gadm", "geoboundaries"]) – Default fallback provider, used when file is not specified or region name not found in the TOML file
**kwargs – Extra parameters passed to
gadm()orgeoboundaries()
- Return type:
- geoglue.region.get_timezone(iso3: str, localize_date: datetime) str | None#
Returns unique timezone offset for a country with ISO3 code
- Parameters:
iso3 (str) – ISO3 code of country
localize_date (datetime.datetime) – Date used to localize the timezone obtained from pytz. Timezone names (such as Europe/Berlin) do not have a fixed offset due to daylight savings time changes, and the same timezone can have a different offset, usually in summer months. The exact date when DST starts also varies by year according to local policy shifts. We pick a specific date here to ensure that the localization is reproducible. The date is taken to be in the middle of winter in the Northern hemisphere when DST does not apply and the time offset follows standard time. For countries in the Southern hemisphere, the choice of this date may lead to non-standard (daylight savings) time being used.
- Returns:
Timezone offset as [+-]HH:MM from UTC if unique timezone found, None otherwise
- Return type:
str | None
geoglue.memoryraster module#
MemoryRaster class to read and operate on raster files with metadata entirely in memory.
While rasterio offers low level access to read and manipulate geospatial raster files (such as in GeoTIFF), it does not have easy to use higher level functions for standard operations on rasters such as projection to a different coordinate system, resampling, or zonal statistics. This module defines a MemoryRaster class to contain metadata about rasters, to and fro conversion from rasterio.DataReader, and functions for plotting, reprojection and resampling. These functions are intended to make working with rasters in Python as easy to use as R’s terra package.
- class geoglue.memoryraster.MemoryRaster(data: ndarray | MaskedArray | DataArray, transform: Affine, crs: str | CRS | None, nodata: int | float, origin_path: Path | None = None, dtype: str = 'float64', driver: str = 'GTiff')#
Bases:
objectClass to operate on rasters in-memory.
While MemoryRaster can be constructed directly by passing the parameters below, in normal practice, it is constructed by reading from a GeoTIFF file or xarray object, using
read()orfrom_xarray().- Parameters:
data (np.ndarray) – Data to consider as raster
transform (affine.Affine) – Affine transformation associated with raster
crs (str | pyproj.crs.CRS | None) – Coordinate reference system associated with raster
nodata (int | float) – Data value indicating NA
origin_path (Path | None) – Path to source file, optional, default=None. This attribute is populated if the MemoryRaster is read from a file
dtype (str) – numpy dtype of array, default=’float64’
driver (str) – rasterio driver, optional, default=’GTiff’
- as_rasterio(zfill: bool = False)#
Returns MemoryRaster as a rasterio dataset
- astype(t) MemoryRaster#
Returns a new MemoryRaster with type cast to t
- checksum() str#
- crop(bbox: Bbox) MemoryRaster#
Crop a MemoryRaster to bounds
- crs: str | CRS | None#
- data: ndarray | MaskedArray | DataArray#
- driver: str = 'GTiff'#
- dtype: str = 'float64'#
- static from_xarray(da: DataArray, c_longitude='longitude', c_latitude='latitude', nodata: int | float | None = None) MemoryRaster#
Creates MemoryRaster from xarray, assumes EPSG:4326
- Parameters:
da – xarray DataArray from which to create MemoryRaster
c_longitude – Longitude axis in dataarray, default=’longitude’
c_latitude – Latitude axis in dataarray, default=’latitude’
nodata – Data value representing NA, optional. If not specified, tries to read from xarray attributes such as GRIB_missingValue, nodata, _FillValue
- Return type:
- property griddes: CdoGriddes#
Returns grid description that can be used by cdo to resample
- property height#
Height of raster image
- property is_lonlat#
Returns whether grid is longitude and latitude
- mask(geometry: GeoDataFrame | GeoSeries | list[Polygon], crop: bool = True) MemoryRaster#
Mask raster file with a set of geometries
- Parameters:
geometry – GeoDataFrame or GeoSeries
crop – Whether to crop the extent to the geometry specified, default=True. This is passed directly to rasterio.mask.mask.
- Return type:
- max() float#
Maximum value in raster
- min() float#
Minimum value in raster
- nodata: int | float#
- origin_path: Path | None = None#
- plot(cmap: str = 'viridis', fill_nodata=None, **kwargs)#
Plots a MemoryRaster using sensible defaults
- property profile#
- static read(file: str | Path, crs: str | None = None, resampling: Resampling = Resampling.bilinear) MemoryRaster#
Reads from a file supported by rasterio
- Parameters:
file – File to read from, must be openable by rasterio
crs – Coordinate reference system to project to
resampling – If reprojecting to another CRS, resampling strategy to use. Must be a strategy supported by rasterio
- Return type:
- resample(dst: MemoryRaster, resampling: Resampling) MemoryRaster#
Resamples source raster to match destination mask
This function is meant to be used for resampling MemoryRaster, usually those created from GeoTIFF files. For data already in netCDF format, we recommend using Climate Data Operator (cdo)’s resampling functions, for which we provide a wrapper in
geoglue.resample- Parameters:
dst – Destination MemoryRaster
resampling – Resampling method, one of rasterio.enums.Resampling
See also
geoglue.resampleResample module with wrappers for cdo resample
- property shape#
Shape (width, height) of raster image
- sum() float#
Sum of non-null values in raster
- transform: Affine#
- property width#
Width of raster image
- zonal_stats(geometry: GeoDataFrame, ops: str | list[str] | Callable, weights: MemoryRaster | None = None, **kwargs) DataFrame | GeoDataFrame#
Calculate zonal statistics using exactextract
- Parameters:
geometry (gpd.GeoDataFrame) – Geometry dataframe, usually read from a shapefile
ops (str | list[str] | Callable) – exactextract operation(s) to perform
weights (MemoryRaster | None) – Optional, if specified uses the supplied raster to perform weighted zonal statistics
**kwargs – Extra parameters passed directly to exactextract.exact_extract()
- Returns:
A copy of the geometry dataframe with additional column(s) with the zonal statistics requested. Each separate zonal statistic is given a column in the data
- Return type:
pd.DataFrame | gpd.GeoDataFrame
- geoglue.memoryraster.get_numpy_dtype(t: str)#
- geoglue.memoryraster.grid_size(da: DataArray, axis: str) float#
geoglue.resample module#
- geoglue.resample.remapbil_sparse(infile: str | Path, griddes_file: str, outfile: str | Path, eps: float = 1e-06, tmp_path: Path = PosixPath('.')) Path#
Sparse bilinear resampling
Resampling a raster with standard CDO remapbil can cause issues such as NaNs moving into coastal regions for covariates where the variable is only defined on land (soil moisture, vegetation). This implementation uses a zero-filled resampled DataArray divided by a resampled mask (non-NA=1, NA=0). A low epsilon threshold is used to small contributions to avoid blowing up output near edges with NaN cells.
- Parameters:
infile – Input data file or xarray.DataArray
griddes – Target griddes file
outfile – Output resampled file path, if not specified, generated from infile by
eps – epsilon value that is used as a threshold for mask
tmp_path – Temporary folder to use for intermediate files, defaults to $CWD
- Returns:
Returns sparse resampled DataArray
- Return type:
xr.DataArray
- geoglue.resample.resample(resampling: Literal['remapbil', 'remapdis', 'sremapbil'], infile: str | Path, target: MemoryRaster | CdoGriddes | DataArray, outfile: str | Path | None = None, skip_exists=True) Path#
Resamples input file to output file using CDO’s resampling to a target raster grid
- Parameters:
resampling –
Resampling type to use, must be one of remapbil, remapdis or sremapbil:
remapbil is bilinear resampling
remapdis is distance-weighted average remapping
- sremapdis is sparse bilinear resampling that uses a non-NaN/NaN mask
to normalise values to avoid NaN spreading from land-ocean boundaries
infile – Input file to read
target – Target MemoryRaster whose grid to resample to, or a CdoGriddes, or an xr.DataArray
outfile – Output resampled file path, if not specified, generated from infile by affixing .resampled to the path
skip_exists – Whether to skip resampling if outfile exists (default=True)
- Returns:
Resampled dataset path
- Return type:
Path
- geoglue.resample.resampled_dataset(resampling: Literal['remapbil', 'remapdis'], data: str | Path | Dataset, target: MemoryRaster | DataArray) Iterator[Dataset]#
Context manager version of
geoglue.resample.resample().- Parameters:
resampling – Resampling type to use, must be one of remapbil or remapdis
data – Input file to read or xarray dataset
target – Target MemoryRaster or xr.DataArray whose grid to resample to
- Yields:
xr.Dataset – Resampled dataset
Example
>>> from geoglue.resample import resampled_dataset >>> from geoglue import MemoryRaster >>> pop = MemoryRaster.read("VNM_ppp_2000_1km_Aggregated_UNadj.tif") >>> with resampled_dataset("remapbil", "somefile.nc", pop) as ds: ... print(ds)
geoglue.types module#
Common types used in geoglue
- class geoglue.types.Bbox(minx: int | float, miny: int | float, maxx: int | float, maxy: int | float)#
Bases:
NamedTupleGeographic bounding box
- as_polygon() Polygon#
- property geodetic_area_km2: float#
- property lat_slice: slice#
- property lon_slice: slice#
- maxx: int | float#
Eastern bounds, maximum longitude
- maxy: int | float#
Northern bounds, maximum latitude
- minx: int | float#
Western bounds, minimum longitude
- miny: int | float#
Southern bounds, minimum latitude
- property safe_name: str#
- to_list(spec: str) list[int | float]#
Returns Bbox converted to list of numbers in different order
The default and standard bbox order is minx,miny,maxx,maxy. Certain applications expect the bbox coordinates in a different order. This method takes a fmt string and returns a list in that order
- Parameters:
spec (str) – Either a fully specified string like “maxx,minx,maxy,maxy” or a shorthand. Supported shorthands are “cdsapi” for supplying bbox parameters to ECMWF’s cdsapi
- Returns:
Returns a list of bbox coordinates in specified order
- Return type:
list[int | float]
- class geoglue.types.CdoGriddes(gridtype: str, gridsize: int, xsize: int, ysize: int, xname: str, yname: str, xfirst: float, xinc: float, yfirst: float, yinc: float, ylongname: str = 'latitude', yunits: str = 'degrees_north', xlongname: str = 'longitude', xunits: str = 'degrees_east')#
Bases:
objectGrid specification used by Climate Data Operators (CDO)
This class represents a grid description as specified by the Climate Data Operators (cdo) program, with functionality to read and write grid descriptions from files.
- approx_equal(other: CdoGriddes, rtol=1e-05, atol=1e-08) bool#
Approximate equality testing, with absolute (atol) and relative (rtol) tolerance
- static from_dataset(ds: Dataset | DataArray) CdoGriddes#
- static from_file(file: str | Path, base: CdoGriddes | None = None, **kwargs) CdoGriddes#
- gridsize: int#
- gridtype: str#
- write(file: str | Path)#
- xfirst: float#
- xinc: float#
- xlongname: str = 'longitude'#
- xname: str#
- xsize: int#
- xunits: str = 'degrees_east'#
- yfirst: float#
- yinc: float#
- ylongname: str = 'latitude'#
- yname: str#
- ysize: int#
- yunits: str = 'degrees_north'#
geoglue.zonal_stats module#
Perform zonal statistics using exactextract
- geoglue.zonal_stats.zonal_stats(da: DataArray, geom: GeoDataFrame, operation: str = 'mean(coverage_weight=area_spherical_km2)', weights: MemoryRaster | None = None, include_cols: list[str] | None = None) DataFrame#
Return zonal statistics for a particular data array.
Note that this function does not perform certain pre-processing steps, as they may not be required in general. See the functions mentioned below for more information.
- Parameters:
da (xr.DataArray) – xarray DataArray to perform zonal statistics on. Must have ‘latitude’, ‘longitude’ and a time coordinate
geom (gpd.GeoDataFrame) – DataFrame containing a geometry column specifying the zones over which to calculate statistics
operation (str) – Zonal statistics operation. For a full list of operations, see https://isciences.github.io/exactextract/operations.html. Default operation is to calculate the mean with a spherical area coverage weight.
weights (MemoryRaster | None) – Optional, if specified, uses the specified raster to perform weighted zonal statistics.
include_cols (list[str] | None) – Optional, if specified, only includes these columns. If not specified, returns all columns except the geometry column
- Returns:
The DataFrame specified by the geom parameter, one additional column, value containing the zonal statistic for the corresponding geometry.
- Return type:
pd.DataFrame
See also
zonal_stats_xarrayVersion of this function that returns a xarray DataArray
geoglue.util.sort_lonlatFunction to sort latitude and longitude
geoglue.util.crop_dataset_to_geometryFunction to crop dataset to geometry if dataset and geometry do not match
- geoglue.zonal_stats.zonal_stats_xarray(da: DataArray, geom: GeoDataFrame, operation: str = 'mean(coverage_weight=area_spherical_km2)', weights: MemoryRaster | None = None, region_col: str | None = None) DataArray#
Return zonal statistics for a DataArray.
Note that this function does not perform certain pre-processing steps, as they may not be required in general. See the functions mentioned below for more information.
- Parameters:
da (xr.DataArray) – xarray DataArray to perform zonal statistics on. Must have ‘latitude’, ‘longitude’ and a time coordinate
geom (gpd.GeoDataFrame) – DataFrame containing a geometry column specifying the zones over which to calculate statistics
operation (str) – Zonal statistics operation. For a full list of operations, see https://isciences.github.io/exactextract/operations.html. Default operation is to calculate the mean with a spherical area coverage weight.
weights (MemoryRaster | None) – Optional, if specified, uses the specified raster to perform weighted zonal statistics.
region_col (str | None) – Column to use as elements of the region coordinate, optional. If not specified, is set to the first column in the geometry that has unique values for each row.
- Returns:
DataArray with region and date as coordinates
- Return type:
xr.DataArray
See also
zonal_statsVersion of this function that returns a DataFrame
geoglue.util.sort_lonlatFunction to sort latitude and longitude
geoglue.util.crop_dataset_to_geometryFunction to crop dataset to geometry if dataset and geometry do not match