STAC Catalogs¶
Overview¶
DummyXarray provides comprehensive support for STAC (SpatioTemporal Asset Catalog), a standardized way to describe geospatial assets. STAC support enables you to:
- Export datasets as STAC Items and Collections
- Import datasets from STAC catalogs
- Validate spatial and temporal metadata
- Integrate with the broader STAC ecosystem
Installation¶
STAC support requires optional dependencies:
This installs:
- pystac - Core STAC library
- python-dateutil - Date parsing utilities
Core Concepts¶
STAC Items¶
A STAC Item represents a single spatiotemporal asset (e.g., a dataset for a specific time and location).
STAC Collections¶
A STAC Collection groups multiple related STAC Items together with shared metadata.
Basic Usage¶
Creating STAC Items¶
Convert a dataset to a STAC Item:
from dummyxarray import DummyDataset
import numpy as np
# Create a dataset with spatial coordinates
ds = DummyDataset()
ds.add_dim('time', 10)
ds.add_dim('lat', 64)
ds.add_dim('lon', 128)
ds.add_coord('lat', ['lat'], data=np.linspace(-45, 45, 64))
ds.add_coord('lon', ['lon'], data=np.linspace(-90, 90, 128))
ds.add_variable('temperature', ['time', 'lat', 'lon'],
attrs={'units': 'K', 'long_name': 'Air Temperature'})
# Add metadata
ds.attrs.update({
'title': 'Climate Model Output',
'description': 'Temperature data from climate simulation',
'time_coverage_start': '2020-01-01T00:00:00Z',
'time_coverage_end': '2020-12-31T23:59:59Z'
})
# Convert to STAC Item
item = ds.to_stac_item(
id='climate-2020',
geometry={
'type': 'Polygon',
'coordinates': [[
[-90, -45], [90, -45], [90, 45], [-90, 45], [-90, -45]
]]
}
)
print(f"STAC Item ID: {item.id}")
print(f"Geometry: {item.geometry['type']}")
print(f"Properties: {list(item.properties.keys())}")
Loading from STAC Items¶
Import a dataset from a STAC Item:
# Load from STAC Item
loaded_ds = DummyDataset.from_stac_item(item)
print(f"Dimensions: {loaded_ds.dims}")
print(f"Variables: {list(loaded_ds.variables.keys())}")
print(f"Title: {loaded_ds.attrs['title']}")
Creating STAC Collections¶
Convert a dataset to a STAC Collection:
# Create a collection
collection = ds.to_stac_collection(
id='climate-collection',
description='Climate model output collection',
license='CC-BY-4.0'
)
print(f"Collection ID: {collection.id}")
print(f"Description: {collection.description}")
print(f"Number of items: {len(list(collection.get_items()))}")
Multi-Asset Collections¶
Create a collection from multiple datasets:
# Create multiple datasets
temperature_ds = DummyDataset()
temperature_ds.attrs['title'] = 'Temperature Data'
# ... configure temperature_ds ...
precipitation_ds = DummyDataset()
precipitation_ds.attrs['title'] = 'Precipitation Data'
# ... configure precipitation_ds ...
wind_ds = DummyDataset()
wind_ds.attrs['title'] = 'Wind Data'
# ... configure wind_ds ...
# Create collection from multiple datasets
collection = DummyDataset.create_stac_collection(
[temperature_ds, precipitation_ds, wind_ds],
collection_id='climate-2020',
description='Complete climate dataset for 2020',
license='CC-BY-4.0'
)
print(f"Collection contains {len(list(collection.get_items()))} items")
File I/O¶
Saving STAC Items¶
Save a STAC Item to a JSON file:
# Save to file
ds.save_stac_item(
'climate_item.json',
id='climate-2020',
geometry={
'type': 'Polygon',
'coordinates': [[[-90, -45], [90, -45], [90, 45], [-90, 45], [-90, -45]]]
}
)
# Load from file
loaded_ds = DummyDataset.load_stac_item('climate_item.json')
Saving STAC Collections¶
Save a STAC Collection to a JSON file:
# Save collection
ds.save_stac_collection(
'climate_collection.json',
id='climate-collection',
description='Climate model output'
)
# Load collection
loaded_ds = DummyDataset.load_stac_collection('climate_collection.json')
Spatial Metadata¶
Adding Spatial Extent¶
Explicitly add spatial extent information:
ds = DummyDataset()
# Add spatial extent
ds.add_spatial_extent(
lat_bounds=(-60, 85),
lon_bounds=(-180, 180)
)
print(f"Spatial bounds: {ds.attrs['geospatial_bounds']}")
Automatic Bbox Extraction¶
DummyXarray automatically extracts bounding boxes from:
geospatial_boundsattribute (highest priority)- Lat/lon coordinates (fallback)
ds = DummyDataset()
ds.add_dim('lat', 64)
ds.add_dim('lon', 128)
# Bbox will be automatically extracted from these coordinates
ds.add_coord('lat', ['lat'], data=np.linspace(-45, 45, 64))
ds.add_coord('lon', ['lon'], data=np.linspace(-90, 90, 128))
item = ds.to_stac_item(id='auto-bbox')
print(f"Automatic bbox: {item.bbox}") # [-90, -45, 90, 45]
Validating Spatial Metadata¶
Validate spatial metadata before creating STAC Items:
# Validate spatial metadata
validation = ds.validate_spatial_metadata()
if validation['valid']:
print("✓ Spatial metadata is valid")
else:
print("✗ Spatial metadata issues:")
for issue in validation['issues']:
print(f" - {issue}")
Temporal Metadata¶
Inferring Temporal Extent¶
Automatically infer temporal extent from time coordinates:
ds = DummyDataset()
ds.add_dim('time', 365)
# Add time coordinate with units
ds.add_coord('time', ['time'],
data=np.arange(365),
attrs={'units': 'days since 2020-01-01'})
# Infer temporal extent
start, end = ds.infer_temporal_extent()
print(f"Start: {start}") # 2020-01-01
print(f"End: {end}") # 2020-12-31
print(f"Coverage start: {ds.attrs['time_coverage_start']}")
print(f"Coverage end: {ds.attrs['time_coverage_end']}")
Integration with Intake Catalogs¶
STAC and Intake catalogs can be used together:
# Save both catalog types
ds.save_intake_catalog('catalog.yaml', name='climate-data')
ds.save_stac_item('catalog.json', id='climate-data')
# Load from either format
intake_ds = DummyDataset.from_intake_catalog('catalog.yaml', 'climate-data')
stac_ds = DummyDataset.load_stac_item('catalog.json')
Advanced Features¶
Custom Properties¶
Add custom properties to STAC Items:
item = ds.to_stac_item(
id='climate-2020',
properties={
'model': 'CESM2',
'scenario': 'SSP2-4.5',
'institution': 'NCAR',
'processing_level': 'L3'
}
)
print(f"Model: {item.properties['model']}")
Geometry from Attributes¶
Use geometry from dataset attributes:
ds.attrs['geospatial_bounds'] = {
'type': 'Polygon',
'coordinates': [[
[-180, -90], [180, -90], [180, 90], [-180, 90], [-180, -90]
]]
}
# Geometry will be extracted automatically
item = ds.to_stac_item(id='global-coverage')
print(f"Geometry type: {item.geometry['type']}")
Collection Metadata¶
Collections preserve dataset metadata:
collection = ds.to_stac_collection(
id='climate-collection',
description='Climate model output'
)
# Access collection metadata
print(f"Dimensions: {collection.extra_fields.get('dims')}")
print(f"Variables: {collection.extra_fields.get('variables')}")
Best Practices¶
1. Always Include Spatial Information¶
# Good: Explicit spatial extent
ds.add_spatial_extent(lat_bounds=(-90, 90), lon_bounds=(-180, 180))
# Also good: Spatial coordinates
ds.add_coord('lat', ['lat'], data=np.linspace(-90, 90, 180))
ds.add_coord('lon', ['lon'], data=np.linspace(-180, 180, 360))
2. Use Descriptive IDs¶
# Good: Descriptive, unique IDs
item = ds.to_stac_item(id='temperature-2020-01-global-daily')
# Avoid: Generic IDs
item = ds.to_stac_item(id='item1') # Not descriptive
3. Include Temporal Information¶
ds.attrs.update({
'time_coverage_start': '2020-01-01T00:00:00Z',
'time_coverage_end': '2020-12-31T23:59:59Z'
})
4. Validate Before Export¶
# Validate spatial metadata
validation = ds.validate_spatial_metadata()
if not validation['valid']:
print("Fix these issues before creating STAC Item:")
for issue in validation['issues']:
print(f" - {issue}")
else:
item = ds.to_stac_item(id='validated-item')
5. Use Appropriate Licenses¶
collection = ds.to_stac_collection(
id='open-data',
license='CC-BY-4.0' # Use standard license identifiers
)
Error Handling¶
Missing Spatial Data¶
ds = DummyDataset() # No spatial coordinates
try:
item = ds.to_stac_item(id='no-spatial')
# Item will be created, but bbox will be None
print(f"Bbox: {item.bbox}") # None
except Exception as e:
print(f"Error: {e}")
Invalid Geometry¶
# Validate before creating
validation = ds.validate_spatial_metadata()
if validation['valid']:
item = ds.to_stac_item(id='valid-item')
Collection with No Items¶
try:
# This will raise an error
collection = DummyDataset.create_stac_collection(
[], # Empty list
collection_id='empty'
)
except Exception as e:
print(f"Error: {e}") # "At least one dataset is required"
Complete Example¶
Here's a complete workflow from dataset creation to STAC export:
from dummyxarray import DummyDataset
import numpy as np
# 1. Create dataset with spatial/temporal coordinates
ds = DummyDataset()
ds.add_dim('time', 12)
ds.add_dim('lat', 180)
ds.add_dim('lon', 360)
ds.add_coord('time', ['time'],
data=np.arange(12),
attrs={'units': 'months since 2020-01-01'})
ds.add_coord('lat', ['lat'], data=np.linspace(-90, 90, 180))
ds.add_coord('lon', ['lon'], data=np.linspace(-180, 180, 360))
# 2. Add variables
ds.add_variable('temperature', ['time', 'lat', 'lon'],
attrs={'units': 'K', 'standard_name': 'air_temperature'})
ds.add_variable('precipitation', ['time', 'lat', 'lon'],
attrs={'units': 'mm/day', 'standard_name': 'precipitation_flux'})
# 3. Add global metadata
ds.attrs.update({
'title': 'Global Climate Data 2020',
'description': 'Monthly global temperature and precipitation',
'institution': 'Climate Research Center',
'source': 'Climate Model v2.0'
})
# 4. Infer temporal extent
start, end = ds.infer_temporal_extent()
print(f"Temporal coverage: {start} to {end}")
# 5. Validate spatial metadata
validation = ds.validate_spatial_metadata()
print(f"Spatial validation: {'✓ Valid' if validation['valid'] else '✗ Invalid'}")
# 6. Create and save STAC Item
ds.save_stac_item(
'climate_2020.json',
id='global-climate-2020',
properties={
'model': 'ClimateModel-v2',
'resolution': '1-degree'
}
)
# 7. Create and save STAC Collection
ds.save_stac_collection(
'climate_collection.json',
id='climate-monthly',
description='Monthly global climate data',
license='CC-BY-4.0'
)
print("✓ STAC catalogs created successfully!")
See Also¶
- Spatial Metadata Guide - Detailed spatial metadata handling
- Geospatial Workflows - End-to-end geospatial examples
- Intake Catalogs - Alternative catalog format
- CF Compliance - CF conventions for geospatial data