YAML Export¶
Export and import dataset specifications using YAML format.
Exporting to YAML¶
from dummyxarray import DummyDataset
ds = DummyDataset()
ds.set_global_attrs(title="My Dataset")
ds.add_dim("time", 12)
ds.add_variable("temperature", ["time"], attrs={"units": "K"})
# Get YAML string
yaml_str = ds.to_yaml()
print(yaml_str)
# Save to file
ds.save_yaml("dataset_spec.yaml")
YAML Structure¶
The exported YAML contains:
dimensions:
time: 12
lat: 64
lon: 128
coordinates:
time:
dims:
- time
attrs:
units: days since 2000-01-01
encoding:
dtype: int32
has_data: true
variables:
temperature:
dims:
- time
- lat
- lon
attrs:
long_name: Temperature
units: K
encoding:
dtype: float32
chunks: [6, 32, 64]
has_data: true
attrs:
title: My Dataset
institution: DKRZ
Loading from YAML¶
from dummyxarray import DummyDataset
# Load specification
ds = DummyDataset.load_yaml("dataset_spec.yaml")
# The structure is loaded, but not the data
print(ds.dims)
print(ds.variables.keys())
# Add data later
import numpy as np
ds.variables["temperature"].data = np.random.rand(12, 64, 128)
# Convert to xarray
xr_ds = ds.to_xarray()
Use Cases¶
1. Documentation¶
Export dataset specifications for documentation:
2. Templates¶
Create reusable templates:
# Create template
template = DummyDataset()
template.set_global_attrs(Conventions="CF-1.8")
template.add_dim("time", None) # Placeholder
template.add_variable("temperature", ["time"], attrs={"units": "K"})
template.save_yaml("templates/temperature_timeseries.yaml")
# Use template
ds = DummyDataset.load_yaml("templates/temperature_timeseries.yaml")
ds.dims["time"] = 365 # Set actual size
3. Version Control¶
Track dataset structure changes in git:
4. Collaboration¶
Share specifications with collaborators:
# Person A creates spec
ds = DummyDataset()
ds.set_global_attrs(title="Shared Dataset")
ds.add_dim("time", 100)
ds.save_yaml("shared_spec.yaml")
# Person B loads and uses
ds_loaded = DummyDataset.load_yaml("shared_spec.yaml")
# Add their data...
JSON Export¶
You can also export to JSON:
Limitations¶
Note that the actual data arrays are not saved to YAML, only the metadata and structure.
The has_data field indicates whether data was present when the spec was created.