Skip to content

IOMixin

Provides serialization and format conversion functionality.

Overview

The IOMixin enables exporting and importing datasets in multiple formats:

  • YAML - Human-readable configuration format
  • JSON - Structured data format
  • xarray - Convert to/from xarray.Dataset
  • Zarr - Write directly to Zarr storage
  • NetCDF - Via xarray conversion
  • Intake Catalogs - Export and import Intake catalog YAML files

Key Methods

Export Methods

  • to_dict() - Export as Python dictionary
  • to_json(indent=2, **kwargs) - Export as JSON string
  • to_yaml() - Export as YAML string
  • save_yaml(filepath) - Save to YAML file
  • to_xarray() - Convert to xarray.Dataset
  • to_zarr(store, **kwargs) - Write to Zarr store
  • to_intake_catalog(name, description, driver, data_path, **kwargs) - Export as Intake catalog YAML
  • save_intake_catalog(path, name, description, driver, data_path, **kwargs) - Save Intake catalog to file

Import Methods

  • from_xarray(xr_dataset) - Create from xarray.Dataset (class method)
  • load_yaml(filepath) - Load from YAML file (class method)
  • from_intake_catalog(catalog_source, source_name) - Load from Intake catalog (class method)
  • load_intake_catalog(path, source_name) - Load from Intake catalog file (class method)

Usage

# Export to YAML
ds.save_yaml("template.yaml")

# Load from YAML
ds = DummyDataset.load_yaml("template.yaml")

# Convert to xarray
xr_ds = ds.to_xarray()

# Import from xarray
ds = DummyDataset.from_xarray(xr_ds)

# Write to Zarr
ds.to_zarr("output.zarr")

# Export to Intake catalog
catalog_yaml = ds.to_intake_catalog(
    name="my_data", 
    description="My dataset",
    driver="zarr"
)
ds.save_intake_catalog("catalog.yaml", name="my_data")

# Import from Intake catalog
loaded_ds = DummyDataset.from_intake_catalog("catalog.yaml", "my_data")
loaded_ds = DummyDataset.load_intake_catalog("catalog.yaml", "my_data")

API Reference

Mixin providing I/O capabilities.

Source code in src/dummyxarray/io.py
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
class IOMixin:
    """Mixin providing I/O capabilities."""

    def to_dict(self):
        """
        Export dataset structure to a dictionary.

        Returns
        -------
        dict
            Dictionary representation of the dataset
        """
        return {
            "dimensions": self.dims,
            "coordinates": {k: v.to_dict() for k, v in self.coords.items()},
            "variables": {k: v.to_dict() for k, v in self.variables.items()},
            "attrs": self.attrs,
        }

    def to_json(self, **kwargs):
        """
        Export dataset structure to JSON string.

        Parameters
        ----------
        **kwargs
            Additional arguments passed to json.dumps

        Returns
        -------
        str
            JSON representation
        """
        # Set default indent if not provided
        if "indent" not in kwargs:
            kwargs["indent"] = 2
        return json.dumps(self.to_dict(), **kwargs)

    def to_yaml(self):
        """
        Export dataset structure to YAML string.

        Returns
        -------
        str
            YAML representation
        """
        return yaml.dump(self.to_dict(), sort_keys=False)

    def save_yaml(self, path):
        """
        Save dataset specification to a YAML file.

        Parameters
        ----------
        path : str
            Output file path
        """
        with open(path, "w") as f:
            f.write(self.to_yaml())

    @classmethod
    def load_yaml(cls, path):
        """
        Load dataset specification from a YAML file.

        Parameters
        ----------
        path : str
            Input file path

        Returns
        -------
        DummyDataset
            Loaded dataset (without data arrays)
        """
        # Import here to avoid circular dependency
        from .core import DummyArray

        with open(path) as f:
            spec = yaml.safe_load(f)

        ds = cls()

        ds.dims.update(spec.get("dimensions", {}))

        for name, info in spec.get("coordinates", {}).items():
            ds.coords[name] = DummyArray(
                dims=info["dims"], attrs=info["attrs"], data=None, encoding=info.get("encoding", {})
            )

        for name, info in spec.get("variables", {}).items():
            ds.variables[name] = DummyArray(
                dims=info["dims"], attrs=info["attrs"], data=None, encoding=info.get("encoding", {})
            )

        ds.attrs.update(spec.get("attrs", {}))

        return ds

    @classmethod
    def from_xarray(cls, xr_dataset, include_data=False):
        """
        Create a DummyDataset from an existing xarray.Dataset.

        This captures all metadata (dimensions, coordinates, variables, attributes,
        and encoding) from an xarray.Dataset without the actual data arrays
        (unless include_data=True).

        Parameters
        ----------
        xr_dataset : xarray.Dataset
            The xarray Dataset to extract metadata from
        include_data : bool, default False
            If True, include the actual data arrays. If False, only capture
            metadata structure.

        Returns
        -------
        DummyDataset
            A new DummyDataset with the structure and metadata from xr_dataset

        Examples
        --------
        >>> import xarray as xr
        >>> import numpy as np
        >>> xr_ds = xr.Dataset({
        ...     "temperature": (["time", "lat"], np.random.rand(10, 5))
        ... })
        >>> dummy_ds = DummyDataset.from_xarray(xr_ds)
        >>> print(dummy_ds.dims)
        {'time': 10, 'lat': 5}
        """
        # Import here to avoid circular dependency
        from .core import DummyArray

        ds = cls()

        # Copy global attributes
        ds.attrs.update(dict(xr_dataset.attrs))

        # Extract dimensions
        for dim_name, dim_size in xr_dataset.sizes.items():
            ds.dims[dim_name] = dim_size

        # Extract coordinates
        for coord_name, coord_var in xr_dataset.coords.items():
            ds.coords[coord_name] = DummyArray(
                dims=list(coord_var.dims),
                attrs=dict(coord_var.attrs),
                data=coord_var.values if include_data else None,
                encoding=dict(coord_var.encoding) if hasattr(coord_var, "encoding") else {},
            )

        # Extract data variables
        for var_name, var in xr_dataset.data_vars.items():
            ds.variables[var_name] = DummyArray(
                dims=list(var.dims),
                attrs=dict(var.attrs),
                data=var.values if include_data else None,
                encoding=dict(var.encoding) if hasattr(var, "encoding") else {},
            )

        return ds

    def to_xarray(self, validate=True):
        """
        Convert to a real xarray.Dataset.

        Parameters
        ----------
        validate : bool, default True
            Whether to validate the dataset before conversion

        Returns
        -------
        xarray.Dataset
            The constructed xarray Dataset

        Raises
        ------
        ValueError
            If validation fails or if any variable/coordinate is missing data
        """
        import xarray as xr

        if validate:
            self.validate(strict_coords=False)

        coords = {}
        for name, arr in self.coords.items():
            if arr.data is None:
                raise ValueError(f"Coordinate '{name}' missing data.")
            coords[name] = (arr.dims, arr.data, arr.attrs)

        variables = {}
        for name, arr in self.variables.items():
            if arr.data is None:
                raise ValueError(f"Variable '{name}' missing data.")
            variables[name] = (arr.dims, arr.data, arr.attrs)

        ds = xr.Dataset(data_vars=variables, coords=coords, attrs=self.attrs)

        # Apply encodings
        for name, arr in self.variables.items():
            if arr.encoding:
                ds[name].encoding = arr.encoding

        for name, arr in self.coords.items():
            if arr.encoding:
                ds[name].encoding = arr.encoding

        return ds

    def to_zarr(self, store_path, mode="w", validate=True):
        """
        Write dataset to Zarr format.

        Parameters
        ----------
        store_path : str
            Path to Zarr store
        mode : str, default "w"
            Write mode ('w' for write, 'a' for append)
        validate : bool, default True
            Whether to validate before writing

        Returns
        -------
        zarr.hierarchy.Group
            The Zarr group
        """
        ds = self.to_xarray(validate=validate)
        return ds.to_zarr(store_path, mode=mode)

    def to_intake_catalog(
        self,
        name="dataset",
        description="Dataset generated by dummyxarray",
        driver="zarr",
        data_path=None,
        **kwargs,
    ):
        """
        Convert dataset to Intake catalog format.

        Parameters
        ----------
        name : str, default "dataset"
            Name for the data source in the catalog
        description : str, default "Dataset generated by dummyxarray"
            Description of the data source
        driver : str, default "zarr"
            Intake driver to use (zarr, netcdf, xarray, etc.)
        data_path : str, optional
            Path to the actual data file. If None, uses template path
        **kwargs
            Additional arguments to pass to the driver

        Returns
        -------
        str
            YAML string representing the Intake catalog

        Examples
        --------
        >>> ds = DummyDataset()
        >>> ds.add_dim("time", 12)
        >>> ds.add_variable("temperature", dims=["time"], attrs={"units": "K"})
        >>> catalog_yaml = ds.to_intake_catalog(
        ...     name="my_dataset",
        ...     description="Temperature data",
        ...     data_path="data/my_dataset.zarr"
        ... )
        """
        # Build catalog structure
        catalog = {
            "metadata": {
                "version": 1,
                "description": f"Intake catalog for {name}",
            }
        }

        # Add dataset-level parameters if any
        if hasattr(self, "attrs") and self.attrs:
            catalog["metadata"]["dataset_attrs"] = dict(self.attrs)

        # Build sources section
        sources = {}

        # Default data path template if not provided
        if data_path is None:
            data_path = "{{ CATALOG_DIR }}/" + name + ".zarr"

        source_entry = {
            "description": description,
            "driver": driver,
            "args": {"urlpath": data_path, **kwargs},
        }

        # Add metadata about the dataset structure
        source_metadata = {}

        # Add dimension information
        if hasattr(self, "dims") and self.dims:
            source_metadata["dimensions"] = dict(self.dims)

        # Add coordinate information
        if hasattr(self, "coords") and self.coords:
            coord_info = {}
            for coord_name, coord_arr in self.coords.items():
                coord_info[coord_name] = {
                    "dims": coord_arr.dims,
                    "attrs": dict(coord_arr.attrs) if coord_arr.attrs else {},
                }
                if coord_arr.encoding:
                    encoding = dict(coord_arr.encoding)
                    # Convert tuples to lists for YAML compatibility
                    for key, value in encoding.items():
                        if isinstance(value, tuple):
                            encoding[key] = list(value)
                    coord_info[coord_name]["encoding"] = encoding
            source_metadata["coordinates"] = coord_info

        # Add variable information
        if hasattr(self, "variables") and self.variables:
            var_info = {}
            for var_name, var_arr in self.variables.items():
                var_info[var_name] = {
                    "dims": var_arr.dims,
                    "attrs": dict(var_arr.attrs) if var_arr.attrs else {},
                }
                if var_arr.encoding:
                    encoding = dict(var_arr.encoding)
                    # Convert tuples to lists for YAML compatibility
                    for key, value in encoding.items():
                        if isinstance(value, tuple):
                            encoding[key] = list(value)
                    var_info[var_name]["encoding"] = encoding
            source_metadata["variables"] = var_info

        if source_metadata:
            source_entry["metadata"] = source_metadata

        sources[name] = source_entry
        catalog["sources"] = sources

        return yaml.dump(catalog, sort_keys=False)

    def save_intake_catalog(
        self,
        path,
        name="dataset",
        description="Dataset generated by dummyxarray",
        driver="zarr",
        data_path=None,
        **kwargs,
    ):
        """
        Save Intake catalog to a YAML file.

        Parameters
        ----------
        path : str
            Output file path for the catalog YAML
        name : str, default "dataset"
            Name for the data source in the catalog
        description : str, default "Dataset generated by dummyxarray"
            Description of the data source
        driver : str, default "zarr"
            Intake driver to use (zarr, netcdf, xarray, etc.)
        data_path : str, optional
            Path to the actual data file. If None, uses template path
        **kwargs
            Additional arguments to pass to the driver
        """
        catalog_yaml = self.to_intake_catalog(
            name=name, description=description, driver=driver, data_path=data_path, **kwargs
        )

        with open(path, "w") as f:
            f.write(catalog_yaml)

    @classmethod
    def from_intake_catalog(cls, catalog_source, source_name=None):
        """
        Create a DummyDataset from an Intake catalog.

        Parameters
        ----------
        catalog_source : str or dict
            Either a path to a YAML catalog file or a dictionary containing
            the catalog structure
        source_name : str, optional
            Name of the source to use from the catalog. If None and catalog
            contains only one source, that source will be used automatically.

        Returns
        -------
        DummyDataset
            A new DummyDataset with the structure from the catalog

        Raises
        ------
        ValueError
            If catalog format is invalid or source_name is not found
        FileNotFoundError
            If catalog_source is a file path that doesn't exist

        Examples
        --------
        >>> # Load from file
        >>> ds = DummyDataset.from_intake_catalog("catalog.yaml", "climate_data")

        >>> # Load from dictionary
        >>> catalog_dict = yaml.safe_load(catalog_yaml)
        >>> ds = DummyDataset.from_intake_catalog(catalog_dict, "climate_data")
        """
        from pathlib import Path

        import yaml

        # Load catalog
        if isinstance(catalog_source, (str, Path)):
            # Load from file
            try:
                with open(catalog_source) as f:
                    catalog = yaml.safe_load(f)
            except FileNotFoundError as err:
                raise FileNotFoundError(f"Catalog file not found: {catalog_source}") from err
        elif isinstance(catalog_source, dict):
            # Use provided dictionary
            catalog = catalog_source
        else:
            raise ValueError("catalog_source must be a file path or dictionary")

        # Validate catalog structure
        if not isinstance(catalog, dict):
            raise ValueError("Catalog must be a dictionary")

        if "sources" not in catalog:
            raise ValueError("Catalog must contain 'sources' section")

        sources = catalog["sources"]
        if not sources:
            raise ValueError("Catalog sources section cannot be empty")

        # Determine which source to use
        if source_name is None:
            if len(sources) == 1:
                source_name = list(sources.keys())[0]
            else:
                raise ValueError(
                    "Multiple sources found in catalog. " "Please specify source_name explicitly."
                )

        if source_name not in sources:
            available_sources = list(sources.keys())
            raise ValueError(
                f"Source '{source_name}' not found in catalog. "
                f"Available sources: {available_sources}"
            )

        source = sources[source_name]

        # Create new DummyDataset
        ds = cls()

        # Extract dataset attributes from catalog metadata if available
        if "metadata" in catalog:
            catalog_metadata = catalog["metadata"]
            if "dataset_attrs" in catalog_metadata:
                ds.attrs.update(catalog_metadata["dataset_attrs"])

        # Extract source metadata if available
        source_metadata = source.get("metadata", {})

        # Add dimensions
        if "dimensions" in source_metadata:
            for dim_name, dim_size in source_metadata["dimensions"].items():
                ds.add_dim(dim_name, dim_size)

        # Add coordinates
        if "coordinates" in source_metadata:
            for coord_name, coord_info in source_metadata["coordinates"].items():
                coord_attrs = coord_info.get("attrs", {})
                coord_encoding = coord_info.get("encoding", {})
                ds.add_coord(
                    coord_name, dims=coord_info["dims"], attrs=coord_attrs, encoding=coord_encoding
                )

        # Add variables
        if "variables" in source_metadata:
            for var_name, var_info in source_metadata["variables"].items():
                var_attrs = var_info.get("attrs", {})
                var_encoding = var_info.get("encoding", {})
                ds.add_variable(
                    var_name, dims=var_info["dims"], attrs=var_attrs, encoding=var_encoding
                )

        # Add catalog-specific attributes
        ds.attrs.update(
            {
                "intake_catalog_source": source_name,
                "intake_driver": source.get("driver", "unknown"),
                "intake_description": source.get("description", ""),
            }
        )

        return ds

    @classmethod
    def load_intake_catalog(cls, path, source_name=None):
        """
        Load a DummyDataset from an Intake catalog YAML file.

        This is a convenience method that wraps from_intake_catalog() for file loading.

        Parameters
        ----------
        path : str
            Path to the catalog YAML file
        source_name : str, optional
            Name of the source to use from the catalog

        Returns
        -------
        DummyDataset
            A new DummyDataset with the structure from the catalog
        """
        return cls.from_intake_catalog(path, source_name)

    def to_stac_item(self, id, geometry=None, properties=None, assets=None, **kwargs):
        """
        Convert the dataset to a STAC Item.

        Requires the 'stac' optional dependency.

        Parameters
        ----------
        id : str
            Unique identifier for the STAC Item
        geometry : dict, optional
            GeoJSON geometry dictionary
        properties : dict, optional
            Additional properties for the STAC Item
        assets : dict, optional
            Dictionary of pystac.Asset objects
        **kwargs
            Additional arguments passed to pystac.Item

        Returns
        -------
        pystac.Item
            The generated STAC Item
        """
        try:
            from .stac import dataset_to_stac_item
        except ImportError as e:
            raise ImportError(
                "STAC support requires 'pystac' and other optional dependencies. "
                "Install with: pip install 'dummyxarray[stac]'"
            ) from e

        return dataset_to_stac_item(
            self, id=id, geometry=geometry, properties=properties, assets=assets, **kwargs
        )

    def to_stac_collection(
        self,
        id: str,
        description: Optional[str] = None,
        extent: Optional[Dict[str, Any]] = None,
        **kwargs,
    ) -> "Collection":
        """
        Create a STAC Collection from this dataset.

        Parameters
        ----------
        id : str
            Unique identifier for the collection
        description : str, optional
            Detailed description of the collection
        extent : dict, optional
            Spatial and temporal extent of the collection. If not provided,
            will attempt to extract from dataset attributes.
        **kwargs
            Additional arguments passed to pystac.Collection

        Returns
        -------
        pystac.Collection
            The generated STAC Collection

        Examples
        --------
        >>> ds = DummyDataset()
        >>> collection = ds.to_stac_collection(
        ...     id="my-collection",
        ...     description="A collection of dummy data"
        ... )
        """
        try:
            from pystac import Collection
        except ImportError as e:
            raise ImportError(
                "STAC support requires 'pystac' and other optional "
                "dependencies. Install with: pip install 'dummyxarray[stac]'"
            ) from e

        # Create a default extent if not provided
        if extent is None:
            extent = self._get_default_stac_extent()

        # Create the collection
        collection = Collection(
            id=id,
            description=description or self.attrs.get("description", ""),
            extent=extent,
            **kwargs,
        )

        # Add dataset metadata
        self._add_collection_metadata(collection)

        return collection

    def _get_default_stac_extent(self) -> Dict[str, Any]:
        """Generate a default STAC extent from dataset attributes."""
        from dateutil.parser import parse
        from pystac import SpatialExtent, TemporalExtent

        extent = {"spatial": None, "temporal": None}

        # Try to get spatial extent
        if (
            hasattr(self, "attrs")
            and isinstance(self.attrs, dict)
            and "geospatial_bounds" in self.attrs
        ):
            coords = self.attrs["geospatial_bounds"]["coordinates"][0]
            lons = [c[0] for c in coords]
            lats = [c[1] for c in coords]
            bbox = [min(lons), min(lats), max(lons), max(lats)]
            extent["spatial"] = SpatialExtent(bboxes=[bbox])

        # Try to get temporal extent
        time_start = None
        time_end = None
        if hasattr(self, "attrs") and isinstance(self.attrs, dict):
            time_start = self.attrs.get("time_coverage_start")
            time_end = self.attrs.get("time_coverage_end", time_start)

        def _parse_dt(val):
            if val is None:
                return None
            if isinstance(val, str):
                try:
                    return parse(val)
                except (ValueError, TypeError):
                    return None
            return val

        start_dt = _parse_dt(time_start)
        end_dt = _parse_dt(time_end)
        if start_dt is not None or end_dt is not None:
            extent["temporal"] = TemporalExtent(intervals=[[start_dt, end_dt]])

        return extent

    def _add_collection_metadata(self, collection: "Collection") -> None:
        """Add dataset metadata to a STAC Collection."""
        if hasattr(self, "dims"):
            collection.extra_fields["dims"] = dict(self.dims)
        if hasattr(self, "variables"):
            collection.extra_fields["variables"] = list(self.variables.keys())

    @classmethod
    def from_stac_item(cls, item):
        """
        Create a DummyDataset from a STAC Item.

        Parameters
        ----------
        item : pystac.Item
            The STAC Item to convert

        Returns
        -------
        DummyDataset
            A new DummyDataset with the structure from the STAC Item
        """
        try:
            from .stac import stac_item_to_dataset
        except ImportError as e:
            raise ImportError(
                "STAC support requires 'pystac' and other optional dependencies. "
                "Install with: pip install 'dummyxarray[stac]'"
            ) from e

        return stac_item_to_dataset(item)

    def save_stac_item(
        self: "D",
        path: str,
        id: Optional[str] = None,
        geometry: Optional[Dict[str, Any]] = None,
        properties: Optional[Dict[str, Any]] = None,
        assets: Optional[Dict[str, Any]] = None,
        **kwargs,
    ) -> None:
        """
        Save the dataset as a STAC Item to a file.

        Parameters
        ----------
        path : str
            Path where to save the STAC Item JSON file
        id : str, optional
            Unique identifier for the STAC Item. If not provided, will use the dataset name or a UUID.
        geometry : dict, optional
            GeoJSON geometry dict (required if not in dataset.attrs)
        properties : dict, optional
            Additional properties for the STAC Item
        assets : dict, optional
            Dictionary of asset information
        **kwargs
            Additional arguments passed to to_stac_item()
        """
        try:
            from pystac import Item  # noqa: F401
        except ImportError as e:
            raise ImportError(
                "STAC file operations require 'pystac'. "
                "Install with: pip install 'dummyxarray[stac]'"
            ) from e

        # Create parent directories if they don't exist
        Path(path).parent.mkdir(parents=True, exist_ok=True)

        # Convert to STAC Item
        item = self.to_stac_item(
            id=id or f"item-{str(uuid.uuid4())}",
            geometry=geometry,
            properties=properties,
            assets=assets,
            **kwargs,
        )

        # Save to file
        item.save_object(dest_href=path)

    def save_stac_collection(
        self: "D", path: str, id: Optional[str] = None, description: Optional[str] = None, **kwargs
    ) -> None:
        """
        Save the dataset as a STAC Collection to a file.

        Parameters
        ----------
        path : str
            Path where to save the STAC Collection JSON file
        id : str, optional
            Unique identifier for the STAC Collection
        description : str, optional
            Description of the collection
        **kwargs
            Additional arguments passed to to_stac_collection()
        """
        try:
            from pystac import Collection  # noqa: F401
        except ImportError as e:
            raise ImportError(
                "STAC file operations require 'pystac'. "
                "Install with: pip install 'dummyxarray[stac]'"
            ) from e

        # Create parent directories if they don't exist
        Path(path).parent.mkdir(parents=True, exist_ok=True)

        # Convert to STAC Collection
        collection = self.to_stac_collection(
            id=id or f"collection-{str(uuid.uuid4())}",
            description=description or "A collection of STAC items",
            **kwargs,
        )

        # Save to file
        collection.save_object(dest_href=path)

    @classmethod
    def load_stac_item(cls: type[D], path: str, **kwargs) -> D:
        """
        Load a STAC Item from a file and convert it to a DummyDataset.

        Parameters
        ----------
        path : str
            Path to the STAC Item JSON file
        **kwargs
            Additional arguments passed to from_stac_item()

        Returns
        -------
        DummyDataset
            The loaded dataset
        """
        try:
            from pystac import Item
        except ImportError as e:
            raise ImportError(
                "STAC file operations require 'pystac'. "
                "Install with: pip install 'dummyxarray[stac]'"
            ) from e

        item = Item.from_file(path)
        return cls.from_stac_item(item, **kwargs)

    @classmethod
    def load_stac_collection(
        cls: type[D], path: str, item_loader: Optional[Callable[[Any], D]] = None, **kwargs
    ) -> Union[D, List[D]]:
        """
        Load a STAC Collection from a file and convert it to one or more DummyDatasets.

        Parameters
        ----------
        path : str
            Path to the STAC Collection JSON file
        item_loader : callable, optional
            Function to handle loading of individual items in the collection.
            If not provided, returns a list of DummyDatasets.
        **kwargs
            Additional arguments passed to from_stac_item()

        Returns
        -------
        DummyDataset or list of DummyDataset
            The loaded dataset(s)
        """
        try:
            from pystac import Collection
        except ImportError as e:
            raise ImportError(
                "STAC file operations require 'pystac'. "
                "Install with: pip install 'dummyxarray[stac]'"
            ) from e

        collection = Collection.from_file(path)

        if item_loader is not None:
            return item_loader(collection)

        # If no item_loader provided, attempt to convert the collection.
        # This will return:
        # - a DummyDataset (if no resolvable items)
        # - or a list of DummyDatasets (if items are available)
        return cls.from_stac_collection(collection, **kwargs)

to_dict

to_dict()

Export dataset structure to a dictionary.

Returns:

Type Description
dict

Dictionary representation of the dataset

Source code in src/dummyxarray/io.py
def to_dict(self):
    """
    Export dataset structure to a dictionary.

    Returns
    -------
    dict
        Dictionary representation of the dataset
    """
    return {
        "dimensions": self.dims,
        "coordinates": {k: v.to_dict() for k, v in self.coords.items()},
        "variables": {k: v.to_dict() for k, v in self.variables.items()},
        "attrs": self.attrs,
    }

to_json

to_json(**kwargs)

Export dataset structure to JSON string.

Parameters:

Name Type Description Default
**kwargs

Additional arguments passed to json.dumps

{}

Returns:

Type Description
str

JSON representation

Source code in src/dummyxarray/io.py
def to_json(self, **kwargs):
    """
    Export dataset structure to JSON string.

    Parameters
    ----------
    **kwargs
        Additional arguments passed to json.dumps

    Returns
    -------
    str
        JSON representation
    """
    # Set default indent if not provided
    if "indent" not in kwargs:
        kwargs["indent"] = 2
    return json.dumps(self.to_dict(), **kwargs)

to_yaml

to_yaml()

Export dataset structure to YAML string.

Returns:

Type Description
str

YAML representation

Source code in src/dummyxarray/io.py
def to_yaml(self):
    """
    Export dataset structure to YAML string.

    Returns
    -------
    str
        YAML representation
    """
    return yaml.dump(self.to_dict(), sort_keys=False)

save_yaml

save_yaml(path)

Save dataset specification to a YAML file.

Parameters:

Name Type Description Default
path str

Output file path

required
Source code in src/dummyxarray/io.py
def save_yaml(self, path):
    """
    Save dataset specification to a YAML file.

    Parameters
    ----------
    path : str
        Output file path
    """
    with open(path, "w") as f:
        f.write(self.to_yaml())

load_yaml classmethod

load_yaml(path)

Load dataset specification from a YAML file.

Parameters:

Name Type Description Default
path str

Input file path

required

Returns:

Type Description
DummyDataset

Loaded dataset (without data arrays)

Source code in src/dummyxarray/io.py
@classmethod
def load_yaml(cls, path):
    """
    Load dataset specification from a YAML file.

    Parameters
    ----------
    path : str
        Input file path

    Returns
    -------
    DummyDataset
        Loaded dataset (without data arrays)
    """
    # Import here to avoid circular dependency
    from .core import DummyArray

    with open(path) as f:
        spec = yaml.safe_load(f)

    ds = cls()

    ds.dims.update(spec.get("dimensions", {}))

    for name, info in spec.get("coordinates", {}).items():
        ds.coords[name] = DummyArray(
            dims=info["dims"], attrs=info["attrs"], data=None, encoding=info.get("encoding", {})
        )

    for name, info in spec.get("variables", {}).items():
        ds.variables[name] = DummyArray(
            dims=info["dims"], attrs=info["attrs"], data=None, encoding=info.get("encoding", {})
        )

    ds.attrs.update(spec.get("attrs", {}))

    return ds

from_xarray classmethod

from_xarray(xr_dataset, include_data=False)

Create a DummyDataset from an existing xarray.Dataset.

This captures all metadata (dimensions, coordinates, variables, attributes, and encoding) from an xarray.Dataset without the actual data arrays (unless include_data=True).

Parameters:

Name Type Description Default
xr_dataset Dataset

The xarray Dataset to extract metadata from

required
include_data bool

If True, include the actual data arrays. If False, only capture metadata structure.

False

Returns:

Type Description
DummyDataset

A new DummyDataset with the structure and metadata from xr_dataset

Examples:

>>> import xarray as xr
>>> import numpy as np
>>> xr_ds = xr.Dataset({
...     "temperature": (["time", "lat"], np.random.rand(10, 5))
... })
>>> dummy_ds = DummyDataset.from_xarray(xr_ds)
>>> print(dummy_ds.dims)
{'time': 10, 'lat': 5}
Source code in src/dummyxarray/io.py
@classmethod
def from_xarray(cls, xr_dataset, include_data=False):
    """
    Create a DummyDataset from an existing xarray.Dataset.

    This captures all metadata (dimensions, coordinates, variables, attributes,
    and encoding) from an xarray.Dataset without the actual data arrays
    (unless include_data=True).

    Parameters
    ----------
    xr_dataset : xarray.Dataset
        The xarray Dataset to extract metadata from
    include_data : bool, default False
        If True, include the actual data arrays. If False, only capture
        metadata structure.

    Returns
    -------
    DummyDataset
        A new DummyDataset with the structure and metadata from xr_dataset

    Examples
    --------
    >>> import xarray as xr
    >>> import numpy as np
    >>> xr_ds = xr.Dataset({
    ...     "temperature": (["time", "lat"], np.random.rand(10, 5))
    ... })
    >>> dummy_ds = DummyDataset.from_xarray(xr_ds)
    >>> print(dummy_ds.dims)
    {'time': 10, 'lat': 5}
    """
    # Import here to avoid circular dependency
    from .core import DummyArray

    ds = cls()

    # Copy global attributes
    ds.attrs.update(dict(xr_dataset.attrs))

    # Extract dimensions
    for dim_name, dim_size in xr_dataset.sizes.items():
        ds.dims[dim_name] = dim_size

    # Extract coordinates
    for coord_name, coord_var in xr_dataset.coords.items():
        ds.coords[coord_name] = DummyArray(
            dims=list(coord_var.dims),
            attrs=dict(coord_var.attrs),
            data=coord_var.values if include_data else None,
            encoding=dict(coord_var.encoding) if hasattr(coord_var, "encoding") else {},
        )

    # Extract data variables
    for var_name, var in xr_dataset.data_vars.items():
        ds.variables[var_name] = DummyArray(
            dims=list(var.dims),
            attrs=dict(var.attrs),
            data=var.values if include_data else None,
            encoding=dict(var.encoding) if hasattr(var, "encoding") else {},
        )

    return ds

to_xarray

to_xarray(validate=True)

Convert to a real xarray.Dataset.

Parameters:

Name Type Description Default
validate bool

Whether to validate the dataset before conversion

True

Returns:

Type Description
Dataset

The constructed xarray Dataset

Raises:

Type Description
ValueError

If validation fails or if any variable/coordinate is missing data

Source code in src/dummyxarray/io.py
def to_xarray(self, validate=True):
    """
    Convert to a real xarray.Dataset.

    Parameters
    ----------
    validate : bool, default True
        Whether to validate the dataset before conversion

    Returns
    -------
    xarray.Dataset
        The constructed xarray Dataset

    Raises
    ------
    ValueError
        If validation fails or if any variable/coordinate is missing data
    """
    import xarray as xr

    if validate:
        self.validate(strict_coords=False)

    coords = {}
    for name, arr in self.coords.items():
        if arr.data is None:
            raise ValueError(f"Coordinate '{name}' missing data.")
        coords[name] = (arr.dims, arr.data, arr.attrs)

    variables = {}
    for name, arr in self.variables.items():
        if arr.data is None:
            raise ValueError(f"Variable '{name}' missing data.")
        variables[name] = (arr.dims, arr.data, arr.attrs)

    ds = xr.Dataset(data_vars=variables, coords=coords, attrs=self.attrs)

    # Apply encodings
    for name, arr in self.variables.items():
        if arr.encoding:
            ds[name].encoding = arr.encoding

    for name, arr in self.coords.items():
        if arr.encoding:
            ds[name].encoding = arr.encoding

    return ds

to_zarr

to_zarr(store_path, mode='w', validate=True)

Write dataset to Zarr format.

Parameters:

Name Type Description Default
store_path str

Path to Zarr store

required
mode str

Write mode ('w' for write, 'a' for append)

"w"
validate bool

Whether to validate before writing

True

Returns:

Type Description
Group

The Zarr group

Source code in src/dummyxarray/io.py
def to_zarr(self, store_path, mode="w", validate=True):
    """
    Write dataset to Zarr format.

    Parameters
    ----------
    store_path : str
        Path to Zarr store
    mode : str, default "w"
        Write mode ('w' for write, 'a' for append)
    validate : bool, default True
        Whether to validate before writing

    Returns
    -------
    zarr.hierarchy.Group
        The Zarr group
    """
    ds = self.to_xarray(validate=validate)
    return ds.to_zarr(store_path, mode=mode)

to_intake_catalog

to_intake_catalog(
    name="dataset",
    description="Dataset generated by dummyxarray",
    driver="zarr",
    data_path=None,
    **kwargs
)

Convert dataset to Intake catalog format.

Parameters:

Name Type Description Default
name str

Name for the data source in the catalog

"dataset"
description str

Description of the data source

"Dataset generated by dummyxarray"
driver str

Intake driver to use (zarr, netcdf, xarray, etc.)

"zarr"
data_path str

Path to the actual data file. If None, uses template path

None
**kwargs

Additional arguments to pass to the driver

{}

Returns:

Type Description
str

YAML string representing the Intake catalog

Examples:

>>> ds = DummyDataset()
>>> ds.add_dim("time", 12)
>>> ds.add_variable("temperature", dims=["time"], attrs={"units": "K"})
>>> catalog_yaml = ds.to_intake_catalog(
...     name="my_dataset",
...     description="Temperature data",
...     data_path="data/my_dataset.zarr"
... )
Source code in src/dummyxarray/io.py
def to_intake_catalog(
    self,
    name="dataset",
    description="Dataset generated by dummyxarray",
    driver="zarr",
    data_path=None,
    **kwargs,
):
    """
    Convert dataset to Intake catalog format.

    Parameters
    ----------
    name : str, default "dataset"
        Name for the data source in the catalog
    description : str, default "Dataset generated by dummyxarray"
        Description of the data source
    driver : str, default "zarr"
        Intake driver to use (zarr, netcdf, xarray, etc.)
    data_path : str, optional
        Path to the actual data file. If None, uses template path
    **kwargs
        Additional arguments to pass to the driver

    Returns
    -------
    str
        YAML string representing the Intake catalog

    Examples
    --------
    >>> ds = DummyDataset()
    >>> ds.add_dim("time", 12)
    >>> ds.add_variable("temperature", dims=["time"], attrs={"units": "K"})
    >>> catalog_yaml = ds.to_intake_catalog(
    ...     name="my_dataset",
    ...     description="Temperature data",
    ...     data_path="data/my_dataset.zarr"
    ... )
    """
    # Build catalog structure
    catalog = {
        "metadata": {
            "version": 1,
            "description": f"Intake catalog for {name}",
        }
    }

    # Add dataset-level parameters if any
    if hasattr(self, "attrs") and self.attrs:
        catalog["metadata"]["dataset_attrs"] = dict(self.attrs)

    # Build sources section
    sources = {}

    # Default data path template if not provided
    if data_path is None:
        data_path = "{{ CATALOG_DIR }}/" + name + ".zarr"

    source_entry = {
        "description": description,
        "driver": driver,
        "args": {"urlpath": data_path, **kwargs},
    }

    # Add metadata about the dataset structure
    source_metadata = {}

    # Add dimension information
    if hasattr(self, "dims") and self.dims:
        source_metadata["dimensions"] = dict(self.dims)

    # Add coordinate information
    if hasattr(self, "coords") and self.coords:
        coord_info = {}
        for coord_name, coord_arr in self.coords.items():
            coord_info[coord_name] = {
                "dims": coord_arr.dims,
                "attrs": dict(coord_arr.attrs) if coord_arr.attrs else {},
            }
            if coord_arr.encoding:
                encoding = dict(coord_arr.encoding)
                # Convert tuples to lists for YAML compatibility
                for key, value in encoding.items():
                    if isinstance(value, tuple):
                        encoding[key] = list(value)
                coord_info[coord_name]["encoding"] = encoding
        source_metadata["coordinates"] = coord_info

    # Add variable information
    if hasattr(self, "variables") and self.variables:
        var_info = {}
        for var_name, var_arr in self.variables.items():
            var_info[var_name] = {
                "dims": var_arr.dims,
                "attrs": dict(var_arr.attrs) if var_arr.attrs else {},
            }
            if var_arr.encoding:
                encoding = dict(var_arr.encoding)
                # Convert tuples to lists for YAML compatibility
                for key, value in encoding.items():
                    if isinstance(value, tuple):
                        encoding[key] = list(value)
                var_info[var_name]["encoding"] = encoding
        source_metadata["variables"] = var_info

    if source_metadata:
        source_entry["metadata"] = source_metadata

    sources[name] = source_entry
    catalog["sources"] = sources

    return yaml.dump(catalog, sort_keys=False)

save_intake_catalog

save_intake_catalog(
    path,
    name="dataset",
    description="Dataset generated by dummyxarray",
    driver="zarr",
    data_path=None,
    **kwargs
)

Save Intake catalog to a YAML file.

Parameters:

Name Type Description Default
path str

Output file path for the catalog YAML

required
name str

Name for the data source in the catalog

"dataset"
description str

Description of the data source

"Dataset generated by dummyxarray"
driver str

Intake driver to use (zarr, netcdf, xarray, etc.)

"zarr"
data_path str

Path to the actual data file. If None, uses template path

None
**kwargs

Additional arguments to pass to the driver

{}
Source code in src/dummyxarray/io.py
def save_intake_catalog(
    self,
    path,
    name="dataset",
    description="Dataset generated by dummyxarray",
    driver="zarr",
    data_path=None,
    **kwargs,
):
    """
    Save Intake catalog to a YAML file.

    Parameters
    ----------
    path : str
        Output file path for the catalog YAML
    name : str, default "dataset"
        Name for the data source in the catalog
    description : str, default "Dataset generated by dummyxarray"
        Description of the data source
    driver : str, default "zarr"
        Intake driver to use (zarr, netcdf, xarray, etc.)
    data_path : str, optional
        Path to the actual data file. If None, uses template path
    **kwargs
        Additional arguments to pass to the driver
    """
    catalog_yaml = self.to_intake_catalog(
        name=name, description=description, driver=driver, data_path=data_path, **kwargs
    )

    with open(path, "w") as f:
        f.write(catalog_yaml)

from_intake_catalog classmethod

from_intake_catalog(catalog_source, source_name=None)

Create a DummyDataset from an Intake catalog.

Parameters:

Name Type Description Default
catalog_source str or dict

Either a path to a YAML catalog file or a dictionary containing the catalog structure

required
source_name str

Name of the source to use from the catalog. If None and catalog contains only one source, that source will be used automatically.

None

Returns:

Type Description
DummyDataset

A new DummyDataset with the structure from the catalog

Raises:

Type Description
ValueError

If catalog format is invalid or source_name is not found

FileNotFoundError

If catalog_source is a file path that doesn't exist

Examples:

>>> # Load from file
>>> ds = DummyDataset.from_intake_catalog("catalog.yaml", "climate_data")
>>> # Load from dictionary
>>> catalog_dict = yaml.safe_load(catalog_yaml)
>>> ds = DummyDataset.from_intake_catalog(catalog_dict, "climate_data")
Source code in src/dummyxarray/io.py
@classmethod
def from_intake_catalog(cls, catalog_source, source_name=None):
    """
    Create a DummyDataset from an Intake catalog.

    Parameters
    ----------
    catalog_source : str or dict
        Either a path to a YAML catalog file or a dictionary containing
        the catalog structure
    source_name : str, optional
        Name of the source to use from the catalog. If None and catalog
        contains only one source, that source will be used automatically.

    Returns
    -------
    DummyDataset
        A new DummyDataset with the structure from the catalog

    Raises
    ------
    ValueError
        If catalog format is invalid or source_name is not found
    FileNotFoundError
        If catalog_source is a file path that doesn't exist

    Examples
    --------
    >>> # Load from file
    >>> ds = DummyDataset.from_intake_catalog("catalog.yaml", "climate_data")

    >>> # Load from dictionary
    >>> catalog_dict = yaml.safe_load(catalog_yaml)
    >>> ds = DummyDataset.from_intake_catalog(catalog_dict, "climate_data")
    """
    from pathlib import Path

    import yaml

    # Load catalog
    if isinstance(catalog_source, (str, Path)):
        # Load from file
        try:
            with open(catalog_source) as f:
                catalog = yaml.safe_load(f)
        except FileNotFoundError as err:
            raise FileNotFoundError(f"Catalog file not found: {catalog_source}") from err
    elif isinstance(catalog_source, dict):
        # Use provided dictionary
        catalog = catalog_source
    else:
        raise ValueError("catalog_source must be a file path or dictionary")

    # Validate catalog structure
    if not isinstance(catalog, dict):
        raise ValueError("Catalog must be a dictionary")

    if "sources" not in catalog:
        raise ValueError("Catalog must contain 'sources' section")

    sources = catalog["sources"]
    if not sources:
        raise ValueError("Catalog sources section cannot be empty")

    # Determine which source to use
    if source_name is None:
        if len(sources) == 1:
            source_name = list(sources.keys())[0]
        else:
            raise ValueError(
                "Multiple sources found in catalog. " "Please specify source_name explicitly."
            )

    if source_name not in sources:
        available_sources = list(sources.keys())
        raise ValueError(
            f"Source '{source_name}' not found in catalog. "
            f"Available sources: {available_sources}"
        )

    source = sources[source_name]

    # Create new DummyDataset
    ds = cls()

    # Extract dataset attributes from catalog metadata if available
    if "metadata" in catalog:
        catalog_metadata = catalog["metadata"]
        if "dataset_attrs" in catalog_metadata:
            ds.attrs.update(catalog_metadata["dataset_attrs"])

    # Extract source metadata if available
    source_metadata = source.get("metadata", {})

    # Add dimensions
    if "dimensions" in source_metadata:
        for dim_name, dim_size in source_metadata["dimensions"].items():
            ds.add_dim(dim_name, dim_size)

    # Add coordinates
    if "coordinates" in source_metadata:
        for coord_name, coord_info in source_metadata["coordinates"].items():
            coord_attrs = coord_info.get("attrs", {})
            coord_encoding = coord_info.get("encoding", {})
            ds.add_coord(
                coord_name, dims=coord_info["dims"], attrs=coord_attrs, encoding=coord_encoding
            )

    # Add variables
    if "variables" in source_metadata:
        for var_name, var_info in source_metadata["variables"].items():
            var_attrs = var_info.get("attrs", {})
            var_encoding = var_info.get("encoding", {})
            ds.add_variable(
                var_name, dims=var_info["dims"], attrs=var_attrs, encoding=var_encoding
            )

    # Add catalog-specific attributes
    ds.attrs.update(
        {
            "intake_catalog_source": source_name,
            "intake_driver": source.get("driver", "unknown"),
            "intake_description": source.get("description", ""),
        }
    )

    return ds

load_intake_catalog classmethod

load_intake_catalog(path, source_name=None)

Load a DummyDataset from an Intake catalog YAML file.

This is a convenience method that wraps from_intake_catalog() for file loading.

Parameters:

Name Type Description Default
path str

Path to the catalog YAML file

required
source_name str

Name of the source to use from the catalog

None

Returns:

Type Description
DummyDataset

A new DummyDataset with the structure from the catalog

Source code in src/dummyxarray/io.py
@classmethod
def load_intake_catalog(cls, path, source_name=None):
    """
    Load a DummyDataset from an Intake catalog YAML file.

    This is a convenience method that wraps from_intake_catalog() for file loading.

    Parameters
    ----------
    path : str
        Path to the catalog YAML file
    source_name : str, optional
        Name of the source to use from the catalog

    Returns
    -------
    DummyDataset
        A new DummyDataset with the structure from the catalog
    """
    return cls.from_intake_catalog(path, source_name)

to_stac_item

to_stac_item(
    id,
    geometry=None,
    properties=None,
    assets=None,
    **kwargs
)

Convert the dataset to a STAC Item.

Requires the 'stac' optional dependency.

Parameters:

Name Type Description Default
id str

Unique identifier for the STAC Item

required
geometry dict

GeoJSON geometry dictionary

None
properties dict

Additional properties for the STAC Item

None
assets dict

Dictionary of pystac.Asset objects

None
**kwargs

Additional arguments passed to pystac.Item

{}

Returns:

Type Description
Item

The generated STAC Item

Source code in src/dummyxarray/io.py
def to_stac_item(self, id, geometry=None, properties=None, assets=None, **kwargs):
    """
    Convert the dataset to a STAC Item.

    Requires the 'stac' optional dependency.

    Parameters
    ----------
    id : str
        Unique identifier for the STAC Item
    geometry : dict, optional
        GeoJSON geometry dictionary
    properties : dict, optional
        Additional properties for the STAC Item
    assets : dict, optional
        Dictionary of pystac.Asset objects
    **kwargs
        Additional arguments passed to pystac.Item

    Returns
    -------
    pystac.Item
        The generated STAC Item
    """
    try:
        from .stac import dataset_to_stac_item
    except ImportError as e:
        raise ImportError(
            "STAC support requires 'pystac' and other optional dependencies. "
            "Install with: pip install 'dummyxarray[stac]'"
        ) from e

    return dataset_to_stac_item(
        self, id=id, geometry=geometry, properties=properties, assets=assets, **kwargs
    )

to_stac_collection

to_stac_collection(
    id: str,
    description: Optional[str] = None,
    extent: Optional[Dict[str, Any]] = None,
    **kwargs
) -> Collection

Create a STAC Collection from this dataset.

Parameters:

Name Type Description Default
id str

Unique identifier for the collection

required
description str

Detailed description of the collection

None
extent dict

Spatial and temporal extent of the collection. If not provided, will attempt to extract from dataset attributes.

None
**kwargs

Additional arguments passed to pystac.Collection

{}

Returns:

Type Description
Collection

The generated STAC Collection

Examples:

>>> ds = DummyDataset()
>>> collection = ds.to_stac_collection(
...     id="my-collection",
...     description="A collection of dummy data"
... )
Source code in src/dummyxarray/io.py
def to_stac_collection(
    self,
    id: str,
    description: Optional[str] = None,
    extent: Optional[Dict[str, Any]] = None,
    **kwargs,
) -> "Collection":
    """
    Create a STAC Collection from this dataset.

    Parameters
    ----------
    id : str
        Unique identifier for the collection
    description : str, optional
        Detailed description of the collection
    extent : dict, optional
        Spatial and temporal extent of the collection. If not provided,
        will attempt to extract from dataset attributes.
    **kwargs
        Additional arguments passed to pystac.Collection

    Returns
    -------
    pystac.Collection
        The generated STAC Collection

    Examples
    --------
    >>> ds = DummyDataset()
    >>> collection = ds.to_stac_collection(
    ...     id="my-collection",
    ...     description="A collection of dummy data"
    ... )
    """
    try:
        from pystac import Collection
    except ImportError as e:
        raise ImportError(
            "STAC support requires 'pystac' and other optional "
            "dependencies. Install with: pip install 'dummyxarray[stac]'"
        ) from e

    # Create a default extent if not provided
    if extent is None:
        extent = self._get_default_stac_extent()

    # Create the collection
    collection = Collection(
        id=id,
        description=description or self.attrs.get("description", ""),
        extent=extent,
        **kwargs,
    )

    # Add dataset metadata
    self._add_collection_metadata(collection)

    return collection

from_stac_item classmethod

from_stac_item(item)

Create a DummyDataset from a STAC Item.

Parameters:

Name Type Description Default
item Item

The STAC Item to convert

required

Returns:

Type Description
DummyDataset

A new DummyDataset with the structure from the STAC Item

Source code in src/dummyxarray/io.py
@classmethod
def from_stac_item(cls, item):
    """
    Create a DummyDataset from a STAC Item.

    Parameters
    ----------
    item : pystac.Item
        The STAC Item to convert

    Returns
    -------
    DummyDataset
        A new DummyDataset with the structure from the STAC Item
    """
    try:
        from .stac import stac_item_to_dataset
    except ImportError as e:
        raise ImportError(
            "STAC support requires 'pystac' and other optional dependencies. "
            "Install with: pip install 'dummyxarray[stac]'"
        ) from e

    return stac_item_to_dataset(item)

save_stac_item

save_stac_item(
    path: str,
    id: Optional[str] = None,
    geometry: Optional[Dict[str, Any]] = None,
    properties: Optional[Dict[str, Any]] = None,
    assets: Optional[Dict[str, Any]] = None,
    **kwargs
) -> None

Save the dataset as a STAC Item to a file.

Parameters:

Name Type Description Default
path str

Path where to save the STAC Item JSON file

required
id str

Unique identifier for the STAC Item. If not provided, will use the dataset name or a UUID.

None
geometry dict

GeoJSON geometry dict (required if not in dataset.attrs)

None
properties dict

Additional properties for the STAC Item

None
assets dict

Dictionary of asset information

None
**kwargs

Additional arguments passed to to_stac_item()

{}
Source code in src/dummyxarray/io.py
def save_stac_item(
    self: "D",
    path: str,
    id: Optional[str] = None,
    geometry: Optional[Dict[str, Any]] = None,
    properties: Optional[Dict[str, Any]] = None,
    assets: Optional[Dict[str, Any]] = None,
    **kwargs,
) -> None:
    """
    Save the dataset as a STAC Item to a file.

    Parameters
    ----------
    path : str
        Path where to save the STAC Item JSON file
    id : str, optional
        Unique identifier for the STAC Item. If not provided, will use the dataset name or a UUID.
    geometry : dict, optional
        GeoJSON geometry dict (required if not in dataset.attrs)
    properties : dict, optional
        Additional properties for the STAC Item
    assets : dict, optional
        Dictionary of asset information
    **kwargs
        Additional arguments passed to to_stac_item()
    """
    try:
        from pystac import Item  # noqa: F401
    except ImportError as e:
        raise ImportError(
            "STAC file operations require 'pystac'. "
            "Install with: pip install 'dummyxarray[stac]'"
        ) from e

    # Create parent directories if they don't exist
    Path(path).parent.mkdir(parents=True, exist_ok=True)

    # Convert to STAC Item
    item = self.to_stac_item(
        id=id or f"item-{str(uuid.uuid4())}",
        geometry=geometry,
        properties=properties,
        assets=assets,
        **kwargs,
    )

    # Save to file
    item.save_object(dest_href=path)

save_stac_collection

save_stac_collection(
    path: str,
    id: Optional[str] = None,
    description: Optional[str] = None,
    **kwargs
) -> None

Save the dataset as a STAC Collection to a file.

Parameters:

Name Type Description Default
path str

Path where to save the STAC Collection JSON file

required
id str

Unique identifier for the STAC Collection

None
description str

Description of the collection

None
**kwargs

Additional arguments passed to to_stac_collection()

{}
Source code in src/dummyxarray/io.py
def save_stac_collection(
    self: "D", path: str, id: Optional[str] = None, description: Optional[str] = None, **kwargs
) -> None:
    """
    Save the dataset as a STAC Collection to a file.

    Parameters
    ----------
    path : str
        Path where to save the STAC Collection JSON file
    id : str, optional
        Unique identifier for the STAC Collection
    description : str, optional
        Description of the collection
    **kwargs
        Additional arguments passed to to_stac_collection()
    """
    try:
        from pystac import Collection  # noqa: F401
    except ImportError as e:
        raise ImportError(
            "STAC file operations require 'pystac'. "
            "Install with: pip install 'dummyxarray[stac]'"
        ) from e

    # Create parent directories if they don't exist
    Path(path).parent.mkdir(parents=True, exist_ok=True)

    # Convert to STAC Collection
    collection = self.to_stac_collection(
        id=id or f"collection-{str(uuid.uuid4())}",
        description=description or "A collection of STAC items",
        **kwargs,
    )

    # Save to file
    collection.save_object(dest_href=path)

load_stac_item classmethod

load_stac_item(path: str, **kwargs) -> D

Load a STAC Item from a file and convert it to a DummyDataset.

Parameters:

Name Type Description Default
path str

Path to the STAC Item JSON file

required
**kwargs

Additional arguments passed to from_stac_item()

{}

Returns:

Type Description
DummyDataset

The loaded dataset

Source code in src/dummyxarray/io.py
@classmethod
def load_stac_item(cls: type[D], path: str, **kwargs) -> D:
    """
    Load a STAC Item from a file and convert it to a DummyDataset.

    Parameters
    ----------
    path : str
        Path to the STAC Item JSON file
    **kwargs
        Additional arguments passed to from_stac_item()

    Returns
    -------
    DummyDataset
        The loaded dataset
    """
    try:
        from pystac import Item
    except ImportError as e:
        raise ImportError(
            "STAC file operations require 'pystac'. "
            "Install with: pip install 'dummyxarray[stac]'"
        ) from e

    item = Item.from_file(path)
    return cls.from_stac_item(item, **kwargs)

load_stac_collection classmethod

load_stac_collection(
    path: str,
    item_loader: Optional[Callable[[Any], D]] = None,
    **kwargs
) -> Union[D, List[D]]

Load a STAC Collection from a file and convert it to one or more DummyDatasets.

Parameters:

Name Type Description Default
path str

Path to the STAC Collection JSON file

required
item_loader callable

Function to handle loading of individual items in the collection. If not provided, returns a list of DummyDatasets.

None
**kwargs

Additional arguments passed to from_stac_item()

{}

Returns:

Type Description
DummyDataset or list of DummyDataset

The loaded dataset(s)

Source code in src/dummyxarray/io.py
@classmethod
def load_stac_collection(
    cls: type[D], path: str, item_loader: Optional[Callable[[Any], D]] = None, **kwargs
) -> Union[D, List[D]]:
    """
    Load a STAC Collection from a file and convert it to one or more DummyDatasets.

    Parameters
    ----------
    path : str
        Path to the STAC Collection JSON file
    item_loader : callable, optional
        Function to handle loading of individual items in the collection.
        If not provided, returns a list of DummyDatasets.
    **kwargs
        Additional arguments passed to from_stac_item()

    Returns
    -------
    DummyDataset or list of DummyDataset
        The loaded dataset(s)
    """
    try:
        from pystac import Collection
    except ImportError as e:
        raise ImportError(
            "STAC file operations require 'pystac'. "
            "Install with: pip install 'dummyxarray[stac]'"
        ) from e

    collection = Collection.from_file(path)

    if item_loader is not None:
        return item_loader(collection)

    # If no item_loader provided, attempt to convert the collection.
    # This will return:
    # - a DummyDataset (if no resolvable items)
    # - or a list of DummyDatasets (if items are available)
    return cls.from_stac_collection(collection, **kwargs)