DataGenerationMixin¶
Provides realistic random data generation for testing and prototyping.
Overview¶
The DataGenerationMixin populates datasets with realistic random data:
- Smart generation - Appropriate ranges based on variable type
- Reproducible - Use seeds for consistent results
- Type-aware - Different strategies for coordinates vs variables
Key Methods¶
populate_with_random_data(seed=None)- Fill all arrays with data_generate_coordinate_data(coord_name, size)- Generate coordinate data_generate_variable_data(var_name, shape, attrs)- Generate variable data
Data Generation Strategies¶
Coordinates¶
- time - Sequential integers (0, 1, 2, ...)
- lat - Uniform distribution from -90 to 90
- lon - Uniform distribution from -180 to 180
- lev/plev - Decreasing pressure levels
- Default - Sequential integers
Variables¶
Based on variable name and units:
- temperature - Realistic ranges (250-310 K or -20-40°C)
- precipitation - Non-negative, skewed distribution
- wind - Appropriate ranges for wind components
- humidity - 0-100% range
- Default - Standard normal distribution
Usage¶
ds = DummyDataset()
ds.add_dim("time", 10)
ds.add_dim("lat", 64)
ds.add_coord("time", dims=["time"])
ds.add_coord("lat", dims=["lat"])
ds.add_variable("temperature", dims=["time", "lat"])
# Populate with random data
ds.populate_with_random_data(seed=42)
# Now all arrays have data
print(ds.coords["time"].data.shape) # (10,)
print(ds.variables["temperature"].data.shape) # (10, 64)
API Reference¶
Mixin providing data generation capabilities.
Source code in src/dummyxarray/data_generation.py
| |
populate_with_random_data ¶
Populate all variables and coordinates with random but meaningful data.
This method generates random data based on variable metadata (units, standard_name, etc.) to create realistic-looking test datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed
|
int
|
Random seed for reproducibility |
None
|
Returns:
| Type | Description |
|---|---|
self
|
Returns self for method chaining |
Examples:
>>> ds = DummyDataset()
>>> ds.add_dim("time", 10)
>>> ds.add_dim("lat", 5)
>>> ds.add_coord("time", ["time"], attrs={"units": "days"})
>>> ds.add_variable("temperature", ["time", "lat"],
... attrs={"units": "K"})
>>> ds.populate_with_random_data(seed=42)
>>> print(ds.coords["time"].data)
[0 1 2 3 4 5 6 7 8 9]