DataGenerationMixin¶
Provides realistic random data generation for testing and prototyping.
Overview¶
The DataGenerationMixin populates datasets with realistic random data:
- Smart generation - Appropriate ranges based on variable type
- Reproducible - Use seeds for consistent results
- Type-aware - Different strategies for coordinates vs variables
Key Methods¶
populate_with_random_data(seed=None)- Fill all arrays with data_generate_coordinate_data(coord_name, size)- Generate coordinate data_generate_variable_data(var_name, shape, attrs)- Generate variable data
Data Generation Strategies¶
Coordinates¶
- time - Sequential integers (0, 1, 2, ...)
- lat - Uniform distribution from -90 to 90
- lon - Uniform distribution from -180 to 180
- lev/plev - Decreasing pressure levels
- Default - Sequential integers
Variables¶
Based on variable name and units:
- temperature - Realistic ranges (250-310 K or -20-40°C)
- precipitation - Non-negative, skewed distribution
- wind - Appropriate ranges for wind components
- humidity - 0-100% range
- Default - Standard normal distribution
Usage¶
ds = DummyDataset()
ds.add_dim("time", 10)
ds.add_dim("lat", 64)
ds.add_coord("time", dims=["time"])
ds.add_coord("lat", dims=["lat"])
ds.add_variable("temperature", dims=["time", "lat"])
# Populate with random data
ds.populate_with_random_data(seed=42)
# Now all arrays have data
print(ds.coords["time"].data.shape) # (10,)
print(ds.variables["temperature"].data.shape) # (10, 64)
API Reference¶
Mixin providing data generation capabilities.
Source code in src/dummyxarray/data_generation.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
populate_with_random_data ¶
Populate all variables and coordinates with random but meaningful data.
This method generates random data based on variable metadata (units, standard_name, etc.) to create realistic-looking test datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed
|
int
|
Random seed for reproducibility |
None
|
Returns:
| Type | Description |
|---|---|
self
|
Returns self for method chaining |
Examples:
>>> ds = DummyDataset()
>>> ds.add_dim("time", 10)
>>> ds.add_dim("lat", 5)
>>> ds.add_coord("time", ["time"], attrs={"units": "days"})
>>> ds.add_variable("temperature", ["time", "lat"],
... attrs={"units": "K"})
>>> ds.populate_with_random_data(seed=42)
>>> print(ds.coords["time"].data)
[0 1 2 3 4 5 6 7 8 9]