Importing from ncdump Headers¶
dummyxarray can parse ncdump -h output to create dataset structures from existing NetCDF files.
Overview¶
The from_ncdump_header() function parses the output of ncdump -h and creates a DummyDataset
with the same structure. This is useful for:
- Replicating structures - Copy metadata from existing datasets
- Template creation - Use real datasets as templates
- Quick inspection - Understand dataset structure without loading data
- Documentation - Extract and document dataset schemas
Basic Usage¶
Step 1: Get ncdump Header¶
Step 2: Import to DummyDataset¶
from dummyxarray import from_ncdump_header
# Read the header file
with open('header.txt', 'r') as f:
header_text = f.read()
# Create DummyDataset
ds = from_ncdump_header(header_text)
# The dataset now has the same structure
print(ds)
What Gets Imported¶
The parser extracts:
- ✅ Dimensions - Including UNLIMITED dimensions
- ✅ Variables - With dimensions and data types
- ✅ Coordinates - Automatically detected
- ✅ Attributes - Variable and global attributes
- ✅ Metadata - All CF-compliant metadata
Note: Data arrays are NOT imported, only the structure and metadata.
Complete Example¶
from dummyxarray import from_ncdump_header
# Example ncdump output
header = """
netcdf climate_data {
dimensions:
time = UNLIMITED ; // (365 currently)
lat = 64 ;
lon = 128 ;
variables:
double time(time) ;
time:units = "days since 2000-01-01" ;
time:calendar = "gregorian" ;
time:axis = "T" ;
double lat(lat) ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
lat:axis = "Y" ;
double lon(lon) ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
lon:axis = "X" ;
float temperature(time, lat, lon) ;
temperature:units = "K" ;
temperature:standard_name = "air_temperature" ;
temperature:long_name = "Air Temperature" ;
// global attributes:
:Conventions = "CF-1.8" ;
:title = "Climate Model Output" ;
}
"""
# Import structure
ds = from_ncdump_header(header)
# Check what was imported
print(f"Dimensions: {ds.dims}")
print(f"Coordinates: {list(ds.coords.keys())}")
print(f"Variables: {list(ds.variables.keys())}")
print(f"Global attrs: {ds.attrs}")
Coordinate Detection¶
The parser automatically identifies coordinates using this rule:
A variable is a coordinate if: - It has exactly one dimension - The dimension name matches the variable name
# These are detected as coordinates:
double time(time) ; # ✓ Coordinate
double lat(lat) ; # ✓ Coordinate
double lon(lon) ; # ✓ Coordinate
# These are detected as variables:
float temperature(time, lat, lon) ; # ✗ Variable (multi-dim)
float bounds(lat, nv) ; # ✗ Variable (dims don't match name)
Handling UNLIMITED Dimensions¶
UNLIMITED dimensions are handled automatically:
# ncdump shows:
# time = UNLIMITED ; // (365 currently)
# Parser extracts the current size (365)
ds = from_ncdump_header(header)
print(ds.dims['time']) # 365
If no current size is specified, the dimension size will be None:
Working with Imported Datasets¶
Once imported, you can work with the dataset normally:
Populate with Data¶
# Add random data for testing
ds.populate_with_random_data(seed=42)
# Now convert to xarray
xr_ds = ds.to_xarray()
Validate CF Compliance¶
Modify Structure¶
# Add new variables
ds.add_variable(
"humidity",
dims=["time", "lat", "lon"],
attrs={"units": "%", "standard_name": "relative_humidity"}
)
# Update attributes
ds.assign_attrs(history="Modified with dummyxarray")
Export as Template¶
# Save as YAML template
ds.save_yaml("template.yaml")
# Later, load and reuse
ds2 = DummyDataset.load_yaml("template.yaml")
History Tracking¶
By default, history is recorded when importing:
ds = from_ncdump_header(header, record_history=True)
# View construction history
history = ds.get_history()
print(f"Operations: {len(history)}")
# Export as Python code
python_code = ds.export_history('python')
print(python_code)
Disable history if not needed:
Supported Features¶
Dimensions¶
- ✅ Fixed-size dimensions
- ✅ UNLIMITED dimensions with current size
- ✅ UNLIMITED dimensions without size (→ None)
Variables¶
- ✅ All NetCDF data types (double, float, int, etc.)
- ✅ Multi-dimensional variables
- ✅ Coordinate variables
- ✅ Variable attributes
Attributes¶
- ✅ String attributes
- ✅ Numeric attributes (int, float)
- ✅ Array attributes
- ✅ Global attributes
Not Supported¶
- ❌ Data arrays (only structure)
- ❌ Groups (NetCDF-4 feature)
- ❌ User-defined types
- ❌ Compound types
Practical Workflows¶
Workflow 1: Replicate Existing Dataset¶
# Get structure from existing file
!ncdump -h existing_data.nc > structure.txt
# Import structure
with open('structure.txt') as f:
ds = from_ncdump_header(f.read())
# Populate with new data
ds.populate_with_random_data()
# Save as new file
ds.to_zarr("new_data.zarr")
Workflow 2: Document Dataset Schema¶
# Import structure
ds = from_ncdump_header(header_text)
# Export as YAML documentation
ds.save_yaml("dataset_schema.yaml")
# Export history as Python script
with open('create_dataset.py', 'w') as f:
f.write(ds.export_history('python'))
Workflow 3: Validate and Fix Metadata¶
# Import existing structure
ds = from_ncdump_header(header_text)
# Check CF compliance
result = ds.validate_cf()
# Fix issues
ds.infer_axis()
ds.set_axis_attributes()
# Re-validate
result = ds.validate_cf()
print(f"Warnings: {len(result['warnings'])}")
Tips and Best Practices¶
- Always validate - Run
validate_cf()after importing - Check coordinates - Verify coordinate detection is correct
- Handle UNLIMITED - Be aware of None dimension sizes
- Add encoding - Set chunks and compression for new data
- Document changes - Use history tracking for reproducibility
Limitations¶
- Only parses metadata, not data
- Assumes standard ncdump format
- May not handle all edge cases
- Groups and complex types not supported
See Also¶
- CF Compliance - Validate imported datasets
- History Tracking - Track modifications
- YAML Export - Save as templates
- Examples - More ncdump import examples