Importing from ncdump Headers¶

dummyxarray can parse ncdump -h output to create dataset structures from existing NetCDF files.

Overview¶

The from_ncdump_header() function parses the output of ncdump -h and creates a DummyDataset with the same structure. This is useful for:

Replicating structures - Copy metadata from existing datasets
Template creation - Use real datasets as templates
Quick inspection - Understand dataset structure without loading data
Documentation - Extract and document dataset schemas

Basic Usage¶

Step 1: Get ncdump Header¶

# Generate header from NetCDF file
ncdump -h your_file.nc > header.txt

Step 2: Import to DummyDataset¶

from dummyxarray import from_ncdump_header

# Read the header file
with open('header.txt', 'r') as f:
    header_text = f.read()

# Create DummyDataset
ds = from_ncdump_header(header_text)

# The dataset now has the same structure
print(ds)

What Gets Imported¶

The parser extracts:

✅ Dimensions - Including UNLIMITED dimensions
✅ Variables - With dimensions and data types
✅ Coordinates - Automatically detected
✅ Attributes - Variable and global attributes
✅ Metadata - All CF-compliant metadata

Note: Data arrays are NOT imported, only the structure and metadata.

Complete Example¶

from dummyxarray import from_ncdump_header

# Example ncdump output
header = """
netcdf climate_data {
dimensions:
    time = UNLIMITED ; // (365 currently)
    lat = 64 ;
    lon = 128 ;
variables:
    double time(time) ;
        time:units = "days since 2000-01-01" ;
        time:calendar = "gregorian" ;
        time:axis = "T" ;
    double lat(lat) ;
        lat:units = "degrees_north" ;
        lat:standard_name = "latitude" ;
        lat:axis = "Y" ;
    double lon(lon) ;
        lon:units = "degrees_east" ;
        lon:standard_name = "longitude" ;
        lon:axis = "X" ;
    float temperature(time, lat, lon) ;
        temperature:units = "K" ;
        temperature:standard_name = "air_temperature" ;
        temperature:long_name = "Air Temperature" ;

// global attributes:
        :Conventions = "CF-1.8" ;
        :title = "Climate Model Output" ;
}
"""

# Import structure
ds = from_ncdump_header(header)

# Check what was imported
print(f"Dimensions: {ds.dims}")
print(f"Coordinates: {list(ds.coords.keys())}")
print(f"Variables: {list(ds.variables.keys())}")
print(f"Global attrs: {ds.attrs}")

Coordinate Detection¶

The parser automatically identifies coordinates using this rule:

A variable is a coordinate if: - It has exactly one dimension - The dimension name matches the variable name

# These are detected as coordinates:
double time(time) ;      # ✓ Coordinate
double lat(lat) ;        # ✓ Coordinate
double lon(lon) ;        # ✓ Coordinate

# These are detected as variables:
float temperature(time, lat, lon) ;  # ✗ Variable (multi-dim)
float bounds(lat, nv) ;              # ✗ Variable (dims don't match name)

Handling UNLIMITED Dimensions¶

UNLIMITED dimensions are handled automatically:

# ncdump shows:
# time = UNLIMITED ; // (365 currently)

# Parser extracts the current size (365)
ds = from_ncdump_header(header)
print(ds.dims['time'])  # 365

If no current size is specified, the dimension size will be None:

# time = UNLIMITED ;  (no current size)
# ds.dims['time'] will be None

Working with Imported Datasets¶

Once imported, you can work with the dataset normally:

Populate with Data¶

# Add random data for testing
ds.populate_with_random_data(seed=42)

# Now convert to xarray
xr_ds = ds.to_xarray()

Validate CF Compliance¶

# Check CF compliance
result = ds.validate_cf()
print(f"Warnings: {len(result['warnings'])}")

Modify Structure¶

# Add new variables
ds.add_variable(
    "humidity",
    dims=["time", "lat", "lon"],
    attrs={"units": "%", "standard_name": "relative_humidity"}
)

# Update attributes
ds.assign_attrs(history="Modified with dummyxarray")

Export as Template¶

# Save as YAML template
ds.save_yaml("template.yaml")

# Later, load and reuse
ds2 = DummyDataset.load_yaml("template.yaml")

History Tracking¶

By default, history is recorded when importing:

ds = from_ncdump_header(header, record_history=True)

# View construction history
history = ds.get_history()
print(f"Operations: {len(history)}")

# Export as Python code
python_code = ds.export_history('python')
print(python_code)

Disable history if not needed:

ds = from_ncdump_header(header, record_history=False)

Supported Features¶

Dimensions¶

✅ Fixed-size dimensions
✅ UNLIMITED dimensions with current size
✅ UNLIMITED dimensions without size (→ None)

Variables¶

✅ All NetCDF data types (double, float, int, etc.)
✅ Multi-dimensional variables
✅ Coordinate variables
✅ Variable attributes

Attributes¶

✅ String attributes
✅ Numeric attributes (int, float)
✅ Array attributes
✅ Global attributes

Not Supported¶

❌ Data arrays (only structure)
❌ Groups (NetCDF-4 feature)
❌ User-defined types
❌ Compound types

Practical Workflows¶

Workflow 1: Replicate Existing Dataset¶

# Get structure from existing file
!ncdump -h existing_data.nc > structure.txt

# Import structure
with open('structure.txt') as f:
    ds = from_ncdump_header(f.read())

# Populate with new data
ds.populate_with_random_data()

# Save as new file
ds.to_zarr("new_data.zarr")

Workflow 2: Document Dataset Schema¶

# Import structure
ds = from_ncdump_header(header_text)

# Export as YAML documentation
ds.save_yaml("dataset_schema.yaml")

# Export history as Python script
with open('create_dataset.py', 'w') as f:
    f.write(ds.export_history('python'))

Workflow 3: Validate and Fix Metadata¶

# Import existing structure
ds = from_ncdump_header(header_text)

# Check CF compliance
result = ds.validate_cf()

# Fix issues
ds.infer_axis()
ds.set_axis_attributes()

# Re-validate
result = ds.validate_cf()
print(f"Warnings: {len(result['warnings'])}")

Tips and Best Practices¶

Always validate - Run validate_cf() after importing
Check coordinates - Verify coordinate detection is correct
Handle UNLIMITED - Be aware of None dimension sizes
Add encoding - Set chunks and compression for new data
Document changes - Use history tracking for reproducibility

Limitations¶

Only parses metadata, not data
Assumes standard ncdump format
May not handle all edge cases
Groups and complex types not supported