History Tracking¶

dummyxarray automatically tracks all operations performed on datasets, enabling reproducible workflows and dataset provenance.

Overview¶

Every operation on a DummyDataset is automatically recorded, including: - Dataset initialization - Adding dimensions, coordinates, and variables - Setting attributes - Renaming operations - And more...

This history can be: - Exported as Python code, JSON, or YAML - Visualized as text, DOT graphs, or Mermaid diagrams - Replayed to recreate datasets - Reset to start tracking from a clean state

Basic Usage¶

Getting History¶

from dummyxarray import DummyDataset

ds = DummyDataset()
ds.add_dim("time", 10)
ds.add_coord("time", dims=["time"], attrs={"units": "days"})
ds.assign_attrs(title="My Dataset")

# Get the operation history
history = ds.get_history()
print(history)

Output:

[
    {'func': '__init__', 'args': {}},
    {'func': 'add_dim', 'args': {'name': 'time', 'size': 10}},
    {'func': 'add_coord', 'args': {'name': 'time', 'dims': ['time'], 'attrs': {'units': 'days'}}},
    {'func': 'assign_attrs', 'args': {'title': 'My Dataset'}}
]

Exporting History¶

As Python Code¶

Export history as executable Python code:

python_code = ds.export_history('python')
print(python_code)

Output:

ds = DummyDataset()
ds.add_dim(name='time', size=10)
ds.add_coord(name='time', dims=['time'], attrs={'units': 'days'})
ds.assign_attrs(title='My Dataset')

As JSON¶

json_history = ds.export_history('json')
print(json_history)

As YAML¶

yaml_history = ds.export_history('yaml')
print(yaml_history)

Visualizing History¶

Text Format¶

print(ds.visualize_history(format='text'))

Output:

Dataset Construction History
============================
1. __init__()
2. add_dim(name='time', size=10)
3. add_coord(name='time', dims=['time'], attrs={'units': 'days'})
4. assign_attrs(title='My Dataset')

Summary:
  Total operations: 4
  Operation breakdown:
    __init__: 1
    add_dim: 1
    add_coord: 1
    assign_attrs: 1

DOT Format (Graphviz)¶

Generate a graph visualization:

dot_graph = ds.visualize_history(format='dot')
print(dot_graph)

# Save to file and render
with open('history.dot', 'w') as f:
    f.write(dot_graph)

# Render with: dot -Tpng history.dot -o history.png

Mermaid Format¶

Generate a Mermaid diagram (works in GitHub, GitLab, documentation):

mermaid_diagram = ds.visualize_history(format='mermaid')
print(mermaid_diagram)

Output:

graph TD
    A[__init__] --> B[add_dim: time=10]
    B --> C[add_coord: time]
    C --> D[assign_attrs: title]

Replaying History¶

From History List¶

Recreate a dataset from its history:

# Get history from original dataset
history = ds.get_history()

# Create new dataset by replaying history
new_ds = DummyDataset.replay_history(history)

# Verify they're equivalent
assert new_ds.dims == ds.dims
assert new_ds.attrs == ds.attrs

From JSON/YAML¶

# Export history
json_history = ds.export_history('json')

# Later, replay from JSON string
recreated_ds = DummyDataset.replay_history(json_history)

Use Cases for Replay¶

Version Control - Store history in git, replay to recreate datasets
Documentation - Include history in papers/reports
Debugging - Replay to understand how a dataset was created
Templates - Share dataset creation workflows

Provenance Tracking¶

Track what changed in each operation:

ds = DummyDataset()
ds.add_dim("time", 10)
ds.add_coord("time", dims=["time"])
ds.rename_dims(time="t")

# Get provenance information
provenance = ds.get_provenance()
print(provenance)

Output:



href="#__codelineno-14-1">[ { 'index': 1, 'func': 'add_dim', 'provenance': {'added': ['time']} }, { 'index': 2, 'func': 'add_coord', 'provenance': {'added': ['time']} }, { 'index': 3, 'func': 'rename_dims', 'provenance': { 'renamed': {'time': 't'}, 'removed': ['time'], 'added': ['t'] } } class="p">]
Visualizing Provenance¶
# Compact view
print(ds.visualize_provenance(format='compact'))

# Detailed view
print(ds.visualize_provenance(format='detailed'))

Resetting History¶
Start fresh while keeping the current dataset state:
# Create dataset with many operations
ds = DummyDataset()
ds.add_dim("time", 10)
ds.add_dim("lat", 64)
ds.add_coord("time", dims=["time"])
# ... many more operations ...

# Reset history
ds.reset_history()

# Now history only contains __init__
print(len(ds.get_history()))  # 1

# New operations are tracked from this point
ds.add_variable("temp", dims=["time", "lat"])
print(len(ds.get_history()))  # 2

When to Reset¶

After importing from xarray
When starting a new workflow phase
To simplify history for documentation
Before sharing templates

Complete Example¶
from dummyxarray import DummyDataset

# Create dataset with tracked operations
ds = DummyDataset()
ds.assign_attrs(Conventions="CF-1.8", title="Temperature Data")

ds.add_dim("time", 12)
ds.add_dim("lat", 64)
ds.add_dim("lon", 128)

ds.add_coord("time", dims=["time"], attrs={"units": "days since 2000-01-01"})
ds.add_coord("lat", dims=["lat"], attrs={"units": "degrees_north"})
ds.add_coord("lon", dims=["lon"], attrs={"units": "degrees_east"})

ds.add_variable(
    "temperature",
    dims=["time", "lat", "lon"],
    attrs={"standard_name": "air_temperature", "units": "K"}
)

# Export history as Python code
python_code = ds.export_history('python')
with open('create_dataset.py', 'w') as f:
    f.write(python_code)

# Visualize history
print(ds.visualize_history(format='text'))

# Save history as JSON for version control
import json
history = ds.get_history()
with open('dataset_history.json', 'w') as f:
    json.dump(history, f, indent=2)

# Get provenance information
provenance = ds.get_provenance()
print(f"\nTotal operations with provenance: {len(provenance)}")
print(f"Items added: {sum(len(p['provenance'].get('added', [])) for p in provenance)}")

History for DummyArray¶
Individual arrays also track their history:
from dummyxarray import DummyArray

arr = DummyArray(dims=["time"], attrs={"units": "K"})
arr.assign_attrs(long_name="Temperature", standard_name="air_temperature")

# Get array history
history = arr.get_history()
print(history)

# Replay array history
new_arr = DummyArray.replay_history(history)

Best Practices¶

Export history regularly - Save to version control
Use meaningful operation names - Makes history easier to understand
Reset after imports - Clean history when importing from xarray
Visualize for documentation - Include Mermaid diagrams in docs
Replay for testing - Verify dataset creation is reproducible
Track provenance - Understand what changed and when

Advanced: Custom History Recording¶
History recording can be disabled:
# Create dataset without history tracking
ds = DummyDataset(_record_history=False)

# No history is recorded
ds.add_dim("time", 10)
print(ds.get_history())  # None

This can be useful for:
- Performance-critical code
- Temporary datasets
- Internal operations
Next Steps¶

Learn about CF Compliance validation
See Examples for history tracking workflows
Check the API Reference for all history methods