PyCauset Container Format
This page specifies the on-disk container format for .pycauset files.
If you want a conceptual and user-oriented explanation (how to save/load safely, copying rules, mental model), start here:
Goals
The container format is designed to satisfy three constraints at once:
1) mmap-friendly payload: large payload bytes live at a stable, aligned offset. 2) deterministic load: loading selects an active header slot in \(O(1)\). 3) crash-consistent metadata updates: metadata updates are append-only and committed by flipping a header slot.
High-level layout
1) Fixed 4096-byte header with a preamble + two header slots (A/B).
2) Payload region at payload_offset (aligned to 4096 bytes).
3) Append-only metadata block at metadata_offset (aligned to 16 bytes).
Endianness
- Containers are little-endian only.
- A header endian marker allows fast failure on unsupported endianness.
Alignment
payload_offsetis aligned to 4096 bytes.metadata_offsetis aligned to 16 bytes.
Fixed header (4096 bytes)
The file begins with a fixed header region:
- A 16-byte preamble.
- Two 128-byte header slots (A and B).
- The remainder reserved (zero in the current format).
Preamble (offset 0)
| Field | Type | Notes |
|---|---|---|
magic |
8 bytes | ASCII PYCAUSET |
format_version |
u32 | current = 1 |
endian |
u8 | 1 = little-endian |
header_bytes |
u16 | current = 4096 |
reserved0 |
u8[1] | must be 0 |
Header slots (A and B)
Each slot is 128 bytes and stores the authoritative pointers.
| Field | Type | Notes |
|---|---|---|
generation |
u64 | monotonic; higher wins |
payload_offset |
u64 | aligned to 4096 |
payload_length |
u64 | bytes |
metadata_offset |
u64 | aligned to 16 |
metadata_length |
u64 | bytes |
hot_offset |
u64 | 0 in v1 |
hot_length |
u64 | 0 in v1 |
slot_crc32 |
u32 | CRC32 of the first 7 fields (56 bytes) |
slot_reserved |
u8[68] | must be 0 |
Slot validity (v1):
slot_crc32matches- offsets/lengths are in-range for file size
- alignment constraints satisfied
Active slot selection:
- Choose the valid slot with the highest
generation. - If neither slot is valid, loading fails.
Payload region
The payload is raw bytes suitable for memory mapping:
- Starts at
payload_offset. - Spans
payload_lengthbytes. - Interpretation is defined by identity metadata (shape, dtype, matrix type,
payload_layout).
Metadata blocks (append-only)
Metadata is stored as blocks after the payload. The active header slot points to the authoritative block.
Metadata framing (at metadata_offset)
| Field | Type | Notes |
|---|---|---|
block_magic |
4 bytes | ASCII PCMB |
block_version |
u32 | v1 = 1 |
encoding_version |
u32 | typed-metadata encoding version; v1 = 1 |
reserved0 |
u32 | must be 0 |
payload_length |
u64 | bytes of encoded metadata payload |
payload_crc32 |
u32 | CRC32 of encoded metadata payload |
reserved1 |
u32 | must be 0 |
payload |
bytes | length = payload_length |
If framing or CRC fails, loading fails deterministically.
Typed metadata map (v1)
The encoded metadata payload is a single top-level map with string keys.
Reserved namespaces:
- identity/header keys:
rows,cols,matrix_type,data_type,payload_layout,payload_uuid, ... view: system-managed view-stateproperties: user-facing gospel assertionscached: cached-derived values plus validity metadataprovenance: optional non-semantic provenance
Readers ignore unknown keys.
Crash-consistent metadata update rule
To update metadata without scanning:
1) Append the new metadata block to the end of the file.
2) Ensure it is fully written (and flushed if applicable).
3) Write the inactive header slot with generation = active.generation + 1 and the new metadata pointer.
4) Optionally flush the header region.
This guarantees deterministic \(O(1)\) load and never moves the payload region.
Debugging notes
When a .pycauset file fails to load
1) Confirm magic PYCAUSET and version.
2) Inspect header slot A/B:
- CRC valid?
- offsets/lengths in-range?
- alignments satisfied?
- which slot is active?
3) Validate the metadata block framing and CRC.
Developer tooling:
python/pycauset/_internal/storage_debug.pyexposessummarize_container(path).