Memory & Data Architecture
This document details the internal architecture of the Pycauset memory system, matrix/vector hierarchy, and type system.
1. Memory System: Tiered Storage
PyCauset is designed to handle causal sets that exceed physical RAM. To achieve this, it adopts a tiered storage architecture where objects can live in RAM or be backed by files on disk.
- Persistence: Objects survive process termination if saved.
- Virtual Memory: The OS handles paging, allowing datasets larger than RAM.
- Interprocess Communication: Memory-mapped files allow multiple processes (e.g., a viewer and a solver) to share data with zero copy overhead.
The PersistentObject Base Class
All matrix and vector classes inherit from PersistentObject.
MemoryMapper: A member object that handles the low-levelmmap(Windows/Linux) calls.- Lifecycle:
- Creation: Creates a temporary file in the system temp directory (or a specified path).
- Access: Maps the file into the process's address space.
- Destruction: Unmaps the view. If the object was temporary, the file is deleted.
Snapshot immutability (mutation policy)
PyCauset distinguishes between:
- persisted
.pycausetsnapshots on disk, and - runtime working copies that may be mutated.
Policy (Release 1):
- Loading a
.pycausetfile yields a snapshot-backed view. - Payload writes do not implicitly overwrite the snapshot file.
- Mutations transition to a copy-on-write working copy so snapshot payload bytes are not overwritten by incidental edits.
Developer reference:
File Format (.pycauset)
.pycauset is a single-file binary container designed for mmap-friendly payload access and sparse, typed metadata.
Authoritative plans:
documentation/internals/plans/completed/R1_STORAGE_PLAN.md(container format)documentation/internals/plans/completed/R1_PROPERTIES_PLAN.md(metadata semantics)
Container summary
At a high level, each .pycauset file contains:
- A fixed-size header region that selects the active header slot (A/B).
- A raw payload region at a stable, aligned offset (so it can be memory-mapped efficiently).
- A typed metadata block (sparse map) that can be appended/updated without shifting the payload.
This design keeps the “heavy” numeric data mmap-friendly while allowing metadata evolution without scans or full rewrites. * Alignment: The data is laid out exactly as it would be in memory (e.g., row-major order for dense matrices, bit-packed words for bit matrices).
Typed metadata is the only schema
All metadata is stored as a typed top-level map (no parallel “legacy”/flattened metadata schema).
Important namespaces:
view: view-state that affects interpretation/derived values (e.g., scalar/transpose/conjugation).properties: user-facing gospel assertions (semantic hints; not truth-validated).cached: cached-derived values (both small scalars and big-blob references).
Big-blob caches (generalized mechanism)
A big-blob cache is a cached-derived value that is itself large (often another matrix/vector) and therefore must be persisted as an independent .pycauset object.
Persistence model:
- The base snapshot stores a typed entry at
cached.<name>. - For big blobs,
cached.<name>.valueis a reference:ref_kind: currentlysibling_object_storeobject_id: UUID hex
- The referenced object lives in a sibling object store directory:
BASE.pycauset.objects/<object_id>.pycauset
Validity model:
cached.<name>.signaturemust allow \(O(1)\) validation against the base object (no scans).- Signatures use
payload_uuid(a per-snapshot identifier that changes whenever payload bytes are persisted). - If the reference is missing/unreadable/stale, it is treated as a cache miss (ignored).
- A
PyCausetStorageWarningis emitted and the cache is not implicitly recomputed. - Missing/unreadable big-blob references should emit
PyCausetStorageWarningand continue load.
Implementation reference (Python):
python/pycauset/_internal/persistence.py- Object store layout helpers:
object_store_path_for_id,new_object_id - Typed ref read/write:
try_get_cached_big_blob_ref,write_cached_big_blob_ref
- Object store layout helpers:
python/pycauset/_internal/big_blob_cache.py- Reusable big-blob cache plumbing for operations:
compute_view_signature,try_load_cached_matrix,persist_cached_object
- Reusable big-blob cache plumbing for operations:
Loading Process
- Python opens the
.pycausetcontainer and reads the fixed header. - Python selects the active header slot (A/B), validates it, and reads the typed metadata block.
- Python instantiates the appropriate C++ class (e.g.,
_TriangularBitMatrix), passing the filename and the payload offset. - C++ calls
mmapon the file, applying the payload offset to map only the raw payload region.
2. Matrix & Vector Class Hierarchy
The system is built on a hierarchy designed to separate storage management from mathematical operations.
classDiagram
class PersistentObject {
+shared_ptr~MemoryMapper~ mapper_
+complex scalar_
+bool is_transposed_
+uint64_t rows_
+uint64_t cols_
+initialize_storage()
+copy_storage()
+ensure_unique()
+clone()
}
class MatrixBase {
<<abstract>>
+uint64_t rows()
+uint64_t cols()
+uint64_t base_rows()
+uint64_t base_cols()
+uint64_t size()
+get_element_as_double(i, j)*
+multiply_scalar()
+add_scalar()
+transpose()
}
class VectorBase {
<<abstract>>
+uint64_t size()
+get_element_as_double(i)*
+transpose()
}
class DenseMatrix~T~ {
+T* data()
+read(i, j)
+write(i, j, value)
}
class TriangularMatrix~T~ {
+vector~uint64_t~ row_offsets_
+read(i, j) T
+write(i, j, value)
}
class DiagonalMatrix~T~ {
+read(i, j) T
+write(i, j, value)
+get_diagonal(i)
}
class IdentityMatrix~T~ {
+get_element_as_double(i, j)
+add(IdentityMatrix)
}
class DenseVector~T~ {
+T* data()
+read(i)
+write(i, value)
}
class UnitVector {
+uint64_t active_index_
+read(i)
}
PersistentObject <|-- MatrixBase
PersistentObject <|-- VectorBase
MatrixBase <|-- DenseMatrix
MatrixBase <|-- TriangularMatrix
MatrixBase <|-- DiagonalMatrix
DiagonalMatrix <|-- IdentityMatrix
VectorBase <|-- DenseVector
VectorBase <|-- UnitVector
Core Concepts
Lazy Evaluation & Metadata
To maintain performance with large matrices, we avoid iterating over data whenever possible.
- Scalars: Multiplying a matrix by a scalar \(k\) does not multiply every element in memory. Instead, it updates
PersistentObject::scalar_. - Transposition: Transposing a matrix usually just toggles the
PersistentObject::is_transposed_flag. This is a metadata view: the backing storage stays in row-major order.
Important: MatrixBase.rows() / MatrixBase.cols() are logical dimensions and account for transpose metadata. MatrixBase.base_rows() / MatrixBase.base_cols() are the backing storage dimensions.
Storage vs. View
The data on disk is the "canonical" storage. The C++ object is a "view" onto that data.
* Raw Data: mapper_->get_data() returns the raw bytes.
* View: The class (e.g., DenseMatrix) interprets those bytes (as int, double, etc.) and applies metadata (scalar, transpose).
For dense matrices, MatrixBase.size() is NumPy-aligned: it returns the total number of logical elements (rows * cols).
Indexing and slicing (backend invariants)
Release 1 implements NumPy-style 2D indexing for dense matrices only:
- Basic indexing (view): integer/
slice/...with unit steps produces a view that shares the underlying mapper. Logical shape comes from the parsed slice; backing shape/offsets remain on the base. Transpose/conjugate metadata is preserved. - Advanced indexing (copy): 1D integer arrays (negative wrap) and 1D boolean masks are supported per axis. Any use of arrays returns a copy; two array axes must broadcast length or length-1.
- Assignments: RHS may be scalar, NumPy 0/1/2-D array, or dense matrix. NumPy 2D broadcast rules must hold; otherwise the setter raises. Casting RHS arrays triggers
PyCausetDTypeWarning; narrowing or float→int casts also triggerPyCausetOverflowRiskWarning. - Unsupported:
None/newaxis and slicing of structured/triangular matrices. - Kernel guardrail: Offsets in views are not yet honored by matmul/qr/lu/inverse kernels; calls with nonzero offsets throw and should be materialized via
copy()first. - Persistence policy (pending): Persisted sources must prefer view reuse; oversized slices on in-RAM sources should fail deterministically instead of implicit spill/snapshot (not yet implemented).
Code touchpoints:
* Slice parsing and dispatch: src/bindings/bind_matrix.cpp (SliceInfo, parse_matrix_subscript, dense_getitem, dense_setitem).
* View metadata: MatrixBase (logical_rows/cols, row_offset/col_offset) and DenseMatrix view constructor/accessors.
* Kernel guards: DenseMatrix::multiply, inverse, qr, lu reject nonzero view offsets.
See also: pycauset.MatrixBase, Matrix Guide, NumPy Alignment Protocol
3. Type System and Dispatch
This document explains how PyCauset handles different data types (double, float, bool) and how operations are dispatched to the correct implementation.
Philosophy: Anti-Promotion
A core principle of PyCauset's type system is Anti-Promotion.
- Traditional Approach: Many libraries (like early NumPy or MATLAB) aggressively promote everything to
double(Float64) to ensure precision. - PyCauset Approach: We respect the user's choice of type. If a user provides
Float32data, they likely want the performance benefits ofFloat32. We should not silently promote it toFloat64unless absolutely necessary (e.g., mixing types).
Rules:
1. Float32 op Float32 -> Float32
2. Float32 op Float64 -> Float64 (Promotion allowed for mixed types)
The Dispatcher
We avoid virtual function overhead for every single element access. Instead, we use a Templated Dispatcher pattern at the operation level (e.g., Matrix Multiplication, Addition).
CPU Dispatch
On the CPU, we use C++ templates to generate specialized code for each type.
// Conceptual example
template <typename T>
void matmul_impl(const MatrixBase& A, const MatrixBase& B, MatrixBase& C) {
// ... optimized code for type T ...
}
void dispatch_matmul(const MatrixBase& A, const MatrixBase& B, MatrixBase& C) {
if (A.dtype() == DType::Float32 && B.dtype() == DType::Float32) {
matmul_impl<float>(A, B, C);
} else if (A.dtype() == DType::Float64) {
matmul_impl<double>(A, B, C);
}
// ...
}
GPU Dispatch
On the GPU, CudaDevice::matmul inspects the DType of the operands and routes the call to the appropriate cuBLAS function.
DType::Float64->cublasDgemmDType::Float32->cublasSgemmDType::Float16->cublasHgemm(Tensor Cores)
Memory Layout
To support this efficient dispatch, the backend works with raw, dtype-tagged memory.
- Host memory lives behind
MemoryMapper(mapper_->get_data()), which provides access to the mapped bytes. - Matrix/Vector views interpret those bytes as typed storage and apply metadata (scalar, transpose).
- Device memory (CUDA) uses dtype-specific buffers to avoid constant casting or reallocation.