Cooperative Compute Architecture
Status: Experimental (Phase 1 Implemented) Last Updated: 2025-12-08
Overview
The Cooperative Compute Architecture (CCA) is designed to unify the CPU, GPU, and I/O subsystems of PyCauset. Instead of operating in isolation, these systems communicate via a Lookahead Protocol.
The Lookahead Protocol
The core concept is Intent-Based I/O. Solvers (CPU/GPU) declare their memory access patterns before starting computation. The I/O subsystem uses these hints to optimize data placement (prefetching, caching, pinning).
1. Memory Hints
Defined in include/pycauset/core/MemoryHints.hpp.
| Pattern | Description | I/O Action |
|---|---|---|
Sequential |
Reading 0..N | Prefetch contiguous pages. |
Strided |
Reading columns | Scatter-gather prefetch (Windows) or batched madvise (Linux). |
Random |
Graph traversal | (Future) Load hot pages based on index. |
Once |
Stream processing | Prefetch + Auto-Discard. |
2. Component Interaction
- Solver: Analyzes the operation (e.g., Matrix Multiplication).
- Solver: Calls
matrix->hint(MemoryHint::strided(...)). - PersistentObject: Forwards hint to
IOAccelerator. - IOAccelerator: Translates hint to OS-specific syscalls (
PrefetchVirtualMemory/madvise). - Solver: Executes computation (now with fewer page faults).
Implementation Status
Phase 1: Core Definitions (Complete)
MemoryHintstruct defined.PersistentObject::hint()API added.IOAccelerator::process_hint()stub added (handlesSequential).
Phase 2: I/O Intelligence (Complete)
- Implemented
Stridedsupport inIOAccelerator. - Added
prefetch_ranges_implfor Windows (usingPrefetchVirtualMemorywith scatter-gather lists) and Linux (batchedmadvise). - Verified with unit tests.
Phase 3: Solver Integration (Complete)
- Updated
CpuSolver::matmul_implto emit hints.- Detects Transposed matrices and emits
Stridedhints. - Emits
Sequentialhints for standard access.
- Detects Transposed matrices and emits
- Verified compilation and linking.
Phase 4: Pinned Memory (Complete)
- Implemented "Pinning Budget" in
MemoryGovernor.try_pin_memory(size): Atomic check-and-reserve.unpin_memory(size): Release budget.- Default Limit: 20% of RAM or 4GB (whichever is smaller).
- Verified with
test_memory_governor.cpp.