DType / Complex / Overflow Plan (Implementation)
Status (2025-12-16): This implementation plan is complete. Phase 1 complete; Phase 2 complete (int8/int16/int32/int64 + uint8/uint16/uint32/uint64 + float16 end-to-end); Phase 3 complete (complex floats are now first-class end-to-end on CPU for the current core-op surface); Phase 4 complete (support matrix declared + enforced in tests/tools). Optional backlog items are still listed below.
Scope update (2025-12)
This plan originally sketched “complex permutations for all base dtypes” (including complex int* and complex bit).
Current project direction: complex support is limited to complex floats only (complex_float16, complex_float32, complex_float64). Complex permutations of non-float dtypes (complex int* / complex bit) are a non-goal by design due to high implementation surface area (promotion/overflow/kernels/persistence/tests) with low practical payoff for PyCauset’s workloads.
As a result:
- Phase 2 includes first-class float16 as a general dtype, plus the full signed/unsigned integer width set.
- Phase 3 (complex) should be interpreted as “complex float integration”, with complex_float16 implemented after float16 readiness.
Phase completion status
- Phase 0 — Documentation & policy grounding: Complete
- Phase 1 — Centralize promotion + overflow policies: Complete
- Phase 2 — Scalar system expansion: Complete (int8/int16/int32/int64, uint8/uint16/uint32/uint64, float16 end-to-end through factories/promotion/CPU dispatch/persistence/bindings/NumPy for the core op surface)
- Phase 3 — Complex system integration: Complete (complex_float16/32/64 are first-class dtypes through factories/promotion/CPU dispatch/persistence/bindings/NumPy for core ops)
- Phase 4 — Coverage enforcement: Complete (support matrix declared + enforced in tests/tools)
This file is an implementation plan. The authoritative dtype behavior documentation lives in:
documentation/internals/DType System.md
0) Problem statement
PyCauset supports several fundamentally different scalar/storage types (bit-packed bit, integers, floats) plus a partially-separate complex system. Adding a new operation currently requires touching multiple layers and remembering many dtype-specific corner cases:
- type/promotion rules are split between global helpers and per-op frontends,
- CPU kernels often dispatch on “result dtype” and omit some types,
- complex numbers are currently not a first-class
MatrixBasedtype and therefore drift from the main dispatch/type-resolution path, - missing coverage is easy to ship because there is no single enforceable “support matrix”.
This document proposes a new, centralized dtype architecture that:
- makes complex floats first-class in the scalar type system,
- adds multiple integer widths (signed/unsigned),
- defines explicit promotion + overflow policies,
- keeps the “anti-promotion / smallest type” ethos,
- keeps performance and out-of-core constraints as first-class concerns.
1) Key constraints (from project philosophy + recent decisions)
- Scale-first: matrices may be 100GB+; memory blowups are unacceptable.
- Underpromotion default: when PyCauset underpromotes, it means compute and result storage both use the smallest selected dtype.
- No silent widening for accuracy: no hidden “compute in float64 then downcast” in the default path.
- Bit matrices are numeric for arithmetic ops: treat
bitvalues as 0/1 numeric values for arithmetic ops (e.g.,+,*,dot,matmul). Bitwise ops are explicit and must preserve bit-packed storage. - Overflow behavior: integer overflow is a runtime error. PyCauset does not auto-promote to avoid overflow.
- Overflow warning: for large integer matmul, run a worst-case bound preflight and emit a warning when overflow looks plausible.
- Complex floats are first-class: complex support is limited to float base dtypes.
complex_float32/complex_float64are BLAS-backed where applicable (native complex typescomplex64/complex128).complex_float16is implemented as a first-class dtype using a two-plane float16 storage model.- Complex non-floats are a non-goal:
complex int*/complex bitare intentionally unsupported to avoid a large promotion/overflow/kernel/persistence surface area with low payoff. - Fundamental-kind rule (bit/int/float): PyCauset never “promotes down” across fundamental kinds. If an operation mixes kinds, the result kind is the higher kind required by the operation’s semantics.
2) Terminology
- Scalar type: the per-element numeric type (bit/int/float plus width and flags).
- Matrix structure: dense/triangular/symmetric/etc. (storage layout and indexing constraints).
- Operation (op): add/subtract/elementwise multiply/matmul/inverse/eigvals/etc.
- Promotion policy: rules for selecting result dtypes for mixed-input ops.
- Overflow policy: what happens when integer arithmetic overflows.
3) Proposed scalar type model (flags/permutations)
Represent scalar types as:
kind:bit | int | floatwidth_bits: for int/float (8/16/32/64), and 1 for bitflags: a small set of orthogonal modifierscomplex(supported for float scalar types only)unsigned(valid only forint)
Examples:
bit= (bit, 1, {})int16= (int, 16, {})uint16= (int, 16, {unsigned})float16= (float, 16, {})complex float16= (float, 16, {complex})float32= (float, 32, {})complex float32(complex64) = (float, 32, {complex})float64= (float, 64, {})complex float64(complex128) = (float, 64, {complex})
Supported scalar set (initial target)
- bit
- int8/int16/int32/int64
- uint8/uint16/uint32/uint64
- float16/float32/float64
- complex_float16/complex_float32/complex_float64
4) Complex implementation strategy
4.1 Complex floats (performance path)
- Implement
complex_float32(complex64) andcomplex_float64(complex128) as true complex numeric types. - Prefer BLAS-backed complex GEMM where applicable.
4.2 Complex float16 (two-plane storage path)
- Represent
complex_float16as two float16 planes (real + imag). - Motivation: there is no ubiquitous, efficient “native complex half” representation across the stack, and forcing complex-half into complex-float32 would violate the “smallest type” ethos.
- Persistence must round-trip as a single complex dtype (one logical object, two payload planes).
4.3 Explicit non-goals
- Complex permutations of non-float dtypes (
complex int*,complex bit) are intentionally out of scope. - If/when we ever revisit this, it must be driven by concrete workloads and come with a scoped support matrix (ops × dtype) rather than a blanket “closure” rule.
This plan does not assume automatic widening in integer matmul. Under the current policy:
- integer overflow throws, and
- the system does not silently widen storage to avoid overflow.
If we ever decide that a particular op’s semantic result dtype must be wider (e.g., a count-producing op), that must be a named, explicit promotion rule and must be documented as semantics, not an overflow workaround.
5) Promotion policy (centralized, op-specific)
Create a single authoritative table/function:
resolve_result_scalar(op, a_scalar, b_scalar) -> scalarresolve_result_structure(op, a_structure, b_structure) -> structure
Design principles:
- Default to the smallest dtype that can represent the result per op semantics.
- Mixed float precision underpromotes by default (compute+store in the smaller float), with a configurable option to promote instead.
- Complex is a flag: complex-ness is preserved unless an op is explicitly defined to drop it.
- Unsigned is preserved where meaningful; if an op can generate negatives, rules must define whether to promote to signed or throw.
5.1 Fundamental kinds (bit / int / float) and “no promote down”
PyCauset distinguishes three fundamental kinds:
bit(bit-packed boolean storage; special rules allowed)int(signed/unsigned integers)float(float16/float32/float64)
Rules:
1) No promote down across kinds. If kinds differ, the result kind cannot be the “lower” kind.
2) When a float participates, the result kind is float. Example: matmul(bit, float64) -> float64.
3) When only integers/bits participate, the result kind is integer unless the op is explicitly bitwise.
4) Underpromotion applies within a kind, not across kinds. Example: matmul(float32, float64) -> float32 by default.
This strikes a balance:
- it preserves the “smallest type” ethos where it is meaningful (within float precision),
- it avoids absurd outcomes like underpromoting a float computation to bit storage,
- it keeps
bitspecial (bitwise ops remain bitwise; numeric ops may change kind).
5.2 Bit is special (scale-first exceptions)
Bit matrices/vectors are used to represent large binary structures (e.g., spacetime relations) where the storage is often 10s–100s of GB.
As a result:
- Bitwise ops (e.g., NOT/AND/OR/XOR) should preserve
bitand stay bit-packed. - Numeric ops that inherently create non-binary results (e.g.,
bit + bit,matmul(bit, bit)producing integer counts) may require widening tointorfloat.
For such numeric ops, widening can be prohibitively expensive. Therefore, for bit we allow explicit, op-specific behavior:
- supported with a documented widening result kind, or
- error-by-design unless the user explicitly requests a widened dtype.
The support matrix must record which choice is made for each op.
Config hooks:
promotion_policy.float_mixed:underpromote_warn(default) |promote|underpromote_no_warn
Warning controls (exact API TBD, but must exist):
warning_policy.float_underpromotion: on by default whenpromotion_policy.float_mixed=underpromote_warnwarning_policy.int_reduction_acc_widen: on by default; emitted whendot/matmulwidens the accumulator dtypewarning_policy.int_overflow_risk_preflight: on by default for “large” integer matmul; emitted when conservative bounds indicate plausible overflow in the requested output dtype
6) Overflow policy
6.1 Runtime behavior
- Overflow is a hard error.
- PyCauset does not auto-promote storage to avoid overflow.
6.1.1 Why this focuses on integer overflow (and not float overflow)
Floating-point overflow is real (e.g., float32 can overflow to +inf), but it behaves differently:
- IEEE-754 overflow typically becomes
inf(and may raise a floating-point flag), which then propagates. - This is often detectable after-the-fact (e.g.,
isfinitechecks), whereas integer overflow in C++ can be undefined behavior or silent wrap depending on the implementation.
Policy-wise:
- For integers: overflow must throw (no silent wrap).
- For floats: overflow results in
inf/nanaccording to IEEE-754; optional “finite-check” validation can exist as a debug/strict mode, but it is not the default because scanning 100GB+ outputs is expensive.
6.2 Preflight warning for large integer matmul
For integer matmul (and potentially some other high-risk ops), run a cheap preflight to estimate overflow risk:
1) sample blocks/rows to estimate max_abs(A) and max_abs(B) (including scalar metadata factors if they apply)
2) compute a conservative bound:
Where \(K\) is the inner dimension (for square matmul, \(K=N\)).
If the bound approaches/exceeds the target dtype max value, emit a warning:
PyCausetWarning: matmul(<lhs_dtype>, <rhs_dtype>) may overflow <out_dtype> (conservative bound). Consider requesting a wider output dtype or scaling.
Notes:
- This is a heuristic. It should warn on risk; it does not guarantee overflow will happen.
- It avoids inner-loop overflow checks in the performance-critical kernel.
Documentation requirement:
- Add an “Overflow” section/doc describing the policy, the preflight warning, and user mitigations.
6.3 Reduction-aware accumulator width (dot/matmul) + required warning
Some integer reductions (especially dot/matmul) can overflow the accumulator even when inputs are representable and the requested output dtype is unchanged.
To keep integer math defined and to uphold “overflow throws” without requiring expensive per-multiply-add overflow checks inside the hot loop, PyCauset uses a reduction-aware accumulator width for integer reductions.
Key clarifications (scale-first):
- This rule is about the accumulator dtype (compute registers / local scratch), not about materializing inputs.
- In particular,
bitinputs stay bit-packed;matmul(bit, int16)does not expand thebitmatrix toint32elements. - This rule does not silently widen the result storage dtype. If the user requests
int16output, the result is stored asint16and overflow remains a hard error (typically detected at the final cast from the wider accumulator).
6.3.1 Accumulator-width selection (deterministic / conservative)
For matmul/dot over integer kinds (including bit treated as numeric 0/1), choose an accumulator dtype wide enough that the worst-case bound for the reduction fits.
For C = A @ B with inner dimension K:
- Use a conservative magnitude bound based on dtype limits (no sampling required):
- For
bit, \(\max|A| = 1\).
For integer dtypes, \(\max|A|\) and \(\max|B|\) may be taken as the maximum representable magnitude for their dtypes (e.g., for int16, 32767). This is conservative and ensures accumulator selection is correctness-preserving without needing an extra pass over out-of-core data.
This is intentionally conservative: it is designed to be computed cheaply and to be correct without relying on probabilistic assumptions.
Optionally (future optimization): when it is cheap relative to the matmul itself and does not force an extra out-of-core pass, tighten the bound using exact streaming summaries such as row popcounts for bit and per-column max-abs for the integer operand.
6.3.2 User-visible warning (required)
Whenever the chosen accumulator dtype is wider than what a reader would naively expect from the inputs (e.g., matmul(bit, int16) accumulating into int32), PyCauset must emit a warning so users understand what is happening.
The warning must include:
- operation name (e.g.,
matmul/dot) - lhs dtype and rhs dtype
- chosen accumulator dtype
- output storage dtype (explicitly stating whether it changed or not)
- reason (reduction-aware widening to keep integer overflow defined)
Suggested warning text (exact wording not required, but content is):
PyCausetWarning: matmul(bit, int16) will accumulate in int32 (reduction-aware integer width). Output dtype remains int16; overflow still throws on cast. Bit input remains bit-packed (no materialization).
Noise control:
- Warn once per call site (or once per unique
(op, lhs_dtype, rhs_dtype, out_dtype, acc_dtype)tuple) to avoid spam. - Provide a user-facing way to silence/route warnings (Python
warnings.warn(...)category, and/or a context flag).
7) Enforceable op coverage (“support matrix”)
Introduce an explicit coverage matrix that enumerates for each operation:
- required scalar families (bit/int/float + complex)
- supported widths
- supported structures (dense/triangular/symmetric/etc.)
- required behaviors (defined, error-by-design, or unimplemented)
Goal:
- When a new op is added, missing dtype coverage becomes a failing test/tool run, not a surprise at runtime.
8) Implementation sequence (phased)
Phase 0 — Documentation & policy grounding (Complete)
- Update project philosophy to explicitly define underpromotion and overflow behavior.
- Add roadmap entry for multi-int widths + unsigned.
- Add this plan doc.
Phase 1 — Centralize promotion + overflow policies (Complete)
- Single promotion resolver per op.
- Central overflow policy + preflight warning for integer matmul.
- Reduction-aware accumulator width for integer
dot/matmul+ required user warning when accumulator widens. - Add mandatory tests for resolver correctness, warning emission, and reduction accumulator selection (see “Mandatory tests”).
Phase 2 — Scalar system expansion (Complete)
- Add integer widths + unsigned.
- Ensure constructors, IO, numpy interop, and basic ops exist.
Phase 3 — Complex system integration (Complete)
- Core complex-float dtype integration is implemented (CPU + persistence + Python/NumPy for key ops).
- See “Phase 3 — Complex system integration (Detailed)” in Section 8.1.
Phase 4 — Coverage enforcement (Complete)
- Support matrix exists and is executed by unit tests and a dev checker tool, so declared support can’t silently regress.
8.1) Phase 3 — Complex system integration (Detailed)
Objective: Make complex float dtypes first-class and integrate them into the same end-to-end pipeline as real dtypes (frontend allocation → promotion resolver → CPU/GPU dispatch → persistence → Python).
User-facing requirement: complex float dtypes must behave like normal dtypes on the frontend. For example, pc.complex_float16 (or equivalent public token) must be a valid dtype= argument to Matrix/Vector factories.
Scope for Phase 3: expand complex support to float base dtypes only:
float16→complex_float16(two float16 planes)float32→complex_float32(a.k.a.complex64)float64→complex_float64(a.k.a.complex128)
Out of scope: complex permutations of non-float dtypes (complex int*, complex bit).
3.x Phase 3 status update (2025-12-16)
Completed in the current codebase:
- First-class complex float dtypes exist end-to-end:
complex_float16/32/64. - Storage:
complex_float32/64: dense storage uses native complex element types.complex_float16: two-plane float16 storage (real+imag) for both matrices and vectors.- Dispatch/promotion:
- promotion resolver supports complex results for matmul/add/sub/elementwise, plus dot/matvec/vecmat/outer.
- CPU solver contains complex implementations for dot/matvec/vecmat/outer and vector elementwise/scalar ops.
- Python/NumPy/persistence:
- dtype tokens + factory inference +
np.array(...)interop + container persistence round-trip. - dot returns Python
complexwhen either operand is complex.
Optional backlog (not required for plan completion):
- Ensure solver/eigensystem outputs use first-class complex dtypes end-to-end (no parallel complex object model).
- BLAS/cBLAS complex GEMM path for dense complex matmul on CPU (and GPU complex where applicable).
- Expand complex coverage across additional operations beyond the current core set.
3.0 Replace legacy ComplexMatrix / ComplexVector (compat layer)
Current state (updated 2025-12-16):
- First-class complex float matrices/vectors now exist as
MatrixBase/VectorBasedtypes (complex_float16/32/64). - The legacy
ComplexMatrix/ComplexVectorconcept may still exist in some solver/eigensystem return paths. That legacy path is now considered technical debt (it drifts from the first-class dtype pipeline).
Plan (still valid):
- Ensure any remaining solver/eigensystem paths route through first-class complex dtype matrices/vectors.
- Long-term goal: complex is a normal
MatrixBase/VectorBasedtype, soLinearAlgebraandComputeDevicedon’t need a parallel complex universe.
Frontend contract note:
- Provide explicit dtype tokens for complex floats (at minimum:
complex_float16,complex_float32,complex_float64). - These tokens must normalize through the same dtype normalization funnel as real dtypes and participate in the same factory code paths.
3.1 Make “complex” first-class in the scalar type model
Requirement: represent scalar types as (kind, width_bits, flags) where flags includes at least {complex, unsigned}.
Implementation direction:
- Introduce a
ScalarTypedescriptor (or equivalent) that can represent: - base dtype (
float16/float32/float64) - flags (
complex) - Plumb this through the type-resolution path so promotion is defined as:
resolve_result_scalar(op, a_scalar, b_scalar) -> scalar
Design constraint (to match the frontend requirement):
- Even though complex can be represented as
(base_dtype + complex flag), it must be treated as a distinct dtype identity for: - promotion resolution,
- dispatch selection,
- persistence metadata,
- and the support-matrix enforcement (coverage must be tracked per complex permutation).
Back-compat note:
- The existing
DataTypeenum can remain as a legacy base-type id during migration, but Phase 3 must ensure complex-ness is not “out-of-band” anymore.
3.2 Storage strategy for complex (by base kind)
We intentionally use two different representations depending on the float width, to balance performance and scale-first storage efficiency.
3.2.1 Complex floats (performance path)
complex_float32(complex64) andcomplex_float64(complex128) are true complex numeric types.- Implement dense complex storage as contiguous
std::complex<float>/std::complex<double>(or ABI-compatible equivalent). - Route matmul to BLAS complex GEMM where possible.
- GPU: use cuBLAS complex GEMM when available.
3.2.2 Complex float16 (two-plane storage path)
- Represent
complex_float16as two float16 planes of equal shape: - real plane:
float16 - imag plane:
float16 - Motivation: avoid forcing half-precision complex values into float32 complex storage, and avoid depending on a non-portable “native complex half” ABI.
Important clarification:
- “Two-plane storage” is an implementation detail. The object is still a single complex-typed matrix/vector from the API perspective, and it must round-trip via persistence as a complex dtype (not as two unrelated real objects).
3.3 First-class complex matrices/vectors in the core object model
Hard requirement: complex objects must participate in factories, persistence, and dispatch the same way other dtypes do.
Minimum deliverables:
- A
MatrixBase-derived complex matrix implementation for: complex_float32/complex_float64(dense)complex_float16(two-plane storage)- A
VectorBase-derived complex vector implementation (same split).
Interface hazards to address explicitly (to avoid “biting us later”):
- Many existing code paths use
get_element_as_double(...). For complex dtypes, this must never silently drop the imaginary part. - Either implement
get_element_as_doubleas a hard error for complex matrices, or ensure it is only used behind a “real-only” guard. - Complex-aware paths must use
get_element_as_complex(...). ComputeDevice::multiply_scalarcurrently takesdouble; Phase 3 must define the complex-scalar story:- either add complex-scalar device entry points, or
- restrict complex-scalar multiply to frontend methods that dispatch to complex kernels.
3.4 Operation coverage policy for complex
Phase 3 does not require “every op supports every complex dtype” on day one, but it must make coverage enforceable:
- For each op in the canonical LinearAlgebra surface (at least
LinearAlgebra.hpp): - declare complex propagation rules (preserve complex, drop complex, or error-by-design)
- declare result dtype selection rules (including for
bitspecial cases) - Ensure the resolver has explicit rows for complex permutations.
Coverage principle (mathematical independence):
- Complex permutations must be treated as separate coverage targets even when they reuse plane-wise kernels.
- “Works because it decomposes into two real ops” is not a substitute for tests: each complex dtype/op combination must be explicitly tested (or explicitly error-by-design with a stable error).
Specific expectations:
complex_float32/complex_float64:add/sub/elementwise/matmulmust work on CPU.- GPU support is optional, but routing must be correct (fallback to CPU when unsupported).
complex_float16:add/sub/elementwise/matmulmust work on CPU.- if implemented via two-plane arithmetic, correctness must be validated vs NumPy complex computations.
3.5 Persistence format for complex
Current implementation note (updated 2025-12-16):
complex_float16uses a two-plane in-memory layout (real + imag), but is persisted as a single contiguous raw payload containing both planes back-to-back.- Typed metadata records the dtype identity (
complex_float16) and the normal shape/layout fields; there is no need for multi-member payloads to round-trip correctly.
Future option (not required for correctness):
- Multi-member payloads could still be introduced later for tooling/inspection convenience, but would be an on-disk format enhancement rather than a correctness requirement.
3.6 GPU/CPU selection policy
Match project intent:
- Default behavior: benchmark/poll hardware once, then pick the fastest device.
- If GPU does not support a dtype/op/structure, fall back to CPU.
- Avoid exploding “one kernel per infinitesimal device” by using:
- a small set of coarse regimes (dtype/shape thresholds)
- a micro-benchmark-derived speedup factor
3.7 Tests (keep the explosion under control)
The only way this stays maintainable is if we separate:
- pure-logic resolver tests (exhaustive across dtype permutations), from
- kernel correctness tests (representative shapes), from
- error-by-design tests (stable error messages).
Phase 3 must add a minimal “complex smoke matrix” for the LinearAlgebra surface:
complex_float64: add/sub/elementwise/matmul correctness vs NumPycomplex_float32: same, smaller shapes + tolerancescomplex_float16: add/sub/elementwise/matmul correctness vs NumPy (two-plane storage) + persistence round-trip
9) Mandatory tests
These tests are required. They exist to prevent dtype coverage drift and to catch correctness/performance regressions early.
9.1 Pure-logic dtype resolution tests
Add unit tests (no kernels) that exercise the resolver tables/functions. At minimum:
- Fundamental kind rule: never promote down across
bit -> int -> float. - Float underpromotion: e.g.,
matmul(float32, float64) -> float32by default. - Complex flag behavior: for each op, verify complex propagation/behavior is explicit (preserve/drop/error-by-design) and covered.
- Unsigned flag behavior: verify signed/unsigned mixing rules are explicit and tested.
- Error-by-design paths: verify they error with stable, specific messages.
These tests should be table-driven and exhaustive across the supported dtype set for each resolver entry.
9.2 Kernel/integration correctness tests
Add tests that validate numeric correctness and overflow behavior for representative ops and shapes:
dot/matmulinteger correctness across widths.- Overflow throws deterministically (no silent wrap).
For reduction-aware accumulator widening specifically, add at least one test where:
matmul(bit, int16)(ordot(bit, int16)) produces a value that would overflow anint16accumulator but fits inint32output.- The test asserts:
- correct numeric result,
- accumulator-widen warning is emitted and mentions: op name, lhs/rhs dtypes, accumulator dtype, and output dtype.
9.3 Warning tests (user-facing behavior)
Add Python-level tests (and C++ tests where applicable) that validate warnings are:
- emitted when required,
- de-duplicated (warn-once policy),
- informative (message includes the dtypes involved and what is happening),
- suppressible/routable via a user-facing control.
Warnings to cover:
- float underpromotion warning (if enabled)
- integer overflow-risk preflight warning (heuristic)
- integer reduction accumulator-widen warning (deterministic)
9.4 Scale-first regression tests (bit materialization guard)
Add a regression test that guards the key scale-first property for bit operands:
bitinputs must remain bit-packed duringdot/matmul(no full materialization to an int/float element buffer).
Implementation note (testability): this may require a test-only hook (e.g., allocation tracer, “materialized_bit_elements” counter, or a debug trace flag) so the test can assert that no allocation proportional to A.numel() * sizeof(int32) occurred.
9.5 Support-matrix completeness test
The support matrix must be executable as a test/tool:
- It must fail CI if an op claims support for a dtype/structure/device combination that lacks an implementation or test coverage.
10) Acceptance criteria
- Adding a new operation requires changing:
- the op implementation,
- one promotion rule table,
- one coverage declaration,
-
tests. It must not require “hunt across the codebase”.
-
Complex dtypes are supported for float base dtypes only (
complex_float16/32/64). -
Overflow behavior is consistent:
- overflow throws,
- large integer matmul emits a risk warning when appropriate,
- no auto-promotion to avoid overflow.
11) Open questions (to confirm before implementation)
- Exact list of supported ops for “core coverage” in the support matrix (minimal set to enforce first).
- Whether unsigned + signed mixing rules should default to promoting to signed or throwing in ops that can go negative.
- Default behavior for numeric ops on
bitwhen the semantic result is not representable inbitwithout widening: default widen vs error-by-design unless the caller explicitly requests an output dtype.