Skip to content

pycauset.matmul

pycauset.matmul(a, b)

Perform matrix multiplication.

For native objects, this routes through the standard compute boundary (AutoSolver / device routing) and dispatches to optimized C++ implementations.

For block matrices (constructed via pycauset.matrix(block_grid) where every element is matrix-like), pycauset.matmul(a, b) preserves “once block, always block” by returning a thunked block-matrix result with partition refinement (union of shared boundaries via SubmatrixView tiling).

This function is NumPy-like:

  • matrix-matrix: (m, k) @ (k, n) -> (m, n)
  • matrix-vector: (m, k) @ (k,) -> (m,)
  • vector-matrix: (k,) @ (k, n) -> (1, n) (row-vector semantics)
  • vector-vector: (k,) @ (k,) -> scalar (dot)

Parameters

  • a (MatrixBase or VectorBase or BlockMatrix): Left operand.
  • b (MatrixBase or VectorBase or BlockMatrix): Right operand.

Shape rule: a.cols() == b.rows().

Returns

  • MatrixBase or VectorBase or scalar: The result. The specific type and shape depend on input ranks.

When either operand is a BlockMatrix, the result is a BlockMatrix (typically holding lazy ThunkBlock output blocks).

Evaluation (block matrices)

Block-matrix results are semi-lazy:

  • Triggers (evaluate minimal required blocks): element access (e.g. C[i, j]), conversion to NumPy (np.asarray(C)), persistence (pycauset.save(C, ...)), and crossing the compute boundary.
  • Non-triggers: repr(C), str(C), and partition metadata access.

Determinism:

  • Fixed k order for each output block’s accumulation.
  • Accumulator dtype is chosen from operand dtype metadata only (folding add-result dtype across terms) before evaluation.
  • Leaf matmul runs through the public compute boundary (AutoSolver) per block; complex blocks route to CPU on CUDA builds.

See Block Matrices for details.

Examples

import pycauset as pc

# Dense matmul
A = pc.matrix(((1.0, 2.0), (3.0, 4.0)))
B = pc.matrix(((5.0,), (6.0,)))
C = pc.matmul(A, B)
assert C.shape == (2, 1)

# Block matmul (returns a BlockMatrix)
blk = pc.matrix(((1.0, 0.0), (0.0, 1.0)))
BM = pc.matrix(((blk, blk), (blk, blk)))
out = pc.matmul(BM, BM)

# Accessing an element triggers evaluation of the needed output block
_ = out[0, 0]

See also