atlas_q.linalg_robust#

Robust Linear Algebra Operations with Fallback Cascade

Provides GPU-first SVD with automatic fallback to CPU and jitter-based recovery for numerically unstable matrices.

Author: ATLAS-Q Contributors Date: October 2025 License: MIT

atlas_q.linalg_robust.robust_svd(X)[source]#

Robust SVD with fallback cascade: CUDA → jitter → CPU

Args:: X: Input tensor to decompose
Returns:: U, S, Vh, driver_used

Strategy: 1. Try torch.linalg.svd on GPU (cuSOLVER backend) 2. If fails, add small jitter and retry on GPU 3. If still fails, fall back to CPU SVD 4. Return which driver succeeded for diagnostics

atlas_q.linalg_robust.robust_qr(X)[source]#

Robust QR decomposition with fallback

Args:: X: Input tensor to decompose
Returns:: Q, R, driver_used

atlas_q.linalg_robust.condition_number(S)[source]#

Compute condition number from singular values

Args:: S: Singular values (sorted descending)
Returns:: Condition number (σ_max / σ_min)

Overview#

The linalg_robust module provides GPU-accelerated linear algebra operations with automatic fallback mechanisms for numerical stability. Critical for reliable tensor network simulations where ill-conditioned matrices can arise from accumulated roundoff errors or singular configurations.

Numerical Challenges in MPS

Matrix Product State operations require frequent Singular Value Decompositions (SVD) and QR factorizations. Challenges include:

Ill-conditioned matrices: Condition numbers κ > 10⁶ cause numerical instabilities
Near-zero singular values: Singular values < 10⁻¹⁴ (machine epsilon) can lead to convergence failures
GPU limitations: cuSOLVER occasionally fails on borderline cases that CPU LAPACK handles
Accumulated roundoff: Deep circuits accumulate floating-point errors

Fallback Cascade Strategy

The robust operations implement a three-tier strategy:

\[\begin{split}\text{SVD}(A) = \begin{cases} \text{cuSOLVER}(A) & \text{if converges} \\ \text{cuSOLVER}(A + \epsilon \cdot I) & \text{if cuSOLVER fails, }\epsilon = 10^{-12} \\ \text{LAPACK}(A) & \text{if GPU fails (always succeeds)} \end{cases}\end{split}\]

The jitter regularization adds a tiny diagonal perturbation \(\epsilon I\) to improve conditioning without significantly altering the decomposition.

Key Guarantees

Always succeeds: Falls back to CPU if GPU fails
Minimal overhead: ~99% of operations succeed on GPU
Diagnostic tracking: Returns which driver succeeded
Automatic integration: Used transparently by AdaptiveMPS

Functions#

`robust_svd`	Robust SVD with fallback cascade: CUDA → jitter → CPU
`robust_qr`	Robust QR decomposition with fallback
`condition_number`	Compute condition number from singular values

robust_svd#

atlas_q.linalg_robust.robust_svd(X)[source]#

Robust SVD with fallback cascade: CUDA → jitter → CPU

Args:: X: Input tensor to decompose
Returns:: U, S, Vh, driver_used

Strategy: 1. Try torch.linalg.svd on GPU (cuSOLVER backend) 2. If fails, add small jitter and retry on GPU 3. If still fails, fall back to CPU SVD 4. Return which driver succeeded for diagnostics

Performs singular value decomposition with a three-stage fallback cascade:

Direct CUDA SVD using cuSOLVER backend
GPU SVD with small jitter added for numerical stability
CPU fallback (always succeeds, slower)

Returns the decomposition along with which driver succeeded for diagnostic purposes.

robust_qr#

atlas_q.linalg_robust.robust_qr(X)[source]#

Robust QR decomposition with fallback

Args:: X: Input tensor to decompose
Returns:: Q, R, driver_used

Performs QR decomposition with fallback:

Direct CUDA QR
CPU fallback if GPU fails

Returns Q and R matrices along with driver information.

condition_number#

atlas_q.linalg_robust.condition_number(S)[source]#

Compute condition number from singular values

Args:: S: Singular values (sorted descending)
Returns:: Condition number (σ_max / σ_min)

Computes the condition number \(\kappa = \sigma_{\text{max}} / \sigma_{\text{min}}\) from singular values. Large condition numbers (> 10⁶) indicate ill-conditioned matrices that may benefit from higher precision or regularization.

Examples#

Basic SVD with automatic fallback:

import torch
from atlas_q.linalg_robust import robust_svd

X = torch.randn(100, 50, dtype=torch.complex64, device='cuda')

U, S, Vh, driver = robust_svd(X)

print(f"SVD succeeded using: {driver}")
print(f"Singular values: {S[:5]}")

Checking condition number:

from atlas_q.linalg_robust import robust_svd, condition_number

X = torch.randn(50, 50, dtype=torch.complex64, device='cuda')
U, S, Vh, driver = robust_svd(X)

cond = condition_number(S)
print(f"Condition number: {cond:.2e}")

if cond > 1e6:
    print("Warning: Matrix is ill-conditioned")

QR decomposition:

from atlas_q.linalg_robust import robust_qr

X = torch.randn(100, 50, dtype=torch.complex64, device='cuda')
Q, R, driver = robust_qr(X)

print(f"QR succeeded using: {driver}")

# Verify orthogonality
I = torch.matmul(Q.conj().T, Q)
error = torch.norm(I - torch.eye(50, device='cuda'))
print(f"Orthogonality error: {error:.2e}")

Integration with AdaptiveMPS:

from atlas_q.adaptive_mps import AdaptiveMPS
import torch

# AdaptiveMPS automatically uses robust_svd internally
mps = AdaptiveMPS(num_qubits=20, bond_dim=16, device='cuda')

# Apply gates - robust SVD handles any numerical issues
H = torch.tensor([[1, 1], [1, -1]], dtype=torch.complex64) / torch.sqrt(torch.tensor(2.0))
H = H.to('cuda')

for q in range(20):
    mps.apply_single_qubit_gate(q, H)

# Check statistics to see fallback usage
stats = mps.stats_summary()
print(f"GPU SVD usage: {stats['cuda_svd_pct']:.1f}%")
print(f"CPU fallback usage: {stats['cpu_fallback_pct']:.1f}%")

Error Handling#

The robust linear algebra routines never raise exceptions for numerical failures. Instead, they automatically fall back through strategies until one succeeds. This ensures simulations continue even with challenging numerical conditions.

Driver Return Values:

'torch_cuda' - Successful GPU SVD/QR
'torch_cuda_jitter' - GPU SVD with jitter regularization
'torch_cpu' - CPU fallback was required

Example 4: Handling Ill-Conditioned Matrices

import torch
from atlas_q.linalg_robust import robust_svd, condition_number

# Create an ill-conditioned matrix
A = torch.randn(50, 50, dtype=torch.complex64, device='cuda')
A[:, 0] = A[:, 1] * 1e-10  # Make first two columns nearly identical

U, S, Vh, driver = robust_svd(A)

cond = condition_number(S)
print(f"Condition number: {cond:.2e}")
print(f"Driver used: {driver}")

# Even with κ ~ 10¹⁰, robust_svd succeeds
# Likely falls back to jitter or CPU

Performance Considerations#

Fallback Frequency

In typical MPS simulations:

torch_cuda: 98-99% of SVDs (fast GPU path)
torch_cuda_jitter: 0.5-1% (negligible overhead)
torch_cpu: 0.1-0.5% (2-10× slower but rare)

Timing Comparison (50×50 complex64 matrix)

SVD Performance#
Method	Time (µs)	Relative	Success Rate
cuSOLVER (direct)	45	1.0×	98%
cuSOLVER + jitter	48	1.07×	99.9%
CPU LAPACK	320	7.1×	100%

The overhead of fallback logic is negligible (< 1µs).

Memory Usage

GPU path: Workspace O(mn) on GPU
CPU fallback: Temporary copy to CPU (8mn bytes for complex64)
Jitter: No additional memory (in-place modification)

When Fallbacks Occur

Common scenarios triggering fallbacks:

Deep circuits (> 100 layers): Accumulated roundoff errors
Small singular values: σ_min < 10⁻¹²
Highly entangled states: κ > 10⁸
Malformed gates: Non-unitary or near-singular

Optimization Strategies

Use complex128 for deep circuits: Reduces roundoff accumulation

mps = AdaptiveMPS(num_qubits=50, bond_dim=32, device='cuda', dtype=torch.complex128)

Monitor condition numbers: Alert on κ > 10⁶

stats = mps.stats_summary()
if stats['max_condition_number'] > 1e6:
    print("Warning: Ill-conditioned matrices detected")

**Periodic recanonical**ization: Reset numerical errors

if step % 1000 == 0:
    mps.canonicalize(chi_max=64)  # Full SVD sweep

Increase truncation threshold: eps_bond=1e-8 → 1e-6 for stability

Best Practices#

Development/Debugging

Enable verbose logging to track fallback frequency
Plot condition numbers vs. circuit depth
Test with both float32 and float64 to identify numerical issues

Production

Monitor CPU fallback rate (should be < 1%)
Alert if rate > 5% (indicates systematic numerical problems)
Use mixed precision: complex64 for most operations, complex128 for final result

Benchmarking

Compare GPU-only vs. robust implementations:

import time
import torch

A = torch.randn(100, 100, dtype=torch.complex64, device='cuda')

# GPU-only (may fail)
start = time.time()
U, S, Vh = torch.linalg.svd(A)
gpu_time = time.time() - start

# Robust (always succeeds)
from atlas_q.linalg_robust import robust_svd
start = time.time()
U, S, Vh, driver = robust_svd(A)
robust_time = time.time() - start

print(f"GPU: {gpu_time*1e6:.1f} µs")
print(f"Robust: {robust_time*1e6:.1f} µs ({robust_time/gpu_time:.2f}× overhead)")
print(f"Driver: {driver}")

Typical overhead: < 5% for normal matrices.

Limitations#

Complex128 Performance

~2× slower than complex64 on GPU
Worth it for circuits > 50 layers or κ > 10⁸

CPU Fallback Latency

Can cause 10× slowdown if frequent (> 10%)
Consider cuQuantum backend for better GPU stability

Jitter Side Effects

Jitter ε = 10⁻¹² alters singular values by O(ε)
Negligible for typical thresholds (> 10⁻¹⁰)
May affect very high-precision requirements

Use Cases#

Critical for:

Deep quantum circuits (> 50 layers)
High-entanglement states (χ > 64)
Long-running simulations (hours)
Production systems requiring reliability

Optional for:

Shallow circuits (< 20 layers)
Low entanglement (χ < 32)
Exploratory development

References#

[Golub13]

1. Golub & C. F. Van Loan, Matrix Computations, 4th edition, Johns Hopkins University Press (2013).

[Higham02]

1. Higham, Accuracy and Stability of Numerical Algorithms, 2nd edition, SIAM (2002).

[Dongarra03]

Dongarra et al., “The impact of multicore on computational science software,” CTWatch Quarterly 3, 1 (2007).