Numerical Stability#

Numerical stability is critical for reliable quantum simulation with MPS. This document explains sources of numerical error, condition number analysis, robust linear algebra techniques, mixed precision strategies, and best practices for maintaining accuracy throughout long simulations.

Overview #

Why Numerical Stability Matters #

MPS simulations involve:

Many operations: 1000+ gate applications in deep circuits
Ill-conditioned linear algebra: SVD of matrices with widely varying singular values
Error accumulation: Small errors compound over time
Truncation approximations: Discarding information must be controlled

Consequences of poor stability:

SVD convergence failures (simulation crash)
Negative probabilities in measurement
Norm drift (\(\|\psi\| \neq 1\))
Unphysical results (energy not conserved in TDVP)
Wrong ground state in VQE

Goals:

Maintain accuracy within user-specified tolerances
Detect numerical issues early
Apply corrective measures automatically when possible
Provide diagnostics for debugging

Floating-Point Arithmetic #

IEEE 754 Standard #

Modern GPUs and CPUs use IEEE 754 floating-point representation:

Complex64 (single precision complex):

Real/Imag parts: 32-bit float (1 sign + 8 exponent + 23 mantissa bits)
Machine epsilon: \(\epsilon \approx 1.2 \times 10^{-7}\)
Relative accuracy: ~7 decimal digits
Range: \(\pm 10^{\pm 38}\)

Complex128 (double precision complex):

Real/Imag parts: 64-bit float (1 sign + 11 exponent + 52 mantissa bits)
Machine epsilon: \(\epsilon \approx 2.2 \times 10^{-16}\)
Relative accuracy: ~16 decimal digits
Range: \(\pm 10^{\pm 308}\)

Rounding Errors #

Every floating-point operation introduces rounding error:

\[\text{fl}(a \odot b) = (a \odot b)(1 + \delta), \quad |\delta| \leq \epsilon_{\text{mach}}\]

where \(\odot\) is any arithmetic operation (+, -, ×, ÷).

Example: Sum of n numbers accumulates error \(O(n \epsilon)\):

import torch

# Accumulation of rounding errors
n = 10000
x = torch.ones(n, dtype=torch.float32) / n

# Exact sum should be 1.0
sum_forward = x.sum()
print(f"Forward sum: {sum_forward:.15f}")  # ~1.000000119

# Better: Kahan summation (compensated summation)
from atlas_q.numerics import kahan_sum
sum_compensated = kahan_sum(x)
print(f"Compensated sum: {sum_compensated:.15f}")  # Closer to 1.0

Loss of Significance #

Subtracting nearly equal numbers causes catastrophic cancellation:

\[a - b = (1.23456789 - 1.23456780) \times 10^{8} = 9.0 \times 10^{1}\]

Leading digits cancel, leaving only low-order (inaccurate) digits.

ATLAS-Q mitigation: Use numerically stable formulas. Example:

# Bad: Numerical cancellation when x ≈ 0
def bad_formula(x):
    return (torch.exp(x) - 1) / x

# Good: Use Taylor series for small |x|
def stable_formula(x):
    # exp(x) - 1 = x + x²/2 + x³/6 + ...
    # (exp(x) - 1) / x = 1 + x/2 + x²/6 + ...
    small_x = torch.abs(x) < 1e-4
    result = torch.empty_like(x)
    result[small_x] = 1.0 + x[small_x] / 2.0 + x[small_x]**2 / 6.0
    result[~small_x] = (torch.exp(x[~small_x]) - 1) / x[~small_x]
    return result

Sources of Numerical Error #

1. Floating-Point Arithmetic #

Magnitude: \(O(\epsilon_{\text{mach}})\) per operation

Accumulation: After \(n\) operations, error grows to \(O(n \epsilon_{\text{mach}})\)

Example: Gate application involves ~10 tensor contractions

Complex64: Error per gate ~ \(10^{-6}\)
Complex128: Error per gate ~ \(10^{-15}\)

For 1000 gates: - Complex64: Accumulated error ~ \(10^{-3}\) (may be significant) - Complex128: Accumulated error ~ \(10^{-12}\) (usually negligible)

2. Ill-Conditioned Matrices #

Definition: Matrix \(A\) is ill-conditioned if small perturbations cause large changes in solutions.

Condition number:

\[\kappa(A) = \|A\| \|A^{-1}\| = \frac{\sigma_{\max}(A)}{\sigma_{\min}(A)}\]

Error amplification:

\[\frac{\|\Delta x\|}{\|x\|} \leq \kappa(A) \frac{\|\Delta b\|}{\|b\|}\]

where \(Ax = b\).

Example in MPS:

from atlas_q.adaptive_mps import AdaptiveMPS
import torch

mps = AdaptiveMPS(num_qubits=30, bond_dim=64, device='cuda')

# Apply many gates - bond tensors become ill-conditioned
for i in range(100):
    for j in range(29):
        mps.apply_cnot(j, j+1)

# Check condition number of a bond tensor
tensor = mps.tensors[15].reshape(-1, mps.tensors[15].shape[-1])
U, S, Vh = torch.linalg.svd(tensor, full_matrices=False)
cond = S.max() / S.min()
print(f"Condition number: {cond:.2e}")  # May be 10^6 or higher

if cond > 1e6:
    print("Warning: Ill-conditioned tensor - consider using complex128")

3. SVD Truncation Error #

Truncation discards small singular values:

\[\|\Theta - \tilde{\Theta}\|_F = \sqrt{\sum_{i=\chi+1}^{r} \sigma_i^2}\]

Local error (single truncation): Typically \(10^{-8}\) to \(10^{-12}\)

Global error (many truncations): Bounded by root-sum-square:

\[\epsilon_{\text{global}} \leq \sqrt{\sum_{t=1}^{N} \epsilon_t^2}\]

Monitoring in ATLAS-Q:

mps = AdaptiveMPS(
    num_qubits=40,
    bond_dim=64,
    truncation_threshold=1e-10,
    device='cuda'
)

# Apply gates
for i in range(100):
    mps.apply_h(i % 40)
    mps.apply_cnot(i % 39, (i % 39) + 1)

# Check accumulated error
global_error = mps.statistics.total_truncation_error
print(f"Global truncation error: {global_error:.2e}")

if global_error > 1e-5:
    print("Warning: Accumulated truncation error may affect results")

4. Canonical Form Drift #

MPS tensors should maintain orthonormality constraints (canonical form), but rounding errors cause drift:

\[\sum_{\alpha,s} A^{[i]}_{\alpha,s,\beta} (A^{[i]}_{\alpha,s,\beta'})^* \approx \delta_{\beta,\beta'}\]

Drift accumulation: After many operations without recanonicalizing, orthonormality is lost.

Consequences:

Incorrect overlap calculations
Norm drift (\(\langle\psi|\psi\rangle \neq 1\))
Unstable TDVP evolution

Mitigation:

# Recanonicalize periodically
for step in range(1000):
    mps.apply_cnot(step % 39, (step % 39) + 1)

    if step % 100 == 0:
        mps.canonicalize(center=20)  # Restore orthonormality
        mps.normalize()               # Ensure unit norm

Condition Numbers #

Mathematical Definition #

For a matrix \(A\), the condition number in the 2-norm is:

\[\kappa(A) = \|A\|_2 \|A^{-1}\|_2 = \frac{\sigma_{\max}(A)}{\sigma_{\min}(A)}\]

Interpretation:

\(\kappa(A) = 1\): Perfectly conditioned (e.g., unitary matrices)
\(\kappa(A) < 10^3\): Well-conditioned
\(10^3 < \kappa(A) < 10^6\): Moderately conditioned
\(\kappa(A) > 10^6\): Ill-conditioned (numerical issues likely)
\(\kappa(A) > 1/\epsilon_{\text{mach}}\): Effectively singular

Computing Condition Numbers #

import torch

def condition_number(tensor):
    """
    Compute condition number of 2D tensor.
    """
    U, S, Vh = torch.linalg.svd(tensor, full_matrices=False)
    return (S.max() / S.min()).item()

# Example: Check MPS tensor conditioning
from atlas_q.adaptive_mps import AdaptiveMPS

mps = AdaptiveMPS(num_qubits=30, bond_dim=64, device='cuda')

# Apply deep circuit
for layer in range(50):
    for i in range(29):
        mps.apply_cnot(i, i+1)

# Check conditioning at each site
for i, tensor in enumerate(mps.tensors):
    mat = tensor.reshape(-1, tensor.shape[-1])
    cond = condition_number(mat)
    if cond > 1e6:
        print(f"Site {i}: condition number = {cond:.2e} (ill-conditioned!)")

Impact on Linear Algebra #

SVD stability: SVD is numerically stable even for ill-conditioned matrices. Small singular values are computed accurately (relative to their magnitude).

Matrix inversion: Inverting ill-conditioned matrix amplifies errors by factor \(\kappa(A)\).

Linear solves: Solving \(Ax = b\) loses \(\log_{10}(\kappa(A))\) digits of accuracy.

Example:

# Ill-conditioned 4×4 Hilbert matrix
H = torch.tensor([
    [1.0, 1/2, 1/3, 1/4],
    [1/2, 1/3, 1/4, 1/5],
    [1/3, 1/4, 1/5, 1/6],
    [1/4, 1/5, 1/6, 1/7]
], dtype=torch.float64)

cond = condition_number(H)
print(f"Condition number: {cond:.2e}")  # ~15,514

# Solve Hx = b
b = torch.randn(4, dtype=torch.float64)
x = torch.linalg.solve(H, b)

# Check residual
residual = torch.norm(H @ x - b) / torch.norm(b)
print(f"Relative residual: {residual:.2e}")  # Should be small

# Perturb b slightly
b_perturbed = b + 1e-10 * torch.randn(4, dtype=torch.float64)
x_perturbed = torch.linalg.solve(H, b_perturbed)

# Large change in solution due to ill-conditioning
relative_change = torch.norm(x_perturbed - x) / torch.norm(x)
print(f"Relative change in x: {relative_change:.2e}")  # ~ κ(H) * 1e-10

Canonical Forms and Orthogonality #

Canonical forms maintain numerical stability by enforcing orthonormality constraints.

Left-Canonical Form #

Site tensors satisfy:

\[\sum_{\alpha,s} A^{[i]}_{\alpha,s,\beta} (A^{[i]}_{\alpha,s,\beta'})^* = \delta_{\beta,\beta'}\]

Interpretation: Right indices of \(A^{[i]}\) form orthonormal basis.

Construction via QR decomposition:

import torch

def left_canonicalize_site(tensor):
    """
    Left-canonicalize MPS tensor via QR.

    Input: tensor of shape (chi_left, d, chi_right)
    Output: Q (left-canonical), R (to absorb into next site)
    """
    chi_left, d, chi_right = tensor.shape
    mat = tensor.reshape(chi_left * d, chi_right)
    Q, R = torch.linalg.qr(mat)
    Q_tensor = Q.reshape(chi_left, d, -1)
    return Q_tensor, R

# Example: Left-canonicalize entire MPS
from atlas_q.adaptive_mps import AdaptiveMPS

mps = AdaptiveMPS(num_qubits=20, bond_dim=64, device='cuda')

# Apply some gates
for i in range(19):
    mps.apply_cnot(i, i+1)

# Left-canonicalize (sweep left-to-right)
mps.canonicalize(direction='left')

# Verify orthonormality
tensor = mps.tensors[10]
chi_left, d, chi_right = tensor.shape
mat = tensor.reshape(chi_left * d, chi_right)
should_be_identity = mat.T.conj() @ mat
error = torch.norm(should_be_identity - torch.eye(chi_right, device='cuda'))
print(f"Orthonormality error: {error:.2e}")  # Should be ~ 1e-7 (complex64)

Right-Canonical Form #

Site tensors satisfy:

\[\sum_{\beta,s} A^{[i]}_{\alpha,s,\beta} (A^{[i]}_{\alpha',s,\beta})^* = \delta_{\alpha,\alpha'}\]

Interpretation: Left indices of \(A^{[i]}\) form orthonormal basis.

Construction via RQ decomposition (QR on transpose).

Mixed-Canonical Form #

Combination of left and right canonical:

Sites \(i < c\): Left-canonical
Site \(c\): Center (no constraint)
Sites \(i > c\): Right-canonical

Benefits:

Efficient norm calculation: \(\|\psi\|^2 = \text{Tr}(A^{[c]\dagger} A^{[c]})\)
Efficient overlap: \(\langle\phi|\psi\rangle\) computed via single contraction at center
TDVP stability: Tangent space projection requires canonical form

ATLAS-Q canonicalization:

mps = AdaptiveMPS(num_qubits=30, bond_dim=64, device='cuda')

# Apply gates (loses canonicalization)
for i in range(200):
    mps.apply_cnot(i % 29, (i % 29) + 1)

# Recanonicalize with center at site 15
mps.canonicalize(center=15)

# Now sites 0-14 are left-canonical, 16-29 are right-canonical

Importance for TDVP #

TDVP requires canonical form for numerical stability:

from atlas_q.tdvp import TDVP

mps = AdaptiveMPS(num_qubits=20, bond_dim=64, device='cuda')
# mps starts in canonical form

# TDVP maintains canonicalization internally
tdvp = TDVP(
    mps=mps,
    hamiltonian=hamiltonian_mpo,
    dt=0.05,
    method='one_site',
    normalize=True,       # Renormalize each step
    check_canonical=True  # Verify canonical form (debug mode)
)

for step in range(100):
    energy = tdvp.evolve_step()

    if step % 10 == 0:
        # Check if canonicalization maintained
        orthogonality_error = mps.check_orthogonality()
        print(f"Step {step}: orthogonality error = {orthogonality_error:.2e}")

Truncation Error Analysis #

Local Truncation Error #

Single truncation discards singular values \(\sigma_{\chi+1}, \sigma_{\chi+2}, \ldots\):

\[\epsilon_{\text{local}} = \|\Theta - \tilde{\Theta}\|_F = \sqrt{\sum_{i=\chi+1}^{r} \sigma_i^2}\]

Optimal approximation (Eckart-Young-Mirsky theorem): SVD truncation minimizes Frobenius norm error.

Energy-based truncation:

\[\sum_{\alpha=1}^{\chi} \sigma_\alpha^2 \geq (1 - \epsilon^2) \sum_{\alpha=1}^{r} \sigma_\alpha^2\]

Retains singular values until target fraction of weight captured.

Global Error Bounds #

After \(N\) truncations with local errors \(\epsilon_1, \epsilon_2, \ldots, \epsilon_N\):

Best-case bound (errors cancel):

\[\epsilon_{\text{global}} \leq \sum_{i=1}^{N} \epsilon_i\]

Typical bound (root-sum-square):

\[\epsilon_{\text{global}} \leq \sqrt{\sum_{i=1}^{N} \epsilon_i^2}\]

Conservative bound: Depends on circuit structure and error accumulation pattern.

ATLAS-Q tracking:

from atlas_q.adaptive_mps import AdaptiveMPS

mps = AdaptiveMPS(
    num_qubits=40,
    bond_dim=64,
    truncation_threshold=1e-10,
    track_error=True,  # Enable error tracking
    device='cuda'
)

# Apply 500 gates
for i in range(500):
    mps.apply_h(i % 40)
    mps.apply_cnot(i % 39, (i % 39) + 1)

# Access error statistics
stats = mps.statistics
print(f"Total truncations: {stats.num_truncations}")
print(f"Max local error: {stats.max_truncation_error:.2e}")
print(f"Average local error: {stats.avg_truncation_error:.2e}")
print(f"Global error (RSS): {stats.total_truncation_error:.2e}")

# Per-bond error distribution
import matplotlib.pyplot as plt
plt.bar(range(len(stats.truncation_errors_per_bond)),
        stats.truncation_errors_per_bond)
plt.xlabel('Bond index')
plt.ylabel('Accumulated error')
plt.title('Truncation Error Distribution')
plt.show()

Error Mitigation Strategies #

1. Adaptive truncation threshold:

# Start with aggressive truncation, tighten if error grows
mps = AdaptiveMPS(
    num_qubits=50,
    bond_dim=64,
    truncation_threshold=1e-8,  # Initial threshold
    adaptive_mode=True,
    device='cuda'
)

max_global_error = 1e-5

for i in range(1000):
    mps.apply_cnot(i % 49, (i % 49) + 1)

    # Check global error periodically
    if i % 100 == 0:
        global_error = mps.statistics.total_truncation_error
        if global_error > max_global_error * 0.8:
            # Tighten threshold
            mps.truncation_threshold *= 0.1
            print(f"Tightened truncation to {mps.truncation_threshold:.2e}")

2. Increase bond dimension:

If global error exceeds tolerance despite tight truncation threshold, increase \(\chi\).

3. Periodic compression:

Apply truncation less frequently to reduce accumulated error:

# Truncate every 10 gates instead of every gate
mps = AdaptiveMPS(num_qubits=30, bond_dim=128, device='cuda')

for i in range(100):
    # Apply 10 gates without truncation
    for j in range(10):
        gate_index = i * 10 + j
        mps.apply_cnot(gate_index % 29, (gate_index % 29) + 1,
                      truncate=False)

    # Single truncation after 10 gates
    mps.compress()  # Global compression

Robust Linear Algebra #

Robust SVD #

ATLAS-Q implements robust SVD with multiple fallback strategies:

from atlas_q.linalg_robust import robust_svd
import torch

def robust_svd(tensor, threshold=1e-14):
    """
    Robust SVD with automatic fallbacks.

    Fallback sequence:
    1. torch.linalg.svd on GPU
    2. Add Tikhonov regularization if condition number high
    3. Fallback to CPU if GPU fails
    4. Try different LAPACK drivers (gesvd, gesvdj)
    """
    try:
        # Try standard GPU SVD
        U, S, Vh = torch.linalg.svd(tensor, full_matrices=False)
    except RuntimeError as e:
        # GPU SVD failed - check for ill-conditioning
        if "convergence" in str(e).lower():
            # Add Tikhonov regularization
            reg = threshold * torch.eye(tensor.shape[-1],
                                         device=tensor.device,
                                         dtype=tensor.dtype)
            regularized = tensor @ tensor.T.conj() + reg
            U, S_squared, _ = torch.linalg.svd(regularized)
            S = torch.sqrt(S_squared)
            Vh = (U.T.conj() @ tensor) / S.unsqueeze(1)
        else:
            # Fallback to CPU
            tensor_cpu = tensor.cpu()
            U, S, Vh = torch.linalg.svd(tensor_cpu, full_matrices=False)
            U = U.to(tensor.device)
            S = S.to(tensor.device)
            Vh = Vh.to(tensor.device)

    # Filter small singular values
    keep = S > threshold
    return U[:, keep], S[keep], Vh[keep, :]

Usage in ATLAS-Q:

from atlas_q.adaptive_mps import AdaptiveMPS

# robust_svd used automatically in adaptive truncation
mps = AdaptiveMPS(
    num_qubits=40,
    bond_dim=128,
    use_robust_linalg=True,  # Enable robust linear algebra
    device='cuda'
)

Robust QR Decomposition #

QR with column pivoting for rank-deficient matrices:

def robust_qr(tensor, threshold=1e-14):
    """
    QR decomposition with column pivoting for numerical stability.
    """
    Q, R, P = torch.linalg.qr(tensor, mode='reduced', pivot=True)

    # Truncate small diagonal elements of R
    diag_R = torch.diag(R)
    keep = torch.abs(diag_R) > threshold

    Q_truncated = Q[:, keep]
    R_truncated = R[keep, :]

    # Undo permutation
    P_inv = torch.argsort(P)
    R_truncated = R_truncated[:, P_inv]

    return Q_truncated, R_truncated

Eigendecomposition Stability #

For Hermitian matrices (e.g., density matrices, Hamiltonians):

def robust_eigh(tensor, threshold=1e-14):
    """
    Robust eigendecomposition for Hermitian matrices.
    """
    # Ensure Hermiticity (symmetrize)
    tensor_sym = (tensor + tensor.T.conj()) / 2

    # Eigendecomposition
    eigenvalues, eigenvectors = torch.linalg.eigh(tensor_sym)

    # Filter small/negative eigenvalues
    keep = eigenvalues > threshold
    return eigenvalues[keep], eigenvectors[:, keep]

# Example: Reduced density matrix
from atlas_q.adaptive_mps import AdaptiveMPS

mps = AdaptiveMPS(num_qubits=20, bond_dim=64, device='cuda')

# Reduced density matrix for sites 0-9
rho = mps.reduced_density_matrix(sites=range(10))

# Eigendecomposition
eigenvalues, eigenvectors = robust_eigh(rho)

# Check: eigenvalues should sum to 1, all non-negative
assert torch.abs(eigenvalues.sum() - 1.0) < 1e-6
assert torch.all(eigenvalues >= -1e-10)  # Allow small numerical negative

Mixed Precision Strategies #

Complex64 vs Complex128 #

Complex64 (default):

Advantages: 2× faster, 2× less memory
Sufficient for: Most quantum circuits, VQE with shallow ansätze, QAOA
Limitations: Accumulates error faster, less stable for ill-conditioned problems

Complex128:

Advantages: Higher precision, better stability for long simulations
Required for: Long-time TDVP, deep circuits (>1000 gates), high-precision chemistry
Cost: 2× slower, 2× more memory

Automatic Precision Promotion #

ATLAS-Q can automatically promote to higher precision when detecting instability:

from atlas_q.adaptive_mps import AdaptiveMPS, DTypePolicy
import torch

# Configure automatic promotion policy
policy = DTypePolicy(
    default=torch.complex64,
    promote_if_cond_gt=1e6,        # Promote if condition number > 10^6
    promote_if_error_gt=1e-5,      # Promote if global error > 10^-5
    promote_to=torch.complex128
)

mps = AdaptiveMPS(
    num_qubits=40,
    bond_dim=64,
    dtype_policy=policy,
    device='cuda'
)

# Simulation proceeds with complex64
for i in range(500):
    mps.apply_cnot(i % 39, (i % 39) + 1)

    # Automatic promotion if instability detected
    if mps.statistics.promoted_to_higher_precision:
        print(f"Promoted to complex128 at gate {i}")
        break

Manual Precision Control #

# Always use complex128 for maximum stability
mps = AdaptiveMPS(
    num_qubits=30,
    bond_dim=128,
    dtype=torch.complex128,
    device='cuda'
)

# Or convert existing MPS
mps = mps.to(dtype=torch.complex128)

Diagnostics and Monitoring #

Built-in Diagnostics #

from atlas_q.adaptive_mps import AdaptiveMPS
from atlas_q.diagnostics import MPSDiagnostics

mps = AdaptiveMPS(num_qubits=30, bond_dim=64, device='cuda')

# Apply circuit
for i in range(200):
    mps.apply_cnot(i % 29, (i % 29) + 1)

# Run comprehensive diagnostics
diag = MPSDiagnostics(mps)

# Check norm
norm = diag.check_norm()
print(f"MPS norm: {norm:.10f}")  # Should be close to 1.0
if abs(norm - 1.0) > 1e-6:
    print("Warning: Norm drift detected")

# Check orthogonality (if in canonical form)
orth_error = diag.check_orthogonality()
print(f"Orthogonality error: {orth_error:.2e}")

# Check for unphysical values
has_nan = diag.check_for_nan()
has_inf = diag.check_for_inf()
print(f"Contains NaN: {has_nan}, Contains Inf: {has_inf}")

# Condition numbers
max_cond = diag.max_condition_number()
print(f"Max condition number: {max_cond:.2e}")

# Entanglement entropy sanity check
max_entropy = diag.max_entanglement_entropy()
theoretical_max = torch.log(torch.tensor(2.0)) * 15  # For half-system
print(f"Max entropy: {max_entropy:.2f}, Theoretical max: {theoretical_max:.2f}")

Energy Conservation (TDVP)#

For Hamiltonian time evolution, energy should be conserved:

from atlas_q.tdvp import TDVP

tdvp = TDVP(mps=mps, hamiltonian=hamiltonian_mpo, dt=0.05, method='one_site')

energies = []
for step in range(500):
    E = tdvp.evolve_step()
    energies.append(E)

    if step % 50 == 0:
        energy_drift = abs(E - energies[0])
        print(f"Step {step}: E = {E:.8f}, drift = {energy_drift:.2e}")

# Check energy conservation
max_drift = max(abs(E - energies[0]) for E in energies)
if max_drift > 1e-6:
    print(f"Warning: Energy drift {max_drift:.2e} exceeds tolerance")
    print("Consider: reducing time step, using complex128, or 2-site TDVP")