atlas_q.cuquantum_backend#
cuQuantum Backend Integration
Optional NVIDIA cuQuantum acceleration for MPS operations. Provides 2-10× speedup on compatible NVIDIA GPUs.
Features: - cuTensorNet for tensor contractions and SVD - cuStateVec for state-vector operations - Automatic fallback to PyTorch if cuQuantum unavailable - Version compatibility handling
Author: ATLAS-Q Contributors Date: October 2025
- class atlas_q.cuquantum_backend.CuQuantumConfig(use_cutensornet=True, use_custatevec=True, workspace_size=1073741824, algorithm='auto', device='cuda')[source]#
Bases:
objectConfiguration for cuQuantum backend
- class atlas_q.cuquantum_backend.CuQuantumBackend(config=None)[source]#
Bases:
objectOptional cuQuantum backend for accelerated tensor operations.
Automatically falls back to PyTorch if cuQuantum is not available.
Methods
contract(tensors, indices[, optimize])Tensor contraction with optional cuQuantum acceleration.
svd(tensor[, chi_max, cutoff])Compute SVD with optional cuQuantum acceleration.
- __init__(config=None)[source]#
Initialize cuQuantum backend.
- Args:
config: Configuration options (uses defaults if None)
- svd(tensor, chi_max=None, cutoff=1e-14)[source]#
Compute SVD with optional cuQuantum acceleration.
- Args:
tensor: Input tensor (2D after reshaping) chi_max: Maximum bond dimension (truncation) cutoff: Singular value cutoff threshold
- Returns:
U, S, Vdagger tensors
- class atlas_q.cuquantum_backend.CuStateVecBackend(config=None)[source]#
Bases:
objectOptional cuStateVec backend for state-vector operations.
Provides accelerated gate application and measurements.
Methods
apply_gate(state, gate, qubits)Apply quantum gate to state vector.
- atlas_q.cuquantum_backend.get_backend(config=None)[source]#
Get global cuQuantum backend instance.
- Args:
config: Optional configuration (uses default if None)
- Returns:
CuQuantumBackend instance
- atlas_q.cuquantum_backend.get_statevec_backend(config=None)[source]#
Get global cuStateVec backend instance.
- Args:
config: Optional configuration
- Returns:
CuStateVecBackend instance
- atlas_q.cuquantum_backend.benchmark_backend(n_trials=10, matrix_size=256)[source]#
Benchmark cuQuantum vs PyTorch performance.
- Args:
n_trials: Number of benchmark trials matrix_size: Size of test matrices
- Returns:
Dictionary with timing results
Overview#
The cuquantum_backend module provides optional NVIDIA cuQuantum acceleration for tensor operations in ATLAS-Q. Key features include:
cuTensorNet integration for accelerated tensor contractions and SVD
cuStateVec support for state-vector operations
Automatic fallback to PyTorch if cuQuantum is unavailable
Version compatibility handling
2-10× speedup on compatible NVIDIA GPUs
This module is optional and ATLAS-Q functions normally without it, using PyTorch as the backend.
Installation#
To enable cuQuantum acceleration:
pip install cuquantum-python
Requires NVIDIA GPU with CUDA support and cuQuantum library (typically ~320MB download).
Classes#
Configuration for cuQuantum backend |
|
Optional cuQuantum backend for accelerated tensor operations. |
CuQuantumConfig#
CuQuantumBackend#
- class atlas_q.cuquantum_backend.CuQuantumBackend(config=None)[source]#
Bases:
objectOptional cuQuantum backend for accelerated tensor operations.
Automatically falls back to PyTorch if cuQuantum is not available.
Methods
contract(tensors, indices[, optimize])Tensor contraction with optional cuQuantum acceleration.
svd(tensor[, chi_max, cutoff])Compute SVD with optional cuQuantum acceleration.
Primary interface for cuQuantum-accelerated operations.
Methods
__init__([config])Initialize cuQuantum backend.
svd(tensor[, chi_max, cutoff])Compute SVD with optional cuQuantum acceleration.
contract(tensors, indices[, optimize])Tensor contraction with optional cuQuantum acceleration.
Automatically detects cuQuantum availability and falls back to PyTorch if:
cuQuantum is not installed
Initialization fails
Individual operations fail
- __init__(config=None)[source]#
Initialize cuQuantum backend.
- Args:
config: Configuration options (uses defaults if None)
- svd(tensor, chi_max=None, cutoff=1e-14)[source]#
Compute SVD with optional cuQuantum acceleration.
- Args:
tensor: Input tensor (2D after reshaping) chi_max: Maximum bond dimension (truncation) cutoff: Singular value cutoff threshold
- Returns:
U, S, Vdagger tensors
Examples#
Basic usage with automatic detection:
from atlas_q.cuquantum_backend import CuQuantumBackend
import torch
# Backend automatically detects cuQuantum
backend = CuQuantumBackend()
if backend.available:
print(f"cuQuantum {backend.version} detected")
else:
print("Using PyTorch backend")
# Perform SVD (uses cuQuantum if available, else PyTorch)
tensor = torch.randn(100, 50, dtype=torch.complex64, device='cuda')
U, S, Vt = backend.svd(tensor, chi_max=32)
print(f"Truncated to {len(S)} singular values")
Custom configuration:
from atlas_q.cuquantum_backend import CuQuantumBackend, CuQuantumConfig
# Configure cuQuantum parameters
config = CuQuantumConfig(
use_cutensornet=True,
use_custatevec=True,
workspace_size=2 * 1024**3, # 2GB workspace
algorithm='gesvdj', # Jacobi SVD
device='cuda:0'
)
backend = CuQuantumBackend(config)
# SVD with custom config
tensor = torch.randn(200, 100, dtype=torch.complex64, device='cuda:0')
U, S, Vt = backend.svd(tensor, chi_max=64, cutoff=1e-12)
Integration with AdaptiveMPS:
from atlas_q.adaptive_mps import AdaptiveMPS
from atlas_q.cuquantum_backend import CuQuantumBackend
# Initialize cuQuantum backend
cu_backend = CuQuantumBackend()
# Create MPS (automatically uses cuQuantum if available)
mps = AdaptiveMPS(
num_qubits=30,
bond_dim=16,
device='cuda'
)
# AdaptiveMPS will use cuQuantum backend internally if available
# Apply gates - operations accelerated by cuQuantum
import torch
H = torch.tensor([[1, 1], [1, -1]], dtype=torch.complex64) / torch.sqrt(torch.tensor(2.0))
H = H.to('cuda')
for q in range(30):
mps.apply_single_qubit_gate(q, H)
Tensor contraction:
from atlas_q.cuquantum_backend import CuQuantumBackend
import torch
backend = CuQuantumBackend()
# Create tensors
A = torch.randn(10, 20, 30, dtype=torch.complex64, device='cuda')
B = torch.randn(30, 40, 50, dtype=torch.complex64, device='cuda')
# Contract using Einstein notation
C = backend.contract([A, B], 'ijk,klm->ijlm', optimize='auto')
print(f"Contraction result shape: {C.shape}")
Checking cuQuantum availability:
from atlas_q.cuquantum_backend import CUQUANTUM_AVAILABLE, CUQUANTUM_VERSION
if CUQUANTUM_AVAILABLE:
print(f"cuQuantum version {CUQUANTUM_VERSION} is available")
print("GPU-accelerated operations enabled")
else:
print("cuQuantum not available")
print("Install with: pip install cuquantum-python")
Performance Considerations#
Speedup Factors#
Expected speedup with cuQuantum:
SVD operations: 2-5× for χ > 64
Tensor contractions: 3-10× for large tensors
Overall MPS operations: 1.5-3× average
GPU Requirements#
NVIDIA GPU with CUDA compute capability 7.0+ (Volta, Turing, Ampere, Hopper)
CUDA Toolkit 11.0+
Recommended: A100, H100, or RTX 4090
Memory Usage#
cuQuantum requires additional GPU workspace memory (configurable via workspace_size). Default is 1GB, but larger workspaces can improve performance for large tensors.
Fallback Behavior#
The backend gracefully handles failures:
If cuQuantum is not installed: All operations use PyTorch
If initialization fails: Falls back to PyTorch with warning
If individual operations fail: Automatic PyTorch fallback for that operation
Warnings are printed to help diagnose issues, but simulation continues.
Compatibility#
Tested with:
cuQuantum 23.x - 25.x
CUDA 11.8+
PyTorch 2.0+
NVIDIA A100, H100, RTX 4090
Version-specific features are auto-detected and handled.
Troubleshooting#
cuQuantum not detected#
from atlas_q.cuquantum_backend import CUQUANTUM_AVAILABLE
if not CUQUANTUM_AVAILABLE:
# Try installing
# pip install cuquantum-python
pass
Out of memory errors#
Reduce workspace size:
from atlas_q.cuquantum_backend import CuQuantumConfig, CuQuantumBackend
config = CuQuantumConfig(workspace_size=512 * 1024**2) # 512MB
backend = CuQuantumBackend(config)
Slower than PyTorch#
cuQuantum overhead is only worthwhile for larger tensors. For small systems (χ < 32), PyTorch may be faster. Consider disabling cuQuantum for small simulations.
Best Practices#
When to Use cuQuantum
χ > 32: Significant speedup (1.5-3×)
χ > 64: Major speedup (2-5×)
Long-running simulations: Reduced wall-clock time
Multi-GPU: cuQuantum’s distributed capabilities
When to Use PyTorch
χ < 32: Overhead dominates
Rapid prototyping: Simpler setup
CPU-only systems: cuQuantum requires NVIDIA GPU
Optimization Tips
Increase workspace_size for better performance (2-4GB recommended)
Use ‘gesvdj’ algorithm for moderate χ (32-128)
Enable both cuTensorNet and cuStateVec for best results
Monitor fallback rate - should be < 1%
Use Cases#
Ideal Applications
Large-scale tensor network simulations (χ > 64)
Production systems requiring maximum performance
Multi-hour simulations where 2× speedup matters
Research requiring state-of-the-art performance
Not Recommended
Small systems (N < 20, χ < 32)
Educational/tutorial code (unnecessary complexity)
Systems without NVIDIA GPUs
See Also#
atlas_q.adaptive_mps - MPS operations accelerated by cuQuantum
atlas_q.linalg_robust - Alternative robust linear algebra
Triton GPU Kernels - Custom GPU kernels for specific operations
Integrate cuQuantum - Integration guide
How to Optimize Performance - Performance optimization
References#
NVIDIA cuQuantum SDK, https://developer.nvidia.com/cuquantum-sdk
cuTensorNet Documentation, https://docs.nvidia.com/cuda/cuquantum/cutensornet/index.html
Lykov et al., “Tensor network quantum simulator with step-dependent parallelization,” arXiv:2212.14703 (2022).