How to Handle Large Quantum Systems#
This guide shows strategies for simulating large quantum systems (50+ qubits, high entanglement) that exceed single-device memory or require prohibitive computation time.
Problem#
You need to:
Simulate systems with 50-200+ qubits
Handle highly entangled states requiring large bond dimensions
Work within memory constraints (GPU/CPU RAM limits)
Reduce computation time for large-scale simulations
Scale simulations across multiple devices or nodes
Prerequisites#
Understanding of MPS representation and bond dimensions (see MPS Basics)
Familiarity with ATLAS-Q basic usage
Access to GPU(s) or multi-node cluster for distributed simulations
Knowledge of your hardware memory limits
Strategy 1: Adaptive Truncation#
Let bond dimensions grow dynamically based on entanglement.
Configure Adaptive Truncation#
from atlas_q.adaptive_mps import AdaptiveMPS
# Start with small χ, grow adaptively
mps = AdaptiveMPS(
num_qubits=50,
bond_dim=8, # Initial χ (small)
chi_max_per_bond=256, # Maximum χ allowed
truncation_threshold=1e-8, # SVD cutoff
adaptive_mode=True, # Enable adaptive growth
device='cuda'
)
# Apply entangling gates - χ grows automatically
import torch
for i in range(49):
mps.apply_cnot(i, i+1)
# Check bond dimensions after evolution
bond_dims = mps.get_bond_dimensions()
print(f"Bond dimensions: {bond_dims}")
print(f"Max χ reached: {max(bond_dims)}")
print(f"Average χ: {sum(bond_dims)/len(bond_dims):.1f}")
The MPS starts small and grows only where entanglement is high, saving memory.
Per-Bond Limits#
Set different limits for different regions:
# Higher limits in middle (high entanglement), lower at edges
chi_limits = [32, 64, 128, 256, 256, 128, 64, 32]
mps = AdaptiveMPS(num_qubits=len(chi_limits)+1, bond_dim=16, device='cuda')
# Apply per-bond limits
for i, chi_max in enumerate(chi_limits):
mps.set_bond_max_dimension(bond_index=i, chi_max=chi_max)
This heterogeneous approach allocates resources where needed most.
Monitor Truncation Error#
Ensure adaptive truncation maintains accuracy:
from atlas_q.diagnostics import MPSStatistics
mps = AdaptiveMPS(
num_qubits=60,
bond_dim=8,
chi_max_per_bond=128,
adaptive_mode=True,
device='cuda'
)
stats = MPSStatistics(mps)
# Apply circuit
for i in range(59):
mps.apply_cnot(i, i+1)
# Check truncation statistics
summary = stats.summary()
print(f"Total truncation error: {summary['cumulative_truncation_error']:.2e}")
print(f"Max single-step error: {summary['max_truncation_error']:.2e}")
print(f"SVD count: {summary['svd_count']}")
# If error too high, increase limits
if summary['cumulative_truncation_error'] > 1e-5:
print("Warning: High truncation error, consider increasing chi_max")
Strategy 2: Memory Budget Management#
Constrain total memory usage.
Set Global Memory Budget#
# Limit MPS to 4 GB
mps = AdaptiveMPS(
num_qubits=50,
bond_dim=16,
chi_max_per_bond=128,
memory_budget_mb=4096, # 4 GB limit
device='cuda'
)
# MPS automatically adjusts χ to stay within budget
# Higher entanglement regions may get larger χ
# Lower entanglement regions get smaller χ to compensate
Check Memory Usage#
Monitor memory consumption:
import torch
def get_memory_usage():
"""Get current GPU memory usage."""
if torch.cuda.is_available():
allocated = torch.cuda.memory_allocated() / 1024**3 # GB
reserved = torch.cuda.memory_reserved() / 1024**3
return allocated, reserved
return 0, 0
# Before simulation
alloc_before, res_before = get_memory_usage()
print(f"Memory before: {alloc_before:.2f} GB allocated, {res_before:.2f} GB reserved")
# Create large MPS
mps = AdaptiveMPS(num_qubits=80, bond_dim=64, device='cuda')
# After simulation
alloc_after, res_after = get_memory_usage()
print(f"Memory after: {alloc_after:.2f} GB allocated, {res_after:.2f} GB reserved")
print(f"MPS uses: {alloc_after - alloc_before:.2f} GB")
Estimate Memory Requirements#
def estimate_mps_memory(num_qubits, bond_dim, dtype=torch.complex64):
"""
Estimate MPS memory usage.
MPS memory ≈ n_qubits × χ² × d × sizeof(dtype)
where d=2 (qubit dimension), χ=bond_dim
"""
d = 2 # Qubit dimension
bytes_per_element = 8 if dtype == torch.complex64 else 16
# Memory per tensor: χ_left × d × χ_right
# Assume χ_left ≈ χ_right ≈ χ on average
memory_per_tensor = bond_dim * d * bond_dim * bytes_per_element
total_memory = num_qubits * memory_per_tensor
return total_memory / 1024**3 # Convert to GB
# Check before allocating
for n in [50, 100, 150, 200]:
for chi in [32, 64, 128]:
mem_gb = estimate_mps_memory(n, chi)
print(f"n={n:3d}, χ={chi:3d}: ~{mem_gb:.2f} GB")
# Output:
# n= 50, χ= 32: ~0.10 GB
# n= 50, χ= 64: ~0.41 GB
# n= 50, χ=128: ~1.64 GB
# n=100, χ= 32: ~0.20 GB
# n=100, χ= 64: ~0.82 GB
# n=100, χ=128: ~3.28 GB
# n=150, χ= 32: ~0.30 GB
# n=150, χ= 64: ~1.23 GB
# n=150, χ=128: ~4.92 GB
# n=200, χ= 32: ~0.41 GB
# n=200, χ= 64: ~1.64 GB
# n=200, χ=128: ~6.55 GB
Strategy 3: Distributed MPS#
Split MPS across multiple GPUs or nodes.
Bond-Parallel Distribution#
Distribute bond dimensions across devices:
from atlas_q.distributed_mps import DistributedMPS, DistributedConfig
# 4 GPUs, split bond dimension
config = DistributedConfig(
mode='bond_parallel',
world_size=4,
backend='nccl', # NVIDIA GPUs
device_ids=[0, 1, 2, 3]
)
# Bond dimension 256 split across 4 GPUs = 64 per GPU
mps = DistributedMPS(
num_qubits=60,
bond_dim=256,
config=config
)
# Operations automatically distributed
for i in range(59):
mps.apply_cnot(i, i+1)
from atlas_q.mpo_ops import MPOBuilder
H = MPOBuilder.heisenberg_hamiltonian(n_sites=60, Jx=1.0, Jy=1.0, Jz=1.0, device='cuda')
energy = mps.expectation(H) # Uses all 4 GPUs
Qubit-Parallel Distribution#
Distribute qubits across devices:
# 8 GPUs, split qubits
config = DistributedConfig(
mode='qubit_parallel',
world_size=8,
backend='nccl'
)
# 200 qubits split across 8 GPUs = 25 qubits per GPU
mps = DistributedMPS(
num_qubits=200,
bond_dim=32,
config=config
)
# Each GPU handles a segment of the chain
# Communication needed for gates spanning boundaries
Multi-Node Distribution#
Scale across compute nodes:
# Launch with torchrun or mpirun:
# torchrun --nproc_per_node=4 --nnodes=2 script.py
import torch.distributed as dist
# Initialize distributed backend
dist.init_process_group(backend='nccl')
local_rank = int(os.environ.get('LOCAL_RANK', 0))
torch.cuda.set_device(local_rank)
config = DistributedConfig(
mode='bond_parallel',
world_size=dist.get_world_size(),
backend='nccl'
)
# Total: 2 nodes × 4 GPUs = 8 devices
# Bond dimension 512 split across 8 devices = 64 per device
mps = DistributedMPS(
num_qubits=100,
bond_dim=512,
config=config
)
Strategy 4: Circuit Cutting#
Partition large circuits into smaller subcircuits.
Basic Circuit Cutting#
from atlas_q.circuit_cutting import CircuitCutter, CuttingConfig
# Define large circuit (100 qubits)
gates = []
for i in range(100):
gates.append(('H', [i], []))
for i in range(99):
gates.append(('CNOT', [i, i+1], []))
# Configure cutter
config = CuttingConfig(
max_partition_size=20, # Each partition ≤ 20 qubits
max_cuts=4, # Allow up to 4 cuts
optimization='min_cuts' # Minimize number of cuts
)
cutter = CircuitCutter(config)
# Partition circuit
partitions, cut_locations = cutter.partition_circuit(gates, n_partitions=5)
print(f"Original: 100 qubits")
print(f"Partitions: {len(partitions)}, Cuts: {len(cut_locations)}")
print(f"Classical overhead: {4**len(cut_locations)}×")
# Simulate each partition
from atlas_q.adaptive_mps import AdaptiveMPS
partition_results = []
for i, partition in enumerate(partitions):
mps = AdaptiveMPS(num_qubits=partition.n_qubits, bond_dim=32, device='cuda')
# Apply partition gates
for gate_name, qubits, params in partition.gates:
# Apply gate...
pass
partition_results.append(mps)
print(f"Partition {i}: {partition.n_qubits} qubits simulated")
# Reconstruct full result
final_result = cutter.reconstruct(partition_results, cut_locations)
Minimize Cutting Overhead#
# Try different strategies
strategies = ['min_cuts', 'balanced_size', 'min_entanglement']
for strategy in strategies:
config = CuttingConfig(
max_partition_size=25,
max_cuts=5,
optimization=strategy
)
cutter = CircuitCutter(config)
partitions, cuts = cutter.partition_circuit(gates, n_partitions=4)
overhead = 4 ** len(cuts)
print(f"{strategy:20s}: {len(cuts)} cuts, {overhead}× overhead")
# Choose strategy with acceptable overhead
# Typical: 2-3 cuts → 16-64× overhead is manageable
Strategy 5: Compress Special States#
Use specialized representations for structured states.
Periodic States (Shor’s Algorithm)#
from atlas_q.quantum_hybrid_system import PeriodicState
# For period-finding: state is |x⟩ → |f(x)⟩ where f(x+r) = f(x)
# PeriodicState stores only one period: O(r) memory vs O(2^n)
N = 15 # Number to factor
a = 7 # Coprime to N
n_qubits = 8 # Sufficient for N=15
# Create periodic state
periodic_state = PeriodicState(
num_qubits=n_qubits,
function=lambda x: (a**x) % N,
device='cuda'
)
# Memory: O(period) instead of O(2^n_qubits)
# For N=15, a=7: period=4, saves 2^8/4 = 64× memory
# Apply QFT to find period
periodic_state.apply_qft()
period = periodic_state.measure_period()
print(f"Period: {period}")
Matrix Product States with Symmetries#
Exploit quantum number conservation:
# For systems with conserved quantum numbers (total Sz, particle number)
# Use symmetry-adapted MPS (not fully implemented, concept shown)
# Example: Conserve total Sz = 0
def create_symmetric_mps(num_qubits, total_sz=0):
"""Create MPS in symmetry sector."""
# Only keeps states with Sz = total_sz
# Reduces effective Hilbert space dimension
# Memory savings proportional to reduction
pass
# For 20 qubits with Sz=0:
# Full space: 2^20 = 1M states
# Sz=0 sector: C(20,10) = 184K states
# Savings: ~5×
Strategy 6: Hybrid Classical-Quantum#
Combine exact quantum simulation with classical approximations.
Compress Low-Entanglement Regions#
# Identify low-entanglement regions, compress aggressively
def adaptive_compression(mps):
"""Compress based on local entanglement."""
from atlas_q.diagnostics import bond_entropy_from_S
for i in range(mps.num_qubits - 1):
S_i = mps.get_schmidt_values(bond=i)
entropy_i = bond_entropy_from_S(S_i)
# Low entanglement: compress more
if entropy_i < 0.5:
mps.set_bond_max_dimension(bond_index=i, chi_max=16)
# Medium entanglement
elif entropy_i < 2.0:
mps.set_bond_max_dimension(bond_index=i, chi_max=64)
# High entanglement: keep large
else:
mps.set_bond_max_dimension(bond_index=i, chi_max=256)
mps = AdaptiveMPS(num_qubits=100, bond_dim=16, device='cuda')
# Evolve system
for _ in range(100):
# Apply gates...
pass
# Adaptively compress
adaptive_compression(mps)
Classical Pre-Processing#
Use classical simulation for Clifford parts:
from atlas_q.stabilizer_backend import StabilizerSimulator
# Circuit has Clifford (H, S, CNOT) and non-Clifford (T) gates
clifford_gates = [('H', [i], []) for i in range(50)]
clifford_gates += [('CNOT', [i, i+1], []) for i in range(49)]
# Simulate Clifford part with stabilizers (fast, O(n²))
sim = StabilizerSimulator(n_qubits=50)
for gate_name, qubits, _ in clifford_gates:
if gate_name == 'H':
sim.h(qubits[0])
elif gate_name == 'CNOT':
sim.cnot(qubits[0], qubits[1])
# Convert to MPS for non-Clifford part
mps = sim.to_mps(bond_dim=32, device='cuda')
# Now apply expensive non-Clifford gates
T = torch.tensor([[1, 0], [0, torch.exp(1j * torch.pi / 4)]],
dtype=torch.complex64, device='cuda')
for i in range(50):
mps.apply_single_qubit_gate(i, T)
Benchmarking and Validation#
Verify Large-Scale Simulation Correctness#
# Compare small-scale simulation with large-scale
def validate_scaling(num_qubits_small, num_qubits_large, chi):
"""Verify large simulation is consistent with small."""
import torch
# Small system (exact)
mps_small = AdaptiveMPS(num_qubits=num_qubits_small, bond_dim=chi, device='cuda')
H_small = MPOBuilder.ising_hamiltonian(n_sites=num_qubits_small, J=1.0, h=0.5, device='cuda')
for i in range(num_qubits_small - 1):
mps_small.apply_cnot(i, i+1)
E_small = mps_small.expectation(H_small)
# Large system (distributed or compressed)
config = DistributedConfig(mode='bond_parallel', world_size=2)
mps_large = DistributedMPS(num_qubits=num_qubits_large, bond_dim=chi, config=config)
H_large = MPOBuilder.ising_hamiltonian(n_sites=num_qubits_large, J=1.0, h=0.5, device='cuda')
for i in range(num_qubits_large - 1):
mps_large.apply_cnot(i, i+1)
E_large = mps_large.expectation(H_large)
# Check energy per qubit is consistent
E_per_qubit_small = E_small / num_qubits_small
E_per_qubit_large = E_large / num_qubits_large
print(f"Small ({num_qubits_small} qubits): {E_per_qubit_small:.6f} per qubit")
print(f"Large ({num_qubits_large} qubits): {E_per_qubit_large:.6f} per qubit")
print(f"Relative difference: {abs(E_per_qubit_small - E_per_qubit_large) / abs(E_per_qubit_small):.2%}")
validate_scaling(20, 100, chi=64)
Summary#
To handle large quantum systems in ATLAS-Q:
Adaptive truncation: Start with small χ, grow where needed (chi_max_per_bond, adaptive_mode=True)
Memory budgets: Set hard limits on total memory (memory_budget_mb)
Distributed MPS: Split across GPUs/nodes (bond-parallel or qubit-parallel)
Circuit cutting: Partition circuits into subcircuits (CircuitCutter)
Compress special states: Use PeriodicState for period-finding, symmetry-adapted MPS
Hybrid approaches: Clifford pre-processing with stabilizers, adaptive compression
Validation: Verify large-scale results match small-scale benchmarks
These strategies enable simulations of 50-200+ qubit systems that would otherwise exceed memory or time constraints.
See Also#
Advanced Features Tutorial - Distributed MPS and circuit cutting details
Performance Model - Scaling analysis and bottlenecks
How to Optimize Performance - General performance optimization
Parallel Computation - Multi-GPU and multi-node setup
Tensor Networks - MPS structure and truncation theory