Protein Mutational Scanning#
Apply spatial coherence to deep mutational scanning (DMS) data to identify functional domains and essential residues.
Overview#
Deep mutational scanning generates fitness landscapes across proteins:
Position: Residue number
Response: Fitness effect (growth rate, binding, activity)
Spatial coherence identifies regions where mutations have consistent effects, distinguishing functional cores from tolerant regions.
Basic Example#
from phaselab.protein.mutscan import (
MutScanLandscape,
analyze_mutscan_coherence,
)
# Your DMS data
landscape = MutScanLandscape(
positions=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
effects=[-0.1, -0.2, -2.5, -2.8, -3.0, -2.7, -0.3, -0.1, 0.0, 0.1],
protein_id="TEM1",
protein_name="TEM-1 β-Lactamase",
effect_type="fitness",
)
# Analyze coherence
result = analyze_mutscan_coherence(
landscape,
window=5, # Residues for local coherence
essential_threshold=-1.0, # Effect threshold for essential
)
# View results
print(f"Functional domains: {len(result.functional_domains)}")
print(f"Essential regions: {len(result.essential_regions)}")
print(f"Variable regions: {len(result.variable_regions)}")
Finding Essential Domains#
# Essential domains: low fitness + high coherence
for domain in result.essential_regions:
print(f"Essential: residues {domain.start}-{domain.end}")
print(f" Mean fitness effect: {domain.mean_effect:.2f}")
print(f" Coherence: {domain.coherence:.3f}")
print(f" Likely function: {domain.annotation}")
# Variable regions: mutations tolerated
for region in result.variable_regions:
print(f"Tolerant: residues {region.start}-{region.end}")
print(f" Good for engineering: mutations here may be neutral")
Per-Residue Coherence Profile#
from phaselab.protein.mutscan import local_coherence_profile
# Get coherence at each position
profile = local_coherence_profile(landscape, window=5)
for pos, coh in zip(profile.positions, profile.coherence):
status = "essential" if coh > 0.7 else "variable"
print(f"Residue {pos}: coherence={coh:.3f} ({status})")
Loading Real DMS Data#
from phaselab.protein.mutscan import load_mutscan_data
# From MaveDB-style file
landscape = load_mutscan_data(
'dms_scores.csv',
position_col='position',
effect_col='score',
protein_id='GFP',
protein_name='Green Fluorescent Protein',
)
# From pandas DataFrame
import pandas as pd
df = pd.read_csv('my_dms.csv')
landscape = MutScanLandscape(
positions=df['residue'].values,
effects=df['fitness'].values,
protein_id='MYC',
)
Mapping to Structure#
Map coherence values to a PDB file for visualization:
from phaselab.protein.mutscan import map_coherence_to_structure
# Map to B-factor column
map_coherence_to_structure(
result,
pdb_file='protein.pdb',
output_file='protein_coherence.pdb',
chain='A',
)
# Visualize in PyMOL:
# spectrum b, blue_white_red
# Blue = low coherence (variable)
# Red = high coherence (essential)
Interpretation Guide#
Coherence |
Fitness Effect |
Interpretation |
|---|---|---|
High (>0.7) |
Strong negative |
Essential functional domain |
High (>0.7) |
Near zero |
Structural (tolerates mutation but constrained) |
Low (<0.4) |
Variable |
Flexible region, good for engineering |
Low (<0.4) |
Strong negative |
Context-dependent (mutation effect varies) |
Use Cases#
Protein Engineering: Target low-coherence regions for mutations
Drug Discovery: High-coherence essential domains are drug targets
Variant Interpretation: High-coherence positions are likely pathogenic
Structure Validation: Compare coherence to structural features
Next Steps#
Binding Landscape Analysis - Protein-ligand binding analysis
/user_guide/howtos/integrate_with_alphafold - Combine with structure prediction