TOPO tutorial
Step-by-Step Tutorial: Setting Up TOPO Coarse-Grained Simulations
This tutorial provides a comprehensive guide for setting up and running coarse-grained molecular dynamics simulations using the TOPO model, based on the example setup in the testing/Quyen folder.
Table of Contents
- Prerequisites
- Understanding the Required Files
- Step-by-Step Setup
- Configuration File Details
- Running the Simulation
- Output Files
- Troubleshooting
Prerequisites
Before starting, ensure you have the following installed:
- Python 3.x (with required packages:
openmm,numpy,parmed,topo) - OpenMM library for molecular dynamics simulations
- STRIDE (optional, for secondary structure analysis)
- CUDA (optional, for GPU acceleration)
Install the TOPO package and dependencies:
pip install openmm numpy parmed
# Install topo package (adjust path as needed)
pip install -e /path/to/topo
Understanding the Required Files
The simulation setup requires several input files:
1. PDB Structure File (P0CX28_clean.pdb)
- Contains the atomic coordinates of your protein structure
- Must be a valid PDB format file
- Should contain alpha-carbon (CA) atoms or full atom structure (CA atoms will be extracted automatically)
- Important: Ensure there are no missing residues in the structure
2. Configuration File (md.ini)
- Contains all simulation parameters
- Uses INI format with an
[OPTIONS]section - Controls simulation length, temperature, pressure, output frequency, etc.
3. Domain Definition File (domain.yaml) - Optional
- Defines protein domains for contact-based interactions
- YAML format specifying residue ranges and interaction strengths
- Used to scale non-bonded interactions between different domains
4. STRIDE Output File (stride.dat) - Optional
- Contains secondary structure assignments from STRIDE analysis
- Used to identify hydrogen bonds and secondary structure elements
- Can be generated by running STRIDE on your PDB file
5. Simulation Script (run_simulation.py)
- Main Python script that reads the configuration and runs the simulation
- Handles model building, force field setup, and MD integration
Step-by-Step Setup
Step 1: Prepare Your Protein Structure
- Obtain or prepare your PDB file
# Example: Your structure file should be named appropriately # e.g., P0CX28_clean.pdb - Verify the structure
- Check that the PDB file is valid
- Ensure no missing residues (the code will warn but not fix this)
- Verify the structure contains the residues you want to simulate
Step 2: Generate STRIDE Output (Optional but Recommended)
If you want to use secondary structure information for contact detection:
# Run STRIDE on your PDB file
stride P0CX28_clean.pdb > stride.dat
The STRIDE output will contain:
- Secondary structure assignments (helix, strand, coil, turns)
- Hydrogen bond information
- Detailed residue-by-residue structure assignments
Note: If you don’t provide stride_output_file in the config, the system may attempt to run STRIDE automatically, but it’s better to provide a pre-generated file.
Step 3: Create Domain Definition File (Optional)
If your protein has multiple domains with different interaction strengths:
- Create
domain.yaml:n_residues: 106 intra_domains: A: residues: [1-106] strength: 2.5044 - Format explanation:
-
n_residues: Total number of residues in the protein -
intra_domains: Define domains (A, B, C, etc.)-
residues: Residue range for the domain (can be[1-50]or[1-50, 60-100]) -
strength: Scaling factor for intra-domain contacts
-
-
- For multi-domain proteins, you can define multiple domains:
n_residues: 200 intra_domains: Domain1: residues: [1-100] strength: 2.5 Domain2: residues: [101-200] strength: 2.5
Step 4: Create Configuration File (md.ini)
Create or modify md.ini with your simulation parameters:
[OPTIONS]
# Simulation parameters
md_steps = 100_000 # Total number of MD steps
dt = 0.015 # Time step in picoseconds
nstxout = 5000 # Steps between writing coordinates/checkpoint
nstlog = 5000 # Steps between writing log file
nstcomm = 100 # Frequency for center of mass motion removal
# Model selection
model = topo # Currently only 'topo' is supported
# Temperature coupling
tcoupl = yes # Enable temperature coupling
ref_t = 300 # Reference temperature in Kelvin
tau_t = 0.05 # Temperature coupling time constant (ps^-1)
# Pressure coupling (requires PBC)
pcoupl = no # Enable pressure coupling
ref_p = 1 # Reference pressure in bar
frequency_p = 25 # Pressure coupling frequency
# Periodic boundary conditions
pbc = no # Enable periodic boundary conditions
box_dimension = 30 # Box size in nm (can be single value or [x, y, z])
# Input files
protein_code = P0CX28_clean # Prefix for output files
pdb_file = P0CX28_clean.pdb # Input structure file
domain_def = domain.yaml # Domain definition file (optional)
stride_output_file = stride.dat # STRIDE output file (optional)
# Output files
checkpoint = P0CX28_clean.chk # Checkpoint file name
# Hardware settings
device = GPU # Use 'GPU' or 'CPU'
ppn = 4 # Number of threads (only for CPU)
# Simulation control
restart = no # Restart from checkpoint
minimize = no # Perform energy minimization (ignored if restart=yes)
Step 5: Verify File Structure
Your working directory should contain:
working_directory/
├── P0CX28_clean.pdb # Input structure
├── md.ini # Configuration file
├── domain.yaml # Domain definition (optional)
├── stride.dat # STRIDE output (optional)
└── run_simulation.py # Simulation script
Configuration File Details
Simulation Parameters
| Parameter | Description | Default | Units |
|---|---|---|---|
md_steps | Total number of MD steps | 1000 | steps |
dt | Integration time step | 0.01 | ps |
nstxout | Steps between coordinate/checkpoint writes | 10 | steps |
nstlog | Steps between log file writes | 10 | steps |
nstcomm | Frequency of COM motion removal | 100 | steps |
Temperature and Pressure
| Parameter | Description | Default | Units |
|---|---|---|---|
tcoupl | Enable temperature coupling | True | boolean |
ref_t | Reference temperature | 300.0 | K |
tau_t | Temperature coupling time constant | 0.01 | ps^-1 |
pcoupl | Enable pressure coupling | False | boolean |
ref_p | Reference pressure | 1.0 | bar |
frequency_p | Pressure coupling frequency | 25 | steps |
Important Notes:
- If
pcoupl = yes, thenpbcmust also beyes -
box_dimensioncan be a single number (cubic box) or[x, y, z](rectangular box) - Pressure coupling requires periodic boundary conditions
File Paths
| Parameter | Description | Required |
|---|---|---|
pdb_file | Input PDB structure file | Yes |
protein_code | Prefix for all output files | Yes |
domain_def | Domain definition YAML file | No |
stride_output_file | STRIDE output file | No |
checkpoint | Checkpoint file name | Yes |
Hardware Settings
| Parameter | Description | Options |
|---|---|---|
device | Compute device | GPU or CPU |
ppn | Number of CPU threads | Integer (only used if device=CPU) |
Simulation Control
| Parameter | Description | Default |
|---|---|---|
restart | Restart from checkpoint | False |
minimize | Energy minimization | True (if not restarting) |
Running the Simulation
Basic Execution
Run the simulation using the configuration file:
python run_simulation.py -f md.ini
Or using the shorter flag:
python run_simulation.py -input md.ini
What Happens During Execution
- Reading Configuration: The script reads
md.iniand parses all parameters - Model Building:
- Loads the PDB structure
- Extracts alpha-carbon atoms only
- Builds bonds, angles, and torsions
- Sets up force field parameters
- Adds non-bonded interactions (contacts, electrostatics)
- System Setup:
- Adds center of mass motion remover
- Sets up integrator (Langevin dynamics)
- Configures platform (GPU/CPU)
- Initialization:
- Sets initial positions (shifted to origin)
- Initializes velocities at reference temperature
- Optionally performs energy minimization
- Simulation:
- Runs MD steps
- Writes coordinates, checkpoints, and log files at specified intervals
- Finalization:
- Writes final structure
- Saves checkpoint
Monitoring Progress
The script prints progress information:
Reading simulation parameters from md.ini file...
Setting number of simulation steps to: 100000
Setting timestep for integration of equations of motion to: 0.015 ps
...
Model built successfully...
Simulation started
--- Finished in X seconds ---
Output Files
After running the simulation, you’ll get several output files:
1. Initial Structure ({protein_code}_init.pdb)
- PDB file of the initial structure after model building
- Useful for visualization and verification
2. Topology File ({protein_code}.psf)
- PSF format topology file
- Contains atom, bond, angle, and dihedral information
3. Trajectory File ({protein_code}.dcd)
- Binary trajectory file containing coordinates at each
nstxoutstep - Can be analyzed with MD analysis tools (VMD, MDAnalysis, etc.)
4. Log File ({protein_code}.log)
- Text file with simulation progress
- Contains: step, time, potential energy, kinetic energy, total energy, temperature, speed, remaining time
- Tab-separated format for easy parsing
5. Checkpoint File ({protein_code}.chk)
- Binary checkpoint file for restarting simulations
- Saved every
nstxoutsteps - Contains complete simulation state
6. Final Structure ({protein_code}_final.pdb)
- PDB file of the final frame
- Last structure from the simulation
Example Output Files (for protein_code = P0CX28_clean):
P0CX28_clean_init.pdb # Initial structure
P0CX28_clean.psf # Topology
P0CX28_clean.dcd # Trajectory
P0CX28_clean.log # Log file
P0CX28_clean.chk # Checkpoint
P0CX28_clean_final.pdb # Final structure
Restarting Simulations
To continue a simulation from a checkpoint:
- Set restart parameters in
md.ini:restart = yes checkpoint = P0CX28_clean.chk - Update
md_steps:- Set to the total number of steps you want (including previous steps)
- The script will calculate remaining steps automatically
- Run normally:
python run_simulation.py -f md.ini
Important: When restarting:
-
minimizeis automatically set toFalse - The simulation continues from the checkpoint step
- Trajectory and log files are appended (not overwritten)
Troubleshooting
Common Issues
- Missing Dependencies
ImportError: No module named 'topo'Solution: Install the topo package or add it to your Python path
- Invalid PDB File
Error reading PDB fileSolution: Verify your PDB file is valid and contains CA atoms
- GPU Not Available
CUDA error or GPU not foundSolution:
- Check CUDA installation
- Set
device = CPUinmd.ini - Verify GPU is accessible:
nvidia-smi
- Missing Residues Warning
Warning: Missing residues detectedSolution: The code will warn but not fix missing residues. Manually fix your PDB file before running
- STRIDE Not Found
STRIDE executable not foundSolution:
- Install STRIDE and add to PATH, or
- Provide pre-generated
stride_output_filein config, or - Set
stride_output_fileto empty/None
- Domain Definition Errors
Error parsing domain.yamlSolution:
- Verify YAML syntax
- Check residue ranges match your protein
- Ensure
n_residuesmatches actual residue count
- Pressure Coupling Without PBC
AssertionError: Pressure coupling requires PBCSolution: Set
pbc = yesand providebox_dimensionwhen using pressure coupling
Performance Tips
- GPU Acceleration: Use
device = GPUfor faster simulations (if available) - Time Step: Larger
dtvalues (0.015-0.02 ps) are typically stable for coarse-grained models - Output Frequency: Reduce
nstxoutandnstlogfor shorter simulations to save disk space - Checkpoint Frequency: Set
nstxoutappropriately to balance restart capability and I/O overhead
Example Workflow Summary
Here’s a complete example workflow:
# 1. Prepare structure
# (Ensure P0CX28_clean.pdb exists)
# 2. Generate STRIDE output
stride P0CX28_clean.pdb > stride.dat
# 3. Create domain.yaml (if needed)
# Edit domain.yaml with your domain definitions
# 4. Create/edit md.ini
# Set all parameters appropriately
# 5. Run simulation
python run_simulation.py -f md.ini
# 6. Check outputs
ls -lh P0CX28_clean.*
tail -f P0CX28_clean.log # Monitor progress
# 7. Analyze results (using your preferred tools)
# VMD, MDAnalysis, custom scripts, etc.
Additional Resources
- TOPO Documentation: See
docs/folder for detailed API documentation - Examples: Check
examples/folder for more simulation setups - OpenMM Documentation: https://openmm.org/documentation
- STRIDE: http://webclu.bio.wzw.tum.de/stride/
Quick Reference: Minimal Setup
For a minimal working setup, you only need:
- PDB file: Your protein structure
- md.ini with minimal options:
[OPTIONS] pdb_file = your_protein.pdb protein_code = your_protein checkpoint = your_protein.chk device = CPU
All other parameters will use defaults. Add optional files (domain.yaml, stride.dat) as needed for your specific simulation requirements.
Enjoy Reading This Article?
Here are some more articles you might like to read next: