TOPO tutorial

Step-by-Step Tutorial: Setting Up TOPO Coarse-Grained Simulations

This tutorial provides a comprehensive guide for setting up and running coarse-grained molecular dynamics simulations using the TOPO model, based on the example setup in the testing/Quyen folder.

Table of Contents

  1. Prerequisites
  2. Understanding the Required Files
  3. Step-by-Step Setup
  4. Configuration File Details
  5. Running the Simulation
  6. Output Files
  7. Troubleshooting

Prerequisites

Before starting, ensure you have the following installed:

  • Python 3.x (with required packages: openmm, numpy, parmed, topo)
  • OpenMM library for molecular dynamics simulations
  • STRIDE (optional, for secondary structure analysis)
  • CUDA (optional, for GPU acceleration)

Install the TOPO package and dependencies:

pip install openmm numpy parmed
# Install topo package (adjust path as needed)
pip install -e /path/to/topo

Understanding the Required Files

The simulation setup requires several input files:

1. PDB Structure File (P0CX28_clean.pdb)

  • Contains the atomic coordinates of your protein structure
  • Must be a valid PDB format file
  • Should contain alpha-carbon (CA) atoms or full atom structure (CA atoms will be extracted automatically)
  • Important: Ensure there are no missing residues in the structure

2. Configuration File (md.ini)

  • Contains all simulation parameters
  • Uses INI format with an [OPTIONS] section
  • Controls simulation length, temperature, pressure, output frequency, etc.

3. Domain Definition File (domain.yaml) - Optional

  • Defines protein domains for contact-based interactions
  • YAML format specifying residue ranges and interaction strengths
  • Used to scale non-bonded interactions between different domains

4. STRIDE Output File (stride.dat) - Optional

  • Contains secondary structure assignments from STRIDE analysis
  • Used to identify hydrogen bonds and secondary structure elements
  • Can be generated by running STRIDE on your PDB file

5. Simulation Script (run_simulation.py)

  • Main Python script that reads the configuration and runs the simulation
  • Handles model building, force field setup, and MD integration

Step-by-Step Setup

Step 1: Prepare Your Protein Structure

  1. Obtain or prepare your PDB file
    # Example: Your structure file should be named appropriately
    # e.g., P0CX28_clean.pdb
    
  2. Verify the structure
    • Check that the PDB file is valid
    • Ensure no missing residues (the code will warn but not fix this)
    • Verify the structure contains the residues you want to simulate

If you want to use secondary structure information for contact detection:

# Run STRIDE on your PDB file
stride P0CX28_clean.pdb > stride.dat

The STRIDE output will contain:

  • Secondary structure assignments (helix, strand, coil, turns)
  • Hydrogen bond information
  • Detailed residue-by-residue structure assignments

Note: If you don’t provide stride_output_file in the config, the system may attempt to run STRIDE automatically, but it’s better to provide a pre-generated file.

Step 3: Create Domain Definition File (Optional)

If your protein has multiple domains with different interaction strengths:

  1. Create domain.yaml:
    n_residues: 106
    intra_domains:
      A:
        residues: [1-106]
        strength: 2.5044
    
  2. Format explanation:
    • n_residues: Total number of residues in the protein
    • intra_domains: Define domains (A, B, C, etc.)
      • residues: Residue range for the domain (can be [1-50] or [1-50, 60-100])
      • strength: Scaling factor for intra-domain contacts
  3. For multi-domain proteins, you can define multiple domains:
    n_residues: 200
    intra_domains:
      Domain1:
        residues: [1-100]
        strength: 2.5
      Domain2:
        residues: [101-200]
        strength: 2.5
    

Step 4: Create Configuration File (md.ini)

Create or modify md.ini with your simulation parameters:

[OPTIONS]
# Simulation parameters
md_steps = 100_000          # Total number of MD steps
dt = 0.015                  # Time step in picoseconds
nstxout = 5000              # Steps between writing coordinates/checkpoint
nstlog = 5000               # Steps between writing log file
nstcomm = 100               # Frequency for center of mass motion removal

# Model selection
model = topo                # Currently only 'topo' is supported

# Temperature coupling
tcoupl = yes                # Enable temperature coupling
ref_t = 300                 # Reference temperature in Kelvin
tau_t = 0.05                # Temperature coupling time constant (ps^-1)

# Pressure coupling (requires PBC)
pcoupl = no                 # Enable pressure coupling
ref_p = 1                   # Reference pressure in bar
frequency_p = 25            # Pressure coupling frequency

# Periodic boundary conditions
pbc = no                    # Enable periodic boundary conditions
box_dimension = 30          # Box size in nm (can be single value or [x, y, z])

# Input files
protein_code = P0CX28_clean # Prefix for output files
pdb_file = P0CX28_clean.pdb # Input structure file
domain_def = domain.yaml    # Domain definition file (optional)
stride_output_file = stride.dat  # STRIDE output file (optional)

# Output files
checkpoint = P0CX28_clean.chk  # Checkpoint file name

# Hardware settings
device = GPU                # Use 'GPU' or 'CPU'
ppn = 4                     # Number of threads (only for CPU)

# Simulation control
restart = no                # Restart from checkpoint
minimize = no               # Perform energy minimization (ignored if restart=yes)

Step 5: Verify File Structure

Your working directory should contain:

working_directory/
├── P0CX28_clean.pdb        # Input structure
├── md.ini                   # Configuration file
├── domain.yaml              # Domain definition (optional)
├── stride.dat               # STRIDE output (optional)
└── run_simulation.py        # Simulation script

Configuration File Details

Simulation Parameters

Parameter Description Default Units
md_steps Total number of MD steps 1000 steps
dt Integration time step 0.01 ps
nstxout Steps between coordinate/checkpoint writes 10 steps
nstlog Steps between log file writes 10 steps
nstcomm Frequency of COM motion removal 100 steps

Temperature and Pressure

Parameter Description Default Units
tcoupl Enable temperature coupling True boolean
ref_t Reference temperature 300.0 K
tau_t Temperature coupling time constant 0.01 ps^-1
pcoupl Enable pressure coupling False boolean
ref_p Reference pressure 1.0 bar
frequency_p Pressure coupling frequency 25 steps

Important Notes:

  • If pcoupl = yes, then pbc must also be yes
  • box_dimension can be a single number (cubic box) or [x, y, z] (rectangular box)
  • Pressure coupling requires periodic boundary conditions

File Paths

Parameter Description Required
pdb_file Input PDB structure file Yes
protein_code Prefix for all output files Yes
domain_def Domain definition YAML file No
stride_output_file STRIDE output file No
checkpoint Checkpoint file name Yes

Hardware Settings

Parameter Description Options
device Compute device GPU or CPU
ppn Number of CPU threads Integer (only used if device=CPU)

Simulation Control

Parameter Description Default
restart Restart from checkpoint False
minimize Energy minimization True (if not restarting)

Running the Simulation

Basic Execution

Run the simulation using the configuration file:

python run_simulation.py -f md.ini

Or using the shorter flag:

python run_simulation.py -input md.ini

What Happens During Execution

  1. Reading Configuration: The script reads md.ini and parses all parameters
  2. Model Building:
    • Loads the PDB structure
    • Extracts alpha-carbon atoms only
    • Builds bonds, angles, and torsions
    • Sets up force field parameters
    • Adds non-bonded interactions (contacts, electrostatics)
  3. System Setup:
    • Adds center of mass motion remover
    • Sets up integrator (Langevin dynamics)
    • Configures platform (GPU/CPU)
  4. Initialization:
    • Sets initial positions (shifted to origin)
    • Initializes velocities at reference temperature
    • Optionally performs energy minimization
  5. Simulation:
    • Runs MD steps
    • Writes coordinates, checkpoints, and log files at specified intervals
  6. Finalization:
    • Writes final structure
    • Saves checkpoint

Monitoring Progress

The script prints progress information:

Reading simulation parameters from md.ini file...
Setting number of simulation steps to: 100000
Setting timestep for integration of equations of motion to: 0.015 ps
...
Model built successfully...
Simulation started
--- Finished in X seconds ---

Output Files

After running the simulation, you’ll get several output files:

1. Initial Structure ({protein_code}_init.pdb)

  • PDB file of the initial structure after model building
  • Useful for visualization and verification

2. Topology File ({protein_code}.psf)

  • PSF format topology file
  • Contains atom, bond, angle, and dihedral information

3. Trajectory File ({protein_code}.dcd)

  • Binary trajectory file containing coordinates at each nstxout step
  • Can be analyzed with MD analysis tools (VMD, MDAnalysis, etc.)

4. Log File ({protein_code}.log)

  • Text file with simulation progress
  • Contains: step, time, potential energy, kinetic energy, total energy, temperature, speed, remaining time
  • Tab-separated format for easy parsing

5. Checkpoint File ({protein_code}.chk)

  • Binary checkpoint file for restarting simulations
  • Saved every nstxout steps
  • Contains complete simulation state

6. Final Structure ({protein_code}_final.pdb)

  • PDB file of the final frame
  • Last structure from the simulation

Example Output Files (for protein_code = P0CX28_clean):

P0CX28_clean_init.pdb    # Initial structure
P0CX28_clean.psf         # Topology
P0CX28_clean.dcd         # Trajectory
P0CX28_clean.log         # Log file
P0CX28_clean.chk         # Checkpoint
P0CX28_clean_final.pdb   # Final structure

Restarting Simulations

To continue a simulation from a checkpoint:

  1. Set restart parameters in md.ini:
    restart = yes
    checkpoint = P0CX28_clean.chk
    
  2. Update md_steps:
    • Set to the total number of steps you want (including previous steps)
    • The script will calculate remaining steps automatically
  3. Run normally:
    python run_simulation.py -f md.ini
    

Important: When restarting:

  • minimize is automatically set to False
  • The simulation continues from the checkpoint step
  • Trajectory and log files are appended (not overwritten)

Troubleshooting

Common Issues

  1. Missing Dependencies
    ImportError: No module named 'topo'
    

    Solution: Install the topo package or add it to your Python path

  2. Invalid PDB File
    Error reading PDB file
    

    Solution: Verify your PDB file is valid and contains CA atoms

  3. GPU Not Available
    CUDA error or GPU not found
    

    Solution:

    • Check CUDA installation
    • Set device = CPU in md.ini
    • Verify GPU is accessible: nvidia-smi
  4. Missing Residues Warning
    Warning: Missing residues detected
    

    Solution: The code will warn but not fix missing residues. Manually fix your PDB file before running

  5. STRIDE Not Found
    STRIDE executable not found
    

    Solution:

    • Install STRIDE and add to PATH, or
    • Provide pre-generated stride_output_file in config, or
    • Set stride_output_file to empty/None
  6. Domain Definition Errors
    Error parsing domain.yaml
    

    Solution:

    • Verify YAML syntax
    • Check residue ranges match your protein
    • Ensure n_residues matches actual residue count
  7. Pressure Coupling Without PBC
    AssertionError: Pressure coupling requires PBC
    

    Solution: Set pbc = yes and provide box_dimension when using pressure coupling

Performance Tips

  1. GPU Acceleration: Use device = GPU for faster simulations (if available)
  2. Time Step: Larger dt values (0.015-0.02 ps) are typically stable for coarse-grained models
  3. Output Frequency: Reduce nstxout and nstlog for shorter simulations to save disk space
  4. Checkpoint Frequency: Set nstxout appropriately to balance restart capability and I/O overhead

Example Workflow Summary

Here’s a complete example workflow:

# 1. Prepare structure
# (Ensure P0CX28_clean.pdb exists)

# 2. Generate STRIDE output
stride P0CX28_clean.pdb > stride.dat

# 3. Create domain.yaml (if needed)
# Edit domain.yaml with your domain definitions

# 4. Create/edit md.ini
# Set all parameters appropriately

# 5. Run simulation
python run_simulation.py -f md.ini

# 6. Check outputs
ls -lh P0CX28_clean.*
tail -f P0CX28_clean.log  # Monitor progress

# 7. Analyze results (using your preferred tools)
# VMD, MDAnalysis, custom scripts, etc.

Additional Resources

  • TOPO Documentation: See docs/ folder for detailed API documentation
  • Examples: Check examples/ folder for more simulation setups
  • OpenMM Documentation: https://openmm.org/documentation
  • STRIDE: http://webclu.bio.wzw.tum.de/stride/

Quick Reference: Minimal Setup

For a minimal working setup, you only need:

  1. PDB file: Your protein structure
  2. md.ini with minimal options:
    [OPTIONS]
    pdb_file = your_protein.pdb
    protein_code = your_protein
    checkpoint = your_protein.chk
    device = CPU
    

All other parameters will use defaults. Add optional files (domain.yaml, stride.dat) as needed for your specific simulation requirements.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra
  • Displaying External Posts on Your al-folio Blog
  • a post with plotly.js
  • a post with image galleries
  • a post with tabs