[modeller_usage] Best workflow to modeling a disordered loop in a complicated structure

Hello,

I have used modeller to attempt to model a loop region in a tetrameric structure.

To speed up computational time I've started with just the dimer ( which I think also makes sense biologically ) and ran the following script with an .ali file I created. The region of interest is a disordered loop AA~399-417. Following this I used the evaluate model script to pick the best model, which I then used as the template input for the loop.py script. I run the loop.py script at many iterations and then when it finishes I take the top 20 models (out of 1000) by DOPE score and attempt to analyze them. Any advice on how I could improve my method would be great.. I really am at the edge of my understanding here.

I've edited the name of the protein. I can include it as soon as I can verify that this can stay anonymous.

Thanks!!!

Scripts +files :

First )using a template pdb and .ali file I made myself of the Dimer (BIO_headers in chimera and deletion of one dimer of the tetramer)

Input :

>P1;ABCD

structureX:ABCD.pdb: 3 :A:+990:B:::-1.00:-1.00

--TSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFNMDYTLAVYKSPEYESLGFELTVE

RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF

YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY

VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT

VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEL

AQELHVWTDKSSLFEELQSLDIFLAS----------------SIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF

ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES/--TSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFNMDYTLAVYKSPEYESLGFELTVE

RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF

YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY

VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT

VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEL

AQELHVWTDKSSLFEELQSLDIFLAS----------------SIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF

ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES*

>P1;X

sequence:ABCD: : : : ::: 0.00: 0.00

MSTSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFDMDYTLAVYKSPEYESLGFELTVE

RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF

YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY

VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT

VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEF

AQELHVWTDKSSLFEELQSLDIFLAELYKHLDSSSNERPDISSIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF

ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES/MSTSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFDMDYTLAVYKSPEYESLGFELTVE

RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF

YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY

VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT

VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEF

AQELHVWTDKSSLFEELQSLDIFLAELYKHLDSSSNERPDISSIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF

ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES*

# Homology modeling by the automodel class

# Demonstrates how to build multi-chain models, and symmetry restraints

from modeller import *

from modeller.automodel import * # Load the automodel class

log.verbose()

# Override the 'special_restraints' and 'user_after_single_model' methods:

class MyModel(automodel):

def special_restraints(self, aln):

# Constrain the A, B, C and D chains to be identical

s1 = selection(self.chains['A']).only_atom_types('CA')

s2 = selection(self.chains['B']).only_atom_types('CA')

self.restraints.symmetry.append(symmetry(s1, s2, 1))

def user_after_single_model(self):

# Report on symmetry violations greater than 1A after building

# each model:

self.restraints.symmetry.report(1)

env = environ()

# directories for input atom files

env.io.atom_files_directory = ['.', '../atom_files']

# Be sure to use 'MyModel' rather than 'automodel' here!

a = MyModel(env,

alnfile = 'WT_dimer_NoCterm.ali' , # alignment filename

knowns = 'ABCD', # codes of the templates

sequence = 'ABCD') # code of the target

a.starting_model= 1 # index of the first model

a.ending_model = 20 # index of the last model

# (determines how many models to calculate)

a.make() # do homology modeling

# class MyModel(automodel):

2nd) Run eval model and pick top model to use as input for the loop.py script below --->

# Loop refinement of an existing model

from modeller import *

from modeller.automodel import *

log.verbose()

env = environ()

# directories for input atom files

env.io.atom_files_directory = './:../atom_files'

# Create a new class based on 'loopmodel' so that we can redefine

# select_loop_atoms (necessary)

class MyLoop(loopmodel):

# This routine picks the residues to be refined by loop modeling

def select_loop_atoms(self):

# 10 residue insertion

return selection(self.residue_range('399:A', '417:A'))

m = MyLoop(env,

inimodel='ABCD_Top_Model.pdb', # initial model of the target

sequence='ABCD') # code of the target

m.loop.starting_model= 1 # index of the first loop model

m.loop.ending_model = 1000 # index of the last loop model

m.loop.md_level = refine.slow # loop refinement method; this yields

# models quickly but of low quality;

# use refine.slow for better models

m.make()

3rd) Run model_energies script

Then I rank them using this perl one liner ( just cats and displays the dope and the file ) can run in any unix terminal.

cat model*.log |perl -lane' print if /DOPE\ score/ || /ABCD.BL/'| perl -pe 's/ +/\t/g'|perl -pe 's/:\t//g' |cut -f3,4|perl -pe 's/\n/\t/'|perl -pe 's/(\d)\t(ABCD)/$1\n$2/g'|sort -k2,2n|head -20

4) Take top 20 models and analyze

Any help with where I'm doing things wrong or how I can improve would be awesome. How many models in loop.py would you recommend to get a good answer? Is there away to interpret the models created as a whole ....for example say something like "60% of the time residue A is next to residue B and 20% of the time its over in this pocket , therefor".... etc

My major aim is to compare different ligand bound crystal states, or mutation states, and show that the models created are different between them. Then use this to design wet lab experiments to test.

Thanks again