I have used modeller to attempt to model a loop region in a tetrameric structure.
To speed up computational time I've started with just the dimer ( which I think also makes sense biologically ) and ran the following script with an .ali file I created. The region of interest is a disordered loop AA~399-417. Following this I used the evaluate model script to pick the best model, which I then used as the template input for the loop.py script. I run the loop.py script at many iterations and then when it finishes I take the top 20 models (out of 1000) by DOPE score and attempt to analyze them. Any advice on how I could improve my method would be great.. I really am at the edge of my understanding here.
I've edited the name of the protein. I can include it as soon as I can verify that this can stay anonymous.
>P1;ABCD
structureX:ABCD.pdb: 3 :A:+990:B:::-1.00:-1.00
--TSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFNMDYTLAVYKSPEYESLGFELTVE
RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF
YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY
VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT
VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEL
AQELHVWTDKSSLFEELQSLDIFLAS----------------SIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF
ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES/--TSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFNMDYTLAVYKSPEYESLGFELTVE
RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF
YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY
VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT
VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEL
AQELHVWTDKSSLFEELQSLDIFLAS----------------SIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF
ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES*
>P1;X
sequence:ABCD: : : : ::: 0.00: 0.00
MSTSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFDMDYTLAVYKSPEYESLGFELTVE
RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF
YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY
VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT
VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEF
AQELHVWTDKSSLFEELQSLDIFLAELYKHLDSSSNERPDISSIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF
ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES/MSTSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFDMDYTLAVYKSPEYESLGFELTVE
RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF
YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY
VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT
VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEF
AQELHVWTDKSSLFEELQSLDIFLAELYKHLDSSSNERPDISSIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF
ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES*
# Homology modeling by the automodel class
#
# Demonstrates how to build multi-chain models, and symmetry restraints
#
from modeller import *
from modeller.automodel import * # Load the automodel class
log.verbose()
# Override the 'special_restraints' and 'user_after_single_model' methods:
class MyModel(automodel):
def special_restraints(self, aln):
# Constrain the A, B, C and D chains to be identical
s1 = selection(self.chains['A']).only_atom_types('CA')
s2 = selection(self.chains['B']).only_atom_types('CA')
self.restraints.symmetry.append(symmetry(s1, s2, 1))
def user_after_single_model(self):
# Report on symmetry violations greater than 1A after building
# each model:
self.restraints.symmetry.report(1)
env = environ()
# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']
# Be sure to use 'MyModel' rather than 'automodel' here!
a = MyModel(env,
alnfile = 'WT_dimer_NoCterm.ali' , # alignment filename
knowns = 'ABCD', # codes of the templates
sequence = 'ABCD') # code of the target
a.starting_model= 1 # index of the first model
a.ending_model = 20 # index of the last model
# (determines how many models to calculate)
a.make() # do homology modeling
#
# class MyModel(automodel):
2nd) Run eval model and pick top model to use as input for the loop.py script below --->
# Loop refinement of an existing model
from modeller import *
from modeller.automodel import *
log.verbose()
env = environ()
# directories for input atom files
env.io.atom_files_directory = './:../atom_files'
# Create a new class based on 'loopmodel' so that we can redefine
# select_loop_atoms (necessary)
class MyLoop(loopmodel):
# This routine picks the residues to be refined by loop modeling
def select_loop_atoms(self):
# 10 residue insertion
return selection(self.residue_range('399:A', '417:A'))
m = MyLoop(env,
inimodel='ABCD_Top_Model.pdb', # initial model of the target
sequence='ABCD') # code of the target
m.loop.starting_model= 1 # index of the first loop model
m.loop.ending_model = 1000 # index of the last loop model
m.loop.md_level = refine.slow # loop refinement method; this yields
# models quickly but of low quality;
# use refine.slow for better models
m.make()
3rd) Run model_energies script
Then I rank them using this perl one liner ( just cats and displays the dope and the file ) can run in any unix terminal.
cat model*.log |perl -lane' print if /DOPE\ score/ || /ABCD.BL/'| perl -pe 's/ +/\t/g'|perl -pe 's/:\t//g' |cut -f3,4|perl -pe 's/\n/\t/'|perl -pe 's/(\d)\t(ABCD)/$1\n$2/g'|sort -k2,2n|head -20
4) Take top 20 models and analyze
Any help with where I'm doing things wrong or how I can improve would be awesome. How many models in loop.py would you recommend to get a good answer? Is there away to interpret the models created as a whole ....for example say something like "60% of the time residue A is next to residue B and 20% of the time its over in this pocket , therefor".... etc
My major aim is to compare different ligand bound crystal states, or mutation states, and show that the models created are different between them. Then use this to design wet lab experiments to test.
Thanks again