Dear Modeller Users,
I am trying to use
Modeller to model a comparative protein structure for the type I
collagen.
I came to this trying to reproduce a model from this paper: doi 10.1021/nl103943u (
Nano Lett. 2011, 11, 2
)
It
also used Modeller to create such a model. I contacted the 1st author
for instructions on how he created the model, but unfortunately he said
not to remember anymore (it has been 10 years since).
I also want to mention that before posting this, I read the
Modeller manual and several questions and answers posted in the mail
listing. They helped me get to this point.
In the following I will describe in some details what I have done, with what files and how:
All necessary files for reproduction are attached in this email (I tried to attach all files - pdb, target fasta, outputs, etc. - , but it exceeded the limit of 100 KB).
1 - Target: P02452 (CO1A1_HUMAN) and
P08123 (CO1A2_HUMAN)
The Targets are in the Fasta format. I converted them to .ali files in the PIR format, as suggested by Ben Webb in https://salilab.org/archives/modeller_usage/2010/msg00072.html .
(see FASTAtoPIR.py in the attached files).
I manually edited the protein code and field2 of the .ali files (based on what I read in the manual - File formats B.1 Alignment file (PIR))
That is how it looks like now:
COL1A1.ali
___________________________________________________________________
>P1;COL1A1
sequence:COL1A1: : : : :::-1.00:-1.00
MFSFVDLRLLLLLAATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDRDVWKPEPCRICVCDNGKVLCDDVIC
...*
___________________________________________________________________
2 - Template: 3hr2.pdb
It is very important to me to get a model as close to this structure as possible, since it comprises the collagen fibrillar structure.
3 - Alignment: I decided to align each chain and sequence individually, as many answers in the mail listing suggested.
First of all I tried to align the COL1A1.ali with the chain A of the 3hr2.pdb structure.
I aligned the sequences using salign (align2d was taking too long).
The 3hr2.pdb contains 2 non standard amino acids:
HYP hydroxyproline
LYZ Hydroxylysine
I added - env.io.hetatm = True - in order to read them.
1A_salign.py
___________________________________________________________________
env = environ()
aln = alignment(env)
env.io.hetatm = True
mdl = model(env, file='3hr2', model_segment=('FIRST:A','LAST:A'))
aln.append_model(mdl, align_codes='3hr2A', atom_files='3hr2.pdb')
aln.append(file='COL1A1.ali', align_codes='COL1A1')
aln.salign()
aln.write(file='COL1A1-3hr2A.ali', alignment_format='PIR')
aln.write(file='COL1A1-3hr2A.pap', alignment_format='PAP')
___________________________________________________________________
I added to restyp.lib:
HETATM | HYP | O | | HYP | hydroxyproline
HETATM | LYZ | K | | LYS | lysine, +1
I have HYP topology and parameters, but since I don't have them for LYZ I chose to treat LYZ as LYS (Lysine).
I do not know if I can do it. Is it better to do so or to simply let Modeller continue with the following warning?
read_pd_459W> Residue type LYZ not recognized. 'automodel' model building will treat this residue as a rigid body.
4 - Automodel: I
created 10 models and chose the one with lowest DOPE score (all scored
1.0 in the GA341 score), in my case, COL1A1.B99990010.pdb.
I included HYP topology and parameters information to the top_heav.lib and par.lib files from the Modeller Library. Furthermore I added the
HB1 (new type in HYP top) atom type information to solv.lib, radii.lib and radii14.lib.
The HYP information was taken from
top_all36_prot_modify_res.str and par_all36_prot_modify_res.str in the CHARMm FF.
2A_model-single.py
___________________________________________________________________
from modeller.automodel import *
env = environ()
env.io.hetatm = True
env.libs.topology.read('${LIB}/top_heav.lib')
env.libs.parameters.read('$(LIB)/par.lib') # from manual
a = automodel(env, alnfile='COL1A1-3hr2A.ali',
knowns='3hr2A', sequence='COL1A1',
assess_methods=(assess.DOPE,
assess.GA341))
a.starting_model = 1
a.ending_model = 10
a.make()
___________________________________________________________________
5 - Final model: I
also ran the 3A_evaluate_model and 4A_plot_profiles python scripts and
after that opened the 3hr2 A chain and the final model in VMD.
The
model looks good, however, it is not as close to the 3hr2 structured as
I wished. As a mechanical engineer I am no used to those models nor to
homology modeling and have been learning all here by reading the manual,
instructions and mail listing. Did I make many/any mistakes? Where and
what? Can the model become better (more close to the 3hr2 template
structure)? If yes, how? How to know if a model is "good enough"? Can I
proceed like that and create the models for the chains B and C and
complete the collagen structure?
I am afraid to lose the fibrillar collagen structure (which in this case is similar to a crystalline periodic structure) from 3hr2.pdb if the model is not very similar to the the 3hr2 structure.
Thank
you in advance for reading up to here. I am looking forward to reading your observations, suggestions and corrections.
They are all very welcome.
Amadeus