Hello Cvetan,
from my own experience I know that Modeller is quite picky about correct
sequence (as it should).
Anyhow, I would not use ClustalW alignments in the first place. When you model
a particular protein using several templates, you are *much* better off using
structural alignments instead of sequence based alignments alone. You could
use for example T-COFFEE together with SAP (or maybe Fugue, but I have no
experience with it). You would then use your PDB files from the start in the
alignment process, not unrelated Genbank sequences, and Modeller would thus
find all residues it needs in the alignment. Alternatively, there are
structrual alignments readily available at Homstrad for a large number of
proteins.
Note also that Modeller comes with a file modlib/CHAINS_all.seq. If you took
the sequences from this file in your ClustalW alignments it should work with
Modeller.
Hope this helps,
Kind regards,
Karsten.
> I'm trying to model 400+ proteins based on ~100 templates. I have an
> alignmentfile of 1500+ sequences comprising of the templates, targets and
> others. ClustalW was used to align the sequences.
>
> I have a few problems.
> - The sequences in the alignment file do not match the aminoacids present
> in the pdb files. _Generally_ the pdb files contain more residues than
> specified in the aligned sequence. Therefore I have to either concatonate
> the pdb files or specify the residues in the appropriate residues in the
> alignment file. - The ID codes in the alignment file do not match the atom
> file names. - There is no "second" line in each entry in the alignment
> file.
>
> Although all this can be done manually, I can't help but wonder if there is
> a way to automate/expidate the process. A paper published by Sanhez and
> Sali (1998) mentioned perl script that allowed for rapid progress through
> the various steps involved with modelling. Suggestions will be most
> appreciated.
>
> Some of the pdb files are complexes. If it can be avoided I'd prefer not
> to use these structures . However if I do decide to use some of them, I
> plan to minimise the E via MD (cns) of the protein (minus the ligand)
> before using it as a template. What are people's thoughts about this?
>
> Many thanks