Thanks Karsten,
Your message was helpfull. I think I'll try T-COFEE/SAP along with the
ALIGN2D routine. I've developed a few programs/scripts that helped me correct
sequence ID codes to match atom files. The TOP scripts also came in very
handy in preparing these files.
Cvetan
Quoting Karsten Suhre <>:
> Hello Cvetan,
>
> from my own experience I know that Modeller is quite picky about correct
> sequence (as it should).
>
> Anyhow, I would not use ClustalW alignments in the first place. When you
> model
> a particular protein using several templates, you are *much* better off using
>
> structural alignments instead of sequence based alignments alone. You could
> use for example T-COFFEE together with SAP (or maybe Fugue, but I have no
> experience with it). You would then use your PDB files from the start in the
>
> alignment process, not unrelated Genbank sequences, and Modeller would thus
> find all residues it needs in the alignment. Alternatively, there are
> structrual alignments readily available at Homstrad for a large number of
> proteins.
>
> Note also that Modeller comes with a file modlib/CHAINS_all.seq. If you took
>
> the sequences from this file in your ClustalW alignments it should work with
>
> Modeller.
>
> Hope this helps,
>
> Kind regards,
>
> Karsten.
>
> > I'm trying to model 400+ proteins based on ~100 templates. I have an
> > alignmentfile of 1500+ sequences comprising of the templates, targets and
> > others. ClustalW was used to align the sequences.
> >
> > I have a few problems.
> > - The sequences in the alignment file do not match the aminoacids present
> > in the pdb files. _Generally_ the pdb files contain more residues than
> > specified in the aligned sequence. Therefore I have to either concatonate
> > the pdb files or specify the residues in the appropriate residues in the
> > alignment file. - The ID codes in the alignment file do not match the atom
> > file names. - There is no "second" line in each entry in the alignment
> > file.
> >
> > Although all this can be done manually, I can't help but wonder if there
> is
> > a way to automate/expidate the process. A paper published by Sanhez and
> > Sali (1998) mentioned perl script that allowed for rapid progress through
> > the various steps involved with modelling. Suggestions will be most
> > appreciated.
> >
> > Some of the pdb files are complexes. If it can be avoided I'd prefer not
> > to use these structures . However if I do decide to use some of them, I
> > plan to minimise the E via MD (cns) of the protein (minus the ligand)
> > before using it as a template. What are people's thoughts about this?
> >
> > Many thanks
>