Subject: Re: [modeller_usage] PDB updates for Modeller
From: Eswar Narayanan <>
Date: Mon, 4 Oct 2004 17:37:20 -0700
Cc: Modeller Usage mailing list <>
On Oct 3, 2004, at 7:47 AM, Bruno Afonso wrote:
Eswar Narayanan wrote:
Since I know this PDB is probably a good model and it won't come up
in seq_search I was wondering how I could manually update
CHAINS_all.seq or create my own sequence database.
The latest release of MODELLER (version 7v7, released last month) has
a new command called SEQFILTER that can be used to cluster PDB
sequences. You can use MAKE_CHAINS (also in the latest release) to
collect the PDB chains prior to running SEQFILTER.
There was a mistake in my previous e-mail. :) The PDB sequence is
missing, which is *bad*, not good. I'm sorry to ask this questions,
but I'm still puzzled as to how to deal with this:
If you know exactly what your template(s) is(are) going to be, you do
not have to use SEQUENCE_SEARCH to "identify" your template. You can
use any of the alignment commands (ALIGN, ALIGN2D etc) to create your
alignment and model your sequence based on that alignment.
1) What's the criteria for make chains_all.seq? I ask this because
clearly not all of PDB is there :) and there are sequences there with
resolutions as high as 5.0 angstroms...
One usually wants to use a non-redundant version of PDB to search for
templates. One way is to first select sequences of all X-ray structures
that are solved at a resolution better than 3.5A, that are longer than
30aa, have no more than 10 non-standard residues, have at least 30
standard residues. These can all be specified as options to
MAKE_CHAINS. You can then cluster these sequences using SEQFILTER to
remove redundancies with a sequence identity threshold (usually set at
30% or 95%).
Ben has put these files on the web at
http://salilab.org/modeller/supplemental.html. These are the
representative sequences derived PDB files at 30% and 95% sequence
identity. All x-ray and NMR PDB chains, with no limits on resolution,
that are at least 30aa long, have more than 30 standard residues and
not more than 10 non-standard residues were use to get these files.
This is just the output of SEQFILTER on last weeks' release (09-28-04)
of PDB.
2) Can't I make a chains_all.seq alike with MY criteria without making
my own script? ie, is there a "right way"(TM) to do it?
See the comments above.
3) I can use MAKE_CHAINS and then load the .chn as a database, but
that involves having me first finding the good PDBs that aren't on the
modeller's DB, which is kind of misses the whole point. I was using
modeller to try to find the good ones in the first place.
Thanks for the tip on seqfilter, but my problem was the sequence
missing in the modeller's default database in the first place ;-)
---
Eswar Narayanan, Ph.D
Mission Bay Genentech Hall
600 16th Street, Suite N474Q
University of California - San Francisco
San Francisco, CA 94143-2240 (CA 94158 for courier)
Tel +1 (415) 514-4233; Fax +1 (415) 514-4231
http://www.salilab.org/~eashwar