All the sequences and segments are defined in the alignment
array. The first block of sequences, the ones with segments,
are the first ALIGN_BLOCK sequences. The regions
corresponding to the segments are defined by the last
entry in the alignment as contiguous blocks of non-gap
residues. Any standard single character residue code may be used. The
segments must be separated by gap residues, `-'. The remaining
sequences from ALIGN_BLOCK + 1 to are the
second block of sequences. The alignment
of the sequences within the two blocks does not change.
A sample alignment file is
The enumeration of alignments explores all possible combinations
of alignments between each segment and the 2nd block of sequences:
The starting position of each segment is varied relative to
the input alignment in the interval from SEGMENT_SHIFT[
]
to SEGMENT_SHIFT[
]. There has to be at
least MIN_LOOP_LENGTH[
] and MIN_LOOP_LENGTH[
]
residues that are not in any segment before and after the
-th segment,
respectively. The location of the N-terminus of segment
is varied
relative to the location in the input alignment in the interval from
SEGMENT_GROWTH_N[
] to SEGMENT_GROWTH_N[
].
Similarly, the location of the C-terminus of segment
is varied
relative to the location in the input alignment in the interval from
SEGMENT_GROWTH_C[
] to SEGMENT_GROWTH_C[
].
The shortening and lengthening of the segments may be useful in
determining the best anchor regions for modeling of a loop.
Each alignment is scored according to the
similarity scoring matrix specified by filename RR_FILE. This matrix may
contain residue--gap scores, the gap being residue type 21; otherwise
the value is set to the smallest value in the matrix. The score
for an alignment is obtained by summing scores only over
all alignment positions corresponding
to the segments (no gap penalty is added for loops). When there is more than
one sequence in any of the two blocks, the position score is an average
of all pairwise comparisons between the two blocks of sequences. In the
case where the number of positions in the alignment changes (i.e., the
segments grow or shorten), the scores are not comparable to each other.
It is feasible to enumerate on the order of different alignments
in less than one hour of CPU time.
In general, two runs are required. In the first run, the alignments
are scored and a histogram of the scores is written to file
FILE. Then this file must be inspected to determine the cutoff
SEGMENT_CUTOFF. In the second run, all the alignments with a score
higher than SEGMENT_CUTOFF are written to files in the
PIR format, using the standard filenaming convention:
OUTPUT_DIRECTORY/ROOT_NAMEFILE_IDFILE_EXT,
where
is the alignment file counter. In addition, the alignments
are also written out in the PAP format for easier inspection by
eye. Thus, SEGMENT_CUTOFF has to be set to a very large value in
the first run, to avoid writting alignment files. During a run,
a message is written to the log every SEGMENT_REPORT
aligments; this is useful for knowing what is going on during very
long runs.