Alignment.append() — read sequences and/or their alignment

append(file, align_codes='all', atom_files=None, remove_gaps=True, alignment_format='PIR', io=None, allow_alternates=False)
Output:
end_of_file

This command reads the sequence(s) and/or their alignment from a text file. Only sequences with the specified codes are read in; align_codes = 'all' can be used to read all sequences. The sequences are added to any currently in the alignment.

file can be either a file name or a readable file handle (see modfile.File()).

There are several alignment formats:

  1. The 'PIR' format resembles that of the PIR sequence database. It is described in Section B.1 and is used for comparative modeling because it allows for additional data about the proteins that are useful for automated access to the atomic coordinates.

  2. The 'FASTA' format resembles the 'PIR' format but has a missing second ‘comment’ line and a missing star at the end of each sequence.

  3. The 'PAP' format is nicer to look at but contains less information and is not used by other programs. When used in conjunction with PDB files, the PDB files must contain exactly the residues in the sequences in the 'PAP' file; i.e., it is not possible to use only a segment of a PDB file. In addition, the 'PAP' protein codes must be expandable into proper PDB atom filenames, as described in Section 5.1.3. Alternatively, a list of PDB file names can be specified with the atom_files parameter, in the same order as the sequences read from the alignment file. (atom_files is not used for other alignment formats.) The protein sequence can now start in any column (this was limited to column 11 before release 5).

  4. The 'QUANTA' format can be used to communicate with the QUANTA program. You are not supposed to mix 'QUANTA' format with any other format because the 'QUANTA' format contains residue numbers which do not occur in the other formats and are difficult to guess correctly. MODELLER can write out alignments in the 'QUANTA' format but cannot read them in.

  5. The 'INSIGHT' format is very similar to the 'PAP' format and can sometimes be used to communicate with the INSIGHTII program. When used in conjunction with PDB files, the same rules as for the 'PAP' format apply.

  6. The 'PSS' format is in the .horiz format used by PSI-PRED to report secondary structure predictions of sequences. A confidence of the prediction is also reported as an integer value between 0 and 9 (high).

If remove_gaps = True, positions with gaps (or whitespace) in all selected sequences are removed from the alignment.

The io argument is required since PIR files can contain empty sequences or ranges; in this case, the sequence or range is read from the corresponding PDB file.

If allow_alternates = True, and reading a 'PIR' file where ‘.’ is used to force MODELLER to read the sequence range from the corresponding PDB file (see Section B.1), then the search for matches between the alignment sequence and PDB is made a little more flexible. Not only will an exact equivalence of one-letter codes be considered a match, but each residue's alternate (as defined by the STD column in 'modlib/restyp.lib') will also count as a match; for example, B (ASX) in the alignment will be considered a match for N (ASN) in the PDB, while G (GLY) in the alignment will match any non-standard residue in the PDB for which an explicit equivalence has not been defined (the DEFATM behavior in 'modlib/restyp.lib'). The alignment sequence will be modified to match the exact sequence from the PDB. This is useful if the alignment sequence is extracted from a database containing 'cleaned' sequences, e.g. that created by SequenceDB.read().

For 'PIR' and 'FASTA' files, the end_of_file variable is set to 1 if MODELLER reached the end of the file during the read, or 0 otherwise.

This command can raise a FileFormatError if the alignment file format is invalid, or a SequenceMismatchError if a 'PIR' sequence does not match that read from PDB (when an empty range is given).

Example: examples/commands/read_alignment.py

# Example for: Alignment.append(), Alignment.write(),
#              Alignment.check()

# Read an alignment, write it out in the 'PAP' format, and
# check the alignment of the N-1 structures as well as the
# alignment of the N-th sequence with each of the N-1 structures.

from modeller import *

log.level(output=1, notes=1, warnings=1, errors=1, memory=0)
env = Environ()
env.io.atom_files_directory = ['../atom_files']

aln = Alignment(env)
aln.append(file='toxin.ali', align_codes='all')
aln.write(file='toxin.pap', alignment_format='PAP')
aln.write(file='toxin.fasta', alignment_format='FASTA')
aln.check()