This command calculates percentage residue identities for all pairs of sequences in the current alignment. The percentage residue identity is defined as the number of identical residues divided by the length of the shorter sequence.
In addition to the output in the log file, this routine creates file matrix_file with pairwise sequence distances that can be used directly as the input to the tree making programs of the PHYLIP package, such as KITSCH [Felsenstein, 1985], and also for the environ.dendrogram() and environ.principal_components() commands. A more general version of this command, which allows a user specified measure for residue-residue differences is alignment.compare_sequences().
# Example for: alignment.id_table(), alignment.compare_sequences(), # misc.principal_components(), misc.dendrogram() # Pairwise sequence identity between sequences in the alignment. from modeller import * env = environ() env.io.atom_files_directory = '../atom_files' # Read all entries in this alignment: aln = alignment(env, file='toxin.ali') # Access pairwise properties: s1 = aln[0] s2 = aln[1] print "%s and %s have %d equivalences, and are %.2f%% identical" % \ (s1, s2, s1.get_num_equiv(s2), s1.get_sequence_identity(s2)) # Calculate pairwise sequence identities: aln.id_table(matrix_file='toxin_id.mat') # Calculate pairwise sequence similarities: mdl = model(env, file='2ctx', model_segment=('1:', '71:')) aln.compare_sequences(mdl, rr_file='$(LIB)/as1.sim.mat', max_gaps_match=1, matrix_file='toxin.mat', variability_file='toxin.var') mdl.write(file='2ctx.var') # Do principal components clustering using sequence similarities: env.principal_components(matrix_file='toxin.mat', file='toxin.princ') # Dendrogram in the log file: env.dendrogram(matrix_file='toxin.mat', cluster_cut=-1.0)