[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [modeller_usage] origin of blosum62.sim.mat and as1.sim.mat



Joshua A. Speidel wrote:
I am curious about the origin of the blosum62.sim.mat and as1.sim.mat.

When I look at the blosum62.sim.mat, I can't figure out the relationship
between the numbers there, and the original blosum62 matrix. Initially,
I thought that the Henikoff & Henikoff values were just scaled to
between 0 and 1000, but that didn't seem to work.
Close. The relationship is m=(9+h)*50 where h is the original H&H 
blosum62 value, and m that used in Modeller. The original blosum62 
similarity measure is first converted from the -4 to 11 range to the 5 
to 20 range (so that we can use 0 similarity for comparing a residue 
with a gap) and then scaled to between 0 and 1000.
I can't find a reference for ALBASE3 mentioned in as1.sim.mat.
I believe that was one of the databases of protein structures used in 
the derivation of the original Modeller restraints - see the '93 
Modeller paper. As far as I understand it, it was derived in a very 
similar way to blosum62, just using the Modeller structure set rather 
than the BLOCKS set used for blosum.
Finally, if I wanted to use my own matrix, is it necessary for me to
scale it to between 0 and 1000?
Yes, and the first line of the file should read either #DISTANCE or 
#SIMILARITY to identify whether it is a distance or similarity matrix.
	Ben Webb, Modeller Caretaker
--
modeller-care@salilab.org             http://www.salilab.org/modeller/
Modeller mail list: http://salilab.org/mailman/listinfo/modeller_usage