BLOSUM matrices were constructed by Henikoff and Henikoff.
- They derived the BLOSUM matrices from a set of ungapped regions from protein database called the BLOCKS database. The procedure is as follows:
- Whenever two sequence were found to have sequence identity greater than some L%, they were put into the same CLuster. This way they clustered sequences from each BLOCK.
- Then the frequency Aab, of observing a in one cluster aligned against b in another cluster. Correction for the sizes of the clusters is done byweighting each occurence by 1/(n1n2), where n1 and n2 are the respective cluster sizes.
- From Aab they estimated qa and Pab. From these they derived the score matrix entries using the standard equation s(a,b) = log Pab/qa qb.
- The resulting log odds matrices were scaled and rounded to the nearest integer value.
BLOSUM 62 is the standard for ungapped alignments and BLOSUM50 is the standard for gapped alignments.
(Lower L values correspond to longer evolutionary time and are applicable for more distant searches.)
0 comments:
Post a Comment