home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!biosci!carson.u.washington.edu
- From: henikoff@carson.u.washington.edu (Steven Henikoff)
- Newsgroups: bionet.announce
- Subject: New release of the Blocks Database for searching
- Message-ID: <1k19eoINN48v@shelley.u.washington.edu>
- Date: 25 Jan 93 17:53:28 GMT
- Sender: kristoff@net.bio.net
- Distribution: bionet
- Organization: University of Washington, Seattle
- Lines: 96
- Approved: bionews-moderator@net.bio.net
-
-
- ___________ ___________ ___________
- |\ __________\ |___________| /__________ /|
- | | | | | | | |
- | | **********| |***********| |********** | |
- | | * BLOCKS | | E-MAIL | |SEARCHER * | |
- | | **********| |***********| |********** | |
- \|___________|___________|___________|___________|___________|/
- |\ __________\ /__________ /|
- | | | | Copyright | |
- | | S Agus | | Fred | |
- | |JG Henikoff| | Hutchinson| |
- | | S Henikoff| | Center | |
- \|___________| |____1993___|/
-
- Release 6.0 of the Blocks Database is now available for searching. Blocks are
- short multiply aligned ungapped segments corresponding to the most highly
- conserved regions of proteins. A database of blocks has been constructed by
- successive application of the automated PROTOMAT system to individual entries
- in the PROSITE catalog of protein groups keyed to the SWISS-PROT protein
- sequence databank. The rationale behind searching a database of blocks is
- that information from multiply aligned sequences is present in a concentrated
- form, reducing background and increasing sensitivity to distant
- relationships. If a particular block scores highly, it is possible that the
- sequence is related to the group of sequences the block represents.
- Typically, a group of proteins has more than one region in common and their
- relationship is represented as a series of blocks separated by unaligned
- regions. If a second block for a group also scores highly in the search, the
- evidence that the sequence is related to the group is strengthened, and is
- further strengthened if a third block also scores it highly, and so on. The
- new database consists of 2302 blocks based on 619 protein groups documented
- in Prosite 10.00, which is keyed to Swiss-Prot 24. This represents an 11%
- increase in the number of groups over the previous release based on Prosite
- 9.00 keyed to Swiss-Prot 22.
-
- For a detailed help file, send a blank e-mail message as follows:
-
- To: blocks@howard.fhcrc.org
- Subject: help
-
- Or just send a protein or DNA sequence in FASTA, Genepro, GenBank, EMBL,
- SWISS-PROT, or PIR formats (DNA is automatically translated in all 6 reading
- frames for searching). Here is an example of a protein query in FASTA format:
-
- To: blocks@howard.fhcrc.org
- Subject:
- >YCZ2_YEAST Hypothetical 40.1 KD protein in HMR 3' region
- MKAVVIEDGKAVVKEGVPIPELEEGFVLIKTLAVAGNPTDWAHIDYKVGPQGSILGCDAA
- GQIVKLGPAVDPKDFSIGDYIYGFIHGSSVRFPSNGAFAEYSAISTVVAYKSPNELKFLG
- EDVLPAGPVRSLEGAATIPVSLTTAGLVLTYNLGLNLKWEPSTPQRNGPILLWGGATAVG
- QSLIQLANKLNGFTKIIVVASRKHEKLLKEYGADQLFDYHDIDVVEQIKHKYNNISYLVD
- CVANQNTLQQVYKCAADKQDATVVELTNLTEENVKKENRRQNVTIDRTRLYSIGGHEVPF
- GGITFPADPEARRAATEFVKFINPKISDGQIHHIPARVYKNGLYDVPRILEDIKIGKNSG
- EKLVAVLN
-
- For any group represented in the current database, the blocks and full
- Prosite documentation can be obtained by sending the command 'GET BL?????'
- in the subject line of a blank message. For example, 'get bl00044' retrieves
- the block(s), PROSITE.DAT and PROSITE.DOC entries for Prosite group PS00044.
-
- The Blocks database also has been used to construct amino acid substitution
- matrices, referred to as the 'BLOSUM' (for BLOcks SUbstitution Matrices)
- series. We have obtained improved results using the FASTA, BLASTP and other
- programs with these matrices (see Henikoff, S & Henikoff, J.G. "Amino acid
- substitution matrices from protein blocks" PNAS 89:10915-10919). The single
- best overall matrix, BLOSUM 62, is recommended for general use. Here is
- BLOSUM 62 formatted for use with BLAST:
- ------------------------------cut here---------------------------------------
- # BLAST version of BLOSUM 62 matrix made from BLOCKS v. 5.0 and
- # scaled in half-bits. B, Z and X columns are based on amino acid
- # frequencies from SwissProt 22. * column uses minimum score.
- A R N D C Q E G H I L K M F P S T W Y V B Z X *
- 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4
- -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4
- -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4
- -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4
- 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4
- -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4
- -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4
- 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4
- -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4
- -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4
- -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4
- -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4
- -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4
- -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4
- -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4
- 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4
- 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4
- -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4
- -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4
- 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4
- -2 -1 3 4 -3 0 1 -1 0 -3 -4 0 -3 -3 -2 0 -1 -4 -3 -3 4 1 -1 -4
- -1 0 0 1 -3 3 4 -2 0 -3 -3 1 -1 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4
- 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 -1 -1 -4
- -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1
-