http://bioinformatics.burnham-inst.org/cd-hi
The program removes redundant sequences and generate a database with only the representatives, therefore the output database is much smaller. The use of clustered database can not only save time in database searching and result parsing, but also increase the search sensitivity.
The program is written by
Weizhong Li UCSD, San Diego Supercomputer Center La Jolla, CA, 92093 Email liwz@sdsc.edu at Adam Godzik's lab The Burnham Institute La Jolla, CA, 92037 Email adam@burnham-inst.org
This program is free. Download with this click.