BLAST program selection guide

BLAST Program Selection Guide
By blast-help group, NCBI User Service NCBI, NLM, NIH, 8600 Rockville Pike, Bethesda, MD 20894

Table of Content
1. 2. 3. 4. 5. 6. 7. Introduction BLAST Database Content Program Selection Table Explanation for the program choices given in Tables 3.1 and 3.2 Explanation for the program choices given in Tables 3.3 Explanation onSpecial Purpose Pages Appendices

1. Introduction
NCBI has provided BLAST sequence analysis services for over a decade. For many users, the first question they face is "Which BLAST program should I use?" In order to help users arrive at an answer to this question, we have constructed this table called the "BLAST Program Selection Guide." It is divided into several categories according to thenature and size of the query and the primary goal of the search. Starting from the query sequence on the left and crossreferencing to the right, an user will arrive the specific BLAST program best suited for that search. This document is also available in PDF (1056656 bytes).

2. BLAST Database Content
To discuss BLAST program selection, we first need to know what databases are available and whatsequences they contain. Here we will take a look at the common BLAST databases. According to their content, they are grouped into nucleotide and protein databases. These databases and their detailed compositions are listed in the two tables below. NCBI also provides specialized BLAST databases such as the vector screening database, variety of genome databases for different organisms, and tracedatabases. The database names are being standardized. Their contents are also described here. This is mostly for the three moedel organisms, i.e. human, mouse, and rat. For other organisms, the content of their genome blast pages will be listed when the their special BLAST pages are discussed. (1 of 24)12/17/2004 12:36:22 PM

Table 2.1 Content of Protein Sequence Databases Database nr refseq swissprot pat month pdb env_nr Content Description Non-redundant GenBank CDS translations + PDB + SwissProt + PIR + PRF, excluding those in env_nr. Protein sequences from NCBI reference sequence project. Last major release of the SWISS-PROT protein sequence database (no incremental updates). Proteins from the Patentdivision of GenBank. All new or revised GenBank CDS translations + PDB + SwissProt + PIR + PRF released in the last 30 days. Sequences derived from the 3-dimensional structure records from the Protein Data Bank. Non-redundant CDS translations from env_nt entries. [Back to top]

Table 2.2 Nucleotide Databases for BLAST Database nr refseq_mrna refseq_genomic est est_human est_mouse est_others gss htgspat Content Description All GenBank + EMBL + DDBJ + PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant" due to computational cost. mRNA sequences from NCBI Reference Sequence Project. Genomic sequences from NCBI Reference Sequence Project. Database of GenBank + EMBL + DDBJ sequences from EST division. Human subset of est. Mouse subset of est. Subset ofest other than human or mouse. Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences. Unfinished High Throughput Genomic Sequences: phases 0, 1 and 2. Finished, phase 3 HTG sequences are in nr. Nucleotides from the Patent division of GenBank. (2 of 24)12/17/2004 12:36:22 PM

Sequences derived from the 3-dimensional structure records from Protein Data Bank. They are NOT the coding sequences for the coresponding proteins found in the same PDB record. All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days. Select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. See "Alu alert" by Claverie and...
