From Documentation
Jump to: navigation, search
Line 201: Line 201:

Revision as of 11:59, 7 January 2016

Description: To find regions of local similarity between sequences.
SHARCNET Package information: see BLAST software page in web portal
Full list of SHARCNET supported software


Sharcnet provides a multithread-safe build of NCBI BLAST+ compiled from source with the intel compiler. A serial build of legacy NCBI BLAST 2.2.20 remains on some clusters. Users are recommended to migrate to BLAST+ asap.

Version Selection

module load blast/2.2.28+

Once the module is loaded the following binaries will be available by default in your path:

[roberpj@hnd19:/opt/sharcnet/blast/2.2.28+/bin] ls
align_format_unit_test    blastn                    gene_info_unit_test  seqdb_demo
bdbloader_unit_test       blastp                seqdb_perf
blastdb_aliastool         blast_services_unit_test  makeblastdb          seqdb_unit_test
blastdbcheck              blast_unit_test           makembindex          tblastn
blastdbcmd                blastx                    makeprofiledb        tblastx
blastdbcp                 convert2blastmask         psiblast             test_pcre
blastdb_format_unit_test  datatool                  rpsblast   
blast_formatter           deltablast                rpstblastn           windowmasker
blast_format_unit_test    dustmasker                seedtop    
blastinput_unit_test      gene_info_reader          segmasker            writedb_unit_test

Job Submission

BLAST 2.2.20

sqsub -r 1h -o ofile%J blastall -p blastn  -i ech_query.fas -d yeast.nt -o  blastn_test.typ
sqsub -r 1h -o ofile%J blastall -p blastx  -i ech_query.fas -d yeast.aa -o  blastx_test.typ
sqsub -r 1h -o ofile%J blastall -p tblastn -i ech_query.fas -d yeast.nt -o tblastn_test.typ
sqsub -r 1h -o ofile%J blastall -p blastp  -i ech_query.fas -d yeast.aa -o  blastp_test.typ
sqsub -r 1h -o ofile%J blastall -p tblastx -i ech_query.fas -d yeast.nt -o tblastx_test.typ

BLAST 2.2.28+

makeblastdb -in igSeqProt.fa -dbtype prot -out db/blast/igSeqProt -hash_index
sqsub -r 60m -q threaded -n 4 -o ofile.%J blastn -task blastn -db swissprot
             -query test.txt -out job.out -evalue 0.001 -num_threads 4

Example Job

Step 1) Prepare blast database:

ssh saw-dev1
gunzip env_nr.gz
mkdir -p db/blast/
makeblastdb -in env_nr -dbtype prot -out db/blast/sorted_env_nr -max_file_sz 500MB
ls db/blast/sorted_env_nr.*.*
db/blast/sorted_env_nr.00.phr  db/blast/sorted_env_nr.01.phr  db/blast/sorted_env_nr.02.phr
db/blast/  db/blast/  db/blast/
db/blast/sorted_env_nr.00.psq  db/blast/sorted_env_nr.01.psq  db/blast/sorted_env_nr.02.psq

Step 2) Create input file:

echo -e ">test\nGGG" >> test.txt
cat test.txt

Step 3) Perform a query:

sqsub -r 1h -q threaded -n 4 --mpp=1G -o ofile.%J blastp -query test.txt
   -db db/blast/env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 4

General Notes

Env Variables

Create a ~/.ncbirc file in your home account such as:


Interactive Jobs

When running blast from the command line its recommended to use iqaluk or one of orcas development node. For instance to run the Example Job interactively on iqaluk with 16 threads and get a timing result do:


time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 1
real	0m7.202s

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 4
real	0m2.990s

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 16
real	0m1.959s

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 32
real	0m2.158s

Scaling Tests

For larger values of threads minimal speedup is found compared to the additional resources required. The following table whose data was generated from jobs run on hound illustrates the point:

ncpus  real time  mpp
-n 1    12.739s   500M
-n 2    8.834s    1.0G
-n 4    4.051s    1.0G   (optimal)
-n 8    3.737s    1.5G
-n 12  3.004s     2.5G
-n 16  2.790s     3.0G
-n 24  3.929s     3.0G

Cmd Line Args

Command line arguments for any of the blast binaries found under /opt/sharcnet/blast/version/bin can be shown by using the -h switch as follows:

[roberpj@hnd19:~] blastn -h
  blastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-perc_identity float_value] [-xdrop_ungap float_value]
    [-xdrop_gap float_value] [-xdrop_gap_final float_value]
    [-searchsp int_value] [-max_hsps_per_subject int_value] [-penalty penalty]
    [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value]
    [-template_type type] [-template_length int_value] [-dust DUST_options]
    [-filtering_db filtering_database]
    [-window_masker_taxid window_masker_taxid]
    [-window_masker_db window_masker_db] [-soft_masking soft_masking]
    [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
    [-best_hit_score_edge float_value] [-window_size int_value]
    [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
    [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-html] [-max_target_seqs num_sequences]
    [-num_threads int_value] [-remote] [-version]
   Nucleotide-Nucleotide BLAST 2.2.28+
Use '-help' to print detailed descriptions of command line arguments


o Blast Homepage

o BLAST+ Release Notes

o BLAST Command Line Applications User Manual

o BLAST Command Line Applications User Manual

o BLAST FTP Download Site

o Blast Download Mirror Readme

o RefSeq: NCBI Reference Sequence Database