From Documentation
Jump to: navigation, search
(Example Job)
Line 4: Line 4:
 
|package_idnumber=9
 
|package_idnumber=9
 
}}
 
}}
 +
 +
{{Template:GrahamUpdate}}
  
 
=Introduction=
 
=Introduction=

Revision as of 15:46, 25 July 2017

BLAST
Description: To find regions of local similarity between sequences.
SHARCNET Package information: see BLAST software page in web portal
Full list of SHARCNET supported software


Note: Some of the information on this page is for our legacy systems only. The page is scheduled for an update to make it applicable to Graham.


Introduction

Sharcnet provides a multithread-safe build of NCBI BLAST+ compiled from source with the intel compiler. A serial build of legacy NCBI BLAST 2.2.20 remains on some clusters. Users are recommended to migrate to BLAST+ asap.

Version Selection

module load blast/2.2.28+

Once the module is loaded the following binaries will be available by default in your path:

[roberpj@hnd19:/opt/sharcnet/blast/2.2.28+/bin] ls
align_format_unit_test    blastn                    gene_info_unit_test  seqdb_demo
bdbloader_unit_test       blastp                    legacy_blast.pl      seqdb_perf
blastdb_aliastool         blast_services_unit_test  makeblastdb          seqdb_unit_test
blastdbcheck              blast_unit_test           makembindex          tblastn
blastdbcmd                blastx                    makeprofiledb        tblastx
blastdbcp                 convert2blastmask         psiblast             test_pcre
blastdb_format_unit_test  datatool                  rpsblast             update_blastdb.pl
blast_formatter           deltablast                rpstblastn           windowmasker
blast_format_unit_test    dustmasker                seedtop              windowmasker_2.2.22_adapter.py
blastinput_unit_test      gene_info_reader          segmasker            writedb_unit_test

Job Submission

BLAST 2.2.20

sqsub -r 1h -o ofile%J blastall -p blastn  -i ech_query.fas -d yeast.nt -o  blastn_test.typ
sqsub -r 1h -o ofile%J blastall -p blastx  -i ech_query.fas -d yeast.aa -o  blastx_test.typ
sqsub -r 1h -o ofile%J blastall -p tblastn -i ech_query.fas -d yeast.nt -o tblastn_test.typ
sqsub -r 1h -o ofile%J blastall -p blastp  -i ech_query.fas -d yeast.aa -o  blastp_test.typ
sqsub -r 1h -o ofile%J blastall -p tblastx -i ech_query.fas -d yeast.nt -o tblastx_test.typ

BLAST 2.2.28+

makeblastdb -in igSeqProt.fa -dbtype prot -out db/blast/igSeqProt -hash_index
sqsub -r 60m -q threaded -n 4 -o ofile.%J blastn -task blastn -db swissprot
             -query test.txt -out job.out -evalue 0.001 -num_threads 4

Example Job

Step 1) Prepare blast database:

ssh saw.sharcnet.ca
ssh saw-dev1
wget http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/blast/db/FASTA/env_nr.gz
gunzip env_nr.gz
mkdir -p db/blast/
makeblastdb -in env_nr -dbtype prot -out db/blast/sorted_env_nr -max_file_sz 500MB
ls db/blast/sorted_env_nr.*.*
db/blast/sorted_env_nr.00.phr  db/blast/sorted_env_nr.01.phr  db/blast/sorted_env_nr.02.phr
db/blast/sorted_env_nr.00.pin  db/blast/sorted_env_nr.01.pin  db/blast/sorted_env_nr.02.pin
db/blast/sorted_env_nr.00.psq  db/blast/sorted_env_nr.01.psq  db/blast/sorted_env_nr.02.psq

Step 2) Create input file:

echo -e ">test\nGGG" >> test.txt
cat test.txt
>test
GGG

Step 3) Perform a query:

sqsub -r 1h -q threaded -n 4 --mpp=1G -o ofile.%J blastp -query test.txt
   -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 4

Step 4) Check results in output file test.out:

cat test.out

General Notes

Env Variables

Create a ~/.ncbirc file in your home account such as:

[NCBI]
Data=/opt/sharcnet/blast/version/data
[BLAST]
BLASTDB=/work/username/your/db/directory
BLASTMAT=/opt/sharcnet/blast/current/data

Interactive Jobs

When running blast from the command line its recommended to use iqaluk or one of orcas development node. For instance to run the Example Job interactively on iqaluk with 16 threads and get a timing result do:

ssh iqaluk.sharcnet.ca

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 1
real	0m7.202s

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 4
real	0m2.990s

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 16
real	0m1.959s

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 32
real	0m2.158s

Scaling Tests

For larger values of threads minimal speedup is found compared to the additional resources required. The following table whose data was generated from jobs run on hound illustrates the point:

ncpus  real time  mpp
---------------------
-n 1    12.739s   500M
-n 2    8.834s    1.0G
-n 4    4.051s    1.0G   (optimal)
-n 8    3.737s    1.5G
-n 12  3.004s     2.5G
-n 16  2.790s     3.0G
-n 24  3.929s     3.0G

Cmd Line Args

Command line arguments for any of the blast binaries found under /opt/sharcnet/blast/version/bin can be shown by using the -h switch as follows:

[roberpj@hnd19:~] blastn -h
USAGE
  blastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-perc_identity float_value] [-xdrop_ungap float_value]
    [-xdrop_gap float_value] [-xdrop_gap_final float_value]
    [-searchsp int_value] [-max_hsps_per_subject int_value] [-penalty penalty]
    [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value]
    [-template_type type] [-template_length int_value] [-dust DUST_options]
    [-filtering_db filtering_database]
    [-window_masker_taxid window_masker_taxid]
    [-window_masker_db window_masker_db] [-soft_masking soft_masking]
    [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
    [-best_hit_score_edge float_value] [-window_size int_value]
    [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
    [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-html] [-max_target_seqs num_sequences]
    [-num_threads int_value] [-remote] [-version]
DESCRIPTION
   Nucleotide-Nucleotide BLAST 2.2.28+
Use '-help' to print detailed descriptions of command line arguments

References

o Blast Homepage
http://blast.ncbi.nlm.nih.gov/Blast.cgi

o BLAST+ Release Notes
http://www.ncbi.nlm.nih.gov/books/NBK131777/

o BLAST Command Line Applications User Manual
http://www.ncbi.nlm.nih.gov/books/NBK1763/

o BLAST Command Line Applications User Manual http://www.ncbi.nlm.nih.gov/books/NBK1763/

o BLAST FTP Download Site
http://www.ncbi.nlm.nih.gov/books/NBK62345/

o Blast Download Mirror Readme
http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/blast/db/README

o RefSeq: NCBI Reference Sequence Database
http://www.ncbi.nlm.nih.gov/refseq/