From Documentation
Jump to: navigation, search
 
Line 1: Line 1:
 +
{{Template:CCDelete}}
 
{{Software
 
{{Software
 
|package_name=BLAST
 
|package_name=BLAST

Latest revision as of 09:04, 6 June 2019

This page is scheduled for deletion because it is either redundant with information available on the CC wiki, or the software is no longer supported.
BLAST
Description: To find regions of local similarity between sequences.
SHARCNET Package information: see BLAST software page in web portal
Full list of SHARCNET supported software


Introduction

Sharcnet provides a multithread-safe build of NCBI BLAST+ compiled from source. A serial build of legacy NCBI BLAST 2.2.20 remains on some clusters. Users are recommended to migrate to BLAST+ asap.

Which BLAST Program to Use

On both national platforms (i.e. graham, cedar, etc.) and legacy systems, once the module is loaded you can refer to the following table to know which blast program to use:

Program Query Type DB Type Comparison Note
blastn Nucleotide Nucleotide Nucleotide-Nucleotide
blastp Protein Protein Protein-Protein
tblastn Protein Nucleotide Protein-Protein The database is translated into protein
blastx Nucleotide Protein Protein-Protein The queries are translated into protein
tblastx Nucleotide Nucleotide Protein-Protein The queries and database are translated into protein

Usage on National Platforms (i.e. Graham, Cedar)

Version Selection

Loading version 2.6.0+:

$ module load gcc/5.4.0 blast+/2.6.0

Loading version 2.7.1+:

$ module load gcc/7.3.0 blast+/2.7.1

Blast DB

On national platforms, the blast database is available at the following path as described in more details on Compute Canada Documentation wiki here:

 /cvmfs/ref.mugqic/genomes/blast_db

Running a Sample Interactive Query

Step 1) Create input file:

$ echo -e ">test\nGGG" >> test.txt
$ cat test.txt
>test
GGG

Step 2) Start an interactive session:

$ salloc --time=1:0:0 --mem=1G --ntasks=1 --cpus-per-task=4 --account=def-someuser

Step 3) Perform a query:

$ blastp -query test.txt -db /cvmfs/ref.mugqic/genomes/blast_db/nr -task blastp -out test.out  -evalue 0.001 -num_threads 4

Step 4) Check results in output file test.out:

$ cat test.out

Step 5) Ending the interactive session:

$ exit

Usage on Legacy Systems

Version Selection

$ module load blast/2.6.0+

Job Submission

BLAST 2.2.20

sqsub -r 1h -o ofile%J blastall -p blastn  -i ech_query.fas -d yeast.nt -o  blastn_test.typ
sqsub -r 1h -o ofile%J blastall -p blastx  -i ech_query.fas -d yeast.aa -o  blastx_test.typ
sqsub -r 1h -o ofile%J blastall -p tblastn -i ech_query.fas -d yeast.nt -o tblastn_test.typ
sqsub -r 1h -o ofile%J blastall -p blastp  -i ech_query.fas -d yeast.aa -o  blastp_test.typ
sqsub -r 1h -o ofile%J blastall -p tblastx -i ech_query.fas -d yeast.nt -o tblastx_test.typ

BLAST 2.6.0+

makeblastdb -in igSeqProt.fa -dbtype prot -out db/blast/igSeqProt -hash_index
sqsub -r 60m -q threaded -n 4 -o ofile.%J blastn -task blastn -db swissprot
             -query test.txt -out job.out -evalue 0.001 -num_threads 4

Example Job

Step 1) Prepare blast database:

ssh saw.sharcnet.ca
ssh saw-dev1
wget http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/blast/db/FASTA/env_nr.gz
gunzip env_nr.gz
mkdir -p db/blast/
makeblastdb -in env_nr -dbtype prot -out db/blast/sorted_env_nr -max_file_sz 500MB
ls db/blast/sorted_env_nr.*.*
db/blast/sorted_env_nr.00.phr  db/blast/sorted_env_nr.01.phr  db/blast/sorted_env_nr.02.phr
db/blast/sorted_env_nr.00.pin  db/blast/sorted_env_nr.01.pin  db/blast/sorted_env_nr.02.pin
db/blast/sorted_env_nr.00.psq  db/blast/sorted_env_nr.01.psq  db/blast/sorted_env_nr.02.psq

Step 2) Create input file:

echo -e ">test\nGGG" >> test.txt
cat test.txt
>test
GGG

Step 3) Perform a query:

sqsub -r 1h -q threaded -n 4 --mpp=1G -o ofile.%J blastp -query test.txt
   -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 4

Step 4) Check results in output file test.out:

cat test.out

General Notes

Env Variables

Create a ~/.ncbirc file in your home account such as:

[NCBI]
Data=/opt/sharcnet/blast/version/data
[BLAST]
BLASTDB=/work/username/your/db/directory
BLASTMAT=/opt/sharcnet/blast/current/data

Interactive Jobs

When running blast from the command line its recommended to use iqaluk or interactive sessions (i.e. 'salloc' command) on national platforms (i.e. graham, cedar, etc.). For instance to run the Example Job interactively on iqaluk with 16 threads and get a timing result do:

ssh iqaluk.sharcnet.ca

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 1
real	0m7.202s

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 4
real	0m2.990s

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 16
real	0m1.959s

time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out  -evalue 0.001 -num_threads 32
real	0m2.158s

Scaling Tests

For larger values of threads minimal speedup is found compared to the additional resources required. The following table whose data was generated from jobs run on hound illustrates the point:

ncpus  real time  mpp
---------------------
-n 1    12.739s   500M
-n 2    8.834s    1.0G
-n 4    4.051s    1.0G   (optimal)
-n 8    3.737s    1.5G
-n 12  3.004s     2.5G
-n 16  2.790s     3.0G
-n 24  3.929s     3.0G

Cmd Line Args

Command line arguments for any of the blast binaries found under /opt/sharcnet/blast/version/bin can be shown by using the -h switch as follows:

[roberpj@hnd19:~] blastn -h
USAGE
  blastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-perc_identity float_value] [-xdrop_ungap float_value]
    [-xdrop_gap float_value] [-xdrop_gap_final float_value]
    [-searchsp int_value] [-max_hsps_per_subject int_value] [-penalty penalty]
    [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value]
    [-template_type type] [-template_length int_value] [-dust DUST_options]
    [-filtering_db filtering_database]
    [-window_masker_taxid window_masker_taxid]
    [-window_masker_db window_masker_db] [-soft_masking soft_masking]
    [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
    [-best_hit_score_edge float_value] [-window_size int_value]
    [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
    [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-html] [-max_target_seqs num_sequences]
    [-num_threads int_value] [-remote] [-version]
DESCRIPTION
   Nucleotide-Nucleotide BLAST 2.2.28+
Use '-help' to print detailed descriptions of command line arguments

References

o Blast Homepage
http://blast.ncbi.nlm.nih.gov/Blast.cgi

o BLAST+ Release Notes
http://www.ncbi.nlm.nih.gov/books/NBK131777/

o BLAST Command Line Applications User Manual
http://www.ncbi.nlm.nih.gov/books/NBK1763/

o BLAST Command Line Applications User Manual http://www.ncbi.nlm.nih.gov/books/NBK1763/

o BLAST FTP Download Site
http://www.ncbi.nlm.nih.gov/books/NBK62345/

o Blast Download Mirror Readme
http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/blast/db/README

o RefSeq: NCBI Reference Sequence Database
http://www.ncbi.nlm.nih.gov/refseq/