Line 1: | Line 1: | ||
+ | {{Template:CCDelete}} | ||
{{Software | {{Software | ||
|package_name=BLAST | |package_name=BLAST |
Latest revision as of 10:04, 6 June 2019
This page is scheduled for deletion because it is either redundant with information available on the CC wiki, or the software is no longer supported. |
Contents
BLAST |
---|
Description: To find regions of local similarity between sequences. |
SHARCNET Package information: see BLAST software page in web portal |
Full list of SHARCNET supported software |
Introduction
Sharcnet provides a multithread-safe build of NCBI BLAST+ compiled from source. A serial build of legacy NCBI BLAST 2.2.20 remains on some clusters. Users are recommended to migrate to BLAST+ asap.
Which BLAST Program to Use
On both national platforms (i.e. graham, cedar, etc.) and legacy systems, once the module is loaded you can refer to the following table to know which blast program to use:
Program | Query Type | DB Type | Comparison | Note |
---|---|---|---|---|
blastn | Nucleotide | Nucleotide | Nucleotide-Nucleotide | |
blastp | Protein | Protein | Protein-Protein | |
tblastn | Protein | Nucleotide | Protein-Protein | The database is translated into protein |
blastx | Nucleotide | Protein | Protein-Protein | The queries are translated into protein |
tblastx | Nucleotide | Nucleotide | Protein-Protein | The queries and database are translated into protein |
Usage on National Platforms (i.e. Graham, Cedar)
Version Selection
Loading version 2.6.0+:
$ module load gcc/5.4.0 blast+/2.6.0
Loading version 2.7.1+:
$ module load gcc/7.3.0 blast+/2.7.1
Blast DB
On national platforms, the blast database is available at the following path as described in more details on Compute Canada Documentation wiki here:
/cvmfs/ref.mugqic/genomes/blast_db
Running a Sample Interactive Query
Step 1) Create input file:
$ echo -e ">test\nGGG" >> test.txt $ cat test.txt >test GGG
Step 2) Start an interactive session:
$ salloc --time=1:0:0 --mem=1G --ntasks=1 --cpus-per-task=4 --account=def-someuser
Step 3) Perform a query:
$ blastp -query test.txt -db /cvmfs/ref.mugqic/genomes/blast_db/nr -task blastp -out test.out -evalue 0.001 -num_threads 4
Step 4) Check results in output file test.out:
$ cat test.out
Step 5) Ending the interactive session:
$ exit
Usage on Legacy Systems
Version Selection
$ module load blast/2.6.0+
Job Submission
BLAST 2.2.20
sqsub -r 1h -o ofile%J blastall -p blastn -i ech_query.fas -d yeast.nt -o blastn_test.typ sqsub -r 1h -o ofile%J blastall -p blastx -i ech_query.fas -d yeast.aa -o blastx_test.typ sqsub -r 1h -o ofile%J blastall -p tblastn -i ech_query.fas -d yeast.nt -o tblastn_test.typ sqsub -r 1h -o ofile%J blastall -p blastp -i ech_query.fas -d yeast.aa -o blastp_test.typ sqsub -r 1h -o ofile%J blastall -p tblastx -i ech_query.fas -d yeast.nt -o tblastx_test.typ
BLAST 2.6.0+
makeblastdb -in igSeqProt.fa -dbtype prot -out db/blast/igSeqProt -hash_index
sqsub -r 60m -q threaded -n 4 -o ofile.%J blastn -task blastn -db swissprot -query test.txt -out job.out -evalue 0.001 -num_threads 4
Example Job
Step 1) Prepare blast database:
ssh saw.sharcnet.ca ssh saw-dev1 wget http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/blast/db/FASTA/env_nr.gz gunzip env_nr.gz mkdir -p db/blast/
makeblastdb -in env_nr -dbtype prot -out db/blast/sorted_env_nr -max_file_sz 500MB
ls db/blast/sorted_env_nr.*.* db/blast/sorted_env_nr.00.phr db/blast/sorted_env_nr.01.phr db/blast/sorted_env_nr.02.phr db/blast/sorted_env_nr.00.pin db/blast/sorted_env_nr.01.pin db/blast/sorted_env_nr.02.pin db/blast/sorted_env_nr.00.psq db/blast/sorted_env_nr.01.psq db/blast/sorted_env_nr.02.psq
Step 2) Create input file:
echo -e ">test\nGGG" >> test.txt
cat test.txt >test GGG
Step 3) Perform a query:
sqsub -r 1h -q threaded -n 4 --mpp=1G -o ofile.%J blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out -evalue 0.001 -num_threads 4
Step 4) Check results in output file test.out:
cat test.out
General Notes
Env Variables
Create a ~/.ncbirc file in your home account such as:
[NCBI] Data=/opt/sharcnet/blast/version/data [BLAST] BLASTDB=/work/username/your/db/directory BLASTMAT=/opt/sharcnet/blast/current/data
Interactive Jobs
When running blast from the command line its recommended to use iqaluk or interactive sessions (i.e. 'salloc' command) on national platforms (i.e. graham, cedar, etc.). For instance to run the Example Job interactively on iqaluk with 16 threads and get a timing result do:
ssh iqaluk.sharcnet.ca time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out -evalue 0.001 -num_threads 1 real 0m7.202s time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out -evalue 0.001 -num_threads 4 real 0m2.990s time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out -evalue 0.001 -num_threads 16 real 0m1.959s time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out -evalue 0.001 -num_threads 32 real 0m2.158s
Scaling Tests
For larger values of threads minimal speedup is found compared to the additional resources required. The following table whose data was generated from jobs run on hound illustrates the point:
ncpus real time mpp --------------------- -n 1 12.739s 500M -n 2 8.834s 1.0G -n 4 4.051s 1.0G (optimal) -n 8 3.737s 1.5G -n 12 3.004s 2.5G -n 16 2.790s 3.0G -n 24 3.929s 3.0G
Cmd Line Args
Command line arguments for any of the blast binaries found under /opt/sharcnet/blast/version/bin can be shown by using the -h switch as follows:
[roberpj@hnd19:~] blastn -h USAGE blastn [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-seqidlist filename] [-negative_gilist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-perc_identity float_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-max_hsps_per_subject int_value] [-penalty penalty] [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value] [-template_type type] [-template_length int_value] [-dust DUST_options] [-filtering_db filtering_database] [-window_masker_taxid window_masker_taxid] [-window_masker_db window_masker_db] [-soft_masking soft_masking] [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-window_size int_value] [-off_diagonal_range int_value] [-use_index boolean] [-index_name string] [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-html] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-version] DESCRIPTION Nucleotide-Nucleotide BLAST 2.2.28+ Use '-help' to print detailed descriptions of command line arguments
References
o Blast Homepage
http://blast.ncbi.nlm.nih.gov/Blast.cgi
o BLAST+ Release Notes
http://www.ncbi.nlm.nih.gov/books/NBK131777/
o BLAST Command Line Applications User Manual
http://www.ncbi.nlm.nih.gov/books/NBK1763/
o BLAST Command Line Applications User Manual http://www.ncbi.nlm.nih.gov/books/NBK1763/
o BLAST FTP Download Site
http://www.ncbi.nlm.nih.gov/books/NBK62345/
o Blast Download Mirror Readme
http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/blast/db/README
o RefSeq: NCBI Reference Sequence Database
http://www.ncbi.nlm.nih.gov/refseq/