BLAST |
---|
Description: To find regions of local similarity between sequences. |
SHARCNET Package information: see BLAST software page in web portal |
Full list of SHARCNET supported software |
Contents
Introduction
Sharcnet provides a multithread-safe build of NCBI BLAST+ compiled from source with the intel compiler. A serial build of legacy NCBI BLAST 2.2.20 remains on some clusters. Users are recommended to migrate to BLAST+ asap.
Version Selection
module load blast/2.2.28+
Once the module is loaded the following binaries will be available by default in your path:
[roberpj@hnd19:/opt/sharcnet/blast/2.2.28+/bin] ls align_format_unit_test blastn gene_info_unit_test seqdb_demo bdbloader_unit_test blastp legacy_blast.pl seqdb_perf blastdb_aliastool blast_services_unit_test makeblastdb seqdb_unit_test blastdbcheck blast_unit_test makembindex tblastn blastdbcmd blastx makeprofiledb tblastx blastdbcp convert2blastmask psiblast test_pcre blastdb_format_unit_test datatool rpsblast update_blastdb.pl blast_formatter deltablast rpstblastn windowmasker blast_format_unit_test dustmasker seedtop windowmasker_2.2.22_adapter.py blastinput_unit_test gene_info_reader segmasker writedb_unit_test
Job Submission
BLAST 2.2.20
sqsub -r 1h -o ofile%J blastall -p blastn -i ech_query.fas -d yeast.nt -o blastn_test.typ sqsub -r 1h -o ofile%J blastall -p blastx -i ech_query.fas -d yeast.aa -o blastx_test.typ sqsub -r 1h -o ofile%J blastall -p tblastn -i ech_query.fas -d yeast.nt -o tblastn_test.typ sqsub -r 1h -o ofile%J blastall -p blastp -i ech_query.fas -d yeast.aa -o blastp_test.typ sqsub -r 1h -o ofile%J blastall -p tblastx -i ech_query.fas -d yeast.nt -o tblastx_test.typ
BLAST 2.2.28+
makeblastdb -in igSeqProt.fa -dbtype prot -out db/blast/igSeqProt -hash_index
sqsub -r 60m -q threaded -n 4 -o ofile.%J blastn -task blastn -db swissprot -query test.txt -out job.out -evalue 0.001 -num_threads 4
Example Job
Step 1) Prepare blast database:
ssh saw.sharcnet.ca ssh saw-dev1 wget http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/blast/db/FASTA/env_nr.gz gunzip env_nr.gz mkdir -p db/blast/
makeblastdb -in env_nr -dbtype prot -out db/blast/sorted_env_nr -max_file_sz 500MB
ls db/blast/sorted_env_nr.*.* db/blast/sorted_env_nr.00.phr db/blast/sorted_env_nr.01.phr db/blast/sorted_env_nr.02.phr db/blast/sorted_env_nr.00.pin db/blast/sorted_env_nr.01.pin db/blast/sorted_env_nr.02.pin db/blast/sorted_env_nr.00.psq db/blast/sorted_env_nr.01.psq db/blast/sorted_env_nr.02.psq
Step 2) Create input file:
echo -e ">test\nGGG" >> test.txt
cat test.txt >test GGG
Step 3) Perform a query:
sqsub -r 1h -q threaded -n 4 --mpp=1G -o ofile.%J blastp -query test.txt -db db/blast/env_nr -task blastp -out test.out -evalue 0.001 -num_threads 4
General Notes
Env Variables
Create a ~/.ncbirc file in your home account such as:
[NCBI] Data=/opt/sharcnet/blast/version/data [BLAST] BLASTDB=/work/username/your/db/directory BLASTMAT=/opt/sharcnet/blast/current/data
Interactive Jobs
When running blast from the command line its recommended to use iqaluk or one of orcas development node. For instance to run the Example Job interactively on iqaluk with 16 threads and get a timing result do:
ssh iqaluk.sharcnet.ca time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out -evalue 0.001 -num_threads 1 real 0m7.202s time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out -evalue 0.001 -num_threads 4 real 0m2.990s time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out -evalue 0.001 -num_threads 16 real 0m1.959s time blastp -query test.txt -db db/blast/sorted_env_nr -task blastp -out test.out -evalue 0.001 -num_threads 32 real 0m2.158s
Scaling Tests
For larger values of threads minimal speedup is found compared to the additional resources required. The following table whose data was generated from jobs run on hound illustrates the point:
ncpus real time mpp --------------------- -n 1 12.739s 500M -n 2 8.834s 1.0G -n 4 4.051s 1.0G (optimal) -n 8 3.737s 1.5G -n 12 3.004s 2.5G -n 16 2.790s 3.0G -n 24 3.929s 3.0G
Cmd Line Args
Command line arguments for any of the blast binaries found under /opt/sharcnet/blast/version/bin can be shown by using the -h switch as follows:
[roberpj@hnd19:~] blastn -h USAGE blastn [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-seqidlist filename] [-negative_gilist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-perc_identity float_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-max_hsps_per_subject int_value] [-penalty penalty] [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value] [-template_type type] [-template_length int_value] [-dust DUST_options] [-filtering_db filtering_database] [-window_masker_taxid window_masker_taxid] [-window_masker_db window_masker_db] [-soft_masking soft_masking] [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-window_size int_value] [-off_diagonal_range int_value] [-use_index boolean] [-index_name string] [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-html] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-version] DESCRIPTION Nucleotide-Nucleotide BLAST 2.2.28+ Use '-help' to print detailed descriptions of command line arguments
References
o Blast Homepage
http://blast.ncbi.nlm.nih.gov/Blast.cgi
o BLAST+ Release Notes
http://www.ncbi.nlm.nih.gov/books/NBK131777/
o BLAST Command Line Applications User Manual
http://www.ncbi.nlm.nih.gov/books/NBK1763/
o BLAST Command Line Applications User Manual http://www.ncbi.nlm.nih.gov/books/NBK1763/
o BLAST FTP Download Site
http://www.ncbi.nlm.nih.gov/books/NBK62345/
o Blast Download Mirror Readme
http://mirrors.vbi.vt.edu/mirrors/ftp.ncbi.nih.gov/blast/db/README
o RefSeq: NCBI Reference Sequence Database
http://www.ncbi.nlm.nih.gov/refseq/