(Created page with "{{Software |package_name=MPIBLAST |package_description=Parallel implementation of NCBI BLAST |package_idnumber=55 }} h1. Example 1 - DROSOPH Copy sample problem files into a dir...") |
(No difference)
|
Revision as of 13:19, 4 April 2012
| MPIBLAST |
|---|
| Description: Parallel implementation of NCBI BLAST |
| SHARCNET Package information: see MPIBLAST software page in web portal |
| Full list of SHARCNET supported software |
h1. Example 1 - DROSOPH
Copy sample problem files into a directory under work. Use 1.5.0 or 1.6.0 depending on which cluster you are on:
mkdir /work/$USER/testmpiblast1; cd /work/$USER/testmpiblast1 cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.aa.gz > drosoph.aa gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt
Create hidden configuration file which defines shared storage location between nodes and a local storage directory on each compute node as follows
cd /work/$USER/testmpiblast1 echo "[mpiBLAST]" > .ncbirc echo "Shared=/scratch/$USER/testmpiblast1" >> .ncbirc echo "Local=/tmp" >> .ncbirc
Create the shared directory under scratch where the partitioned database will be stored. Note that files under scratch will eventually expire and be deleted automatically by the system.
mkdir /scratch/$USER/testmpiblast1
From _/work/$USER/testmpiblast1_ execute the following command to partition the database. After it completes verify the partition files were created in the shared scratch directory. For this example choosing N=32 doubles the execution time compared to N=16. The choice of N should be therefore carefully chosen based on scaling tests.
Version 1.5.0 clusters run: mpiformatdb.sh "-N 16 -i drosoph.nt -o T -p F" Version 1.6.0 clusters run: mpiformatdb -N 16 -i drosoph.nt -o T -p F
Submit a short test job to the queue with a 15m time limit. If all goes well output results will be written to _drosoph.out_ and the total execution wall time will be approximately 3 seconds.
sqsub -t -r 15m -n 16 -q mpi -o ofile%J mpiblast -d drosoph.nt -i drosoph.in -p blastn -o drosoph.out --removedb
Sample output is included in _/opt/sharcnet/mpiblast/current/examples/ROSOPH.out_ to compare your _drosoph.out_ output file with it.
h1. Example 2 - BIOBREW
Copy sample problem files into a directory under work.
mkdir /work/$USER/testmpiblast2; cd /work/$USER/testmpiblast2 cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq
Create hidden configuration file which defines shared storage location between nodes and a local storage directory on each compute node as follows:
cd /work/$USER/testmpiblast2 echo "[mpiBLAST]" > .ncbirc echo "Shared=/work/$USER/mpiformatdbs/testmpiblast2" >> .ncbirc echo "Local=/tmp" >> .ncbirc
Create the shared directory under work where formated databases will be stored. In this example the database is saved under work for long term retention and sharing.
mkdir /work/$USER/mpiformatdbs; mkdir /work/$USER/mpiformatdbs/testmpiblast2
From _/work/$USER/testmpiblast2_ execute the following command to partition the database. After it completes verify the database files were created in the shared work directory. Note that doubling N to 32 in this examples improves the performance by only 10% and therefore is not practical.
Version 1.5.0 clusters run: mpiformatdb.sh "-N 16 -i Hs.seq.uniq -o T -p F" Version 1.6.0 clusters run: mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F
Submit a short test job to the queue with a 15m time limit. If all goes well output results will be written to biobrew.out and the total execution wall time will be approximately 30 seconds.
sqsub -t -r 55m -n 16 -q mpi -o ofile%J mpiblast -p blastn -d Hs.seq.uniq -i il2ra.in -o biobrew.out
Sample output is included in _/opt/sharcnet/mpiblast/1.6.0/examples/BIOBREW.out_ to compare your _biobrew.out_ output file with.
h1. MPIBLAST Command Line Arguments
mpiblast.sh --help -p [blast program name] -d [database] -i [query file]
mpiformatdb.sh --help
formatdb 2.2.15 arguments:
-t Title for database file [String] Optional
-i Input file(s) for formatting [File In] Optional
-l Logfile name: [File Out] Optional
default = formatdb.log
-p Type of file
T - protein
F - nucleotide [T/F] Optional
default = T
-o Parse options
T - True: Parse SeqId and create indexes.
F - False: Do not parse SeqId. Do not create indexes.
[T/F] Optional
default = F
-a Input file is database in ASN.1 format (otherwise FASTA is expected)
T - True,
F - False.
[T/F] Optional
default = F
-b ASN.1 database in binary mode
T - binary,
F - text mode.
[T/F] Optional
default = F
-e Input is a Seq-entry [T/F] Optional
default = F
-n Base name for BLAST files [String] Optional
-v Database volume size in millions of letters [Integer] Optional
default = 0
range from 0 to <NULL>
-s Create indexes limited only to accessions - sparse [T/F] Optional
default = F
-V Verbose: check for non-unique string ids in the database [T/F] Optional
default = F
-L Create an alias file with this name
use the gifile arg (below) if set to calculate db size
use the BLAST db specified with -i (above) [File Out] Optional
-F Gifile (file containing list of gi's) [File In] Optional
-B Binary Gifile produced from the Gifile specified above [File Out] Optional
-N Number of database volumes [Integer] Optional
default = 0
range from 1 to 250