From Documentation
Revision as of 14:19, 4 April 2012 by Isaac (Talk | contribs) (Created page with "{{Software |package_name=MPIBLAST |package_description=Parallel implementation of NCBI BLAST |package_idnumber=55 }} h1. Example 1 - DROSOPH Copy sample problem files into a dir...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
MPIBLAST
Description: Parallel implementation of NCBI BLAST
SHARCNET Package information: see MPIBLAST software page in web portal
Full list of SHARCNET supported software


h1. Example 1 - DROSOPH

Copy sample problem files into a directory under work. Use 1.5.0 or 1.6.0 depending on which cluster you are on:

mkdir /work/$USER/testmpiblast1; cd /work/$USER/testmpiblast1
cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in
gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.aa.gz > drosoph.aa
gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt

Create hidden configuration file which defines shared storage location between nodes and a local storage directory on each compute node as follows

cd /work/$USER/testmpiblast1
echo "[mpiBLAST]" > .ncbirc
echo "Shared=/scratch/$USER/testmpiblast1" >> .ncbirc
echo "Local=/tmp" >> .ncbirc

Create the shared directory under scratch where the partitioned database will be stored. Note that files under scratch will eventually expire and be deleted automatically by the system.

mkdir /scratch/$USER/testmpiblast1

From _/work/$USER/testmpiblast1_ execute the following command to partition the database. After it completes verify the partition files were created in the shared scratch directory. For this example choosing N=32 doubles the execution time compared to N=16. The choice of N should be therefore carefully chosen based on scaling tests.

Version 1.5.0 clusters run:  mpiformatdb.sh "-N 16 -i drosoph.nt -o T -p F"
Version 1.6.0 clusters run:  mpiformatdb -N 16 -i drosoph.nt -o T -p F

Submit a short test job to the queue with a 15m time limit. If all goes well output results will be written to _drosoph.out_ and the total execution wall time will be approximately 3 seconds.

sqsub -t -r 15m -n 16 -q mpi -o ofile%J mpiblast -d drosoph.nt -i drosoph.in -p blastn -o drosoph.out --removedb

Sample output is included in _/opt/sharcnet/mpiblast/current/examples/ROSOPH.out_ to compare your _drosoph.out_ output file with it.

h1. Example 2 - BIOBREW

Copy sample problem files into a directory under work.

mkdir /work/$USER/testmpiblast2; cd /work/$USER/testmpiblast2
cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in
gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq

Create hidden configuration file which defines shared storage location between nodes and a local storage directory on each compute node as follows:

cd /work/$USER/testmpiblast2
echo "[mpiBLAST]" > .ncbirc
echo "Shared=/work/$USER/mpiformatdbs/testmpiblast2" >> .ncbirc
echo "Local=/tmp" >> .ncbirc

Create the shared directory under work where formated databases will be stored. In this example the database is saved under work for long term retention and sharing.

mkdir /work/$USER/mpiformatdbs; mkdir /work/$USER/mpiformatdbs/testmpiblast2

From _/work/$USER/testmpiblast2_ execute the following command to partition the database. After it completes verify the database files were created in the shared work directory. Note that doubling N to 32 in this examples improves the performance by only 10% and therefore is not practical.

Version 1.5.0 clusters run:  mpiformatdb.sh "-N 16 -i Hs.seq.uniq -o T -p F" 
Version 1.6.0 clusters run:  mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F

Submit a short test job to the queue with a 15m time limit. If all goes well output results will be written to biobrew.out and the total execution wall time will be approximately 30 seconds.

sqsub -t -r 55m -n 16 -q mpi -o ofile%J mpiblast -p blastn -d Hs.seq.uniq -i il2ra.in -o biobrew.out

Sample output is included in _/opt/sharcnet/mpiblast/1.6.0/examples/BIOBREW.out_ to compare your _biobrew.out_ output file with.

h1. MPIBLAST Command Line Arguments


mpiblast.sh --help

-p [blast program name]
-d [database]
-i [query file]
mpiformatdb.sh --help
formatdb 2.2.15   arguments:
 -t  Title for database file [String]  Optional
 -i  Input file(s) for formatting [File In]  Optional
 -l  Logfile name: [File Out]  Optional
   default = formatdb.log
 -p  Type of file
        T - protein   
        F - nucleotide [T/F]  Optional
   default = T
 -o  Parse options
        T - True: Parse SeqId and create indexes.
        F - False: Do not parse SeqId. Do not create indexes.
 [T/F]  Optional
   default = F
 -a  Input file is database in ASN.1 format (otherwise FASTA is expected)
        T - True, 
        F - False.
 [T/F]  Optional
   default = F
 -b  ASN.1 database in binary mode
        T - binary, 
        F - text mode.
 [T/F]  Optional
   default = F
 -e  Input is a Seq-entry [T/F]  Optional
   default = F
 -n  Base name for BLAST files [String]  Optional
 -v  Database volume size in millions of letters [Integer]  Optional
   default = 0
   range from 0 to <NULL>
 -s  Create indexes limited only to accessions - sparse [T/F]  Optional
   default = F
 -V  Verbose: check for non-unique string ids in the database [T/F]  Optional
   default = F
 -L  Create an alias file with this name
       use the gifile arg (below) if set to calculate db size
       use the BLAST db specified with -i (above) [File Out]  Optional
 -F  Gifile (file containing list of gi's) [File In]  Optional
 -B  Binary Gifile produced from the Gifile specified above [File Out]  Optional
 -N  Number of database volumes [Integer]  Optional
   default = 0
   range from 1 to 250