| Line 16: | Line 16: | ||
<pre> | <pre> | ||
| − | mkdir -p /work/$USER/samples/mpiblast/test1; rm | + | mkdir -p /work/$USER/samples/mpiblast/test1; rm /work/$USER/samples/mpiblast/test1/*; cd /work/$USER/samples/mpiblast/test1 |
cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in | cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in | ||
gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt | gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt | ||
| Line 37: | Line 37: | ||
<pre> | <pre> | ||
| − | mkdir /scratch/$USER/mpiblasttest1; rm -f /scratch/$USER/mpiblasttest1; cd /work/$USER/samples/mpiblast/test1 | + | mkdir /scratch/$USER/mpiblasttest1; rm -f /scratch/$USER/mpiblasttest1/*; cd /work/$USER/samples/mpiblast/test1 |
mpiformatdb -N 8 -i drosoph.nt -o T -p F -n /scratch/$USER/testmpiblast1 | mpiformatdb -N 8 -i drosoph.nt -o T -p F -n /scratch/$USER/testmpiblast1 | ||
</pre> | </pre> | ||
| Line 55: | Line 55: | ||
<pre> | <pre> | ||
| − | mkdir /work/$USER/samples/mpiblast/test2; rm | + | mkdir /work/$USER/samples/mpiblast/test2; rm /work/$USER/samples/mpiblast/test2/*; cd /work/$USER/samples/mpiblast/test2 |
cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in | cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in | ||
gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq | gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq | ||
| Line 78: | Line 78: | ||
<pre> | <pre> | ||
| − | mkdir -p /scratch/$USER/mpiblasttest2; rm -f /scratch/$USER/mpiblasttest2; cd /work/$USER/samples/mpiblast/test1 | + | mkdir -p /scratch/$USER/mpiblasttest2; rm -f /scratch/$USER/mpiblasttest2/*; cd /work/$USER/samples/mpiblast/test1 |
mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F | mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F | ||
</pre> | </pre> | ||
Revision as of 17:58, 6 June 2012
| MPIBLAST |
|---|
| Description: Parallel implementation of NCBI BLAST |
| SHARCNET Package information: see MPIBLAST software page in web portal |
| Full list of SHARCNET supported software |
GETTING STARTED
Mpiblast is not loaded by default on the clusters therefore load the module before submitting any jobs:
module load mpiblast/1.6.0
EXAMPLE1 - DROSOPH
Copy sample problem files (fasta database and input) into a directory under work:
mkdir -p /work/$USER/samples/mpiblast/test1; rm /work/$USER/samples/mpiblast/test1/*; cd /work/$USER/samples/mpiblast/test1 cp /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.in drosoph.in gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/drosoph.nt.gz > drosoph.nt
Create hidden configuration file to define a Shared storage location between nodes and a Local storage directory available on each compute node where $USER should be replaced with your username as shown here:
[username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc [BLAST] BLASTDB=/scratch/$USER/mpiblasttest1 BLASTMAT=/work/$USER/samples/mpiblast/test1 [mpiBLAST] Shared=/scratch/$USER/mpiblasttest1 Local=/tmp
Partition the database into 8 fragments to a cluster local scratch storage location:
mkdir /scratch/$USER/mpiblasttest1; rm -f /scratch/$USER/mpiblasttest1/*; cd /work/$USER/samples/mpiblast/test1 mpiformatdb -N 8 -i drosoph.nt -o T -p F -n /scratch/$USER/testmpiblast1
Submit a short job with a 15m time limit on 8 plus 2 cores. If all goes well output results will be written to drosoph.out and the execution time will appear in ofile%J where %J is the job number:
cd /work/$USER/samples/mpiblast/test1 sqsub -r 15m -n 10 -q mpi -o ofile%J mpiblast -d drosoph.nt -i drosoph.in -p blastn -o drosoph.out --use-parallel-write --use-virtual-frags
Sample output results computed previously with BLASTN 2.2.15 [Oct-15-2006] are included in /opt/sharcnet/mpiblast/1.6.0/examples/ROSOPH.out to compare your newly generated drosoph.out file with.
EXAMPLE2 - BIOBREW
This example is provided to show some extra options and switchs that maybe useful for debugging and dealing with larger databases. As before copy sample problem files into a directory under work:
mkdir /work/$USER/samples/mpiblast/test2; rm /work/$USER/samples/mpiblast/test2/*; cd /work/$USER/samples/mpiblast/test2 cp /opt/sharcnet/mpiblast/1.6.0/examples/il2ra.in il2ra.in gunzip -c /opt/sharcnet/mpiblast/1.6.0/examples/Hs.seq.uniq.gz > Hs.seq.uniq
Create hidden configuration file using the vi editor to define a Shared storage location between nodes and a Local storage directory available on each compute node as follows, where the Data directory is not yet populated or used in this example and hence can be omitted, where $USER should be replaced with your username as shown here:. If its desired the Local and Shared directories are the same then replace --copy-via=mpi with --copy-via=none as will be demonstrated in the below sqsub commands.
[username@orc-login1:/work/roberpj/samples/mpiblast/test1] vi .ncbirc [NCBI] Data=/opt/sharcnet/mpiblast/1.6.0/data [BLAST] BLASTDB=/work/$USER/mpiblasttest2 BLASTMAT=/work/$USER/samples/mpiblast/test2 [mpiBLAST] Shared=/work/$USER/mpiblasttest2 Local=/tmp
Partition the database into 16 fragments directly in the work directly:
mkdir -p /scratch/$USER/mpiblasttest2; rm -f /scratch/$USER/mpiblasttest2/*; cd /work/$USER/samples/mpiblast/test1 mpiformatdb -N 16 -i Hs.seq.uniq -o T -p F
Submit a couple of short jobs 15m time limit. If all goes well output results will be written to biobrewA.out and biobrewB.out and the execution time will appear in corresponding ofile%J's where %J is the job number.
A) In this job submission fragment files are first copied from work to local /tmp before being used (appropriate if work is slow). Usage of the profile option is also shown in this example:
cd /work/$USER/samples/mpiblast/test2; rm -f oTime* sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast --use-parallel-write --copy-via=mpi -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --time-profile=oTime
B) In this job submission fragment files are used inplace on work. Usage of the debug option is also shown in this example.
cd /work/$USER/samples/mpiblast/test2; rm -f oLog* sqsub -r 15m -n 18 -q mpi -o ofile%J mpiblast --use-parallel-write --copy-via=none -d Hs.seq.uniq -i il2ra.in -p blastn -o biobrew.out --debug=oLog
Finally compare /opt/sharcnet/mpiblast/1.6.0/examples/BIOBREW.out computed previously with BLASTN 2.2.15 [Oct-15-2006] with your newly generated biobrew.out output file to verify the results and submit a ticket if there are any problems!
MPIBLAST BINARIES (command line arguments)
[roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiblast -help mpiBLAST requires the following options: -d [database] -i [query file] -p [blast program name]
[roberpj@orc-login1:/opt/sharcnet/mpiblast/1.6.0/bin] ./mpiformatdb --help
Executing: formatdb -
formatdb 2.2.20 arguments:
-t Title for database file [String] Optional
-i Input file(s) for formatting [File In] Optional
-l Logfile name: [File Out] Optional
default = formatdb.log
-p Type of file
T - protein
F - nucleotide [T/F] Optional
default = T
-o Parse options
T - True: Parse SeqId and create indexes.
F - False: Do not parse SeqId. Do not create indexes.
[T/F] Optional
default = F
-a Input file is database in ASN.1 format (otherwise FASTA is expected)
T - True,
F - False.
[T/F] Optional
default = F
-b ASN.1 database in binary mode
T - binary,
F - text mode.
[T/F] Optional
default = F
-e Input is a Seq-entry [T/F] Optional
default = F
-n Base name for BLAST files [String] Optional
-v Database volume size in millions of letters [Integer] Optional
default = 4000
-s Create indexes limited only to accessions - sparse [T/F] Optional
default = F
-V Verbose: check for non-unique string ids in the database [T/F] Optional
default = F
-L Create an alias file with this name
use the gifile arg (below) if set to calculate db size
use the BLAST db specified with -i (above) [File Out] Optional
-F Gifile (file containing list of gi's) [File In] Optional
-B Binary Gifile produced from the Gifile specified above [File Out] Optional
-T Taxid file to set the taxonomy ids in ASN.1 deflines [File In] Optional
-N Number of database volumes [Integer] Optional
default = 0