From Documentation
Jump to: navigation, search

Note: users should consult with one of SHARCNET's HPTC Software Analysts prior to engaging in HPC development with Java; while there are many situations where Java-based development is acceptable in a parallel environment, any numerically intensive, long-running application is typically poorly served by a Java implementation and risks wasting vast computational resources. Solutions such as this one (MPJ Express), are effectively "home rolled" solutions to the problem of interprocess communication using Java, and portability and future compatibility with systems is likely to be an issue.


SHARCNET has struggled for years to provide Java support to MPI programmers in a way that integrates well with our batch scheduling system, and isn't so fragile as to cause code to break easily as JDKs are updated independently of the parallel programming framework. Options that have been evaluated in the past include mpiJava, which most people find pretty quickly on their own using a search engine; however, this software is overly reliant on its JNI underpinnings, is closely tied to JDK 1.2 which is painfully out of date so typically of little interest to Java programmers, and hasn't been seriously maintained since 2003. Attempts to compile this code on our systems, even with JDK 1.2 installed, have failed outright due to changes in the kernel code that mpiJava was hooking into.

I have always been a strong advocate for Java programmers within SHARCNET, and have wanted to provide a usable MPI framework for some time. Early in 2009, a user group brought MPJ Express to my attention and after experimenting with it for a while, I am optimistic that this will provide just such a tool for those Java programmers willing and able to use it. It can be easily installed on just about any SHARCNET system, and together with a minor modification and submission script I custom-wrote for this purpose it can be submitted and run productively in our batch scheduling environment.

We hope this software provides a reasonable parallel option to our Java users, and would love to receive feedback from those who do use it. Feel free to update this page with your experiences/tips, and forward any bugs and/or problems to David McCaughan (, or submit a ticket in the SHARCNET problem tracking system

What is MPJ Express?

MPJ Express is an implementation of Java bindings for the MPI standard. The system implements thread-safe communication in a Java messaging system to make it compliant with Javas threading. In response to a number of early attempts to introduce MPI-like functionality to Java, the Message-Passing Working Group of the Java Grande Forum was formed in late 1998. An initial draft for Java bindings for MPI was presented at Supercomputing 1998. A number of changes were made to the API over time, and the current API is now known as MPJ [1]. It can be viewed as a more modern, portable, and more importantly maintained framework for MPI in Java along the lines of the mpiJava project which is so old as to be unusable in most modern implementations.

It should be noted that there are no "official" Java bindings for MPI, and this software represents one of a small number of attempts to provide a parallel programming framework for Java programmers. Until such a time as MPI bindings are formally defined and supported for the Java platform, a Java programmer who wants to take advantage of MPI-parallelism will have to use a framework such as this one.

Internal SHARCNET testing has discovered this to be a very friendly framework which is not overly reliant on complex underlying JNI integration issues which tends to break code over time, particularly where the project is poorly supported. We are presenting these instructions here so that SHARCNET users who wish to pursue parallel Java implementations have an option available which is accessible, and relatively easy to install and configure in their user accounts.

Obtaining the Software

In order to use MPJ Express, you will need the following:

  • JDK v1.5 or newer (tested with JDK 1.6)
    • all SHARCNET systems should have default Java installations > 1.6.x at this point
    • note that some systems (e.g. requin at the time of writing) have an older system-wide default, although a new version is available in /opt/sharcnet/jdk; note that this particular package is not available as a module so you will need to adjust your path manually in such cases - see below.
  • MPJ Express (tested with v0.38 - current as of 2012/11/19)

Installing the Software

Assuming an adequate Java installation is present, we need only install the MPJ Express software. We will then need to make a few small modifications in order to make this work properly with our systems, and compatible with a batch submission environment.

Verify Java version

You need to ensure that when you invoke the Java tools, that you are getting the version you expect (rather than the ancient Java JRE install that may be left lying around on an old system). This is easily accomplished by modifying your PATH so that the appropriate JDK installation occurs before anything already listed. First, let's verify you're accessing a sufficiently current version of Java:

orc129:~% java -version
java version "1.7.0_09-icedtea"
OpenJDK Runtime Environment (rhel-2.3.3.el6_3.1-x86_64)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)

The above example was run on one of Orca's devel nodes (you may not be able to run it from the login node due to system limits). The version is 1.7 so we're golden. The following would be obtained on requin's login node:

req769:~% java -version
java version "1.4.2"
gcj (GCC) 3.4.6 20060404 (Red Hat 3.4.6-8)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO

v1.4.2 - ouch. However, as noted above, there is a more current JDK installation (see /opt/sharcnet/jdk), you just have to set the path manually as this isn't available as a module currently. Looking more closely at that directory structure, the binary directory we're after is actually /opt/sharcnet/jdk/current/bin. If you use a bash shell, you can add (or modify) a line in your ~/.bashrc file:

export PATH="/opt/sharcnet/jdk/current/bin:${PATH}"

Once you have made this modification, you can either source that configuration file, or simply log out and back in to verify what is being invoked when you type java or javac, e.g.:

req769:~% java -version
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)

Giddy-up. If it is reporting the same version still, something is wrong with the setting in the PATH variable.

Install MPJ Express

The basic install of MPJ is very simple: unarchive the tarball/zip file you downloaded in a location that is convenient to you. Typically, somewhere in /work is likely ideal. We'll assume you unpack it in /work/user (which will unpack itself into a mpj-v0_38 subdirectory):


MPJ Express expects there to be a MPJ_HOME environment variable set, and the path to the MPJ binaries should be added to the PATH variable---in this example, we have simply modified the PATH as noted above. MPJ Express will also require access to the mpj.jar library for compilation---this can be specified explicitly at compile time, however it can also be added to a CLASSPATH environment variable so that it is searched for automatically (as we did with the PATH, you must add this to any CLASSPATH that already exists, if one exists already). Again, using the bash shell, the following lines should appear (in ~/.bashrc) by the time you are done:

export MPJ_HOME="/work/user/mpj-v0_38"
export PATH="${PATH}:${MPJ_HOME}/bin"
export CLASSPATH="${MPJ_HOME}/lib/mpj.jar"

Note if you're working on requin (or any system with the "old" SHARCNET environment), you also need to add the path to the Java binaries as noted above, in which case your PATH line may look more like:

export PATH="/opt/sharcnet/jdk/current/bin:${PATH}:${MPJ_HOME}/bin"

Again, source the configuration file or log out and back in and verify that you can now access the MPJ binaries by using the which command:

req769:~% which mpjboot

Edit mpjboot and mpjhalt Scripts

The ${MPJ_HOME}/bin/mpjboot and ${MPJ_HOME}/bin/mpjhalt scripts as provided are relatively simple things, reading a number of hostnames (on which to start its communication daemons) from a file, using ssh to shell to those hosts and explicitly invoke or kill the daemon. The issue on SHARCNET clusters is that you do not get a full environment when executing commands with ssh and this mpjboot script will appear to do nothing (it actually does try to start the daemons but is unable to do so). mpjhalt is not effected by this problem. The solution is to modify the start-up process so that ssh invokes a bash shell, which we can more easily ensure has the full environment settings. Edit the ${MPJ_HOME}/bin/mpjboot script as follows:

A line that reads:

ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_linux_xxxx start;"

should be changed to read:

ssh $host "bash -l -c \"cd $MPJ_HOME/bin;./mpjdaemon_linux_xxxx start;\""

Note that in both mpjboot and mpjhalt the script attempts to identify the correct system architecture and execute the appropriate command; however, if you are having issues you can simply comment out all the options that do not apply to the system in question. Just ensure that the uncommented line matches the architecture of the system on which you plan to run it. The only systems SHARCNET has currently that can run MPJ are those based on Opteron and Xeon processors (on which x86_64 is likely the version you should be using). When you are done, the only uncommented ssh lines in the two files should be: in mpjboot:

ssh $host "bash -l -c \"cd $MPJ_HOME/bin;./mpjdaemon_linux_x86_64 start;\"

and in mpjhalt:

ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_linux_x86_64 stop;"

assuming you are running the x64_64 variant of the software.

Add a Job-starting Wrapper Script

The way that MPJ expects to operate is that you provide a machines file listing all the hosts on which you want to be able to run your jobs; the mpjboot script attempts to start communication daemons on those hosts, and then will distribute processes amongst those nodes. It is critical that you do not run the program in this way on SHARCNET production clusters

All of our computing resources are allocated by a batch scheduler. If you start processes on compute nodes that are already running jobs, you will hammer the performance of both your job, and the one already running on the nodes. No one will appreciate this, and it should never be done. What we want to do is use the scheduler to get an node allocation for us, and then write out an appropriate machines file and start our job. Fortunately our scheduler supports this in an indirect way. We will make use of the --nompirun option to sqsub, which will prevent the normal MPI job starter from running our job and will simply start a single copy of whatever we submit. We will submit the following script, which parses out the node allocation from the environment and writes out an appropriate machines file, boots the MPJ environment (pausing for a short time to ensure the environment has time to initialize properly---you can get connection failures if the job is launched before the environment fully initializes), launches your job, and then cleans up the environment when it is done:

This script is appropriate to "old environment" systems such as requin ONLY


MACHINEFILE=$(mktemp -p . machine.tmp.XXXXXXXXXX)

# CLI validation - other than "there should be arguments to this script"
# we'll keep validation simple at this point"
if [ -z $EXE ]; then
    echo ">>> error: no executable/args specified"
    exit 1

echo "00> ALERT: !! This script should only be used for MPJ submission on requin !!"

echo "01> NODELIST = ${NODELIST}"

for VALUE in ${NODELIST}; do
    # This is pretty specific: the list of nodes will be of the format
    # "req1 2 req2 2 ..." --- so, if after deleting the longest match after
    # (and including) a leading number something is left in the string then
    # that is the machine name (assumes no machine name starts with numbers)
    if [ -n "${MACHINE}" ]; then
        echo "02> MACHINE = ${MACHINE} - writing"   
        echo ${MACHINE} >> ${MACHINEFILE}
        NPROC=$((NPROC + VALUE))
        echo "02> NPROC = ${NPROC} - accumulating"  

echo "03> ...booting MPJ environment (pausing 30s to permit time for init.)"
mpjboot ${MACHINEFILE}
sleep 30

echo "04> ...executing job [${EXE}] on ${NPROC} processors:" -machinesfile ${MACHINEFILE} -np ${NPROC} ${EXE}

echo "05> up"
mpjhalt ${MACHINEFILE}
echo "05> The machinefile for this run was: ${MACHINEFILE}; it has not been"
echo "05> deleted in case problems with execution result in an unclean exit."
echo "05> Please run \"mpjhalt ${MACHINEFILE}\" to ensure MPJ run- time has"
echo "05> been terminated properly (the machine file can then be removed)." 

echo "...done"
exit 0

We'll assume you save the above script as, and ensure you set the permissions such that it can be executed by name, i.e.:

chmod +x

You can place this file in ${MPJ_HOME}/bin so it is easily accessible, and kept together with the other MPJ-related binaries. Note that there are some "status" messages that are generated by this script that will appear in the output; however, I figured this was useful in case of issues with the job not running properly. Better some feedback than none.

Testing the Installation

You should now test the basic installation to ensure it is all working properly. We'll use a simple "Hello, world!" program provided by the developers for this purpose. Copy the following program into a file called

import mpi.*;
public class World
    public World() { }
    public static void main(String args[]) throws Exception
        int me = MPI.COMM_WORLD.Rank();
        int size = MPI.COMM_WORLD.Size();
        System.out.println("Hi from <"+me+">");

Assuming you have set the CLASSPATH as described above, you can compile this as you would any other Java program:


You can now attempt to submit this job, which we will submit to the test queue so it will execute immediately. Submit the job as follows (details regarding the command-line arguments appears in the following section):

sqsub -r 10m -q mpi -t -n 4 --nompirun -o OUTPUT.TXT ./ World

Once the job completes, you can check the contents of OUTPUT.TXT. Amidst a slew of debugging-related output from the MPJ framework and submission wrapper script you should see the output from the MPJ program:

Hi from <2>
Hi from <0>
Hi from <1>
Hi from <3>

and the summary appended by the scheduler should indicate the job completed successfully.

Once you know it is running properly, feel free to comment out or remove all the "trace" output from the submission script if you find it bothersome.

Running MPJ Jobs

Assuming you have everything described in the previous sections working and tested, you should now be able to submit MPJ jobs to our batch scheduling system. The rationale for some of these instructions was covered in previous sections (see esp. #Add a Job-starting Wrapper Script).

Key points:

  • submit job to the mpi queue (can make use of the test queue as you would normally if desired, -t)
  • must submit using --nompirun option to sqsub
  • must submit the provided script ( with additional arguments - see below) as the executable for the job

The script will use the arguments to the script as the executable and arguments to be executed as the MPJ job. By way of example, if the job you wish to run is the Main program in Myproj.class, and it expects three floating point arguments, which in a serial world you would execute as:

java Myprog 1.1 2.2 3.3

would be submitted to the batch scheduler using the script instead: Myprog 1.1 2.2 3.3

Now, keeping in mind we must use the --nompirun option to prevent our scheduler from invoking the MPI job starter, so that we can invoke the MPJ job starter instead, we'll consider the situation where we want to run the above example as a parallel job on eight processors (we're assuming it has been written as an appropriate MPJ program). Since our scheduler demands a run-time estimate, we'll just pick a random number (10 minutes) for our example. A file destination for all output is also required via the -o option to redirect output to a file In this example we assumed output directed to a file called OUTPUT.TXT. The complete submission command would look like:

sqsub -q mpi -n 8 --nompirun -r 10m -o OUTPUT.TXT ./ Myprog 1.1 2.2 3.3

Other than the --nompirun option, and the need to use a job starter script rather than the executable directly, job submission via SQ is otherwise the same and you can refer to documentation for the various SQ tools (sqsub, sqjobs, etc.) for additional details as necessary.

Update: compiling and using MPJ Express on orca

The ticket provides full information on how to download, compile, and run MPJ Express in the production queue on orca.

Additional References

The following links provide extensive additional reference and documentation for the installation and use of the MPJ software, and API details for programmers.

If you have questions or would like assistance performing a local installation of this software, please contact David McCaughan (, or open a problem ticket in our problem tracking system.