MPJ: MPI-like message passing for Java

.


INTRODUCTION AND BACKGROUND
A likely prerequisite for parallel programming in a distributed environment is a good message passing API. Java comes with various ready-made packages for communication, notably an easy-to-use interface to BSD sockets, and the Remote Method Invocation (RMI) mechanism. Interesting as these interfaces are, it is questionable whether parallel programmers will find them especially convenient. Sockets and remote procedure calls have been around for approximately as long as parallel computing has been fashionable, and neither of them has been popular in that field. Both of these communication models are optimized for client-server programming, whereas the parallel computing world is mainly concerned with a more symmetric model, where communications occur in groups of interacting peers.
This peer-to-peer model of communication is captured in the successful Message Passing Interface (MPI) standard, established in 1994 [1]. MPI directly supports the Single Program Multiple Data supporting the MPI 1.1 subset of that standard. In some cases the extra runtime information available in Java objects allows argument lists to be simplified relative to the C++ binding. In other cases restrictions of Java, especially the fact that all arguments are passed by value in Java, forces some changes to argument lists. But in general mpiJava adheres closely to earlier standards.
The implementation of mpiJava is through JNI wrappers to native MPI software. Interfacing Java to MPI is not always trivial. We often see low-level conflicts between the Java runtime and the interrupt mechanisms used in MPI implementations. The situation is improving as JDK matures, and the mpiJava software now works reliably on top of Solaris MPI implementations and various shared memory platforms. A port to Windows NT (based on WMPI) is available, and other ports are in progress.
Other work in progress includes development of demonstrator applications, and Java-specific extensions such as support for direct communication of serializable objects.

JavaMPI-automatic generation of MPI wrappers
In principle, the binding of existing MPI library to Java using JNI amounts to either dynamically linking the library to the Java virtual machine, or linking the library to the object code produced by a standalone Java compiler. Complications stem from the fact that Java data formats are in general different from those of C. Java implementations will have to use JNI, which allows C functions to access Java data and perform format conversion if necessary. Such an interface is fairly convenient for writing new C code to be called from Java, but is not adequate for linking existing native code.
Clearly an additional interface layer must be written in order to bind a legacy library to Java. A large library like MPI has over a hundred exported functions, therefore it is preferable to automate the creation of the additional interface layer. The Java-to-C interface generator (JCI) [11] takes as input a header file containing the C function prototypes of the native library. It outputs a number of files comprising the additional interface: a file of C stub-functions; files of Java class and native method declarations; shell scripts for doing the compilation and linking. The JCI tool generates a C stubfunction and a Java native method declaration for each exported function of the MPI library. Every C stub-function takes arguments whose types correspond directly to those of the Java native method, and converts the arguments into the form expected by the C library function.
As the JavaMPI bindings have been generated automatically from the C prototypes of MPI functions, they are very close to the C binding. However, there is nothing to prevent from parting with the C-style binding and adopting a Java-style object-oriented approach by grouping MPI functions into a hierarchy of classes.

MPIJ-MPI-like implementation in pure Java
MPIJ is a completely Java-based implementation of MPI which runs as part of the Distributed Object Group Metacomputing Architecture (DOGMA) system. MPIJ implements a large subset of MPI-like functionality including all modes of point-to-point communication, intracommunicator operations, groups, and user-defined reduction operations. Notable capabilities that are not yet implemented include process topologies, intercommunicators, and user-defined datatypes, but these are arguably needed for legacy code only.
MPIJ communication uses native marshaling of primitive Java types. On Win32 platforms this technique allows MPIJ to achieve communication speeds comparable to, and in some instances exceeding, native MPI implementations [12]. Our performance evaluation experiments show that Java communication speed would be greatly increased if native marshaling were a core Java function.
A key feature of a pure Java MPI-like implementation is the ability to function on applet-based nodes. In MPIJ, this provides a flexible method for creating clusters of workstations without the need to install any system or user software related to the message-passing environment on the participating nodes.

Rationale
The MPI standard is explicitly object-based. The C and Fortran bindings rely on 'opaque objects' that can be manipulated only by acquiring object handles from constructor functions, and passing the handles to suitable functions in the library. The C++ binding specified in the MPI-2 standard collects these objects into suitable class hierarchies and defines most of the library functions as class member functions. The draft MPJ API specification follows this model, lifting the structure of its class hierarchy directly from the C++ binding.
The initial specification builds directly on the MPI-1 infrastructure provided by the MPI Forum, together with language bindings motivated by the C++ bindings of MPI-2. The purpose of this phase of the effort is to provide an immediate, ad hoc standardization for common message passing programs in Java, as well as to provide a basis for conversion between C, C++, Fortran 77, and Java. Eventually, support for other parts of MPI-2 also belongs here, particularly dynamic process management. † The position of the working group was that the initial MPI-centric API should subsequently be extended with more object-oriented, Java-centric features, although the exact requirements for this later phase have not yet been established.
The major classes of the MPJ specification are illustrated in Figure 1. The class MPJ only has static members. It acts as a module containing global services, such as initialization, and many global constants including the default communicator COMM WORLD. The most important class in the package is the communicator class Comm. All communication functions in MPJ are members of Comm or its subclasses. As usual in MPI, a communicator stands for a 'collective object' logically shared by a group of processors. The processes communicate, typically by addressing messages to their peers through the common communicator. A class that will be important in the following discussion is the Datatype class. This describes the type of the elements in the message buffers passed to send, receive, and all other communication functions.

Example and data types
In general the point-to-point communication operations are realized as methods of the Comm class. The basic point-to-point communication operations are send and receive. Their use is illustrated in Figure 2. Consider, for example, the MPJ analogue of the operation MPI SEND. The method prototype is: void Comm.send(Object buf, int offset, int count, Datatype datatype, int dest, int tag) buf send buffer array offset initial offset in send buffer count number of items to send datatype data type in each item in send buffer dest rank of destination tag message tag The data part of the message consists of a sequence of count values, each of the type indicated by datatype. The actual argument associated with buf must be an array with elements of corresponding type. The value offset is a subscript in this array, defining the position of the first item of the message. The elements of buf may have primitive type or class type. If the elements are objects, they must be serializable objects. If the datatype argument represents an MPI-compatible basic type, its value must be consistent with the element type of buf. have a unique base type, one of the nine types enumerated above. If the datatype argument of a communication function represents an MPJ derived type, its base type must agree with the Java element type of the associated buf argument. Alternatively, if it was decided to remove derived types from MPJ, datatype arguments could be removed from many functions, and Java runtime inquiries could be used internally to extract the element type of the buffer. ‡

MPJ as an MPI-like language binding
MPJ does not have the status of an official language binding for MPI. But, as a matter of interest, this section will compare some surface features of the Java API with standard MPI language bindings.
All MPJ classes belong to the package mpj. Conventions for capitalization, etc., in class and member names generally follow the recommendations of Sun's Java code conventions [13]. In general these conventions are consistent with the naming conventions of the MPI 2.0 C++ standard. Exceptions to this rule include the use of lower case for the first letters of method names, and avoidance of underscore in variable names.
With MPI opaque objects replaced by Java objects, MPI destructors can be absorbed into Java object destructors (finalize methods), called automatically by the Java garbage collector. MPJ adopts this strategy as the general rule. Explicit calls to destructor functions are typically omitted from the Java ‡ Or methods like send could be overloaded to accept buffers with elements of the nine basic types. The disadvantage of this approach is that it leads to a major proliferation in the number of methods user code. An exception is made for the Comm classes. In MPI the destructor for a communicator is a collective operation, and the user must ensure that calls are made at consistent times on all processors involved. Automatic garbage collection would not guarantee this. Hence the MPJ Comm class has an explicitly free method. Some options allowed for derived data types in the C and Fortran bindings are absent from MPJ. In particular, the Java virtual machine does not support any concept of a global linear address space. Therefore, physical memory displacements between fields in objects are unavailable or ill-defined. This puts some limits on the possible uses of any analogues of the MPI TYPE STRUCT type constructor. In practice the MPJ struct data type constructor has been further restricted in a way that makes it impossible to send mixed basic data types in a single message. However, this should not be a serious problem, since the set of basic data types in MPJ is extended to include serializable Java objects.
Array size arguments are often omitted in MPJ, because they can be picked up within the function by reading the length member of the array argument. A crucial exception is for message buffers, where an explicit count is always given. Message buffers aside, typical array arguments to MPI functions (e.g. vectors of request structures) are small arrays. If subsections of these must be passed to an MPI function, the sections can be copied to smaller arrays at little cost. In contrast, message buffers are typically large and copying them is expensive, so it is worthwhile to pass an extra size argument to select a subset. (Moreover, if derived data types are being used, the required value of the count argument is always different to the buffer length.) C and Fortran both have ways of treating a section of an array, offset from the beginning of the array, as if it was an array in its own right. Java does not have any such mechanism. To provide the same flexibility in MPJ, an explicit integer offset parameter also accompanies any buffer argument. This defines the position in the Java array of the first element actually treated as part of the buffer.
The C and Fortran languages define a straightforward mapping (or 'sequence association') between their multidimensional arrays and equivalent one-dimensional arrays. In MPI a multidimensional array passed as a message buffer argument is generally treated like a one-dimensional array with the same element type. Offsets in the buffer (such as offsets occurring in derived data types) behave like offsets in the effective one-dimensional array. In Java the relationship between multidimensional arrays and onedimensional arrays is different. An 'n-dimensional array' is equivalent to a one-dimensional array of (n−1)-dimensional arrays. In the MPJ interface message buffers are always treated as one-dimensional arrays. The element type may be an object, which may have array type. Hence, multidimensional arrays can appear as message buffers, but the interpretation and behavior is significantly different.
Unlike the standard MPI interfaces, the MPJ methods do not return explicit error codes. Instead, the Java exception mechanism is used to report errors.

Complete draft API
The Appendix of this paper lists the public interfaces of all the classes. Of course this only defines syntax. A more complete description of the semantics of all methods is available in [4].

OPEN ISSUES
The API described in [4] is not assumed to be 'final'. It was originally presented as a starting point for discussion. In this section we will mention some areas we consider to be open to improvement.

Derived data types
It is unclear whether a Java interface should support MPI-like derived data types. A proposal for a Javacompatible subset of derived types is included in the draft specification document [4], but deleting it would simplify the API significantly. In particular, datatype arguments for buffers could be dropped.
One factor in favor of including MPI-like derived data types in MPJ is the support for legacy MPI applications. The possible need to interact with native code that uses derived data types is probably best supported by including derived data types in the MPJ API specification.
It has been argued that the functionality of derived data types is already provided by Java objects, and supporting both only adds unneeded complexity. But in fact there are good reasons to retain some additional functionality of derived data types. Any scientific code, written in Java or otherwise, will benefit from the ability to efficiently and conveniently send sections (subsets) of program arrays. In MPI, this is one of the most useful roles of the so-called derived data types, and MPJ object data types do not address this requirement. The discussion of whether derived data types are to be supported in MPJ should therefore be closely linked with the discussion of how true 'scientific' (multi-dimensional) arrays, allowing Fortran-90-like sectioning operations, should be handled.

Multidimensional array
Some specific support for communicating multidimensional arrays would be desirable. In the current proposal, sending a multidimensional array involves either sending one row at a time or using Java object serialization, both of which will introduce performance bottlenecks. For instance, our experience has shown that MPIJ sends a 200 × 200 array of doubles over Fast Ethernet much faster when multidimensional array support is included than when individual rows are sent. More detailed analysis of this problem is presented in [12,14].
Trying to fix the problem for standard Java multidimensional arrays is probably the wrong approach. There is a deeper problem that the Java 'array-of-arrays' model for multidimensional arrays is not especially well-suited for 'scientific' computation. This issue is being actively addressed by other groups in the Java Grande Forum. In particular the work by IBM on the Array package [15], which has been adopted by the Java Grande Numerics working group, is very relevant. A more complete MPJ specification should probably include mechanisms for efficiently communicating standardized 'scientific' arrays, and their sections.
In fact, if a standard like the Array package were adopted, and if it supported description of array sections (without copying elements), it is quite likely that the remaining arguments in favor of keeping an MPI-like derived data type mechanism would go away.

Overloaded communication operations
It has been suggested that many of the communication operations should be overloaded to provide simplified variants that omit arguments like offset, count (and possibly datatype). This suggestion is not included in the current proposal, but it could be added. The primary argument in favor is that it simplifies user code. For instance, becomes MPJ.COMM_WORLD.send(message, MPJ.CHAR, 1, 99); The obvious counter-argument is that this very significantly increases the total number of methods in the API. A possible compromise is to provide overloaded versions only of specific common functions such as point-to-point communication functions (the argument against this, in turn, is that it looks inconsistent).

Other issues
The current draft MPJ specification supports all MPI-like error handling using the Java exception model. An alternative suggestion that has been put forward is that all MPJ exceptions be derived from two classes: MPJException and MPJRuntimeException. Subclasses of MPJException would represent errors that the user would be required to catch, whereas subclasses of MPJRuntimeException would represent uncommon or unusual errors. It has also been suggested that certain MPJ exceptions could carry subexceptions when the cause of the error is another exception. Whether or not to utilize MPI-like user-defined and predefined error handlers is also an open question. In principle, these error handlers could still serve a purpose in addition to the exception mechanism mentioned above.
It has been suggested that the specification of user-defined operations could be simplified. In the current proposal, which is modelled after a procedural approach, a more complex or unique operation can be created in two phases. Initially users define functions and then create a new operation class (Op). This results in the creation of an extra class (UserFunction) which is not really necessary. An alternative approach would be to simply have users define subclasses of the class Op with a named method (for example, call). This design would also eliminate the overhead associated with method invocation.
A profiling interface for MPJ has not yet been defined. A possible general design approach is for profiling class and method names to exactly match those of the non-profiling classes and methods. Implementors would then place the compiled binary files in different locations. As Java linking is always dynamic, this would allow users to enable or disable profiling by simply selecting the appropriate codebase (e.g. by changing the CLASSPATH environment variable).

DISCUSSION AND CONCLUSION
An initial goal of the Java Grande Message Passing working group was to promote a standardized MPI binding for Java. It became apparent that this road was likely to produce a collision of interest with the existing MPI community, and the name of the new API was changed to MPJ. MPJ was designated an 'MPI-like' specification. The current specification is available in [4]. This specification is essentially complete and self-contained, but as discussed in Section 4, it is not necessarily considered 'final'.
Because the proposed API was designed on object-oriented principles, most of the original MPI specification actually maps very naturally into Java. So long as one accepts the Java Grande premise that Java is an excellent basis for technical computing, an MPI-like approach to parallel computing seems very promising-more promising than some have assumed. But there remain non-obvious issues about supporting basic MPI functionality. Some of the more difficult ones boil down to the lack of a good model of scientific arrays in Java. This issue is somewhat outside the purview of this working group, but is being actively discussed by the Java Grande Numerics working group [16].
Reference implementations of the MPJ specification are currently (March 2000) under development. An implementation based on JNI wrappers to native MPI will be created by adapting the mpiJava wrappers [7]. While this is a good approach in some situations, it has various disadvantages and conflicts with the ethos of Java, where pure-Java, write-once-run-anywhere software is the order of the day. A design for a pure-Java reference implementation of MPJ has also been outlined [17]. In this case, design goals were that the system should be as easy to install on distributed systems as we can reasonably make it, and that it be sufficiently robust to be useable in an Internet environment.
Back in 1994, MPI-1 was originally designed with relatively static platforms in mind. To better support computing in volatile Internet environments, modern message passing designs for Java will have to support (at least) features such as dynamic spawning of process groups and parallel client/server interfaces as introduced in the MPI-2 specification. In addition, a natural framework for dynamically discovering new compute resources and establishing connections between running programs already exists in Sun's Jini project [18], and one line of investigation is into MPJ implementations operating in the Jini framework.
Closely modeled as it is on the MPI standards, the existing MPJ specification should be regarded as a first phase in a broader program to define a more Java-centric high performance messagepassing environment. In future a detachment from legacy implementations involving Java on top of native methods will be emphasized. We should consider the possibility of layering the messaging middleware over standard transports and other Java-compliant middleware (like CORBA). In a sense, the middleware developed at this level should offer a choice of emphasis between performance or generality, while always supporting portability. We note an opportunity to study and standardize aspects of real-time and fault-aware programs, drawing on the concepts learned in the MPI/RT activity [19]. For performance, we should seek to take advantage of what has been learned since MPI-1 and MPI-2 were finalized, or ignored in MPI standardization for various reasons-for instance drawing on the body of knowledge completed within the MPI/RT Forum. From here we may at least glean design hints concerning channel abstractions, and the more direct use of object-oriented design for message passing than was seen in MPI-1 or MPI-2. The value of this type of messaging middleware in the embedded and real-time Java application spaces should also be considered.
Of course, a primary goal in the above mentioned, both current and future work, should be the aim to offer MPI-like services to Java programs in an upward compatible fashion. The purposes are twofold: performance and portability.