Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 100
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON ENGINEERING COMPUTATIONAL TECHNOLOGY
Edited by: B.H.V. Topping
Paper 74

Efficient Parallelization of Java Applications for Semantic Web by means of the Message-Passing Interface

A. Cheptsov

HLRS - High Performance Computing Center Stuttgart, University of Stuttgart, Germany

Full Bibliographic Reference for this paper
A. Cheptsov, "Efficient Parallelization of Java Applications for Semantic Web by means of the Message-Passing Interface", in B.H.V. Topping, (Editor), "Proceedings of the Eighth International Conference on Engineering Computational Technology", Civil-Comp Press, Stirlingshire, UK, Paper 74, 2012. doi:10.4203/ccp.100.74
Keywords: semantic web, Java, parallelization, message-passing interface, random indexing, Open MPI.

Summary
Driven by the concepts of portability and interoperability, Java has become a widely accepted general-purpose programming language with a large existing code base and programmer communities. Among others, Java has gained a wide adoption in data-centric computing such as information retrieval and semantic web that have a potential demand for parallel and high-performance computing. Whereas the recent advances of those communities require their Java applications to scale up to the requirements of the vast and rapidly increasing data, e.g. coming from millions of sensors in the Smart Cities domain, Java fairly lacks mechanisms that would enable the Java applications to scale beyond the single NUMA-node across the network interconnect of a modern supercomputing system. A number of software applications developed in Java have been facing performance and scalability issues in view of the growing computation demands. The message-passing interface (MPI) has proved to be an efficient solution for a wide variety of parallel applications developed with "traditional" high performance computing languages such as C and Fortran. This paper demonstrates that the design features of Java prevent the native MPI realization to massively scale on the production high performance computing systems. As a reaction on this challenge, a solution is presented that enables performance issues of the native implementation to be overcome by integration in the highly-scalable C realization such as Open MPI.

The MPI is one of the most efficient techniques of the parallel applications' implementation. Since the emergence of open source libraries such as MPJExpress and mpiJava, the MPI has been enabled to parallelize the Java applications as well. Nevertheless, for a long time this technique was underestimated for Java developments for many reasons; perhaps the most important of them is the complexity of applying a process-based programming model. This paper introduces an MPI-based programming model to parallelize the Java applications. Presenting a common parallelisation strategy, which is based on domain decomposition, a MPI-parallel version of a pilot Semantic Web application (Airhead) is implemented which performs random indexing and search in large semantically annotated text sets. The technique described allows any other Java application to efficiently apply the MPI to parallelize the available serial code with minimum knowledge about the distributed-memory parallelization. For our pilot application, a speed-up of 33 on 16 compute nodes has already been achieved. With this experience, the authors encourage other researchers to apply the MPI-based parallelization to their respective Java applications.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £50 +P&P)