The DORII Middleware. MPI-Start

Overview and general information

MPI-Start is a set of scripts intended to support the execution of parallel applications on the Grid. MPI-Start hides from the users of the infrastructure the complexity of launching and configuration of parallel grid jobs by the Workload Management System and allows them comfortable usage of MPI tools and libraries.

MPI-Start has a component-based architecture. The current MPI-Start version supports three frameworks related to the MPI implementation, the scheduler and the file distribution method. In addition to this MPI-Start has a supporting mechanism for callback hooks. All these parts are bound together by the mpi-start core:

The main features of MPI-Start are:

  • Offers a unique and stable interface to the WfMS. This means that the WfMS only needs to implement support for mpi-start. The information which MPI implementation the user requested is mainly passed through the WfMS to mpi-start and doesn’t need any further processing by the WfMS (exception: PACX-MPI).
  • Easily extensible via a component (plugin) based architecture. It is enough to write an extra component to support new MPI implementations, new schedulers etc.
  • Handles compatibility issues between different schedulers and MPI implementations. In case the MPI implementation is not able to understand the information provided by a certain scheduling system, mpi-start will perform all necessary tasks to convert the scheduler information into valid information for the MPI implementation.
  • To be highly portable mpi-start has been completely written in POSIX shell scripts.This fact also makes it easy for site administrators to perform modifications for the local site if required.
  • Mpi-start is completely relocatable. This feature allows mpi-start to offer support for remote injection. Remote injection means that mpi-start does not need to be installed on a remote site - it is possible to send mpi-start along with the job.
  • All MPI implementations require that the executable is copied on every execution node. Therefore in the case of a non-shared file system mpi-start will automatically distribute the binary to the remote nodes.
  • Mpi-start offers a possibility to the user to register callback functions which will be called before/after the parallel jobs have been started/finished.

Extensive debugging features have been implemented as well. These debugging features are available to the user and allow the easy remote detection of any problem during the startup of the job.

The workflow of MPI-Start is given as a list here:

  1. check for scheduler plugin
  2. activate scheduler plugin
  3. get machinefile
  4. check for MPI plugin
  5. check DORII/EGEE environment
  6. parameter adjustments depending on scheduler/MPI
  7. activate MPI plugin and set MPI implementation specific parameters
  8. check user pre-run hooks
  9. check file system
  10. choose file distribution plugin if FS is non-shared
  11. distribute files using the file distribution plugin
  12. check for external MPI tools
  13. execute MPI application with generated command
  14. check user post-run hook
  15. clean up files using the file distribution plugin
  16. return the value of the mpirun/mpiexec command

Installation

MPI-Start is being installed on all the CEs where MPI-Libraries are available.

Beside the version of MPI-Start preinstalled on a CE the user can optionally specify another version to be used for his grid job. In that case the specific version of MPI-Start required by the user is additionally attached to the user's grid job (as a component of the InputSandbox). The version of MPI-Start to be used is specified through the environmental variable $I2G_MPI_Start (refers initialy to the default version available on a CE).

Documentation and technical support

More details about MPI-Start can be found at http://www.hlrs.de/organization/amt/projects/mpi-start/. For technical support or reporting troubleshoots please refer to the DORII Helpdesk sercice available at https://dorii-helpdesk.grid.elettra.trieste.it/

Back to top
DORII project receives funding from the EC's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° RI-211693.