MPI-Start is a set of scripts intended to support the execution of parallel applications on the Grid. MPI-Start hides from the users of the infrastructure the complexity of launching and configuration of parallel grid jobs by the Workload Management System and allows them comfortable usage of MPI tools and libraries.
MPI-Start has a component-based architecture. The current MPI-Start version supports three frameworks related to the MPI implementation, the scheduler and the file distribution method. In addition to this MPI-Start has a supporting mechanism for callback hooks. All these parts are bound together by the mpi-start core:
The main features of MPI-Start are:
Offers a unique and stable interface to the WfMS. This means that the WfMS only needs to implement support for mpi-start. The information which MPI implementation the user requested is mainly passed through the WfMS to mpi-start and doesn’t need any further processing by the WfMS (exception: PACX-MPI).
Easily extensible via a component (plugin) based architecture. It is enough to write an extra component to support new MPI implementations, new schedulers etc.
Handles compatibility issues between different schedulers and MPI implementations. In case the MPI implementation is not able to understand the information provided by a certain scheduling system, mpi-start will perform all necessary tasks to convert the scheduler information into valid information for the MPI implementation.
To be highly portable mpi-start has been completely written in POSIX shell scripts.This fact also makes it easy for site administrators to perform modifications for the local site if required.
Mpi-start is completely relocatable. This feature allows mpi-start to offer support for remote injection. Remote injection means that mpi-start does not need to be installed on a remote site - it is possible to send mpi-start along with the job.
All MPI implementations require that the executable is copied on every execution node. Therefore in the case of a non-shared file system mpi-start will automatically distribute the binary to the remote nodes.
Mpi-start offers a possibility to the user to register callback functions which will be called before/after the parallel jobs have been started/finished.
Extensive debugging features have been implemented as well. These debugging features are available to the user and allow the easy remote detection of any problem during the startup of the job.
The workflow of MPI-Start is given as a list here:
check for scheduler plugin
activate scheduler plugin
get machinefile
check for MPI plugin
check DORII/EGEE environment
parameter adjustments depending on scheduler/MPI
activate MPI plugin and set MPI implementation specific parameters
check user pre-run hooks
check file system
choose file distribution plugin if FS is non-shared
distribute files using the file distribution plugin
check for external MPI tools
execute MPI application with generated command
check user post-run hook
clean up files using the file distribution plugin
return the value of the mpirun/mpiexec command
MPI-Start is being installed on all the CEs where MPI-Libraries are available.
Beside the version of MPI-Start preinstalled on a CE the user can optionally specify another version to be used for his grid job. In that case the specific version of MPI-Start required by the user is additionally attached to the user's grid job (as a component of the InputSandbox). The version of MPI-Start to be used is specified through the environmental variable $I2G_MPI_Start (refers initialy to the default version available on a CE).
Back to top