Saturday 12 February 2011

Using MPI_Spawn

This article will present how to use MPI_Spwan and what are the problem associated with it. This will first show the profiler code, then the display code. And finally discuss the problems.


Spawn the interface: profiler point of view


In order to spawn the interface, the PATH variable was exported in order to contain the path to the executable mpidisplay, that is the simple interface developed for this test. It is basically counting the number of calls to some of the communication function of MPI.

The spawning actually occurs in the MPI_Init overloaded function :

int world_rank;
MPI_Comm intercomm = MPI_COMM_NULL;
int intercomm_child_rank = 0;

int MPI_Init(int* argc, char ***argv)
{
int ret;

ret = PMPI_Init(argc, argv);

PMPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

fprintf(stderr, "!profiler(%d)! MPI_Init()\n", world_rank);

// spawn the interface
MPI_Comm_spawn("mpidisplay", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);

return ret;
}


This is simply starting the display when the profiler is started through mpiexec and link them together. But as soon as MPI_Finalize is called, both of them are killed, and the interface is closed. Thus a trick was used to make the profiler waiting for the child to be closed to stop running.

The idea is that the display sends a message to the profiler when it is closed, and that the profiler waits on this message with an asynchronous receive from the beginning. When MPI_Finalize is called on the profiler, a MPI_Wait of that message is performed, basically waiting for the display to be closed to resume. The profiler also send information about is imminent death, to display the information if needed on the display.


#define CHILD "mpidisplay"
#define CHILD_ARGS MPI_ARGV_NULL

int world_rank;
MPI_Comm intercomm = MPI_COMM_NULL;
int intercomm_child_rank = 0;

static Intra_message quitmessage[INTRA_MESSAGE_SIZE];
MPI_Request dead_child = MPI_REQUEST_NULL;

int MPI_Init(int* argc, char ***argv)
{
int ret;
Intra_message message[INTRA_MESSAGE_SIZE];

ret = PMPI_Init(argc, argv);

PMPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
//PMPI_Comm_size(MPI_COMM_WORLD, &world_size);

fprintf(stderr, "!profiler(%d)! MPI_Init()\n", world_rank);


// spawn the interface
MPI_Comm_spawn(CHILD, CHILD_ARGS, 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);

sprintf(message, "%d Init\0", world_rank);

PMPI_Ssend(message, INTRA_MESSAGE_SIZE, INTRA_MESSAGE_MPITYPE, intercomm_child_rank, 0, intercomm);

// wait for a message that "I'm dying"
PMPI_Irecv(&(quitmessage[0]), INTRA_MESSAGE_SIZE, INTRA_MESSAGE_MPITYPE, intercomm_child_rank, 0, intercomm, &dead_child);// check each time if the child is dead...

return ret;
}

int MPI_Finalize(void)
{
int ret;

fprintf(stderr, "!profiler!(%d): MPI_Finalize()\n", world_rank);

if ( dead_child != MPI_REQUEST_NULL )
{
Intra_message message[INTRA_MESSAGE_SIZE];

sprintf(message, "%d Finalize\0", world_rank);

// send my death to the display
PMPI_Ssend(message, INTRA_MESSAGE_SIZE, INTRA_MESSAGE_MPITYPE, intercomm_child_rank, 0, intercomm);

fprintf(stderr, "!profiler!%d is waiting for its child...\n", world_rank);

// wait for the display to quit
PMPI_Wait(&dead_child, MPI_STATUS_IGNORE);
fprintf(stderr, "!profiler!%d finished waiting...\n", world_rank);
}

ret = PMPI_Finalize();

return ret;
}

Spawn the interface: display point of view


The display was implemented using Qt, and is therefore in C++. The MPI calls are the same, just organized in a Object Oriented fashion.

When the child is spawned, it can retrieve its parent information, and do so in order to get the special communicator. Then it simply uses normal MPI communication with it.

The MPIWatcher class was written to handle the MPI communication. It is implementing the singleton design pattern. The MPI init code are therefore present in the global call that creates the object, and are normally performed only once (as the object is carried by until the end of the program).


MPIWatch* MPIWatch::getWatcher(void)
{
if ( instance == 0 )
{
//MPI::Intercomm parent = MPI::COMM_NULL;
int parentSize;

MPI::Init();
parent = MPI::Comm::Get_parent();

if ( parent == MPI::COMM_NULL )
{
std::cerr << "Cannot connect with the parent program! Aborting." << std::endl;
//parent.Abort(-1);
MPI::Finalize();
return 0;
}

parentSize = parent.Get_remote_size();

if ( parentSize != 1 )
{
std::cerr << "Parent communicator size is " << parentSize << "! It should be 1. Aborting." << std::endl;
parent.Abort(-1);
return 0;
}

instance = new MPIWatch();
}

return instance;
}

The instance process to catch up message will be discuss later. Basically the MPIWatch do synchronized receives from his father, and push the result on a stack, that is read by the interface.

When the window is closed, the MPIWatch object has to be destroyed, and the actual message is therefore sent to the father.


bool MPIWatch::delWatcher()
{
if ( ! instance )
return false;

if ( instance->isRunning() )
return false;

QString s(MESSAGE_QUIT);

parent.Ssend(s.toStdString().c_str(), INTRA_MESSAGE_SIZE, INTRA_MESSAGE_MPITYPE, 0, 0);

MPI::Finalize();
parent = MPI::COMM_NULL;

delete instance;
instance = 0;

return true;
}

Problems with spawned instances


The major issue with the spawn interface is the actual call to MPI_Finalize. When one of the child or parent calls it, the ORTE process - the daemon that handles the MPI communication on OpenMPI and MPICH should have something similar - kills the other. Therefore even with the trick to wait for the display from the profiler would not always terminate the actual execution properly. It is actually rather bizarre that there is no proper way of doing so.

A bit more research will certainly be done on that problem, to see if closing the communicator can be effective. But there is not that much advantage compare to a typical client-server application, and next development tests will be done on that.

No comments:

Post a Comment