Sunday 20 February 2011

Using MPI sockets

This article will present how to use MPI to create a remote socket and use it through MPI calls. Remember that we have the profiler - the library part that uses the profiling interface of MPI to profile the program - and the display - that displays the information sent by the profiler - parts that communicate.

First of all a research was made in order to try to find out how to create a socket with MPI on the profiler and communicate with some other socket library on the display. So far no example were found using that approach, and as this is a technical test, no real implementation was done that way.

The approach used here is to bind the profiler and display communicators using a technique similar to MPI_Spawn but that doesn't require the 2 softwares to be tight together. This is done using the MPI_Open_port functions.

The code wasn't modifier a lot from the MPI Spawn approach, as you are going to see. The reference used to understand and develop that approach was actually the MPI standard website: 5.4.6. Client/Server Examples


The profiler side - server side


The global idea of that approach is for the profiler to open a port, and wait for some display to connect on it. The idea can be pushed further, if needed, to allow several display to connect on a single profiler (sharing the view of the program on several display for example).

Actually what was modified from the Spawn example is the way to connect the profiler and the display together. Rather than calling MPI_Spawn, MPI_Open_port was used, and few lines were added just before finalizing the execution.


Opening the port

int start_child(char* command, char* argv[])
{
MPI_Open_port(MPI_INFO_NULL, port_name);

/* child doesn't find it...
sprintf(published, "%s-%d\0", PROFNAME, world_rank);

MPI_Publish_name(published, MPI_INFO_NULL, port_name);*/

fprintf(stderr, "!profiler!(%d) open port '%s'\n", world_rank, port_name);

fprintf(stderr, "!profiler!(%d) waiting for a child...\n", world_rank);

MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm);

fprintf(stderr, "!profiler!(%d) got a child!\n", world_rank);

int r;
MPI_Comm_rank(intercomm, &r);
fprintf(stderr, "!profiler!(%d) is %d on parent!\n", world_rank, r);

// wait for a message that "I'm dying"
if ( PMPI_Irecv(&(quitmessage[0]), INTRA_MESSAGE_SIZE, INTRA_MESSAGE_MPITYPE, CHILD_RANK, INTERCOMM_TAG, intercomm, &dead_child) != MPI_SUCCESS )
{
intercomm = MPI_COMM_NULL;

fprintf(stderr, "!profiler!(%d) communication failed!\n", world_rank);
intercomm = MPI_COMM_NULL;
return FAILURE;
}

char mess[INTRA_MESSAGE_SIZE];
sprintf(mess, "%d IsYourFather\0", world_rank);


sendto_child(mess);

PMPI_Barrier(MPI_COMM_WORLD);


return SUCCESS;
}

Finalizing the communication

int wait_child(char* mess)
{
// send my death
if ( sendto_child(mess) == SUCCESS )
{
// wait his death
if ( PMPI_Wait(&dead_child, MPI_STATUS_IGNORE) == MPI_SUCCESS )
{
fprintf(stderr, "!profiler!(%d) received its child death!\n", world_rank);
//MPI_Unpublish_name(published, MPI_INFO_NULL, port_name);
MPI_Close_port(port_name);
return SUCCESS;
}
}

return FAILURE;
}

The display side - the client side


On the display side, the same kind of modification had to be done. Rather that using information from the father's communicator, a connection to a port is performed.


The MPIWatch::getWatcher method

MPIWatch* MPIWatch::getWatcher(char port_name[])
{
if ( instance == 0 )
{
MPI::Init();

std::cout << "Try to connect to " << port_name << std::endl;

parent = MPI::COMM_WORLD.Connect(port_name, MPI::INFO_NULL, 0);

if ( parent == MPI::COMM_NULL )
{
std::cerr << "Cannot connect with the parent program! Aborting." << std::endl;
MPI::Finalize();
return 0;
}

std::cout << "Connection with parent completed!" << std::endl;

instance = new MPIWatch();
}

return instance;
}

Running it!

The main difference here is that on the previous version the display was starting by itself. Now it has to be started separately, and actually one per MPI process. Some attempts were made to use the name publication described in the standard (see the reference further up) but for a unknown reason the display part never found the profiler published name.So far, 1 port is open per MPI process - or 1 name was published - and each display connect on 1 of them through command line input.

Console 1: run MPI

$> mpiexec -n 2 mpi_ring
!profiler!(0) open port '3449421824.0;tcp://192.168.0.2:48251+3449421825.0;tcp://192.168.0.2:36965:300'
!profiler!(1) open port '3449421824.0;tcp://192.168.0.2:48251+3449421825.1;tcp://192.168.0.2:52304:300'

Console 2-3: run the display

$> mpidisplay '3449421824.0;tcp://192.168.0.2:48251+3449421825.0;tcp://192.168.0.2:36965:300'
$> mpidisplay '3449421824.0;tcp://192.168.0.2:48251+3449421825.1;tcp://192.168.0.2:52304:300'

The current implementation is a little more complicated to run than the spawn version, but doesn't have any error code when finishing. It also allows more flexibility in the future, to allow more than one display on a single profiler, and any other idea that requires a more flexible approach than a spawn process (like been able to connect a display in the middle of a run and disconnect at will, to see if the program is deadlock etc).

Limitations

The port information are rather long and it is quite not user friendly to have to lookup the profiler output and copy/paste the port information into the display. Further investigation have to be made on that part, in order to either manage to find the name publication problem, or to find a way to look for the port with a more automatic fashion.The actual name publication idea was to publish a name, like 'profiler-mpirank' to look up for - or with any string given by the user instead of profiler. This will allow the display to be started in a single command, that will only need to know 2 information: the base name of the profiler and the number of MPI process to connect to!

The other limitation is not a real one, but more like a bug on the current implementation. A barrier was added to wait for every MPI process to get a display, and isn't that much of a problem, as no high performance are required for that project. The problem arises when one display is closed while the program is running. The current implementation doesn't catch it, and deadlocks. Further investigation will obviously be done on that problem later on.

Source code

As for the previous version the source code is available on http://www.megaupload.com/?d=ZXJGHBPQ. It is a test version, not very clean, and buggy (as explained above). Later on a post will be done on how to use the library with a MPI code in C.

Further work

The preliminary technical overview of the project is about to be over. Now that the basis of the project techniques are setted up, are more detailed reasoning will be done on the project functional requirements. As part of the Project Preparation course of the MSc, some risk analysis and workplan for the overall project has to be done as well and will be published here as well.

No comments:

Post a Comment