Real-time visualisation of MPI programs
David Henty
One of the problems with MPI programming is that it is very difficult to debug incorrect programs. Tools like VAMPIR can display the communications patterns of MPI programs by producing a trace file during execution and enabling the user to view the file as a timeline afterwards. Unfortunately, this is only useful if the program runs to completion which is usually not the case when you have a bug! It would also be useful to track MPI communications at runtime for training and education purposes, allowing new users to see what their programs are doing, or to run standard examples and follow their execution so they can understand concepts such as synchronous/asynchronous modes and blocking/non-blocking operations.
The project is to develop a tool/library that, for each MPI processes, pops up a window that shows real-time information about its execution. For example, it could just say what routine was being called ("Currently in MPI_Send"), give more details ("Calling MPI_Send to send 14 real numbers to rank 4") or display the operations graphically (eg boxes showing all the pending sends and receives, animations showing messages matching up at runtime etc etc). This tool would then be run on a set of test programs from simple examples all the way to full applications to see how useful it is in practice. Possible extensions include halting execution until the user hits a button ("click here to continue") which could be very useful in illustrating concepts such as collective communications: the routine will not complete until the user has clicked "go" for all MPI processes. Another possibility would be to display where in the source code each process is at any one time.
It is quite simple to do this in practice as the MPI library has a separate "profiling interface" that enables all MPI calls easily to be intercepted by the user. Here, we would then display information about the call in some way (eg write text to a window) before calling the real MPI routine.
The tool could easily be developed and tested on a single workstation with all MPI processes displaying information on the same screen. However, it would be more interesting to run on a real cluster like the EPCC training room machines. Here, a window would appear on each screen where an MPI process was running and there would be interactions between different machines in the room. A user at one screen might have to call to a user at another screen for them to initiate a receive operation so that the first user's synchronous send can complete.
The tool should work with both C and Fortran, but will itself be developed in C. A good knowledge of C programming is therefore required. Previous experience in graphics programming would be useful but not essential.
No comments:
Post a Comment