Existing approaches
The report discuss what are the goals of the project and especially how is it possible to fulfil them. The first step was obviously to find existing software, that are known to be use and solve some of the goals.
Vampir
Vampir is tool used to display information about communication patterns. It creates a log file to store the information and shows - with an external program after the execution of the MPI code - several useful view to show the latencies on the network, the possible communication problems (like late sender or late receiver patterns).
The way to activate Vampir is to load a module - on Ness - that uses another compiler. As MPI does it with mpicc it certainly adds another library include and linking path to the classical compiler.
Even though the file format is available, the actual software isn't free of use.
Scalasca
Scalasca is another tool that provides analysis of an MPI code. It is problem based, meaning it tries to spot possible slowdowns in the program and highlight them. In order to do so, it analyse the communication patterns and the actual data pattern used as well. It supports hybrid development (MPI & OpenMP for example).
Scalasca is free to use, but the actual software is copyrighted. By definition it is quite a complicated tool that gives a lot of details on a running code.
XMPI
XMPI is a legacy tool that was used on the LAM/MPI implementation (now part of OpenMPI). It provides statistics information about a running MPI 1 program. But it also provides a real time view (snapshot) of the processes (waiting state, current messages in queue, etc). But it is not supported any more (the last update was March 2008) and did only work for the LAM/MPI implementation.
Motivation
Scalasca and Vampir are the two mainly used tools on HPC systems to analyse an MPI code. But both of them provide an after-execution analysis of the program. They are used to tune and improve the performances of a working code. XMPI is the only tool that might help knowing the state of a running program at the moment, with its snapshot view, but is not supported anymore.
The goal of this project is to develop a tool for beginner, helping them understanding why a give code is working or not. The aim is therefore not a deep analyse of the code, and the performance of the code is not an issue. This project should provide a simple library and GUI to be used by beginner in MPI development, it will help to illustrate possible mistakes, and provide a simple tool to display information about a running code (in real time).
To summarise, this project aims to generate a global view of the program, as it is executing,
to help understanding how it works - or does not work. The result is between a parallel
debugger (as a real-time view of the program actions are displayed) and a profiling tool
(with the information about the on-going communications).
No comments:
Post a Comment