Saturday 16 April 2011

Original Requirements

Requirements


The requirements are the basis to organise the development of the project. The main problems of beginner - according to David Henty - are:

  1. Bad communication resulting in a dead-lock.
  2. Sending wrong data.
  3. Not waiting for asynchronous requests.



Example of state view.

In order to solve the 1st one, a synchronised view of the communication is needed. Providing the state of a process (for example: waiting for process 1: Recv) and the pattern of already executed messages. This is close to what Vampir provides, but display as the program is running and not after.



Example of a 2D array view.

Sending the wrong data would involve the software to know about important arrays for the user. A mechanism to register data is therefore needed - to both simplify development and readability. The user would be able to highlight operation on a given array, and display graphical information when part of it is used.



Deliverables


Two components will be delivered. A library (mpi_wrapper) that acts as a profiler; this part is using the MPI profiling interface and gather the information of the executing program. The other part is a software, that displays information gathered by the profiler; there is therefore a communication need between profiler and display, that will introduce some slowdown on the original MPI program.


Functional requirements


  1. Communication profiling
    1. point to point: communication from a processor to another.
    2. global: use of general communication routines.
    3. using different communicators: registering and selecting the communicators to profile.
    4. communication time: display the time when the communication occurred and the duration of the operation.
    5. step by step view: providing a blocking synchronized view of the communication, showing step by step what is going on for each processor.
    6. display communication with graphical “animation”: display the occurring communication with simple animation, using the step by step view.
    7. generate a log file: either using standard log formats (Vampir’s or Scalasca’s) or a dedicated one.
  2. Data view
    1. register an array: see information regarding an array when used in communications by registering it to the profiler.
    2. display graphical view of registered data: when used during communication display which part of data are transferred.
    3. recognise derived data types: in order to display the graphical view using simple data types first (vectors, subarrays) or more complex ones later.

Non-functional requirements


The project will as much as possible create a transparent tool for the user (only add a header and few compiler options to use the profiler); therefore it will avoid adding extra function calls. But some of the functionalities needs explicit calls - like the data view, that needs explicit listing of the arrays to look at - and have to add new functions. The aim is to have as few as possible extra calls for the user code to work with and without the library very easily.


The project is driven by a "teaching tool" goal. Therefore the development will be focussed on a solid backbone library usable for potential future development rather than providing a swiss-army knife of partially implemented functionalities. If the project leads to a well-developed tool that provides interesting features it will be published on the Internet and the code will be released with an open source licence.


The project is not motivated by a good performance tool. Analysing and displaying real time information will obviously introduce a delay. But as a matter of fact, the provided tool will try to be efficient in memory usage. It is important that the tool is both reliable and does not need enormous amount of RAM to work.

Thursday 14 April 2011

Meeting 5 [ 21/03] & Existing Approaches

The 5th meeting only focussed on the report and presentation, that were due for the end of March. Therefore this not will mainly focus on the report itself.

Existing approaches


The report discuss what are the goals of the project and especially how is it possible to fulfil them. The first step was obviously to find existing software, that are known to be use and solve some of the goals.

Vampir


Vampir is tool used to display information about communication patterns. It creates a log file to store the information and shows - with an external program after the execution of the MPI code - several useful view to show the latencies on the network, the possible communication problems (like late sender or late receiver patterns).

The way to activate Vampir is to load a module - on Ness - that uses another compiler. As MPI does it with mpicc it certainly adds another library include and linking path to the classical compiler.

Even though the file format is available, the actual software isn't free of use.

Vampir Official website

Scalasca


Scalasca is another tool that provides analysis of an MPI code. It is problem based, meaning it tries to spot possible slowdowns in the program and highlight them. In order to do so, it analyse the communication patterns and the actual data pattern used as well. It supports hybrid development (MPI & OpenMP for example).

Scalasca is free to use, but the actual software is copyrighted. By definition it is quite a complicated tool that gives a lot of details on a running code.

Scalasca Official Website

XMPI


XMPI is a legacy tool that was used on the LAM/MPI implementation (now part of OpenMPI). It provides statistics information about a running MPI 1 program. But it also provides a real time view (snapshot) of the processes (waiting state, current messages in queue, etc). But it is not supported any more (the last update was March 2008) and did only work for the LAM/MPI implementation.

XMPI Official website

Motivation


Scalasca and Vampir are the two mainly used tools on HPC systems to analyse an MPI code. But both of them provide an after-execution analysis of the program. They are used to tune and improve the performances of a working code. XMPI is the only tool that might help knowing the state of a running program at the moment, with its snapshot view, but is not supported anymore.

The goal of this project is to develop a tool for beginner, helping them understanding why a give code is working or not. The aim is therefore not a deep analyse of the code, and the performance of the code is not an issue. This project should provide a simple library and GUI to be used by beginner in MPI development, it will help to illustrate possible mistakes, and provide a simple tool to display information about a running code (in real time).


To summarise, this project aims to generate a global view of the program, as it is executing,
to help understanding how it works - or does not work. The result is between a parallel
debugger (as a real-time view of the program actions are displayed) and a profiling tool
(with the information about the on-going communications).