Table of Contents

Overview

Numerical applications are still demanding more and more computing and storage capacity. Computers are answering such a demand but mainly thanks to the usage of parallelism instead of processor frequency improvement for computing power. Hence, applications are becoming more and more parallel. However, they have to deal with a huge diversity of architectures such as multicore machines – possibility enhanced with GPU, grouped into clusters that takes part to grids or clouds. Such resources are handled by resource management services (RMS) as for example (local/grid) operating systems, batch systems, or grid middleware systems; they all provide a kind of job management service to deal with the application activity such as thread/process scheduling.

On the other hand, programming models have evolved to answer to the requirements of applications, such as multiphysic applications by providing more and more abstract concepts that ease application development. However, they require a dedicated framework to translate such concepts into concepts handled by RMS. Such a framework can be just as simple as a runtime or it can be quite complex as for example the SALOME framework. The level of complexity is related to the distance between the concepts of the programming models and those of the RMS. As application's lifetime is far longer than hardware's lifetime, modern programming models tend to be resource independent and thus they require a complex framework to fill the gap to RMS.

The problem addressed by this project is to reconcile the two layers – programming model frameworks (PMF) and RMS – with respect to a number of tasks that they both try to handle independently. PMF needs to have a knowledge of resources to select the most efficient transformation of abstract programming concepts into executable ones. However, the actual management of resources is done by RMS in an opaque way, based on a simple abstraction of applications.

The two main goals of this project are to set up such a cooperation as general as possible with respect to programming models and RMS and to develop algorithms for efficient resource selection. In particular, the project targets the SALOME platform and GRID-TLSE expert-site as example of programming models, and Marcel/PadicoTM, DIET and XtreemOS as examples of multithread scheduler/communication manager, grid middleware and distributed operating systems.

Objectives

The long term goal of the project is to contribute to the simplification of high performance application development by improving the performance of resource independent programming models. Such programming models are targeted to support a huge variety of hardware configurations but at the price of a potential complex transformation of their programming artifacts to the actual resources. However, resource management services hide the nature and the structure of the resources: they expect to receive an application whose structure is fixed and thus they are not able to efficiently support the aforementioned programming models. Thus, the actual goal of this project is to establish generic cooperation mechanisms between resource management services and programming model frameworks so as to reach the long term goal of having simple programming models that achieve high performance.

As RMS and PMF are handled by different teams with distinct goals – application completion time vs system throughput for example –, it is very important to be able to identify generic cooperation mechanisms so as not to tie a particular PMF to a particular RMS. It turns out that it requires to understand how to handle resource management between two levels that can not be merged while letting each level optimizing its own objectives. This tends to be particular true now that RMS not only target to optimize application throughput but also to become green by managing power consumption.

Research Agenda

This project is subdivided in a project management task (Task 1) and 3 scientific tasks (Task 2, 3 & 4). As the success of this project depends on the cooperation of independent layers, we have decided that all partners will be involved in all tasks. Let detail the tasks before explaining the global process. Task 1 contains administrative, dissemination and promotion activities. Thanks to the involvement of partners in other projects, the promotion of the results of this project is eased. XtreemOS is a European Integrated Project with a high visibility. DIET is used in some production projects such as Decrypthon (AFM-IBM-CNRS) and a startup company is being created. The GRID-TLSE expert site is already used by partners of ANR SOLSTICE to make available their linear problems (matrices); at the end of February 2009, some computational services and scenarios will be proposed. Last, EDF makes a huge use of numerical simulations and parallel machines.

The general idea of the scientific program is that each task represents a different logical stage. Task 2 is an analysis oriented task with two goals. First, it aims at providing bibliographic studies of programming model frameworks and resource management services. Second, by a careful analysis of the actual architectural models, Task 2 will identify which elements requires a cooperation between the two levels. Task 3 is an algorithmic oriented task which targets to study how to set up a collaboration between the two levels. We decide to group into the same task work oriented to architectural modifications of existing software to provide the targeted services as well as the design of algorithms being able to take benefit of such services. The main motivation is that both works are quite closely related as the algorithms can only take benefit from the services provided by RMS, and that RMS may have to provide news service for algorithms.

Task 4 is a validation oriented task which will gather development and experimental activities. The development activity mainly covers modifications of available software, either programming models or RMS, as well the implementation of algorithmic strategies designed in Task 3. The experimental activity will take care of running experiments and gathering their result

There is a clear dependency between Task 2, 3 and 4. Task 2 identifies issues, Task 3 proposes solutions, and Task 4 validates such solutions by modifying software if needed and making experiments


Powered by Heliovista - Création site internet