StarPU Handbook
 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator Macros Groups Pages
SimGrid Support

StarPU can use Simgrid in order to simulate execution on an arbitrary platform. This was tested with simgrid from 3.11 to 3.16, and 3.18 to 3.19. Other versions may have compatibility issues. 3.17 notably does not build at all.

Preparing Your Application For Simulation

There are a few technical details which need to be handled for an application to be simulated through Simgrid.

If the application uses gettimeofday to make its performance measurements, the real time will be used, which will be bogus. To get the simulated time, it has to use starpu_timing_now() which returns the virtual timestamp in us.

For some technical reason, the application's .c file which contains main() has to be recompiled with starpu_simgrid_wrap.h, which in the simgrid case will # define main() into starpu_main(), and it is libstarpu which will provide the real main() and will call the application's main().

To be able to test with crazy data sizes, one may want to only allocate application data if STARPU_SIMGRID is not defined. Passing a NULL pointer to starpu_data_register functions is fine, data will never be read/written to by StarPU in Simgrid mode anyway.

To be able to run the application with e.g. CUDA simulation on a system which does not have CUDA installed, one can fill the cuda_funcs with (void*)1, to express that there is a CUDA implementation, even if one does not actually provide it. StarPU will not actually run it in Simgrid mode anyway by default (unless the STARPU_CODELET_SIMGRID_EXECUTE flag is set in the codelet)

static struct starpu_codelet cl11 =
{
.cpu_funcs = {chol_cpu_codelet_update_u11},
.cpu_funcs_name = {"chol_cpu_codelet_update_u11"},
#ifdef STARPU_USE_CUDA
.cuda_funcs = {chol_cublas_codelet_update_u11},
#elif defined(STARPU_SIMGRID)
.cuda_funcs = {(void*)1},
#endif
.nbuffers = 1,
.modes = {STARPU_RW},
.model = &chol_model_11
};

Calibration

The idea is to first compile StarPU normally, and run the application, so as to automatically benchmark the bus and the codelets.

$ ./configure && make
$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
[starpu][_starpu_load_history_based_model] Warning: model matvecmult
   is not calibrated, forcing calibration for this run. Use the
   STARPU_CALIBRATE environment variable to control this.
$ ...
$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
TEST PASSED

Note that we force to use the scheduler dmda to generate performance models for the application. The application may need to be run several times before the model is calibrated.

Simulation

Then, recompile StarPU, passing --enable-simgrid to ./configure. Make sure to keep all other ./configure options the same, and notably options such as –enable-maxcudadev.

$ ./configure --enable-simgrid

To specify the location of SimGrid, you can either set the environment variables SIMGRID_CFLAGS and SIMGRID_LIBS, or use the configure options --with-simgrid-dir, --with-simgrid-include-dir and --with-simgrid-lib-dir, for example

$ ./configure --with-simgrid-dir=/opt/local/simgrid

You can then re-run the application.

$ make
$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
TEST FAILED !!!

It is normal that the test fails: since the computation are not actually done (that is the whole point of simgrid), the result is wrong, of course.

If the performance model is not calibrated enough, the following error message will be displayed

$ STARPU_SCHED=dmda ./examples/matvecmult/matvecmult
[starpu][_starpu_load_history_based_model] Warning: model matvecmult
    is not calibrated, forcing calibration for this run. Use the
    STARPU_CALIBRATE environment variable to control this.
[starpu][_starpu_simgrid_execute_job][assert failure] Codelet
    matvecmult does not have a perfmodel, or is not calibrated enough

The number of devices can be chosen as usual with STARPU_NCPU, STARPU_NCUDA, and STARPU_NOPENCL, and the amount of GPU memory with STARPU_LIMIT_CUDA_MEM, STARPU_LIMIT_CUDA_devid_MEM, STARPU_LIMIT_OPENCL_MEM, and STARPU_LIMIT_OPENCL_devid_MEM.

Simulation On Another Machine

The simgrid support even permits to perform simulations on another machine, your desktop, typically. To achieve this, one still needs to perform the Calibration step on the actual machine to be simulated, then copy them to your desktop machine (the $STARPU_HOME/.starpu directory). One can then perform the Simulation step on the desktop machine, by setting the environment variable STARPU_HOSTNAME to the name of the actual machine, to make StarPU use the performance models of the simulated machine even on the desktop machine.

If the desktop machine does not have CUDA or OpenCL, StarPU is still able to use simgrid to simulate execution with CUDA/OpenCL devices, but the application source code will probably disable the CUDA and OpenCL codelets in thatcd sc case. Since during simgrid execution, the functions of the codelet are actually not called by default, one can use dummy functions such as the following to still permit CUDA or OpenCL execution.

Simulation Examples

StarPU ships a few performance models for a couple of systems: attila, mirage, idgraf, and sirocco. See section Simulated benchmarks for the details.

Simulations On Fake Machines

It is possible to build fake machines which do not exist, by modifying the platform file in $STARPU_HOME/.starpu/sampling/bus/machine.platform.xml by hand: one can add more CPUs, add GPUs (but the performance model file has to be extended as well), change the available GPU memory size, PCI memory bandwidth, etc.

Tweaking Simulation

The simulation can be tweaked, to be able to tune it between a very accurate simulation and a very simple simulation (which is thus close to scheduling theory results), see the STARPU_SIMGRID_CUDA_MALLOC_COST and STARPU_SIMGRID_CUDA_QUEUE_COST environment variables.

MPI Applications

StarPU-MPI applications can also be run in simgrid mode. It needs to be compiled with smpicc, and run using the starpu_smpirun script, for instance:

$ STARPU_SCHED=dmda starpu_smpirun -platform cluster.xml -hostfile hostfile ./mpi/tests/pingpong

Where cluster.xml is a Simgrid-MPI platform description, and hostfile the list of MPI nodes to be used. StarPU currently only supports homogeneous MPI clusters: for each MPI node it will just replicate the architecture referred by STARPU_HOSTNAME.

Debugging Applications

By default, simgrid uses its own implementation of threads, which prevents gdb from being able to inspect stacks of all threads. To be able to fully debug an application running with simgrid, pass the –cfg=contexts/factory:thread option to the application, to make simgrid use system threads, which gdb will be able to manipulate as usual.

Memory Usage

Since kernels are not actually run and data transfers are not actually performed, the data memory does not actually need to be allocated. This allows for instance to simulate the execution of applications processing very big data on a small laptop.

The application can for instance pass 1 (or whatever bogus pointer) to starpu data registration functions, instead of allocating data. This will however require the application to take care of not trying to access the data, and will not work in MPI mode, which performs transfers.

Another way is to pass the STARPU_MALLOC_SIMULATION_FOLDED flag to the starpu_malloc_flags() function. This will make it allocate a memory area which one can read/write, but optimized so that this does not actually consume memory. Of course, the values read from such area will be bogus, but this allows the application to keep e.g. data load, store, initialization as it is, and also work in MPI mode.

Note however that notably Linux kernels refuse obvious memory overcommitting by default, so a single allocation can typically not be bigger than the amount of physical memory, see https://www.kernel.org/doc/Documentation/vm/overcommit-accounting This prevents for instance from allocating a single huge matrix. Allocating a huge matrix in several tiles is not a problem, however. sysctl vm.overcommit_memory=1 can also be used to allow such overcommit.

Note however that this folding is done by remapping the same file several times, and Linux kernels will also refuse to create too many memory areas. sysctl vm.max_map_count can be used to check and change the default (65535). By default, StarPU uses a 1MiB file, so it hopefully fits in the CPU cache. This however limits the amount of such folded memory to a bit below 64GiB. The STARPU_MALLOC_SIMULATION_FOLD environment variable can be used to increase the size of the file.