"Best practices for programming with openMP on NUMA systems" By ...

2 downloads 303 Views 571KB Size Report
software developer on how to allocate data in such a way that memory accesses .... Build application file and launch mpi
"Best practices for programming with openMP on NUMA systems" By [email protected] Abstract: NUMA systems present the challenge to the multithreading software developer on how to allocate data in such a way that memory accesses are maximized with respect to local accesses. Fundamental concepts on the penalties of remote memory accesses versus local memory accesses will be exposed in addition of experimental data collected on AMD based systems. A set of examples will be provided then as a guideline on how to improve the programming of openMP applications with heavy memory access requirements (both memory latency and bandwidth sensitive applications). Additionally, runtime setup is another important component towards the proper exploitation of the NUMA systems with openMP applications. Therefore it deserves to be covered as well within those best practices.

NUMA concepts

best practices with openMP on NUMA systems

2

Performance metrics • Summary: It is all about “feeding the beast” • In order to “crunch” (ie. process) data on the processor you have to feed it with that data. The faster you feed it, the more data it crunches per second. • FLOP/s is the rate of crunching data on the core, compute unit, processor, node. • GB/s is the rate of feeding with data (Bytes) that core, compute unit, processor, node. • FLOP/s and GB/s are related performance metrics best practices with openMP on NUMA systems

3

Measuring the feeding rate (GB/s) • Single threaded for (i=0;i