GPU Computing with CUDA

on compiled parallel C applications. ▫ Available in laptops, desktops, and clusters. • GPU parallelism is doubling every year. • Programming model scales ...
2MB Sizes 72 Downloads 155 Views
The “New” Moore’s Law • Computers no longer get faster, just wider • You must re-think your algorithms to be parallel ! • Data-parallel computing is most scalable solution

The “New” Moore’s Law

Enter the GPU • Massive economies of scale • Massively parallel

Graphical processors • The graphics processing unit (GPU) on commodity video cards has evolved into an extremely flexible and powerful processor  Programmability  Precision  Power

• GPGPU: an emerging field seeking to harness GPUs for general-purpose computation


Parallel Computing on a GPU •

8-series GPUs deliver 25 to 200+ GFLOPS on compiled parallel C applications 

• •

GeForce 8800

Available in laptops, desktops, and clusters

GPU parallelism is doubling every year Programming model scales transparently Tesla D870

Multithreaded SPMD model uses application data parallelism and thread parallelism Tesla S870

Computational Power • GPUs are fast…    

3.0 GHz dual-core Pentium4: 24.6 GFLOPS NVIDIA GeForceFX 7800: 165 GFLOPs 1066 MHz FSB Pentium Extreme Edition : 8.5 GB/s ATI Radeon X850 XT Platinum Edition: 37.8 GB/s

• GPUs are getting faster, faster  CPUs: 1.4× annual growth  GPUs: 1.7×(pixels) to 2.3× (vertices) annual growth



Flexible and Precise • Modern GPUs are deeply programmable  Programmable pixel, vertex, video engines  Solidifying high-level language support

• Modern GPUs support high precision  32 bit floating point throughout the pipeline  High enough for many (not all) applications


GPU for graphics • GPUs designed for & driven by video games  Programming model unusual  Programming idioms tied to computer graphics  Programming environment tightly constrained

• Underlying architectures are:  Inherently parallel  Rapidly evolving (even in basic feature set!)  Largely secret


General purpose GPUs • The power and flexibility of GPUs makes them an attractive platform for general-purpose computation • Example applications range from in-game physics simulation to conventional computational science • Goal: make the inexpensive power of the GPU available to developers as a sort of computational coprocessor


Previous GPGPU Constraints • Dealing with graphics API  Working with the corner cases of the graphics API

• Addressing modes

Input Registers

Fragment Program

 Limited texture size/dimension

• Instruction sets  Lack of Integer & bit ops

• Communication limited  Between pixels

Texture Constants Temp Registers

• Shader capabilities  Limited outputs

per thread per Shader per Context

Output Registers FB


Enter CUDA • Scalable parallel programming model • Minimal extensions to familiar C/C++ environment • Heterogeneous serial-parallel computing

Sound Bite

GPUs + CUDA = The Democratization of Parallel Computing

Massively parallel computing has become a commodity technology




Interactive visualization of volumetric white matter connectivity

Ionic placement for molecular dynamics simulation on GPU



Financial simulation of LIBOR model with swaptions

[email protected]: an Mscript API for GPU linear algebra



Fluid mechanics in Matlab using .mex file CUDA function

Astrophysics N-body simulation




Ultrasound medical imaging for cancer diagnostics

Highly optimized object oriented molecular dynamics

Cmatch exact str