05 Tutorial High Performance Parallel Programming with ... - UPC

4 downloads 331 Views 2MB Size Report
Nov 14, 2005 - Berkeley. ▫ Intrepid. ▫ MTU. • UPC application development tools. ▫ totalview. ▫ upc_trace. ▫
SC|05 Tutorial High Performance Parallel Programming with Unified Parallel C (UPC) Tarek El-Ghazawi [email protected]

Phillip Merkey Steve Seidel

The George Washington U. Michigan Technological U.

{merk,steve}@mtu.edu

UPC Tutorial

November 14, 2005

1

UPC Tutorial Web Site This site contains the UPC code segments discussed in this tutorial. http://www.upc.mtu.edu/SC05-tutorial

UPC Tutorial

November 14, 2005

2

UPC Home Page http://www.upc.gwu.edu

UPC Tutorial

November 14, 2005

3

UPC textbook now available http://www.upcworld.org • UPC: Distributed Shared Memory Programming Tarek El-Ghazawi William Carlson Thomas Sterling Katherine Yelick

• Wiley, May, 2005 • ISBN: 0-471-22048-5

UPC Tutorial

November 14, 2005

4

Section 1: The UPC Language El-Ghazawi • Introduction • UPC and the PGAS Model • Data Distribution • Pointers • Worksharing and Exploiting Locality • Dynamic Memory Management (10:15am - 10:30am break) • Synchronization • Memory Consistency

UPC Tutorial

November 14, 2005

5

Section 2: UPC Systems Merkey & Seidel

• Summary of current UPC systems ƒ ƒ ƒ ƒ ƒ

Cray X-1 Hewlett-Packard Berkeley Intrepid MTU

• UPC application development tools ƒ ƒ ƒ ƒ

totalview upc_trace performance toolkit interface performance model

UPC Tutorial

November 14, 2005

6

Section 3: UPC Libraries •

Seidel

Collective Functions ƒ ƒ ƒ ƒ ƒ

Bucket sort example UPC collectives Synchronization modes Collectives performance Extensions

Noon – 1:00pm lunch • UPC-IO ƒ ƒ ƒ UPC Tutorial

El-Ghazawi

Concepts Main Library Calls Library Overview November 14, 2005

7

Sec. 4: UPC Applications Development • Two case studies of application design

Merkey

ƒ histogramming • • • •

locks revisited generalizing the histogram problem programming the sparse case implications of the memory model

(2:30pm – 2:45pm break) ƒ generic science code (advection): • shared multi-dimensional arrays • implications of the memory model

• UPC tips, tricks, and traps UPC Tutorial

Seidel November 14, 2005

8

Introduction • UPC – Unified Parallel C • Set of specs for a parallel C ƒ v1.0 completed February of 2001 ƒ v1.1.1 in October of 2003 ƒ v1.2 in May of 2005 • Compiler implementations by vendors and others • Consortium of government, academia, and HPC vendors including IDA CCS, GWU, UCB, MTU, UMN, ARSC, UMCP, U of Florida, ANL, LBNL, LLNL, DoD, DoE, HP, Cray, IBM, Sun, Intrepid, Etnus, …

UPC Tutorial

November 14, 2005

9

Introductions cont. • UPC compilers are now available for most HPC platforms and clusters ƒ Some are open source

• A debugger is available and a performance analysis tool is in the works • Benchmarks, programming examples, and compiler testing suite(s) are available • Visit www.upcworld.org or upc.gwu.edu for more information

UPC Tutorial

November 14, 2005

10

Parallel Programming Models • What is a programming model? ƒ An abstract virtual machine ƒ A view of data and execution ƒ The logical interface between architecture and applications

• Why Programming Models? ƒ Decouple applications and architectures • Write applications that run effectively across architectures • Design new architectures that can effectively support legacy applications

• Programming Model Design Considerations ƒ Expose modern architectural features to exploit machine power and improve performance ƒ Maintain Ease of Use UPC Tutorial

November 14, 2005

11

Programming Models • Common Parallel Programming models ƒ ƒ ƒ ƒ ƒ

Data Parallel Message Passing Shared Memory Distributed Shared Memory …

• Hybrid models ƒ Shared Memory under Message Passing ƒ …

UPC Tutorial

November 14, 2005

12

Programming Models

Process/Thread Address Space

Message Passing

Shared Memory

DSM/PGAS

MPI

OpenMP

UPC

UPC Tutorial

November 14, 2005

13

The Partitioned Global Address Space (PGAS) Model

Th0



Thn-2 Thn-1

ƒ Similar to the shared memory ƒ Memory partition Mi has affinity to thread Thi

x M0



Mn-2 Mn-1

Address Space UPC Tutorial

• (+)ive: ƒ Helps exploiting locality ƒ Simple statements as SM

Legend: Thread/Process

• Aka the DSM model • Concurrent threads with a partitioned shared space

• (-)ive: Memory Access

ƒ Synchronization

• UPC, also CAF and Titanium November 14, 2005

14

What is UPC? • Unified Parallel C • An explicit parallel extension of ISO C • A partitioned shared memory parallel programming language

UPC Tutorial

November 14, 2005

15

UPC Execution Model • A number of threads working independently in a SPMD fashion ƒ MYTHREAD specifies thread index (0..THREADS-1) ƒ Number of threads specified at compile-time or runtime

• Synchronization when needed ƒ Barriers ƒ Locks ƒ Memory consistency control UPC Tutorial

November 14, 2005

16

Private Partitioned Spaces Global address space

UPC Memory Model Thread 0

Thread THREADS-1

Thread 1

Shared Private 0 Private 1

Private THREADS-1

• A pointer-to-shared can reference all locations in the shared space, but there is data-thread affinity • A private pointer may reference addresses in its private space or its local portion of the shared space • Static and dynamic memory allocations are supported for both shared and private memory

UPC Tutorial

November 14, 2005

17

User’s General View

A collection of threads operating in a single global address space, which is logically partitioned among threads. Each thread has affinity with a portion of the globally shared address space. Each thread has also a private space.

UPC Tutorial

November 14, 2005

18

A First Example: Vector addition Thread 0 Thread 1

//vect_add.c #include #define N 100*THREADS

Iteration #:

1 3

v1[0] v1[2]

v1[1] v1[3]

shared int v1[N], v2[N], v1plusv2[N]; void main() { int i; v2[0] v2[1] for(i=0; i