Qunying Huang, Phil Yang, Huang, Phil Yang, Hannes Wu, Kai ... - CISC

2 downloads 145 Views 3MB Size Report
f t) computing (15mins, demo only because of cost). □ Deploying GEOSS clearinghouse onto GMU CISC cloud ti (15 i + 30
Qunying Huang, Phil Yang, Hannes Wu, Kai Liu, Jing Li Center for Intelligent Spatial Computing George Mason University & NASA http://cisc.gmu.edu/

Agenda …

Cloud computing for Earth Science † Presentation

((45mins)) † Question &Discussion (15 mins ) …

Demos † Deploying

GEOSS clearinghouse onto Amazon cloud computing ti (15mins, (15 i d demo only l because b off cost) t) † Deploying GEOSS clearinghouse onto GMU CISC cloud computing ti (15mins (15 i + 30 user iinteraction) t ti )

Qunying Huang, Chaowei Yang, Huayi Wu, Kai Liu, Jing Li Joint Center of Intelligent Computing George Mason University ESIP Summer Meeting J l 20th , 2010 July Contact: http://cisc.gmu.edu [email protected]

Outline …

Introduction

…

Cloud computing definition

…

Cl d computing Cloud i examples l

…

Cloud computing platform test

…

Spatial p cloud computing p g

…

Summary (benefit & future research directions)

Introduction …

Many scientific problems are data and computational intensive

…

Hi h performance High f computing ti supportt †

Distributed computing

†

Grid computing

†

Cloud Computing

The growth of cloud computing From http://www.zdnet.com/blog/hichecliffe

Cloud Computing …

D fi iti Definition †

A computing Cloud is a set of network enabled services, providing scalable QoS guaranteed, scalable, guaranteed inexpensive computing platforms on demand, which could be accessed in a simple and pervasive way. (Liu and Orban, 2008) Mobile device

Client Computer

Database/Storage

Application Servers/ Cluster

Cloud Computing Software as a Service (SaaS) • Almost any IT services • Users: End-user

Pl f Platform as a Service S i (PaaS) (P S) • Platform for developing and delivering pp , abstracted from infrastructures applications, • Users: Developer ᄎ

Infrastructure as a Service (IaaS) • On-demand sharing physical infrastructures • Users: System Administrator

Cloud computing …

D fi i characteristics Defining h i i On-demand self-service † Multi-tenancy M li † Measured Services † Device and Location independent resource pooling † Rapid elasticity †

…

Enabling technologies Virtualization † Web 2.0 † Web service & SOA † World-wide distributed storage & file system † Parallel & distributed programming model †

Virtualization …

Foundation of cloud Isolated runtime environment † Disaster recovery † Hide heterogeneity g y of the infrastructure † Allow partitioning and isolating of physical resources †

… … …

Full virtualization Para-virtualization H d Hardware virtualization/hardware i li i /h d assisted i d virtualization. i li i

Virtualization const Virtualization-const …

…

Xen †

Para-virtualization.

†

Amazon EC2 , GoGrid,, 21vianet CloudEx , RackSpace Mosso

† †

…

Hardware d virtualization i li i

WAH †

…

Para-virtualization: P i t li ti Workstation W k t ti product d t Full virtualization: Vmware ESX Server AT&T Synaptic, Verizon CaaS

Qemu/VirtualBox KVM(Kernel-based Virtual Machine) †

…

VM

VMware †

…

VM

Microsoft Azure

Accelerator †

Joyent

Hypervisor Hardware

VM

Virtual Infrastructure Middleware (VIM) ‰

‰

‰

‰

‰

Provides a uniform view of the resource pool Place and replace VM dynamically on a pool of physical infrastructures

Virtual Machine

VIM (OpenNebula, Eucalyptus Nimbus, Hadoop ) Hypervisor

Hypervisor Hypervisor Hypervisor

Scheduling & monitoring Networking Life-cycle Life cycle management and monitoring of VM

Physical Infrastructure

Amazon Cloud Services …

Elastic Compute Cloud – EC2 (IaaS)

…

Simple Storage Service – S3 (IaaS)

…

Elastic Block Storage g – EBS ((IaaS))

…

SimpleDB (SDB) (PaaS)

…

Simple Queue Service – SQS (PaaS)

…

CloudFront (S3 based Content Delivery Network – PaaS)

…

Consistent AWS Web Services API

Amazon EC2 …

…

A “W “Webb service i that th t provides id resizable i bl compute t capacity it in i the th cloud” EC2 saves a bootable VM root image as an “Amazon Amazon Machine Image” (AMI).

Instances Elastic Block Storage(EBS) XEN Virtualization Hosting of Virtual machine images(AMI)

Physical Server

Simple Si l Storage Service (S3)

Hosting of Virtual machine images(AMI)

How to Deploy Applications on Amazon EC2 …

Prepare a AMI †

From scratchh

†

Based on a ppublic AMI and customize

…

Launch the AMI as a Amazon EC2 instance

…

Access the instance through SSH

…

Configure/Run applications

…

Register as a new AMI

The GEOSS Clearinghouse … …

Metadata catalogues search facility EO O da data, a, services, se v ces, and a d related e a ed resources esou ces can ca be discovered d scove ed and accessed.

Deployment of GEOSS Clearinghouse on Amazon EC2 … … … …

… … … … …

Launch an CentOS AMI Authorize Network Access SSH the Amazon EC2 instance Transfer the GEOSS Clearinghouse codes into the virtual server I t ll Postgres/postgis Install P t / t i Restore the GEOSS Clearinghouse database Install tomcat, Jetty or other servlet container Configure servlet container Start the servlet container

Amazon EC2 Standard Linux Instance Types

Amazon EC2 High-Memory Linux Instance Types Type

CPU

Memory Storage Platform I/O

AWS Name

Cost

HighMemory Extra Large

6.5 ECU (2 virtual cores with 3.25 EC2 Compute Units each)

17.1 GB

420 GB

64-bit

High

m2.xlarge

$0.50 per hour

HighMemory Double Extra Large

13 EC2 Compute 34.2 GB Units (4 virtual cores with 3.25 EC2 Compute Units each)

850 GB

64-bit

High

m2.2xlarg e

$1.20 per hour

HighMemory Quadruple Extra Large

26 6 EC2 Compute 6 68.4 GB Units (8 virtual cores with 3.25 EC2 Compute Units each)

1690 6 GB

6 64-bit

High

m2.4xlarg e

$2.40 $ per hour

Amazon EC2 High-CPU Linux Instance Types Type

CPU

Memory Storage Platform I/O

AWS Name

Cost

High-CPU Medium

5 ECU (2 virtual cores with 2.5 EC2 Compute p Units each)

1.7 GB

370 GB

32-bit

Medi um

c1.medium

$0.17p er hour

High-CPU Extra Large

20 Compute Units (8virtual cores with 2.5 EC2 Compute Units each)

7.5 GB

1810 GB

64-bit

High

c1.xlarge

$0.68 per hour

Av verage Response TIme

Amazon EC2 Instance Performance Test GetCapabilities

250

…

200 150 100 50

…

0 1

20

40

60

80

Large Instance High-Memory Extra Large Instance

Average R Response Improv vement

100

Concurrent Request Number

120

140

160

Extra Large Instance High-CPU Extra Large Instanc

Only One Core of the VM is utilized CPU speed is the primary factor †

High-CPU g Medium instance should be used

†

Only $0.17per hour

Performance Improvement 0.4 0.35 0.3 0 25 0.25 0.2 0.15 0.1 0.05 0

6.5 EC2 Compute Units (2 virtual cores with 3.25EC2 Compute Units each)

20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each) 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)

20

40

6 60

80

100

120

140

160 6

Concurrent Request Number Extra Large Instance

High-Memory Extra Large Instance

High-CPU Extra Large Instance

CISC Cloud Computing Platform

Spatial Cloud Computing Architecture Cloud user Local User and Administrator

Spatial Cloud Portal

Geospatial Middleware

Virtual Infrastructure Middleware

Public cloud(EC2, Elastic Hosts…) L l infrastructure Local i f t t

Geospatial p Middleware …

…

Integrate spatial constraints and principles †

Scheduling/resource g allocation

†

Parallelization: Parallelization Methods/Parallelization Degree

G Geospatial i l capabilities bili i †

kernel GIS functions as services

†

Standardize the interfaces: OGC WPS (Web Processing Service)

†

Community tools & API

Spatial principles (Yang et al., al 2010) …

Physical phenomena are continuous and digital representations are discrete for both space and time † †

…

Closer things are more related Multiple scale

Physical phenomena are heterogeneous in space and time † † †

Higher resolution will include more information Phenomena are evolvingg at different speed p The longer or bigger a dynamic process, the more exchanges are needed among neighbors

Application pp Example p 1: Server site selection l

Application Example 2: Parallelization

24 Processors P

Execcution Tim me(mins)

350 300 250 200 150 100 50 0

3*8

4*6 2*12

6*4

8*3

12*2 1*24 24*1

Decompostion Method (Longitude*latitude )

Multilevel Application pp Example p 3: visualization i li i

Cloud computing examples-Deployment of GeoNetwork instance through thro gh CISC Cloud Clo d

Benefit …

…

…

…

Integrates a set of open-source components into a seamless,, self-service pplatform. Provides high-capacity computing, storage and network connectivity. connectivity Uses a virtualized, scalable approach to achieve cost and energy efficiencies. Create new opportunities pp for national, international, state, and local partners to leverage research easily

Challenges …

Network bottlenecks † Data

… … …

Performance unpredictability Privacy Scalable storage g †

…

transfer

Amazon EBS

Bugs in large distributed systems

30

Future …

IaaS will become increasingly standardized and commoditized Across-Cloud implementations (e.g. AWS and vCloud-based) † Across-Cloud tools and middleware will be available to enable interoperability and portability across different cloud †

…

…

…

IaaS providers will increasingly add new utilities and PaaS capabilities PaaS will become the battleground g for determining g the future of Cloud Computing PaaS will integrate with applications utilizing mobile devices and sensors

Conclusion Cloudd computing Cl i is i not just j a trendd … We are at a p prescient time …

Technologies † Cloud Architecture † Open data standards † Platform independent languages †

…

Spatial Cloud Computing Parallelizing ll li i andd Scheduling S h d li † Geospatial Middleware †

Reference … …

Amazon Elastic Compute Cloud (Amazon EC2): http://www.amazon.com/ec2(Access http://www amazon com/ec2(Access June 7, 7 2010) Armbrust, M., Fox, A. and Griffith, R. et al., 2009. Above the Clouds: A Berkeley View of Cloud Computing, Unversity of California, Berkeley, Berkeley, CA, 2009. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html(accessed March 12, 2010).

…

Cisco, 2009. http://www.cisco.com/en/US/solutions/collateral/ns340/ns858/Virtualization_Blueprint.pdf

…

Google App Engine, http://appengine.google.com

…

Nimbus, The Nimbus Cloud. http://www.nimbusproject.org/

… …

… …

…

…

Microsoft Azure, http://www.microsoft.com/azure/ Xie, J., C. Yang, B. Zhou, and Q. Huang. 2009. High performance computing for the simulation of dust storms. In Computers, Environment, and Urban Systems. (In press). OpenNebula, 2010. http://Opennebula.org Wang, S., and Liu, Y. 2009. TeraGrid GIScience Gateway: Bridging Cyberinfrastructure and GIScience. International Journal of Geographical Information Science, 23 (5): 631 – 656. Wiki, 2009. Wiki Cloud Computing. http://en.wikipedia.org/wiki/Cloud_computing(accessed April 29, 2009) Yang, C., H.Wu, Q.Huang, Z.Li and J.Li. 2010, Spatial Computing for Supporting Physical Sciences Proceedings of the National Academy of Science Sciences, Science. (in press). press)

Thank You! Chaowei Yang [email protected] y g @g

Pointers Portal http://aws.amazon.com Blog http://aws.typepad.com

CISC http://cisc.gmu.edu p g

EC2 http://aws.amazon.com/ec2 S3 http://aws.amazon.com/s3 Resource Center http://aws.amazon.com/resources Forums http://aws.amazon.com/forums